A Constant Time Algorithm for Estimating the Diversity of Large Chemical Libraries. Agrafiotis, D. K. J.~Chem.~Inf.~Comput.~Sci., 41:156--167, 2001.
doi  abstract   bibtex   
We describe a novel diversity metric for use in the design of combinatorial chemistry and high-throughput screening experiments. The method estimates the cumulative probability distribution of intermolecular dissimilarities in the collection of interest and then measures the deviation of that distribution from the respective distribution of a uniform sample using the Kolmogorov-Smirnov statistic. The distinct advantage of this approach is that the cumulative distribution can be easily estimated using probability sampling and does not require exhaustive enumeration of all pairwise distances in the data set. The function is intuitive, very fast to compute, does not depend on the size of the collection, and can be used to perform diversity estimates on both global and local scale. More importantly, it allows meaningful comparison of data sets of different cardinality and is not affected by the curse of dimensionality, which plagues many other diversity indices. The advantages of this approach are demonstrated using examples from the combinatorial chemistry literature.
@article{Agrafiotis:2001aa,
	Abstract = { We describe a novel diversity metric for use in the design of combinatorial
	chemistry and high-throughput screening experiments. The method estimates
	the cumulative probability distribution of intermolecular dissimilarities
	in the collection of interest and then measures the deviation of
	that distribution from the respective distribution of a uniform sample
	using the Kolmogorov-Smirnov statistic. The distinct advantage of
	this approach is that the cumulative distribution can be easily estimated
	using probability sampling and does not require exhaustive enumeration
	of all pairwise distances in the data set. The function is intuitive,
	very fast to compute, does not depend on the size of the collection,
	and can be used to perform diversity estimates on both global and
	local scale. More importantly, it allows meaningful comparison of
	data sets of different cardinality and is not affected by the curse
	of dimensionality, which plagues many other diversity indices. The
	advantages of this approach are demonstrated using examples from
	the combinatorial chemistry literature. },
	Author = {Agrafiotis, D. K.},
	Date-Added = {2007-12-11 17:01:03 -0500},
	Date-Modified = {2009-04-27 17:32:57 -0400},
	Doi = {10.1021/ci000091j},
	Journal = {J.~Chem.~Inf.~Comput.~Sci.},
	Keywords = {diversity; ks; kolmogorov; smirnov; distribution},
	Pages = {156--167},
	Title = {A Constant Time Algorithm for Estimating the Diversity of Large Chemical Libraries},
	Volume = {41},
	Year = {2001},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RElpOUy5vYmplY3RzViRjbGFzc1dOUy5rZXlzog8QgASABoAHohMUgAKAA1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgMGRpXTlMuZGF0YU8RAW4AAAAAAW4AAgAAA212IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMU5bQNIKwAAABCNbQ1jaTAwMDA5MWoucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEI3xxNMuPwAAAAAAAAAAAAIAAwAACSAAAAAAAAAAAAAAAAAAAAAIYXJ0aWNsZXMAEAAIAADFObNTAAAAEQAIAADE02Z/AAAAAQAQABCNbQAKTIAACkxpAAB8EwACADBtdiA6VXNlcnM6cmd1aGE6RG9jdW1lbnRzOmFydGljbGVzOmNpMDAwMDkxai5wZGYADgAcAA0AYwBpADAAMAAwADAAOQAxAGoALgBwAGQAZgAPAAgAAwBtAHYAIAASACxVc2Vycy9yZ3VoYS9Eb2N1bWVudHMvYXJ0aWNsZXMvY2kwMDAwOTFqLnBkZgATAAEvAAAVAAIADP//AACABdIcHR4fWCRjbGFzc2VzWiRjbGFzc25hbWWjHyAhXU5TTXV0YWJsZURhdGFWTlNEYXRhWE5TT2JqZWN0XxAmLi4vLi4vRG9jdW1lbnRzL2FydGljbGVzL2NpMDAwMDkxai5wZGbSHB0kJaIlIVxOU0RpY3Rpb25hcnkSAAGGoF8QD05TS2V5ZWRBcmNoaXZlcgAIABEAFgAfACgAMgA1ADoAPABFAEsAUgBdAGQAbABvAHEAcwB1AHgAegB8AIYAkwCYAKACEgIUAhkCIgItAjECPwJGAk8CeAJ9AoACjQKSAAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAAqQ=}}

Downloads: 0