Less may be more: an informed reflection on molecular descriptors for drug design and discovery. Barnard, T., Hagan, H., Tseng, S., & Sosso, G. C. Molecular Systems Design & Engineering, 5(1):317–329, 2020.
Less may be more: an informed reflection on molecular descriptors for drug design and discovery [link]Paper  doi  abstract   bibtex   
The phenomenal advances of machine learning in the context of drug design have led to the development of a plethora of molecular descriptors. And yet, there might be value in using just a handful of them – inspired by our physical intuition. , The phenomenal advances of machine learning in the context of drug design and discovery have led to the development of a plethora of molecular descriptors. In fact, many of these “standard” descriptors are now readily available via open source, easy-to-use computational tools. As a result, it is not uncommon to take advantage of large numbers – up to thousands in some cases – of these descriptors to predict the functional properties of drug-like molecules. This “strength in numbers” approach does usually provide excellent flexibility – and thus, good numerical accuracy – to the machine learning framework of choice; however, it suffers from a lack of transparency, in that it becomes very challenging to pinpoint the – usually, few – descriptors that are playing a key role in determining the functional properties of a given molecule. In this work, we show that just a handful of well-tailored molecular descriptors may often be capable to predict the functional properties of drug-like molecules with an accuracy comparable to that obtained by using hundreds of standard descriptors. In particular, we apply feature selection and genetic algorithms to in-house descriptors we have developed building on junction trees and symmetry functions, respectively. We find that information from as few as 10–20 molecular fragments is often enough to predict with decent accuracy even complex biomedical activities. In addition, we demonstrate that the usage of small sets of optimised symmetry functions may pave the way towards the prediction of the physical properties of drugs in their solid phases – a pivotal challenge for the pharmaceutical industry. Thus, this work brings strong arguments in support of the usage of small numbers of selected descriptors to discover the structure–function relation of drug-like molecules – as opposed to blindly leveraging the flexibility of the thousands of molecular descriptors currently available.
@article{barnard_less_2020,
	title = {Less may be more: an informed reflection on molecular descriptors for drug design and discovery},
	volume = {5},
	issn = {2058-9689},
	shorttitle = {Less may be more},
	url = {http://xlink.rsc.org/?DOI=C9ME00109C},
	doi = {10.1039/C9ME00109C},
	abstract = {The phenomenal advances of machine learning in the context of drug design have led to the development of a plethora of molecular descriptors. And yet, there might be value in using just a handful of them – inspired by our physical intuition.
          , 
            
              The phenomenal advances of machine learning in the context of drug design and discovery have led to the development of a plethora of molecular descriptors. In fact, many of these “standard” descriptors are now readily available
              via
              open source, easy-to-use computational tools. As a result, it is not uncommon to take advantage of large numbers – up to thousands in some cases – of these descriptors to predict the functional properties of drug-like molecules. This “strength in numbers” approach does usually provide excellent flexibility – and thus, good numerical accuracy – to the machine learning framework of choice; however, it suffers from a lack of transparency, in that it becomes very challenging to pinpoint the – usually, few – descriptors that are playing a key role in determining the functional properties of a given molecule. In this work, we show that just a handful of well-tailored molecular descriptors may often be capable to predict the functional properties of drug-like molecules with an accuracy comparable to that obtained by using hundreds of standard descriptors. In particular, we apply feature selection and genetic algorithms to in-house descriptors we have developed building on junction trees and symmetry functions, respectively. We find that information from as few as 10–20 molecular fragments is often enough to predict with decent accuracy even complex biomedical activities. In addition, we demonstrate that the usage of small sets of optimised symmetry functions may pave the way towards the prediction of the physical properties of drugs in their solid phases – a pivotal challenge for the pharmaceutical industry. Thus, this work brings strong arguments in support of the usage of small numbers of selected descriptors to discover the structure–function relation of drug-like molecules – as opposed to blindly leveraging the flexibility of the thousands of molecular descriptors currently available.},
	language = {en},
	number = {1},
	urldate = {2020-04-08},
	journal = {Molecular Systems Design \& Engineering},
	author = {Barnard, Trent and Hagan, Harry and Tseng, Steven and Sosso, Gabriele C.},
	year = {2020},
	pages = {317--329},
}

Downloads: 0