Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships. Ma, J.; Sheridan, R. P.; Liaw, A.; Dahl, G. E.; and Svetnik, V. Journal of Chemical Information and Modeling, 55(2):263--274, February, 2015. 00003
Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships [link]Paper  doi  abstract   bibtex   
Neural networks were widely used for quantitative structure?activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck?s drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.
@article{ ma_deep_2015,
  title = {Deep {Neural} {Nets} as a {Method} for {Quantitative} {Structure}–{Activity} {Relationships}},
  volume = {55},
  issn = {1549-9596},
  url = {http://dx.doi.org/10.1021/ci500747n},
  doi = {10.1021/ci500747n},
  abstract = {Neural networks were widely used for quantitative structure?activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck?s drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.},
  number = {2},
  urldate = {2015-06-17TZ},
  journal = {Journal of Chemical Information and Modeling},
  author = {Ma, Junshui and Sheridan, Robert P. and Liaw, Andy and Dahl, George E. and Svetnik, Vladimir},
  month = {February},
  year = {2015},
  note = {00003},
  pages = {263--274}
}
Downloads: 0