Classification and Interaction in Random Forests. Denisko, D. and Hoffman, M. M. 115(8):1690–1692.
Classification and Interaction in Random Forests [link]Paper  doi  abstract   bibtex   
Suppose you are a physician with a patient whose complaint could arise from multiple diseases. To attain a specific diagnosis, you might ask yourself a series of yes/no questions depending on observed features describing the patient, such as clinical test results and reported symptoms. As some questions rule out certain diagnoses early on, each answer determines which question you ask next. With about a dozen features and extensive medical knowledge, you could create a simple flow chart to connect and order these questions. If you had observations of thousands of features instead, you would probably want to automate. Machine learning methods can learn which questions to ask about these features to classify the entity they describe. Even when we lack prior knowledge, a classifier can tell us which features are most important and how they relate to, or interact with, each other. Identifying interactions with large numbers of features poses a special challenge. In PNAS, Basu et al. (1) address this problem with a new classifier based on the widely used random forest technique. The new method, an iterative random forest algorithm (iRF), increases the robustness of random forest classifiers and provides a valuable new way to identify important feature interactions. [] Random forests came into the spotlight in 2001 after their description by Breiman (2). He was largely influenced by previous work, especially the similar ” randomized trees” method of Amit and Geman (3), as well as Ho's ” random decision forests” (4). Random forests have since proven useful in many fields due to their high predictive accuracy (5, 6). In biology and medicine, random forests have successfully tackled a range of problems, including predicting drug response in cancer cell lines (7), identifying DNA-binding proteins (8), and localizing cancer to particular tissues from a liquid biopsy (9). [...]
@article{deniskoClassificationInteractionRandom2018,
  title = {Classification and Interaction in Random Forests},
  author = {Denisko, Danielle and Hoffman, Michael M.},
  date = {2018-02},
  journaltitle = {Proceedings of the National Academy of Sciences},
  volume = {115},
  pages = {1690--1692},
  issn = {0027-8424},
  doi = {10.1073/pnas.1800256115},
  url = {https://doi.org/10.1073/pnas.1800256115},
  abstract = {Suppose you are a physician with a patient whose complaint could arise from multiple diseases. To attain a specific diagnosis, you might ask yourself a series of yes/no questions depending on observed features describing the patient, such as clinical test results and reported symptoms. As some questions rule out certain diagnoses early on, each answer determines which question you ask next. With about a dozen features and extensive medical knowledge, you could create a simple flow chart to connect and order these questions. If you had observations of thousands of features instead, you would probably want to automate. Machine learning methods can learn which questions to ask about these features to classify the entity they describe. Even when we lack prior knowledge, a classifier can tell us which features are most important and how they relate to, or interact with, each other. Identifying interactions with large numbers of features poses a special challenge. In PNAS, Basu et al. (1) address this problem with a new classifier based on the widely used random forest technique. The new method, an iterative random forest algorithm (iRF), increases the robustness of random forest classifiers and provides a valuable new way to identify important feature interactions.

[] Random forests came into the spotlight in 2001 after their description by Breiman (2). He was largely influenced by previous work, especially the similar ” randomized trees” method of Amit and Geman (3), as well as Ho's ” random decision forests” (4). Random forests have since proven useful in many fields due to their high predictive accuracy (5, 6). In biology and medicine, random forests have successfully tackled a range of problems, including predicting drug response in cancer cell lines (7), identifying DNA-binding proteins (8), and localizing cancer to particular tissues from a liquid biopsy (9). [...]},
  keywords = {*imported-from-citeulike-INRMM,~INRMM-MiD:c-14538760,classification,computational-science,modelling,predictor-selection,random-forest},
  number = {8}
}
Downloads: 0