Knowledge Discovery by Accuracy Maximization

Knowledge Discovery by Accuracy Maximization. Cacciatore, S., Luchinat, C., & Tenori, L. 111(14):201220873–5122.

[Significance] We propose an innovative method to extract new knowledge from noisy and high-dimensional data. Our approach differs from previous methods in that it has an integrated procedure of validation of the results through maximization of cross-validated accuracy. In many cases, this method performs better than existing feature extraction methods and offers a general framework for analyzing any kind of complex data in a broad range of sciences. Examples ranging from genomics and metabolomics to astronomy and linguistics show the versatility of the method. [Abstract] Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.

@article{cacciatoreKnowledgeDiscoveryAccuracy2014,
  title = {Knowledge Discovery by Accuracy Maximization},
  author = {Cacciatore, Stefano and Luchinat, Claudio and Tenori, Leonardo},
  date = {2014-04},
  journaltitle = {Proceedings of the National Academy of Sciences},
  volume = {111},
  pages = {201220873--5122},
  issn = {1091-6490},
  doi = {10.1073/pnas.1220873111},
  url = {https://doi.org/10.1073/pnas.1220873111},
  abstract = {[Significance] 

We propose an innovative method to extract new knowledge from noisy and high-dimensional data. Our approach differs from previous methods in that it has an integrated procedure of validation of the results through maximization of cross-validated accuracy. In many cases, this method performs better than existing feature extraction methods and offers a general framework for analyzing any kind of complex data in a broad range of sciences. Examples ranging from genomics and metabolomics to astronomy and linguistics show the versatility of the method. [Abstract] 

Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.},
  keywords = {*imported-from-citeulike-INRMM,~INRMM-MiD:c-13121748,dimensionality-reduction,machine-learning,modelling},
  number = {14}
}

Downloads: 0

{"_id":"QyrwwhNwkaytXpgYh","bibbaseid":"cacciatore-luchinat-tenori-knowledgediscoverybyaccuracymaximization","authorIDs":[],"author_short":["Cacciatore, S.","Luchinat, C.","Tenori, L."],"bibdata":{"bibtype":"article","type":"article","title":"Knowledge Discovery by Accuracy Maximization","author":[{"propositions":[],"lastnames":["Cacciatore"],"firstnames":["Stefano"],"suffixes":[]},{"propositions":[],"lastnames":["Luchinat"],"firstnames":["Claudio"],"suffixes":[]},{"propositions":[],"lastnames":["Tenori"],"firstnames":["Leonardo"],"suffixes":[]}],"date":"2014-04","journaltitle":"Proceedings of the National Academy of Sciences","volume":"111","pages":"201220873–5122","issn":"1091-6490","doi":"10.1073/pnas.1220873111","url":"https://doi.org/10.1073/pnas.1220873111","abstract":"[Significance] We propose an innovative method to extract new knowledge from noisy and high-dimensional data. Our approach differs from previous methods in that it has an integrated procedure of validation of the results through maximization of cross-validated accuracy. In many cases, this method performs better than existing feature extraction methods and offers a general framework for analyzing any kind of complex data in a broad range of sciences. Examples ranging from genomics and metabolomics to astronomy and linguistics show the versatility of the method. [Abstract] Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.","keywords":"*imported-from-citeulike-INRMM,~INRMM-MiD:c-13121748,dimensionality-reduction,machine-learning,modelling","number":"14","bibtex":"@article{cacciatoreKnowledgeDiscoveryAccuracy2014,\n title = {Knowledge Discovery by Accuracy Maximization},\n author = {Cacciatore, Stefano and Luchinat, Claudio and Tenori, Leonardo},\n date = {2014-04},\n journaltitle = {Proceedings of the National Academy of Sciences},\n volume = {111},\n pages = {201220873--5122},\n issn = {1091-6490},\n doi = {10.1073/pnas.1220873111},\n url = {https://doi.org/10.1073/pnas.1220873111},\n abstract = {[Significance] \n\nWe propose an innovative method to extract new knowledge from noisy and high-dimensional data. Our approach differs from previous methods in that it has an integrated procedure of validation of the results through maximization of cross-validated accuracy. In many cases, this method performs better than existing feature extraction methods and offers a general framework for analyzing any kind of complex data in a broad range of sciences. Examples ranging from genomics and metabolomics to astronomy and linguistics show the versatility of the method. [Abstract] \n\nHere we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning.},\n keywords = {*imported-from-citeulike-INRMM,~INRMM-MiD:c-13121748,dimensionality-reduction,machine-learning,modelling},\n number = {14}\n}\n\n","author_short":["Cacciatore, S.","Luchinat, C.","Tenori, L."],"key":"cacciatoreKnowledgeDiscoveryAccuracy2014","id":"cacciatoreKnowledgeDiscoveryAccuracy2014","bibbaseid":"cacciatore-luchinat-tenori-knowledgediscoverybyaccuracymaximization","role":"author","urls":{"Paper":"https://doi.org/10.1073/pnas.1220873111"},"keyword":["*imported-from-citeulike-INRMM","~INRMM-MiD:c-13121748","dimensionality-reduction","machine-learning","modelling"],"downloads":0},"bibtype":"article","biburl":"https://tmpfiles.org/dl/58794/INRMM.bib","creationDate":"2020-07-02T22:41:03.529Z","downloads":0,"keywords":["*imported-from-citeulike-inrmm","~inrmm-mid:c-13121748","dimensionality-reduction","machine-learning","modelling"],"search_terms":["knowledge","discovery","accuracy","maximization","cacciatore","luchinat","tenori"],"title":"Knowledge Discovery by Accuracy Maximization","year":null,"dataSources":["DXuKbcZTirdigFKPF"]}