Metabolite identification and molecular fingerprint prediction via machine learning. Heinonen, M., Shen, H., Zamboni, N., & Rousu, J. Bioinformatics, 28(18):2333–2341, 2012.
doi  abstract   bibtex   
MOTIVATION: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. RESULTS: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine (SVM). Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy, and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. AVAILABILITY: An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.com/p/fingerid. CONTACT: markus.heinonen@cs.helsinki.fi.
@Article{heinonen12metabolite,
  author    = {Markus Heinonen and Huibin Shen and Nicola Zamboni and Juho Rousu},
  title     = {Metabolite identification and molecular fingerprint prediction via machine learning.},
  journal   = {Bioinformatics},
  year      = {2012},
  volume    = {28},
  number    = {18},
  pages     = {2333--2341},
  abstract  = {MOTIVATION: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. RESULTS: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine (SVM). Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy, and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. AVAILABILITY: An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.com/p/fingerid. CONTACT: markus.heinonen@cs.helsinki.fi.},
  doi       = {10.1093/bioinformatics/bts437},
  file      = {HeinonenEtAl_MetaboliteIdentificationMolecular_Bioinformatics_2012.pdf:2012/HeinonenEtAl_MetaboliteIdentificationMolecular_Bioinformatics_2012.pdf:PDF},
  keywords  = {metabolite ms; TrACReview},
  owner     = {fhufsky},
  pmid      = {22815355},
  timestamp = {2012.07.23},
}

Downloads: 0