Taxonomically informed scoring enhances confidence in natural products annotation. Rutz, A., Dounoue-Kubo, M., Ollivier, S., Bisson, J., Bagheri, M., Saesong, T., Nejad Ebrahimi, S., Ingkaninan, K., Wolfender, J., & Allard, P. bioRxiv, Cold Spring Harbor Laboratory, 2019.
Taxonomically informed scoring enhances confidence in natural products annotation [link]Paper  doi  abstract   bibtex   
Mass spectrometry (MS) hyphenated to liquid chromatography (LC)-MS offers unrivalled sensitivity for metabolite profiling of complex biological matrices encountered in natural products (NP) research. With advanced platforms LC, MS/MS spectra are acquired in an untargeted manner on most detected features. This generates massive and complex sets of spectral data that provide valuable structural information on most analytes. To interpret such datasets, computational methods are mandatory. To this extent, computerized annotation of metabolites links spectral data to candidate structures. When profiling complex extracts spectra are often organized in clusters by similarity via Molecular Networking (MN). A spectral matching score is usually established between the acquired data and experimental or theoretical spectral databases (DB). The process leads to various candidate structures for each MS features. At this stage, obtaining high annotation confidence level remains a challenge notably due to the high chemodiversity of specialized metabolomes.The integration of additional information in a meta-score is a way to capture complementary experimental attributes and improve the annotation process. Here we show that integrating unambiguous taxonomic position of analyzed samples and candidate structures enhances confidence in metabolite annotation. A script is proposed to automatically input such information at various granularity levels (species, genus, and family) and weight the score obtained between experimental spectral data and output of available computational metabolite annotation tools (ISDB-DNP, MS-Finder, Sirius). In all cases, the consideration of the taxonomic distance allowed an efficient re-ranking of the candidate structures leading to a systematic enhancement of the recall and precision rates of the tools (1.5 to 7-fold increase in the F1 score). Our results clearly demonstrate the importance of considering taxonomic information in the process of specialized metabolites annotation. This requires to access structural data systematically documented with biological origin, both for new and previously reported NPs. In this respect, the establishment of an open structural DB of specialized metabolites and their associated metadata (particularly biological sources) is timely and critical for the NP research community.
@Article{rutz19taxonomically,
  author       = {Rutz, Adriano and Dounoue-Kubo, Miwa and Ollivier, Simon and Bisson, Jonathan and Bagheri, Mohsen and Saesong, Tongchai and Nejad Ebrahimi, Samad and Ingkaninan, Kornkanok and Wolfender, Jean-Luc and Allard, Pierre-Marie},
  journal      = {bioRxiv},
  title        = {Taxonomically informed scoring enhances confidence in natural products annotation},
  year         = {2019},
  abstract     = {Mass spectrometry (MS) hyphenated to liquid chromatography (LC)-MS offers unrivalled sensitivity for metabolite profiling of complex biological matrices encountered in natural products (NP) research. With advanced platforms LC, MS/MS spectra are acquired in an untargeted manner on most detected features. This generates massive and complex sets of spectral data that provide valuable structural information on most analytes. To interpret such datasets, computational methods are mandatory. To this extent, computerized annotation of metabolites links spectral data to candidate structures. When profiling complex extracts spectra are often organized in clusters by similarity via Molecular Networking (MN). A spectral matching score is usually established between the acquired data and experimental or theoretical spectral databases (DB). The process leads to various candidate structures for each MS features. At this stage, obtaining high annotation confidence level remains a challenge notably due to the high chemodiversity of specialized metabolomes.The integration of additional information in a meta-score is a way to capture complementary experimental attributes and improve the annotation process. Here we show that integrating unambiguous taxonomic position of analyzed samples and candidate structures enhances confidence in metabolite annotation. A script is proposed to automatically input such information at various granularity levels (species, genus, and family) and weight the score obtained between experimental spectral data and output of available computational metabolite annotation tools (ISDB-DNP, MS-Finder, Sirius). In all cases, the consideration of the taxonomic distance allowed an efficient re-ranking of the candidate structures leading to a systematic enhancement of the recall and precision rates of the tools (1.5 to 7-fold increase in the F1 score). Our results clearly demonstrate the importance of considering taxonomic information in the process of specialized metabolites annotation. This requires to access structural data systematically documented with biological origin, both for new and previously reported NPs. In this respect, the establishment of an open structural DB of specialized metabolites and their associated metadata (particularly biological sources) is timely and critical for the NP research community.},
  doi          = {10.1101/702308},
  elocation-id = {702308},
  eprint       = {https://www.biorxiv.org/content/early/2019/07/14/702308.full.pdf},
  publisher    = {Cold Spring Harbor Laboratory},
  url          = {https://www.biorxiv.org/content/early/2019/07/14/702308},
}

Downloads: 0