Significance estimation for large scale untargeted metabolomics annotations. Scheubert, K., Hufsky, F., Petras, D., Wang, M., Nothias, L., Duehrkop, K., Bandeira, N., Dorrestein, P., & Boecker, S. bioRxiv, Cold Spring Harbor Labs Journals, 2017.
Significance estimation for large scale untargeted metabolomics annotations [link]Paper  doi  abstract   bibtex   
The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate. Relying on estimations of false discovery rates, we explore the effect of different spectrum-spectrum match criteria on the number and the nature of the molecules annotated. We show that the spectral matching settings needs to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from -92% up to +5705%) when compared to a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to define the scoring criteria for large scale analysis of untargeted small molecule data that has been essential in the advancement of large scale proteomics, transcriptomics, and genomics science.
@Article{scheubert17significance-biorxiv,
  author    = {Scheubert, Kerstin and Hufsky, Franziska and Petras, Daniel and Wang, Mingxun and Nothias, Louis-Felix and Duehrkop, Kai and Bandeira, Nuno and Dorrestein, Pieter and Boecker, Sebastian},
  title     = {Significance estimation for large scale untargeted metabolomics annotations},
  journal   = {bioRxiv},
  year      = {2017},
  abstract  = {The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate. Relying on estimations of false discovery rates, we explore the effect of different spectrum-spectrum match criteria on the number and the nature of the molecules annotated. We show that the spectral matching settings needs to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139\% (ranging from -92\% up to +5705\%) when compared to a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to define the scoring criteria for large scale analysis of untargeted small molecule data that has been essential in the advancement of large scale proteomics, transcriptomics, and genomics science.},
  doi       = {10.1101/109389},
  eprint    = {http://biorxiv.org/content/early/2017/02/17/109389.full.pdf},
  owner     = {Sebastian},
  publisher = {Cold Spring Harbor Labs Journals},
  timestamp = {2017.04.03},
  url       = {http://biorxiv.org/content/early/2017/02/17/109389},
}

Downloads: 0