Why Batch Effects Matter in Omics Data, and How to Avoid Them

Why Batch Effects Matter in Omics Data, and How to Avoid Them. Goh, W. W. B., Wang, W., & Wong, L. Trends in Biotechnology, 35(6):498–507, June, 2017. Number: 6
doi abstract bibtex

Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.

@article{goh_why_2017,
	title = {Why {Batch} {Effects} {Matter} in {Omics} {Data}, and {How} to {Avoid} {Them}},
	volume = {35},
	issn = {1879-3096},
	doi = {10.1016/j.tibtech.2017.02.012},
	abstract = {Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.},
	language = {eng},
	number = {6},
	journal = {Trends in Biotechnology},
	author = {Goh, Wilson Wen Bin and Wang, Wei and Wong, Limsoon},
	month = jun,
	year = {2017},
	pmid = {28351613},
	note = {Number: 6},
	keywords = {Algorithms, Databases, Genetic, Gene Expression Profiling, Information Storage and Retrieval, Models, Theoretical, Proteomics, batch effect, cross-validation, data integration, heterogeneity, reproducibility},
	pages = {498--507},
}

Downloads: 0

{"_id":"b5RmP8uKAF8pTbYSw","bibbaseid":"goh-wang-wong-whybatcheffectsmatterinomicsdataandhowtoavoidthem-2017","author_short":["Goh, W. W. B.","Wang, W.","Wong, L."],"bibdata":{"bibtype":"article","type":"article","title":"Why Batch Effects Matter in Omics Data, and How to Avoid Them","volume":"35","issn":"1879-3096","doi":"10.1016/j.tibtech.2017.02.012","abstract":"Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.","language":"eng","number":"6","journal":"Trends in Biotechnology","author":[{"propositions":[],"lastnames":["Goh"],"firstnames":["Wilson","Wen","Bin"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Wei"],"suffixes":[]},{"propositions":[],"lastnames":["Wong"],"firstnames":["Limsoon"],"suffixes":[]}],"month":"June","year":"2017","pmid":"28351613","note":"Number: 6","keywords":"Algorithms, Databases, Genetic, Gene Expression Profiling, Information Storage and Retrieval, Models, Theoretical, Proteomics, batch effect, cross-validation, data integration, heterogeneity, reproducibility","pages":"498–507","bibtex":"@article{goh_why_2017,\n\ttitle = {Why {Batch} {Effects} {Matter} in {Omics} {Data}, and {How} to {Avoid} {Them}},\n\tvolume = {35},\n\tissn = {1879-3096},\n\tdoi = {10.1016/j.tibtech.2017.02.012},\n\tabstract = {Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.},\n\tlanguage = {eng},\n\tnumber = {6},\n\tjournal = {Trends in Biotechnology},\n\tauthor = {Goh, Wilson Wen Bin and Wang, Wei and Wong, Limsoon},\n\tmonth = jun,\n\tyear = {2017},\n\tpmid = {28351613},\n\tnote = {Number: 6},\n\tkeywords = {Algorithms, Databases, Genetic, Gene Expression Profiling, Information Storage and Retrieval, Models, Theoretical, Proteomics, batch effect, cross-validation, data integration, heterogeneity, reproducibility},\n\tpages = {498--507},\n}\n\n\n\n","author_short":["Goh, W. W. B.","Wang, W.","Wong, L."],"key":"goh_why_2017","id":"goh_why_2017","bibbaseid":"goh-wang-wong-whybatcheffectsmatterinomicsdataandhowtoavoidthem-2017","role":"author","urls":{},"keyword":["Algorithms","Databases","Genetic","Gene Expression Profiling","Information Storage and Retrieval","Models","Theoretical","Proteomics","batch effect","cross-validation","data integration","heterogeneity","reproducibility"],"metadata":{"authorlinks":{}},"downloads":0,"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/jayanth-5566","dataSources":["Jdt3BP2cNPrxPeZi2"],"keywords":["algorithms","databases","genetic","gene expression profiling","information storage and retrieval","models","theoretical","proteomics","batch effect","cross-validation","data integration","heterogeneity","reproducibility"],"search_terms":["batch","effects","matter","omics","data","avoid","goh","wang","wong"],"title":"Why Batch Effects Matter in Omics Data, and How to Avoid Them","year":2017}