Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis. Sheerin, D., Lakay, F., Esmail, H., Kinnear, C., Sansom, B., Glanzmann, B., Wilkinson, R. J, Ritchie, M. E, & Coussens, A. K Scientific Reports, 13:1859, Nature Publishing Group, feb, 2023.
Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis [link]Paper  doi  abstract   bibtex   
When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; an alternative approach is to deplete reads arising from Hgb RNA bioinformatically. In the present study, we compared the impact of these two approaches on the outcome of differential gene expression analysis performed using RNA-seq data from 58 human tuberculosis (TB) patient or contact whole blood samples–29 globin kit-depleted and 29 matched non-depleted—a subset of which were taken at TB diagnosis and at six months post-TB treatment from the same patient. Bioinformatic depletion of Hgb genes from the non-depleted samples (bioinformatic-depleted) substantially reduced library sizes (median = 57.24%) and fewer long non-coding, micro, small nuclear and small nucleolar RNAs were captured in these libraries. Profiling published TB gene signatures across all samples revealed inferior correlation between kit-depleted and bioinformatic-depleted pairs when the proportion of reads mapping to Hgb genes was higher in the non-depleted sample, particularly at the TB diagnosis time point. A set of putative “globin-fingerprint” genes were identified by directly comparing kit-depleted and bioinformatic-depleted samples at each timepoint. Two TB treatment response signatures were also shown to have decreased differential performance when comparing samples at TB diagnosis to six months post-TB treatment when profiled on the bioinformatic-depleted samples compared with their kit-depleted counterparts. These results demonstrate that failure to deplete Hgb RNA prior to sequencing has a negative impact on the sensitivity to detect disease-relevant gene expression changes even when bioinformatic removal is performed.
@article{Sheerin2023,
abstract = {When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70{\%} of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; an alternative approach is to deplete reads arising from Hgb RNA bioinformatically. In the present study, we compared the impact of these two approaches on the outcome of differential gene expression analysis performed using RNA-seq data from 58 human tuberculosis (TB) patient or contact whole blood samples–29 globin kit-depleted and 29 matched non-depleted—a subset of which were taken at TB diagnosis and at six months post-TB treatment from the same patient. Bioinformatic depletion of Hgb genes from the non-depleted samples (bioinformatic-depleted) substantially reduced library sizes (median = 57.24{\%}) and fewer long non-coding, micro, small nuclear and small nucleolar RNAs were captured in these libraries. Profiling published TB gene signatures across all samples revealed inferior correlation between kit-depleted and bioinformatic-depleted pairs when the proportion of reads mapping to Hgb genes was higher in the non-depleted sample, particularly at the TB diagnosis time point. A set of putative “globin-fingerprint” genes were identified by directly comparing kit-depleted and bioinformatic-depleted samples at each timepoint. Two TB treatment response signatures were also shown to have decreased differential performance when comparing samples at TB diagnosis to six months post-TB treatment when profiled on the bioinformatic-depleted samples compared with their kit-depleted counterparts. These results demonstrate that failure to deplete Hgb RNA prior to sequencing has a negative impact on the sensitivity to detect disease-relevant gene expression changes even when bioinformatic removal is performed.},
author = {Sheerin, Dylan and Lakay, Francisco and Esmail, Hanif and Kinnear, Craig and Sansom, Bianca and Glanzmann, Brigitte and Wilkinson, Robert J and Ritchie, Matthew E and Coussens, Anna K},
doi = {10.1038/s41598-023-28218-7},
file = {:C$\backslash$:/Users/01462563/AppData/Local/Mendeley Ltd./Mendeley Desktop/Downloaded/Sheerin et al. - 2023 - Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expre.pdf:pdf},
isbn = {0123456789},
issn = {2045-2322},
journal = {Scientific Reports},
keywords = {Data processing,OA,OA{\_}PMC,Quality control,Statistical methods,Tuberculosis,fund{\_}ack,original},
mendeley-tags = {OA,OA{\_}PMC,fund{\_}ack,original},
month = {feb},
pages = {1859},
pmid = {36725870},
publisher = {Nature Publishing Group},
title = {{Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis}},
url = {https://www.nature.com/articles/s41598-023-28218-7},
volume = {13},
year = {2023}
}

Downloads: 0