Identification of cell types in a mouse brain single-cell atlas using low sampling coverage. Bhaduri, A., Nowakowski, T. J, Pollen, A. A, & Kriegstein, A. R BMC Biol, 16(1):113, October, 2018. abstract bibtex BACKGROUND: High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. However, the efficient generation of such atlases will depend on sufficient sampling of diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. RESULTS: To examine the relationship between sampled cell numbers and transcriptional heterogeneity in the context of unbiased cell type classification, we explored the population structure of a publicly available 1.3 million cell dataset from E18.5 mouse brain and validated our findings in published data from adult mice. We propose a computational framework for inferring the saturation point of cluster discovery in a single-cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a ``complexity index,'' which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether the detected biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells than the originally sampled, though technical saturation of rare populations such as Cajal-Retzius cells is not achieved. We additionally validated these findings with a recently published atlas of cell types across mouse organs and again find using subsampling that a much smaller number of cells recapitulates the cluster distinctions of the complete dataset. CONCLUSIONS: Together, these findings suggest that most of the biologically interpretable cell types from the 1.3 million cell database can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high ``cellular coverage,'' cell atlas studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage and then further enriching for populations of interest. This strategy is ideal for scenarios where cost and time are limited, though extremely rare populations of interest (< 1%) may be identifiable only with much higher cell numbers.
@ARTICLE{Bhaduri2018-ew,
title = "Identification of cell types in a mouse brain single-cell atlas
using low sampling coverage",
author = "Bhaduri, Aparna and Nowakowski, Tomasz J and Pollen, Alex A and
Kriegstein, Arnold R",
abstract = "BACKGROUND: High throughput methods for profiling the
transcriptomes of single cells have recently emerged as
transformative approaches for large-scale population surveys of
cellular diversity in heterogeneous primary tissues. However, the
efficient generation of such atlases will depend on sufficient
sampling of diverse cell types while remaining cost-effective to
enable a comprehensive examination of organs, developmental
stages, and individuals. RESULTS: To examine the relationship
between sampled cell numbers and transcriptional heterogeneity in
the context of unbiased cell type classification, we explored the
population structure of a publicly available 1.3 million cell
dataset from E18.5 mouse brain and validated our findings in
published data from adult mice. We propose a computational
framework for inferring the saturation point of cluster discovery
in a single-cell mRNA-seq experiment, centered around cluster
preservation in downsampled datasets. In addition, we introduce a
``complexity index,'' which characterizes the heterogeneity of
cells in a given dataset. Using Cajal-Retzius cells as an example
of a limited complexity dataset, we explored whether the detected
biological distinctions relate to technical clustering.
Surprisingly, we found that clustering distinctions carrying
biologically interpretable meaning are achieved with far fewer
cells than the originally sampled, though technical saturation of
rare populations such as Cajal-Retzius cells is not achieved. We
additionally validated these findings with a recently published
atlas of cell types across mouse organs and again find using
subsampling that a much smaller number of cells recapitulates the
cluster distinctions of the complete dataset. CONCLUSIONS:
Together, these findings suggest that most of the biologically
interpretable cell types from the 1.3 million cell database can
be recapitulated by analyzing 50,000 randomly selected cells,
indicating that instead of profiling few individuals at high
``cellular coverage,'' cell atlas studies may instead benefit
from profiling more individuals, or many time points at lower
cellular coverage and then further enriching for populations of
interest. This strategy is ideal for scenarios where cost and
time are limited, though extremely rare populations of interest
(< 1\%) may be identifiable only with much higher cell numbers.",
journal = "BMC Biol",
volume = 16,
number = 1,
pages = "113",
month = oct,
year = 2018,
keywords = "Bioinformatics; Cell atlas studies; Downsampling; Single-cell
analysis",
language = "en"
}
Downloads: 0
{"_id":"zt8wKDvmrEFKm4bBq","bibbaseid":"bhaduri-nowakowski-pollen-kriegstein-identificationofcelltypesinamousebrainsinglecellatlasusinglowsamplingcoverage-2018","author_short":["Bhaduri, A.","Nowakowski, T. J","Pollen, A. A","Kriegstein, A. R"],"bibdata":{"bibtype":"article","type":"article","title":"Identification of cell types in a mouse brain single-cell atlas using low sampling coverage","author":[{"propositions":[],"lastnames":["Bhaduri"],"firstnames":["Aparna"],"suffixes":[]},{"propositions":[],"lastnames":["Nowakowski"],"firstnames":["Tomasz","J"],"suffixes":[]},{"propositions":[],"lastnames":["Pollen"],"firstnames":["Alex","A"],"suffixes":[]},{"propositions":[],"lastnames":["Kriegstein"],"firstnames":["Arnold","R"],"suffixes":[]}],"abstract":"BACKGROUND: High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. However, the efficient generation of such atlases will depend on sufficient sampling of diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. RESULTS: To examine the relationship between sampled cell numbers and transcriptional heterogeneity in the context of unbiased cell type classification, we explored the population structure of a publicly available 1.3 million cell dataset from E18.5 mouse brain and validated our findings in published data from adult mice. We propose a computational framework for inferring the saturation point of cluster discovery in a single-cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a ``complexity index,'' which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether the detected biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells than the originally sampled, though technical saturation of rare populations such as Cajal-Retzius cells is not achieved. We additionally validated these findings with a recently published atlas of cell types across mouse organs and again find using subsampling that a much smaller number of cells recapitulates the cluster distinctions of the complete dataset. CONCLUSIONS: Together, these findings suggest that most of the biologically interpretable cell types from the 1.3 million cell database can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high ``cellular coverage,'' cell atlas studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage and then further enriching for populations of interest. This strategy is ideal for scenarios where cost and time are limited, though extremely rare populations of interest (< 1%) may be identifiable only with much higher cell numbers.","journal":"BMC Biol","volume":"16","number":"1","pages":"113","month":"October","year":"2018","keywords":"Bioinformatics; Cell atlas studies; Downsampling; Single-cell analysis","language":"en","bibtex":"@ARTICLE{Bhaduri2018-ew,\n title = \"Identification of cell types in a mouse brain single-cell atlas\n using low sampling coverage\",\n author = \"Bhaduri, Aparna and Nowakowski, Tomasz J and Pollen, Alex A and\n Kriegstein, Arnold R\",\n abstract = \"BACKGROUND: High throughput methods for profiling the\n transcriptomes of single cells have recently emerged as\n transformative approaches for large-scale population surveys of\n cellular diversity in heterogeneous primary tissues. However, the\n efficient generation of such atlases will depend on sufficient\n sampling of diverse cell types while remaining cost-effective to\n enable a comprehensive examination of organs, developmental\n stages, and individuals. RESULTS: To examine the relationship\n between sampled cell numbers and transcriptional heterogeneity in\n the context of unbiased cell type classification, we explored the\n population structure of a publicly available 1.3 million cell\n dataset from E18.5 mouse brain and validated our findings in\n published data from adult mice. We propose a computational\n framework for inferring the saturation point of cluster discovery\n in a single-cell mRNA-seq experiment, centered around cluster\n preservation in downsampled datasets. In addition, we introduce a\n ``complexity index,'' which characterizes the heterogeneity of\n cells in a given dataset. Using Cajal-Retzius cells as an example\n of a limited complexity dataset, we explored whether the detected\n biological distinctions relate to technical clustering.\n Surprisingly, we found that clustering distinctions carrying\n biologically interpretable meaning are achieved with far fewer\n cells than the originally sampled, though technical saturation of\n rare populations such as Cajal-Retzius cells is not achieved. We\n additionally validated these findings with a recently published\n atlas of cell types across mouse organs and again find using\n subsampling that a much smaller number of cells recapitulates the\n cluster distinctions of the complete dataset. CONCLUSIONS:\n Together, these findings suggest that most of the biologically\n interpretable cell types from the 1.3 million cell database can\n be recapitulated by analyzing 50,000 randomly selected cells,\n indicating that instead of profiling few individuals at high\n ``cellular coverage,'' cell atlas studies may instead benefit\n from profiling more individuals, or many time points at lower\n cellular coverage and then further enriching for populations of\n interest. This strategy is ideal for scenarios where cost and\n time are limited, though extremely rare populations of interest\n (< 1\\%) may be identifiable only with much higher cell numbers.\",\n journal = \"BMC Biol\",\n volume = 16,\n number = 1,\n pages = \"113\",\n month = oct,\n year = 2018,\n keywords = \"Bioinformatics; Cell atlas studies; Downsampling; Single-cell\n analysis\",\n language = \"en\"\n}\n\n","author_short":["Bhaduri, A.","Nowakowski, T. J","Pollen, A. A","Kriegstein, A. R"],"key":"Bhaduri2018-ew","id":"Bhaduri2018-ew","bibbaseid":"bhaduri-nowakowski-pollen-kriegstein-identificationofcelltypesinamousebrainsinglecellatlasusinglowsamplingcoverage-2018","role":"author","urls":{},"keyword":["Bioinformatics; Cell atlas studies; Downsampling; Single-cell analysis"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/f/EJMp3HRuxirjxpcXh/references.bib","dataSources":["sAFYeB74DpbdXM9NN","4zx9n2tbeLTix3Wxr","k3cdWrThyTh5o59Rm","hq9pebjzmsTuyxGGx","h8Atv2SAy4PmShg5j"],"keywords":["bioinformatics; cell atlas studies; downsampling; single-cell analysis"],"search_terms":["identification","cell","types","mouse","brain","single","cell","atlas","using","low","sampling","coverage","bhaduri","nowakowski","pollen","kriegstein"],"title":"Identification of cell types in a mouse brain single-cell atlas using low sampling coverage","year":2018}