Identification of cell types in a mouse brain single-cell atlas using low sampling coverage

Identification of cell types in a mouse brain single-cell atlas using low sampling coverage. Bhaduri, A., Nowakowski, T. J, Pollen, A. A, & Kriegstein, A. R BMC Biol, 16(1):113, October, 2018.
abstract bibtex

BACKGROUND: High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. However, the efficient generation of such atlases will depend on sufficient sampling of diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. RESULTS: To examine the relationship between sampled cell numbers and transcriptional heterogeneity in the context of unbiased cell type classification, we explored the population structure of a publicly available 1.3 million cell dataset from E18.5 mouse brain and validated our findings in published data from adult mice. We propose a computational framework for inferring the saturation point of cluster discovery in a single-cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a ``complexity index,'' which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether the detected biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells than the originally sampled, though technical saturation of rare populations such as Cajal-Retzius cells is not achieved. We additionally validated these findings with a recently published atlas of cell types across mouse organs and again find using subsampling that a much smaller number of cells recapitulates the cluster distinctions of the complete dataset. CONCLUSIONS: Together, these findings suggest that most of the biologically interpretable cell types from the 1.3 million cell database can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high ``cellular coverage,'' cell atlas studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage and then further enriching for populations of interest. This strategy is ideal for scenarios where cost and time are limited, though extremely rare populations of interest (< 1%) may be identifiable only with much higher cell numbers.

@ARTICLE{Bhaduri2018-ew,
  title    = "Identification of cell types in a mouse brain single-cell atlas
              using low sampling coverage",
  author   = "Bhaduri, Aparna and Nowakowski, Tomasz J and Pollen, Alex A and
              Kriegstein, Arnold R",
  abstract = "BACKGROUND: High throughput methods for profiling the
              transcriptomes of single cells have recently emerged as
              transformative approaches for large-scale population surveys of
              cellular diversity in heterogeneous primary tissues. However, the
              efficient generation of such atlases will depend on sufficient
              sampling of diverse cell types while remaining cost-effective to
              enable a comprehensive examination of organs, developmental
              stages, and individuals. RESULTS: To examine the relationship
              between sampled cell numbers and transcriptional heterogeneity in
              the context of unbiased cell type classification, we explored the
              population structure of a publicly available 1.3 million cell
              dataset from E18.5 mouse brain and validated our findings in
              published data from adult mice. We propose a computational
              framework for inferring the saturation point of cluster discovery
              in a single-cell mRNA-seq experiment, centered around cluster
              preservation in downsampled datasets. In addition, we introduce a
              ``complexity index,'' which characterizes the heterogeneity of
              cells in a given dataset. Using Cajal-Retzius cells as an example
              of a limited complexity dataset, we explored whether the detected
              biological distinctions relate to technical clustering.
              Surprisingly, we found that clustering distinctions carrying
              biologically interpretable meaning are achieved with far fewer
              cells than the originally sampled, though technical saturation of
              rare populations such as Cajal-Retzius cells is not achieved. We
              additionally validated these findings with a recently published
              atlas of cell types across mouse organs and again find using
              subsampling that a much smaller number of cells recapitulates the
              cluster distinctions of the complete dataset. CONCLUSIONS:
              Together, these findings suggest that most of the biologically
              interpretable cell types from the 1.3 million cell database can
              be recapitulated by analyzing 50,000 randomly selected cells,
              indicating that instead of profiling few individuals at high
              ``cellular coverage,'' cell atlas studies may instead benefit
              from profiling more individuals, or many time points at lower
              cellular coverage and then further enriching for populations of
              interest. This strategy is ideal for scenarios where cost and
              time are limited, though extremely rare populations of interest
              (< 1\%) may be identifiable only with much higher cell numbers.",
  journal  = "BMC Biol",
  volume   =  16,
  number   =  1,
  pages    = "113",
  month    =  oct,
  year     =  2018,
  keywords = "Bioinformatics; Cell atlas studies; Downsampling; Single-cell
              analysis",
  language = "en"
}

Downloads: 0

{"_id":"zt8wKDvmrEFKm4bBq","bibbaseid":"bhaduri-nowakowski-pollen-kriegstein-identificationofcelltypesinamousebrainsinglecellatlasusinglowsamplingcoverage-2018","author_short":["Bhaduri, A.","Nowakowski, T. J","Pollen, A. A","Kriegstein, A. R"],"bibdata":{"bibtype":"article","type":"article","title":"Identification of cell types in a mouse brain single-cell atlas using low sampling coverage","author":[{"propositions":[],"lastnames":["Bhaduri"],"firstnames":["Aparna"],"suffixes":[]},{"propositions":[],"lastnames":["Nowakowski"],"firstnames":["Tomasz","J"],"suffixes":[]},{"propositions":[],"lastnames":["Pollen"],"firstnames":["Alex","A"],"suffixes":[]},{"propositions":[],"lastnames":["Kriegstein"],"firstnames":["Arnold","R"],"suffixes":[]}],"abstract":"BACKGROUND: High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. However, the efficient generation of such atlases will depend on sufficient sampling of diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. RESULTS: To examine the relationship between sampled cell numbers and transcriptional heterogeneity in the context of unbiased cell type classification, we explored the population structure of a publicly available 1.3 million cell dataset from E18.5 mouse brain and validated our findings in published data from adult mice. We propose a computational framework for inferring the saturation point of cluster discovery in a single-cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a ``complexity index,'' which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether the detected biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells than the originally sampled, though technical saturation of rare populations such as Cajal-Retzius cells is not achieved. We additionally validated these findings with a recently published atlas of cell types across mouse organs and again find using subsampling that a much smaller number of cells recapitulates the cluster distinctions of the complete dataset. CONCLUSIONS: Together, these findings suggest that most of the biologically interpretable cell types from the 1.3 million cell database can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high ``cellular coverage,'' cell atlas studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage and then further enriching for populations of interest. This strategy is ideal for scenarios where cost and time are limited, though extremely rare populations of interest (< 1%) may be identifiable only with much higher cell numbers.","journal":"BMC Biol","volume":"16","number":"1","pages":"113","month":"October","year":"2018","keywords":"Bioinformatics; Cell atlas studies; Downsampling; Single-cell analysis","language":"en","bibtex":"@ARTICLE{Bhaduri2018-ew,\n title = \"Identification of cell types in a mouse brain single-cell atlas\n using low sampling coverage\",\n author = \"Bhaduri, Aparna and Nowakowski, Tomasz J and Pollen, Alex A and\n Kriegstein, Arnold R\",\n abstract = \"BACKGROUND: High throughput methods for profiling the\n transcriptomes of single cells have recently emerged as\n transformative approaches for large-scale population surveys of\n cellular diversity in heterogeneous primary tissues. However, the\n efficient generation of such atlases will depend on sufficient\n sampling of diverse cell types while remaining cost-effective to\n enable a comprehensive examination of organs, developmental\n stages, and individuals. RESULTS: To examine the relationship\n between sampled cell numbers and transcriptional heterogeneity in\n the context of unbiased cell type classification, we explored the\n population structure of a publicly available 1.3 million cell\n dataset from E18.5 mouse brain and validated our findings in\n published data from adult mice. We propose a computational\n framework for inferring the saturation point of cluster discovery\n in a single-cell mRNA-seq experiment, centered around cluster\n preservation in downsampled datasets. In addition, we introduce a\n ``complexity index,'' which characterizes the heterogeneity of\n cells in a given dataset. Using Cajal-Retzius cells as an example\n of a limited complexity dataset, we explored whether the detected\n biological distinctions relate to technical clustering.\n Surprisingly, we found that clustering distinctions carrying\n biologically interpretable meaning are achieved with far fewer\n cells than the originally sampled, though technical saturation of\n rare populations such as Cajal-Retzius cells is not achieved. We\n additionally validated these findings with a recently published\n atlas of cell types across mouse organs and again find using\n subsampling that a much smaller number of cells recapitulates the\n cluster distinctions of the complete dataset. CONCLUSIONS:\n Together, these findings suggest that most of the biologically\n interpretable cell types from the 1.3 million cell database can\n be recapitulated by analyzing 50,000 randomly selected cells,\n indicating that instead of profiling few individuals at high\n ``cellular coverage,'' cell atlas studies may instead benefit\n from profiling more individuals, or many time points at lower\n cellular coverage and then further enriching for populations of\n interest. This strategy is ideal for scenarios where cost and\n time are limited, though extremely rare populations of interest\n (< 1\\%) may be identifiable only with much higher cell numbers.\",\n journal = \"BMC Biol\",\n volume = 16,\n number = 1,\n pages = \"113\",\n month = oct,\n year = 2018,\n keywords = \"Bioinformatics; Cell atlas studies; Downsampling; Single-cell\n analysis\",\n language = \"en\"\n}\n\n","author_short":["Bhaduri, A.","Nowakowski, T. J","Pollen, A. A","Kriegstein, A. R"],"key":"Bhaduri2018-ew","id":"Bhaduri2018-ew","bibbaseid":"bhaduri-nowakowski-pollen-kriegstein-identificationofcelltypesinamousebrainsinglecellatlasusinglowsamplingcoverage-2018","role":"author","urls":{},"keyword":["Bioinformatics; Cell atlas studies; Downsampling; Single-cell analysis"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/f/EJMp3HRuxirjxpcXh/references.bib","dataSources":["sAFYeB74DpbdXM9NN","4zx9n2tbeLTix3Wxr","k3cdWrThyTh5o59Rm","hq9pebjzmsTuyxGGx","h8Atv2SAy4PmShg5j"],"keywords":["bioinformatics; cell atlas studies; downsampling; single-cell analysis"],"search_terms":["identification","cell","types","mouse","brain","single","cell","atlas","using","low","sampling","coverage","bhaduri","nowakowski","pollen","kriegstein"],"title":"Identification of cell types in a mouse brain single-cell atlas using low sampling coverage","year":2018}