A unified genealogy of modern and ancient genomes

A unified genealogy of modern and ancient genomes. Wohns, A. W., Wong, Y., Jeffery, B., Akbari, A., Mallick, S., Pinhasi, R., Patterson, N., Reich, D., Kelleher, J., & McVean, G. Science, 375(6583):eabi8264, February, 2022.

Paper doi abstract bibtex

The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history. , Genomics and human ancestral genealogy Hundreds of thousands of modern human genomes and thousands of ancient human genomes have been generated to date. However, different methods and data quality can make comparisons among them difficult. Furthermore, every human genome contains segments from ancestries of varying ages. Wohns et al . applied a tree recording method to ancient and modern human genomes to generate a unified human genealogy (see the Perspective by Rees and Andrés). This method allows for missing and erroneous data and uses ancient genomes to calibrate genomic coalescent times. This permits us to determine how our genomes have changed over time and between populations, informing upon the evolution of our species. —LMZ , A genealogy of modern and ancient genomes provides insight into human history and evolution. , INTRODUCTION The characterization of modern and ancient human genome sequences has revealed previously unknown features of our evolutionary past. As genome data generation continues to accelerate—through the sequencing of population-scale biobanks and ancient samples from around the world—so does the potential to generate an increasingly detailed understanding of how populations have evolved. However, such genomic datasets are highly heterogeneous. Samples from diverse times, geographic locations, and populations are processed, sequenced, and analyzed using a variety of techniques. The resulting datasets contain genuine variation but also complex patterns of missingness and error. This makes combining data challenging and hinders efforts to generate the most complete picture of human genomic variation. RATIONALE To address these challenges, we use the foundational notion that the ancestral relationships of all humans who have ever lived can be described by a single genealogy or tree sequence, so named because it encodes the sequence of trees that link individuals to one another at every point in the genome. This tree sequence of humanity is immensely complex, but estimates of the structure are a powerful means of integrating diverse datasets and gaining greater insights into human genetic diversity. In this work, we introduce statistical and computational methods to infer such a unified genealogy of modern and ancient samples, validate the methods through a mixture of computer simulation and analysis of empirical data, and apply the methods to reveal features of human diversity and evolution. RESULTS We present a unified tree sequence of 3601 modern and eight high-coverage ancient human genome sequences compiled from eight datasets. This structure is a lossless and compact representation of 27 million ancestral haplotype fragments and 231 million ancestral lineages linking genomes from these datasets back in time. The tree sequence also benefits from the use of an additional 3589 ancient samples compiled from more than 100 publications to constrain and date relationships. Using simulations and empirical analyses, we demonstrate the ability to recover relationships between individuals and populations as well as to identify descendants of ancient samples. We calculate the distribution of the time to most recent common ancestry between the 215 populations of the constituent datasets, revealing patterns consistent with substantial variation in historical population size and evidence of archaic admixture in modern humans. The tree sequence also offers insight into patterns of recurrent mutation and sequencing error in commonly used genetic datasets. We find pervasive signals of sequencing error as well as a small subset of variant sites that appear to be erroneous. Finally, we introduce an estimator of ancestor geographic location that recapitulates key features of human history. We observe signals of very deep ancestral lineages in Africa, the out-of-Africa event, and archaic introgression in Oceania. The method motivates improved spatiotemporal inference methods that will better elucidate the paths and timings of historic migrations. CONCLUSION The profusion of genetic sequencing data creates challenges for integrating diverse data sources. Our results demonstrate that whole-genome genealogies provide a powerful platform for synthesizing genetic data and investigating human history and evolution. Visualizing inferred human ancestral lineages over time and space. Each line represents an ancestor-descendant relationship in our inferred genealogy of modern and ancient genomes. The width of a line corresponds to how many times the relationship is observed, and lines are colored on the basis of the estimated age of the ancestor.

@article{wohns_unified_2022,
title = {A unified genealogy of modern and ancient genomes},
volume = {375},
issn = {0036-8075, 1095-9203},
url = {https://www.science.org/doi/10.1126/science.abi8264},
doi = {10.1126/science.abi8264},
abstract = {The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history.
,
Genomics and human ancestral genealogy

Hundreds of thousands of modern human genomes and thousands of ancient human genomes have been generated to date. However, different methods and data quality can make comparisons among them difficult. Furthermore, every human genome contains segments from ancestries of varying ages. Wohns
et al
. applied a tree recording method to ancient and modern human genomes to generate a unified human genealogy (see the Perspective by Rees and Andrés). This method allows for missing and erroneous data and uses ancient genomes to calibrate genomic coalescent times. This permits us to determine how our genomes have changed over time and between populations, informing upon the evolution of our species. —LMZ

,
A genealogy of modern and ancient genomes provides insight into human history and evolution.
,

INTRODUCTION
The characterization of modern and ancient human genome sequences has revealed previously unknown features of our evolutionary past. As genome data generation continues to accelerate—through the sequencing of population-scale biobanks and ancient samples from around the world—so does the potential to generate an increasingly detailed understanding of how populations have evolved.
However, such genomic datasets are highly heterogeneous. Samples from diverse times, geographic locations, and populations are processed, sequenced, and analyzed using a variety of techniques. The resulting datasets contain genuine variation but also complex patterns of missingness and error. This makes combining data challenging and hinders efforts to generate the most complete picture of human genomic variation.

RATIONALE
To address these challenges, we use the foundational notion that the ancestral relationships of all humans who have ever lived can be described by a single genealogy or tree sequence, so named because it encodes the sequence of trees that link individuals to one another at every point in the genome. This tree sequence of humanity is immensely complex, but estimates of the structure are a powerful means of integrating diverse datasets and gaining greater insights into human genetic diversity. In this work, we introduce statistical and computational methods to infer such a unified genealogy of modern and ancient samples, validate the methods through a mixture of computer simulation and analysis of empirical data, and apply the methods to reveal features of human diversity and evolution.

RESULTS
We present a unified tree sequence of 3601 modern and eight high-coverage ancient human genome sequences compiled from eight datasets. This structure is a lossless and compact representation of 27 million ancestral haplotype fragments and 231 million ancestral lineages linking genomes from these datasets back in time. The tree sequence also benefits from the use of an additional 3589 ancient samples compiled from more than 100 publications to constrain and date relationships.
Using simulations and empirical analyses, we demonstrate the ability to recover relationships between individuals and populations as well as to identify descendants of ancient samples. We calculate the distribution of the time to most recent common ancestry between the 215 populations of the constituent datasets, revealing patterns consistent with substantial variation in historical population size and evidence of archaic admixture in modern humans.
The tree sequence also offers insight into patterns of recurrent mutation and sequencing error in commonly used genetic datasets. We find pervasive signals of sequencing error as well as a small subset of variant sites that appear to be erroneous.
Finally, we introduce an estimator of ancestor geographic location that recapitulates key features of human history. We observe signals of very deep ancestral lineages in Africa, the out-of-Africa event, and archaic introgression in Oceania. The method motivates improved spatiotemporal inference methods that will better elucidate the paths and timings of historic migrations.

CONCLUSION
The profusion of genetic sequencing data creates challenges for integrating diverse data sources. Our results demonstrate that whole-genome genealogies provide a powerful platform for synthesizing genetic data and investigating human history and evolution.

Visualizing inferred human ancestral lineages over time and space.
Each line represents an ancestor-descendant relationship in our inferred genealogy of modern and ancient genomes. The width of a line corresponds to how many times the relationship is observed, and lines are colored on the basis of the estimated age of the ancestor.},
language = {English},
number = {6583},
urldate = {2023-12-28},
journal = {Science},
author = {Wohns, Anthony Wilder and Wong, Yan and Jeffery, Ben and Akbari, Ali and Mallick, Swapan and Pinhasi, Ron and Patterson, Nick and Reich, David and Kelleher, Jerome and McVean, Gil},
month = feb,
year = {2022},
keywords = {pending to read},
pages = {eabi8264},
}

Downloads: 0

{"_id":"hatoGKxuGvhebH32F","bibbaseid":"wohns-wong-jeffery-akbari-mallick-pinhasi-patterson-reich-etal-aunifiedgenealogyofmodernandancientgenomes-2022","author_short":["Wohns, A. W.","Wong, Y.","Jeffery, B.","Akbari, A.","Mallick, S.","Pinhasi, R.","Patterson, N.","Reich, D.","Kelleher, J.","McVean, G."],"bibdata":{"bibtype":"article","type":"article","title":"A unified genealogy of modern and ancient genomes","volume":"375","issn":"0036-8075, 1095-9203","url":"https://www.science.org/doi/10.1126/science.abi8264","doi":"10.1126/science.abi8264","abstract":"The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history. , Genomics and human ancestral genealogy Hundreds of thousands of modern human genomes and thousands of ancient human genomes have been generated to date. However, different methods and data quality can make comparisons among them difficult. Furthermore, every human genome contains segments from ancestries of varying ages. Wohns et al . applied a tree recording method to ancient and modern human genomes to generate a unified human genealogy (see the Perspective by Rees and Andrés). This method allows for missing and erroneous data and uses ancient genomes to calibrate genomic coalescent times. This permits us to determine how our genomes have changed over time and between populations, informing upon the evolution of our species. —LMZ , A genealogy of modern and ancient genomes provides insight into human history and evolution. , INTRODUCTION The characterization of modern and ancient human genome sequences has revealed previously unknown features of our evolutionary past. As genome data generation continues to accelerate—through the sequencing of population-scale biobanks and ancient samples from around the world—so does the potential to generate an increasingly detailed understanding of how populations have evolved. However, such genomic datasets are highly heterogeneous. Samples from diverse times, geographic locations, and populations are processed, sequenced, and analyzed using a variety of techniques. The resulting datasets contain genuine variation but also complex patterns of missingness and error. This makes combining data challenging and hinders efforts to generate the most complete picture of human genomic variation. RATIONALE To address these challenges, we use the foundational notion that the ancestral relationships of all humans who have ever lived can be described by a single genealogy or tree sequence, so named because it encodes the sequence of trees that link individuals to one another at every point in the genome. This tree sequence of humanity is immensely complex, but estimates of the structure are a powerful means of integrating diverse datasets and gaining greater insights into human genetic diversity. In this work, we introduce statistical and computational methods to infer such a unified genealogy of modern and ancient samples, validate the methods through a mixture of computer simulation and analysis of empirical data, and apply the methods to reveal features of human diversity and evolution. RESULTS We present a unified tree sequence of 3601 modern and eight high-coverage ancient human genome sequences compiled from eight datasets. This structure is a lossless and compact representation of 27 million ancestral haplotype fragments and 231 million ancestral lineages linking genomes from these datasets back in time. The tree sequence also benefits from the use of an additional 3589 ancient samples compiled from more than 100 publications to constrain and date relationships. Using simulations and empirical analyses, we demonstrate the ability to recover relationships between individuals and populations as well as to identify descendants of ancient samples. We calculate the distribution of the time to most recent common ancestry between the 215 populations of the constituent datasets, revealing patterns consistent with substantial variation in historical population size and evidence of archaic admixture in modern humans. The tree sequence also offers insight into patterns of recurrent mutation and sequencing error in commonly used genetic datasets. We find pervasive signals of sequencing error as well as a small subset of variant sites that appear to be erroneous. Finally, we introduce an estimator of ancestor geographic location that recapitulates key features of human history. We observe signals of very deep ancestral lineages in Africa, the out-of-Africa event, and archaic introgression in Oceania. The method motivates improved spatiotemporal inference methods that will better elucidate the paths and timings of historic migrations. CONCLUSION The profusion of genetic sequencing data creates challenges for integrating diverse data sources. Our results demonstrate that whole-genome genealogies provide a powerful platform for synthesizing genetic data and investigating human history and evolution. Visualizing inferred human ancestral lineages over time and space. Each line represents an ancestor-descendant relationship in our inferred genealogy of modern and ancient genomes. The width of a line corresponds to how many times the relationship is observed, and lines are colored on the basis of the estimated age of the ancestor.","language":"English","number":"6583","urldate":"2023-12-28","journal":"Science","author":[{"propositions":[],"lastnames":["Wohns"],"firstnames":["Anthony","Wilder"],"suffixes":[]},{"propositions":[],"lastnames":["Wong"],"firstnames":["Yan"],"suffixes":[]},{"propositions":[],"lastnames":["Jeffery"],"firstnames":["Ben"],"suffixes":[]},{"propositions":[],"lastnames":["Akbari"],"firstnames":["Ali"],"suffixes":[]},{"propositions":[],"lastnames":["Mallick"],"firstnames":["Swapan"],"suffixes":[]},{"propositions":[],"lastnames":["Pinhasi"],"firstnames":["Ron"],"suffixes":[]},{"propositions":[],"lastnames":["Patterson"],"firstnames":["Nick"],"suffixes":[]},{"propositions":[],"lastnames":["Reich"],"firstnames":["David"],"suffixes":[]},{"propositions":[],"lastnames":["Kelleher"],"firstnames":["Jerome"],"suffixes":[]},{"propositions":[],"lastnames":["McVean"],"firstnames":["Gil"],"suffixes":[]}],"month":"February","year":"2022","keywords":"pending to read","pages":"eabi8264","bibtex":"@article{wohns_unified_2022,\n\ttitle = {A unified genealogy of modern and ancient genomes},\n\tvolume = {375},\n\tissn = {0036-8075, 1095-9203},\n\turl = {https://www.science.org/doi/10.1126/science.abi8264},\n\tdoi = {10.1126/science.abi8264},\n\tabstract = {The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history.\n , \n Genomics and human ancestral genealogy\n \n Hundreds of thousands of modern human genomes and thousands of ancient human genomes have been generated to date. However, different methods and data quality can make comparisons among them difficult. Furthermore, every human genome contains segments from ancestries of varying ages. Wohns\n et al\n . applied a tree recording method to ancient and modern human genomes to generate a unified human genealogy (see the Perspective by Rees and Andrés). This method allows for missing and erroneous data and uses ancient genomes to calibrate genomic coalescent times. This permits us to determine how our genomes have changed over time and between populations, informing upon the evolution of our species. —LMZ\n \n , \n A genealogy of modern and ancient genomes provides insight into human history and evolution.\n , \n \n INTRODUCTION\n The characterization of modern and ancient human genome sequences has revealed previously unknown features of our evolutionary past. As genome data generation continues to accelerate—through the sequencing of population-scale biobanks and ancient samples from around the world—so does the potential to generate an increasingly detailed understanding of how populations have evolved.\n However, such genomic datasets are highly heterogeneous. Samples from diverse times, geographic locations, and populations are processed, sequenced, and analyzed using a variety of techniques. The resulting datasets contain genuine variation but also complex patterns of missingness and error. This makes combining data challenging and hinders efforts to generate the most complete picture of human genomic variation.\n \n \n RATIONALE\n To address these challenges, we use the foundational notion that the ancestral relationships of all humans who have ever lived can be described by a single genealogy or tree sequence, so named because it encodes the sequence of trees that link individuals to one another at every point in the genome. This tree sequence of humanity is immensely complex, but estimates of the structure are a powerful means of integrating diverse datasets and gaining greater insights into human genetic diversity. In this work, we introduce statistical and computational methods to infer such a unified genealogy of modern and ancient samples, validate the methods through a mixture of computer simulation and analysis of empirical data, and apply the methods to reveal features of human diversity and evolution.\n \n \n RESULTS\n We present a unified tree sequence of 3601 modern and eight high-coverage ancient human genome sequences compiled from eight datasets. This structure is a lossless and compact representation of 27 million ancestral haplotype fragments and 231 million ancestral lineages linking genomes from these datasets back in time. The tree sequence also benefits from the use of an additional 3589 ancient samples compiled from more than 100 publications to constrain and date relationships.\n Using simulations and empirical analyses, we demonstrate the ability to recover relationships between individuals and populations as well as to identify descendants of ancient samples. We calculate the distribution of the time to most recent common ancestry between the 215 populations of the constituent datasets, revealing patterns consistent with substantial variation in historical population size and evidence of archaic admixture in modern humans.\n The tree sequence also offers insight into patterns of recurrent mutation and sequencing error in commonly used genetic datasets. We find pervasive signals of sequencing error as well as a small subset of variant sites that appear to be erroneous.\n Finally, we introduce an estimator of ancestor geographic location that recapitulates key features of human history. We observe signals of very deep ancestral lineages in Africa, the out-of-Africa event, and archaic introgression in Oceania. The method motivates improved spatiotemporal inference methods that will better elucidate the paths and timings of historic migrations.\n \n \n CONCLUSION\n The profusion of genetic sequencing data creates challenges for integrating diverse data sources. Our results demonstrate that whole-genome genealogies provide a powerful platform for synthesizing genetic data and investigating human history and evolution.\n \n \n Visualizing inferred human ancestral lineages over time and space.\n Each line represents an ancestor-descendant relationship in our inferred genealogy of modern and ancient genomes. The width of a line corresponds to how many times the relationship is observed, and lines are colored on the basis of the estimated age of the ancestor.},\n\tlanguage = {English},\n\tnumber = {6583},\n\turldate = {2023-12-28},\n\tjournal = {Science},\n\tauthor = {Wohns, Anthony Wilder and Wong, Yan and Jeffery, Ben and Akbari, Ali and Mallick, Swapan and Pinhasi, Ron and Patterson, Nick and Reich, David and Kelleher, Jerome and McVean, Gil},\n\tmonth = feb,\n\tyear = {2022},\n\tkeywords = {pending to read},\n\tpages = {eabi8264},\n}\n\n\n\n","author_short":["Wohns, A. W.","Wong, Y.","Jeffery, B.","Akbari, A.","Mallick, S.","Pinhasi, R.","Patterson, N.","Reich, D.","Kelleher, J.","McVean, G."],"key":"wohns_unified_2022","id":"wohns_unified_2022","bibbaseid":"wohns-wong-jeffery-akbari-mallick-pinhasi-patterson-reich-etal-aunifiedgenealogyofmodernandancientgenomes-2022","role":"author","urls":{"Paper":"https://www.science.org/doi/10.1126/science.abi8264"},"keyword":["pending to read"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/Pablomr","dataSources":["ovn29uG6Mbp3JWCRR","Xiy4qRT48hGdeZLak"],"keywords":["pending to read"],"search_terms":["unified","genealogy","modern","ancient","genomes","wohns","wong","jeffery","akbari","mallick","pinhasi","patterson","reich","kelleher","mcvean"],"title":"A unified genealogy of modern and ancient genomes","year":2022}