Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Sullivan, P. F., Meadows, J. R. S., Gazal, S., Phan, B. N., Li, X., Genereux, D. P., Dong, M. X., Bianchi, M., Andrews, G., Sakthikumar, S., Nordin, J., Roy, A., Christmas, M. J., Marinescu, V. D., Wang, C., Wallerman, O., Xue, J., Yao, S., Sun, Q., Szatkiewicz, J., Wen, J., Huckins, L. M., Lawler, A., Keough, K. C., Zheng, Z., Zeng, J., Wray, N. R., Li, Y., Johnson, J., Chen, J., Zoonomia Consortium§, Paten, B., Reilly, S. K., Hughes, G. M., Weng, Z., Pollard, K. S., Pfenning, A. R., Forsberg-Nilsson, K., Karlsson, E. K., Lindblad-Toh, K., Andrews, G., Armstrong, J. C., Bianchi, M., Birren, B. W., Bredemeyer, K. R., Breit, A. M., Christmas, M. J., Clawson, H., Damas, J., Di Palma, F., Diekhans, M., Dong, M. X., Eizirik, E., Fan, K., Fanter, C., Foley, N. M., Forsberg-Nilsson, K., Garcia, C. J., Gatesy, J., Gazal, S., Genereux, D. P., Goodman, L., Grimshaw, J., Halsey, M. K., Harris, A. J., Hickey, G., Hiller, M., Hindle, A. G., Hubley, R. M., Hughes, G. M., Johnson, J., Juan, D., Kaplow, I. M., Karlsson, E. K., Keough, K. C., Kirilenko, B., Koepfli, K., Korstian, J. M., Kowalczyk, A., Kozyrev, S. V., Lawler, A. J., Lawless, C., Lehmann, T., Levesque, D. L., Lewin, H. A., Li, X., Lind, A., Lindblad-Toh, K., Mackay-Smith, A., Marinescu, V. D., Marques-Bonet, T., Mason, V. C., Meadows, J. R. S., Meyer, W. K., Moore, J. E., Moreira, L. R., Moreno-Santillan, D. D., Morrill, K. M., Muntané, G., Murphy, W. J., Navarro, A., Nweeia, M., Ortmann, S., Osmanski, A., Paten, B., Paulat, N. S., Pfenning, A. R., Phan, B. N., Pollard, K. S., Pratt, H. E., Ray, D. A., Reilly, S. K., Rosen, J. R., Ruf, I., Ryan, L., Ryder, O. A., Sabeti, P. C., Schäffer, D. E., Serres, A., Shapiro, B., Smit, A. F. A., Springer, M., Srinivasan, C., Steiner, C., Storer, J. M., Sullivan, K. A. M., Sullivan, P. F., Sundström, E., Supple, M. A., Swofford, R., Talbot, J., Teeling, E., Turner-Maier, J., Valenzuela, A., Wagner, F., Wallerman, O., Wang, C., Wang, J., Weng, Z., Wilder, A. P., Wirthlin, M. E., Xue, J. R., & Zhang, X. Science, 380(6643):eabn2937, April, 2023.
Leveraging base-pair mammalian constraint to understand genetic variation and human disease [link]Paper  doi  abstract   bibtex   
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease. , INTRODUCTION Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge. RATIONALE We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx). RESULTS Using constraint calculated across placental mammals, 3.3% of bases in the human genome are significantly constrained, including 57.6% of coding bases. Most constrained bases (80.7%) are noncoding. Common variants (allele frequency ≥ 5%) and low-frequency variants (0.5% ≤ allele frequency \textless 5%) are depleted for constrained bases (1.85 versus 3.26% expected by chance, P \textless 2.2 × 10 −308 ). Pathogenic ClinVar variants are more constrained than benign variants ( P \textless 2.2 × 10 −16 ). The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base. Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes. CONCLUSION Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease. Using evolutionary constraint in genomic studies of human diseases. ( A ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). ( B ) Pathogenic ClinVar variants ( N = 73,885) are more constrained across mammals than benign variants ( N = 231,642; P \textless 2.2 × 10 −16 ). ( C ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). ( D ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). ( E ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability.
@article{sullivan_leveraging_2023,
	title = {Leveraging base-pair mammalian constraint to understand genetic variation and human disease},
	volume = {380},
	issn = {0036-8075, 1095-9203},
	url = {https://www.science.org/doi/10.1126/science.abn2937},
	doi = {10.1126/science.abn2937},
	abstract = {Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3\% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
          , 
            
              INTRODUCTION
              Thousands of genetic variants have been associated with human diseases and traits through genome-wide association studies (GWASs). Translating these discoveries into improved therapeutics requires discerning which variants among hundreds of candidates are causally related to disease risk. To date, only a handful of causal variants have been confirmed. Here, we leverage 100 million years of mammalian evolution to address this major challenge.
            
            
              RATIONALE
              We compared genomes from hundreds of mammals and identified bases with unusually few variants (evolutionarily constrained). Constraint is a measure of functional importance that is agnostic to cell type or developmental stage. It can be applied to investigate any heritable disease or trait and is complementary to resources using cell type– and time point–specific functional assays like Encyclopedia of DNA Elements (ENCODE) and Genotype-Tissue Expression (GTEx).
            
            
              RESULTS
              
                Using constraint calculated across placental mammals, 3.3\% of bases in the human genome are significantly constrained, including 57.6\% of coding bases. Most constrained bases (80.7\%) are noncoding. Common variants (allele frequency ≥ 5\%) and low-frequency variants (0.5\% ≤ allele frequency {\textless} 5\%) are depleted for constrained bases (1.85 versus 3.26\% expected by chance,
                P
                {\textless} 2.2 × 10
                −308
                ). Pathogenic ClinVar variants are more constrained than benign variants (
                P
                {\textless} 2.2 × 10
                −16
                ).
              
              The most constrained common variants are more enriched for disease single-nucleotide polymorphism (SNP)–heritability in 63 independent GWASs. The enrichment of SNP-heritability in constrained regions is greater (7.8-fold) than previously reported in mammals and is even higher in primates (11.1-fold). It exceeds the enrichment of SNP-heritability in nonsynonymous coding variants (7.2-fold) and fine-mapped expression quantitative trait loci (eQTL)–SNPs (4.8-fold). The enrichment peaks near constrained bases, with a log-linear decrease of SNP-heritability enrichment as a function of the distance to a constrained base.
              Zoonomia constraint scores improve functionally informed fine-mapping. Variants at sites constrained in mammals and primates have greater posterior inclusion probabilities and higher per-SNP contributions. In addition, using both constraint and functional annotations improves polygenic risk score accuracy across a range of traits. Finally, incorporating constraint information into the analysis of noncoding somatic variants in medulloblastomas identifies new candidate driver genes.
            
            
              CONCLUSION
              Genome-wide measures of evolutionary constraint can help discern which variants are functionally important. This information may accelerate the translation of genomic discoveries into the biological, clinical, and therapeutic knowledge that is required to understand and treat human disease.
              
                
                  Using evolutionary constraint in genomic studies of human diseases.
                  
                    (
                    A
                    ) Constraint was calculated across 240 mammal species, including 43 primates (teal line). (
                    B
                    ) Pathogenic ClinVar variants (
                    N
                    = 73,885) are more constrained across mammals than benign variants (
                    N
                    = 231,642;
                    P
                    {\textless} 2.2 × 10
                    −16
                    ). (
                    C
                    ) More-constrained bases are more enriched for trait-associated variants (63 GWASs). (
                    D
                    ) Enrichment of heritability is higher in constrained regions than in functional annotations (left), even in a joint model with 106 annotations (right). (
                    E
                    ) Fine-mapping (PolyFun) using a model that includes constraint scores identifies an experimentally validated association at rs1421085. Error bars represent 95\% confidence intervals. BMI, body mass index; LF, low frequency; PIP, posterior inclusion probability.},
	language = {en},
	number = {6643},
	urldate = {2023-04-28},
	journal = {Science},
	author = {Sullivan, Patrick F. and Meadows, Jennifer R. S. and Gazal, Steven and Phan, BaDoi N. and Li, Xue and Genereux, Diane P. and Dong, Michael X. and Bianchi, Matteo and Andrews, Gregory and Sakthikumar, Sharadha and Nordin, Jessika and Roy, Ananya and Christmas, Matthew J. and Marinescu, Voichita D. and Wang, Chao and Wallerman, Ola and Xue, James and Yao, Shuyang and Sun, Quan and Szatkiewicz, Jin and Wen, Jia and Huckins, Laura M. and Lawler, Alyssa and Keough, Kathleen C. and Zheng, Zhili and Zeng, Jian and Wray, Naomi R. and Li, Yun and Johnson, Jessica and Chen, Jiawen and {Zoonomia Consortium§} and Paten, Benedict and Reilly, Steven K. and Hughes, Graham M. and Weng, Zhiping and Pollard, Katherine S. and Pfenning, Andreas R. and Forsberg-Nilsson, Karin and Karlsson, Elinor K. and Lindblad-Toh, Kerstin and Andrews, Gregory and Armstrong, Joel C. and Bianchi, Matteo and Birren, Bruce W. and Bredemeyer, Kevin R. and Breit, Ana M. and Christmas, Matthew J. and Clawson, Hiram and Damas, Joana and Di Palma, Federica and Diekhans, Mark and Dong, Michael X. and Eizirik, Eduardo and Fan, Kaili and Fanter, Cornelia and Foley, Nicole M. and Forsberg-Nilsson, Karin and Garcia, Carlos J. and Gatesy, John and Gazal, Steven and Genereux, Diane P. and Goodman, Linda and Grimshaw, Jenna and Halsey, Michaela K. and Harris, Andrew J. and Hickey, Glenn and Hiller, Michael and Hindle, Allyson G. and Hubley, Robert M. and Hughes, Graham M. and Johnson, Jeremy and Juan, David and Kaplow, Irene M. and Karlsson, Elinor K. and Keough, Kathleen C. and Kirilenko, Bogdan and Koepfli, Klaus-Peter and Korstian, Jennifer M. and Kowalczyk, Amanda and Kozyrev, Sergey V. and Lawler, Alyssa J. and Lawless, Colleen and Lehmann, Thomas and Levesque, Danielle L. and Lewin, Harris A. and Li, Xue and Lind, Abigail and Lindblad-Toh, Kerstin and Mackay-Smith, Ava and Marinescu, Voichita D. and Marques-Bonet, Tomas and Mason, Victor C. and Meadows, Jennifer R. S. and Meyer, Wynn K. and Moore, Jill E. and Moreira, Lucas R. and Moreno-Santillan, Diana D. and Morrill, Kathleen M. and Muntané, Gerard and Murphy, William J. and Navarro, Arcadi and Nweeia, Martin and Ortmann, Sylvia and Osmanski, Austin and Paten, Benedict and Paulat, Nicole S. and Pfenning, Andreas R. and Phan, BaDoi N. and Pollard, Katherine S. and Pratt, Henry E. and Ray, David A. and Reilly, Steven K. and Rosen, Jeb R. and Ruf, Irina and Ryan, Louise and Ryder, Oliver A. and Sabeti, Pardis C. and Schäffer, Daniel E. and Serres, Aitor and Shapiro, Beth and Smit, Arian F. A. and Springer, Mark and Srinivasan, Chaitanya and Steiner, Cynthia and Storer, Jessica M. and Sullivan, Kevin A. M. and Sullivan, Patrick F. and Sundström, Elisabeth and Supple, Megan A. and Swofford, Ross and Talbot, Joy-El and Teeling, Emma and Turner-Maier, Jason and Valenzuela, Alejandro and Wagner, Franziska and Wallerman, Ola and Wang, Chao and Wang, Juehan and Weng, Zhiping and Wilder, Aryn P. and Wirthlin, Morgan E. and Xue, James R. and Zhang, Xiaomeng},
	month = apr,
	year = {2023},
	keywords = {Zoonomia},
	pages = {eabn2937},
}

Downloads: 0