Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster

Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster. Ober, U., [...], Schlather, M., Mackay, T. F. C., & Simianer, H. PLoS Genetics, 8(5):e1002685, 2012.

Paper doi abstract bibtex

Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using $∼$2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239$±$0.008 (0.230$±$0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. The ability to accurately predict values of complex phenotypes from genotype data will revolutionize plant and animal breeding, personalized medicine, and evolutionary biology. To date, genomic prediction has utilized high-density single-nucleotide polymorphism (SNP) genotyping arrays, but the availability of sequence data opens new frontiers for genomic prediction methods. This article is the first application of genomic phenotype prediction using whole-genome sequence data in a substantial sample of a higher eukaryote. We use $∼$2.5 million SNPs with minor allele frequency greater than 2.5% derived from genomic sequences of the '' Drosophila Genetic Reference Panel'' to predict phenotypes for two traits, starvation resistance and startle-induced locomotor behavior. We systematically address prediction within versus across sexes, genomic best linear unbiased prediction (GBLUP) versus a Bayesian approach, and the effect of SNP density. We find that (i) genomic prediction can be efficiently implemented using sequence data via GBLUP, (ii) there is little gain in predictive ability if the number of SNPs is increased above 150,000, and (iii) neither implicit nor explicit marker selection substantially improves the predictive ability. Although the findings must be seen against the background of small sample sizes, the results illustrate both the potential of the approach and the challenges ahead.

@article{Ober2012Using,
 abstract = {Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using $\sim$2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239$\pm$0.008 (0.230$\pm$0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5{\%} SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP--based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. The ability to accurately predict values of complex phenotypes from genotype data will revolutionize plant and animal breeding, personalized medicine, and evolutionary biology. To date, genomic prediction has utilized high-density single-nucleotide polymorphism (SNP) genotyping arrays, but the availability of sequence data opens new frontiers for genomic prediction methods. This article is the first application of genomic phenotype prediction using whole-genome sequence data in a substantial sample of a higher eukaryote. We use $\sim$2.5 million SNPs with minor allele frequency greater than 2.5{\%} derived from genomic sequences of the  '' Drosophila Genetic Reference Panel'' to predict phenotypes for two traits, starvation resistance and startle-induced locomotor behavior. We systematically address prediction within versus across sexes, genomic best linear unbiased prediction (GBLUP) versus a Bayesian approach, and the effect of SNP density. We find that (i) genomic prediction can be efficiently implemented using sequence data via GBLUP, (ii) there is little gain in predictive ability if the number of SNPs is increased above 150,000, and (iii) neither implicit nor explicit marker selection substantially improves the predictive ability. Although the findings must be seen against the background of small sample sizes, the results illustrate both the potential of the approach and the challenges ahead.},
 author = {Ober, Ulrike and {[...]} and Schlather, Martin and Mackay, Trudy F. C. and Simianer, Henner},
 year = {2012},
 title = {Using whole-genome sequence data to predict quantitative trait phenotypes in \textit{Drosophila melanogaster}},
 url = {http://dx.doi.org/10.1371/journal.pgen.1002685},
 keywords = {gen;phd},
 pages = {e1002685},
 volume = {8},
 number = {5},
 journal = {PLoS Genetics},
 doi = {10.1371/journal.pgen.1002685},
 howpublished = {refereed}
}

Downloads: 0

{"_id":"HgvYYdfY3MTyX7XJ7","bibbaseid":"ober--schlather-mackay-simianer-usingwholegenomesequencedatatopredictquantitativetraitphenotypesinidrosophilamelanogasteri-2012","authorIDs":[],"author_short":["Ober, U.","[...]","Schlather, M.","Mackay, T. F. C.","Simianer, H."],"bibdata":{"bibtype":"article","type":"article","abstract":"Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using $∼$2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239$±$0.008 (0.230$±$0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. The ability to accurately predict values of complex phenotypes from genotype data will revolutionize plant and animal breeding, personalized medicine, and evolutionary biology. To date, genomic prediction has utilized high-density single-nucleotide polymorphism (SNP) genotyping arrays, but the availability of sequence data opens new frontiers for genomic prediction methods. This article is the first application of genomic phenotype prediction using whole-genome sequence data in a substantial sample of a higher eukaryote. We use $∼$2.5 million SNPs with minor allele frequency greater than 2.5% derived from genomic sequences of the '' Drosophila Genetic Reference Panel'' to predict phenotypes for two traits, starvation resistance and startle-induced locomotor behavior. We systematically address prediction within versus across sexes, genomic best linear unbiased prediction (GBLUP) versus a Bayesian approach, and the effect of SNP density. We find that (i) genomic prediction can be efficiently implemented using sequence data via GBLUP, (ii) there is little gain in predictive ability if the number of SNPs is increased above 150,000, and (iii) neither implicit nor explicit marker selection substantially improves the predictive ability. Although the findings must be seen against the background of small sample sizes, the results illustrate both the potential of the approach and the challenges ahead.","author":[{"propositions":[],"lastnames":["Ober"],"firstnames":["Ulrike"],"suffixes":[]},{"firstnames":[],"propositions":[],"lastnames":["[...]"],"suffixes":[]},{"propositions":[],"lastnames":["Schlather"],"firstnames":["Martin"],"suffixes":[]},{"propositions":[],"lastnames":["Mackay"],"firstnames":["Trudy","F.","C."],"suffixes":[]},{"propositions":[],"lastnames":["Simianer"],"firstnames":["Henner"],"suffixes":[]}],"year":"2012","title":"Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster","url":"http://dx.doi.org/10.1371/journal.pgen.1002685","keywords":"gen;phd","pages":"e1002685","volume":"8","number":"5","journal":"PLoS Genetics","doi":"10.1371/journal.pgen.1002685","howpublished":"refereed","bibtex":"@article{Ober2012Using,\r\n abstract = {Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using $\\sim$2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239$\\pm$0.008 (0.230$\\pm$0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5{\\%} SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP--based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms. The ability to accurately predict values of complex phenotypes from genotype data will revolutionize plant and animal breeding, personalized medicine, and evolutionary biology. To date, genomic prediction has utilized high-density single-nucleotide polymorphism (SNP) genotyping arrays, but the availability of sequence data opens new frontiers for genomic prediction methods. This article is the first application of genomic phenotype prediction using whole-genome sequence data in a substantial sample of a higher eukaryote. We use $\\sim$2.5 million SNPs with minor allele frequency greater than 2.5{\\%} derived from genomic sequences of the '' Drosophila Genetic Reference Panel'' to predict phenotypes for two traits, starvation resistance and startle-induced locomotor behavior. We systematically address prediction within versus across sexes, genomic best linear unbiased prediction (GBLUP) versus a Bayesian approach, and the effect of SNP density. We find that (i) genomic prediction can be efficiently implemented using sequence data via GBLUP, (ii) there is little gain in predictive ability if the number of SNPs is increased above 150,000, and (iii) neither implicit nor explicit marker selection substantially improves the predictive ability. Although the findings must be seen against the background of small sample sizes, the results illustrate both the potential of the approach and the challenges ahead.},\r\n author = {Ober, Ulrike and {[...]} and Schlather, Martin and Mackay, Trudy F. C. and Simianer, Henner},\r\n year = {2012},\r\n title = {Using whole-genome sequence data to predict quantitative trait phenotypes in \\textit{Drosophila melanogaster}},\r\n url = {http://dx.doi.org/10.1371/journal.pgen.1002685},\r\n keywords = {gen;phd},\r\n pages = {e1002685},\r\n volume = {8},\r\n number = {5},\r\n journal = {PLoS Genetics},\r\n doi = {10.1371/journal.pgen.1002685},\r\n howpublished = {refereed}\r\n}\r\n\r\n\r\n","author_short":["Ober, U.","[...]","Schlather, M.","Mackay, T. F. C.","Simianer, H."],"key":"Ober2012Using","id":"Ober2012Using","bibbaseid":"ober--schlather-mackay-simianer-usingwholegenomesequencedatatopredictquantitativetraitphenotypesinidrosophilamelanogasteri-2012","role":"author","urls":{"Paper":"http://dx.doi.org/10.1371/journal.pgen.1002685"},"keyword":["gen;phd"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"article","biburl":"http://www.uni-goettingen.de/de/document/download/9d7c40531010bf5be953ccd9446e47ae.bib/GRK1644.bib","creationDate":"2020-12-14T16:55:09.326Z","downloads":0,"keywords":["gen;phd"],"search_terms":["using","whole","genome","sequence","data","predict","quantitative","trait","phenotypes","drosophila","melanogaster","ober","[...]","schlather","mackay","simianer"],"title":"Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster","year":2012,"dataSources":["psxr4mFyE5JDwFLuZ","2w3D54bmLuhpt4TNv","t8S6Y6RWEwDAQHiSQ","cLGdYAfLyvQDgrYmh"]}