Genomic basis of seed colour in quinoa inferred from variant patterns using extreme gradient boosting. Sandell, F. L., Holzweber, T., Street, N. R., Dohm, J. C., & Himmelbauer, H. Plant Biotechnology Journal, 22(5):1312–1324, 2024. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/pbi.14267
Genomic basis of seed colour in quinoa inferred from variant patterns using extreme gradient boosting [link]Paper  doi  abstract   bibtex   
Quinoa is an agriculturally important crop species originally domesticated in the Andes of central South America. One of its most important phenotypic traits is seed colour. Seed colour variation is determined by contrasting abundance of betalains, a class of strong antioxidant and free radicals scavenging colour pigments only found in plants of the order Caryophyllales. However, the genetic basis for these pigments in seeds remains to be identified. Here we demonstrate the application of machine learning (extreme gradient boosting) to identify genetic variants predictive of seed colour. We show that extreme gradient boosting outperforms the classical genome-wide association approach. We provide re-sequencing and phenotypic data for 156 South American quinoa accessions and identify candidate genes potentially controlling betalain content in quinoa seeds. Genes identified include novel cytochrome P450 genes and known members of the betalain synthesis pathway, as well as genes annotated as being involved in seed development. Our work showcases the power of modern machine learning methods to extract biologically meaningful information from large sequencing data sets.
@article{sandell_genomic_2024,
	title = {Genomic basis of seed colour in quinoa inferred from variant patterns using extreme gradient boosting},
	volume = {22},
	copyright = {© 2023 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley \& Sons Ltd.},
	issn = {1467-7652},
	url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/pbi.14267},
	doi = {10.1111/pbi.14267},
	abstract = {Quinoa is an agriculturally important crop species originally domesticated in the Andes of central South America. One of its most important phenotypic traits is seed colour. Seed colour variation is determined by contrasting abundance of betalains, a class of strong antioxidant and free radicals scavenging colour pigments only found in plants of the order Caryophyllales. However, the genetic basis for these pigments in seeds remains to be identified. Here we demonstrate the application of machine learning (extreme gradient boosting) to identify genetic variants predictive of seed colour. We show that extreme gradient boosting outperforms the classical genome-wide association approach. We provide re-sequencing and phenotypic data for 156 South American quinoa accessions and identify candidate genes potentially controlling betalain content in quinoa seeds. Genes identified include novel cytochrome P450 genes and known members of the betalain synthesis pathway, as well as genes annotated as being involved in seed development. Our work showcases the power of modern machine learning methods to extract biologically meaningful information from large sequencing data sets.},
	language = {en},
	number = {5},
	urldate = {2024-04-19},
	journal = {Plant Biotechnology Journal},
	author = {Sandell, Felix L. and Holzweber, Thomas and Street, Nathaniel R. and Dohm, Juliane C. and Himmelbauer, Heinz},
	year = {2024},
	note = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/pbi.14267},
	keywords = {betalain synthesis pathway, genome sequencing, genotype-phenotype relationships, machine learning, quinoa, seed colour},
	pages = {1312--1324},
}

Downloads: 0