Single-sequence protein structure prediction using supervised transformer protein language models. Wang, W., Peng, Z., & Yang, J. Technical Report January, 2022. Company: Cold Spring Harbor Laboratory Distributor: Cold Spring Harbor Laboratory Label: Cold Spring Harbor Laboratory Section: New Results Type: article
Single-sequence protein structure prediction using supervised transformer protein language models [link]Paper  doi  abstract   bibtex   
It remains challenging for single-sequence protein structure prediction with AlphaFold2 and other deep learning methods. In this work, we introduce trRosettaX-Single, a novel algorithm for singlesequence protein structure prediction. It is built on sequence embedding from s-ESM-1b, a supervised transformer protein language model optimized from the pre-trained model ESM-1b. The sequence embedding is fed into a multi-scale network with knowledge distillation to predict inter-residue 2D geometry, including distance and orientations. The predicted 2D geometry is then used to reconstruct 3D structure models based on energy minimization. Benchmark tests show that trRosettaX-Single outperforms AlphaFold2 and RoseTTAFold on natural proteins. For instance, with single-sequence input, trRosettaX-Single generates structure models with an average TM-score \textasciitilde0.5 on 77 CASP14 domains, significantly higher than AlphaFold2 (0.35) and RoseTTAFold (0.34). Further test on 101 human-designed proteins indicates that trRosettaX-Single works very well, with accuracy (average TM-score 0.77) approaching AlphaFold2 and higher than RoseTTAFold, but using much less computing resource. On 2000 designed proteins from network hallucination, trRosettaX-Single generates structure models highly consistent to the hallucinated ones. These data suggest that trRosettaX-Single may find immediate applications in de novo protein design and related studies. trRosettaX-Single is available through the trRosetta server at: http://yanglab.nankai.edu.cn/trRosetta/.
@techreport{wang_single-sequence_2022,
	title = {Single-sequence protein structure prediction using supervised transformer protein language models},
	copyright = {© 2022, Posted by Cold Spring Harbor Laboratory. The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.},
	url = {https://www.biorxiv.org/content/10.1101/2022.01.15.476476v1},
	abstract = {It remains challenging for single-sequence protein structure prediction with AlphaFold2 and other deep learning methods. In this work, we introduce trRosettaX-Single, a novel algorithm for singlesequence protein structure prediction. It is built on sequence embedding from s-ESM-1b, a supervised transformer protein language model optimized from the pre-trained model ESM-1b. The sequence embedding is fed into a multi-scale network with knowledge distillation to predict inter-residue 2D geometry, including distance and orientations. The predicted 2D geometry is then used to reconstruct 3D structure models based on energy minimization. Benchmark tests show that trRosettaX-Single outperforms AlphaFold2 and RoseTTAFold on natural proteins. For instance, with single-sequence input, trRosettaX-Single generates structure models with an average TM-score {\textasciitilde}0.5 on 77 CASP14 domains, significantly higher than AlphaFold2 (0.35) and RoseTTAFold (0.34). Further test on 101 human-designed proteins indicates that trRosettaX-Single works very well, with accuracy (average TM-score 0.77) approaching AlphaFold2 and higher than RoseTTAFold, but using much less computing resource. On 2000 designed proteins from network hallucination, trRosettaX-Single generates structure models highly consistent to the hallucinated ones. These data suggest that trRosettaX-Single may find immediate applications in de novo protein design and related studies. trRosettaX-Single is available through the trRosetta server at: http://yanglab.nankai.edu.cn/trRosetta/.},
	language = {en},
	urldate = {2022-01-25},
	author = {Wang, Wenkai and Peng, Zhenling and Yang, Jianyi},
	month = jan,
	year = {2022},
	doi = {10.1101/2022.01.15.476476},
	note = {Company: Cold Spring Harbor Laboratory
Distributor: Cold Spring Harbor Laboratory
Label: Cold Spring Harbor Laboratory
Section: New Results
Type: article},
	pages = {2022.01.15.476476},
}

Downloads: 0