Accurate prediction of protein structural class. Xia, X., Ge, M., Wang, Z., & Pan, X. PLoS ONE, 7(6):e37653, 2012. doi abstract bibtex Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.
@article{xia_accurate_2012,
title = {Accurate prediction of protein structural class},
volume = {7},
doi = {10.1371/journal.pone.0037653},
abstract = {Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1\% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1\%, which is even higher than the results of those predicted secondary structure-based methods.},
language = {eng},
number = {6},
journal = {PLoS ONE},
author = {Xia, Xia-Yu and Ge, Meng and Wang, Zhi-Xin and Pan, Xian-Ming},
year = {2012},
pmid = {22723837},
pages = {e37653},
}
Downloads: 0
{"_id":"TrEtH2eznmLP5CA39","bibbaseid":"xia-ge-wang-pan-accuratepredictionofproteinstructuralclass-2012","author_short":["Xia, X.","Ge, M.","Wang, Z.","Pan, X."],"bibdata":{"bibtype":"article","type":"article","title":"Accurate prediction of protein structural class","volume":"7","doi":"10.1371/journal.pone.0037653","abstract":"Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods.","language":"eng","number":"6","journal":"PLoS ONE","author":[{"propositions":[],"lastnames":["Xia"],"firstnames":["Xia-Yu"],"suffixes":[]},{"propositions":[],"lastnames":["Ge"],"firstnames":["Meng"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Zhi-Xin"],"suffixes":[]},{"propositions":[],"lastnames":["Pan"],"firstnames":["Xian-Ming"],"suffixes":[]}],"year":"2012","pmid":"22723837","pages":"e37653","bibtex":"@article{xia_accurate_2012,\n\ttitle = {Accurate prediction of protein structural class},\n\tvolume = {7},\n\tdoi = {10.1371/journal.pone.0037653},\n\tabstract = {Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1\\% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1\\%, which is even higher than the results of those predicted secondary structure-based methods.},\n\tlanguage = {eng},\n\tnumber = {6},\n\tjournal = {PLoS ONE},\n\tauthor = {Xia, Xia-Yu and Ge, Meng and Wang, Zhi-Xin and Pan, Xian-Ming},\n\tyear = {2012},\n\tpmid = {22723837},\n\tpages = {e37653},\n}\n\n","author_short":["Xia, X.","Ge, M.","Wang, Z.","Pan, X."],"key":"xia_accurate_2012","id":"xia_accurate_2012","bibbaseid":"xia-ge-wang-pan-accuratepredictionofproteinstructuralclass-2012","role":"author","urls":{},"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/kountour","dataSources":["MnayAXw3qciX87bz7"],"keywords":[],"search_terms":["accurate","prediction","protein","structural","class","xia","ge","wang","pan"],"title":"Accurate prediction of protein structural class","year":2012}