{"_id":"GJRo79X65S98gmxEP","bibbaseid":"airola-pyysalo-bjrne-pahikkala-ginter-salakoski-allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning-2008","authorIDs":[],"author_short":["Airola, A.","Pyysalo, S.","Björne, J.","Pahikkala, T.","Ginter, F.","Salakoski, T."],"bibdata":{"title":"All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning","type":"article","year":"2008","keywords":"algorithms,artificial intelligence,computational biology,computational biology methods,databases topic,natural language processing,protein interaction mapping,protein interaction mapping methods","pages":"S2","volume":"9","websites":"http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2586751&tool=pmcentrez&rendertype=abstract","publisher":"BioMed Central","institution":"Turku Centre for Computer Science (TUCS) and the Department of IT, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland. antti.airola@utu.fi","id":"a8093e2c-dd5e-3195-a260-a59dc37f1ba9","created":"2012-04-01T16:32:49.000Z","file_attached":"true","profile_id":"5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6","group_id":"066b42c8-f712-3fc3-abb2-225c158d2704","last_modified":"2017-03-14T14:36:19.698Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"citation_key":"Airola2008","private_publication":false,"abstract":"Background: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. Results: We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. Conclusion: We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided.","bibtype":"article","author":"Airola, Antti and Pyysalo, Sampo and Björne, Jari and Pahikkala, Tapio and Ginter, Filip and Salakoski, Tapio","journal":"BMC Bioinformatics","number":"Suppl 11","bibtex":"@article{\n title = {All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning},\n type = {article},\n year = {2008},\n keywords = {algorithms,artificial intelligence,computational biology,computational biology methods,databases topic,natural language processing,protein interaction mapping,protein interaction mapping methods},\n pages = {S2},\n volume = {9},\n websites = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2586751&tool=pmcentrez&rendertype=abstract},\n publisher = {BioMed Central},\n institution = {Turku Centre for Computer Science (TUCS) and the Department of IT, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland. antti.airola@utu.fi},\n id = {a8093e2c-dd5e-3195-a260-a59dc37f1ba9},\n created = {2012-04-01T16:32:49.000Z},\n file_attached = {true},\n profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},\n group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},\n last_modified = {2017-03-14T14:36:19.698Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n citation_key = {Airola2008},\n private_publication = {false},\n abstract = {Background: Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. Results: We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. Conclusion: We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided.},\n bibtype = {article},\n author = {Airola, Antti and Pyysalo, Sampo and Björne, Jari and Pahikkala, Tapio and Ginter, Filip and Salakoski, Tapio},\n journal = {BMC Bioinformatics},\n number = {Suppl 11}\n}","author_short":["Airola, A.","Pyysalo, S.","Björne, J.","Pahikkala, T.","Ginter, F.","Salakoski, T."],"urls":{"Paper":"https://bibbase.org/service/mendeley/bfdabac2-d7f2-3c5b-aa7a-06431c0ae35e/file/54f8f3f4-a930-409f-0593-6f8b030817f0/2008-All-paths_graph_kernel_for_protein-protein_interaction_extraction_with_evaluation_of_cross-corpus_learning.pdf.pdf","Website":"http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2586751&tool=pmcentrez&rendertype=abstract"},"bibbaseid":"airola-pyysalo-bjrne-pahikkala-ginter-salakoski-allpathsgraphkernelforproteinproteininteractionextractionwithevaluationofcrosscorpuslearning-2008","role":"author","keyword":["algorithms","artificial intelligence","computational biology","computational biology methods","databases topic","natural language processing","protein interaction mapping","protein interaction mapping methods"],"downloads":0,"html":""},"bibtype":"article","creationDate":"2020-02-06T23:48:12.126Z","downloads":0,"keywords":["algorithms","artificial intelligence","computational biology","computational biology methods","databases topic","natural language processing","protein interaction mapping","protein interaction mapping methods"],"search_terms":["paths","graph","kernel","protein","protein","interaction","extraction","evaluation","cross","corpus","learning","airola","pyysalo","björne","pahikkala","ginter","salakoski"],"title":"All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning","year":2008}