BioInfer: a corpus for information extraction in the biomedical domain. Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. BMC Bioinformatics, 8(1):50, BioMed Central, 2007.
Paper
Website abstract bibtex Background: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. Results: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. Conclusion: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at .
@article{
title = {BioInfer: a corpus for information extraction in the biomedical domain},
type = {article},
year = {2007},
keywords = {database management systems,databases,documentation,documentation methods,factual,genes,information storage retrieval,information storage retrieval methods,natural language processing,periodicals topic,proteins,proteins classification,rna,rna classification,terminology topic},
pages = {50},
volume = {8},
websites = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1808065&tool=pmcentrez&rendertype=abstract},
publisher = {BioMed Central},
institution = {Turku Centre for Computer Science (TUCS), University of Turku, Lemminkäisenkatu 14a, 20520 Turku, Finland. sampo.pyysalo@it.utu.fi},
id = {6ba597df-16a8-31f7-9731-d8035f1d1a41},
created = {2011-12-29T19:53:53.000Z},
file_attached = {true},
profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
last_modified = {2017-03-14T14:36:19.698Z},
read = {false},
starred = {false},
authored = {false},
confirmed = {true},
hidden = {false},
citation_key = {Pyysalo2007},
private_publication = {false},
abstract = {Background: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. Results: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. Conclusion: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at .},
bibtype = {article},
author = {Pyysalo, Sampo and Ginter, Filip and Heimonen, Juho and Björne, Jari and Boberg, Jorma and Järvinen, Jouni and Salakoski, Tapio},
journal = {BMC Bioinformatics},
number = {1}
}
Downloads: 0
{"_id":"29i4sjqydkckg5JjH","authorIDs":[],"author_short":["Pyysalo, S.","Ginter, F.","Heimonen, J.","Björne, J.","Boberg, J.","Järvinen, J.","Salakoski, T."],"bibbaseid":"pyysalo-ginter-heimonen-bjrne-boberg-jrvinen-salakoski-bioinferacorpusforinformationextractioninthebiomedicaldomain-2007","bibdata":{"title":"BioInfer: a corpus for information extraction in the biomedical domain","type":"article","year":"2007","keywords":"database management systems,databases,documentation,documentation methods,factual,genes,information storage retrieval,information storage retrieval methods,natural language processing,periodicals topic,proteins,proteins classification,rna,rna classification,terminology topic","pages":"50","volume":"8","websites":"http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1808065&tool=pmcentrez&rendertype=abstract","publisher":"BioMed Central","institution":"Turku Centre for Computer Science (TUCS), University of Turku, Lemminkäisenkatu 14a, 20520 Turku, Finland. sampo.pyysalo@it.utu.fi","id":"6ba597df-16a8-31f7-9731-d8035f1d1a41","created":"2011-12-29T19:53:53.000Z","file_attached":"true","profile_id":"5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6","group_id":"066b42c8-f712-3fc3-abb2-225c158d2704","last_modified":"2017-03-14T14:36:19.698Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"citation_key":"Pyysalo2007","private_publication":false,"abstract":"Background: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. Results: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. Conclusion: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at .","bibtype":"article","author":"Pyysalo, Sampo and Ginter, Filip and Heimonen, Juho and Björne, Jari and Boberg, Jorma and Järvinen, Jouni and Salakoski, Tapio","journal":"BMC Bioinformatics","number":"1","bibtex":"@article{\n title = {BioInfer: a corpus for information extraction in the biomedical domain},\n type = {article},\n year = {2007},\n keywords = {database management systems,databases,documentation,documentation methods,factual,genes,information storage retrieval,information storage retrieval methods,natural language processing,periodicals topic,proteins,proteins classification,rna,rna classification,terminology topic},\n pages = {50},\n volume = {8},\n websites = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1808065&tool=pmcentrez&rendertype=abstract},\n publisher = {BioMed Central},\n institution = {Turku Centre for Computer Science (TUCS), University of Turku, Lemminkäisenkatu 14a, 20520 Turku, Finland. sampo.pyysalo@it.utu.fi},\n id = {6ba597df-16a8-31f7-9731-d8035f1d1a41},\n created = {2011-12-29T19:53:53.000Z},\n file_attached = {true},\n profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},\n group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},\n last_modified = {2017-03-14T14:36:19.698Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n citation_key = {Pyysalo2007},\n private_publication = {false},\n abstract = {Background: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. Results: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. Conclusion: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at .},\n bibtype = {article},\n author = {Pyysalo, Sampo and Ginter, Filip and Heimonen, Juho and Björne, Jari and Boberg, Jorma and Järvinen, Jouni and Salakoski, Tapio},\n journal = {BMC Bioinformatics},\n number = {1}\n}","author_short":["Pyysalo, S.","Ginter, F.","Heimonen, J.","Björne, J.","Boberg, J.","Järvinen, J.","Salakoski, T."],"urls":{"Paper":"https://bibbase.org/service/mendeley/bfdabac2-d7f2-3c5b-aa7a-06431c0ae35e/file/664db9c0-85b9-3748-6e64-85c69a364b51/2007-BioInfer_a_corpus_for_information_extraction_in_the_biomedical_domain.pdf.pdf","Website":"http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1808065&tool=pmcentrez&rendertype=abstract"},"bibbaseid":"pyysalo-ginter-heimonen-bjrne-boberg-jrvinen-salakoski-bioinferacorpusforinformationextractioninthebiomedicaldomain-2007","role":"author","keyword":["database management systems","databases","documentation","documentation methods","factual","genes","information storage retrieval","information storage retrieval methods","natural language processing","periodicals topic","proteins","proteins classification","rna","rna classification","terminology topic"],"downloads":0,"html":""},"bibtype":"article","biburl":"http://bibbase.org/zotero/nicmer","creationDate":"2015-04-01T06:12:29.797Z","downloads":0,"keywords":["database management systems","databases","documentation","documentation methods","factual","genes","information storage retrieval","information storage retrieval methods","natural language processing","periodicals topic","proteins","proteins classification","rna","rna classification","terminology topic"],"search_terms":["bioinfer","corpus","information","extraction","biomedical","domain","pyysalo","ginter","heimonen","björne","boberg","järvinen","salakoski"],"title":"BioInfer: a corpus for information extraction in the biomedical domain","year":2007,"dataSources":["P7bHLEfmPfwtQBBTM"]}