Finna: A paragraph prioritization system for biocuration in the neurosciences

Finna: A paragraph prioritization system for biocuration in the neurosciences. Ambert, K., Cohen, A., Burns, G., Boudreau, E., & Sonmez, K. In AAAI Fall Symposium - Technical Report, volume FS-13-01, 2013.
abstract bibtex

The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved.

@inProceedings{
 title = {Finna: A paragraph prioritization system for biocuration in the neurosciences},
 type = {inProceedings},
 year = {2013},
 identifiers = {[object Object]},
 volume = {FS-13-01},
 id = {a2cd9752-5a3f-3f35-b7ea-996e3a5af8ee},
 created = {2017-01-06T00:32:24.000Z},
 file_attached = {false},
 profile_id = {cd7eefc1-a0bf-39e0-ba51-226c75418449},
 last_modified = {2017-03-09T22:58:18.462Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {false},
 hidden = {false},
 abstract = {The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved.},
 bibtype = {inProceedings},
 author = {Ambert, K.H. and Cohen, A.M. and Burns, G.A.P.C. and Boudreau, E. and Sonmez, K.},
 booktitle = {AAAI Fall Symposium - Technical Report}
}

Downloads: 0

{"_id":"5S4hPdgL5dGtY27Fv","bibbaseid":"ambert-cohen-burns-boudreau-sonmez-finnaaparagraphprioritizationsystemforbiocurationintheneurosciences-2013","downloads":0,"creationDate":"2017-03-19T23:05:21.685Z","title":"Finna: A paragraph prioritization system for biocuration in the neurosciences","author_short":["Ambert, K.","Cohen, A.","Burns, G.","Boudreau, E.","Sonmez, K."],"year":2013,"bibtype":"inProceedings","biburl":null,"bibdata":{"title":"Finna: A paragraph prioritization system for biocuration in the neurosciences","type":"inProceedings","year":"2013","identifiers":"[object Object]","volume":"FS-13-01","id":"a2cd9752-5a3f-3f35-b7ea-996e3a5af8ee","created":"2017-01-06T00:32:24.000Z","file_attached":false,"profile_id":"cd7eefc1-a0bf-39e0-ba51-226c75418449","last_modified":"2017-03-09T22:58:18.462Z","read":false,"starred":false,"authored":"true","confirmed":false,"hidden":false,"abstract":"The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved.","bibtype":"inProceedings","author":"Ambert, K.H. and Cohen, A.M. and Burns, G.A.P.C. and Boudreau, E. and Sonmez, K.","booktitle":"AAAI Fall Symposium - Technical Report","bibtex":"@inProceedings{\n title = {Finna: A paragraph prioritization system for biocuration in the neurosciences},\n type = {inProceedings},\n year = {2013},\n identifiers = {[object Object]},\n volume = {FS-13-01},\n id = {a2cd9752-5a3f-3f35-b7ea-996e3a5af8ee},\n created = {2017-01-06T00:32:24.000Z},\n file_attached = {false},\n profile_id = {cd7eefc1-a0bf-39e0-ba51-226c75418449},\n last_modified = {2017-03-09T22:58:18.462Z},\n read = {false},\n starred = {false},\n authored = {true},\n confirmed = {false},\n hidden = {false},\n abstract = {The emphasis of multilevel modeling techniques in the neurosciences has led to an increased need for large-scale, computationally-accessible databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Neuroinformaticians. The reasons for this are common to scientific database curation in general, namely, limitation of resources. Much of neuroscience's long tradition of research has been documented in computationally inaccessible formats, such as the pdf, making large-scale data extraction laborious and expensive. Here, we present a system for alleviating one bottleneck in the workflow for curating a typical knowledge base of neuroscience-related information. Finna is designed to rank-order the composite paragraphs of a publication that is predicted to contain information relevant to a knowledge base, in terms of the probability that each documents relevant data. We were able to achieve excellent performance with our classifier (AUC > 0.90) on our manually-curated neuroscience document corpus. Our approach would allow curators to read only a median of 2 paragraphs for each document, in order to identify information relevant to a neuron-related knowledge base. To our knowledge, this is the first system of its kind, and will be a useful baseline for developing similar resources for the neurosciences, and curation in general. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved.},\n bibtype = {inProceedings},\n author = {Ambert, K.H. and Cohen, A.M. and Burns, G.A.P.C. and Boudreau, E. and Sonmez, K.},\n booktitle = {AAAI Fall Symposium - Technical Report}\n}","author_short":["Ambert, K.","Cohen, A.","Burns, G.","Boudreau, E.","Sonmez, K."],"bibbaseid":"ambert-cohen-burns-boudreau-sonmez-finnaaparagraphprioritizationsystemforbiocurationintheneurosciences-2013","role":"author","urls":{},"downloads":0},"search_terms":["finna","paragraph","prioritization","system","biocuration","neurosciences","ambert","cohen","burns","boudreau","sonmez"],"keywords":[],"authorIDs":[]}