Data-driven classification of the certainty of scholarly assertions

Data-driven classification of the certainty of scholarly assertions. Prieto, M., Deus, H., de Waard, A., Schultes, E., García-Jiménez, B., & Wilkinson, M., D. PeerJ, 8:e8871, 4, 2020.

Website doi abstract bibtex 9 downloads

The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.

@article{
 title = {Data-driven classification of the certainty of scholarly assertions},
 type = {article},
 year = {2020},
 pages = {e8871},
 volume = {8},
 websites = {https://peerj.com/articles/8871},
 month = {4},
 day = {21},
 id = {0b32f8c2-f0a8-3480-bb74-cde89e8d0992},
 created = {2020-05-12T12:13:44.995Z},
 file_attached = {false},
 profile_id = {17c87d5d-2470-32d7-b273-0734a1d9195f},
 last_modified = {2020-07-10T11:55:06.849Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Prieto2020},
 private_publication = {false},
 abstract = {The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.},
 bibtype = {article},
 author = {Prieto, Mario and Deus, Helena and de Waard, Anita and Schultes, Erik and García-Jiménez, Beatriz and Wilkinson, Mark D.},
 doi = {10.7717/peerj.8871},
 journal = {PeerJ}
}

Downloads: 9

{"_id":"onWWnuSq4qLYcsJT2","bibbaseid":"prieto-deus-dewaard-schultes-garcajimnez-wilkinson-datadrivenclassificationofthecertaintyofscholarlyassertions-2020","authorIDs":["36ekkCAkZyyxuKaK4","3D2gJyDWo9oBKnSCp","3zcaogicCAqHPNnok","6zAznqPzD5cqJbMMH","9QLGMa73yQx5SHJxy","Bz9ccd9GKz9LhoC6Y","DadrcuWgdY84cCawp","FzAF3293AkCjvDfnT","GjsLZLwaMtB2eRYxy","HWbpM4xvWnDgaxhfW","JEcbfHjFzf49RmyTf","MEspTDBJBjhrEya5h","SDCtrmyrMaghAhukd","SLpnx3c4EduLxA6oW","WLg4QvcCNNZrruNSd","XXoxZ2skr7TR9KJKd","XzdwCmhzy8kZPntGL","Zwns6q8yR6MrKr5pY","a6NmgLfcCX6rNmf9q","aFTFFY2MNyAFdTAi2","ctxLLCdynZRhc25W8","ey7CdDbYue8dJCo5c","fTZznnFfMbPGdQiYH","uvyDTMXpGSQJz2HQY","zTw6W3kTiuRT7TGS9"],"author_short":["Prieto, M.","Deus, H.","de Waard, A.","Schultes, E.","García-Jiménez, B.","Wilkinson, M., D."],"bibdata":{"title":"Data-driven classification of the certainty of scholarly assertions","type":"article","year":"2020","pages":"e8871","volume":"8","websites":"https://peerj.com/articles/8871","month":"4","day":"21","id":"0b32f8c2-f0a8-3480-bb74-cde89e8d0992","created":"2020-05-12T12:13:44.995Z","file_attached":false,"profile_id":"17c87d5d-2470-32d7-b273-0734a1d9195f","last_modified":"2020-07-10T11:55:06.849Z","read":false,"starred":false,"authored":"true","confirmed":"true","hidden":false,"citation_key":"Prieto2020","private_publication":false,"abstract":"The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.","bibtype":"article","author":"Prieto, Mario and Deus, Helena and de Waard, Anita and Schultes, Erik and García-Jiménez, Beatriz and Wilkinson, Mark D.","doi":"10.7717/peerj.8871","journal":"PeerJ","bibtex":"@article{\n title = {Data-driven classification of the certainty of scholarly assertions},\n type = {article},\n year = {2020},\n pages = {e8871},\n volume = {8},\n websites = {https://peerj.com/articles/8871},\n month = {4},\n day = {21},\n id = {0b32f8c2-f0a8-3480-bb74-cde89e8d0992},\n created = {2020-05-12T12:13:44.995Z},\n file_attached = {false},\n profile_id = {17c87d5d-2470-32d7-b273-0734a1d9195f},\n last_modified = {2020-07-10T11:55:06.849Z},\n read = {false},\n starred = {false},\n authored = {true},\n confirmed = {true},\n hidden = {false},\n citation_key = {Prieto2020},\n private_publication = {false},\n abstract = {The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.},\n bibtype = {article},\n author = {Prieto, Mario and Deus, Helena and de Waard, Anita and Schultes, Erik and García-Jiménez, Beatriz and Wilkinson, Mark D.},\n doi = {10.7717/peerj.8871},\n journal = {PeerJ}\n}","author_short":["Prieto, M.","Deus, H.","de Waard, A.","Schultes, E.","García-Jiménez, B.","Wilkinson, M., D."],"urls":{"Website":"https://peerj.com/articles/8871"},"biburl":"https://bibbase.org/service/mendeley/17c87d5d-2470-32d7-b273-0734a1d9195f","bibbaseid":"prieto-deus-dewaard-schultes-garcajimnez-wilkinson-datadrivenclassificationofthecertaintyofscholarlyassertions-2020","role":"author","metadata":{"authorlinks":{"wilkinson, m":"https://bibbase.org/service/mendeley/17c87d5d-2470-32d7-b273-0734a1d9195f","garcía-jiménez, b":"https://bibbase.org/service/mendeley/fa910c8b-8889-3a42-afc9-302da7e3933a/group/ff1f9038-dd83-321a-9605-910d757253bb"}},"downloads":9},"bibtype":"article","creationDate":"2020-05-13T02:14:06.034Z","downloads":9,"keywords":[],"search_terms":["data","driven","classification","certainty","scholarly","assertions","prieto","deus","de waard","schultes","garcía-jiménez","wilkinson"],"title":"Data-driven classification of the certainty of scholarly assertions","year":2020,"biburl":"https://bibbase.org/service/mendeley/17c87d5d-2470-32d7-b273-0734a1d9195f","dataSources":["u3DebWvhQaEque62E","ya2CyA73rpZseyrZ8","2252seNhipfTmjEBQ"]}