Presenting a classifier to improve the identification of research journal publications in OpenAlex. Haupka, N. Scientometrics, January, 2026.
Paper doi abstract bibtex Abstract This paper introduces a document type classifier with the purpose to optimise the distinction between research and non-research journal publications in OpenAlex. Based on open metadata, the classifier can identify non-research or editorial content within a set of classified articles and reviews (e.g. paratext , abstracts , editorials , letters ), which for example is relevant for bibliometric studies, university rankings and academic procedures. In this respect, OpenAlex shows issues in classifying journal research contributions, tending to overestimate them compared to other databases, due to the promotion of non-research items to research items. The classifier presented in this study achieves an F1-score of 0.95, indicating a potential improvement in the data quality of bibliometric research in OpenAlex when applying the classifier on real data. In total, 4,589,967 out of 42,701,863 articles and reviews could be reclassified as non-research contributions by the classifier, representing a share of 10.75%.
@article{haupka_presenting_2026,
title = {Presenting a classifier to improve the identification of research journal publications in {OpenAlex}},
issn = {0138-9130, 1588-2861},
url = {https://link.springer.com/10.1007/s11192-025-05524-7},
doi = {10.1007/s11192-025-05524-7},
abstract = {Abstract
This paper introduces a document type classifier with the purpose to optimise the distinction between research and non-research journal publications in OpenAlex. Based on open metadata, the classifier can identify non-research or editorial content within a set of classified articles and reviews (e.g.
paratext
,
abstracts
,
editorials
,
letters
), which for example is relevant for bibliometric studies, university rankings and academic procedures. In this respect, OpenAlex shows issues in classifying journal research contributions, tending to overestimate them compared to other databases, due to the promotion of non-research items to research items. The classifier presented in this study achieves an F1-score of 0.95, indicating a potential improvement in the data quality of bibliometric research in OpenAlex when applying the classifier on real data. In total, 4,589,967 out of 42,701,863 articles and reviews could be reclassified as non-research contributions by the classifier, representing a share of 10.75\%.},
language = {en},
urldate = {2026-01-21},
journal = {Scientometrics},
author = {Haupka, Nick},
month = jan,
year = {2026},
}
Downloads: 0
{"_id":"rLkHiAsoJ4a2YL2Fo","bibbaseid":"haupka-presentingaclassifiertoimprovetheidentificationofresearchjournalpublicationsinopenalex-2026","author_short":["Haupka, N."],"bibdata":{"bibtype":"article","type":"article","title":"Presenting a classifier to improve the identification of research journal publications in OpenAlex","issn":"0138-9130, 1588-2861","url":"https://link.springer.com/10.1007/s11192-025-05524-7","doi":"10.1007/s11192-025-05524-7","abstract":"Abstract This paper introduces a document type classifier with the purpose to optimise the distinction between research and non-research journal publications in OpenAlex. Based on open metadata, the classifier can identify non-research or editorial content within a set of classified articles and reviews (e.g. paratext , abstracts , editorials , letters ), which for example is relevant for bibliometric studies, university rankings and academic procedures. In this respect, OpenAlex shows issues in classifying journal research contributions, tending to overestimate them compared to other databases, due to the promotion of non-research items to research items. The classifier presented in this study achieves an F1-score of 0.95, indicating a potential improvement in the data quality of bibliometric research in OpenAlex when applying the classifier on real data. In total, 4,589,967 out of 42,701,863 articles and reviews could be reclassified as non-research contributions by the classifier, representing a share of 10.75%.","language":"en","urldate":"2026-01-21","journal":"Scientometrics","author":[{"propositions":[],"lastnames":["Haupka"],"firstnames":["Nick"],"suffixes":[]}],"month":"January","year":"2026","bibtex":"@article{haupka_presenting_2026,\n\ttitle = {Presenting a classifier to improve the identification of research journal publications in {OpenAlex}},\n\tissn = {0138-9130, 1588-2861},\n\turl = {https://link.springer.com/10.1007/s11192-025-05524-7},\n\tdoi = {10.1007/s11192-025-05524-7},\n\tabstract = {Abstract\n \n This paper introduces a document type classifier with the purpose to optimise the distinction between research and non-research journal publications in OpenAlex. Based on open metadata, the classifier can identify non-research or editorial content within a set of classified articles and reviews (e.g.\n paratext\n ,\n abstracts\n ,\n editorials\n ,\n letters\n ), which for example is relevant for bibliometric studies, university rankings and academic procedures. In this respect, OpenAlex shows issues in classifying journal research contributions, tending to overestimate them compared to other databases, due to the promotion of non-research items to research items. The classifier presented in this study achieves an F1-score of 0.95, indicating a potential improvement in the data quality of bibliometric research in OpenAlex when applying the classifier on real data. In total, 4,589,967 out of 42,701,863 articles and reviews could be reclassified as non-research contributions by the classifier, representing a share of 10.75\\%.},\n\tlanguage = {en},\n\turldate = {2026-01-21},\n\tjournal = {Scientometrics},\n\tauthor = {Haupka, Nick},\n\tmonth = jan,\n\tyear = {2026},\n}\n\n","author_short":["Haupka, N."],"key":"haupka_presenting_2026","id":"haupka_presenting_2026","bibbaseid":"haupka-presentingaclassifiertoimprovetheidentificationofresearchjournalpublicationsinopenalex-2026","role":"author","urls":{"Paper":"https://link.springer.com/10.1007/s11192-025-05524-7"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/4790165/items?key=qWYUkNg8G2tSrs1m5i7SsKOn&format=bibtex&limit=100","dataSources":["XooGe8m5uEyMY8yz7","ttiB3rxTuWH3fiHv3","txmtuJDjhqHfaZE3C","wkZmECJAmJTTcjXCL"],"keywords":[],"search_terms":["presenting","classifier","improve","identification","research","journal","publications","openalex","haupka"],"title":"Presenting a classifier to improve the identification of research journal publications in OpenAlex","year":2026}