Completeness degree of publication metadata in eight free-access scholarly databases. Delgado-Quirós, L. & Ortega, J. L. Quantitative Science Studies, 5(1):31–49, 2024. QID: Q128433986 OpenAlex: W4391658149 CorpusID: 267580653
Completeness degree of publication metadata in eight free-access scholarly databases [link]Paper  doi  abstract   bibtex   
The main objective of this study is to compare the amount of metadata and the completeness degree of research publications in new academic databases. Using a quantitative approach, we selected a random Crossref sample of more than 115,000 records, which was then searched in seven databases (Dimensions, Google Scholar, Microsoft Academic, OpenAlex, Scilit, Semantic Scholar, and The Lens). Seven characteristics were analyzed (abstract, access, bibliographic info, document type, publication date, language, and identifiers), to observe fields that describe this information, the completeness rate of these fields, and the agreement among databases. The results show that academic search engines (Google Scholar, Microsoft Academic, and Semantic Scholar) gather less information and have a low degree of completeness. Conversely, third-party databases (Dimensions, OpenAlex, Scilit, and The Lens) have more metadata quality and a higher completeness rate. We conclude that academic search engines lack the ability to retrieve reliable descriptive data by crawling the web, and the main problem of third-party databases is the loss of information derived from integrating different sources.
@article{delgado-quiros_completeness_2024,
	title = {Completeness degree of publication metadata in eight free-access scholarly databases},
	volume = {5},
	copyright = {CC BY 4.0},
	issn = {2641-3337},
	url = {https://doi.org/10.1162/qss_a_00286},
	doi = {10.1162/qss_a_00286},
	abstract = {The main objective of this study is to compare the amount of metadata and the completeness degree of research publications in new academic databases. Using a quantitative approach, we selected a random Crossref sample of more than 115,000 records, which was then searched in seven databases (Dimensions, Google Scholar, Microsoft Academic, OpenAlex, Scilit, Semantic Scholar, and The Lens). Seven characteristics were analyzed (abstract, access, bibliographic info, document type, publication date, language, and identifiers), to observe fields that describe this information, the completeness rate of these fields, and the agreement among databases. The results show that academic search engines (Google Scholar, Microsoft Academic, and Semantic Scholar) gather less information and have a low degree of completeness. Conversely, third-party databases (Dimensions, OpenAlex, Scilit, and The Lens) have more metadata quality and a higher completeness rate. We conclude that academic search engines lack the ability to retrieve reliable descriptive data by crawling the web, and the main problem of third-party databases is the loss of information derived from integrating different sources.},
	language = {en},
	number = {1},
	urldate = {2024-10-30},
	journal = {Quantitative Science Studies},
	author = {Delgado-Quirós, Lorena and Ortega, José Luis},
	year = {2024},
	note = {QID: Q128433986
OpenAlex: W4391658149
CorpusID: 267580653},
	keywords = {zfrancophone\_wikidata},
	pages = {31--49},
}

Downloads: 0