Use of LLMs to Improve Affiliation Disambiguation in Alexandria3k

Use of LLMs to Improve Affiliation Disambiguation in Alexandria3k. Gupta, D. 2024.

The growth of academic publications, heterogeneity of datasets and the absence of a globally accepted organization identifier introduce the challenge of affiliation disambiguation in bibliographic databases. In this paper, we create a baseline using the currently implemented algorithm for author affiliation linkage in Alexandria3k by comparing it to the ground truth. We aim to explore the usage of LLMs (GPT-4) in the Alexandria3k environment to disambiguate author affiliations. The proposed approach extracts the research organization from textual affiliations provided by researchers through their published works and cross-references the organization across the Research Organization Registry. Our process shows promising results and a significant improvement on the existing algorithm in terms of matching rate and identification of multiple affiliations. We discuss the margin of error in LLM results, limitations of the ground truth, and suggest future research directions.

@article{gupta_use_2024,
	title = {Use of {LLMs} to {Improve} {Affiliation} {Disambiguation} in {Alexandria3k}},
	url = {https://repository.tudelft.nl/islandora/object/uuid%3Adde430e8-d0e0-4d63-9785-ed442e3574bd},
	abstract = {The growth of academic publications, heterogeneity of datasets and the absence of a globally accepted organization identifier introduce the challenge of affiliation disambiguation in bibliographic databases. In this paper, we create a baseline using the currently implemented algorithm for author affiliation linkage in Alexandria3k by comparing it to the ground truth. We aim to explore the usage of LLMs (GPT-4) in the Alexandria3k environment to disambiguate author affiliations. The proposed approach extracts the research organization from textual affiliations provided by researchers through their published works and cross-references the organization across the Research Organization Registry. Our process shows promising results and a significant improvement on the existing algorithm in terms of matching rate and identification of multiple affiliations. We discuss the margin of error in LLM results, limitations of the ground truth, and suggest future research directions.},
	language = {en},
	urldate = {2024-02-12},
	author = {Gupta, Dibyendu},
	year = {2024},
}

Downloads: 0

{"_id":"jTKuFCQkfjtW9vSbk","bibbaseid":"gupta-useofllmstoimproveaffiliationdisambiguationinalexandria3k-2024","author_short":["Gupta, D."],"bibdata":{"bibtype":"article","type":"article","title":"Use of LLMs to Improve Affiliation Disambiguation in Alexandria3k","url":"https://repository.tudelft.nl/islandora/object/uuid%3Adde430e8-d0e0-4d63-9785-ed442e3574bd","abstract":"The growth of academic publications, heterogeneity of datasets and the absence of a globally accepted organization identifier introduce the challenge of affiliation disambiguation in bibliographic databases. In this paper, we create a baseline using the currently implemented algorithm for author affiliation linkage in Alexandria3k by comparing it to the ground truth. We aim to explore the usage of LLMs (GPT-4) in the Alexandria3k environment to disambiguate author affiliations. The proposed approach extracts the research organization from textual affiliations provided by researchers through their published works and cross-references the organization across the Research Organization Registry. Our process shows promising results and a significant improvement on the existing algorithm in terms of matching rate and identification of multiple affiliations. We discuss the margin of error in LLM results, limitations of the ground truth, and suggest future research directions.","language":"en","urldate":"2024-02-12","author":[{"propositions":[],"lastnames":["Gupta"],"firstnames":["Dibyendu"],"suffixes":[]}],"year":"2024","bibtex":"@article{gupta_use_2024,\n\ttitle = {Use of {LLMs} to {Improve} {Affiliation} {Disambiguation} in {Alexandria3k}},\n\turl = {https://repository.tudelft.nl/islandora/object/uuid%3Adde430e8-d0e0-4d63-9785-ed442e3574bd},\n\tabstract = {The growth of academic publications, heterogeneity of datasets and the absence of a globally accepted organization identifier introduce the challenge of affiliation disambiguation in bibliographic databases. In this paper, we create a baseline using the currently implemented algorithm for author affiliation linkage in Alexandria3k by comparing it to the ground truth. We aim to explore the usage of LLMs (GPT-4) in the Alexandria3k environment to disambiguate author affiliations. The proposed approach extracts the research organization from textual affiliations provided by researchers through their published works and cross-references the organization across the Research Organization Registry. Our process shows promising results and a significant improvement on the existing algorithm in terms of matching rate and identification of multiple affiliations. We discuss the margin of error in LLM results, limitations of the ground truth, and suggest future research directions.},\n\tlanguage = {en},\n\turldate = {2024-02-12},\n\tauthor = {Gupta, Dibyendu},\n\tyear = {2024},\n}\n\n\n\n","author_short":["Gupta, D."],"key":"gupta_use_2024","id":"gupta_use_2024","bibbaseid":"gupta-useofllmstoimproveaffiliationdisambiguationinalexandria3k-2024","role":"author","urls":{"Paper":"https://repository.tudelft.nl/islandora/object/uuid%3Adde430e8-d0e0-4d63-9785-ed442e3574bd"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/zotero-group/researchorgs/4790165","dataSources":["wkZmECJAmJTTcjXCL","ttiB3rxTuWH3fiHv3","QHRyMQvCoxGqqkZfo","ez36gbfWfBmHWbPMB","XooGe8m5uEyMY8yz7","4WhN75iipFtxrbivB"],"keywords":[],"search_terms":["use","llms","improve","affiliation","disambiguation","alexandria3k","gupta"],"title":"Use of LLMs to Improve Affiliation Disambiguation in Alexandria3k","year":2024}