Large Language Models integration in Digital Humanities

Large Language Models integration in Digital Humanities. Sullutrone, G. In 2024.

The exponential growth of available data to Digital Humanities (DH) has created an impending need for tools capable of analyzing and extracting information from multi-lingual historical documents. This paper explores the research directions of my PhD project: providing DH scholars with effective, efficient, and explainable tools based on recent advancements in Large Language Models (LLMs). Two are the main directions of investigation: Self-Improving LLMs applied to Text-to-SQL and Topic Modeling, with a focus on interacting with and augmenting existing DBMS; Knowledge Graph (KG) creation and integration to mitigate hallucination, improve transparency and reasoning in question-answering systems. At the heart of my research lies the Digital Maktaba (DM) project which seeks to create a digital library for assisting in the preservation and analysis of multicultural non-latin heritage documents using, among others, cutting edge techniques for Natural Language Processing (NLP) and Data Science. The DM objectives and ideals align with the ultimate goal of the PhD project: the creation of instruments capable of aiding human-data interaction and information extraction while keeping the user at the center of an ever-evolving system. These tools have the potential to revolutionize the way DH scholars interact with historical documents, leading to new insights and discoveries for the field at large.

@inproceedings{sullutrone_large_2024,
	title = {Large {Language} {Models} integration in {Digital} {Humanities}},
	url = {https://www.semanticscholar.org/paper/Large-Language-Models-integration-in-Digital-Sullutrone/8d127bc2f01fab76ea9fdad8706d7f58fd4b8308},
	abstract = {The exponential growth of available data to Digital Humanities (DH) has created an impending need for tools capable of analyzing and extracting information from multi-lingual historical documents. This paper explores the research directions of my PhD project: providing DH scholars with effective, efficient, and explainable tools based on recent advancements in Large Language Models (LLMs). Two are the main directions of investigation: Self-Improving LLMs applied to Text-to-SQL and Topic Modeling, with a focus on interacting with and augmenting existing DBMS; Knowledge Graph (KG) creation and integration to mitigate hallucination, improve transparency and reasoning in question-answering systems. At the heart of my research lies the Digital Maktaba (DM) project which seeks to create a digital library for assisting in the preservation and analysis of multicultural non-latin heritage documents using, among others, cutting edge techniques for Natural Language Processing (NLP) and Data Science. The DM objectives and ideals align with the ultimate goal of the PhD project: the creation of instruments capable of aiding human-data interaction and information extraction while keeping the user at the center of an ever-evolving system. These tools have the potential to revolutionize the way DH scholars interact with historical documents, leading to new insights and discoveries for the field at large.},
	urldate = {2025-01-15},
	author = {Sullutrone, Giovanni},
	year = {2024},
}

Downloads: 0

{"_id":"zkrjtMPwicuSM3w3b","bibbaseid":"sullutrone-largelanguagemodelsintegrationindigitalhumanities-2024","author_short":["Sullutrone, G."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Large Language Models integration in Digital Humanities","url":"https://www.semanticscholar.org/paper/Large-Language-Models-integration-in-Digital-Sullutrone/8d127bc2f01fab76ea9fdad8706d7f58fd4b8308","abstract":"The exponential growth of available data to Digital Humanities (DH) has created an impending need for tools capable of analyzing and extracting information from multi-lingual historical documents. This paper explores the research directions of my PhD project: providing DH scholars with effective, efficient, and explainable tools based on recent advancements in Large Language Models (LLMs). Two are the main directions of investigation: Self-Improving LLMs applied to Text-to-SQL and Topic Modeling, with a focus on interacting with and augmenting existing DBMS; Knowledge Graph (KG) creation and integration to mitigate hallucination, improve transparency and reasoning in question-answering systems. At the heart of my research lies the Digital Maktaba (DM) project which seeks to create a digital library for assisting in the preservation and analysis of multicultural non-latin heritage documents using, among others, cutting edge techniques for Natural Language Processing (NLP) and Data Science. The DM objectives and ideals align with the ultimate goal of the PhD project: the creation of instruments capable of aiding human-data interaction and information extraction while keeping the user at the center of an ever-evolving system. These tools have the potential to revolutionize the way DH scholars interact with historical documents, leading to new insights and discoveries for the field at large.","urldate":"2025-01-15","author":[{"propositions":[],"lastnames":["Sullutrone"],"firstnames":["Giovanni"],"suffixes":[]}],"year":"2024","bibtex":"@inproceedings{sullutrone_large_2024,\n\ttitle = {Large {Language} {Models} integration in {Digital} {Humanities}},\n\turl = {https://www.semanticscholar.org/paper/Large-Language-Models-integration-in-Digital-Sullutrone/8d127bc2f01fab76ea9fdad8706d7f58fd4b8308},\n\tabstract = {The exponential growth of available data to Digital Humanities (DH) has created an impending need for tools capable of analyzing and extracting information from multi-lingual historical documents. This paper explores the research directions of my PhD project: providing DH scholars with effective, efficient, and explainable tools based on recent advancements in Large Language Models (LLMs). Two are the main directions of investigation: Self-Improving LLMs applied to Text-to-SQL and Topic Modeling, with a focus on interacting with and augmenting existing DBMS; Knowledge Graph (KG) creation and integration to mitigate hallucination, improve transparency and reasoning in question-answering systems. At the heart of my research lies the Digital Maktaba (DM) project which seeks to create a digital library for assisting in the preservation and analysis of multicultural non-latin heritage documents using, among others, cutting edge techniques for Natural Language Processing (NLP) and Data Science. The DM objectives and ideals align with the ultimate goal of the PhD project: the creation of instruments capable of aiding human-data interaction and information extraction while keeping the user at the center of an ever-evolving system. These tools have the potential to revolutionize the way DH scholars interact with historical documents, leading to new insights and discoveries for the field at large.},\n\turldate = {2025-01-15},\n\tauthor = {Sullutrone, Giovanni},\n\tyear = {2024},\n}\n\n\n\n","author_short":["Sullutrone, G."],"key":"sullutrone_large_2024","id":"sullutrone_large_2024","bibbaseid":"sullutrone-largelanguagemodelsintegrationindigitalhumanities-2024","role":"author","urls":{"Paper":"https://www.semanticscholar.org/paper/Large-Language-Models-integration-in-Digital-Sullutrone/8d127bc2f01fab76ea9fdad8706d7f58fd4b8308"},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero-group/schulzkx/5158478","dataSources":["JFDnASMkoQCjjGL8E"],"keywords":[],"search_terms":["large","language","models","integration","digital","humanities","sullutrone"],"title":"Large Language Models integration in Digital Humanities","year":2024}