Incorporating Knowledge Resources into Natural Language Processing Techniques to Advance Academic Research and Application Development. Han, K. May, 2023. Paper abstract bibtex The rapid advancement of natural language processing (NLP) and machine learning (ML) techniques, coupled with the accumulation of data and knowledge resources in the recent decades, opens up numerous new opportunities for social and scientific studies, as well as for developing applications used in daily life (e.g., chatbots and online search engines). However, challenges persist, such as the lack of sufficient amounts of annotated training data to build or fine-tune NLP and ML models, noisy data with incomplete information for specific needs, and the adaptation of generic pre-trained models to domain-specific downstream tasks, among others. Leveraging knowledge resources, which I define as data or human resources that contain dense and typically structured knowledge within specific domains, holds promise for advancing NLP and ML techniques to facilitate social and scientific studies, as well as application design and development for daily life purposes. In this dissertation, I investigate various knowledge resources that can be mined and incorporated into NLP techniques for social and scientific studies and application development. Specifically, this dissertation will present four studies, including trimming the Wikipedia Category Tree for domain-specific tasks, disambiguating funder names and predicting funder characteristics for funding allocation studies based on community-curated resources, developing socially responsible chatbots for purchase decision-making based on online platform data, and categorizing domain-specific documents based on small annotated data and an expert-in-the-loop approach. These studies make contributions to advance 1) knowledge on how to use existing knowledge resources for specific domains or tasks, 2) novel frameworks for cleaning, mining, and utilizing these knowledge resources, and 3) models and systems that can be directly used for tasks such as funder name disambiguation and question-answering.
@article{han_incorporating_2023,
title = {Incorporating {Knowledge} {Resources} into {Natural} {Language} {Processing} {Techniques} to {Advance} {Academic} {Research} and {Application} {Development}},
url = {https://hdl.handle.net/2142/118097},
abstract = {The rapid advancement of natural language processing (NLP) and machine learning (ML) techniques, coupled with the accumulation of data and knowledge resources in the recent decades, opens up numerous new opportunities for social and scientific studies, as well as for developing applications used in daily life (e.g., chatbots and online search engines). However, challenges persist, such as the lack of sufficient amounts of annotated training data to build or fine-tune NLP and ML models, noisy data with incomplete information for specific needs, and the adaptation of generic pre-trained models to domain-specific downstream tasks, among others.
Leveraging knowledge resources, which I define as data or human resources that contain dense and typically structured knowledge within specific domains, holds promise for advancing NLP and ML techniques to facilitate social and scientific studies, as well as application design and development for daily life purposes. In this dissertation, I investigate various knowledge resources that can be mined and incorporated into NLP techniques for social and scientific studies and application development. Specifically, this dissertation will present four studies, including trimming the Wikipedia Category Tree for domain-specific tasks, disambiguating funder names and predicting funder characteristics for funding allocation studies based on community-curated resources, developing socially responsible chatbots for purchase decision-making based on online platform data, and categorizing domain-specific documents based on small annotated data and an expert-in-the-loop approach.
These studies make contributions to advance 1) knowledge on how to use existing knowledge resources for specific domains or tasks, 2) novel frameworks for cleaning, mining, and utilizing these knowledge resources, and 3) models and systems that can be directly used for tasks such as funder name disambiguation and question-answering.},
language = {en},
urldate = {2023-05-30},
author = {Han, Kanyao},
month = may,
year = {2023},
}
Downloads: 0
{"_id":"ooxMje9rjhmN2oCQb","bibbaseid":"han-incorporatingknowledgeresourcesintonaturallanguageprocessingtechniquestoadvanceacademicresearchandapplicationdevelopment-2023","author_short":["Han, K."],"bibdata":{"bibtype":"article","type":"article","title":"Incorporating Knowledge Resources into Natural Language Processing Techniques to Advance Academic Research and Application Development","url":"https://hdl.handle.net/2142/118097","abstract":"The rapid advancement of natural language processing (NLP) and machine learning (ML) techniques, coupled with the accumulation of data and knowledge resources in the recent decades, opens up numerous new opportunities for social and scientific studies, as well as for developing applications used in daily life (e.g., chatbots and online search engines). However, challenges persist, such as the lack of sufficient amounts of annotated training data to build or fine-tune NLP and ML models, noisy data with incomplete information for specific needs, and the adaptation of generic pre-trained models to domain-specific downstream tasks, among others. Leveraging knowledge resources, which I define as data or human resources that contain dense and typically structured knowledge within specific domains, holds promise for advancing NLP and ML techniques to facilitate social and scientific studies, as well as application design and development for daily life purposes. In this dissertation, I investigate various knowledge resources that can be mined and incorporated into NLP techniques for social and scientific studies and application development. Specifically, this dissertation will present four studies, including trimming the Wikipedia Category Tree for domain-specific tasks, disambiguating funder names and predicting funder characteristics for funding allocation studies based on community-curated resources, developing socially responsible chatbots for purchase decision-making based on online platform data, and categorizing domain-specific documents based on small annotated data and an expert-in-the-loop approach. These studies make contributions to advance 1) knowledge on how to use existing knowledge resources for specific domains or tasks, 2) novel frameworks for cleaning, mining, and utilizing these knowledge resources, and 3) models and systems that can be directly used for tasks such as funder name disambiguation and question-answering.","language":"en","urldate":"2023-05-30","author":[{"propositions":[],"lastnames":["Han"],"firstnames":["Kanyao"],"suffixes":[]}],"month":"May","year":"2023","bibtex":"@article{han_incorporating_2023,\n\ttitle = {Incorporating {Knowledge} {Resources} into {Natural} {Language} {Processing} {Techniques} to {Advance} {Academic} {Research} and {Application} {Development}},\n\turl = {https://hdl.handle.net/2142/118097},\n\tabstract = {The rapid advancement of natural language processing (NLP) and machine learning (ML) techniques, coupled with the accumulation of data and knowledge resources in the recent decades, opens up numerous new opportunities for social and scientific studies, as well as for developing applications used in daily life (e.g., chatbots and online search engines). However, challenges persist, such as the lack of sufficient amounts of annotated training data to build or fine-tune NLP and ML models, noisy data with incomplete information for specific needs, and the adaptation of generic pre-trained models to domain-specific downstream tasks, among others. \n\nLeveraging knowledge resources, which I define as data or human resources that contain dense and typically structured knowledge within specific domains, holds promise for advancing NLP and ML techniques to facilitate social and scientific studies, as well as application design and development for daily life purposes. In this dissertation, I investigate various knowledge resources that can be mined and incorporated into NLP techniques for social and scientific studies and application development. Specifically, this dissertation will present four studies, including trimming the Wikipedia Category Tree for domain-specific tasks, disambiguating funder names and predicting funder characteristics for funding allocation studies based on community-curated resources, developing socially responsible chatbots for purchase decision-making based on online platform data, and categorizing domain-specific documents based on small annotated data and an expert-in-the-loop approach. \n\nThese studies make contributions to advance 1) knowledge on how to use existing knowledge resources for specific domains or tasks, 2) novel frameworks for cleaning, mining, and utilizing these knowledge resources, and 3) models and systems that can be directly used for tasks such as funder name disambiguation and question-answering.},\n\tlanguage = {en},\n\turldate = {2023-05-30},\n\tauthor = {Han, Kanyao},\n\tmonth = may,\n\tyear = {2023},\n}\n\n\n\n","author_short":["Han, K."],"key":"han_incorporating_2023","id":"han_incorporating_2023","bibbaseid":"han-incorporatingknowledgeresourcesintonaturallanguageprocessingtechniquestoadvanceacademicresearchandapplicationdevelopment-2023","role":"author","urls":{"Paper":"https://hdl.handle.net/2142/118097"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/zotero-group/researchorgs/4790165","dataSources":["XooGe8m5uEyMY8yz7"],"keywords":[],"search_terms":["incorporating","knowledge","resources","natural","language","processing","techniques","advance","academic","research","application","development","han"],"title":"Incorporating Knowledge Resources into Natural Language Processing Techniques to Advance Academic Research and Application Development","year":2023}