A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Hedderich, M. A., Lange, L., Adel, H., Strötgen, J., & Klakow, D. arXiv:2010.12309 [cs], April, 2021. 🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning

Paper abstract bibtex

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research. 【摘要翻译】深度神经网络和庞大的语言模型在自然语言应用中正变得无处不在。众所周知，它们需要大量的训练数据，因此有越来越多的工作致力于提高它们在低资源环境下的性能。受最近对神经模型的根本性改变以及流行的预训练和微调范式的推动，我们对低资源自然语言处理的前景看好的方法进行了调查。在讨论了数据可用性的不同维度后，我们对在训练数据稀少的情况下实现学习的方法进行了结构化概述。这包括创建额外标记数据的机制，如数据增强和远距离监督，以及减少目标监督需求的迁移学习设置。我们调查的一个目标是解释这些方法在要求上有何不同，因为了解这些要求对于选择适合特定低资源环境的技术至关重要。这项工作的另一个关键方面是强调尚未解决的问题，并为今后的研究勾勒出有希望的方向。

@article{hedderich2021,
	title = {A {Survey} on {Recent} {Approaches} for {Natural} {Language} {Processing} in {Low}-{Resource} {Scenarios}},
	shorttitle = {低资源环境下自然语言处理最新方法概览},
	url = {http://arxiv.org/abs/2010.12309},
	abstract = {Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

【摘要翻译】深度神经网络和庞大的语言模型在自然语言应用中正变得无处不在。众所周知，它们需要大量的训练数据，因此有越来越多的工作致力于提高它们在低资源环境下的性能。受最近对神经模型的根本性改变以及流行的预训练和微调范式的推动，我们对低资源自然语言处理的前景看好的方法进行了调查。在讨论了数据可用性的不同维度后，我们对在训练数据稀少的情况下实现学习的方法进行了结构化概述。这包括创建额外标记数据的机制，如数据增强和远距离监督，以及减少目标监督需求的迁移学习设置。我们调查的一个目标是解释这些方法在要求上有何不同，因为了解这些要求对于选择适合特定低资源环境的技术至关重要。这项工作的另一个关键方面是强调尚未解决的问题，并为今后的研究勾勒出有希望的方向。},
	language = {en},
	urldate = {2021-08-12},
	journal = {arXiv:2010.12309 [cs]},
	author = {Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Strötgen, Jannik and Klakow, Dietrich},
	month = apr,
	year = {2021},
	note = {🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning},
	keywords = {/unread, Computer Science - Computation and Language, Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"rkiFoKrwGtLGzKPMQ","bibbaseid":"hedderich-lange-adel-strtgen-klakow-asurveyonrecentapproachesfornaturallanguageprocessinginlowresourcescenarios-2021","author_short":["Hedderich, M. A.","Lange, L.","Adel, H.","Strötgen, J.","Klakow, D."],"bibdata":{"bibtype":"article","type":"article","title":"A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios","shorttitle":"低资源环境下自然语言处理最新方法概览","url":"http://arxiv.org/abs/2010.12309","abstract":"Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research. 【摘要翻译】深度神经网络和庞大的语言模型在自然语言应用中正变得无处不在。众所周知，它们需要大量的训练数据，因此有越来越多的工作致力于提高它们在低资源环境下的性能。受最近对神经模型的根本性改变以及流行的预训练和微调范式的推动，我们对低资源自然语言处理的前景看好的方法进行了调查。在讨论了数据可用性的不同维度后，我们对在训练数据稀少的情况下实现学习的方法进行了结构化概述。这包括创建额外标记数据的机制，如数据增强和远距离监督，以及减少目标监督需求的迁移学习设置。我们调查的一个目标是解释这些方法在要求上有何不同，因为了解这些要求对于选择适合特定低资源环境的技术至关重要。这项工作的另一个关键方面是强调尚未解决的问题，并为今后的研究勾勒出有希望的方向。","language":"en","urldate":"2021-08-12","journal":"arXiv:2010.12309 [cs]","author":[{"propositions":[],"lastnames":["Hedderich"],"firstnames":["Michael","A."],"suffixes":[]},{"propositions":[],"lastnames":["Lange"],"firstnames":["Lukas"],"suffixes":[]},{"propositions":[],"lastnames":["Adel"],"firstnames":["Heike"],"suffixes":[]},{"propositions":[],"lastnames":["Strötgen"],"firstnames":["Jannik"],"suffixes":[]},{"propositions":[],"lastnames":["Klakow"],"firstnames":["Dietrich"],"suffixes":[]}],"month":"April","year":"2021","note":"🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning","keywords":"/unread, Computer Science - Computation and Language, Computer Science - Machine Learning","bibtex":"@article{hedderich2021,\n\ttitle = {A {Survey} on {Recent} {Approaches} for {Natural} {Language} {Processing} in {Low}-{Resource} {Scenarios}},\n\tshorttitle = {低资源环境下自然语言处理最新方法概览},\n\turl = {http://arxiv.org/abs/2010.12309},\n\tabstract = {Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.\n\n【摘要翻译】深度神经网络和庞大的语言模型在自然语言应用中正变得无处不在。众所周知，它们需要大量的训练数据，因此有越来越多的工作致力于提高它们在低资源环境下的性能。受最近对神经模型的根本性改变以及流行的预训练和微调范式的推动，我们对低资源自然语言处理的前景看好的方法进行了调查。在讨论了数据可用性的不同维度后，我们对在训练数据稀少的情况下实现学习的方法进行了结构化概述。这包括创建额外标记数据的机制，如数据增强和远距离监督，以及减少目标监督需求的迁移学习设置。我们调查的一个目标是解释这些方法在要求上有何不同，因为了解这些要求对于选择适合特定低资源环境的技术至关重要。这项工作的另一个关键方面是强调尚未解决的问题，并为今后的研究勾勒出有希望的方向。},\n\tlanguage = {en},\n\turldate = {2021-08-12},\n\tjournal = {arXiv:2010.12309 [cs]},\n\tauthor = {Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Strötgen, Jannik and Klakow, Dietrich},\n\tmonth = apr,\n\tyear = {2021},\n\tnote = {🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning},\n\tkeywords = {/unread, Computer Science - Computation and Language, Computer Science - Machine Learning},\n}\n\n","author_short":["Hedderich, M. A.","Lange, L.","Adel, H.","Strötgen, J.","Klakow, D."],"key":"hedderich2021","id":"hedderich2021","bibbaseid":"hedderich-lange-adel-strtgen-klakow-asurveyonrecentapproachesfornaturallanguageprocessinginlowresourcescenarios-2021","role":"author","urls":{"Paper":"http://arxiv.org/abs/2010.12309"},"keyword":["/unread","Computer Science - Computation and Language","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2386895/collections/7PPRTB2H/items?format=bibtex&limit=100","dataSources":["u8q5uny4m5jJL9RcX"],"keywords":["/unread","computer science - computation and language","computer science - machine learning"],"search_terms":["survey","recent","approaches","natural","language","processing","low","resource","scenarios","hedderich","lange","adel","strötgen","klakow"],"title":"A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios","year":2021}