In *Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP*, pages 115–124, Stroudsburg, PA, USA, 2018. Association for Computational Linguistics.

Paper doi abstract bibtex

Paper doi abstract bibtex

While long short-term memory (LSTM) neural net architectures are designed to capture sequence information, human language is generally composed of hierarchical structures. This raises the question as to whether LSTMs can learn hierarchical structures. We explore this question with a well-formed bracket prediction task using two types of brackets modeled by an LSTM. Demonstrating that such a system is learnable by an LSTM is the first step in demonstrating that the entire class of CFLs is also learnable. We observe that the model requires exponential memory in terms of the number of characters and embedded depth, where a sub-linear memory should suffice. Still, the model does more than memorize the training input. It learns how to distinguish between relevant and irrelevant information. On the other hand, we also observe that the model does not generalize well. We conclude that LSTMs do not learn the relevant underlying context-free rules, suggesting the good overall performance is attained rather by an efficient way of evaluating nuisance variables. LSTMs are a way to quickly reach good results for many natural language tasks, but to understand and generate natural language one has to investigate other concepts that can make more direct use of natural language's structural nature.

@inproceedings{Sennhauser2018, abstract = {While long short-term memory (LSTM) neural net architectures are designed to capture sequence information, human language is generally composed of hierarchical structures. This raises the question as to whether LSTMs can learn hierarchical structures. We explore this question with a well-formed bracket prediction task using two types of brackets modeled by an LSTM. Demonstrating that such a system is learnable by an LSTM is the first step in demonstrating that the entire class of CFLs is also learnable. We observe that the model requires exponential memory in terms of the number of characters and embedded depth, where a sub-linear memory should suffice. Still, the model does more than memorize the training input. It learns how to distinguish between relevant and irrelevant information. On the other hand, we also observe that the model does not generalize well. We conclude that LSTMs do not learn the relevant underlying context-free rules, suggesting the good overall performance is attained rather by an efficient way of evaluating nuisance variables. LSTMs are a way to quickly reach good results for many natural language tasks, but to understand and generate natural language one has to investigate other concepts that can make more direct use of natural language's structural nature.}, address = {Stroudsburg, PA, USA}, archivePrefix = {arXiv}, arxivId = {1811.02611}, author = {Sennhauser, Luzi and Berwick, Robert}, booktitle = {Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP}, doi = {10.18653/v1/W18-5414}, eprint = {1811.02611}, file = {:Users/shanest/Documents/Library/Sennhauser, Berwick/Proceedings of the 2018 EMNLP Workshop BlackboxNLP Analyzing and Interpreting Neural Networks for NLP/Sennhauser, Berwick - 2018 - Evaluating the Ability of LSTMs to Learn Context-Free Grammars.pdf:pdf}, keywords = {method: formal languages}, pages = {115--124}, publisher = {Association for Computational Linguistics}, title = {{Evaluating the Ability of LSTMs to Learn Context-Free Grammars}}, url = {http://aclweb.org/anthology/W18-5414}, year = {2018} }

Downloads: 0