What Does BERT Learn about the Structure of Language?. Jawahar, G., Sagot, B., & Seddah, D. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, Stroudsburg, PA, USA, 2019. Association for Computational Linguistics.
What Does BERT Learn about the Structure of Language? [link]Paper  doi  abstract   bibtex   3 downloads  
BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. We first show that BERT's phrasal representation captures phrase-level information in the lower layers. We also show that BERT's intermediate layers encode a rich hierarchy of linguistic information , with surface features at the bottom, syntactic features in the middle and semantic features at the top. BERT turns out to require deeper layers when long-distance dependency information is required, e.g. to track subject-verb agreement. Finally, we show that BERT representations capture linguistic information in a compositional way that mimics classical, tree-like structures.
@inproceedings{Jawahar2019,
abstract = {BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. We first show that BERT's phrasal representation captures phrase-level information in the lower layers. We also show that BERT's intermediate layers encode a rich hierarchy of linguistic information , with surface features at the bottom, syntactic features in the middle and semantic features at the top. BERT turns out to require deeper layers when long-distance dependency information is required, e.g. to track subject-verb agreement. Finally, we show that BERT representations capture linguistic information in a compositional way that mimics classical, tree-like structures.},
address = {Stroudsburg, PA, USA},
author = {Jawahar, Ganesh and Sagot, Beno{\^{i}}t and Seddah, Djam{\'{e}}},
booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
doi = {10.18653/v1/P19-1356},
file = {:Users/shanest/Documents/Library/Jawahar, Sagot, Seddah/Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics/Jawahar, Sagot, Seddah - 2019 - What Does BERT Learn about the Structure of Language.pdf:pdf},
keywords = {method: diagnostic classifier,phenomenon: compositionality,phenomenon: various},
pages = {3651--3657},
publisher = {Association for Computational Linguistics},
title = {{What Does BERT Learn about the Structure of Language?}},
url = {https://www.aclweb.org/anthology/P19-1356},
year = {2019}
}

Downloads: 3