Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT. Coenen, A., Reif, E., Yuan, A., Kim, B., Pearce, A., Viégas, F., & Wattenberg, M.

Paper abstract bibtex

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

@article{coenenVisualizingMeasuringGeometry2019,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1906.02715},
  primaryClass = {cs, stat},
  title = {Visualizing and {{Measuring}} the {{Geometry}} of {{BERT}}},
  url = {http://arxiv.org/abs/1906.02715},
  abstract = {Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.},
  urldate = {2019-06-21},
  date = {2019-06-06},
  keywords = {Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning},
  author = {Coenen, Andy and Reif, Emily and Yuan, Ann and Kim, Been and Pearce, Adam and Viégas, Fernanda and Wattenberg, Martin},
  file = {/home/dimitri/Nextcloud/Zotero/storage/7D3K8L65/Coenen et al. - 2019 - Visualizing and Measuring the Geometry of BERT.pdf;/home/dimitri/Nextcloud/Zotero/storage/7WX24LBK/1906.html}
}

Downloads: 0

{"_id":"N7sWsTjawxp24pjW9","bibbaseid":"coenen-reif-yuan-kim-pearce-vigas-wattenberg-visualizingandmeasuringthegeometryofbert","authorIDs":[],"author_short":["Coenen, A.","Reif, E.","Yuan, A.","Kim, B.","Pearce, A.","Viégas, F.","Wattenberg, M."],"bibdata":{"bibtype":"article","type":"article","archiveprefix":"arXiv","eprinttype":"arxiv","eprint":"1906.02715","primaryclass":"cs, stat","title":"Visualizing and Measuring the Geometry of BERT","url":"http://arxiv.org/abs/1906.02715","abstract":"Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.","urldate":"2019-06-21","date":"2019-06-06","keywords":"Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning","author":[{"propositions":[],"lastnames":["Coenen"],"firstnames":["Andy"],"suffixes":[]},{"propositions":[],"lastnames":["Reif"],"firstnames":["Emily"],"suffixes":[]},{"propositions":[],"lastnames":["Yuan"],"firstnames":["Ann"],"suffixes":[]},{"propositions":[],"lastnames":["Kim"],"firstnames":["Been"],"suffixes":[]},{"propositions":[],"lastnames":["Pearce"],"firstnames":["Adam"],"suffixes":[]},{"propositions":[],"lastnames":["Viégas"],"firstnames":["Fernanda"],"suffixes":[]},{"propositions":[],"lastnames":["Wattenberg"],"firstnames":["Martin"],"suffixes":[]}],"file":"/home/dimitri/Nextcloud/Zotero/storage/7D3K8L65/Coenen et al. - 2019 - Visualizing and Measuring the Geometry of BERT.pdf;/home/dimitri/Nextcloud/Zotero/storage/7WX24LBK/1906.html","bibtex":"@article{coenenVisualizingMeasuringGeometry2019,\n archivePrefix = {arXiv},\n eprinttype = {arxiv},\n eprint = {1906.02715},\n primaryClass = {cs, stat},\n title = {Visualizing and {{Measuring}} the {{Geometry}} of {{BERT}}},\n url = {http://arxiv.org/abs/1906.02715},\n abstract = {Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.},\n urldate = {2019-06-21},\n date = {2019-06-06},\n keywords = {Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning},\n author = {Coenen, Andy and Reif, Emily and Yuan, Ann and Kim, Been and Pearce, Adam and Viégas, Fernanda and Wattenberg, Martin},\n file = {/home/dimitri/Nextcloud/Zotero/storage/7D3K8L65/Coenen et al. - 2019 - Visualizing and Measuring the Geometry of BERT.pdf;/home/dimitri/Nextcloud/Zotero/storage/7WX24LBK/1906.html}\n}\n\n","author_short":["Coenen, A.","Reif, E.","Yuan, A.","Kim, B.","Pearce, A.","Viégas, F.","Wattenberg, M."],"key":"coenenVisualizingMeasuringGeometry2019","id":"coenenVisualizingMeasuringGeometry2019","bibbaseid":"coenen-reif-yuan-kim-pearce-vigas-wattenberg-visualizingandmeasuringthegeometryofbert","role":"author","urls":{"Paper":"http://arxiv.org/abs/1906.02715"},"keyword":["Statistics - Machine Learning","Computer Science - Computation and Language","Computer Science - Machine Learning"],"downloads":0},"bibtype":"article","biburl":"https://raw.githubusercontent.com/dlozeve/newblog/master/bib/all.bib","creationDate":"2020-01-08T20:39:39.378Z","downloads":0,"keywords":["statistics - machine learning","computer science - computation and language","computer science - machine learning"],"search_terms":["visualizing","measuring","geometry","bert","coenen","reif","yuan","kim","pearce","viégas","wattenberg"],"title":"Visualizing and Measuring the Geometry of BERT","year":null,"dataSources":["3XqdvqRE7zuX4cm8m"]}