BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Devlin, J., Chang, M., Lee, K., & Toutanova, K.

Paper abstract bibtex

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.

@article{devlinBERTPretrainingDeep2018,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1810.04805},
  primaryClass = {cs},
  title = {{{BERT}}: {{Pre}}-Training of {{Deep Bidirectional Transformers}} for {{Language Understanding}}},
  url = {http://arxiv.org/abs/1810.04805},
  shorttitle = {{{BERT}}},
  abstract = {We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4\% (7.6\% absolute improvement), MultiNLI accuracy to 86.7 (5.6\% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5\% absolute improvement), outperforming human performance by 2.0\%.},
  urldate = {2019-02-21},
  date = {2018-10-10},
  keywords = {Computer Science - Computation and Language},
  author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  file = {/home/dimitri/Nextcloud/Zotero/storage/3M8VB4UN/Devlin et al. - 2018 - BERT Pre-training of Deep Bidirectional Transform.pdf;/home/dimitri/Nextcloud/Zotero/storage/3P9Z7WWU/1810.html}
}

Downloads: 0

{"_id":"qLyZ2AjdPsGBeku7a","bibbaseid":"devlin-chang-lee-toutanova-bertpretrainingofdeepbidirectionaltransformersforlanguageunderstanding","authorIDs":[],"author_short":["Devlin, J.","Chang, M.","Lee, K.","Toutanova, K."],"bibdata":{"bibtype":"article","type":"article","archiveprefix":"arXiv","eprinttype":"arxiv","eprint":"1810.04805","primaryclass":"cs","title":"BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding","url":"http://arxiv.org/abs/1810.04805","shorttitle":"BERT","abstract":"We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.","urldate":"2019-02-21","date":"2018-10-10","keywords":"Computer Science - Computation and Language","author":[{"propositions":[],"lastnames":["Devlin"],"firstnames":["Jacob"],"suffixes":[]},{"propositions":[],"lastnames":["Chang"],"firstnames":["Ming-Wei"],"suffixes":[]},{"propositions":[],"lastnames":["Lee"],"firstnames":["Kenton"],"suffixes":[]},{"propositions":[],"lastnames":["Toutanova"],"firstnames":["Kristina"],"suffixes":[]}],"file":"/home/dimitri/Nextcloud/Zotero/storage/3M8VB4UN/Devlin et al. - 2018 - BERT Pre-training of Deep Bidirectional Transform.pdf;/home/dimitri/Nextcloud/Zotero/storage/3P9Z7WWU/1810.html","bibtex":"@article{devlinBERTPretrainingDeep2018,\n archivePrefix = {arXiv},\n eprinttype = {arxiv},\n eprint = {1810.04805},\n primaryClass = {cs},\n title = {{{BERT}}: {{Pre}}-Training of {{Deep Bidirectional Transformers}} for {{Language Understanding}}},\n url = {http://arxiv.org/abs/1810.04805},\n shorttitle = {{{BERT}}},\n abstract = {We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4\\% (7.6\\% absolute improvement), MultiNLI accuracy to 86.7 (5.6\\% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5\\% absolute improvement), outperforming human performance by 2.0\\%.},\n urldate = {2019-02-21},\n date = {2018-10-10},\n keywords = {Computer Science - Computation and Language},\n author = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},\n file = {/home/dimitri/Nextcloud/Zotero/storage/3M8VB4UN/Devlin et al. - 2018 - BERT Pre-training of Deep Bidirectional Transform.pdf;/home/dimitri/Nextcloud/Zotero/storage/3P9Z7WWU/1810.html}\n}\n\n","author_short":["Devlin, J.","Chang, M.","Lee, K.","Toutanova, K."],"key":"devlinBERTPretrainingDeep2018","id":"devlinBERTPretrainingDeep2018","bibbaseid":"devlin-chang-lee-toutanova-bertpretrainingofdeepbidirectionaltransformersforlanguageunderstanding","role":"author","urls":{"Paper":"http://arxiv.org/abs/1810.04805"},"keyword":["Computer Science - Computation and Language"],"downloads":0},"bibtype":"article","biburl":"https://raw.githubusercontent.com/dlozeve/newblog/master/bib/all.bib","creationDate":"2020-01-08T20:39:39.259Z","downloads":0,"keywords":["computer science - computation and language"],"search_terms":["bert","pre","training","deep","bidirectional","transformers","language","understanding","devlin","chang","lee","toutanova"],"title":"BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding","year":null,"dataSources":["3XqdvqRE7zuX4cm8m"]}