BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Yannic Kilcher January, 2019.

https://arxiv.org/abs/1810.04805 Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%. Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

@misc{yannic_kilcher_bert_2019,
	title = {{BERT}: {Pre}-training of {Deep} {Bidirectional} {Transformers} for {Language} {Understanding}},
	shorttitle = {{BERT}},
	url = {https://www.youtube.com/watch?v=-9evrZnBorM},
	abstract = {https://arxiv.org/abs/1810.04805

Abstract:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. 
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4\% (7.6\% absolute improvement), MultiNLI accuracy to 86.7 (5.6\% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5\% absolute improvement), outperforming human performance by 2.0\%.

Authors:
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova},
	language = {en},
	urldate = {2023-07-28},
	author = {{Yannic Kilcher}},
	month = jan,
	year = {2019},
	keywords = {\#NLP, \#Transformer, \#Youtube, /unread},
}

Downloads: 0

{"_id":"k6C5pQXQpJRZ4Aaqa","bibbaseid":"yannickilcher-bertpretrainingofdeepbidirectionaltransformersforlanguageunderstanding-2019","author_short":["Yannic Kilcher"],"bibdata":{"bibtype":"misc","type":"misc","title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","shorttitle":"BERT","url":"https://www.youtube.com/watch?v=-9evrZnBorM","abstract":"https://arxiv.org/abs/1810.04805 Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%. Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova","language":"en","urldate":"2023-07-28","author":[{"firstnames":[],"propositions":[],"lastnames":["Yannic Kilcher"],"suffixes":[]}],"month":"January","year":"2019","keywords":"#NLP, #Transformer, #Youtube, /unread","bibtex":"@misc{yannic_kilcher_bert_2019,\n\ttitle = {{BERT}: {Pre}-training of {Deep} {Bidirectional} {Transformers} for {Language} {Understanding}},\n\tshorttitle = {{BERT}},\n\turl = {https://www.youtube.com/watch?v=-9evrZnBorM},\n\tabstract = {https://arxiv.org/abs/1810.04805\n\nAbstract:\nWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. \nBERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4\\% (7.6\\% absolute improvement), MultiNLI accuracy to 86.7 (5.6\\% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5\\% absolute improvement), outperforming human performance by 2.0\\%.\n\nAuthors:\nJacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova},\n\tlanguage = {en},\n\turldate = {2023-07-28},\n\tauthor = {{Yannic Kilcher}},\n\tmonth = jan,\n\tyear = {2019},\n\tkeywords = {\\#NLP, \\#Transformer, \\#Youtube, /unread},\n}\n\n\n\n","author_short":["Yannic Kilcher"],"key":"yannic_kilcher_bert_2019","id":"yannic_kilcher_bert_2019","bibbaseid":"yannickilcher-bertpretrainingofdeepbidirectionaltransformersforlanguageunderstanding-2019","role":"author","urls":{"Paper":"https://www.youtube.com/watch?v=-9evrZnBorM"},"keyword":["#NLP","#Transformer","#Youtube","/unread"],"metadata":{"authorlinks":{}},"downloads":0,"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/zzhenry2012","dataSources":["nZHrFJKyxKKDaWYM8"],"keywords":["#nlp","#transformer","#youtube","/unread"],"search_terms":["bert","pre","training","deep","bidirectional","transformers","language","understanding","yannic kilcher"],"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","year":2019}