BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Yannic Kilcher January, 2019.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [link]Paper  abstract   bibtex   
https://arxiv.org/abs/1810.04805 Abstract: We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%. Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
@misc{yannic_kilcher_bert_2019,
	title = {{BERT}: {Pre}-training of {Deep} {Bidirectional} {Transformers} for {Language} {Understanding}},
	shorttitle = {{BERT}},
	url = {https://www.youtube.com/watch?v=-9evrZnBorM},
	abstract = {https://arxiv.org/abs/1810.04805

Abstract:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. 
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4\% (7.6\% absolute improvement), MultiNLI accuracy to 86.7 (5.6\% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5\% absolute improvement), outperforming human performance by 2.0\%.

Authors:
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova},
	language = {en},
	urldate = {2023-07-28},
	author = {{Yannic Kilcher}},
	month = jan,
	year = {2019},
	keywords = {\#NLP, \#Transformer, \#Youtube, /unread},
}

Downloads: 0