Attention Is All You Need. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. December, 2017. 🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning
Paper doi abstract bibtex The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 【摘要翻译】主流的序列转换模型是基于复杂的递归或卷积神经网络的编码器-解码器配置。性能最好的模型还通过注意机制连接编码器和解码器。我们提出了一种新的简单网络架构–"转换器"(Transformer),它完全基于注意力机制,无需递归和卷积。在两项机器翻译任务上的实验表明,这些模型的质量更优,同时可并行化程度更高,所需的训练时间也大大减少。我们的模型在 WMT 2014 英语到德语的翻译任务中达到了 28.4 BLEU,比现有的最佳结果(包括集合)提高了 2 BLEU 以上。在 WMT 2014 英法翻译任务中,我们的模型在 8 个 GPU 上进行了 3.5 天的训练后,单模型 BLEU 得分为 41.8,达到了新的一流水平,这只是文献中最佳模型训练成本的一小部分。我们将 Transformer 成功应用于英语选区解析,并同时使用大量和有限的训练数据,从而证明 Transformer 可以很好地推广到其他任务中。
@misc{vaswani2017,
title = {Attention {Is} {All} {You} {Need}},
shorttitle = {关注就是一切},
url = {http://arxiv.org/abs/1706.03762},
doi = {10.48550/arXiv.1706.03762},
abstract = {The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
【摘要翻译】主流的序列转换模型是基于复杂的递归或卷积神经网络的编码器-解码器配置。性能最好的模型还通过注意机制连接编码器和解码器。我们提出了一种新的简单网络架构--"转换器"(Transformer),它完全基于注意力机制,无需递归和卷积。在两项机器翻译任务上的实验表明,这些模型的质量更优,同时可并行化程度更高,所需的训练时间也大大减少。我们的模型在 WMT 2014 英语到德语的翻译任务中达到了 28.4 BLEU,比现有的最佳结果(包括集合)提高了 2 BLEU 以上。在 WMT 2014 英法翻译任务中,我们的模型在 8 个 GPU 上进行了 3.5 天的训练后,单模型 BLEU 得分为 41.8,达到了新的一流水平,这只是文献中最佳模型训练成本的一小部分。我们将 Transformer 成功应用于英语选区解析,并同时使用大量和有限的训练数据,从而证明 Transformer 可以很好地推广到其他任务中。},
language = {en},
urldate = {2023-02-02},
publisher = {arXiv},
author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia},
month = dec,
year = {2017},
note = {🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning},
keywords = {/unread, Computer Science - Computation and Language, Computer Science - Machine Learning},
}
Downloads: 0
{"_id":"rcB74Z4BpPjr9gAxd","bibbaseid":"vaswani-shazeer-parmar-uszkoreit-jones-gomez-kaiser-polosukhin-attentionisallyouneed-2017","downloads":0,"creationDate":"2018-01-22T16:01:14.158Z","title":"Attention Is All You Need","author_short":["Vaswani, A.","Shazeer, N.","Parmar, N.","Uszkoreit, J.","Jones, L.","Gomez, A. N.","Kaiser, L.","Polosukhin, I."],"year":2017,"bibtype":"misc","biburl":"https://api.zotero.org/groups/2386895/collections/7PPRTB2H/items?format=bibtex&limit=100","bibdata":{"bibtype":"misc","type":"misc","title":"Attention Is All You Need","shorttitle":"关注就是一切","url":"http://arxiv.org/abs/1706.03762","doi":"10.48550/arXiv.1706.03762","abstract":"The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 【摘要翻译】主流的序列转换模型是基于复杂的递归或卷积神经网络的编码器-解码器配置。性能最好的模型还通过注意机制连接编码器和解码器。我们提出了一种新的简单网络架构–\"转换器\"(Transformer),它完全基于注意力机制,无需递归和卷积。在两项机器翻译任务上的实验表明,这些模型的质量更优,同时可并行化程度更高,所需的训练时间也大大减少。我们的模型在 WMT 2014 英语到德语的翻译任务中达到了 28.4 BLEU,比现有的最佳结果(包括集合)提高了 2 BLEU 以上。在 WMT 2014 英法翻译任务中,我们的模型在 8 个 GPU 上进行了 3.5 天的训练后,单模型 BLEU 得分为 41.8,达到了新的一流水平,这只是文献中最佳模型训练成本的一小部分。我们将 Transformer 成功应用于英语选区解析,并同时使用大量和有限的训练数据,从而证明 Transformer 可以很好地推广到其他任务中。","language":"en","urldate":"2023-02-02","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Vaswani"],"firstnames":["Ashish"],"suffixes":[]},{"propositions":[],"lastnames":["Shazeer"],"firstnames":["Noam"],"suffixes":[]},{"propositions":[],"lastnames":["Parmar"],"firstnames":["Niki"],"suffixes":[]},{"propositions":[],"lastnames":["Uszkoreit"],"firstnames":["Jakob"],"suffixes":[]},{"propositions":[],"lastnames":["Jones"],"firstnames":["Llion"],"suffixes":[]},{"propositions":[],"lastnames":["Gomez"],"firstnames":["Aidan","N."],"suffixes":[]},{"propositions":[],"lastnames":["Kaiser"],"firstnames":["Lukasz"],"suffixes":[]},{"propositions":[],"lastnames":["Polosukhin"],"firstnames":["Illia"],"suffixes":[]}],"month":"December","year":"2017","note":"🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning","keywords":"/unread, Computer Science - Computation and Language, Computer Science - Machine Learning","bibtex":"@misc{vaswani2017,\n\ttitle = {Attention {Is} {All} {You} {Need}},\n\tshorttitle = {关注就是一切},\n\turl = {http://arxiv.org/abs/1706.03762},\n\tdoi = {10.48550/arXiv.1706.03762},\n\tabstract = {The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.\n\n【摘要翻译】主流的序列转换模型是基于复杂的递归或卷积神经网络的编码器-解码器配置。性能最好的模型还通过注意机制连接编码器和解码器。我们提出了一种新的简单网络架构--\"转换器\"(Transformer),它完全基于注意力机制,无需递归和卷积。在两项机器翻译任务上的实验表明,这些模型的质量更优,同时可并行化程度更高,所需的训练时间也大大减少。我们的模型在 WMT 2014 英语到德语的翻译任务中达到了 28.4 BLEU,比现有的最佳结果(包括集合)提高了 2 BLEU 以上。在 WMT 2014 英法翻译任务中,我们的模型在 8 个 GPU 上进行了 3.5 天的训练后,单模型 BLEU 得分为 41.8,达到了新的一流水平,这只是文献中最佳模型训练成本的一小部分。我们将 Transformer 成功应用于英语选区解析,并同时使用大量和有限的训练数据,从而证明 Transformer 可以很好地推广到其他任务中。},\n\tlanguage = {en},\n\turldate = {2023-02-02},\n\tpublisher = {arXiv},\n\tauthor = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia},\n\tmonth = dec,\n\tyear = {2017},\n\tnote = {🏷️ /unread、Computer Science - Computation and Language、Computer Science - Machine Learning},\n\tkeywords = {/unread, Computer Science - Computation and Language, Computer Science - Machine Learning},\n}\n\n","author_short":["Vaswani, A.","Shazeer, N.","Parmar, N.","Uszkoreit, J.","Jones, L.","Gomez, A. N.","Kaiser, L.","Polosukhin, I."],"key":"vaswani2017","id":"vaswani2017","bibbaseid":"vaswani-shazeer-parmar-uszkoreit-jones-gomez-kaiser-polosukhin-attentionisallyouneed-2017","role":"author","urls":{"Paper":"http://arxiv.org/abs/1706.03762"},"keyword":["/unread","Computer Science - Computation and Language","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}},"downloads":0},"search_terms":["attention","need","vaswani","shazeer","parmar","uszkoreit","jones","gomez","kaiser","polosukhin"],"keywords":["/unread","computer science - computation and language","computer science - machine learning"],"authorIDs":[],"dataSources":["3cSHcH82NdYLcnKP2","QGwcHf7xnb5mCCQi7","ya2CyA73rpZseyrZ8","SYqcwHeTFu9kg8TXh","ovn29uG6Mbp3JWCRR","pzyFFGWvxG2bs63zP","MDN92uztrK9rBiHJj","6yqkfgG5XRop3rtQ6","gBvKD3NdQwvPCaD5C","nTukRxsNNmbKgjGgn","aXmRAq63YsH7a3ufx","mt9b5ir7GnHzHywGs","CmHEoydhafhbkXXt5","SzwP89Pgwvcu5euzJ","N4kJAiLiJ7kxfNsoh","5qXSH7BrePnXHtcrf","Wsv2bQ4jPuc7qme8R","2252seNhipfTmjEBQ","TJKHZ3TN7ogruuAa6","h7kKWXpJh2iaX92T5","nZHrFJKyxKKDaWYM8","akApyGuSiBYkDccny","igXWS7EdKxb8weRwm","taWdMrienBzqHC2tC","u8q5uny4m5jJL9RcX"]}