Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate. Bahdanau, D., Cho, K., & Bengio, Y.

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

@article{bahdanauNeuralMachineTranslation2014,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1409.0473},
  primaryClass = {cs, stat},
  title = {Neural {{Machine Translation}} by {{Jointly Learning}} to {{Align}} and {{Translate}}},
  url = {http://arxiv.org/abs/1409.0473},
  abstract = {Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.},
  urldate = {2018-11-03},
  date = {2014-09-01},
  keywords = {Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computing},
  author = {Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
  file = {/home/dimitri/Nextcloud/Zotero/storage/EDT3ACT8/Bahdanau et al. - 2014 - Neural Machine Translation by Jointly Learning to .pdf;/home/dimitri/Nextcloud/Zotero/storage/APNE596P/1409.html}
}

Downloads: 0

{"_id":"6fewcJ5jbPwfiym9H","bibbaseid":"bahdanau-cho-bengio-neuralmachinetranslationbyjointlylearningtoalignandtranslate","authorIDs":[],"author_short":["Bahdanau, D.","Cho, K.","Bengio, Y."],"bibdata":{"bibtype":"article","type":"article","archiveprefix":"arXiv","eprinttype":"arxiv","eprint":"1409.0473","primaryclass":"cs, stat","title":"Neural Machine Translation by Jointly Learning to Align and Translate","url":"http://arxiv.org/abs/1409.0473","abstract":"Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.","urldate":"2018-11-03","date":"2014-09-01","keywords":"Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computing","author":[{"propositions":[],"lastnames":["Bahdanau"],"firstnames":["Dzmitry"],"suffixes":[]},{"propositions":[],"lastnames":["Cho"],"firstnames":["Kyunghyun"],"suffixes":[]},{"propositions":[],"lastnames":["Bengio"],"firstnames":["Yoshua"],"suffixes":[]}],"file":"/home/dimitri/Nextcloud/Zotero/storage/EDT3ACT8/Bahdanau et al. - 2014 - Neural Machine Translation by Jointly Learning to .pdf;/home/dimitri/Nextcloud/Zotero/storage/APNE596P/1409.html","bibtex":"@article{bahdanauNeuralMachineTranslation2014,\n archivePrefix = {arXiv},\n eprinttype = {arxiv},\n eprint = {1409.0473},\n primaryClass = {cs, stat},\n title = {Neural {{Machine Translation}} by {{Jointly Learning}} to {{Align}} and {{Translate}}},\n url = {http://arxiv.org/abs/1409.0473},\n abstract = {Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.},\n urldate = {2018-11-03},\n date = {2014-09-01},\n keywords = {Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computing},\n author = {Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},\n file = {/home/dimitri/Nextcloud/Zotero/storage/EDT3ACT8/Bahdanau et al. - 2014 - Neural Machine Translation by Jointly Learning to .pdf;/home/dimitri/Nextcloud/Zotero/storage/APNE596P/1409.html}\n}\n\n","author_short":["Bahdanau, D.","Cho, K.","Bengio, Y."],"key":"bahdanauNeuralMachineTranslation2014","id":"bahdanauNeuralMachineTranslation2014","bibbaseid":"bahdanau-cho-bengio-neuralmachinetranslationbyjointlylearningtoalignandtranslate","role":"author","urls":{"Paper":"http://arxiv.org/abs/1409.0473"},"keyword":["Statistics - Machine Learning","Computer Science - Computation and Language","Computer Science - Machine Learning","Computer Science - Neural and Evolutionary Computing"],"downloads":0},"bibtype":"article","biburl":"https://raw.githubusercontent.com/dlozeve/newblog/master/bib/all.bib","creationDate":"2020-01-08T20:39:39.209Z","downloads":0,"keywords":["statistics - machine learning","computer science - computation and language","computer science - machine learning","computer science - neural and evolutionary computing"],"search_terms":["neural","machine","translation","jointly","learning","align","translate","bahdanau","cho","bengio"],"title":"Neural Machine Translation by Jointly Learning to Align and Translate","year":null,"dataSources":["3XqdvqRE7zuX4cm8m"]}