Neural Machine Translation by Jointly Learning to Align and Translate. Bahdanau, D., Cho, K., & Bengio, Y.
Neural Machine Translation by Jointly Learning to Align and Translate [link]Paper  abstract   bibtex   
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.
@article{bahdanauNeuralMachineTranslation2014,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1409.0473},
  primaryClass = {cs, stat},
  title = {Neural {{Machine Translation}} by {{Jointly Learning}} to {{Align}} and {{Translate}}},
  url = {http://arxiv.org/abs/1409.0473},
  abstract = {Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.},
  urldate = {2018-11-03},
  date = {2014-09-01},
  keywords = {Statistics - Machine Learning,Computer Science - Computation and Language,Computer Science - Machine Learning,Computer Science - Neural and Evolutionary Computing},
  author = {Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
  file = {/home/dimitri/Nextcloud/Zotero/storage/EDT3ACT8/Bahdanau et al. - 2014 - Neural Machine Translation by Jointly Learning to .pdf;/home/dimitri/Nextcloud/Zotero/storage/APNE596P/1409.html}
}

Downloads: 0