On Using Very Large Target Vocabulary for Neural Machine Translation

On Using Very Large Target Vocabulary for Neural Machine Translation. Jean, S., Cho, K., Memisevic, R., & Bengio, Y. 2014.
abstract bibtex

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English-\\textgreater\German translation and almost as high performance as state-of-the-art English-\\textgreater\French translation system.

@article{Jean_On_2014,
  year={2014},
  abstract={Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the {LSTM-based} neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by {BLEU)} on the {English-\{\textgreater\}German} translation and almost as high performance as state-of-the-art {English-\{\textgreater\}French} translation system.},
  title={On Using Very Large Target Vocabulary for Neural Machine Translation},
  author={Jean, S{\'e}bastien and Cho, Kyunghyun and Memisevic, Roland and Bengio, Yoshua}
}

Downloads: 0

{"_id":"cLGQYAjX5852fJMSh","bibbaseid":"jean-cho-memisevic-bengio-onusingverylargetargetvocabularyforneuralmachinetranslation-2014","downloads":0,"creationDate":"2017-05-10T16:03:41.197Z","title":"On Using Very Large Target Vocabulary for Neural Machine Translation","author_short":["Jean, S.","Cho, K.","Memisevic, R.","Bengio, Y."],"year":2014,"bibtype":"article","biburl":"http://tilke.github.io/bib/deep-learning.bib","bibdata":{"bibtype":"article","type":"article","year":"2014","abstract":"Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English-\\\\textgreater\\German translation and almost as high performance as state-of-the-art English-\\\\textgreater\\French translation system.","title":"On Using Very Large Target Vocabulary for Neural Machine Translation","author":[{"propositions":[],"lastnames":["Jean"],"firstnames":["Sébastien"],"suffixes":[]},{"propositions":[],"lastnames":["Cho"],"firstnames":["Kyunghyun"],"suffixes":[]},{"propositions":[],"lastnames":["Memisevic"],"firstnames":["Roland"],"suffixes":[]},{"propositions":[],"lastnames":["Bengio"],"firstnames":["Yoshua"],"suffixes":[]}],"bibtex":"@article{Jean_On_2014,\n year={2014},\n abstract={Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the {LSTM-based} neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by {BLEU)} on the {English-\\{\\textgreater\\}German} translation and almost as high performance as state-of-the-art {English-\\{\\textgreater\\}French} translation system.},\n title={On Using Very Large Target Vocabulary for Neural Machine Translation},\n author={Jean, S{\\'e}bastien and Cho, Kyunghyun and Memisevic, Roland and Bengio, Yoshua}\n}\n","author_short":["Jean, S.","Cho, K.","Memisevic, R.","Bengio, Y."],"key":"Jean_On_2014","id":"Jean_On_2014","bibbaseid":"jean-cho-memisevic-bengio-onusingverylargetargetvocabularyforneuralmachinetranslation-2014","role":"author","urls":{},"downloads":0},"search_terms":["using","very","large","target","vocabulary","neural","machine","translation","jean","cho","memisevic","bengio"],"keywords":[],"authorIDs":[],"dataSources":["NScDFpTBCtWwAQ6eq"]}