AN ACTOR-CRITIC ALGORITHM FOR SEQUENCE PREDICTION

AN ACTOR-CRITIC ALGORITHM FOR SEQUENCE PREDICTION. Brakel, D. B. P., Goyal, K. X. A., Courville, A., Pineau, R. L. J., & Bengio, Y. 2017.
abstract bibtex

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-speciﬁc score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

@article{brakel_actor-critic_2017,
	title = {{AN} {ACTOR}-{CRITIC} {ALGORITHM} {FOR} {SEQUENCE} {PREDICTION}},
	abstract = {We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-speciﬁc score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.},
	language = {en},
	author = {Brakel, Dzmitry Bahdanau Philemon and Goyal, Kelvin Xu Anirudh and Courville, Aaron and Pineau, Ryan Lowe Joelle and Bengio, Yoshua},
	year = {2017},
	pages = {17}
}

Downloads: 0

{"_id":"yAkY4buBL9RHzrv9e","bibbaseid":"brakel-goyal-courville-pineau-bengio-anactorcriticalgorithmforsequenceprediction-2017","authorIDs":[],"author_short":["Brakel, D. B. P.","Goyal, K. X. A.","Courville, A.","Pineau, R. L. J.","Bengio, Y."],"bibdata":{"bibtype":"article","type":"article","title":"AN ACTOR-CRITIC ALGORITHM FOR SEQUENCE PREDICTION","abstract":"We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-speciﬁc score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.","language":"en","author":[{"propositions":[],"lastnames":["Brakel"],"firstnames":["Dzmitry","Bahdanau","Philemon"],"suffixes":[]},{"propositions":[],"lastnames":["Goyal"],"firstnames":["Kelvin","Xu","Anirudh"],"suffixes":[]},{"propositions":[],"lastnames":["Courville"],"firstnames":["Aaron"],"suffixes":[]},{"propositions":[],"lastnames":["Pineau"],"firstnames":["Ryan","Lowe","Joelle"],"suffixes":[]},{"propositions":[],"lastnames":["Bengio"],"firstnames":["Yoshua"],"suffixes":[]}],"year":"2017","pages":"17","bibtex":"@article{brakel_actor-critic_2017,\n\ttitle = {{AN} {ACTOR}-{CRITIC} {ALGORITHM} {FOR} {SEQUENCE} {PREDICTION}},\n\tabstract = {We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-speciﬁc score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.},\n\tlanguage = {en},\n\tauthor = {Brakel, Dzmitry Bahdanau Philemon and Goyal, Kelvin Xu Anirudh and Courville, Aaron and Pineau, Ryan Lowe Joelle and Bengio, Yoshua},\n\tyear = {2017},\n\tpages = {17}\n}\n\n","author_short":["Brakel, D. B. P.","Goyal, K. X. A.","Courville, A.","Pineau, R. L. J.","Bengio, Y."],"key":"brakel_actor-critic_2017","id":"brakel_actor-critic_2017","bibbaseid":"brakel-goyal-courville-pineau-bengio-anactorcriticalgorithmforsequenceprediction-2017","role":"author","urls":{},"downloads":0,"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/asneha213","creationDate":"2019-06-06T20:57:45.731Z","downloads":0,"keywords":[],"search_terms":["actor","critic","algorithm","sequence","prediction","brakel","goyal","courville","pineau","bengio"],"title":"AN ACTOR-CRITIC ALGORITHM FOR SEQUENCE PREDICTION","year":2017,"dataSources":["fjacg9txEnNSDwee6"]}