AN ACTOR-CRITIC ALGORITHM FOR SEQUENCE PREDICTION. Brakel, D. B. P., Goyal, K. X. A., Courville, A., Pineau, R. L. J., & Bengio, Y. 2017.
abstract   bibtex   
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.
@article{brakel_actor-critic_2017,
	title = {{AN} {ACTOR}-{CRITIC} {ALGORITHM} {FOR} {SEQUENCE} {PREDICTION}},
	abstract = {We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.},
	language = {en},
	author = {Brakel, Dzmitry Bahdanau Philemon and Goyal, Kelvin Xu Anirudh and Courville, Aaron and Pineau, Ryan Lowe Joelle and Bengio, Yoshua},
	year = {2017},
	pages = {17}
}

Downloads: 0