Inverse Reinforcement Learning. Fischer, A. 2013. 00003
Inverse Reinforcement Learning [link]Paper  abstract   bibtex   
Recently researches on imitation learning have shown that Markov Decision Processes (MDPs) are a powerful way to characterize this problem. Inverse reinforcement learning tries to describe observed behavior by ascertaining a reward function (or respectively a cost function) by solving a Markov Decision Problem. This paper shows three different approaches to find an optimal policy which mimics observed behavior. The differences and issues will be pointed out and compared on some applications. The first approach handles different cases in which the policy and states are finite and known, the state size is continuous, and the policy is only known through a finite set of observed trajectories. The second approach LEARCH extends Maximum Margin Planning and is simpler to implement like many other approaches while satisfying constraints on the cost function in a more naturally way. The last approach is based on the principle of maximum entropy and reduces learning to the problem of recovering utility function that closely mimics demonstrated behavior. 1
@article{fischer_inverse_2013,
	title = {Inverse {Reinforcement} {Learning}},
	url = {https://core.ac.uk/display/23628604},
	abstract = {Recently researches on imitation learning have shown that Markov Decision Processes (MDPs) are a powerful way to characterize this problem. Inverse reinforcement learning tries to describe observed behavior by ascertaining a reward function (or respectively a cost function) by solving a Markov Decision Problem. This paper shows three different approaches to find an optimal policy which mimics observed behavior. The differences and issues will be pointed out and compared on some applications. The first approach handles different cases in which the policy and states are finite and known, the state size is continuous, and the policy is only known through a finite set of observed trajectories. The second approach LEARCH extends Maximum Margin Planning and is simpler to implement like many other approaches while satisfying constraints on the cost function in a more naturally way. The last approach is based on the principle of maximum entropy and reduces learning to the problem of recovering utility function that closely mimics demonstrated behavior. 1},
	urldate = {2017-01-24},
	author = {Fischer, Arthur},
	year = {2013},
	note = {00003},
	keywords = {MachineLearning}
}

Downloads: 0