Dynamic Concept Model Learns Optimal Policies

Dynamic Concept Model Learns Optimal Policies. Szepesvári, C. In Proc. of IEEE WCCI ICNN'94, volume 3, pages 1738–1742, Orlando, Florida, 1994. IEEE Inc..

Paper abstract bibtex

Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and has been proven powerful in many applications. A brain model based alternative to RL has been introduced in the literature: It integrates artificial neural networks (ANN) and knowledge based (KB) systems into one unit or agent for goal oriented problem solving. The agent may possess inherited and learnt ANN and KB subsystems. The agent has and develops ANN cues to the environment for dimensionality reduction in order to ease the problem of combinatorial explosion. A dynamic concept model was forwarded that builds cue-models of the phenomena in the world, designs action sets (concepts) and make them compete in a neural stage to come to a decision. The competition was implemented in the form of activation spreading (AS) and a winner-take-all mechanism. The efficiency of the algorithm has been demonstrated for several examples, however, the optimality of the algorithm have not yet been proven in general. Here, a restriction to Markov decision problems (MDP) shall be treated making possible to show the equivalence of a special AS and RL. The equivalence in this special case means, that DCM has all the advantages of RL, moreover it keeps track of more distinctions allowing faster convergence and generalization.

@inproceedings{Szepesvari1994c,
	abstract = {Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and has been proven powerful in many applications. A brain model based alternative to RL has been introduced in the literature: It integrates artificial neural networks (ANN) and knowledge based (KB) systems into one unit or agent for goal oriented problem solving. The agent may possess inherited and learnt ANN and KB subsystems. The agent has and develops ANN cues to the environment for dimensionality reduction in order to ease the problem of combinatorial explosion. A dynamic concept model was forwarded that builds cue-models of the phenomena in the world, designs action sets (concepts) and make them compete in a neural stage to come to a decision. The competition was implemented in the form of activation spreading (AS) and a winner-take-all mechanism. The efficiency of the algorithm has been demonstrated for several examples, however, the optimality of the algorithm have not yet been proven in general. Here, a restriction to Markov decision problems (MDP) shall be treated making possible to show the equivalence of a special AS and RL. The equivalence in this special case means, that DCM has all the advantages of RL, moreover it keeps track of more distinctions allowing faster convergence and generalization.},
	address = {Orlando, Florida},
	author = {Szepesv{\'a}ri, Cs.},
	booktitle = {Proc. of IEEE WCCI ICNN'94},
	keywords = {reinforcement learning, theory},
	pages = {1738--1742},
	publisher = {IEEE Inc.},
	title = {Dynamic Concept Model Learns Optimal Policies},
	url_paper = {szepes.dcmopt.ps.pdf},
	volume = {3},
	year = {1994}}

Downloads: 0