Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule

Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule. Baras, D. & Meir, R. 2006.
abstract bibtex

Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plas-ticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from Machine Learning to networks of spiking neurons, and derive a spike time dependent plasticity rule which ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists. 1 Policy Learning and Neuronal Dynamics Reinforcement Learning (RL) is a general term used for a class of learning problems in which an agent attempts to improve its performance over time at a given task (e.g., Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998). Formally, it is the problem of mapping situations to actions in order to maximize a given reward signal. The interaction between the agent and the environment is modelled mathematically as a Partially Ob-

@book{baras_reinforcement_2006,
	title = {Reinforcement {Learning}, {Spike} {Time} {Dependent} {Plasticity} and the {BCM} {Rule}},
	abstract = {Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plas-ticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from Machine Learning to networks of spiking neurons, and derive a spike time dependent plasticity rule which ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists. 1 Policy Learning and Neuronal Dynamics Reinforcement Learning (RL) is a general term used for a class of learning problems in which an agent attempts to improve its performance over time at a given task (e.g., Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998). Formally, it is the problem of mapping situations to actions in order to maximize a given reward signal. The interaction between the agent and the environment is modelled mathematically as a Partially Ob-},
	author = {Baras, Dorit and Meir, Ron},
	year = {2006}
}

Downloads: 0

{"_id":"MjNvtLpdcyp4RSmCh","bibbaseid":"baras-meir-reinforcementlearningspiketimedependentplasticityandthebcmrule-2006","authorIDs":[],"author_short":["Baras, D.","Meir, R."],"bibdata":{"bibtype":"book","type":"book","title":"Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule","abstract":"Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plas-ticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from Machine Learning to networks of spiking neurons, and derive a spike time dependent plasticity rule which ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists. 1 Policy Learning and Neuronal Dynamics Reinforcement Learning (RL) is a general term used for a class of learning problems in which an agent attempts to improve its performance over time at a given task (e.g., Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998). Formally, it is the problem of mapping situations to actions in order to maximize a given reward signal. The interaction between the agent and the environment is modelled mathematically as a Partially Ob-","author":[{"propositions":[],"lastnames":["Baras"],"firstnames":["Dorit"],"suffixes":[]},{"propositions":[],"lastnames":["Meir"],"firstnames":["Ron"],"suffixes":[]}],"year":"2006","bibtex":"@book{baras_reinforcement_2006,\n\ttitle = {Reinforcement {Learning}, {Spike} {Time} {Dependent} {Plasticity} and the {BCM} {Rule}},\n\tabstract = {Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plas-ticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We apply a recently introduced policy learning algorithm from Machine Learning to networks of spiking neurons, and derive a spike time dependent plasticity rule which ensures convergence to a local optimum of the expected average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. We demonstrate the effectiveness of the derived rule in several toy problems. Finally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule, for which good biological evidence exists. 1 Policy Learning and Neuronal Dynamics Reinforcement Learning (RL) is a general term used for a class of learning problems in which an agent attempts to improve its performance over time at a given task (e.g., Bertsekas and Tsitsiklis, 1996; Sutton and Barto, 1998). Formally, it is the problem of mapping situations to actions in order to maximize a given reward signal. The interaction between the agent and the environment is modelled mathematically as a Partially Ob-},\n\tauthor = {Baras, Dorit and Meir, Ron},\n\tyear = {2006}\n}\n\n","author_short":["Baras, D.","Meir, R."],"key":"baras_reinforcement_2006","id":"baras_reinforcement_2006","bibbaseid":"baras-meir-reinforcementlearningspiketimedependentplasticityandthebcmrule-2006","role":"author","urls":{},"downloads":0},"bibtype":"book","biburl":"https://bibbase.org/zotero/asneha213","creationDate":"2019-06-06T20:57:45.712Z","downloads":0,"keywords":[],"search_terms":["reinforcement","learning","spike","time","dependent","plasticity","bcm","rule","baras","meir"],"title":"Reinforcement Learning, Spike Time Dependent Plasticity and the BCM Rule","year":2006,"dataSources":["fjacg9txEnNSDwee6"]}