An Actor/Critic Algorithm that is Equivalent to Q-Learning

An Actor/Critic Algorithm that is Equivalent to Q-Learning. Crites, R. H & Barto, A. G
abstract bibtex

We prove the convergence of an actor/critic algorithm that is equivalent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria that depend on the relative probability of the action that was executed.

@article{crites_actor/critic_nodate,
	title = {An {Actor}/{Critic} {Algorithm} that is {Equivalent} to {Q}-{Learning}},
	abstract = {We prove the convergence of an actor/critic algorithm that is equivalent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria that depend on the relative probability of the action that was executed.},
	language = {en},
	author = {Crites, Robert H and Barto, Andrew G},
	pages = {10}
}

Downloads: 0

{"_id":"WJ2WqAbrAqmT5AsDw","bibbaseid":"crites-barto-anactorcriticalgorithmthatisequivalenttoqlearning","authorIDs":[],"author_short":["Crites, R. H","Barto, A. G"],"bibdata":{"bibtype":"article","type":"article","title":"An Actor/Critic Algorithm that is Equivalent to Q-Learning","abstract":"We prove the convergence of an actor/critic algorithm that is equivalent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria that depend on the relative probability of the action that was executed.","language":"en","author":[{"propositions":[],"lastnames":["Crites"],"firstnames":["Robert","H"],"suffixes":[]},{"propositions":[],"lastnames":["Barto"],"firstnames":["Andrew","G"],"suffixes":[]}],"pages":"10","bibtex":"@article{crites_actor/critic_nodate,\n\ttitle = {An {Actor}/{Critic} {Algorithm} that is {Equivalent} to {Q}-{Learning}},\n\tabstract = {We prove the convergence of an actor/critic algorithm that is equivalent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria that depend on the relative probability of the action that was executed.},\n\tlanguage = {en},\n\tauthor = {Crites, Robert H and Barto, Andrew G},\n\tpages = {10}\n}\n\n","author_short":["Crites, R. H","Barto, A. G"],"key":"crites_actor/critic_nodate","id":"crites_actor/critic_nodate","bibbaseid":"crites-barto-anactorcriticalgorithmthatisequivalenttoqlearning","role":"author","urls":{},"downloads":0,"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/asneha213","creationDate":"2019-06-06T20:57:45.721Z","downloads":0,"keywords":[],"search_terms":["actor","critic","algorithm","equivalent","learning","crites","barto"],"title":"An Actor/Critic Algorithm that is Equivalent to Q-Learning","year":null,"dataSources":["fjacg9txEnNSDwee6"]}