Multi-Advisor Reinforcement Learning

Multi-Advisor Reinforcement Learning. Laroche, R., Fatemi, M., Romoff, J., & van Seijen, H. arXiv:1704.00756 [cs, stat], April, 2017. arXiv: 1704.00756

Paper abstract bibtex

We consider tackling a single-agent RL problem by distributing it to n learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is ﬂawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefﬁcient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical ﬁndings on a fruit collection task.

@article{laroche_multi-advisor_2017,
	title = {Multi-{Advisor} {Reinforcement} {Learning}},
	url = {http://arxiv.org/abs/1704.00756},
	abstract = {We consider tackling a single-agent RL problem by distributing it to n learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is ﬂawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefﬁcient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical ﬁndings on a fruit collection task.},
	language = {en},
	urldate = {2019-06-18},
	journal = {arXiv:1704.00756 [cs, stat]},
	author = {Laroche, Romain and Fatemi, Mehdi and Romoff, Joshua and van Seijen, Harm},
	month = apr,
	year = {2017},
	note = {arXiv: 1704.00756},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning}
}

Downloads: 0

{"_id":"6ju3TKwevhzDfbCnu","bibbaseid":"laroche-fatemi-romoff-vanseijen-multiadvisorreinforcementlearning-2017","authorIDs":[],"author_short":["Laroche, R.","Fatemi, M.","Romoff, J.","van Seijen, H."],"bibdata":{"bibtype":"article","type":"article","title":"Multi-Advisor Reinforcement Learning","url":"http://arxiv.org/abs/1704.00756","abstract":"We consider tackling a single-agent RL problem by distributing it to n learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is ﬂawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefﬁcient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical ﬁndings on a fruit collection task.","language":"en","urldate":"2019-06-18","journal":"arXiv:1704.00756 [cs, stat]","author":[{"propositions":[],"lastnames":["Laroche"],"firstnames":["Romain"],"suffixes":[]},{"propositions":[],"lastnames":["Fatemi"],"firstnames":["Mehdi"],"suffixes":[]},{"propositions":[],"lastnames":["Romoff"],"firstnames":["Joshua"],"suffixes":[]},{"propositions":["van"],"lastnames":["Seijen"],"firstnames":["Harm"],"suffixes":[]}],"month":"April","year":"2017","note":"arXiv: 1704.00756","keywords":"Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning","bibtex":"@article{laroche_multi-advisor_2017,\n\ttitle = {Multi-{Advisor} {Reinforcement} {Learning}},\n\turl = {http://arxiv.org/abs/1704.00756},\n\tabstract = {We consider tackling a single-agent RL problem by distributing it to n learners. These learners, called advisors, endeavour to solve the problem from a different focus. Their advice, taking the form of action values, is then communicated to an aggregator, which is in control of the system. We show that the local planning method for the advisors is critical and that none of the ones found in the literature is ﬂawless: the egocentric planning overestimates values of states where the other advisors disagree, and the agnostic planning is inefﬁcient around danger zones. We introduce a novel approach called empathic and discuss its theoretical aspects. We empirically examine and validate our theoretical ﬁndings on a fruit collection task.},\n\tlanguage = {en},\n\turldate = {2019-06-18},\n\tjournal = {arXiv:1704.00756 [cs, stat]},\n\tauthor = {Laroche, Romain and Fatemi, Mehdi and Romoff, Joshua and van Seijen, Harm},\n\tmonth = apr,\n\tyear = {2017},\n\tnote = {arXiv: 1704.00756},\n\tkeywords = {Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning}\n}\n\n","author_short":["Laroche, R.","Fatemi, M.","Romoff, J.","van Seijen, H."],"key":"laroche_multi-advisor_2017","id":"laroche_multi-advisor_2017","bibbaseid":"laroche-fatemi-romoff-vanseijen-multiadvisorreinforcementlearning-2017","role":"author","urls":{"Paper":"http://arxiv.org/abs/1704.00756"},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Machine Learning","Statistics - Machine Learning"],"downloads":0,"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/asneha213","creationDate":"2019-07-08T00:48:54.935Z","downloads":0,"keywords":["computer science - artificial intelligence","computer science - machine learning","statistics - machine learning"],"search_terms":["multi","advisor","reinforcement","learning","laroche","fatemi","romoff","van seijen"],"title":"Multi-Advisor Reinforcement Learning","year":2017,"dataSources":["fjacg9txEnNSDwee6"]}