Human-level control through deep reinforcement learning

Human-level control through deep reinforcement learning. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. Nature, 518(7540):529–533, Nature Publishing Group, February, 2015. arXiv: 1604.03986 ISBN: 1476-4687 (Electronic) 0028-0836 (Linking)

Paper doi abstract bibtex

Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

@article{mnih_human-level_2015,
	title = {Human-level control through deep reinforcement learning},
	volume = {518},
	issn = {0028-0836},
	url = {http://dx.doi.org/10.1038/nature14236},
	doi = {10.1038/nature14236},
	abstract = {Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.},
	number = {7540},
	journal = {Nature},
	publisher = {Nature Publishing Group},
	author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis},
	month = feb,
	year = {2015},
	note = {arXiv: 1604.03986
ISBN: 1476-4687 (Electronic) 0028-0836 (Linking)},
	keywords = {★},
	pages = {529--533},
}

Downloads: 0

{"_id":"tjJNLjpk7Th4MrzDK","bibbaseid":"mnih-kavukcuoglu-silver-rusu-veness-bellemare-graves-riedmiller-etal-humanlevelcontrolthroughdeepreinforcementlearning-2015","downloads":0,"creationDate":"2018-02-28T11:17:24.609Z","title":"Human-level control through deep reinforcement learning","author_short":["Mnih, V.","Kavukcuoglu, K.","Silver, D.","Rusu, A. A.","Veness, J.","Bellemare, M. G.","Graves, A.","Riedmiller, M.","Fidjeland, A. K.","Ostrovski, G.","Petersen, S.","Beattie, C.","Sadik, A.","Antonoglou, I.","King, H.","Kumaran, D.","Wierstra, D.","Legg, S.","Hassabis, D."],"year":2015,"bibtype":"article","biburl":"https://bibbase.org/zotero/yywangvr","bibdata":{"bibtype":"article","type":"article","title":"Human-level control through deep reinforcement learning","volume":"518","issn":"0028-0836","url":"http://dx.doi.org/10.1038/nature14236","doi":"10.1038/nature14236","abstract":"Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.","number":"7540","journal":"Nature","publisher":"Nature Publishing Group","author":[{"propositions":[],"lastnames":["Mnih"],"firstnames":["Volodymyr"],"suffixes":[]},{"propositions":[],"lastnames":["Kavukcuoglu"],"firstnames":["Koray"],"suffixes":[]},{"propositions":[],"lastnames":["Silver"],"firstnames":["David"],"suffixes":[]},{"propositions":[],"lastnames":["Rusu"],"firstnames":["Andrei","A."],"suffixes":[]},{"propositions":[],"lastnames":["Veness"],"firstnames":["Joel"],"suffixes":[]},{"propositions":[],"lastnames":["Bellemare"],"firstnames":["Marc","G."],"suffixes":[]},{"propositions":[],"lastnames":["Graves"],"firstnames":["Alex"],"suffixes":[]},{"propositions":[],"lastnames":["Riedmiller"],"firstnames":["Martin"],"suffixes":[]},{"propositions":[],"lastnames":["Fidjeland"],"firstnames":["Andreas","K."],"suffixes":[]},{"propositions":[],"lastnames":["Ostrovski"],"firstnames":["Georg"],"suffixes":[]},{"propositions":[],"lastnames":["Petersen"],"firstnames":["Stig"],"suffixes":[]},{"propositions":[],"lastnames":["Beattie"],"firstnames":["Charles"],"suffixes":[]},{"propositions":[],"lastnames":["Sadik"],"firstnames":["Amir"],"suffixes":[]},{"propositions":[],"lastnames":["Antonoglou"],"firstnames":["Ioannis"],"suffixes":[]},{"propositions":[],"lastnames":["King"],"firstnames":["Helen"],"suffixes":[]},{"propositions":[],"lastnames":["Kumaran"],"firstnames":["Dharshan"],"suffixes":[]},{"propositions":[],"lastnames":["Wierstra"],"firstnames":["Daan"],"suffixes":[]},{"propositions":[],"lastnames":["Legg"],"firstnames":["Shane"],"suffixes":[]},{"propositions":[],"lastnames":["Hassabis"],"firstnames":["Demis"],"suffixes":[]}],"month":"February","year":"2015","note":"arXiv: 1604.03986 ISBN: 1476-4687 (Electronic) 0028-0836 (Linking)","keywords":"★","pages":"529–533","bibtex":"@article{mnih_human-level_2015,\n\ttitle = {Human-level control through deep reinforcement learning},\n\tvolume = {518},\n\tissn = {0028-0836},\n\turl = {http://dx.doi.org/10.1038/nature14236},\n\tdoi = {10.1038/nature14236},\n\tabstract = {Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.},\n\tnumber = {7540},\n\tjournal = {Nature},\n\tpublisher = {Nature Publishing Group},\n\tauthor = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis},\n\tmonth = feb,\n\tyear = {2015},\n\tnote = {arXiv: 1604.03986\nISBN: 1476-4687 (Electronic) 0028-0836 (Linking)},\n\tkeywords = {★},\n\tpages = {529--533},\n}\n\n\n\n\n\n\n\n","author_short":["Mnih, V.","Kavukcuoglu, K.","Silver, D.","Rusu, A. A.","Veness, J.","Bellemare, M. G.","Graves, A.","Riedmiller, M.","Fidjeland, A. K.","Ostrovski, G.","Petersen, S.","Beattie, C.","Sadik, A.","Antonoglou, I.","King, H.","Kumaran, D.","Wierstra, D.","Legg, S.","Hassabis, D."],"key":"mnih_human-level_2015","id":"mnih_human-level_2015","bibbaseid":"mnih-kavukcuoglu-silver-rusu-veness-bellemare-graves-riedmiller-etal-humanlevelcontrolthroughdeepreinforcementlearning-2015","role":"author","urls":{"Paper":"http://dx.doi.org/10.1038/nature14236"},"keyword":["★"],"metadata":{"authorlinks":{}},"downloads":0,"html":""},"search_terms":["human","level","control","through","deep","reinforcement","learning","mnih","kavukcuoglu","silver","rusu","veness","bellemare","graves","riedmiller","fidjeland","ostrovski","petersen","beattie","sadik","antonoglou","king","kumaran","wierstra","legg","hassabis"],"keywords":["★"],"authorIDs":[],"dataSources":["qLJ7Ld8T2ZKybATHB","EJbQZ5DryKAnqjJXj","KhfhF8P52iu5Szymq","79oQCSkNzzoZHJw69","SC4vtg3To6xYfQL7G","XFrKPG99s5t3W7xuW","CmHEoydhafhbkXXt5","Wsv2bQ4jPuc7qme8R","cx4WvnDhXJhiLqdQo","qAPjQpsx8e9aJNrSa","Cfgnp5s4HQSBd8tAf","PW3eQRZmFcK6vLuar"]}