Fitted Q-iteration in Continuous Action-space MDPs. Antos, A., Munos, R., & Szepesvári, C. In *Advances in Neural Information Processing Systems*, pages 9–16, 2007.

Paper abstract bibtex 3 downloads

Paper abstract bibtex 3 downloads

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems. Note: In retrospect, it would have been better to call this algorithm an actor-critic algorithm. The algorithm that we considers updates a policy and a value function (action-value function in this case).

@inproceedings{antos2007, abstract = {We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems. Note: In retrospect, it would have been better to call this algorithm an actor-critic algorithm. The algorithm that we considers updates a policy and a value function (action-value function in this case).}, author = {Antos, A. and Munos, R. and Szepesv{\'a}ri, Cs.}, booktitle = {Advances in Neural Information Processing Systems}, keywords = {batch learning, reinforcement learning, function approximation, performance bounds, actor-critic methods, nonparametrics}, pages = {9--16}, title = {Fitted Q-iteration in Continuous Action-space MDPs}, url_paper = {rlca.pdf}, year = {2007}}

Downloads: 3

{"_id":"iRGWjkePzDuvTtb9Z","bibbaseid":"antos-munos-szepesvri-fittedqiterationincontinuousactionspacemdps-2007","authorIDs":["279PY77kXFE8vWA2Z","2D7qHXzoqBDrsXraN","2NWXtkdHPuiv98fKs","2jdSf6tbdZhqmZoS8","2nxx7ACaruh7iK8f8","2vSJk5XEm3rEYwurK","32JiATPMQE7FWriRt","3AfeijGNsN8mpcE48","3RfzECoweoi7whJcn","3rSFxsAnwMbMpG3S8","3s2MzsDXExmQeBwnh","3zwS8Ssco5SQeaKSF","4KfzFX9PPuzMHGECG","4LGCCsttcqqsBhcgC","4QCWeGJDcuieMasAe","4RdeTkj45uydsJWtf","4Tjqo47EWWsMKkTsz","4rnd6s56kwkYuN4vj","4wv8N73WsiRxpbSDN","596hfkzoGyduaHJsx","5cknfg97BteFEuPYW","5qN54o7kSJx8EXEJB","5wBypKw2vGJjKGJYf","62cdm64LSj2QQKxoe","6P7F7YD5iq9GJoKFF","6WHczfunmjvmK7yt9","6ZE3ATLtdNK2XKNyM","77cGtbpgmo5BLLoQT","793mKnZ6ZfT8NpTSc","7C3Eof9dLjSREQX5o","7LsSZFZGRDEpuBKaT","7RY3ZwaAknsSH2k2J","7WPP3MTRBcmS7isfe","7cw6ZnDSCerwjQ7b7","8LDdAWvCHhd43b59E","8RGDLABf9pK7RnAAP","8WEMJhNeam2JBZzba","8aW4FL2jpj55Fp8Et","8si9ev8RXQ2AMBTYY","99T5SjY7hztGpFBvH","9ptfi8y4NAbFtcFyE","9vAiiJbZE2nqJpSye","A2yHTTtEd7BHAWKxd","AanXwy53QfAQ7H5or","Au9aNaigywe27GRXt","AvtqCYzeczmq6r4Pm","AwDAqABZLH9q7ivbA","BD4Hdv5n3oAaG8Myo","BQb3bQzbnacmQ2Nfn","BWKJPwgwu4YJEiCSF","BaSkT6CoFaikED65c","BeKZBYePker73NYaW","BfkSoEcfxRLvmvZjx","BnDo6icizXoM3ZM6w","Bp6mj7TJsYJZ5ysau","BzqzD3kyiWmGXxSX2","CEF7BzjRG82xSkYnM","CFXdJbo2kH2DxnZ3r","CFyyfMtja9RChADkb","CKmTatuBRfdQ7oRqt","CNNkdvJNYs6mrvzjX","CkjJdYuszRwZC3aby","CuaCYHTopgvGbd8zk","Db6F6oAwmtxrRZ4Xk","EcNmDLmzNx5qJWrTJ","EhgRapTR9gpwd8GqC","EnjbnN5QxJfycEFot","EpjmAxFHcYnPRmpAy","EtiCWD2idtepY8u2v","EzApurSSk4FMQoFfA","F2vs4LRcswWXavxfy","F4KfJnr3Ss7HoDrQ7","F92MDPBq2JbNDgguQ","FKTud9JfBmdmxLsFq","FaD78bpAgKLAq4DE2","FowtiKtSrw63oc8nB","G25PrkxMGXRRMcCc4","G4qsQq4vTc8tzFdDd","G7wmMrhBFwdNWqG6x","GRBsJjKZ5KrhbhuyZ","GRFbpc8LyWJTvyNyN","Ge5Rxopmc3SuMrwAH","GeF4m3ShgXFLrefmv","GpEM5uuobmY3kpHTW","GyG5MtNEJTriLvg9L","GzttjDqkZ7AofCaBC","HQexAEhMqWng3FYTx","HZkj8nWinkG8RGyYR","HeA8EiNy9csjTPfZ7","JEbvqTxk6JESPHnx2","JF54pphczR7WdX7Km","JSjWRZJmttSrtr3aY","JYhYxghGatqr4mF3H","JcKRLmxxJkMLsTpk3","JdCvvY7vmDS37xtBu","JpQfHtdoWMG9CKnBY","Jy8Spsne93gHavwin","JynWnHF987JKr4x96","K35jR5H6x5n7zt68g","KDMX7rrdf6AsAYDyL","KFpw9rYFeSRdATA4e","KKEqgve2E9zpJ3QhS","KRMezpN6EiTiG5FLp","KRpsFoiZnaCs9spJb","KXB8ePWrk8bidEQZ4","KaaDW3CcB7w9jsdXT","KergaMvq5ySYJJ3ja","KrANNKoB6LGvD4RtL","KtXQLnfkd4hAMmHbX","L6rzkWjjNHg7jfnds","L79tQyaj5QPQQWbhg","LQhYttJrwZWd7DrQR","Ld9R4QktQCeKY5Evy","Ldfz29PMdHXZiWNvj","LmWjNvpm7mp5CcgLC","Lw9RgtJqB5qbTcS2C","MC2LMEKQ3EATsdpDY","MCevASJPSmNMGNJxx","MXwP9eMvQqu9NnP5X","MYwHnbXmgZ6kDo3rw","McyeBcvo4PZTAv7M7","MjeSNYzCEoHDzGxrc","MqACNns2ePKNzkjNz","MwHsLe6xMSqRXNS2a","MzMxPK3tnEdaXYXHx","N6qcDirNbTE4YgX8x","Nqdz2pg2eacb6buWK","PJhFKRrz3EvjiKPzR","PSAjh25akFifX2ZfN","Pdp4C2gTBZ7xvHrDK","Por6oS262gdqYa8rr","Pt2EC4NdJaRQQjqMJ","Px8xSNb3LrPQap6Kk","Q6itd4jKLZFdSnTf3","Qhia4guHX4jyzHbQs","QnCMenKnAvFhJGnE5","R2QWF4bMkcqfXtkFy","R4cZsfzoubPJYRrnK","Ro8w9jcjvoj73u7Xr","RqacNJpuaHYLKpaHC","SKxfQQ82vSnyWbaqv","SrwM2vqXtCXv3w6r8","Ssv23SugpQfsQSMX9","SuSf6qvpGmcea5sXg","TBZagbGJSdbNJugmB","TFtNr7Gkec5KGNDtp","TStcihizC52x2ioxc","TceAvDjPMyuHHwxSH","TezhSGA54uQKB2pHs","ThzK4FfJdffq6AX8E","WB9mhhwJaBkCpShBe","WKu3rLNXRRkS5BqtF","WRZmzH63799tejdn7","WXk4jWburgmauBt7C","Wj5JAf7LZRbQrwNBv","WwyKmshbvxgLDETsx","XKguNtDfpi65mQGoP","Xfkk7uQL8EdfTKvQr","XufMgvdbedgEr9AJF","Y5cQfZ2EcGWeKXnPH","YAFji5o58gK7EpkAo","YFwZa2FjPNTszexdz","YSq38LDamJaPixav2","Yb6DPPQCXKHeYbEHN","Ye34T2brz67qpGryv","YiYpFyS8woCkdhoiR","Ykhm4ZJBtfXGifdJT","ZSrq2ruek7WCQpLjZ","ZnEgj7goyoQcrbbkt","ZrEwoLRgsYFRvmBM3","ZuZsatkxppZCHnGih","ZvJuHZznMf9AkJaQz","ZxvYv4Qz5HX2uJuNy","ZyWT99n8MudD2L4wC","ZyYJdxvrLWa3xzfLN","aAM8WYXNivryxK788","aGRfxR5GrwH86e8Yq","aGnaJvGkCaHLPE98T","aKGfFMFF2iP9iBpat","abeZr8physSQM35kQ","ah5nYxu78LHrnohN5","aic3PMJ7pGSALfoQz","ajyTaR56CwAGAwPMK","aod4LHA2acYGGgTq5","ayeonfMB7wX57i6rt","b5rT7DfYesos7Jvxq","bBJKWChRwaW6xbuJ3","bKqf8jmHoiFQ5igMd","bQ6gtMSx58BMJDaGm","bdFvo4PbiGpDXspAj","bmJDR5g2aTpKLryzH","bojupz7LkNeit65N4","bqjCh2KJa9oEXi4W3","bszwp5Z2Qn55mH3DK","busywH6PS39Nm9ZaD","c48vfb55fwPmubFJK","cZeQ9KBRjqhhAddd6","d7HEncsQi9FQ9ZnEf","d9bTcaWoACdq8gcPd","dPLx5jQPTZ38sge6e","dSbpzoDrx9Ej3umd6","dXujMQMamxikyQW5L","daaG2KorDDHmmfE8n","dgFExufhAa2X4ZfDu","dmhDotkJiRPyc4BKc","dnFmayFL7iiJoAKYY","drqQFrFQPKok8anXq","e6FLJXcbsWN389Nac","e7mt5b5K2A7x8tMC4","eCyFauMpXb7HBGEL9","egbKX8T3XeC6CB5Ss","euwQteZ8dvXDgnTeJ","f8CMQr2dxc4mToggD","fCcZBpWoomHwsZhMc","fi9Njbc9PNb6H7mKk","fjJ4rCAY73hrX8FfN","fmmS5YfMBPyiFxMKo","fr7jxa5gmZt5cQgsn","ft2THiTnasQSzHCuz","g53T79nBo3zPS3e3A","g6H7GCHhqB2hxT9tD","g9rrAXy9cpJREqZpG","gCiBwcvBRXuhN6THT","gLRPiWaYJbXc65Xdj","gN9iFKFYKg5hpXH2g","gk4TWPADdMJkG2d55","grDAS7p9N526PE9KT","guQSSkELw6cHcDJ77","hHcy6gGW5T4BmCyGr","hPu7R32gMnDZT8DTM","hapRYywPMG7eTYKdn","hkA3NWgTN3S2hAS9j","hmSszMbDpTHeKRbQN","iTeWB9tMBChksmfRi","iZY3zMXJqtwthqGLf","iehHt2fvDipiirmda","iudtvEJDPRb9kzqwQ","j2FCZFfugW6bzo8sy","j2dgsqZB6s4nwmkv4","jHmuvbkGEx9c2x9qT","jT9EgmjXvsKC8mchN","jfmhkP9tBb5CPF34G","jmCr7AdWrnXoni54Z","jqRm9piESHxML2fDN","jtukZcBvTqeLPEGbG","jvGfvSgSdoi8cCLwG","jzYGL4nHWtXMxLrS2","kKM8sCmqcQ4MF7EyJ","kPKrqFYFpRudpxBGa","kbh4okhBn289JWXSg","kcPLzkQsPqppFc2ta","kvrLo4T5ozkrmr7o3","m6xaEFjGNDJwwTT9g","mRfFf8jcPqEGCcnz6","maXZk4pSWP5ug6itZ","mb33guDtKCRK8YwtX","mqkuyPCpPYeHGRxf2","mumW39T4dMxwJznAX","mzy7ANDQid6NEgT3J","nALRNQrKriwFcyYrP","nysjrsfbKn2mM2kpP","o7eSSyiMrY5sM7Riu","oFkdkTATAEezGtx6z","oaoYKJTLnSSepkCma","owMdmtr7BNr2EGZCH","oz3yZxGqX3YNuYW6p","pGGEajvcnrFZAbtnw","pGL2sg87CePPyBe35","pQkRCin3QCgFfdeZS","pSqSMQXbPwMiceJco","pSsH62z9peDqdQLEp","pcabD8b3a5jgJiCQj","pedoPB9whhPpHP5QN","ppjYhLyLA96Tc3xFv","ptZYRKkQnt6z4opZj","qAzJb4HQRWNFTboh6","qNNzt66GAqQX4q6nj","qQZqRDvYFwWjheqeN","qZG9eGoTDZQerwFFk","qsRk8h3hY5oki6HL8","quxNuFg9Bfgi7tZbL","qyoCz465tcgbF6k84","rLnbnm3N6z6ao7Sgs","rYns6NBaim47CHzB6","rj6N6SCnT439kMHMX","rtq4C4Aa9pjX7cqEA","rweBTLYEuDTMtfrqW","rxFwHBLm65EJmrsff","sAFSd9zJX4zqvgPgP","sBB47EEbFQEeoHYdY","sD2rsN2StRCu4hnEd","sK7bSiuBJzd2hmSb8","sRM2feKwvLjKGE4T5","skqaFycHXghbGjSyr","tEF27Asuzks6LuhDc","tG2QP37XTyXSFqoW5","tN2Xi8dE49T38Me7D","tPEcG6gpERvBMQHXC","tQTMaq52LbDdf2Dkg","tcPCYiCfNx26iQvrG","tepS4j4xyQcYE9w6A","tgkR7vLLWSxHuGWqE","tiNxi5HSdGPY64uTz","trxfmmqMmZfL2ZiQr","u8YWp79iEPkjZWt8B","uFkYZHPYmxzc49L7C","ukkTSQtdhzcxkb7A5","uvf6di2o6HtEqHnEE","vDJqoDcmigxiiLyM3","vEGDZadANdDu7HE4S","vbqzFJHffryGh48j3","ve7TL8xsaiyQYLCoA","vquWK752x4mufxqTh","wEgGifShAzFD5dtAN","wTZjspTgAAWDEffNe","wTfyrvg6vBYFtGNMB","wcQRoMHf267L4NqXE","wgycKNgr5p4RdvfvE","wxDwP8oB96zqRykGW","xDPQQz5qucFkLvQGh","xEdRfj9HvMiXdm446","xEkabBjTQjdvXWXbX","xWzq2PYnQyGs2xgWM","xbt8L6ShwbqvD2BXi","xhFRvr7t74c9RNzfq","xiwi83Yr2jYiCrcZA","xjnXBabC3ENSKmNee","xms6HmdTpJjS7kqXM","xyst9ZfRqvy2Qhf39","y9pNX95MQpgkRNzZ6","yBqs6bCrGdSn3qzts","yCniYJnerQbuGqTfF","yZwfLRC8ZjJPtHRi2","yihFcCL955KFLR4h2","yj2AyF56MJsfhM23o","z28BNvgZk68asQdcK","z3Gjh8c2ESrGnGcxb","z6JBip23F3AdZdfwq","zHjwxCJzKeEdrLPZs","zSC285QSD92qh2Faz","zZ7MtfMqKYD35u4hF","za7fMuwSiCWJ39cYt","zp3vGBRAgXABpLXZr"],"author_short":["Antos, A.","Munos, R.","Szepesvári, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems. Note: In retrospect, it would have been better to call this algorithm an actor-critic algorithm. The algorithm that we considers updates a policy and a value function (action-value function in this case).","author":[{"propositions":[],"lastnames":["Antos"],"firstnames":["A."],"suffixes":[]},{"propositions":[],"lastnames":["Munos"],"firstnames":["R."],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Cs."],"suffixes":[]}],"booktitle":"Advances in Neural Information Processing Systems","keywords":"batch learning, reinforcement learning, function approximation, performance bounds, actor-critic methods, nonparametrics","pages":"9–16","title":"Fitted Q-iteration in Continuous Action-space MDPs","url_paper":"rlca.pdf","year":"2007","bibtex":"@inproceedings{antos2007,\n\tabstract = {We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems.\n\n\t\t Note: In retrospect, it would have been better to call this algorithm an actor-critic algorithm. The algorithm that we considers updates a policy and a value function (action-value function in this case).},\n\tauthor = {Antos, A. and Munos, R. and Szepesv{\\'a}ri, Cs.},\n\tbooktitle = {Advances in Neural Information Processing Systems},\n\tkeywords = {batch learning, reinforcement learning, function approximation, performance bounds, actor-critic methods, nonparametrics},\n\tpages = {9--16},\n\ttitle = {Fitted Q-iteration in Continuous Action-space MDPs},\n\turl_paper = {rlca.pdf},\n\tyear = {2007}}\n\n","author_short":["Antos, A.","Munos, R.","Szepesvári, C."],"key":"antos2007","id":"antos2007","bibbaseid":"antos-munos-szepesvri-fittedqiterationincontinuousactionspacemdps-2007","role":"author","urls":{" paper":"https://www.ualberta.ca/~szepesva/papers/rlca.pdf"},"keyword":["batch learning","reinforcement learning","function approximation","performance bounds","actor-critic methods","nonparametrics"],"metadata":{"authorlinks":{"szepesvári, c":"https://sites.ualberta.ca/~szepesva/pubs.html"}},"downloads":3},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","creationDate":"2020-03-08T20:45:59.873Z","downloads":3,"keywords":["batch learning","reinforcement learning","function approximation","performance bounds","actor-critic methods","nonparametrics"],"search_terms":["fitted","iteration","continuous","action","space","mdps","antos","munos","szepesvári"],"title":"Fitted Q-iteration in Continuous Action-space MDPs","year":2007,"dataSources":["dYMomj4Jofy8t4qmm","Ciq2jeFvPFYBCoxwJ","v2PxY4iCzrNyY9fhF","cd5AYQRw3RHjTgoQc"]}