Exploration by Optimisation in Partial Monitoring

Exploration by Optimisation in Partial Monitoring. Lattimore, T. & Szepesvári, C. In COLT, 06, 2020.

We provide a novel algorithm for adversarial $k$-action $d$-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the $n$-round minimax regret is bounded by $6m k^{3/2} \sqrt{n łog(k)}$, where $m$ is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.

@inproceedings{LSz20,
	abstract = {We provide a novel algorithm for adversarial $k$-action $d$-outcome partial monitoring that is adaptive, intuitive and efficient.
The highlight is that for the non-degenerate locally observable games, the $n$-round minimax regret is bounded by
$6m k^{3/2} \sqrt{n \log(k)}$, where $m$ is the number of signals.
This matches the best known information-theoretic upper bound derived via Bayesian minimax duality.
The same algorithm also achieves near-optimal regret for full information, bandit and
globally observable games. High probability bounds and simple experiments are also provided.
},
	acceptrate = {120 out of 359=33\%},
	author = {Lattimore, Tor and Szepesv\'ari, Csaba},
	booktitle = {COLT},
	month = {06},
	title = {Exploration by Optimisation in Partial Monitoring},
	url_paper = {COLT2020_explbyopt.pdf},
	year = {2020},
	keywords = {exploration},		
	}

Downloads: 49

{"_id":"qM3NasWYTwbE4abbW","bibbaseid":"lattimore-szepesvri-explorationbyoptimisationinpartialmonitoring-2020","authorIDs":["279PY77kXFE8vWA2Z","3RfzECoweoi7whJcn","4QCWeGJDcuieMasAe","4Tjqo47EWWsMKkTsz","4rnd6s56kwkYuN4vj","596hfkzoGyduaHJsx","6ZE3ATLtdNK2XKNyM","99T5SjY7hztGpFBvH","9ptfi8y4NAbFtcFyE","A2yHTTtEd7BHAWKxd","BnDo6icizXoM3ZM6w","CEF7BzjRG82xSkYnM","CNNkdvJNYs6mrvzjX","CuaCYHTopgvGbd8zk","F2vs4LRcswWXavxfy","FaD78bpAgKLAq4DE2","G25PrkxMGXRRMcCc4","Ge5Rxopmc3SuMrwAH","GpEM5uuobmY3kpHTW","JYhYxghGatqr4mF3H","JdCvvY7vmDS37xtBu","KDMX7rrdf6AsAYDyL","KFpw9rYFeSRdATA4e","KRpsFoiZnaCs9spJb","KaaDW3CcB7w9jsdXT","KergaMvq5ySYJJ3ja","L79tQyaj5QPQQWbhg","MYwHnbXmgZ6kDo3rw","MwHsLe6xMSqRXNS2a","Px8xSNb3LrPQap6Kk","Q6itd4jKLZFdSnTf3","R2QWF4bMkcqfXtkFy","R4cZsfzoubPJYRrnK","Ro8w9jcjvoj73u7Xr","TFtNr7Gkec5KGNDtp","XKguNtDfpi65mQGoP","Xfkk7uQL8EdfTKvQr","ZuZsatkxppZCHnGih","ZxvYv4Qz5HX2uJuNy","abeZr8physSQM35kQ","aod4LHA2acYGGgTq5","dPLx5jQPTZ38sge6e","daaG2KorDDHmmfE8n","e6FLJXcbsWN389Nac","euwQteZ8dvXDgnTeJ","fCcZBpWoomHwsZhMc","fjJ4rCAY73hrX8FfN","jT9EgmjXvsKC8mchN","jqRm9piESHxML2fDN","jzYGL4nHWtXMxLrS2","o7eSSyiMrY5sM7Riu","qZG9eGoTDZQerwFFk","rLnbnm3N6z6ao7Sgs","tPEcG6gpERvBMQHXC","tcPCYiCfNx26iQvrG","tepS4j4xyQcYE9w6A","u8YWp79iEPkjZWt8B","vEGDZadANdDu7HE4S","xEkabBjTQjdvXWXbX","xyst9ZfRqvy2Qhf39","z3Gjh8c2ESrGnGcxb"],"author_short":["Lattimore, T.","Szepesvári, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"We provide a novel algorithm for adversarial $k$-action $d$-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the $n$-round minimax regret is bounded by $6m k^{3/2} \\sqrt{n łog(k)}$, where $m$ is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided. ","acceptrate":"120 out of 359=33%","author":[{"propositions":[],"lastnames":["Lattimore"],"firstnames":["Tor"],"suffixes":[]},{"propositions":[],"lastnames":["Szepesvári"],"firstnames":["Csaba"],"suffixes":[]}],"booktitle":"COLT","month":"06","title":"Exploration by Optimisation in Partial Monitoring","url_paper":"COLT2020_explbyopt.pdf","year":"2020","keywords":"exploration","bibtex":"@inproceedings{LSz20,\n\tabstract = {We provide a novel algorithm for adversarial $k$-action $d$-outcome partial monitoring that is adaptive, intuitive and efficient.\nThe highlight is that for the non-degenerate locally observable games, the $n$-round minimax regret is bounded by\n$6m k^{3/2} \\sqrt{n \\log(k)}$, where $m$ is the number of signals.\nThis matches the best known information-theoretic upper bound derived via Bayesian minimax duality.\nThe same algorithm also achieves near-optimal regret for full information, bandit and\nglobally observable games. High probability bounds and simple experiments are also provided.\n},\n\tacceptrate = {120 out of 359=33\\%},\n\tauthor = {Lattimore, Tor and Szepesv\\'ari, Csaba},\n\tbooktitle = {COLT},\n\tmonth = {06},\n\ttitle = {Exploration by Optimisation in Partial Monitoring},\n\turl_paper = {COLT2020_explbyopt.pdf},\n\tyear = {2020},\n\tkeywords = {exploration},\t\t\n\t}\n\n","author_short":["Lattimore, T.","Szepesvári, C."],"key":"LSz20","id":"LSz20","bibbaseid":"lattimore-szepesvri-explorationbyoptimisationinpartialmonitoring-2020","role":"author","urls":{" paper":"https://www.ualberta.ca/~szepesva/papers/COLT2020_explbyopt.pdf"},"keyword":["exploration"],"metadata":{"authorlinks":{"szepesvári, c":"https://sites.ualberta.ca/~szepesva/pubs.html"}},"downloads":49},"bibtype":"inproceedings","biburl":"https://www.ualberta.ca/~szepesva/papers/p2.bib","creationDate":"2020-07-06T22:49:00.034Z","downloads":49,"keywords":["exploration"],"search_terms":["exploration","optimisation","partial","monitoring","lattimore","szepesvári"],"title":"Exploration by Optimisation in Partial Monitoring","year":2020,"dataSources":["dYMomj4Jofy8t4qmm","Ciq2jeFvPFYBCoxwJ","v2PxY4iCzrNyY9fhF","cd5AYQRw3RHjTgoQc"]}