Exploration by Optimisation in Partial Monitoring. Lattimore, T. & Szepesvári, C. In COLT, 06, 2020.
Exploration by Optimisation in Partial Monitoring [pdf]Paper  abstract   bibtex   49 downloads  
We provide a novel algorithm for adversarial $k$-action $d$-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the $n$-round minimax regret is bounded by $6m k^{3/2} \sqrt{n łog(k)}$, where $m$ is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.

Downloads: 49