A randomized mirror descent algorithm for large scale multiple kernel learning

A randomized mirror descent algorithm for large scale multiple kernel learning. Afkanpour, A., György, A., Bowling, M. H., & Szepesvári, C. In ICML, pages 374–382, 06, 2013.

Paper abstract bibtex 2 downloads

We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show the surprising result that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows the implementation of sampling from this distribution to run in $O(løg(d))$ time, making the total computational cost of the method to achieve an epsilon-optimal solution to be $O(łog(d)/ε^2)$, thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.

@inproceedings{AfGyBoSze13,
	abstract = {We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel.
When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable.
We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up  is the computationally efficient construction of low-variance estimates of the gradient.
We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show the surprising result that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows the implementation of sampling from this distribution to run in $O(l\og(d))$ time, making the total computational cost of the method to achieve an epsilon-optimal solution to be $O(\log(d)/\epsilon^2)$, thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.},
	acceptrate = {283 out of 1204=24\%},
	author = {Afkanpour, A. and Gy{\"o}rgy, A. and Bowling, M. H. and Szepesv{\'a}ri, Cs.},
	booktitle = {ICML},
	keywords = {theory, RKHS, kernels, optimization, multikernel learning, stochastic gradient methods, coordinate descent, mirror descent},
	month = {06},
	pages = {374--382},
	title = {A randomized mirror descent algorithm for large scale multiple kernel learning},
	url_paper = {mkl_icml2013.pdf},
	year = {2013}}

Downloads: 2