Relating Data Compression and Learnability. Littlestone, N. & Warmuth, M. K. .
Relating Data Compression and Learnability [pdf]Paper  abstract   bibtex   
We explore the learnability of two-valued functions from samples using the paradigm of Data Compression. A rst algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sucient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. [] This approach provides an alternative to that of [BEHW86], which uses the Vapnik-Chervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the Vapnik-Chervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning.

Downloads: 0