Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance. Jacobs, P. M., Namjoo, F., & Phillips, J. M. April, 2025. arXiv:2504.11299 [stat]
Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance [link]Paper  doi  abstract   bibtex   
We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multidimensional setting, and make new arguments about the proper way to approach this generalization. Our proposed formulation maximizes the difference over orthogonal dominating rectangular ranges (d-sided rectangles in Rd), and is an integral probability metric. We also prove that the distance between a distribution and a sample from the distribution converges to 0 as the sample size grows, and bound this rate. Moreover we show that one can, up to this same approximation error, compute the distance efficiently in 4 or fewer dimensions; specifically the runtime is near-linear in the size of the sample needed for that error. With this, we derive a δ-precision two-sample hypothesis test using this distance. Finally, we show these metric and approximation properties do not hold for other popular variants.
@misc{jacobs_efficient_2025,
	title = {Efficient and {Stable} {Multi}-{Dimensional} {Kolmogorov}-{Smirnov} {Distance}},
	url = {http://arxiv.org/abs/2504.11299},
	doi = {10.48550/arXiv.2504.11299},
	abstract = {We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multidimensional setting, and make new arguments about the proper way to approach this generalization. Our proposed formulation maximizes the difference over orthogonal dominating rectangular ranges (d-sided rectangles in Rd), and is an integral probability metric. We also prove that the distance between a distribution and a sample from the distribution converges to 0 as the sample size grows, and bound this rate. Moreover we show that one can, up to this same approximation error, compute the distance efficiently in 4 or fewer dimensions; specifically the runtime is near-linear in the size of the sample needed for that error. With this, we derive a δ-precision two-sample hypothesis test using this distance. Finally, we show these metric and approximation properties do not hold for other popular variants.},
	language = {en},
	urldate = {2025-05-28},
	publisher = {arXiv},
	author = {Jacobs, Peter Matthew and Namjoo, Foad and Phillips, Jeff M.},
	month = apr,
	year = {2025},
	note = {arXiv:2504.11299 [stat]},
	keywords = {Computer Science - Computational Geometry, Computer Science - Machine Learning, Statistics - Computation},
}

Downloads: 0