Efficient multivariate entropy estimation via $k$-nearest neighbour distances

Efficient multivariate entropy estimation via $k$-nearest neighbour distances. Berrett, T. B., Samworth, R. J., & Yuan, M. The Annals of Statistics, 47(1):288–318, February, 2019. Publisher: Institute of Mathematical Statistics

Paper doi abstract bibtex

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based on the $k$-nearest neighbour distances of a sample of $n$ independent and identically distributed random vectors in ${\}mathbb\{R\}{\textasciicircum}\{d\}$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when $d{\}leq 3$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.

@article{berrett_efficient_2019,
	title = {Efficient multivariate entropy estimation via \$k\$-nearest neighbour distances},
	volume = {47},
	issn = {0090-5364, 2168-8966},
	url = {https://projecteuclid.org/journals/annals-of-statistics/volume-47/issue-1/Efficient-multivariate-entropy-estimation-via-k-nearest-neighbour-distances/10.1214/18-AOS1688.full},
	doi = {10.1214/18-AOS1688},
	abstract = {Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based on the \$k\$-nearest neighbour distances of a sample of \$n\$ independent and identically distributed random vectors in \${\textbackslash}mathbb\{R\}{\textasciicircum}\{d\}\$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when \$d{\textbackslash}leq 3\$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.},
	number = {1},
	urldate = {2023-03-24},
	journal = {The Annals of Statistics},
	author = {Berrett, Thomas B. and Samworth, Richard J. and Yuan, Ming},
	month = feb,
	year = {2019},
	note = {Publisher: Institute of Mathematical Statistics},
	keywords = {62G20, 62G05, efficiency, Entropy estimation, Kozachenko–Leonenko estimator, weighted nearest neighbours},
	pages = {288--318},
	file = {Full Text PDF:/Users/soumikp/Zotero/storage/L4QT355T/Berrett et al. - 2019 - Efficient multivariate entropy estimation via \$k\$-.pdf:application/pdf},
}

Downloads: 0

{"_id":"A4jsCe5uKNEdCvuQA","bibbaseid":"berrett-samworth-yuan-efficientmultivariateentropyestimationviaknearestneighbourdistances-2019","author_short":["Berrett, T. B.","Samworth, R. J.","Yuan, M."],"bibdata":{"bibtype":"article","type":"article","title":"Efficient multivariate entropy estimation via $k$-nearest neighbour distances","volume":"47","issn":"0090-5364, 2168-8966","url":"https://projecteuclid.org/journals/annals-of-statistics/volume-47/issue-1/Efficient-multivariate-entropy-estimation-via-k-nearest-neighbour-distances/10.1214/18-AOS1688.full","doi":"10.1214/18-AOS1688","abstract":"Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based on the $k$-nearest neighbour distances of a sample of $n$ independent and identically distributed random vectors in ${\\}mathbb\\{R\\}{\\textasciicircum}\\{d\\}$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when $d{\\}leq 3$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.","number":"1","urldate":"2023-03-24","journal":"The Annals of Statistics","author":[{"propositions":[],"lastnames":["Berrett"],"firstnames":["Thomas","B."],"suffixes":[]},{"propositions":[],"lastnames":["Samworth"],"firstnames":["Richard","J."],"suffixes":[]},{"propositions":[],"lastnames":["Yuan"],"firstnames":["Ming"],"suffixes":[]}],"month":"February","year":"2019","note":"Publisher: Institute of Mathematical Statistics","keywords":"62G20, 62G05, efficiency, Entropy estimation, Kozachenko–Leonenko estimator, weighted nearest neighbours","pages":"288–318","file":"Full Text PDF:/Users/soumikp/Zotero/storage/L4QT355T/Berrett et al. - 2019 - Efficient multivariate entropy estimation via $k$-.pdf:application/pdf","bibtex":"@article{berrett_efficient_2019,\n\ttitle = {Efficient multivariate entropy estimation via \\$k\\$-nearest neighbour distances},\n\tvolume = {47},\n\tissn = {0090-5364, 2168-8966},\n\turl = {https://projecteuclid.org/journals/annals-of-statistics/volume-47/issue-1/Efficient-multivariate-entropy-estimation-via-k-nearest-neighbour-distances/10.1214/18-AOS1688.full},\n\tdoi = {10.1214/18-AOS1688},\n\tabstract = {Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based on the \\$k\\$-nearest neighbour distances of a sample of \\$n\\$ independent and identically distributed random vectors in \\${\\textbackslash}mathbb\\{R\\}{\\textasciicircum}\\{d\\}\\$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when \\$d{\\textbackslash}leq 3\\$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.},\n\tnumber = {1},\n\turldate = {2023-03-24},\n\tjournal = {The Annals of Statistics},\n\tauthor = {Berrett, Thomas B. and Samworth, Richard J. and Yuan, Ming},\n\tmonth = feb,\n\tyear = {2019},\n\tnote = {Publisher: Institute of Mathematical Statistics},\n\tkeywords = {62G20, 62G05, efficiency, Entropy estimation, Kozachenko–Leonenko estimator, weighted nearest neighbours},\n\tpages = {288--318},\n\tfile = {Full Text PDF:/Users/soumikp/Zotero/storage/L4QT355T/Berrett et al. - 2019 - Efficient multivariate entropy estimation via \\$k\\$-.pdf:application/pdf},\n}\n\n","author_short":["Berrett, T. B.","Samworth, R. J.","Yuan, M."],"key":"berrett_efficient_2019","id":"berrett_efficient_2019","bibbaseid":"berrett-samworth-yuan-efficientmultivariateentropyestimationviaknearestneighbourdistances-2019","role":"author","urls":{"Paper":"https://projecteuclid.org/journals/annals-of-statistics/volume-47/issue-1/Efficient-multivariate-entropy-estimation-via-k-nearest-neighbour-distances/10.1214/18-AOS1688.full"},"keyword":["62G20","62G05","efficiency","Entropy estimation","Kozachenko–Leonenko estimator","weighted nearest neighbours"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/f/Ceciz2iNjTZgQNtDc/mypubs_mar_2024.bib","dataSources":["m8Y57GfgnRrMKZTQS","epk5yKhDyD37NAsSC"],"keywords":["62g20","62g05","efficiency","entropy estimation","kozachenko–leonenko estimator","weighted nearest neighbours"],"search_terms":["efficient","multivariate","entropy","estimation","via","nearest","neighbour","distances","berrett","samworth","yuan"],"title":"Efficient multivariate entropy estimation via $k$-nearest neighbour distances","year":2019}