RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data

RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data. Coleman, B., Shrivastava, A., & Baraniuk, R. G. arXiv:1902.06687 [cs, eess, stat], February, 2019. arXiv: 1902.06687

Paper abstract bibtex

We demonstrate the first possibility of a sub-linear memory sketch for solving the approximate near-neighbor search problem. In particular, we develop an online sketching algorithm that can compress \$N\$ vectors into a tiny sketch consisting of small arrays of counters whose size scales as \$O(N{\textasciicircum}\{b\}{\textbackslash}log{\textasciicircum}2\{N\})\$, where \$b {\textless} 1\$ depending on the stability of the near-neighbor search. This sketch is sufficient to identify the top-\$v\$ near-neighbors with high probability. To the best of our knowledge, this is the first near-neighbor search algorithm that breaks the linear memory (\$O(N)\$) barrier. We achieve sub-linear memory by combining advances in locality sensitive hashing (LSH) based estimation, especially the recently-published ACE algorithm, with compressed sensing and heavy hitter techniques. We provide strong theoretical guarantees; in particular, our analysis sheds new light on the memory-accuracy tradeoff in the near-neighbor search setting and the role of sparsity in compressed sensing, which could be of independent interest. We rigorously evaluate our framework, which we call RACE (Repeated ACE) data structures on a friend recommendation task on the Google plus graph with more than 100,000 high-dimensional vectors. RACE provides compression that is orders of magnitude better than the random projection based alternative, which is unsurprising given the theoretical advantage. We anticipate that RACE will enable both new theoretical perspectives on near-neighbor search and new methodologies for applications like high-speed data mining, internet-of-things (IoT), and beyond.

@article{coleman_race:_2019,
	title = {{RACE}: {Sub}-{Linear} {Memory} {Sketches} for {Approximate} {Near}-{Neighbor} {Search} on {Streaming} {Data}},
	shorttitle = {{RACE}},
	url = {http://arxiv.org/abs/1902.06687},
	abstract = {We demonstrate the first possibility of a sub-linear memory sketch for solving the approximate near-neighbor search problem. In particular, we develop an online sketching algorithm that can compress \$N\$ vectors into a tiny sketch consisting of small arrays of counters whose size scales as \$O(N{\textasciicircum}\{b\}{\textbackslash}log{\textasciicircum}2\{N\})\$, where \$b {\textless} 1\$ depending on the stability of the near-neighbor search. This sketch is sufficient to identify the top-\$v\$ near-neighbors with high probability. To the best of our knowledge, this is the first near-neighbor search algorithm that breaks the linear memory (\$O(N)\$) barrier. We achieve sub-linear memory by combining advances in locality sensitive hashing (LSH) based estimation, especially the recently-published ACE algorithm, with compressed sensing and heavy hitter techniques. We provide strong theoretical guarantees; in particular, our analysis sheds new light on the memory-accuracy tradeoff in the near-neighbor search setting and the role of sparsity in compressed sensing, which could be of independent interest. We rigorously evaluate our framework, which we call RACE (Repeated ACE) data structures on a friend recommendation task on the Google plus graph with more than 100,000 high-dimensional vectors. RACE provides compression that is orders of magnitude better than the random projection based alternative, which is unsurprising given the theoretical advantage. We anticipate that RACE will enable both new theoretical perspectives on near-neighbor search and new methodologies for applications like high-speed data mining, internet-of-things (IoT), and beyond.},
	urldate = {2019-03-23TZ},
	journal = {arXiv:1902.06687 [cs, eess, stat]},
	author = {Coleman, Benjamin and Shrivastava, Anshumali and Baraniuk, Richard G.},
	month = feb,
	year = {2019},
	note = {arXiv: 1902.06687},
	keywords = {⛔ No DOI found}
}

Downloads: 0

{"_id":"2f9LSgGuYmQpwqwaY","bibbaseid":"coleman-shrivastava-baraniuk-racesublinearmemorysketchesforapproximatenearneighborsearchonstreamingdata-2019","authorIDs":[],"author_short":["Coleman, B.","Shrivastava, A.","Baraniuk, R. G."],"bibdata":{"bibtype":"article","type":"article","title":"RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data","shorttitle":"RACE","url":"http://arxiv.org/abs/1902.06687","abstract":"We demonstrate the first possibility of a sub-linear memory sketch for solving the approximate near-neighbor search problem. In particular, we develop an online sketching algorithm that can compress \\$N\\$ vectors into a tiny sketch consisting of small arrays of counters whose size scales as \\$O(N{\\textasciicircum}\\{b\\}{\\textbackslash}log{\\textasciicircum}2\\{N\\})\\$, where \\$b {\\textless} 1\\$ depending on the stability of the near-neighbor search. This sketch is sufficient to identify the top-\\$v\\$ near-neighbors with high probability. To the best of our knowledge, this is the first near-neighbor search algorithm that breaks the linear memory (\\$O(N)\\$) barrier. We achieve sub-linear memory by combining advances in locality sensitive hashing (LSH) based estimation, especially the recently-published ACE algorithm, with compressed sensing and heavy hitter techniques. We provide strong theoretical guarantees; in particular, our analysis sheds new light on the memory-accuracy tradeoff in the near-neighbor search setting and the role of sparsity in compressed sensing, which could be of independent interest. We rigorously evaluate our framework, which we call RACE (Repeated ACE) data structures on a friend recommendation task on the Google plus graph with more than 100,000 high-dimensional vectors. RACE provides compression that is orders of magnitude better than the random projection based alternative, which is unsurprising given the theoretical advantage. We anticipate that RACE will enable both new theoretical perspectives on near-neighbor search and new methodologies for applications like high-speed data mining, internet-of-things (IoT), and beyond.","urldate":"2019-03-23TZ","journal":"arXiv:1902.06687 [cs, eess, stat]","author":[{"propositions":[],"lastnames":["Coleman"],"firstnames":["Benjamin"],"suffixes":[]},{"propositions":[],"lastnames":["Shrivastava"],"firstnames":["Anshumali"],"suffixes":[]},{"propositions":[],"lastnames":["Baraniuk"],"firstnames":["Richard","G."],"suffixes":[]}],"month":"February","year":"2019","note":"arXiv: 1902.06687","keywords":"⛔ No DOI found","bibtex":"@article{coleman_race:_2019,\n\ttitle = {{RACE}: {Sub}-{Linear} {Memory} {Sketches} for {Approximate} {Near}-{Neighbor} {Search} on {Streaming} {Data}},\n\tshorttitle = {{RACE}},\n\turl = {http://arxiv.org/abs/1902.06687},\n\tabstract = {We demonstrate the first possibility of a sub-linear memory sketch for solving the approximate near-neighbor search problem. In particular, we develop an online sketching algorithm that can compress \\$N\\$ vectors into a tiny sketch consisting of small arrays of counters whose size scales as \\$O(N{\\textasciicircum}\\{b\\}{\\textbackslash}log{\\textasciicircum}2\\{N\\})\\$, where \\$b {\\textless} 1\\$ depending on the stability of the near-neighbor search. This sketch is sufficient to identify the top-\\$v\\$ near-neighbors with high probability. To the best of our knowledge, this is the first near-neighbor search algorithm that breaks the linear memory (\\$O(N)\\$) barrier. We achieve sub-linear memory by combining advances in locality sensitive hashing (LSH) based estimation, especially the recently-published ACE algorithm, with compressed sensing and heavy hitter techniques. We provide strong theoretical guarantees; in particular, our analysis sheds new light on the memory-accuracy tradeoff in the near-neighbor search setting and the role of sparsity in compressed sensing, which could be of independent interest. We rigorously evaluate our framework, which we call RACE (Repeated ACE) data structures on a friend recommendation task on the Google plus graph with more than 100,000 high-dimensional vectors. RACE provides compression that is orders of magnitude better than the random projection based alternative, which is unsurprising given the theoretical advantage. We anticipate that RACE will enable both new theoretical perspectives on near-neighbor search and new methodologies for applications like high-speed data mining, internet-of-things (IoT), and beyond.},\n\turldate = {2019-03-23TZ},\n\tjournal = {arXiv:1902.06687 [cs, eess, stat]},\n\tauthor = {Coleman, Benjamin and Shrivastava, Anshumali and Baraniuk, Richard G.},\n\tmonth = feb,\n\tyear = {2019},\n\tnote = {arXiv: 1902.06687},\n\tkeywords = {⛔ No DOI found}\n}\n\n","author_short":["Coleman, B.","Shrivastava, A.","Baraniuk, R. G."],"key":"coleman_race:_2019","id":"coleman_race:_2019","bibbaseid":"coleman-shrivastava-baraniuk-racesublinearmemorysketchesforapproximatenearneighborsearchonstreamingdata-2019","role":"author","urls":{"Paper":"http://arxiv.org/abs/1902.06687"},"keyword":["⛔ No DOI found"],"downloads":0},"bibtype":"article","biburl":"https://bibbase.org/zotero/ky25","creationDate":"2019-05-11T17:47:04.244Z","downloads":0,"keywords":["⛔ no doi found"],"search_terms":["race","sub","linear","memory","sketches","approximate","near","neighbor","search","streaming","data","coleman","shrivastava","baraniuk"],"title":"RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data","year":2019,"dataSources":["XxiQtwZYfozhQmvGR"]}