Succinct Nearest Neighbor Search

Eric Sadit, Edgar Chávez, and Gonzalo Navarro

In this paper we present a novel technique for nearest neighbor searching dubbed neighborhood approximation. The central idea is to divide the database in compact regions represented by a single object, called the reference. To search for nearest neighbors a set of candidate references is first obtained and later enriched with the database objects associated to references.

The above approach can be implemented as an inverted index, which in turn can be represented in a succinct way, just a few bits per object. As a consequence it is possible to store the index in main memory, even for relatively large databases.

The speed/compression/recall tradeoff is excellent. To obtain 92% recall in 30-nearest neighbors searches the index review less than 0.006 of the database in time ranging from 0.35 to 2.67 seconds using from 93 to 24 Mbytes for a ten million objects database. The tradeoff comes from using different compression techniques. The uncompressed index uses 0.17 seconds time using 267 Mbytes of space.

A quality measure complementary to the recall is the ratio between the covering radius of the actual nearest neighbors and the near neighbors reported by the algorithm. Using this measure our results are within a small constant compared to exact results.