###
Succinct Nearest Neighbor Search

####
Eric Sadit Tellez, Edgar Chavez, and Gonzalo Navarro.

An efficient and universal similarity search solution is a holy grail for
multimedia information retrieval. Most similarity indexes work by
mapping the original multimedia objects into simpler representations, which
are then searched by proximity using a suitable distance function.
In this paper we introduce a novel representation of the objects as sets of
integers, with a distance that is computed using set operations. This allows
us to use compressed inverted indexes, which have become a mature technology
that excels in this type of search. Such indexes allow processing queries in
main memory even for very large databases, so that the disk is accessed only
to verify a few candidates and present the results.

We show that our technique achieves very good speed/compression/recall
tradeoffs. As an example, to reach 92% recall in 30-nearest neighbor
searches,
an index using 1 to 2.5 bytes per object inspects less than 0.6% of the
actual objects.
Furthermore, the ratio between the distances to the actual nearest neighbors
and those reported by our algorithm is very close to 1.