In this paper we introduce a novel representation of the objects as sets of integers, with a distance that is computed using set operations. This allows us to use compressed inverted indexes, which have become a mature technology that excels in this type of search. Such indexes allow processing queries in main memory even for very large databases, so that the disk is accessed only to verify a few candidates and present the results.
We show that our technique achieves very good speed/compression/recall tradeoffs. As an example, to reach 92% recall in 30-nearest neighbor searches, an index using 1 to 2.5 bytes per object inspects less than 0.6% of the actual objects. Furthermore, the ratio between the distances to the actual nearest neighbors and those reported by our algorithm is very close to 1.