List of Clustered Permutations for Proximity Searching

Karina Figueroa and Rodrigo Paredes

The permutation based algorithm has been proved unbeatable in high dimensional spaces, requiring O(|P|) distance evaluations when solving similarity queries (where P is the set of permutants); but needs n evaluations of the permutant distance to compute the order to review the metric dataset, requires O(n|P|) space, and does not take much benefit from low dimensionality. There have been several proposals to avoid the n computations of the permutant distance, however all of them lose precision. Inspired in the list of cluster, in this paper we group the permutations and establish a criterion to discard whole clusters according the permutation of their centers. As a consequence of our proposal, we now reduce not only the space of the index and the number of distance evaluations but also the cpu time required when comparing the permutations themselves. Also, we can use the permutations in low dimensions.