A Hybrid Indexing Method for Approximate String Matching

Gonzalo Navarro and Ricardo Baeza-Yates

We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is O(n^lambda *log n), for some lambda>0 that depends on the error fraction tolerated alpha and the alphabet size sigma. It is shown that lambda<1 for approximately alpha < 1-e/sqrt(sigma), where e=2.718.... The space required is four times the text size, which is quite moderate for this problem. We experimentally show that this index can outperform by far all the existing alternatives for indexed approximate searching. These are also the first experiments that compare the different existing schemes.