Indexed Hierarchical Approximate String Matching

Luís Russo, Gonzalo Navarro, and Arlindo Oliveira

We present a new search procedure for approximate string matching (ASM) over suffix trees. We show that hierarchical verification, which is a well established technique for on-line searching, can also be used with an indexed approach. We analyze the resulting algorithm and compare it with related approaches, showing that our method achieves the same performance as the hybrid indexes, which are the best in practice for this problem. We make heavy use of a class of compressed indexes that supports bi-directionality, meaning that the search for a pattern can be updated by adding a letter at the right or at the left. Compressed indexes are also space parsimonious since they can be represented, essentially, in the same space as the compressed text. The resulting algorithm is of great interest for several applications, in domains like computational biology and information retrieval.