Increased Bit-Parallelism for Approximate String Matching.

Heikki Hyyrö, Kimmo Fredriksson and Gonzalo Navarro.

Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(ceil(m/w) n), where w is the number of bits in the computer word. Although this is asymptotically the optimal speedup over the basic O(mn) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w, since w-m bits in the computer words get unused.

In this paper we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed in a single computer word so as to search for multiple patterns simultaneously. Instead of paying O(rn) time to search for r patterns of length m, we obtain O(ceil(r/floor(w/m)) n) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m < w, which can be searched for in time O(n / floor(w/m)) instead of O(n). Finally, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m, where k is the maximum number of differences permitted.

Our experimental results show that that the algorithms work well in practice, and are the fastest alternatives for wide range of search parameters.