Increased Bit-Parallelism for Approximate and Multiple String Matching

Heikki Hyyrö, Kimmo Fredriksson, and Gonzalo Navarro

Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(ceil(m/w) n), where w is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic O(mn) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w, since w-m bits in the computer words get unused.

In this paper we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending O(rn) time to search for r patterns of length m<=w/2, we need O(ceil(rm/w) n) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m<=w/2, which can be searched for in O(ceil(n/floor(w/m))) bit-parallel steps instead of O(n). Third, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m, where k is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences.

Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.