A Bit-parallel approach to Suffix Automata: Fast Extended String Matching.

Gonzalo Navarro and Mathieu Raffinot

We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a "suffix automaton" which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%-25% faster than BDM, 2-3 times faster than other bit-parallel algorithms, and 10%-40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.