Indexing Text using the Ziv-Lempel Trie.

Gonzalo Navarro

Let a text of u characters over an alphabet of size s be compressible to n symbols by the LZ78 or LZW algorithm. We show how to build a data structure, called the LZ-index, based on the Ziv-Lempel trie that takes 4n log_2(n) (1+o(1)) bits of space (that is, 4 times the entropy of the text) and reports the R occurrences of a pattern of length m in worst case time O(m^3 log(s) + (m+R)log(n)). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found.