Reducing the Space Requirement of LZ-index

Diego Arroyuelo, Gonzalo Navarro, and Kunihiko Sadakane

The LZ-index is a compressed full-text self-index able to represent a text T[1..u], over an alphabet of size s and with k-th order empirical entropy Hk(T), using 4uHk(T) + o(u log s) bits for any k = o(log_s u). It can report all the occ occurrences of a pattern P[1..m] in T in O(m^3 log s + (m+occ) log u) worst case time. This is the only existing data structure of size O(uHk(T)) able of spending O(log u) time per occurrence reported. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2+e) u H_k(T) + o(u log s) bits of space, for any constant e > 0, and we simultaneously improve the search time to O(m^2 log m + (m+occ) log u). Both indexes support displaying any subtext of length l in optimal O(l / log_s u) time. In addition, we show how the space can be squeezed to (1+e) u H_k(T) + o(u log s) to obtain a structure with O(m^2) average search time for m >= 2 log_s u.