###
Stronger Lempel-Ziv Based Compressed Text Indexing

####
Diego Arroyuelo, Gonzalo Navarro, and Kunihiko Sadakane

Given a text *T[1..u]* over an alphabet of size *s*,
the *full-text search* problem consists in finding
the *occ* occurrences of a given pattern *P[1..m]* in *T*.
In *indexed* text searching we build an index on *T* to
improve the search time, yet increasing the space requirement.
The current trend in indexed text searching is that of
*compressed full-text self-indices*, which replace
the text with a more space-efficient representation of it,
at the same time providing indexed access to the text.
Thus, we can provide efficient access within compressed space.
The Lempel-Ziv index (LZ-index) of Navarro
is a compressed full-text self-index
able to represent *T* using *4 u Hk(T) + o(u log s)* bits of space,
where *Hk(T)* denotes the *k*-th order empirical entropy of
*T*, for any
*k = o(log_s u)*.
This space is about four times the compressed text size.
The index can locate all the *occ* occurrences
of a pattern *P* in *T* in
*O(m^3 log s + (m+occ) log u)* worst-case time.
Although this index has proven very competitive in practice,
the *O(m^3 log s)* term can be excessive for long patterns.
Also, the factor *4*
in its space complexity makes it larger than other
state-of-the-art alternatives.

In this paper we present stronger Lempel-Ziv based indices (LZ-indices),
improving the overall performance
of the original LZ-index.
We achieve indices requiring *(2+e) u Hk(T) + o(u log s)*
bits of space, for any constant *e > 0*,
which makes them the smallest existing LZ-indices.
We simultaneously improve the search time to *O(m^2 + (m+occ) log u)*,
which makes our indices very competitive with state-of-the-art alternatives.
Our indices support displaying any text substring of length *l*
in optimal *O(l/log_s u)* time.
In addition, we show how the space can be squeezed to
*(1+e) u Hk(T) + o(u log s)* to obtain a structure with
*O(m^2)* average search time for *m >= 2 log_s u*.
Alternatively, the search time of LZ-indices can be improved to
*O((m+occ) log u)*
with *(3+e) u Hk(T) + o(u log s)* bits of space,
which is much less than the space needed by other Lempel-Ziv-based
indices achieving the same search time.
Overall our indices stand out as a very attractive alternative
for space-efficient indexed text searching.