A Compressed Text Index on Secondary Memory

Rodrigo González and Gonzalo Navarro.

We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as most compressed indexes are slower than their classical counterparts on secondary memory. We analyze our index and show experimentally that it is extremely competitive on compressible texts. As a side contribution, we introduce a simple encoding of sequences that achieves high-order compression and provides constant-time random access, both in main and secondary memory.