Rank and Select Revisited and Extended

Veli Mäkinen and Gonzalo Navarro

The intricate connection between the Burrows-Wheeler transform (BWT) and the so-called rank and select dictionaries over symbol sequences is the basis of most successful approaches to compressed text indexing. Rank of a symbol at a given position equals the number of times the symbol appears in the corresponding prefix of the sequence. Select is the inverse, retrieving the positions of the symbol occurrences. Experience has shown that improvements to rank/select algorithms, in combination with the BWT, turn into improved compressed text indexes. This paper is devoted to alternative implementations and extensions of rank and select dictionaries. First, we show that one can use gap encoding techniques to obtain constant time rank and select queries in essentially the same space as what is achieved by the best current direct solution. Second, we extend symbol rank and select to substring rank and select, giving several space/time tradeoffs for the problem. An application of these queries is in position-restricted substring searching, where one can specify the range in the text where the search is restricted to, and only occurrences residing in that range are to be reported. In addition, arbitrary occurrences are reported in text position order. Several byproducts of our results display connections with searchable partial sums, Chazelle's two-dimensional data structures, and Grossi et al.'s wavelet trees.