A Prototype for Querying over LZCS Transformed Documents

Joaquín Adiego, Gonzalo Navarro, and Pablo de la Fuente.

We present novel query algorithms that efficiently support some popular XPath operations over LZCS-transformed documents. The LZCS transformation compresses a redundant XML collection without loss. The main idea of LZCS, inspired by Lempel-Ziv compression, is to replace whole substructures by previous occurrences thereof, and our algorithms try to reuse the work done over those repeating substructures. The algorithms are implemented in a prototype called lzcs-grep. The main advantage of lzcs-grep is that it processes the documents in transformed form, obtaining very fast response times in combination with low memory requirements. Our experimental results show that lzcs-grep is competitive with other XPath processors even over untransformed documents, and by far unbeaten when it can operate over their LZCS-transformed version.