Software

Re-Pair compression and decompression (2010)

It is difficult to find an efficient (linear-time) and easy-to-use implementation of compressor RePair [Larson and Moffat 2000]. I have implemented such a compressor and decompressor (original and balanced variants). It outputs the dictionary as a sequence of pairs, so you can play with better representations of it.

The code is here.

Re-Pair based Web Graph Compression (2010)

The WebGraph ReCoded site contains free code for compressing Web graphs, performance results, and testbed data (mostly pointers to the wider WebGraph project). The related articles are pointed from the site itself, and their preprints from my pesonal publications page.

This project derives from the MSc thesis of my student Francisco Claude.

Libcds: Library of Compact Data Structures (2009)

It is a Google Code project containing implementations of several compact data structures for manipulating bitmaps, sequences, trees, suffix trees, etc.

It contains code contributed by my (ex)students Diego Arroyuelo, Rodrigo Cánovas, Francisco Claude, Rodrigo González, and myself.

This code is being moved to ReCoded page as well.

Pizza&Chili Site: Compressed Full-Text Indexing Corpus and Testbed (2005)

The Pizza&Chili Site contains free compressed full-text index implementations and testing test collections. It is a joint effort with Paolo Ferragina, University of Pisa, and our students Rodrigo González and Rossano Venturini.

Get the technical report where the current compressed indexes are surveyed, some of which are implemented in the site.

LZgrep: A Direct Compressed Text Search Tool (2003 and 2005)

This is the code corresponding to the paper "LZgrep: A Boyer-Moore String Matching Tool for Ziv-Lempel Compressed Text", by G. Navarro and J. Tarhio. It contains improvements by Carlos Avendano Pérez.

Get the paper that explains all the aspects of the software and appeared in Software Practice & Experience.

Get the code (version 0.99), which is a gzipped tar with the copyright notice in a COPYRIGHT file. Build it with "make" and run "lzgrep --help" to start.
It is source code that has been tested on Solaris and Linux. Please see copyright details and (absence of) warranties in the COPYRIGHT file.
Note: option D-WM not yet implemented (works using D-SBOM). All the rest should work.

LZ-index: A Text Index based on the Ziv-Lempel Trie (2003)

This is the code corresponding to the paper "Indexing Text with the Ziv Lempel Trie" and "Implementing the LZ-index: Theory and Practice", by G. Navarro.

Get either the paper already published in JDA or the full technical report, which explains all the aspects of the implementation and parts thereof have been submitted to ACM Journal of Experimental Algorithmics.

Get version 1.1 of the LZ-index, which is the current one. It differs from the original in a new rank implementation obtained with Rodrigo González, Szymon Grabowski and Veli Mäkinen, which made the index noticeably faster.

You can also get version 1.0, the original code tested in the JDA paper.

In the paper submitted to ACM JEA, a new version of the index is proposed that takes 80% of the space of the original index and is up to 3 times slower for short queries but almost similar on longer queries. Get the code of that version. It uses the improved rank function of version 1.1.

All codes come as gzipped tars with the necessary explanations in a README file and the copyright in a COPYRIGHT file.
They are source code that have been tested on Solaris and Linux.

The LZindex is compared against my particular implementation of the FMindex in the paper. Get that implementation if you want to play with it. It is a gzipped tar with the necessary explanations in a README file and the copyright in a COPYRIGHT file. It contains source code that has been tested on Solaris and Linux.
NOTE: This implementation follows the idea of Ferragina and Manzini's paper, however, it has been optimized to use much more space, so I could make a fair comparison against the LZindex, which is larger. You might instead be interested in their implementation.

You can get Sadakane's implementation of his Compressed Suffix Array I compared to in the paper.

Finally, the PizzaChili site contains newer versions of the LZ-index by Diego Arroyuelo, which require less space.

IXPN: Indexed XPath via Proximal Nodes (2002)

This is a demo of an index based on the Proximal Nodes model, which solves XPath queries. It is the undergraduate thesis of Manuel Ortega, a student of mine. The demo uses the Shakespeare collection, which is widely available.

This is the official IXPN page. Specific questions and comments to Manuel Ortega.

JTV: Java Turing Visual (2002)

This is a graphical interface to simulate Turing Machines, using the abbreviated notation of Lewis and Papadimitriou. It is the undergraduate thesis of Marco Mora, a student of mine. It can work on English too.

This is the official JTV page. Specific questions and comments to Marco Mora.

NR-grep: Fast and Flexible Text Searching (2000)

This is the code corresponding to the paper "NR-grep: A Fast and Flexible Pattern Matching Tool", by G. Navarro.

Get the paper that explains all the aspects of the software and has appeared in Software Practice & Experience.

Get the code (version 1.1.2), which is a gzipped tar with the necessary explanations in a README file.
It is source code distributed under a GNU License. The code has been tested on Solaris and Linux. No guarantees, though.

Want older versions? nrgrep version 1.0 nrgrep version 1.1. nrgrep version 1.1.1.

Online Approximate String Matching (1996)

This is the code corresponding to the paper "Improving an Algorithm for Approximate Pattern Matching", by R. Baeza-Yates and G. Navarro.

It is a gzipped tar with the necessary explanations in a README file inside. It is compiled code for SunOS, no source code.

This code can be freely used and distributed for teaching and research purposes, provided it is not modified and this notice is kept attached. No part of this code can be used for direct or indirect commercial advantage without written permission of the authors.

Here is the code.