Modeling Text Databases

Ricardo Baeza-Yates and Gonzalo Navarro

We present a unified view to models for text databases, proxing new relations between empirical and theoretical models. A particular case that we cover is the Web. We also introduce a simple model for random queries and the size of their answers, giving experimental results that support them. As an example of the importance of text modeling, we analyze time and space overhead of inverted files for the Web.