FIX to Analyzing Metric Spaces Indexes: What For?

Level: Medium

There is an error in Section V. It says that the height of the tree will be log_{1/p} n, whereas it has to be always log_k n, even if the balls overlap, because the elements cannot go to more than one and they distribute uniformly. Therefore the terms associated to the sum from log_k n + 1 to h^* are not valid.

Still, note that the analysis as written is correct if you consider the case of clusters of different probability mass, p_1, p_2, ..., p_k (which add up > 1). In this case the masses at depth h are not all p^h, but all the product of h choices between the p_i's. Assuming again F(x)=x to gain intuition, the sum turns out to be (p_1+p_2+...+p_k)^h + rk^h, which is actually a natural extension of the uniform case, where the sum of the p_i's is pk > 1.

The difference, however, is that now the tree height is log_{1/pmax} n >= log_k n, and past this latter depth one can assume there are n clusters. Here the extra terms that were wrong appear, demonstrating that it is just a bad idea to have clusters of different size, and supporting the idea of balancing just as claimed in the paper.