HyperLogLog intersection accuracy

As Aerospike recently announced availability of new HyperLogLog feature in its recent release, I wonder what is the accuracy of intersecting one data-structure of quite a big cardinality and another data-structure with a really small one?

For example, one HLL represents all the women of the US, and the second one - all the astronauts of the US and we need to answer how many women of the US are astronauts.

This is covered here: https://www.aerospike.com/docs/guide/hyperloglog.html#error-bounds-by-hyperloglog-operation

You are describing a case where HLL intersection accuracy is poor. You can place relative error bounds on an HLL if you supply n_minhash_bits. By doing so the HLL uses HyperMinHash algorithms for calculating similarity and intersection.

Just noticed a mistake on that page for choosing n_index_bits when using minhash estimations.

The doc says to choose n_index_bits to be at least e-2. It should have been log(e-2). Logs here are log2, we should also clarify that.

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.