Drawback with fine-grained namespaces


What’s the drawback, or overhead, with putting each set in its own namespace?

The reason for this is to be able to store more than 2^32 records (per node), and I’m wondering if there’s any drawback of proactively putting each set in its own namespace from day 1.


You have to consider how you are storing your records - is it data-in-memory? single-bin integer data with data-in-index, hybrid - index in RAM, record data on SSD…etc.

Based on that, compute size of RAM and SSD space required for each node within the limitation guidelines for each node. Depending on your record size, compute max SSD space needed to go with that many number of records. Max 2TB per SSD drive storage can be addressed, max 64 SSD drives can be attached per node.

So before you scale out on the namespace, do some basic sizing calculations on the size of node you will need. 4 billion records per namespace per node need 256GB RAM to store the primary index per namespace per node.

Review capacity planning and limitations links.



BTW, good idea to plan the number of namespaces upfront because it is not possible to add or delete namespaces easily later. Namespace planning should also consider your application use case and data access patterns. There are other very important considerations such as do you need to implement single-bin with data-ini-index for counters? Those are more important considerations than just number of records per namespace per node because you can always grow your cluster horizontally to hold more records in a namespace.