Storage space difference between bins and maps

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

#Synopsis Storing data in bins as compared to a a single bin of maps is different.

var doc = {
id: '12321412421421',
d: [60,40,13,14,17,17,20,17,560],
do: {'2': 60,'3': 40},
di: [ 3 ]
};

used-bytes-disk - 512

,and compared with

var doc = {
info:{
id: '12321412421421',
d: [60,40,13,14,17,17,20,17,560],
do: {'2': 60,'3': 40},
di: [ 3 ]
};

used-bytes-disk = 256

#Solution: Aerospike uses MessagePack as the packing format for list/map. The best way to estimate the overhead of list/map will be to use http://msgpack.org/ to calculate specific size for your list/map data. This would be the equivalent “BLOB” size.

To be clear, bins of type Map and List are stored as MessagePack map and array, respectively. Any data stored within a Map or List is stored as MessagePack types. MessagPack is a far more compact means of representing data than what Aerospike uses for bins. Please refer to MessagePack Spec for details [1], but I will highlight a few points as promised.

Integers

  1. An Integer Bin is a 64-bit data type. In storage it is the bin type (1 byte), and the 64-bit integer (8 bytes). Storage: 9 bytes.
  2. An Integer in MessagePack format can vary in size, depending on the size of the integer (1, 2, 3, 5, or 9 bytes). Storage: 1, 2, 4, 5 or 9 bytes.

Strings and Byte Arrays

  1. Also referred to as strings
  2. A String Bin contains a the type of bin (1 byte), the size of the string as 64-bit integer (8 bytes), followed by the actual data (n bytes). Storage: 9 + n bytes.
  3. A String in MessagePack it will only use the bytes necessary to represent the size (1, 2, 3 or ,5 bytes), much like an integer in MessagePack, then it is followed by the data. Storage: (1, 2, 3 or 5) + n bytes.
  4. It is a 32-bit (4 byte) integer.

Map and List

  1. A Map and List Bin contains the the bin type (1 byte), and the MessagePack representation of the Map or List, which contains the information in next line. So there will always be 1 byte extra for bins containing Maps and List, than Maps or List containing Maps or Lists. Storage: 1 + (1, 3, or 5) + n bytes.
  2. A Map and List Bin contains 1,3 or 5 bytes for type and size information, then followed by each element, each of which is a MessagePacked value. For Maps it is a key and value pair. Storage: (1, 3, or 5) + n bytes.

Now if you take all this into account, then you will find in many cases, storing a Map containing all fields will actually be more space efficient. Unfortunately, the database works more efficiently with regards to bins, as it knows how to work with bins (top level) and not entries in a Map bin.

Note: When using bins, we need to have at least 28 bytes overhead per bin Capacity Planning. Because we work on size chunks of 128 bytes.

It is worthwhile to bring out to the customer the other side of the data-model trade-off of using bins - he will be able to operate on selected bins as desired, create sindex as appropriate, etc.

Reference: msgpack/spec.md at master · msgpack/msgpack · GitHub

2 Likes