Reducing memory footprint


#1

Hi,

I apologize if this isn’t the correct place to discuss the Aerospike Server source code, but I couldn’t find another other place for discussion, Github or otherwise.

I have a use case where the KV pairs have a pretty small value field (~100 bytes), so 64 bytes of overhead per key means a 1:2 ratio between RAM and SSD, which is less than idea.

With this in mind, I started going through the source code to figure out if the index memory could be reduced. I looked through the as_index_s structure to see where memory could be saved. Here is my assessment:

  1. Dropping the flex_bits_1, flex_bits_2 and dim fields should be pretty harmless if there is no need for in-memory storage and only SSDs are being used. This saves 10 bytes.
  2. generation could perhaps be made smaller without significant increase in chances of accessing stale data.
  3. In my use-case, the keys are unique 64-bit numbers, so the key could be smaller as well, though this would be a more involved change because many parts of code assume a 20-byte key. The LDT code, in particular, munges data in the key digest. It should be easier to have a smaller key if LDT support is not needed.
  4. I don’t understand the comment around the color and migrate_mark fields. Would it be incorrect to pack these bits in one byte?

I’d like to solicit your (read: the developers) thoughts on this.

Thanks, Akshat


#2

Thanks for chatting with me on IRC. I’ll see if we can get a good answer for you. From what I understand, “key stacking” might not be a good solution, because you said “Our keys are 64-bit random numbers but highly sparse, so it’s hard to pack multiple keys into one key.”

I’ll look forward to more details in your email. Meanwhile, I’ll see what answers we can muster for your use case.


#3

Akshat,

Adding flavor to the problem statement.

Ofcourse there are few bits which are there in index which could be trimmed aways but it being 64byte is not just because it needs that much space but also related to the fact that the cacheline are 64byte. If you reduce, it should go down to 32 otherwise each index entry would stomp over each other’s cacheline which will cause massive performance hit while performing RB-tree lookup.

Moreover Aerospike, given it is schemaless has no restriction on key type in any form. There is no sort order defined on primary it is simply hash. Enforcing changes which forces key to be integer always goes against this.

– R


#4

Hi Raj,

Thanks for your prompt reply. I’m certainly not suggesting this as a general solution, but I was just wondering if it would make sense to squeeze the index for our use case. Like I told Peter, our keys are quite sparse in a 64-bit key space, so key stacking is likely not to work well, that is, it would be hard to ensure that multiple keys can be packed evenly within one key in Aerospike.

I get your point about fitting each index entry in a cache line – it’ll perhaps be difficult to reduce the index entry down to 32 bytes, even after doing all the things that I enumerated in my original post.

-Akshat