Swaping memory to disk with large datasets


#1

Hi guys. I have a case where I need to access my data (+20 billions) through 2 secondary indexes (user-id and global-id) and it wont be accessed very often and it doesn’t need to be very fast. My problem is that I won’t have enough memory to store these indexes as each index needs 64B.

As I dont need it to be very fast I was thinking about configuring my memory to swap to disk so I can have all the memory I need to store my indexes.

Do you guys think its a good idea?


#2

If you mean you’re opting to run aerospike on SSD’s rather than data-in-memory than yeah it works great if you have good disks. Another couple of thoughts are

  • You could model your data and write reference sets so that you do not need indexes
  • If they are not accessed frequently and do not need to be quick, you could opt to use scans/predicate scans instead of queries

The best way to know what fits you best of course would be to test your use cases against the different setups and compare! :slight_smile:


#3

Hi @Albot What do you mean when u say “write reference sets so that you do not need indexes”? I did a little search about it but had no success, do you know any doc talking about this?

Thanks for your support.


#4

Say you’re writing a table to store stock symbol information. Let’s say you have a record like this: PK,SymbolName,SymbolType,Price

Let’s assume the PK is some symbolID we can use for our primary use case. Getting the data is easy, you just form a PK and run client.get. In some use cases though you would need ability to lookup by SymbolName or SymbolType. Depending on the cardinality and use case, you may choose a variety of different methods. The easiest is of course secondary index.

Another method to do this would be to write a reference record. ex… setSymbols may contain records like this: PK,SymbolName,SymbolType,Price. Lets assume we are looking at 1 record… with values “1234,GOOG,F,123”

Now if we need a way to look this record up from not just the PK (1234) but also the SymbolName (GOOG) we can write a reference set. setSymbolReference: PK,SymbolID… continuing this example, we could have the values of the record we need to lookup goog as “GOOG, 1234”.

So step 1 is to lookup the reference key which points to the real primary key, step 2 is to get the actual data.

  1. get(GOOG) => 1234
  2. get(1234) => data!

The pitfall is that you need to update 2 records whenever 1 update comes through.