Swaping memory to disk with large datasets

Juliano_Galhiego_Vie · September 24, 2017, 11:41am

Hi guys. I have a case where I need to access my data (+20 billions) through 2 secondary indexes (user-id and global-id) and it wont be accessed very often and it doesn’t need to be very fast. My problem is that I won’t have enough memory to store these indexes as each index needs 64B.

As I dont need it to be very fast I was thinking about configuring my memory to swap to disk so I can have all the memory I need to store my indexes.

Do you guys think its a good idea?

Albot · September 24, 2017, 4:41pm

If you mean you’re opting to run aerospike on SSD’s rather than data-in-memory than yeah it works great if you have good disks. Another couple of thoughts are

You could model your data and write reference sets so that you do not need indexes
If they are not accessed frequently and do not need to be quick, you could opt to use scans/predicate scans instead of queries

The best way to know what fits you best of course would be to test your use cases against the different setups and compare!

Juliano_Galhiego_Vie · October 7, 2017, 5:11pm

Hi @Albot What do you mean when u say “write reference sets so that you do not need indexes”? I did a little search about it but had no success, do you know any doc talking about this?

Thanks for your support.

Albot · October 9, 2017, 4:02pm

Say you’re writing a table to store stock symbol information. Let’s say you have a record like this: PK,SymbolName,SymbolType,Price

Let’s assume the PK is some symbolID we can use for our primary use case. Getting the data is easy, you just form a PK and run client.get. In some use cases though you would need ability to lookup by SymbolName or SymbolType. Depending on the cardinality and use case, you may choose a variety of different methods. The easiest is of course secondary index.

Another method to do this would be to write a reference record. ex… setSymbols may contain records like this: PK,SymbolName,SymbolType,Price. Lets assume we are looking at 1 record… with values “1234,GOOG,F,123”

Now if we need a way to look this record up from not just the PK (1234) but also the SymbolName (GOOG) we can write a reference set. setSymbolReference: PK,SymbolID… continuing this example, we could have the values of the record we need to lookup goog as “GOOG, 1234”.

So step 1 is to lookup the reference key which points to the real primary key, step 2 is to get the actual data.

get(GOOG) => 1234
get(1234) => data!

The pitfall is that you need to update 2 records whenever 1 update comes through.

Topic		Replies	Views
Secondary index on huge number of records Query & Indexing secondary , index	7	3270	January 31, 2018
When not to use a secondary Index index	13	1485	June 29, 2018
Secondary Indexes in memory? Query & Indexing index	0	1471	April 19, 2016
Memory to Disk Ratio with "data-in-memory false" Operations	3	672	October 28, 2023
Using up to much memory, not enough Disk Tuning	1	1284	August 16, 2014

Swaping memory to disk with large datasets

Related topics