Hi guys.
I have a case where I need to access my data (+20 billions) through 2 secondary indexes (user-id and global-id) and it wont be accessed very often and it doesn’t need to be very fast.
My problem is that I won’t have enough memory to store these indexes as each index needs 64B.
As I dont need it to be very fast I was thinking about configuring my memory to swap to disk so I can have all the memory I need to store my indexes.
If you mean you’re opting to run aerospike on SSD’s rather than data-in-memory than yeah it works great if you have good disks. Another couple of thoughts are
You could model your data and write reference sets so that you do not need indexes
If they are not accessed frequently and do not need to be quick, you could opt to use scans/predicate scans instead of queries
The best way to know what fits you best of course would be to test your use cases against the different setups and compare!
Hi @Albot
What do you mean when u say “write reference sets so that you do not need indexes”?
I did a little search about it but had no success, do you know any doc talking about this?
Say you’re writing a table to store stock symbol information.
Let’s say you have a record like this:
PK,SymbolName,SymbolType,Price
Let’s assume the PK is some symbolID we can use for our primary use case. Getting the data is easy, you just form a PK and run client.get. In some use cases though you would need ability to lookup by SymbolName or SymbolType. Depending on the cardinality and use case, you may choose a variety of different methods. The easiest is of course secondary index.
Another method to do this would be to write a reference record.
ex…
setSymbols may contain records like this: PK,SymbolName,SymbolType,Price. Lets assume we are looking at 1 record… with values “1234,GOOG,F,123”
Now if we need a way to look this record up from not just the PK (1234) but also the SymbolName (GOOG) we can write a reference set.
setSymbolReference: PK,SymbolID… continuing this example, we could have the values of the record we need to lookup goog as “GOOG, 1234”.
So step 1 is to lookup the reference key which points to the real primary key, step 2 is to get the actual data.
get(GOOG) => 1234
get(1234) => data!
The pitfall is that you need to update 2 records whenever 1 update comes through.