We are new to Aerospike - so far our experience has been very positive! We are going to be using aerospike for updating about 50M member profiles while the users visit the site. We also want to then sync that data to another data store for additional processing/reporting. On any one day we might update 1M records and potentially update each one several times throughout the user’s visit. We would like to create a queue/list of profiles that need to be updated and at some time interval query the updated records from aerospike and copy the data to the other store.
The two ways I could think to do this are to create a bin for each record called “was updated” (or similar) and then, when I want to run the sync, query the members where was_updated was set. Read the records (I assume in a scan), save to the new store, then update the was_updated bin for those records so I won’t process them again.
The second possibility is to create a Large Data Type (list) with the keys of the records to be updated and then scan through the list and remove entries after processing.
Basically what I’m not sure of is 1) can the LDT handle large numbers of small entries (eg 1M) well and 2) will scanning an indexed bin be performant at this size and 3) which would perform better?
We also use a solution that will work for a 10-100x growth - the site is growing and if this works well, we will start using it for other data structures (content, and logging)