Updating multiple records in a single user request

Raunaj · July 15, 2019, 3:09pm

Dear Community Member,

I am doing some PoC on aerospike for my use case. As I understood from the documentation, aerospike can work with strong consistency in the enterprise edition which I am willing to opt for.

Now the thing is, in my use case, a user request can result in updating close to 30K keys in the aerospike cluster and since aerospike can update only one record in a transaction, I am wondering how can I update 30K keys in a transaction.

Kindly suggest.

Thanks

Albot · July 15, 2019, 5:17pm

For a single user transaction to touch 30k records, I don’t know if that’s a sustainable model. There is also no way to lock or atomically operate on more than 1 record at a time. Maybe you can explain your use case and the current data model and we can suggest an alternate approach? If you must update 30k records, you would need 30k put calls.

Raunaj · July 16, 2019, 7:25am

Hi There,

The use case is very similar to this:

Suppose in a company there are 30000 employees. Now suppose HR wants to increment everyone’s salary by 10% ( ) , they will just update in a ssystem saying “bump up everyone’s salary by 10%” and the system will calculate new salary for all 30000 employees and will store it in aerospike. So as you can see, with one update from a user, the system is going to update 30000 records in aerospike.

Right now, we are using MSSQL where we are storing this salary component as a JSON of map of {empId → Salary} and when someone updates the salary, we will read and parse the JSON in Java and update values and then again store this JSON in MSSQL.

Ideally we would want to have 30000 records where we are storing key as empId and a bin value as salary, so that it is easier to make adhoc queries and also easy to do other operation like remove employees, aggregate some other data etc.

I hope I am clear on the use case and how we have modeled it currently.

PS: We need a system which can support strong consistency since these are very critical data and we cannot afford losing records.

Thanks

Albot · July 17, 2019, 1:45am

Well you can’t really do it atomically across 30k records. You could use a UDF to crawl through, store a ‘update procedure’ hash (to verify if it has been applied or not) and then perform the action on the record. You’d just send the procedure to the cluster and it would execute it for you. I’m not real sure if that’s the right solution for this though so I’ll let someone else chime in

bbulkow · July 19, 2019, 7:26am

Aerospike is fast and scalable because it can treat each row independently.

That’s why there are no master coordinating nodes, no master rollback log, or anything like that which impedes scalability.

The tradeoff you get for multi-million transaction per second per server, even for writes, and scaling linearly, is no multi-record updates.

The database theory in this area has a lot of twists and turns, and is beyond a simple knowledge base article.

And no, UDFs don’t solve the problem. It’s a distributed transaction commit problem, where you have to deal with interrupted transactions, network outages, cluster splits, reads during writes, isolation tradeoffs, and other cases. Not simple.

pgupta · July 22, 2019, 4:47pm

With SC mode in EE, you will have to do a data model where record state is built in the record. Conceptually, you have a current timestamp in a bin, current salary, new timestamp, new salary. And then you make sure after a scan UDF, your scan completes successfully, use failOnClusterChange option. Re-run till done. UDF should be idempotent. Then run a verification run that all salaries did get updated. You will need to keep track of the exact count of total employees. For e.g. JacksID 20190701 100000 20190801 110000.

Mich_Talebzadeh · July 22, 2019, 7:35pm

Hi,

Please execuse my ignorance. I am intrigued about your use case here.

What advantages are you going to gain by making Aerospike behave like an RDBMS? Since you have MSSQL, I gather that can do the job pretty well. Which one is the golden source here MSSQL or Aerospike?

HTH

goyalSachin136 · April 13, 2020, 1:43pm

This is a simple use case of fan out and fan in approach. Instead of updating all 30k records why cant you update the salary of all 30k users on demand(whenever salary is fetched) by storing a common minus, plus event somewhere.

Topic		Replies	Views
How to get, examine and update/insert few records in one transaction Python Client udf	3	1431	April 17, 2018
Support for Batch Insert	1	1185	June 15, 2015
Query: Aerospike cache implementation frequent update usecase Feature Discussion query	5	128	December 14, 2024
How to achieve atomic update on a map inside a Aerospike Model? Operations java	3	1316	July 13, 2022
How does aerospike implement when multiple user are inserting record with same key	2	782	May 12, 2022

Updating multiple records in a single user request

Related topics