Query: Aerospike cache implementation frequent update usecase

Hi community members, We were exploring Aerospike as a cache based solution for keeping ephemeral data (say sessional data) which also requires frequent update request on the initial record created (say ~100 to 1000 / sec). Wanted to check if we can leverage it (Aerospike - cache model) for this use case?

I think your concern is around key contention. An in-memory solution using Aerospike for this should work, but it depends on how large the record is. 100-1000/s i wouldn’t be too worried unless this is >100Ki in size. A parameter to know about is transaction-pending-limit. To monitor from the server side you can check fail_key_busy counter metric and you can also enable some hot-key logging to get key-busy logs.

The QPS limit against any specific record will greatly depend on how large the record is. The record size dictates how fast the transaction and thus the lock is, which will limit concurrency against a specific record. Combine this with varying CPU speeds, memory speeds-- you get it. You should experiment to see where your limit is for your hardware and object spec. If this was stored in the Hybrid model, where data is stored in NVMe, I don’t think I would recommend this due to the slower IOs.

Hi @Albot, Thanks for your quick reply.

Given our solution is purely on cache (i.e. DRAM storage) for this ephemeral data (i.e., user sessional data) and the max size of record could be around ~400 to 500kb. And given this is sessional data(uniquie for each user and session) which we are planning to store in AeroSpike (Cache impl) so there is no concurrent update for the data triggered.

Let us know your feedback if this can be achieved for above query using AreoSpike (cache impl.)?

That sounds contentious! You would need to test it. There are techniques to get around this though. Can you describe your data model? You’ll probably need to re-arrange the data, or think about ways to split it up to distribute across more keys.

For fun, I wrote a rust program (ok, mostly a LLM wrote it) to spin up a bunch of threads and record the put time/result against the same key. Aerospike does much better than I expected, but basically my entire gaming rig is dedicated to running this workload. The workload is totally bottlenecked on the Aerospike server, and it looks to be maxing out my CPU. No network or replication involved :slight_smile: Not sure what this looks like if there are more than 1 contentious keys in the system, but I wanted to know the max for 1 (all in-memory, random int arrays). I assume splitting between 2 contentious keys won’t really impact the performance much but won’t be nonzero.

This is charted out in jupyter/pandas

This chart was generated with transaction-pending-limit=1. Changing this to 100,000 to see if the server crashes or how the chart looks is about what you’d expect. Do not change this to 100,000 in prod, it will probably OOM-kill your box. We can see the client performance is about the same, maybe slightly more overall QPS, but the latency is terrible. We get queued up against the key and wait in line to perform our transaction. Probably better to instead implement expontential backoff in the application instead of queueing inside the server like this, but a neat datapoint anyway.

transaction-pending-limit=100,000 (UNSAFE) Chart

This go around I made the blob size dependent on time passing to fix chart skew. It was previously just blob_size=count (n-records written) so it was being stretched on the tail end. This should be more representative of the actual distribution.

I verified these numbers are correct using asadm during runtime and comparing against latencies output for QPS/latency.

Since the graph doesn’t have much resolution, here’s the ~1MiB single-key workload with 100,000txn pending:


Admin+> sh l
~~~~~~~~~~~~Latency  (2024-12-14 20:32:04 UTC)~~~~~~~~~~~
Namespace|Histogram|       Node|ops/sec| >1ms| >8ms|>64ms
test     |read     |mydc-1:3100|    0.0|  0.0|  0.0|  0.0
         |         |           |    0.0|  0.0|  0.0|  0.0
test     |write    |mydc-1:3100|  679.9|100.0|100.0|91.44
         |         |           |  679.9|100.0|100.0|91.44
Number of rows: 2

And specifically for your question:

Looks like the intersection is close to around 400KiB for 1,000/s limit? Maybe? So you’re right in the ballpark!