Hi,
We are trying to use aerospike with the following use-case, and are looking for feed-back/recommendations as to how we are thinking of modelling the data.
In our use-case, we have a large number of “entities” (initially several hundred thousand, but potentially millions). Each entity has an arbitrary number of “data-contexts” associated with it, and each data-context is essentially a binary blob (potentially quite large, perhaps 50-100Kb each).
The basic process is that an entity and all of it’s data-contexts are retrieved from aerospike, processed by the application, and the entity and any modified data-contexts are stored back.
We expect a very high rate of queries/updates across the data set, but a small number of concurrent accesses to a single entity.
What would be the most efficent way to achieve this? We have experimented with passing the data-contexts and record generation count as parameters to a record UDF. This UDF would check the records generation count to prevent concurrent modification, and use an LDT bin to store chunks of each data-context in a map. We have run into an issue with this approach where it seems the servers Lua cache eventually uses all available memory and crashes the node, possibly because of the amount of data we are passing?
Some additional questions that have been raised are:
- Are LDT bins suitable for frequent reads and updates?
- Are there any performance guidelines or benchmarks available when passing large amounts of data to and from Aerospike?