Modelling large data structures

mlove-au · July 22, 2016, 1:53pm

Hi,

We are trying to use aerospike with the following use-case, and are looking for feed-back/recommendations as to how we are thinking of modelling the data.

In our use-case, we have a large number of “entities” (initially several hundred thousand, but potentially millions). Each entity has an arbitrary number of “data-contexts” associated with it, and each data-context is essentially a binary blob (potentially quite large, perhaps 50-100Kb each).

The basic process is that an entity and all of it’s data-contexts are retrieved from aerospike, processed by the application, and the entity and any modified data-contexts are stored back.

We expect a very high rate of queries/updates across the data set, but a small number of concurrent accesses to a single entity.

What would be the most efficent way to achieve this? We have experimented with passing the data-contexts and record generation count as parameters to a record UDF. This UDF would check the records generation count to prevent concurrent modification, and use an LDT bin to store chunks of each data-context in a map. We have run into an issue with this approach where it seems the servers Lua cache eventually uses all available memory and crashes the node, possibly because of the amount of data we are passing?

Some additional questions that have been raised are:

Are LDT bins suitable for frequent reads and updates?
Are there any performance guidelines or benchmarks available when passing large amounts of data to and from Aerospike?

manigandham · July 23, 2016, 7:57am

LDT’s are not really recommended, and you don’t need them for that size of data.

Store a record for each entity, with any general information you need, and use a list or map bin that stores the primary keys of all the associated data context. Then store the data-contexts as individual records, perhaps in another set for organization.

You can do 1 operation to lookup the entity record and set a value on a bin as a lock, then read all the primary keys for the data-contexts and fetch them all efficiently with a batch get. Do your processing and store them all again and remove or unset the lock bin value on the entity record.

You can also use a secondary index and store the entity’s primary key as a value on all the data-contexts so you can retrieve them through a query, this will scale better if you have thousands (or more) of data-contexts for an entity as your records are limited in size (defaults to the write block size).

Topic		Replies	Views
How to handle large lists: use LDT or index? Feature Discussion	4	1580	June 24, 2015
Timeseries Data structure question Data Modeling	4	3821	September 5, 2015
Aerospike for large objects - LDT & LLIST (looking for alternatives to MongoDB and S3) Aerospike and other Databases llist , ldt	13	5942	September 28, 2015
Aerospike Php Client storing Large data more than 1MB PHP Client Library	0	1274	March 21, 2017
Increasing maximum number of bins per namespace Planning	3	3971	August 16, 2014

Modelling large data structures

Related topics