The best way to handle JSON object in Aerospike

json
list
secondary
query
map

#1

I was wondering the best way to save & retrieve a JSON object.

In the previous topic at Storing and retrieving JSON Data there was discussion about using secondary indices.

QUESTION: What is the best way, performance & memory-wise, to deal with JSON objects. Should we use a string for the JSON object or a JSON doc into a map that is a Complex Data Type?

QUESTION: Does Aerospike support data replication across data centers for JSON docs?


#2

Answer 1

  1. Store as serialized string in a single bin - easy to store, can’t query by anything other than key
  2. Convert JSON doc into series of bins with nested maps in the Aerospike record - complicated saving logic that you have to maintain, can use secondary indexes to query
  3. Hybrid - Store as serialized string in single bin and store any important values as separate bins with secondary indexes to get query ability over certain fields while still keeping overall logic simple.

Performance should be similar with each method, since you’ll be storing the entire record either way. The serialization logic would probably be the slowest part of this.

Aerospike at present can only do range/equality on numeric values and equality only on string values. Also queries can only use a single secondary index. If you need more complicated filtering then you can use Stream UDFs.

Remember to check how big your documents are. Aerospike running on SSDs will have a 128kb default write block size but this can be increased (with side effects on performance and storage space). Records have some overhead and each bin will also have overhead. All indexes are also stored in memory.

Answer 2

I think you might be referring to Aerospike “record” which is the official naming for each main item in Aerospike. In that case, yes records can be replicated in your cluster as well as across data centers (called XDR).


#3

Thank you for all this great info manigandham, this is much appreciated!


#4

Another way is to store as a List or Map. Except for the simplest of JSON, a JSON is either a List (Array) containing Lists and Maps, or a Map (Object) containing Lists and Maps. I routinely store JSON in this way, and it all goes into a single Bin.

If you are a Java programmer and use JSON Simple from Google or a C# programmer and use System.Json, the JSON Array in both implementations implement the List interface and are easily and directly stored as a List Bin. The JSON Object implements the Map interface in Java and the iDictionary (i think) interface in C# and are easily stored in a Map Bin.

The storage part is easy, but as Manigandham has mentioned, the real meat of your question is how you want to retrieve the data in the document. In the NoSQL world, the key to the data model is how you plan to retrieve the data, and there are always different ways to slice and dice the problem.

You could store your JSON as a complex data type (Compound Lists and Maps) and denormalize some attributes of data to individual bins for secondary indexed queries.

Hope this helps.


#5

Thank you very much. We did some prototyping along those lines. It looks like things work.

One more question… How does one calculate the data storage for a single record with Compound Lists and Maps. We know how to calculate for primitive types, but not these others. We did not find documentation on that.


#6

Hi rcritt,

Here is a technique to estimate the size of a value in bytes.

Create your list or map in java (JSON simple). Create an Aerospike Value from it, then call estimateSize() on the value.Here is an example:

List<String> places =  Arrays.asList("Buenos Aires", "CĂłrdoba", "La Plata");
Map<String, Object> map = new HashMap<String, Object>(); 
map.put("cal", "this is a cat");
map.put("size", 1234L);
map.put("places", places);
Value mapValue = Value.get(map);
System.out.println("Size in bytes: " + mapValue.estimateSize());

I hope this helps.

Peter


#7

one more bin-per-field con

  • Aerospike field name is limited to 14 characters, which is actually a very tiny amount.

and pro:

  • writing UDF is a LOT easier and they perform better when they don’t need to parse the JSON, but can work with lists and maps directly