Aerospike using double memory

Thanks for the info/links!

Our use case is an IoT solution for buildings capturing sensor data from a large number of sensors per building. The incoming data is read from MQTT and stored in Aerospike and is periodically aggregated by a batch process. We provide a web portal providing analytics/insights into the sensor data at different resolutions (live, five minutes, hourly, daily, weekly, monthly), which uses a custom query engine which was designed to be fast, flexible and scalable. The system has been in development since around 2015.

Namespaces

Our namespaces are live, five minutes, hourly, daily, weekly, and monthly.

Sets

We then use sets as logical groupings for buildings. You stated the limit is 1023 for sets per namespace, so theoretically we can only store data for 1023 buildings. This isn’t an issue yet, but could be down the track. Can you give an example of a typical use case for sets?

Bins

  • start : Timestamp in epoch. This is indexed and hence queries can only run against this bin.
  • end : Timestamp in epoch.
  • zone : Timezone.
  • type : The type of sensor data we are storing
  • data : Binary data which contains the sensor data as written by an ingestion service.

Indexes

There is one index used, idx_<building_id>_start - bin (start) on every set (building) for each namespace (data type); querying can only be done by start time. We dynamically add secondary indexes when we put data into the Aerospike store using the Java client via the ingestion service.

User Defined Queries (UDF)

The querying happens on the primary index, start but the other parameters are still used to filter data returned by the queries.

Have we modelled our data correctly? What would the performance impact be if we removed the secondary indexes?