Complex queries

I’m new to Aerospike and I’m currently working on it using the C client library. I have the following example data set:

column1 column2 column3 column4 column5 column6 column7 column8 column9

1 SOU VAW 1 1562528.12341 (1,2,4,5,1,3,4,6,2,5 … etc) q+z text - could be very long

2 SOU VAW 1 1562528.3457 (1,4,2,5,1,3,4,6,2,5 … etc) q+z text - could be very long

3 SOU VAW 1 1562528.3588 (1,2,4,5,1,3,4,6,2,5 … etc) q+z text - could be very long

4 VID VAW 1 1562528.12341 (1,5,4,5,1,3,4,6,2,5 … etc) q+z text - could be very long

5 VID VAW 1 1562528.688 (1,8,4,5,1,3,4,6,2,5 … etc) q+z text - could be very long

6 VID VAW 1 1562528.906 (1,7,4,5,1,3,4,6,2,5 … etc) q+z text - could be very long

I am supposed to insert the data into Aerospike and to formulate queries to achieve the following:

  1. select all column3 and colunm7 unique combinations with maximum column5 for each
  2. select all column2, column3, column7 and column8 unique combinations
  3. select all column2, column3, column7 and column8 unique combinations for where column2, column3 are specified
  4. assign column9 for where column2, column3 and column5 are specified specifications
  5. select column5, column9 and columm6 for where column2 and column3 are specified
  6. select column5, column9 and column6 for where column2 and column3 and column5 are specified
  7. select column5, column9 and column6 for where coulmn2 and column3 are specified and where column5 values are within a specified range
  8. select column6, column9 and column8 for where column2 and column3 and column5 is max for each combination of column2 and column3
  9. delete all data
  10. delete everything that is not the latest unique column2 and column3 and column5 combination

I have realised that I cannot achieve what is needed using C only and I need to use UDFs. I have gone through record and stream UDFs and I know how to setup and invoke them. However, I don’t know how to write UDFs for the queries such as I have specified. Please guide a friend!

Thanks in advance.

@kporter this looks like his original, more detailed, question.

It can be done, but this isn’t really an area that Aerospike excels. If you need these queries to be high performance and this is one of the primary functions of this deployment than you will likely have better luck in another DB which specializes in analytic use cases.

The ‘unique’ aspect of the query isn’t possible on the server side, the client would need to filter out duplicates. A stream UDF could dedup on an individual node but across nodes, the client would need to dedup.

Similarly, a UDF on a scan can be used to can be used to filter find max etc.

Scan UDF will need to scan the entire namespace which can be slow. You could create a secondary index on colum3 (since it seems to be in all of your filters) and use a stream UDF on that secondary index to match other criteria.