Does queryAggregate() return Single or Multiple results or rows from ResultSet

Looking for help if anyone can help explain queryAggregate() behaviour in a clustered environment using an aggregation UDF. It’s not exactly clear to me what queryAggregate() returns when multiple nodes are present.

The docs state the following:

The query executor puts results on a queue in separate threads. The calling thread concurrently pops results off the queue through the ResultSet iterator. The aggregation function is called on both server and client (final reduce). Therefore, the Lua script file must also reside on both server and client.

Is the calling thread in this instance my application thread? Is this iteration and final reduction being performed by my application thread inside the client before control is returned to the application, or do I have to manually iterate the ResultSet which is returned? It’s a little confusing to state that the final reduction is executed on the client side through the UDF lua functions and then return a ResultSet which is something an app can iterate.

Using docker-compose with the CE version I can only run a single node locally to test, so I am wondering if there is any way to determine this behaviour without spinning up a local cluster somehow.

Is the calling thread in this instance my application thread?

Yes, the calling thread is your application thread.

Is this iteration and final reduction being performed by my application thread inside the client before control is returned to the application, or do I have to manually iterate the ResultSet which is returned?

You have to manually iterate the ResultSet which is returned.

You can test aggregation queries with a single node. In this case, the ResultSet contains the same results that are returned from that single node and no reduce is performed.

Look at QuerySum.java and QueryAverage.java for aggregation query example code.

Thank you for your reply.

I see in QueryAverage that it returns a single result, which is retrieved using resultSet.next() once, whereas QuerySum iterates the ResultSet. There is also a difference in the UDF lua definitions. QueryAverage uses a map+aggregation+reduce steps, whereas QuerySum uses only map+reduce steps. My own function uses only filter->map->reduce, no aggregation step.

I will test to see which of these examples match my own use case.

I missed this in my previous reading of the docs:

One main characteristic of reduce function is that it executes both on the server nodes as well as the client side (in application instance). Each node first runs the data stream through the functions defined in the stream definition. The end result of this is sent to the application instance. The application gets results from all the nodes in the cluster. The client layer in it does the final reduce using the reduce function specified in the stream. So, the reduce function should be able to aggregate the intermediate aggregated values (coming form the cluster nodes). If there is no reduce function, the client layer simply passes all the data coming from the nodes to the application.

So I think as long as I provide a reduce() function in my UDF it should get run on both the individual node and then on the client side, thereby returning a single result to my application?

That is correct.

1 Like

After making some modifications I’d like to confirm my understanding of the flow of operations.

  1. The filter function is applied.
  2. The map function is applied.
  3. The aggregation function is applied. These first 3 ops all run inside each partition.
  4. Then a server-side reduce function reduces all the aggregate results from all partitions on that node to a single result for that node.
  5. Finally the same reduce function is called once more client-side to produce a single result for the client.

Is this correct?

The first 3 ops apply to the entire node and are not restricted to a specific partition within that node. Aggregation queries use the older query protocol which does not query by partition. Otherwise, your statements are correct.

1 Like