How to implement "count" with "group by" and multiple filters efficiently

Hi! We use AS for a long time and now we stuck on implementing complex count query.

What we have:

  1. 260.000.000 of records (350Gb on SSD) with such bins: owner_id, end_date, message_type, is_read, message_params.
  2. 7.000.000 unique “owner_id” and secondary index on it.
  3. 3 nodes.

What we need:

  1. Few times per day we should put 1 “message” per owner_id and calculate count like: “SELECT COUNT(1), message_type FROM table WHERE owner_id= ‘some-owner-id’ AND end_date > NOW() AND is_read=0 GROUP BY message_type”

Right now for each owner_id we query all items with end_date > NOW() and do the rest “in code”. We do it in 200 threads so it leads to very hight IO Wait. In fact we read nearly all data.

So the question is how we can do that count operations without huge number of io reads. Maybe we can configure indexes in the way to do the “count” in memory. Like index on owner_id + end_date + is_read + message_type.

Have no idea… need help.

Creating separate records whose primary key is the combination of interest and keep the count in parallel is certainly one way to do it if you don’t want to scan all the records and aggregate counts later either in client or via a stream UDF.