Aggregations slowing down as the number of concurrent requests increase

I’ve been doing certain aggregations which return quite a lot of data (around 2mb per operation) as well (the reason for aggregation being the need to send out only relevant information).

When run individually, the time taken remains well within acceptable ranges. But as the concurrent aggregations start to increase, all of them start slowing down although the cpu utilization of the server stays at a minimum.

The target namespace in this case has “data in memory” set to true. Is there any optimization/tuning that can be done to increase the overall throughput. An insight into how parallel operations are handled (queued or otherwise) would definitely be of help. The network bandwidth cannot be an issue here as the r3.xlarge ec2 instance being used has ample amount of it. Interesting thing to note here is that increasing parallel operations add overhead to all of them and responses do not show any uniform increment to response times in general.

Also, is there any kind of lock on records that are being accessed in an aggregation so that multiple aggregations having common set of records might end up slowing down? Could the client be a bottleneck (C# client in this case) being unable to process parallel aggregations? Are there any specific areas that need monitoring to reveal the exact area of bottleneck?