Low performance in Aerospike for batchRequests

I have an application that does a lot of batchGets. Here is the code I use:

public byte[][] getBatch(byte[][] keys) {
    Key[] aeroKeys = new Key[keys.length];
    for (int i = 0; i < keys.length; i++) {
        aeroKeys[i] = new Key(NAMESPACE, setName, keys[i]);
    }
    Record[] records = aerospike.get(batchPolicy, aeroKeys);
    byte[][] response = new byte[keys.length][];

    for (int i = 0; i < keys.length; i++) {
        if (records[i] != null) {
            response[i] = (byte[]) records[i].getValue(DEFAULT_BIN_NAME);
        }
    }
    return response;
}

This code works perfectly and fast when I have a single request. But when I run multiple parallel threads doing batchGets, it is extremely slow. I don’t see much CPU or I/O usage in the monitoring, so I suspect something is waiting, but I don’t know what it is.

I have tried many different configurations, and this is the config I have now (16 cores server):

    service-threads 16
    transaction-queues 16
    transaction-threads-per-queue 16
    batch-index-threads 16
    batch-max-buffers-per-queue 1000
    proto-fd-max 15000
    batch-max-requests 2000000

Any idea on whats going on?