High latency and cpu is not getting utilised completely

Hi Team, I am trying out the aerospike for our production use case and when i cam hitting the server the cpu usage is not going above 5% due to this latency is very low. Aerospike version - 7.0.0.10 Cluster - 3 node cluster (each 2 core / 8gb ram) Client - aerospike-client-jdk21:8.1.1

Client

@Configuration
public class Config {

    private static final Host[] hosts = new Host[] {
            new Host("host1", 3000),
            new Host("host2", 3000),
            new Host("host3", 3000)
    };

    @Bean
    public AerospikeClient aerospikeClient() {
        ClientPolicy clientPolicy = new ClientPolicy();
        clientPolicy.connPoolsPerNode = 10; //tried increasing these nothing is working
        clientPolicy.maxConnsPerNode = 300; //same here
        return new AerospikeClient(clientPolicy, hosts);
    }
}
@RestController
@RequestMapping("/aerospike")
@Slf4j
public class Controller {

    @Autowired
    private AerospikeClient aerospikeClient;

    private List<String> list = new ArrayList<>();
    private List<Long> readLatencies = new ArrayList<>();

    @PostMapping
    public void write(@RequestBody Payload payload) {
        String primaryKey = String.valueOf(UUID.randomUUID());
        try {
            WritePolicy writePolicy = aerospikeClient.getWritePolicyDefault();
            writePolicy.socketTimeout = 30000;
            writePolicy.totalTimeout = 30000;
            Key key = new Key("perf", "", primaryKey);
            Bin bin = new Bin("data", payload.getPayload());
            aerospikeClient.put(writePolicy, key, bin);
        } catch (Exception e) {
            log.error("Error while writing to aerospike: {}", primaryKey, e);
        }
    }

    @GetMapping
    public String read(@RequestParam String primaryKey) {
        String data = null;
        try {
            Key key = new Key("perf", "", primaryKey);
            Policy policy = new Policy(aerospikeClient.getReadPolicyDefault());
            policy.socketTimeout = 30000;
            policy.totalTimeout = 30000;
            data = aerospikeClient.get(policy, key).bins.get("data").toString();
        } catch (Exception e) {
            log.error("Error while reading from aerospike: {}", primaryKey, e);
        }
        return data;
    }
}

tried increasing the hits to check if there is any issue with client config but still the same looks i need to change configuration in both client and server.

currently having ~500TPS to ~800TPS from client

server configs

also in the histogram everything seems to be less than 1ms but from jmeter 95% is 120ms for ~500TPS

Histogram latency is at the server.

In this analogy, in your case, looks like, 1 ms is the “service time”, 120 ms is the “response time”. So, Aerospike server is doing its part. You may want to investigate the rest of the path, getting to Aerospike and back.

yeah got it i also thought it should be the delay in my network calls or something else that i need check… but why i am not seeing the high cpu usage even though service-thread parameter is 10 in each node… currently the cpu usage is not going above 10% ideally it should be 100% due to service-thread param coz i am using 2 core machine for all 3 servers … is this something to do with client or server configs?

Typically folks worry when cpu usage is high. That can happen if you are using things Transport Layer Security (TLS) i.e. encryption on client to server connections, TLS on fabric, TLS on heartbeat, encryption-at-rest on device data etc. - any similar computationally intensive work. You might want to search on this forum on “high cpu” related discussions.

the requests are not using TLS… eventhough it uses TLS my doubt is always why server it is not using the all the threads to read / write record and if there is a less cpu usage then we cannot able to get the required TPS and all requests will be submitted to server and will be in waiting stage coz it uses a less resources to perform the operations…

PS: stuck on the same doubt for last 2 days and searched almost everything unable to find anything to help me

can you check the configs are on point on the client and server end? CMIIW with anything

I am suspecting your issue may be how you have written your application. You may be thinking, again I am guessing, that your read requests are pipelined on the socket and the response are pipelined as well and server parallelization would improve throughput. In Aerospike, for reads, a single read transaction happens on a dedicated socket, blocked till the response is received back to the client.

So if you are doing individual get() calls in a loop in a single threaded application, you are sending those to the server sequentially, waiting on each response to send the next one.

To achieve higher throughput, you have to multi-thread your application or if your application is amenable to use batch reads, try that, which are then parallelized by the client library. Batch response is pipelined back to the client library.