Intermittent high latency

I have 6 node cluster. Each node is a bare metal machine with 24 core and 256 GB RAM and 10 Gbps Network running on CentOS 7.4.1708 (Core) (kernel 4.14.0-1.el7.elrepo.x86_64).

Aerospike config :

service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/
        #service-threads 24
        #transaction-queues 24
        transaction-threads-per-queue 4
        proto-fd-max 30000
        transaction-pending-limit 0
        auto-pin cpu

logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
                context any info

network {
        service {
                address int4
                port 3000
                access-address int4

        heartbeat {

#               mode multicast
 #              address #
  #                     port 9918

    mode mesh
    port 3002

    mesh-seed-address-port ....
    mesh-seed-address-port ....

    interval 150
                timeout 10

        fabric {
                address int4
                port 3001

        info {
                port 3003

namespace test {
        replication-factor 2
        memory-size 1G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine memory

#production namespace
namespace Production {
  replication-factor 2
  memory-size 242G
  default-ttl 0 # 30 days, use 0 to never expire/evict.
        high-water-disk-pct 50 # How full may the disk become before the
                               # server begins eviction (expiring records
                               # early)
        high-water-memory-pct 85 # How full may the memory become before the
                                 # server begins eviction (expiring records
                                 # early)
        stop-writes-pct 90  # How full may the memory become before
                            # we disallow new writes
        partition-tree-sprigs 4096
        partition-tree-locks 256
  # storage-engine memory
  storage-engine device {
                #device /dev/sdb1
                #data-in-memory false

    file /opt/aerospike/data/
    filesize 1000G # 8 times of RAM
    data-in-memory true

                #write-block-size 128K   # adjust block size to make it efficient for SSDs.
                # largwst size of any object

Total load (in TPS) -

Read: 120-250K

Batch_read : 300-400K

Write: 4-10K (peak 80K)

UDF: 500-1000

Queries: 60-80

Cluster Info:

Aerospike server version:

Master Object count: ~ 2B

RAM uses per node: ~ 60%

There are multiple clients (Mostly latest Go client) that read and write. I have noticed that sometimes latency goes high (asloglatency tool and on client side stats) and after sometimes(few hours) it comes down to normal without any change. I have checked TPS during that time but it seems independent of it. I couldn’t find anything in logs.

how should I find the root cause? This happens a couple of time every week with no time/load pattern (At least I have not observed any till now). Please suggest.

The microbenchmark systems should help narrow it down .