Poor performance after migrate environment from gcloud

We have a cliente on gcp and we need to migrate to a new datacenter. They had a cluster with six nodes, each one with 4 cpus and 4gb ram. The size of database is approximately about 260gb with a java application in ten servers comparing some musics fingerprints. On this enviroment the speed queries something about 300 ms. We migrate to bate metal servers, with four nodes, each one with 12 cores and 64gb ram, with same OS versions and same aerospike version (4.5.1.5). On our environment, these same queries take something about 20 sec, 30 sec, the processors will go to 100% of use. We try so many different settings, and we didn’t even come close to the performance from old environment, so, any, any kind of help is welcome .

Config 4 servers with AMD Ryzen 5 3600 64Gb RAM 2 nvmes RAID 0 Ubuntu 18.04

Config:

# Aerospike database configuration file.
service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 32
        transaction-queues 32
        transaction-threads-per-queue 32
        batch-index-threads 256
        batch-max-buffers-per-queue 128
        batch-max-unused-buffers 128
        proto-fd-max 9000
#        auto-pin cpu
#        proto-fd-max 4096
#        proto-fd-idle-ms 10000
        batch-max-requests 50000
}

logging {
	# Log file must be an absolute path.
	file /var/log/aerospike/aerospike.log {
		context any info
	}
}

network {
	service {
		address any
		port 3000
	}

	heartbeat {
                mode mesh
		mesh-seed-address 4 servers addresses
#
                port 3002
#
		interval 150
		timeout 30
	}

	fabric {
		port 3001
	}

	info {
		port 3003
	}
}

namespace productiont {
	replication-factor 2
	memory-size 40G
	default-ttl 30 # 30 days, use 0 to never expire/evict.
	 write-commit-level-override master
	 high-water-disk-pct 50 # How full may the disk become before the
                           # server begins eviction (expiring records
                           # early)
    high-water-memory-pct 85 # How full may the memory become before the
                             # server begins eviction (expiring records
                             # early)
    stop-writes-pct 90  # How full may the memory become before
                        # we disallow new writes
	storage-engine memory {
	storage-engine device
          file /data/aerospike.dat
          #device /dev/sdb
          write-block-size 8M
#	max-write-cache 256
          filesize 580G
        }
}

I am not sure the database would start with a config as this one for the storage-engine:

Looking at log files comparing the two systems may give some clues. I am not sure whether there would be something about those AMD Ryzen 5 CPU that would cause any issue or if it is something else in the configuration itself.

Hi meher

Im already remove storage-engine-memory. On my researches, i found something weirdo. When i start the old java application, the network bandwith going on the limit.

Hard to comment much more with this limited data set. If you have an Enterprise License, you can hit Aerospike Support. Otherwise, check your server logs or monitoring dashboards to see if there is some observable pattern. Could be a small performance regression causing clients to over-react and try to compensate and make things worse…