To start, if you have that many processes (5000 - 10K) they definitely should be using shared-memory cluster tending (
aerospike.shm.use = 1) as well as persistent connections. If you do not use shared-memory for the cluster tending, then you’ll have hundreds of processes that each need their tending thread to check every second on the state of the cluster. Potentially 1000 processes that just for tending have to cycle through 24 cores every second, on top of any actual KV operations, etc.
Further, understand that persistent connection means that each FPM process will have one aerospike client, and that costs at least 2 * N connections, where N is the number of nodes in the cluster.
The idea is to pay the overhead of initiating the client (connecting to the nodes, learning the partition table, etc) once and then reusing the client for many requests. How many connections is your FPM process set to take before it terminates (max_connections)? The (very) old config is to have tens or hundreds of concurrent processes that only accept a few hundreds of requests. That was tuned for the era of PHP < 5.3.3 when memory leaks forced you to kill the process before it bloated. But, PHP has been stable on memory for a long while, so the correct configuration would be to have far fewer processes, and let each of them handle a much higher number of requests. This reduces a lot of overhead, and uses the PHP client correctly.
Let me know how you have it configured now, please.