We have an issue while using php client.
In our situation, we have 5 aerospike servers ( version 3.5.15) and 10 php servers. On each php server, we have 500~1000 php-fpm processes.
After deployed on php server with php-client, the aerospike servers got very high CPU usages (1200% above on 24 cores machine), even there is only few active write request.
we have enabled the persistent connection flag on the php client.
It seems the client connections from php to server is very high (over 700 on each server). However, most of the connections are not in heavy load. I think the aerospike server should handle idle connections efficiently.
And with the perf top -p $AS_PID I can see the 10%~15% percent is used by the get_random_replica function. I saw this function is removed since 3.6.0, but I am not sure is this the whole truth that cause the high CPU usages.
Is there any reason to cause so high CPU usage and how to fix this?
Thanks.
We have done the stress test. With the aerospike.shm.use = 1 enabled, only a little improved.
The test is running with 300+ php-fpm processes on single machine.
When the server client connections showed over 700, the CPU usage is reaching 1000%+ on 24 cores machine.
And I tried the server version 3.6.4 has the same issue.
Again, understand that it only trims 1 connection thread per-process when you use shared-memory cluster tending. The issue is still that you have 2*N connections per-process where N is the number of nodes in the Aerospike cluster. Multiply that by 300-700 and you get the idea.
Drop the number of concurrent FPM processes, keep the max_requests at a high number, and see what impact that has.
Reduce the fpm processes is not possible currently. Maybe we need do some proxy connections between the php and server. Also I think maybe it is possible for the aerospike server to handle large idle connections more efficiently.
Wait, you can’t configure your own PHP server? The Aerospike server already handles a very large number of concurrent connections. Your problem is on the PHP app nodes, not the Aerospike server nodes. It’s the amount of CPU being used by way too many concurrent FPM processes, each of them holding an Aerospike client under the covers.
In an environment like Java and C# all the operations happen from a single, high-performance Aerospike client. Since PHP is set up to run as shared-nothing, those processes cannot share a client, and must all hold their own.
Therefore, you need to reduce the number of concurrent FPM processes, increase the number of requests each of those make. Less overhead spent on initializing the process, and less clients on a single node. It’s not that complicated to test.
Anyway, at some point you can show your FPM config, and your Aerospike client (ini) config.
Understand that having hundreds to thousands of clients on a machine with 24 is going to be problematic. The Aerospike client is a heavy client - it keeps track of the state of the cluster, it connects to each of the nodes directly, at least two connections per-node. It’s all in the manual, but I can point out the links if you can’t find it.
What is your server like? What hardware is it on, what does the configuration file look like?
I turned on the debug log on the demarshal, and seeing lots of Sending Info request via fast path. . Maybe the php client can reduce the info request frequency when idle?