CPU loading

denis_r · February 15, 2021, 9:08pm

Hi, everyone. We used C# client of version 3.9.12. All of our services are hosted in Kubernetes (aerospike cluster and services which are working with aerospike). In case some of node is down we have the following problems:

Big CPU increasing for all services which work with aerospike
During some times requests continue send to this node(which is down). We know that this node proxies these requests to other nodes, but it is difficult to control for situations when we wants to make for example maintenance.

Could somebody help how to solve these problems ?

Albot · February 17, 2021, 1:33am

Can you describe the node ‘going down’? Is this because of some issue, or planned maintenance? What is your heartbeat (server) settings, and tend interval (client) set to?

denis_r · February 17, 2021, 2:13pm

Hi. It happened during planning maintenance. We used defaults for heartbeat and tend interval without any overrides. May be will you advice better configuration for it or something another ?

Albot · February 18, 2021, 6:52am

Is the replication factor 2 or greater for all namespaces? Can you quantify ‘big cpu increase’? For our use case we use enterprise edition, which allows quiesce. With quiesce you can gracefully remove the node without causing errors to clients. I think without quiesce it is normal for some period of errors until heartbeat detects node is gone and tend picks up new partition map - but the high CPU usage is interesting. Have you profiled it? Is this high as in like ‘50% increase on a 1vCPU container’ or higher as in ‘64 saturated cores’? I’m curious if the high cpu load is driven by application design rather than aerospike

denis_r · February 19, 2021, 2:27pm

Hello. Replication factor is 2 for all namespaces, we use gracefully removing the node. Result is the following - increasing CPU till 100% (initial state is about 30 - 40 % of usage). CPU loading is continues till service restart. Also we are seeing that during 15 minutes and more this node proxies requests which is continued sent to it.

meher · February 23, 2021, 5:15am

Hard to guess much more on such symptoms without logs. But as @Albot mentioned, for smooth maintenance, one should make use of the quiescence feature, otherwise, based on the network heartbeat settings, it could take some time for a node to be recognized as having left the cluster, which could be long enough for clients to try to compensate for higher latencies / failed transactions, which would cause a surge of connections (hence CPU).

Eliminatured · April 8, 2021, 8:34pm

I think without quiesce it is normal for some period of errors until heartbeat detects node is gone and tend picks up new partition map - but the high CPU usage is interesting. Have you profiled it?

meher · April 8, 2021, 9:36pm

CPU increase can have different root cause, depending on the configuration… it could be caused by connections churning due to clients having to compensate and retry when failing against the node that is going down. It could also be driven by migrations starting. Logs would have details that can help narrown down.

Topic		Replies	Views
Multiple performance problems Tuning	4	3630	September 4, 2015
Intermittent spikes in user mode cpu usage only on a single aerospike node	8	1064	November 23, 2022
Help to configure Aerospike for lowest CPU consumption (for develop environment) Configuration	3	1049	April 6, 2021
Client failures when a node is removed Operations	3	1239	September 9, 2017
CPU unusually high on one node of 8 node cluster	10	2931	April 17, 2017

CPU loading

Related topics