Aerospike Community Forum

Network bottleneck on GCP

Operations Tuning

Foo34 September 5, 2017, 4:06am 1

Hello,

I have an Aerospike cluster on GCP:

21 nodes 16 CPU / 64 GB
Only in memory
Between 4 and 5 MTps.
Last version 3.14.1.2-1, kernel 4.4.0-93-generic

It seems that now the network is the bottleneck.

Each node uses 500 MBit/s of bandwidth.
Sometimes I have this error in kern.log:

net eth0: Unexpected TXQ (2) queue failure: -28

In aerospike.log, I can see some error:

could not create heartbeat connection to node xxx

Tuning of txqueuelen / net.core.rmem_max / net.ipv4.tcp_rmem / net.ipv4.tcp_congestion_control does not change anything.
The only workaround I have found is to do ethtool -L eth0 combined 8, or 4 (default value is 16). It seems to help a lot, I’m not sure why.
Some time kicking ‘bad performing’ server out of the cluster seems to help. May be some GCP nodes have less available bandwidth.
Adding or removing nodes (19 or 23 instead of 21) does not change perfomance a lot (with more nodes, the global load is lower, but latency issues are still here).

Here are some questions:

do you think that 21 small servers is too high for aerospike ? I prefer to have small server to avoid transactions bottleneck.
do you have any idea of what I can tune on network config ? Or what I can check / monitor ?
do you think using two network cards (one for clients, one for the cluster) can help ? It seems to be possible on GCP, but not easy. And I do not think it will be mapped on different physical network cards.

Thx

Topic		Replies	Views	Activity
Intermittent high latency Tuning	1	1547	April 17, 2018
Need Urgent help in tuning Production AS Tuning	1	1671	November 9, 2017
Inter node bandwidth: what is causing bandwidth difference? Operations	4	1423	August 3, 2015
Multiple performance problems Tuning	4	3631	September 4, 2015
Poor performance after migrate environment from gcloud Tuning	3	440	February 14, 2023

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.