Client Connection taking time in cluster mode

Hi All,

We’ve been using Aerospike for a while and are using the Python Client to connect and query the same. The DB was set on a single node till now. We’ve been trying to setup the cluster on the same by using the Mesh mode of communication in AWS. We’ve changed the configuration on both the instances we are running to include the other instance details (private-ip) to connect to. Post restarting the DB server on the initial instance and starting DB server on newer instance, I could see the data getting distributed between the two and few minutes later both were in sync in terms on number of objects present (both as master and replica) using show sets command.

Till now it is all fine. But the client using trying to connect to the cluster, which has both the instances configured in the Hosts, has been taking order of seconds to connect and close the connection. Whereas previously it was order of milli-seconds.

Can I get some pointers on what I could be doing wrong and ways to fix the same. Please let me know if any additional details are required.

Thanks, Karthik

What does your “before” and “after” topology look like?

eg: Aersospike server on what kind of server? - physical/vm/aws? Client on separate or same machine? Client to cluster network connection? Number of NIC interfaces? Please share some details.

We’ve been running the aerospike servers on m4.large type instances of AWS inside a VPC. The cluster nodes belong to the same subnet inside the VPC (though are not part of any placement group). Apart from the private-ip attached to the nodes, elastic-ips have been attached to each.

The client is on another machine trying to connect to the cluster using the IP addresses of the cluster nodes.

Thanks, Karthik

One NIC interface per node.

How are you measuring the time to open a connection from your client machine to the server? When you were working with single node, was the client machine and link to single node server identical as now? Most likely your TCP/IP link between your client machine and the Aerospike cluster is slow. I would also suggest you read through this link: Aerospike on Amazon EC2 - Recommendations | Aerospike Documentation

I also did some quick basic testing of your problem with a one node and two node cluster on AWS, client machine is VM in my desktop going over wifi to AWS. Test code is at: https://github.com/pygupta/aerospike-discuss/tree/master/topic3717 I found that with python client one node cluster, I get:

INFO: Connection to Aerospike opened in 0.667096138 seconds.
INFO: Connection to Aerospike cluster closed in 0.0881290435791 seconds

Whereas, with a two node AWS cluster I get:

INFO: Connection to Aerospike opened in 2.93013811111 seconds.
INFO: Connection to Aerospike cluster closed in 2.19174313545 seconds

My logic is also similar to what you wrote. The Client is an AWS Lambda - which is also in the same region as the server nodes. So my TCP link is pretty much the same. Not sure why the time has increased ?

Data using C Client on my Desktop, Ubuntu 16.04 VM: One node cluster on AWS.

Successfully Connected to Aerospike in 0.001570 seconds!
Closed connection to Aerospike in 0.000329 seconds!

Two node cluster on AWS:

Successfully Connected to Aerospike in 0.002793 seconds!
Closed connection to Aerospike in 0.000811 seconds!

Repeat, two node:

Successfully Connected to Aerospike in 0.002246 seconds!
Closed connection to Aerospike in 0.000524 seconds!

The code is in the same location as earlier in topic3717 subdir, file: example.c, Makefile included. make will produce executable in target subdir. run as: target/example. Add your own AWS node IP address to test.

Some extra time is needed when connecting to a multi-node cluster because client connects to first node, gets cluster topology and then opens direct connections to all nodes. If time to establish connection is important to you, perhaps you can switch to C Client. The delay we are seeing may be Python related.

Interesting, thanks for the insights in here.