Reliable way to know when Aerospike server is ready to accept DB operations


#1

Hello all,

I need to know if there is an existing way as to how one can be sure that Aerospike server is actually ready to accept incoming connections from the client for any PUT/SCAN operations.

Merely checking the aerospike service status doesn’t ensure if the DB is actually ready for DB operations. We can put sleep timers before attempting to do a PUT from the client but that is not very graceful.

How are you guys dealing with a situation like this? Let me know if anyone needs more explanation regarding this from me.


#2

Normally clients make an array of hosts to connect to, if the first host is not reachable it will contact the second and so on until it is able to connect… Once the client is connected to the cluster, it should auto-discover any new nodes. If this is a question of bringing up/down a 1 node cluster, or performing an operation specifically on 1 node, waiting for port 3000 to be available or checking the output of “status” in asinfo might be sufficient, but waiting for cluster size>1 would be even better. What are you trying to do?


#3

Hi,

Thanks for the reply. I am running an openstack stack in which all the instances depend on atleast 1 DB node to be available and be ready to accept and perform operations. Also, I am using FQDN based resolution which will get resolved from a DNS local to the stack itself.

All is working just fine but sometimes due to some delays in starting DB process, I see few writes getting failed for which I need to find a mechanism that is reliable enough so that I am 100% sure before attempting to write/read from the DB.

As of the cluster, FQDN based resolution are good enough in my case to form a cluster but my issue is related to failures in DB operations due to delay in readiness of DB after service restart.

Regards


#4

Hi Amit, which client and version are you using?

The Java client for example had a fix in version 4.1.0 before which it may have sent transactions to a node before the node was ready to accept them.

  • Do not allow nodes into cluster until node partition maps are fully initialized (partition-generation != -1).

There may be other situations, though, where a node is potentially accepting transactions when not ready or the client thinking as such. If you have logs (client / server) showing client errors you are getting in such situation and what state servers are, it may be useful.


#5

Hi Meher,

I am using python client 1.0.46(which is very old but is fine for what I am intending to do). Is there any handling in python client like the one you have mentioned for java client?

I first check if a connection is being setup between DB and client. After that I attempt to write into DB but that fails with basic error pasted below:

(11L, 'AEROSPIKE_ERR_CLUSTER', 'src/main/client/put.c', 106)


#6

I see. Let me check if there is anything known about this client… did you try with latest python client by any chance?


#8

I would suggest trying with the latest client version. If you still run into this issue, we would want to look at the server side log files. Having said this, for scans, those would fail during migrations (but there is a policy that would allow them to still go through, albeit with potential missing or duplicate records - again, only during migrations).