Batch Writes Failing with NO_AVAILABLE_CONNECTIONS_TO_NODE

Hi All,

We are using Aerospike on production for one of our use case. We perform both read as well as write transactions on the DB. Currently, in one of the flow, we update ~1.5K records with individual DB calls one after another. We recently, changed this to leverage Batch Write feature in server version v.6.0. After switching to batch writes, we are seeing NO_AVAILABLE_CONNECTIONS_TO_NODE issues while performing batch writes.

  • Connection Pool Size: 100 [default]
  • LimitConnectionsToQueueSize = true [default]
  • Socket and Total Timeout = 20 seconds
  • Concurrent Node in Batch Write Policy = 1 [One node at a time]

We tried few changes like,

  1. Change LimitConnectionsToQueueSize to false
  2. Reducing batch size from 1.5K to 500 records
  3. Initially Concurrent Node in Batch Write Policy was All nodes, but then we changed it to 1. [One node at a time]

But we are still facing same issue, surprisingly when we move back to our older implementation of separate 1.5K calls, we don’t see any call failing due to NO_AVAILABLE_CONNECTIONS_TO_NODE issue.

If batch write uses less connections, then it should certainly not fail with NO_AVAILABLE_CONNECTIONS_TO_NODE error compared with individual call approach.

Please help us understanding problem here. Thanks in advance

What are the versions of the client/server involved? What language SDK? Are there any server-side errors to accompany the issue? Do you have client-side logging of the Aerospike driver Logging | Developer enabled? Are any metrics on the server side increasing during the issue like batch_index_delay batch_sub_write_error? Does the server-side have any reported issues, non-info messages in the logs? Are there server-side bottlenecks such as cpu/memory/network? Why are you using socket timeout=20s instead of using keepalive socket timeout=0? I’ve noticed sometime I have issues when my applications start up and instantly try sending large batches and they get some relief by setting minConnsPerNode=100 or something higher than the default 1. This gets the connection pool warm (alternatively use the WarmUp method).