I’m facing this weird java client behavior where it stops fetching results based on the secondary index. I have a secondary index on a field called node_id (string). My java client queries based on the node_id and retrieves the records. The code runs on a jetty server. Here’s the scenario. I deploy the code, start the server, everything works. But after few days of running, the query stops fetching result based on node id. I’m able to query based on node_id using aql. Moreover, the failure is happening for random node ids, rest works without any issue. The only remediation, in this case, is to restart the server, after which it goes away, only to come back after a few days. The same client seems to work great for queries based on primary key, the issue is limited to secondary index only. Here’s the code snippet:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 8
transaction-threads-per-queue 8
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /mnt/ebs2/aerospike/log/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
address xx.xx.x.xx6
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port xx.xx.xx.xx9 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace caas {
replication-factor 2
memory-size 10G
default-ttl 0 # 30 days, use 0 to never expire/evict.
#storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
storage-engine device {
file /mnt/ebs2/aerospike/data/caas.dat
filesize 200G
write-block-size 1M
# data-in-memory true # Store data in memory in addition to file.
# }
}
I’ve 2 aerospike nodes, running on version 3.6.2-el6.
I’m totally confused on what can go wrong here, any pointers will be appreciated.
Three client instances are being created when one will do. Each client spawns a cluster tend thread which periodically polls the server nodes for cluster status. It’s extra overhead to create more than one client per cluster.
Since you are using both async and sync commands, I recommend creating a single AsyncClient instance. AsyncClient can perform both async and sync commands because it inherits from AerospikeClient.
In any case, the code you provided should work (if client.query() is called before the try/finally block). Are you getting an exception (result code?) or just no returned records?
I created a separate connection for read, write and async as I was struggling to apply the appropriate policy for each type. Are you saying I can create one instance of AsyncClient and can re-use that for both read and writes ?
Meanwhile,I’ve updated the code to move the query execution before the try block and handle resultset as per your suggestion. I’ll keep an eye on the issue and see if it resurfaces.