We are using Aerospike 2.1.1 ruby client on production and querying for some records using the batch_get() API. We are suspecting that this method is erroneous because a lot of the times it returns nil response for keys that do exist in the cluster. Its definition states a #TODO to implement wait till migration is over. We are suspecting that while migration is happening on the cluster, some of keys move to new partitions and the ruby client still queries on the old partitions and fails to get the record. This inconsistency is causing lots of errors in our system.
Could this missing #TODO implementation be the cause for this inconsistency?? If yes, has this already been fixed and released in newer version? If not, will this be taken up soon and fixed?
Should we avoid using batch_get() API till then? Or could there be some other problem with the cluster and that the batch_get() API is working as it is intended to?
Someone please help us out asap as we can’t afford to have data inconsistency on our platform.
Thanks.
That’s correct. The Ruby client still uses the older Batch Direct protocol. The difference between these two batch protocols is explained in the Aerospike documentation; a discussion of the impact of migrations on batch requests using the Batch Direct protocol can be found in this forum thread.
The only possible work-around for the Ruby client is to use single-key requests instead of batch requests. Adding a feature to the client to wait for the migration to finish (as suggested by that TODO) is not a viable solution, as migrations can potentially take a long time. The better solution would be to implement the Batch Index protocol.
I am also using ruby client but sometimes partition map is not fetched correctly and hence resulting in wrong partition map and due to which batch_get starts failing