Recently we have upgraded our Aerospike version from community edition version 5 to enterprise edition version 6.1.0 for Batch write support. But while using it we encountered Segmentation fault error on Aerospike Server.
On investigation, we found that when we delete record using new BatchWrite(new Key(namespace, set, key), Operation.array(Operation.delete())) , we get Segmentation fault but not when deleted using new BatchDelete(new Key(namespace, set, key)).
Can someone please tell the difference between these two functions and why we are getting error on deleting record using BatchWrite?
We did try to quickly reproduce this and could not, we will add details once the case is opened. We are going to ask for your help finding a minimal reproducible scenario.
Thanks for the code to reproduce. It took a while, but we eventually reproduced it. (Note - if you configure “debug-allocations true”, then it happens every time.)
It is a double free. It has nothing to do with batch writes in particular – any write with the delete-all-bins operation will double free the bin data. It also only is an issue if you are configured “single-bin true” and “data-in-memory true”, which of course you are. (Note – in 6.4, two releases from now, the single-bin configuration will no longer be supported, and you should consider moving away from that.)
We will fix this today, and it will be available in a hotfix next week.
Separately, I am a bit worried by the log lines showing in your first report:
Feb 27 2023 07:49:35 GMT: WARNING (flat): (flat.c:400) extra rblocks follow flat bin
Feb 27 2023 07:49:35 GMT: WARNING (record): (record.c:493) {replicator} record replace: failed unpickle bin 3309a6d6d12e49995f0184c1b674498c44d0bf18
The bug we are fixing does not explain these, as far as we can tell. One thing that might explain them is you are running a cluster in which this node is “single-bin” but some other node is not.
Anyhow, we will definitely fix the crash, whether or not we can explain the above warnings.
Thanks for responding quickly. One thing I still don’t understand is why deleting records with BatchDelete function doesn’t cause this issue. Isn’t BatchDelete also delete-all-bins operation?
BatchDelete is not the same thing. It generates a record delete transaction that goes on a different code path, as opposed to a record delete operation that can go within an atomic set of operations. If all you want to do is delete a record, BatchDelete is fine. If you want to mix deleting the record (atomically) with other operations, like read it then delete it, then you would use the delete-record operation with other operations. You can of course use it as an operation by itself, and it will do the same thing as BatchDelete, but via a different code path which had the bug.