We have a use case where we want to replicate the aerospike records to our data warehouse. With the release of query based on LUT, we can get all the records that have been modified since X and merge it into our data warehouse. One issue that we found is query based on LUT does not send over deletion. As a result, we have deleted records in our data warehouse. Is there any approach to get the stream for the delete records?
Also, so we have a use case where we need to do some action based on the record delta. For example if we have a record:
{"planets": ["mercury", "venus", "earth", "mars"]}
Then it get modifies to
{"planets": ["mercury", "venus", "earth"]}
We want to be able to know that "mars" was deleted, so we can perform some additional action. Is there any approach to this?
Assuming you are CE user and don’t have access to EE features, to handle shipping deletes you have to manage them yourself. It inevitably involves the client taking over that responsibility. When you delete a record in CE, it deletes its Primary Index in RAM - so query with predicate filtering on LUT will not find it. In Enterprise Edition, you can opt to store a tombstone or use Cross-data-center replication feature.
So, 1) as @Albot suggested, do a simultaneous delete in the “data-warehouse” cluster from the client app or 2) depending on your delete frequency, and when you run the database query, write-block-size etc - set up a record or few records that store digests of deleted records in a bin as a list on the transaction cluster. i.e. key: deleteme, bin: [digest1, digest2, …etc., each digest is 20 bytes] - then every time you delete a record in the transaction cluster, append to the deleteme list record. Then, when you do the data-warehouse update operation, run the delete on each digest and clear the delelteme list. Or, consider EE.
Second question - looks like one client modifies a record, you want to be notified so another client can do some additional action. There is no log of transactions in Aerospike, neither can it be made to emit signals on updates. So again, you may have to create your own “log of modified digests” in few dedicated records and visit them from the other client. Similar to the delete solution above. Client has to do the double work. And since you are doing multi-record update for every transaction - the map you want to modify + logging the update, there is always the chance that first one succeeds and second one fails. But here, retry on failure on second one will not adversely impact the method.
What are you trying to query? You can get the number of tombstones, but the tombstones themselves are no longer accessible from the client - they behaves as if they aren’t there.
A tombstone of a record has all its bins set to null - i.e. deleted. So there is nothing left to query off the record except perhaps the record’s digest. Current Record UDF API doesn’t support filtering based on whether the record has turned into a tombstone. Hypothetically, if such a feature were to be added, all you could get back really is the last-update-time and digest of all the tombstoned records in a scanUDF operation. How would that help you?