I am using a record with multiple bins. I wanted to know the performance implications of updating one of the bins in the record. Will the whole record be gotten in memory and then the particular bin updated or will just that bin be pulled up in memory, modified and then written to disk. Yes, we are using the index in memory and data on disk setup.
The reason this thought came to mind is that since only key digest is kept in memory and not (key, bin) digest. Hope I didn’t get the concept wrong. If such is the case, then how does Aerospike get to the bin if not loading the whole record in memory and getting to the bin. In that case, we can see that there will be a big read and write amplification on updating of just one bin.
Can you please clarify as this stands crucial to my data modelling.
You are mostly correct, the default write policy is ‘update’ which means to merge new bins with existing bins. The exception to this is when using single-bin
configuration, in which case all writes are replaces
. You can find more information here: FAQ - What is difference between update and replace?.
Notice that even when updating a record that only has a single bin in it, by default (when not using a single-bin
configuration), the write will read the existing record to merge the incoming bin with the existing bin, even if they are the same bin.
@kporter Thanks for your response.
Default write configuration being update means all the bins in the record will be read into memory, then merge operation will be performed and the record will be written back on the disk ( SSDs in our case ). So there is Read and Write Amplification in this if all i am intersted in is updating just a single key in one of the bins that happens to be a map. And yes, index will be updated to the new location of the record.
In case of replace policy, no read of the bins will be done and the record will be written to the disk Here fundamentally there is none of the amplifications if all I am interested in writing a new value to the record. So in case of key value kind of setup, replace is the default which makes sense.
So it will be a foolish design to have a record with multiple bins all of which happen to be decent enough sized hash maps. That will be an overkill. It will be better to have seperate records for each hash map, as the amplification will be scoped to just that hash map. Right?
If the bin is a map, then you will still need to read the record, to get the current map, before you can update the map.
A record in aerospike is an atomic unit. If you need to update multiple bins in one atomic step (where you cannot read the record where the transaction has been partially applied), then you will need to use multi-bin.
The read amplification can be reduced in multi-bin by either increasing the post-write-queue
or configuring read-page-cache
. More information can be found here: Buffering and Caching in Aerospike