Hello, we’ve been told that bin operations require full record read from the device which I suppose it makes sense from a device utilization point, but then it looks like that small partial updates on a big record are slowed down because the server reads the whole update from the disk and then writes it back. Meaning that technically “partial” updates are only partial between client and server and not server ↔ device communication.
That is correct. Partial bin update saves you moving entire record data on the network from client to server. On the server, entire record must be read, affected bin updated and then written back to device at a different location on the device.
You cannot flip bits on an SSD in-situ and Aerospike does not store the record in “bin fragments” with connecting pointers - inefficient. Entire record is stored contiguously and hence limited to max 8MB, including overheads - the max possible record size with the now fixed 8MB write-block size.
(The record is read from device unless its last update is still present in one of the caches - the current write buffer, the max-write-cache or the post-write-cache)
Memory layout of record data is same as device layout of record data now and partial updates are treated the same. This is inherently necessary now that Aerospike supports transactions in version 8 onwards, and has the ability to roll back to previous version of the record should client decide to abort the commit.
Way back in old versions of Aerospike, there was a possibility of updating bin data in-situ while record was still in the write buffer, the next change was same size bytes-wise and the update came within the 1000 ms flush-max-ms … a very rare situation - but that too was eliminated starting with version 7 or perhaps even earlier.
.. I’ve been looking into similar behavior and was wondering if it was just me.
From what I understand, even simple bin updates like increment or append will mark the record as modified, so depending on your storage configuration (especially with data-in-memory disabled), that can trigger a full write back to the device. I noticed this more clearly when tracking IOPS under load… even light updates can stack up fast.
Might be worth checking your namespace config to see how your write policies and storage-engine settings are tuned. Curious to hear if anyone’s found a clever way to optimize for smaller bin ops without triggering full rewrites every time.