How does it work?
When you (client) ask to update (i.e., append to existing record) a record, client sends the [ digest (hash of your key + setname) + namespace ] --> (key object) to the server and the string you want to append - depending the bin type you are using - assuming its a list bin…
At the server: For the specified namespace, the server
1 - looks for the Primary Index of this record using the digest in RAM - (R-B tree search)
2 - from the primary index, it finds which device your record is stored on, at what offset, and how long is it (bytes). It retrieves the record in memory. (SSD–> READ OP)
3 - It then does whatever append operation you asked for on the record in memory.
4 - It then writes the record to the current write-block in RAM - the 1MB block that is being filled with new or updated records. It also changes the Primary Index of this record you are modifying to point to data in this different write-block.
5 - this write-block is flushed to device when full, asynchronously or partially every 1 second till full. (SSD - WRITE OP) The primary index now points to this new version of the record.
Note: We don’t update a record “in-situ” in the previous write-block where the old version of the record was.
5a - The write-block is placed in a write-q - normally the q will be zero depth - its just a buffer to allow the device write thread to deal with burst loads. Typically it go to write-q to --> device --> and then to another queue called the post-write-queue (depth 256 - FIFO) which allows for reading recently updated records, e.g., by XDR - cross data center replication feature.
6 - the device defrag thread will eventually recover the unpointed to space on the device of the old version of the record. This concept of defragging is really the server’s point of view in terms of the write-blocks its allocating which can be seen as “software” data structures. Not really what the SSD controller is doing at the SSD level. The SSD controller has its own world to deal with in terms of the blocks and pages that store the write-block data. From the SSD controller’s point of view, its storing these “write-block” data structures for the server, what it does underneath only the controller knows. The server code manages its view of these data structures.