The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
Details
The configuration write-block-size
defines the size in bytes of each I/O block that is written to the disk. You can increase or decrease this depending on your record size. The default value is 1MB and the configured value of this parameter must be a power of 2, so the different options are: 128K, 256K, 512K, 1M (and 2M, 4M or 8M as of version 4.2). To identify the optimal settings, we would recommend running a benchmark tool (ACT) or contact Aerospike Support for guidance (Enterprise Licensees).
What are the recommended values for write-block-size
?
Empirical testing results have shown that, in general, a write-block-size
of 128K for Flash devices and 1MB for Hard Disk Drives (the default value) provides best performance, but this can vary based on the device brand, size, and workload.
Are read transactions impacted by the write-block-size
?
Read transactions are not directly impacted by the write-block-size
. Records are stored in 16 bytes increments (RBLOCK) as of version 4.2, and 128 bytes prior to that. The i/o size on disk depends on the disk itself and Aerospike will detect the smallest possible read i/o size. Having said that, the write-block-size
will have a general impact on performance. This will depend on the nature of the workload and it is a good practice to benchmark with different values.
Can I set the write-block-size
lower than 128KB?
A write-block-size
of 128K is typical for Flash storage, but the optimal value depends on the Flash device used. Having a smaller write-block-size
causes more hits to the SSD, which in turn creates more I/O operations and increases the defragmentation process.
I have records larger than 1MB - what are my options?
For server versions prior to 4.2, the only option is to consider splitting the records and handling the merge on the client side. For server versions 4.2 and above, it is possible to increase the write-block-size
. This could adversely impact the overall performance of the system, though. The defragmentation of larger blocks involves longer large-block reads, where the entire block is read, injecting latency into other operations. A benchmark tool, such as (ACT,) should be used to quantify the impact of larger blocks on latency.
With a default 64 MB of âmax-write-cacheâ and 8 MB write block size, you could hit âqueue too deepâ faster:
WARNING (drv_ssd): (drv_ssd.c:3691) {test} write fail: queue too deep: exceeds max 8
How do I change the write-block-size
configuration?
To update the write-block-size
setting:
-
Open
/etc/aerospike/aerospike.conf
. -
Configure the namespaces
write-block-size
to the new desired size. Note that this configuration is placed inside thestorage-engine device
stanza. This is not applicable forstorage-engine memory
. -
Restart the server:
/etc/init.d/aerospike restart
. -
Continue with other nodes in the cluster.
Can I change the write-block-size
configuration in a rolling manner on a cluster?
The write-block-size
configuration is static but can be changed in a rolling manner across all nodes in a cluster. Here are a few points to be attentive to, though:
- When increasing the
write-block-size
, records with the new âincreasedâ size should not be written until all nodes in the cluster have been re-configured with the increasedwrite-block-size
. - If decreasing the
write-block-size
, it is necessary to first delete all the records that are bigger than the new lowerwrite-block-size
and also zeroize the disks which will require to wait for migrations to complete between each node. - If running the older cluster protocol (versions 3.13 and older), it may also be necessary to wait for migrations to complete between each node (depending on the nature of the write workload).
Can I increase and decrease the value with only rolling service-restarts on a cluster?
Once the configuration is increased, it cannot be decreased without zeroizing the disks.
What happens when I write records bigger than the configured write-block-size
?
This configuration upper-limits the maximum size of the record that can be written on the cluster. Any records with a size bigger than write-block-size
will trigger an error to the client and a write-failure.
Server side logs / stats
- Versions prior to 3.16:
Jan 22 2017 16:39:55 GMT: WARNING (rw): (thr_rw.c:write_local_ssd:4658) {namespace1} write_local: failed as_storage_record_write() <Digest>:0x88bb698a04a26517e5528hje57ed188e12ab29f4f
Jan 22 2017 16:39:59 GMT: WARNING (drv_ssd): (drv_ssd.c::1568) write: rejecting 1765a2048a69bb88 write size: 131328
- Version 3.16 and above:
Aug 09 2019 00:06:21 GMT: DETAIL (drv_ssd): (drv_ssd.c:1516) write: size 9437246 - rejecting <Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13
Aug 09 2019 00:06:21 GMT: WARNING (rw): (write.c:1265) {bar} write_master: failed as_storage_record_write() <Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13
Aug 09 2019 00:06:21 GMT: DETAIL (rw): (write.c:822) {bar} write_master: record too big <Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13
Aug 09 2019 00:06:22 GMT: INFO (info): (ticker.c:884) {bar} special-errors: key-busy 0 record-too-big 217
-
The DETAIL lines will only appear if the appropriate log contexts (
rw
anddrv_ssd
) are set todetail
. -
Unlike the stat, the special-error log ticker and the ârwâ context line, the âdrv_ssdâ context line occurs on all oversized attempts, including replica writes, immigrations, and applying duplicate resolution winners.
-
The
fail_record_too_big
statistic will be incremented on each occurrence.
Error seen on the client
AS_PROTO_RESULT_FAIL_RECORD_TOO_BIG - Error code 13
Can I determine what set is being written to, when these server log messages come up?
Refer to the knowledge base article, How to return the set name of a record using its digest.
Important considerations
A few considerations on the write-block-size
parameter:
- This configuration is on a per-namespace basic and is only configurable if the storage-engine is
device
for the namespace. - The value of this parameter must be a power of 2, so your options to decrease are: 128K, 256K, 512K, 1M etc.
- Performance characteristics of your cluster may change so careful monitor is necessary.
References
Link to the configuration reference:
To identify the optimal configuration for your setup, we recommend testing your SSDâs with our certification tool (ACT).
Server log reference for âwrite_master: failed as_storage_record_writeâ.
Server log reference for âwrite_master: record too bigâ.
Keywords
WRITE-BLOCK-SIZE WRITE BLOCK SIZE
Timestamp
August 2019