FAQ - Write Block Size

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

Details

The configuration write-block-size defines the size in bytes of each I/O block that is written to the disk. You can increase or decrease this depending on your record size. The default value is 1MB and the configured value of this parameter must be a power of 2, so the different options are: 128K, 256K, 512K, 1M (and 2M, 4M or 8M as of version 4.2). To identify the optimal settings, we would recommend running a benchmark tool (ACT) or contact Aerospike Support for guidance (Enterprise Licensees).

What are the recommended values for write-block-size?

Empirical testing results have shown that, in general, a write-block-size of 128K for Flash devices and 1MB for Hard Disk Drives (the default value) provides best performance, but this can vary based on the device brand, size, and workload.

Are read transactions impacted by the write-block-size?

Read transactions are not directly impacted by the write-block-size. Records are stored in 16 bytes increments (RBLOCK) as of version 4.2, and 128 bytes prior to that. The i/o size on disk depends on the disk itself and Aerospike will detect the smallest possible read i/o size. Having said that, the write-block-size will have a general impact on performance. This will depend on the nature of the workload and it is a good practice to benchmark with different values.

Can I set the write-block-size lower than 128KB?

A write-block-size of 128K is typical for Flash storage, but the optimal value depends on the Flash device used. Having a smaller write-block-size causes more hits to the SSD, which in turn creates more I/O operations and increases the defragmentation process.

I have records larger than 1MB - what are my options?

For server versions prior to 4.2, the only option is to consider splitting the records and handling the merge on the client side. For server versions 4.2 and above, it is possible to increase the write-block-size. This could adversely impact the overall performance of the system, though. The defragmentation of larger blocks involves longer large-block reads, where the entire block is read, injecting latency into other operations. A benchmark tool, such as (ACT,) should be used to quantify the impact of larger blocks on latency.

With a default 64 MB of ‘max-write-cache’ and 8 MB write block size, you could hit ‘queue too deep’ faster:

 WARNING (drv_ssd): (drv_ssd.c:3691) {test} write fail: queue too deep: exceeds max 8

How do I change the write-block-size configuration?

To update the write-block-size setting:

  1. Open /etc/aerospike/aerospike.conf.

  2. Configure the namespaces write-block-size to the new desired size. Note that this configuration is placed inside the storage-engine device stanza. This is not applicable for storage-engine memory.

  3. Restart the server: /etc/init.d/aerospike restart.

  4. Continue with other nodes in the cluster.

Can I change the write-block-size configuration in a rolling manner on a cluster?

The write-block-size configuration is static but can be changed in a rolling manner across all nodes in a cluster. Here are a few points to be attentive to, though:

  • When increasing the write-block-size, records with the new ‘increased’ size should not be written until all nodes in the cluster have been re-configured with the increased write-block-size.
  • If decreasing the write-block-size, it is necessary to first delete all the records that are bigger than the new lower write-block-size and also zeroize the disks which will require to wait for migrations to complete between each node.
  • If running the older cluster protocol (versions 3.13 and older), it may also be necessary to wait for migrations to complete between each node (depending on the nature of the write workload).

Can I increase and decrease the value with only rolling service-restarts on a cluster?

Once the configuration is increased, it cannot be decreased without zeroizing the disks.

What happens when I write records bigger than the configured write-block-size?

This configuration upper-limits the maximum size of the record that can be written on the cluster. Any records with a size bigger than write-block-size will trigger an error to the client and a write-failure.

Server side logs / stats

  • Versions prior to 3.16:
Jan 22 2017 16:39:55 GMT: WARNING (rw): (thr_rw.c:write_local_ssd:4658) {namespace1} write_local: failed as_storage_record_write() <Digest>:0x88bb698a04a26517e5528hje57ed188e12ab29f4f
Jan 22 2017 16:39:59 GMT: WARNING (drv_ssd): (drv_ssd.c::1568) write: rejecting 1765a2048a69bb88 write size: 131328
  • Version 3.16 and above:
Aug 09 2019 00:06:21 GMT: DETAIL (drv_ssd): (drv_ssd.c:1516) write: size 9437246 - rejecting <Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13
Aug 09 2019 00:06:21 GMT: WARNING (rw): (write.c:1265) {bar} write_master: failed as_storage_record_write() <Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13
Aug 09 2019 00:06:21 GMT: DETAIL (rw): (write.c:822) {bar} write_master: record too big <Digest>:0xd751c6d7eea87c82b3d6332467e8bc9a3c630e13
Aug 09 2019 00:06:22 GMT: INFO (info): (ticker.c:884) {bar} special-errors: key-busy 0 record-too-big 217
  • The DETAIL lines will only appear if the appropriate log contexts (rw and drv_ssd) are set to detail.

  • Unlike the stat, the special-error log ticker and the ‘rw’ context line, the ‘drv_ssd’ context line occurs on all oversized attempts, including replica writes, immigrations, and applying duplicate resolution winners.

  • The fail_record_too_big statistic will be incremented on each occurrence.

Error seen on the client

AS_PROTO_RESULT_FAIL_RECORD_TOO_BIG - Error code 13

Can I determine what set is being written to, when these server log messages come up?

Refer to the knowledge base article, How to return the set name of a record using its digest.

Important considerations

A few considerations on the write-block-size parameter:

  1. This configuration is on a per-namespace basic and is only configurable if the storage-engine is device for the namespace.
  2. The value of this parameter must be a power of 2, so your options to decrease are: 128K, 256K, 512K, 1M etc.
  3. Performance characteristics of your cluster may change so careful monitor is necessary.

References

Link to the configuration reference:

To identify the optimal configuration for your setup, we recommend testing your SSD’s with our certification tool (ACT).

Server log reference for “write_master: failed as_storage_record_write”.

Server log reference for “write_master: record too big”.

Keywords

WRITE-BLOCK-SIZE WRITE BLOCK SIZE

Timestamp

August 2019

3 Likes

Hi,

I tried putting write-block-size 1M to my config file like this:

namespace test {
    write-block-size 1M
    replication-factor 2
    memory-size 4G
    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory
}

But aerospike failed to recognize this config:

May 28 2015 17:51:47 GMT: CRITICAL (config): (cfg.c:1189) line 49 :: unknown config parameter name 'write-block-size'
May 28 2015 17:51:47 GMT: WARNING (as): (signal.c::134) SIGINT received, shutting down
May 28 2015 17:51:47 GMT: WARNING (as): (signal.c::137) startup was not complete, exiting immediately

I also tried putting it inside storage-engine section like in Recipe for an SSD storage engine, but still doesn’t work:

namespace test {
    replication-factor 2
    memory-size 4G
    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory {
        write-block-size 1M
    }
}

Could you please help me on this? I don’t know what I’ve done wrong here.

Ok looks like write-block-size is only applicable to disk storage engine as I’ve read in configuration reference.

That is correct, write-block-size only applies to storage-engine device.

1 Like

Hello @kporter, How can I set size to record value on RAM ?

RAM has no record limit. The record size limit only applies if your namespace is configured with a persistent storage device @Jacky

Thanks so much.

What’s the difference between filesize and write-block-size as both of them seemed kind of similar to me?

One of the possible storage engine options for a namespace can be in-memory with persistence. The persistence layer can be defined as (using one or more) raw device, and then the write-block-size applies. Alternatively, the persistence layer can be defined as a using one or more file on the filesystem, and then the filesize configuration applies.

1 Like

The write-block-size applies to both, normally it is left to 1MiB when using file.

1 Like

RAM has 128MB record size limit due to other gating factors. It is recommended to not write records greater than 8MB even in RAM should you ever want to transition the namespace from in-memory only to device storage. [Updating because this KB is referred to in the Server Log Reference.]

1 Like