Write Latency Increase After Upgrade to Aerospike 4.4 or higher


#1

Write Latency Increase After Upgrade to Aerospike 4.4 or higher

Problem Description

A cluster running an Aerospike version prior to 4.4 is upgraded to 4.4 or higher. After the upgrade, write latency appears to have increased drastically as measured using server histograms. Performance from an application perspective does not appear to have changed though the logs may report an increase in server transaction timeouts.

Solution

Write latency has not increased in comparison to prior versions, the way in which latency is reported has been changed to increase accuracy and utility. When a large write is sent to the Aerospike server, the write may be split across multiple chunks. In earlier versions, latency was calculated based on the amount of time to read the last packet. From Aerospike 4.4 the latency is calculated from the time the first packet arrived leading to a more accurate and useful measure of latency.

Notes

  • As the time taken to demarshal network packets is now reported more accurately the server may timeout transactions. A short term resolution to this would be to increase server transaction timeouts or on a per transaction basis from the client if the desired behaviour is for such transactions to succeed.

  • Using histograms to measure latency is covered in detail here

  • Histograms affected:

    • batch-index
    • {ns}-batch-sub-start
    • {ns}-write
    • {ns}-write-start
    • svc-demarshal

Keywords

UPGRADE WRITE LATENCY HISTOGRAM

Timestamp

01/16/18