Understanding Client Write Errors


#1

Understanding Client Write errors

Abstract

One of the important metrics to monitor are the errors received during client writes (client_write_error). There are different reasons that could cause this, so it’s important to monitor the warnings on the server logs and catch the right exceptions on the clients in order to handle these errors gracefully.

Monitoring Client Write Errors

To monitor those errors, you can use your preferred monitoring plug-in, or use asadm tool as follows to check on a per-namespace basis:

asadm -e 'watch show stat like client_write_error'
[ 2017-07-18 14:22:46 'show stat like client_write_error' sleep: 2.0s iteration: 1 ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~nsSSD Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE              :   192.168.100.242:3000   v24-vm1.localnet:3000   v24-vm7.localnet:3000   
client_write_error:   200                      2532                       864                       

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE              :   192.168.100.242:3000   v24-vm1.localnet:3000   v24-vm7.localnet:3000   
client_write_error:   0                      0                       0                       

This is a cumulative statistic, so one would need to check the counts over time. Note that this does not include the transactions that timed out on the server.

List of causes

If you see the above metric increasing, it can be one or more of the following reasons:

Some possibilities which are accompanied by error specific statistic change:

  • If the key fails on generation check: fail_generation
  • If the key requested to be written is a hot-key: fail_key_busy
  • If key size is bigger than write-block-size: fail_record_too_big
  • If key written is from XDR when allow-xdr-writes configuration is false or if key is from non-XDR client when allow-non-xdr-writes configuration is false: fail_xdr_forbidden

Some possibilities which would be accompanied by server log warnings:

  • If the drives are full.
  • If an invalid-ttl is specified.
  • If the storage is overloaded (would see “queue too deep” warnings in log).
  • Policy-related problems with the incoming transaction or problems writing to storage that would trigger log warnings prefaced by " {namespace} write_master:" along with explanatory text about the exact problem.
  • Attempting to create a record in a set whose name is too long.
  • Attempting to create a record in a set, but the message has a missing or mismatched set name.
  • Attempting a durable delete with the Community Edition server.
  • If there are issues accessing a stored key.

Some other possibilities which would not be accompanied by server log warnings:

  • If the namespace is under stop-writes.
  • During an update/replace operation, but the record is either not found or found but expired or truncated.
  • If create-only is specified, but record already exists.
  • When creating a record in a set, but stop writes are in effect on the set.
  • If creating a record that would be immediately truncated.
  • Attempting to write in a set currently being deleted (for server versions earlier than 3.14).
  • If a touch operation is performed and if that record doesn’t exist.

In all of these instances, a corresponding return code would be sent back to the client.

Example:Java list of client error codes can be found on this Github page: ResultCode.java

Keywords

write error client_write_error

Timestamp

07/18/2017