Decompression fails when shipping to older XDR destinations

Decompression fails when shipping to older XDR destinations

Problem Description

When two clusters are connected by XDR and the source cluster is Aerospike 4.8.0.1 with a destination that is Aerospike 4.3.1.x or earlier, the following warnings can be seen in the aerospike.log.

Aug 10 2020 11:46:54 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)

It is possible that this may also lead to a node crash.

Explanation

Aerospike 4.8.0.1 introduced support for client/server compression as part of AER-6136. When Aerospike sends compressed messages it includes a field which describes the uncompressed size of the record. In Aerospike versions 4.8.0.1 and later, this field is in correct network byte order (big endian).

Older Aerospike servers expect this record size field to be in little endian.

This means that when a newer source cluster ships compressed records to an older destination cluster, the size of the uncompressed record is interpreted incorrectly as unfeasibly large. This causes the rejection of the record at the destination.

It is possible that destination nodes can crash when the function within the code (cf_warning_binary) that tries to log a hex dump of the problematic message over flows the 64 KB buffer allocated for this purpose.

If this buffer overflows the server goes down with a SIGSEGV. The log will show error messages such as those shown below when this issue is happening:

Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)
Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:720) compressed proto_p<HexSpaced>:02 04 cf 01 00 00 00 00 00 00 00 00 00 00 02 66 78 5e 55 91 3f 68 <...> 08 3a ca 9a 9e bc d9 48 fd 01 86 bc aa 6f 
Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)
Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)

Solution

This will only happen when shipping to an older XDR destination on a server version that is no longer supported. The most robust solution is to ensure that all XDR clusters in the topology are of a currently supported variant.

A short term solution is to switch off XDR compression. In versions prior to Aerospike 5.0 this is done by setting xdr-compression-threshold to 0. The following command switches off compression.

asinfo -v 'set-config:context=xdr;xdr-compression-threshold=0'

If the source is Aerospike 5.x or higher then the parameter to modify is enable-compression

asinfo -v "set-config:context=xdr;dc=DC1;namespace=namespaceName;enable-compression=false"

Notes

  • Shipping from the older version to the newer version will not fail as the newer server version is capable of correctly parsing the incoming messages, even with incorrect endian-ness.
  • This is a particular issue for XDR as opposed to other functions as even old XDR versions have the capability to compress data as it is sent.

Applies To

Server 4.3.1 or later

Keywords

XDR COMPRESSION NODE CRASH DECOMPRESSION FAILED

Timestamp

October 2021

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.