When two clusters are connected by XDR and the source cluster is Aerospike 188.8.131.52 with a destination that is Aerospike 4.3.1.x or earlier, the following warnings can be seen in the
Aug 10 2020 11:46:54 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)
It is possible that this may also lead to a node crash.
Aerospike 184.108.40.206 introduced support for client/server compression as part of AER-6136. When Aerospike sends compressed messages it includes a field which describes the uncompressed size of the record. In Aerospike versions 220.127.116.11 and later, this field is in correct network byte order (big endian).
Older Aerospike servers expect this record size field to be in little endian.
This means that when a newer source cluster ships compressed records to an older destination cluster, the size of the uncompressed record is interpreted incorrectly as unfeasibly large. This causes the rejection of the record at the destination.
It is possible that destination nodes can crash when the function within the code (
cf_warning_binary) that tries to log a hex dump of the problematic message over flows the 64 KB buffer allocated for this purpose.
If this buffer overflows the server goes down with a SIGSEGV. The log will show error messages such as those shown below when this issue is happening:
Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3) Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:720) compressed proto_p<HexSpaced>:02 04 cf 01 00 00 00 00 00 00 00 00 00 00 02 66 78 5e 55 91 3f 68 <...> 08 3a ca 9a 9e bc d9 48 fd 01 86 bc aa 6f Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3) Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)
This will only happen when shipping to an older XDR destination on a server version that is no longer supported. The most robust solution is to ensure that all XDR clusters in the topology are of a currently supported variant.
A short term solution is to switch off XDR compression. In versions prior to Aerospike 5.0 this is done by setting
xdr-compression-threshold to 0. The following command switches off compression.
asinfo -v 'set-config:context=xdr;xdr-compression-threshold=0'
If the source is Aerospike 5.x or higher then the parameter to modify is
asinfo -v "set-config:context=xdr;dc=DC1;namespace=namespaceName;enable-compression=false"
- Shipping from the older version to the newer version will not fail as the newer server version is capable of correctly parsing the incoming messages, even with incorrect endian-ness.
- This is a particular issue for XDR as opposed to other functions as even old XDR versions have the capability to compress data as it is sent.
Server 4.3.1 or later
XDR COMPRESSION NODE CRASH DECOMPRESSION FAILED