How to set read.consistency_level in python aerospike client?

Hi Is there a way to set “read.consistency_level=all” in python client in connection or in transaction ?

Is there anothers setting to force “read.consistency_level=all” only if cluster is replicating/synchronizing partition ?

– Kind Regards Marek Grzybowski

We are working on exposing those policies for the underlying C-client in one of the next releases (depending on which branches get into master first). It’s close.

Ronen

Excellent, this feature seems to be very important. Current default aerospike settings (write.commit_level=all, read.consistency_level=one) means that if some ssd storage node is back from downtime, it will return old values from ssd for all master records belonging to node. If You have counters on aerospike, and You increment them frequently, every aeorospike node restart means data loss. AFIK only way to awoid this beahevior is to setup global read.consistency_level=all, or every time node is down wipe ssd disks. Or maybe there is another way ?

PS: We wrote some nagios plugins, to test our aerospike cluster: GitHub - RTBHOUSE/check_aerospike_put_get: Nagios plugin that gets and put data to aerospike to assess potential data loss after node failures. GitHub - RTBHOUSE/check_naglio_aerospike: Nagios plugins that parse output from aerospike tools.

1 Like

Hi Marek,

Thank you for your follow-up post. We are looking into your issue and will get back to you soon. Thank you for your patience.

Regards,

Maud

Hi Marek,

If a node is recovering from being partitioned an automatic process will resolve write duplicates by assuming the copy with the most recent timestamp is the canonical one - unless you overwrote the config parameter write-duplicate-resolution-disable to true. If the client sends a write to the wrong node during cluster reconfiguration because its not yet in sync about the new owner of the record the cluster will proxy the write to the correct node.

Both situations are rare and handled by the cluster.

See more here: http://www.aerospike.com/docs/architecture/data-distribution.html and http://www.aerospike.com/docs/reference/configuration/#write-duplicate-resolution-disable

Hi Ronen Steps to reproduce my tests on clear three node cluster ( i currently do not have spare ssd servers, so i used containers and file backstore instead, result is the same ) :

ii  aerospike-server-community            3.4.0-1                             The Aerospike distributed datastore allows fully scalable and reliable data storage with elastic server properties.
ii  aerospike-tools                       3.4.0                               Aerospike server tools.

aerospike.conf:

namespace test {
        replication-factor 2
        memory-size 2G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine memory

        storage-engine device {
                file /opt/aerospike/data/test.dat
                filesize 16G
                data-in-memory true # Store data in memory in addition to file.
        }
}
  • start clear cluster with namespace “test” that have persistant storage ( file or SSD )

  • check config ( to make sure “write-duplicate-resolution-disable” is not enabled ):

    asinfo -v ‘get-config:’ requested value get-config: value is transaction-queues=4;transaction-threads-per-queue=4;transaction-duplicate-threads=0;transaction-pending-limit=20;migrate-threads=1;migrate-xmit-priority=40;migrate-xmit-sleep=500;migrate-read-priority=10;migrate-read-sleep=500;migrate-xmit-hwm=10;migrate-xmit-lwm=5;migrate-max-num-incoming=256;migrate-rx-lifetime-ms=60000;proto-fd-max=15000;proto-fd-idle-ms=60000;transaction-retry-ms=1000;transaction-max-ms=1000;transaction-repeatable-read=false;dump-message-above-size=134217728;ticker-interval=10;microbenchmarks=false;storage-benchmarks=false;scan-priority=200;scan-sleep=1;batch-threads=4;batch-max-requests=5000;batch-priority=200;nsup-delete-sleep=0;nsup-period=120;nsup-startup-evict=true;paxos-retransmit-period=5;paxos-single-replica-limit=1;paxos-max-cluster-size=32;paxos-protocol=v3;paxos-recovery-policy=manual;write-duplicate-resolution-disable=false;respond-client-on-master-completion=false;replication-fire-and-forget=false;info-threads=16;allow-inline-transactions=true;use-queue-per-device=false;snub-nodes=false;fb-health-msg-per-burst=0;fb-health-msg-timeout=200;fb-health-good-pct=50;fb-health-bad-pct=0;auto-dun=false;auto-undun=false;prole-extra-ttl=0;max-msgs-per-type=-1;pidfile=/var/run/aerospike/asd.pid;memory-accounting=false;udf-runtime-gmax-memory=18446744073709551615;udf-runtime-max-memory=18446744073709551615;sindex-populator-scan-priority=3;sindex-data-max-memory=18446744073709551615;query-threads=6;query-worker-threads=15;query-priority=10;query-in-transaction-thread=0;query-req-in-query-thread=0;query-req-max-inflight=100;query-bufpool-size=256;query-batch-size=100;query-sleep=1;query-job-tracking=false;query-short-q-max-size=500;query-long-q-max-size=500;query-rec-count-bound=4294967295;query-threshold=10;query-untracked-time=1000000;service-address=0.0.0.0;service-port=3000;mesh-address=172.16.9.37;mesh-port=3002;reuse-address=true;fabric-port=3001;network-info-port=3003;enable-fastpath=true;heartbeat-mode=mesh;heartbeat-protocol=v2;heartbeat-address=172.16.9.37;heartbeat-port=3002;heartbeat-interval=150;heartbeat-timeout=10;enable-security=false;privilege-refresh-period=300;report-authentication-sinks=0;report-sys-admin-sinks=0;report-user-admin-sinks=0;report-violation-sinks=0;syslog-local=-1;xdr-delete-shipping-enabled=true;xdr-nsup-deletes-enabled=false;enable-xdr=false;stop-writes-noxdr=false;reads-hist-track-back=1800;reads-hist-track-slice=10;reads-hist-track-thresholds=1,8,64;writes_master-hist-track-back=1800;writes_master-hist-track-slice=10;writes_master-hist-track-thresholds=1,8,64;proxy-hist-track-back=1800;proxy-hist-track-slice=10;proxy-hist-track-thresholds=1,8,64;writes_reply-hist-track-back=1800;writes_reply-hist-track-slice=10;writes_reply-hist-track-thresholds=1,8,64;udf-hist-track-back=1800;udf-hist-track-slice=10;udf-hist-track-thresholds=1,8,64;query-hist-track-back=1800;query-hist-track-slice=10;query-hist-track-thresholds=1,8,64;query_rec_count-hist-track-back=1800;query_rec_count-hist-track-slice=10;query_rec_count-hist-track-thresholds=1,8,64

    asinfo -v ‘get-config:context=namespace;id=test’ requested value get-config:context=namespace;id=test value is ;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=2592000;max-ttl=0;conflict-resolution-policy=generation;allow_versions=false;single-bin=false;ldt-enabled=false;enable-xdr=falsesets-enable-xdr=trueforward-xdr-writes=false;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;total-bytes-disk=17179869184;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;write-smoothing-period=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=17179869184;writethreads=1;writecache=67108864;obj-size-hist-max=100

  • put some several key-values to the cluster:

    git clone GitHub - RTBHOUSE/check_aerospike_put_get: Nagios plugin that gets and put data to aerospike to assess potential data loss after node failures. ./check_aerospike_put_get/check_aerospike_put_get.py -i

  • increment values ( get , increment, put )

    ./check_aerospike_put_get/check_aerospike_put_get.py

  • down single cluster node ( you can wait since cluster finish migrate all partition, but it does not mather )

  • increment values in loop

    for x in {1…30} ; do ./check_aerospike_put_get/check_aerospike_put_get.py ; done

  • start node that were down in time when You icrement values

    aql -c ‘select * from test’ ±-----------+ | nagios-bin | ±-----------+ | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 88 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 88 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | | 103 | ±-----------+ 30 rows in set (0.062 secs)

In my test two values are different than others.

There is time window when python client reads old values from node that is getting up. The more data in partitions, the longer time window when client reads old records. Writes/puts are always go in the right place, rekord generation is the same for all values.

I’ll look to recreate the problem using your code. Thanks.

I will get back to reproducing your problem. Sorry about the delay.

First, though, I wanted to point out that the read consistency level and read replica policies can be controlled by passing a policy with the get() method, as seen in test/test_get.py.

  • The policy field ‘consistency’ can take on the values aerospike.POLICY_CONSISTENCY_ONE (default) or aerospike.POLICY_CONSISTENCY_ALL.
  • The ‘replica’ field can take on the values aerospike.POLICY_REPLICA_MASTER (default) or aerospike.POLICY_REPLICA_ANY.

The write commit level policy can be controlled by passing a policy containing it to the put() method, as seen in test/test_put.py.

  • The policy field ‘commit_level’ can take on the values aerospike.POLICY_COMMIT_LEVEL_ALL (default) or aerospike.POLICY_COMMIT_LEVEL_MASTER.