High Query latency

In the aerospike set we have four bins userId, adId, timestamp, eventype and the primary key is userId:timestamp. Secondary Index is created on userId to get all the records for a particular user and the resulted records are passed to stream udf. On our client side till 500 qps the aerosike query latency is fine but as soon as we increase the qps above 500 the aerospike query latency shoots up (around ~ 10 ms)

Below is the message that we see on our client side:

Attached below is the lua file :

Hi Aditi. You’re not providing much information about your config, number of nodes, where the servers are deployed (virtualized environment like AWS or Vagrant or physical servers), or really much else. All those things will affect your performance, and ability to scale. Do share more of those.

There are two thing to note. Since your queries involves a stream UDF it will be significantly slower than a regular query. Also, there is a limit on how many Lua instances you have on each node. This means that as you create more concurrent queries that use this stream UDF module, there is a relatively big overhead in creating and destroying Lua instances.

Hello @rbotzer . There are 2 nodes and the server is hosted in AWS with T2.large instance type.

Given below are the aerospike config parameters:

transaction-queues=8;transaction-threads-per-queue=8;transaction-duplicate-threads=0;transaction-pending-limit=20;migrate-threads=1;migrate-xmit-priority=40;migrate-xmit-sleep=500;migrate-read-priority=10;migrate-read-sleep=500;migrate-xmit-hwm=10;migrate-xmit-lwm=5;migrate-max-num-incoming=256;migrate-rx-lifetime-ms=60000;proto-fd-max=15000;proto-fd-idle-ms=60000;proto-slow-netio-sleep-ms=1;transaction-retry-ms=1000;transaction-max-ms=1000;transaction-repeatable-read=false;dump-message-above-size=134217728;ticker-interval=10;microbenchmarks=false;storage-benchmarks=false;ldt-benchmarks=false;scan-max-active=100;scan-max-done=100;scan-max-udf-transactions=32;scan-threads=4;batch-index-threads=4;batch-threads=4;batch-max-requests=5000;batch-max-buffers-per-queue=255;batch-max-unused-buffers=256;batch-priority=200;nsup-delete-sleep=100;nsup-period=120;nsup-startup-evict=true;paxos-retransmit-period=5;paxos-single-replica-limit=1;paxos-max-cluster-size=32;paxos-protocol=v3;paxos-recovery-policy=manual;write-duplicate-resolution-disable=false;respond-client-on-master-completion=false;replication-fire-and-forget=false;info-threads=16;allow-inline-transactions=true;use-queue-per-device=false;snub-nodes=false;fb-health-msg-per-burst=0;fb-health-msg-timeout=200;fb-health-good-pct=50;fb-health-bad-pct=0;auto-dun=false;auto-undun=false;prole-extra-ttl=0;max-msgs-per-type=-1;service-threads=40;fabric-workers=16;pidfile=/var/run/aerospike/asd.pid;memory-accounting=false;udf-runtime-gmax-memory=18446744073709551615;udf-runtime-max-memory=18446744073709551615;sindex-builder-threads=4;sindex-data-max-memory=18446744073709551615;query-threads=6;query-worker-threads=15;query-priority=10;query-in-transaction-thread=0;query-req-in-query-thread=0;query-req-max-inflight=100;query-bufpool-size=256;query-batch-size=100;query-priority-sleep-us=1;query-short-q-max-size=500;query-long-q-max-size=500;query-rec-count-bound=18446744073709551615;query-threshold=10;query-untracked-time-ms=1000;pre-reserve-qnodes=false;service-address=0.0.0.0;service-port=3000;mesh-address=10.0.1.80;mesh-port=3002;reuse-address=true;fabric-port=3001;fabric-keepalive-enabled=true;fabric-keepalive-time=1;fabric-keepalive-intvl=1;fabric-keepalive-probes=10;network-info-port=3003;enable-fastpath=true;heartbeat-mode=mesh;heartbeat-protocol=v2;heartbeat-address=10.0.1.80;heartbeat-port=3002;heartbeat-interval=150;heartbeat-timeout=10;enable-security=false;privilege-refresh-period=300;report-authentication-sinks=0;report-data-op-sinks=0;report-sys-admin-sinks=0;report-user-admin-sinks=0;report-violation-sinks=0;syslog-local=-1;enable-xdr=false;xdr-namedpipe-path=NULL;forward-xdr-writes=false;xdr-delete-shipping-enabled=true;xdr-nsup-deletes-enabled=false;stop-writes-noxdr=false;reads-hist-track-back=1800;reads-hist-track-slice=10;reads-hist-track-thresholds=1,8,64;writes_master-hist-track-back=1800;writes_master-hist-track-slice=10;writes_master-hist-track-thresholds=1,8,64;proxy-hist-track-back=1800;proxy-hist-track-slice=10;proxy-hist-track-thresholds=1,8,64;udf-hist-track-back=1800;udf-hist-track-slice=10;udf-hist-track-thresholds=1,8,64;query-hist-track-back=1800;query-hist-track-slice=10;query-hist-track-thresholds=1,8,64;query_rec_count-hist-track-back=1800;query_rec_count-hist-track-slice=10;query_rec_count-hist-track-thresholds=1,8,64

We even changed the following config parameters but it further increased the latency:

query-batch-size=1000
query-short-q-max-size=100000
query-long-q-max-size=100000
query-threads=28
query-worker-threads=400
query-req-max-inflight=1000

Hi again. You’re mainly bumping against the limits of that specific instance. We do not recommend using the t2 family. Please review the recommendations section of our Amazon deployment guide. You should consider instances in the m3, c3, c4, r3, or i2 instance families.

A quick note on your configuration:

  • transaction-queues should be tuned to the number of cores. You have it set to 8, and a t2.large has 2 cores.
  • transaction-threads-per-queue is set too high for this instance type.
  • query-threads and query-worker-threads are both set too high for this instance.

My point regarding stream UDFs still holds, and we can get into more detail on those, but you likely should first move to a more appropriate instance type for your workload.

Thanks For the reply… There is a little confusion … I framed the answer incorrectly. The aerospike server is hosted in AWS with r3 large instance Type and the aerospike populator job( writing the records to aerospike) is hosted in AWS with t2 large instance Type.

I guess the limited number of lua instances is a problem for us because for less than 500 QPS query latency is low (mean latency is in microseconds ) but as soon as the QPS increases above 500 aerospike mean latency shoots upto (~ 10 ms).

Should we not use stream UDF and move the query to client side ??

Well, the r3.large is still only 2 vCPUs, so my warning about the tuning is still an issue. If you look through the configuration reference you’ll see what it suggests you tune based on your number of cores.

Either way, the overhead would have to do with not having enough Lua instances to handle your requests, and the overhead of instantiating new ones as your traffic increases. UDFs are not a great fit for points where you need low latency. They work very well for things that can run in the background (maintenance, reporting), and basically anywhere where you don’t have a large number of concurrent requests to them. You would want to use the native key-value, scan, and query operations that don’t involve a UDF for that. The scalability and speed would be significantly higher.