Aerospike batch requests performance tuning

Hello,

Batch read use case is most important while using Aerospike for us.

We have four sets from which we query data. 2 are stored in device (pcs_device namespace) and 2 are stored in memory (pcs_memory namespace). Each set is queried with 3000 keys.

Currently our db queries for 3000 records take around 45ms.

My questions are:

  • Is database configuration set up properly?
  • Are C# aerospike client and read/write policies set up properly?
  • Is there something we are missing out what could increase performance of batch read operations?

Our setup:

We have 3 db machines with:

  • Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-1021-aws x86_64)
  • 16 core 2nd generation AMD EPYC 7002 3.3 GHz32 gb ram
  • 32 GB RAM
  • 2 x 300 GB nvme

Db configuration file:

service {
  proto-fd-max 80000
}

logging {
  file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
  service {
    address any
    port 3000
  }

  heartbeat {
    mode mesh
    mesh-seed-address-port ip_of_first_machine 3002
    mesh-seed-address-port ip_of_second_machine 3002
    mesh-seed-address-port ip_of_third_machine 3002
    port 3002
    interval 150
    timeout 10
  }

  fabric {
    port 3001
  }

  info {
    port 3003
  }
}

namespace pcs_device {
  replication-factor 2
  memory-size 18G
  default-ttl 0
  high-water-memory-pct 0
  high-water-disk-pct 0
  stop-writes-pct 90
  partition-tree-sprigs 16K
  index-stage-size 1G

  storage-engine device {
    device /dev/nvme1n1p1
    device /dev/nvme1n1p2
    device /dev/nvme1n1p3
    device /dev/nvme1n1p4
    device /dev/nvme2n1p1
    device /dev/nvme2n1p2
    device /dev/nvme2n1p3
    device /dev/nvme2n1p4

    write-block-size 1M
    defrag-lwm-pct 70
    data-in-memory false
  }
}

namespace pcs_memory {
  memory-size 10G

  storage-engine memory
}

C# client policy:

var clientPolicy = new AsyncClientPolicy
{
    asyncMaxCommandAction = MaxCommandAction.BLOCK,
    asyncMaxCommands = 1024,
    asyncMaxConnsPerNode = -1,
    maxConnsPerNode = 300,
    connPoolsPerNode = 10,
    asyncMaxCommandsInQueue = 1000,
};

C# client read policy set up:

var readPolicy = BatchPolicy.ReadDefault()
readPolicy.maxConcurrentThreads = 0;
readPolicy.maxRetries = 0;
readPolicy.allowInline = false;
readPolicy.SetTimeout(5000);

Are you allocating huge buffers while pulling your batches? Check your stats. You have 3 servers with 2x300GB nvme, and your write-block-size is 1MiB- that’s nearly 25Gbps if we assume 3000*1MiB/s. I would really suspect that you’re reaching an IO bottleneck at the SSD side (Aerospike) or network side (Aerospike or Client). Are you seeing aqu-sz rise when hitting your cluster? Are you capped on anything? Can you tell us anything about your hardware or troubleshooting you’ve done? Have you reviewed FAQ - batch-index tuning parameters ?

1 Like

We have adjusted write-block-size to 1MiB because some records do not fit into 512KiB block while writing data into Aerospike db

These are stats of one of our db servers while application is reading data from db:

iostat -x

Linux 5.15.0-1021-aws 	10/27/22 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.51    0.00    2.02    0.17    0.01   97.30

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1        268.15   1485.26     0.00   0.00    0.20     5.54    8.64   1037.79     0.00   0.00    0.76   120.11    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.06   2.23
nvme2n1        267.40   1495.58     0.00   0.00    0.20     5.59    8.72   1047.48     0.00   0.00    0.73   120.17    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.06   2.23

Regarding batch-index tuning

We haven’t noticed nothing problematic in this statistics or logs, so we have left default values for these configurations.

asadm -e “show stat like batch_index”

Node                               |first_db_machine:3000                                          |second_db_machine:3000                                         |third_db_machine:3000                                   
batch_index_complete               |963238                                                         |963262                                                         |963231                                                         
batch_index_created_buffers        |14658                                                          |13239                                                          |12292                                                          
batch_index_delay                  |2006621                                                        |2139021                                                        |3851369                                                        
batch_index_destroyed_buffers      |14583                                                          |13166                                                          |12215                                                          
batch_index_error                  |0                                                              |0                                                              |0                                                              
batch_index_huge_buffers           |14583                                                          |13166                                                          |12215                                                          
batch_index_initiate               |963238                                                         |963262                                                         |963231                                                         
batch_index_proto_compression_ratio|1.0                                                            |1.0                                                            |1.0                                                            
batch_index_proto_uncompressed_pct |0.0                                                            |0.0                                                            |0.0                                                            
batch_index_queue                  |0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0|0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0|0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0
batch_index_timeout                |0                                                              |0                                                              |0                                                              
batch_index_unused_buffers         |75                                                             |73                                                             |77

https://man7.org/linux/man-pages/man1/iostat.1.html

The first report generated by the iostat command provides statistics concerning the time since the system was booted, unless the -y option is used

This means your iostat -x output doesn’t really show us much. It’s really an average since boot time, and will completely hide any kind of spikes. sar -d -p might be helpful to look historically if you have sysstat collection enabled (and persistent enough issues or fast enough frequency to catch the otherwise 10min resolution) or iostat -zxmty 1 and observe during peak loads would be much more helpful.

Have you done any troubleshooting? What about the network both on the Aerospike cluster side and client side? Many, many, many times I have a dev that escalates a ‘batch latency’ issue when it turns out they asked for more than their machine could handle - batch is dangerous that way. You need to compare your tested/promised speeds vs what you’re seeing on the nic of both the cluster nodes and clients. Let me know if you need some pointers on doing that.

Have you looked at anything else? What about CPU for both client and server? Does the latency scale linearly with request size; do smaller batches happen faster? Does the latency ebb and flow with traffic? Have you tried tuning any of the aerospike server settings like batch index threads (assuming there is headroom)?

What kind of machines are these? Are they very small sliced instances in AWS? Those seem like some awfully small drives…

To more directly answer your question - I don’t think that any of your settings are wrong, in fact what you’ve done is definitely good, but use cases have to be tuned and bottlenecks have to be reviewed. So at this point its probably good to just understand what makes the latency scale and if you can find any easy bottlenecks - if you are sure your clients and servers have the headroom it makes sense to try tuning up aerospike settings like threads and buffers - maybe even look to adjust kernel tunings like min-free-kb among others.

1 Like

This is result of iostat -zxmty 1 during peak loads

11/04/22 12:44:31
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.91    0.00    6.29    2.35    0.00   89.45

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       9104.00     15.53     0.00   0.00    0.13     1.75    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.22  22.80
nvme2n1       9080.00     15.43     0.00   0.00    0.13     1.74    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.21  22.80


11/04/22 12:44:32
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.03    0.00    5.72    2.35    0.00   89.89

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       8660.00     15.87     0.00   0.00    0.13     1.88    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.16  23.20
nvme2n1       8722.00     15.52     0.00   0.00    0.13     1.82    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.16  23.60


11/04/22 12:44:33
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.70    0.00    5.64    2.19    0.00   89.47

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       7791.00     14.45     0.00   0.00    0.13     1.90    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.02  21.60
nvme2n1       8036.00     14.13     0.00   0.00    0.13     1.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.06  22.00


11/04/22 12:44:34
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.04    0.00    6.84    2.66    0.00   87.45

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       9787.00     17.49     0.00   0.00    0.13     1.83    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.29  26.80
nvme2n1       9670.00     16.76     0.00   0.00    0.13     1.77    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.26  26.80


11/04/22 12:44:35
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.15    0.00    5.37    1.96    0.00   90.53

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       7741.00     12.65     0.00   0.00    0.13     1.67    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.00  19.20
nvme2n1       7697.00     13.27     0.00   0.00    0.13     1.77    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.01  19.20


11/04/22 12:44:36
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.33    0.00    3.79    1.58    0.00   93.30

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       6401.00     10.05     0.00   0.00    0.13     1.61    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.84  13.60
nvme2n1       6478.00     11.14     0.00   0.00    0.13     1.76    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.86  13.60


11/04/22 12:44:37
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.45    0.00    3.59    1.32    0.00   93.64

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       5296.00      9.92     0.00   0.00    0.13     1.92    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.71  12.00
nvme2n1       5288.00     10.08     0.00   0.00    0.14     1.95    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.71  12.00


11/04/22 12:44:38
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.50    0.00    7.86    2.33    0.00   86.31

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       9064.00     15.85     0.00   0.00    0.13     1.79    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.21  22.80
nvme2n1       9111.00     15.31     0.00   0.00    0.13     1.72    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.19  23.20


11/04/22 12:44:39
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.32    0.00    6.45    2.38    0.00   87.85

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       9062.00     16.23     0.00   0.00    0.13     1.83    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.21  21.20
nvme2n1       8989.00     16.92     0.00   0.00    0.13     1.93    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.19  21.20


11/04/22 12:44:40
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.97    0.00    3.81    1.78    0.00   92.44

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       6632.00     12.02     0.00   0.00    0.13     1.86    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.88  16.80
nvme2n1       6518.00     11.79     0.00   0.00    0.13     1.85    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.86  16.80


11/04/22 12:44:41
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.01    0.00    5.25    2.66    0.00   90.09

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       9590.00     17.86     0.00   0.00    0.13     1.91    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.27  22.80
nvme2n1       9610.00     16.58     0.00   0.00    0.13     1.77    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.25  22.80


11/04/22 12:44:42
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.90    0.00    2.38    1.42    0.00   95.30

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       5623.00      9.47     0.00   0.00    0.13     1.72    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.74  13.60
nvme2n1       5613.00     10.58     0.00   0.00    0.13     1.93    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.74  14.40


11/04/22 12:44:43
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.50    0.00    4.00    1.31    0.06   93.12

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       5273.00      9.80     0.00   0.00    0.13     1.90    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.71  10.80
nvme2n1       5332.00      9.57     0.00   0.00    0.14     1.84    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.72  10.80
1 Like

AWS instances which are used for DB machines are of type c5ad.4xlarge

Current loads we generate (taken from one of DB machines):

  • Network traffic on ens5: 231,63Mbps
  • CPU utilization: CPU interrupt time: 0%; CPU iowait time 1.9772%; CPU system time 4.0106%; CPU user time 2.1477%; CPU steal time 0.005287%
  • Disk IOPS on nvme1n1 7.55K; Disk IOPS on nvme2n1 7.54K
  • Memory usage 13.65GB (from 19.39GB available)

We have also noticed one very interesting thing:

Queries with key which is not present (NOT_FOUND) result takes almost same time to return result as queries with keys, which are present.

We have tested this case with raw database calls (no application logic) in batches (same way as normal call to DB would be executed).

Response time of DB query with keys present - 68ms

Response time of DB query with keys not present - 65ms

1 Like

None of that sounds like an aerospike server bottleneck. Have you looked at network/cpu/etc from client side too? That really makes it seem like this is just the client side performance if you get same performance with null keys - mem/cpu/thread pressure. How are you measuring it from client side? Could you share some snippet?

1 Like

This is a code snippet how we are making batch requests:

    private const char KeySeparator = '-';
    private const string SetName = "dataset";
    private const string BinName = "m";
    
    public Task<Record?[]> GetManyRaw(List<KeyPartGroup> keyPartGroups, CancellationToken ct)
    {
        var keys = GetKeys(keyPartGroups).ToArray();

        return GetRecords(keys, ct);
    }

    private Task<Record?[]> GetRecords(Key[] keysBatch, CancellationToken ct)
    {
        return _aerospikeClient.Get(_readPolicy, ct, keysBatch);
    }

    private List<Key> GetKeys(List<KeyPartGroup> keyPartGroups)
    {
        var keys = new List<Key>(keyPartGroups.Count);
        foreach (var group in CollectionsMarshal.AsSpan(keyPartGroups))
        {
            keys.Add(new Key(_setting.InMemoryNamespace, SetName, GenerateKey(group.firstPart, group.secondPart)));
        }

        return keys;
    }

    private static string GenerateKey(string firstKeyPart, string secondKeyPart)
    {
        return $"{firstKeyPart}{KeySeparator}{secondKeyPart}";
    }

Can you illustrate how its being measured

1 Like

Sorry for long response.

We are measuring client side performance with help of prometheus DotNetRuntimeStatsBuilder configured to collect GcStats and ThreadPoolStats. Neither of those resources are the problem

Right but where you start and stop timers could be important, like if its time spent building a batch request/array of keys vs the actual get

1 Like

Correct. We have been tracking metrics of requests correctly (only get requests).

We were able to find out, that the client machine together with application were not able to handle such network loads. The Aerospike DB is handling all the load really well.

Thank you for your help! We really appreciate it!