Bulk/Batch Updates


#1

Hello guys,

I suppose that similar feature requests has already been reported for a few times and I’d like to discuss/request subj one more time.

My use case is pretty standard - there are about 4 billions of records (user profiles) and it’s necessary to upload daily updates to aerospike.

Given a cluster of 4 nodes with 2 SSDs (4kb write, ~18000 IOPS) with replication_factor 2, it seems that SSDs will become a bottleneck in that case, as theoretically, uploading/updating 4B of records should take more than 15.4 hours (4B records / (8 SSDs *18000 IOPS / 2 replication-factor) / 3600 seconds in hours).

So, it would be really-really great if Aerospike would support bulk updates to speed up inserting and updating data within Aerospike.

It seems that there multiple ways of optimizing such a usecase, for example,

  1. Optimizing network I/O

    • pre-calculate key digests for every record in the batch of records,
    • determine which node is responsible for each key digest
    • group all the records within the batch by the corresponding node (determined on the previous step) it should belong to
    • send the record groups to the corresponding nodes in one network hop
  2. Optimizing disk I/O

    • write multiple records to disk sequentially in bulk

What do you think?


#2

In the end you’re always going to be limited by the aggregate write capacity of the SSDs in your cluster. You didn’t mention anything about your network cards, so that’s another physical capacity barriers your nodes may have.

To be clear, Aerospike already batches your records into larger blocks, determined on your write-block-size, and stores them onto your SSD block device. Your records are not being persisted one at a time.

It is unlikely that the Aerospike client is going to be the thing slowing you down. During your bulk update period you can do the following:

  • Modify the write-commit level policy from all to master.
  • Use the async writes of the Java client (see JavaDoc) or the C client’s regular or pipeline async writes to save on network latency.
  • Again, you will most likely be bottlenecked by the max write capacity of your SSD, not the client. if you start seeing ‘queue too deep’ warnings on your cluster nodes, and later device overload (error code 18) errors, you will probably want to dynamically increase the max-write-cache of your namespace for this update, and later reduce it to its original size.

#3

Hi @rbotzer,

Thanks a lot for the tips. I’m still using old aerospike client, and had no chance to test the new eventloop abstraction yet (which seems to be very promising).

Regarding the NICs - there are 1gbps ones, and an average record size is about 256-512 bytes, so SSDs are expected to be the bottleneck, as I already mentioned.

BTW, is it correct, that in case of

  • Aerospike’s in-memory namespace
  • write-commit equal to master
  • async client with single-threaded epoll event loop

we can expect from the aerospike client to fully utilize 1 CPU core?


#4

Earlier you stated that you’re using SSDs, so I’m a bit unsure what you are asking.

  • Is data also being loaded into a second in-memory namespace?
  • Is the data being loaded into an in-memory namespace that has persistence onto SSD?
  • Or are you considering switching your namespace from storage on SSD to data-in-memory?

Please clarify which is the direction you’re going.

If the bottleneck is your network or your SSDs, you shouldn’t expect all the CPU cores to be busy. Aerospike will run on all the cores, but your NIC may full, so there will be no new work for some cores. Do you have a multiqueue network card? When you look at /proc/interrupts how many TxRx entries are there?

I wrote an answer on Stackoverflow that goes over the flow of incoming requests. Take a look there, it may clear up things.

Which version of Aerospike you’re running on will make a difference for in-memory namespaces. You should be on the latest version (3.15) if you can, but if not upgrade to something >= 3.12. See the blog post articles on What’s New in Aerospike 3.11 and Aerospike 3.12. There are similar posts for releases 3.13, 3.14, and 3.15.


#5

Hi @rbotzer

Sorry for not being clear enough.

I’m looking for a way of better utilization of Aerospike server/client and trying to find the bottleneck in the uploading data into Aerospike by means of Spark and Aerospike Hadoop Connecter.

Given:

  • Aerospike Server 3.9.1.1
  • Aerospike Client 3.3.0 [Java] (which does not support event loop abstraction)
  • 4 nodes in the cluster
  • 2 SSDs (Samsung PM863 MZ7LM480HCHP) on each node
    • random read, 4kb, 99K IOPS
    • random write, 4kb, 17K IOPS
  • Replication factor: 2
  • 1gbps NICs with single tx- and rx- queue, with Receive Packet Steering (RPS) and Receive Flow Steering (RFS) configured

Use case:

  • Upload about 4 billions of records (user profiles) into Aerospike in the most efficient way
  • Upload should take place on the daily basis
  • Reads: 10%
    Writes: 90%

Details:

  • There is a Spark Job which uses Aerospike Hadoop Connector to upload data from Hadoop cluster to Aerospike
  • Write-commit level policy is all (and I’m going to experiment with master)
  • Record-exists action policy is update (and I’m goring to experiment with replace)
  • Estimated throughput capacity of the cluster
    8 (ssd) * 17K (iops) / 2 (replication) = 68K (tps)
    
  • asloglatency statistics
    interval,s avg,ops/s max,ops/s
    10 6199.0 28158.3
    60 6199.0 27826.5
    600 6195.0 26977.3
    … and these max values are achieved only in case of increasing number of Aerospike clients

Expectations:

  • Aerospike cluster is fully utilized while Spark job is running on the idle Hadoop Cluster and is uploading data into Aerospike
  • asloglatency shows at least ~60K+ tps (60K is not a really big deal and I believe that it should be possible to achieve it with a single threaded aerospike client)

Issues:

  • Throughput only increases with increasing the number of Aerospike clients (in my case Spark executors, while it is pretty expensive, because it consumes more than needed resources of Hadoop cluster)

So, what I’d like is to utilize Aerospike cluster as much as possible with the minimal resources of the Hadoop cluster and the plan is the following

  • use newer aerospike client with netty epoll event loop
  • use master write-commit policy
  • use replace record-exists action policy

#6

Hi @rbotzer

Finally we were able to update to the new async client and fully utilize our aerospike cluster, but noticed the following pretty strange behavior (with 0% read, 100% write load)

1). CPU and Disk utilization with async client, write-commit=all, record-exists=update

A lot of reads (more than writes), when load is 100% writes is confusing a bit, but it can be explained taking into account record-exists=update.

2). CPU and Disk utilization with async client, write-commit=master, record-exists=replace

Although reads decreased from 101 MB/s to 86 MB/s and writes increased from 40 MB/s to 50 MB/s, and transfers/sec increased a little bit, existence of such amount of reads (when we’re doing writes only) is really confusing.

Could you please help me to understand what is really happening and why there are a lot of reads even if doing writes only?


#7

There’s no surprise that you have reads in case (1) when the WritePolicy.recordExistsAction is UPDATE. You’re telling it to read the record, merge in your new data, then write it back to storage.

Case (2), however, is indeed confusing, so please post your configuration file. I’m guessing one of the two:

  • You’re storing on the filesystem, not raw device storage. If that’s the case, your filesystem will be moving data all over the place as it tries to optimize storage.
  • You’re not using just Aerospike on those nodes. Earlier you mentioned Spark, Hadoop, and I’m not sure how you’re generating the load - is that being done on the same machine? Are those stats isolated to Aerospike?
  • If you’re still using Aerospike 3.9 it’s time you upgrade to version 4.1 by way of version 3.13. Regardless of what your issue is, that version is out of support and you should first move forward and check if the problem isn’t resolved already in a newer one.

#8
  • Could you share the defrag line from the logs? My guess is that defrag is writing about 25MB/s and reading ~50MB/s.
  • Also, the record is always read when it is in a set that has a sindex - have there been any secondary indices created?

#9

Guys it seems there is a bug in our code, and policies are not applied, I will post an update in a while…

Regarding the questions, please find the answers below.

@rbotzer, here is the configuration of the namespace

namespace xyz {
  high-water-memory-pct 85
  high-water-disk-pct 85
  replication-factor 2
  memory-size 248G
  default-ttl 60d
  storage-engine device {
    device /dev/sda
    device /dev/sdb
    scheduler-mode noop
    write-block-size 128K
  }
}

Are those stats isolated to Aerospike?

Yep. Aerospike is on the separate servers dedicated for Aerospike only, no other software are installed there.

… it’s time you upgrade to version 4.1

We are thinking of doing this, but would like to understand the issue with a lot of reeds first.

@kporter,

Could you share the defrag line from the logs? My guess is that defrag is writing about 25MB/s and reading ~50MB/s.

Here some defrag log lines during writing to Aerospike

May 19 2018 05:23:13 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225583236992, contig-free 231628M (1853024 wblocks), swb-free 7, w-q 0 w-tot 648030499 (162.6/s), defrag-q 0 defrag-tot 649446233 (8.3/s) defrag-w-tot 317881536 (4.1/s)
May 19 2018 05:23:15 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225568556288, contig-free 231539M (1852312 wblocks), swb-free 7, w-q 0 w-tot 649463501 (166.1/s), defrag-q 0 defrag-tot 650878709 (8.9/s) defrag-w-tot 319324330 (4.4/s)
...
May 19 2018 06:17:55 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225911040768, contig-free 182893M (1463144 wblocks), swb-free 7, w-q 0 w-tot 650052027 (212.8/s), defrag-q 636303 defrag-tot 651714371 (430.5/s) defrag-w-tot 319416482 (72.8/s)
May 19 2018 06:18:13 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225928077696, contig-free 182260M (1458085 wblocks), swb-free 7, w-q 0 w-tot 648618775 (207.0/s), defrag-q 649527 defrag-tot 650289097 (396.8/s) defrag-w-tot 317970596 (68.0/s)
May 19 2018 06:18:15 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225912963200, contig-free 182816M (1462529 wblocks), swb-free 7, w-q 0 w-tot 650056254 (211.4/s), defrag-q 640393 defrag-tot 651722073 (385.1/s) defrag-w-tot 319417926 (72.2/s)
May 19 2018 06:18:33 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225930037376, contig-free 182164M (1457312 wblocks), swb-free 7, w-q 0 w-tot 648622943 (208.4/s), defrag-q 653090 defrag-tot 650296055 (347.9/s) defrag-w-tot 317971942 (67.3/s)
May 19 2018 06:18:35 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225914911360, contig-free 182731M (1461851 wblocks), swb-free 7, w-q 0 w-tot 650060498 (212.2/s), defrag-q 643746 defrag-tot 651728992 (346.0/s) defrag-w-tot 319419344 (70.9/s)
May 19 2018 06:18:53 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225931986688, contig-free 182069M (1456554 wblocks), swb-free 7, w-q 0 w-tot 648627094 (207.6/s), defrag-q 655733 defrag-tot 650302090 (301.8/s) defrag-w-tot 317973279 (66.8/s)
May 19 2018 06:18:55 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225916856704, contig-free 182648M (1461185 wblocks), swb-free 7, w-q 0 w-tot 650064710 (210.6/s), defrag-q 646145 defrag-tot 651734936 (297.2/s) defrag-w-tot 319420743 (69.9/s)
May 19 2018 06:19:13 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225933954432, contig-free 181974M (1455798 wblocks), swb-free 7, w-q 0 w-tot 648631246 (207.6/s), defrag-q 657617 defrag-tot 650307371 (264.0/s) defrag-w-tot 317974607 (66.4/s)
May 19 2018 06:19:15 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225918832384, contig-free 182566M (1460532 wblocks), swb-free 6, w-q 0 w-tot 650068931 (211.1/s), defrag-q 647870 defrag-tot 651740229 (264.6/s) defrag-w-tot 319422142 (69.9/s)
May 19 2018 06:19:33 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225935916416, contig-free 181877M (1455022 wblocks), swb-free 7, w-q 0 w-tot 648635375 (206.4/s), defrag-q 658785 defrag-tot 650311892 (226.1/s) defrag-w-tot 317975912 (65.2/s)
May 19 2018 06:19:35 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225920788608, contig-free 182484M (1459876 wblocks), swb-free 6, w-q 0 w-tot 650073140 (210.4/s), defrag-q 648773 defrag-tot 651744686 (222.9/s) defrag-w-tot 319423527 (69.2/s)
May 19 2018 06:19:53 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225937862400, contig-free 181787M (1454296 wblocks), swb-free 7, w-q 0 w-tot 648639519 (207.2/s), defrag-q 659365 defrag-tot 650315890 (199.9/s) defrag-w-tot 317977232 (66.0/s)
May 19 2018 06:19:55 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 225922801664, contig-free 182404M (1459239 wblocks), swb-free 6, w-q 0 w-tot 650077347 (210.4/s), defrag-q 649116 defrag-tot 651748599 (195.6/s) defrag-w-tot 319424908 (69.1/s)
May 19 2018 06:20:13 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 225939878784, contig-free 181692M (1453539 wblocks), swb-free 7, w-q 0 w-tot 648643634 (205.8/s), defrag-q 659314 defrag-tot 650319196 (165.3/s) defrag-w-tot 317978520 (64.4/s)

Also, the record is always read when it is in a set that has a sindex - have there been any secondary indices created?

There are no secondary indices in use.


#10

So defrag read rate ranged from ~21MiB/s (165.3 * 128KiB) to ~55MiB/s (430.5 * 128KiB).

Defrag writes were about 10MiB/s.

With default defrag settings and defrag equilibrium, you would expect the write total to be approximately equal to defrag-total.


#11

Well we were mistakenly overriding the already set write policy, so here is the updated screenshot when both write-commit=master as well as record-exists=replace set.

It somehow has happened that NICs are still not configured properly, so all the network interrupts are being processed by the first CPU core only (after configuring RSS/RPS/RFS I will post an update)

Regarding defrag lines here is an update

May 20 2018 12:23:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194473M (1555791 wblocks), swb-free 12, w-q 0 w-tot 652659226 (320.4/s), defrag-q 1211 defrag-tot 653779125 (295.8/s) defrag-w-tot 320279609 (119.3/s)
May 20 2018 12:24:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194368M (1554945 wblocks), swb-free 16, w-q 0 w-tot 651190033 (309.3/s), defrag-q 6124 defrag-tot 652313811 (480.2/s) defrag-w-tot 318796463 (105.4/s)
May 20 2018 12:24:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194326M (1554611 wblocks), swb-free 12, w-q 0 w-tot 652665584 (317.9/s), defrag-q 5493 defrag-tot 653788585 (473.0/s) defrag-w-tot 320281868 (112.9/s)
May 20 2018 12:24:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194247M (1553981 wblocks), swb-free 16, w-q 0 w-tot 651195979 (297.3/s), defrag-q 9722 defrag-tot 652322392 (429.0/s) defrag-w-tot 318798323 (93.0/s)
May 20 2018 12:24:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194210M (1553685 wblocks), swb-free 12, w-q 0 w-tot 652671613 (301.5/s), defrag-q 9092 defrag-tot 653797287 (435.1/s) defrag-w-tot 320283809 (97.1/s)
May 20 2018 12:24:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194186M (1553490 wblocks), swb-free 16, w-q 0 w-tot 651201537 (277.9/s), defrag-q 10988 defrag-tot 652328725 (316.6/s) defrag-w-tot 318799896 (78.7/s)
May 20 2018 12:24:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194170M (1553361 wblocks), swb-free 12, w-q 0 w-tot 652677335 (286.1/s), defrag-q 10585 defrag-tot 653804177 (344.5/s) defrag-w-tot 320285544 (86.8/s)
May 20 2018 12:25:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194137M (1553103 wblocks), swb-free 16, w-q 0 w-tot 651206834 (264.9/s), defrag-q 13950 defrag-tot 652336597 (393.6/s) defrag-w-tot 318801102 (60.3/s)
May 20 2018 12:25:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194124M (1552993 wblocks), swb-free 12, w-q 0 w-tot 652682719 (269.2/s), defrag-q 13130 defrag-tot 653811738 (378.0/s) defrag-w-tot 320286835 (64.6/s)
May 20 2018 12:25:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194130M (1553044 wblocks), swb-free 16, w-q 0 w-tot 651211988 (257.7/s), defrag-q 11654 defrag-tot 652339396 (139.9/s) defrag-w-tot 318802177 (53.8/s)
May 20 2018 12:25:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194125M (1553000 wblocks), swb-free 12, w-q 0 w-tot 652687914 (259.8/s), defrag-q 10876 defrag-tot 653814687 (147.4/s) defrag-w-tot 320287971 (56.8/s)
May 20 2018 12:25:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194128M (1553029 wblocks), swb-free 15, w-q 0 w-tot 651217014 (251.3/s), defrag-q 11166 defrag-tot 652343919 (226.1/s) defrag-w-tot 318803143 (48.3/s)
May 20 2018 12:25:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194161M (1553290 wblocks), swb-free 12, w-q 0 w-tot 652693155 (262.0/s), defrag-q 10012 defrag-tot 653819353 (233.3/s) defrag-w-tot 320289135 (58.2/s)
May 20 2018 12:26:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194166M (1553332 wblocks), swb-free 16, w-q 0 w-tot 651221965 (247.6/s), defrag-q 10565 defrag-tot 652348572 (232.6/s) defrag-w-tot 318803949 (40.3/s)
May 20 2018 12:26:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194170M (1553366 wblocks), swb-free 12, w-q 0 w-tot 652698381 (261.3/s), defrag-q 9813 defrag-tot 653824457 (255.2/s) defrag-w-tot 320290211 (53.8/s)
May 20 2018 12:26:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194131M (1553055 wblocks), swb-free 16, w-q 0 w-tot 651227230 (263.2/s), defrag-q 13739 defrag-tot 652356734 (408.1/s) defrag-w-tot 318805117 (58.4/s)
May 20 2018 12:26:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194131M (1553055 wblocks), swb-free 12, w-q 0 w-tot 652703942 (278.0/s), defrag-q 13435 defrag-tot 653833329 (443.6/s) defrag-w-tot 320291685 (73.7/s)
May 20 2018 12:26:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194073M (1552586 wblocks), swb-free 16, w-q 0 w-tot 651232593 (268.1/s), defrag-q 19067 defrag-tot 652366956 (511.1/s) defrag-w-tot 318806432 (65.8/s)
May 20 2018 12:26:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194109M (1552879 wblocks), swb-free 12, w-q 0 w-tot 652709554 (280.6/s), defrag-q 18112 defrag-tot 653843442 (505.6/s) defrag-w-tot 320293224 (76.9/s)
May 20 2018 12:27:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194040M (1552321 wblocks), swb-free 15, w-q 0 w-tot 651237899 (265.3/s), defrag-q 20307 defrag-tot 652373237 (314.0/s) defrag-w-tot 318807652 (61.0/s)
May 20 2018 12:27:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194098M (1552791 wblocks), swb-free 12, w-q 0 w-tot 652715073 (276.0/s), defrag-q 18576 defrag-tot 653849337 (294.8/s) defrag-w-tot 320294676 (72.6/s)
May 20 2018 12:27:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226703420544, contig-free 194039M (1552313 wblocks), swb-free 16, w-q 0 w-tot 651243105 (260.3/s), defrag-q 18567 defrag-tot 652376695 (172.9/s) defrag-w-tot 318808823 (58.5/s)
May 20 2018 12:27:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226687737600, contig-free 194152M (1553216 wblocks), swb-free 12, w-q 0 w-tot 652720341 (263.4/s), defrag-q 16936 defrag-tot 653853390 (202.6/s) defrag-w-tot 320295883 (60.3/s)
...
May 20 2018 12:50:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226704395264, contig-free 190721M (1525771 wblocks), swb-free 15, w-q 0 w-tot 651599402 (260.5/s), defrag-q 70223 defrag-tot 652758106 (473.6/s) defrag-w-tot 318880591 (55.5/s)
May 20 2018 12:50:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226688778240, contig-free 191211M (1529693 wblocks), swb-free 12, w-q 0 w-tot 653086888 (269.4/s), defrag-q 65621 defrag-tot 654245099 (522.6/s) defrag-w-tot 320377927 (64.3/s)
May 20 2018 12:51:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226705865344, contig-free 190701M (1525609 wblocks), swb-free 16, w-q 0 w-tot 651604528 (256.3/s), defrag-q 79059 defrag-tot 652771906 (690.0/s) defrag-w-tot 318881633 (52.1/s)
May 20 2018 12:51:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226690207872, contig-free 191221M (1529768 wblocks), swb-free 12, w-q 0 w-tot 653092077 (259.5/s), defrag-q 74366 defrag-tot 654259108 (700.5/s) defrag-w-tot 320379061 (56.7/s)
May 20 2018 12:51:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226707184768, contig-free 190718M (1525747 wblocks), swb-free 16, w-q 0 w-tot 651609585 (252.9/s), defrag-q 85676 defrag-tot 652783718 (590.6/s) defrag-w-tot 318882598 (48.2/s)
May 20 2018 12:51:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226691583744, contig-free 191216M (1529734 wblocks), swb-free 11, w-q 0 w-tot 653097167 (254.5/s), defrag-q 81581 defrag-tot 654271379 (613.5/s) defrag-w-tot 320380040 (49.0/s)
May 20 2018 12:51:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226709136512, contig-free 190713M (1525708 wblocks), swb-free 16, w-q 0 w-tot 651614475 (244.5/s), defrag-q 97612 defrag-tot 652800505 (839.3/s) defrag-w-tot 318883397 (40.0/s)
May 20 2018 12:51:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226693521280, contig-free 191252M (1530018 wblocks), swb-free 12, w-q 0 w-tot 653102133 (248.3/s), defrag-q 93478 defrag-tot 654288526 (857.3/s) defrag-w-tot 320380926 (44.3/s)
May 20 2018 12:52:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226711364096, contig-free 190716M (1525728 wblocks), swb-free 16, w-q 0 w-tot 651619194 (235.9/s), defrag-q 111684 defrag-tot 652819316 (940.5/s) defrag-w-tot 318884092 (34.8/s)
May 20 2018 12:52:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226695839744, contig-free 191333M (1530664 wblocks), swb-free 12, w-q 0 w-tot 653107000 (243.4/s), defrag-q 106906 defrag-tot 654307467 (947.0/s) defrag-w-tot 320381764 (41.9/s)
May 20 2018 12:52:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226714539904, contig-free 190738M (1525910 wblocks), swb-free 16, w-q 0 w-tot 651623984 (239.5/s), defrag-q 133116 defrag-tot 652845720 (1320.2/s) defrag-w-tot 318884783 (34.5/s)
May 20 2018 12:52:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226699148928, contig-free 191416M (1531334 wblocks), swb-free 11, w-q 0 w-tot 653111891 (244.6/s), defrag-q 129243 defrag-tot 654335364 (1394.8/s) defrag-w-tot 320382547 (39.2/s)
May 20 2018 12:52:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226718679552, contig-free 190791M (1526328 wblocks), swb-free 16, w-q 0 w-tot 651628723 (236.9/s), defrag-q 162621 defrag-tot 652880382 (1733.1/s) defrag-w-tot 318885467 (34.2/s)
May 20 2018 12:52:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226703257472, contig-free 191512M (1532101 wblocks), swb-free 12, w-q 0 w-tot 653116709 (240.9/s), defrag-q 158137 defrag-tot 654369844 (1724.0/s) defrag-w-tot 320383310 (38.2/s)
May 20 2018 12:53:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226722215808, contig-free 190848M (1526788 wblocks), swb-free 15, w-q 0 w-tot 651633499 (238.8/s), defrag-q 188018 defrag-tot 652911015 (1531.7/s) defrag-w-tot 318886151 (34.2/s)
May 20 2018 12:53:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226706731008, contig-free 191605M (1532841 wblocks), swb-free 11, w-q 0 w-tot 653121516 (240.4/s), defrag-q 182593 defrag-tot 654399847 (1500.2/s) defrag-w-tot 320384034 (36.2/s)
May 20 2018 12:53:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226726055808, contig-free 190903M (1527226 wblocks), swb-free 16, w-q 0 w-tot 651638250 (237.6/s), defrag-q 215327 defrag-tot 652943513 (1624.9/s) defrag-w-tot 318886797 (32.3/s)
May 20 2018 12:53:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226710667136, contig-free 191693M (1533547 wblocks), swb-free 12, w-q 0 w-tot 653126326 (240.5/s), defrag-q 210001 defrag-tot 654432771 (1646.2/s) defrag-w-tot 320384752 (35.9/s)
May 20 2018 12:53:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226730277888, contig-free 190974M (1527796 wblocks), swb-free 15, w-q 1 w-tot 651642892 (232.1/s), defrag-q 244738 defrag-tot 652978136 (1731.2/s) defrag-w-tot 318887390 (29.6/s)
May 20 2018 12:53:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226714854912, contig-free 191789M (1534312 wblocks), swb-free 12, w-q 0 w-tot 653131066 (237.0/s), defrag-q 239193 defrag-tot 654467468 (1734.8/s) defrag-w-tot 320385452 (35.0/s)
May 20 2018 12:54:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226734090112, contig-free 191051M (1528411 wblocks), swb-free 16, w-q 0 w-tot 651647476 (229.2/s), defrag-q 269838 defrag-tot 653008435 (1514.9/s) defrag-w-tot 318887961 (28.5/s)
May 20 2018 12:54:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226718553216, contig-free 191847M (1534782 wblocks), swb-free 12, w-q 0 w-tot 653136056 (249.5/s), defrag-q 262775 defrag-tot 654496510 (1452.1/s) defrag-w-tot 320386392 (47.0/s)
May 20 2018 12:54:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226737952256, contig-free 191125M (1529005 wblocks), swb-free 16, w-q 0 w-tot 651652078 (230.1/s), defrag-q 294827 defrag-tot 653038620 (1509.2/s) defrag-w-tot 318888511 (27.5/s)
...
May 20 2018 12:55:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226753483520, contig-free 190761M (1526092 wblocks), swb-free 16, w-q 0 w-tot 651675175 (308.3/s), defrag-q 407056 defrag-tot 653171033 (2105.0/s) defrag-w-tot 318895415 (104.4/s)
May 20 2018 12:55:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226737945984, contig-free 191291M (1530332 wblocks), swb-free 11, w-q 0 w-tot 653167106 (307.0/s), defrag-q 401760 defrag-tot 654662095 (2156.2/s) defrag-w-tot 320397199 (103.1/s)
May 20 2018 12:56:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226757283968, contig-free 190617M (1524943 wblocks), swb-free 16, w-q 0 w-tot 651681361 (309.3/s), defrag-q 442004 defrag-tot 653211018 (1999.2/s) defrag-w-tot 318897538 (106.2/s)
May 20 2018 12:56:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226741912064, contig-free 191131M (1529054 wblocks), swb-free 12, w-q 0 w-tot 653173239 (306.6/s), defrag-q 435212 defrag-tot 654700402 (1915.3/s) defrag-w-tot 320399271 (103.6/s)
May 20 2018 12:56:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226763324672, contig-free 190478M (1523828 wblocks), swb-free 16, w-q 0 w-tot 651687457 (304.8/s), defrag-q 470512 defrag-tot 653244506 (1674.4/s) defrag-w-tot 318899593 (102.8/s)
May 20 2018 12:56:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226747844224, contig-free 190957M (1527660 wblocks), swb-free 12, w-q 0 w-tot 653179273 (301.7/s), defrag-q 463981 defrag-tot 654733811 (1670.4/s) defrag-w-tot 320401294 (101.2/s)
May 20 2018 12:56:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226767566208, contig-free 190325M (1522604 wblocks), swb-free 16, w-q 0 w-tot 651693480 (301.1/s), defrag-q 485995 defrag-tot 653264789 (1014.2/s) defrag-w-tot 318901637 (102.2/s)
May 20 2018 12:56:43 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226751871360, contig-free 190792M (1526337 wblocks), swb-free 11, w-q 0 w-tot 653185368 (304.8/s), defrag-q 478731 defrag-tot 654753333 (976.1/s) defrag-w-tot 320403383 (104.4/s)
May 20 2018 12:57:02 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226771361024, contig-free 190137M (1521098 wblocks), swb-free 16, w-q 0 w-tot 651699523 (302.1/s), defrag-q 497626 defrag-tot 653280956 (808.3/s) defrag-w-tot 318903601 (98.2/s)
May 20 2018 12:57:03 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226755923328, contig-free 190608M (1524869 wblocks), swb-free 12, w-q 0 w-tot 653191531 (308.1/s), defrag-q 491532 defrag-tot 654770829 (874.8/s) defrag-w-tot 320405462 (103.9/s)
May 20 2018 12:57:22 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226777985536, contig-free 189942M (1519536 wblocks), swb-free 16, w-q 0 w-tot 651705509 (299.3/s), defrag-q 517800 defrag-tot 653305555 (1229.9/s) defrag-w-tot 318905516 (95.8/s)
...
May 20 2018 13:02:23 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sda: used 226868299008, contig-free 188053M (1504429 wblocks), swb-free 12, w-q 0 w-tot 653287934 (266.9/s), defrag-q 517857 defrag-tot 654873117 (2.2/s) defrag-w-tot 320438147 (107.1/s)
May 20 2018 13:02:42 GMT: INFO (drv_ssd): (drv_ssd.c:2072) device /dev/sdb: used 226890663552, contig-free 187234M (1497874 wblocks), swb-free 16, w-q 0 w-tot 651798962 (252.4/s), defrag-q 526315 defrag-tot 653385861 (1.8/s) defrag-w-tot 318936129 (100.2/s)

Seems that defrag-total is usually bigger than write-total during inserting data into Aerospike.


#12

A few notes for now:

Use RPS when you have only one tx/rx in your NIC. See the best practices in our Amazon EC2 guide. Consider a multiqueue NIC.

Why is your high-water-memory-pct set to 85%? You have a 4 node cluster, and you don’t want to exceed 80% if a node goes down. So 80 * (4 - 1) / 4 = 60%

You should not be setting your high-water-disk-pct higher than 50% unless you also intend to raise your defrag-lwm-pct to the same value. As you do this and go above 50%, you’re going to incur a lot more defrag. See the knowledge base articles below:

Next, grep your log for cache-read-pct. You have plenty of DRAM defined for the namespace, and you should check if dynamically raising the namespace post-write-queue from the default of 256 blocks per-device up toward the max of 2048 (4096 in version >= 3.16) gives you a better cache hit rate. This is especially useful when your reads follow the writes closely, as is the case with updates.

Last thing for now, it doesn’t matter what the random read IOPS published by Samsung for the PM863 are. Their tests do not simulate a database workload. In Aerospike you’ll be combining random reads and writes and large-block reads for defrag, all at the same time. This is why Aerospike published the open source ACT tool. The PM863 has previously been rated as a 9x SSD (though you might be using a newer model that rates better). This means that with the 1.5K object and 128K write block it was measured to do 18Ktps reads concurrent with 9Ktps writes while not exceeding 5% > 1ms, 1% > 8ms, 0.1% > 64ms for 24 hours.

You can use ACT to measure your very specific workload, to your spec. Using that information you can size your cluster for your needs. Do not expect your SSD to hold up to more than what it was measured to do sustainably.


#13

@rbotzer, thanks a lot for the recommendations, we will definitely give them a try and will update this topic later on with the results.


#14

@rbotzer, here are some updates about our journey with Aerospike configuration.

Setting defrag-lwm-pct to the same value as high-water-disk-pct (both are now are set to 85%) seem to influence somehow the stability of Aerospike.

Some instances started to be killed by out of memory killer:

Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166833] asd invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166836] asd cpuset=/ mems_allowed=0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166840] CPU: 0 PID: 50855 Comm: asd Not tainted 4.4.0-47-generic #68-Ubuntu
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166841] Hardware name: ASUSTeK COMPUTER INC. Z10PA-U8 Series/Z10PA-U8 Series, BIOS 3202 05/06/2016
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166842]  0000000000000286 00000000f7a0a3b6 ffff880cccd23a30 ffffffff813f5aa3
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166843]  ffff880cccd23c28 ffff883f604eee00 ffff880cccd23aa0 ffffffff81209ebe
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166845]  ffff880cccd23a50 ffffffff8113ebfa ffff880cccd23ad0 ffffffff811a5d45
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166846] Call Trace:
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166852]  [<ffffffff813f5aa3>] dump_stack+0x63/0x90
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166855]  [<ffffffff81209ebe>] dump_header+0x5a/0x1c5
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166858]  [<ffffffff8113ebfa>] ? __delayacct_freepages_end+0x2a/0x30
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166860]  [<ffffffff811a5d45>] ? do_try_to_free_pages+0x325/0x440
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166863]  [<ffffffff81191ab2>] oom_kill_process+0x202/0x3c0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166864]  [<ffffffff81191ed9>] out_of_memory+0x219/0x460
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166866]  [<ffffffff81197f20>] __alloc_pages_nodemask+0x9d0/0xb60
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166869]  [<ffffffff811e18ac>] alloc_pages_current+0x8c/0x110
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166871]  [<ffffffff8118e35b>] __page_cache_alloc+0xab/0xc0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166872]  [<ffffffff8119058a>] filemap_fault+0x14a/0x3f0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166875]  [<ffffffff812a11b6>] ext4_filemap_fault+0x36/0x50
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166878]  [<ffffffff811bcfd0>] __do_fault+0x50/0xe0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166880]  [<ffffffff811c0adb>] handle_mm_fault+0xf8b/0x1820
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166882]  [<ffffffff8106b4e7>] __do_page_fault+0x197/0x400
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166883]  [<ffffffff8106b772>] do_page_fault+0x22/0x30
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166887]  [<ffffffff81837178>] page_fault+0x28/0x30
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166903] Mem-Info:
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166908] active_anon:65175389 inactive_anon:230712 isolated_anon:0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166908]  active_file:95 inactive_file:0 isolated_file:0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166908]  unevictable:0 dirty:5 writeback:0 unstable:0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166908]  slab_reclaimable:12655 slab_unreclaimable:12828
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166908]  mapped:10979 shmem:665638 pagetables:127850 bounce:0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166908]  free:276795 free_pcp:1755 free_cma:0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166910] Node 0 DMA free:15904kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166913] lowmem_reserve[]: 0 1802 257800 257800 257800
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166915] Node 0 DMA32 free:1024216kB min:464kB low:580kB high:696kB active_anon:853216kB inactive_anon:560kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1964556kB managed:1883984kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:1000kB slab_reclaimable:160kB slab_unreclaimable:232kB kernel_stack:0kB pagetables:3464kB unstable:0kB bounce:0kB free_pcp:1224kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:16228 all_unreclaimable? yes
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166917] lowmem_reserve[]: 0 0 255997 255997 255997
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166919] Node 0 Normal free:67060kB min:65060kB low:81324kB high:97588kB active_anon:259848340kB inactive_anon:922288kB active_file:388kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:266338304kB managed:262141552kB mlocked:0kB dirty:20kB writeback:0kB mapped:43900kB shmem:2661552kB slab_reclaimable:50460kB slab_unreclaimable:51080kB kernel_stack:6496kB pagetables:507936kB unstable:0kB bounce:0kB free_pcp:5796kB local_pcp:604kB free_cma:0kB writeback_tmp:0kB pages_scanned:8636 all_unreclaimable? yes
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166922] lowmem_reserve[]: 0 0 0 0 0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166923] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166929] Node 0 DMA32: 405*4kB (UME) 431*8kB (UME) 221*16kB (UME) 72*32kB (UME) 75*64kB (UME) 27*128kB (UME) 16*256kB (UM) 15*512kB
 (M) 6*1024kB (UME) 2*2048kB (UE) 240*4096kB (M) = 1024220kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166934] Node 0 Normal: 808*4kB (UME) 846*8kB (UEH) 463*16kB (UMEH) 307*32kB (UME) 273*64kB (UE) 173*128kB (UME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 67104kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166940] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166941] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166942] 666173 total pagecache pages
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166943] 0 pages in swap cache
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166943] Swap cache stats: add 0, delete 0, find 0/0
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166944] Free swap  = 0kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166944] Total swap = 0kB
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166945] 67079712 pages RAM
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166946] 0 pages HighMem/MovableOnly
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166946] 1069352 pages reserved
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166947] 0 pages cma reserved
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166947] 0 pages hwpoisoned
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166948] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166952] [  375]     0   375    25742       48      17       3        0             0 lvmetad
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166953] [  383]     0   383    11397      503      24       4        0         -1000 systemd-udevd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166955] [  696]     0   696     3051     1236      11       3        0             0 haveged
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166956] [  698]     0   698    31411    20718      66       4        0             0 systemd-journal
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166957] [  778]   100   778    25081       54      19       3        0             0 systemd-timesyn
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166958] [  780]     0   780    23502       50      15       3        0         -1000 auditd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166959] [  784]     0   784    20072       33      10       3        0             0 audispd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166960] [  805]   108   805    11248      106      26       3        0          -900 dbus-daemon
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166961] [  817]     0   817    14858      173      32       3        0         -1000 sshd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166962] [  827]   105   827    64624      389      29       4        0             0 rsyslogd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166963] [  836]     0   836     6334      240      17       3        0             0 smartd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166964] [  839]     0   839     5096       76      15       3        0             0 systemd-logind
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166965] [  850]     0   850   194377      123      45       4        0             0 nscd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166966] [  869]     0   869     2614       19       9       3        0             0 agetty
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166967] [  902]     0   902     3344       38      11       3        0             0 mdadm
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166968] [  919]     0   919     6511       49      19       3        0             0 atd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166969] [  923]     0   923     7940       70      21       3        0             0 cron
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166970] [  947]     0   947   123264      529      70       3        0             0 automount
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166972] [  949]   109   949    16071     1075      33       3        0             0 snmpd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166973] [ 1149]   114  1149    23775      260      44       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166974] [ 1150]   114  1150    23775      259      44       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166975] [ 1151]   114  1151    23775      309      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166976] [ 1152]   114  1152    23775      309      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166977] [ 1153]   114  1153    23775      309      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166978] [ 1154]   114  1154    23775      291      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166979] [49599]   111 49599   114642      463      56       4        0         -1000 nslcd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166980] [47164]     0 47164 65391352 64731910  127036     253        0             0 asd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166982] Out of memory: Kill process 47164 (asd) score 953 or sacrifice child
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.167198] Killed process 47164 (asd) total-vm:261565408kB, anon-rss:258927788kB, file-rss:0kB
Jun 21 03:12:31 dmc-prod-as4 kernel: [28989668.053871] ata2.00: Enabling discard_zeroes_data
Jun 21 03:12:31 dmc-prod-as4 kernel: [28989668.431149] ata1.00: Enabling discard_zeroes_data
Jun 21 03:12:31 dmc-prod-as4 kernel: [28989668.433295] ata1.00: Enabling discard_zeroes_data

Another interesting thing is that after parameters had been updated memory consumption grown significantly during loading data into Aerospike as well as in the idle state too.

So I’m just wondering whether these two things are connected, I mean setting defrag-lwm-pct to the same value as high-water-disk-pct and memory consumption.


#15

If you raised your post-write-queue, that’s probably the cause. You have to do a full plan out for your cluster including the allocations given to namespaces for record overhead and secondary indexes, as well as memory fragmentation, post-write-queue’s, and some other overhead - as well as leaving room for the OS.

Rbotzner suggested possibly raising post-write-queue but there is a cost to that. Was that change made?

Additionally, setting your lwm above 50% is not really recommended. You are making Aerospike spend tons of cycles doing defragmentation work. This wastes your IO, cpu, etc… In a pinch it works, but if you need to use over 50% disk space, you probably should just get more hardware.


#16

@Albot, post-write-queue has remained the same equal to its default value of 256.

I’ve noticed that according to Aerospike logs the available amount of system memory and Aerospike’s memory was the following:

system ram,    free-pct: 34
aerospike ram, used-pct: 43.33

just before the failure logs showed these numbers

system ram,    free-pct: 2
aerospike ram, used-pct: 43.37

So I started to think that probably something else consumed all the previously available memory, but then I discovered that oom killer logged that asd process consumed 64731910 pages of ram (or ~247G)

Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166948] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166952] [  375]     0   375    25742       48      17       3        0             0 lvmetad
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166953] [  383]     0   383    11397      503      24       4        0         -1000 systemd-udevd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166955] [  696]     0   696     3051     1236      11       3        0             0 haveged
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166956] [  698]     0   698    31411    20718      66       4        0             0 systemd-journal
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166957] [  778]   100   778    25081       54      19       3        0             0 systemd-timesyn
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166958] [  780]     0   780    23502       50      15       3        0         -1000 auditd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166959] [  784]     0   784    20072       33      10       3        0             0 audispd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166960] [  805]   108   805    11248      106      26       3        0          -900 dbus-daemon
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166961] [  817]     0   817    14858      173      32       3        0         -1000 sshd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166962] [  827]   105   827    64624      389      29       4        0             0 rsyslogd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166963] [  836]     0   836     6334      240      17       3        0             0 smartd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166964] [  839]     0   839     5096       76      15       3        0             0 systemd-logind
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166965] [  850]     0   850   194377      123      45       4        0             0 nscd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166966] [  869]     0   869     2614       19       9       3        0             0 agetty
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166967] [  902]     0   902     3344       38      11       3        0             0 mdadm
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166968] [  919]     0   919     6511       49      19       3        0             0 atd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166969] [  923]     0   923     7940       70      21       3        0             0 cron
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166970] [  947]     0   947   123264      529      70       3        0             0 automount
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166972] [  949]   109   949    16071     1075      33       3        0             0 snmpd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166973] [ 1149]   114  1149    23775      260      44       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166974] [ 1150]   114  1150    23775      259      44       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166975] [ 1151]   114  1151    23775      309      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166976] [ 1152]   114  1152    23775      309      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166977] [ 1153]   114  1153    23775      309      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166978] [ 1154]   114  1154    23775      291      46       3        0             0 zabbix_agentd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166979] [49599]   111 49599   114642      463      56       4        0         -1000 nslcd
Jun 21 03:12:27 dmc-prod-as4 kernel: [28989664.166980] [47164]     0 47164 65391352 64731910  127036     253        0             0 asd

So now I’m wondering what is the connection between used-pct visible in logs and resident set size of asd process visible either by means of top or in oom killer logs.


#17

The used-pct is the used memory for that namespace and does not account for everything. If you continue to face this issue I suggest you start a support case with enterprise support who can also help with sizing https://www.aerospike.com/forms/contact-us/ or open a new thread since this one seems to be a run on. :slight_smile: I believe @rbotzner solved your original issue, yes?


#18

Sure, thanks a lot for help!