Unused data on the disk - Error Code 8: Server memory error

Hi, We have a problem

  • Configuration is the same as I posted before in thread
  • Aerospike version now is 3.3.17
  • no anomalies in logs

Aerospike server responds to the client “java.lang.Throwable: com.aerospike.client.AerospikeException: Error Code 8: Server memory error”

Console tells us that we are using 63% of the disk.

Aerospike can not be written to disk because the disk is full. I set “min-avail-pct=0” to namespace, it helped for a while.

What the data filled in the entire disk? What must be done to release the disk of unnecessary data?

Namespace:

namespace ssd {
replication-factor                2
default-ttl                       0
memory-size                       118G
high-water-memory-pct             95
high-water-disk-pct               95
stop-writes-pct                   99
storage-engine device {
    scheduler-mode                noop
    device                        /dev/sda4
    device                        /dev/sdb4
    device                        /dev/sdc4
    device                        /dev/sdd4
    write-block-size              128k
}

Hi,

After looking at your configuration on the other thread, I can’t find any settings for your defrag-lwm-pct. Can I have you post the actual configuration from Aerospike server and the last 2000 logs?

For Aerospike log:

tail -2000 /var/log/aerospike/aerospike.log

It seems that Aerospike might not be evicting or its defrag is not keeping up. One more thing, I take it your client is handling the ttl? Since I see from your posting above your default-ttl is 0.

Jerry

You are running with extremely unsafe settings of 99% HWM! You should be running with 50-60% HWM. Also, Your avail pct is on 4% which will hit stop-writes true.

For practical purpose, avail pct is more critical than free disk pct. Also, how much is your memory consumption. Can you post a screenshot of memory as well from the AMC.

Yes default ttl 0, we use our job for cleanup old data.

I add 1 node to cluster. All 5 node on that moment have some configuration

allow-inline-transactions true auto-dun false auto-undun false batch-max-requests 5 000 batch-priority 200 batch-threads 4 dump-message-above-size 134 217 728 enable-fastpath true enable-security false enable-xdr false fabric-port 3 001 fb-health-bad-pct 0 fb-health-good-pct 50 fb-health-msg-per-burst 0 fb-health-msg-timeout 200 heartbeat-interval 1 000 heartbeat-mode mesh heartbeat-port 3 002 heartbeat-protocol v2 heartbeat-timeout 600 info-threads 16 max-msgs-per-type -1 memory-accounting false mesh-port 3 002 microbenchmarks false migrate-max-num-incoming 6 migrate-priority 15 migrate-read-priority 15 migrate-read-sleep 20 migrate-rx-lifetime-ms 60 000 migrate-threads 6 migrate-xmit-hwm 10 migrate-xmit-lwm 8 migrate-xmit-priority 15 migrate-xmit-sleep 20 network-info-port 3 003 nsup-period 120 nsup-queue-escape 1 000 nsup-queue-hwm 15 nsup-queue-lwm 10 nsup-startup-evict true paxos-max-cluster-size 32 paxos-protocol v3 paxos-recovery-policy manual paxos-retransmit-period 5 paxos-single-replica-limit 1 pidfile /var/run/aerospike/asd.pid privilege-refresh-period 300 prole-extra-ttl 0 proto-fd-idle-ms 60 000 proto-fd-max 30 000 proxy-hist-track-back 1 800 proxy-hist-track-slice 10 proxy-hist-track-thresholds 1,8,64 query-batch-size 100 query-bufpool-size 256 query-hist-track-back 1 800 query-hist-track-slice 10 query-hist-track-thresholds 1,8,64 query-in-transaction-thread 0 query-job-tracking false query-long-q-max-size 500 query-priority 10 query-rec-count-bound 4 294 967 295 query-req-in-query-thread 0 query-req-max-inflight 100 query-short-q-max-size 500 query-sleep 1 query-threads 6 query-threshold 10 query-worker-threads 15 query_rec_count-hist-track-back 1 800 query_rec_count-hist-track-slice 10 query_rec_count-hist-track-thresholds 1,8,64 reads-hist-track-back 1 800 reads-hist-track-slice 10 reads-hist-track-thresholds 1,8,64 replication-fire-and-forget false report-authentication-sinks 0 report-sys-admin-sinks 0 report-user-admin-sinks 0 report-violation-sinks 0 respond-client-on-master-completion true reuse-address true scan-priority 200 scan-sleep 1 service-address 0.0.0.0 service-port 3 000 sindex-data-max-memory 18 446 744 073 709 551 616 sindex-populator-scan-priority 3 snub-nodes false stop-writes-noxdr false storage-benchmarks false syslog-local -1 ticker-interval 10 transaction-duplicate-threads 0 transaction-max-ms 1 000 transaction-pending-limit 30 transaction-queues 12 transaction-repeatable-read false transaction-retry-ms 1 000 transaction-threads-per-queue 3 udf-hist-track-back 1 800 udf-hist-track-slice 10 udf-hist-track-thresholds 1,8,64 udf-runtime-gmax-memory 18 446 744 073 709 551 616 udf-runtime-max-memory 18 446 744 073 709 551 616 use-queue-per-device false write-duplicate-resolution-disable false writes_master-hist-track-back 1 800 writes_master-hist-track-slice 10 writes_master-hist-track-thresholds 1,8,64 writes_reply-hist-track-back 1 800 writes_reply-hist-track-slice 10 writes_reply-hist-track-thresholds 1,8,64 xdr-delete-shipping-enabled true xdr-nsup-deletes-enabled false

I cant post full log to discuss. It limit 32000.

Logs repeated like this

Sep 17 2014 11:03:04 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdc4: used 111389286016, contig-free 43048M (344391 wblocks), swb-free 1, w-q 0 w-tot 1694467 (2.8/s), defrag-q 0 defrag-tot 473966 (0.0/s) Sep 17 2014 11:03:04 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdb4: used 111398413824, contig-free 43212M (345703 wblocks), swb-free 1, w-q 0 w-tot 1692087 (2.8/s), defrag-q 0 defrag-tot 472891 (0.0/s) Sep 17 2014 11:03:04 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sda4: used 111385923840, contig-free 43077M (344622 wblocks), swb-free 1, w-q 0 w-tot 1696200 (2.8/s), defrag-q 0 defrag-tot 475950 (0.0/s) Sep 17 2014 11:03:04 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdd4: used 111406512128, contig-free 43132M (345056 wblocks), swb-free 2, w-q 0 w-tot 1691485 (2.9/s), defrag-q 0 defrag-tot 471627 (0.0/s) Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4440) system memory: free 61535320kb ( 46 percent free ) Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4447) migrates in progress ( 0 , 0 ) ::: ClusterSize 5 ::: objects 1086199654 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4455) rec refs 1086227253 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4461) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 294229959 :: master 0 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4471) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (504, 1166401, 1165897) : hb (4, 4, 0) : fab (92, 92, 0) Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4473) heartbeat_received: self 0 : foreign 708520 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4474) heartbeat_stats: bt 0 bf 3 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4487) tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 0 ssdr 0 ssdw 0 rw 0 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4518) namespace ssd: disk inuse: 445579900544 memory inuse: 69516777856 (bytes) sindex memory inuse: 0 (bytes) avail pct 22 cache-read pct 36.23 Sep 17 2014 11:03:05 GMT: INFO (info): (thr_info.c::4528) partitions: actual 815 sync 843 desync 0 zombie 0 wait 0 absent 2438 Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: reads (525951948 total) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (00: 0509762107) (01: 0006818381) (02: 0005456947) (03: 0002308521) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (04: 0000772551) (05: 0000276130) (06: 0000126077) (07: 0000170670) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (08: 0000217341) (09: 0000028931) (10: 0000013626) (11: 0000000637) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::162) (12: 0000000029) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: writes_master (168098851 total) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (00: 0071126321) (01: 0014466098) (02: 0032577534) (03: 0029431645) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (04: 0013158579) (05: 0002909770) (06: 0000917808) (07: 0001153452) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (08: 0001916043) (09: 0000218629) (10: 0000131842) (11: 0000091020) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::162) (12: 0000000110) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: proxy (91103 total) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (00: 0000016671) (01: 0000020546) (02: 0000021046) (03: 0000015468) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (04: 0000007003) (05: 0000002359) (06: 0000001413) (07: 0000001511) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (08: 0000001970) (09: 0000001824) (10: 0000001253) (11: 0000000039) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: writes_reply (209028290 total) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (00: 0112194337) (01: 0014482123) (02: 0032595772) (03: 0029427767) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (04: 0013157075) (05: 0002910831) (06: 0000918350) (07: 0001137841) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::153) (08: 0001882249) (09: 0000218675) (10: 0000102000) (11: 0000001270) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: udf (0 total) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: query (0 total) Sep 17 2014 11:03:05 GMT: INFO (info): (hist.c::136) histogram dump: query_rec_count (0 total) Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4440) system memory: free 61536752kb ( 46 percent free ) Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4447) migrates in progress ( 0 , 0 ) ::: ClusterSize 5 ::: objects 1086198922 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4455) rec refs 1086657017 ::: rec locks 3 ::: trees 0 ::: wr reqs 1 ::: mig tx 0 ::: mig rx 1 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4461) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 294229959 :: master 0 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4471) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (503, 1166426, 1165923) : hb (4, 4, 0) : fab (92, 92, 0) Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4473) heartbeat_received: self 0 : foreign 708560 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4474) heartbeat_stats: bt 0 bf 3 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4487) tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 0 ssdr 0 ssdw 0 rw 3 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4518) namespace ssd: disk inuse: 445578546432 memory inuse: 69516731008 (bytes) sindex memory inuse: 0 (bytes) avail pct 22 cache-read pct 36.26 Sep 17 2014 11:03:15 GMT: INFO (info): (thr_info.c::4528) partitions: actual 815 sync 843 desync 0 zombie 0 wait 0 absent 2438 Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: reads (525976977 total) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (00: 0509787102) (01: 0006818393) (02: 0005456962) (03: 0002308523) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (04: 0000772553) (05: 0000276133) (06: 0000126077) (07: 0000170670) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (08: 0000217341) (09: 0000028931) (10: 0000013626) (11: 0000000637) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::162) (12: 0000000029) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: writes_master (168098851 total) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (00: 0071126321) (01: 0014466098) (02: 0032577534) (03: 0029431645) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (04: 0013158579) (05: 0002909770) (06: 0000917808) (07: 0001153452) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (08: 0001916043) (09: 0000218629) (10: 0000131842) (11: 0000091020) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::162) (12: 0000000110) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: proxy (91103 total) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (00: 0000016671) (01: 0000020546) (02: 0000021046) (03: 0000015468) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (04: 0000007003) (05: 0000002359) (06: 0000001413) (07: 0000001511) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (08: 0000001970) (09: 0000001824) (10: 0000001253) (11: 0000000039) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: writes_reply (209036032 total) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (00: 0112202073) (01: 0014482126) (02: 0032595772) (03: 0029427768) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (04: 0013157075) (05: 0002910833) (06: 0000918350) (07: 0001137841) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::153) (08: 0001882249) (09: 0000218675) (10: 0000102000) (11: 0000001270) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: udf (0 total) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: query (0 total) Sep 17 2014 11:03:15 GMT: INFO (info): (hist.c::136) histogram dump: query_rec_count (0 total) Sep 17 2014 11:03:24 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdc4: used 111388631680, contig-free 43041M (344335 wblocks), swb-free 1, w-q 0 w-tot 1694523 (2.8/s), defrag-q 0 defrag-tot 473966 (0.0/s) Sep 17 2014 11:03:24 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdb4: used 111397711488, contig-free 43206M (345651 wblocks), swb-free 1, w-q 0 w-tot 1692139 (2.6/s), defrag-q 0 defrag-tot 472891 (0.0/s) Sep 17 2014 11:03:24 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sda4: used 111385263872, contig-free 43070M (344565 wblocks), swb-free 1, w-q 0 w-tot 1696257 (2.8/s), defrag-q 0 defrag-tot 475950 (0.0/s) Sep 17 2014 11:03:24 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdd4: used 111406003584, contig-free 43124M (344998 wblocks), swb-free 2, w-q 0 w-tot 1691543 (2.9/s), defrag-q 0 defrag-tot 471627 (0.0/s) Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4440) system memory: free 61542420kb ( 46 percent free ) Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4447) migrates in progress ( 0 , 0 ) ::: ClusterSize 5 ::: objects 1086198396 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4455) rec refs 1086431406 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4461) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 294229959 :: master 0 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4471) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (505, 1166447, 1165942) : hb (4, 4, 0) : fab (92, 92, 0) Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4473) heartbeat_received: self 0 : foreign 708600 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4474) heartbeat_stats: bt 0 bf 3 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4487) tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 0 ssdr 0 ssdw 0 rw 2 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4518) namespace ssd: disk inuse: 445577440512 memory inuse: 69516697344 (bytes) sindex memory inuse: 0 (bytes) avail pct 21 cache-read pct 36.18 Sep 17 2014 11:03:25 GMT: INFO (info): (thr_info.c::4528) partitions: actual 815 sync 843 desync 0 zombie 0 wait 0 absent 2438 Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: reads (526002414 total) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (00: 0509812503) (01: 0006818408) (02: 0005456974) (03: 0002308526) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (04: 0000772555) (05: 0000276137) (06: 0000126077) (07: 0000170670) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (08: 0000217341) (09: 0000028931) (10: 0000013626) (11: 0000000637) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::162) (12: 0000000029) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: writes_master (168098851 total) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (00: 0071126321) (01: 0014466098) (02: 0032577534) (03: 0029431645) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (04: 0013158579) (05: 0002909770) (06: 0000917808) (07: 0001153452) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (08: 0001916043) (09: 0000218629) (10: 0000131842) (11: 0000091020) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::162) (12: 0000000110) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: proxy (91103 total) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (00: 0000016671) (01: 0000020546) (02: 0000021046) (03: 0000015468) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (04: 0000007003) (05: 0000002359) (06: 0000001413) (07: 0000001511) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (08: 0000001970) (09: 0000001824) (10: 0000001253) (11: 0000000039) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: writes_reply (209043944 total) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (00: 0112209985) (01: 0014482126) (02: 0032595772) (03: 0029427768) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (04: 0013157075) (05: 0002910833) (06: 0000918350) (07: 0001137841) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::153) (08: 0001882249) (09: 0000218675) (10: 0000102000) (11: 0000001270) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: udf (0 total) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: query (0 total) Sep 17 2014 11:03:25 GMT: INFO (info): (hist.c::136) histogram dump: query_rec_count (0 total) Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4440) system memory: free 61536036kb ( 46 percent free ) Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4447) migrates in progress ( 0 , 0 ) ::: ClusterSize 5 ::: objects 1086198104 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4455) rec refs 1086643151 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4461) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 294229959 :: master 0 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4471) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (502, 1166465, 1165963) : hb (4, 4, 0) : fab (92, 92, 0) Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4473) heartbeat_received: self 0 : foreign 708640 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4474) heartbeat_stats: bt 0 bf 3 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4487) tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 0 ssdr 0 ssdw 0 rw 0 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4518) namespace ssd: disk inuse: 445576680576 memory inuse: 69516678656 (bytes) sindex memory inuse: 0 (bytes) avail pct 21 cache-read pct 35.86 Sep 17 2014 11:03:35 GMT: INFO (info): (thr_info.c::4528) partitions: actual 815 sync 843 desync 0 zombie 0 wait 0 absent 2438 Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: reads (526027543 total) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (00: 0509837605) (01: 0006818422) (02: 0005456982) (03: 0002308527) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (04: 0000772556) (05: 0000276140) (06: 0000126077) (07: 0000170670) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (08: 0000217341) (09: 0000028931) (10: 0000013626) (11: 0000000637) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::162) (12: 0000000029) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: writes_master (168098851 total) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (00: 0071126321) (01: 0014466098) (02: 0032577534) (03: 0029431645) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (04: 0013158579) (05: 0002909770) (06: 0000917808) (07: 0001153452) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (08: 0001916043) (09: 0000218629) (10: 0000131842) (11: 0000091020) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::162) (12: 0000000110) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: proxy (91103 total) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (00: 0000016671) (01: 0000020546) (02: 0000021046) (03: 0000015468) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (04: 0000007003) (05: 0000002359) (06: 0000001413) (07: 0000001511) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (08: 0000001970) (09: 0000001824) (10: 0000001253) (11: 0000000039) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: writes_reply (209051652 total) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (00: 0112217692) (01: 0014482126) (02: 0032595772) (03: 0029427768) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (04: 0013157075) (05: 0002910834) (06: 0000918350) (07: 0001137841) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::153) (08: 0001882249) (09: 0000218675) (10: 0000102000) (11: 0000001270) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: udf (0 total) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: query (0 total) Sep 17 2014 11:03:35 GMT: INFO (info): (hist.c::136) histogram dump: query_rec_count (0 total) Sep 17 2014 11:03:44 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdc4: used 111388163968, contig-free 43035M (344280 wblocks), swb-free 1, w-q 0 w-tot 1694578 (2.8/s), defrag-q 0 defrag-tot 473966 (0.0/s) Sep 17 2014 11:03:44 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdb4: used 111397290880, contig-free 43199M (345597 wblocks), swb-free 1, w-q 0 w-tot 1692193 (2.7/s), defrag-q 0 defrag-tot 472891 (0.0/s) Sep 17 2014 11:03:44 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sdd4: used 111405487232, contig-free 43117M (344942 wblocks), swb-free 2, w-q 0 w-tot 1691599 (2.8/s), defrag-q 0 defrag-tot 471627 (0.0/s) Sep 17 2014 11:03:44 GMT: INFO (drv_ssd): (drv_ssd.c::2392) device /dev/sda4: used 111384907904, contig-free 43063M (344508 wblocks), swb-free 1, w-q 0 w-tot 1696314 (2.8/s), defrag-q 0 defrag-tot 475950 (0.0/s) Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4440) system memory: free 61536076kb ( 46 percent free ) Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4447) migrates in progress ( 0 , 0 ) ::: ClusterSize 5 ::: objects 1086197648 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4455) rec refs 1086782819 ::: rec locks 1 ::: trees 0 ::: wr reqs 1 ::: mig tx 0 ::: mig rx 1 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4461) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 294229959 :: master 0 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4471) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (502, 1166484, 1165982) : hb (4, 4, 0) : fab (92, 92, 0) Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4473) heartbeat_received: self 0 : foreign 708680 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4474) heartbeat_stats: bt 0 bf 3 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4487) tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 0 ssdr 0 ssdw 0 rw 0 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4518) namespace ssd: disk inuse: 445575711488 memory inuse: 69516649472 (bytes) sindex memory inuse: 0 (bytes) avail pct 21 cache-read pct 36.35 Sep 17 2014 11:03:45 GMT: INFO (info): (thr_info.c::4528) partitions: actual 815 sync 843 desync 0 zombie 0 wait 0 absent 2438 Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: reads (526052763 total) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (00: 0509862803) (01: 0006818428) (02: 0005456993) (03: 0002308527) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (04: 0000772557) (05: 0000276143) (06: 0000126078) (07: 0000170670) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (08: 0000217341) (09: 0000028931) (10: 0000013626) (11: 0000000637) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::162) (12: 0000000029) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: writes_master (168098851 total) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (00: 0071126321) (01: 0014466098) (02: 0032577534) (03: 0029431645) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (04: 0013158579) (05: 0002909770) (06: 0000917808) (07: 0001153452) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (08: 0001916043) (09: 0000218629) (10: 0000131842) (11: 0000091020) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::162) (12: 0000000110) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: proxy (91103 total) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (00: 0000016671) (01: 0000020546) (02: 0000021046) (03: 0000015468) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (04: 0000007003) (05: 0000002359) (06: 0000001413) (07: 0000001511) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (08: 0000001970) (09: 0000001824) (10: 0000001253) (11: 0000000039) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: writes_reply (209059398 total) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (00: 0112225435) (01: 0014482126) (02: 0032595774) (03: 0029427769) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (04: 0013157075) (05: 0002910834) (06: 0000918350) (07: 0001137841) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::153) (08: 0001882249) (09: 0000218675) (10: 0000102000) (11: 0000001270) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: udf (0 total) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: query (0 total) Sep 17 2014 11:03:45 GMT: INFO (info): (hist.c::136) histogram dump: query_rec_count (0 total) Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4440) system memory: free 61536000kb ( 46 percent free ) Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4447) migrates in progress ( 0 , 0 ) ::: ClusterSize 5 ::: objects 1086197065 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4455) rec refs 1086836548 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4461) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 294229959 :: master 0 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4471) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (504, 1166505, 1166001) : hb (4, 4, 0) : fab (92, 92, 0) Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4473) heartbeat_received: self 0 : foreign 708720 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4474) heartbeat_stats: bt 0 bf 3 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4487) tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 1 migtx 0 ssdr 0 ssdw 0 rw 0 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4518) namespace ssd: disk inuse: 445574517248 memory inuse: 69516612160 (bytes) sindex memory inuse: 0 (bytes) avail pct 21 cache-read pct 36.07 Sep 17 2014 11:03:55 GMT: INFO (info): (thr_info.c::4528) partitions: actual 815 sync 843 desync 0 zombie 0 wait 0 absent 2438 Sep 17 2014 11:03:55 GMT: INFO (info): (hist.c::136) histogram dump: reads (526077829 total)

On that moment:

How do I clean the disks of excess garbage? Garbage occupies almost 20% of the cluster.

Stats:

batch_errors 0 batch_initiate 0 batch_queue 0 batch_timeout 0 batch_tree_count 0 client_connections 476 cluster_integrity true cluster_key 3CC26E756378F8F9 cluster_size 5 data-used-bytes-memory 0 err_duplicate_proxy_request 349 err_out_of_space 0 err_replica_non_null_node 0 err_replica_null_node 0 err_rw_cant_put_unique 0 err_rw_pending_limit 1 err_rw_request_not_found 2 760 err_storage_queue_full 0 err_sync_copy_null_master 0 err_sync_copy_null_node 458 635 612 err_tsvc_requests 10 227 557 err_write_fail_bin_exists 0 err_write_fail_generation 15 211 512 err_write_fail_generation_xdr 0 err_write_fail_incompatible_type 0 err_write_fail_key_exists 0 err_write_fail_key_mismatch 0 err_write_fail_not_found 469 err_write_fail_noxdr 0 err_write_fail_parameter 0 err_write_fail_prole_delete 466 072 err_write_fail_prole_generation 0 err_write_fail_prole_unknown 0 err_write_fail_unknown 0 fabric_msgs_rcvd 5 132 753 504 fabric_msgs_sent 5 132 744 578 free-pct-disk 54 free-pct-memory 49 heartbeat_received_foreign 1 266 336 heartbeat_received_self 0 index-used-bytes-memory 67 702 617 728 info_queue 0 migrate_msgs_recv 3 184 744 011 migrate_msgs_sent 3 184 152 708 migrate_num_incoming_accepted 7 139 migrate_num_incoming_refused 69 170 migrate_progress_recv 0 migrate_progress_send 0 migrate_rx_objs 0 migrate_tx_objs 0 objects 1 057 853 402 ongoing_write_reqs 0 partition_absent 2 486 partition_actual 820 partition_desync 0 partition_object_count 1 057 853 574 partition_ref_count 4 104 partition_replica 790 paxos_principal BB970D95F3ACAB8 proxy_action 302 775 562 proxy_initiate 49 654 proxy_retry 422 proxy_retry_new_dest 0 proxy_retry_q_full 0 proxy_retry_same_dest 0 proxy_unproxy 0 query_abort 0 query_agg 0 query_agg_abort 0 query_agg_avg_rec_count 0 query_agg_err 0 query_agg_success 0 query_avg_rec_count 0 query_fail 1 query_long_queue_full 0 query_long_running 0 query_lookup_abort 0 query_lookup_avg_rec_count 0 query_lookup_err 0 query_lookup_success 0 query_lookups 0 query_reqs 1 query_short_queue_full 0 query_short_running 0 query_success 0 query_tracked 0 queue 0 read_dup_prole 1 089 009 400 reaped_fds 1 106 record_locks 0 record_refs 1 058 410 341 rw_err_ack_badnode 0 rw_err_ack_internal 0 rw_err_ack_nomatch 431 413 rw_err_dup_cluster_key 173 246 373 rw_err_dup_internal 0 rw_err_dup_send 0 rw_err_write_cluster_key 0 rw_err_write_internal 0 rw_err_write_send 0 sindex-used-bytes-memory 0 sindex_gc_activity_dur 0 sindex_gc_garbage_cleaned 0 sindex_gc_garbage_found 0 sindex_gc_inactivity_dur 0 sindex_gc_list_creation_time 0 sindex_gc_list_deletion_time 0 sindex_gc_locktimedout 0 sindex_gc_objects_validated 0 sindex_ucgarbage_found 0 stat_cluster_key_err_ack_dup_trans_reenqueue 152 331 stat_cluster_key_err_ack_rw_trans_reenqueue 28 stat_cluster_key_partition_transaction_queue_count 0 stat_cluster_key_prole_retry 0 stat_cluster_key_regular_processed 0 stat_cluster_key_trans_to_proxy_retry 7 697 216 stat_cluster_key_transaction_reenqueue 0 stat_delete_success 33 571 853 stat_deleted_set_objects 0 stat_duplicate_operation 293 334 530 stat_evicted_objects 0 stat_evicted_objects_time 0 stat_evicted_set_objects 0 stat_expired_objects 20 196 681 stat_nsup_deletes_not_shipped 12 244 605 stat_proxy_errs 257 stat_proxy_reqs 49 654 stat_proxy_reqs_xdr 0 stat_proxy_success 43 762 stat_read_errs_notfound 57 488 184 stat_read_errs_other 0 stat_read_reqs 826 066 280 stat_read_reqs_xdr 0 stat_read_success 768 385 811 stat_rw_timeout 182 680 stat_slow_trans_queue_batch_pop 18 627 stat_slow_trans_queue_pop 24 536 175 stat_slow_trans_queue_push 24 536 175 stat_write_errs 11 278 533 stat_write_errs_notfound 9 stat_write_errs_other 11 278 524 stat_write_reqs 411 162 084 stat_write_reqs_xdr 0 stat_write_success 399 740 883 stat_xdr_pipe_miss 0 stat_xdr_pipe_writes 0 stat_zero_bin_records 0 storage_defrag_corrupt_record 0 system_free_mem_pct 37 system_swapping false total-bytes-disk 947 268 878 336 total-bytes-memory 133 143 986 176 transactions 996 711 321 tree_count 0 tscan_aborted 0 tscan_initiate 2 tscan_pending 0 tscan_succeeded 2 udf_delete_err_others 0 udf_delete_reqs 0 udf_delete_success 0 udf_lua_errs 0 udf_query_rec_reqs 0 udf_read_errs_other 0 udf_read_reqs 0 udf_read_success 0 udf_replica_writes 0 udf_scan_rec_reqs 0 udf_write_err_others 0 udf_write_reqs 0 udf_write_success 0 uptime 260 653 used-bytes-disk 433 669 163 136 used-bytes-memory 67 702 617 728 waiting_transactions 0 write_master 676 693 097 write_prole 675 884 666

Hi,

Can I know what your defrag-lwm-pct is set to? You can grab that info out of the aerospike.conf. Or if not you can also run the following command : asmonitor -e “asinfo -v “namespace/ssd””

We need to know what your defrag-lwm-pct is set to. Since your defrag is not keeping up with your writes. Also that 20% is un-defraged space, it is not garbage space.

Jerry

Hi Previously set in “50” We had structural changes, and the entire cluster was restarted. Now is “99”

~ # asmonitor -e “asinfo -v “namespace/ssd””

Enter help for commands

3 hosts in cluster: host1:3000,host2:3000,host3:3000 -v namespace/ssd host1:3000 returned : type=device objects=1619078594 master-objects=890416207 prole-objects=644094879 expired-objects=1828941 evicted-objects=0 set-deleted-objects=0 set-evicted-objects=0 used-bytes-memory=103621030016 data-used-bytes-memory=0 index-used-bytes-memory=103621030016 sindex-used-bytes-memory=0 free-pct-memory=18 max-void-time=151831847 non-expirable-objects=886198727 current-time=149239850 stop-writes=false hwm-breached=false available-bin-names=32747 ldt_reads=0 ldt_read_success=0 ldt_deletes=0 ldt_delete_success=0 ldt_writes=0 ldt_write_success=0 ldt_updates=0 ldt_errors=0 used-bytes-disk=647123851904 free-pct-disk=21 available_pct=18 cache-read-pct=10 sets-enable-xdr=true memory-size=126701535232 high-water-disk-pct=99 high-water-memory-pct=99 evict-tenths-pct=5 stop-writes-pct=99 cold-start-evict-ttl=4294967295 repl-factor=2 default-ttl=0 max-ttl=0 conflict-resolution-policy=generation allow_versions=false single-bin=false enable-xdr=false disallow-null-setname=false total-bytes-memory=126701535232 total-bytes-disk=820623966208 defrag-lwm-pct=99 defrag-sleep=1000 defrag-startup-minimum=10 write-smoothing-period=0 max-write-cache=67108864 min-avail-pct=1 post-write-queue=256 data-in-memory=false dev=/dev/sda4 dev=/dev/sdb4 dev=/dev/sdc4 dev=/dev/sdd4 filesize=17179869184 writethreads=1 writecache=67108864 obj-size-hist-max=100

host2:3000 returned : type=device objects=1780367579 master-objects=895351945 prole-objects=875250666 expired-objects=2300464 evicted-objects=0 set-deleted-objects=0 set-evicted-objects=0 used-bytes-memory=113943525056 data-used-bytes-memory=0 index-used-bytes-memory=113943525056 sindex-used-bytes-memory=0 free-pct-memory=10 max-void-time=151831847 non-expirable-objects=891134072 current-time=149239850 stop-writes=false hwm-breached=false available-bin-names=32746 ldt_reads=0 ldt_read_success=0 ldt_deletes=0 ldt_delete_success=0 ldt_writes=0 ldt_write_success=0 ldt_updates=0 ldt_errors=0 used-bytes-disk=710912081920 free-pct-disk=13 available_pct=10 cache-read-pct=2 sets-enable-xdr=true memory-size=126701535232 high-water-disk-pct=99 high-water-memory-pct=99 evict-tenths-pct=5 stop-writes-pct=99 cold-start-evict-ttl=4294967295 repl-factor=2 default-ttl=0 max-ttl=0 conflict-resolution-policy=generation allow_versions=false single-bin=false enable-xdr=false disallow-null-setname=false total-bytes-memory=126701535232 total-bytes-disk=820623966208 defrag-lwm-pct=99 defrag-sleep=1000 defrag-startup-minimum=10 write-smoothing-period=0 max-write-cache=67108864 min-avail-pct=1 post-write-queue=256 data-in-memory=false dev=/dev/sda4 dev=/dev/sdb4 dev=/dev/sdc4 dev=/dev/sdd4 filesize=17179869184 writethreads=1 writecache=67108864 obj-size-hist-max=100

host3:3000 returned : type=device objects=1374769499 master-objects=915345196 prole-objects=212538455 expired-objects=1887511 evicted-objects=0 set-deleted-objects=0 set-evicted-objects=0 used-bytes-memory=87985247936 data-used-bytes-memory=0 index-used-bytes-memory=87985247936 sindex-used-bytes-memory=0 free-pct-memory=30 max-void-time=151831834 non-expirable-objects=910993878 current-time=149239850 stop-writes=false hwm-breached=false available-bin-names=32747 ldt_reads=0 ldt_read_success=0 ldt_deletes=0 ldt_delete_success=0 ldt_writes=0 ldt_write_success=0 ldt_updates=0 ldt_errors=0 used-bytes-disk=550839482880 free-pct-disk=32 available_pct=28 cache-read-pct=3 sets-enable-xdr=true memory-size=126701535232 high-water-disk-pct=99 high-water-memory-pct=99 evict-tenths-pct=5 stop-writes-pct=99 cold-start-evict-ttl=4294967295 repl-factor=2 default-ttl=0 max-ttl=0 conflict-resolution-policy=generation allow_versions=false single-bin=false enable-xdr=false disallow-null-setname=false total-bytes-memory=126701535232 total-bytes-disk=820623966208 defrag-lwm-pct=99 defrag-sleep=1000 defrag-startup-minimum=10 write-smoothing-period=0 max-write-cache=67108864 min-avail-pct=1 post-write-queue=256 data-in-memory=false dev=/dev/sda4 dev=/dev/sdb4 dev=/dev/sdc4 dev=/dev/sdd4 filesize=17179869184 writethreads=1 writecache=67108864 obj-size-hist-max=100

Now we wait for the end of migration Thanks

I would recommend moving your defrag-lwm-pct back to 50. By putting defrag-lwm-pct to that high of a percentage (99) means more blocks to be defragmented, and more dense data on the disk which may lead to latency.

Also is there a reason why your stop writes, high-water-disk-pct and high-water-memory-pct is at 99 pct? Since I believe we recommend the HWM to be around 50-60% & stop-writes to be around 90%.

Jerry

A post was split to a new topic: Memory errors