Server full error when not full

tivvit · June 9, 2016, 2:41pm

Hello,

one of the nodes in the cluster started to return Server full exception. The console looks like this:

Bad thing is that it happend multiple times on different nodes. We solved that by deleting the data file on the node and let them replicate from other nodes, but the problem seems systematic.

What seem strange is the node says used 18%, available: 0% so they do not sum to 100%, but it does not on other nodes too, but on other nodes the difference is around 10%. All the node statistics seem coparable to other nodes.

I provide a part of log file and part of gdb output. I also have backup of the “full” data file (which seems almost empty) and I also have output from asmonitor - collectinfo tool.

Any ideas, how to explore the problem?

Jun 09 2016 10:17:25 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /www/aerospike/data/er.dat: used 1013766400, contig-free 15169M (15169 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s) defrag-w-tot 0 (0.0/s)
Jun 09 2016 10:17:37 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /www/aerospike/data/abox.dat: used 0, contig-free 1021M (1021 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 1 (0.0/s) defrag-w-tot 0 (0.0/s)
Jun 09 2016 10:17:45 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /www/aerospike/data/er.dat: used 1013766400, contig-free 15169M (15169 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s) defrag-w-tot 0

Jun 09 2016 07:45:43 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /opt/aerospike/data/aero_scan.dat: used 22505230592, contig-free 768M (768 wblocks), swb-free 15, w-q 0 w-tot 24648627
5 (219.3/s), defrag-q 95585 defrag-tot 246493114 (220.3/s) defrag-w-tot 103630785 (3.5/s)

Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::2453) {scan}: write_local_pickled: drives full
Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::2453) {scan}: write_local_pickled: drives full
Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::3418) {scan}: write_local: drives full
Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::2453) {scan}: write_local_pickled: drives full
Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::3418) {scan}: write_local: drives full
Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::3418) {scan}: write_local: drives full
Jun 09 2016 07:45:43 GMT: WARNING (rw): (thr_rw.c::2453) {scan}: write_local_pickled: drives full
Jun 09 2016 07:45:43 GMT: INFO (rw): (thr_rw.c::2861) [NOTICE] writing pickled failed(-1):<Digest>:0x0d6f6a70b1a4eead46240641749ccbb0f3b4e30c
Jun 09 2016 07:45:43 GMT: INFO (rw): (thr_rw.c::2861) [NOTICE] writing pickled failed(-1):<Digest>:0x89ad25997e10640b27ee4bc2d36c1a35497cd655
Jun 09 2016 07:45:43 GMT: INFO (rw): (thr_rw.c::2861) [NOTICE] writing pickled failed(-1):<Digest>:0x5af448122da58488bd4ac1b07371346bb7f15afd

kporter · June 9, 2016, 6:04pm

Defrag appears to be unable to keep up with your write load on the underlying storage devices.

You are writing at 219.3 wblocks (see write-block-size in config) per second, defrag is processing 220.3 wblocks per second and after compacting them writes them back at 3.5 wblocks per second. The write rate exceeds the number of wblocks defrag is providing resulting in the runaway situation here.

You can increase the defrag rate by reducing the defrag-sleep which by default is 1ms per wblock read.

tivvit · June 18, 2016, 12:41pm

Thanks a lot, reducing defrag-sleep helped.

Topic		Replies	Views
Cluster with Error 503 AEROSPIKE_ERR_SERVER_FULL C Client Library	2	3512	September 9, 2014
Restore cluster trouble	2	1364	May 27, 2016
One node disk space suddenly free falls to 0% free disk	1	1153	March 23, 2016
Low avail_pct - is it dangerous?	4	2685	June 18, 2015
Aerospike_err_server_full AQL truncate	22	4207	June 21, 2020

Server full error when not full

Related topics