Hi, I have 3 aerospike nodes installed on linux OS.
I am working with a Java client from a linux machine.
Aerospike Version: Enterprise Edition build 3.5.15
In a certain point of time, one of the aerospike nodes stops responding to scans.
This is a grep of “scan job received” and “SCAN JOB DONE” in the aerospike.log:
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23597241221030: ns= regional set=subscribers scanned=6539 expired=0 set_diff=0 elapsed=15 (ms)]
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23597268704158: ns= regional set=subscribers scanned=6539 expired=0 set_diff=0 elapsed=15 (ms)]
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23597295959600: ns= regional set=subscribers scanned=6539 expired=0 set_diff=0 elapsed=16 (ms)]
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23597323034699: ns= regional set=subscribers scanned=6539 expired=0 set_diff=0 elapsed=16 (ms)]
Jan 20 2016 13:26:05 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:06 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23597350354171: ns= regional set=subscribers scanned=6539 expired=0 set_diff=0 elapsed=15 (ms)]
Jan 20 2016 13:26:06 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:06 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23597376914686: ns= regional set=subscribers scanned=6539 expired=0 set_diff=0 elapsed=15 (ms)]
Jan 20 2016 13:26:09 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:09 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23604106355837: ns= global set=status scanned=9 expired=0 set_diff=126 elapsed=13 (ms)]
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23609107672641: ns= global set=status scanned=9 expired=0 set_diff=126 elapsed=13 (ms)]
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:37 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:55 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:06 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:13 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:23 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:31 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:40 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:49 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:55 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:27:59 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:07 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:12 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:16 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:26 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:29 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:33 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:44 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:44 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:28:48 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:01 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:02 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:05 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:18 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:20 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:22 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:33 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:34 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:29:37 GMT: INFO (scan): (thr_tscan.c::871) scan job received
This is the output of the last succesfull scan + 2 following failed scans, and the log lines around it:
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::922) scan_option 0x8 0x64
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::979) NO bins specified select all
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::1013) scan option: Fail if cluster change True
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::1014) scan option: Background Job False
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::1015) scan option: priority is 0 n_threads 1 job_type 1
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::1016) scan option: scan_pct is 100
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::387) SCAN JOB DONE [id =23609107672641: ns= global set=status scanned=9 expired=0 set_diff=126 elapsed=13 (ms)]
Jan 20 2016 13:26:14 GMT: INFO (scan): (thr_tscan.c::1459) Scan Job 23609107672641: send final message: fd 99 result 0
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::922) scan_option 0x0 0x64
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::1013) scan option: Fail if cluster change False
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::1014) scan option: Background Job False
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::1015) scan option: priority is 0 n_threads 1 job_type 1
Jan 20 2016 13:26:17 GMT: INFO (scan): (thr_tscan.c::1016) scan option: scan_pct is 100
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::922) scan_option 0x0 0x64
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::1013) scan option: Fail if cluster change False
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::1014) scan option: Background Job False
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::1015) scan option: priority is 0 n_threads 1 job_type 1
Jan 20 2016 13:26:18 GMT: INFO (scan): (thr_tscan.c::1016) scan option: scan_pct is 100
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::871) scan job received
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::922) scan_option 0x8 0x64
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::979) NO bins specified select all
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::1013) scan option: Fail if cluster change True
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::1014) scan option: Background Job False
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::1015) scan option: priority is 0 n_threads 1 job_type 1
Jan 20 2016 13:26:19 GMT: INFO (scan): (thr_tscan.c::1016) scan option: scan_pct is 100
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4796) system memory: free 10559952kb ( 64 percent free )
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4804) migrates in progress ( 0 , 0 ) ::: ClusterSize 3 ::: objects 14155 ::: sub_objects 0
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4812) rec refs 14155 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4818) replica errs :: null 2168281804 non-null 0 ::: sync copy errs :: node 0 :: master 0
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4828) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (30, 461, 431) : hb (3, 8, 5) : fab (42, 77, 35)
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4830) heartbeat_received: self 0 : foreign 679844
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4831) heartbeat_stats: bt 0 bf 452716 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 5 efd 4 efa 1 um 0 mcf 0 rc 5
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4844) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4860) namespace global: disk inuse: 235648 memory inuse: 145538 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4860) namespace regional: disk inuse: 14065792 memory inuse: 10161116 (bytes) sindex memory inuse: 538232 (bytes) avail pct 99
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4860) namespace local: disk inuse: 512 memory inuse: 222 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Jan 20 2016 13:26:20 GMT: INFO (info): (thr_info.c::4905) partitions: actual 4053 sync 6848 desync 0 zombie 0 wait 0 absent 1387
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: reads (9392607 total) msec
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::154) (00: 0009392428) (01: 0000000138) (02: 0000000031) (03: 0000000008)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::163) (04: 0000000001) (06: 0000000001)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (2607768 total) msec
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::154) (00: 0002602724) (01: 0000003383) (02: 0000000738) (03: 0000000557)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::154) (04: 0000000292) (05: 0000000060) (06: 0000000002) (07: 0000000001)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::163) (08: 0000000005) (09: 0000000006)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (4203086 total) msec
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::154) (00: 0004196092) (01: 0000004486) (02: 0000000976) (03: 0000000766)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::154) (04: 0000000554) (05: 0000000194) (06: 0000000006) (07: 0000000001)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::163) (08: 0000000005) (09: 0000000006)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: query (2304 total) msec
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::163) (00: 0000002304)
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2304 total) count
Jan 20 2016 13:26:20 GMT: INFO (info): (hist.c::163) (00: 0000000480) (01: 0000001056) (02: 0000000768)
Jan 20 2016 13:26:22 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /var/log/contextream/xl/aerospike_data/global.dat: used 235648, contig-free 498M (3988 wblocks), swb-free 1, n-w 0, w-q 0 w-tot 188 (0.0/s), defrag-q 0 defrag-tot 182 (0.0/s)
Jan 20 2016 13:26:22 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /var/log/contextream/xl/aerospike_data/regional.dat: used 14065792, contig-free 8166M (65329 wblocks), swb-free 1, n-w 0, w-q 0 w-tot 94629 (0.9/s), defrag-q 0 defrag-tot 94431 (0.9/s)
Jan 20 2016 13:26:22 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /var/log/contextream/xl/aerospike_data/local.dat: used 512, contig-free 498M (3991 wblocks), swb-free 1, n-w 0, w-q
This is the exception in the client log:
Failed database scan- 'Error Code 11: Server not available' (17235972)
After restarting the aerospike node, it started working well again.