Dear Aerospike community&support,
Executing “asadm -h 172.18.106.148” , following output is read. It looks as cluster is broken.
From your documentation on aerospike version older than 3.10, its clearly stated that in case of multiple NIC aerospike nodes, with address any in network/service section, its recommended to set access-address parameter in order to publish only one IP for clients to connect. Our client currently does not have the access-address configured, but it does have address any value setting in the aerospike.conf.
Based on above, is this the cause of heaving such confusing output of the “asadm -h 172.18.106.148”?
This is/are the network configuration section from aerospike.conf:
network {
service {
#address 172.18.106.148 <= this entry was previously used / incorrect listening IP
address any
port 3000
#access-address 172.18.106.148 <= this entry was previously used / incorrect published IP
}
heartbeat {
mode mesh
port 3002
address 172.18.106.148
##<==============================================>
##< Cluster nodes go here, one entry per line. You do not need
##< to exclude the "current" host you're on, i.e. the whole set
##< of cluster nodes may be listed here.
mesh-seed-address-port 172.18.106.147 3002
mesh-seed-address-port 172.18.106.148 3002
mesh-seed-address-port 172.18.106.149 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
In short: for all three nodes:
- node 1 : service / access-addres should be 172.19.107.237 ; heartbeat / address is 172.18.106.147
- node2 (principal node): service / access-addres should be 172.19.107.238 ; heartbeat / address 172.18.106.148
- node 3 : service / access-addres should be 172.19.107.239 ; heartbeat / address is 172.18.106.149
Is it only recommended, or required to set the service / access-address in case of service / address any ?
Second issue, observed from the logs, for same cluster is replica err:
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4818) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 42949672960 :: master 0
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::137) histogram dump: proxy (19758 total) msec
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::154) (00: 0000017611) (01: 0000001053) (02: 0000001006) (03: 0000000080)
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::163) (04: 0000000003) (08: 0000000005)
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (153842 total) msec
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::154) (00: 0000145959) (01: 0000003811) (02: 0000003950) (03: 0000000120)
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::163) (04: 0000000001) (05: 0000000001)
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 18 2017 07:52:13 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
May 18 2017 07:52:19 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/avn-psm.dat: used 2556646400, contig-free 38434M (38434 wblocks), swb-free 1, n-w 0, w-q 0 w-tot 328 (0.0/s), defrag-q 0 defrag-tot 329 (0.0/s)
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4796) system memory: free 5468760kb ( 67 percent free )
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4804) migrates in progress ( 0 , 0 ) ::: ClusterSize 3 ::: objects 3178908 ::: sub_objects 0
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4812) rec refs 3178908 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4818) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 42949672960 :: master 0
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4828) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 777 ::: dq 0 : fds - proto (7206, 3800553, 3793347) : hb (4, 4, 0) : fab (44, 44, 0)
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4830) heartbeat_received: self 0 : foreign 2232724
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4831) heartbeat_stats: bt 0 bf 1900191 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4844) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4860) namespace avn-psm: disk inuse: 2556647424 memory inuse: 2033906720 (bytes) sindex memory inuse: 63025440 (bytes) avail pct 93
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4869) namespace avn-psm: ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
May 18 2017 07:52:23 GMT: INFO (info): (thr_info.c::4905) partitions: actual 1365 sync 1394 desync 0 zombie 0 wait 0 absent 1337
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: reads (76012 total) msec
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::154) (00: 0000075988) (01: 0000000008) (02: 0000000010) (03: 0000000004)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::163) (04: 0000000002)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (109043 total) msec
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::154) (00: 0000102898) (01: 0000002994) (02: 0000003054) (03: 0000000095)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::163) (04: 0000000001) (05: 0000000001)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: proxy (19759 total) msec
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::154) (00: 0000017612) (01: 0000001053) (02: 0000001006) (03: 0000000080)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::163) (04: 0000000003) (08: 0000000005)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (153849 total) msec
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::154) (00: 0000145965) (01: 0000003812) (02: 0000003950) (03: 0000000120)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::163) (04: 0000000001) (05: 0000000001)
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 18 2017 07:52:23 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
May 18 2017 07:52:26 GMT: INFO (nsup): (thr_nsup.c::1319) {avn-psm} nsup start
May 18 2017 07:52:27 GMT: INFO (nsup): (thr_nsup.c::1237) {avn-psm} Records: 1574159, 1574159 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 1054 ms
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4796) system memory: free 5468756kb ( 67 percent free )
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4804) migrates in progress ( 0 , 0 ) ::: ClusterSize 3 ::: objects 3178912 ::: sub_objects 0
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4812) rec refs 3178912 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4818) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 42949672960 :: master 0
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4828) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 258 ::: dq 0 : fds - proto (7249, 3800672, 3793423) : hb (4, 4, 0) : fab (44, 44, 0)
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4830) heartbeat_received: self 0 : foreign 2232988
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4831) heartbeat_stats: bt 0 bf 1900417 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4844) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4860) namespace avn-psm: disk inuse: 2556649344 memory inuse: 2033908034 (bytes) sindex memory inuse: 63025440 (bytes) avail pct 93
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4869) namespace avn-psm: ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
May 18 2017 07:52:33 GMT: INFO (info): (thr_info.c::4905) partitions: actual 1365 sync 1394 desync 0 zombie 0 wait 0 absent 1337
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: reads (76024 total) msec
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::154) (00: 0000076000) (01: 0000000008) (02: 0000000010) (03: 0000000004)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::163) (04: 0000000002)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (109051 total) msec
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::154) (00: 0000102906) (01: 0000002994) (02: 0000003054) (03: 0000000095)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::163) (04: 0000000001) (05: 0000000001)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: proxy (19761 total) msec
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::154) (00: 0000017614) (01: 0000001053) (02: 0000001006) (03: 0000000080)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::163) (04: 0000000003) (08: 0000000005)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (153861 total) msec
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::154) (00: 0000145977) (01: 0000003812) (02: 0000003950) (03: 0000000120)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::163) (04: 0000000001) (05: 0000000001)
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 18 2017 07:52:33 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
From the above I see the Cluster Size is 3 / there is sufficient memory. All communication ports are open. We have already once tried rebooting principal node and it did not help. As you can see the load avergae is quite high, also since reboot: load average: 6.90, 6.72, 6.69
Please advise how to proceed? Am I even reading the logs correctly? Help is appreciated.
Marko Vrgotic ActiveVideo