Data inconsistency after failed node back

oldforum · August 18, 2014, 5:49pm

by maxulan » Tue Aug 05, 2014 11:32 am Consistency test:

Given:

3 servers ring at EC2 in mesh mode. Replication factor: 2 Written 100 records into server1

Before nodes failing scenario:

server1 and server2 show the same:

Monitor> info SETS server1/test:consistency_test Objects: 100 server2/test:consistency_test Objects: 100

After stopping all nodes except of server2 and starting fisrt server3 and then server1:

server2:

shows 102 records (2 records are duplicated) Monitor> info SETS server1/test:consistency_test Objects: 12 server2/test:consistency_test Objects: 99 server3/test:consistency_test Objects: 1

node server1:

shows 101 record (1 record is missed, 2 records are duplicated ) Monitor> info SETS server1/test:consistency_test Objects: 12 server2/test:consistency_test Objects: 99

node server3:

shows 101 records (2 records are duplicated) Monitor> info SETS server2/test:consistency_test Objects: 99 server1/test:consistency_test Objects: 12 server3/test:consistency_test Objects: 1

Right now cluster is slowly increasing number of object at server1. In the log I can see:

Aug 05 2014 04:34:17 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 04:34:17 GMT: INFO (paxos): (paxos.c::2207) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes: dun:nodes=bb9c6849be34606

Please let me know if it’s a bug or there is some workaround.

Thanks, Max

oldforum · August 18, 2014, 5:49pm

by devops02 » Tue Aug 05, 2014 5:06 pm

Hi Max,

Can I have you post your configuration from your node here? Also if you can, can you give me the first 1000 lines of logs?

Jerry

oldforum · August 18, 2014, 5:50pm

by maxulan » Tue Aug 05, 2014 5:32 pm

Please find configuration below

paxos-recovery-policy is manual (default) and mode is mesh if it’s important.

====10.10.0.102:3000====
allow-inline-transactions true
auto-dun false
auto-undun false
batch-max-requests 5,000
batch-priority 200
batch-threads 4
defrag-queue-escape 10
defrag-queue-hwm 500
defrag-queue-lwm 1
defrag-queue-priority 1
dump-message-above-size 134,217,728
enable-fastpath true
enable-xdr false
fabric-port 3,001
fb-health-bad-pct 0
fb-health-good-pct 50
fb-health-msg-per-burst 0
fb-health-msg-timeout 200
heartbeat-address 10.10.0.102
heartbeat-interval 150
heartbeat-mode mesh
heartbeat-port 3,002
heartbeat-protocol v2
heartbeat-timeout 50
info-threads 16
max-msgs-per-type -1
memory-accounting false
mesh-address 10.10.0.107
mesh-port 3,002
microbenchmarks false
migrate-max-num-incoming 256
migrate-priority 40
migrate-read-priority 10
migrate-read-sleep 500
migrate-rx-lifetime-ms 60,000
migrate-threads 1
migrate-xmit-hwm 10
migrate-xmit-lwm 5
migrate-xmit-priority 40
migrate-xmit-sleep 500
network-info-port 3,003
nsup-auto-hwm-pct 15
nsup-period 120
nsup-queue-escape 10
nsup-queue-hwm 500
nsup-queue-lwm 1
nsup-startup-evict true
paxos-max-cluster-size 32
paxos-protocol v3
paxos-recovery-policy manual
paxos-retransmit-period 5
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
prole-extra-ttl 0
proto-fd-idle-ms 60,000
proto-fd-max 15,000
proxy-hist-track-back 1,800
proxy-hist-track-slice 10
proxy-hist-track-thresholds 1,8,64
query-batch-size 100
query-bufpool-size 256
query-hist-track-back 1,800
query-hist-track-slice 10
query-hist-track-thresholds 1,8,64
query-in-transaction-thread 0
query-job-tracking false
query-long-q-max-size 500
query-priority 10
query-rec-count-bound 4,294,967,295
query-req-in-query-thread 0
query-req-max-inflight 100
query-short-q-max-size 500
query-sleep 1
query-threads 6
query-threshold 10
query-worker-threads 15
query_rec_count-hist-track-back 1,800
query_rec_count-hist-track-slice 10
query_rec_count-hist-track-thresholds 1,8,64
reads-hist-track-back 1,800
reads-hist-track-slice 10
reads-hist-track-thresholds 1,8,64
replication-fire-and-forget false
respond-client-on-master-completion false
reuse-address true
scan-priority 200
scan-sleep 1
service-address 0.0.0.0
service-port 3,000
sindex-data-max-memory 18,446,744,073,709,551,616
sindex-populator-scan-priority 3
snub-nodes false
stop-writes-noxdr false
storage-benchmarks false
ticker-interval 10
transaction-duplicate-threads 0
transaction-max-ms 1,000
transaction-pending-limit 20
transaction-queues 4
transaction-repeatable-read false
transaction-retry-ms 1,000
transaction-threads-per-queue 4
udf-hist-track-back 1,800
udf-hist-track-slice 10
udf-hist-track-thresholds 1,8,64
udf-runtime-gmax-memory 18,446,744,073,709,551,616
udf-runtime-max-memory 18,446,744,073,709,551,616
use-queue-per-device false
write-duplicate-resolution-disable false
writes_master-hist-track-back 1,800
writes_master-hist-track-slice 10
writes_master-hist-track-thresholds 1,8,64
writes_reply-hist-track-back 1,800
writes_reply-hist-track-slice 10
writes_reply-hist-track-thresholds 1,8,64
xdr-delete-shipping-enabled true
xdr-nsup-deletes-enabled false
====10.10.0.107:3000====
access-address 10.10.0.107
allow-inline-transactions true
auto-dun false
auto-undun false
batch-max-requests 5,000
batch-priority 200
batch-threads 4
defrag-queue-escape 10
defrag-queue-hwm 500
defrag-queue-lwm 1
defrag-queue-priority 1
dump-message-above-size 134,217,728
enable-fastpath true
enable-xdr false
fabric-port 3,001
fb-health-bad-pct 0
fb-health-good-pct 50
fb-health-msg-per-burst 0
fb-health-msg-timeout 200
heartbeat-address 10.10.0.107
heartbeat-interval 15
heartbeat-mode mesh
heartbeat-port 3,002
heartbeat-protocol v2
heartbeat-timeout 50
info-threads 16
max-msgs-per-type -1
memory-accounting false
mesh-address 10.10.0.245
mesh-port 3,002
microbenchmarks false
migrate-max-num-incoming 256
migrate-priority 40
migrate-read-priority 10
migrate-read-sleep 500
migrate-rx-lifetime-ms 60,000
migrate-threads 1
migrate-xmit-hwm 10
migrate-xmit-lwm 5
migrate-xmit-priority 40
migrate-xmit-sleep 500
network-info-port 3,003
nsup-auto-hwm-pct 15
nsup-period 120
nsup-queue-escape 10
nsup-queue-hwm 500
nsup-queue-lwm 1
nsup-startup-evict true
paxos-max-cluster-size 32
paxos-protocol v3
paxos-recovery-policy manual
paxos-retransmit-period 5
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
prole-extra-ttl 0
proto-fd-idle-ms 60,000
proto-fd-max 15,000
proxy-hist-track-back 1,800
proxy-hist-track-slice 10
proxy-hist-track-thresholds 1,8,64
query-batch-size 100
query-bufpool-size 256
query-hist-track-back 1,800
query-hist-track-slice 10
query-hist-track-thresholds 1,8,64
query-in-transaction-thread 0
query-job-tracking false
query-long-q-max-size 500
query-priority 10
query-rec-count-bound 4,294,967,295
query-req-in-query-thread 0
query-req-max-inflight 100
query-short-q-max-size 500
query-sleep 1
query-threads 6
query-threshold 10
query-worker-threads 15
query_rec_count-hist-track-back 1,800
query_rec_count-hist-track-slice 10
query_rec_count-hist-track-thresholds 1,8,64
reads-hist-track-back 1,800
reads-hist-track-slice 10
reads-hist-track-thresholds 1,8,64
replication-fire-and-forget false
respond-client-on-master-completion false
reuse-address true
scan-priority 200
scan-sleep 1
service-address 0.0.0.0
service-port 3,000
sindex-data-max-memory 18,446,744,073,709,551,616
sindex-populator-scan-priority 3
snub-nodes false
stop-writes-noxdr false
storage-benchmarks false
ticker-interval 10
transaction-duplicate-threads 0
transaction-max-ms 1,000
transaction-pending-limit 20
transaction-queues 4
transaction-repeatable-read false
transaction-retry-ms 1,000
transaction-threads-per-queue 4
udf-hist-track-back 1,800
udf-hist-track-slice 10
udf-hist-track-thresholds 1,8,64
udf-runtime-gmax-memory 18,446,744,073,709,551,616
udf-runtime-max-memory 18,446,744,073,709,551,616
use-queue-per-device false
write-duplicate-resolution-disable false
writes_master-hist-track-back 1,800
writes_master-hist-track-slice 10
writes_master-hist-track-thresholds 1,8,64
writes_reply-hist-track-back 1,800
writes_reply-hist-track-slice 10
writes_reply-hist-track-thresholds 1,8,64
xdr-delete-shipping-enabled true
xdr-nsup-deletes-enabled false
====10.10.0.245:3000====
access-address 10.10.0.245
allow-inline-transactions true
auto-dun false
auto-undun false
batch-max-requests 5,000
batch-priority 200
batch-threads 4
defrag-queue-escape 10
defrag-queue-hwm 500
defrag-queue-lwm 1
defrag-queue-priority 1
dump-message-above-size 134,217,728
enable-fastpath true
enable-xdr false
fabric-port 3,001
fb-health-bad-pct 0
fb-health-good-pct 50
fb-health-msg-per-burst 0
fb-health-msg-timeout 200
heartbeat-address 10.10.0.245
heartbeat-interval 150
heartbeat-mode mesh
heartbeat-port 3,002
heartbeat-protocol v2
heartbeat-timeout 50
info-threads 16
max-msgs-per-type -1
memory-accounting false
mesh-address 10.10.0.102
mesh-port 3,002
microbenchmarks false
migrate-max-num-incoming 256
migrate-priority 40
migrate-read-priority 10
migrate-read-sleep 500
migrate-rx-lifetime-ms 60,000
migrate-threads 1
migrate-xmit-hwm 10
migrate-xmit-lwm 5
migrate-xmit-priority 40
migrate-xmit-sleep 500
network-info-port 3,003
nsup-auto-hwm-pct 15
nsup-period 120
nsup-queue-escape 10
nsup-queue-hwm 500
nsup-queue-lwm 1
nsup-startup-evict true
paxos-max-cluster-size 32
paxos-protocol v3
paxos-recovery-policy manual
paxos-retransmit-period 5
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
prole-extra-ttl 0
proto-fd-idle-ms 60,000
proto-fd-max 15,000
proxy-hist-track-back 1,800
proxy-hist-track-slice 10
proxy-hist-track-thresholds 1,8,64
query-batch-size 100
query-bufpool-size 256
query-hist-track-back 1,800
query-hist-track-slice 10
query-hist-track-thresholds 1,8,64
query-in-transaction-thread 0
query-job-tracking false
query-long-q-max-size 500
query-priority 10
query-rec-count-bound 4,294,967,295
query-req-in-query-thread 0
query-req-max-inflight 100
query-short-q-max-size 500
query-sleep 1
query-threads 6
query-threshold 10
query-worker-threads 15
query_rec_count-hist-track-back 1,800
query_rec_count-hist-track-slice 10
query_rec_count-hist-track-thresholds 1,8,64
reads-hist-track-back 1,800
reads-hist-track-slice 10
reads-hist-track-thresholds 1,8,64
replication-fire-and-forget false
respond-client-on-master-completion false
reuse-address true
scan-priority 200
scan-sleep 1
service-address 0.0.0.0
service-port 3,000
sindex-data-max-memory 18,446,744,073,709,551,616
sindex-populator-scan-priority 3
snub-nodes false
stop-writes-noxdr false
storage-benchmarks false
ticker-interval 10
transaction-duplicate-threads 0
transaction-max-ms 1,000
transaction-pending-limit 20
transaction-queues 4
transaction-repeatable-read false
transaction-retry-ms 1,000
transaction-threads-per-queue 4
udf-hist-track-back 1,800
udf-hist-track-slice 10
udf-hist-track-thresholds 1,8,64
udf-runtime-gmax-memory 18,446,744,073,709,551,616
udf-runtime-max-memory 18,446,744,073,709,551,616
use-queue-per-device false
write-duplicate-resolution-disable false
writes_master-hist-track-back 1,800
writes_master-hist-track-slice 10
writes_master-hist-track-thresholds 1,8,64
writes_reply-hist-track-back 1,800
writes_reply-hist-track-slice 10
writes_reply-hist-track-thresholds 1,8,64
xdr-delete-shipping-enabled true
xdr-nsup-deletes-enabled false

oldforum · August 18, 2014, 5:51pm

by maxulan » Wed Aug 06, 2014 9:00 am

Please find log example attached.

Aug 05 2014 19:01:15 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:15 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:15 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:15 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:15 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:15 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:19 GMT: INFO (hb): (hb.c::1713) HB node bb9ee492d91bb06 in different cluster - succession lists don't match
Aug 05 2014 19:01:20 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:20 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:20 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:20 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:20 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:20 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4228) system memory: free 3705220kb ( 96 percent free ) 
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 0 , 0 ) ::: ClusterSize 2 ::: objects 47309
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4243) rec refs 47309 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1204
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 18995, 18995) : hb 74 : fab 46
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184713092
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 1204 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5894440 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:01:23 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1604 desync 493 zombie 0 wait 0 absent 0
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:01:23 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:01:25 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:25 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:25 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:25 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:25 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:25 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:30 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:30 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:30 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:30 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:30 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:30 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:31 GMT: INFO (hb): (hb.c::1713) HB node bb92454f68be506 in same cluster - succession lists don't match
Aug 05 2014 19:01:31 GMT: INFO (hb): (hb.c::1713) HB node bb9c6849be34606 in different cluster - succession lists don't match
Aug 05 2014 19:01:33 GMT: INFO (partition): (partition.c::385) DISALLOW MIGRATIONS
Aug 05 2014 19:01:33 GMT: INFO (paxos): (paxos.c::2668) SUCCESSION [19.0]*: bb92454f68be506 bb9000a04053506 
Aug 05 2014 19:01:33 GMT: INFO (paxos): (paxos.c::2041) Sent partition sync request to node bb92454f68be506
Aug 05 2014 19:01:33 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:01:33 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:01:33 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:01:33 GMT: INFO (paxos): (paxos.c::2756) received partition sync message from bb92454f68be506
Aug 05 2014 19:01:33 GMT: INFO (config): (cluster_config.c::421) Rack Aware is disabled.
Aug 05 2014 19:01:33 GMT: INFO (partition): (cluster_config.c::382) Rack Aware is disabled.
Aug 05 2014 19:01:33 GMT: INFO (partition): (partition.c::2834) CLUSTER SIZE = 2
Aug 05 2014 19:01:33 GMT: INFO (partition): (partition.c::2873) Global state is well formed
Aug 05 2014 19:01:33 GMT: INFO (paxos): (partition.c::2527) setting replication factors: cluster size 2, paxos single replica limit 1
Aug 05 2014 19:01:33 GMT: INFO (paxos): (partition.c::2534) {test} replication factor is 2
Aug 05 2014 19:01:33 GMT: INFO (paxos): (partition.c::3783) global partition state: total 4096 lost 0 unique 493 duplicate 3603
Aug 05 2014 19:01:33 GMT: INFO (paxos): (partition.c::3784) partition state after fixing lost partitions (master): total 4096 lost 0 unique 493 duplicate 3603
Aug 05 2014 19:01:33 GMT: INFO (paxos): (partition.c::3785) 0 new partition version tree paths generated
Aug 05 2014 19:01:33 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4228) system memory: free 3705584kb ( 96 percent free ) 
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 1604 , 0 ) ::: ClusterSize 2 ::: objects 47309
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4243) rec refs 47309 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 1604 ::: mig rx 38
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 19005, 19005) : hb 74 : fab 46
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184747673
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 38 migtx 1604 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5894440 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:01:33 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1604 desync 493 zombie 0 wait 0 absent 0
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:01:33 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:01:35 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506

oldforum · August 18, 2014, 5:52pm

Aug 05 2014 19:01:35 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:35 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:35 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:35 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:38 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:01:38 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:01:38 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:01:40 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:40 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:40 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:40 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:40 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:40 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4228) system memory: free 3706064kb ( 96 percent free )
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 0 , 0 ) ::: ClusterSize 2 ::: objects 47317
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1197
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 19015, 19015) : hb 74 : fab 46
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184782431
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 1197 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:01:43 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1605 desync 492 zombie 0 wait 0 absent 0
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:01:43 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:01:45 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:45 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:45 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:45 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:45 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:45 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:49 GMT: INFO (hb): (hb.c::1713) HB node bb9ee492d91bb06 in different cluster - succession lists don't match
Aug 05 2014 19:01:50 GMT: INFO (nsup): (thr_nsup.c::1233) {test} nsup start
Aug 05 2014 19:01:50 GMT: INFO (nsup): (thr_nsup.c::1151) {test} Records: 27239, 0 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 19 ms
Aug 05 2014 19:01:50 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:50 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:50 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:50 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:50 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:50 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4228) system memory: free 3706064kb ( 96 percent free )
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 0 , 0 ) ::: ClusterSize 2 ::: objects 47317
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1197
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 19022, 19022) : hb 74 : fab 46
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184816855
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 1197 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:01:53 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1605 desync 492 zombie 0 wait 0 absent 0
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:01:53 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:01:55 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:01:55 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:01:55 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:01:55 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:01:55 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:01:55 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:00 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:00 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:00 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:00 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:00 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:00 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:01 GMT: INFO (hb): (hb.c::1713) HB node bb92454f68be506 in same cluster - succession lists don't match
Aug 05 2014 19:02:02 GMT: INFO (hb): (hb.c::1713) HB node bb9c6849be34606 in different cluster - succession lists don't match
Aug 05 2014 19:02:03 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:02:03 GMT: INFO (partition): (partition.c::385) DISALLOW MIGRATIONS
Aug 05 2014 19:02:03 GMT: INFO (paxos): (paxos.c::2668) SUCCESSION [21.0]*: bb92454f68be506 bb9000a04053506 
Aug 05 2014 19:02:03 GMT: INFO (paxos): (paxos.c::2041) Sent partition sync request to node bb92454f68be506
Aug 05 2014 19:02:03 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:02:03 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:02:03 GMT: INFO (paxos): (paxos.c::2756) received partition sync message from bb92454f68be506
Aug 05 2014 19:02:03 GMT: INFO (config): (cluster_config.c::421) Rack Aware is disabled.
Aug 05 2014 19:02:03 GMT: INFO (partition): (cluster_config.c::382) Rack Aware is disabled.
Aug 05 2014 19:02:03 GMT: INFO (partition): (partition.c::2834) CLUSTER SIZE = 2
Aug 05 2014 19:02:03 GMT: INFO (partition): (partition.c::2873) Global state is well formed
Aug 05 2014 19:02:03 GMT: INFO (paxos): (partition.c::2527) setting replication factors: cluster size 2, paxos single replica limit 1
Aug 05 2014 19:02:03 GMT: INFO (paxos): (partition.c::2534) {test} replication factor is 2
Aug 05 2014 19:02:03 GMT: INFO (paxos): (partition.c::3783) global partition state: total 4096 lost 0 unique 492 duplicate 3604
Aug 05 2014 19:02:03 GMT: INFO (paxos): (partition.c::3784) partition state after fixing lost partitions (master): total 4096 lost 0 unique 492 duplicate 3604
Aug 05 2014 19:02:03 GMT: INFO (paxos): (partition.c::3785) 0 new partition version tree paths generated
Aug 05 2014 19:02:03 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4228) system memory: free 3705544kb ( 96 percent free )
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 1605 , 0 ) ::: ClusterSize 2 ::: objects 47317
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 1605 ::: mig rx 38
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 19026, 19026) : hb 74 : fab 46
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184851487
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 38 migtx 1605 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:02:03 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1605 desync 492 zombie 0 wait 0 absent 0
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:02:03 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:02:05 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:05 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:05 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:05 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:05 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:08 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:02:08 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:02:08 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:02:10 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:10 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:10 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:10 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:10 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:10 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4228) system memory: free 3704716kb ( 96 percent free )
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 0 , 0 ) ::: ClusterSize 2 ::: objects 47317
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1180
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (3, 19036, 19033) : hb 74 : fab 46
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184886123
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 1180 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:02:13 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1605 desync 492 zombie 0 wait 0 absent 0
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:02:13 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:02:15 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:15 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:15 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:15 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:15 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:15 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:19 GMT: INFO (hb): (hb.c::1713) HB node bb9ee492d91bb06 in different cluster - succession lists don't match
Aug 05 2014 19:02:20 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:20 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:20 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:20 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:20 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:20 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4228) system memory: free 3704716kb ( 96 percent free )
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 0 , 0 ) ::: ClusterSize 2 ::: objects 47317
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 1180
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (3, 19046, 19043) : hb 74 : fab 46
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184920360
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 1180 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:02:23 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1605 desync 492 zombie 0 wait 0 absent 0
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)

oldforum · August 18, 2014, 5:52pm

Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:02:23 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:02:25 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:25 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:25 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:25 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:25 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:25 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:30 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:30 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:30 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:30 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:30 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:30 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:31 GMT: INFO (hb): (hb.c::1713) HB node bb92454f68be506 in same cluster - succession lists don't match
Aug 05 2014 19:02:32 GMT: INFO (hb): (hb.c::1713) HB node bb9c6849be34606 in different cluster - succession lists don't match
Aug 05 2014 19:02:33 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:02:33 GMT: INFO (partition): (partition.c::385) DISALLOW MIGRATIONS
Aug 05 2014 19:02:33 GMT: INFO (paxos): (paxos.c::2668) SUCCESSION [23.0]*: bb92454f68be506 bb9000a04053506 
Aug 05 2014 19:02:33 GMT: INFO (paxos): (paxos.c::2041) Sent partition sync request to node bb92454f68be506
Aug 05 2014 19:02:33 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:02:33 GMT: INFO (paxos): (paxos.c::2437) Ignoring self(bb9000a04053506) add from Principal bb92454f68be506
Aug 05 2014 19:02:33 GMT: INFO (paxos): (paxos.c::2756) received partition sync message from bb92454f68be506
Aug 05 2014 19:02:33 GMT: INFO (config): (cluster_config.c::421) Rack Aware is disabled.
Aug 05 2014 19:02:33 GMT: INFO (partition): (cluster_config.c::382) Rack Aware is disabled.
Aug 05 2014 19:02:33 GMT: INFO (partition): (partition.c::2834) CLUSTER SIZE = 2
Aug 05 2014 19:02:33 GMT: INFO (partition): (partition.c::2873) Global state is well formed
Aug 05 2014 19:02:33 GMT: INFO (paxos): (partition.c::2527) setting replication factors: cluster size 2, paxos single replica limit 1
Aug 05 2014 19:02:33 GMT: INFO (paxos): (partition.c::2534) {test} replication factor is 2
Aug 05 2014 19:02:33 GMT: INFO (paxos): (partition.c::3783) global partition state: total 4096 lost 0 unique 492 duplicate 3604
Aug 05 2014 19:02:33 GMT: INFO (paxos): (partition.c::3784) partition state after fixing lost partitions (master): total 4096 lost 0 unique 492 duplicate 3604
Aug 05 2014 19:02:33 GMT: INFO (paxos): (partition.c::3785) 0 new partition version tree paths generated
Aug 05 2014 19:02:33 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4228) system memory: free 3705912kb ( 96 percent free )
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 1605 , 0 ) ::: ClusterSize 2 ::: objects 47317
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 1605 ::: mig rx 38
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 19053, 19053) : hb 74 : fab 46
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184954791
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 38 migtx 1605 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:02:33 GMT: INFO (info): (thr_info.c::4313) partitions: actual 1999 sync 1605 desync 492 zombie 0 wait 0 absent 0
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:02:33 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:02:35 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:35 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:35 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:35 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:35 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:38 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:02:38 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:02:38 GMT: INFO (paxos): (paxos.c::2443) Ignoring self(bb9000a04053506) remove from Principal bb92454f68be506
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 85 event 2019
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 86 event 2019
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 87 event 2019
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 88 event 2019
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 89 event 2019
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 90 event 2019
Aug 05 2014 19:02:39 GMT: INFO (hb): (hb.c::1428) remote close: fd 91 event 2019
Aug 05 2014 19:02:40 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:40 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:40 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb92454f68be506 and self bb9000a04053506
Aug 05 2014 19:02:40 GMT: WARNING (paxos): (paxos.c::2269) CLUSTER INTEGRITY FAULT DETECTED! I am not principal.
Aug 05 2014 19:02:40 GMT: WARNING (paxos): (paxos.c::2271) AUTOMATIC CLUSTER INTEGRITY FAULT RECOVERY [Phase 1 of 2]: dunning principal: bb92454f68be506.
Aug 05 2014 19:02:40 GMT: INFO (hb): (hb.c::499) dunning nodes: bb92454f68be506
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1775) removing node on heartbeat failure: bb92454f68be506
Aug 05 2014 19:02:42 GMT: INFO (fabric): (fabric.c::1782) fabric: node bb92454f68be506 departed
Aug 05 2014 19:02:42 GMT: INFO (fabric): (fabric.c::1692) fabric disconnecting node: bb92454f68be506
Aug 05 2014 19:02:42 GMT: WARNING (paxos): (paxos.c::1929) quorum visibility lost! Continuing anyway ...
Aug 05 2014 19:02:42 GMT: WARNING (paxos): (paxos.c::1786) quorum visibility lost! Continuing anyway ...
Aug 05 2014 19:02:42 GMT: INFO (paxos): (paxos.c::1798) as_paxos_spark establishing transaction [24.0] 1 first op 4 bb92454f68be506
Aug 05 2014 19:02:42 GMT: INFO (partition): (partition.c::385) DISALLOW MIGRATIONS
Aug 05 2014 19:02:42 GMT: INFO (paxos): (paxos.c::1406) SUCCESSION [24.0]: bb9000a04053506 
Aug 05 2014 19:02:42 GMT: INFO (paxos): (paxos.c::2598) SINGLE NODE CLUSTER!!!
Aug 05 2014 19:02:42 GMT: INFO (config): (cluster_config.c::421) Rack Aware is disabled.
Aug 05 2014 19:02:42 GMT: INFO (partition): (cluster_config.c::382) Rack Aware is disabled.
Aug 05 2014 19:02:42 GMT: INFO (partition): (partition.c::2834) CLUSTER SIZE = 1
Aug 05 2014 19:02:42 GMT: INFO (partition): (partition.c::2873) Global state is well formed
Aug 05 2014 19:02:42 GMT: INFO (paxos): (partition.c::2527) setting replication factors: cluster size 1, paxos single replica limit 1
Aug 05 2014 19:02:42 GMT: INFO (paxos): (partition.c::2534) {test} replication factor is 1
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 86 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 87 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 88 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 89 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 90 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 91 event 2019
Aug 05 2014 19:02:42 GMT: INFO (paxos): (partition.c::3783) global partition state: total 4096 lost 492 unique 3604 duplicate 0
Aug 05 2014 19:02:42 GMT: INFO (paxos): (partition.c::3784) partition state after fixing lost partitions (master): total 4096 lost 0 unique 4096 duplicate 0
Aug 05 2014 19:02:42 GMT: INFO (paxos): (partition.c::3785) 3604 new partition version tree paths generated
Aug 05 2014 19:02:42 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Aug 05 2014 19:02:42 GMT: WARNING (smd): (system_metadata.c::1741) asmg(): No module name found!
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 86 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 87 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 88 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 89 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 90 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 91 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 86 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 87 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 88 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 89 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 90 event 2019
Aug 05 2014 19:02:42 GMT: INFO (hb): (hb.c::1428) remote close: fd 91 event 2019
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4228) system memory: free 3710220kb ( 96 percent free )
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4235) migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 47317
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4243) rec refs 47317 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 38
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4249) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4257) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 19063, 19063) : hb 79 : fab 34
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4259) heartbeat_received: self 0 : foreign 184985511
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4272) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 38 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4288) namespace test: disk inuse: 0 memory inuse: 5895340 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Aug 05 2014 19:02:43 GMT: INFO (info): (thr_info.c::4313) partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: reads (28 total)
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::92) (00: 0000000028) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: writes_master (45166 total)
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::84) (00: 0000042733) (01: 0000001100) (02: 0000000370) (03: 0000000671) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::84) (04: 0000000223) (05: 0000000049) (06: 0000000003) (09: 0000000004) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::92) (10: 0000000013) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: proxy (730 total)
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::84) (00: 0000000679) (01: 0000000019) (02: 0000000006) (03: 0000000015) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::84) (04: 0000000004) (05: 0000000004) (06: 0000000001) (07: 0000000001) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::92) (08: 0000000001) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: writes_reply (34733 total)
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::84) (00: 0000034280) (01: 0000000225) (02: 0000000115) (03: 0000000088) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::92) (04: 0000000024) (05: 0000000001) 
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: udf (0 total)
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: query (0 total)
Aug 05 2014 19:02:43 GMT: INFO (info): (hist.c::69) histogram dump: query_rec_count (0 total)
Aug 05 2014 19:02:44 GMT: WARNING (cf:socket): (socket.c::360) Error in delayed connect(): timed out
Aug 05 2014 19:02:44 GMT: INFO (hb): (hb.c::1303) couldn't connect to remote heartbeat service: at 10.10.0.245:3002 Connection timed out
Aug 05 2014 19:02:45 GMT: WARNING (cf:socket): (socket.c::360) Error in delayed connect(): timed out
Aug 05 2014 19:02:45 GMT: INFO (hb): (hb.c::1303) couldn't connect to remote heartbeat service: at 10.10.0.245:3002 Connection timed out
Aug 05 2014 19:02:45 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9c6849be34606 and self bb9000a04053506
Aug 05 2014 19:02:45 GMT: INFO (paxos): (paxos.c::2160) Cluster Integrity Check: Detected succession list discrepancy between node bb9ee492d91bb06 and self bb9000a04053506
Aug 05 2014 19:02:45 GMT: WARNING (paxos): (paxos.c::2278) I am not principal and waiting to undun principal: Count 1 of 5.
Aug 05 2014 19:02:46 GMT: INFO (partition): (partition.c::385) DISALLOW MIGRATIONS
Aug 05 2014 19:02:46 GMT: INFO (paxos): (paxos.c::2668) SUCCESSION [1.0]*: bb9c6849be34606 bb9000a04053506 
Aug 05 2014 19:02:46 GMT: INFO (paxos): (paxos.c::2041) Sent partition sync request to node bb9c6849be34606
Aug 05 2014 19:02:46 GMT: WARNING (cf:socket): (socket.c::360) Error in delayed connect(): timed out
Aug 05 2014 19:02:46 GMT: INFO (hb): (hb.c::1303) couldn't connect to remote heartbeat service: at 10.10.0.245:3002 Connection timed out
Aug 05 2014 19:02:46 GMT: INFO (partition): (partition.c::385) DISALLOW MIGRATIONS
Aug 05 2014 19:02:46 GMT: INFO (paxos): (paxos.c::2668) SUCCESSION [1.0]*: bb9c6849be34606 bb9000a04053506 
Aug 05 2014 19:02:46 GMT: INFO (paxos): (paxos.c::2041) Sent partition sync request to node bb9c6849be34606
Aug 05 2014 19:02:47 GMT: WARNING (cf:socket): (socket.c::360) Error in delayed connect(): timed out
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1303) couldn't connect to remote heartbeat service: at 10.10.0.245:3002 Connection timed out
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364437129 last 364434107
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 361 node bb9ee492d91bb06
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364437280 last 364434107
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 481 node bb9ee492d91bb06
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364437430 last 364434107
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 642 node bb9ee492d91bb06
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364437580 last 364434107
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 802 node bb9ee492d91bb06
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364437730 last 364434107
Aug 05 2014 19:02:47 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 962 node bb9ee492d91bb06
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364437880 last 364434107
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 1089 node bb9ee492d91bb06
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438030 last 364434107
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 1249 node bb9ee492d91bb06
Aug 05 2014 19:02:48 GMT: WARNING (cf:socket): (socket.c::360) Error in delayed connect(): timed out
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1303) couldn't connect to remote heartbeat service: at 10.10.0.245:3002 Connection timed out
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438181 last 364434107
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 1410 node bb9ee492d91bb06
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438331 last 364434107
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 1531 node bb9ee492d91bb06
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438481 last 364434107
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 1692 node bb9ee492d91bb06
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438631 last 364434107
Aug 05 2014 19:02:48 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 1852 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438781 last 364434107
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2013 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364438931 last 364434107
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2133 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1713) HB node bb9ee492d91bb06 in different cluster - succession lists don't match
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439082 last 364436009
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2655 node bb9c6849be34606
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439082 last 364434107
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2294 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: WARNING (cf:socket): (socket.c::360) Error in delayed connect(): timed out
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1303) couldn't connect to remote heartbeat service: at 10.10.0.245:3002 Connection timed out
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439232 last 364436009
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2815 node bb9c6849be34606
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439232 last 364434107
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2454 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439382 last 364436009
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2975 node bb9c6849be34606
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439382 last 364434107
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2614 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439532 last 364436009
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1591) hb expires but fabric says DEAD: node bb9c6849be34606
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1578) hb considers expiring: now 364439532 last 364434107
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1601) hb expires but fabric says ALIVE: lasttime 2735 node bb9ee492d91bb06
Aug 05 2014 19:02:49 GMT: INFO (hb): (hb.c::1775) removing node on heartbeat failure: bb9c6849be34606

wowuwowuwo · November 13, 2014, 2:52am

I have seen the same problem twice since my cluster began running. I restarted the node with error log “CLUSTER INTEGRITY FAULT”. I want to know why this happen.

kporter · November 14, 2014, 8:33am

Could you provide a copy of your aerospike.conf file?

Multicast Heartbeats

If you are configured to use multicast heartbeats, cluster faults are often caused by a switch being configured with storm-control or IGMP snooping. We recommend disabling storm-control.

For IGMP snooping either turn off IGMP snooping or enable the querier (a.k.a multicast routing).

You can test multicast within the cluster by using or Validate multicast with MTools guide.

Mesh Heartbeats

If you are configured to use mesh heartbeats, there have been many improvements in the mesh protocol since 3.3.17, so if you are running a prior version and experiencing these messages we recommend upgrading.

Topic		Replies	Views
Inconsistent reads during a node's start up Operations	2	1224	June 17, 2016
Data inconsistencies in reviving a dead node Installation	5	965	September 20, 2022
Cluster integrity fault Operations	1	2123	January 24, 2016
Inconsistent data after simulated failure Operations	5	1789	December 4, 2024
Removal of paxos-single-replica-limit and fail_on_cluster_change Feature Discussion	3	642	July 18, 2024