Unable to create the cluster using mesh


#1

Hi ,

I am using aerospike single machine ( having 7 Millons objects) now i want start this machine as mesh and add another machine in mesh cluster. I have tried the same but unable to form the cluster.

Environment : AWS version : 3.6 config :

service {
	user root
	group root
	paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
	pidfile /var/run/aerospike/asd.pid
	service-threads 24
	transaction-queues 24
	transaction-threads-per-queue 4
	proto-fd-max 15000
}

logging {
	# Log file must be an absolute path.
	file /var/log/aerospike/aerospike.log {
		context any info
	}
}

network {
	service {
		address any
		port 3000
	}

	heartbeat {
        	mode mesh
		address 10.20.30.150
        	port 3002
		mesh-seed-address-port 10.20.30.151 3002
                interval 250
                timeout  25

	}

	fabric {
		port 3001
	}

	info {
		port 3003
	}
}

 namespace test {
        replication-factor 2
        memory-size 8G
        default-ttl 0 # 30 days, use 0 to never expire/evict.
                  storage-engine device {
		  device /dev/xvdb1
    		  write-block-size 128K

      }
      }


 namespace test1 {
        replication-factor 2
        ldt-enabled true
        memory-size 10G
        default-ttl 3600 # 1 hour, use 0 to never expire/evict.
        storage-engine memory

      }

config on node2 is same as above except mesh-seed-address-port

Error :

 Oct 24 2016 13:36:20 GMT: INFO (paxos): (paxos.c::3214) ... other node(s) detected - node will operate in a multi-node cluster
 Oct 24 2016 13:36:20 GMT: INFO (paxos): (paxos.c::3183) paxos supervisor thread started
 Oct 24 2016 13:36:20 GMT: INFO (partition): (partition.c::392) DISALLOW MIGRATIONS
 Oct 24 2016 13:36:20 GMT: INFO (nsup): (thr_nsup.c::1144) namespace supervisor started
 Oct 24 2016 13:36:20 GMT: INFO (ldt): (thr_nsup.c::1107) LDT supervisor started
 Oct 24 2016 13:36:20 GMT: INFO (paxos): (paxos.c::2863) SUCCESSION [1.0]@bb969791de89a02*: bb969791de89a02 bb95b6df0d57f02
 Oct 24 2016 13:36:20 GMT: INFO (paxos): (paxos.c::2874) node bb969791de89a02 is now principal pro tempore
 Oct 24 2016 13:36:20 GMT: INFO (demarshal): (thr_demarshal.c::260) Saved original JEMalloc arena #10 for thr_demarshal()
 Oct 24 2016 13:36:20 GMT: INFO (paxos): (paxos.c::2227) Sent partition sync request to node bb969791de89a02
 Oct 24 2016 13:36:20 GMT: INFO (demarshal): (thr_demarshal.c::288) Service started: socket 3000
 Oct 24 2016 13:36:20 GMT: WARNING (tsvc): (thr_tsvc.c::382) rejecting client transaction - initial partition balance unresolved
 Oct 24 2016 13:36:20 GMT: WARNING (tsvc): (thr_tsvc.c::382) rejecting client transaction - initial partition balance unresolved
 Oct 24 2016 13:36:20 GMT: WARNING (tsvc): (thr_tsvc.c::382) rejecting client transaction - initial partition balance unresolved

#2

The heartbeat layer has heard from another node, so the node which is starting knows it will be joining a multi-node cluster.

The cluster the node is joining appears to be a 2 node cluster.

They have began exchanging partition version information.

When restarting a node there is a brief period where the clients can reach the node but the initial rebalance hasn’t yet completed, so it doesn’t yet know which partitions it will own and which nodes to replicate to.

If this issue doesn’t clear itself within 30 seconds then there may be other problems, in which case could you share the configuration of the other node?