Mannoj
March 8, 2018, 8:26pm
1
Am in 3.9.0.2.
I added 5 new empty nodes to existing 5 nodes with data. It is not joining the cluster.
asadm -> Cluster Visibility error (Please check services list):
Mar 08 2018 20:15:13 GMT: INFO (info): (ticker.c:415) {seooi} device-usage: used-bytes 0 avail-pct 99
Mar 08 2018 20:15:13 GMT: INFO (partition): (partition.c:235) DISALLOW MIGRATIONS
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:147) cluster_key set to 0x799ae23ce8966716
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:3201) SUCCESSION [1520540107]@bb998c054005452*: bb998c054005452 bb997c054005452 bb996c054005452 bb995c054005452 bb994c054005452 bb94bc354005452 bb94ac354005452 bb949c354005452 bb948c354005452 bb946c354005452
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:3212) node bb998c054005452 is still principal pro tempore
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:2328) Sent partition sync request to node bb998c054005452
I see ports are pinging to and fro on both for 300{1,2,3} for both the clusters.
At least I would like to halt adding nodes. And let old nodes to live. Am afraid of below details on new nodes only. Pending migrates is stuck like this. Not sure if I can shutdown new guys. Need help as the apps are asleep now.
Node Avail% Evictions Master Replica Repl Stop Pending
. . . Objects Objects Factor Writes Migrates
. . . . . . . (tx%,rx%)
:3000 99 0.000 0.000 0.000 2 false (0,0)
:3000 99 0.000 0.000 0.000 2 false (0,0)
:3000 99 0.000 0.000 0.000 2 false (100,100)
:3000 99 0.000 0.000 0.000 2 false (100,100)
:3000 99 0.000 0.000 0.000 2 false (0,0)
0.000 0.000 0.000 (31,52)
Could you show the rest of the output from asadm -e info
?
Could you also run:
asinfo -e 'show stat like migrate'
Mannoj
March 8, 2018, 9:11pm
3
Admin> show stat like migrate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : survey001.com:3000 survey002.com:3000 survey003.com:3000 survey004.com:3000 survey005.com:3000
migrate_allowed : false false false false false
migrate_partitions_remaining: 0 0 2720 2709 0
migrate_progress_recv : 0 0 2720 2709 0
migrate_progress_send : 0 0 2720 2709 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : survey001.com:3000 survey002.com:3000 survey003.com:3000 survey004.com:3000 survey005.com:3000
migrate-order : 5 5 5 5 5
migrate-sleep : 1 1 1 1 1
migrate_record_receives : 0 0 0 0 0
migrate_record_retransmits : 0 0 0 0 0
migrate_records_skipped : 0 0 0 0 0
migrate_records_transmitted : 0 0 0 0 0
migrate_rx_instances : 0 0 0 0 0
migrate_rx_partitions_active : 0 0 0 0 0
migrate_rx_partitions_initial : 825 859 1904 1899 1956
migrate_rx_partitions_remaining: 0 0 1904 1899 0
migrate_tx_instances : 0 0 0 0 0
migrate_tx_partitions_active : 0 0 0 0 0
migrate_tx_partitions_imbalance: 0 0 0 0 0
migrate_tx_partitions_initial : 1392 1412 816 810 836
migrate_tx_partitions_remaining: 0 0 816 810 0
Admin>
>>
survey001.com:3000 BB946C354005452 11.14.11.1:3000 C-3.9.0.2 3 E5AA55F4E5F82B35 True BB998C054005452 16 01:30:32
survey002.com:3000 BB948C354005452 11.14.11.2:3000 C-3.9.0.2 3 E5AA55F4E5F82B35 True BB998C054005452 13 01:30:32
survey003.com:3000 BB949C354005452 11.14.11.3:3000 C-3.9.0.2 3 E5AA55F4E5F82B35 True BB998C054005452 16 01:30:32
survey004.com:3000 BB94AC354005452 11.14.11.4:3000 C-3.9.0.2 3 E5AA55F4E5F82B35 True BB998C054005452 16 01:30:32
survey005.com:3000 *BB94BC354005452 11.14.11.5:3000 C-3.9.0.2 3 E5AA55F4E5F82B35 True BB998C054005452 17 01:30:32
Number of rows: 5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Avail% Evictions Master Replica Repl Stop Pending Disk Disk HWM Mem Mem HWM Stop
. . . . Objects Objects Factor Writes Migrates Used Used% Disk% Used Used% Mem% Writes%
. . . . . . . . (tx%,rx%) . . . . . . .
seooi 000000001.com:3000 99 0.000 0.000 0.000 2 false (0,0) 0.000 B 0 50 0.000 B 0 60 90
seooi 000000002.com:3000 99 0.000 0.000 0.000 2 false (0,0) 0.000 B 0 50 0.000 B 0 60 90
seooi 000000003.com:3000 99 0.000 0.000 0.000 2 false (100,100) 0.000 B 0 50 0.000 B 0 60 90
seooi 000000004.com:3000 99 0.000 0.000 0.000 2 false (100,100) 0.000 B 0 50 0.000 B 0 60 90
seooi 000000005.com:3000 99 0.000 0.000 0.000 2 false (0,0) 0.000 B 0 50 0.000 B 0 60 90
seooi 0.000 0.000 0.000 (31,52) 0.000 B 0.000 B
Number of rows: 6
lucien
March 8, 2018, 9:17pm
4
Are your namespaces in the same position in the aerospike.conf on the newer nodes?
Is this a mesh or multicast configured cluster?
Mannoj
March 8, 2018, 9:21pm
5
Older ones don’t have new nodes IPs and new ones has (older one & new node) IPs. Other than that rest are the same. Also older one has another NS, that is not required on new nodes. So ignored that namespace block in new nodes.
lucien
March 8, 2018, 9:24pm
6
That is the issue! On the older version the namespace count and position have to match!
Mannoj
March 8, 2018, 9:25pm
7
You mean I should have all Namespaces to new ones?
Mannoj
March 8, 2018, 9:27pm
8
If that is the case. Can I shut down now? And re-do it with adding left out namespace as well? Is the new cluster is in safe state? I will take backup of old. Then rm -f dat file on new guys and then add entries of both NS and then start new nodes. Is this approach is good?
lucien
March 8, 2018, 9:27pm
9
Correct! namespace position and namepace number have to be the same.
I the version you have adding a namespace requires a cluster shutdown. This has changed in newer versions.
I think the steps could be ,
-Take a backup.
-Shutdown the cluster
-Fix config to be similar in terms of namespaces
-Restart the cluster.
I don’t believe you need to rm any storage files.
Mannoj
March 8, 2018, 9:32pm
10
-Take a backup. (FROM Current AS data nodes)
-Shutdown the cluster (On NEW nodata nodes)
-Fix config to be similar in terms of namespaces (On NEW nodata nodes)
-Restart the cluster. (On NEW nodata nodes)
lucien
March 8, 2018, 9:34pm
11
correct! that should work! Backup is not really needed, but its always good practice!
Mannoj
March 8, 2018, 9:44pm
12
While running backup - 2018-03-08 21:38:40 GMT [ERR] [15949] Error while running node scan for BB998C054005452 - code 7: AEROSPIKE_ERR_CLUSTER_CHANGE at src/main/aerospike/aerospike_scan.c:190
lucien
March 8, 2018, 9:52pm
13
Ah yes you are doing a backup during a cluster change.
Please see:
asbackup does have a flag that allows for this.
Mannoj
March 8, 2018, 10:03pm
14
I removed the arguement - --no-cluster-change . Its going on. Do you think apps are not responding to their GETS at this moment?
Mannoj
March 8, 2018, 10:26pm
15
Above steps are done. Migrations are kicking in . Happy to see myself now. Thanks all, you guys are really helpful when needed.
Mannoj
March 8, 2018, 10:46pm
16
I see that from 3.13.3 onwards it supports certain nodes only to have one namespace.