Stuck in adding nodes


#1

Am in 3.9.0.2.

I added 5 new empty nodes to existing 5 nodes with data. It is not joining the cluster.

asadm -> Cluster Visibility error (Please check services list):
Mar 08 2018 20:15:13 GMT: INFO (info): (ticker.c:415) {seooi} device-usage: used-bytes 0 avail-pct 99
Mar 08 2018 20:15:13 GMT: INFO (partition): (partition.c:235) DISALLOW MIGRATIONS
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:147) cluster_key set to 0x799ae23ce8966716
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:3201) SUCCESSION [1520540107]@bb998c054005452*: bb998c054005452 bb997c054005452 bb996c054005452 bb995c054005452 bb994c054005452 bb94bc354005452 bb94ac354005452 bb949c354005452 bb948c354005452 bb946c354005452 
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:3212) node bb998c054005452 is still principal pro tempore
Mar 08 2018 20:15:13 GMT: INFO (paxos): (paxos.c:2328) Sent partition sync request to node bb998c054005452

I see ports are pinging to and fro on both for 300{1,2,3} for both the clusters.

At least I would like to halt adding nodes. And let old nodes to live. Am afraid of below details on new nodes only. Pending migrates is stuck like this. Not sure if I can shutdown new guys. Need help as the apps are asleep now.

 Node   Avail%   Evictions    Master   Replica     Repl     Stop     Pending 
    .        .           .   Objects   Objects   Factor   Writes    Migrates 
    .        .           .         .         .        .        .   (tx%,rx%) 
:3000   99         0.000     0.000     0.000     2        false    (0,0)     
:3000   99         0.000     0.000     0.000     2        false    (0,0)     
:3000   99         0.000     0.000     0.000     2        false    (100,100) 
:3000   99         0.000     0.000     0.000     2        false    (100,100) 
:3000   99         0.000     0.000     0.000     2        false    (0,0)     
                   0.000     0.000     0.000                       (31,52)

#2

Could you show the rest of the output from asadm -e info?

Could you also run:

asinfo -e 'show stat like migrate'

#3
Admin> show stat like migrate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   survey001.com:3000   survey002.com:3000   survey003.com:3000   survey004.com:3000   survey005.com:3000  
migrate_allowed             :   false                                                      false                                                      false                                                      false                                                      false                                                      
migrate_partitions_remaining:   0                                                          0                                                          2720                                                       2709                                                       0                                                          
migrate_progress_recv       :   0                                                          0                                                          2720                                                       2709                                                       0                                                          
migrate_progress_send       :   0                                                          0                                                          2720                                                       2709                                                       0                                                          

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                           :   survey001.com:3000   survey002.com:3000   survey003.com:3000   survey004.com:3000   survey005.com:3000  
migrate-order                  :   5                                                          5                                                          5                                                          5                                                          5                                                          
migrate-sleep                  :   1                                                          1                                                          1                                                          1                                                          1                                                          
migrate_record_receives        :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_record_retransmits     :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_records_skipped        :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_records_transmitted    :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_rx_instances           :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_rx_partitions_active   :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_rx_partitions_initial  :   825                                                        859                                                        1904                                                       1899                                                       1956                                                       
migrate_rx_partitions_remaining:   0                                                          0                                                          1904                                                       1899                                                       0                                                          
migrate_tx_instances           :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_tx_partitions_active   :   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_tx_partitions_imbalance:   0                                                          0                                                          0                                                          0                                                          0                                                          
migrate_tx_partitions_initial  :   1392                                                       1412                                                       816                                                        810                                                        836                                                        
migrate_tx_partitions_remaining:   0                                                          0                                                          816                                                        810                                                        0                                                          

Admin> 
>> 




survey001.com:3000   BB946C354005452    11.14.11.1:3000   C-3.9.0.2         3   E5AA55F4E5F82B35   True        BB998C054005452       16   01:30:32   
survey002.com:3000   BB948C354005452    11.14.11.2:3000   C-3.9.0.2         3   E5AA55F4E5F82B35   True        BB998C054005452       13   01:30:32   
survey003.com:3000   BB949C354005452    11.14.11.3:3000   C-3.9.0.2         3   E5AA55F4E5F82B35   True        BB998C054005452       16   01:30:32   
survey004.com:3000   BB94AC354005452    11.14.11.4:3000   C-3.9.0.2         3   E5AA55F4E5F82B35   True        BB998C054005452       16   01:30:32   
survey005.com:3000   *BB94BC354005452   11.14.11.5:3000   C-3.9.0.2         3   E5AA55F4E5F82B35   True        BB998C054005452       17   01:30:32   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Namespace                                                       Node   Avail%   Evictions    Master   Replica     Repl     Stop     Pending       Disk    Disk     HWM        Mem     Mem    HWM      Stop   
          .                                                          .        .           .   Objects   Objects   Factor   Writes    Migrates       Used   Used%   Disk%       Used   Used%   Mem%   Writes%   
          .                                                          .        .           .         .         .        .        .   (tx%,rx%)          .       .       .          .       .      .         .   
seooi   000000001.com:3000   99         0.000     0.000     0.000     2        false    (0,0)       0.000 B    0       50      0.000 B    0       60     90        
seooi   000000002.com:3000   99         0.000     0.000     0.000     2        false    (0,0)       0.000 B    0       50      0.000 B    0       60     90        
seooi   000000003.com:3000   99         0.000     0.000     0.000     2        false    (100,100)   0.000 B    0       50      0.000 B    0       60     90        
seooi   000000004.com:3000   99         0.000     0.000     0.000     2        false    (100,100)   0.000 B    0       50      0.000 B    0       60     90        
seooi   000000005.com:3000   99         0.000     0.000     0.000     2        false    (0,0)       0.000 B    0       50      0.000 B    0       60     90        
seooi                                                                         0.000     0.000     0.000                       (31,52)     0.000 B                    0.000 B                             
Number of rows: 6

#4

Are your namespaces in the same position in the aerospike.conf on the newer nodes?

Is this a mesh or multicast configured cluster?


#5

Older ones don’t have new nodes IPs and new ones has (older one & new node) IPs. Other than that rest are the same. Also older one has another NS, that is not required on new nodes. So ignored that namespace block in new nodes.


#6

That is the issue! On the older version the namespace count and position have to match!


#7

You mean I should have all Namespaces to new ones?


#8

If that is the case. Can I shut down now? And re-do it with adding left out namespace as well? Is the new cluster is in safe state? I will take backup of old. Then rm -f dat file on new guys and then add entries of both NS and then start new nodes. Is this approach is good?


#9

Correct! namespace position and namepace number have to be the same. I the version you have adding a namespace requires a cluster shutdown. This has changed in newer versions.

I think the steps could be ,

-Take a backup.

-Shutdown the cluster

-Fix config to be similar in terms of namespaces

-Restart the cluster.

I don’t believe you need to rm any storage files.


#10

-Take a backup. (FROM Current AS data nodes)

-Shutdown the cluster (On NEW nodata nodes)

-Fix config to be similar in terms of namespaces (On NEW nodata nodes)

-Restart the cluster. (On NEW nodata nodes)


#11

correct! that should work! Backup is not really needed, but its always good practice!


#12

While running backup - 2018-03-08 21:38:40 GMT [ERR] [15949] Error while running node scan for BB998C054005452 - code 7: AEROSPIKE_ERR_CLUSTER_CHANGE at src/main/aerospike/aerospike_scan.c:190


#13

Ah yes you are doing a backup during a cluster change.

Please see:

asbackup does have a flag that allows for this.


#14

I removed the arguement - --no-cluster-change . Its going on. Do you think apps are not responding to their GETS at this moment?


#15

Above steps are done. Migrations are kicking in . Happy to see myself now. :wink: Thanks all, you guys are really helpful when needed.


#16

I see that from 3.13.3 onwards it supports certain nodes only to have one namespace.