Migration of Aerospike cluster without downtime


#1

Hello,

I had gone through the link How to Migrate Aerospike from one Cluster to Another but we don’t want any downtime as we are serving live traffic. we are currently using 5 machines in our cluster with below namespaces.

namespace lgp {
        replication-factor 2
        memory-size 10G
        default-ttl 0 # 5 days, use 0 to never expire/evict.
        high-water-memory-pct 90 # How full may the memory become before the server begins eviction (expiring records early)
        high-water-disk-pct 80 # How full may the disk become before the server begins eviction (expiring records early)
        stop-writes-pct 90  # How full may the memory  become before  we disallow new writes
        storage-engine device {
                file /opt/aerospike/data/lgp/object.dat
                filesize 150G
        }
}


namespace lgp_cache {
        replication-factor 1
        memory-size 15G
        default-ttl 2d # 5 days, use 0 to never expire/evict.
        high-water-memory-pct 60 
       high-water-disk-pct 80 
        stop-writes-pct 90  # How full may the memory  become before  we disallow new writes
        storage-engine memory
}

Old Machines configurations: RAM: 32GB CPU: 16 HDD: 250GB

New Mchine Configurations: RAM: 24GB CPU: 12 HDD: 1TB

is there any way to add new machines in live cluster with different configuration for namespaces(as data is growing rapidly, we want to allocate 750gb filesize for lgp namespace) and then removing old machine like cassandra provides decommission command.


#2

Sure you can. But the nodes will be utilized evenly (space wise not percentage). Your smaller systems may fill faster, so you will need to keep an eye on things. Probably want to add the new nodes empty quickly to avoid them filling, only waiting 10s or so between each node add. Once you have the new machines up, simply do a rolling shutdown on the old nodes - allowing migrates between each shutdown to compete


#4

Hello Albot,

Now I am trying to add new machine(10.84.245.153) with the existing cluster. Below is the configuration file of new node.

service {
	user root
	group root
	paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
	pidfile /var/run/aerospike/asd.pid
	service-threads 4
	transaction-queues 4
	transaction-threads-per-queue 4
	proto-fd-max 15000
	log-local-time true
}

logging {
	# Log file must be an absolute path.
	file /var/log/aerospike/aerospike.log {
		context any debug
	}
}

network {
	service {
		address any
		port 3000
		access-address 10.84.245.153
	}

	heartbeat {
		mode mesh 
		port 3002
		# To use unicast-mesh heartbeats, remove the 3 lines above, and see
		# aerospike_mesh.conf for alternative.
		
		mesh-seed-address-port 172.20.21.185 3002
		mesh-seed-address-port 172.20.21.186 3002
		mesh-seed-address-port 172.20.21.187 3002
		mesh-seed-address-port 172.20.21.188 3002
		mesh-seed-address-port 172.20.21.192 3002

	
		interval 150
		timeout 10
	}

	fabric {
		port 3001
	}

	info {
		port 3003
	}
}

namespace test {
	replication-factor 2
	memory-size 4G
	default-ttl 30d # 30 days, use 0 to never expire/evict.

	storage-engine memory
}

namespace bar {
	replication-factor 2
	memory-size 4G
	default-ttl 30d # 30 days, use 0 to never expire/evict.

	storage-engine memory

	# To use file storage backing, comment out the line above and use the
	# following lines instead.
#	storage-engine device {
#		file /opt/aerospike/data/bar.dat
#		filesize 16G
#		data-in-memory true # Store data in memory in addition to file.
#	}
}
namespace lgp {
        replication-factor 2
        memory-size 10G
        default-ttl 0 # 5 days, use 0 to never expire/evict.
        high-water-memory-pct 90 # How full may the memory become before the server begins eviction (expiring records early)
        high-water-disk-pct 80 # How full may the disk become before the server begins eviction (expiring records early)
        stop-writes-pct 90  # How full may the memory  become before  we disallow new writes
#       storage-engine memory

        # To use file storage backing, comment out the line above and use the
        # following lines instead.
        storage-engine device {
                file /myntra/aerospike/data/lgp/object.dat
                filesize 750G
#               data-in-memory true # Store data in memory in addition to file.
        }
}


namespace lgp_cache {
        replication-factor 1
        memory-size 10G
        default-ttl 2d # 5 days, use 0 to never expire/evict.
        high-water-memory-pct 60 # How full may the memory become before the server begins eviction (expiring records early)
#       high-water-disk-pct 80 # How full may the disk become before the server begins eviction (expiring records early)
        stop-writes-pct 90  # How full may the memory  become before  we disallow new writes
        storage-engine memory
}

Configuration file from one of the node from existing cluster is given below.

service {
	user root
	group root
	paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
	pidfile /var/run/aerospike/asd.pid
	service-threads 4
	transaction-queues 4
	transaction-threads-per-queue 4
	proto-fd-max 15000
	log-local-time true
}

logging {
	# Log file must be an absolute path.
	file /var/log/aerospike/aerospike.log {
		context any info
	}
}

network {
	service {
		address any
		port 3000
		access-address 172.20.21.185
	}

	heartbeat {
		mode mesh 
		port 3002
		# To use unicast-mesh heartbeats, remove the 3 lines above, and see
		# aerospike_mesh.conf for alternative.
		
		mesh-seed-address-port 172.20.21.186 3002
		mesh-seed-address-port 172.20.21.187 3002
		mesh-seed-address-port 172.20.21.188 3002
		mesh-seed-address-port 172.20.21.192 3002

	
		interval 150
		timeout 10
	}

	fabric {
		port 3001
	}

	info {
		port 3003
	}
}

namespace test {
	replication-factor 2
	memory-size 4G
	default-ttl 30d # 30 days, use 0 to never expire/evict.

	storage-engine memory
}

namespace bar {
	replication-factor 2
	memory-size 4G
	default-ttl 30d # 30 days, use 0 to never expire/evict.

	storage-engine memory

	# To use file storage backing, comment out the line above and use the
	# following lines instead.
#	storage-engine device {
#		file /opt/aerospike/data/bar.dat
#		filesize 16G
#		data-in-memory true # Store data in memory in addition to file.
#	}
}
namespace lgp {
        replication-factor 2
        memory-size 10G
        default-ttl 0 # 5 days, use 0 to never expire/evict.
        high-water-memory-pct 90 # How full may the memory become before the server begins eviction (expiring records early)
        high-water-disk-pct 80 # How full may the disk become before the server begins eviction (expiring records early)
        stop-writes-pct 90  # How full may the memory  become before  we disallow new writes
#       storage-engine memory

        # To use file storage backing, comment out the line above and use the
        # following lines instead.
        storage-engine device {
                file /opt/aerospike/data/lgp/object.dat
                filesize 150G
#               data-in-memory true # Store data in memory in addition to file.
        }
}


namespace lgp_cache {
        replication-factor 1
        memory-size 15G
        default-ttl 2d # 5 days, use 0 to never expire/evict.
        high-water-memory-pct 60 # How full may the memory become before the server begins eviction (expiring records early)
#       high-water-disk-pct 80 # How full may the disk become before the server begins eviction (expiring records early)
        stop-writes-pct 90  # How full may the memory  become before  we disallow new writes
        storage-engine memory
}

Only difference between these 2 files is size of filesize(for lgp namespace) and memory-size(for lgp_cache) but new machine is not getting added in the existing cluster.

Output of existing cluster

Admin> info network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              Node               Node                   Ip     Build   Cluster            Cluster     Cluster         Principal   Client       Uptime
                 .                 Id                    .         .      Size                Key   Integrity                 .    Conns            .
172.20.21.185:3000   BB92E0FBD565000    172.20.21.185:3000   C-3.9.1         5   48CC27E9791254A3   True        BB9EF48BD565000       32   3481:48:04
172.20.21.186:3000   BB99214BD565000    172.20.21.186:3000   C-3.9.1         5   48CC27E9791254A3   True        BB9EF48BD565000       33   3481:45:12
172.20.21.187:3000   *BB9EF48BD565000   172.20.21.187:3000   C-3.9.1         5   48CC27E9791254A3   True        BB9EF48BD565000       30   3481:43:31
172.20.21.188:3000   BB94179BD565000    172.20.21.188:3000   C-3.9.1         5   48CC27E9791254A3   True        BB9EF48BD565000       33   3481:12:53
172.20.21.192:3000   BB9245FBD565000    172.20.21.192:3000   C-3.9.1         5   48CC27E9791254A3   True        BB9EF48BD565000       31   3500:25:01
Number of rows: 5

output of new machine

Admin> info network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    Node               Node                   Ip     Build   Cluster            Cluster     Cluster         Principal   Client     Uptime
                       .                 Id                    .         .      Size                Key   Integrity                 .    Conns          .
myntra-none-6046925:3000   *BB999F5540A0102   10.84.245.153:3000   C-3.9.1         1   CC413304FC829C72   True        BB9EF48BD565000        7   00:23:22
Number of rows: 1

I am not able to figure it out why this new machine is not able to join existing cluster. Basic check(like accessibility of machines, port open) has been done.

Can you please help me to debug this issue. Any lead would be helpful.


#5

telnet to port 3002 works from new-node to old-node and vice versa? Can you post your ifconfig?


#6

The new node discovered the principal so heartbeat and fabric must be working (though maybe a one way fault?). I think we will need some logs to understand what is happening here. Any warnings from the logs?


#7

@kporter @Albot Sorry for the delay. Below are the logs from new node.

Jul 11 2017 14:44:15 GMT+0530: DEBUG (fabric): (fabric.c:1263) asking about node bb9245fbd565000 good read 33346 good write 33346
Jul 11 2017 14:44:15 GMT+0530: DEBUG (fabric): (fabric.c:1263) asking about node bb94179bd565000 good read 33346 good write 33346
Jul 11 2017 14:44:15 GMT+0530: DEBUG (paxos): (paxos.c:2040) PAXOS message with ID 9 received from node bb9ef48bd565000
Jul 11 2017 14:44:15 GMT+0530: DEBUG (paxos): (paxos.c:2743) unwrapped | received paxos message from node bb9ef48bd565000 command SYNC (9)
Jul 11 2017 14:44:15 GMT+0530: DEBUG (paxos): (paxos.c:3076) received sync message from bb9ef48bd565000
Jul 11 2017 14:44:15 GMT+0530: DEBUG (paxos): (paxos.c:428) SYNC getting cluster key 54a8295ed4c6c4d2
**_Jul 11 2017 14:44:15 GMT+0530: INFO (partition): (partition.c:235) DISALLOW MIGRATIONS_**
Jul 11 2017 14:44:15 GMT+0530: INFO (paxos): (paxos.c:147) cluster_key set to 0x54a8295ed4c6c4d2
Jul 11 2017 14:44:15 GMT+0530: DEBUG (paxos): (paxos.c:442) setting succession[0] = bb9ef48bd565000 to alive 

when i checked on the machines from existing cluster, i got to know fabric process is bind to different NIC card.

tcp        0      0 10.40.0.151:3001            10.40.0.158:3064            ESTABLISHED 0          317516582  -
tcp        0      0 10.40.0.151:3001            10.40.0.153:47604           ESTABLISHED 0          156429404  -

ifconfig output of a machine from existing cluster

eth0      Link encap:Ethernet  HWaddr 00:50:56:BD:0F:2E
          inet addr:10.40.0.151  Bcast:10.40.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:f2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9043214405 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7371972367 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2543220222540 (2.3 TiB)  TX bytes:2551073592952 (2.3 TiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:BD:1E:78
          inet addr:172.20.21.185  Bcast:172.20.23.255  Mask:255.255.248.0
          inet6 addr: fe80::250:56ff:febd:1e78/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6609006486 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3849719449 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1966068229402 (1.7 TiB)  TX bytes:1002592992809 (933.7 GiB)

eth2      Link encap:Ethernet  HWaddr 00:50:56:BD:0C:3B
          inet addr:10.20.0.150  Bcast:10.20.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:c3b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1521778656 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:92570021828 (86.2 GiB)  TX bytes:760 (760.0 b)

eth3      Link encap:Ethernet  HWaddr 00:50:56:BD:78:25
          inet addr:10.30.0.159  Bcast:10.30.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:7825/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1521768774 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:92567914320 (86.2 GiB)  TX bytes:718 (718.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:702660367 errors:0 dropped:0 overruns:0 frame:0
          TX packets:702660367 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:350155915456 (326.1 GiB)  TX bytes:350155915456 (326.1 GiB)

which is not accessible from new machine

[root@fcp-lgpaerospike3 aerospike]# telnet 10.40.0.151 3001
Trying 10.40.0.151...

Now if need to change the config of existing node to bind the fabric process on eth1(172.20.21.185) but my only concern is, we had deleted lots of entry from the cluster recently and i was reading somewhere, by restarting node those entries might reappear(Expired/Deleted data reappears after server is restarted).

so My question is, there is any way to find out that all the deleted entries has been flushed to disk and they won’t reappear if i restart


#8

That log snippet isn’t very useful, also not sure why you are logging debug logs, certain modules can be very noisy (should keep log level at info).

Short answer is you cannot.

Deletes are never synced to disk. The wblock they are in becomes a bit more free and eventually defrag will pack the remaining (undeleted) contents to a new wblock. At which point the wblock with the deleted data becomes eligible for new writes (note that the data will still return on coldstart at this point). When the wblock is eventually overwritten with new data is when the data is truly deleted. Also note that because of the defrag process as well as records being updated that a particular deleted record may have multiple version on disk that all need to be overwritten.

Durable deletes (deletes that persist a tombstone to disk) are offered in Aerospike Enterprise. If you really need them, this would be your best option.


#9

Or just write 0’s on your disk before starting back up :slight_smile:


#10

@kporter we are thinking for backup and then restore for cluster migration as mentioned in giving link How to Migrate Aerospike from one Cluster to Another .

Currently, we have around 500GB data in our cluster so backup and restore might take time, so we are thinking about the incremental backup. I have the following question around that,

  1. we are using Aerospike server version 3.9 which doesn’t support incremental backup, so If I install aerospike-tools with version 3.12 and let the server version 3.9, will incremental backup work??
  2. How does incremental backup works?? Does it scan the whole database and then filter out data based on date range Or it uses some kind on indexing to figure out the data which qualifies condition

#11

No, incremental backup requires “predicate filtering” feature which was introduce in Aerospike 3.12.0

Each primary index entry contains the last update time for that record. The incremental backup scans for anything that has been updated since a specified time.


#12

@kporter Thanks for quick response. Just one more question, Is there any way(some configurable parameters) to increase the speed to take backup. Below are the logs from currently running backup,

2017-07-12 18:15:35 GMT [INF] [21186] 37% complete (~4137 KiB/s, ~3785 rec/s, ~1119 B/rec)
2017-07-12 18:15:35 GMT [INF] [21186] ~8h47m52s remaining

#13

You can configure the priority of the backup scan:

http://www.aerospike.com/docs/tools/backup/asbackup.html#other-options


#14

I have found the best way is to run a backup process on each server in the cluster. Simply login to each node and run asbackup with --node-listhostname:3000


#15

@kporter @Albot asbackup took around 12 hours for dump all the cluster data. As we are still on 3.9 so incremental backup is not possible and backup is taking too much time so this rules out the possibility of migration cluster with this method.

Now the better option is to add new machine on the running cluster and then remove the older machine from the cluster which won’t require the downtime. Now as i have mentioned before fabric and info is attached to different IP which is not accessible with the new machine so I will change the config in existing cluster to bind with IP which is accessible from the new machine.

Below is the output of ifconfig from a node from cluster

[ankit.tyagi@nmc-aerospike1 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:BD:0F:2E
          inet addr:10.40.0.151  Bcast:10.40.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:f2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9390537920 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7672971096 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2639175195393 (2.4 TiB)  TX bytes:2673101386997 (2.4 TiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:BD:1E:78
          inet addr:172.20.21.185  Bcast:172.20.23.255  Mask:255.255.248.0
          inet6 addr: fe80::250:56ff:febd:1e78/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6763824691 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3933872540 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2011132872705 (1.8 TiB)  TX bytes:1064897072177 (991.7 GiB)

eth2      Link encap:Ethernet  HWaddr 00:50:56:BD:0C:3B
          inet addr:10.20.0.150  Bcast:10.20.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:c3b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1562795510 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95037134700 (88.5 GiB)  TX bytes:760 (760.0 b)

eth3      Link encap:Ethernet  HWaddr 00:50:56:BD:78:25
          inet addr:10.30.0.159  Bcast:10.30.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:7825/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1562785551 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:95035014742 (88.5 GiB)  TX bytes:718 (718.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:702697351 errors:0 dropped:0 overruns:0 frame:0
          TX packets:702697351 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:350196411778 (326.1 GiB)  TX bytes:350196411778 (326.1 GiB)

Currently fabric is bind to IP 10.40.0.151 which is not accessible from new machines, I will change fabric address to 172.20.21.185 and will restart the cluster. Now my question is,

Will this change in config force my cluster to start the migration??


#16

Yes, migrations should start any time a node leaves and rejoins a cluster.


#17

@kporter @Albot with the help of devops team, all the ports are accessible now.

New Node Ip: 10.84.245.153
Node from Existing cluster : 172.20.21.185 

I am able to access 3002(mash), 3001(fabric), 3003(info) from new node to old node and vice versa. Telnet output from new node to old node.

[ankit.tyagi@fcp-lgpaerospike3 aerospike]$ telnet 172.20.21.185  3001
Trying 172.20.21.185...
Connected to 172.20.21.185.
Escape character is '^]'.
^]

telnet> Connection closed.
[ankit.tyagi@fcp-lgpaerospike3 aerospike]$ telnet 172.20.21.185 3003
Trying 172.20.21.185...
Connected to 172.20.21.185.
Escape character is '^]'.
^]

telnet> quit
Connection closed.
[ankit.tyagi@fcp-lgpaerospike3 aerospike]$ telnet 172.20.21.185 3002
Trying 172.20.21.185...
Connected to 172.20.21.185.
Escape character is '^]'

Telnet output from old node to new node.

[ankit.tyagi@nmc-aerospike1 aerospike]$ telnet 10.84.245.153 3001
Trying 10.84.245.153...
Connected to 10.84.245.153.
Escape character is '^]'.
^]

telnet> quit
Connection closed.
[ankit.tyagi@nmc-aerospike1 aerospike]$ telnet 10.84.245.153 3003
Trying 10.84.245.153...
Connected to 10.84.245.153.
Escape character is '^]'.
^]

telnet> quit
Connection closed.
[ankit.tyagi@nmc-aerospike1 aerospike]$ telnet 10.84.245.153 3002
Trying 10.84.245.153...
Connected to 10.84.245.153.
Escape character is '^]'

Below is log snippet from new node.

Jul 17 2017 11:53:20 GMT+0530: INFO (paxos): (paxos.c:147) cluster_key set to 0x21e89130f7bda5b3
Jul 17 2017 11:53:20 GMT+0530: INFO (paxos): (paxos.c:3123) SUCCESSION [1500272598]@bb9ef48bd565000*: bb9ef48bd565000 bb999f5540a0102 bb99214bd565000 bb94179bd565000 bb92e0fbd565000 bb9245fbd565000
Jul 17 2017 11:53:20 GMT+0530: INFO (paxos): (paxos.c:3134) node bb9ef48bd565000 is still principal pro tempore
Jul 17 2017 11:53:20 GMT+0530: INFO (paxos): (paxos.c:2352) Sent partition sync request to node bb9ef48bd565000
Jul 17 2017 11:53:22 GMT+0530: INFO (paxos): (paxos.c:2657) as_paxos_retransmit_check: node bb999f5540a0102 retransmitting partition sync request to principal bb9ef48bd565000 ...
Jul 17 2017 11:53:22 GMT+0530: INFO (paxos): (paxos.c:2352) Sent partition sync request to node bb9ef48bd565000
Jul 17 2017 11:53:22 GMT+0530: INFO (partition): (partition.c:235) DISALLOW MIGRATIONS
Jul 17 2017 11:53:22 GMT+0530: INFO (paxos): (paxos.c:147) cluster_key set to 0x21e89130f7bda5b3
Jul 17 2017 11:53:22 GMT+0530: INFO (paxos): (paxos.c:3123) SUCCESSION [1500272598]@bb9ef48bd565000*: bb9ef48bd565000 bb999f5540a0102 bb99214bd565000 bb94179bd565000 bb92e0fbd565000 bb9245fbd565000
Jul 17 2017 11:53:22 GMT+0530: INFO (paxos): (paxos.c:3134) node bb9ef48bd565000 is still principal pro tempore
Jul 17 2017 11:53:22 GMT+0530: INFO (paxos): (paxos.c:2352) Sent partition sync request to node bb9ef48bd565000
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:2657) as_paxos_retransmit_check: node bb999f5540a0102 retransmitting partition sync request to principal bb9ef48bd565000 ...
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:2352) Sent partition sync request to node bb9ef48bd565000
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:2816) Self(bb999f5540a0102) add from Principal bb9ef48bd565000
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:2851) {1500272598} sending prepare_ack to bb9ef48bd565000
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:2816) Self(bb999f5540a0102) add from Principal bb9ef48bd565000
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:2816) Self(bb999f5540a0102) add from Principal bb9ef48bd565000
Jul 17 2017 11:53:24 GMT+0530: INFO (partition): (partition.c:235) DISALLOW MIGRATIONS
Jul 17 2017 11:53:24 GMT+0530: INFO (paxos): (paxos.c:147) cluster_key set to 0xb40ebd56109a1099

I am not able to figure out any reason. Can you please help me to figure out the reason and do let me know if any information if required from my side.


#18

@kporter @Albot Saw few more logs in new node.

Jul 17 2017 12:51:44 GMT+0530: INFO (batch): (thr_batch.c:343) Initialize batch-threads to 4
Jul 17 2017 12:51:44 GMT+0530: INFO (drv_ssd): (drv_ssd.c:4142) {lgp} floor set at 41 wblocks per device
Jul 17 2017 12:51:44 GMT+0530: INFO (hb): (hb.c:7112) Initializing mesh heartbeat socket : 10.84.245.153:3002
Jul 17 2017 12:51:44 GMT+0530: INFO (hb): (hb.c:7150) MTU of the network is 1500.
Jul 17 2017 12:51:44 GMT+0530: INFO (paxos): (paxos.c:3784) listening for other nodes (max 3000 milliseconds) ...
Jul 17 2017 12:51:44 GMT+0530: INFO (hb): (hb.c:6365) Updating mesh seed endpoint address from 172.20.21.185:3002 to 10.40.0.151:3002  

Not sure why mesh address is getting updated. this new ip (10.40.0.151) is not accessible from new node.

Ifconfig of existing node in cluster

[ankit.tyagi@nmc-aerospike1 aerospike]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:BD:0F:2E
          inet addr:10.40.0.151  Bcast:10.40.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:f2e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:9821628700 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8054569023 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2782663547908 (2.5 TiB)  TX bytes:2796738168215 (2.5 TiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:BD:1E:78
          inet addr:172.20.21.185  Bcast:172.20.23.255  Mask:255.255.248.0
          inet6 addr: fe80::250:56ff:febd:1e78/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6915882799 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4026897221 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2057804118567 (1.8 TiB)  TX bytes:1085736927172 (1011.1 GiB)

eth2      Link encap:Ethernet  HWaddr 00:50:56:BD:0C:3B
          inet addr:10.20.0.150  Bcast:10.20.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:c3b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1601718639 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:97378318518 (90.6 GiB)  TX bytes:760 (760.0 b)

eth3      Link encap:Ethernet  HWaddr 00:50:56:BD:78:25
          inet addr:10.30.0.159  Bcast:10.30.3.255  Mask:255.255.252.0
          inet6 addr: fe80::250:56ff:febd:7825/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1601708757 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:97376195048 (90.6 GiB)  TX bytes:718 (718.0 b)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:702729447 errors:0 dropped:0 overruns:0 frame:0
          TX packets:702729447 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:350207186462 (326.1 GiB)  TX bytes:350207186462 (326.1 GiB)

#19

Trying changing “address any” to “address 172.20.21.185”


#20

@Albot @kporter Yaah I have already done that. I had a question, is it safe to add new node or restart any node in cluster while migration is in process ?

One more weird thing, which I am seeing that objects are not getting replicated.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace                 Node   Avail%   Evictions      Master    Replica     Repl     Stop     Pending         Disk    Disk     HWM          Mem     Mem    HWM      Stop
        .                    .        .           .     Objects    Objects   Factor   Writes    Migrates         Used   Used%   Disk%         Used   Used%   Mem%   Writes%
        .                    .        .           .           .          .        .        .   (tx%,rx%)            .       .       .            .       .      .         .
lgp         172.20.21.185:3000   38         0.000      41.863 M   17.765 M   2        false    (78,91)      90.874 GB   61      80        4.911 GB   50      90     90
lgp         172.20.21.186:3000   35         0.000      44.108 M    0.000     2        false    (90,62)      94.137 GB   63      80        5.091 GB   51      90     90
lgp         172.20.21.187:3000   41         0.000      36.678 M   26.525 M   2        false    (68,88)      85.592 GB   58      80        4.505 GB   46      90     90
lgp         172.20.21.188:3000   42         0.000      37.510 M   27.026 M   2        false    (65,89)      84.191 GB   57      80        4.429 GB   45      90     90
lgp         172.20.21.192:3000   40         0.000      37.521 M   26.586 M   2        false    (71,90)      87.468 GB   59      80        4.603 GB   47      90     90
lgp                                         0.000     197.680 M   97.902 M                     (79,79)     442.263 GB                    23.539 GB

I am using replication factor 2, so number of master objects and replication objects should be same while I am seeing lots of difference in number. If I restart other nodes at this point, I might loose some data. How can I speed up this process.


#21

This isn’t completely safe prior to Aerospike 3.13.0.1 with paxos-protocol set to v5. If a node is removed while migrations are ongoing, writes could result in lost updates to existing records and reads may return an older copy.

With paxos-protocol v5 we not only improved how cluster discovery/formation works, but also improved the migration and partition versioning algorithms to prevent these replication deficit issues.