The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
How to use quiesce to expand a cluster vertically
Context
A key feature of Aerospike is the ability to add extra capacity to a cluster by expanding horizontally without performance penalties. At a certain point it may be preferable to replace a cluster of numerous smaller nodes with fewer large capacity nodes. This could be to reduce data centre footprint or simply to refresh with newer hardware. Swapping out nodes one by one is problematic, as the last of the smaller nodes may end up not having sufficient capacity to take their share of records within a cluster where there are fewer larger nodes. For that reason it is ideal to swap to the larger nodes in one fell swoop. This article details how this can be done using quiesce
to maintain replication factor throughout the operation and without interruption to client workload.
Method
Initial State
The initial state is a 6 node cluster with some 15 million records in the bar
namespace. Replication factor is 2 and the nodes are at around 40% disk usage. The cluster is stable and not migrating.
Admin> summary -l
Cluster
=======
1. Server Version : E-4.5.1.5
2. OS Version : Ubuntu 18.04.1 LTS (4.9.125-linuxkit)
3. Cluster Size : 6
4. Devices : Total 6, per-node 1
5. Memory : Total 48.000 GB, 5.67% used (2.720 GB), 94.33% available (45.280 GB)
6. Disk : Total 6.000 GB, 37.45% used (2.247 GB), 61.00% available contiguous space (3.660 GB)
7. Usage (Unique Data): 0.000 B in-memory, 547.218 MB on-disk
8. Active Namespaces : 1 of 2
9. Features : KVS, Scan
Namespaces
==========
test
====
1. Devices : Total 0, per-node 0
2. Memory : Total 24.000 GB, 0.00% used (0.000 B), 100.00% available (24.000 GB)
3. Replication Factor : 2
4. Rack-aware : False
5. Master Objects : 0.000
6. Usage (Unique Data): 0.000 B in-memory, 0.000 B on-disk
bar
===
1. Devices : Total 6, per-node 1
2. Memory : Total 24.000 GB, 11.33% used (2.720 GB), 88.67% available (21.280 GB)
3. Disk : Total 6.000 GB, 37.45% used (2.247 GB), 61.00% available contiguous space (3.660 GB)
4. Replication Factor : 2
5. Rack-aware : False
6. Master Objects : 15.100 M
7. Usage (Unique Data): 0.000 B in-memory, 547.218 MB on-disk
Admin> info
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-03-05 17:07:11 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Node Ip Build Cluster Migrations Cluster Cluster Principal Client Uptime
. Id . . Size . Key Integrity . Conns .
0534b209248c:3000 BB9030011AC4202 172.17.0.3:3000 E-4.5.1.5 6 0.000 D8B2B3140412 True BB9080011AC4202 2 02:47:17
172.17.0.4:3000 BB9040011AC4202 172.17.0.4:3000 E-4.5.1.5 6 0.000 D8B2B3140412 True BB9080011AC4202 3 02:47:18
172.17.0.5:3000 BB9050011AC4202 172.17.0.5:3000 E-4.5.1.5 6 0.000 D8B2B3140412 True BB9080011AC4202 3 02:47:18
172.17.0.6:3000 BB9060011AC4202 172.17.0.6:3000 E-4.5.1.5 6 0.000 D8B2B3140412 True BB9080011AC4202 2 02:47:18
172.17.0.7:3000 BB9070011AC4202 172.17.0.7:3000 E-4.5.1.5 6 0.000 D8B2B3140412 True BB9080011AC4202 5 02:47:18
172.17.0.8:3000 *BB9080011AC4202 172.17.0.8:3000 E-4.5.1.5 6 0.000 D8B2B3140412 True BB9080011AC4202 2 02:47:18
Number of rows: 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-03-05 17:07:11 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Expirations,Evictions Stop Disk Disk HWM Avail% Mem Mem HWM Stop
. . Records . Writes Used Used% Disk% . Used Used% Mem% Writes%
bar 0534b209248c:3000 4.963 M (0.000, 0.000) false 378.614 MB 37 50 62 446.105 MB 11 60 90
bar 172.17.0.4:3000 5.154 M (0.000, 0.000) false 393.222 MB 39 50 60 463.314 MB 12 60 90
bar 172.17.0.5:3000 4.855 M (0.000, 0.000) false 370.378 MB 37 50 62 436.401 MB 11 60 90
bar 172.17.0.6:3000 4.927 M (0.000, 0.000) false 375.895 MB 37 50 62 442.902 MB 11 60 90
bar 172.17.0.7:3000 4.865 M (0.000, 0.000) false 371.138 MB 37 50 62 437.294 MB 11 60 90
bar 172.17.0.8:3000 5.237 M (0.000, 0.000) false 399.540 MB 40 50 60 470.762 MB 12 60 90
bar 30.000 M (0.000, 0.000) 2.235 GB 2.634 GB
test 0534b209248c:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.4:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.5:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.6:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.7:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.8:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 0.000 (0.000, 0.000) 0.000 B 0.000 B
Number of rows: 14
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-03-05 17:07:11 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Repl Objects Tombstones Pending Rack
. . Records Factor (Master,Prole,Non-Replica) (Master,Prole,Non-Replica) Migrates ID
. . . . . . (tx,rx) .
bar 0534b209248c:3000 4.963 M 2 (2.444 M, 2.519 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.4:3000 5.154 M 2 (2.620 M, 2.534 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.5:3000 4.855 M 2 (2.464 M, 2.391 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.6:3000 4.927 M 2 (2.421 M, 2.506 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.7:3000 4.865 M 2 (2.370 M, 2.494 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.8:3000 5.237 M 2 (2.681 M, 2.556 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 30.000 M (15.000 M, 15.000 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000)
test 0534b209248c:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.4:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.5:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.6:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.7:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.8:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 0.000 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000)
Number of rows: 14
Admin>
The plan is to replace these 6 nodes with 3 nodes having larger disks. This should take place without reducing replication factor and without affecting the workflow. Here, the Aerospike java benchmark
tool is being used to provide a workload of 50% read, 50% update across 100,000 keys.
root@dafbcaffcac9:~/java/aerospike-client-java-4.3.1/benchmarks# ./run_benchmarks -h 172.17.0.3 -p 3000 -n bar -k 100000 -w RU,50 -S 1 -z 10
Benchmark: 172.17.0.3 3000, namespace: bar, set: testset, threads: 10, workload: READ_UPDATE
read: 50% (all bins: 100%, single bin: 0%), write: 50% (all bins: 100%, single bin: 0%)
keys: 100000, start key: 1, transactions: 0, bins: 1, random values: false, throughput: unlimited
Add new nodes
Here the 3 new nodes have been added to the cluster and migrations are ongoing. In this test, the new nodes are the same specification other than having larger disks. This is reflected below in the excerpts from asadm
:
Admin> summary -l
Cluster (Migrations in Progress)
=================================
1. Server Version : E-4.5.1.5
2. OS Version : Ubuntu 18.04.1 LTS (4.9.125-linuxkit)
3. Cluster Size : 9
4. Devices : Total 9, per-node 1
5. Memory : Total 72.000 GB, 3.94% used (2.840 GB), 96.06% available (69.160 GB)
6. Disk : Total 15.000 GB, 15.02% used (2.252 GB), 83.80% available contiguous space (12.570 GB)
7. Usage (Unique Data): 0.000 B in-memory, 982.011 MB on-disk
8. Active Namespaces : 1 of 2
9. Features : KVS, Scan
Namespaces
==========
bar
===
1. Devices : Total 9, per-node 1
2. Memory : Total 36.000 GB, 7.89% used (2.840 GB), 92.11% available (33.160 GB)
3. Disk : Total 15.000 GB, 15.02% used (2.252 GB), 83.80% available contiguous space (12.570 GB)
4. Replication Factor : 2
5. Rack-aware : False
6. Master Objects : 15.100 M
7. Usage (Unique Data): 0.000 B in-memory, 982.011 MB on-disk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-03-06 15:22:21 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Expirations,Evictions Stop Disk Disk HWM Avail% Mem Mem HWM Stop
. . Records . Writes Used Used% Disk% . Used Used% Mem% Writes%
bar 0534b209248c:3000 4.996 M (0.000, 0.000) false 380.648 MB 38 50 61 448.583 MB 11 60 90
bar 172.17.0.10:3000 39.786 K (0.000, 0.000) false 2.966 MB 1 50 99 3.506 MB 1 60 90
bar 172.17.0.11:3000 40.584 K (0.000, 0.000) false 3.031 MB 1 50 99 3.583 MB 1 60 90
bar 172.17.0.12:3000 25.902 K (0.000, 0.000) false 1.908 MB 1 50 99 2.259 MB 1 60 90
bar 172.17.0.4:3000 5.189 M (0.000, 0.000) false 395.327 MB 39 50 60 465.880 MB 12 60 90
bar 172.17.0.5:3000 4.887 M (0.000, 0.000) false 372.362 MB 37 50 62 438.819 MB 11 60 90
bar 172.17.0.6:3000 4.960 M (0.000, 0.000) false 377.889 MB 37 50 62 445.332 MB 11 60 90
bar 172.17.0.7:3000 4.897 M (0.000, 0.000) false 373.103 MB 37 50 62 439.689 MB 11 60 90
bar 172.17.0.8:3000 5.272 M (0.000, 0.000) false 401.666 MB 40 50 59 473.353 MB 12 60 90
bar 30.306 M (0.000, 0.000) 2.255 GB 2.657 GB
test 0534b209248c:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.10:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.11:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.12:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.4:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.5:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.6:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.7:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.8:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 0.000 (0.000, 0.000) 0.000 B 0.000 B
Number of rows: 20
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-03-06 15:22:21 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Repl Objects Tombstones Pending Rack
. . Records Factor (Master,Prole,Non-Replica) (Master,Prole,Non-Replica) Migrates ID
. . . . . . (tx,rx) .
bar 0534b209248c:3000 4.996 M 2 (2.450 M, 1.076 M, 1.470 M) (0.000, 0.000, 0.000) (615.000, 493.000) 0
bar 172.17.0.10:3000 39.766 K 2 (35.476 K, 4.290 K, 0.000) (0.000, 0.000, 0.000) (949.000, 964.000) 0
bar 172.17.0.11:3000 40.314 K 2 (35.605 K, 4.709 K, 0.000) (0.000, 0.000, 0.000) (1.143 K, 953.000) 0
bar 172.17.0.12:3000 26.089 K 2 (21.250 K, 4.839 K, 0.000) (0.000, 0.000, 0.000) (771.000, 939.000) 0
bar 172.17.0.4:3000 5.189 M 2 (2.620 M, 1.153 M, 1.416 M) (0.000, 0.000, 0.000) (632.000, 741.000) 0
bar 172.17.0.5:3000 4.887 M 2 (2.462 M, 1.047 M, 1.378 M) (0.000, 0.000, 0.000) (583.000, 665.000) 0
bar 172.17.0.6:3000 4.960 M 2 (2.419 M, 1.050 M, 1.490 M) (0.000, 0.000, 0.000) (593.000, 648.000) 0
bar 172.17.0.7:3000 4.897 M 2 (2.368 M, 1.089 M, 1.440 M) (0.000, 0.000, 0.000) (606.000, 553.000) 0
bar 172.17.0.8:3000 5.272 M 2 (2.688 M, 1.038 M, 1.546 M) (0.000, 0.000, 0.000) (649.000, 597.000) 0
bar 30.306 M (15.100 M, 6.466 M, 8.740 M) (0.000, 0.000, 0.000) (6.541 K, 6.553 K)
test 0534b209248c:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (392.000, 0.000) 0
test 172.17.0.10:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (115.000, 910.000) 0
test 172.17.0.11:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (119.000, 897.000) 0
test 172.17.0.12:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (113.000, 905.000) 0
test 172.17.0.4:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (409.000, 0.000) 0
test 172.17.0.5:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (376.000, 0.000) 0
test 172.17.0.6:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (397.000, 0.000) 0
test 172.17.0.7:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (376.000, 0.000) 0
test 172.17.0.8:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (415.000, 0.000) 0
test 0.000 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (2.712 K, 2.712 K)
Number of rows: 20
Admin>
As expected, there is no impact on client workload (note the 3 nodes discovered by the client):
2019-03-06 15:22:04.678 write(tps=3985 timeouts=0 errors=0) read(tps=4003 timeouts=0 errors=0) total(tps=7988 timeouts=0 errors=0)
2019-03-06 15:22:05.679 write(tps=1800 timeouts=0 errors=0) read(tps=1757 timeouts=0 errors=0) total(tps=3557 timeouts=0 errors=0)
2019-03-06 15:22:06.680 write(tps=1125 timeouts=0 errors=0) read(tps=1145 timeouts=0 errors=0) total(tps=2270 timeouts=0 errors=0)
2019-03-06 15:22:07.686 write(tps=1305 timeouts=0 errors=0) read(tps=1297 timeouts=0 errors=0) total(tps=2602 timeouts=0 errors=0)
2019-03-06 15:22:07.727 INFO Thread tend Add node BB90C0011AC4202 172.17.0.12 3000
2019-03-06 15:22:07.730 INFO Thread tend Add node BB90B0011AC4202 172.17.0.11 3000
2019-03-06 15:22:07.733 INFO Thread tend Add node BB90A0011AC4202 172.17.0.10 3000
2019-03-06 15:22:08.686 write(tps=1070 timeouts=2 errors=0) read(tps=1084 timeouts=0 errors=0) total(tps=2154 timeouts=2 errors=0)
2019-03-06 15:22:09.687 write(tps=1332 timeouts=0 errors=0) read(tps=1371 timeouts=0 errors=0) total(tps=2703 timeouts=0 errors=0)
2019-03-06 15:22:10.689 write(tps=1204 timeouts=0 errors=0) read(tps=1242 timeouts=0 errors=0) total(tps=2446 timeouts=0 errors=0)
2019-03-06 15:22:11.690 write(tps=1672 timeouts=0 errors=0) read(tps=1674 timeouts=0 errors=0) total(tps=3346 timeouts=0 errors=0)
2019-03-06 15:22:12.690 write(tps=1380 timeouts=0 errors=0) read(tps=1429 timeouts=0 errors=0) total(tps=2809 timeouts=0 errors=0)
2019-03-06 15:22:13.691 write(tps=1318 timeouts=0 errors=0) read(tps=1274 timeouts=0 errors=0) total(tps=2592 timeouts=0 errors=0)
2019-03-06 15:22:14.691 write(tps=1169 timeouts=0 errors=0) read(tps=1228 timeouts=0 errors=0) total(tps=2397 timeouts=0 errors=0)
Quiesce original 6 nodes
Once the new nodes are in the cluster the original 6 nodes can be quiesced
. It is not necessary to wait for migrations to complete before quiescing as the quiesced
nodes will not give up master ownership for their partitions until they have migrated out those partitions. The quiesce
command is executed and the status checked by running pending quiesce
.
Admin> asinfo -v 'quiesce:' with 172.17.0.3
0534b209248c:3000 (172.17.0.3) returned:
ok
Admin> asinfo -v 'quiesce:' with 172.17.0.4
172.17.0.4:3000 (172.17.0.4) returned:
ok
Admin> asinfo -v 'quiesce:' with 172.17.0.5
172.17.0.5:3000 (172.17.0.5) returned:
ok
Admin> asinfo -v 'quiesce:' with 172.17.0.6
172.17.0.6:3000 (172.17.0.6) returned:
ok
Admin> asinfo -v 'quiesce:' with 172.17.0.7
172.17.0.7:3000 (172.17.0.7) returned:
ok
Admin> asinfo -v 'quiesce:' with 172.17.0.8
172.17.0.8:3000 (172.17.0.8) returned:
ok
Admin> show statistics like pending_quiesce
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~bar Namespace Statistics (2019-03-06 16:28:22 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : 0534b209248c:3000 172.17.0.10:3000 172.17.0.11:3000 172.17.0.12:3000 172.17.0.4:3000 172.17.0.5:3000 172.17.0.6:3000 172.17.0.7:3000 172.17.0.8:3000
pending_quiesce: true false false false true true true true true
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2019-03-06 16:28:22 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : 0534b209248c:3000 172.17.0.10:3000 172.17.0.11:3000 172.17.0.12:3000 172.17.0.4:3000 172.17.0.5:3000 172.17.0.6:3000 172.17.0.7:3000 172.17.0.8:3000
pending_quiesce: true false false false true true true true true
At this point nothing has happened as the quiesce
command does not take effect until a recluster
command is issued as follows:
Admin> asinfo -v 'recluster:'
172.17.0.8:3000 (172.17.0.8) returned:
ignored-by-non-principal
172.17.0.11:3000 (172.17.0.11) returned:
ignored-by-non-principal
0534b209248c:3000 (172.17.0.3) returned:
ignored-by-non-principal
172.17.0.4:3000 (172.17.0.4) returned:
ignored-by-non-principal
172.17.0.7:3000 (172.17.0.7) returned:
ignored-by-non-principal
172.17.0.6:3000 (172.17.0.6) returned:
ignored-by-non-principal
172.17.0.12:3000 (172.17.0.12) returned:
ok
172.17.0.5:3000 (172.17.0.5) returned:
ignored-by-non-principal
172.17.0.10:3000 (172.17.0.10) returned:
ignored-by-non-principal
Monitor migrations
The quiesced
nodes will continue to take traffic as long as they are a master for a given partition. Nodes will give up the master role for a partition when it has fully migrated out to the new master node. In the interim between the master role changing and the clients receiving a new partition map, the quiesced
nodes will proxy transactions. The client workload continues uninterrupted.
Admin> info
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-03-06 16:30:46 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Repl Objects Tombstones Pending Rack
. . Records Factor (Master,Prole,Non-Replica) (Master,Prole,Non-Replica) Migrates ID
. . . . . . (tx,rx) .
bar 0534b209248c:3000 3.301 M 2 (1.013 M, 0.000, 2.288 M) (0.000, 0.000, 0.000) (275.000, 0.000) 0
bar 172.17.0.10:3000 3.384 M 2 (2.959 M, 424.843 K, 0.000) (0.000, 0.000, 0.000) (1.239 K, 1.789 K) 0
bar 172.17.0.11:3000 3.324 M 2 (2.902 M, 421.437 K, 0.000) (0.000, 0.000, 0.000) (1.237 K, 1.797 K) 0
bar 172.17.0.12:3000 3.346 M 2 (2.893 M, 453.114 K, 0.000) (0.000, 0.000, 0.000) (1.273 K, 1.884 K) 0
bar 172.17.0.4:3000 3.558 M 2 (1.122 M, 0.000, 2.436 M) (0.000, 0.000, 0.000) (304.000, 0.000) 0
bar 172.17.0.5:3000 3.321 M 2 (1.085 M, 0.000, 2.236 M) (0.000, 0.000, 0.000) (294.000, 0.000) 0
bar 172.17.0.6:3000 3.271 M 2 (973.272 K, 0.000, 2.297 M) (0.000, 0.000, 0.000) (264.000, 0.000) 0
bar 172.17.0.7:3000 3.228 M 2 (991.724 K, 0.000, 2.236 M) (0.000, 0.000, 0.000) (269.000, 0.000) 0
bar 172.17.0.8:3000 3.524 M 2 (1.164 M, 0.000, 2.360 M) (0.000, 0.000, 0.000) (316.000, 0.000) 0
bar 30.256 M (15.104 M, 1.299 M, 13.853 M) (0.000, 0.000, 0.000) (5.471 K, 5.470 K)
test 0534b209248c:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (21.000, 0.000) 0
test 172.17.0.10:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (1.155 K, 1.177 K) 0
test 172.17.0.11:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (1.157 K, 1.073 K) 0
test 172.17.0.12:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (1.079 K, 1.309 K) 0
test 172.17.0.4:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (37.000, 0.000) 0
test 172.17.0.5:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (28.000, 0.000) 0
test 172.17.0.6:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (6.000, 0.000) 0
test 172.17.0.7:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (34.000, 0.000) 0
test 172.17.0.8:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (24.000, 0.000) 0
test 0.000 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (3.541 K, 3.559 K)
Number of rows: 20
Shutdown quiesced nodes
Once migrations are finished the old nodes are shut down.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-03-06 17:43:55 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Repl Objects Tombstones Pending Rack
. . Records Factor (Master,Prole,Non-Replica) (Master,Prole,Non-Replica) Migrates ID
. . . . . . (tx,rx) .
bar 0534b209248c:3000 3.301 M 2 (0.000, 0.000, 3.301 M) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.10:3000 9.974 M 2 (4.991 M, 4.982 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.11:3000 9.940 M 2 (4.998 M, 4.941 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.12:3000 10.287 M 2 (5.111 M, 5.176 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.4:3000 3.558 M 2 (0.000, 0.000, 3.558 M) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.5:3000 3.321 M 2 (0.000, 0.000, 3.321 M) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.6:3000 3.271 M 2 (0.000, 0.000, 3.271 M) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.7:3000 3.228 M 2 (0.000, 0.000, 3.228 M) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.8:3000 3.524 M 2 (0.000, 0.000, 3.524 M) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 50.403 M (15.100 M, 15.100 M, 20.203 M) (0.000, 0.000, 0.000) (0.000, 0.000)
test 0534b209248c:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.10:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.11:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.12:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.4:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.5:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.6:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.7:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.8:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 0.000 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000)
Number of rows: 20
Admin>
The client output looks as follows:
2019-03-06 17:44:57.780 write(tps=6414 timeouts=0 errors=0) read(tps=6355 timeouts=0 errors=0) total(tps=12769 timeouts=0 errors=0)
2019-03-06 17:44:58.782 write(tps=6411 timeouts=0 errors=0) read(tps=6100 timeouts=0 errors=0) total(tps=12511 timeouts=0 errors=0)
2019-03-06 17:44:59.786 write(tps=6196 timeouts=0 errors=0) read(tps=6259 timeouts=0 errors=0) total(tps=12455 timeouts=0 errors=0)
2019-03-06 17:45:00.787 write(tps=5977 timeouts=0 errors=0) read(tps=6094 timeouts=0 errors=0) total(tps=12071 timeouts=0 errors=0)
2019-03-06 17:45:01.789 write(tps=6427 timeouts=0 errors=0) read(tps=6378 timeouts=0 errors=0) total(tps=12805 timeouts=0 errors=0)
2019-03-06 17:45:02.792 write(tps=5991 timeouts=0 errors=0) read(tps=6112 timeouts=0 errors=0) total(tps=12103 timeouts=0 errors=0)
2019-03-06 17:45:03.792 write(tps=6553 timeouts=0 errors=0) read(tps=6643 timeouts=0 errors=0) total(tps=13196 timeouts=0 errors=0)
2019-03-06 17:45:04.795 write(tps=6741 timeouts=0 errors=0) read(tps=6521 timeouts=0 errors=0) total(tps=13262 timeouts=0 errors=0)
2019-03-06 17:45:05.798 write(tps=6382 timeouts=0 errors=0) read(tps=6420 timeouts=0 errors=0) total(tps=12802 timeouts=0 errors=0)
2019-03-06 17:45:06.799 write(tps=4873 timeouts=0 errors=0) read(tps=4824 timeouts=0 errors=0) total(tps=9697 timeouts=0 errors=0)
2019-03-06 17:45:07.801 write(tps=3442 timeouts=0 errors=0) read(tps=3621 timeouts=0 errors=0) total(tps=7063 timeouts=0 errors=0)
2019-03-06 17:45:08.806 write(tps=2160 timeouts=0 errors=0) read(tps=2185 timeouts=0 errors=0) total(tps=4345 timeouts=0 errors=0)
2019-03-06 17:45:09.775 write(tps=1960 timeouts=0 errors=0) read(tps=1942 timeouts=0 errors=0) total(tps=3902 timeouts=0 errors=0)
2019-03-06 17:45:10.714 WARN Thread tend Node BB9030011AC4202 172.17.0.3 3000 refresh failed: Error -1: java.net.SocketTimeoutException: Read timed out
2019-03-06 17:45:10.776 write(tps=1644 timeouts=0 errors=0) read(tps=1692 timeouts=0 errors=0) total(tps=3336 timeouts=0 errors=0)
2019-03-06 17:45:11.777 write(tps=435 timeouts=0 errors=0) read(tps=417 timeouts=0 errors=0) total(tps=852 timeouts=0 errors=0)
2019-03-06 17:45:11.839 WARN Thread tend Node BB9040011AC4202 172.17.0.4 3000 refresh failed: Error -1: java.net.SocketTimeoutException: Read timed out
2019-03-06 17:45:12.779 write(tps=209 timeouts=0 errors=0) read(tps=211 timeouts=0 errors=0) total(tps=420 timeouts=0 errors=0)
2019-03-06 17:45:12.794 WARN Thread tend Node BB9050011AC4202 172.17.0.5 3000 refresh failed: com.aerospike.client.AerospikeException: Error -1: java.net.SocketException: Connection reset
at com.aerospike.client.Info.sendCommand(Info.java:580)
at com.aerospike.client.Info.<init>(Info.java:123)
at com.aerospike.client.Info.request(Info.java:520)
at com.aerospike.client.cluster.Node.refresh(Node.java:184)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:444)
at com.aerospike.client.cluster.Cluster.run(Cluster.java:406)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.aerospike.client.cluster.Connection.readFully(Connection.java:248)
at com.aerospike.client.Info.sendCommand(Info.java:571)
... 6 more
2019-03-06 17:45:13.048 WARN Thread tend Node BB9060011AC4202 172.17.0.6 3000 refresh failed: com.aerospike.client.AerospikeException: Error -1: java.net.SocketException: Connection reset
at com.aerospike.client.Info.sendCommand(Info.java:580)
at com.aerospike.client.Info.<init>(Info.java:85)
at com.aerospike.client.cluster.PeerParser.<init>(PeerParser.java:43)
at com.aerospike.client.cluster.Node.refreshPeers(Node.java:390)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:453)
at com.aerospike.client.cluster.Cluster.run(Cluster.java:406)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.aerospike.client.cluster.Connection.readFully(Connection.java:248)
at com.aerospike.client.Info.sendCommand(Info.java:571)
... 6 more
2019-03-06 17:45:13.659 WARN Thread tend Node BB9070011AC4202 172.17.0.7 3000 refresh failed: com.aerospike.client.AerospikeException: Error -1: java.net.SocketException: Connection reset
at com.aerospike.client.Info.sendCommand(Info.java:580)
at com.aerospike.client.Info.<init>(Info.java:85)
at com.aerospike.client.cluster.PeerParser.<init>(PeerParser.java:43)
at com.aerospike.client.cluster.Node.refreshPeers(Node.java:390)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:453)
at com.aerospike.client.cluster.Cluster.run(Cluster.java:406)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.aerospike.client.cluster.Connection.readFully(Connection.java:248)
at com.aerospike.client.Info.sendCommand(Info.java:571)
... 6 more
2019-03-06 17:45:13.783 write(tps=2303 timeouts=0 errors=0) read(tps=2344 timeouts=0 errors=0) total(tps=4647 timeouts=0 errors=0)
2019-03-06 17:45:14.021 WARN Thread tend Node BB9080011AC4202 172.17.0.8 3000 refresh failed: com.aerospike.client.AerospikeException: Error -1: java.net.SocketException: Connection reset
at com.aerospike.client.Info.sendCommand(Info.java:580)
at com.aerospike.client.Info.<init>(Info.java:85)
at com.aerospike.client.cluster.PeerParser.<init>(PeerParser.java:43)
at com.aerospike.client.cluster.Node.refreshPeers(Node.java:390)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:453)
at com.aerospike.client.cluster.Cluster.run(Cluster.java:406)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.aerospike.client.cluster.Connection.readFully(Connection.java:248)
at com.aerospike.client.Info.sendCommand(Info.java:571)
... 6 more
2019-03-06 17:45:14.783 write(tps=5749 timeouts=0 errors=0) read(tps=5695 timeouts=0 errors=0) total(tps=11444 timeouts=0 errors=0)
2019-03-06 17:45:15.029 WARN Thread tend Node BB9030011AC4202 172.17.0.3 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.029 WARN Thread tend Node BB9060011AC4202 172.17.0.6 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.030 WARN Thread tend Node BB9070011AC4202 172.17.0.7 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.030 WARN Thread tend Node BB9040011AC4202 172.17.0.4 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.031 WARN Thread tend Node BB9080011AC4202 172.17.0.8 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.031 WARN Thread tend Node BB9050011AC4202 172.17.0.5 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.784 write(tps=6578 timeouts=0 errors=0) read(tps=6546 timeouts=0 errors=0) total(tps=13124 timeouts=0 errors=0)
2019-03-06 17:45:16.036 WARN Thread tend Node BB9030011AC4202 172.17.0.3 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:16.036 WARN Thread tend Node BB9060011AC4202 172.17.0.6 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:16.038 WARN Thread tend Node BB9070011AC4202 172.17.0.7 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:16.038 WARN Thread tend Node BB9040011AC4202 172.17.0.4 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:16.039 WARN Thread tend Node BB9080011AC4202 172.17.0.8 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:16.039 WARN Thread tend Node BB9050011AC4202 172.17.0.5 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9030011AC4202 172.17.0.3 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9060011AC4202 172.17.0.6 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9070011AC4202 172.17.0.7 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9040011AC4202 172.17.0.4 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9080011AC4202 172.17.0.8 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9050011AC4202 172.17.0.5 3000
2019-03-06 17:45:16.785 write(tps=6694 timeouts=0 errors=0) read(tps=6496 timeouts=0 errors=0) total(tps=13190 timeouts=0 errors=0)
2019-03-06 17:45:17.786 write(tps=6801 timeouts=0 errors=0) read(tps=6707 timeouts=0 errors=0) total(tps=13508 timeouts=0 errors=0)
2019-03-06 17:45:18.787 write(tps=6734 timeouts=0 errors=0) read(tps=6647 timeouts=0 errors=0) total(tps=13381 timeouts=0 errors=0)
2019-03-06 17:45:19.787 write(tps=6184 timeouts=0 errors=0) read(tps=6252 timeouts=0 errors=0) total(tps=12436 timeouts=0 errors=0)
2019-03-06 17:45:20.788 write(tps=5742 timeouts=0 errors=0) read(tps=5712 timeouts=0 errors=0) total(tps=11454 timeouts=0 errors=0)
Initially, the output above may cause concern but it is all to be expected. When a client builds a partition map it does so by tending to all nodes in the cluster. These nodes report back which partitions they own. As visible above, even when quiesced
, until a node has migrated out all of its partitions, it will retain the master role. For this reason, even after the node has been quiesced
the clients will still tend to it until it shuts down. This allows for the possibility that the node be unquiesced
. This behaviour is observed here:
2019-03-06 17:45:10.714 WARN Thread tend Node BB9030011AC4202 172.17.0.3 3000 refresh failed: Error -1: java.net.SocketTimeoutException: Read timed out
2019-03-06 17:45:10.776 write(tps=1644 timeouts=0 errors=0) read(tps=1692 timeouts=0 errors=0) total(tps=3336 timeouts=0 errors=0)
2019-03-06 17:45:11.777 write(tps=435 timeouts=0 errors=0) read(tps=417 timeouts=0 errors=0) total(tps=852 timeouts=0 errors=0)
2019-03-06 17:45:11.839 WARN Thread tend Node BB9040011AC4202 172.17.0.4 3000 refresh failed: Error -1: java.net.SocketTimeoutException: Read timed out
As the tend requests to the shutdown nodes are timing out, messages show that the workload continues without issue.
The connections used to tend are reset.
2019-03-06 17:45:13.783 write(tps=2303 timeouts=0 errors=0) read(tps=2344 timeouts=0 errors=0) total(tps=4647 timeouts=0 errors=0)
2019-03-06 17:45:14.021 WARN Thread tend Node BB9080011AC4202 172.17.0.8 3000 refresh failed: com.aerospike.client.AerospikeException: Error -1: java.net.SocketException: Connection reset
at com.aerospike.client.Info.sendCommand(Info.java:580)
at com.aerospike.client.Info.<init>(Info.java:85)
at com.aerospike.client.cluster.PeerParser.<init>(PeerParser.java:43)
at com.aerospike.client.cluster.Node.refreshPeers(Node.java:390)
at com.aerospike.client.cluster.Cluster.tend(Cluster.java:453)
at com.aerospike.client.cluster.Cluster.run(Cluster.java:406)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.aerospike.client.cluster.Connection.readFully(Connection.java:248)
at com.aerospike.client.Info.sendCommand(Info.java:571)
... 6 more
Further connections are refused, with messages showing the workload continuing:
2019-03-06 17:45:14.783 write(tps=5749 timeouts=0 errors=0) read(tps=5695 timeouts=0 errors=0) total(tps=11444 timeouts=0 errors=0)
2019-03-06 17:45:15.029 WARN Thread tend Node BB9030011AC4202 172.17.0.3 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.029 WARN Thread tend Node BB9060011AC4202 172.17.0.6 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.030 WARN Thread tend Node BB9070011AC4202 172.17.0.7 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.030 WARN Thread tend Node BB9040011AC4202 172.17.0.4 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.031 WARN Thread tend Node BB9080011AC4202 172.17.0.8 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.031 WARN Thread tend Node BB9050011AC4202 172.17.0.5 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2019-03-06 17:45:15.784 write(tps=6578 timeouts=0 errors=0) read(tps=6546 timeouts=0 errors=0) total(tps=13124 timeouts=0 errors=0)
The client removes the nodes from the partition map:
2019-03-06 17:45:14.783 write(tps=5749 timeouts=0 errors=0) read(tps=5695 timeouts=0 errors=0) total(tps=11444 timeouts=0 errors=0)
019-03-06 17:45:16.046 INFO Thread tend Remove node BB9030011AC4202 172.17.0.3 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9060011AC4202 172.17.0.6 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9070011AC4202 172.17.0.7 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9040011AC4202 172.17.0.4 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9080011AC4202 172.17.0.8 3000
2019-03-06 17:45:16.046 INFO Thread tend Remove node BB9050011AC4202 172.17.0.5 3000
2019-03-06 17:45:16.785 write(tps=6694 timeouts=0 errors=0) read(tps=6496 timeouts=0 errors=0) total(tps=13190 timeouts=0 errors=0)
Observe the final cluster state
The cluster is now shown as a stable 3 node cluster with significantly lower disk usage % due to the vertical expansion. Throughout the expansion there has been a steady workload which has not been disrupted. At all times the data replication has not lowered.
Namespaces
==========
test
====
1. Devices : Total 0, per-node 0
2. Memory : Total 12.000 GB, 0.00% used (0.000 B), 100.00% available (12.000 GB)
3. Replication Factor : 2
4. Rack-aware : False
5. Master Objects : 0.000
6. Usage (Unique Data): 0.000 B in-memory, 0.000 B on-disk
bar
===
1. Devices : Total 3, per-node 1
2. Memory : Total 12.000 GB, 22.33% used (2.680 GB), 77.67% available (9.320 GB)
3. Disk : Total 9.000 GB, 24.97% used (2.247 GB), 68.00% available contiguous space (6.120 GB)
4. Replication Factor : 2
5. Rack-aware : False
6. Master Objects : 15.100 M
7. Usage (Unique Data): 0.000 B in-memory, 547.218 MB on-disk
Admin> info
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-03-08 16:15:03 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Node Ip Build Cluster Migrations Cluster Cluster Principal Client Uptime
. Id . . Size . Key Integrity . Conns .
172.17.0.10:3000 BB90A0011AC4202 172.17.0.10:3000 E-4.5.1.5 3 0.000 E9316F7FBC5B True BB90C0011AC4202 2 24:51:08
172.17.0.12:3000 *BB90C0011AC4202 172.17.0.12:3000 E-4.5.1.5 3 0.000 E9316F7FBC5B True BB90C0011AC4202 3 24:51:08
2de7168d260b:3000 BB90B0011AC4202 172.17.0.11:3000 E-4.5.1.5 3 0.000 E9316F7FBC5B True BB90C0011AC4202 2 24:51:08
Number of rows: 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-03-08 16:15:03 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Expirations,Evictions Stop Disk Disk HWM Avail% Mem Mem HWM Stop
. . Records . Writes Used Used% Disk% . Used Used% Mem% Writes%
bar 172.17.0.10:3000 9.974 M (0.000, 0.000) false 759.900 MB 25 50 68 895.519 MB 22 60 90
bar 172.17.0.12:3000 10.287 M (0.000, 0.000) false 783.761 MB 26 50 67 923.640 MB 23 60 90
bar 2de7168d260b:3000 9.940 M (0.000, 0.000) false 757.334 MB 25 50 69 892.497 MB 22 60 90
bar 30.200 M (0.000, 0.000) 2.247 GB 2.648 GB
test 172.17.0.10:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 172.17.0.12:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 2de7168d260b:3000 0.000 (0.000, 0.000) false N/E N/E 50 N/E 0.000 B 0 60 90
test 0.000 (0.000, 0.000) 0.000 B 0.000 B
Number of rows: 8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-03-08 16:15:03 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Repl Objects Tombstones Pending Rack
. . Records Factor (Master,Prole,Non-Replica) (Master,Prole,Non-Replica) Migrates ID
. . . . . . (tx,rx) .
bar 172.17.0.10:3000 9.974 M 2 (4.991 M, 4.982 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 172.17.0.12:3000 10.287 M 2 (5.111 M, 5.176 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 2de7168d260b:3000 9.940 M 2 (4.998 M, 4.941 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
bar 30.200 M (15.100 M, 15.100 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000)
test 172.17.0.10:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 172.17.0.12:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 2de7168d260b:3000 0.000 2 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
test 0.000 (0.000, 0.000, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000)
Number of rows: 8
Admin>
Notes
- In versions prior to Aerospike 4.3.1.3 the
quiesce
command is not present. In such a scenario a potential approach to expanding a cluster vertically may be to make use ofrack aware
to put all newer nodes into a single rack which would then mean that rack contained a single copy of every partition. The smaller nodes could then be shut down and the larger nodes could have theirrack-id
set to different values to cause migration and ensure that the replication factor was resumed. This would mean that for the duration of the exercise the replication factor would be reduced to 1. - Another area where this method could be employed to do an application transparent change would be when changing instance types or re-stacking cloud based Aerospike cluster nodes.
Keywords
EXPAND VERTICAL NODES QUIESCE CHANGE INSTANCE RACK_ID RACK AWARE INSTANCE TYPE AEROSPIKE
Timestamp
3/8/19