mmao
March 2, 2015, 9:10pm
1
I’m running 2 OpenNebula VMs with CentOS.
I’ve installed CE 3.5.3 on both nodes, and AMC and the benchmarks on the 1st node.
edit: Is the CE restricted to a single node? Is it something that basic?
I can add the 2nd node in AMC no problem. After adding, I can run_benchmarks and get confirmation that both nodes are reachable:
2015-03-02 20:14:16.410 INFO Thread 1 Add node BB98405080A0002 127.0.0.1:3000
2015-03-02 20:14:16.425 INFO Thread 1 Add node BB98705080A0002 10.8.5.135:3000
2015-03-02 20:14:16.500 write(tps=16 timeouts=0 errors=0) read(tps=85 timeouts=0 errors=0) total(tps=101 timeouts=0 err
But soon after, the 2nd node doesn’t get added any more:
2015-03-02 21:08:06.728 INFO Thread 1 Add node BB98405080A0002 127.0.0.1:3000
2015-03-02 21:08:06.786 write(tps=29 timeouts=0 errors=0) read(tps=123 timeouts=0 errors=0) total(tps=152 timeouts=0 errors=0)
AMC shows both nodes as up and green Cluster Visibility for some time after that, but then they both show as read. Even while run_benchmarks still works, both nodes stay red.
If I issue an aerospike service restart on the 2nd node, it rejoins, even in the middle of a run_benchmarks:
2015-03-02 22:06:30.988 write(tps=13364 timeouts=0 errors=0) read(tps=13442 timeouts=0 errors=0) total(tps=26806 timeouts=0 errors=0)
2015-03-02 22:06:31.914 INFO Thread 8 Add node BB98705080A0002 10.8.5.135:3000
2015-03-02 22:06:31.988 write(tps=5211 timeouts=0 errors=0) read(tps=5275 timeouts=0 errors=0) total(tps=10486 timeouts=0 errors=0)
And then the 2 nodes both go green in AMC.
How can I get more detail about why the 2nd node is dropping out? What errors should I normally expect from AMC about that event?
mmao
March 2, 2015, 9:37pm
2
One more piece of evidence. I kept a run_benchmarks running for a long time to see if it would register a dropped/disconnected node. It didn’t, and AMC kept graphing throughput from both nodes throughout.
I canceled the benchmark and relaunched it. This time, it didn’t add the 2nd node. And AMC’s graph for the 2nd node’s throughput dropped to zero.
2015-03-02 22:33:34.595 write(tps=3424 timeouts=0 errors=0) read(tps=3391 timeouts=0 errors=0) total(tps=6815 timeouts=0 errors=0)
2015-03-02 22:33:35.595 write(tps=3355 timeouts=0 errors=0) read(tps=3401 timeouts=0 errors=0) total(tps=6756 timeouts=0 errors=0)
2015-03-02 22:33:36.595 write(tps=3338 timeouts=0 errors=0) read(tps=3387 timeouts=0 errors=0) total(tps=6725 timeouts=0 errors=0)
2015-03-02 22:33:37.596 write(tps=3367 timeouts=0 errors=0) read(tps=3414 timeouts=0 errors=0) total(tps=6781 timeouts=0 errors=0)
2015-03-02 22:33:38.596 write(tps=3237 timeouts=0 errors=0) read(tps=3410 timeouts=0 errors=0) total(tps=6647 timeouts=0 errors=0)
2015-03-02 22:33:39.597 write(tps=3334 timeouts=0 errors=0) read(tps=3337 timeouts=0 errors=0) total(tps=6671 timeouts=0 errors=0)
[root@mmao-aerospike_ce benchmarks]# ./run_benchmarks
Benchmark: 127.0.0.1:3000, namespace: test, set: testset, threads: 16, workload: READ_UPDATE
read: 50% (all bins: 100%, single bin: 0%), write: 50% (all bins: 100%, single bin: 0%)
keys: 100000, start key: 0, transactions: 0, bins: 1, random values: false, throughput: unlimited
read policy: timeout: 0, maxRetries: 1, sleepBetweenRetries: 500, consistencyLevel: CONSISTENCY_ONE, reportNotFound: false
write policy: timeout: 0, maxRetries: 1, sleepBetweenRetries: 500, commitLevel: COMMIT_ALL
bin[0]: integer
debug: false
2015-03-02 22:33:43.885 INFO Thread 1 Add node BB98405080A0002 127.0.0.1:3000
2015-03-02 22:33:44.111 write(tps=902 timeouts=0 errors=0) read(tps=979 timeouts=0 errors=0) total(tps=1881 timeouts=0 errors=0)
2015-03-02 22:33:45.112 write(tps=8463 timeouts=0 errors=0) read(tps=8432 timeouts=0 errors=0) total(tps=16895 timeouts=0 errors=0)
2015-03-02 22:33:46.112 write(tps=9539 timeouts=0 errors=0) read(tps=9579 timeouts=0 errors=0) total(tps=19118 timeouts=0 errors=0)
I only noticed then that both nodes were marked red. I hadn’t checked during the initial long run while both nodes were working.
meher
March 5, 2015, 2:50am
3
Thanks for reaching out. The CE does allow multiple nodes per cluster. There are no limitations.
Did you configure your nodes to form a cluster across those VMs?
Could you share config files for your nodes and last lines of each node’s log file (under /var/log/aerospike/aerospike.log).
Thanks,
–meher