Hi,
We have 10 node cluster running in production. Out of 10 nodes,1 node had performance problems and was shutdown. When it was added back to the cluster we have noticed that once the migrations start, other nodes ASD process has been terminated with following error:
Oct 07 2015 19:33:17 GMT: WARNING (as): (signal.c::161) SIGSEGV received, aborting Aerospike Community Edition build 3.6.1 os el6-------
Before the crash, we see following message
Oct 07 2015 22:25:32 GMT: INFO (paxos): (paxos.c::2410) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes:dun:nodes=bb9f9dd5f290c00,bb9c14a8b565000,bb996068b565000,bb9786f8b565000,bb9607a3f290c00,bb917e213290c00,bb91193a1290c00,bb90c18b1290c00,bb9061f8b565000
This is really concerning. We would love to learn and know how can we avoid this situation.
Atleast, the problematic node should not force other aerospike process to go down.
We just released Aerospike Server Community Edition v3.6.2, which fixes a regression found in 3.6.0 and 3.6.1 where the server would crash during migrations while processing batch requests.
Will you please upgrade to 3.6.2 and let us know whether you are still experiencing this issue?
Hi,
I just upgraded my test cluster to the latest 3.6.2 (enterprise) since we saw same issue in our lab with 3.6.1
After the upgrade, service starts, but when I try asadm it errors in each node like this:
[root@host-192-168-1-15 bin]# ./asadm
Traceback (most recent call last):
File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
exec code in run_globals
File "./asadm/__main__.py", line 15, in <module>
File "./asadm/asadm.py", line 26, in <module>
File "./asadm/lib/controller.py", line 15, in <module>
File "./asadm/lib/controllerlib.py", line 16, in <module>
File "./asadm/lib/cluster.py", line 17, in <module>
File "./asadm/lib/node.py", line 463
result = {node.node_id for node in c.nodes.values()}
^
SyntaxError: invalid syntax
All other tools seem to run fine (aql, asmonitor, ascli, asinfo).
thanks for your response
-wilson
I obviously used that same tools package bundled in the enterprise tarball, it matches the 3.6.2.1 version.
Just for fun, I downloaded JUST that package again (aerospike-tools-3.6.2.1-el6.tgz) and installed it after “removing” the previous one. Same errors come up. This is happening on RHE6 (2.6.32-431.el6.x86_64)
Sorry for the confusion, originally the tarball had the 3.6.2 package, so you could have been running that. In anycase it appears that the patch submitted to resolve this issue didn’t fully remedy the problem.
Correct, there was a set comprehension added which isn’t supported by python 2.6.
Great, I just saw an email from the build system chatting about 3.6.2.2, so I suspect someone will be updating the packages again soon with the new interim tools release.