Aerospike crash after re-joining a disabled node (3.6.1)

Hi, We have 10 node cluster running in production. Out of 10 nodes,1 node had performance problems and was shutdown. When it was added back to the cluster we have noticed that once the migrations start, other nodes ASD process has been terminated with following error:

Oct 07 2015 19:33:17 GMT: WARNING (as): (signal.c::161) SIGSEGV received, aborting Aerospike Community Edition build 3.6.1 os el6-------

Before the crash, we see following message


Oct 07 2015 22:25:32 GMT: INFO (paxos): (paxos.c::2410) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes:dun:nodes=bb9f9dd5f290c00,bb9c14a8b565000,bb996068b565000,bb9786f8b565000,bb9607a3f290c00,bb917e213290c00,bb91193a1290c00,bb90c18b1290c00,bb9061f8b565000

This is really concerning. We would love to learn and know how can we avoid this situation.

Atleast, the problematic node should not force other aerospike process to go down.

Help appreciated!

Thanks, Manish

@trivmanish,

We just released Aerospike Server Community Edition v3.6.2, which fixes a regression found in 3.6.0 and 3.6.1 where the server would crash during migrations while processing batch requests.

Will you please upgrade to 3.6.2 and let us know whether you are still experiencing this issue?

Hi, I just upgraded my test cluster to the latest 3.6.2 (enterprise) since we saw same issue in our lab with 3.6.1 After the upgrade, service starts, but when I try asadm it errors in each node like this:

[root@host-192-168-1-15 bin]# ./asadm
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "./asadm/__main__.py", line 15, in <module>
  File "./asadm/asadm.py", line 26, in <module>
  File "./asadm/lib/controller.py", line 15, in <module>
  File "./asadm/lib/controllerlib.py", line 16, in <module>
  File "./asadm/lib/cluster.py", line 17, in <module>
  File "./asadm/lib/node.py", line 463
    result = {node.node_id for node in  c.nodes.values()}
                             ^
SyntaxError: invalid syntax

All other tools seem to run fine (aql, asmonitor, ascli, asinfo). thanks for your response -wilson

There was a python2.6 incompatible changed introduced in asadm. We have released a tools package that fixes this incompatibility.

See AER-3587 at Aerospike Tools Release Note | Download | Aerospike

I obviously used that same tools package bundled in the enterprise tarball, it matches the 3.6.2.1 version. Just for fun, I downloaded JUST that package again (aerospike-tools-3.6.2.1-el6.tgz) and installed it after “removing” the previous one. Same errors come up. This is happening on RHE6 (2.6.32-431.el6.x86_64)

Sorry for the confusion, originally the tarball had the 3.6.2 package, so you could have been running that. In anycase it appears that the patch submitted to resolve this issue didn’t fully remedy the problem.

Seems our build system used the wrong asadm tag–this will be addressed in the next release, not sure if there will be another interim release.

In the meantime you may be able download the 0.0.13 release of asadm from Release Release 0.0.13 · aerospike/aerospike-admin · GitHub

Running make;make install as described here: GitHub - aerospike/aerospike-admin: Aerospike Administration tool: allows operations to view vital stats from the aerospike server from the command line. will install asadm to the same directories the packages normally install.

here is an interesting fact. I installed the tool on different server RHEL7 (3.10.0-229.el7.x86_64), and it works fine there (python 2.7.5).

Thanks! I downloaded the source as suggested and installed it as per instructions in my RHEL6 server. asadm now works there.

Correct, there was a set comprehension added which isn’t supported by python 2.6.

Great, I just saw an email from the build system chatting about 3.6.2.2, so I suspect someone will be updating the packages again soon with the new interim tools release.