Aerospike Crash

blonkel · January 2, 2016, 10:03am

Hello,

Today one of our cluster member crashed. Were running two nodes, one of it went offline.

Were running on the latest version (3.7.0.2 enterprise).

I attached you all logfiles / configs we got. Please investigate.

https://dl.dropboxusercontent.com/u/5982366/logs.rar

anushree · January 4, 2016, 7:50pm

We will investigate on the logs and the stack traces. From a quick look, it seems that the cluster was showing Cluster integrity faults about 4 minutes prior to the crash and the node that crashed was running as a 1 node cluster. Was there anything unexpected triggered on the other node?

Starting 3.7.0.2, we have made improvements in the paxos algorithm implementation and we recommend the configuration paxos-recovery-policy to 'auto-reset-master' if cluster is sensitive to network blips.

http://www.aerospike.com/docs/reference/configuration/#paxos-recovery-policy

In order to investigate further, could you please share information on the features you are currently using - UDF, Scans, Batch operations and if anything changed close to when the node crashed?

Are you currently deployed on AWS/similar or bare-metal?

anushree · January 6, 2016, 8:25pm

We identified a fix for the SegV that you observed that missed the release that you used. It has gone out in release 3.7.1. Please give it a spin and let us know if you run into issues.

[AER-4487], [AER-4690] - (Clustering/Migration) Race condition causing incorrect heartbeat fd saved and later not removable.

wchu · January 9, 2016, 2:06am

Can you indicate what type of access pattern you have? purely put/gets? batches? scan? secondary index?

blonkel · January 9, 2016, 4:15pm

It was unexpectedly starting to swap its memory (not caused by Aerospike).

Bare metal.

~70% read, 30% write. Only some batch gets and scans per hour. No secondary index.

Will do!

Topic		Replies	Views
Aerospike sudden crash Operations crash	6	3100	March 11, 2016
Aerospike cluster crashed after index creation Configuration aws	5	2870	September 22, 2015
Experienced sudden crash error	6	1621	August 16, 2014
Aerospike crash after re-joining a disabled node ( aerospike-server-community-3.7.2-el6) Operations	1	1132	February 4, 2016
Problem cluster integrity false on aerospike enterprise 3.9 Configuration	0	1379	September 3, 2016

Aerospike Crash

Related topics