Aerospike hangs when s-index deletes


#1

Hi,

We have

  • 4 nodes aerospike cluster with replication factor of 2
  • a set with 670 million records within the cluster
  • 7 bins (4 strings, 3 integers) in every record of the set
  • and a string s-index on one of the string bins (the bin’s data is always a 36 symbols string of one of 350 million unique values distributed uniformaly between those 670 millions).

After dropping the s-index the entire cluster stopped responding, clients couldn’t connect to the aerospike nodes, asmonitor kept printing timeout errors.

It lasted for about 10 minutes. During that period we could see the s-index usage of ram was slowly going down until reached 0.

After that the cluster went back to normal, the state of the cluster remained unchanged, migrations didn’t start.

Why is this happening?


#2

Which versions of Aerospike are you running. Can you post your config and may be any critical messages from your logs during the issue.

–Lucien


#3

Configuration is the same as I posted before in thread

Aerospike version now is 3.3.12

no anomalies in logs


#4

Our engineers have been able to identify the issue. We are doing a lot of frees inside the sindex global lock, which can take a while to finish. During this time writes will timeout, info call like sindex-list (which takes global sindex lock) will timeout too. We’re working on addressing this in the next release.

Thanks again for bringing this to our attention.

best, Lucien