Index disappeared without a trace


#1

I experienced an odd case that an index just disappeared. The server version is “Aerospike Community Edition build 3.3.21”

My first guess is I may have carelessly dropped an index. as create or drop index are logged similar to the following lines:

Nov 22 2014 08:01:42 GMT: INFO (info): (thr_info.c::6110) Index creation request received for NAMESPACE:INDEX via SMD
Nov 22 2014 09:09:54 GMT: INFO (info): (thr_info.c::6188) Index deletion request received for NAMESPACE:INDEX via SMD

every log archive file are verified iteratively, and there is only one index creation 2 weeks ago and no index deletion.

The system has only 1 index, and the index was not used every day so all I can tell is it does exist a few days ago but suddenly disappeared. And the most likely relevant cause is I restarted the Aerospike instance 4 times to test a monitoring system yesterday. For the last restart, the log indicates it doesn’t scan every record to build index.

Any idea why an index will be dropped without a line of log? or is there any case a server restart will cause an index be dropped?


#2

Hi mingfai,

There can be two possible reason for index disappearance.

  1. Before restarting the persisted sindex metadata file was deleted. The default location of this file is /opt/aerospike/smd/
  2. SMD layer of aersopike may have deleted the persisted metadata on aerospike startup. This happens when a metadata is not in sync with the majority consensus between nodes.

To know the exact reason can you send us the result of this command on your machine, sudo grep -iE “SMD|WARNING” /var/log/aerospike/aerospike.log


#3

Hi pratyy,

I just encounter the same issue. I’m using Community Edition build 3.3.21 via Vagrant for my development. I set up the indexes a few days ago via the aql console and today when I started up the vagrant box the indexes are no longer there. Restarted the vagrant box many times before and the indexes were always there, but this time they pulled a Houdini.

  • executing sudo grep -iE “SMD|WARNING” /var/log/aerospike/aerospike.log : Dec 19 2014 01:13:44 GMT: WARNING (as): (signal.c::50) SIGTERM received, shutting down Dec 19 2014 18:45:19 GMT: WARNING (as): (signal.c::50) SIGTERM received, shutting down Dec 19 2014 18:56:19 GMT: WARNING (as): (signal.c::50) SIGTERM received, shutting down

  • /opt/aerospike/smd The sindex_module.smd file is there but is blank, it only has []

Thanks in advance. Regards,


#4

Hi joe99,

Disappearance of indexes is very unlikely. To understand the situation better can you answer the following questions -

  1. Did you add more nodes to the cluster after you restarted the node ? i.e Was cluster size changed before and after the reboot ?
  2. Did it happen again ?
  3. If you still have the log from the time it disappeared, can you run this command on that as well. sudo grep “Invalid state at cleanup” /var/log/aerospike/aerospike.log

Thanks


#5

Hi pratyyy,

Apologies for the late reply, I didn’t have time to look at it until today. Answers to your questions:

  1. No nodes were added. Cluster size was not changed before nor after the reboot.
  2. Restarted today and indexes are still there.
  3. Ran the command on the log file for that date in question and came up empty.

Hope that helps.

Regards,


#6

Hi,

Thanks for the reply. This is very unusual. We are trying to reproduce this and will update you if we find anything.

Thanks


#7

Hi joe99,

We have tried to reproduce this issue, but are unable to.

  1. Are you still experiencing the issue?

  2. Is it possible that you accidentally deleted the smd files?

Regards,

Maud


#8

hi Maud,

for my previous setup in Dec, it is a single node setup, using file system instead of whole device for storage.

/var/log/aerospike/aerospike.log:Dec 19 2014 20:29:29 GMT: WARNING (nsup): (thr_nsup.c::1084) {MY_NAMESPACE} can't evict - no records eligible
/var/log/aerospike/aerospike.log:Dec 19 2014 20:31:29 GMT: WARNING (nsup): (thr_nsup.c::1084) {MY_NAMESPACE} can't evict - no records eligible
/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:12 GMT: WARNING (as): (signal.c::50) SIGTERM received, shutting down
/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:13 GMT: WARNING (smd): (system_metadata.c::1453) failed to load System Metadata for module "sindex_module" from file "/opt/aerospike/smd/sindex_module.smd" with JSON error: '[' or '{' expected near end of file ; source: <stream> ; line: 1 ; column: 0 ; position: 0
/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:13 GMT: WARNING (smd): (system_metadata.c::1515) failed to read persisted System Metadata for module "sindex_module"
/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:13 GMT: WARNING (smd): (system_metadata.c::1721) failed to restore persisted System Metadata for module "sindex_module"
/var/log/aerospike/aerospike.log:Dec 19 2014 20:38:30 GMT: WARNING (nsup): (thr_nsup.c::283) {MY_NAMESPACE} cold-start can't evict - no records eligible
/var/log/aerospike/aerospike.log:Dec 19 2014 20:38:30 GMT: WARNING (nsup): (thr_nsup.c::410) {MY_NAMESPACE} could not evict any records
/var/log/aerospike/aerospike.log:Dec 19 2014 20:38:30 GMT: WARNING (drv_ssd): (drv_ssd.c::2859) device /ssd/aerospike/MY_NAMESPACE.db: record-add halting read
/var/log/aerospike/aerospike.log:Dec 19 2014 20:38:30 GMT: WARNING (drv_ssd): (drv_ssd.c::3341) disk restore: hit high water limit before disk entirely loaded.
/var/log/aerospike/aerospike.log:Dec 19 2014 20:40:33 GMT: WARNING (nsup): (thr_nsup.c::1084) {NAMESPACE} can't evict - no records eligible

I think there was problem at that time. And that server was restarted quite often.

However, for now, I migrated to proper production hardware with 3 nodes, and upgraded to the latest 3.5.2 version, and rarely do full restart of the servers. So i didn’t experience any problem again, but please don’t treat it as I confirmed any bug fixed.


#9

Hi mingfai,

I appreciate you giving us information on your setup. Please do let us know if you experience any more issues, and we will do our best to assist you promptly.

Regards,

Maud


#10

2 points I would like to mention regarding the warnings:

1- Can’t evict


/var/log/aerospike/aerospike.log:Dec 19 2014 20:38:30 GMT: WARNING (nsup): (thr_nsup.c::283) {MY_NAMESPACE} cold-start can't evict - no records eligible

/var/log/aerospike/aerospike.log:Dec 19 2014 20:40:33 GMT: WARNING (nsup): (thr_nsup.c::1084) {NAMESPACE} can't evict - no records eligible

Those warnings mean that your cluster (at the time, and seems that this is happening at start up -> cold-start can’t evict) breached the high-water-memory-pct or the high-water-disk-pct and is trying to evict data. You can read about evictions on this page. But it didn’t find any records eligible for eviction, likely because your records are set to not expire (which is fine of course, but means that you have to plan your memory/disk storage carefully to make sure you do not run out).

This should not have any impact on index disappearing though .

2- failed to load System Metadata for module “sindex_module”

/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:13 GMT: WARNING (smd): (system_metadata.c::1453) failed to load System Metadata for module "sindex_module" from file "/opt/aerospike/smd/sindex_module.smd" with JSON error: '[' or '{' expected near end of file ; source: <stream> ; line: 1 ; column: 0 ; position: 0
/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:13 GMT: WARNING (smd): (system_metadata.c::1515) failed to read persisted System Metadata for module "sindex_module"
/var/log/aerospike/aerospike.log:Dec 19 2014 20:33:13 GMT: WARNING (smd): (system_metadata.c::1721) failed to restore persisted System Metadata for module "sindex_module"

This likely explains the disappearance of the index. Looks like the sindex_module.smd file was potentially corrupted and couldn’t be loaded (could be a bug of course). Let us follow up to check potential cases when this could occur.

Thanks for the details.