Problems with Secondary Indexes and Garbage Collection

secondary
index

#1

There is a bug in memory usage for secondary indexes with namespaces that use a storage device.

If, between garbage collection passes, you delete one record and immediately insert a different record, the digest for the first record remains in memory. It is not cleaned by garbage collection. Digests for secondary indexes that ‘evade’ garbage collection are only cleaned up when secondary indexes are rebuilt, either from manually rebuilding them or restarting the node.

For example, we have a bin for Country, and we have defined a secondary index for Country:

  PI   Country   Name  
 ---- --------- ------ 
   1   Sweden    AAA   
   2   Norway    BBB   
   3   India     CCC   
   4   USA       DDD   
   5   Sweden    EEE   
   6   Norway    FFF   
   7   USA       GGG   
   8   India     HHH   
   9   India     III   
  10   Sweden    JJJ   
  11   Norway    LLL   
  12   India     MMM

The secondary index consists of lists of digests that point to the primary indexes:

  Index:           Digests:          
 --------- ------------------------- 
  Sweden:   D(1), D(5), D(10)        
  Norway:   D(2), D(6), D(11)        
  USA:      D(4), D(7)               
  India:    D(3), D(8), D(9), D(12)

The following steps illustrate the bug:

  1. Garbage collection cleans the secondary indexes.
  2. You insert the value ‘USA’ into the Country bin. The digest for the value is updated in the secondary index for Country.
  3. You delete the value ‘USA’ from the Country bin.
  4. You insert the value ‘India’ into the Country bin. The digest for the value is updated in the secondary index for Country.

The new value for PI 4 is ‘India’:

  P    Country   Name  
 ---- --------- ------ 
   1   Sweden    AAA   
   2   Norway    BBB   
   3   India     CCC   
   4   India     DDD   
   5   Sweden    EEE   
   6   Norway    FFF   
   7   USA       GGG   
   8   India     HHH   
   9   India     III   
   10  Sweden    JJJ   
   11  Norway    LLL   
   12  India     MMM   

The record on disk is properly deleted and recreated, but the secondary index for Country is not:

  Index:              Digests:             
---------- ------------------------------- 
 Sweden:    D(1), D(5), D(10)              
 Norway:    D(2), D(6), D(11)              
 USA:       D(4), D(7)                     
 India:     D(3), D(8), D(9), D(12), D(4)
  1. Garbage collection cleans the secondary indexes. It leaves the digest for ‘USA’. It stays in memory until the secondary index is rebuilt.

When a query comes to the secondary index for Country on this node, it reads the digests for the sindex, checks the locations in the primary index, and returns data. The query checks the primary index for D(4), and it returns the correct value, ‘India’, and the query returns correct results.

The digest for USA still points to PI 4, and is never cleaned up properly.

Reference: AER-1126