Query by range does not return all data

garbage
index
secondary

#1

Hi all,

We have a problem with the requests to the numeric index by range. We get a data that does not correspond to the expected result.

The dump of data you can download from http://dropmefiles.com/cVMYK. Set has one index:

CREATE INDEX rqdtsi ON test.Test (rqdts) NUMERIC

We used 3.3.19 version of server and 3.0.6 C# / .NET client.

We are trying to quering on index by Filster.Range by range from 1411372200 to 1411372799 and get 13341 records instead of 13717.

var statement = new Statement();
statement.SetNamespace("test");
statement.SetSetName("Test");
statement.SetIndexName("rqdtsi");
statement.SetBinNames("rqdts");
statement.SetFilters(Filter.Range("rqdts", 1411372200, 1411372799)));

var digests = new List<byte[]>();

RecordSet rs = _client.Query(null, statement);
while (rs.Next())
{
    if (rs.Record != null)
    {
        digests.Add(rs.Key.digest);
    }
}
rs.Close();

If make a request by Filter.Equal in loop from 1411372200 to 1411372799 or to scan recods, we get the correct data.

What could be the problem?


#2

Thanks for reporting this bug on the range filter. We were able to reproduce the issue with the data you provided. Our engineers are working on a fix.

best, Lucien


#3

@lucien, is this problem now fixed? We are observing a similar (?) issue where we are getting duplicate records:

aql> select key from test.context where flag between 0 and 1
+-------------------+
| key               |
+-------------------+
| "Device000000033" |
| "Device000000030" |
| "Device000000031" |
| "Device000000031" |
| "Device000000030" |
| "Device000000034" |
| "Device000000033" |
+-------------------+

This doesn’t happen if we use asbackup to backup the database and do a clean import on e.g. a different machine with a clean install (and a new CREATE INDEX call). This leads us to believe that somehow the index got corrupted, but have no errors or indications otherwise. The only time this issue manifests itself is if we use BETWEEN.


#4

Hi,

The bug you have previously reported have been fixed and you can find it in the latest build. Thanks again for finding this.

To solve the current problem, we will like you to share some details with us ?

  1. Size of the cluster.
  2. Were the migrations going on when you ran the secondary index query ?
  3. Can you share the output of following commands to us -
  • aql> show indexes
  • aql> stat index test sindex name
  • aql> stat namespace test
  • sudo grep -i warn /var/log/aerospike/aerospike.log

#5

Hi Pratyyy,

this is on a cluster of 1, so no migrations are ongoing. Regarding your questions:

  1. show indexes (trimmed to remove sensitive information)
aql> show indexes
+----------------+-----------------+---------------------------------------------+----------+-------+-------------------------------------------------+------------+--------------+
| ns             | bins            | set                                         | num_bins | state | indexname                                       | sync_state | type         |
+----------------+-----------------+---------------------------------------------+----------+-------+-------------------------------------------------+------------+--------------+
| "user_profile" | "last_activity" | "west"                                      | 1        | "RW"  | "ix3"                                           | "synced"   | "INT SIGNED" |
| "user_profile" | "location"      | "west"                                      | 1        | "RW"  | "ix2"                                           | "synced"   | "TEXT"       |
| "test"         | "in_use"        | "test_context|Test device"                  | 1        | "RW"  | "test_context|Test device_idx"                  | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPool67DRjQGl"      | 1        | "RW"  | "AerospikeIntegrationTestPool67DRjQGl_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolS069L8BZ"      | 1        | "RW"  | "AerospikeIntegrationTestPoolS069L8BZ_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolkSV9NUwS"      | 1        | "RW"  | "AerospikeIntegrationTestPoolkSV9NUwS_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolvpBMxFtf"      | 1        | "RW"  | "AerospikeIntegrationTestPoolvpBMxFtf_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "test_contextTest device"                   | 1        | "RW"  | "test_contextTest device_idx"                   | "synced"   | "INT SIGNED" |
| "test"         | "owner_id"      | "key_meta"                                  | 1        | "RW"  | "owner_id_idx"                                  | "synced"   | "TEXT"       |
| "test"         | "in_use"        | "test_contextStandardDevelopmentHTTPClient" | 1        | "RW"  | "test_contextStandardDevelopmentHTTPClient_idx" | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "test_contextUnknownDeviceClass"            | 1        | "RW"  | "test_contextUnknownDeviceClass_idx"            | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolj7uQExaB"      | 1        | "RW"  | "AerospikeIntegrationTestPoolj7uQExaB_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolDdwOyDaH"      | 1        | "RW"  | "AerospikeIntegrationTestPoolDdwOyDaH_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolYn2taJyl"      | 1        | "RW"  | "AerospikeIntegrationTestPoolYn2taJyl_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "test_pool1"                                | 1        | "RW"  | "test_pool1_idx"                                | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolvGxjrrHR"      | 1        | "RW"  | "AerospikeIntegrationTestPoolvGxjrrHR_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "test-pool"                                 | 1        | "RW"  | "test-pool_idx"                                 | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "yyy"                                       | 1        | "RW"  | "yyy_idx"                                       | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "AerospikeIntegrationTestPoolxuoLh0e1"      | 1        | "RW"  | "AerospikeIntegrationTestPoolxuoLh0e1_idx"      | "synced"   | "INT SIGNED" |
| "test"         | "in_use"        | "xxx"                                       | 1        | "RW"  | "xxx_idx"                                       | "synced"   | "INT SIGNED" |
+----------------+-----------------+---------------------------------------------+----------+-------+-------------------------------------------------+------------+--------------+
20 rows in set (0.001 secs)
OK	
  1. Index stats
aql> stat index test test_contextStandardDevelopmentHTTPClient_idx
+--------------------------------+---------+
| name                           | value   |
+--------------------------------+---------+
| "keys"                         | 2       |
| "objects"                      | 7       |
| "ibtr_memory_used"             | 18432   |
| "nbtr_memory_used"             | 342     |
| "si_accounted_memory"          | 18774   |
| "load_pct"                     | 100     |
| "loadtime"                     | 0       |
| "stat_write_reqs"              | 53      |
| "stat_write_success"           | 53      |
| "stat_write_errs"              | 0       |
| "stat_delete_reqs"             | 14      |
| "stat_delete_success"          | 14      |
| "stat_delete_errs"             | 0       |
| "stat_gc_recs"                 | 1       |
| "stat_gc_time"                 | 0       |
| "query_reqs"                   | 3       |
| "query_avg_rec_count"          | 7       |
| "query_avg_record_size"        | 222     |
| "query_agg"                    | 0       |
| "query_agg_avg_rec_count"      | 0       |
| "query_agg_avg_record_size"    | 0       |
| "query_lookups"                | 3       |
| "query_lookup_avg_rec_count"   | 7       |
| "query_lookup_avg_record_size" | 222     |
| "gc-period"                    | 1000    |
| "gc-max-units"                 | 1000    |
| "data-max-memory"              | -1      |
| "tracing"                      | 0       |
| "histogram"                    | "false" |
| "ignore-not-sync"              | "true"  |
+--------------------------------+---------+
30 rows in set (0.002 secs)
OK
  1. Namespace stats
+-----------------------------------+-----------------------------------+
| name                              | value                             |
+-----------------------------------+-----------------------------------+
| "type"                            | "device"                          |
| "objects"                         | 9499                              |
| "sub-objects"                     | 216                               |
| "master-objects"                  | 9499                              |
| "master-sub-objects"              | 216                               |
| "prole-objects"                   | 0                                 |
| "prole-sub-objects"               | 0                                 |
| "expired-objects"                 | 0                                 |
| "evicted-objects"                 | 0                                 |
| "set-deleted-objects"             | 0                                 |
| "set-evicted-objects"             | 0                                 |
| "nsup-cycle-duration"             | 0                                 |
| "nsup-cycle-sleep-pct"            | 0                                 |
| "used-bytes-memory"               | 35336237                          |
| "data-used-bytes-memory"          | 34085545                          |
| "index-used-bytes-memory"         | 607936                            |
| "sindex-used-bytes-memory"        | 642756                            |
| "free-pct-memory"                 | 99                                |
| "max-void-time"                   | 0                                 |
| "non-expirable-objects"           | 9499                              |
| "current-time"                    | 164880999                         |
| "stop-writes"                     | "false"                           |
| "hwm-breached"                    | "false"                           |
| "available-bin-names"             | 32746                             |
| "ldt-reads"                       | 0                                 |
| "ldt-read-success"                | 0                                 |
| "ldt-deletes"                     | 0                                 |
| "ldt-delete-success"              | 0                                 |
| "ldt-writes"                      | 0                                 |
| "ldt-write-success"               | 0                                 |
| "ldt-updates"                     | 0                                 |
| "ldt-gc-io"                       | 4472510                           |
| "ldt-gc-cnt"                      | 13                                |
| "ldt-randomizer-retry"            | 0                                 |
| "ldt-errors"                      | 0                                 |
| "ldt-err-toprec-notfound"         | 0                                 |
| "ldt-err-item-notfound"           | 0                                 |
| "ldt-err-internal"                | 0                                 |
| "ldt-err-unique-key-violation"    | 0                                 |
| "ldt-err-insert-fail"             | 0                                 |
| "ldt-err-delete-fail"             | 0                                 |
| "ldt-err-search-fail"             | 0                                 |
| "ldt-err-version-mismatch"        | 0                                 |
| "ldt-err-capacity-exceeded"       | 0                                 |
| "ldt-err-param"                   | 0                                 |
| "ldt-err-op-bintype-mismatch"     | 0                                 |
| "ldt-err-too-many-open-subrec"    | 0                                 |
| "ldt-err-subrec-not-found"        | 0                                 |
| "ldt-err-bin-does-not-exist"      | 0                                 |
| "ldt-err-bin-exits"               | 0                                 |
| "ldt-err-bin-damaged"             | 0                                 |
| "ldt-err-toprec-internal"         | 0                                 |
| "ldt-err-subrec-internal"         | 0                                 |
| "ldt-err-transform-internal"      | 0                                 |
| "ldt-err-unknown"                 | 0                                 |
| "used-bytes-disk"                 | 37015808                          |
| "free-pct-disk"                   | 99                                |
| "available_pct"                   | 99                                |
| "memory-size"                     | 4294967296                        |
| "high-water-disk-pct"             | 50                                |
| "high-water-memory-pct"           | 60                                |
| "evict-tenths-pct"                | 5                                 |
| "stop-writes-pct"                 | 90                                |
| "cold-start-evict-ttl"            | 4294967295                        |
| "repl-factor"                     | 1                                 |
| "default-ttl"                     | 0                                 |
| "max-ttl"                         | 0                                 |
| "conflict-resolution-policy"      | "generation"                      |
| "allow_versions"                  | "false"                           |
| "single-bin"                      | "false"                           |
| "ldt-enabled"                     | "true"                            |
| "enable-xdr"                      | "false"                           |
| "sets-enable-xdr"                 | "true"                            |
| "ns-forward-xdr-writes"           | "false"                           |
| "disallow-null-setname"           | "false"                           |
| "total-bytes-memory"              | 4294967296                        |
| "read-consistency-level-override" | "off"                             |
| "write-commit-level-override"     | "off"                             |
| "total-bytes-disk"                | 17179869184                       |
| "defrag-lwm-pct"                  | 50                                |
| "defrag-queue-min"                | 0                                 |
| "defrag-sleep"                    | 1000                              |
| "defrag-startup-minimum"          | 10                                |
| "flush-max-ms"                    | 1000                              |
| "fsync-max-sec"                   | 0                                 |
| "write-smoothing-period"          | 0                                 |
| "max-write-cache"                 | 67108864                          |
| "min-avail-pct"                   | 5                                 |
| "post-write-queue"                | 0                                 |
| "data-in-memory"                  | "true"                            |
| "file"                            | "/opt/aerospike/data/test.dat"    |
| "filesize"                        | 17179869184                       |
| "writethreads"                    | 1                                 |
| "writecache"                      | 67108864                          |
| "obj-size-hist-max"               | 100                               |
+-----------------------------------+-----------------------------------+
95 rows in set (0.002 secs)
OK
  1. Warnings in log file

There are no warnings in the log file.

Some additional information:

aql> show sets

+-----------+----------------+----------------------+-----------+---------------------------------------------+------------+---------------------+
| n_objects | set-enable-xdr | set-stop-write-count | ns_name   | set_name                                    | set-delete | set-evict-hwm-count |
+-----------+----------------+----------------------+-----------+---------------------------------------------+------------+---------------------+

| 5         | "use-default"  | 0                    | "test"    | "test_contextStandardDevelopmentHTTPClient" | "false"    | 0                   |
+-----------+----------------+----------------------+-----------+---------------------------------------------+------------+---------------------+

Note: the set has 5 objects, but the index is reported to have 7


#6

Hi bbavn,

Seems like sindex is corrupted indeed. We are trying to replicate this and would appreciate if you can share some more information with us.

  1. Do you get duplicate records in equality queries as well ?
  2. How are you inserting records into aerospike ? through udf or using simple client APIs like aerospike_put.
  3. Can you recall what kind of operations (write/updates) have you done on the set - test_contextStandardDevelopmentHTTPClient

Thanks


#7

Hi Pratyyy,

regarding your questions:

  1. No, only when using BETWEEN.

  2. We have inserted the keys through aerospike_put (by using the C client library)

  3. The operations we typically do (apart from the initial bulk insert of the records using aerospike_key_put) are:

  4. we run a query that selects on as_integer_equals(0) for the in_use field (w/ corresponding index). The callback for this query selects the string value in the key bin and returns it to the caller through the user data parameter and returns false to stop the query

  5. after the foreach-query returns, we do an aerospike_key_get using the returned key to select that record.

  6. next, we re-write that record with the in_use field set to (integer) 1 and with a AS_POLICY_GEN_EQ to ensure the record wasn’t modified in between

  7. if the latter operation fails, we retry the whole procedure (up to a certain defined maximum)

Additionally, we have an operation to empty a set (having an API call for this would be highly appreciated, by the way!) that works by iterating over all the records in a set and, in its callback, will re-write that record with a TTL of 1 (we had a tombstoning bug with this previously) and subsequently calls aerospike_key_remove. We have been using this code for quite a while and have verified that it indeed removes the records. It might be that this operation has been called once (or more) on the mentioned set.


#8

@pratyyy - note: dropping the index & recreating it resolves the issue. Question is of course how this issue can occur in the first place and what can be done to avoid it (preferably in the server). In the meantime, are there any means to validate correctness of an index, or let them be rebuilt automatically?


#9

Hi bbvan,

We have identified the issue. You have some un-cleanable garbage in secondary index.

This generally happens when a user deletes and re-insert a record in on-disk namespace within a short interval. And I can see that your namespace’s type is device.

So, can you confirm that when you say that you re-insert a record that means “delete + insert” operation ?

To make deletes efficient on on-disk namespace, we do not delete sindex entry synchronously. There is a separate background thread which clean such entries. But if you are inserting the record with updated value, before this gc-thread cleans the data, this garbage cannot be cleaned unless this record is deleted again.

You will not see such results in equality queries. But there is a possibility that you may see this records in range queries. We have identified this possibility now. And will get fixed in future releases.

To avoid such situation, one simple way is to not delete the record to re-insert it. You can directly re-insert the record with aerospike_put() API. It will automatically update the previous value. And there is no possibility of generation of garbage in secondary index as well.

Best way to come out of this situation is to recreate the index which will work well since you have very less amount of data in secondary indexes.

Please reply if you have some further queries.

Thanks


#10

Hi @pratyyy,

thanks for getting back. To clarify, we have two operations that we have identified that might be contributing to the issue we’re observing:

  • marking as in-use - we use the procedure as described under point 3 in my post above (Query by range does not return all data). Note that this operation never deletes any records; it only does a put
  • deleting records - since we ran in some tombstoning problems (records reappearing after having been removed), we “fixed” this by first issueing a write with a 1 second TTL, immediately followed by an aerospike_key_remove. This seems to have removed the records reappearing, but if I read you correctly, may have caused the index corruption?

Note: these two operations are distinct and do not happen in the same flow; they might happen at the same time (although the record deletion path is not very common in our system).

That said, I’m not quite sure how to move forward with your advice. Is there any trick we can pull using the C client library?


#11

Hi @pratyyy, any update on this?


#12

Hi bbvan,

Apologies for the late reply. As I said earlier we have found the issue regarding wrong results in secondary index query. It will be fixed in future release. Thanks again for bringing this to our notice.

Till then you can apply some tricks to avoid getting into such situation.

  • Do not keep the window between deleting and inserting a same primary key very small.

For example -

User deleted a record with key- ‘Car’’ bin - ‘Ford’

User inserted a record with key - ‘Car’ bin - ‘Toyota’

I understand if it is hard to do that. This is very application specific.

  • You can increase the gc speed of the secondary index.

asinfo -v “set-config:context=namespace;id=<namespace_name>;indexname=<index_name>;gc-period=0”

asinfo -v “set-config:context=namespace;id=<namespace_name>;indexname=<index_name>;gc-max-units=10000”

Do this for all sindexes in the namespace.

This will speed up the rate at which sindex-gc cleans stale entries.

Can you try this and see if it helps ?

Thanks


#13

Hi bbvan,

Have you tried this ? Did it solved your issue ?

An update regarding the bug - Unfortunately you will not be able to get the required fix in the next release. But if you are blocked on this, we can provide you a dev build with the fix.

Please let us know. Thanks


#14

Hi @pratyyy, thanks for circling back. We haven’t been able to try the suggestion yet as we’ve been focusing on a few other points so there is no urgency at this moment for a dev build (thanks for the offer, though!).

I’ll update this ticket as soon as we have more results.