Query optimization for multiple filter

index
secondary
query

#1

As Aerospike only support one secondary index per query, but can support filter with multiple predicate. I want to select best qualifier as the secondary index(statement’s Filter) and the rest as filter(statement’s PredExp). I see there are some index statistics in the sindex command: SINDEX, but there is no detail information for the query result:

asinfo -v 'sindex/phobos_sindex/str_100_idx'
keys=58743268;entries=58743268;ibtr_memory_used=2847777760;nbtr_memory_used=1821041308;si_accounted_memory=4668819068; \
load_pct=100;loadtime=0;stat_write_reqs=462802373;stat_write_success=462802373;stat_write_errs=0;stat_delete_reqs=306554517; \
stat_delete_success=306554517;stat_delete_errs=0;stat_gc_recs=12489653;stat_gc_time=36475718;query_reqs=21281009; \
query_avg_rec_count=0;query_avg_record_size=0;query_agg=0;query_agg_avg_rec_count=0;query_agg_avg_record_size=0; \
query_lookups=21281009;query_lookup_avg_rec_count=0;query_lookup_avg_record_size=0;gc-period=1000;gc-max-units=1000; \
data-max-memory=18446744073709551615;tracing=0;histogram=false;ignore-not-sync=true

Can anyone tell me how to find the detail? Many thanks.


To be more specific, if I want to simple choose the high selectivity secondary index (the one that finds the smallest number of rows), what index statistics can I use?


#2

Here is what I found: keys is the unique values in the bin you have indexed, entries is the total records that were (eligible and were) indexed. I ran this test file on AQL: ( I had 3 unique integer values for age, index was numeric, two records had age as string. 12 total records, only 10 were indexed.)

INSERT INTO test.testset (PK, name, age) VALUES ('key1', 'Jack', 26)
INSERT INTO test.testset (PK, name, age) VALUES ('key2', 'Jill', 20)
INSERT INTO test.testset (PK, name, age) VALUES ('key3', 'James', 20)
INSERT INTO test.testset (PK, name, age) VALUES ('key4', 'Jim', 26)
INSERT INTO test.testset (PK, name, age) VALUES ('key5', 'Julia', 26)
INSERT INTO test.testset (PK, name, age) VALUES ('key6', 'Sally', 20)
INSERT INTO test.testset (PK, name, age) VALUES ('key7', 'Sean', 20)
INSERT INTO test.testset (PK, name, age) VALUES ('key8', 'Sam', 12)
INSERT INTO test.testset (PK, name, age) VALUES ('key9', 'Susan', 12)
INSERT INTO test.testset (PK, name, age) VALUES ('key0', 'Sandra', 20)
INSERT INTO test.testset (PK, name, age) VALUES ('key10', 'Jack', 'ab')
INSERT INTO test.testset (PK, name, age) VALUES ('key11', 'Casey', 'cd')

create index idtest on test.testset (age) numeric

Then run asinfo:

asinfo -v 'sindex/test/idtest'
keys=3;entries=10;ibtr_memory_used=18432;nbtr_memory_used=313;si_accounted_memory=18745;load_pct=100;loadtime=2;write_success=10;write_error=0;delete_success=0;delete_error=0;stat_gc_recs=0;stat_gc_time=0;query_reqs=0;query_avg_rec_count=0;query_avg_record_size=0;query_agg=0;query_agg_avg_rec_count=0;query_agg_avg_record_size=0;query_lookups=0;query_lookup_avg_rec_count=0;query_lookup_avg_record_size=0;gc-period=1000;gc-max-units=1000;data-max-memory=ULONG_MAX;histogram=false;ignore-not-sync=true

#3

Thanks very much, @pgupta, it really help!

keys and entries can evaluate the cardinality of this index.

I also do some test, and found that the query_reqs=4;query_avg_rec_count=1 and query_lookups=4;query_lookup_avg_rec_count=1 information may be somehow useful, they represent the query number for this index and the average result count.

It seems that the index statistics information is not very straightforward for query optimization, but can be useful if use correctly, we will test this later.