Upgrade 4.9 to 6.3

Hello

We trying upgrade cluster to new version.

  • 5 nodes
  • SI indexes
  • cpp client

After upgrade one node, near 90-95% queries with SI will timeout, with PK near 5%. Also on new node client connections in 3x times more

All migrations and SI building is completed.

How to fix it and why this happen?

With best regards, Oleg

P.s. also we tested 6.4 and it’s doesn’t work with SI if multiversion cluster

Are your SI queries returning very few or no records at all?

As mentioned in the 6.0 release notes: “If you use ‘equality’ secondary index queries that return a small number of records, you may see a latency increase in server releases 6.0-6.2.”

Specifically, in 6.3:

  • AER-6588 (QUERY) Added namespace context configuration item inline-short-queries to run short queries directly in service threads.

And then in 6.4:

  • AER-6631 (QUERY) Share partition reservations among concurrent short queries, to improve performance.

Make sure you set the queries to be short-queries from the client policy and turn on the inline-short-queries in the client policy. Not sure why 6.4 would not work similarly to 6.3 in terms of multiversion cluster… I would recommend trying with a 6.4 cluster, though, were all nodes are updated since queries would have to hit all the nodes (unless specific partitions are targeted from the client application).

Hello.

My SI query return 1 small record.

I change all queries to short and add param “inline-short-queries”. I also try 6.4 But perfomance is terrible.

I created new set with lookup table, and change SI Query to PI Query via that set, after that perfomance stay like on 4.9.

The core PI queries design didn’t really change between 4.9 and 6.4 so I would expect the performance of those to be the same (if not better on 6.4 due to other general improvements).

But for SI queries, as of version 6.0, in order to make sure all partitions are queried during cluster changes, the Secondary Indexes are built on a ‘per partition’ basis… therefore, for smaller cluster sizes that would return very few records, the time spent checking for each partition will be much more meaningful and the biggest portion of the query compared to the time spent reading records.

I am curious, though, did changing to short queries and setting inline-short-queries to true help? Did 6.4 show any improvement on top of that? Would you be able to quantify the performance for 6.0, 6.3, 6.3 with short-queries and inlining, and finally, 6.4 (with short-queries and inlining).

I believe there will be further improvements or a new secondary index model in a future version for such queries returning very few or no results.

I am curious, though, did changing to short queries and setting inline-short-queries to true help? No, its not resolve problem. It a bit improve perfomance. I didnt see any better perfomance between 6.4 and 6.3 in my case

Also I set query.replica=AS_POLICY_REPLICA_ANY on client and that increase perfomance

In perf top

  27.44%  asd                 [.] query_reduce
   6.75%  asd                 [.] gc_collect_cb
   5.29%  asd                 [.] as_index_sprig_traverse

in asadm

I believe there will be further improvements or a new secondary index model in a future version for such queries returning very few or no results.

It will be very good. In next year we plan migrate to 7.0

Hum… AS_POLICY_REPLICA_ANY shouldn’t/wouldn’t make sense for queries since queries would have to hit all the nodes in the cluster… It could indirectly help if you had hotkeys on your read/write transactions as those would then be distributed differently (the first node is doing much less batch index transactions as well and those are much slower… that same node is having much slower read/write transactions too).

But there is something else that seems wrong in your setup since only one node appears to be doing SI queries… is this because you are running mixed versions and older versions do not have si short queries stat or is the sindex not defined on those other nodes?