Max sindex-builder-threads value

secondary
index

#1

Hi. I have a cluster which had some node restarts and now it is rebalancing and reindexing since a few hours. There’s only one server left reindexing, and until it completes I can’t perform queries.

I’ve already changed sindex-builder-threads value to 16 (on a 8 vCPU server), but it hasn’t had any noticeable impact. I was wondering what was the maximum safe value to set that variable? And any other recommendations to speed up the process?

Thanks


#2

If you allow migrations to happen instead of reading from disk, you can perform queries while migrations are going on with best-effort results… For your particular problem though, it’s most likely a bottleneck on the disk. What does ‘iostat -xky 1 10’ look like, and what about your load avg ‘sar -q’?


#3

Clarification: I’m running Aerospike CE 3.14.1.4 on Docker, so I ran the commands on the host:

iostat -xky 1 10 (3 iterations):

Linux 4.4.0-121-generic (backend-aerospike-03) 	05/08/2018 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.20    0.00    9.60   76.52    0.00    6.69

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00    32.00    1.00   40.00     8.00   288.00    14.44     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00 6757.00    0.00 36796.00     0.00    10.89    21.53    3.23    3.23    0.00   0.15 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.65    0.00    5.24   78.90    0.13    9.08

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00   257.00 5771.00    4.00 31164.00  1044.00    11.15    21.17    3.57    3.57    0.00   0.17 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.89    0.00    2.94   79.51    0.00   11.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00 5296.00    1.00 28316.00     4.00    10.69    22.31    4.32    4.26  340.00   0.19 100.00

Unfortunately, I’m not able to run ‘sar -q’ becuase I just instsalled sysstat and it seems to need a system restart.

Edit: what do you mean to allow migrations to happen instead of reading from disk?


#4

So you are showing a high iowait% time and %util is 100% on disk sda… I’m assuming this is the disk aerospike reads from? This is happening because you performed a cold start and either do not have your namespace set as cold-start-empty or did not blank the disk out. In some cases its desired… but if you have replicated data and just lost 1 node or are doing rolling restarts - you could technically empty the drive out and let the aerospike nodes re-replicate the data through migrations to the node. That’s what I mean. In some cases I’ve actually seen replication happen faster than secondary index building, but we had to test that to decide what was best for us… we also don’t have a use case to read from disk since we can’t tolerate zombie records and dont use tombstones. – Long story short though, for the problem you’re showing and the output of iostat - I think your problem is that your disk IO is maxed out. Not much that can be done there.


#5

I originally had 3 nodes with 3 disks (one disk each node). Then the cluster went down, so I emptied out a disk in order to be able to get the DB up fast, knowing that, with a replication factor of 2, I’d have no data loss (if I kept the 2 other disks). Then I added a 4th node to add overall capacity, and now 1 of the 2 original disks remaining has already built its entire index.

So basically, I’m left with this 3rd disk’s indexes being rebuilt, but I guess that to be able to zero it out with no data loss, I’d have to wait until migrations between the 4 nodes are done; right?

This is the info:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace        Node   Avail%   Evictions                 Master                Replica     Repl     Stop                Pending         Disk    Disk     HWM         Mem     Mem    HWM      Stop   
        .                                    (Objects,Tombstones)   (Objects,Tombstones)   Factor   Writes               Migrates         Used   Used%   Disk%        Used   Used%   Mem%   Writes%   
        .                                                                                                .        .           .                      .                      .        .        .                (tx,rx)            .       .       .           .       .      .         .   
ns1        node1:3000   53         0.000     (43.088 M, 0.000)      (18.100 M, 0.000)      2        false    (2.460 K, 1.669 K)      80.627 GB   41      50      10.302 GB   65      60     90        
ns1        node2:3000   56         0.000     (19.014 M, 0.000)      (32.490 M, 0.000)      2        false    (2.933 K, 1.569 K)      81.017 GB   41      50       7.580 GB   48      60     90        
ns1        node3:3000   90         0.000     (9.887 M, 0.000)       (8.800 M, 0.000)       2        false    (917.000,  2.103 K)     19.721 GB   10      50       2.543 GB   16      60     90        
ns1        node4:3000   84         0.000     (22.434 M, 0.000)      (9.125 M, 0.000)       2        false    (790.000,  1.759 K)     30.366 GB   16      50       3.909 GB   25      60     90        
ns1                                0.000     (94.424 M, 0.000)      (68.515 M, 0.000)                        (7.100 K, 7.100 K)     211.731 GB                   24.333 GB                            
Number of rows: 5

#6

Right. There’s nothing else that can be done in that case… unless you want to upgrade those disks :wink:


#7

Now that you mention it, I’m running on Docker Swarm, so AS can’t access devices directly, and I have to use the ‘file’ storage engine. Would it have a better performance with ‘device’ storage engine, even though it would be the same kind of disk?


#8

I’m not real sure how that would work on a docker container. Try spinning up a test instance and run some benchmarks to find out?


#9

Yep, I guess that’s what I’m going to do. My question though was if ‘device’ is more performant that ‘file’, considering the same disk.

Thanks anyway, you’ve been helpful!


#10

In theory, yes. By giving Aerospike direct access to the drives you are bypassing the kernel.