Asbackup - Why is there a performance difference when backing up different namespaces


#1

asbackup - Why is there a performance difference when backing up different namespaces

Problem description

When performing an asbackup of multiple namespaces, one namespace can backup much faster than others. The slower backups may even be using less network bandwidth as well.

Explanation

When running a backup, consideration needs to be given to record sizes. The backup will hit it’s maximum network or disk speed. How many records that is per second will vary. Note that when hitting network speed limit, the number of kb/s will be similar across namespaces. When hitting a disk speed limit, these will not.

Example:

Namespace 1: average 200 bytes / record

Namespace 2: average 4200 bytes / record

Namespace 1 from asbackup output, backs up 2.1 million records per second.

Namespace 2 from asbackup output, backs up 0.6 million records per second.

Considering sizes, that would be:

Namespace 1: 2.1*200 = 420 KiB/s

Namespace 2: 0.6*4200 = 2520 KiB/s

You will therefore notice on the network that namespace 2 is considerably faster, data wise, even though namespace 1, with smaller records, backs up more records per second.

This will be directly connected to the number of IOPS per second a disk can have at different sizes. In our example case:

  • 2.1 million IOPS with reads sized 200 bytes each
  • 0.6 million IOPS with reads sized 4200 bytes each

So, larger records reduce the read IOPS, but their larger size more than makes up for the reduced IOPS. That’s why larger records are backed up faster - in terms of KiB/s. In terms of records per second, smaller racords are backed up faster. But they are so much smaller, that their KiB/s is still smaller than the KiB/s for larger records.

In terms of SSD disks, each read will fall within sector size of 4KiB. As such, reading smaller records will be less efficient.

Notes

To confirm that the disks are in fact reaching their 100% utilization, being the bottlebeck, use iostat. If the bottleneck is not the disks on aerospike server side, it will be elsewhere, for example network, or disk speed on the backup server.

There isn’t much that can be done to speed the backup up, other than sizing. If the disks are already reaching 100% utilization they cannot be pushed any further. As such, to speed this up, either add more nodes or size differently, such that you have more disks per node to spread the read load from. Alternatively, if you have enough RAM and the use case permits it, consider using data-in-memory.

Keywords

ASBACKUP SLOW DIFFERENCE SPEED NAMESPACE

Timestamp

10/01/2018