Elapsed time to build secondary index in cold restart vs elapsed time to write its data

I am experiencing very different times in writing data into a set (with a secondary index already defined) and the time it takes to build it after a cold restart. I am running aerospike on on a single M3xlarge EC2 instance, SSD storage with 2000 provisioned iops. Aerospike version is Community Edition 3.9.1.1.

Inserting 35 million records and indexing on a String bin which exists in all of the records takes about 45 minutes (this could maybe be improved, but not the focus of my question), all together 32GB of data. Right after that I perform a cold restart of the server. The primary index takes about 10 minutes to build, and the secondary index takes 5 hours. iowait is around 80 at that stage. Doubling the amount of provisioned iops has a drastic improvement on the the secondary index building (but not on the primary index building time).

My question is - why are the great differences between writing performance and secondary index build time upon restart, for same size of data?

my config:

    # Aerospike database configuration file.

service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}

logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}

network {
service {
address any
port 3000
}

heartbeat {
mode multicast
address 239.1.99.222
port 9918

# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.

interval 150
timeout 10
}

fabric {
port 3001
}

info {
port 3003
}
}

namespace prod {
        replication-factor 2
        memory-size 8G
        default-ttl 0d # 30 days, use 0 to never expire/evict.

        storage-engine  device {     # Configure the storage-engine to use persistence
        device /dev/xvda2    # raw device. Maximum size is 2 TiB
        # device /dev/<device>  # (optional) another raw device.
        write-block-size 128K   # adjust block size to make it efficient for SSDs.
        }
}

comparing a real-time insert/index isn’t quite the apple/apple comparison. The more appropriate comparison is insert all the data, then create the sindex.

How do you conclude that primary index building was only 10 minutes and secondary index building was 5 hours?

Hi @wchu, thanks for your reply. I am comparing these two because writing a new data sets hits the disk - to persist the data to the ssd device. On top of that there is event a minor penalty of updating the secondary index (in memory). Yes, if I am writing the data first only later adding the index (as you are suggesting) , I am experiencing the same high io and long time to read the data and creating the index. What I am asking why is there in Aerospike such a big difference in terms of elapsed time and io usage between writing the data (persisting to SSD) and reading it when creating the secondary index.

I conclude the elapsed times of primary index and secondary by looking at the log messages. They were explained here.

Tested building of SIs on coldstart. 100,000 records stored on HDD on file. Single node cluster, ver 3.10.1.2 Case 1: Index was defined, Aerospike coldstarted. (SI is built as data is read from HDD)

  1. Sindex-ticker done: ns=test si= si-mem-used=3283716 elapsed=404 ms 2 - repeat test ) Sindex-ticker done: ns=test si= si-mem-used=3292580 elapsed=570 ms 3 - repeat test) Sindex-ticker done: ns=test si= si-mem-used=3292556 elapsed=397 ms 4 - repeat) Sindex-ticker done: ns=test si= si-mem-used=3286100 elapsed=403 ms

Case 2: Index dropped, (no SI present). Aerospike coldstarted, then SI defined. (SI is built after data has been loaded from HDD)

  1. Sindex-ticker done: ns=test si=regIndex si-mem-used=3284756 elapsed=400 ms 2 - repeat) Sindex-ticker done: ns=test si=regIndex si-mem-used=3303292 elapsed=401 ms 3 - repeat) Sindex-ticker done: ns=test si=regIndex si-mem-used=3297812 elapsed=387 ms 4 - repeat) Sindex-ticker done: ns=test si=regIndex si-mem-used=3299092 elapsed=401 ms

I don’t see any appreciable difference in time in building the SI between the two scenarios.