Hi. I have an aerospike cluster of 3 nodes, with 4 namespaces. One of them, lets call it namespace n1, contains the vast mayority of our data, with replication factor 2 and around 450M objects in a set (s1). We don’t have any secondary indexes on that set. I’m attaching an anonymized aerospike.conf example.
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 2 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 8
transaction-queues 8
transaction-threads-per-queue 8
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
access-address 10.0.0.210
port 3000
}
heartbeat {
mode mesh
address 10.0.0.210
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
#asd1
mesh-seed-address-port 10.0.0.167 3002
# mode multicast
# address 10.0.0.4
# port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace n1 {
replication-factor 2
memory-size 29G
default-ttl 60d # 30 days, use 0 to never expire/evict.
high-water-memory-pct 70
high-water-disk-pct 50
stop-writes-pct 90
storage-engine device {
device /dev/xvdd
device /dev/xvde
scheduler-mode noop
write-block-size 128K
}
}
namespace n2 {
replication-factor 2
memory-size 29G
default-ttl 365d # 30 days, use 0 to never expire/evict.
conflict-resolution-policy last-update-time
storage-engine device {
file /data/n2.dat
filesize 25G
data-in-memory true # Store data in memory in addition to file.
}
}
namespace n3 {
replication-factor 2
memory-size 29G
default-ttl 7d # 30 days, use 0 to never expire/evict.
storage-engine device {
file /data/n3.dat
filesize 25G
data-in-memory true # Store data in memory in addition to file.
}
}
namespace n4 {
replication-factor 1
memory-size 4G
default-ttl 5d # 30 days, use 0 to never expire/evict.
storage-engine memory
}
I am performing an update from 3.8 version to 3.13, so I need to do a cold restart of each node. I’ve started with one node, ant It is taking around 28~ hours to start up again. A sample of aerospike.log is included (only showing the progress lines):
Jul 25 2017 16:50:27 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=232000000 si-mem-used=18432 progress= 78% est-time=19683781 ms
Jul 25 2017 16:53:14 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=232500000 si-mem-used=18432 progress= 79% est-time=19526861 ms
Jul 25 2017 16:56:01 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=233000000 si-mem-used=18432 progress= 79% est-time=19369998 ms
Jul 25 2017 16:58:47 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=233500000 si-mem-used=18432 progress= 79% est-time=19212949 ms
Jul 25 2017 17:01:34 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=234000000 si-mem-used=18432 progress= 79% est-time=19056111 ms
Jul 25 2017 17:04:21 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=234500000 si-mem-used=18432 progress= 79% est-time=18899047 ms
Jul 25 2017 17:07:08 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=235000000 si-mem-used=18432 progress= 79% est-time=18742002 ms
Jul 25 2017 17:09:55 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=235500000 si-mem-used=18432 progress= 80% est-time=18584979 ms
Jul 25 2017 17:12:42 GMT: INFO (sindex): (secondary_index.c:4812) Sindex-ticker: ns=n1 si=<all> obj-scanned=236000000 si-mem-used=18432 progress= 80% est-time=18427828 ms
The weird thing is that the big set (s1) does not have any secondary indexes on them, but the log is showing that the scan progress for secondary indexation is taking too long. On the same namespace, we have another set (s2) with one secondary index. This set is currently without any records, so I believe it shouldn’t matter unless it makes s1 to perform an unnecesary scan on boot. Could that be the case? If not, what do you think it could be happening? Is it normal to take this long to do a cold restart with 450MM records on a set and no secondary index? Thanks.