I have questions pertaining to profiling and tuning Aerospike.
Right now I have a following data model to emulate an inverted index in information retrieval.
Cluster ID (Key) of Integer | Document IDs (Map) of String : Double
1 | abc : 1.0, aec: 12.4, yufss: 14.09
2 | efd : 22.9, erf: 13.6, abc : 87.9
...
Total number of Rows are fixed at 1048576.
The each row can potentially grow to a rather large size of say 10k entries
We are suffering from database performance issue right now when performing batch reads against this set. Typically we perform a batch read on around 1000 rows and do further processing against the doc id contained in the rows.
A Batch read of such dimension takes us up to 2000 ms to complete, we want to keep the latency lower than 300 ms consistently. In addition, the reads have very large jitter and it is not acceptable for our use case.
Configuration I have so far:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 16
transaction-queues 16
transaction-threads-per-queue 3
proto-fd-max 15000
batch-threads 12
batch-max-requests 10000
}
namespace iq {
ldt-enabled true
replication-factor 2
memory-size 80G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine device {
device /dev/xvdb
write-block-size 1024K
data-in-memory true
}
}
3 node cluster with replication factor 2 each cluster is a 16 CPU machine with 120GB RAM. Cluster is configured with Thread affinity using taskset and irq configs to spread the NIC load.
Questions:
-
Is this achievable on Aerospike to keep the batch read latency to < 300ms and if it is possible what are the steps to perform this.
-
How does batch read work internally in Aerospike, what are the considerations that will affect the performance.
-
How do you monitor a breakdown of the latency for a batch read into say processing time of read and data transfer to client.