Hi, I am a bit confused about the way, the transaction-threads, transaction-queues & service-queues are configured
Please correct my understanding about the below and suggest some reading
In memory namespace: Since we might be adding an extra lag if we have the transaction-queues and the transaction threads, its a fair idea that we remove the t-queues and t-threads. In this case it makes sense that the number of service-threads is recommended to be equal to number of cores on the node
But in non memory namespace: since we have the t-queues and t-threads and it is the t-threads that interact with the drives, why is it that the t-threads count is not preferred to be equal to number of cores and have these grouped as the default 4 per t-queue. i,e
lets say that we have 32 core node and it is a non memory namespace. Why is it not that we have:
32 t-threads: one per core
32 grouped into 4 per t-queue = 8 t-queues &
8 service- threads
After I understand this, I would also want to get into the dependency of the count of t-threads on the work load
In the case of in-memory namespaces transaction queues aren’t used by default. The service thread will do the entire transaction inline, from picking the data off the NIC, to accessing the DRAM storage, to returning the result. This behavior is controlled by the allow-inline-transactions config parameter.
When the namespace stores its data on SSD, the service threads place work on the transaction queues, multiple cores get involved with transaction threads processing work from those queues. The recommended number of threads per-transaction queue are based on performance testing, with 3 working better for smaller objects, versus the default 4.
It really depends on the hardware you have, the generation of the chipset, and your workload, and it’s something you’ll have to benchmark, if you want to modify with confidence. For example, newer chipsets working with fast NVMe drives benefit from having more transaction threads per queue.
I think I still have few questions but before that:
could you please point me to some benchmarks that you might have noted, because I want to see how the combination of service threads, transaction queues, transaction threads is played with
You can use the benchmarking tool that comes with the Java client to load a data set, then run the same benchmark against it (a specific read/write workload) time after time, while adjusting your config. That will allow you to compare the performance (average TPS) of the different configurations.
By the way, this is the type of thing covered in Aerospike’s admin training.