Why there is no AIO but plain old read+write?

Hi, I’ve been looking at https://github.com/aerospike/aerospike-server/blob/180ed47a5fffc54b3e45faccb33c908bc189db2e/as/src/storage/drv_ssd.c and I wonder why you don’t use io_submit and io_getevents. It usually gives big performance boost (see HornetQ). It is easier to manage memory buffers when read/write are synchronous? Have you evaluated AIO? Regards, Jakub

We did prototype AIO and have seen good boost. AIO is in our product roadmap.

However, its a bit hard to integrate into the current architecture as its more synchronous in nature. So, it will need some significant changes which means a higher core risk. This is only the reason why we are not going full force on this. Today, we have multiple transaction threads (typically 64) which will exploit the I/O parallelism. But the AIO can achieve the same with far fewer threads. This will avoid a lot of thread context switches which is one of the performance killer.

Thanks for your suggestions/comments on the core of our architecture. Please share more ideas. We are always open to learn and improve.

Jakub, I lead quite a bit of investigation on adding AIO to the read path. As sunil says, it’s a positive improvement, and we would love to discuss adding proper AIO to our system. The largest question was how to balance AIO with network input, and we tried a number of coding styles, and believe we have the best blend. If you are interested in helping us recode the read code path based on AIO, we would enjoy talking to you. (the write code path has so few IOs that AIO is not as big a win).

Each thread must call read/write and the system as a whole must have enough capacity (think about heavy load) to fit all these request in IO scheduler structures in a more parallel than serial manner (at the end put as much as possible into NCQ with balancing latency and throughput, SSDs have independent memory channels and perform read/writes concurrently). After io_submit we almost directly go to the device. I think that AIO should be big win for both reads and writes.

Can you go into details? Aggregating operations from clients, submitting as a whole and going back to clients with callbacks, all with meeting given timeouts/latency deadlines?

Jakub, Linux’s event model has no mechanism for an efficient combined event system. We investigated using eventfd, but the common mechanism of using io_completion → signal → epoll is terribly slow. We looked at the underlying code used by libev and nginx, but these mechanisms are slower than single threaded. Without a unified event model, either you allocate some threads for network and some for storage and try to continually rebalance them, or you come up with a cleverer idea (which we have).

Our measurements do show that aio would be an improvement, but the benefit isn’t as much as you might think. There are other benefits that draw greater performance improvement - like Redis’ trick of pipelining (and similar batch write - AS only does batch read right now), and how we write to the network. If you have interest in adding aio support, please coordinate your pull request with our core engineering group, and they might be able to give architectural guidance in how to add it.

On a related note, at the beginning of the code base, we considered using a different OS like BSD because of kevent, which would have solved this problem, but the popularity of Linux makes it the only suitable choice.

Hi, I have added AIO for writing: https://drive.google.com/file/d/0B-f1Z0bEJlvQU3hKaVc0dGY4YkE/view?usp=sharing https://drive.google.com/file/d/0B-f1Z0bEJlvQVUZIV2tUTjRQd1E/view?usp=sharing My dev env is Fedora 21 inside VirtualBox. I see 10% better TPS. It means nothing, but I don’t have access to production servers yet, and cannot do real life tests.

Bug: io_destroy was missing, fixed: https://drive.google.com/open?id=0B-f1Z0bEJlvQbUtWbmtSMGsyb1U&authuser=0