Using filesystem to have better caching via VFS

TLDR; We found file system to be really bad when the system is running low on memory

Details… We actually experimented for the same reason - to see if we can get free read-cache. We tested in the past comparing raw SSD, ext3 and ext4 filesystems. While ext3 was a bit poor than raw SSD, ext4 was better than raw SSD (a bit inconsistent but better). But the fun ends there!

While the short-term benchmarks show everything in good light, there were some really bad nightmares when our customers used filesystems it in production. Since then we are recommending against filesystem.

Without going into too much detail, the fundamental issue is that the OS/filesystem is not very good at handling dirty memory pages when it is already low on memory resource. Seems it gets really desperate/aggressive to reclaim memory resource (memory compaction) and some of it happens under kernel locks. So, when the OS is busy doing those things it blocks the process. Our aerospike latency characteristics which are very nice otherwise took a real beating. We also observed impact on network efficiency. We think its because even network communication needs memory buffers per connection and are slowed down due to the same issue. There are few instnaces where the OOM manager of Linux ended up killing aerospike process as it is one of the biggest consumers of memory. Obviously, it cannot kill the filesystem. So, it killed aerospike.

We could clearly see some kernel memory compaction functions (like isolate_freepages_block) taking too much CPU time when we profiled the process which is in distress. See discussions on this function going into high CPU consumption - LKML: "Jim Schutt": excessive CPU utilization by isolate_freepages? ; http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/044895.html. We tried using /proc/sys/vm/drop_caches. We ran it almost every 5 mins. It provided some cushion but it did not really help.

2 Likes