2015-03-04 08:26:52 UTC
Is there anything in the pipeline to add the ability to write the librbd
cache to ssd so that it can safely ignore sync requests? I have seen a
thread a few years back where Sage was discussing something similar, but I
can't find anything more recent discussing it.
I've been running lots of tests on our new cluster, buffered/parallel
performance is amazing (40K Read 10K write iops), very impressed. However
sync writes are actually quite disappointing.
Running fio with 128k block size and depth=1, normally only gives me about
300iops or 30MB/s. I'm seeing 2-3ms latency writing to SSD OSD's and from
what I hear that's about normal, so I don't think I have a ceph config
problem. For applications which do a lot of sync's, like ESXi over iSCSI or
SQL databases, this has a major performance impact.
Traditional storage arrays work around this problem by having a battery
backed cache which has latency 10-100 times less than what you can currently
achieve with Ceph and an SSD . Whilst librbd does have a writeback cache,
from what I understand it will not cache syncs and so in my usage case, it
effectively acts like a write through cache.
To illustrate the difference a proper write back cache can make, I put a 1GB
(512mb dirty threshold) flashcache in front of my RBD and tweaked the flush
parameters to flush dirty blocks at a large queue depth. The same fio test
(128k iodepth=1) now runs at 120MB/s and is limited by the performance of
SSD used by flashcache, as everything is stored as 4k blocks on the ssd. In
fact since everything is stored as 4k blocks, pretty much all IO sizes are
accelerated to max speed of the SSD. Looking at iostat I can see all the
IO's are getting coalesced into nice large 512kb IO's at a high queue depth,
which Ceph easily swallows.
If librbd could support writing its cache out to SSD it would hopefully
achieve the same level of performance and having it integrated would be