[ceph-users] High average apply latency Firefly

Janne Johansson

2018-12-04 10:24:20 UTC

Post by Klimenko, Roman
Hi everyone!
On the old prod cluster
- baremetal, 5 nodes (24 cpu, 256G RAM)
- ceph 0.80.9 filestore
- 105 osd, size 114TB (each osd 1.1T, SAS Seagate ST1200MM0018) , raw used 60%
- 15 journals (eash journal 0.4TB, Toshiba PX04SMB040)
- net 20Gbps
- 5 pools, size 2, min_size 1
we have discovered recently pretty high Average Apply latency, near 20 ms.
Using ceph osd perf I can see that sometimes osd's apply latency could be high as 300-400 ms on some disks. How I can tune ceph in order to reduce this latency?

I would start with running "iostat" on all OSDs hosts and see if one
or more drives show a huge percent on utilization%.
Having one or a few drives that are lots slower than the rest (in many
cases it shows up as taking a long time to finish IO
and hence more utilization% than the other OSD drives) will hurt the
whole cluster speed.
If you find one or a few drives being extra slow, lower crush weight
so data moves off them to other healthy drives.

--
May the most significant bit of your life be positive.