Discussion:
[ceph-users] High average apply latency Firefly
Klimenko, Roman
2018-12-04 10:20:05 UTC
Permalink
Hi everyone!

On the old prod cluster

- baremetal, 5 nodes (24 cpu, 256G RAM)

- ceph 0.80.9 filestore

- 105 osd, size 114TB (each osd 1.1T, SAS Seagate ST1200MM0018) , raw used 60%

- 15 journals (eash journal 0.4TB, Toshiba PX04SMB040)

- net 20Gbps

- 5 pools, size 2, min_size 1



we have discovered recently pretty high Average Apply latency, near 20 ms.

Using ceph osd perf I can see that sometimes osd's apply latency could be high as 300-400 ms on some disks. How I can tune ceph in order to reduce this latency?


ceph.conf:

https://pastebin.com/raw/MeF00YLG?
Janne Johansson
2018-12-04 10:24:20 UTC
Permalink
Post by Klimenko, Roman
Hi everyone!
On the old prod cluster
- baremetal, 5 nodes (24 cpu, 256G RAM)
- ceph 0.80.9 filestore
- 105 osd, size 114TB (each osd 1.1T, SAS Seagate ST1200MM0018) , raw used 60%
- 15 journals (eash journal 0.4TB, Toshiba PX04SMB040)
- net 20Gbps
- 5 pools, size 2, min_size 1
we have discovered recently pretty high Average Apply latency, near 20 ms.
Using ceph osd perf I can see that sometimes osd's apply latency could be high as 300-400 ms on some disks. How I can tune ceph in order to reduce this latency?
I would start with running "iostat" on all OSDs hosts and see if one
or more drives show a huge percent on utilization%.
Having one or a few drives that are lots slower than the rest (in many
cases it shows up as taking a long time to finish IO
and hence more utilization% than the other OSD drives) will hurt the
whole cluster speed.
If you find one or a few drives being extra slow, lower crush weight
so data moves off them to other healthy drives.
--
May the most significant bit of your life be positive.
Loading...