[ceph-users] yet another deep-scrub performance topic

Vladimir Prokofev

2018-12-10 11:05:52 UTC

Hello list.

Deep scrub totally kills cluster performance.
First of all, it takes several minutes to complete:
2018-12-09 01:39:53.857994 7f2d32fde700 0 log_channel(cluster) log [DBG] :
4.75 deep-scrub starts
2018-12-09 01:46:30.703473 7f2d32fde700 0 log_channel(cluster) log [DBG] :
4.75 deep-scrub ok

Second, while it runs, it consumes 100% of OSD time[1]. This is on an
ordinary 7200RPM spinner.
While this happens, VMs cannot access their disks, and that leads to
service interruptions.

I disabled scrub and deep-scrub operations for now, and have 2 major
questions:
- can I disable 'health warning' status for noscrub and nodeep-scrub? I
thought there was a way to do this, but can't find it. I want my cluster to
think it's healthy, so if any new 'slow requests' or anything else pops -
it will change status to 'health warning' again;
- is there a way to limit deepscrub impact on disk performance, or do I
just have to go and buy SSDs?

[1] https://imgur.com/a/TKH3uda