[ceph-users] Increase recovery / backfilling speed (with many small objects)

Discussion:

Stefan Kooman

2018-01-05 20:13:01 UTC

Hi,

I know I'm not the only one with this question as I have see similar questions on this list:
How to speed up recovery / backfilling?

Current status:

pgs: 155325434/800312109 objects degraded (19.408%)
1395 active+clean
440 active+undersized+degraded+remapped+backfill_wait
21 active+undersized+degraded+remapped+backfilling

io:
client: 180 kB/s rd, 5776 kB/s wr, 273 op/s rd, 440 op/s wr
recovery: 2990 kB/s, 109 keys/s, 114 objects/s

What we did? Shutdown one DC. Fill cluster with loads of objects, turn
DC back on (size = 3, min_size=2). To test exactly this: recovery.

I have been going trough all the recovery options (including legacy) but
I cannot get the recovery speed to increase:

osd_recovery_op_priority 63
osd_client_op_priority 3

^^ yup, reversed those, to no avail

osd_recovery_max_active 10'

^^ This helped for a short period of time, and then it went back to
"slow" mode

osd_recovery_max_omap_entries_per_chunk 0
osd_recovery_max_chunk 67108864

Haven't seen any change in recovery speed.

osd_recovery_sleep_ssd": "0.000000
^^ default for SSD

The whole cluster is idle, ODSs have very low load. What can be the
reason for the slow recovery? Something is holding it back but I cannot
think of what.

Ceph Luminous 12.2.2 (bluestore on lvm, all SSD)

Thanks,

Stefan

--
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / ***@bit.nl

Chris Sarginson

2018-01-05 20:49:43 UTC

Permalink

You probably want to consider increasing osd max backfills

You should be able to inject this online

http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/

You might want to drop your osd recovery max active settings back down to
around 2 or 3, although with it being SSD your performance will probably be
fine.

Post by Stefan Kooman
Hi,
I know I'm not the only one with this question as I have see similar
How to speed up recovery / backfilling?
pgs: 155325434/800312109 objects degraded (19.408%)
1395 active+clean
440 active+undersized+degraded+remapped+backfill_wait
21 active+undersized+degraded+remapped+backfilling
client: 180 kB/s rd, 5776 kB/s wr, 273 op/s rd, 440 op/s wr
recovery: 2990 kB/s, 109 keys/s, 114 objects/s
What we did? Shutdown one DC. Fill cluster with loads of objects, turn
DC back on (size = 3, min_size=2). To test exactly this: recovery.
I have been going trough all the recovery options (including legacy) but
osd_recovery_op_priority 63
osd_client_op_priority 3
^^ yup, reversed those, to no avail
osd_recovery_max_active 10'
^^ This helped for a short period of time, and then it went back to
"slow" mode
osd_recovery_max_omap_entries_per_chunk 0
osd_recovery_max_chunk 67108864
Haven't seen any change in recovery speed.
osd_recovery_sleep_ssd": "0.000000
^^ default for SSD
The whole cluster is idle, ODSs have very low load. What can be the
reason for the slow recovery? Something is holding it back but I cannot
think of what.
Ceph Luminous 12.2.2 (bluestore on lvm, all SSD)
Thanks,
Stefan
--
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Stefan Kooman

2018-01-08 10:34:22 UTC

Permalink

Post by Chris Sarginson
You probably want to consider increasing osd max backfills
You should be able to inject this online
http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/
You might want to drop your osd recovery max active settings back down to
around 2 or 3, although with it being SSD your performance will probably be
fine.

Thanks. If forgot to mention I already increased that setting to "10"
(and eventually 50). It will increase the speed a little bit: from 150
objects /s to ~ 400 objects / s. It would still take days for the cluster
to recover.

Gr. Stefan

--
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / ***@bit.nl

Mark Schouten

2018-01-08 10:55:48 UTC

Permalink

Post by Stefan Kooman
Thanks. If forgot to mention I already increased that setting to "10"
(and eventually 50). It will increase the speed a little bit: from 150
objects /s to ~ 400 objects / s. It would still take days for the cluster
to recover.

There was some discussion a week or so ago about the tweaks you guys did to
detect an issue ultra fast. One of the responses was that the cluster would
get pretty busy because of that. Have you tried reverting that?

Mark

--
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | ***@tuxis.nl

Stefan Kooman

2018-02-28 14:52:58 UTC

Permalink

Hi,

We recently learned on this list about the "rotational_journal = 1" for
some (all?) NVMe / SSD setups. We also hit this issue (see below). It
would eventually take a week to recover ... This was all "scratch data"
so didn't matter anyway. We recently had to do some reovery /
backfilling on our OSD nodes. Only large objects were stored now (rbd
chunks) so reovery speed was already much better. Still we had to crank
osd_max_backfills to 6, and osd_max_recovery to 3 to get some more
recovery performance. TL;DR: we set osd_recovery_sleep_hdd to 0 as well
as osd_recovery_sleep_hybrid to 0 and had another node recover. Already
with default recovery settings performance was much better. With
recovery / backfills set to 3, recovery went really fast. See [1] for
some "before / after" impression. Max throughput was around 1800 MB/s,
each OSD doing some 5K writes. For sure this was not the limit. We would
hit max nic bandwith pretty soon though.

ceph++

Gr. Stefan

[1]: https://owncloud.kooman.org/s/mvbMCVLFbWjAyOn#pdfviewer

Post by Stefan Kooman
Hi,
How to speed up recovery / backfilling?
pgs: 155325434/800312109 objects degraded (19.408%)
1395 active+clean
440 active+undersized+degraded+remapped+backfill_wait
21 active+undersized+degraded+remapped+backfilling
client: 180 kB/s rd, 5776 kB/s wr, 273 op/s rd, 440 op/s wr
recovery: 2990 kB/s, 109 keys/s, 114 objects/s
What we did? Shutdown one DC. Fill cluster with loads of objects, turn
DC back on (size = 3, min_size=2). To test exactly this: recovery.
I have been going trough all the recovery options (including legacy) but
osd_recovery_op_priority 63
osd_client_op_priority 3
^^ yup, reversed those, to no avail
osd_recovery_max_active 10'
^^ This helped for a short period of time, and then it went back to
"slow" mode
osd_recovery_max_omap_entries_per_chunk 0
osd_recovery_max_chunk 67108864
Haven't seen any change in recovery speed.
osd_recovery_sleep_ssd": "0.000000
^^ default for SSD

Didn't think about hdd / hybrid setting, as we have all SSD.

Gr. Stefan

--
| BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / ***@bit.nl