Discussion:
[ceph-users] non-effective new deep scrub interval
David DELON
2016-09-08 07:48:46 UTC
Permalink
Hello,

i'm using ceph jewel.
I would like to schedule the deep scrub operations on my own.
First of all, i have tried to change the interval value for 30 days:
In each /etc/ceph/ceph.conf, i have added:

[osd]
#30*24*3600
osd deep scrub interval = 2592000
I have restarted all the OSD daemons.
The new value has been taken into account as for each OSD:

ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep deep_scrub_interval
"osd_deep_scrub_interval": "2.592e+06",


I have checked the last_deep_scrub value for each pg with
ceph pg dump
And each pg has been deep scrubbed during the last 7 days (which is the default behavior).

Since i have made the changes 2 days ago, it keeps on deep scrubbing.
Do i miss something?

Thanks.
Christian Balzer
2016-09-08 11:30:23 UTC
Permalink
Hello,
Post by David DELON
Hello,
i'm using ceph jewel.
I would like to schedule the deep scrub operations on my own.
Welcome to the club, alas the ride isn't for the faint of heart.

You will want to (re-)search the ML archive (google) and in particular the
recent "Spreading deep-scrubbing load" thread.
Post by David DELON
[osd]
#30*24*3600
osd deep scrub interval = 2592000
I have restarted all the OSD daemons.
This could have been avoided by an "inject" for all OSDs.
Restarting (busy) OSDs isn't particular nice for a cluster.
Post by David DELON
ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep deep_scrub_interval
"osd_deep_scrub_interval": "2.592e+06",
I have checked the last_deep_scrub value for each pg with
ceph pg dump
And each pg has been deep scrubbed during the last 7 days (which is the default behavior).
See the above thread.
Post by David DELON
Since i have made the changes 2 days ago, it keeps on deep scrubbing.
Do i miss something?
At least 2 things, maybe more.

Unless you changed the "osd_scrub_max_interval" as well, that will enforce
things, by default after a week.

And with Jewel you get that well meaning, but turned on by default and
ill-documented "osd_scrub_interval_randomize_ratio", which will spread
things out happily and not when you want them.

Again, read the above thread.

Also your cluster _should_ be able to endure deep scrubs even when busy,
otherwise you're looking at trouble when you loose and OSD and the
resulting balancing as well.

Setting these to something sensible:
"osd_scrub_begin_hour": "0",
"osd_scrub_end_hour": "6",

and especially this:
"osd_scrub_sleep": "0.1",

will minimize the impact of scrub as well.

Christian
--
Christian Balzer Network/Systems Engineer
***@gol.com Global OnLine Japan/Rakuten Communications
http://www.gol.com/
David DELON
2016-09-08 15:09:27 UTC
Permalink
First, thanks for your answer Christian.
Post by David DELON
Hello,
Post by David DELON
Hello,
i'm using ceph jewel.
I would like to schedule the deep scrub operations on my own.
Welcome to the club, alas the ride isn't for the faint of heart.
You will want to (re-)search the ML archive (google) and in particular the
recent "Spreading deep-scrubbing load" thread.
It is not exactly what i would like to do. That's why i have posted.
I wanted to trigger on my own the deep scrubbing on sundays with a cron script...
Post by David DELON
Post by David DELON
[osd]
#30*24*3600
osd deep scrub interval = 2592000
I have restarted all the OSD daemons.
This could have been avoided by an "inject" for all OSDs.
Restarting (busy) OSDs isn't particular nice for a cluster.
I have first done the inject of the new value. But as it did not the trick after some hours and the "injectargs" command have returned
"(unchangeable)"
i have thought OSD restarts were needed...
Post by David DELON
Post by David DELON
ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep deep_scrub_interval
"osd_deep_scrub_interval": "2.592e+06",
I have checked the last_deep_scrub value for each pg with
ceph pg dump
And each pg has been deep scrubbed during the last 7 days (which is the default
behavior).
See the above thread.
Post by David DELON
Since i have made the changes 2 days ago, it keeps on deep scrubbing.
Do i miss something?
At least 2 things, maybe more.
Unless you changed the "osd_scrub_max_interval" as well, that will enforce
things, by default after a week.
Increasing osd_scrub_max_interval and osd_scrub_min_interval does not solve.
Post by David DELON
And with Jewel you get that well meaning, but turned on by default and
ill-documented "osd_scrub_interval_randomize_ratio", which will spread
things out happily and not when you want them.
Again, read the above thread.
Also your cluster _should_ be able to endure deep scrubs even when busy,
otherwise you're looking at trouble when you loose and OSD and the
resulting balancing as well.
"osd_scrub_begin_hour": "0",
"osd_scrub_end_hour": "6",
"osd_scrub_sleep": "0.1",
OK, i will consider this solution.
Post by David DELON
will minimize the impact of scrub as well.
Christian
--
Christian Balzer Network/Systems Engineer
http://www.gol.com/
Christian Balzer
2016-09-09 01:42:43 UTC
Permalink
Hello,
Post by David DELON
First, thanks for your answer Christian.
C'est rien.
Post by David DELON
Post by David DELON
Hello,
Post by David DELON
Hello,
i'm using ceph jewel.
I would like to schedule the deep scrub operations on my own.
Welcome to the club, alas the ride isn't for the faint of heart.
You will want to (re-)search the ML archive (google) and in particular the
recent "Spreading deep-scrubbing load" thread.
It is not exactly what i would like to do. That's why i have posted.
I wanted to trigger on my own the deep scrubbing on sundays with a cron script...
If you look at that thread (and others) that's what I do, too.
And ideally, not even needing a cron script after the first time,
provided your scrubs can fit into the time frame permitted.
Post by David DELON
Post by David DELON
Post by David DELON
[osd]
#30*24*3600
osd deep scrub interval = 2592000
I have restarted all the OSD daemons.
This could have been avoided by an "inject" for all OSDs.
Restarting (busy) OSDs isn't particular nice for a cluster.
I have first done the inject of the new value. But as it did not the trick after some hours and the "injectargs" command have returned
"(unchangeable)"
i have thought OSD restarts were needed...
I keep forgetting about that, annoying.
Post by David DELON
Post by David DELON
Post by David DELON
ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep
deep_scrub_interval
"osd_deep_scrub_interval": "2.592e+06",
I have checked the last_deep_scrub value for each pg with
ceph pg dump
And each pg has been deep scrubbed during the last 7 days (which is the default
behavior).
See the above thread.
Post by David DELON
Since i have made the changes 2 days ago, it keeps on deep scrubbing.
Do i miss something?
At least 2 things, maybe more.
Unless you changed the "osd_scrub_max_interval" as well, that will enforce
things, by default after a week.
Increasing osd_scrub_max_interval and osd_scrub_min_interval does not solve.
osd_scrub_min_interval has no impact on deep scrubs,
osd_scrub_max_interval interestingly and unexpectedly does.
Post by David DELON
Post by David DELON
And with Jewel you get that well meaning, but turned on by default and
ill-documented "osd_scrub_interval_randomize_ratio", which will spread
things out happily and not when you want them.
If you set osd_scrub_interval_randomize_ratio to 0, scrubs should be
become fixed interval and deterministic again.

Christian
Post by David DELON
Post by David DELON
Again, read the above thread.
Also your cluster _should_ be able to endure deep scrubs even when busy,
otherwise you're looking at trouble when you loose an OSD and the
resulting balancing as well.
"osd_scrub_begin_hour": "0",
"osd_scrub_end_hour": "6",
"osd_scrub_sleep": "0.1",
OK, i will consider this solution.
Post by David DELON
will minimize the impact of scrub as well.
Christian
--
Christian Balzer Network/Systems Engineer
http://www.gol.com/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian Balzer Network/Systems Engineer
***@gol.com Global OnLine Japan/Rakuten Communications
http://www.gol.com/
David DELON
2016-09-09 08:03:59 UTC
Permalink
Hi,
this is good for me:

ceph tell osd.* injectargs --osd_scrub_end_hour 7
ceph tell osd.* injectargs --osd_scrub_load_threshold 0.1

About the "(unchangeable)" warning, it seems to be a bug according:
http://tracker.ceph.com/issues/16054

Have a nice day.
D.
Post by David DELON
Hello,
Post by David DELON
First, thanks for your answer Christian.
C'est rien.
Post by David DELON
Post by David DELON
Hello,
Post by David DELON
Hello,
i'm using ceph jewel.
I would like to schedule the deep scrub operations on my own.
Welcome to the club, alas the ride isn't for the faint of heart.
You will want to (re-)search the ML archive (google) and in particular the
recent "Spreading deep-scrubbing load" thread.
It is not exactly what i would like to do. That's why i have posted.
I wanted to trigger on my own the deep scrubbing on sundays with a cron script...
If you look at that thread (and others) that's what I do, too.
And ideally, not even needing a cron script after the first time,
provided your scrubs can fit into the time frame permitted.
Post by David DELON
Post by David DELON
Post by David DELON
[osd]
#30*24*3600
osd deep scrub interval = 2592000
I have restarted all the OSD daemons.
This could have been avoided by an "inject" for all OSDs.
Restarting (busy) OSDs isn't particular nice for a cluster.
I have first done the inject of the new value. But as it did not the trick after
some hours and the "injectargs" command have returned
"(unchangeable)"
i have thought OSD restarts were needed...
I keep forgetting about that, annoying.
Post by David DELON
Post by David DELON
Post by David DELON
ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep
deep_scrub_interval
"osd_deep_scrub_interval": "2.592e+06",
I have checked the last_deep_scrub value for each pg with
ceph pg dump
And each pg has been deep scrubbed during the last 7 days (which is the default
behavior).
See the above thread.
Post by David DELON
Since i have made the changes 2 days ago, it keeps on deep scrubbing.
Do i miss something?
At least 2 things, maybe more.
Unless you changed the "osd_scrub_max_interval" as well, that will enforce
things, by default after a week.
Increasing osd_scrub_max_interval and osd_scrub_min_interval does not solve.
osd_scrub_min_interval has no impact on deep scrubs,
osd_scrub_max_interval interestingly and unexpectedly does.
Post by David DELON
Post by David DELON
And with Jewel you get that well meaning, but turned on by default and
ill-documented "osd_scrub_interval_randomize_ratio", which will spread
things out happily and not when you want them.
If you set osd_scrub_interval_randomize_ratio to 0, scrubs should be
become fixed interval and deterministic again.
Christian
Post by David DELON
Post by David DELON
Again, read the above thread.
Also your cluster _should_ be able to endure deep scrubs even when busy,
otherwise you're looking at trouble when you loose an OSD and the
resulting balancing as well.
"osd_scrub_begin_hour": "0",
"osd_scrub_end_hour": "6",
"osd_scrub_sleep": "0.1",
OK, i will consider this solution.
Post by David DELON
will minimize the impact of scrub as well.
Christian
--
Christian Balzer Network/Systems Engineer
http://www.gol.com/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian Balzer Network/Systems Engineer
http://www.gol.com/
Loading...