Discussion:
[ceph-users] Decommissioning cluster - rebalance questions
s***@turka.nl
2018-12-03 15:41:36 UTC
Permalink
Hi,

Currently I am decommissioning an old cluster.

For example, I want to remove OSD Server X with all its OSD's.

I am following these steps for all OSD's of Server X:
- ceph osd out <osd>
- Wait for rebalance (active+clean)
- On OSD: service ceph stop osd.<osd>

Once the steps above are performed, the following steps should be
performed:
- ceph osd crush remove osd.<osd>
- ceph auth del osd.<osd>
- ceph osd rm <osd>


What I don't get is, when I perform 'ceph osd out <osd>' the cluster is
rebalancing, but when I perform 'ceph osd crush remove osd.<osd>' it
again starts to rebalance. Why does this happen? The cluster should be
already balanced after out'ed the osd. I didn't expect another rebalance
with removing the OSD from the CRUSH map.

Thanks!

Sinan Polat
Paul Emmerich
2018-12-03 16:35:27 UTC
Permalink
There's unfortunately a difference between an osd with weight 0 and
removing one item (OSD) from the crush bucket :(

If you want to remove the whole cluster completely anyways: either
keep it as down+out in the CRUSH map, i.e., just skip the last step.
Or just purge the OSD without setting it to out first.

Paul
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Post by s***@turka.nl
Hi,
Currently I am decommissioning an old cluster.
For example, I want to remove OSD Server X with all its OSD's.
- ceph osd out <osd>
- Wait for rebalance (active+clean)
- On OSD: service ceph stop osd.<osd>
Once the steps above are performed, the following steps should be
- ceph osd crush remove osd.<osd>
- ceph auth del osd.<osd>
- ceph osd rm <osd>
What I don't get is, when I perform 'ceph osd out <osd>' the cluster is
rebalancing, but when I perform 'ceph osd crush remove osd.<osd>' it
again starts to rebalance. Why does this happen? The cluster should be
already balanced after out'ed the osd. I didn't expect another rebalance
with removing the OSD from the CRUSH map.
Thanks!
Sinan Polat
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Marco Gaiarin
2018-12-04 08:53:29 UTC
Permalink
Mandi! ***@turka.nl
In chel di` si favelave...
Post by s***@turka.nl
What I don't get is, when I perform 'ceph osd out <osd>' the cluster is
rebalancing, but when I perform 'ceph osd crush remove osd.<osd>' it again
starts to rebalance. Why does this happen?
I've recently hit the same 'strangeness'. Note that i'm not a ceph
developer or 'power' (or 'old') user.

Seems to me that there's two ''rebalance'': one for safety, one for
optimization.


If you tear 'out' an OSD, ceph rebalance the data for safety. But you
don't have touched the crushmap, so data are scattered with the 'old'
crushmap.
So if then you remove that OSD (or, in any other way you touch the crushmap),
a rebalance for 'optimization' start.

In the same way, you can put 'slowly out' an OSD with:
ceph osd reweight <ID> X

(with 0 <= X <= 1) but still you don't touch the crusmap.

You can also 'slowly remove' an OSD with:
ceph osd crush reweight osd.<ID> X

(with 0 <= X <= <disk_size_in_TB>); in this way you can 'deweight' the
OSD in crushmap until 0, then you can safely remove.


I hope i've not sayed too much blasphemy... ;-)
--
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/
Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN)
marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
Jarek
2018-12-04 09:15:01 UTC
Permalink
On Mon, 03 Dec 2018 16:41:36 +0100
Post by s***@turka.nl
Hi,
Currently I am decommissioning an old cluster.
For example, I want to remove OSD Server X with all its OSD's.
- ceph osd out <osd>
- Wait for rebalance (active+clean)
- On OSD: service ceph stop osd.<osd>
Once the steps above are performed, the following steps should be
- ceph osd crush remove osd.<osd>
- ceph auth del osd.<osd>
- ceph osd rm <osd>
What I don't get is, when I perform 'ceph osd out <osd>' the cluster
is rebalancing, but when I perform 'ceph osd crush remove osd.<osd>'
it again starts to rebalance. Why does this happen? The cluster
should be already balanced after out'ed the osd. I didn't expect
another rebalance with removing the OSD from the CRUSH map.
'ceph osd out' doesn't change the host weight in crush map, 'ceph
osd crush remove' does.
Instead of 'ceph osd out' use 'ceph osd crush reweight'.
--
Pozdrawiam
Jarosław Mociak - Nettelekom GK Sp. z o.o.
Loading...