Robert Sander
2018-11-21 09:22:50 UTC
Hi,
I was thinking if it is a good idea to just move the disk of an OSD to
another node.
Prerequisite is that the FileStore journal resp the BlueStore RocksDB
and WAL are located on the same device.
I have tested this move on a virtual ceph cluster and it seems to work.
Set noout, stopped the OSD process, unmounted everything and removed the
(virtual) disk from the original node. Attached the disk to the new node
and as soon as the disk is recognized an OSD process is started and some
rebalancing happens, after that the cluster is healthy).
But every time now when something is happening the numbers are odd.
E.g. when I add a new OSD:
2018-11-21 10:00:00.000301 mon.ceph04 mon.0 192.168.101.92:6789/0 10374 : cluster [INF] overall HEALTH_OK
2018-11-21 10:08:43.950612 mon.ceph04 mon.0 192.168.101.92:6789/0 10427 : cluster [INF] osd.8 192.168.101.156:6805/2361542 boot
2018-11-21 10:08:44.946176 mon.ceph04 mon.0 192.168.101.92:6789/0 10429 : cluster [WRN] Health check failed: 2/1716 objects misplaced (0.117%) (OBJECT_MISPLACED)
2018-11-21 10:08:44.946211 mon.ceph04 mon.0 192.168.101.92:6789/0 10430 : cluster [WRN] Health check failed: Reduced data availability: 11 pgs inactive, 37 pgs peering (PG_AVAILABILITY)
2018-11-21 10:08:44.946242 mon.ceph04 mon.0 192.168.101.92:6789/0 10431 : cluster [WRN] Health check failed: Degraded data redundancy: 230/1716 objects degraded (13.403%), 1 pg degraded (PG_DEGRADED)
2018-11-21 10:08:50.883625 mon.ceph04 mon.0 192.168.101.92:6789/0 10433 : cluster [WRN] Health check update: 40/1716 objects misplaced (2.331%) (OBJECT_MISPLACED)
2018-11-21 10:08:50.883684 mon.ceph04 mon.0 192.168.101.92:6789/0 10434 : cluster [WRN] Health check update: Degraded data redundancy: 7204/1716 objects degraded (419.814%), 83 pgs degraded (PG_DEGRADED)
2018-11-21 10:08:50.883719 mon.ceph04 mon.0 192.168.101.92:6789/0 10435 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 12 pgs inactive, 22 pgs peering)
2018-11-21 10:08:43.112896 osd.8 osd.8 192.168.101.156:6805/2361542 1 : cluster [WRN] failed to encode map e315 with expected crc
2018-11-21 10:08:57.390534 mon.ceph04 mon.0 192.168.101.92:6789/0 10436 : cluster [WRN] Health check update: Degraded data redundancy: 7001/1716 objects degraded (407.984%), 79 pgs degraded (PG_DEGRADED)
2018-11-21 10:09:00.891305 mon.ceph04 mon.0 192.168.101.92:6789/0 10437 : cluster [WRN] Health check update: 56/1716 objects misplaced (3.263%) (OBJECT_MISPLACED)
2018-11-21 10:09:02.391144 mon.ceph04 mon.0 192.168.101.92:6789/0 10438 : cluster [WRN] Health check update: Degraded data redundancy: 6413/1716 objects degraded (373.718%), 77 pgs degraded (PG_DEGRADED)
2018-11-21 10:09:06.897229 mon.ceph04 mon.0 192.168.101.92:6789/0 10441 : cluster [WRN] Health check update: 55/1716 objects misplaced (3.205%) (OBJECT_MISPLACED)
2018-11-21 10:09:07.391932 mon.ceph04 mon.0 192.168.101.92:6789/0 10442 : cluster [WRN] Health check update: Degraded data redundancy: 5533/1716 objects degraded (322.436%), 71 pgs degraded (PG_DEGRADED)
2018-11-21 10:09:12.392621 mon.ceph04 mon.0 192.168.101.92:6789/0 10443 : cluster [WRN] Health check update: Degraded data redundancy: 5499/1716 objects degraded (320.455%), 69 pgs degraded (PG_DEGRADED)
until finally
2018-11-21 10:11:07.407294 mon.ceph04 mon.0 192.168.101.92:6789/0 10495 : cluster [WRN] Health check update: 17/1716 objects misplaced (0.991%) (OBJECT_MISPLACED)
2018-11-21 10:11:07.507613 mon.ceph04 mon.0 192.168.101.92:6789/0 10496 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1/1716 objects degraded (0.058%), 1 pg degraded)
2018-11-21 10:11:12.407743 mon.ceph04 mon.0 192.168.101.92:6789/0 10497 : cluster [WRN] Health check update: 13/1716 objects misplaced (0.758%) (OBJECT_MISPLACED)
2018-11-21 10:11:17.408178 mon.ceph04 mon.0 192.168.101.92:6789/0 10500 : cluster [WRN] Health check update: 10/1716 objects misplaced (0.583%) (OBJECT_MISPLACED)
2018-11-21 10:11:25.406556 mon.ceph04 mon.0 192.168.101.92:6789/0 10501 : cluster [WRN] Health check update: 4/1716 objects misplaced (0.233%) (OBJECT_MISPLACED)
2018-11-21 10:11:31.016869 mon.ceph04 mon.0 192.168.101.92:6789/0 10502 : cluster [INF] Health check cleared: OBJECT_MISPLACED (was: 1/1716 objects misplaced (0.058%))
2018-11-21 10:11:31.016936 mon.ceph04 mon.0 192.168.101.92:6789/0 10503 : cluster [INF] Cluster is now healthy
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 93818 B
GeschÀftsfÌhrer: Peer Heinlein - Sitz: Berlin
I was thinking if it is a good idea to just move the disk of an OSD to
another node.
Prerequisite is that the FileStore journal resp the BlueStore RocksDB
and WAL are located on the same device.
I have tested this move on a virtual ceph cluster and it seems to work.
Set noout, stopped the OSD process, unmounted everything and removed the
(virtual) disk from the original node. Attached the disk to the new node
and as soon as the disk is recognized an OSD process is started and some
rebalancing happens, after that the cluster is healthy).
But every time now when something is happening the numbers are odd.
E.g. when I add a new OSD:
2018-11-21 10:00:00.000301 mon.ceph04 mon.0 192.168.101.92:6789/0 10374 : cluster [INF] overall HEALTH_OK
2018-11-21 10:08:43.950612 mon.ceph04 mon.0 192.168.101.92:6789/0 10427 : cluster [INF] osd.8 192.168.101.156:6805/2361542 boot
2018-11-21 10:08:44.946176 mon.ceph04 mon.0 192.168.101.92:6789/0 10429 : cluster [WRN] Health check failed: 2/1716 objects misplaced (0.117%) (OBJECT_MISPLACED)
2018-11-21 10:08:44.946211 mon.ceph04 mon.0 192.168.101.92:6789/0 10430 : cluster [WRN] Health check failed: Reduced data availability: 11 pgs inactive, 37 pgs peering (PG_AVAILABILITY)
2018-11-21 10:08:44.946242 mon.ceph04 mon.0 192.168.101.92:6789/0 10431 : cluster [WRN] Health check failed: Degraded data redundancy: 230/1716 objects degraded (13.403%), 1 pg degraded (PG_DEGRADED)
2018-11-21 10:08:50.883625 mon.ceph04 mon.0 192.168.101.92:6789/0 10433 : cluster [WRN] Health check update: 40/1716 objects misplaced (2.331%) (OBJECT_MISPLACED)
2018-11-21 10:08:50.883684 mon.ceph04 mon.0 192.168.101.92:6789/0 10434 : cluster [WRN] Health check update: Degraded data redundancy: 7204/1716 objects degraded (419.814%), 83 pgs degraded (PG_DEGRADED)
2018-11-21 10:08:50.883719 mon.ceph04 mon.0 192.168.101.92:6789/0 10435 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 12 pgs inactive, 22 pgs peering)
2018-11-21 10:08:43.112896 osd.8 osd.8 192.168.101.156:6805/2361542 1 : cluster [WRN] failed to encode map e315 with expected crc
2018-11-21 10:08:57.390534 mon.ceph04 mon.0 192.168.101.92:6789/0 10436 : cluster [WRN] Health check update: Degraded data redundancy: 7001/1716 objects degraded (407.984%), 79 pgs degraded (PG_DEGRADED)
2018-11-21 10:09:00.891305 mon.ceph04 mon.0 192.168.101.92:6789/0 10437 : cluster [WRN] Health check update: 56/1716 objects misplaced (3.263%) (OBJECT_MISPLACED)
2018-11-21 10:09:02.391144 mon.ceph04 mon.0 192.168.101.92:6789/0 10438 : cluster [WRN] Health check update: Degraded data redundancy: 6413/1716 objects degraded (373.718%), 77 pgs degraded (PG_DEGRADED)
2018-11-21 10:09:06.897229 mon.ceph04 mon.0 192.168.101.92:6789/0 10441 : cluster [WRN] Health check update: 55/1716 objects misplaced (3.205%) (OBJECT_MISPLACED)
2018-11-21 10:09:07.391932 mon.ceph04 mon.0 192.168.101.92:6789/0 10442 : cluster [WRN] Health check update: Degraded data redundancy: 5533/1716 objects degraded (322.436%), 71 pgs degraded (PG_DEGRADED)
2018-11-21 10:09:12.392621 mon.ceph04 mon.0 192.168.101.92:6789/0 10443 : cluster [WRN] Health check update: Degraded data redundancy: 5499/1716 objects degraded (320.455%), 69 pgs degraded (PG_DEGRADED)
until finally
2018-11-21 10:11:07.407294 mon.ceph04 mon.0 192.168.101.92:6789/0 10495 : cluster [WRN] Health check update: 17/1716 objects misplaced (0.991%) (OBJECT_MISPLACED)
2018-11-21 10:11:07.507613 mon.ceph04 mon.0 192.168.101.92:6789/0 10496 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1/1716 objects degraded (0.058%), 1 pg degraded)
2018-11-21 10:11:12.407743 mon.ceph04 mon.0 192.168.101.92:6789/0 10497 : cluster [WRN] Health check update: 13/1716 objects misplaced (0.758%) (OBJECT_MISPLACED)
2018-11-21 10:11:17.408178 mon.ceph04 mon.0 192.168.101.92:6789/0 10500 : cluster [WRN] Health check update: 10/1716 objects misplaced (0.583%) (OBJECT_MISPLACED)
2018-11-21 10:11:25.406556 mon.ceph04 mon.0 192.168.101.92:6789/0 10501 : cluster [WRN] Health check update: 4/1716 objects misplaced (0.233%) (OBJECT_MISPLACED)
2018-11-21 10:11:31.016869 mon.ceph04 mon.0 192.168.101.92:6789/0 10502 : cluster [INF] Health check cleared: OBJECT_MISPLACED (was: 1/1716 objects misplaced (0.058%))
2018-11-21 10:11:31.016936 mon.ceph04 mon.0 192.168.101.92:6789/0 10503 : cluster [INF] Cluster is now healthy
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
https://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 93818 B
GeschÀftsfÌhrer: Peer Heinlein - Sitz: Berlin