Wido den Hollander
2018-11-14 14:38:36 UTC
Hi,
I'm in the middle of expanding a Ceph cluster and while having 'ceph -s'
open I suddenly saw a bunch of Placement Groups go undersized.
My first hint was that one or more OSDs have failed, but none did.
So I checked and I saw these Placement Groups undersized:
11.3b54 active+undersized+degraded+remapped+backfill_wait
[1795,639,1422] 1795 [1795,639] 1795
11.362f active+undersized+degraded+remapped+backfill_wait
[1431,1134,2217] 1431 [1134,1468] 1134
11.3e31 active+undersized+degraded+remapped+backfill_wait
[1451,1391,1906] 1451 [1906,2053] 1906
11.50c active+undersized+degraded+remapped+backfill_wait
[1867,1455,1348] 1867 [1867,2036] 1867
11.421e active+undersized+degraded+remapped+backfilling
[280,117,1421] 280 [280,117] 280
11.700 active+undersized+degraded+remapped+backfill_wait
[2212,1422,2087] 2212 [2055,2087] 2055
11.735 active+undersized+degraded+remapped+backfilling
[772,1832,1433] 772 [772,1832] 772
11.d5a active+undersized+degraded+remapped+backfill_wait
[423,1709,1441] 423 [423,1709] 423
11.a95 active+undersized+degraded+remapped+backfill_wait
[1433,1180,978] 1433 [978,1180] 978
11.a67 active+undersized+degraded+remapped+backfill_wait
[1154,1463,2151] 1154 [1154,2151] 1154
11.10ca active+undersized+degraded+remapped+backfill_wait
[2012,486,1457] 2012 [2012,486] 2012
11.2439 active+undersized+degraded+remapped+backfill_wait
[910,1457,1193] 910 [910,1193] 910
11.2f7e active+undersized+degraded+remapped+backfill_wait
[1423,1356,2098] 1423 [1356,2098] 1356
After searching I found that OSDs
1422,1431,1451,1455,1421,1422,1433,1441,1433,1463,1457,1457 and 1423 are
all running on the same (newly) added host.
I checked:
- The host did not reboot
- The OSDs did not restart
The OSDs are up_thru since map 646724 which is from 11:05 this morning
(4,5 hours ago), which is about the same time when these were added.
So these PGs are currently running on *2* replicas while they should be
running on *3*.
We just added 8 nodes with 24 disks each to the cluster, but none of the
existing OSDs were touched.
When looking at PG 11.3b54 I see that 1422 is a backfill target:
$ ceph pg 11.3b54 query|jq '.recovery_state'
The 'enter time' for this is about 30 minutes ago and that's about the
same time this has happened.
'might_have_unfound' tells me OSD 1982 which is in the same rack as 1422
(CRUSH replicates over racks), but that OSD is also online.
It's up_thru = 647122 and that's from about 30 minutes ago. That
ceph-osd process is however running since September and seems to be
functioning fine.
This confuses me as during such an expansion I know that normally a PG
would map to size+1 until the backfill finishes.
The cluster is running Luminous 12.2.8 on CentOS 7.5.
Any ideas on what this could be?
Wido
I'm in the middle of expanding a Ceph cluster and while having 'ceph -s'
open I suddenly saw a bunch of Placement Groups go undersized.
My first hint was that one or more OSDs have failed, but none did.
So I checked and I saw these Placement Groups undersized:
11.3b54 active+undersized+degraded+remapped+backfill_wait
[1795,639,1422] 1795 [1795,639] 1795
11.362f active+undersized+degraded+remapped+backfill_wait
[1431,1134,2217] 1431 [1134,1468] 1134
11.3e31 active+undersized+degraded+remapped+backfill_wait
[1451,1391,1906] 1451 [1906,2053] 1906
11.50c active+undersized+degraded+remapped+backfill_wait
[1867,1455,1348] 1867 [1867,2036] 1867
11.421e active+undersized+degraded+remapped+backfilling
[280,117,1421] 280 [280,117] 280
11.700 active+undersized+degraded+remapped+backfill_wait
[2212,1422,2087] 2212 [2055,2087] 2055
11.735 active+undersized+degraded+remapped+backfilling
[772,1832,1433] 772 [772,1832] 772
11.d5a active+undersized+degraded+remapped+backfill_wait
[423,1709,1441] 423 [423,1709] 423
11.a95 active+undersized+degraded+remapped+backfill_wait
[1433,1180,978] 1433 [978,1180] 978
11.a67 active+undersized+degraded+remapped+backfill_wait
[1154,1463,2151] 1154 [1154,2151] 1154
11.10ca active+undersized+degraded+remapped+backfill_wait
[2012,486,1457] 2012 [2012,486] 2012
11.2439 active+undersized+degraded+remapped+backfill_wait
[910,1457,1193] 910 [910,1193] 910
11.2f7e active+undersized+degraded+remapped+backfill_wait
[1423,1356,2098] 1423 [1356,2098] 1356
After searching I found that OSDs
1422,1431,1451,1455,1421,1422,1433,1441,1433,1463,1457,1457 and 1423 are
all running on the same (newly) added host.
I checked:
- The host did not reboot
- The OSDs did not restart
The OSDs are up_thru since map 646724 which is from 11:05 this morning
(4,5 hours ago), which is about the same time when these were added.
So these PGs are currently running on *2* replicas while they should be
running on *3*.
We just added 8 nodes with 24 disks each to the cluster, but none of the
existing OSDs were touched.
When looking at PG 11.3b54 I see that 1422 is a backfill target:
$ ceph pg 11.3b54 query|jq '.recovery_state'
The 'enter time' for this is about 30 minutes ago and that's about the
same time this has happened.
'might_have_unfound' tells me OSD 1982 which is in the same rack as 1422
(CRUSH replicates over racks), but that OSD is also online.
It's up_thru = 647122 and that's from about 30 minutes ago. That
ceph-osd process is however running since September and seems to be
functioning fine.
This confuses me as during such an expansion I know that normally a PG
would map to size+1 until the backfill finishes.
The cluster is running Luminous 12.2.8 on CentOS 7.5.
Any ideas on what this could be?
Wido