Thomas Klute
2018-11-19 11:49:58 UTC
Hi,
we have a production cluster (3 nodes) stuck unclean after we had to
replace one osd.
Cluster recovered fine except some pgs that are stuck unclean for about
2-3 days now:
[***@ceph1 ~]# ceph health detail
HEALTH_WARN 7 pgs stuck unclean; recovery 8/8565617 objects degraded
(0.000%); recovery 38790/8565617 objects misplaced (0.453%)
pg 3.19 is stuck unclean for 324141.349243, current state
active+remapped, last acting [8,1,12]
pg 3.17f is stuck unclean for 324093.413743, current state
active+remapped, last acting [7,10,14]
pg 3.15e is stuck unclean for 324072.637573, current state
active+remapped, last acting [9,11,12]
pg 3.1cc is stuck unclean for 324141.437666, current state
active+remapped, last acting [6,4,9]
pg 3.47 is stuck unclean for 324014.795713, current state
active+remapped, last acting [4,7,14]
pg 3.1d6 is stuck unclean for 324019.903078, current state
active+remapped, last acting [8,0,4]
pg 3.83 is stuck unclean for 324024.970570, current state
active+remapped, last acting [5,11,13]
recovery 8/8565617 objects degraded (0.000%)
recovery 38790/8565617 objects misplaced (0.453%)
Grep on pg dump shows:
[***@ceph1 ~]# fgrep remapp /tmp/pgdump.txt
3.83 5423 0 0 5423 0 22046870528 3065
3065 active+remapped 2018-11-16 04:08:22.365825 85711'8469810
85711:8067280 [5,11] 5 [5,11,13] 5 83827'8450839
2018-11-14 14:01:20.330322 81079'8422114 2018-11-11 05:10:57.628147
3.47 5487 0 0 5487 0 22364503552 3010
3010 active+remapped 2018-11-15 18:24:24.047889 85711'9511787
85711:9975900 [4,7] 4 [4,7,14] 4 84165'9471676
2018-11-14 23:46:23.149867 80988'9434392 2018-11-11 02:00:23.427834
3.1d6 5567 0 2 5567 0 22652505618 3093
3093 active+remapped 2018-11-16 23:26:06.136037 85711'6730858
85711:6042914 [8,0] 8 [8,0,4] 8 83682'6673939
2018-11-14 09:15:37.810103 80664'6608489 2018-11-09 09:21:00.431783
3.1cc 5656 0 0 5656 0 22988533760 3088
3088 active+remapped 2018-11-17 09:18:42.263108 85711'9795820
85711:8040672 [6,4] 6 [6,4,9] 6 80670'9756755
2018-11-10 13:07:35.097811 80664'9742234 2018-11-09 04:33:10.497507
3.15e 5564 0 6 5564 0 22675107328 3007
3007 active+remapped 2018-11-17 02:47:44.282884 85711'9000186
85711:8021053 [9,11] 9 [9,11,12] 9 83502'8957026
2018-11-14 03:31:18.592781 80664'8920925 2018-11-09 22:15:54.478402
3.17f 5601 0 0 5601 0 22861908480 3077
3077 active+remapped 2018-11-17 01:16:34.016231 85711'31880220
85711:30659045 [7,10] 7 [7,10,14] 7 83668'31705772
2018-11-14 08:35:10.952368 80664'31649045 2018-11-09 04:40:28.644421
3.19 5492 0 0 5492 0 22460691985 3016
3016 active+remapped 2018-11-15 18:54:32.268758 85711'16782496
85711:15483621 [8,1] 8 [8,1,12] 8 84542'16774356
2018-11-15 09:40:41.713627 82163'16760520 2018-11-12 13:13:29.764191
We running Jewel (10.2.11) on Centos 7:
rpm -qa |grep ceph
ceph-radosgw-10.2.11-0.el7.x86_64
libcephfs1-10.2.11-0.el7.x86_64
ceph-mds-10.2.11-0.el7.x86_64
ceph-release-1-1.el7.noarch
ceph-common-10.2.11-0.el7.x86_64
ceph-selinux-10.2.11-0.el7.x86_64
python-cephfs-10.2.11-0.el7.x86_64
ceph-base-10.2.11-0.el7.x86_64
ceph-osd-10.2.11-0.el7.x86_64
ceph-mon-10.2.11-0.el7.x86_64
ceph-deploy-1.5.39-0.noarch
ceph-10.2.11-0.el7.x86_64
Could please someone help how to proceed?
Thanks and kind regards,
Thomas
we have a production cluster (3 nodes) stuck unclean after we had to
replace one osd.
Cluster recovered fine except some pgs that are stuck unclean for about
2-3 days now:
[***@ceph1 ~]# ceph health detail
HEALTH_WARN 7 pgs stuck unclean; recovery 8/8565617 objects degraded
(0.000%); recovery 38790/8565617 objects misplaced (0.453%)
pg 3.19 is stuck unclean for 324141.349243, current state
active+remapped, last acting [8,1,12]
pg 3.17f is stuck unclean for 324093.413743, current state
active+remapped, last acting [7,10,14]
pg 3.15e is stuck unclean for 324072.637573, current state
active+remapped, last acting [9,11,12]
pg 3.1cc is stuck unclean for 324141.437666, current state
active+remapped, last acting [6,4,9]
pg 3.47 is stuck unclean for 324014.795713, current state
active+remapped, last acting [4,7,14]
pg 3.1d6 is stuck unclean for 324019.903078, current state
active+remapped, last acting [8,0,4]
pg 3.83 is stuck unclean for 324024.970570, current state
active+remapped, last acting [5,11,13]
recovery 8/8565617 objects degraded (0.000%)
recovery 38790/8565617 objects misplaced (0.453%)
Grep on pg dump shows:
[***@ceph1 ~]# fgrep remapp /tmp/pgdump.txt
3.83 5423 0 0 5423 0 22046870528 3065
3065 active+remapped 2018-11-16 04:08:22.365825 85711'8469810
85711:8067280 [5,11] 5 [5,11,13] 5 83827'8450839
2018-11-14 14:01:20.330322 81079'8422114 2018-11-11 05:10:57.628147
3.47 5487 0 0 5487 0 22364503552 3010
3010 active+remapped 2018-11-15 18:24:24.047889 85711'9511787
85711:9975900 [4,7] 4 [4,7,14] 4 84165'9471676
2018-11-14 23:46:23.149867 80988'9434392 2018-11-11 02:00:23.427834
3.1d6 5567 0 2 5567 0 22652505618 3093
3093 active+remapped 2018-11-16 23:26:06.136037 85711'6730858
85711:6042914 [8,0] 8 [8,0,4] 8 83682'6673939
2018-11-14 09:15:37.810103 80664'6608489 2018-11-09 09:21:00.431783
3.1cc 5656 0 0 5656 0 22988533760 3088
3088 active+remapped 2018-11-17 09:18:42.263108 85711'9795820
85711:8040672 [6,4] 6 [6,4,9] 6 80670'9756755
2018-11-10 13:07:35.097811 80664'9742234 2018-11-09 04:33:10.497507
3.15e 5564 0 6 5564 0 22675107328 3007
3007 active+remapped 2018-11-17 02:47:44.282884 85711'9000186
85711:8021053 [9,11] 9 [9,11,12] 9 83502'8957026
2018-11-14 03:31:18.592781 80664'8920925 2018-11-09 22:15:54.478402
3.17f 5601 0 0 5601 0 22861908480 3077
3077 active+remapped 2018-11-17 01:16:34.016231 85711'31880220
85711:30659045 [7,10] 7 [7,10,14] 7 83668'31705772
2018-11-14 08:35:10.952368 80664'31649045 2018-11-09 04:40:28.644421
3.19 5492 0 0 5492 0 22460691985 3016
3016 active+remapped 2018-11-15 18:54:32.268758 85711'16782496
85711:15483621 [8,1] 8 [8,1,12] 8 84542'16774356
2018-11-15 09:40:41.713627 82163'16760520 2018-11-12 13:13:29.764191
We running Jewel (10.2.11) on Centos 7:
rpm -qa |grep ceph
ceph-radosgw-10.2.11-0.el7.x86_64
libcephfs1-10.2.11-0.el7.x86_64
ceph-mds-10.2.11-0.el7.x86_64
ceph-release-1-1.el7.noarch
ceph-common-10.2.11-0.el7.x86_64
ceph-selinux-10.2.11-0.el7.x86_64
python-cephfs-10.2.11-0.el7.x86_64
ceph-base-10.2.11-0.el7.x86_64
ceph-osd-10.2.11-0.el7.x86_64
ceph-mon-10.2.11-0.el7.x86_64
ceph-deploy-1.5.39-0.noarch
ceph-10.2.11-0.el7.x86_64
Could please someone help how to proceed?
Thanks and kind regards,
Thomas