Chris Martin
2018-11-21 00:49:23 UTC
I am also having this problem. Zheng (or anyone else), any idea how to
perform this downgrade on a node that is also a monitor and an OSD
node?
dpkg complains of a dependency conflict when I try to install
ceph-mds_13.2.1-1xenial_amd64.deb:
```
dpkg: dependency problems prevent configuration of ceph-mds:
ceph-mds depends on ceph-base (= 13.2.1-1xenial); however:
Version of ceph-base on system is 13.2.2-1xenial.
```
I don't think I want to downgrade ceph-base to 13.2.1.
Thank you,
Chris Martin
perform this downgrade on a node that is also a monitor and an OSD
node?
dpkg complains of a dependency conflict when I try to install
ceph-mds_13.2.1-1xenial_amd64.deb:
```
dpkg: dependency problems prevent configuration of ceph-mds:
ceph-mds depends on ceph-base (= 13.2.1-1xenial); however:
Version of ceph-base on system is 13.2.2-1xenial.
```
I don't think I want to downgrade ceph-base to 13.2.1.
Thank you,
Chris Martin
Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and
marking mds repaird can resolve this.
Yan, Zheng
marking mds repaird can resolve this.
Yan, Zheng
I discovered http://tracker.ceph.com/issues/24236 and https://github.com/ceph/ceph/pull/22146
Make sure that it is not relevant in your case before proceeding to operations that modify on-disk data.
I ended up rescanning the entire fs using alternate metadata pool approach as in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
The first stage (scan_extents) completed in 84 hours (120M objects in data pool on 8 hdd OSDs on 4 hosts). The second (scan_inodes) was interrupted by OSDs failure so I have no timing stats but it seems to be runing 2-3 times faster than extents scan.
As to root cause -- in my case I recall that during upgrade I had forgotten to restart 3 OSDs, one of which was holding metadata pool contents, before restarting MDS daemons and that seemed to had an impact on MDS journal corruption, because when I restarted those OSDs, MDS was able to start up but soon failed throwing lots of 'loaded dup inode' errors.
Same problem...
# cephfs-journal-tool --journal=purge_queue journal inspect
2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object 500.0000016c
Overall journal integrity: DAMAGED
0x16c
0x5b000000-ffffffffffffffff
Just after upgrade to 13.2.2
Did you fixed it?
Hello,
Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2.
After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are damaged. Resetting purge_queue does not seem to work well as journal still appears to be damaged.
Can anybody help?
-789> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.mds2 Updating MDS map to version 586 from mon.2
-788> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map i am now mds.0.583
-787> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map state change up:rejoin --> up:active
-786> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 recovery_done -- successful recovery!
<skip>
-38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume: Decode error at read_pos=0x322ec6636
-37> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 set_want_state: up:active -> down:damaged
-36> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 _send down:damaged seq 137
-35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message to mon.ceph3 at mon:6789/0
-34> 2018-09-26 18:42:32.707 7f70f28a7700 1 -- mds:6800/e4cc09cf --> mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 -- 0x563b321ad480 con 0
<skip>
-3> 2018-09-26 18:42:32.743 7f70f98b5700 5 -- mds:6800/3838577103 >> mon:6789/0 conn(0x563b3213e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2 seq 29 0x563b321ab880 mdsbeaco
n(85106/mds2 down:damaged seq 311 v587) v7
-2> 2018-09-26 18:42:32.743 7f70f98b5700 1 -- mds:6800/3838577103 <== mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 ==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e
000
-1> 2018-09-26 18:42:32.743 7f70f98b5700 5 mds.beacon.mds2 handle_mds_beacon down:damaged seq 311 rtt 0.038261
0> 2018-09-26 18:42:32.743 7f70f28a7700 1 mds.mds2 respawn!
# cephfs-journal-tool --journal=purge_queue journal inspect
Overall journal integrity: DAMAGED
0x322ec65d9-ffffffffffffffff
# cephfs-journal-tool --journal=purge_queue journal reset
old journal was 13470819801~8463
new journal start will be 13472104448 (1276184 bytes past old end)
writing journal head
done
# cephfs-journal-tool --journal=purge_queue journal inspect
2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c
Overall journal integrity: DAMAGED
0xc8c
0x323000000-ffffffffffffffff
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Make sure that it is not relevant in your case before proceeding to operations that modify on-disk data.
I ended up rescanning the entire fs using alternate metadata pool approach as in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
The first stage (scan_extents) completed in 84 hours (120M objects in data pool on 8 hdd OSDs on 4 hosts). The second (scan_inodes) was interrupted by OSDs failure so I have no timing stats but it seems to be runing 2-3 times faster than extents scan.
As to root cause -- in my case I recall that during upgrade I had forgotten to restart 3 OSDs, one of which was holding metadata pool contents, before restarting MDS daemons and that seemed to had an impact on MDS journal corruption, because when I restarted those OSDs, MDS was able to start up but soon failed throwing lots of 'loaded dup inode' errors.
Same problem...
# cephfs-journal-tool --journal=purge_queue journal inspect
2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object 500.0000016c
Overall journal integrity: DAMAGED
0x16c
0x5b000000-ffffffffffffffff
Just after upgrade to 13.2.2
Did you fixed it?
Hello,
Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2.
After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are damaged. Resetting purge_queue does not seem to work well as journal still appears to be damaged.
Can anybody help?
-789> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.mds2 Updating MDS map to version 586 from mon.2
-788> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map i am now mds.0.583
-787> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map state change up:rejoin --> up:active
-786> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 recovery_done -- successful recovery!
<skip>
-38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume: Decode error at read_pos=0x322ec6636
-37> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 set_want_state: up:active -> down:damaged
-36> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 _send down:damaged seq 137
-35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message to mon.ceph3 at mon:6789/0
-34> 2018-09-26 18:42:32.707 7f70f28a7700 1 -- mds:6800/e4cc09cf --> mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 -- 0x563b321ad480 con 0
<skip>
-3> 2018-09-26 18:42:32.743 7f70f98b5700 5 -- mds:6800/3838577103 >> mon:6789/0 conn(0x563b3213e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2 seq 29 0x563b321ab880 mdsbeaco
n(85106/mds2 down:damaged seq 311 v587) v7
-2> 2018-09-26 18:42:32.743 7f70f98b5700 1 -- mds:6800/3838577103 <== mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 ==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e
000
-1> 2018-09-26 18:42:32.743 7f70f98b5700 5 mds.beacon.mds2 handle_mds_beacon down:damaged seq 311 rtt 0.038261
0> 2018-09-26 18:42:32.743 7f70f28a7700 1 mds.mds2 respawn!
# cephfs-journal-tool --journal=purge_queue journal inspect
Overall journal integrity: DAMAGED
0x322ec65d9-ffffffffffffffff
# cephfs-journal-tool --journal=purge_queue journal reset
old journal was 13470819801~8463
new journal start will be 13472104448 (1276184 bytes past old end)
writing journal head
done
# cephfs-journal-tool --journal=purge_queue journal inspect
2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c
Overall journal integrity: DAMAGED
0xc8c
0x323000000-ffffffffffffffff
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com