Yan, Zheng
2018-11-26 02:08:05 UTC
Did you do any special operation (such as reset journal) before this happened
hi,
According to my understanding, the parent directory CDir fnode
version should be incremented if creating file or directory operation
happened. But there was a strange problem in my environment parent
fnode version is far less then subdirectories inode version which
leading a assert fail and the mds can’t work now.
Does anyone know how shuod i do can recover my environment?
Greatly appreciated if there has any help!
ceph version: jewel 10.2.2
mds mode: single mds
mds.log
2018-11-07 17:44:49.892205 7f4015824700 15 mds.0.journal try_to_expire
committing [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401 3=0+3)
n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f402a7aa000]
2018-11-07 17:44:49.892249 7f4015824700 10 mds.0.cache.dir(1) commit
want 0 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401 3=0+3)
n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f402a7aa000]
2018-11-07 17:44:49.892280 7f4015824700 10 mds.0.cache.dir(1) auth_pin
by 0x7f402a7aa000 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
ap=1+0+0 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
count now 1 + 0
2018-11-07 17:44:49.892312 7f4015824700 10 mds.0.cache.dir(1) _commit
want 94977 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
ap=1+0+0 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
2018-11-07 17:44:49.892349 7f4015824700 10 mds.0.cache.dir(1) marking committing
2018-11-07 17:44:49.892355 7f4015824700 10 mds.0.cache.dir(1) _omap_commit
2018-11-07 17:44:49.892610 7f4015824700 10 mds.0.cache.dir(1) set
bioflow [dentry #1/bioflow [2,head] auth (dversion lock) v=94976
inode=0x7f402a798c40 | lock=0 inodepin=1 dirty=1 authpin=0
0x7f402a7c4000]
2018-11-07 17:44:49.892632 7f4015824700 14 mds.0.cache.dir(1) pos 8 dn
'bioflow' inode [inode 10000000001 [...2,head] /bioflow/ auth v94976
f(v0 m2018-10-17 14:48:45.785088 4=0+4) n(v6615 rc2018-11-07
17:44:20.552395 b3212972601995 169081=168070+1011) (inest lock)
dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0
dirty=1 authpin=0 0x7f402a798c40 mode: 16877]
2018-11-07 17:44:49.892924 7f4015824700 10 mds.0.cache.dir(1) rm test
[dentry #1/test [2,head] auth NULL (dversion lock) v=94948 inode=0 |
inodepin=0 dirty=1 0x7f402b156bc0]
2018-11-07 17:44:49.893070 7f4015824700 1 -- 192.168.1.7:6807/26144
--> 192.168.1.6:6801/31980 -- osd_op(mds.0.63911:110775 8.6b2cdaff
1.00000000 [tmap2omap 1~0,omap-set-header 0~258,omap-set-vals
0~458,omap-rm-keys 0~17] snapc 0=[]
ondisk+write+known_if_redirected+full_force e6778) v7 -- ?+0
0x7f40304ef8c0 con 0x7f402d350600
2018-11-07 17:44:49.893134 7f4015824700 15 mds.0.journal try_to_expire
committing [dir 10000000001 /bioflow/ [2,head] auth v=94464 cv=0/0
state=1610612736 f(v0 m2018-10-17 14:48:45.785088 4=0+4) n(v6615
rc2018-11-07 17:44:20.552395 b3212972601995 169080=168070+1010)
hs=4+0,ss=0+0 dirty=2 | child=1 dirty=1 authpin=0 0x7f402a7aa600]
//expire next logsegment
18C_IO_Dir_Committed
2018-11-07 17:44:50.045917 7f4013820700 10 mds.0.cache.dir(1)
_committed v 94977 on [dir 1 / [2,head] auth v=94977 cv=94977/0
dir_auth=0 ap=1+0+1 state=1610612802|complete f(v0 m2018-10-17
16:39:14.350401 3=0+3) n(v7782 rc2018-11-07 17:44:20.552395
b3272234942693 171405=170108+1297)/n(v7782 rc2018-10-17
16:39:14.350401 b3272234942693 171394=170108+1286) hs=200+2,ss=0+0
dirty=2 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1
0x7f402a7aa000]
2018-11-07 17:44:50.045936 7f4013820700 10 mds.0.cache.dir(1)
mark_clean [dir 1 / [2,head] auth v=94977 cv=94977/94977 dir_auth=0
ap=1+0+1 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
version 94977
2018-11-07 17:44:50.045955 7f4013820700 15 mds.0.cache.dir(1) dir
94977 < inode 1755186 still dirty [inode 10000003557 [...2,head]
/tree112/ auth v1755186 f(v0 m2018-09-11 17:56:17.483220 1=1+0) n(v0
rc2018-09-11 17:56:17.483220 2=1+1) (inest lock) (iversion lock)
16877]
2018-11-07 17:44:50.047480 7f4013820700 -1 mds/CDir.cc: In function
'void CDir::_committed(int, version_t)' thread 7f4013820700 time
2018-11-07 17:44:50.045990
mds/CDir.cc: 2292: FAILED assert(in->is_dirty() || in->last < ((__u64)(-2)))
ceph version 10.2.2-Summit3.0-beta2-43-g9c6f44d
(9c6f44d72d16134cc2504f76d6ee48517d046efd)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7f401e56d0d5]
2: (CDir::_committed(int, unsigned long)+0xec2) [0x7f401e34baf2]
3: (MDSIOContextBase::complete(int)+0xa4) [0x7f401e3bf754]
4: (Finisher::finisher_thread_entry()+0x216) [0x7f401e49c9e6]
5: (()+0x7dc5) [0x7f401d340dc5]
6: (clone()+0x6d) [0x7f401be0d28d]
According to my understanding, the parent directory CDir fnode
version should be incremented if creating file or directory operation
happened. But there was a strange problem in my environment parent
fnode version is far less then subdirectories inode version which
leading a assert fail and the mds can’t work now.
Does anyone know how shuod i do can recover my environment?
Greatly appreciated if there has any help!
ceph version: jewel 10.2.2
mds mode: single mds
mds.log
2018-11-07 17:44:49.892205 7f4015824700 15 mds.0.journal try_to_expire
committing [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401 3=0+3)
n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f402a7aa000]
2018-11-07 17:44:49.892249 7f4015824700 10 mds.0.cache.dir(1) commit
want 0 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401 3=0+3)
n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x7f402a7aa000]
2018-11-07 17:44:49.892280 7f4015824700 10 mds.0.cache.dir(1) auth_pin
by 0x7f402a7aa000 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
ap=1+0+0 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
count now 1 + 0
2018-11-07 17:44:49.892312 7f4015824700 10 mds.0.cache.dir(1) _commit
want 94977 on [dir 1 / [2,head] auth v=94977 cv=0/0 dir_auth=0
ap=1+0+0 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
2018-11-07 17:44:49.892349 7f4015824700 10 mds.0.cache.dir(1) marking committing
2018-11-07 17:44:49.892355 7f4015824700 10 mds.0.cache.dir(1) _omap_commit
2018-11-07 17:44:49.892610 7f4015824700 10 mds.0.cache.dir(1) set
bioflow [dentry #1/bioflow [2,head] auth (dversion lock) v=94976
inode=0x7f402a798c40 | lock=0 inodepin=1 dirty=1 authpin=0
0x7f402a7c4000]
2018-11-07 17:44:49.892632 7f4015824700 14 mds.0.cache.dir(1) pos 8 dn
'bioflow' inode [inode 10000000001 [...2,head] /bioflow/ auth v94976
f(v0 m2018-10-17 14:48:45.785088 4=0+4) n(v6615 rc2018-11-07
17:44:20.552395 b3212972601995 169081=168070+1011) (inest lock)
dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyrstat=0
dirty=1 authpin=0 0x7f402a798c40 mode: 16877]
2018-11-07 17:44:49.892924 7f4015824700 10 mds.0.cache.dir(1) rm test
[dentry #1/test [2,head] auth NULL (dversion lock) v=94948 inode=0 |
inodepin=0 dirty=1 0x7f402b156bc0]
2018-11-07 17:44:49.893070 7f4015824700 1 -- 192.168.1.7:6807/26144
--> 192.168.1.6:6801/31980 -- osd_op(mds.0.63911:110775 8.6b2cdaff
1.00000000 [tmap2omap 1~0,omap-set-header 0~258,omap-set-vals
0~458,omap-rm-keys 0~17] snapc 0=[]
ondisk+write+known_if_redirected+full_force e6778) v7 -- ?+0
0x7f40304ef8c0 con 0x7f402d350600
2018-11-07 17:44:49.893134 7f4015824700 15 mds.0.journal try_to_expire
committing [dir 10000000001 /bioflow/ [2,head] auth v=94464 cv=0/0
state=1610612736 f(v0 m2018-10-17 14:48:45.785088 4=0+4) n(v6615
rc2018-11-07 17:44:20.552395 b3212972601995 169080=168070+1010)
hs=4+0,ss=0+0 dirty=2 | child=1 dirty=1 authpin=0 0x7f402a7aa600]
//expire next logsegment
18C_IO_Dir_Committed
2018-11-07 17:44:50.045917 7f4013820700 10 mds.0.cache.dir(1)
_committed v 94977 on [dir 1 / [2,head] auth v=94977 cv=94977/0
dir_auth=0 ap=1+0+1 state=1610612802|complete f(v0 m2018-10-17
16:39:14.350401 3=0+3) n(v7782 rc2018-11-07 17:44:20.552395
b3272234942693 171405=170108+1297)/n(v7782 rc2018-10-17
16:39:14.350401 b3272234942693 171394=170108+1286) hs=200+2,ss=0+0
dirty=2 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1
0x7f402a7aa000]
2018-11-07 17:44:50.045936 7f4013820700 10 mds.0.cache.dir(1)
mark_clean [dir 1 / [2,head] auth v=94977 cv=94977/94977 dir_auth=0
ap=1+0+1 state=1610612738|complete f(v0 m2018-10-17 16:39:14.350401
3=0+3) n(v7782 rc2018-11-07 17:44:20.552395 b3272234942693
171405=170108+1297)/n(v7782 rc2018-10-17 16:39:14.350401
b3272234942693 171394=170108+1286) hs=200+2,ss=0+0 dirty=2 | child=1
subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=1 0x7f402a7aa000]
version 94977
2018-11-07 17:44:50.045955 7f4013820700 15 mds.0.cache.dir(1) dir
94977 < inode 1755186 still dirty [inode 10000003557 [...2,head]
/tree112/ auth v1755186 f(v0 m2018-09-11 17:56:17.483220 1=1+0) n(v0
rc2018-09-11 17:56:17.483220 2=1+1) (inest lock) (iversion lock)
16877]
2018-11-07 17:44:50.047480 7f4013820700 -1 mds/CDir.cc: In function
'void CDir::_committed(int, version_t)' thread 7f4013820700 time
2018-11-07 17:44:50.045990
mds/CDir.cc: 2292: FAILED assert(in->is_dirty() || in->last < ((__u64)(-2)))
ceph version 10.2.2-Summit3.0-beta2-43-g9c6f44d
(9c6f44d72d16134cc2504f76d6ee48517d046efd)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7f401e56d0d5]
2: (CDir::_committed(int, unsigned long)+0xec2) [0x7f401e34baf2]
3: (MDSIOContextBase::complete(int)+0xa4) [0x7f401e3bf754]
4: (Finisher::finisher_thread_entry()+0x216) [0x7f401e49c9e6]
5: (()+0x7dc5) [0x7f401d340dc5]
6: (clone()+0x6d) [0x7f401be0d28d]