Discussion:
avoid 3-mds fs laggy on 1 rejoin?
(too old to reply)
Dzianis Kahanovich
2015-10-06 10:43:41 UTC
Permalink
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?

ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd)

I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to cephfs
(at least I can backup it). May be I just don't see it before, may be there are
cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) - whole mds cluster
stuck (but, good news - all clients alive after). How to make mds cluster
reliable on at least 1 restart?

My current mds config:

[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30

[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4

(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
John Spray
2015-10-06 11:02:01 UTC
Permalink
On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd)
I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How to
make mds cluster reliable on at least 1 restart?
It's not exactly clear to me how you've got this set up. What's the
output of "ceph status"?

John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Dzianis Kahanovich
2015-10-06 11:07:06 UTC
Permalink
Post by John Spray
On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd)
I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How to
make mds cluster reliable on at least 1 restart?
It's not exactly clear to me how you've got this set up. What's the
output of "ceph status"?
cluster 4fc73849-f913-4689-b6a6-efcefccae8d1
health HEALTH_OK
monmap e1: 3 mons at
{a=10.227.227.101:6789/0,b=10.227.227.103:6789/0,c=10.227.227.104:6789/0}
election epoch 28556, quorum 0,1,2 a,b,c
mdsmap e7136: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby
osdmap e158986: 15 osds: 15 up, 15 in
pgmap v60013179: 6032 pgs, 8 pools, 6528 GB data, 2827 kobjects
16257 GB used, 6005 GB / 22263 GB avail
6032 active+clean
client io 3211 kB/s rd, 1969 kB/s wr, 176 op/s

PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by John Spray
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-06 11:57:04 UTC
Permalink
PS This is standard 3 node (MON+MDS+OSDs - initial 3x setup) cluster + 1 OSDs
later node. Nothing special. OSDs balanced near equal size per host.
Post by Dzianis Kahanovich
Post by John Spray
On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd)
I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How to
make mds cluster reliable on at least 1 restart?
It's not exactly clear to me how you've got this set up. What's the
output of "ceph status"?
cluster 4fc73849-f913-4689-b6a6-efcefccae8d1
health HEALTH_OK
monmap e1: 3 mons at
{a=10.227.227.101:6789/0,b=10.227.227.103:6789/0,c=10.227.227.104:6789/0}
election epoch 28556, quorum 0,1,2 a,b,c
mdsmap e7136: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby
osdmap e158986: 15 osds: 15 up, 15 in
pgmap v60013179: 6032 pgs, 8 pools, 6528 GB data, 2827 kobjects
16257 GB used, 6005 GB / 22263 GB avail
6032 active+clean
client io 3211 kB/s rd, 1969 kB/s wr, 176 op/s
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by John Spray
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
John Spray
2015-10-06 12:01:20 UTC
Permalink
On Tue, Oct 6, 2015 at 12:07 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Post by John Spray
On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
ceph version 0.94.3-242-g79385a8
(79385a85beea9bccd82c99b6bda653f0224c4fcd)
I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How to
make mds cluster reliable on at least 1 restart?
It's not exactly clear to me how you've got this set up. What's the
output of "ceph status"?
cluster 4fc73849-f913-4689-b6a6-efcefccae8d1
health HEALTH_OK
monmap e1: 3 mons at
{a=10.227.227.101:6789/0,b=10.227.227.103:6789/0,c=10.227.227.104:6789/0}
election epoch 28556, quorum 0,1,2 a,b,c
mdsmap e7136: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby
osdmap e158986: 15 osds: 15 up, 15 in
pgmap v60013179: 6032 pgs, 8 pools, 6528 GB data, 2827 kobjects
16257 GB used, 6005 GB / 22263 GB avail
6032 active+clean
client io 3211 kB/s rd, 1969 kB/s wr, 176 op/s
OK, thanks. So the symptom is that when you have an MDS failure, the
standby-replay guy is coming up, but he is spending too long in
'rejoin' state, right? How long, exactly?

You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.

John
Post by Dzianis Kahanovich
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by John Spray
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-06 12:22:06 UTC
Permalink
Post by John Spray
Post by Dzianis Kahanovich
Post by John Spray
Post by Dzianis Kahanovich
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin?
ceph version 0.94.3-242-g79385a8
(79385a85beea9bccd82c99b6bda653f0224c4fcd)
I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to
cephfs (at least I can backup it). May be I just don't see it before, may be
there are cephfs pressure problem, but while 1 of 3 mds rejoin (slow!) -
whole mds cluster stuck (but, good news - all clients alive after). How to
make mds cluster reliable on at least 1 restart?
It's not exactly clear to me how you've got this set up. What's the
output of "ceph status"?
cluster 4fc73849-f913-4689-b6a6-efcefccae8d1
health HEALTH_OK
monmap e1: 3 mons at
{a=10.227.227.101:6789/0,b=10.227.227.103:6789/0,c=10.227.227.104:6789/0}
election epoch 28556, quorum 0,1,2 a,b,c
mdsmap e7136: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby
osdmap e158986: 15 osds: 15 up, 15 in
pgmap v60013179: 6032 pgs, 8 pools, 6528 GB data, 2827 kobjects
16257 GB used, 6005 GB / 22263 GB avail
6032 active+clean
client io 3211 kB/s rd, 1969 kB/s wr, 176 op/s
OK, thanks. So the symptom is that when you have an MDS failure, the
standby-replay guy is coming up, but he is spending too long in
'rejoin' state, right? How long, exactly?
Even now I remove "mds standby replay = true":
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get behaviour
like on MONs - leader->down/peon->leader?

Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs + starts).
Surprise for me - only MDS need special actions.
Post by John Spray
You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.
OK, debug later - this is production ;). But even 5 min. rejoin must be not too
problem if other 2 nodes up. Even degraded OSDs still accessible (if balanced
size/min_size & CRUSH).

About time of rejoin - simple:
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700 1 mds.0.634 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700 1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700 1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700 1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700 1 mds.0.634 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700 1 mds.0.636 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700 1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700 1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700 1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700 1 mds.0.636 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700 1 mds.0.639 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700 1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700 1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700 1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700 1 mds.0.639 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700 1 mds.0.642 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700 1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700 1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700 1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700 1 mds.0.642 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700 1 mds.0.645 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700 1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700 1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700 1 mds.0.645 suicide. wanted down:dne,
now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700 1 mds.0.648 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700 1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700 1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700 1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700 1 mds.0.648 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700 1 mds.0.651 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700 1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700 1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700 1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700 1 mds.0.651 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700 1 mds.0.655 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700 1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700 1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700 1 mds.0.655 suicide. wanted down:dne,
now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700 1 mds.0.657 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700 1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700 1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700 1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700 1 mds.0.657 handle_mds_map state change
up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700 1 mds.0.660 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700 1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700 1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700 1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700 1 mds.0.660 handle_mds_map state change
up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700 1 mds.0.664 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700 1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700 1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700 1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700 1 mds.0.664 handle_mds_map state change
up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700 1 mds.0.666 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700 1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700 1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700 1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700 1 mds.0.666 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700 1 mds.0.670 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700 1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700 1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700 1 mds.0.670 suicide. wanted down:dne,
now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700 1 mds.0.676 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700 1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700 1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700 1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700 1 mds.0.676 handle_mds_map state change
up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700 1 mds.0.681 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700 1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700 1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700 1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700 1 mds.0.681 handle_mds_map state change
up:rejoin --> up:active
Post by John Spray
John
Post by Dzianis Kahanovich
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by John Spray
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
John Spray
2015-10-06 12:34:20 UTC
Permalink
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?
It's not clear to me why you're saying it's stuck. Is it stuck, or is it slow?

From that log it looks like you're restarting mds.b many times in one
day, that's kind of unusual. Are you really doing all those restarts
by hand, or is something else going wrong?

The expected behaviour is that when mds.b restarts, the mon notices
that the old mds.b instance is dead, and hands the role to the
standby-replay mds.
Post by Dzianis Kahanovich
Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs +
starts). Surprise for me - only MDS need special actions.
What special actions are you having to perform? It looks like your
cluster is coming back online eventually?

John
Post by Dzianis Kahanovich
Post by John Spray
You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.
OK, debug later - this is production ;). But even 5 min. rejoin must be not
too problem if other 2 nodes up. Even degraded OSDs still accessible (if
balanced size/min_size & CRUSH).
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700 1 mds.0.634 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700 1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700 1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700 1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700 1 mds.0.634 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700 1 mds.0.636 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700 1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700 1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700 1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700 1 mds.0.636 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700 1 mds.0.639 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700 1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700 1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700 1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700 1 mds.0.639 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700 1 mds.0.642 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700 1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700 1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700 1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700 1 mds.0.642 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700 1 mds.0.645 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700 1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700 1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700 1 mds.0.645 suicide. wanted
down:dne, now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700 1 mds.0.648 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700 1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700 1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700 1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700 1 mds.0.648 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700 1 mds.0.651 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700 1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700 1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700 1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700 1 mds.0.651 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700 1 mds.0.655 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700 1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700 1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700 1 mds.0.655 suicide. wanted
down:dne, now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700 1 mds.0.657 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700 1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700 1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700 1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700 1 mds.0.657 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700 1 mds.0.660 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700 1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700 1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700 1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700 1 mds.0.660 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700 1 mds.0.664 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700 1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700 1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700 1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700 1 mds.0.664 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700 1 mds.0.666 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700 1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700 1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700 1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700 1 mds.0.666 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700 1 mds.0.670 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700 1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700 1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700 1 mds.0.670 suicide. wanted
down:dne, now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700 1 mds.0.676 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700 1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700 1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700 1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700 1 mds.0.676 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700 1 mds.0.681 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700 1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700 1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700 1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700 1 mds.0.681 handle_mds_map state
change up:rejoin --> up:active
Post by John Spray
John
Post by Dzianis Kahanovich
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by Dzianis Kahanovich
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-06 13:21:42 UTC
Permalink
Post by John Spray
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?
It's not clear to me why you're saying it's stuck. Is it stuck, or is it slow?
It totally sleep (stuck) up to HEALTH_OK (to rejoin complete). Not slow. "mds
cluster degraded".
Post by John Spray
Post by Dzianis Kahanovich
From that log it looks like you're restarting mds.b many times in one
day, that's kind of unusual. Are you really doing all those restarts
by hand, or is something else going wrong?
The expected behaviour is that when mds.b restarts, the mon notices
that the old mds.b instance is dead, and hands the role to the
standby-replay mds.
Post by Dzianis Kahanovich
Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs +
starts). Surprise for me - only MDS need special actions.
What special actions are you having to perform? It looks like your
cluster is coming back online eventually?
I don't test while, something like:
ceph mds stop <who>
ceph mds deactivate <who>
ceph mds tell <who> <args> [<args>...]
- before KILL

- something to tell mds to release "active" status and move it to another.
Also I look to "mds shutdown check = <int>" (?).
Or fix mds to do it on KILL if nothing this.
Post by John Spray
John
Post by Dzianis Kahanovich
Post by John Spray
You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.
OK, debug later - this is production ;). But even 5 min. rejoin must be not
too problem if other 2 nodes up. Even degraded OSDs still accessible (if
balanced size/min_size & CRUSH).
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700 1 mds.0.634 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700 1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700 1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700 1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700 1 mds.0.634 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700 1 mds.0.636 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700 1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700 1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700 1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700 1 mds.0.636 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700 1 mds.0.639 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700 1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700 1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700 1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700 1 mds.0.639 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700 1 mds.0.642 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700 1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700 1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700 1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700 1 mds.0.642 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700 1 mds.0.645 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700 1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700 1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700 1 mds.0.645 suicide. wanted
down:dne, now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700 1 mds.0.648 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700 1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700 1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700 1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700 1 mds.0.648 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700 1 mds.0.651 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700 1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700 1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700 1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700 1 mds.0.651 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700 1 mds.0.655 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700 1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700 1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700 1 mds.0.655 suicide. wanted
down:dne, now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700 1 mds.0.657 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700 1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700 1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700 1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700 1 mds.0.657 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700 1 mds.0.660 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700 1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700 1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700 1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700 1 mds.0.660 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700 1 mds.0.664 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700 1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700 1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700 1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700 1 mds.0.664 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700 1 mds.0.666 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700 1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700 1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700 1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700 1 mds.0.666 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700 1 mds.0.670 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700 1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700 1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700 1 mds.0.670 suicide. wanted
down:dne, now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700 1 mds.0.676 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700 1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700 1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700 1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700 1 mds.0.676 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700 1 mds.0.681 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700 1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700 1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700 1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700 1 mds.0.681 handle_mds_map state
change up:rejoin --> up:active
Post by John Spray
John
Post by Dzianis Kahanovich
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by Dzianis Kahanovich
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
John Spray
2015-10-06 14:00:26 UTC
Permalink
On Tue, Oct 6, 2015 at 2:21 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Post by John Spray
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?
It's not clear to me why you're saying it's stuck. Is it stuck, or is it slow?
It totally sleep (stuck) up to HEALTH_OK (to rejoin complete). Not slow.
"mds cluster degraded".
Okay, so if I understand you correctly, "it" means the client IO. The
MDS cluster isn't stuck, but client metadata operations are blocked
while the MDS cluster is degraded. That is expected behaviour.

The idea is that MDS failover should be rare and quick enough that the
interruption to client IO isn't a problem, so the interesting part is
finding out why the failover isn't happening quickly enough.

Next time you go through this process, turn up the MDS debug logs (if
10 is too verbose for your system, maybe just set to 7 or so), and
also capture the relevant section of the cluster log (i.e. the
ceph.log) so that we can see how ranks are being assigned during the
failover event. That would give us enough information to know why
this is taking longer than it should.
Post by Dzianis Kahanovich
Post by John Spray
What special actions are you having to perform? It looks like your
cluster is coming back online eventually?
ceph mds stop <who>
ceph mds deactivate <who>
ceph mds tell <who> <args> [<args>...]
- before KILL
- something to tell mds to release "active" status and move it to another.
Also I look to "mds shutdown check = <int>" (?).
Or fix mds to do it on KILL if nothing this.
I see that you've listed some commands, but I'm not sure I understand
what action you're actually taking here? If you're looking for the
command that notifies ceph that an MDS daemon is gone for good and
another daemon should take over, it's "ceph mds fail <rank>".

John
Dzianis Kahanovich
2015-10-06 14:53:04 UTC
Permalink
Post by John Spray
On Tue, Oct 6, 2015 at 2:21 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Post by John Spray
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?
It's not clear to me why you're saying it's stuck. Is it stuck, or is it slow?
It totally sleep (stuck) up to HEALTH_OK (to rejoin complete). Not slow.
"mds cluster degraded".
Okay, so if I understand you correctly, "it" means the client IO. The
MDS cluster isn't stuck, but client metadata operations are blocked
while the MDS cluster is degraded. That is expected behaviour.
Yes, IT=IO ;). Simple: "ls /mnt/ceph" wait until "HEALTH_WARN: mds cluster
degraded" end ("ceph health detail" say node is "rejoin", same - in log).
Post by John Spray
The idea is that MDS failover should be rare and quick enough that the
interruption to client IO isn't a problem, so the interesting part is
finding out why the failover isn't happening quickly enough.
Next time you go through this process, turn up the MDS debug logs (if
10 is too verbose for your system, maybe just set to 7 or so), and
also capture the relevant section of the cluster log (i.e. the
ceph.log) so that we can see how ranks are being assigned during the
failover event. That would give us enough information to know why
this is taking longer than it should.
Post by Dzianis Kahanovich
Post by John Spray
What special actions are you having to perform? It looks like your
cluster is coming back online eventually?
ceph mds stop <who>
ceph mds deactivate <who>
ceph mds tell <who> <args> [<args>...]
- before KILL
- something to tell mds to release "active" status and move it to another.
Also I look to "mds shutdown check = <int>" (?).
Or fix mds to do it on KILL if nothing this.
I see that you've listed some commands, but I'm not sure I understand
what action you're actually taking here? If you're looking for the
command that notifies ceph that an MDS daemon is gone for good and
another daemon should take over, it's "ceph mds fail <rank>".
I want to say to mds to go to delegate "active" state in normal behaviour to
anybody else by normal shutdown (deactivation) process. If possible. So,
sometimes similar "ceph osd down" helps before kill to avoid some misbehaviour
(on busy node/mon state osd can be "out" immediately without timeout). As I
understand, "standby-reply" must prepare other mds to fast activation, but in
theory - same can be happened (+wait) if mds replication started only on shutdown.
Post by John Spray
John
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-07 15:46:54 UTC
Permalink
John Spray пишет:

[...]

There are part of log for restarted mds debug 7 (without standby-replplay, but
IMHO no matter):

(PS How [un]safe multiple mds in current hammer? Now I try & temporary work with
"set_max_mds 3", but 1 mds shutdown still too laggy for related client anymore.
And anymore 4.2.3 kernel driver buggy and at least need hard tuning to normal
work in real busy mode...)

2015-10-06 23:38:34.420499 7f256905a780 0 ceph version 0.94.3-242-g79385a8
(79385a85beea9bccd82c99b6bda653f0224c4fcd), process ceph-mds, pid 17337
2015-10-06 23:38:34.590909 7f256905a780 -1 mds.-1.0 log_to_monitors {default=true}
2015-10-06 23:38:34.608101 7f2561355700 7 mds.-1.server handle_osd_map: full =
0 epoch = 159091
2015-10-06 23:38:34.608457 7f2561355700 5 mds.-1.0 handle_mds_map epoch 7241
from mon.2
2015-10-06 23:38:34.608558 7f2561355700 7 mds.-1.server handle_osd_map: full =
0 epoch = 159091
2015-10-06 23:38:34.608614 7f2561355700 5 mds.-1.-1 handle_mds_map epoch 7241
from mon.2
2015-10-06 23:38:34.608618 7f2561355700 5 mds.-1.-1 old map epoch 7241 <=
7241, discarding
2015-10-06 23:38:35.047873 7f2561355700 7 mds.-1.server handle_osd_map: full =
0 epoch = 159092
2015-10-06 23:38:35.390461 7f2561355700 5 mds.-1.-1 handle_mds_map epoch 7242
from mon.2
2015-10-06 23:38:35.390529 7f2561355700 1 mds.-1.0 handle_mds_map standby
2015-10-06 23:38:35.607255 7f2561355700 5 mds.-1.0 handle_mds_map epoch 7243
from mon.2
2015-10-06 23:38:35.607292 7f2561355700 1 mds.0.688 handle_mds_map i am now
mds.0.688
2015-10-06 23:38:35.607310 7f2561355700 1 mds.0.688 handle_mds_map state change
up:standby --> up:replay
2015-10-06 23:38:35.607313 7f2561355700 1 mds.0.688 replay_start
2015-10-06 23:38:35.607316 7f2561355700 7 mds.0.cache set_recovery_set
2015-10-06 23:38:35.607318 7f2561355700 1 mds.0.688 recovery set is
2015-10-06 23:38:35.607321 7f2561355700 2 mds.0.688 boot_start 0: opening inotable
2015-10-06 23:38:35.648096 7f2561355700 2 mds.0.688 boot_start 0: opening
sessionmap
2015-10-06 23:38:35.648227 7f2561355700 2 mds.0.688 boot_start 0: opening mds log
2015-10-06 23:38:35.648230 7f2561355700 5 mds.0.log open discovering log bounds
2015-10-06 23:38:35.648281 7f2561355700 2 mds.0.688 boot_start 0: opening snap
table
2015-10-06 23:38:35.695211 7f255bf47700 4 mds.0.log Waiting for journal 300 to
recover...
2015-10-06 23:38:35.699398 7f255bf47700 4 mds.0.log Journal 300 recovered.
2015-10-06 23:38:35.699408 7f255bf47700 4 mds.0.log Recovered journal 300 in
format 1
2015-10-06 23:38:35.699413 7f255bf47700 2 mds.0.688 boot_start 1:
loading/discovering base inodes
2015-10-06 23:38:35.699501 7f255bf47700 0 mds.0.cache creating system inode
with ino:100
2015-10-06 23:38:35.716269 7f255bf47700 0 mds.0.cache creating system inode
with ino:1
2015-10-06 23:38:35.851761 7f255eb50700 2 mds.0.688 boot_start 2: replaying mds log
2015-10-06 23:38:36.272808 7f255bf47700 7 mds.0.cache adjust_subtree_auth -1,-2
-> -2,-2 on [dir 1 / [2,head] auth v=0 cv=0/0 state=1073741824 f() n()
hs=0+0,ss=0+0 0x5cde000]
2015-10-06 23:38:36.272837 7f255bf47700 7 mds.0.cache current root is [dir 1 /
[2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 | subtree=1
0x5cde000]
2015-10-06 23:38:36.272849 7f255bf47700 7 mds.0.cache adjust_subtree_auth -1,-2
-> -2,-2 on [dir 100 ~mds0/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n()
hs=0+0,ss=0+0 0x5cde3c0]
2015-10-06 23:38:36.272855 7f255bf47700 7 mds.0.cache current root is [dir 100
~mds0/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 |
subtree=1 0x5cde3c0]
2015-10-06 23:38:36.282049 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 1 / [2,head] auth v=24837995
cv=0/0 dir_auth=-2 state=1073741824 f(v175 m2015-10-05 14:34:45.120292 73=45+28)
n(v2043504 rc2015-10-06 23:16:58.837591 b4465915097499
1125727=964508+161219)/n(v2043504 rc2015-10-06 23:16:46.667354 b4465915097417
1125726=964507+161219) hs=0+0,ss=0+0 | subtree=1 0x5cde000] bound_dfs []
2015-10-06 23:38:36.282086 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 1 / [2,head] auth v=24837995
cv=0/0 dir_auth=-2 state=1073741824 f(v175 m2015-10-05 14:34:45.120292 73=45+28)
n(v2043504 rc2015-10-06 23:16:58.837591 b4465915097499
1125727=964508+161219)/n(v2043504 rc2015-10-06 23:16:46.667354 b4465915097417
1125726=964507+161219) hs=0+0,ss=0+0 | subtree=1 0x5cde000] bounds
2015-10-06 23:38:36.282100 7f255bf47700 7 mds.0.cache current root is [dir 1 /
[2,head] auth v=24837995 cv=0/0 dir_auth=-2 state=1073741824 f(v175 m2015-10-05
14:34:45.120292 73=45+28) n(v2043504 rc2015-10-06 23:16:58.837591 b4465915097499
1125727=964508+161219)/n(v2043504 rc2015-10-06 23:16:46.667354 b4465915097417
1125726=964507+161219) hs=0+0,ss=0+0 | subtree=1 0x5cde000]
2015-10-06 23:38:36.282147 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 100 ~mds0/ [2,head] auth
v=28417783 cv=0/0 dir_auth=-2 state=1073741824 f(v0 11=1+10) n(v821531
rc2015-10-06 23:16:58.847591 b2144931 594=584+10)/n(v821531 rc2015-10-06
23:14:18.064458 b2136353 593=583+10) hs=0+0,ss=0+0 | subtree=1 0x5cde3c0]
bound_dfs []
2015-10-06 23:38:36.282161 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 100 ~mds0/ [2,head] auth
v=28417783 cv=0/0 dir_auth=-2 state=1073741824 f(v0 11=1+10) n(v821531
rc2015-10-06 23:16:58.847591 b2144931 594=584+10)/n(v821531 rc2015-10-06
23:14:18.064458 b2136353 593=583+10) hs=0+0,ss=0+0 | subtree=1 0x5cde3c0] bounds
2015-10-06 23:38:36.282171 7f255bf47700 7 mds.0.cache current root is [dir 100
~mds0/ [2,head] auth v=28417783 cv=0/0 dir_auth=-2 state=1073741824 f(v0
11=1+10) n(v821531 rc2015-10-06 23:16:58.847591 b2144931 594=584+10)/n(v821531
rc2015-10-06 23:14:18.064458 b2136353 593=583+10) hs=0+0,ss=0+0 | subtree=1
0x5cde3c0]
2015-10-06 23:38:36.282180 7f255bf47700 7 mds.0.cache recalc_auth_bits (replay)
2015-10-06 23:38:36.282183 7f255bf47700 7 mds.0.cache show_cache
2015-10-06 23:38:36.282195 7f255bf47700 7 mds.0.cache unlinked [inode 100
[...2,head] ~mds0/ auth v3265584 snaprealm=0x38be880 f(v0 11=1+10) n(v817982
rc2015-10-06 08:04:17.159725 b1990579 489=476+13)/n(v0 11=1+10) (iversion lock)
| dirfrag=1 0x3ab8000]
2015-10-06 23:38:36.282210 7f255bf47700 7 mds.0.cache dirfrag [dir 100 ~mds0/
[2,head] auth v=28417783 cv=0/0 dir_auth=0 state=1073741824 f(v0 11=1+10)
n(v821531 rc2015-10-06 23:16:58.847591 b2144931 594=584+10)/n(v821531
rc2015-10-06 23:14:18.064458 b2136353 593=583+10) hs=0+0,ss=0+0 | subtree=1
0x5cde3c0]
2015-10-06 23:38:36.282226 7f255bf47700 7 mds.0.cache unlinked [inode 1
[...2,head] / auth v10215492 snaprealm=0x38beac0 f(v175 m2015-09-17
16:18:51.151569 69=41+28) n(v1901802 rc2015-09-18 03:16:52.300905 b5226530189072
205909=160799+45110)/n(v0 1=0+1) (iversion lock) | dirfrag=1 0x3ab89a0]
2015-10-06 23:38:36.282236 7f255bf47700 7 mds.0.cache dirfrag [dir 1 /
[2,head] auth v=24837995 cv=0/0 dir_auth=0 state=1073741824 f(v175 m2015-10-05
14:34:45.120292 73=45+28) n(v2043504 rc2015-10-06 23:16:58.837591 b4465915097499
1125727=964508+161219)/n(v2043504 rc2015-10-06 23:16:46.667354 b4465915097417
1125726=964507+161219) hs=0+0,ss=0+0 | subtree=1 0x5cde000]
2015-10-06 23:38:36.652925 7f255bf47700 0 log_channel(cluster) log [WRN] :
replayed op client.5330402:205687,205685 used ino 10000a6cc65 but session next
is 10000a666f7
2015-10-06 23:38:36.652992 7f255bf47700 0 log_channel(cluster) log [WRN] :
replayed op client.5330402:205691,205685 used ino 10000a6cc66 but session next
is 10000a666f7
2015-10-06 23:38:36.653067 7f255bf47700 0 log_channel(cluster) log [WRN] :
replayed op client.5330402:205693,205685 used ino 10000a6cc67 but session next
is 10000a666f7
2015-10-06 23:38:36.767359 7f255bf47700 1 mds.0.688 replay_done
2015-10-06 23:38:36.767372 7f255bf47700 1 mds.0.688 making mds journal writeable
2015-10-06 23:38:36.767824 7f255bf47700 2 mds.0.688 i am alone, moving to state
reconnect
2015-10-06 23:38:36.767827 7f255bf47700 3 mds.0.688 request_state up:reconnect
2015-10-06 23:38:38.153519 7f2559a30700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.104:0/928408192 pipe(0x546f000 sd=29 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6880).accept peer addr is really 10.227.227.104:0/928408192 (socket is
10.227.227.104:44734/0)
2015-10-06 23:38:38.153764 7f255992f700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.106:0/2753586590 pipe(0x5480000 sd=30 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6700).accept peer addr is really 10.227.227.106:0/2753586590 (socket is
10.227.227.106:40232/0)
2015-10-06 23:38:38.154183 7f2561355700 7 mds.0.server handle_client_reconnect
client.5293917
2015-10-06 23:38:38.154392 7f2561355700 7 mds.0.server handle_client_reconnect
client.5293920
2015-10-06 23:38:38.154676 7f2561355700 7 mds.0.server handle_client_reconnect
client.5297061
2015-10-06 23:38:38.154927 7f2561355700 7 mds.0.server handle_client_reconnect
client.5299230
2015-10-06 23:38:38.159228 7f2561355700 7 mds.0.server handle_client_reconnect
client.5285889
2015-10-06 23:38:38.160108 7f2559026700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.103:0/4210455448 pipe(0x54a4000 sd=34 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6b80).accept peer addr is really 10.227.227.103:0/4210455448 (socket is
10.227.227.103:49749/0)
2015-10-06 23:38:38.160243 7f2558f25700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.101:0/3087077443 pipe(0x54a9000 sd=35 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6d00).accept peer addr is really 10.227.227.101:0/3087077443 (socket is
10.227.227.101:44073/0)
2015-10-06 23:38:38.160340 7f2558d23700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.102:0/1177330677 pipe(0x54bb000 sd=37 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e7000).accept peer addr is really 10.227.227.102:0/1177330677 (socket is
10.227.227.102:35575/0)
2015-10-06 23:38:38.160723 7f2561355700 7 mds.0.server handle_client_reconnect
client.5294037
2015-10-06 23:38:38.160865 7f2561355700 7 mds.0.server handle_client_reconnect
client.5303329
2015-10-06 23:38:38.160932 7f2561355700 7 mds.0.server handle_client_reconnect
client.3093290
2015-10-06 23:38:38.161235 7f2561355700 7 mds.0.server handle_client_reconnect
client.5300956
2015-10-06 23:38:38.161808 7f2561355700 7 mds.0.server handle_client_reconnect
client.5306077
2015-10-06 23:38:38.162426 7f2561355700 7 mds.0.server handle_client_reconnect
client.3102997
2015-10-06 23:38:38.300492 7f2561355700 5 mds.0.688 handle_mds_map epoch 7244
from mon.2
2015-10-06 23:38:38.300564 7f2561355700 1 mds.0.688 handle_mds_map i am now
mds.0.688
2015-10-06 23:38:38.300571 7f2561355700 1 mds.0.688 handle_mds_map state change
up:replay --> up:reconnect
2015-10-06 23:38:38.300574 7f2561355700 1 mds.0.688 reconnect_start
2015-10-06 23:38:38.300576 7f2561355700 1 mds.0.688 reopen_log
2015-10-06 23:38:38.330199 7f2561355700 1 mds.0.server reconnect_clients -- 16
sessions
2015-10-06 23:38:38.330219 7f2561355700 7 mds.0.server handle_client_reconnect
client.5293917
2015-10-06 23:38:38.330262 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5293917 10.227.227.104:0/928408192 after 0.000023
2015-10-06 23:38:38.330330 7f2561355700 7 mds.0.server handle_client_reconnect
client.5293920
2015-10-06 23:38:38.330352 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5293920 10.227.227.106:0/2753586590 after 0.000137
2015-10-06 23:38:38.330362 7f2561355700 7 mds.0.server handle_client_reconnect
client.5297061
2015-10-06 23:38:38.330375 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5297061 10.227.227.104:0/2947571808 after 0.000165
2015-10-06 23:38:38.330388 7f2561355700 7 mds.0.server handle_client_reconnect
client.5299230
2015-10-06 23:38:38.330402 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5299230 10.227.227.104:0/4118952172 after 0.000191
2015-10-06 23:38:38.330445 7f2561355700 7 mds.0.server handle_client_reconnect
client.5285889
2015-10-06 23:38:38.330463 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5285889 10.227.227.101:0/20130 after 0.000251
2015-10-06 23:38:38.330474 7f2561355700 7 mds.0.server handle_client_reconnect
client.5294037
2015-10-06 23:38:38.330487 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5294037 10.227.227.103:0/4210455448 after 0.000277
2015-10-06 23:38:38.330497 7f2561355700 7 mds.0.server handle_client_reconnect
client.5303329
2015-10-06 23:38:38.330509 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5303329 10.227.227.101:0/3087077443 after 0.000300
2015-10-06 23:38:38.330527 7f2561355700 7 mds.0.server handle_client_reconnect
client.3093290
2015-10-06 23:38:38.330541 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.3093290 10.227.227.102:0/1177330677 after 0.000330
2015-10-06 23:38:38.330561 7f2561355700 7 mds.0.server handle_client_reconnect
client.5300956
2015-10-06 23:38:38.330573 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5300956 10.227.227.106:0/109306 after 0.000363
2015-10-06 23:38:38.330581 7f2561355700 7 mds.0.server handle_client_reconnect
client.5306077
2015-10-06 23:38:38.330593 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5306077 10.227.227.104:0/2137 after 0.000383
2015-10-06 23:38:38.330601 7f2561355700 7 mds.0.server handle_client_reconnect
client.3102997
2015-10-06 23:38:38.330614 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.3102997 10.227.227.102:0/8221 after 0.000403
2015-10-06 23:38:38.330650 7f2561355700 5 mds.0.bal rebalance done
2015-10-06 23:38:38.330951 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.330967 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331075 7f2561355700 7 mds.0.server handle_client_reconnect
client.5330090
2015-10-06 23:38:38.331090 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5330090 10.227.227.104:0/3762438977 after 0.000878
2015-10-06 23:38:38.331138 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331148 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331254 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331454 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331500 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331602 7f2561355700 7 mds.0.server handle_client_reconnect
client.5325905
2015-10-06 23:38:38.331616 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5325905 10.227.227.103:0/17787 after 0.001405
2015-10-06 23:38:38.332393 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.361193 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.361286 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.361360 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.390186 7f2557e14700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.200:0/2048866910 pipe(0x3b87000 sd=43 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e7900).accept peer addr is really 10.227.227.200:0/2048866910 (socket is
10.227.227.200:42145/0)
2015-10-06 23:38:38.390688 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.390751 7f2561355700 7 mds.0.server handle_client_reconnect
client.5325809
2015-10-06 23:38:38.390775 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5325809 10.227.227.200:0/2048866910 after 0.060554
2015-10-06 23:38:38.396656 7f2561355700 7 mds.0.server handle_client_reconnect
client.5330402
2015-10-06 23:38:38.396672 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5330402 10.227.227.204:0/3522816523 after 0.066459
2015-10-06 23:38:38.629573 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629608 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629622 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629639 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629661 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629679 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629698 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629753 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629779 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629792 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629845 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629887 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629943 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629962 7f2561355700 7 mds.0.locker handle_client_caps on
10000a68438 follows 1 op flush
2015-10-06 23:38:38.629976 7f2561355700 7 mds.0.locker handle_client_caps on
10000a6843a follows 1 op flush
2015-10-06 23:38:38.630088 7f2561355700 7 mds.0.locker handle_client_caps on
10000a6843c follows 1 op flush
2015-10-06 23:38:38.630104 7f2561355700 7 mds.0.locker handle_client_caps on
10000a68439 follows 1 op flush
2015-10-06 23:38:38.630119 7f2561355700 7 mds.0.locker handle_client_caps on
10000a6843b follows 1 op flush
2015-10-06 23:38:38.858273 7f2561355700 7 mds.0.locker handle_client_caps on
10000a348a2 follows 0 op update
2015-10-06 23:38:38.858329 7f2561355700 7 mds.0.locker handle_client_caps on
10000a348a2 follows 0 op update
2015-10-06 23:38:38.931186 7f2561355700 7 mds.0.locker handle_client_caps on
10000a35092 follows 0 op update
2015-10-06 23:38:38.958907 7f2561355700 7 mds.0.locker handle_client_caps on
10000a35092 follows 0 op update
2015-10-06 23:38:39.111386 7f2561355700 7 mds.0.locker handle_client_caps on
10000a350ed follows 0 op update
2015-10-06 23:38:39.111791 7f2561355700 7 mds.0.locker handle_client_caps on
10000a350ed follows 0 op update
2015-10-06 23:38:39.157380 7f2561355700 7 mds.0.locker handle_client_caps on
10000a348a2 follows 0 op update
2015-10-06 23:38:39.291335 7f2561355700 7 mds.0.server handle_client_reconnect
client.5298246
2015-10-06 23:38:39.291369 7f2561355700 0 log_channel(cluster) log [DBG] :
reconnect by client.5298246 10.227.227.205:0/3532402607 after 0.961140
2015-10-06 23:38:39.725442 7f2561355700 7 mds.0.server reconnect_gather_finish.
failed on 0 clients
2015-10-06 23:38:39.725454 7f2561355700 1 mds.0.688 reconnect_done
2015-10-06 23:38:39.725455 7f2561355700 3 mds.0.688 request_state up:rejoin
2015-10-06 23:38:39.860274 7f2561355700 7 mds.0.locker handle_client_caps on
10000a68440 follows 1 op update
2015-10-06 23:38:39.860333 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:39.860343 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:39.860364 7f2561355700 7 mds.0.locker handle_client_caps on
10000a68442 follows 1 op update
2015-10-06 23:38:39.860381 7f2561355700 7 mds.0.locker handle_client_caps on
10000a68441 follows 1 op update
2015-10-06 23:38:39.914539 7f2561355700 7 mds.0.locker handle_client_caps on
10000a474e9 follows 0 op update
2015-10-06 23:38:40.327674 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:40.726731 7f2561355700 5 mds.0.688 handle_mds_map epoch 7245
from mon.2
2015-10-06 23:38:40.726778 7f2561355700 1 mds.0.688 handle_mds_map i am now
mds.0.688
2015-10-06 23:38:40.726780 7f2561355700 1 mds.0.688 handle_mds_map state change
up:reconnect --> up:rejoin
2015-10-06 23:38:40.726783 7f2561355700 1 mds.0.688 rejoin_start
2015-10-06 23:38:52.802698 7f2561355700 1 mds.0.688 rejoin_joint_start
2015-10-06 23:38:52.802702 7f2561355700 7 mds.0.cache rejoin_send_rejoins still
processing imported caps, delaying
2015-10-06 23:38:52.802716 7f2561355700 5 mds.0.bal rebalance done
2015-10-06 23:38:52.803065 7f2561355700 7 mds.0.locker handle_client_caps on
100009b0da6 follows 0 op update
2015-10-06 23:38:52.803091 7f2561355700 7 mds.0.locker handle_client_caps on
10000a68444 follows 1 op update
2015-10-06 23:38:52.803511 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
2015-10-06 23:38:52.803614 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:52.803636 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
2015-10-06 23:38:52.803718 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
2015-10-06 23:38:52.803796 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
2015-10-06 23:38:52.803842 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
2015-10-06 23:38:52.804182 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
2015-10-06 23:38:52.804193 7f255eb50700 7 mds.0.cache.dir(1000002e7cf) already
fetching; waiting
[...]
2015-10-06 23:43:40.232678 7f255d24c700 7 mds.0.688 mds has 3 queued contexts
2015-10-06 23:43:40.254516 7f255eb50700 7 mds.0.cache.dir(100009a7cdd) already
fetching; waiting
2015-10-06 23:43:40.282586 7f255eb50700 7 mds.0.cache.dir(100009e3a8c) already
fetching; waiting
2015-10-06 23:43:40.327519 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:43:40.344125 7f255eb50700 7 mds.0.cache.dir(100009aca65) already
fetching; waiting
2015-10-06 23:43:40.382397 7f255eb50700 7 mds.0.cache.dir(100009a8068) already
fetching; waiting
2015-10-06 23:43:40.406319 7f255eb50700 7 mds.0.cache.dir(100009e2ba3) already
fetching; waiting
2015-10-06 23:43:40.439557 7f255eb50700 7 mds.0.cache.dir(100009a84ee) already
fetching; waiting
2015-10-06 23:43:40.481450 7f255eb50700 7 mds.0.cache.dir(100009c5531) already
fetching; waiting
2015-10-06 23:43:40.522600 7f255eb50700 7 mds.0.cache.dir(100009a4a3e) already
fetching; waiting
2015-10-06 23:43:40.537671 7f255eb50700 7 mds.0.cache.dir(100009acc63) already
fetching; waiting
2015-10-06 23:43:40.539612 7f255eb50700 7 mds.0.cache.dir(100009dc614) already
fetching; waiting
2015-10-06 23:43:40.610772 7f255eb50700 7 mds.0.cache.dir(100009dc614) already
fetching; waiting
2015-10-06 23:43:40.640233 7f255eb50700 7 mds.0.cache.dir(100009f2867) already
fetching; waiting
2015-10-06 23:43:40.647848 7f255eb50700 7 mds.0.cache.dir(100009d0739) already
fetching; waiting
2015-10-06 23:43:40.663730 7f255eb50700 7 mds.0.cache.dir(100009a8776) already
fetching; waiting
2015-10-06 23:43:40.723566 7f255eb50700 7 mds.0.cache.dir(100009de1cd) already
fetching; waiting
2015-10-06 23:43:40.747002 7f255eb50700 7 mds.0.cache.dir(100009c8188) already
fetching; waiting
2015-10-06 23:43:40.780093 7f255eb50700 7 mds.0.cache.dir(10000a404d7) already
fetching; waiting
2015-10-06 23:43:40.823139 7f255eb50700 7 mds.0.cache.dir(100009a90de) already
fetching; waiting
2015-10-06 23:43:40.835196 7f255eb50700 7 mds.0.cache.dir(100009c5531) already
fetching; waiting
2015-10-06 23:43:40.841444 7f255eb50700 7 mds.0.cache.dir(100009dc19c) already
fetching; waiting
2015-10-06 23:43:40.875112 7f255eb50700 7 mds.0.cache.dir(100009e3a8c) already
fetching; waiting
2015-10-06 23:43:40.890399 7f255eb50700 7 mds.0.cache.dir(100009e3a8c) already
fetching; waiting
2015-10-06 23:43:40.929537 7f255eb50700 7 mds.0.cache.dir(100009a66a4) already
fetching; waiting
2015-10-06 23:43:40.936432 7f255eb50700 7 mds.0.cache.dir(100009c8188) already
fetching; waiting
2015-10-06 23:43:40.975802 7f255ca4b700 -1 mds.0.688 *** got signal Terminated ***
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Yan, Zheng
2015-10-09 03:14:56 UTC
Permalink
On Wed, Oct 7, 2015 at 11:46 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
[...]
There are part of log for restarted mds debug 7 (without standby-replplay,
(PS How [un]safe multiple mds in current hammer? Now I try & temporary work
with "set_max_mds 3", but 1 mds shutdown still too laggy for related client
anymore. And anymore 4.2.3 kernel driver buggy and at least need hard tuning
to normal work in real busy mode...)
2015-10-06 23:38:34.420499 7f256905a780 0 ceph version 0.94.3-242-g79385a8
(79385a85beea9bccd82c99b6bda653f0224c4fcd), process ceph-mds, pid 17337
2015-10-06 23:38:34.590909 7f256905a780 -1 mds.-1.0 log_to_monitors {default=true}
full = 0 epoch = 159091
2015-10-06 23:38:34.608457 7f2561355700 5 mds.-1.0 handle_mds_map epoch
7241 from mon.2
full = 0 epoch = 159091
2015-10-06 23:38:34.608614 7f2561355700 5 mds.-1.-1 handle_mds_map epoch
7241 from mon.2
2015-10-06 23:38:34.608618 7f2561355700 5 mds.-1.-1 old map epoch 7241 <=
7241, discarding
full = 0 epoch = 159092
2015-10-06 23:38:35.390461 7f2561355700 5 mds.-1.-1 handle_mds_map epoch
7242 from mon.2
2015-10-06 23:38:35.390529 7f2561355700 1 mds.-1.0 handle_mds_map standby
2015-10-06 23:38:35.607255 7f2561355700 5 mds.-1.0 handle_mds_map epoch
7243 from mon.2
2015-10-06 23:38:35.607292 7f2561355700 1 mds.0.688 handle_mds_map i am now
mds.0.688
2015-10-06 23:38:35.607310 7f2561355700 1 mds.0.688 handle_mds_map state
change up:standby --> up:replay
2015-10-06 23:38:35.607313 7f2561355700 1 mds.0.688 replay_start
2015-10-06 23:38:35.607316 7f2561355700 7 mds.0.cache set_recovery_set
2015-10-06 23:38:35.607318 7f2561355700 1 mds.0.688 recovery set is
2015-10-06 23:38:35.607321 7f2561355700 2 mds.0.688 boot_start 0: opening inotable
2015-10-06 23:38:35.648096 7f2561355700 2 mds.0.688 boot_start 0: opening
sessionmap
2015-10-06 23:38:35.648227 7f2561355700 2 mds.0.688 boot_start 0: opening mds log
2015-10-06 23:38:35.648230 7f2561355700 5 mds.0.log open discovering log bounds
2015-10-06 23:38:35.648281 7f2561355700 2 mds.0.688 boot_start 0: opening
snap table
2015-10-06 23:38:35.695211 7f255bf47700 4 mds.0.log Waiting for journal 300
to recover...
2015-10-06 23:38:35.699398 7f255bf47700 4 mds.0.log Journal 300 recovered.
2015-10-06 23:38:35.699408 7f255bf47700 4 mds.0.log Recovered journal 300
in format 1
loading/discovering base inodes
2015-10-06 23:38:35.699501 7f255bf47700 0 mds.0.cache creating system inode
with ino:100
2015-10-06 23:38:35.716269 7f255bf47700 0 mds.0.cache creating system inode
with ino:1
2015-10-06 23:38:35.851761 7f255eb50700 2 mds.0.688 boot_start 2: replaying mds log
2015-10-06 23:38:36.272808 7f255bf47700 7 mds.0.cache adjust_subtree_auth
-1,-2 -> -2,-2 on [dir 1 / [2,head] auth v=0 cv=0/0 state=1073741824 f() n()
hs=0+0,ss=0+0 0x5cde000]
2015-10-06 23:38:36.272837 7f255bf47700 7 mds.0.cache current root is [dir
1 / [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 |
subtree=1 0x5cde000]
2015-10-06 23:38:36.272849 7f255bf47700 7 mds.0.cache adjust_subtree_auth
-1,-2 -> -2,-2 on [dir 100 ~mds0/ [2,head] auth v=0 cv=0/0 state=1073741824
f() n() hs=0+0,ss=0+0 0x5cde3c0]
2015-10-06 23:38:36.272855 7f255bf47700 7 mds.0.cache current root is [dir
100 ~mds0/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 |
subtree=1 0x5cde3c0]
2015-10-06 23:38:36.282049 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 1 / [2,head] auth
v=24837995 cv=0/0 dir_auth=-2 state=1073741824 f(v175 m2015-10-05
14:34:45.120292 73=45+28) n(v2043504 rc2015-10-06 23:16:58.837591
b4465915097499 1125727=964508+161219)/n(v2043504 rc2015-10-06
23:16:46.667354 b4465915097417 1125726=964507+161219) hs=0+0,ss=0+0 |
subtree=1 0x5cde000] bound_dfs []
2015-10-06 23:38:36.282086 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 1 / [2,head] auth
v=24837995 cv=0/0 dir_auth=-2 state=1073741824 f(v175 m2015-10-05
14:34:45.120292 73=45+28) n(v2043504 rc2015-10-06 23:16:58.837591
b4465915097499 1125727=964508+161219)/n(v2043504 rc2015-10-06
23:16:46.667354 b4465915097417 1125726=964507+161219) hs=0+0,ss=0+0 |
subtree=1 0x5cde000] bounds
2015-10-06 23:38:36.282100 7f255bf47700 7 mds.0.cache current root is [dir
1 / [2,head] auth v=24837995 cv=0/0 dir_auth=-2 state=1073741824 f(v175
m2015-10-05 14:34:45.120292 73=45+28) n(v2043504 rc2015-10-06
23:16:58.837591 b4465915097499 1125727=964508+161219)/n(v2043504
rc2015-10-06 23:16:46.667354 b4465915097417 1125726=964507+161219)
hs=0+0,ss=0+0 | subtree=1 0x5cde000]
2015-10-06 23:38:36.282147 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 100 ~mds0/ [2,head] auth
v=28417783 cv=0/0 dir_auth=-2 state=1073741824 f(v0 11=1+10) n(v821531
rc2015-10-06 23:16:58.847591 b2144931 594=584+10)/n(v821531 rc2015-10-06
23:14:18.064458 b2136353 593=583+10) hs=0+0,ss=0+0 | subtree=1 0x5cde3c0]
bound_dfs []
2015-10-06 23:38:36.282161 7f255bf47700 7 mds.0.cache
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 100 ~mds0/ [2,head] auth
v=28417783 cv=0/0 dir_auth=-2 state=1073741824 f(v0 11=1+10) n(v821531
rc2015-10-06 23:16:58.847591 b2144931 594=584+10)/n(v821531 rc2015-10-06
23:14:18.064458 b2136353 593=583+10) hs=0+0,ss=0+0 | subtree=1 0x5cde3c0] bounds
2015-10-06 23:38:36.282171 7f255bf47700 7 mds.0.cache current root is [dir
100 ~mds0/ [2,head] auth v=28417783 cv=0/0 dir_auth=-2 state=1073741824 f(v0
11=1+10) n(v821531 rc2015-10-06 23:16:58.847591 b2144931
594=584+10)/n(v821531 rc2015-10-06 23:14:18.064458 b2136353 593=583+10)
hs=0+0,ss=0+0 | subtree=1 0x5cde3c0]
2015-10-06 23:38:36.282180 7f255bf47700 7 mds.0.cache recalc_auth_bits (replay)
2015-10-06 23:38:36.282183 7f255bf47700 7 mds.0.cache show_cache
2015-10-06 23:38:36.282195 7f255bf47700 7 mds.0.cache unlinked [inode 100
[...2,head] ~mds0/ auth v3265584 snaprealm=0x38be880 f(v0 11=1+10) n(v817982
rc2015-10-06 08:04:17.159725 b1990579 489=476+13)/n(v0 11=1+10) (iversion
lock) | dirfrag=1 0x3ab8000]
2015-10-06 23:38:36.282210 7f255bf47700 7 mds.0.cache dirfrag [dir 100
~mds0/ [2,head] auth v=28417783 cv=0/0 dir_auth=0 state=1073741824 f(v0
11=1+10) n(v821531 rc2015-10-06 23:16:58.847591 b2144931
594=584+10)/n(v821531 rc2015-10-06 23:14:18.064458 b2136353 593=583+10)
hs=0+0,ss=0+0 | subtree=1 0x5cde3c0]
2015-10-06 23:38:36.282226 7f255bf47700 7 mds.0.cache unlinked [inode 1
[...2,head] / auth v10215492 snaprealm=0x38beac0 f(v175 m2015-09-17
16:18:51.151569 69=41+28) n(v1901802 rc2015-09-18 03:16:52.300905
b5226530189072 205909=160799+45110)/n(v0 1=0+1) (iversion lock) | dirfrag=1
0x3ab89a0]
2015-10-06 23:38:36.282236 7f255bf47700 7 mds.0.cache dirfrag [dir 1 /
[2,head] auth v=24837995 cv=0/0 dir_auth=0 state=1073741824 f(v175
m2015-10-05 14:34:45.120292 73=45+28) n(v2043504 rc2015-10-06
23:16:58.837591 b4465915097499 1125727=964508+161219)/n(v2043504
rc2015-10-06 23:16:46.667354 b4465915097417 1125726=964507+161219)
hs=0+0,ss=0+0 | subtree=1 0x5cde000]
replayed op client.5330402:205687,205685 used ino 10000a6cc65 but session
next is 10000a666f7
replayed op client.5330402:205691,205685 used ino 10000a6cc66 but session
next is 10000a666f7
replayed op client.5330402:205693,205685 used ino 10000a6cc67 but session
next is 10000a666f7
2015-10-06 23:38:36.767359 7f255bf47700 1 mds.0.688 replay_done
2015-10-06 23:38:36.767372 7f255bf47700 1 mds.0.688 making mds journal writeable
2015-10-06 23:38:36.767824 7f255bf47700 2 mds.0.688 i am alone, moving to
state reconnect
2015-10-06 23:38:36.767827 7f255bf47700 3 mds.0.688 request_state up:reconnect
2015-10-06 23:38:38.153519 7f2559a30700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.104:0/928408192 pipe(0x546f000 sd=29 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6880).accept peer addr is really 10.227.227.104:0/928408192 (socket
is 10.227.227.104:44734/0)
2015-10-06 23:38:38.153764 7f255992f700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.106:0/2753586590 pipe(0x5480000 sd=30 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6700).accept peer addr is really 10.227.227.106:0/2753586590 (socket
is 10.227.227.106:40232/0)
2015-10-06 23:38:38.154183 7f2561355700 7 mds.0.server
handle_client_reconnect client.5293917
2015-10-06 23:38:38.154392 7f2561355700 7 mds.0.server
handle_client_reconnect client.5293920
2015-10-06 23:38:38.154676 7f2561355700 7 mds.0.server
handle_client_reconnect client.5297061
2015-10-06 23:38:38.154927 7f2561355700 7 mds.0.server
handle_client_reconnect client.5299230
2015-10-06 23:38:38.159228 7f2561355700 7 mds.0.server
handle_client_reconnect client.5285889
2015-10-06 23:38:38.160108 7f2559026700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.103:0/4210455448 pipe(0x54a4000 sd=34 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6b80).accept peer addr is really 10.227.227.103:0/4210455448 (socket
is 10.227.227.103:49749/0)
2015-10-06 23:38:38.160243 7f2558f25700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.101:0/3087077443 pipe(0x54a9000 sd=35 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e6d00).accept peer addr is really 10.227.227.101:0/3087077443 (socket
is 10.227.227.101:44073/0)
2015-10-06 23:38:38.160340 7f2558d23700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.102:0/1177330677 pipe(0x54bb000 sd=37 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e7000).accept peer addr is really 10.227.227.102:0/1177330677 (socket
is 10.227.227.102:35575/0)
2015-10-06 23:38:38.160723 7f2561355700 7 mds.0.server
handle_client_reconnect client.5294037
2015-10-06 23:38:38.160865 7f2561355700 7 mds.0.server
handle_client_reconnect client.5303329
2015-10-06 23:38:38.160932 7f2561355700 7 mds.0.server
handle_client_reconnect client.3093290
2015-10-06 23:38:38.161235 7f2561355700 7 mds.0.server
handle_client_reconnect client.5300956
2015-10-06 23:38:38.161808 7f2561355700 7 mds.0.server
handle_client_reconnect client.5306077
2015-10-06 23:38:38.162426 7f2561355700 7 mds.0.server
handle_client_reconnect client.3102997
2015-10-06 23:38:38.300492 7f2561355700 5 mds.0.688 handle_mds_map epoch
7244 from mon.2
2015-10-06 23:38:38.300564 7f2561355700 1 mds.0.688 handle_mds_map i am now
mds.0.688
2015-10-06 23:38:38.300571 7f2561355700 1 mds.0.688 handle_mds_map state
change up:replay --> up:reconnect
2015-10-06 23:38:38.300574 7f2561355700 1 mds.0.688 reconnect_start
2015-10-06 23:38:38.300576 7f2561355700 1 mds.0.688 reopen_log
2015-10-06 23:38:38.330199 7f2561355700 1 mds.0.server reconnect_clients --
16 sessions
2015-10-06 23:38:38.330219 7f2561355700 7 mds.0.server
handle_client_reconnect client.5293917
reconnect by client.5293917 10.227.227.104:0/928408192 after 0.000023
2015-10-06 23:38:38.330330 7f2561355700 7 mds.0.server
handle_client_reconnect client.5293920
reconnect by client.5293920 10.227.227.106:0/2753586590 after 0.000137
2015-10-06 23:38:38.330362 7f2561355700 7 mds.0.server
handle_client_reconnect client.5297061
reconnect by client.5297061 10.227.227.104:0/2947571808 after 0.000165
2015-10-06 23:38:38.330388 7f2561355700 7 mds.0.server
handle_client_reconnect client.5299230
reconnect by client.5299230 10.227.227.104:0/4118952172 after 0.000191
2015-10-06 23:38:38.330445 7f2561355700 7 mds.0.server
handle_client_reconnect client.5285889
reconnect by client.5285889 10.227.227.101:0/20130 after 0.000251
2015-10-06 23:38:38.330474 7f2561355700 7 mds.0.server
handle_client_reconnect client.5294037
reconnect by client.5294037 10.227.227.103:0/4210455448 after 0.000277
2015-10-06 23:38:38.330497 7f2561355700 7 mds.0.server
handle_client_reconnect client.5303329
reconnect by client.5303329 10.227.227.101:0/3087077443 after 0.000300
2015-10-06 23:38:38.330527 7f2561355700 7 mds.0.server
handle_client_reconnect client.3093290
reconnect by client.3093290 10.227.227.102:0/1177330677 after 0.000330
2015-10-06 23:38:38.330561 7f2561355700 7 mds.0.server
handle_client_reconnect client.5300956
reconnect by client.5300956 10.227.227.106:0/109306 after 0.000363
2015-10-06 23:38:38.330581 7f2561355700 7 mds.0.server
handle_client_reconnect client.5306077
reconnect by client.5306077 10.227.227.104:0/2137 after 0.000383
2015-10-06 23:38:38.330601 7f2561355700 7 mds.0.server
handle_client_reconnect client.3102997
reconnect by client.3102997 10.227.227.102:0/8221 after 0.000403
2015-10-06 23:38:38.330650 7f2561355700 5 mds.0.bal rebalance done
2015-10-06 23:38:38.330951 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.330967 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331075 7f2561355700 7 mds.0.server
handle_client_reconnect client.5330090
reconnect by client.5330090 10.227.227.104:0/3762438977 after 0.000878
2015-10-06 23:38:38.331138 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331148 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331254 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331454 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331500 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.331602 7f2561355700 7 mds.0.server
handle_client_reconnect client.5325905
reconnect by client.5325905 10.227.227.103:0/17787 after 0.001405
2015-10-06 23:38:38.332393 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.361193 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.361286 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.361360 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.390186 7f2557e14700 0 -- 10.227.227.104:6801/17337 >>
10.227.227.200:0/2048866910 pipe(0x3b87000 sd=43 :6801 s=0 pgs=0 cs=0 l=0
c=0x38e7900).accept peer addr is really 10.227.227.200:0/2048866910 (socket
is 10.227.227.200:42145/0)
2015-10-06 23:38:38.390688 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.390751 7f2561355700 7 mds.0.server
handle_client_reconnect client.5325809
reconnect by client.5325809 10.227.227.200:0/2048866910 after 0.060554
2015-10-06 23:38:38.396656 7f2561355700 7 mds.0.server
handle_client_reconnect client.5330402
reconnect by client.5330402 10.227.227.204:0/3522816523 after 0.066459
2015-10-06 23:38:38.629573 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629608 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629622 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629639 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629661 7f2561355700 3 mds.0.server queuing replayed op
2015-10-06 23:38:38.629679 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629698 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629753 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629779 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629792 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629845 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629887 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629943 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:38.629962 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a68438 follows 1 op flush
2015-10-06 23:38:38.629976 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a6843a follows 1 op flush
2015-10-06 23:38:38.630088 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a6843c follows 1 op flush
2015-10-06 23:38:38.630104 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a68439 follows 1 op flush
2015-10-06 23:38:38.630119 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a6843b follows 1 op flush
2015-10-06 23:38:38.858273 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a348a2 follows 0 op update
2015-10-06 23:38:38.858329 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a348a2 follows 0 op update
2015-10-06 23:38:38.931186 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a35092 follows 0 op update
2015-10-06 23:38:38.958907 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a35092 follows 0 op update
2015-10-06 23:38:39.111386 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a350ed follows 0 op update
2015-10-06 23:38:39.111791 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a350ed follows 0 op update
2015-10-06 23:38:39.157380 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a348a2 follows 0 op update
2015-10-06 23:38:39.291335 7f2561355700 7 mds.0.server
handle_client_reconnect client.5298246
reconnect by client.5298246 10.227.227.205:0/3532402607 after 0.961140
It seems you have 16 mounts. Are you using kernel client or fuse
client, which version are they?

Yan, Zheng
Post by Dzianis Kahanovich
2015-10-06 23:38:39.725442 7f2561355700 7 mds.0.server
reconnect_gather_finish. failed on 0 clients
2015-10-06 23:38:39.725454 7f2561355700 1 mds.0.688 reconnect_done
2015-10-06 23:38:39.725455 7f2561355700 3 mds.0.688 request_state up:rejoin
2015-10-06 23:38:39.860274 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a68440 follows 1 op update
2015-10-06 23:38:39.860333 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:39.860343 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:39.860364 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a68442 follows 1 op update
2015-10-06 23:38:39.860381 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a68441 follows 1 op update
2015-10-06 23:38:39.914539 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a474e9 follows 0 op update
2015-10-06 23:38:40.327674 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:40.726731 7f2561355700 5 mds.0.688 handle_mds_map epoch
7245 from mon.2
2015-10-06 23:38:40.726778 7f2561355700 1 mds.0.688 handle_mds_map i am now
mds.0.688
2015-10-06 23:38:40.726780 7f2561355700 1 mds.0.688 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 23:38:40.726783 7f2561355700 1 mds.0.688 rejoin_start
2015-10-06 23:38:52.802698 7f2561355700 1 mds.0.688 rejoin_joint_start
2015-10-06 23:38:52.802702 7f2561355700 7 mds.0.cache rejoin_send_rejoins
still processing imported caps, delaying
2015-10-06 23:38:52.802716 7f2561355700 5 mds.0.bal rebalance done
2015-10-06 23:38:52.803065 7f2561355700 7 mds.0.locker handle_client_caps
on 100009b0da6 follows 0 op update
2015-10-06 23:38:52.803091 7f2561355700 7 mds.0.locker handle_client_caps
on 10000a68444 follows 1 op update
2015-10-06 23:38:52.803511 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
2015-10-06 23:38:52.803614 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:38:52.803636 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
2015-10-06 23:38:52.803718 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
2015-10-06 23:38:52.803796 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
2015-10-06 23:38:52.803842 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
2015-10-06 23:38:52.804182 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
2015-10-06 23:38:52.804193 7f255eb50700 7 mds.0.cache.dir(1000002e7cf)
already fetching; waiting
[...]
2015-10-06 23:43:40.232678 7f255d24c700 7 mds.0.688 mds has 3 queued contexts
2015-10-06 23:43:40.254516 7f255eb50700 7 mds.0.cache.dir(100009a7cdd)
already fetching; waiting
2015-10-06 23:43:40.282586 7f255eb50700 7 mds.0.cache.dir(100009e3a8c)
already fetching; waiting
2015-10-06 23:43:40.327519 7f2561355700 3 mds.0.server not active yet, waiting
2015-10-06 23:43:40.344125 7f255eb50700 7 mds.0.cache.dir(100009aca65)
already fetching; waiting
2015-10-06 23:43:40.382397 7f255eb50700 7 mds.0.cache.dir(100009a8068)
already fetching; waiting
2015-10-06 23:43:40.406319 7f255eb50700 7 mds.0.cache.dir(100009e2ba3)
already fetching; waiting
2015-10-06 23:43:40.439557 7f255eb50700 7 mds.0.cache.dir(100009a84ee)
already fetching; waiting
2015-10-06 23:43:40.481450 7f255eb50700 7 mds.0.cache.dir(100009c5531)
already fetching; waiting
2015-10-06 23:43:40.522600 7f255eb50700 7 mds.0.cache.dir(100009a4a3e)
already fetching; waiting
2015-10-06 23:43:40.537671 7f255eb50700 7 mds.0.cache.dir(100009acc63)
already fetching; waiting
2015-10-06 23:43:40.539612 7f255eb50700 7 mds.0.cache.dir(100009dc614)
already fetching; waiting
2015-10-06 23:43:40.610772 7f255eb50700 7 mds.0.cache.dir(100009dc614)
already fetching; waiting
2015-10-06 23:43:40.640233 7f255eb50700 7 mds.0.cache.dir(100009f2867)
already fetching; waiting
2015-10-06 23:43:40.647848 7f255eb50700 7 mds.0.cache.dir(100009d0739)
already fetching; waiting
2015-10-06 23:43:40.663730 7f255eb50700 7 mds.0.cache.dir(100009a8776)
already fetching; waiting
2015-10-06 23:43:40.723566 7f255eb50700 7 mds.0.cache.dir(100009de1cd)
already fetching; waiting
2015-10-06 23:43:40.747002 7f255eb50700 7 mds.0.cache.dir(100009c8188)
already fetching; waiting
2015-10-06 23:43:40.780093 7f255eb50700 7 mds.0.cache.dir(10000a404d7)
already fetching; waiting
2015-10-06 23:43:40.823139 7f255eb50700 7 mds.0.cache.dir(100009a90de)
already fetching; waiting
2015-10-06 23:43:40.835196 7f255eb50700 7 mds.0.cache.dir(100009c5531)
already fetching; waiting
2015-10-06 23:43:40.841444 7f255eb50700 7 mds.0.cache.dir(100009dc19c)
already fetching; waiting
2015-10-06 23:43:40.875112 7f255eb50700 7 mds.0.cache.dir(100009e3a8c)
already fetching; waiting
2015-10-06 23:43:40.890399 7f255eb50700 7 mds.0.cache.dir(100009e3a8c)
already fetching; waiting
2015-10-06 23:43:40.929537 7f255eb50700 7 mds.0.cache.dir(100009a66a4)
already fetching; waiting
2015-10-06 23:43:40.936432 7f255eb50700 7 mds.0.cache.dir(100009c8188)
already fetching; waiting
2015-10-06 23:43:40.975802 7f255ca4b700 -1 mds.0.688 *** got signal Terminated ***
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Dzianis Kahanovich
2015-10-09 10:49:36 UTC
Permalink
Post by Yan, Zheng
It seems you have 16 mounts. Are you using kernel client or fuse
client, which version are they?
1) On every of 4x ceph node:
4.1.8 kernel mount to /mnt/ceph0 (used only for samba/ctdb lockfile);
fuse mount to /mnt/ceph1 (some used);
samba cluster (ctdb) with vfs_ceph;
2) On 2 additional out-of-cluster (service) nodes:
4.1.8 (now 4.2.3) kernel mount;
4.1.0 both mounts;
3) 2 VMs:
kernel mounts (most active: web & mail);
4.2.3;

fuse mounts - same version with ceph;
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-09 11:01:38 UTC
Permalink
PS: https://bugzilla.kernel.org/show_bug.cgi?id=105671
There was (solved) deadlock problem with CLONE_FS -> CLONE_VFORK. Not restarted
mds after solution, so don't know how it related, but very like to.

Short: process A (daemon) CLONE_FS -> B (workers) CLONE_VFORK -> C (clients);
looks like freezed by CLONE_VFORK (B) freeze (A) & others (B), (but sometime on
PREEMPT, always - on PREEMPT_NONE).

I will restart mds this night, will look to restart time.
Post by Dzianis Kahanovich
Post by Yan, Zheng
It seems you have 16 mounts. Are you using kernel client or fuse
client, which version are they?
4.1.8 kernel mount to /mnt/ceph0 (used only for samba/ctdb lockfile);
fuse mount to /mnt/ceph1 (some used);
samba cluster (ctdb) with vfs_ceph;
4.1.8 (now 4.2.3) kernel mount;
4.1.0 both mounts;
kernel mounts (most active: web & mail);
4.2.3;
fuse mounts - same version with ceph;
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Yan, Zheng
2015-10-09 11:10:07 UTC
Permalink
On Fri, Oct 9, 2015 at 6:49 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Post by Yan, Zheng
It seems you have 16 mounts. Are you using kernel client or fuse
client, which version are they?
4.1.8 kernel mount to /mnt/ceph0 (used only for samba/ctdb lockfile);
fuse mount to /mnt/ceph1 (some used);
samba cluster (ctdb) with vfs_ceph;
4.1.8 (now 4.2.3) kernel mount;
4.1.0 both mounts;
kernel mounts (most active: web & mail);
4.2.3;
fuse mounts - same version with ceph;
please run "ceph daemon mds.x session ls" to find which client has
largest number of caps. mds.x is ID of active mds.

Yan, Zheng
Post by Dzianis Kahanovich
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-09 11:26:09 UTC
Permalink
Post by Yan, Zheng
Post by Dzianis Kahanovich
Post by Yan, Zheng
It seems you have 16 mounts. Are you using kernel client or fuse
client, which version are they?
4.1.8 kernel mount to /mnt/ceph0 (used only for samba/ctdb lockfile);
fuse mount to /mnt/ceph1 (some used);
samba cluster (ctdb) with vfs_ceph;
4.1.8 (now 4.2.3) kernel mount;
4.1.0 both mounts;
kernel mounts (most active: web & mail);
4.2.3;
fuse mounts - same version with ceph;
please run "ceph daemon mds.x session ls" to find which client has
largest number of caps. mds.x is ID of active mds.
1) This command is not valid more ;) - "cephfs-table-tool x show session" now.

2) I have 3 active mds now. I try, it works, keep it. Restart still problematic.

3) Yes, more caps on master VM (4.2.3 kernel mount, there are web+mail+heartbeat
cluster 2xVMs) on apache root. In this place was (no more now) described
CLONE_FS -> CLONE_VFORK deadlocks. But 4.2.3 installed just before tests, was
4.1.8 with similar effects (but log from 4.2.3 on VM clients).

Waith this night for MDSs restart.
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-09 15:37:47 UTC
Permalink
Post by Dzianis Kahanovich
Waith this night for MDSs restart.
Now (3 active mds, balanced, nodes upgrade from git & restart - first - mon,
next - mds+osd, without debug) - reduced time, but anymore clients wait to
rejoin complete of related mds. May be need rebalance with maximum "busy" weight
on shutdown?

2015-10-09 18:08:29.828889 7f7d67cc3700 1 mds.1.4 rejoin_joint_start
2015-10-09 18:09:50.782661 7f7d63bba700 0 log_channel(cluster) log [WRN] :
open_snap_parents has:
2015-10-09 18:09:50.782668 7f7d63bba700 0 log_channel(cluster) log [WRN] :
unconnected snaprealm 100
2015-10-09 18:09:50.782672 7f7d63bba700 0 log_channel(cluster) log [WRN] :
client.5302119 snapid 1
2015-10-09 18:09:50.782675 7f7d63bba700 0 log_channel(cluster) log [WRN] :
client.5311408 snapid 1
2015-10-09 18:09:50.782677 7f7d63bba700 0 log_channel(cluster) log [WRN] :
unconnected snaprealm 102
2015-10-09 18:09:50.782680 7f7d63bba700 0 log_channel(cluster) log [WRN] :
client.5302119 snapid 1
2015-10-09 18:09:50.827008 7f7d63bba700 1 mds.1.4 rejoin_done

2015-10-09 18:21:48.345565 7f6471277700 1 mds.2.5 rejoin_start
2015-10-09 18:21:51.443000 7f6471277700 1 mds.2.5 rejoin_joint_start
2015-10-09 18:23:13.381992 7f646d16e700 0 log_channel(cluster) log [WRN] :
open_snap_parents has:
2015-10-09 18:23:13.382002 7f646d16e700 0 log_channel(cluster) log [WRN] :
unconnected snaprealm 100
2015-10-09 18:23:13.382012 7f646d16e700 0 log_channel(cluster) log [WRN] :
client.5302119 snapid 1
2015-10-09 18:23:13.382015 7f646d16e700 0 log_channel(cluster) log [WRN] :
client.5311408 snapid 1
2015-10-09 18:23:13.474911 7f646d16e700 1 mds.2.5 rejoin_done

2015-10-09 18:28:17.767139 7f0c36252700 1 mds.0.710 rejoin_joint_start
2015-10-09 18:28:35.062585 7f0c36252700 0 log_channel(cluster) log [WRN] :
open_snap_parents has:
2015-10-09 18:28:35.062592 7f0c36252700 0 log_channel(cluster) log [WRN] :
unconnected snaprealm 102
2015-10-09 18:28:35.062597 7f0c36252700 0 log_channel(cluster) log [WRN] :
client.5302119 snapid 1
2015-10-09 18:28:35.079123 7f0c36252700 1 mds.0.710 rejoin_done
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Yan, Zheng
2015-10-10 02:25:48 UTC
Permalink
On Fri, Oct 9, 2015 at 7:26 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
Post by Yan, Zheng
Post by Dzianis Kahanovich
Post by Yan, Zheng
It seems you have 16 mounts. Are you using kernel client or fuse
client, which version are they?
4.1.8 kernel mount to /mnt/ceph0 (used only for samba/ctdb lockfile);
fuse mount to /mnt/ceph1 (some used);
samba cluster (ctdb) with vfs_ceph;
4.1.8 (now 4.2.3) kernel mount;
4.1.0 both mounts;
kernel mounts (most active: web & mail);
4.2.3;
fuse mounts - same version with ceph;
please run "ceph daemon mds.x session ls" to find which client has
largest number of caps. mds.x is ID of active mds.
1) This command is not valid more ;) - "cephfs-table-tool x show session" now.
you need to run this command on machine that runs the active MDS. this
command show some in-memory states of session. cephfs-table-tool only
show on-disk states of session.
Post by Dzianis Kahanovich
2) I have 3 active mds now. I try, it works, keep it. Restart still problematic.
multiple active MDS is not ready for production.
Post by Dzianis Kahanovich
3) Yes, more caps on master VM (4.2.3 kernel mount, there are
web+mail+heartbeat cluster 2xVMs) on apache root. In this place was (no more
now) described CLONE_FS -> CLONE_VFORK deadlocks. But 4.2.3 installed just
before tests, was 4.1.8 with similar effects (but log from 4.2.3 on VM
clients).
I suspect your problem is due to some mount have too many open files.
during mds failover, mds needs to open these files, which take a long
time.

Yan, Zheng
Post by Dzianis Kahanovich
Waith this night for MDSs restart.
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-14 11:55:27 UTC
Permalink
Post by Yan, Zheng
Post by Dzianis Kahanovich
2) I have 3 active mds now. I try, it works, keep it. Restart still problematic.
multiple active MDS is not ready for production.
OK, so if I run 3 active now (looks good) - better to turn back to 1?
Post by Yan, Zheng
Post by Dzianis Kahanovich
3) Yes, more caps on master VM (4.2.3 kernel mount, there are
web+mail+heartbeat cluster 2xVMs) on apache root. In this place was (no more
now) described CLONE_FS -> CLONE_VFORK deadlocks. But 4.2.3 installed just
before tests, was 4.1.8 with similar effects (but log from 4.2.3 on VM
clients).
I suspect your problem is due to some mount have too many open files.
during mds failover, mds needs to open these files, which take a long
time.
Can some kind of cache improve behaviour?
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Dzianis Kahanovich
2015-10-06 13:35:31 UTC
Permalink
Sorry, skipped some...
Post by John Spray
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich
Post by Dzianis Kahanovich
e7151: 1/1/1 up {0=b=up:active}, 2 up:standby
Cluster stuck on KILL active mds.b. How to correctly stop mds to get
behaviour like on MONs - leader->down/peon->leader?
It's not clear to me why you're saying it's stuck. Is it stuck, or is it slow?
Post by Dzianis Kahanovich
From that log it looks like you're restarting mds.b many times in one
day, that's kind of unusual. Are you really doing all those restarts
by hand, or is something else going wrong?
The expected behaviour is that when mds.b restarts, the mon notices
that the old mds.b instance is dead, and hands the role to the
standby-replay mds.
Yes, +1. I want to say same. But now it not work [for me].
Post by John Spray
Post by Dzianis Kahanovich
Usually I upgrade ceph by simple "/etc/init.d/ceph restart" (KILLs +
starts). Surprise for me - only MDS need special actions.
What special actions are you having to perform? It looks like your
cluster is coming back online eventually?
John
Post by Dzianis Kahanovich
Post by John Spray
You could try setting a higher debug level (e.g. debug mds = 10) on
your MDS before it takes over, so that the log output can give us an
idea of what the daemon is doing while it's stuck in rejoin.
OK, debug later - this is production ;). But even 5 min. rejoin must be not
too problem if other 2 nodes up. Even degraded OSDs still accessible (if
balanced size/min_size & CRUSH).
# grep rejoin ceph-mds.b.log
2015-10-06 03:16:28.102415 7f91d0852700 1 mds.0.634 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:16:28.102417 7f91d0852700 1 mds.0.634 rejoin_start
2015-10-06 03:16:33.028668 7f91d0852700 1 mds.0.634 rejoin_joint_start
2015-10-06 03:18:05.266657 7f91cc749700 1 mds.0.634 rejoin_done
2015-10-06 03:18:06.070233 7f91d0852700 1 mds.0.634 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:31:22.860780 7f8ab6643700 1 mds.0.636 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:31:22.860783 7f8ab6643700 1 mds.0.636 rejoin_start
2015-10-06 03:31:32.771089 7f8ab6643700 1 mds.0.636 rejoin_joint_start
2015-10-06 03:32:48.644880 7f8ab2439700 1 mds.0.636 rejoin_done
2015-10-06 03:32:49.412352 7f8ab6643700 1 mds.0.636 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 03:57:03.625397 7f981c944700 1 mds.0.639 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 03:57:03.625400 7f981c944700 1 mds.0.639 rejoin_start
2015-10-06 03:57:14.561840 7f981c944700 1 mds.0.639 rejoin_joint_start
2015-10-06 03:58:26.875557 7f981883b700 1 mds.0.639 rejoin_done
2015-10-06 03:58:28.159967 7f981c944700 1 mds.0.639 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:12:49.984929 7f1afa6d7700 1 mds.0.642 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:12:49.984932 7f1afa6d7700 1 mds.0.642 rejoin_start
2015-10-06 04:13:01.391428 7f1afa6d7700 1 mds.0.642 rejoin_joint_start
2015-10-06 04:14:38.680632 7f1af65ce700 1 mds.0.642 rejoin_done
2015-10-06 04:14:39.802623 7f1afa6d7700 1 mds.0.642 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:23:55.942713 7f028b2a9700 1 mds.0.645 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:23:55.942716 7f028b2a9700 1 mds.0.645 rejoin_start
2015-10-06 04:24:06.260830 7f028b2a9700 1 mds.0.645 rejoin_joint_start
2015-10-06 04:24:19.627641 7f028699f700 1 mds.0.645 suicide. wanted
down:dne, now up:rejoin
2015-10-06 04:35:53.910743 7f32cee3d700 1 mds.0.648 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:35:53.910746 7f32cee3d700 1 mds.0.648 rejoin_start
2015-10-06 04:36:03.541504 7f32cee3d700 1 mds.0.648 rejoin_joint_start
2015-10-06 04:37:14.470805 7f32cad34700 1 mds.0.648 rejoin_done
2015-10-06 04:37:15.390864 7f32cee3d700 1 mds.0.648 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:40:46.878251 7f6600df0700 1 mds.0.651 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:40:46.878254 7f6600df0700 1 mds.0.651 rejoin_start
2015-10-06 04:40:57.984821 7f6600df0700 1 mds.0.651 rejoin_joint_start
2015-10-06 04:43:23.230549 7f65fcce7700 1 mds.0.651 rejoin_done
2015-10-06 04:43:23.841793 7f6600df0700 1 mds.0.651 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 04:50:56.961706 7fb5871a5700 1 mds.0.655 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:50:56.961709 7fb5871a5700 1 mds.0.655 rejoin_start
2015-10-06 04:51:06.743421 7fb5871a5700 1 mds.0.655 rejoin_joint_start
2015-10-06 04:51:09.134144 7fb58289b700 1 mds.0.655 suicide. wanted
down:dne, now up:rejoin
2015-10-06 04:56:27.819070 7f64123e5700 1 mds.0.657 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 04:56:27.819072 7f64123e5700 1 mds.0.657 rejoin_start
2015-10-06 04:56:27.839223 7f64123e5700 1 mds.0.657 rejoin_joint_start
2015-10-06 04:56:30.375895 7f640e2dc700 1 mds.0.657 rejoin_done
2015-10-06 04:56:31.858593 7f64123e5700 1 mds.0.657 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 05:06:11.023545 7feef429a700 1 mds.0.660 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 05:06:11.023548 7feef429a700 1 mds.0.660 rejoin_start
2015-10-06 05:06:11.433153 7feef429a700 1 mds.0.660 rejoin_joint_start
2015-10-06 05:06:46.113313 7feef1a95700 1 mds.0.660 rejoin_done
2015-10-06 05:06:47.515843 7feef429a700 1 mds.0.660 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 09:42:59.932714 7fccadb81700 1 mds.0.664 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 09:42:59.932717 7fccadb81700 1 mds.0.664 rejoin_start
2015-10-06 09:43:00.497196 7fccadb81700 1 mds.0.664 rejoin_joint_start
2015-10-06 09:43:57.889918 7fcca9a78700 1 mds.0.664 rejoin_done
2015-10-06 09:43:58.490246 7fccadb81700 1 mds.0.664 handle_mds_map state
change up:rejoin --> up:clientreplay
2015-10-06 10:42:24.162929 7f7ddf175700 1 mds.0.666 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 10:42:24.162931 7f7ddf175700 1 mds.0.666 rejoin_start
2015-10-06 10:42:38.235885 7f7ddf175700 1 mds.0.666 rejoin_joint_start
2015-10-06 10:47:30.636129 7f7ddb06c700 1 mds.0.666 rejoin_done
2015-10-06 10:47:32.037131 7f7ddf175700 1 mds.0.666 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 12:10:32.496677 7f94435f4700 1 mds.0.670 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:10:32.496681 7f94435f4700 1 mds.0.670 rejoin_start
2015-10-06 12:10:45.968556 7f94435f4700 1 mds.0.670 rejoin_joint_start
2015-10-06 12:14:10.590516 7f943ecea700 1 mds.0.670 suicide. wanted
down:dne, now up:rejoin
2015-10-06 12:57:19.796554 7fbe9955b700 1 mds.0.676 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 12:57:19.796557 7fbe9955b700 1 mds.0.676 rejoin_start
2015-10-06 12:57:31.080582 7fbe9955b700 1 mds.0.676 rejoin_joint_start
2015-10-06 12:59:39.291300 7fbe95452700 1 mds.0.676 rejoin_done
2015-10-06 12:59:40.162822 7fbe9955b700 1 mds.0.676 handle_mds_map state
change up:rejoin --> up:active
2015-10-06 14:41:48.552281 7f8bc218d700 1 mds.0.681 handle_mds_map state
change up:reconnect --> up:rejoin
2015-10-06 14:41:48.552284 7f8bc218d700 1 mds.0.681 rejoin_start
2015-10-06 14:41:49.242241 7f8bc218d700 1 mds.0.681 rejoin_joint_start
2015-10-06 14:42:32.421263 7f8bbe084700 1 mds.0.681 rejoin_done
2015-10-06 14:42:33.341350 7f8bc218d700 1 mds.0.681 handle_mds_map state
change up:rejoin --> up:active
Post by John Spray
John
Post by Dzianis Kahanovich
PS I know - PGs too much, "mon pg warn max per osd = 1400"...
Post by Dzianis Kahanovich
John
Post by Dzianis Kahanovich
[mds]
mds recall state timeout = 120
mds bal mode = 1
mds standby replay = true
mds cache size = 500000
mds mem max = 2097152
mds op history size = 50
# vs. laggy beacon
mds decay halflife = 9
mds beacon interval = 8
mds beacon grace = 30
[mds.a]
host = megaserver1
[mds.b]
host = megaserver3
[mds.c]
host = megaserver4
(I trying to unswitch all non-defaults, IMHO no results - fixme)
Or may be I need special care on mds stop (now - SIGKILL).
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich,
http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
Loading...