Discussion:
[ceph-users] Effects of restoring a cluster's mon from an older backup
Hector Martin
2018-11-08 11:40:49 UTC
Permalink
I'm experimenting with single-host Ceph use cases, where HA is not
important but data durability is.

How does a Ceph cluster react to its (sole) mon being rolled back to an
earlier state? The idea here is that the mon storage may not be
redundant but would be (atomically, e.g. lvm snapshot and dump) backed
up, say, daily. If the cluster goes down and then is brought back up
with a mon backup that is several days to hours old, while the OSDs are
up to date, what are the potential consequences?

Of course I expect maintenance operations to be affected (obviously any
OSDs added/removed would likely get confused). But what about regular
operation? Things like snapshots and snapshot ranges. Is this likely to
cause data loss, or would the OSDs and clients largely not be affected
as long as the cluster config has not changed?

There's a way of rebuilding the monmap from OSD data:

http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

Would this be preferable to just restoring the mon from a backup? What
about the MDS map?
--
Hector Martin (***@marcansoft.com)
Public Key: https://mrcn.st/pub
Gregory Farnum
2018-11-09 21:35:51 UTC
Permalink
Post by Hector Martin
I'm experimenting with single-host Ceph use cases, where HA is not
important but data durability is.
How does a Ceph cluster react to its (sole) mon being rolled back to an
earlier state? The idea here is that the mon storage may not be
redundant but would be (atomically, e.g. lvm snapshot and dump) backed
up, say, daily. If the cluster goes down and then is brought back up
with a mon backup that is several days to hours old, while the OSDs are
up to date, what are the potential consequences?
Of course I expect maintenance operations to be affected (obviously any
OSDs added/removed would likely get confused). But what about regular
operation? Things like snapshots and snapshot ranges. Is this likely to
cause data loss, or would the OSDs and clients largely not be affected
as long as the cluster config has not changed?
http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
Would this be preferable to just restoring the mon from a backup?
Yes, do that, don't try and back up your monitor. If you restore a monitor
from backup then the monitor — your authoritative data source — will warp
back in time on what the OSD peering intervals look like, which snapshots
have been deleted and created, etc. It would be a huge disaster and
probably every running daemon or client would have to pause IO until the
monitor generated enough map epochs to "catch up" — and then the rest of
the cluster would start applying those changes and nothing would work right.
Post by Hector Martin
What
about the MDS map?
Unlike the OSDMap, the MDSMap doesn't really keep track of any persistent
data so it's much safer to rebuild or reset from scratch.
-Greg
Post by Hector Martin
--
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hector Martin
2018-11-12 09:16:11 UTC
Permalink
Post by Gregory Farnum
Yes, do that, don't try and back up your monitor. If you restore a
monitor from backup then the monitor — your authoritative data source —
will warp back in time on what the OSD peering intervals look like,
which snapshots have been deleted and created, etc. It would be a huge
disaster and probably every running daemon or client would have to pause
IO until the monitor generated enough map epochs to "catch up" — and
then the rest of the cluster would start applying those changes and
nothing would work right.
Thanks, I suspected this might be the case. Is there any reasonable safe
"backwards warp" time window (that would permit asynchronous replication
of mon storage to be good enough for disaster recovery), e.g. on the
order of seconds? I assume synchronous replication is fine (e.g. RAID or
DRBD configured correctly) since that's largely equivalent to local
storage. I'll probably go with something like that for mon durability.
Post by Gregory Farnum
Unlike the OSDMap, the MDSMap doesn't really keep track of any
persistent data so it's much safer to rebuild or reset from scratch.
-Greg
Good to know. I'll see if I can do some DR tests when I set this up, to
prove to myself that it all works out :-)
--
Hector Martin (***@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
Gregory Farnum
2018-11-15 05:36:34 UTC
Permalink
Post by Hector Martin
Post by Gregory Farnum
Yes, do that, don't try and back up your monitor. If you restore a
monitor from backup then the monitor — your authoritative data source —
will warp back in time on what the OSD peering intervals look like,
which snapshots have been deleted and created, etc. It would be a huge
disaster and probably every running daemon or client would have to pause
IO until the monitor generated enough map epochs to "catch up" — and
then the rest of the cluster would start applying those changes and
nothing would work right.
Thanks, I suspected this might be the case. Is there any reasonable safe
"backwards warp" time window (that would permit asynchronous replication
of mon storage to be good enough for disaster recovery), e.g. on the
order of seconds? I assume synchronous replication is fine (e.g. RAID or
DRBD configured correctly) since that's largely equivalent to local
storage. I'll probably go with something like that for mon durability.
Unfortunately there really isn't. Any situation in which a monitor goes
back in time opens up the possibility (even likelihood!) that updates which
directly impact data services can disappear and cause issues. Synchronous
replication is fine, although I'm not sure there's much advantage to doing
that over simply running another monitor in that disk location.
-Greg
Post by Hector Martin
Post by Gregory Farnum
Unlike the OSDMap, the MDSMap doesn't really keep track of any
persistent data so it's much safer to rebuild or reset from scratch.
-Greg
Good to know. I'll see if I can do some DR tests when I set this up, to
prove to myself that it all works out :-)
--
Public Key: https://marcan.st/marcan.asc
Loading...