Hector Martin
2018-11-23 15:54:42 UTC
Background: I'm running single-node Ceph with CephFS as an experimental
replacement for "traditional" filesystems. In this case I have 11 OSDs,
1 mon, and 1 MDS.
I just had an unclean shutdown (kernel panic) while a large (>1TB) file
was being copied to CephFS (via rsync). Upon bringing the system back
up, I noticed that the (incomplete) file has about 320MB worth of zeroes
at the end.
This is the kind of behavior I would expect of traditional local
filesystems, where file metadata was updated to reflect the new size of
a growing file before disk extents were allocated and filled with data,
so an unclean shutdown results in files with tails of zeroes, but I'm
surprised to see it with Ceph. I expected the OSD side of things should
be atomic with all the BlueStore goodness, checksums, etc. I figured
CephFS would build upon those primitives in a way that this kind of
inconsistency isn't possible.
Is this expected behavior? It's not a huge dealbreaker, but I'd like to
understand how this kind of situation happens in CephFS (and how it
could affect a proper cluster, if at all - can this happen if e.g. a
client, or an MDS, or an OSD dies uncleanly? Or only if several things
go down at once?)
replacement for "traditional" filesystems. In this case I have 11 OSDs,
1 mon, and 1 MDS.
I just had an unclean shutdown (kernel panic) while a large (>1TB) file
was being copied to CephFS (via rsync). Upon bringing the system back
up, I noticed that the (incomplete) file has about 320MB worth of zeroes
at the end.
This is the kind of behavior I would expect of traditional local
filesystems, where file metadata was updated to reflect the new size of
a growing file before disk extents were allocated and filled with data,
so an unclean shutdown results in files with tails of zeroes, but I'm
surprised to see it with Ceph. I expected the OSD side of things should
be atomic with all the BlueStore goodness, checksums, etc. I figured
CephFS would build upon those primitives in a way that this kind of
inconsistency isn't possible.
Is this expected behavior? It's not a huge dealbreaker, but I'd like to
understand how this kind of situation happens in CephFS (and how it
could affect a proper cluster, if at all - can this happen if e.g. a
client, or an MDS, or an OSD dies uncleanly? Or only if several things
go down at once?)
--
Hector Martin (***@marcansoft.com)
Public Key: https://mrcn.st/pub
Hector Martin (***@marcansoft.com)
Public Key: https://mrcn.st/pub