[ceph-users] rsync kernel client cepfs mkstemp no space left on device

Discussion:

Hauke Homburg

2016-10-07 13:15:23 UTC

Hello,

I have a Ceph Cluster with 5 Server, and 40 OSD. Aktual on this Cluster
are 85GB Free Space, and the rsync dir has lots of Pictures and a Data
Volume of 40GB.

The Linux is a Centos 7 and the Last stable Ceph. The Client is a Debian
8 with Kernel 4 and the Cluster is with cephfs mounted.

When i sync the Directory i see often the Message rsync mkstemp no space
left on device (28). At this Point i can touch a File in anotherDiretory
in the Cluster. In the Diretory i have ~ 630000 Files. Are this too much
Files?

greetings

Hauke

--
www.w3-creative.de

www.westchat.de

Gregory Farnum

2016-10-07 15:37:04 UTC

Permalink

Post by Hauke Homburg
Hello,
I have a Ceph Cluster with 5 Server, and 40 OSD. Aktual on this Cluster
are 85GB Free Space, and the rsync dir has lots of Pictures and a Data
Volume of 40GB.
The Linux is a Centos 7 and the Last stable Ceph. The Client is a Debian
8 with Kernel 4 and the Cluster is with cephfs mounted.
When i sync the Directory i see often the Message rsync mkstemp no space
left on device (28). At this Point i can touch a File in anotherDiretory
in the Cluster. In the Diretory i have ~ 630000 Files. Are this too much
Files?

Yes, in recent releases CephFS limits you to 100k dentries in a single
directory fragment. This *includes* the "stray" directories that files
get moved into when you unlink them, and is intended to prevent issues
with very large folders. It will stop being a problem once we enable
automatic fragmenting (soon, hopefully).
You can change that by changing the "mds bal fragment size max"
config, but you're probably better off by figuring out if you've got
an over-large directory or if you're deleting files faster than the
cluster can keep up. There was a thread about this very recently and
John included some details about tuning if you check the archives. :)
-Greg

Hauke Homburg

2016-10-10 08:05:29 UTC

Permalink

Post by Gregory Farnum

Hello,

Thanks for the answer.
I enabled on the Cluster the mds bal frag = true Options.

Today i read that i have to enable this option on the Client, too. With
a Fuse mount i can do it with the ceph Binary. I use the Kernel Module.
How can i do it there?

Regards

Hauke

--
www.w3-creative.de

www.westchat.de

John Spray

2016-10-10 09:33:48 UTC

Permalink

Post by Hauke Homburg

Post by Gregory Farnum

Hello,
Thanks for the answer.
I enabled on the Cluster the mds bal frag = true Options.
Today i read that i have to enable this option on the Client, too. With
a Fuse mount i can do it with the ceph Binary. I use the Kernel Module.
How can i do it there?

mds_bal_frag is only a server side thing. You do also need to do the
"ceph fs set <name> allow_dirfrags true", which you can run from any
client with an admin key (but again, this is a server side thing, not
a client setting).

Note that the reason directory fragmentation is not enabled by default
is that it wasn't thoroughly tested ahead of Jewel, so there's a
reason it requires a --yes-i-really-mean-it.

John

Post by Hauke Homburg
Regards
Hauke
--
www.w3-creative.de
www.westchat.de
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hauke Homburg

2016-10-12 07:42:40 UTC

Permalink

Post by John Spray

Post by Hauke Homburg

Post by Gregory Farnum

Hello,
Thanks for the answer.
I enabled on the Cluster the mds bal frag = true Options.
Today i read that i have to enable this option on the Client, too. With
a Fuse mount i can do it with the ceph Binary. I use the Kernel Module.
How can i do it there?

mds_bal_frag is only a server side thing. You do also need to do the
"ceph fs set <name> allow_dirfrags true", which you can run from any
client with an admin key (but again, this is a server side thing, not
a client setting).
Note that the reason directory fragmentation is not enabled by default
is that it wasn't thoroughly tested ahead of Jewel, so there's a
reason it requires a --yes-i-really-mean-it.
John

Hello,

Yesterday i found the correct Command Line to enable the allow_dirfrags:

ceph mds set allow_dirfrags true --yes-i-really-mean-it

I startet the Line on a CLient with Debian 8, Kernel 4 and ceph Client
10.2.3 installed ceph-fs-common and ceph-common.

Currently i rsync some TB with some Directory with more than 100K
Entries in it into the Ceph Cluster. For testing.

Do you know a Roadmap for the Directory Fragmentation to be stable?
ceph.com is currently offline.

Regards

Hauke

--
www.w3-creative.de

www.westchat.de

Hauke Homburg

2016-12-07 07:32:10 UTC

Permalink

Post by Hauke Homburg

Post by Gregory Farnum

Hello,
Thanks for the answer.
I enabled on the Cluster the mds bal frag = true Options.
Today i read that i have to enable this option on the Client, too. With
a Fuse mount i can do it with the ceph Binary. I use the Kernel Module.
How can i do it there?
Regards
Hauke

Hello,

After some Discussion in our Team we have deleted die Cephfs and
switched to rados with ext4.

Now we want to realive the Setup:

1 Ceph Cluster Jewel 10.0.2.3
5 Server with Ceph 10.0.2.3 Client with rados installed. We map all 5
Mons of our Cluster into every rbd map call. To have a Failover.

Aktually we have the Problem, that we can store Data into The Cluster
with rsync, but when rsync is deleting Files, ext4 becomes Filesystem
errors.

I undersstand Ceph with rbd so, that i can use Ceph als Cluster
Filesystem like ocfs2. So i don't unterstand, why i have Filesytem Errors.

I read in some Postings here, Ceph needs Filesystem Locking like DLM. Is
this true? In the aktual Version Jewel? Doesn't do libceph this Locking?

Thanks for Help

Hauke

--
www.w3-creative.de

www.westchat.de

Mike Miller

2016-12-11 16:38:14 UTC

Permalink

Hi,

you have given up too early. rsync is not a nice workload for cephfs, in
particular, most linux kernel clients cephfs will end up caching all
inodes/dentries. The result is that mds servers crash due to memory
limitations. And rsync basically screens all inodes/dentries so it is
the perfect application to gobble up all inode caps.

We run a cronjob script flush_cache every few (2-5) minutes:

#!/bin/bash
echo 2 > /proc/sys/vm/drop_caches

on all machines that mount cephfs. There is no performance drop in the
client machines, but happily, the mds congestion is solved by this.

We also went the rbd way before this, but for large rbd images we much
prefer cephfs instead.

Regards,

Mike

John Spray

2016-12-12 00:40:28 UTC

Permalink

Post by Mike Miller
Hi,
you have given up too early. rsync is not a nice workload for cephfs, in
particular, most linux kernel clients cephfs will end up caching all
inodes/dentries. The result is that mds servers crash due to memory
limitations. And rsync basically screens all inodes/dentries so it is the
perfect application to gobble up all inode caps.

While historically there have been client bugs that prevented the MDS
from enforcing cache size limits, this is not expected behaviour --
manually calling drop_caches is most definitely a workaround and not
something that I would recommend unless you're stuck with a
known-buggy client version for some reason.

Just felt the need to point that out in case people started picking
this up as a best practice!

Cheers,
John

Post by Mike Miller
#!/bin/bash
echo 2 > /proc/sys/vm/drop_caches
on all machines that mount cephfs. There is no performance drop in the
client machines, but happily, the mds congestion is solved by this.
We also went the rbd way before this, but for large rbd images we much
prefer cephfs instead.
Regards,
Mike
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Mike Miller

2016-12-12 08:08:50 UTC

Permalink

John,

thanks for emphasizing this, before this workaround we tried many
different kernel versions including 4.5.x, all the same. The problem
might be particular to our environment as most of the client machines
(compute servers) have large RAM, so plenty of cache space for
inodes/dentries.

Cheers,

Mike