Dylan McCulloch
2018-10-08 06:57:18 UTC
Hi all,
We have identified some unexpected blocking behaviour by the ceph-fs kernel client.
When performing 'rm' on large files (100+GB), there appears to be a significant delay of 10 seconds or more, before a 'stat' operation can be performed on the same directory on the filesystem.
Looking at the kernel client's mds inflight-ops, we observe that there are pending
UNLINK operations corresponding to the deleted files.
We have noted some correlation between files being in the client page cache and the blocking behaviour. For example, if the cache is dropped or the filesystem remounted the blocking will not occur.
Test scenario below:
/mnt/cephfs_mountpoint type ceph (rw,relatime,name=ceph_filesystem,secret=<hidden>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
Test1:
1) unmount & remount:
2) Add 10 x 100GB files to a directory:
for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
3) Delete all files in directory:
for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
4) Immediately perform ls on directory:
time ls /mnt/cephfs_mountpoint/test1
Result: delay ~16 seconds
real 0m16.818s
user 0m0.000s
sys 0m0.002s
Test2:
1) unmount & remount
2) Add 10 x 100GB files to a directory
for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
3) Either a) unmount & remount; or b) drop caches
echo 3 >/proc/sys/vm/drop_caches
4) Delete files in directory:
for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
5) Immediately perform ls on directory:
time ls /mnt/cephfs_mountpoint/test1
Result: no delay
real 0m0.010s
user 0m0.000s
sys 0m0.001s
Our understanding of ceph-fs file deletion mechanism, is that there should be no blocking observed on the client. http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
It appears that if files are cached on the client, either by being created or accessed recently it will cause the kernel client to block for reasons we have not identified.
Is this a known issue, are there any ways to mitigate this behaviour?
Our production system relies on our clients processes having concurrent access to the file system, and access contention must be avoided.
An old mailing list post that discusses changes to clients page cache behaviour may be relevant.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
Client System:
OS: RHEL7
Kernel: 4.15.15-1
Cluster: Ceph: Luminous 12.2.8
Thanks,
Dylan
We have identified some unexpected blocking behaviour by the ceph-fs kernel client.
When performing 'rm' on large files (100+GB), there appears to be a significant delay of 10 seconds or more, before a 'stat' operation can be performed on the same directory on the filesystem.
Looking at the kernel client's mds inflight-ops, we observe that there are pending
UNLINK operations corresponding to the deleted files.
We have noted some correlation between files being in the client page cache and the blocking behaviour. For example, if the cache is dropped or the filesystem remounted the blocking will not occur.
Test scenario below:
/mnt/cephfs_mountpoint type ceph (rw,relatime,name=ceph_filesystem,secret=<hidden>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
Test1:
1) unmount & remount:
2) Add 10 x 100GB files to a directory:
for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
3) Delete all files in directory:
for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
4) Immediately perform ls on directory:
time ls /mnt/cephfs_mountpoint/test1
Result: delay ~16 seconds
real 0m16.818s
user 0m0.000s
sys 0m0.002s
Test2:
1) unmount & remount
2) Add 10 x 100GB files to a directory
for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
3) Either a) unmount & remount; or b) drop caches
echo 3 >/proc/sys/vm/drop_caches
4) Delete files in directory:
for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
5) Immediately perform ls on directory:
time ls /mnt/cephfs_mountpoint/test1
Result: no delay
real 0m0.010s
user 0m0.000s
sys 0m0.001s
Our understanding of ceph-fs file deletion mechanism, is that there should be no blocking observed on the client. http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
It appears that if files are cached on the client, either by being created or accessed recently it will cause the kernel client to block for reasons we have not identified.
Is this a known issue, are there any ways to mitigate this behaviour?
Our production system relies on our clients processes having concurrent access to the file system, and access contention must be avoided.
An old mailing list post that discusses changes to clients page cache behaviour may be relevant.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
Client System:
OS: RHEL7
Kernel: 4.15.15-1
Cluster: Ceph: Luminous 12.2.8
Thanks,
Dylan