Discussion:
[ceph-users] kernel:rbd:rbd0: encountered watch error: -10
x***@iluvatar.ai
2018-11-10 06:35:18 UTC
Permalink
Hi !

I meet a confused case:

When write to cephfs and rbd at same time, after a while, rbd process is hang and i find:

kernel:rbd:rbd0: encountered watch error: -10

I try to reproduce with below action and succeed:

- run 2 dd process to write to cephfs
- do file write action on rbd

I find that lots of cpu are in iowait status, and lots of kernel process in D status.

I guess that:

- the process in the D state is mainly kswapd and writeback dirty page write-back thread process.
when IO wait queue of the rbd disk is very long, then any process do IO operations on rbd disk,
they need to be queued and wait for a long time and in the D state, the kernel will automatically print out the call stack after more than 120s

- rbd hang since rbd client use watch-notify to communicate, when iowait stress is high, may do impact on it

- cephfs and rbd share network bandwidth, and we use 40GB IB for ceph, network speed is too faster than disk speed

Only workaround i can think about is refresh page cache by crond, but it may result in performance degradation.

Could someone help me?

Why rbd hang and how can I fix?

I really want to use cephfs and rbd at same time, but this issue is so bad for production environment.

Thanks

Loading...