Discussion:
[ceph-users] RBD-mirror high cpu usage?
Magnus Grönlund
2018-11-15 14:24:56 UTC
Permalink
Hi,

I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an
openstack cloud, but the rbd-mirror is unable to “catch up” with the
changes. However it appears to me as if it's not due to the ceph-clusters
or the network but due to the server running the rbd-mirror process running
out of cpu?

Is a high cpu load to be expected or is it a symptom of something else?
Or in other words, what can I check/do to get the mirroring working? 😊

# rbd mirror pool status nova
health: WARNING
images: 596 total
572 starting_replay
24 replaying

top - 13:31:36 up 79 days, 5:31, 1 user, load average: 32.27, 26.82,
25.33
Tasks: 360 total, 17 running, 182 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.9 us, 70.0 sy, 0.0 ni, 18.5 id, 0.0 wa, 0.0 hi, 2.7 si,
0.0 st
KiB Mem : 13205185+total, 12862490+free, 579508 used, 2847444 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12948856+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
2336553 ceph 20 0 17.1g 178160 20344 S 417.2 0.1 21:50.61
rbd-mirror
2312698 root 20 0 0 0 0 I 70.2 0.0 70:11.51
kworker/12:2
2312851 root 20 0 0 0 0 R 69.2 0.0 62:29.69
kworker/24:1
2324627 root 20 0 0 0 0 I 68.4 0.0 40:36.77
kworker/14:1
2235817 root 20 0 0 0 0 I 68.0 0.0 469:14.08
kworker/8:0
2241720 root 20 0 0 0 0 R 67.3 0.0 437:46.51
kworker/9:1
2306648 root 20 0 0 0 0 R 66.9 0.0 109:27.44
kworker/25:0
2324625 root 20 0 0 0 0 R 66.9 0.0 40:37.53
kworker/13:1
2336318 root 20 0 0 0 0 R 66.7 0.0 14:51.96
kworker/27:3
2324643 root 20 0 0 0 0 I 66.5 0.0 36:21.46
kworker/15:2
2294989 root 20 0 0 0 0 I 66.3 0.0 134:09.89
kworker/11:1
2324626 root 20 0 0 0 0 I 66.3 0.0 39:44.14
kworker/28:2
2324019 root 20 0 0 0 0 I 65.3 0.0 44:51.80
kworker/26:1
2235814 root 20 0 0 0 0 R 65.1 0.0 459:14.70
kworker/29:2
2294174 root 20 0 0 0 0 I 64.5 0.0 220:58.50
kworker/30:1
2324355 root 20 0 0 0 0 R 63.3 0.0 45:04.29
kworker/10:1
2263800 root 20 0 0 0 0 R 62.9 0.0 353:38.48
kworker/31:1
2270765 root 20 0 0 0 0 R 60.2 0.0 294:46.34
kworker/0:0
2294798 root 20 0 0 0 0 R 59.8 0.0 148:48.23
kworker/1:2
2307128 root 20 0 0 0 0 R 59.8 0.0 86:15.45
kworker/6:2
2307129 root 20 0 0 0 0 I 59.6 0.0 85:29.66
kworker/5:0
2294826 root 20 0 0 0 0 R 58.2 0.0 138:53.56
kworker/7:3
2294575 root 20 0 0 0 0 I 57.8 0.0 155:03.74
kworker/2:3
2294310 root 20 0 0 0 0 I 57.2 0.0 176:10.92
kworker/4:2
2295000 root 20 0 0 0 0 I 57.2 0.0 132:47.28
kworker/3:2
2307060 root 20 0 0 0 0 I 56.6 0.0 87:46.59
kworker/23:2
2294931 root 20 0 0 0 0 I 56.4 0.0 133:31.47
kworker/17:2
2318659 root 20 0 0 0 0 I 56.2 0.0 55:01.78
kworker/16:2
2336304 root 20 0 0 0 0 I 56.0 0.0 11:45.92
kworker/21:2
2306947 root 20 0 0 0 0 R 55.6 0.0 90:45.31
kworker/22:2
2270628 root 20 0 0 0 0 I 53.8 0.0 273:43.31
kworker/19:3
2294797 root 20 0 0 0 0 R 52.3 0.0 141:13.67
kworker/18:0
2330537 root 20 0 0 0 0 R 52.3 0.0 25:33.25
kworker/20:2

The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6
nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror
runs on a separate server with 2* E5-2650v2 cpus and 128GB memory.

Best regards
/Magnus
Magnus Grönlund
2018-11-21 08:22:23 UTC
Permalink
Hi,

Answering my own question, the high load was related to the cpufreq kernel
module. Unloaded the cpufreq module and the CPU load instantly dropped and
the mirroring started to work.
Obviously there is a bug somewhere but for the moment I’m just happy it
works.

/Magnus
Post by Magnus Grönlund
Hi,
I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an
openstack cloud, but the rbd-mirror is unable to “catch up” with the
changes. However it appears to me as if it's not due to the ceph-clusters
or the network but due to the server running the rbd-mirror process running
out of cpu?
Is a high cpu load to be expected or is it a symptom of something else?
Or in other words, what can I check/do to get the mirroring working? 😊
# rbd mirror pool status nova
health: WARNING
images: 596 total
572 starting_replay
24 replaying
top - 13:31:36 up 79 days, 5:31, 1 user, load average: 32.27, 26.82,
25.33
Tasks: 360 total, 17 running, 182 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.9 us, 70.0 sy, 0.0 ni, 18.5 id, 0.0 wa, 0.0 hi, 2.7 si,
0.0 st
KiB Mem : 13205185+total, 12862490+free, 579508 used, 2847444 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12948856+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+ COMMAND
2336553 ceph 20 0 17.1g 178160 20344 S 417.2 0.1 21:50.61
rbd-mirror
2312698 root 20 0 0 0 0 I 70.2 0.0 70:11.51
kworker/12:2
2312851 root 20 0 0 0 0 R 69.2 0.0 62:29.69
kworker/24:1
2324627 root 20 0 0 0 0 I 68.4 0.0 40:36.77
kworker/14:1
2235817 root 20 0 0 0 0 I 68.0 0.0 469:14.08
kworker/8:0
2241720 root 20 0 0 0 0 R 67.3 0.0 437:46.51
kworker/9:1
2306648 root 20 0 0 0 0 R 66.9 0.0 109:27.44
kworker/25:0
2324625 root 20 0 0 0 0 R 66.9 0.0 40:37.53
kworker/13:1
2336318 root 20 0 0 0 0 R 66.7 0.0 14:51.96
kworker/27:3
2324643 root 20 0 0 0 0 I 66.5 0.0 36:21.46
kworker/15:2
2294989 root 20 0 0 0 0 I 66.3 0.0 134:09.89
kworker/11:1
2324626 root 20 0 0 0 0 I 66.3 0.0 39:44.14
kworker/28:2
2324019 root 20 0 0 0 0 I 65.3 0.0 44:51.80
kworker/26:1
2235814 root 20 0 0 0 0 R 65.1 0.0 459:14.70
kworker/29:2
2294174 root 20 0 0 0 0 I 64.5 0.0 220:58.50
kworker/30:1
2324355 root 20 0 0 0 0 R 63.3 0.0 45:04.29
kworker/10:1
2263800 root 20 0 0 0 0 R 62.9 0.0 353:38.48
kworker/31:1
2270765 root 20 0 0 0 0 R 60.2 0.0 294:46.34
kworker/0:0
2294798 root 20 0 0 0 0 R 59.8 0.0 148:48.23
kworker/1:2
2307128 root 20 0 0 0 0 R 59.8 0.0 86:15.45
kworker/6:2
2307129 root 20 0 0 0 0 I 59.6 0.0 85:29.66
kworker/5:0
2294826 root 20 0 0 0 0 R 58.2 0.0 138:53.56
kworker/7:3
2294575 root 20 0 0 0 0 I 57.8 0.0 155:03.74
kworker/2:3
2294310 root 20 0 0 0 0 I 57.2 0.0 176:10.92
kworker/4:2
2295000 root 20 0 0 0 0 I 57.2 0.0 132:47.28
kworker/3:2
2307060 root 20 0 0 0 0 I 56.6 0.0 87:46.59
kworker/23:2
2294931 root 20 0 0 0 0 I 56.4 0.0 133:31.47
kworker/17:2
2318659 root 20 0 0 0 0 I 56.2 0.0 55:01.78
kworker/16:2
2336304 root 20 0 0 0 0 I 56.0 0.0 11:45.92
kworker/21:2
2306947 root 20 0 0 0 0 R 55.6 0.0 90:45.31
kworker/22:2
2270628 root 20 0 0 0 0 I 53.8 0.0 273:43.31
kworker/19:3
2294797 root 20 0 0 0 0 R 52.3 0.0 141:13.67
kworker/18:0
2330537 root 20 0 0 0 0 R 52.3 0.0 25:33.25
kworker/20:2
The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6
nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror
runs on a separate server with 2* E5-2650v2 cpus and 128GB memory.
Best regards
/Magnus
Loading...