Cody
2018-11-27 00:06:34 UTC
Hello,
I have a Ceph cluster deployed together with OpenStack using TripleO.
While the Ceph cluster shows a healthy status, its performance is
painfully slow. After eliminating a possibility of network issues, I
have zeroed in on the Ceph cluster itself, but have no experience in
further debugging and tunning.
The Ceph OSD part of the cluster uses 3 identical servers with the
following specifications:
CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Network: 1G port shared for Ceph public and cluster traffics
Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
This is not beefy enough in any way, but I am running for PoC only,
with minimum utilization.
Ceph-mon and ceph-mgr daemons are hosted on the OpenStack Controller
nodes. Ceph-ansible version is 3.1 and is using Filestore with
non-colocated scenario (1 SSD for every 2 OSDs). Connection speed
among Controllers, Computes, and OSD nodes can reach ~900Mbps tested
using iperf.
I followed the Red Hat Ceph 3 benchmarking procedure [1] and received
following results:
Write Test:
Total time run: 80.313004
Total writes made: 17
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 0.846687
Stddev Bandwidth: 0.320051
Max bandwidth (MB/sec): 2
Min bandwidth (MB/sec): 0
Average IOPS: 0
Stddev IOPS: 0
Max IOPS: 0
Min IOPS: 0
Average Latency(s): 66.6582
Stddev Latency(s): 15.5529
Max latency(s): 80.3122
Min latency(s): 29.7059
Sequencial Read Test:
Total time run: 25.951049
Total reads made: 17
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2.62032
Average IOPS: 0
Stddev IOPS: 0
Max IOPS: 1
Min IOPS: 0
Average Latency(s): 24.4129
Max latency(s): 25.9492
Min latency(s): 0.117732
Random Read Test:
Total time run: 66.355433
Total reads made: 46
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2.77295
Average IOPS: 0
Stddev IOPS: 3
Max IOPS: 27
Min IOPS: 0
Average Latency(s): 21.4531
Max latency(s): 66.1885
Min latency(s): 0.0395266
Apparently, the results are pathetic...
As I moved on to test block devices, I got a following error message:
# rbd map image01 --pool testbench --name client.admin
rbd: failed to add secret 'client.admin' to kernel
Any suggestions on the above error and/or debugging would be greatly
appreciated!
Thank you very much to all.
Cody
[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#benchmarking_performance
I have a Ceph cluster deployed together with OpenStack using TripleO.
While the Ceph cluster shows a healthy status, its performance is
painfully slow. After eliminating a possibility of network issues, I
have zeroed in on the Ceph cluster itself, but have no experience in
further debugging and tunning.
The Ceph OSD part of the cluster uses 3 identical servers with the
following specifications:
CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Network: 1G port shared for Ceph public and cluster traffics
Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
This is not beefy enough in any way, but I am running for PoC only,
with minimum utilization.
Ceph-mon and ceph-mgr daemons are hosted on the OpenStack Controller
nodes. Ceph-ansible version is 3.1 and is using Filestore with
non-colocated scenario (1 SSD for every 2 OSDs). Connection speed
among Controllers, Computes, and OSD nodes can reach ~900Mbps tested
using iperf.
I followed the Red Hat Ceph 3 benchmarking procedure [1] and received
following results:
Write Test:
Total time run: 80.313004
Total writes made: 17
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 0.846687
Stddev Bandwidth: 0.320051
Max bandwidth (MB/sec): 2
Min bandwidth (MB/sec): 0
Average IOPS: 0
Stddev IOPS: 0
Max IOPS: 0
Min IOPS: 0
Average Latency(s): 66.6582
Stddev Latency(s): 15.5529
Max latency(s): 80.3122
Min latency(s): 29.7059
Sequencial Read Test:
Total time run: 25.951049
Total reads made: 17
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2.62032
Average IOPS: 0
Stddev IOPS: 0
Max IOPS: 1
Min IOPS: 0
Average Latency(s): 24.4129
Max latency(s): 25.9492
Min latency(s): 0.117732
Random Read Test:
Total time run: 66.355433
Total reads made: 46
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2.77295
Average IOPS: 0
Stddev IOPS: 3
Max IOPS: 27
Min IOPS: 0
Average Latency(s): 21.4531
Max latency(s): 66.1885
Min latency(s): 0.0395266
Apparently, the results are pathetic...
As I moved on to test block devices, I got a following error message:
# rbd map image01 --pool testbench --name client.admin
rbd: failed to add secret 'client.admin' to kernel
Any suggestions on the above error and/or debugging would be greatly
appreciated!
Thank you very much to all.
Cody
[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#benchmarking_performance