Discussion:
[ceph-users] SLOW SSD's after moving to Bluestore
Tyler Bishop
2018-12-11 00:09:53 UTC
Permalink
Hi,

I have an SSD only cluster that I recently converted from filestore to
bluestore and performance has totally tanked. It was fairly decent before,
only having a little additional latency than expected. Now since
converting to bluestore the latency is extremely high, SECONDS. I am
trying to determine if it an issue with the SSD's or Bluestore treating
them differently than filestore... potential garbage collection? 24+ hrs ???

I am now seeing constant 100% IO utilization on ALL of the devices and
performance is terrible!

IOSTAT

avg-cpu: %user %nice %system %iowait %steal %idle
1.37 0.00 0.34 18.59 0.00 79.70

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 9.50 0.00 64.00 13.47
0.01 1.16 0.00 1.16 1.11 1.05
sdb 0.00 96.50 4.50 46.50 34.00 11776.00 463.14
132.68 1174.84 782.67 1212.80 19.61 100.00
dm-0 0.00 0.00 5.50 128.00 44.00 8162.00 122.94
507.84 1704.93 674.09 1749.23 7.49 100.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.85 0.00 0.30 23.37 0.00 75.48

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.00 0.00 17.00 11.33
0.01 2.17 0.00 2.17 2.17 0.65
sdb 0.00 24.50 9.50 40.50 74.00 10000.00 402.96
83.44 2048.67 1086.11 2274.46 20.00 100.00
dm-0 0.00 0.00 10.00 33.50 78.00 2120.00 101.06
287.63 8590.47 1530.40 10697.96 22.99 100.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 0.30 11.40 0.00 87.48

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 40.25 13.42
0.01 1.33 0.00 1.33 1.25 0.75
sdb 0.00 314.50 15.50 72.00 122.00 17264.00 397.39
61.21 1013.30 740.00 1072.13 11.41 99.85
dm-0 0.00 0.00 10.00 427.00 78.00 27728.00 127.26
224.12 712.01 1147.00 701.82 2.28 99.85

avg-cpu: %user %nice %system %iowait %steal %idle
1.22 0.00 0.29 4.01 0.00 94.47

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.50 0.00 17.00 9.71
0.00 1.29 0.00 1.29 1.14 0.40
sdb 0.00 0.00 1.00 39.50 8.00 10112.00 499.75
78.19 1711.83 1294.50 1722.39 24.69 100.00
Mark Nelson
2018-12-11 00:43:31 UTC
Permalink
Hi Tyler,

I think we had a user a while back that reported they had background
deletion work going on after upgrading their OSDs from filestore to
bluestore due to PGs having been moved around.  Is it possible that your
cluster is doing a bunch of work (deletion or otherwise) beyond the
regular client load?  I don't remember how to check for this off the top
of my head, but it might be something to investigate.  If that's what it
is, we just recently added the ability to throttle background deletes:

https://github.com/ceph/ceph/pull/24749


If the logs/admin socket don't tell you anything, you could also try
using our wallclock profiler to see what the OSD is spending it's time
doing:

https://github.com/markhpc/gdbpmp/


./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp

./gdbpmp -i foo.gdbpmp -t 1


Mark
Post by Tyler Bishop
Hi,
I have an SSD only cluster that I recently converted from filestore to
bluestore and performance has totally tanked. It was fairly decent
before, only having a little additional latency than expected.  Now
since converting to bluestore the latency is extremely high, SECONDS. 
I am trying to determine if it an issue with the SSD's or Bluestore
treating them differently than filestore... potential garbage
collection? 24+ hrs ???
I am now seeing constant 100% IO utilization on ALL of the devices and
performance is terrible!
IOSTAT
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.37    0.00    0.34   18.59    0.00   79.70
Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await svctm  %util
sda               0.00     0.00    0.00    9.50  0.00    64.00   
13.47     0.01    1.16    0.00    1.16  1.11   1.05
sdb               0.00    96.50    4.50   46.50 34.00 11776.00 
 463.14   132.68 1174.84  782.67 1212.80 19.61 100.00
dm-0              0.00     0.00    5.50  128.00 44.00  8162.00 
 122.94   507.84 1704.93  674.09 1749.23  7.49 100.00
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.85    0.00    0.30   23.37    0.00   75.48
Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await svctm  %util
sda               0.00     0.00    0.00    3.00  0.00    17.00   
11.33     0.01    2.17    0.00    2.17  2.17   0.65
sdb               0.00    24.50    9.50   40.50 74.00 10000.00 
 402.96    83.44 2048.67 1086.11 2274.46 20.00 100.00
dm-0              0.00     0.00   10.00   33.50 78.00  2120.00 
 101.06   287.63 8590.47 1530.40 10697.96 22.99 100.00
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.81    0.00    0.30   11.40    0.00   87.48
Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await svctm  %util
sda               0.00     0.00    0.00    6.00  0.00    40.25   
13.42     0.01    1.33    0.00    1.33  1.25   0.75
sdb               0.00   314.50   15.50   72.00  122.00 17264.00 
 397.39    61.21 1013.30  740.00 1072.13  11.41  99.85
dm-0              0.00     0.00   10.00  427.00 78.00 27728.00 
 127.26   224.12  712.01 1147.00  701.82  2.28  99.85
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.22    0.00    0.29    4.01    0.00   94.47
Device:         rrqm/s   wrqm/s     r/s     w/s rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await svctm  %util
sda               0.00     0.00    0.00    3.50  0.00    17.00   
 9.71     0.00    1.29    0.00    1.29  1.14   0.40
sdb               0.00     0.00    1.00   39.50  8.00 10112.00 
 499.75    78.19 1711.83 1294.50 1722.39 24.69 100.00
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Tyler Bishop
2018-12-11 01:43:40 UTC
Permalink
I don't think thats my issue here because I don't see any IO to justify the
latency. Unless the IO is minimal and its ceph issuing a bunch of discards
to the ssd and its causing it to slow down while doing that.

Log isn't showing anything useful and I have most debugging disabled.
Post by Mark Nelson
Hi Tyler,
I think we had a user a while back that reported they had background
deletion work going on after upgrading their OSDs from filestore to
bluestore due to PGs having been moved around. Is it possible that your
cluster is doing a bunch of work (deletion or otherwise) beyond the
regular client load? I don't remember how to check for this off the top
of my head, but it might be something to investigate. If that's what it
https://github.com/ceph/ceph/pull/24749
If the logs/admin socket don't tell you anything, you could also try
using our wallclock profiler to see what the OSD is spending it's time
https://github.com/markhpc/gdbpmp/
./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
./gdbpmp -i foo.gdbpmp -t 1
Mark
Post by Tyler Bishop
Hi,
I have an SSD only cluster that I recently converted from filestore to
bluestore and performance has totally tanked. It was fairly decent
before, only having a little additional latency than expected. Now
since converting to bluestore the latency is extremely high, SECONDS.
I am trying to determine if it an issue with the SSD's or Bluestore
treating them differently than filestore... potential garbage
collection? 24+ hrs ???
I am now seeing constant 100% IO utilization on ALL of the devices and
performance is terrible!
IOSTAT
avg-cpu: %user %nice %system %iowait %steal %idle
1.37 0.00 0.34 18.59 0.00 79.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 9.50 0.00 64.00
13.47 0.01 1.16 0.00 1.16 1.11 1.05
sdb 0.00 96.50 4.50 46.50 34.00 11776.00
463.14 132.68 1174.84 782.67 1212.80 19.61 100.00
dm-0 0.00 0.00 5.50 128.00 44.00 8162.00
122.94 507.84 1704.93 674.09 1749.23 7.49 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.85 0.00 0.30 23.37 0.00 75.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.00 0.00 17.00
11.33 0.01 2.17 0.00 2.17 2.17 0.65
sdb 0.00 24.50 9.50 40.50 74.00 10000.00
402.96 83.44 2048.67 1086.11 2274.46 20.00 100.00
dm-0 0.00 0.00 10.00 33.50 78.00 2120.00
101.06 287.63 8590.47 1530.40 10697.96 22.99 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 0.30 11.40 0.00 87.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 40.25
13.42 0.01 1.33 0.00 1.33 1.25 0.75
sdb 0.00 314.50 15.50 72.00 122.00 17264.00
397.39 61.21 1013.30 740.00 1072.13 11.41 99.85
dm-0 0.00 0.00 10.00 427.00 78.00 27728.00
127.26 224.12 712.01 1147.00 701.82 2.28 99.85
avg-cpu: %user %nice %system %iowait %steal %idle
1.22 0.00 0.29 4.01 0.00 94.47
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.50 0.00 17.00
9.71 0.00 1.29 0.00 1.29 1.14 0.40
sdb 0.00 0.00 1.00 39.50 8.00 10112.00
499.75 78.19 1711.83 1294.50 1722.39 24.69 100.00
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Christian Balzer
2018-12-11 01:57:19 UTC
Permalink
Hello,
Post by Tyler Bishop
I don't think thats my issue here because I don't see any IO to justify the
latency. Unless the IO is minimal and its ceph issuing a bunch of discards
to the ssd and its causing it to slow down while doing that.
What does atop have to say?

Discards/Trims are usually visible in it, this is during a fstrim of a
RAID1 / :
---
DSK | sdb | busy 81% | read 0 | write 8587 | MBw/s 2323.4 | avio 0.47 ms |
DSK | sda | busy 70% | read 2 | write 8587 | MBw/s 2323.4 | avio 0.41 ms |
---

The numbers tend to be a lot higher than what the actual interface is
capable of, clearly the SSD is reporting its internal activity.

In any case, it should give a good insight of what is going on activity
wise.
Also for posterity and curiosity, what kind of SSDs?

Christian
Post by Tyler Bishop
Log isn't showing anything useful and I have most debugging disabled.
Post by Mark Nelson
Hi Tyler,
I think we had a user a while back that reported they had background
deletion work going on after upgrading their OSDs from filestore to
bluestore due to PGs having been moved around. Is it possible that your
cluster is doing a bunch of work (deletion or otherwise) beyond the
regular client load? I don't remember how to check for this off the top
of my head, but it might be something to investigate. If that's what it
https://github.com/ceph/ceph/pull/24749
If the logs/admin socket don't tell you anything, you could also try
using our wallclock profiler to see what the OSD is spending it's time
https://github.com/markhpc/gdbpmp/
./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
./gdbpmp -i foo.gdbpmp -t 1
Mark
Post by Tyler Bishop
Hi,
I have an SSD only cluster that I recently converted from filestore to
bluestore and performance has totally tanked. It was fairly decent
before, only having a little additional latency than expected. Now
since converting to bluestore the latency is extremely high, SECONDS.
I am trying to determine if it an issue with the SSD's or Bluestore
treating them differently than filestore... potential garbage
collection? 24+ hrs ???
I am now seeing constant 100% IO utilization on ALL of the devices and
performance is terrible!
IOSTAT
avg-cpu: %user %nice %system %iowait %steal %idle
1.37 0.00 0.34 18.59 0.00 79.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 9.50 0.00 64.00
13.47 0.01 1.16 0.00 1.16 1.11 1.05
sdb 0.00 96.50 4.50 46.50 34.00 11776.00
463.14 132.68 1174.84 782.67 1212.80 19.61 100.00
dm-0 0.00 0.00 5.50 128.00 44.00 8162.00
122.94 507.84 1704.93 674.09 1749.23 7.49 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.85 0.00 0.30 23.37 0.00 75.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.00 0.00 17.00
11.33 0.01 2.17 0.00 2.17 2.17 0.65
sdb 0.00 24.50 9.50 40.50 74.00 10000.00
402.96 83.44 2048.67 1086.11 2274.46 20.00 100.00
dm-0 0.00 0.00 10.00 33.50 78.00 2120.00
101.06 287.63 8590.47 1530.40 10697.96 22.99 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 0.30 11.40 0.00 87.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 40.25
13.42 0.01 1.33 0.00 1.33 1.25 0.75
sdb 0.00 314.50 15.50 72.00 122.00 17264.00
397.39 61.21 1013.30 740.00 1072.13 11.41 99.85
dm-0 0.00 0.00 10.00 427.00 78.00 27728.00
127.26 224.12 712.01 1147.00 701.82 2.28 99.85
avg-cpu: %user %nice %system %iowait %steal %idle
1.22 0.00 0.29 4.01 0.00 94.47
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.50 0.00 17.00
9.71 0.00 1.29 0.00 1.29 1.14 0.40
sdb 0.00 0.00 1.00 39.50 8.00 10112.00
499.75 78.19 1711.83 1294.50 1722.39 24.69 100.00
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian Balzer Network/Systems Engineer
***@gol.com Rakuten Communications
Tyler Bishop
2018-12-11 01:58:07 UTC
Permalink
Older Crucial/Micron M500/M600
_____________________________________________

*Tyler Bishop*
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of this
email and any attachment(s).
Post by Christian Balzer
Hello,
Post by Tyler Bishop
I don't think thats my issue here because I don't see any IO to justify
the
Post by Tyler Bishop
latency. Unless the IO is minimal and its ceph issuing a bunch of
discards
Post by Tyler Bishop
to the ssd and its causing it to slow down while doing that.
What does atop have to say?
Discards/Trims are usually visible in it, this is during a fstrim of a
---
DSK | sdb | busy 81% | read 0 | write 8587 | MBw/s
2323.4 | avio 0.47 ms |
DSK | sda | busy 70% | read 2 | write 8587 | MBw/s
2323.4 | avio 0.41 ms |
---
The numbers tend to be a lot higher than what the actual interface is
capable of, clearly the SSD is reporting its internal activity.
In any case, it should give a good insight of what is going on activity
wise.
Also for posterity and curiosity, what kind of SSDs?
Christian
Post by Tyler Bishop
Log isn't showing anything useful and I have most debugging disabled.
Post by Mark Nelson
Hi Tyler,
I think we had a user a while back that reported they had background
deletion work going on after upgrading their OSDs from filestore to
bluestore due to PGs having been moved around. Is it possible that
your
Post by Tyler Bishop
Post by Mark Nelson
cluster is doing a bunch of work (deletion or otherwise) beyond the
regular client load? I don't remember how to check for this off the
top
Post by Tyler Bishop
Post by Mark Nelson
of my head, but it might be something to investigate. If that's what
it
Post by Tyler Bishop
Post by Mark Nelson
https://github.com/ceph/ceph/pull/24749
If the logs/admin socket don't tell you anything, you could also try
using our wallclock profiler to see what the OSD is spending it's time
https://github.com/markhpc/gdbpmp/
./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
./gdbpmp -i foo.gdbpmp -t 1
Mark
Post by Tyler Bishop
Hi,
I have an SSD only cluster that I recently converted from filestore
to
Post by Tyler Bishop
Post by Mark Nelson
Post by Tyler Bishop
bluestore and performance has totally tanked. It was fairly decent
before, only having a little additional latency than expected. Now
since converting to bluestore the latency is extremely high, SECONDS.
I am trying to determine if it an issue with the SSD's or Bluestore
treating them differently than filestore... potential garbage
collection? 24+ hrs ???
I am now seeing constant 100% IO utilization on ALL of the devices
and
Post by Tyler Bishop
Post by Mark Nelson
Post by Tyler Bishop
performance is terrible!
IOSTAT
avg-cpu: %user %nice %system %iowait %steal %idle
1.37 0.00 0.34 18.59 0.00 79.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 9.50 0.00 64.00
13.47 0.01 1.16 0.00 1.16 1.11 1.05
sdb 0.00 96.50 4.50 46.50 34.00 11776.00
463.14 132.68 1174.84 782.67 1212.80 19.61 100.00
dm-0 0.00 0.00 5.50 128.00 44.00 8162.00
122.94 507.84 1704.93 674.09 1749.23 7.49 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.85 0.00 0.30 23.37 0.00 75.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.00 0.00 17.00
11.33 0.01 2.17 0.00 2.17 2.17 0.65
sdb 0.00 24.50 9.50 40.50 74.00 10000.00
402.96 83.44 2048.67 1086.11 2274.46 20.00 100.00
dm-0 0.00 0.00 10.00 33.50 78.00 2120.00
101.06 287.63 8590.47 1530.40 10697.96 22.99 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 0.30 11.40 0.00 87.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 40.25
13.42 0.01 1.33 0.00 1.33 1.25 0.75
sdb 0.00 314.50 15.50 72.00 122.00 17264.00
397.39 61.21 1013.30 740.00 1072.13 11.41 99.85
dm-0 0.00 0.00 10.00 427.00 78.00 27728.00
127.26 224.12 712.01 1147.00 701.82 2.28 99.85
avg-cpu: %user %nice %system %iowait %steal %idle
1.22 0.00 0.29 4.01 0.00 94.47
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.50 0.00 17.00
9.71 0.00 1.29 0.00 1.29 1.14 0.40
sdb 0.00 0.00 1.00 39.50 8.00 10112.00
499.75 78.19 1711.83 1294.50 1722.39 24.69 100.00
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian Balzer Network/Systems Engineer
Tyler Bishop
2018-12-11 02:01:31 UTC
Permalink
LVM | dm-0 | busy 101% | read 137 | write 1761 |
KiB/r 4 | KiB/w 30 | MBr/s 0.1 | MBw/s 5.3 | avq
185.42 | avio 5.31 ms |
DSK | sdb | busy 100% | read 127 | write 1208 |
KiB/r 4 | KiB/w 32 | MBr/s 0.1 | MBw/s 3.9 | avq
58.39 | avio 7.49 ms |
_____________________________________________

Tyler Bishop
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing any
information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of
this email and any attachment(s).
Post by Christian Balzer
Hello,
Post by Tyler Bishop
I don't think thats my issue here because I don't see any IO to justify the
latency. Unless the IO is minimal and its ceph issuing a bunch of discards
to the ssd and its causing it to slow down while doing that.
What does atop have to say?
Discards/Trims are usually visible in it, this is during a fstrim of a
---
DSK | sdb | busy 81% | read 0 | write 8587 | MBw/s 2323.4 | avio 0.47 ms |
DSK | sda | busy 70% | read 2 | write 8587 | MBw/s 2323.4 | avio 0.41 ms |
---
The numbers tend to be a lot higher than what the actual interface is
capable of, clearly the SSD is reporting its internal activity.
In any case, it should give a good insight of what is going on activity
wise.
Also for posterity and curiosity, what kind of SSDs?
Christian
Post by Tyler Bishop
Log isn't showing anything useful and I have most debugging disabled.
Post by Mark Nelson
Hi Tyler,
I think we had a user a while back that reported they had background
deletion work going on after upgrading their OSDs from filestore to
bluestore due to PGs having been moved around. Is it possible that your
cluster is doing a bunch of work (deletion or otherwise) beyond the
regular client load? I don't remember how to check for this off the top
of my head, but it might be something to investigate. If that's what it
https://github.com/ceph/ceph/pull/24749
If the logs/admin socket don't tell you anything, you could also try
using our wallclock profiler to see what the OSD is spending it's time
https://github.com/markhpc/gdbpmp/
./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
./gdbpmp -i foo.gdbpmp -t 1
Mark
Post by Tyler Bishop
Hi,
I have an SSD only cluster that I recently converted from filestore to
bluestore and performance has totally tanked. It was fairly decent
before, only having a little additional latency than expected. Now
since converting to bluestore the latency is extremely high, SECONDS.
I am trying to determine if it an issue with the SSD's or Bluestore
treating them differently than filestore... potential garbage
collection? 24+ hrs ???
I am now seeing constant 100% IO utilization on ALL of the devices and
performance is terrible!
IOSTAT
avg-cpu: %user %nice %system %iowait %steal %idle
1.37 0.00 0.34 18.59 0.00 79.70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 9.50 0.00 64.00
13.47 0.01 1.16 0.00 1.16 1.11 1.05
sdb 0.00 96.50 4.50 46.50 34.00 11776.00
463.14 132.68 1174.84 782.67 1212.80 19.61 100.00
dm-0 0.00 0.00 5.50 128.00 44.00 8162.00
122.94 507.84 1704.93 674.09 1749.23 7.49 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.85 0.00 0.30 23.37 0.00 75.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.00 0.00 17.00
11.33 0.01 2.17 0.00 2.17 2.17 0.65
sdb 0.00 24.50 9.50 40.50 74.00 10000.00
402.96 83.44 2048.67 1086.11 2274.46 20.00 100.00
dm-0 0.00 0.00 10.00 33.50 78.00 2120.00
101.06 287.63 8590.47 1530.40 10697.96 22.99 100.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 0.30 11.40 0.00 87.48
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 40.25
13.42 0.01 1.33 0.00 1.33 1.25 0.75
sdb 0.00 314.50 15.50 72.00 122.00 17264.00
397.39 61.21 1013.30 740.00 1072.13 11.41 99.85
dm-0 0.00 0.00 10.00 427.00 78.00 27728.00
127.26 224.12 712.01 1147.00 701.82 2.28 99.85
avg-cpu: %user %nice %system %iowait %steal %idle
1.22 0.00 0.29 4.01 0.00 94.47
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3.50 0.00 17.00
9.71 0.00 1.29 0.00 1.29 1.14 0.40
sdb 0.00 0.00 1.00 39.50 8.00 10112.00
499.75 78.19 1711.83 1294.50 1722.39 24.69 100.00
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Christian Balzer Network/Systems Engineer
Loading...