Brendan Moloney
2018-11-13 02:19:57 UTC
Hi,
I have been reading up on this a bit, and found one particularly useful mailing list thread [1].
The fact that there is such a large jump when your DB fits into 3 levels (30GB) vs 4 levels (300GB) makes it hard to choose SSDs of an appropriate size. My workload is all RBD, so objects should be large, but I am also looking at purchasing rather large HDDs (12TB). It seems wasteful to spec out 300GB per OSD, but I am worried that I will barely cross the 30GB threshold when the disks get close to full.
It would be nice if we could either enable "dynamic level sizing" (done here [2] for monitors, but not bluestore?), or allow changing the "max_bytes_for_level_base" to something that better suits our use case. For example, if it were set it to 25% of the default (75MB L0 and L1, 750MB L2, 7.5GB L3, 75GB L4) then I could allocate ~85GB per OSD and feel confident there wouldn't be any spill over onto the slow HDDs. I am far from on expert on RocksDB, so I might be overlooking something important here.
[1] https://ceph-users.ceph.narkive.com/tGcDsnAB/slow-used-bytes-slowdb-being-used-despite-lots-of-space-free-in-blockdb-on-ssd
[2] https://tracker.ceph.com/issues/24361
Thanks,
Brendan
I have been reading up on this a bit, and found one particularly useful mailing list thread [1].
The fact that there is such a large jump when your DB fits into 3 levels (30GB) vs 4 levels (300GB) makes it hard to choose SSDs of an appropriate size. My workload is all RBD, so objects should be large, but I am also looking at purchasing rather large HDDs (12TB). It seems wasteful to spec out 300GB per OSD, but I am worried that I will barely cross the 30GB threshold when the disks get close to full.
It would be nice if we could either enable "dynamic level sizing" (done here [2] for monitors, but not bluestore?), or allow changing the "max_bytes_for_level_base" to something that better suits our use case. For example, if it were set it to 25% of the default (75MB L0 and L1, 750MB L2, 7.5GB L3, 75GB L4) then I could allocate ~85GB per OSD and feel confident there wouldn't be any spill over onto the slow HDDs. I am far from on expert on RocksDB, so I might be overlooking something important here.
[1] https://ceph-users.ceph.narkive.com/tGcDsnAB/slow-used-bytes-slowdb-being-used-despite-lots-of-space-free-in-blockdb-on-ssd
[2] https://tracker.ceph.com/issues/24361
Thanks,
Brendan