Discussion:
Bluestore with so many small files
(too old to reply)
Behnam Loghmani
2018-02-12 13:06:43 UTC
Permalink
Hi there,

I am using ceph Luminous 12.2.2 with:

3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3

I stored lots of thumbnails with very small size on ceph with radosgw.

Actual size of files is something about 32G but it filled 70G of each osd.

what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change it and set
it to smaller size, does it impact on performance?

what is the best practice for storing small files on bluestore?

Best regards,
Behnam Loghmani
David Turner
2018-02-12 13:36:11 UTC
Permalink
Some of your overhead is the Wal and rocksdb that are on the OSDs. The Wal
is pretty static in size, but rocksdb grows with the amount of objects you
have. You also have copies of the osdmap on each osd. There's just overhead
that adds up. The biggest is going to be rocksdb with how many objects you
have.
Post by Behnam Loghmani
Hi there,
3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3
I stored lots of thumbnails with very small size on ceph with radosgw.
Actual size of files is something about 32G but it filled 70G of each osd.
what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change it and set
it to smaller size, does it impact on performance?
what is the best practice for storing small files on bluestore?
Best regards,
Behnam Loghmani
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Behnam Loghmani
2018-02-12 14:16:31 UTC
Permalink
so you mean that rocksdb and osdmap filled disk about 40G for only 800k
files?
I think it's not reasonable and it's too high
Post by David Turner
Some of your overhead is the Wal and rocksdb that are on the OSDs. The Wal
is pretty static in size, but rocksdb grows with the amount of objects you
have. You also have copies of the osdmap on each osd. There's just overhead
that adds up. The biggest is going to be rocksdb with how many objects you
have.
Post by Behnam Loghmani
Hi there,
3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3
I stored lots of thumbnails with very small size on ceph with radosgw.
Actual size of files is something about 32G but it filled 70G of each osd.
what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change it and
set it to smaller size, does it impact on performance?
what is the best practice for storing small files on bluestore?
Best regards,
Behnam Loghmani
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Wido den Hollander
2018-02-12 15:35:03 UTC
Permalink
Post by Behnam Loghmani
so you mean that rocksdb and osdmap filled disk about 40G for only 800k
files?
I think it's not reasonable and it's too high
Could you check the output of the OSDs using a 'perf dump' on their
admin socket?

The 'bluestore' and 'bluefs' sections should tell you:

- db_used_bytes
- onodes

using those values you can figure out how much data the DB is using and
how many objects you have in the OSD.

Wido
Post by Behnam Loghmani
Some of your overhead is the Wal and rocksdb that are on the OSDs.
The Wal is pretty static in size, but rocksdb grows with the amount
of objects you have. You also have copies of the osdmap on each osd.
There's just overhead that adds up. The biggest is going to be
rocksdb with how many objects you have.
On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani
Hi there,
3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3
I stored lots of thumbnails with very small size on ceph with radosgw.
Actual size of files is something about 32G but it filled 70G of each osd.
what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change
it and set it to smaller size, does it impact on performance?
what is the best practice for storing small files on bluestore?
Best regards,
Behnam Loghmani
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Igor Fedotov
2018-02-13 09:42:51 UTC
Permalink
Hi Behnam,
Post by Behnam Loghmani
Hi there,
3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3
I stored lots of thumbnails with very small size on ceph with radosgw.
Actual size of files is something about 32G but it filled 70G of each osd.
what's the reason of this high disk usage?
Most probably the major reason is BlueStore allocation granularity. E.g.
an object of 1K bytes length needs 64K of disk space if default
bluestore_min_alloc_size_hdd  (=64K) is applied.
Additional inconsistency in space reporting might also appear since
BlueStore adds up DB volume space when accounting total store space.
While free space is taken from Block device only. is As a result when
reporting "Used" space always contain that total DB space part ( i.e.
Used = Total(Block+DB) - Free(Block) ). That correlates to other
comments in this thread about RockDB space usage.
There is a pending PR to fix that:
https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
You may look for "Bluestore: inaccurate disk usage statistics problem"
in this mail list for previous discussion as well.
Post by Behnam Loghmani
should I change "bluestore_min_alloc_size_hdd"? and If I change it and
set it to smaller size, does it impact on performance?
Unfortunately I haven't benchmark "small writes over hdd" cases much
hence don't have exacts answer here. Indeed these 'min_alloc_size'
family of parameters might impact the performance quite significantly.
Post by Behnam Loghmani
what is the best practice for storing small files on bluestore?
Best regards,
Behnam Loghmani
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Loading...