Discussion:
[ceph-users] Is Ceph the right tool for storing lots of small files?
Christian Wimmer
2018-07-17 06:41:19 UTC
Permalink
Hi all,

I am trying to use Ceph with RGW to store lots (>300M) of small files (80%
2-15kB, 20% up to 500kB).
After some testing, I wonder if Ceph is the right tool for that.

Does anybody of you have experience with this use case?

Things I came across:
- EC pools: default stripe-width is 4kB. Does it make sense to lower the
stripe width for small objects or is EC a bad idea for this use case?
- Bluestore: bluestore min alloc size is per default 64kB. Would it be
better to lower it to say 2kB or am I better off with Filestore (probably
not if I want to store a huge amount of small files)?
- Bluestore / RocksDB: RocksDB seems to consume a lot of disk space when
storing lots of files.
For example: I have OSDs with about 500k onodes (which should translate
to 500k stored objects, right?) and the DB size is about 30GB. That's about
63kB per onode - which is a lot, considering the original object is about
5kB.

Thanks,
Christian
Gregory Farnum
2018-07-17 16:45:27 UTC
Permalink
On Mon, Jul 16, 2018 at 11:41 PM Christian Wimmer <
Post by Christian Wimmer
Hi all,
I am trying to use Ceph with RGW to store lots (>300M) of small files (80%
2-15kB, 20% up to 500kB).
After some testing, I wonder if Ceph is the right tool for that.
Does anybody of you have experience with this use case?
- EC pools: default stripe-width is 4kB. Does it make sense to lower the
stripe width for small objects or is EC a bad idea for this use case?
- Bluestore: bluestore min alloc size is per default 64kB. Would it be
better to lower it to say 2kB or am I better off with Filestore (probably
not if I want to store a huge amount of small files)?
- Bluestore / RocksDB: RocksDB seems to consume a lot of disk space when
storing lots of files.
For example: I have OSDs with about 500k onodes (which should translate
to 500k stored objects, right?) and the DB size is about 30GB. That's about
63kB per onode - which is a lot, considering the original object is about
5kB.
Those numbers seem a little large to me (although with erasure coding they
could make sense due to the "object info" replication across shards), but
in general I would not expect Ceph or RGW to be a good fit for files which
tend to be that small from a data storage efficiency standpoint.

That said, you've got about 4TB of data there. Are you sure some large SSDs
in a RAID1(+0) or something wouldn't fulfill your needs? ;) If you're more
concerned about scaling out the IO than the ratio of data stored to data
used, Ceph may still be a good choice. *shrug*
-Greg
Post by Christian Wimmer
Thanks,
Christian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Loading...