Jan Kasprzak
2018-12-05 08:28:10 UTC
Hello, CEPH users,
having upgraded my CEPH cluster to Luminous, I plan to add new OSD hosts,
and I am looking for setup recommendations.
Intended usage:
- small-ish pool (tens of TB) for RBD volumes used by QEMU
- large pool for object-based cold (or not-so-hot :-) data,
write-once read-many access pattern, average object size
10s or 100s of MBs, probably custom programmed on top of
libradosstriper.
Hardware:
The new OSD hosts have ~30 HDDs 12 TB each, and two 960 GB SSDs.
There is a small RAID-1 root and RAID-1 swap volume spanning both SSDs,
leaving about 900 GB free on each SSD.
The OSD hosts have two CPU sockets (32 cores including SMT), 128 GB RAM.
My questions:
- Filestore or Bluestore? -> probably the later, but I am also considering
using the OSD hosts for QEMU-based VMs which are not performance
critical, and then having the kernel balance the memory usage
between ceph-osd and qemu processes (using Filestore) would
probably be better? Am I right?
- block.db on SSDs? The docs recommend about 4 % of the data size
for block.db, but my SSDs are only 0.6 % of total storage size.
- or would it be better to leave SSD caching on the OS and use LVMcache
or something?
- LVM or simple volumes? I find it a bit strange and bloated to create
32 VGs, each VG for a single HDD or SSD, and have 30 VGs with only
one LV. Could I use /dev/disk/by-id/wwn-0x5000.... symlinks to have
stable device names instead, and have only two VGs for two SSDs?
Thanks for any recommendations.
-Yenya
having upgraded my CEPH cluster to Luminous, I plan to add new OSD hosts,
and I am looking for setup recommendations.
Intended usage:
- small-ish pool (tens of TB) for RBD volumes used by QEMU
- large pool for object-based cold (or not-so-hot :-) data,
write-once read-many access pattern, average object size
10s or 100s of MBs, probably custom programmed on top of
libradosstriper.
Hardware:
The new OSD hosts have ~30 HDDs 12 TB each, and two 960 GB SSDs.
There is a small RAID-1 root and RAID-1 swap volume spanning both SSDs,
leaving about 900 GB free on each SSD.
The OSD hosts have two CPU sockets (32 cores including SMT), 128 GB RAM.
My questions:
- Filestore or Bluestore? -> probably the later, but I am also considering
using the OSD hosts for QEMU-based VMs which are not performance
critical, and then having the kernel balance the memory usage
between ceph-osd and qemu processes (using Filestore) would
probably be better? Am I right?
- block.db on SSDs? The docs recommend about 4 % of the data size
for block.db, but my SSDs are only 0.6 % of total storage size.
- or would it be better to leave SSD caching on the OS and use LVMcache
or something?
- LVM or simple volumes? I find it a bit strange and bloated to create
32 VGs, each VG for a single HDD or SSD, and have 30 VGs with only
one LV. Could I use /dev/disk/by-id/wwn-0x5000.... symlinks to have
stable device names instead, and have only two VGs for two SSDs?
Thanks for any recommendations.
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
This is the world we live in: the way to deal with computers is to google
the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
This is the world we live in: the way to deal with computers is to google
the symptoms, and hope that you don't have to watch a video. --P. Zaitcev