Discussion:
max number of pools per cluster
Add Reply
Aleksei Gutikov
2018-02-08 14:36:46 UTC
Reply
Permalink
Raw Message
Hi all.

We use RBDs as storage of data for applications.
If application itself can do replication (for example Cassandra),
we want to get profit (HA) from replication on app level.
But we can't if all RBDs are in same pool.
If all RBDs are in same pool - then all rbds are tied up with one set of
PGs.
And if for any reason even single PG was damaged and for example stuck
inactive - then all RBDs will be affected.

First that come to mind is to create a separate pool for every RBD.

I'm aware of max number of PGs per OSD and about osd_pool_default_pg_num
that should be reasonable.
So max number of pools == osds_num * pgs_per_osd / min_pool_pgs.
For example 1000 osds * 300 pg per osd / 32 pgs per pool = 9375.
If osd size 1T then average RBD size will be 100G (looks sane).

So my question is: is there any theoretical limit of pools per cluster?
And, maybe, what it depends on?

Thanks.
--
Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
Jamie Fargen
2018-02-08 15:32:28 UTC
Reply
Permalink
Raw Message
Aleksei-

This won't be a ceph answer. Most virtualization platforms you will have a
type of disk called ephemeral, it is usually storage composed of disks on
the hypervisor, possibly RAID with parity, usually not backed up. You may
want to consider running your Cassandra instances on the ephemeral storage,
this would alleviate the data redundancy at the application and storage
level for the Cassandra service. Then keep backups of your Cassandra db on
the Ceph storage. There are some benefits and drawbacks, the main benefit
will probably be a latency decrease. You will need to evaluate the
hypervisors you are running on, disk layout, etc.

-Jamie
Post by Aleksei Gutikov
Hi all.
We use RBDs as storage of data for applications.
If application itself can do replication (for example Cassandra),
we want to get profit (HA) from replication on app level.
But we can't if all RBDs are in same pool.
If all RBDs are in same pool - then all rbds are tied up with one set of
PGs.
And if for any reason even single PG was damaged and for example stuck
inactive - then all RBDs will be affected.
First that come to mind is to create a separate pool for every RBD.
I'm aware of max number of PGs per OSD and about osd_pool_default_pg_num
that should be reasonable.
So max number of pools == osds_num * pgs_per_osd / min_pool_pgs.
For example 1000 osds * 300 pg per osd / 32 pgs per pool = 9375.
If osd size 1T then average RBD size will be 100G (looks sane).
So my question is: is there any theoretical limit of pools per cluster?
And, maybe, what it depends on?
Thanks.
--
Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Jamie Fargen
Consultant
***@redhat.com
813-817-4430
Konstantin Shalygin
2018-02-12 04:03:57 UTC
Reply
Permalink
Raw Message
Post by Aleksei Gutikov
And if for any reason even single PG was damaged and for example stuck
inactive - then all RBDs will be affected.
First that come to mind is to create a separate pool for every RBD.
I think this is insane.
Is better to think how Kipod save data in CRUSH. Plan your failure
domains and perform full stack monitoring (hots, power, network...).





k
Chris Apsey
2018-02-12 04:18:58 UTC
Reply
Permalink
Raw Message
All,

Recently doubled the number of OSDs in our cluster, and towards the end
of the rebalancing, I noticed that recovery IO fell to nothing and that
the ceph mons eventually looked like this when I ran ceph -s

cluster:
id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780
health: HEALTH_WARN
34922/4329975 objects misplaced (0.807%)
Reduced data availability: 542 pgs inactive, 49 pgs
peering, 13502 pgs stale
Degraded data redundancy: 248778/4329975 objects
degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs
undersized

services:
mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
osd: 376 osds: 376 up, 376 in

data:
pools: 9 pools, 13952 pgs
objects: 1409k objects, 5992 GB
usage: 31528 GB used, 1673 TB / 1704 TB avail
pgs: 3.225% pgs unknown
0.659% pgs not active
248778/4329975 objects degraded (5.745%)
34922/4329975 objects misplaced (0.807%)
6141 stale+active+clean
4537 stale+active+remapped+backfilling
1575 stale+active+undersized+degraded
489 stale+active+clean+remapped
450 unknown
396 stale+active+recovery_wait+degraded
216
stale+active+undersized+degraded+remapped+backfilling
40 stale+peering
30 stale+activating
24 stale+active+undersized+remapped
22 stale+active+recovering+degraded
13 stale+activating+degraded
9 stale+remapped+peering
4 stale+active+remapped+backfill_wait
3 stale+active+clean+scrubbing+deep
2
stale+active+undersized+degraded+remapped+backfill_wait
1 stale+active+remapped

The problem is, everything works fine. If I run ceph health detail and
do a pg query against one of the 'degraded' placement groups, it reports
back as active-clean. All clients in the cluster can write and read at
normal speeds, but not IO information is ever reported in ceph -s.

From what I can see, everything in the cluster is working properly
except the actual reporting on the status of the cluster. Has anyone
seen this before/know how to sync the mons up to what the OSDs are
actually reporting? I see no connectivity errors in the logs of the
mons or the osds.

Thanks,

---
v/r

Chris Apsey
***@bitskrieg.net
https://www.bitskrieg.net
Gregory Farnum
2018-02-12 16:51:11 UTC
Reply
Permalink
Raw Message
Post by Chris Apsey
All,
Recently doubled the number of OSDs in our cluster, and towards the end
of the rebalancing, I noticed that recovery IO fell to nothing and that
the ceph mons eventually looked like this when I ran ceph -s
id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780
health: HEALTH_WARN
34922/4329975 objects misplaced (0.807%)
Reduced data availability: 542 pgs inactive, 49 pgs
peering, 13502 pgs stale
Degraded data redundancy: 248778/4329975 objects
degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs
undersized
mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
osd: 376 osds: 376 up, 376 in
pools: 9 pools, 13952 pgs
objects: 1409k objects, 5992 GB
usage: 31528 GB used, 1673 TB / 1704 TB avail
pgs: 3.225% pgs unknown
0.659% pgs not active
248778/4329975 objects degraded (5.745%)
34922/4329975 objects misplaced (0.807%)
6141 stale+active+clean
4537 stale+active+remapped+backfilling
1575 stale+active+undersized+degraded
489 stale+active+clean+remapped
450 unknown
396 stale+active+recovery_wait+degraded
216
stale+active+undersized+degraded+remapped+backfilling
40 stale+peering
30 stale+activating
24 stale+active+undersized+remapped
22 stale+active+recovering+degraded
13 stale+activating+degraded
9 stale+remapped+peering
4 stale+active+remapped+backfill_wait
3 stale+active+clean+scrubbing+deep
2
stale+active+undersized+degraded+remapped+backfill_wait
1 stale+active+remapped
The problem is, everything works fine. If I run ceph health detail and
do a pg query against one of the 'degraded' placement groups, it reports
back as active-clean. All clients in the cluster can write and read at
normal speeds, but not IO information is ever reported in ceph -s.
From what I can see, everything in the cluster is working properly
except the actual reporting on the status of the cluster. Has anyone
seen this before/know how to sync the mons up to what the OSDs are
actually reporting? I see no connectivity errors in the logs of the
mons or the osds.
It sounds like the manager has gone stale somehow. You can probably fix it
by restarting, though if you have logs it would be good to file a bug
report at tracker.ceph.com.
-Greg
Post by Chris Apsey
Thanks,
---
v/r
Chris Apsey
https://www.bitskrieg.net
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Loading...