[ceph-users] Mimic - EC and crush rules

Discussion:

[ceph-users] Mimic - EC and crush rules - clarification

Steven Vacaroaia

2018-11-01 17:34:20 UTC

Hi,

I am trying to create an EC pool on my SSD based OSDs
and will appreciate if someone clarify / provide advice about the following

- best K + M combination for 4 hosts one OSD per host
My understanding is that K+M< OSD but using K=2, M=1 does not provide any
redundancy
( as soon as 1 OSD is down, you cannot write to the pool)
Am I right ?

- assigning crush_rule as per documentation does not seem to work
If I provide all the crush rule details when I create the EC profile, the
PGs are being placed on SSD OSDs AND a crush rule is automatically create
Is that the right/new way of doing it ?
EXAMPLE
ceph osd erasure-code-profile set erasureISA crush-failure-domain=osd k=3
m=2 crush-root=ssds plugin=isa technique=cauchy crush-device-class=ssd

[***@osd01 ~]# ceph osd crush rule ls
replicated_rule
erasure-code
ssdrule
[***@osd01 ~]# ceph osd crush rule dump ssdrule
{
"rule_id": 2,
"rule_name": "ssdrule",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -4,
"item_name": "ssds"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

[***@osd01 ~]# ceph osd pool set test crush_rule 2
Error ENOENT: crush rule 2 does not exist

David Turner

2018-11-01 18:10:16 UTC

Permalink

Yes, when creating an EC profile, it automatically creates a CRUSH rule
specific for that EC profile. You are also correct that 2+1 doesn't really
have any resiliency built in. 2+2 would allow 1 node to go down while
still having your data accessible. It will use 2x data to raw as opposed
to the 1.5x of 2+1, but it gives you resiliency. The example in your
command of 3+2 is not possible with your setup. May I ask why you want EC
on such a small OSD count? I'm guessing to not use as much storage on your
SSDs, but I would just suggest going with replica with such a small
cluster. If you have a larger node/OSD count, then you can start seeing if
EC is right for your use case, but if this is production data... I wouldn't
risk it.

When setting the crush rule, it wants the name of it, ssdrule, not 2.

Post by Steven Vacaroaia
Hi,
I am trying to create an EC pool on my SSD based OSDs
and will appreciate if someone clarify / provide advice about the following
- best K + M combination for 4 hosts one OSD per host
My understanding is that K+M< OSD but using K=2, M=1 does not provide
any redundancy
( as soon as 1 OSD is down, you cannot write to the pool)
Am I right ?
- assigning crush_rule as per documentation does not seem to work
If I provide all the crush rule details when I create the EC profile, the
PGs are being placed on SSD OSDs AND a crush rule is automatically create
Is that the right/new way of doing it ?
EXAMPLE
ceph osd erasure-code-profile set erasureISA crush-failure-domain=osd k=3
m=2 crush-root=ssds plugin=isa technique=cauchy crush-device-class=ssd
replicated_rule
erasure-code
ssdrule
{
"rule_id": 2,
"rule_name": "ssdrule",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -4,
"item_name": "ssds"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
Error ENOENT: crush rule 2 does not exist
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Wladimir Mutel

2018-11-01 21:06:04 UTC

Permalink

Post by David Turner
Yes, when creating an EC profile, it automatically creates a CRUSH rule
specific for that EC profile. You are also correct that 2+1 doesn't
really have any resiliency built in. 2+2 would allow 1 node to go down
while still having your data accessible. It will use 2x data to raw as

Is not EC 2+2 the same as 2x replication (i.e. RAID1) ?
Is not EC benefit and intention to allow equivalent replication
factors be chosen between >1 and <2 ?
That's why it is recommended to have m<k in EC algorithm
parameters. Because when you have m==k, it is equivalent to 2x
replication, with m==2k - to 3x replication and so on.
And correspondingly, with m==1 you have equivalent reliability
of RAID5, with m==2 - that of RAID6, and you start to have more
"interesting" reliability factors only when you could allow m>2
and k>m. Overall, your reliability in Ceph is measured as a
cluster rebuild/performance degradation time in case of
up-to m OSDs failure, provided that no more than m OSDs
(or larger failure domains) have failed at once.
Sure, EC is beneficial only when you have enough failure domains
(i.e. hosts). My criterion is that you should have more hosts
than you have individual OSDs within a single host.
I.e. at least 8 (and better >8) hosts when you have 8 OSDs
per host.

Post by David Turner
opposed to the 1.5x of 2+1, but it gives you resiliency. The example in
your command of 3+2 is not possible with your setup. May I ask why you
want EC on such a small OSD count? I'm guessing to not use as much
storage on your SSDs, but I would just suggest going with replica with
such a small cluster. If you have a larger node/OSD count, then you can
start seeing if EC is right for your use case, but if this is production
data... I wouldn't risk it.
When setting the crush rule, it wants the name of it, ssdrule, not 2.

David Turner

2018-11-16 20:27:37 UTC

Permalink

The difference for 2+2 vs 2x replication isn't in the amount of space being
used or saved, but in the amount of OSDs you can safely lose without any
data loss or outages. 2x replication is generally considered very unsafe
for data integrity, but 2+2 would is as resilient as 3x replication while
only using as much space as 2x replication.

Post by Wladimir Mutel

_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com