[ceph-users] Crush Map and SSD Pools

Discussion:

Lindsay Mathieson

2014-12-30 14:13:28 UTC

I looked at the section for setting up different pools with different OSD's (e.g SSD Pool):

http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds

And it seems to make the assumption that the ssd's and platters all live on separate hosts.

Not the case at all for my setup and I imagine for most people I have ssd's mixed with
platters on the same hosts.

In that case should one have the root buckets referencing buckets not based on hosts, e.g,
something like this:

# devices
# Platters
device 0 osd.0
device 1 osd.1

# SSD
device 2 osd.2
device 3 osd.3

host vnb {
id -2 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
item osd.2 weight 1.000
}
host vng {
id -3 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 1.000
item osd.3 weight 1.000
}

row disk-platter {
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
item osd.1 weight 1.000
}

row disk-ssd {
alg straw
hash 0 # rjenkins1
item osd.2 weight 1.000
item osd.3 weight 1.000
}

root default {
id -1 # do not change unnecessarily
# weight 2.000
alg straw
hash 0 # rjenkins1
item disk-platter weight 2.000
}

root ssd {
id -4
alg straw
hash 0
item disk-ssd weight 2.000
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

rule ssd {
ruleset 1
type replicated
min_size 0
max_size 4
step take ssd
step chooseleaf firstn 0 type host
step emit
}

--
Lindsay

Erik Logtenberg

2014-12-30 15:18:07 UTC

Permalink

Hi Lindsay,

Actually you just setup two entries for each host in your crush map. One
for hdd's and one for ssd's. My osd's look like this:

# id weight type name up/down reweight
-6 1.8 root ssd
-7 0.45 host ceph-01-ssd
0 0.45 osd.0 up 1
-8 0.45 host ceph-02-ssd
3 0.45 osd.3 up 1
-9 0.45 host ceph-03-ssd
8 0.45 osd.8 up 1
-10 0.45 host ceph-04-ssd
11 0.45 osd.11 up 1
-1 29.12 root default
-2 7.28 host ceph-01
1 3.64 osd.1 up 1
2 3.64 osd.2 up 1
-3 7.28 host ceph-02
5 3.64 osd.5 up 1
4 3.64 osd.4 up 1
-4 7.28 host ceph-03
6 3.64 osd.6 up 1
7 3.64 osd.7 up 1
-5 7.28 host ceph-04
10 3.64 osd.10 up 1
9 3.64 osd.9 up 1

As you can see, I have four hosts: ceph-01 ... ceph-04, but eight host
entries. This works great.

Regards,

Erik.

Post by Lindsay Mathieson
http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds
And it seems to make the assumption that the ssd's and platters all live on separate hosts.
Not the case at all for my setup and I imagine for most people I have
ssd's mixed with platters on the same hosts.
In that case should one have the root buckets referencing buckets not
# devices
# Platters
device 0 osd.0
device 1 osd.1
# SSD
device 2 osd.2
device 3 osd.3
host vnb {
id -2 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
item osd.2 weight 1.000
}
host vng {
id -3 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 1.000
item osd.3 weight 1.000
}
row disk-platter {
alg straw
hash 0 # rjenkins1
item osd.0 weight 1.000
item osd.1 weight 1.000
}
row disk-ssd {
alg straw
hash 0 # rjenkins1
item osd.2 weight 1.000
item osd.3 weight 1.000
}
root default {
id -1 # do not change unnecessarily
# weight 2.000
alg straw
hash 0 # rjenkins1
item disk-platter weight 2.000
}
root ssd {
id -4
alg straw
hash 0
item disk-ssd weight 2.000
}
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ssd {
ruleset 1
type replicated
min_size 0
max_size 4
step take ssd
step chooseleaf firstn 0 type host
step emit
}
--
Lindsay
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Lindsay Mathieson

2014-12-30 20:43:04 UTC

Permalink

Post by Erik Logtenberg
As you can see, I have four hosts: ceph-01 ... ceph-04, but eight host
entries. This works great.

you have
- host ceph-01
- host ceph-01-ssd

Don't the host names have to match the real host names?

--
Lindsay

Lindsay Mathieson

2014-12-30 21:18:08 UTC

Permalink

Post by Erik Logtenberg
As you can see, I have four hosts: ceph-01 ... ceph-04, but eight host
entries. This works great.

you have
- host ceph-01
- host ceph-01-ssd

Don't the host names have to match the real host names?

--
Lindsay

Erik Logtenberg

2014-12-30 21:38:14 UTC

Permalink

No, bucket names in crush map are completely arbitrary. In fact, crush
doesn't really know what a "host" is. It is just a bucket, like "rack"
or "datacenter". But they could be called "cat" and "mouse" just as well.

The only reason to use host names is for human readability.

You can then use crush rules to make sure that for instance two copies
of some object are not on the same "host" or in the same "rack" or not
in the same whatever bucket you like. This way you can define your
failure domains in correspondence with your physical layout.

Kind regards,

Erik.

Post by Erik Logtenberg
As you can see, I have four hosts: ceph-01 ... ceph-04, but eight
host entries. This works great.

you have - host ceph-01 - host ceph-01-ssd
Don't the host names have to match the real host names?
_______________________________________________ ceph-users mailing
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Lindsay Mathieson

2014-12-30 22:11:14 UTC

Permalink

Post by Erik Logtenberg
No, bucket names in crush map are completely arbitrary. In fact, crush
doesn't really know what a "host" is. It is just a bucket, like "rack"
or "datacenter". But they could be called "cat" and "mouse" just as well.

Hmmm, I tried that earlier and ran into problems with starting/stopping the
osd - but maybe I screwed something else up. Will give it another go.

--
Lindsay

Erik Logtenberg

2014-12-30 22:25:40 UTC

Permalink

If you want to be able to start your osd's with /etc/init.d/ceph init
script, then you better make sure that /etc/ceph/ceph.conf does link
the osd's to the actual hostname :)

Check out this snippet from my ceph.conf:

[osd.0]
host = ceph-01
osd crush location = "host=ceph-01-ssd root=ssd"

[osd.1]
host = ceph-01

[osd.2]
host = ceph-01

You see all osd's are linked to the right hostname. But the ssd osd is
then explicitly set to go into the right crush location too.

Kind regards,

Erik.

Post by Lindsay Mathieson

Post by Erik Logtenberg
No, bucket names in crush map are completely arbitrary. In fact,
crush doesn't really know what a "host" is. It is just a bucket,
like "rack" or "datacenter". But they could be called "cat" and
"mouse" just as well.

Hmmm, I tried that earlier and ran into problems with
starting/stopping the osd - but maybe I screwed something else up.
Will give it another go.
_______________________________________________ ceph-users mailing
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Lindsay Mathieson

2014-12-30 23:18:06 UTC

Permalink

Post by Erik Logtenberg
f you want to be able to start your osd's with /etc/init.d/ceph init
script, then you better make sure that /etc/ceph/ceph.conf does link
the osd's to the actual hostname

I tried again and it was ok for a short while, then *something* moved the ssd
osd's from "<host>-ssd" to "<host>". Fortunately I had them weighted at 0.

I suspect it was the cluster manager I'm using (proxmox) which adds a simple
gui layer over ceph, I suspect it doesn't deal with this usecase yet.

I'll take it to the proxmox list.

--
Lindsay

Lindsay Mathieson

2014-12-31 12:26:02 UTC

Permalink

I believe that the upstart scripts will do this by default, they call out to
a bash script (I can't remember precisely what that is off the top of my
head) which then returns the crush rule, which will default to host=X osd=X
unless it's overridden somewhere (ceph.conf).
If memory serves there's the ability to provide your own script to call out
to in order to provide the crush rule.

Good to know, thanks.

--
Lindsay

Robert LeBlanc

2015-01-05 22:28:40 UTC

Permalink

It took me a while to figure out the callout script since it wasn't
documented anywhere easy. This is what I wrote down, it could be helpful to
you or others:

1.

Add the hook script to the ceph.conf file of each OSD

osd crush location hook = /path/to/script

1.

Install the script at the defined location that accepts the following
arguments (where the cluster name is typically 'ceph', the id is the daemon
identifier (the OSD number), and the daemon type is typically 'osd')

$ ceph-crush-location --cluster CLUSTER --id ID --type TYPE

1.

The script needs to output on a single line the key/value pairs of the
location such as

host=osdhost rack=rack5 row=row8 section=sec2 datacenter=provo
region=na-west root=default

On Wed, Dec 31, 2014 at 5:26 AM, Lindsay Mathieson <

I believe that the upstart scripts will do this by default, they call

out to

a bash script (I can't remember precisely what that is off the top of my
head) which then returns the crush rule, which will default to host=X

osd=X

unless it's overridden somewhere (ceph.conf).
If memory serves there's the ability to provide your own script to call

out

to in order to provide the crush rule.

Good to know, thanks.
--
Lindsay
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com