Discussion:
[ceph-users] Help with crushmap
Vasiliy Tolstov
2018-12-02 16:43:50 UTC
Permalink
Hi, i need help with crushmap
I have
3 regions - r1 r2 r3
5 dc - dc1 dc2 dc3 dc4 dc5
dc1 dc2 dc3 in r1
dc4 in r2
dc5 in r3

Each dc have 3 nodes with 2 disks
I need to have 3 rules
rule1 to have 2 copies on two nodes in each dc - 10 copies total failure
domain dc
rule2 to have 2 copies on two nodes in each region - 6 copies total failure
domain region
rule3 to have 2 copies on two nodes in dc1 failure domain node

How looks crushmap in this case for replicated type?
Thanks.
Paul Emmerich
2018-12-02 17:38:12 UTC
Permalink
10 copies for a replicated setup seems... excessive.

The rules are quite simple, for example rule 1 could be:

take default
choose firstn 5 type datacenter # picks 5 datacenters
chooseleaf firstn 2 type host # 2 different hosts in each datacenter
emit

rule 2 is the same but type region and first 3 and for rule3 you can
just start directly in the selected dc (take dc1).
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Post by Vasiliy Tolstov
Hi, i need help with crushmap
I have
3 regions - r1 r2 r3
5 dc - dc1 dc2 dc3 dc4 dc5
dc1 dc2 dc3 in r1
dc4 in r2
dc5 in r3
Each dc have 3 nodes with 2 disks
I need to have 3 rules
rule1 to have 2 copies on two nodes in each dc - 10 copies total failure domain dc
rule2 to have 2 copies on two nodes in each region - 6 copies total failure domain region
rule3 to have 2 copies on two nodes in dc1 failure domain node
How looks crushmap in this case for replicated type?
Thanks.
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Vasiliy Tolstov
2018-12-02 17:52:15 UTC
Permalink
Post by Paul Emmerich
10 copies for a replicated setup seems... excessive.
I'm try to create golang package for simple key-val store that used ceph
crushmap to distribute data.
For each namespace attach ceph crushmap rule.
Florian Engelmann
2018-12-03 10:02:28 UTC
Permalink
Hello,

we are fighting a HDD spin-down problem on our production ceph cluster
since two weeks now. The problem is not ceph related but I guess this
topic is interesting to the list and to be honest I hope to find a
solution here.

We do use 6 OSD Nodes like:
OS: Suse 12 SP3
Ceph: SES 5.5 (12.2.8)
Server: Supermicro 6048R-E1CR36L
Controller: LSI 3008 (LSI3008-IT)
Disk: 12x Seagate ST8000NM0055-1RM112 8TB (SN05 Firmware (some still
SN02 and SN04)
NVMe: 1x Intel DC P3700 800GB (used for 80GB RocksDB and 2GB WAL for
each OSD (only 7 Disks are online right now - up to 9 Disks will have
there RocksDB/WAL on one NVMe SSD)


Problem:
This Ceph cluster is used for objectstorage (RadosGW) only and is mostly
used for backups to S3 (RadosGW). There is not that much activity -
mostly at night time. We do not want any HDD to spin down but they do.
We tried to disable the spindown timers by using sdparm and also with
the Seagate tool SeaChest but "something" does re-enable them:


Disable standby on all HDD:
for i in sd{c..n}; do
/root/SeaChestUtilities/Linux/Lin64/SeaChest_PowerControl_191_1183_64 -d
/dev/$i --onlySeagate --changePower --disableMode --powerMode standby ;
done


Monitor standby timer status:

while true; do for i in sd{c..n}; do echo "$(date) $i
$(/root/SeaChestUtilities/Linux/Lin64/SeaChest_PowerControl_191_1183_64
-d /dev/$i --onlySeagate --showEPCSettings -v0 | grep Stand)"; done;
sleep 1 ; done

This will show:
Mon Dec 3 10:42:54 CET 2018 sdc Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:54 CET 2018 sdd Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:54 CET 2018 sde Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:54 CET 2018 sdf Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:54 CET 2018 sdg Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:54 CET 2018 sdh Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:54 CET 2018 sdi Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:55 CET 2018 sdj Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:55 CET 2018 sdk Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:55 CET 2018 sdl Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:55 CET 2018 sdm Standby Z 0 9000
65535 120 Y Y
Mon Dec 3 10:42:55 CET 2018 sdn Standby Z 0 9000
65535 120 Y Y


So everything is fine right now. Standby timer is 0 and disabled (no *
shown) while the default value is 9000 and the saved timer is FFFF (we
saved this value so the disks have a huge time after reboots). But after
a unknown amount of time (in this case ~7 minutes) things start to get
weird:

Mon Dec 3 10:47:52 CET 2018 sdc Standby Z *3500 9000
65535 120 Y Y
[...]
65535 120 Y Y
Mon Dec 3 10:48:07 CET 2018 sdc Standby Z *3500 9000
65535 120 Y Y
Mon Dec 3 10:48:09 CET 2018 sdc Standby Z *3500 9000
65535 120 Y Y
Mon Dec 3 10:48:12 CET 2018 sdc Standby Z *4500 9000
65535 120 Y Y
Mon Dec 3 10:48:14 CET 2018 sdc Standby Z *4500 9000
65535 120 Y Y
Mon Dec 3 10:48:16 CET 2018 sdc Standby Z *4500 9000
65535 120 Y Y
Mon Dec 3 10:48:19 CET 2018 sdc Standby Z *4500 9000
65535 120 Y Y
Mon Dec 3 10:48:21 CET 2018 sdc Standby Z *4500 9000
65535 120 Y Y
Mon Dec 3 10:48:23 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:26 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:28 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:30 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:32 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:35 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:37 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:40 CET 2018 sdc Standby Z *5500 9000
65535 120 Y Y
Mon Dec 3 10:48:42 CET 2018 sdc Standby Z *6500 9000
65535 120 Y Y
Mon Dec 3 10:48:44 CET 2018 sdc Standby Z *6500 9000
65535 120 Y Y
Mon Dec 3 10:48:47 CET 2018 sdc Standby Z *6500 9000
65535 120 Y Y
Mon Dec 3 10:48:49 CET 2018 sdc Standby Z *6500 9000
65535 120 Y Y
Mon Dec 3 10:48:52 CET 2018 sdc Standby Z *7500 9000
65535 120 Y Y
Mon Dec 3 10:48:52 CET 2018 sde Standby Z *65535 9000
65535 120 Y Y
Mon Dec 3 10:48:54 CET 2018 sdc Standby Z *7500 9000
65535 120 Y Y
Mon Dec 3 10:48:55 CET 2018 sde Standby Z *65535 9000
65535 120 Y Y
Mon Dec 3 10:48:57 CET 2018 sdc Standby Z *7500 9000
65535 120 Y Y
Mon Dec 3 10:48:57 CET 2018 sde Standby Z *65535 9000
65535 120 Y Y
Mon Dec 3 10:48:59 CET 2018 sdc Standby Z *7500 9000
65535 120 Y Y
Mon Dec 3 10:49:00 CET 2018 sde Standby Z *65535 9000
65535 120 Y Y
Mon Dec 3 10:49:02 CET 2018 sdc Standby Z *8500 9000
65535 120 Y Y
Mon Dec 3 10:49:02 CET 2018 sde Standby Z *11500 9000
65535 120 Y Y
Mon Dec 3 10:49:04 CET 2018 sdc Standby Z *8500 9000
65535 120 Y Y
Mon Dec 3 10:49:05 CET 2018 sde Standby Z *11500 9000
65535 120 Y Y
Mon Dec 3 10:49:07 CET 2018 sdc Standby Z *8500 9000
65535 120 Y Y
Mon Dec 3 10:49:07 CET 2018 sde Standby Z *11500 9000
65535 120 Y Y


So "something" starts to re-enable those standby timers with strange
numbers. After those timers going up and down and disabled/enabled a
certain (unknown) amount of time they get "stable" at a value of 3000
and stay enabled (*):

Mon Dec 3 10:50:43 CET 2018 sde Standby Z *3000 9000
65535 120 Y Y
Mon Dec 3 10:50:45 CET 2018 sde Standby Z *3000 9000
65535 120 Y Y


3000 = 3000 / 100ms = 5 minutes. This is exactly what we measured when
we started to analyse the issue. Disks powered off (spin-down) after 5
Minutes.

We tried to add:
options mpt3sas allow_drive_spindown=0

did not help anything...

The only workaround right now is to have a cronjob in place running all
3 minutes to disable standby for all disks.


Anyone this a proper solution?

All the best,
Flo

Loading...