Discussion:
[ceph-users] which kernel support object-map, fast-diff
x***@sky-data.cn
2018-05-15 08:50:58 UTC
Permalink
Hi, all!

I use Centos 7.4 and want to use ceph rbd.

I found that object-map, fast-diff can not work.

rbd image 'app':
size 500 GB in 128000 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.10a2643c9869
format: 2
features: layering, exclusive-lock, object-map, fast-diff <===
flags: object map invalid, fast diff invalid

Ceph version is 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
Kernel: 3.10.0-693.el7.x86_64

So which kernel version support those feature?

I do not find answer on ceph docs.
Konstantin Shalygin
2018-05-15 08:57:00 UTC
Permalink
Post by x***@sky-data.cn
So which kernel version support those feature?
No one kernel support this features yet.



k
x***@sky-data.cn
2018-05-15 09:06:43 UTC
Permalink
Could give a list about enable or not?

----- Original Message -----
From: "Konstantin Shalygin" <***@k0ste.ru>
To: "ceph-users" <ceph-***@lists.ceph.com>
Cc: "xiang dai" <***@sky-data.cn>
Sent: Tuesday, May 15, 2018 4:57:00 PM
Subject: Re: [ceph-users] which kernel support object-map, fast-diff
Post by x***@sky-data.cn
So which kernel version support those feature?
No one kernel support this features yet.



k
--
戴翔
南京天数信息科技有限公司
电话: +86 1 3382776490
公司官网: www.sky-data.cn
免费使用天数润科智能计算平台 SkyDiscovery
Paul Emmerich
2018-05-15 20:48:55 UTC
Permalink
The following RBD features are supported since these kernel versions:

Kernel 3.8: RBD_FEATURE_LAYERING
https://github.com/ceph/ceph-client/commit/d889140c4a1c5edb6a7bd90392b9d878bfaccfb6
Kernel 3.10: RBD_FEATURE_STRIPINGV2
https://github.com/ceph/ceph-client/commit/5cbf6f12c48121199cc214c93dea98cce719343b
Kernel 4.9: RBD_FEATURE_EXCLUSIVE_LOCK
https://github.com/ceph/ceph-client/commit/ed95b21a4b0a71ef89306cdeb427d53cc9cb343f
Kernel 4.11: RBD_FEATURE_DATA_POOL
https://github.com/ceph/ceph-client/commit/7e97332ea9caad3b7c6d86bc3b982e17eda2f736

Try using rbd-nbd if you need other features.

Paul
Post by x***@sky-data.cn
Could give a list about enable or not?
----- Original Message -----
Sent: Tuesday, May 15, 2018 4:57:00 PM
Subject: Re: [ceph-users] which kernel support object-map, fast-diff
Post by x***@sky-data.cn
So which kernel version support those feature?
No one kernel support this features yet.
k
--
戎翔
南京倩数信息科技有限公叞
电话: +86 1 3382776490
公叞官眑: www.sky-data.cn
免莹䜿甚倩数涊科智胜计算平台 SkyDiscovery
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 MÃŒnchen
www.croit.io
Tel: +49 89 1896585 90
Florian Engelmann
2018-11-20 14:29:53 UTC
Permalink
Hi,

today we migrated all of our rocksdb and wal devices to new once. The
new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and
LVM based.

We migrated like:

export OSD=x

systemctl stop ceph-osd@$OSD

lvcreate -n db-osd$OSD -L60g data || exit 1
lvcreate -n wal-osd$OSD -L2g data || exit 1

dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD
bs=1M || exit 1

rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db ||
exit 1
ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1


chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1


ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1

systemctl start ceph-osd@$OSD


Everything went fine but it looks like the db and wal size is still the
old one:

ceph daemon osd.0 perf dump|jq '.bluefs'
{
"gift_bytes": 0,
"reclaim_bytes": 0,
"db_total_bytes": 524279808,
"db_used_bytes": 330301440,
"wal_total_bytes": 524283904,
"wal_used_bytes": 69206016,
"slow_total_bytes": 320058949632,
"slow_used_bytes": 13606322176,
"num_files": 220,
"log_bytes": 44204032,
"log_compactions": 0,
"logged_bytes": 31145984,
"files_written_wal": 1,
"files_written_sst": 1,
"bytes_written_wal": 37753489,
"bytes_written_sst": 238992
}


Even if the new block devices are recognized correctly:

2018-11-20 11:40:34.653524 7f70219b8d00 1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf00000000,
60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00 1 bluefs add_block_device bdev
1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB


2018-11-20 11:40:34.662385 7f70219b8d00 1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 (0x80000000,
2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00 1 bluefs add_block_device bdev
0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB


Are we missing some command to "notify" rocksdb about the new device size?

All the best,
Florian
Igor Fedotov
2018-11-20 15:17:08 UTC
Permalink
Hi Florian,

what's your Ceph version?

Can you also check the output for

ceph-bluestore-tool show-label -p <path to osd>


It should report 'size' labels for every volume, please check they
contain new values.


Thanks,

Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once. The
new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and
LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD
bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db
|| exit 1
    ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is still
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf00000000,
60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device
bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 (0x80000000,
2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device
bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Florian Engelmann
2018-11-20 15:42:00 UTC
Permalink
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
"/var/lib/ceph/osd/ceph-0//block": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 8001457295360,
"btime": "2018-06-29 23:43:12.088842",
"description": "main",
"bluefs": "1",
"ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"ready": "ready",
"whoami": "0"
},
"/var/lib/ceph/osd/ceph-0//block.wal": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 524288000,
"btime": "2018-06-29 23:43:12.098690",
"description": "bluefs wal"
},
"/var/lib/ceph/osd/ceph-0//block.db": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 524288000,
"btime": "2018-06-29 23:43:12.098023",
"description": "bluefs db"
}
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check they
contain new values.
That's exactly the problem, whether "ceph-bluestore-tool show-label" nor
"ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new sizes.
But we are 100% sure the new devices are used as we already deleted the
old once...

We tried to delete the "key" "size" to add one with the new value but:

ceph-bluestore-tool rm-label-key --dev /var/lib/ceph/osd/ceph-0/block.db
-k size
key 'size' not present

even if:

ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
"/var/lib/ceph/osd/ceph-0/block.db": {
"osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
"size": 524288000,
"btime": "2018-06-29 23:43:12.098023",
"description": "bluefs db"
}
}

So it looks like the key "size" is "read-only"?
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once. The
new once are much bigger (500MB for wal/db -> 60GB db and 2G WAL) and
LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db of=/dev/data/db-osd$OSD
bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db
|| exit 1
    ln -vs /dev/data/wal-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is still
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440 (0xf00000000,
60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device
bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648 (0x80000000,
2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device
bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 ZÃŒrich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:***@everyware.ch
web: http://www.everyware.ch
Igor Fedotov
2018-11-20 15:59:11 UTC
Permalink
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check they
contain new values.
That's exactly the problem, whether "ceph-bluestore-tool show-label"
nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new
sizes. But we are 100% sure the new devices are used as we already
deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352

This PR also eliminates the need to set sizes manually on bdev-expand.

I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.


Thanks,
Igor
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once.
The new once are much bigger (500MB for wal/db -> 60GB db and 2G
WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db
|| exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is still
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device
bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device
bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Florian Engelmann
2018-11-20 16:05:55 UTC
Permalink
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check they
contain new values.
That's exactly the problem, whether "ceph-bluestore-tool show-label"
nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new
sizes. But we are 100% sure the new devices are used as we already
deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe you
could help us here as well.

Option A: Wait for this fix to be available. -> could last weeks or even
months

Option B: Recreate OSDs "one-by-one". -> will take a very long time as well

Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once.
The new once are much bigger (500MB for wal/db -> 60GB db and 2G
WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD /var/lib/ceph/osd/ceph-$OSD/block.db
|| exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is still
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device
bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device
bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Igor Fedotov
2018-11-20 17:13:47 UTC
Permalink
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check they
contain new values.
That's exactly the problem, whether "ceph-bluestore-tool show-label"
nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did recognize the new
sizes. But we are 100% sure the new devices are used as we already
deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe you
could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks or
even months
if you can build a custom version of ceph_bluestore_tool then this is a
short path. I'll submit a patch today or tomorrow which you need to
integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to update
64bit size value in block.db and block.wal files. In my lab I can find
it at offset 0x52. Most probably this is the fixed location but it's
better to check beforehand - existing value should contain value
corresponding to the one reported with show-label. Or I can do that for
you - please send the  first 4K chunks to me along with corresponding
label report.
Then update with new values - the field has to contain exactly the same
size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once.
The new once are much bigger (500MB for wal/db -> 60GB db and 2G
WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db ||
exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal ||
exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs add_block_device
bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs add_block_device
bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Igor Fedotov
2018-11-20 17:54:42 UTC
Permalink
FYI: https://github.com/ceph/ceph/pull/25187
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check they
contain new values.
That's exactly the problem, whether "ceph-bluestore-tool
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did
recognize the new sizes. But we are 100% sure the new devices are
used as we already deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe
you could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks or
even months
if you can build a custom version of ceph_bluestore_tool then this is
a short path. I'll submit a patch today or tomorrow which you need to
integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to
update 64bit size value in block.db and block.wal files. In my lab I
can find it at offset 0x52. Most probably this is the fixed location
but it's better to check beforehand - existing value should contain
value corresponding to the one reported with show-label. Or I can do
that for you - please send the  first 4K chunks to me along with
corresponding label report.
Then update with new values - the field has to contain exactly the
same size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once.
The new once are much bigger (500MB for wal/db -> 60GB db and 2G
WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db ||
exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal ||
exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db
size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs
add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal
size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Florian Engelmann
2018-11-21 08:11:17 UTC
Permalink
Great support Igor!!!! Both thumbs up! We will try to build the tool
today and expand those bluefs devices once again.
Post by Igor Fedotov
FYI: https://github.com/ceph/ceph/pull/25187
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check they
contain new values.
That's exactly the problem, whether "ceph-bluestore-tool
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did
recognize the new sizes. But we are 100% sure the new devices are
used as we already deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe
you could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks or
even months
if you can build a custom version of ceph_bluestore_tool then this is
a short path. I'll submit a patch today or tomorrow which you need to
integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to
update 64bit size value in block.db and block.wal files. In my lab I
can find it at offset 0x52. Most probably this is the fixed location
but it's better to check beforehand - existing value should contain
value corresponding to the one reported with show-label. Or I can do
that for you - please send the  first 4K chunks to me along with
corresponding label report.
Then update with new values - the field has to contain exactly the
same size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new once.
The new once are much bigger (500MB for wal/db -> 60GB db and 2G
WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal ||
exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db
size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs
add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal
size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 ZÃŒrich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:***@everyware.ch
web: http://www.everyware.ch
Igor Fedotov
2018-11-21 08:34:47 UTC
Permalink
Actually  (given that your devices are already expanded) you don't need
to expand them once again - one can just update size labels with my new PR.

For new migrations you can use updated bluefs expand command which sets
size label automatically though.


Thanks,
Igor
Post by Florian Engelmann
Great support Igor!!!! Both thumbs up! We will try to build the tool
today and expand those bluefs devices once again.
Post by Igor Fedotov
FYI: https://github.com/ceph/ceph/pull/25187
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check
they contain new values.
That's exactly the problem, whether "ceph-bluestore-tool
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did
recognize the new sizes. But we are 100% sure the new devices are
used as we already deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on
bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe
you could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks or
even months
if you can build a custom version of ceph_bluestore_tool then this
is a short path. I'll submit a patch today or tomorrow which you
need to integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to
update 64bit size value in block.db and block.wal files. In my lab I
can find it at offset 0x52. Most probably this is the fixed location
but it's better to check beforehand - existing value should contain
value corresponding to the one reported with show-label. Or I can do
that for you - please send the first 4K chunks to me along with
corresponding label report.
Then update with new values - the field has to contain exactly the
same size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new
once. The new once are much bigger (500MB for wal/db -> 60GB db
and 2G WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db ||
exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db
size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs
add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal
size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Florian Engelmann
2018-11-21 16:01:33 UTC
Permalink
Hi Igor,

sad to say but I failed building the tool. I tried to build the whole
project like documented here:

http://docs.ceph.com/docs/mimic/install/build-ceph/

But as my workstation is running Ubuntu the binary fails on SLES:

./ceph-bluestore-tool --help
./ceph-bluestore-tool: symbol lookup error: ./ceph-bluestore-tool:
undefined symbol: _ZNK7leveldb6Status8ToStringB5cxx11Ev

I did copy all libraries to ~/lib and exported LD_LIBRARY_PATH but it
did not solve the problem.

Is there any simple method to just build the bluestore-tool standalone
and static?

All the best,
Florian
Actually  (given that your devices are already expanded) you don't need
to expand them once again - one can just update size labels with my new PR.
For new migrations you can use updated bluefs expand command which sets
size label automatically though.
Thanks,
Igor
Post by Florian Engelmann
Great support Igor!!!! Both thumbs up! We will try to build the tool
today and expand those bluefs devices once again.
Post by Igor Fedotov
FYI: https://github.com/ceph/ceph/pull/25187
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check
they contain new values.
That's exactly the problem, whether "ceph-bluestore-tool
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did
recognize the new sizes. But we are 100% sure the new devices are
used as we already deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on
bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe
you could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks or
even months
if you can build a custom version of ceph_bluestore_tool then this
is a short path. I'll submit a patch today or tomorrow which you
need to integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to
update 64bit size value in block.db and block.wal files. In my lab I
can find it at offset 0x52. Most probably this is the fixed location
but it's better to check beforehand - existing value should contain
value corresponding to the one reported with show-label. Or I can do
that for you - please send the first 4K chunks to me along with
corresponding label report.
Then update with new values - the field has to contain exactly the
same size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new
once. The new once are much bigger (500MB for wal/db -> 60GB db
and 2G WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db ||
exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db
size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs
add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal
size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 ZÃŒrich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: mailto:***@everyware.ch
web: http://www.everyware.ch
Igor Fedotov
2018-11-22 08:38:52 UTC
Permalink
Hi Florian,
Post by Florian Engelmann
Hi Igor,
sad to say but I failed building the tool. I tried to build the whole
http://docs.ceph.com/docs/mimic/install/build-ceph/
./ceph-bluestore-tool --help
undefined symbol: _ZNK7leveldb6Status8ToStringB5cxx11Ev
I did copy all libraries to ~/lib and exported LD_LIBRARY_PATH but it
did not solve the problem.
Is there any simple method to just build the bluestore-tool standalone
and static?
Unfortunately I don't know such a method.

May be try hex editing instead?
Post by Florian Engelmann
All the best,
Florian
Post by Igor Fedotov
Actually  (given that your devices are already expanded) you don't
need to expand them once again - one can just update size labels with
my new PR.
For new migrations you can use updated bluefs expand command which
sets size label automatically though.
Thanks,
Igor
Post by Florian Engelmann
Great support Igor!!!! Both thumbs up! We will try to build the tool
today and expand those bluefs devices once again.
Post by Igor Fedotov
FYI: https://github.com/ceph/ceph/pull/25187
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check
they contain new values.
That's exactly the problem, whether "ceph-bluestore-tool
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did
recognize the new sizes. But we are 100% sure the new devices
are used as we already deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed.
Maybe you could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks
or even months
if you can build a custom version of ceph_bluestore_tool then this
is a short path. I'll submit a patch today or tomorrow which you
need to integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to
update 64bit size value in block.db and block.wal files. In my lab
I can find it at offset 0x52. Most probably this is the fixed
location but it's better to check beforehand - existing value
should contain value corresponding to the one reported with
show-label. Or I can do that for you - please send the first 4K
chunks to me along with corresponding label report.
Then update with new values - the field has to contain exactly the
same size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new
once. The new once are much bigger (500MB for wal/db -> 60GB
db and 2G WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) || exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db
|| exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1
bdev(0x5647ea9ce200 /var/lib/ceph/osd/ceph-0/block.db) open
size 64424509440 (0xf00000000, 60GiB) block_size 4096 (4KiB)
non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs
add_block_device bdev 1 path
/var/lib/ceph/osd/ceph-0/block.db size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1
bdev(0x5647ea9ce600 /var/lib/ceph/osd/ceph-0/block.wal) open
size 2147483648 (0x80000000, 2GiB) block_size 4096 (4KiB)
non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs
add_block_device bdev 0 path
/var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new
device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Francois Scheurer
2018-11-23 11:59:13 UTC
Permalink
Dear Igor


Thank you for your help!
I am working with Florian.
We have built the ceph-bluestore-tool with your patch on SLES 12SP3.

We will post back the results  ASAP.


Best Regards
Francois Scheurer
-------- Weitergeleitete Nachricht --------
Betreff: Re: [ceph-users] RocksDB and WAL migration to new block device
Datum: Wed, 21 Nov 2018 11:34:47 +0300
Actually  (given that your devices are already expanded) you don't
need to expand them once again - one can just update size labels with
my new PR.
For new migrations you can use updated bluefs expand command which
sets size label automatically though.
Thanks,
Igor
Post by Florian Engelmann
Great support Igor!!!! Both thumbs up! We will try to build the tool
today and expand those bluefs devices once again.
Post by Igor Fedotov
FYI: https://github.com/ceph/ceph/pull/25187
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Hi Igor,
Post by Igor Fedotov
what's your Ceph version?
12.2.8 (SES 5.5 - patched to the latest version)
Post by Igor Fedotov
Can you also check the output for
ceph-bluestore-tool show-label -p <path to osd>
ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/
infering bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-0//block": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 8001457295360,
        "btime": "2018-06-29 23:43:12.088842",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a2222146-6561-307e-b032-c5cee2ee520c",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "whoami": "0"
    },
    "/var/lib/ceph/osd/ceph-0//block.wal": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098690",
        "description": "bluefs wal"
    },
    "/var/lib/ceph/osd/ceph-0//block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
Post by Igor Fedotov
It should report 'size' labels for every volume, please check
they contain new values.
That's exactly the problem, whether "ceph-bluestore-tool
show-label" nor "ceph daemon osd.0 perf dump|jq '.bluefs'" did
recognize the new sizes. But we are 100% sure the new devices
are used as we already deleted the old once...
ceph-bluestore-tool rm-label-key --dev
/var/lib/ceph/osd/ceph-0/block.db -k size
key 'size' not present
ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "1e5b3908-20b1-41e4-b6eb-f5636d20450b",
        "size": 524288000,
        "btime": "2018-06-29 23:43:12.098023",
        "description": "bluefs db"
    }
}
So it looks like the key "size" is "read-only"?
There was a bug in updating specific keys, see
https://github.com/ceph/ceph/pull/24352
This PR also eliminates the need to set sizes manually on
bdev-expand.
I thought it had been backported to Luminous but it looks like it doesn't.
Will submit a PR shortly.
Thank you so much Igor! So we have to decide how to proceed. Maybe
you could help us here as well.
Option A: Wait for this fix to be available. -> could last weeks
or even months
if you can build a custom version of ceph_bluestore_tool then this
is a short path. I'll submit a patch today or tomorrow which you
need to integrate into your private build.
Then you need to upgrade just the tool and apply new sizes.
Post by Florian Engelmann
Option B: Recreate OSDs "one-by-one". -> will take a very long time as well
No need for that IMO.
Post by Florian Engelmann
Option C: There is some "lowlevel" commad allowing us to fix those sizes?
Well hex editor might help here as well. What you need is just to
update 64bit size value in block.db and block.wal files. In my lab
I can find it at offset 0x52. Most probably this is the fixed
location but it's better to check beforehand - existing value
should contain value corresponding to the one reported with
show-label. Or I can do that for you - please send the first 4K
chunks to me along with corresponding label report.
Then update with new values - the field has to contain exactly the
same size as your new partition.
Post by Florian Engelmann
Post by Igor Fedotov
Post by Florian Engelmann
Post by Igor Fedotov
Thanks,
Igor
Post by Florian Engelmann
Hi,
today we migrated all of our rocksdb and wal devices to new
once. The new once are much bigger (500MB for wal/db -> 60GB
db and 2G WAL) and LVM based.
    export OSD=x
    lvcreate -n db-osd$OSD -L60g data || exit 1
    lvcreate -n wal-osd$OSD -L2g data || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.wal
of=/dev/data/wal-osd$OSD bs=1M || exit 1
    dd if=/var/lib/ceph/osd/ceph-$OSD/block.db
of=/dev/data/db-osd$OSD bs=1M  || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    rm -v /var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    ln -vs /dev/data/db-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.db || exit 1
    ln -vs /dev/data/wal-osd$OSD
/var/lib/ceph/osd/ceph-$OSD/block.wal || exit 1
    chown -c ceph:ceph $(realpath /dev/data/db-osd$OSD) || exit 1
    chown -c ceph:ceph $(realpath /dev/data/wal-osd$OSD) ||
exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.db
|| exit 1
    chown -ch ceph:ceph /var/lib/ceph/osd/ceph-$OSD/block.wal
|| exit 1
    ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-$OSD/ || exit 1
Everything went fine but it looks like the db and wal size is
ceph daemon osd.0 perf dump|jq '.bluefs'
{
  "gift_bytes": 0,
  "reclaim_bytes": 0,
  "db_total_bytes": 524279808,
  "db_used_bytes": 330301440,
  "wal_total_bytes": 524283904,
  "wal_used_bytes": 69206016,
  "slow_total_bytes": 320058949632,
  "slow_used_bytes": 13606322176,
  "num_files": 220,
  "log_bytes": 44204032,
  "log_compactions": 0,
  "logged_bytes": 31145984,
  "files_written_wal": 1,
  "files_written_sst": 1,
  "bytes_written_wal": 37753489,
  "bytes_written_sst": 238992
}
2018-11-20 11:40:34.653524 7f70219b8d00  1 bdev(0x5647ea9ce200
/var/lib/ceph/osd/ceph-0/block.db) open size 64424509440
(0xf00000000, 60GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.653532 7f70219b8d00  1 bluefs
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db
size 60GiB
2018-11-20 11:40:34.662385 7f70219b8d00  1 bdev(0x5647ea9ce600
/var/lib/ceph/osd/ceph-0/block.wal) open size 2147483648
(0x80000000, 2GiB) block_size 4096 (4KiB) non-rotational
2018-11-20 11:40:34.662406 7f70219b8d00  1 bluefs
add_block_device bdev 0 path
/var/lib/ceph/osd/ceph-0/block.wal size 2GiB
Are we missing some command to "notify" rocksdb about the new device size?
All the best,
Florian
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 ZÃŒrich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: ***@everyware.ch
web: http://www.everyware.ch
Loading...