[ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

Discussion:

Jens-U. Mozdzen

2018-01-10 13:57:47 UTC

Dear *,

has anybody been successful migrating Filestore OSDs to Bluestore
OSDs, keeping the OSD number? There have been a number of messages on
the list, reporting problems, and my experience is the same. (Removing
the existing OSD and creating a new one does work for me.)

I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd - this basically
says

1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD

I never got step 4 to complete. The closest I got was by doing the
following steps (assuming OSD ID "999" on /dev/sdzz):

1. Stop the old OSD via systemd (osd-node # systemctl stop
ceph-***@999.service)

2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)

3a. if the old OSD was Bluestore with LVM, manually clean up the old
OSD's volume group

3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)

4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)

5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)

6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd'
-i /var/lib/ceph/osd/ceph-999/keyring)

7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)
mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

but ceph-osd keeps complaining "osdmap says I am destroyed, exiting"
on "osd-node # systemctl start ceph-***@999.service".

At first I felt I was hitting http://tracker.ceph.com/issues/21023
(BlueStore-OSDs marked as destroyed in OSD-map after v12.1.1 to
v12.1.4 upgrade). But I was already using the "ceph osd new" command,
which didn't help.

Some hours of sleep later I matched the issued commands to the osdmap
changes and the ceph-osd log messages, which revealed something strange:

- from issuing "ceph osd destroy", osdmap lists the OSD as
"autoout,destroyed,exists" (no surprise here)
- once I issued "ceph osd new", osdmap lists the OSD as "autoout,exists,new"
- starting ceph-osd after "ceph osd new" reports "osdmap says I am
destroyed, exiting"

I can see in the ceph-osd log that it is relating to an *old* osdmap
epoch, roughly 45 minutes old by then?

This got me curious and I dug through the OSD log file, checking the
epoch numbers during start-up:

I took some detours, so there's more than two failed starts in the OSD
log file ;) :

--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00 0 osd.999 0 crush map has
features 288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00 0 osd.999 0 crush map has
features 288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00 0 osd.999 0 crush map has
features 288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00 0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00 0 osd.999 0 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init
authentication failed: (1) Operation not permitted

# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at
that time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381

2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has
features 288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has
features 288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has
features 288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00 0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init,
starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for
initial osdmap
2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map
has features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map
has features 288232610642264064 was 288232575208792577, adjusting msgr
requires for mons
2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map
has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says
I am destroyed, exiting

# another try
# it is now using epoch 110587 for everything. But that one is off by
one at that time already:
# 110587: modified 2018-01-09 23:43:13.202381
# 110588: modified 2018-01-10 00:12:55.271913

# but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
2018-01-10 00:13:04.332026 7f408d5a4d00 0 osd.999 110587 crush map
has features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:13:04.332037 7f408d5a4d00 0 osd.999 110587 crush map
has features 288232610642264064 was 8705, adjusting msgr requires for
mons
2018-01-10 00:13:04.332043 7f408d5a4d00 0 osd.999 110587 crush map
has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:13:04.332092 7f408d5a4d00 0 osd.999 110587 load_pgs
2018-01-10 00:13:04.332096 7f408d5a4d00 0 osd.999 110587 load_pgs
opened 0 pgs
2018-01-10 00:13:04.332100 7f408d5a4d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587
log_to_monitors {default=true}
2018-01-10 00:13:06.026628 7f408d5a4d00 0 osd.999 110587 done with
init, starting boot process
2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says
I am destroyed, exiting

# the attempt after using "ceph osd new", which created epoch 110591
as the first with osd.999 as autoout,exists,new
# But ceph-osd still uses 110587.
# 110587: modified 2018-01-09 23:43:13.202381
# 110591: modified 2018-01-10 00:30:44.850078

2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map
has features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map
has features 288232610642264064 was 8705, adjusting msgr requires for
mons
2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map
has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00 0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs
opened 0 pgs
2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587
log_to_monitors {default=true}
2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with
init, starting boot process
2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says
I am destroyed, exiting
--- cut here ---

So why is ceph-osd referring to an old osdmap, while newer ones are
available for some time already?

And am I right to believe that *if* ceph-osd had checked the then
current osdmap, it would have started successfully (once I did the
"ceph osd new" that's not mentioned in the docs)?

Is the documented procedure (from the "master" HTML docs) correct, or
should the "ceph auth" and "ceph osd new" steps get added?

Regards,
Jens

Alfredo Deza

2018-01-10 14:14:30 UTC

Permalink

Post by Jens-U. Mozdzen
Dear *,
has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)
I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
- this basically says
1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD
I never got step 4 to complete. The closest I got was by doing the following
1. Stop the old OSD via systemd (osd-node # systemctl stop
2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group
3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)
5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

Post by Jens-U. Mozdzen
6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-999/keyring)
7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)
mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.

See http://tracker.ceph.com/issues/22642

In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.

Post by Jens-U. Mozdzen
but ceph-osd keeps complaining "osdmap says I am destroyed, exiting" on
At first I felt I was hitting http://tracker.ceph.com/issues/21023
(BlueStore-OSDs marked as destroyed in OSD-map after v12.1.1 to v12.1.4
upgrade). But I was already using the "ceph osd new" command, which didn't
help.
Some hours of sleep later I matched the issued commands to the osdmap
- from issuing "ceph osd destroy", osdmap lists the OSD as
"autoout,destroyed,exists" (no surprise here)
- once I issued "ceph osd new", osdmap lists the OSD as "autoout,exists,new"
- starting ceph-osd after "ceph osd new" reports "osdmap says I am
destroyed, exiting"
I can see in the ceph-osd log that it is relating to an *old* osdmap epoch,
roughly 45 minutes old by then?
This got me curious and I dug through the OSD log file, checking the epoch
I took some detours, so there's more than two failed starts in the OSD log
--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00 0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00 0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
failed: (1) Operation not permitted
# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at that
time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381
2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00 0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init,
starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial
osdmap
2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064 was 288232575208792577, adjusting msgr requires
for mons
2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
# another try
# it is now using epoch 110587 for everything. But that one is off by one at
# 110587: modified 2018-01-09 23:43:13.202381
# 110588: modified 2018-01-10 00:12:55.271913
# but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
2018-01-10 00:13:04.332026 7f408d5a4d00 0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:13:04.332037 7f408d5a4d00 0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:13:04.332043 7f408d5a4d00 0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:13:04.332092 7f408d5a4d00 0 osd.999 110587 load_pgs
2018-01-10 00:13:04.332096 7f408d5a4d00 0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:13:04.332100 7f408d5a4d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:13:06.026628 7f408d5a4d00 0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
# the attempt after using "ceph osd new", which created epoch 110591 as the
first with osd.999 as autoout,exists,new
# But ceph-osd still uses 110587.
# 110587: modified 2018-01-09 23:43:13.202381
# 110591: modified 2018-01-10 00:30:44.850078
2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00 0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
--- cut here ---
So why is ceph-osd referring to an old osdmap, while newer ones are
available for some time already?
And am I right to believe that *if* ceph-osd had checked the then current
osdmap, it would have started successfully (once I did the "ceph osd new"
that's not mentioned in the docs)?
Is the documented procedure (from the "master" HTML docs) correct, or should
the "ceph auth" and "ceph osd new" steps get added?
Regards,
Jens
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Jens-U. Mozdzen

2018-01-10 14:29:16 UTC

Permalink

Hi Alfredo,

Post by Alfredo Deza

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

I at first tried to follow the documented steps (without my steps 5
and 6), which did not work for me. The documented approach failed with
"init authentication >> failed: (1) Operation not permitted", because
actually ceph-volume did not add the auth entry for me.

But even after manually adding the authentication, the "ceph-volume"
approach failed, as the OSD was still marked "destroyed" in the osdmap
epoch as used by ceph-osd (see the commented messages from
ceph-osd.999.log below).

Post by Alfredo Deza

Post by Jens-U. Mozdzen
7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)

You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.
See http://tracker.ceph.com/issues/22642

If I read that bug description correctly, you're confirming why I
needed step #6 above (manually adding the OSD auth entry. But even if
ceph-volume had added it, the ceph-osd.log entries suggest that
starting the OSD would still have failed, because of accessing the
wrong osdmap epoch.

To me it seems like I'm hitting a bug outside of ceph-volume - unless
it's ceph-volume that somehow determines which osdmap epoch is used by
ceph-osd.

Post by Alfredo Deza
In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.

Yes, that's the work-around I then used - purge the old OSD and create
a new one.

Thanks & regards,
Jens

Post by Alfredo Deza

Post by Jens-U. Mozdzen
[...]
--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00 0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00 0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
failed: (1) Operation not permitted
# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at that
time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381
2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00 0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init,
starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial
osdmap
2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064 was 288232575208792577, adjusting msgr requires
for mons
2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
# another try
# it is now using epoch 110587 for everything. But that one is off by one at
# 110587: modified 2018-01-09 23:43:13.202381
# 110588: modified 2018-01-10 00:12:55.271913
# but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
2018-01-10 00:13:04.332026 7f408d5a4d00 0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:13:04.332037 7f408d5a4d00 0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:13:04.332043 7f408d5a4d00 0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:13:04.332092 7f408d5a4d00 0 osd.999 110587 load_pgs
2018-01-10 00:13:04.332096 7f408d5a4d00 0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:13:04.332100 7f408d5a4d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:13:06.026628 7f408d5a4d00 0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
# the attempt after using "ceph osd new", which created epoch 110591 as the
first with osd.999 as autoout,exists,new
# But ceph-osd still uses 110587.
# 110587: modified 2018-01-09 23:43:13.202381
# 110591: modified 2018-01-10 00:30:44.850078
2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00 0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
--- cut here ---
[...]

Reed Dier

2018-01-11 18:22:48 UTC

Permalink

I am in the process of migrating my OSDs to bluestore finally and thought I would give you some input on how I am approaching it.
Some of saga you can find in another ML thread here: https://www.spinics.net/lists/ceph-users/msg41802.html <https://www.spinics.net/lists/ceph-users/msg41802.html>

My first OSD I was cautious, and I outed the OSD without downing it, allowing it to move data off.
Some background on my cluster, for this OSD, it is an 8TB spinner, with an NVMe partition previously used for journaling in filestore, intending to be used for block.db in bluestore.

Then I downed it, flushed the journal, destroyed it, zapped with ceph-volume, set norecover and norebalance flags, did ceph osd crush remove osd.$ID, ceph auth del osd.$ID, and ceph osd rm osd.$ID and used ceph-volume locally to create the new LVM target. Then unset the norecover and norebalance flags and it backfilled like normal.

I initially ran into issues with specifying --osd.id causing my osdâs to fail to start, but removing that I was able to get it to fill in the gap of the OSD I just removed.

Iâm now doing quicker, more destructive migrations in an attempt to reduce data movement.
This way I donât read from OSD Iâm replacing, write to other OSD temporarily, read back from temp OSD, write back to ânewâ OSD.
Iâm just reading from replica and writing to ânewâ OSD.

So Iâm setting the norecover and norebalance flags, down the OSD (but not out, it stays in, also have the noout flag set), destroy/zap, recreate using ceph-volume, unset the flags, and it starts backfilling.
For 8TB disks, and with 23 other 8TB disks in the pool, it takes a long time to offload it and then backfill back from them. I trust my disks enough to backfill from the other disks, and its going well. Also seeing very good write performance backfilling compared to previous drive replacements in filestore, so thats very promising.

Reed

Post by Jens-U. Mozdzen
Hi Alfredo,

Post by Alfredo Deza

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

I at first tried to follow the documented steps (without my steps 5 and 6), which did not work for me. The documented approach failed with "init authentication >> failed: (1) Operation not permitted", because actually ceph-volume did not add the auth entry for me.
But even after manually adding the authentication, the "ceph-volume" approach failed, as the OSD was still marked "destroyed" in the osdmap epoch as used by ceph-osd (see the commented messages from ceph-osd.999.log below).

Post by Alfredo Deza

Post by Jens-U. Mozdzen
7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)

You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.
See http://tracker.ceph.com/issues/22642 <http://tracker.ceph.com/issues/22642>

If I read that bug description correctly, you're confirming why I needed step #6 above (manually adding the OSD auth entry. But even if ceph-volume had added it, the ceph-osd.log entries suggest that starting the OSD would still have failed, because of accessing the wrong osdmap epoch.
To me it seems like I'm hitting a bug outside of ceph-volume - unless it's ceph-volume that somehow determines which osdmap epoch is used by ceph-osd.

Yes, that's the work-around I then used - purge the old OSD and create a new one.
Thanks & regards,
Jens

Post by Alfredo Deza

_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

Brady Deetz

2018-01-11 18:38:44 UTC

Permalink

I hear you on time. I have 350 x 6TB drives to convert. I recently posted
about a disaster I created automating my migration. Good luck

Post by Reed Dier
I am in the process of migrating my OSDs to bluestore finally and thought
I would give you some input on how I am approaching it.
https://www.spinics.net/lists/ceph-users/msg41802.html
My first OSD I was cautious, and I outed the OSD without downing it,
allowing it to move data off.
Some background on my cluster, for this OSD, it is an 8TB spinner, with an
NVMe partition previously used for journaling in filestore, intending to be
used for block.db in bluestore.
Then I downed it, flushed the journal, destroyed it, zapped with
ceph-volume, set norecover and norebalance flags, did ceph osd crush remove
osd.$ID, ceph auth del osd.$ID, and ceph osd rm osd.$ID and used
ceph-volume locally to create the new LVM target. Then unset the norecover
and norebalance flags and it backfilled like normal.
I initially ran into issues with specifying --osd.id causing my osdâs to
fail to start, but removing that I was able to get it to fill in the gap of
the OSD I just removed.
Iâm now doing quicker, more destructive migrations in an attempt to reduce
data movement.
This way I donât read from OSD Iâm replacing, write to other OSD
temporarily, read back from temp OSD, write back to ânewâ OSD.
Iâm just reading from replica and writing to ânewâ OSD.
So Iâm setting the norecover and norebalance flags, down the OSD (but not
out, it stays in, also have the noout flag set), destroy/zap, recreate
using ceph-volume, unset the flags, and it starts backfilling.
For 8TB disks, and with 23 other 8TB disks in the pool, it takes a *long* time
to offload it and then backfill back from them. I trust my disks enough to
backfill from the other disks, and its going well. Also seeing very good
write performance backfilling compared to previous drive replacements in
filestore, so thats very promising.
Reed
Hi Alfredo,
Dear *,
has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)
I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-
or-rm-osds/#replacing-an-osd
- this basically says
1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD
I never got step 4 to complete. The closest I got was by doing the following
1. Stop the old OSD via systemd (osd-node # systemctl stop
2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group
3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)
5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)
Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.
6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-999/keyring)
I at first tried to follow the documented steps (without my steps 5 and
6), which did not work for me. The documented approach failed with "init
authentication >> failed: (1) Operation not permitted", because actually
ceph-volume did not add the auth entry for me.
But even after manually adding the authentication, the "ceph-volume"
approach failed, as the OSD was still marked "destroyed" in the osdmap
epoch as used by ceph-osd (see the commented messages from ceph-osd.999.log
below).
7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)
You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.
See http://tracker.ceph.com/issues/22642
If I read that bug description correctly, you're confirming why I needed
step #6 above (manually adding the OSD auth entry. But even if ceph-volume
had added it, the ceph-osd.log entries suggest that starting the OSD would
still have failed, because of accessing the wrong osdmap epoch.
To me it seems like I'm hitting a bug outside of ceph-volume - unless it's
ceph-volume that somehow determines which osdmap epoch is used by ceph-osd.
In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.
Yes, that's the work-around I then used - purge the old OSD and create a new one.
Thanks & regards,
Jens
[...]
--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00 0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00 0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
failed: (1) Operation not permitted
# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at that
time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381
2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00 0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00 0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init,
starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial
osdmap
2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map has
features 288232610642264064 was 288232575208792577, adjusting msgr requires
for mons
2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
# another try
# it is now using epoch 110587 for everything. But that one is off by one at
# 110587: modified 2018-01-09 23:43:13.202381
# 110588: modified 2018-01-10 00:12:55.271913
# but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
2018-01-10 00:13:04.332026 7f408d5a4d00 0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:13:04.332037 7f408d5a4d00 0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:13:04.332043 7f408d5a4d00 0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:13:04.332092 7f408d5a4d00 0 osd.999 110587 load_pgs
2018-01-10 00:13:04.332096 7f408d5a4d00 0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:13:04.332100 7f408d5a4d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:13:06.026628 7f408d5a4d00 0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
# the attempt after using "ceph osd new", which created epoch 110591 as the
first with osd.999 as autoout,exists,new
# But ceph-osd still uses 110587.
# 110587: modified 2018-01-09 23:43:13.202381
# 110591: modified 2018-01-10 00:30:44.850078
2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00 0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
--- cut here ---
[...]
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reed Dier

2018-01-11 20:24:52 UTC

Permalink

Thank you for documenting your progress and peril on the ML.

Luckily I only have 24x 8TB HDD and 50x 1.92TB SSDs to migrate over to bluestore.

8 nodes, 4 chassis (failure domain), 3 drives per node for the HDDs, so Iâm able to do about 3 at a time (1 node) for rip/replace.

Definitely taking it slow and steady, and the SSDs will move quickly for backfills as well.
Seeing about 1TB/6hr on backfills, without much performance hit on rest of everything, about 5TB average util on each 8TB disk, so just about 30 hours-ish per host *8 hosts will be about 10 days, so a couple weeks is a safe amount of headway.
This write performance certainly seems better on bluestore than filestore, so that likely helps as well.

Expect I can probably refill an SSD osd in about an hour or two, and will likely stagger those out.
But with such a small number of osdâs currently, Iâm taking the by-hand approach rather than scripting it so as to avoid similar pitfalls.

Reed

I hear you on time. I have 350 x 6TB drives to convert. I recently posted about a disaster I created automating my migration. Good luck
I am in the process of migrating my OSDs to bluestore finally and thought I would give you some input on how I am approaching it.
Some of saga you can find in another ML thread here: https://www.spinics.net/lists/ceph-users/msg41802.html <https://www.spinics.net/lists/ceph-users/msg41802.html>
My first OSD I was cautious, and I outed the OSD without downing it, allowing it to move data off.
Some background on my cluster, for this OSD, it is an 8TB spinner, with an NVMe partition previously used for journaling in filestore, intending to be used for block.db in bluestore.
Then I downed it, flushed the journal, destroyed it, zapped with ceph-volume, set norecover and norebalance flags, did ceph osd crush remove osd.$ID, ceph auth del osd.$ID, and ceph osd rm osd.$ID and used ceph-volume locally to create the new LVM target. Then unset the norecover and norebalance flags and it backfilled like normal.
I initially ran into issues with specifying --osd.id <http://osd.id/> causing my osdâs to fail to start, but removing that I was able to get it to fill in the gap of the OSD I just removed.
Iâm now doing quicker, more destructive migrations in an attempt to reduce data movement.
This way I donât read from OSD Iâm replacing, write to other OSD temporarily, read back from temp OSD, write back to ânewâ OSD.
Iâm just reading from replica and writing to ânewâ OSD.
So Iâm setting the norecover and norebalance flags, down the OSD (but not out, it stays in, also have the noout flag set), destroy/zap, recreate using ceph-volume, unset the flags, and it starts backfilling.
For 8TB disks, and with 23 other 8TB disks in the pool, it takes a long time to offload it and then backfill back from them. I trust my disks enough to backfill from the other disks, and its going well. Also seeing very good write performance backfilling compared to previous drive replacements in filestore, so thats very promising.
Reed

Post by Jens-U. Mozdzen
Hi Alfredo,

Post by Alfredo Deza

Post by Jens-U. Mozdzen
Dear *,
has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)
I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd <http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd>
- this basically says
1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD
I never got step 4 to complete. The closest I got was by doing the following
1. Stop the old OSD via systemd (osd-node # systemctl stop
2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group
3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)
5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

I at first tried to follow the documented steps (without my steps 5 and 6), which did not work for me. The documented approach failed with "init authentication >> failed: (1) Operation not permitted", because actually ceph-volume did not add the auth entry for me.
But even after manually adding the authentication, the "ceph-volume" approach failed, as the OSD was still marked "destroyed" in the osdmap epoch as used by ceph-osd (see the commented messages from ceph-osd.999.log below).

Post by Alfredo Deza

Post by Jens-U. Mozdzen
7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)

If I read that bug description correctly, you're confirming why I needed step #6 above (manually adding the OSD auth entry. But even if ceph-volume had added it, the ceph-osd.log entries suggest that starting the OSD would still have failed, because of accessing the wrong osdmap epoch.
To me it seems like I'm hitting a bug outside of ceph-volume - unless it's ceph-volume that somehow determines which osdmap epoch is used by ceph-osd.

Yes, that's the work-around I then used - purge the old OSD and create a new one.
Thanks & regards,
Jens

Post by Alfredo Deza

_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

Dan Jakubiec

2018-01-17 19:14:53 UTC

Permalink

Also worth pointing out something a bit obvious but: this kind of faster/destructive migration should only be attempted if all your pools are at least 3x replicated.

For example, if you had a 1x replicated pool you would lose data using this approach.

-- Dan

Post by Reed Dier
Thank you for documenting your progress and peril on the ML.
Luckily I only have 24x 8TB HDD and 50x 1.92TB SSDs to migrate over to bluestore.
8 nodes, 4 chassis (failure domain), 3 drives per node for the HDDs, so Iâm able to do about 3 at a time (1 node) for rip/replace.
Definitely taking it slow and steady, and the SSDs will move quickly for backfills as well.
Seeing about 1TB/6hr on backfills, without much performance hit on rest of everything, about 5TB average util on each 8TB disk, so just about 30 hours-ish per host *8 hosts will be about 10 days, so a couple weeks is a safe amount of headway.
This write performance certainly seems better on bluestore than filestore, so that likely helps as well.
Expect I can probably refill an SSD osd in about an hour or two, and will likely stagger those out.
But with such a small number of osdâs currently, Iâm taking the by-hand approach rather than scripting it so as to avoid similar pitfalls.
Reed

Post by Jens-U. Mozdzen
Hi Alfredo,

Post by Alfredo Deza

Post by Jens-U. Mozdzen
Dear *,
has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)
I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd <http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd>
- this basically says
1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD
I never got step 4 to complete. The closest I got was by doing the following
1. Stop the old OSD via systemd (osd-node # systemctl stop
2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group
3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)
5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

I at first tried to follow the documented steps (without my steps 5 and 6), which did not work for me. The documented approach failed with "init authentication >> failed: (1) Operation not permitted", because actually ceph-volume did not add the auth entry for me.
But even after manually adding the authentication, the "ceph-volume" approach failed, as the OSD was still marked "destroyed" in the osdmap epoch as used by ceph-osd (see the commented messages from ceph-osd.999.log below).

Post by Alfredo Deza

Post by Jens-U. Mozdzen
7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)

If I read that bug description correctly, you're confirming why I needed step #6 above (manually adding the OSD auth entry. But even if ceph-volume had added it, the ceph-osd.log entries suggest that starting the OSD would still have failed, because of accessing the wrong osdmap epoch.
To me it seems like I'm hitting a bug outside of ceph-volume - unless it's ceph-volume that somehow determines which osdmap epoch is used by ceph-osd.

Yes, that's the work-around I then used - purge the old OSD and create a new one.
Thanks & regards,
Jens

Post by Alfredo Deza

_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com