Discussion:
[ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Kees Meijs
2018-08-17 14:53:15 UTC
Permalink
Hi Cephers,

For the last months (well... years actually) we were quite happy using
Hammer. So far, there was no immediate cause implying an upgrade.

However, having seen Luminous providing support for BlueStore, it seemed
like a good idea to perform some upgrade steps.

Doing baby steps, I wanted to upgrade from Hammer to Infernalis first
since all ownerships should be changed because of using an unprivileged
user (good stuff!) instead of root.

So far, I've upgraded all monitors from Hammer (0.94.10) to Infernalis
(9.2.1). All seemed well resulting in HEALTH_OK.

Then, I tried upgrading one OSD server using the following procedure:

1. Alter APT sources to utilise Infernalis instead of Hammer.
2. Update and upgrade the packages.
3. Since I didn't want any rebalancing going on, I ran "ceph osd set
noout" as well.
4. Stop a OSD, then chown ceph:ceph -R /var/lib/ceph/osd/ceph-X, start
the OSD and so on.

Maybe I acted too quickly (ehrm... didn't wait long enough) but at some
point it seemed not all ownership was changed during the process.
Meanwhile we were still HEALTH_OK so I didn't really worry and fixed
left-overs using find /var/lib/ceph -not -user ceph -exec chown
ceph:ceph '{}' ';'

It seemed to work well and two days passed without any issues.
     health HEALTH_ERR
            1 pgs inconsistent
            2 scrub errors
So far, I figured out the two scrubbing errors apply to the same OSD,
being osd.0.
2018-08-17 15:25:36.810866 7fa3c9e09700  0 log_channel(cluster) log
[INF] : 3.72 deep-scrub starts
2018-08-17 15:25:37.221562 7fa3c7604700 -1 log_channel(cluster) log
[ERR] : 3.72 soid -5/00000072/temp_3.72_0_16187756_3476/head: failed
to pick suitable auth object
2018-08-17 15:25:37.221566 7fa3c7604700 -1 log_channel(cluster) log
[ERR] : 3.72 soid -5/00000072/temp_3.72_0_16195026_251/head: failed to
pick suitable auth object
2018-08-17 15:46:36.257994 7fa3c7604700 -1 log_channel(cluster) log
[ERR] : 3.72 deep-scrub 2 errors
The situation seems similar to http://tracker.ceph.com/issues/13862 but
so far I'm unable to repair the placement group.

Meanwhile I'm forcing deep scrubbing for all placement groups applicable
to osd.0, hopefully resulting in just PG 3.72 having errors.

Awaiting deep scrubbing to finish, it seemed like a good idea to ask you
guys for help.

What's the best approach at this point?
eph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 3.72 is active+clean+inconsistent, acting [0,33,39]
2 scrub errors
OSDs 33 and 39 are untouched (still running 0.94.10) and seem fine
without errors.

Thanks in advance for any comments or thoughts.

Regards and enjoy your weekend!
Kees
--
https://nefos.nl/contact

Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland

KvK 66494931

/Aanwezig op maandag, dinsdag, woensdag en vrijdag/
David Turner
2018-08-17 15:21:57 UTC
Permalink
In your baby step upgrade you should avoid the 2 non-LTS releases of
Infernalis and Kraken. You should go from Hammer to Jewel to Luminous.

The general rule of doing the upgrade to put all of your OSDs to be owned
by ceph was to not change the ownership as part of the upgrade. There is a
[1] config option that tells Ceph to override the user the daemons run as
so that you can separate these 2 operations from each other simplifying
each maintenance task. It will set the user to whatever the user is for
each daemon's folder.

[1]
setuser match path = /var/lib/ceph/$type/$cluster-$id
Post by Kees Meijs
Hi Cephers,
For the last months (well... years actually) we were quite happy using
Hammer. So far, there was no immediate cause implying an upgrade.
However, having seen Luminous providing support for BlueStore, it seemed
like a good idea to perform some upgrade steps.
Doing baby steps, I wanted to upgrade from Hammer to Infernalis first
since all ownerships should be changed because of using an unprivileged
user (good stuff!) instead of root.
So far, I've upgraded all monitors from Hammer (0.94.10) to Infernalis
(9.2.1). All seemed well resulting in HEALTH_OK.
1. Alter APT sources to utilise Infernalis instead of Hammer.
2. Update and upgrade the packages.
3. Since I didn't want any rebalancing going on, I ran "ceph osd set
noout" as well.
4. Stop a OSD, then chown ceph:ceph -R /var/lib/ceph/osd/ceph-X, start
the OSD and so on.
Maybe I acted too quickly (ehrm... didn't wait long enough) but at some
point it seemed not all ownership was changed during the process. Meanwhile
we were still HEALTH_OK so I didn't really worry and fixed left-overs using
find /var/lib/ceph -not -user ceph -exec chown ceph:ceph '{}' ';'
It seemed to work well and two days passed without any issues.
health HEALTH_ERR
1 pgs inconsistent
2 scrub errors
So far, I figured out the two scrubbing errors apply to the same OSD,
being osd.0.
2018-08-17 15:25:36.810866 7fa3c9e09700 0 log_channel(cluster) log [INF]
: 3.72 deep-scrub starts
2018-08-17 15:25:37.221562 7fa3c7604700 -1 log_channel(cluster) log [ERR]
: 3.72 soid -5/00000072/temp_3.72_0_16187756_3476/head: failed to pick
suitable auth object
2018-08-17 15:25:37.221566 7fa3c7604700 -1 log_channel(cluster) log [ERR]
: 3.72 soid -5/00000072/temp_3.72_0_16195026_251/head: failed to pick
suitable auth object
2018-08-17 15:46:36.257994 7fa3c7604700 -1 log_channel(cluster) log [ERR]
: 3.72 deep-scrub 2 errors
The situation seems similar to http://tracker.ceph.com/issues/13862 but
so far I'm unable to repair the placement group.
Meanwhile I'm forcing deep scrubbing for all placement groups applicable
to osd.0, hopefully resulting in just PG 3.72 having errors.
Awaiting deep scrubbing to finish, it seemed like a good idea to ask you
guys for help.
What's the best approach at this point?
eph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 3.72 is active+clean+inconsistent, acting [0,33,39]
2 scrub errors
OSDs 33 and 39 are untouched (still running 0.94.10) and seem fine without
errors.
Thanks in advance for any comments or thoughts.
Regards and enjoy your weekend!
Kees
--
https://nefos.nl/contact
Nefos IT bv
Ambachtsweg 25 (industrienummer 4217)
5627 BZ Eindhoven
Nederland
KvK 66494931
*Aanwezig op maandag, dinsdag, woensdag en vrijdag*
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Kees Meijs
2018-08-18 08:52:53 UTC
Permalink
Hi David,

Thank you for pointing out the option.

On http://docs.ceph.com/docs/infernalis/release-notes/ one can read:

*

Ceph daemons now run as user and group ceph by default. The ceph
user has a static UID assigned by Fedora and Debian (also used by
derivative distributions like RHEL/CentOS and Ubuntu). On SUSE the
ceph user will currently get a dynamically assigned UID when the
user is created.

If your systems already have a ceph user, upgrading the package will
cause problems. We suggest you first remove or rename the existing
‘ceph’ user before upgrading.

When upgrading, administrators have two options:

1.

Add the following line to ceph.conf on all hosts:

setuser match path = /var/lib/ceph/$type/$cluster-$id

This will make the Ceph daemons run as root (i.e., not drop
privileges and switch to user ceph) if the daemon’s data
directory is still owned by root. Newly deployed daemons
will be created with data owned by user ceph and will run
with reduced privileges, but upgraded daemons will continue
to run as root.

2.

Fix the data ownership during the upgrade. This is the
preferred option, but is more work. The process for each
host would be to:

1.

Upgrade the ceph package. This creates the ceph user and
group. For example:

ceph-deploy install --stable infernalis HOST

2.

Stop the daemon(s).:

service ceph stop # fedora, centos, rhel, debian
stop ceph-all # ubuntu

3.

Fix the ownership:

chown -R ceph:ceph /var/lib/ceph

4.

Restart the daemon(s).:

start ceph-all # ubuntu
systemctl start ceph.target # debian, centos, fedora, rhel


Since it seemed more elegant to me, I chose the second option and
followed the steps.

To be continued... Over night, some more placement groups seem to be
inconsistent. I'll post my findings later on.

Regards,
Kees
Post by David Turner
In your baby step upgrade you should avoid the 2 non-LTS releases of
Infernalis and Kraken.  You should go from Hammer to Jewel to Luminous.
The general rule of doing the upgrade to put all of your OSDs to be
owned by ceph was to not change the ownership as part of the upgrade. 
There is a [1] config option that tells Ceph to override the user the
daemons run as so that you can separate these 2 operations from each
other simplifying each maintenance task.  It will set the user to
whatever the user is for each daemon's folder.
[1]
setuser match path = /var/lib/ceph/$type/$cluster-$id
David Turner
2018-08-18 15:11:29 UTC
Permalink
The reason to separate the items is to make one change at a time so you
know what might have caused your problems. Good luck.
Post by Kees Meijs
Hi David,
Thank you for pointing out the option.
-
Ceph daemons now run as user and group ceph by default. The ceph user
has a static UID assigned by Fedora and Debian (also used by derivative
distributions like RHEL/CentOS and Ubuntu). On SUSE the ceph user will
currently get a dynamically assigned UID when the user is created.
If your systems already have a ceph user, upgrading the package will
cause problems. We suggest you first remove or rename the existing ‘ceph’
user before upgrading.
1.
setuser match path = /var/lib/ceph/$type/$cluster-$id
This will make the Ceph daemons run as root (i.e., not drop
privileges and switch to user ceph) if the daemon’s data directory is still
owned by root. Newly deployed daemons will be created with data owned by
user ceph and will run with reduced privileges, but upgraded daemons will
continue to run as root.
2.
Fix the data ownership during the upgrade. This is the preferred
1.
Upgrade the ceph package. This creates the ceph user and group.
ceph-deploy install --stable infernalis HOST
2.
service ceph stop # fedora, centos, rhel, debian
stop ceph-all # ubuntu
3.
chown -R ceph:ceph /var/lib/ceph
4.
start ceph-all # ubuntu
systemctl start ceph.target # debian, centos, fedora, rhel
Since it seemed more elegant to me, I chose the second option and followed
the steps.
To be continued... Over night, some more placement groups seem to be
inconsistent. I'll post my findings later on.
Regards,
Kees
In your baby step upgrade you should avoid the 2 non-LTS releases of
Infernalis and Kraken. You should go from Hammer to Jewel to Luminous.
The general rule of doing the upgrade to put all of your OSDs to be owned
by ceph was to not change the ownership as part of the upgrade. There is a
[1] config option that tells Ceph to override the user the daemons run as
so that you can separate these 2 operations from each other simplifying
each maintenance task. It will set the user to whatever the user is for
each daemon's folder.
[1]
setuser match path = /var/lib/ceph/$type/$cluster-$id
Kees Meijs
2018-08-18 15:43:39 UTC
Permalink
Hi again,

After listing all placement groups the problematic OSD (osd.0) being
part of, I forced a deep-scrub for all those PGs.

A few hours later (and some other deep scrubbing as well) the result
HEALTH_ERR 8 pgs inconsistent; 14 scrub errors
pg 3.6c is active+clean+inconsistent, acting [14,2,38]
pg 3.32 is active+clean+inconsistent, acting [0,11,33]
pg 3.13 is active+clean+inconsistent, acting [8,34,9]
pg 3.30 is active+clean+inconsistent, acting [14,35,26]
pg 3.31 is active+clean+inconsistent, acting [44,35,26]
pg 3.7d is active+clean+inconsistent, acting [46,37,35]
pg 3.70 is active+clean+inconsistent, acting [0,36,11]
pg 3.72 is active+clean+inconsistent, acting [0,33,39]
14 scrub errors
OSDs (in order) 0, 8, 14 and 46 all reside on the same server. Obviously
being the one upgraded to Infernalis.

It makes sense I acted too quick given a OSD (regarding to fixing the
ownerships while maybe still running), maybe two but not all of them.

Although it's very likely it wouldn't make a difference, I'll try a ceph
pg repair for each PG.

To be continued again!

Regards,
Kees
To be continued... Over night, some more placement groups seem to be
inconsistent. I'll post my findings later on.
David Turner
2018-08-18 15:51:25 UTC
Permalink
You can't change the file permissions while the pads are still running. The
instructions you pasted earlier said to stop the osd and run the chmod.
What do your osd logs of the primary osd say about the PGs that are
inconsistent?
Post by Kees Meijs
Hi again,
After listing all placement groups the problematic OSD (osd.0) being
part of, I forced a deep-scrub for all those PGs.
A few hours later (and some other deep scrubbing as well) the result
HEALTH_ERR 8 pgs inconsistent; 14 scrub errors
pg 3.6c is active+clean+inconsistent, acting [14,2,38]
pg 3.32 is active+clean+inconsistent, acting [0,11,33]
pg 3.13 is active+clean+inconsistent, acting [8,34,9]
pg 3.30 is active+clean+inconsistent, acting [14,35,26]
pg 3.31 is active+clean+inconsistent, acting [44,35,26]
pg 3.7d is active+clean+inconsistent, acting [46,37,35]
pg 3.70 is active+clean+inconsistent, acting [0,36,11]
pg 3.72 is active+clean+inconsistent, acting [0,33,39]
14 scrub errors
OSDs (in order) 0, 8, 14 and 46 all reside on the same server. Obviously
being the one upgraded to Infernalis.
It makes sense I acted too quick given a OSD (regarding to fixing the
ownerships while maybe still running), maybe two but not all of them.
Although it's very likely it wouldn't make a difference, I'll try a ceph
pg repair for each PG.
To be continued again!
Regards,
Kees
To be continued... Over night, some more placement groups seem to be
inconsistent. I'll post my findings later on.
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Kees Meijs
2018-08-19 06:55:45 UTC
Permalink
Good morning,
2018-08-18 17:45:08.927387 7fa3cbe0d700  0 log_channel(cluster) log
[INF] : 3.32 repair starts
2018-08-18 17:45:12.350343 7fa3c9608700 -1 log_channel(cluster) log
[ERR] : 3.32 soid -5/00000032/temp_3.32_0_16187756_293/head: failed to
pick suitable auth object
2018-08-18 18:07:43.908310 7fa3c9608700 -1 log_channel(cluster) log
[ERR] : 3.32 repair 1 errors, 0 fixed
2018-08-18 18:27:48.141634 7fa3c8606700  0 log_channel(cluster) log
[INF] : 3.70 repair starts
2018-08-18 18:27:49.073504 7fa3c8606700 -1 log_channel(cluster) log
[ERR] : 3.70 soid -5/00000070/temp_3.70_0_16187756_4006/head: failed
to pick suitable auth object
2018-08-18 18:51:57.393099 7fa3cae0b700 -1 log_channel(cluster) log
[ERR] : 3.70 repair 1 errors, 0 fixed
2018-08-18 19:21:20.456610 7fa3c7604700  0 log_channel(cluster) log
[INF] : 3.72 repair starts
2018-08-18 19:21:21.303999 7fa3c9e09700 -1 log_channel(cluster) log
[ERR] : 3.72 soid -5/00000072/temp_3.72_0_16187756_3476/head: failed
to pick suitable auth object
2018-08-18 19:21:21.304051 7fa3c9e09700 -1 log_channel(cluster) log
[ERR] : 3.72 soid -5/00000072/temp_3.72_0_16187756_5344/head: failed
to pick suitable auth object
2018-08-18 19:21:21.304077 7fa3c9e09700 -1 log_channel(cluster) log
[ERR] : 3.72 soid -5/00000072/temp_3.72_0_16195026_251/head: failed to
pick suitable auth object
2018-08-18 19:48:00.016879 7fa3c9e09700 -1 log_channel(cluster) log
[ERR] : 3.72 repair 3 errors, 0 fixed
2018-08-18 17:45:08.807173 7f047f9a2700  0 log_channel(cluster) log
[INF] : 3.13 repair starts
2018-08-18 17:45:10.669835 7f04821a7700 -1 log_channel(cluster) log
[ERR] : 3.13 soid -5/00000013/temp_3.13_0_16175425_287/head: failed to
pick suitable auth object
2018-08-18 18:05:28.966015 7f04795c7700  0 -- 10.128.4.3:6816/5641 >>
10.128.4.4:6800/3454 pipe(0x564161026000 sd=59 :46182 s=2 pgs=11994
cs=31 l=0 c=0x56415b4fc2c0).fault with nothing to send, going to standby
2018-08-18 18:09:46.667875 7f047f9a2700 -1 log_channel(cluster) log
[ERR] : 3.13 repair 1 errors, 0 fixed
2018-08-18 17:45:00.099722 7f1e4f857700  0 log_channel(cluster) log
[INF] : 3.6c repair starts
2018-08-18 17:45:01.982007 7f1e4f857700 -1 log_channel(cluster) log
[ERR] : 3.6c soid -5/0000006c/temp_3.6c_0_16187760_5765/head: failed
to pick suitable auth object
2018-08-18 17:45:01.982042 7f1e4f857700 -1 log_channel(cluster) log
[ERR] : 3.6c soid -5/0000006c/temp_3.6c_0_16187760_796/head: failed to
pick suitable auth object
2018-08-18 18:07:33.490940 7f1e4f857700 -1 log_channel(cluster) log
[ERR] : 3.6c repair 2 errors, 0 fixed
2018-08-18 18:29:24.339018 7f1e4d052700  0 log_channel(cluster) log
[INF] : 3.30 repair starts
2018-08-18 18:29:25.689341 7f1e4f857700 -1 log_channel(cluster) log
[ERR] : 3.30 soid -5/00000030/temp_3.30_0_16187760_3742/head: failed
to pick suitable auth object
2018-08-18 18:29:25.689346 7f1e4f857700 -1 log_channel(cluster) log
[ERR] : 3.30 soid -5/00000030/temp_3.30_0_16187760_3948/head: failed
to pick suitable auth object
2018-08-18 18:54:59.123152 7f1e4f857700 -1 log_channel(cluster) log
[ERR] : 3.30 repair 2 errors, 0 fixed
2018-08-18 18:05:27.421858 7efc52942700  0 log_channel(cluster) log
[INF] : 3.7d repair starts
2018-08-18 18:05:29.511779 7efc5013d700 -1 log_channel(cluster) log
[ERR] : 3.7d soid -5/0000007d/temp_3.7d_0_16204674_4402/head: failed
to pick suitable auth object
2018-08-18 18:29:23.159691 7efc52942700 -1 log_channel(cluster) log
[ERR] : 3.7d repair 1 errors, 0 fixed
I'll investigate further.

Regards,
Kees
Although it's very likely it wouldn't make a difference, I'll try a
ceph pg repair for each PG.
Kees Meijs
2018-08-20 09:51:51 UTC
Permalink
Hi again,

Over night some other PGs seem inconsistent as well after being deep
scrubbed.
failed to pick suitable auth object
Since there's temp in the name and we're running a 3-replica cluster,
I'm thinking of just reboiling the comprised OSDs.

Any thoughts on this, can I do this safely?
12 active+clean+inconsistent
Nota bene: it cannot be file ownership is the real culprit of this. Like
I mentioned earlier in this thread it might be the case for one or maybe
two OSDs but definitely not all.

Regards,
Kees
I'll investigate further.
Kees Meijs
2018-08-20 09:55:01 UTC
Permalink
Ehrm, that should of course be rebuilding. (I.e. removing the OSD,
reformat, re-add.)
Post by Kees Meijs
Since there's temp in the name and we're running a 3-replica cluster,
I'm thinking of just reboiling the comprised OSDs.
David Turner
2018-08-20 10:04:41 UTC
Permalink
My suggestion would be to remove the osds and let the cluster recover from
all of the other copies. I would deploy the node back to Hammer instead of
Infernalis. Either that or remove these osds, let the cluster backfill, and
then upgrade to Jewel, and then luminous, and maybe mimic if you're
planning on making it to the newest LTS before adding the node back in.
That way you could add them back in as bluestore (on either luminous or
mimic) if that's a part of your plan.
Post by Kees Meijs
Ehrm, that should of course be rebuilding. (I.e. removing the OSD,
reformat, re-add.)
Post by Kees Meijs
Since there's temp in the name and we're running a 3-replica cluster,
I'm thinking of just reboiling the comprised OSDs.
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Kees Meijs
2018-08-20 10:12:11 UTC
Permalink
Hi David,

Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel
and then Luminous would be ideal.

Currently all monitors are (succesfully) running Internalis, one OSD
node is running Infernalis and all other OSD nodes have Hammer.

I'll try freeing up one Infernalis OSD at first and see what'll happen.
If it goes well I'll just (for now) give up all OSDs on the given node.
If it works, I'll end up with Hammer OSDs only and Infernalis monitors.

To be continued again!

Regards,
Kees
Post by David Turner
My suggestion would be to remove the osds and let the cluster recover
from all of the other copies. I would deploy the node back to Hammer
instead of Infernalis. Either that or remove these osds, let the
cluster backfill, and then upgrade to Jewel, and then luminous, and
maybe mimic if you're planning on making it to the newest LTS before
adding the node back in. That way you could add them back in as
bluestore (on either luminous or mimic) if that's a part of your plan.
Kees Meijs
2018-08-20 11:14:14 UTC
Permalink
Bad news: I've got a PG stuck in down+peering now.

Please advice.

K.
Post by Kees Meijs
Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel
and then Luminous would be ideal.
Currently all monitors are (succesfully) running Internalis, one OSD
node is running Infernalis and all other OSD nodes have Hammer.
I'll try freeing up one Infernalis OSD at first and see what'll happen.
If it goes well I'll just (for now) give up all OSDs on the given node.
If it works, I'll end up with Hammer OSDs only and Infernalis monitors.
To be continued again!
Kees Meijs
2018-08-20 11:23:58 UTC
Permalink
The given PG is back online, phew...
2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::scan_range(int, int,
PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700
time 2018-08-20 13:06:33.709922
osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)
Restarting the OSDs seems to work.

K.
Bad news: I've got a PG stuck in down+peering now.
Kees Meijs
2018-08-20 19:46:28 UTC
Permalink
Hi again,

I'm starting to feel really unlucky here...
                1387 active+clean
                  11 active+clean+inconsistent
                   7 active+recovery_wait+degraded
                   1 active+recovery_wait+undersized+degraded+remapped
                   1 active+undersized+degraded+remapped+wait_backfill
                   1
active+undersized+degraded+remapped+inconsistent+backfilling
To ensure nothing is in the way, I disabled both scrubbing and deep
scrubbing for the time being.

However, random OSDs (still on Hammer) constantly crash giving the error
as mentioned earlier (osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)).

It felt like they started crashing when hitting the PG currently
backfilling, so I set the nobackfill flag.

For now, the crashing seems to have stopped. However, the cluster seems
slow at the moment when trying to access the given PG via KVM/QEMU (RBD).

Recap:

* All monitors run Infernalis.
* One OSD node runs Infernalis.
* All other OSD nodes run Hammer.
* One OSD on Infernalis is set to "out" and is stopped. This OSD
seemed to contain one inconsistent PG.
* Backfilling started.
* After hours and hours of backfilling, OSDs started to crash.

Other than restarting the "out" and stopped OSD for the time being
(haven't tried that yet) I'm quite lost.

Hopefully someone has some pointers for me.

Regards,
Kees
The given PG is back online, phew...
2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::scan_range(int, int,
PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700
time 2018-08-20 13:06:33.709922
osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)
Restarting the OSDs seems to work.
K.
Bad news: I've got a PG stuck in down+peering now.
Kees Meijs
2018-08-21 01:45:33 UTC
Permalink
Hi there,

A few hours ago I started the given OSD again and gave it weight
1.00000. Backfilling started and more PGs became active+clean.

After a while the same crashing behaviour started to act up so I stopped
the backfilling.

Running with noout,nobackfill,norebalance,noscrub,nodeep-scrub flags now
but at least it seems the cluster seems stable (fingers crossed...)

Possible plan of attack:

1. Stopping all Infernalis OSDs.
2. Remove Ceph Infernalis packages from OSD node.
3. Install Hammer packages.
4. Start the OSDs (or maybe the package installation does this already.)

Effectively this is an OSD downgrade. Is this supported or does Ceph
"upgrade" data structures on disk as well?

Recap: this would imply going from Infernalis back to Hammer.

Any thoughts are more than welcome (maybe a completely different
approach makes sense...) Meanwhile, I'll try to catch some sleep.

Thanks, thanks!

Best regards,
Kees
Post by Kees Meijs
Other than restarting the "out" and stopped OSD for the time being
(haven't tried that yet) I'm quite lost.
David Turner
2018-08-21 14:34:54 UTC
Permalink
Ceph does not support downgrading OSDs. When you removed the single OSD,
it was probably trying to move data onto the other OSDs in the node with
Infernalis OSDs. I would recommend stopping every OSD in that node and
marking them out so the cluster will rebalance without them. Assuming your
cluster is able to get healthy after that, we'll see where things are.

Also, please stop opening so many email threads about this same issue. It
makes tracking this in the archives impossible.
Post by Kees Meijs
Hi there,
A few hours ago I started the given OSD again and gave it weight 1.00000.
Backfilling started and more PGs became active+clean.
After a while the same crashing behaviour started to act up so I stopped
the backfilling.
Running with noout,nobackfill,norebalance,noscrub,nodeep-scrub flags now
but at least it seems the cluster seems stable (fingers crossed...)
1. Stopping all Infernalis OSDs.
2. Remove Ceph Infernalis packages from OSD node.
3. Install Hammer packages.
4. Start the OSDs (or maybe the package installation does this already.)
Effectively this is an OSD downgrade. Is this supported or does Ceph
"upgrade" data structures on disk as well?
Recap: this would imply going from Infernalis back to Hammer.
Any thoughts are more than welcome (maybe a completely different approach
makes sense...) Meanwhile, I'll try to catch some sleep.
Thanks, thanks!
Best regards,
Kees
Other than restarting the "out" and stopped OSD for the time being
(haven't tried that yet) I'm quite lost.
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Kees Meijs
2018-08-21 17:08:02 UTC
Permalink
Hello David,

Thank you and I'm terribly sorry; I was unaware I was starting new threads.

From the top of my mind I say "yes it'll fit" but obviously I make sure
at first.

Regards,
Kees
Ceph does not support downgrading OSDs.  When you removed the single
OSD, it was probably trying to move data onto the other OSDs in the
node with Infernalis OSDs.  I would recommend stopping every OSD in
that node and marking them out so the cluster will rebalance without
them.  Assuming your cluster is able to get healthy after that, we'll
see where things are.
Also, please stop opening so many email threads about this same
issue.  It makes tracking this in the archives impossible.
Paul Emmerich
2018-08-21 17:13:08 UTC
Permalink
I would continue with the upgrade of all OSDs this scenario as the old
ones are crashing, not the new one.
Maybe with all the flags set (pause, norecover, ...)


Paul
Post by Kees Meijs
Hello David,
Thank you and I'm terribly sorry; I was unaware I was starting new threads.
From the top of my mind I say "yes it'll fit" but obviously I make sure at
first.
Regards,
Kees
Post by David Turner
Ceph does not support downgrading OSDs. When you removed the single OSD,
it was probably trying to move data onto the other OSDs in the node with
Infernalis OSDs. I would recommend stopping every OSD in that node and
marking them out so the cluster will rebalance without them. Assuming your
cluster is able to get healthy after that, we'll see where things are.
Also, please stop opening so many email threads about this same issue. It
makes tracking this in the archives impossible.
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
David Turner
2018-08-21 17:48:32 UTC
Permalink
The problem with the current OSDs was a poorly advised chmod of the OSD
data store. From what I've pieced together the chmod was run against a
running OSD.
Post by Paul Emmerich
I would continue with the upgrade of all OSDs this scenario as the old
ones are crashing, not the new one.
Maybe with all the flags set (pause, norecover, ...)
Paul
Post by Kees Meijs
Hello David,
Thank you and I'm terribly sorry; I was unaware I was starting new
threads.
Post by Kees Meijs
From the top of my mind I say "yes it'll fit" but obviously I make sure
at
Post by Kees Meijs
first.
Regards,
Kees
Post by David Turner
Ceph does not support downgrading OSDs. When you removed the single
OSD,
Post by Kees Meijs
Post by David Turner
it was probably trying to move data onto the other OSDs in the node with
Infernalis OSDs. I would recommend stopping every OSD in that node and
marking them out so the cluster will rebalance without them. Assuming
your
Post by Kees Meijs
Post by David Turner
cluster is able to get healthy after that, we'll see where things are.
Also, please stop opening so many email threads about this same issue.
It
Post by Kees Meijs
Post by David Turner
makes tracking this in the archives impossible.
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 MÃŒnchen
www.croit.io
Tel: +49 89 1896585 90 <+49%2089%20189658590>
Kees Meijs
2018-09-10 14:43:16 UTC
Permalink
Hi list,

A little update: meanwhile we added a new node consisting of Hammer OSDs
to ensure sufficient cluster capacity.

The upgraded node with Infernalis OSDs is completely removed from the
CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet).

At the moment we're still running using flags
noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs
reside, we still experience OSD crashes on backfilling so we're unable
to achieve HEALTH_OK state.

Using debug 20 level we're (mostly my coworker Willem Jan is) figuring
out why the crashes happen exactly. Hopefully we'll figure it out.

To be continued...

Regards,
Kees
Kees Meijs
2018-11-12 07:50:13 UTC
Permalink
Hi list,

Between crashes we were able to allow the cluster to backfill as much as
possible (all monitors Infernalis, OSDs being Hammer again).
8.0M -rw-r--r-- 1 root root 8.0M Aug 24 23:56
temp\u3.bd\u0\u16175417\u2718__head_000000BD__fffffffffffffffb
8.0M -rw-r--r-- 1 root root 8.0M Aug 28 05:51
temp\u3.bd\u0\u16175417\u3992__head_000000BD__fffffffffffffffb
8.0M -rw-r--r-- 1 root root 8.0M Aug 30 03:40
temp\u3.bd\u0\u16175417\u4521__head_000000BD__fffffffffffffffb
8.0M -rw-r--r-- 1 root root 8.0M Aug 31 03:46
temp\u3.bd\u0\u16175417\u4817__head_000000BD__fffffffffffffffb
8.0M -rw-r--r-- 1 root root 8.0M Sep  5 19:44
temp\u3.bd\u0\u16175417\u6252__head_000000BD__fffffffffffffffb
8.0M -rw-r--r-- 1 root root 8.0M Sep  6 14:44
temp\u3.bd\u0\u16175417\u6593__head_000000BD__fffffffffffffffb
8.0M -rw-r--r-- 1 root root 8.0M Sep  7 10:21
temp\u3.bd\u0\u16175417\u6870__head_000000BD__fffffffffffffffb
Restarting the given OSD didn't seem necessary; backfilling started to
work and at some point enough replicas were available for each PG.

Finally deep scrubbing repaired the inconsistent PGs automagically and
we arrived at HEALTH_OK again!

Case closed: up to Jewel.

For everyone involved: a big, big and even bigger thank you for all
pointers and support!

Regards,
Kees
A little update: meanwhile we added a new node consisting of Hammer OSDs
to ensure sufficient cluster capacity.
The upgraded node with Infernalis OSDs is completely removed from the
CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet).
At the moment we're still running using flags
noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs
reside, we still experience OSD crashes on backfilling so we're unable
to achieve HEALTH_OK state.
Using debug 20 level we're (mostly my coworker Willem Jan is) figuring
out why the crashes happen exactly. Hopefully we'll figure it out.
To be continued...
Loading...