Discussion:
OSD Segfaults after Bluestore conversion
Add Reply
Kyle Hutson
2018-02-06 21:53:42 UTC
Reply
Permalink
Raw Message
We had a 26-node production ceph cluster which we upgraded to Luminous a
little over a month ago. I added a 27th-node with Bluestore and didn't have
any issues, so I began converting the others, one at a time. The first two
went off pretty smoothly, but the 3rd is doing something strange.

Initially, all the OSDs came up fine, but then some started to segfault.
Out of curiosity more than anything else, I did reboot the server to see if
it would get better or worse, and it pretty much stayed the same - 12 of
the 18 OSDs did not properly come up. Of those, 3 again segfaulted

I picked one that didn't properly come up and copied the log to where
anybody can view it:
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log

You can contrast that with one that is up:
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log

(which is still showing segfaults in the logs, but seems to be recovering
from them OK?)

Any ideas?
Mike O'Connor
2018-02-08 09:02:51 UTC
Reply
Permalink
Raw Message
Post by Kyle Hutson
We had a 26-node production ceph cluster which we upgraded to Luminous
a little over a month ago. I added a 27th-node with Bluestore and
didn't have any issues, so I began converting the others, one at a
time. The first two went off pretty smoothly, but the 3rd is doing
something strange.
Initially, all the OSDs came up fine, but then some started to
segfault. Out of curiosity more than anything else, I did reboot the
server to see if it would get better or worse, and it pretty much
stayed the same - 12 of the 18 OSDs did not properly come up. Of
those, 3 again segfaulted
I picked one that didn't properly come up and copied the log to where
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
<http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.426.log>
http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
<http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.428.log>
(which is still showing segfaults in the logs, but seems to be
recovering from them OK?)
Any ideas?
Ideas ? yes

There is a a bug which is hitting a small number of systems and at this
time there is no solution. Issues details at
http://tracker.ceph.com/issues/22102.

Please submit more details of your problem on the ticket.

Mike

Loading...