Discussion:
[ceph-users] Monitor rename / recreate issue -- probing state
deeepdish
2015-12-10 04:00:45 UTC
Permalink
Hello,

I encountered a strange issue when rebuilding monitors reusing same hostnames, however different IPs.

Steps to reproduce:

- Build monitor using ceph-deploy create mon <hostname1>
- Remove monitor via http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ (remove monitor) — I didn’t realize there was a ceph-deploy mon destroy command at this point.
- Build a new monitor on same hardware using ceph-deploy create mon <hostname1a> # reason = to rename / change IP of monitor as per above link
- Monitor ends up in probing mode. When connecting via the admin socket, I see that there are no peers avail.

The above behavior of only when reinstalling monitors. I even tried reinstalling the OS, however there’s a monmap embedded somewhere causing the previous monitor hostnames / IPs to conflict with the new monitor’s peering ability.

On a working monitor:

# sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.b02s08.asok mon_status
{
"name": "b02s08",
"rank": 0,
"state": "leader",
"election_epoch": 2618,
"quorum": [
0,
1,
2
],
"outside_quorum": [],
"extra_probe_peers": [
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 12,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "2015-12-09 06:23:43.665100",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "b02s08",
"addr": "10.20.1.8:6789\/0"
},
{
"rank": 1,
"name": "smon01",
"addr": "10.20.10.251:6789\/0"
},
{
"rank": 2,
"name": "smon02",
"addr": "10.20.10.252:6789\/0"
}
]
}
}

[***@b02s08 ~]#

On a reinstalled (not working) monitor:

sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smg01.asok mon_status
{
"name": "smg01",
"rank": 0,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [
"smg01"
],
"extra_probe_peers": [
"10.20.1.8:6789\/0",
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0",
"10.20.10.18:6789\/0",
"10.20.10.251:6789\/0",
"10.20.10.252:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 0,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "0.000000",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "smg01",
"addr": "10.20.10.250:6789\/0"
},
{
"rank": 1,
"name": "b02vm14s",
"addr": "0.0.0.0:0\/1"
},
{
"rank": 2,
"name": "b02vm16s",
"addr": "0.0.0.0:0\/2"
},
{
"rank": 3,
"name": "b02s18s",
"addr": "0.0.0.0:0\/3"
},
{
"rank": 4,
"name": "smon01s",
"addr": "0.0.0.0:0\/4"
},
{
"rank": 5,
"name": "smon02s",
"addr": "0.0.0.0:0\/5"
},
{
"rank": 6,
"name": "b02s08",
"addr": "0.0.0.0:0\/6"
}
]
}
}


How can I correct this?

Thanks.
Joao Eduardo Luis
2015-12-10 09:34:58 UTC
Permalink
Post by deeepdish
Hello,
I encountered a strange issue when rebuilding monitors reusing same
hostnames, however different IPs.
- Build monitor using ceph-deploy create mon <hostname1>
- Remove monitor
via http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/
(remove monitor) — I didn’t realize there was a ceph-deploy mon destroy
command at this point.
- Build a new monitor on same hardware using ceph-deploy create mon
<hostname1a> # reason = to rename / change IP of monitor as per above link
- Monitor ends up in probing mode. When connecting via the admin
socket, I see that there are no peers avail.
The above behavior of only when reinstalling monitors. I even tried
reinstalling the OS, however there’s a monmap embedded somewhere causing
the previous monitor hostnames / IPs to conflict with the new monitor’s
peering ability.
sudo ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.smg01.asok mon_status
{
"name": "smg01",
"rank": 0,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [
"smg01"
],
"extra_probe_peers": [
"10.20.1.8:6789\/0",
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0",
"10.20.10.18:6789\/0",
"10.20.10.251:6789\/0",
"10.20.10.252:6789\/0"
],
[snip]
Post by deeepdish
}
This appears to be consistent with a wrongly populated 'mon_host' and
'mon_initial_members' in your ceph.conf.

-Joao

Loading...