deeepdish
2015-12-10 04:00:45 UTC
Hello,
I encountered a strange issue when rebuilding monitors reusing same hostnames, however different IPs.
Steps to reproduce:
- Build monitor using ceph-deploy create mon <hostname1>
- Remove monitor via http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ (remove monitor) â I didnât realize there was a ceph-deploy mon destroy command at this point.
- Build a new monitor on same hardware using ceph-deploy create mon <hostname1a> # reason = to rename / change IP of monitor as per above link
- Monitor ends up in probing mode. When connecting via the admin socket, I see that there are no peers avail.
The above behavior of only when reinstalling monitors. I even tried reinstalling the OS, however thereâs a monmap embedded somewhere causing the previous monitor hostnames / IPs to conflict with the new monitorâs peering ability.
On a working monitor:
# sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.b02s08.asok mon_status
{
"name": "b02s08",
"rank": 0,
"state": "leader",
"election_epoch": 2618,
"quorum": [
0,
1,
2
],
"outside_quorum": [],
"extra_probe_peers": [
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 12,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "2015-12-09 06:23:43.665100",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "b02s08",
"addr": "10.20.1.8:6789\/0"
},
{
"rank": 1,
"name": "smon01",
"addr": "10.20.10.251:6789\/0"
},
{
"rank": 2,
"name": "smon02",
"addr": "10.20.10.252:6789\/0"
}
]
}
}
[***@b02s08 ~]#
On a reinstalled (not working) monitor:
sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smg01.asok mon_status
{
"name": "smg01",
"rank": 0,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [
"smg01"
],
"extra_probe_peers": [
"10.20.1.8:6789\/0",
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0",
"10.20.10.18:6789\/0",
"10.20.10.251:6789\/0",
"10.20.10.252:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 0,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "0.000000",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "smg01",
"addr": "10.20.10.250:6789\/0"
},
{
"rank": 1,
"name": "b02vm14s",
"addr": "0.0.0.0:0\/1"
},
{
"rank": 2,
"name": "b02vm16s",
"addr": "0.0.0.0:0\/2"
},
{
"rank": 3,
"name": "b02s18s",
"addr": "0.0.0.0:0\/3"
},
{
"rank": 4,
"name": "smon01s",
"addr": "0.0.0.0:0\/4"
},
{
"rank": 5,
"name": "smon02s",
"addr": "0.0.0.0:0\/5"
},
{
"rank": 6,
"name": "b02s08",
"addr": "0.0.0.0:0\/6"
}
]
}
}
How can I correct this?
Thanks.
I encountered a strange issue when rebuilding monitors reusing same hostnames, however different IPs.
Steps to reproduce:
- Build monitor using ceph-deploy create mon <hostname1>
- Remove monitor via http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ (remove monitor) â I didnât realize there was a ceph-deploy mon destroy command at this point.
- Build a new monitor on same hardware using ceph-deploy create mon <hostname1a> # reason = to rename / change IP of monitor as per above link
- Monitor ends up in probing mode. When connecting via the admin socket, I see that there are no peers avail.
The above behavior of only when reinstalling monitors. I even tried reinstalling the OS, however thereâs a monmap embedded somewhere causing the previous monitor hostnames / IPs to conflict with the new monitorâs peering ability.
On a working monitor:
# sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.b02s08.asok mon_status
{
"name": "b02s08",
"rank": 0,
"state": "leader",
"election_epoch": 2618,
"quorum": [
0,
1,
2
],
"outside_quorum": [],
"extra_probe_peers": [
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 12,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "2015-12-09 06:23:43.665100",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "b02s08",
"addr": "10.20.1.8:6789\/0"
},
{
"rank": 1,
"name": "smon01",
"addr": "10.20.10.251:6789\/0"
},
{
"rank": 2,
"name": "smon02",
"addr": "10.20.10.252:6789\/0"
}
]
}
}
[***@b02s08 ~]#
On a reinstalled (not working) monitor:
sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smg01.asok mon_status
{
"name": "smg01",
"rank": 0,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [
"smg01"
],
"extra_probe_peers": [
"10.20.1.8:6789\/0",
"10.20.10.14:6789\/0",
"10.20.10.16:6789\/0",
"10.20.10.18:6789\/0",
"10.20.10.251:6789\/0",
"10.20.10.252:6789\/0"
],
"sync_provider": [],
"monmap": {
"epoch": 0,
"fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7",
"modified": "0.000000",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "smg01",
"addr": "10.20.10.250:6789\/0"
},
{
"rank": 1,
"name": "b02vm14s",
"addr": "0.0.0.0:0\/1"
},
{
"rank": 2,
"name": "b02vm16s",
"addr": "0.0.0.0:0\/2"
},
{
"rank": 3,
"name": "b02s18s",
"addr": "0.0.0.0:0\/3"
},
{
"rank": 4,
"name": "smon01s",
"addr": "0.0.0.0:0\/4"
},
{
"rank": 5,
"name": "smon02s",
"addr": "0.0.0.0:0\/5"
},
{
"rank": 6,
"name": "b02s08",
"addr": "0.0.0.0:0\/6"
}
]
}
}
How can I correct this?
Thanks.