Discussion:
[ceph-users] Ceph mgr Prometheus plugin: error when osd is down
Gökhan Kocak
2018-11-14 15:32:39 UTC
Permalink
Hello everyone,

we encountered an error with the Prometheus plugin for Ceph mgr:
One osd was down and (therefore) it had no class:
```
sudo ceph osd tree
ID  CLASS WEIGHT    TYPE NAME          STATUS REWEIGHT PRI-AFF
 28   hdd   7.27539             osd.28     up  1.00000 1.00000
  6               0 osd.6                down        0 1.00000

```

When we tried to curl the metrics, there was an error because the osd
had no class (see below "KeyError: 'class' ").

Anybody experience the same?

Isn't this an error on the Prometheus plugin's behalf? When an osd is down, the plugin should not stop working imo.

```
~> curl -v 127.0.0.1:9283/metrics
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 9283 (#0)
GET /metrics HTTP/1.1
Host: 127.0.0.1:9283
User-Agent: curl/7.47.0
Accept: */*
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 14 Nov 2018 13:59:59 GMT
< Content-Length: 1663
< Content-Type: text/html;charset=utf-8
< Server: CherryPy/3.5.0
<
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html;
charset=utf-8"></meta>
    <title>500 Internal Server Error</title>
    <style type="text/css">
    #powered_by {
        margin-top: 20px;
        border-top: 2px solid black;
        font-style: italic;
    }

    #traceback {
        color: red;
    }
    </style>
</head>
    <body>
        <h2>500 Internal Server Error</h2>
        <p>The server encountered an unexpected condition which
prevented it from fulfilling the request.</p>
        <pre id="traceback">Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line
670, in respond
    response.body = self.handler()
  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
217, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line
61, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
414, in metrics
    metrics = global_instance().collect()
  File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
351, in collect
    self.get_metadata_and_osd_status()
  File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
310, in get_metadata_and_osd_status
    dev_class['class'],
KeyError: 'class'
</pre>
    <div id="powered_by">
      <span>
        Powered by <a href="http://www.cherrypy.org">CherryPy 3.5.0</a>
      </span>
    </div>
    </body>
</html>
* Connection #0 to host 127.0.0.1 left intact
```

Kind regards,

Gökhan
John Spray
2018-11-14 20:03:53 UTC
Permalink
On Wed, Nov 14, 2018 at 3:32 PM Gökhan Kocak
Post by Gökhan Kocak
Hello everyone,
```
sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
28 hdd 7.27539 osd.28 up 1.00000 1.00000
6 0 osd.6 down 0 1.00000
```
When we tried to curl the metrics, there was an error because the osd
had no class (see below "KeyError: 'class' ").
I suspect you're running an old release? This bug
(https://tracker.ceph.com/issues/23300) was fixed in 12.2.5.

John
Post by Gökhan Kocak
Anybody experience the same?
Isn't this an error on the Prometheus plugin's behalf? When an osd is down, the plugin should not stop working imo.
```
~> curl -v 127.0.0.1:9283/metrics
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 9283 (#0)
GET /metrics HTTP/1.1
Host: 127.0.0.1:9283
User-Agent: curl/7.47.0
Accept: */*
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 14 Nov 2018 13:59:59 GMT
< Content-Length: 1663
< Content-Type: text/html;charset=utf-8
< Server: CherryPy/3.5.0
<
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8"></meta>
<title>500 Internal Server Error</title>
<style type="text/css">
#powered_by {
margin-top: 20px;
border-top: 2px solid black;
font-style: italic;
}
#traceback {
color: red;
}
</style>
</head>
<body>
<h2>500 Internal Server Error</h2>
<p>The server encountered an unexpected condition which
prevented it from fulfilling the request.</p>
File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line
670, in respond
response.body = self.handler()
File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
217, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line
61, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
414, in metrics
metrics = global_instance().collect()
File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
351, in collect
self.get_metadata_and_osd_status()
File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
310, in get_metadata_and_osd_status
dev_class['class'],
KeyError: 'class'
</pre>
<div id="powered_by">
<span>
Powered by <a href="http://www.cherrypy.org">CherryPy 3.5.0</a>
</span>
</div>
</body>
</html>
* Connection #0 to host 127.0.0.1 left intact
```
Kind regards,
Gökhan
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Gökhan Kocak
2018-11-15 07:58:46 UTC
Permalink
True, sorry and many thanks!

Gökhan
Post by John Spray
On Wed, Nov 14, 2018 at 3:32 PM Gökhan Kocak
Post by Gökhan Kocak
Hello everyone,
```
sudo ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
28 hdd 7.27539 osd.28 up 1.00000 1.00000
6 0 osd.6 down 0 1.00000
```
When we tried to curl the metrics, there was an error because the osd
had no class (see below "KeyError: 'class' ").
I suspect you're running an old release? This bug
(https://tracker.ceph.com/issues/23300) was fixed in 12.2.5.
John
Post by Gökhan Kocak
Anybody experience the same?
Isn't this an error on the Prometheus plugin's behalf? When an osd is down, the plugin should not stop working imo.
```
~> curl -v 127.0.0.1:9283/metrics
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 9283 (#0)
GET /metrics HTTP/1.1
Host: 127.0.0.1:9283
User-Agent: curl/7.47.0
Accept: */*
< HTTP/1.1 500 Internal Server Error
< Date: Wed, 14 Nov 2018 13:59:59 GMT
< Content-Length: 1663
< Content-Type: text/html;charset=utf-8
< Server: CherryPy/3.5.0
<
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8"></meta>
<title>500 Internal Server Error</title>
<style type="text/css">
#powered_by {
margin-top: 20px;
border-top: 2px solid black;
font-style: italic;
}
#traceback {
color: red;
}
</style>
</head>
<body>
<h2>500 Internal Server Error</h2>
<p>The server encountered an unexpected condition which
prevented it from fulfilling the request.</p>
File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line
670, in respond
response.body = self.handler()
File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
217, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line
61, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
414, in metrics
metrics = global_instance().collect()
File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
351, in collect
self.get_metadata_and_osd_status()
File "/usr/lib/x86_64-linux-gnu/ceph/mgr/prometheus/module.py", line
310, in get_metadata_and_osd_status
dev_class['class'],
KeyError: 'class'
</pre>
<div id="powered_by">
<span>
Powered by <a href="http://www.cherrypy.org">CherryPy 3.5.0</a>
</span>
</div>
</body>
</html>
* Connection #0 to host 127.0.0.1 left intact
```
Kind regards,
Gökhan
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Loading...