楼锴毅
2018-11-20 03:17:36 UTC
Hello,
sorry to disturb , but recently when I use ceph(12.2.8),I found that the leader monitor will always failed in thread_name:safe_timer.
Here is a part of the log
0> 2018-11-20 10:33:22.386543 7faf7d84f700 -1 *** Caught signal (Aborted) **
in thread 7faf7d84f700 thread_name:safe_timer
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (()+0x93f2d1) [0x55ef7319c2d1]
2: (()+0xf5e0) [0x7faf83fb55e0]
3: (gsignal()+0x37) [0x7faf810ee1f7]
4: (abort()+0x148) [0x7faf810ef8e8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7faf819f4ac5]
6: (()+0x5ea36) [0x7faf819f2a36]
7: (()+0x5ea63) [0x7faf819f2a63]
8: (()+0x5ec83) [0x7faf819f2c83]
9: (std::__throw_out_of_range(char const*)+0x77) [0x7faf81a47a97]
10: (FSMap::get_info_gid(mds_gid_t) const+0xfc) [0x55ef72e1dc0c]
11: (MDSMonitor::tick()+0x427) [0x55ef72e107d7]
12: (Monitor::tick()+0x128) [0x55ef72c48908]
13: (C_MonContext::finish(int)+0x37) [0x55ef72c1a7d7]
14: (Context::complete(int)+0x9) [0x55ef72c585c9]
15: (SafeTimer::timer_thread()+0x104) [0x55ef72e8dbc4]
16: (SafeTimerThread::entry()+0xd) [0x55ef72e8f5ed]
17: (()+0x7e25) [0x7faf83fade25]
18: (clone()+0x6d) [0x7faf811b134d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
And my cluster¡¯s status is about:
cluster:
id: 8c9bc910-c7f1-4b98-8c61-e18ee786e983
health: HEALTH_OK
services:
mon: 2 daemons, quorum qbs-monitor-online010-hbaz1.qiyi.virtual,qbs-monitor-online009-hbaz1.qiyi.virtual
mgr: qbs-monitor-online009-hbaz1(active, starting)
osd: 164 osds: 164 up, 164 in
rgw: 3 daemons active
data:
pools: 26 pools, 4832 pgs
objects: 5.39k objects, 20.0GiB
usage: 243GiB used, 1.07PiB / 1.07PiB avail
pgs: 4832 active+clean
io:
client: 4.63KiB/s wr, 0op/s rd, 0op/s wr
what can I do to recover it ? I am happy to give more information about the question if necessary.
Sincerely,
LouKaiyi
sorry to disturb , but recently when I use ceph(12.2.8),I found that the leader monitor will always failed in thread_name:safe_timer.
Here is a part of the log
0> 2018-11-20 10:33:22.386543 7faf7d84f700 -1 *** Caught signal (Aborted) **
in thread 7faf7d84f700 thread_name:safe_timer
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (()+0x93f2d1) [0x55ef7319c2d1]
2: (()+0xf5e0) [0x7faf83fb55e0]
3: (gsignal()+0x37) [0x7faf810ee1f7]
4: (abort()+0x148) [0x7faf810ef8e8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7faf819f4ac5]
6: (()+0x5ea36) [0x7faf819f2a36]
7: (()+0x5ea63) [0x7faf819f2a63]
8: (()+0x5ec83) [0x7faf819f2c83]
9: (std::__throw_out_of_range(char const*)+0x77) [0x7faf81a47a97]
10: (FSMap::get_info_gid(mds_gid_t) const+0xfc) [0x55ef72e1dc0c]
11: (MDSMonitor::tick()+0x427) [0x55ef72e107d7]
12: (Monitor::tick()+0x128) [0x55ef72c48908]
13: (C_MonContext::finish(int)+0x37) [0x55ef72c1a7d7]
14: (Context::complete(int)+0x9) [0x55ef72c585c9]
15: (SafeTimer::timer_thread()+0x104) [0x55ef72e8dbc4]
16: (SafeTimerThread::entry()+0xd) [0x55ef72e8f5ed]
17: (()+0x7e25) [0x7faf83fade25]
18: (clone()+0x6d) [0x7faf811b134d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
And my cluster¡¯s status is about:
cluster:
id: 8c9bc910-c7f1-4b98-8c61-e18ee786e983
health: HEALTH_OK
services:
mon: 2 daemons, quorum qbs-monitor-online010-hbaz1.qiyi.virtual,qbs-monitor-online009-hbaz1.qiyi.virtual
mgr: qbs-monitor-online009-hbaz1(active, starting)
osd: 164 osds: 164 up, 164 in
rgw: 3 daemons active
data:
pools: 26 pools, 4832 pgs
objects: 5.39k objects, 20.0GiB
usage: 243GiB used, 1.07PiB / 1.07PiB avail
pgs: 4832 active+clean
io:
client: 4.63KiB/s wr, 0op/s rd, 0op/s wr
what can I do to recover it ? I am happy to give more information about the question if necessary.
Sincerely,
LouKaiyi