Olivier Bonvalet
2018-06-05 07:25:49 UTC
Hi,
I have a cluster in "stale" state : a lots of RBD are blocked since ~10
hours. In the status I see PG in stale or down state, but thoses PG
doesn't seem to exists anymore :
root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull osd(s); 16 pool(s) nearfull; 4645278/103969515 objects misplaced (4.468%); Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515 objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized; 229 slow requests are blocked > 32 sec; 4074 stuck requests are blocked > 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-sbg,hyp02-sbg,hyp03-sbg are using a lot of disk space
PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale
pg 31.8b is down, acting [2147483647,16,36]
pg 31.8e is down, acting [2147483647,29,19]
pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]
root! stor00-sbg:~# ceph pg 31.8b query
Error ENOENT: i don't have pgid 31.8b
root! stor00-sbg:~# ceph pg 31.8e query
Error ENOENT: i don't have pgid 31.8e
root! stor00-sbg:~# ceph pg 46.b8 query
Error ENOENT: i don't have pgid 46.b8
We just loose an HDD, and mark the corresponding OSD as "lost".
Any idea of what should I do ?
Thanks,
Olivier
I have a cluster in "stale" state : a lots of RBD are blocked since ~10
hours. In the status I see PG in stale or down state, but thoses PG
doesn't seem to exists anymore :
root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull osd(s); 16 pool(s) nearfull; 4645278/103969515 objects misplaced (4.468%); Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515 objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized; 229 slow requests are blocked > 32 sec; 4074 stuck requests are blocked > 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-sbg,hyp02-sbg,hyp03-sbg are using a lot of disk space
PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale
pg 31.8b is down, acting [2147483647,16,36]
pg 31.8e is down, acting [2147483647,29,19]
pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]
root! stor00-sbg:~# ceph pg 31.8b query
Error ENOENT: i don't have pgid 31.8b
root! stor00-sbg:~# ceph pg 31.8e query
Error ENOENT: i don't have pgid 31.8e
root! stor00-sbg:~# ceph pg 46.b8 query
Error ENOENT: i don't have pgid 46.b8
We just loose an HDD, and mark the corresponding OSD as "lost".
Any idea of what should I do ?
Thanks,
Olivier