Discussion:
[ceph-users] slow ops after cephfs snapshot removal
Kenneth Waegeman
2018-11-09 10:23:53 UTC
Permalink
Hi all,

On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
snapshots:

[***@osd001 ~]# ceph -s
  cluster:
    id:     92bfcf0a-1d39-43b3-b60f-44f01b630e47
    health: HEALTH_WARN
            5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has
slow ops

  services:
    mon: 3 daemons, quorum mds01,mds02,mds03
    mgr: mds02(active), standbys: mds03, mds01
    mds: ceph_fs-2/2/2 up  {0=mds03=up:active,1=mds01=up:active}, 1
up:standby
    osd: 544 osds: 544 up, 544 in

  io:
    client:   5.4 KiB/s wr, 0 op/s rd, 0 op/s wr

[***@osd001 ~]# ceph health detail
HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has
slow ops
SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops

[***@osd001 ~]# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)

Is this a known issue?

Cheers,

Kenneth
Gregory Farnum
2018-11-09 21:38:12 UTC
Permalink
On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman <***@ugent.be>
wrote:

> Hi all,
>
> On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
> snapshots:
>
> [***@osd001 ~]# ceph -s
> cluster:
> id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
> health: HEALTH_WARN
> 5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has
> slow ops
>
> services:
> mon: 3 daemons, quorum mds01,mds02,mds03
> mgr: mds02(active), standbys: mds03, mds01
> mds: ceph_fs-2/2/2 up {0=mds03=up:active,1=mds01=up:active}, 1
> up:standby
> osd: 544 osds: 544 up, 544 in
>
> io:
> client: 5.4 KiB/s wr, 0 op/s rd, 0 op/s wr
>
> [***@osd001 ~]# ceph health detail
> HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has
> slow ops
> SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow
> ops
>
> [***@osd001 ~]# ceph -v
> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
> (stable)
>
> Is this a known issue?
>

It's not exactly a known issue, but from the output and story you've got
here it looks like the OSDs are deleting the snapshot data too fast and the
MDS isn't getting quick enough replies? Or maybe you have an overlarge
CephFS directory which is taking a long time to clean up somehow; you
should get the MDS ops and the MDS' objecter ops in flight and see what
specifically is taking so long.
-Greg


>
> Cheers,
>
> Kenneth
>
> _______________________________________________
> ceph-users mailing list
> ceph-***@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
Chris Taylor
2018-11-09 21:54:53 UTC
Permalink
> On Nov 9, 2018, at 1:38 PM, Gregory Farnum <***@redhat.com> wrote:
>
>> On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman <***@ugent.be> wrote:
>> Hi all,
>>
>> On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
>> snapshots:
>>
>> [***@osd001 ~]# ceph -s
>> cluster:
>> id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
>> health: HEALTH_WARN
>> 5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has
>> slow ops
>>
>> services:
>> mon: 3 daemons, quorum mds01,mds02,mds03
>> mgr: mds02(active), standbys: mds03, mds01
>> mds: ceph_fs-2/2/2 up {0=mds03=up:active,1=mds01=up:active}, 1
>> up:standby
>> osd: 544 osds: 544 up, 544 in
>>
>> io:
>> client: 5.4 KiB/s wr, 0 op/s rd, 0 op/s wr
>>
>> [***@osd001 ~]# ceph health detail
>> HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has
>> slow ops
>> SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops
>>
>> [***@osd001 ~]# ceph -v
>> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
>> (stable)
>>
>> Is this a known issue?
>
> It's not exactly a known issue, but from the output and story you've got here it looks like the OSDs are deleting the snapshot data too fast and the MDS isn't getting quick enough replies? Or maybe you have an overlarge CephFS directory which is taking a long time to clean up somehow; you should get the MDS ops and the MDS' objecter ops in flight and see what specifically is taking so long.
> -Greg

We had a similar issue on ceph 10.2 and RBD images. It was fixed by slowing down snapshot removal by adding this to the ceph.conf.

[osd]
osd snap trim sleep = 0.6



>
>>
>> Cheers,
>>
>> Kenneth
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-***@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-***@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Cranage, Steve
2018-11-11 00:15:18 UTC
Permalink
Can anyone tell me the secret? A colleague tried and failed many times so I tried and got this:


Steve Cranage>>>> --_000_SN4PR0701MB3792CB55C8AA7468ADE7FC4DB2C00SN4PR0701MB3792_
**** Command '--_000_sn4pr0701mb3792cb55c8aa7468ade7fc4db2c00sn4pr0701mb3792_' not recognized.
>>>> Content-Type: text/plain; charset="us-ascii"
**** Command 'content-type:' not recognized.
>>>> Content-Transfer-Encoding: quoted-printable
**** Command 'content-transfer-encoding:' not recognized.
>>>>
>>>> subscribe+ceph-devel
**** Command 'subscribe+ceph-devel' not recognized.

According to the server help, the 'subscribe+ceph-devel’ should be correct syntax, but apparently not so.

TIA!

Principal Architect, Co-Founder
DeepSpace Storage
719-930-6960
[cid:***@01D3FCBC.58FDB6F0]
Brad Hubbard
2018-11-12 00:16:47 UTC
Permalink
What do you get if you send "help" (without quotes) to m
***@vger.kernel.org ?

On Sun, Nov 11, 2018 at 10:15 AM Cranage, Steve <
***@deepspacestorage.com> wrote:

> Can anyone tell me the secret? A colleague tried and failed many times so
> I tried and got this:
>
>
>
>
>
> Steve Cranage>>>>
> --_000_SN4PR0701MB3792CB55C8AA7468ADE7FC4DB2C00SN4PR0701MB3792_
> **** Command
> '--_000_sn4pr0701mb3792cb55c8aa7468ade7fc4db2c00sn4pr0701mb3792_' not
> recognized.
> >>>> Content-Type: text/plain; charset="us-ascii"
> **** Command 'content-type:' not recognized.
> >>>> Content-Transfer-Encoding: quoted-printable
> **** Command 'content-transfer-encoding:' not recognized.
> >>>>
> >>>> subscribe+ceph-devel
> **** Command 'subscribe+ceph-devel' not recognized.
>
>
>
> According to the server help, the 'subscribe+ceph-devel’ should be correct
> syntax, but apparently not so.
>
>
>
> TIA!
>
> Principal Architect, Co-Founder
>
> DeepSpace Storage
>
> 719-930-6960
>
> [image: cid:***@01D3FCBC.58FDB6F0]
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-***@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--
Cheers,
Brad
Loading...