Discussion:
[ceph-users] No fix for 0x6706be76 CRCs ?
Alfredo Daniel Rezinovsky
2018-09-18 19:10:21 UTC
Permalink
Changed all my hardware. Now I have plenty of free ram. swap never
needed, low iowait and still

7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006

It happens sometimes, in all my OSDs.

Bluestore OSDs with data in HDD and block.db in SSD

After running pg repair the pgs were always repaired.

running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
Paul Emmerich
2018-09-18 19:19:18 UTC
Permalink
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.


Paul


2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still
7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006
It happens sometimes, in all my OSDs.
Bluestore OSDs with data in HDD and block.db in SSD
After running pg repair the pgs were always repaired.
running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Alfredo Daniel Rezinovsky
2018-09-18 19:23:10 UTC
Permalink
MOMENT !!!

"Some kernels (4.9+) sometime fail to return data when readingfrom a
block device under memory pressure."

I dind't knew that was the problem. Can't I just dowgrade the kernel?

There are known working versions o just need to be prior 4.9?
Post by Paul Emmerich
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.
Paul
2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still
7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006
It happens sometimes, in all my OSDs.
Bluestore OSDs with data in HDD and block.db in SSD
After running pg repair the pgs were always repaired.
running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
Paul Emmerich
2018-09-18 19:27:23 UTC
Permalink
Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).

4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.


Paul

2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
Post by Alfredo Daniel Rezinovsky
MOMENT !!!
"Some kernels (4.9+) sometime fail to return data when reading from a block
device under memory pressure."
I dind't knew that was the problem. Can't I just dowgrade the kernel?
There are known working versions o just need to be prior 4.9?
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.
Paul
2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still
7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006
It happens sometimes, in all my OSDs.
Bluestore OSDs with data in HDD and block.db in SSD
After running pg repair the pgs were always repaired.
running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Alfredo Daniel Rezinovsky
2018-09-18 19:28:57 UTC
Permalink
I started with this after upgrade to bionic. I had Xenial with lts
kernels (4.13) without problem.

I will try to change to ubuntu 4.13 and wait for the logs.

Thanks
Post by Paul Emmerich
Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).
4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.
Paul
2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
Post by Alfredo Daniel Rezinovsky
MOMENT !!!
"Some kernels (4.9+) sometime fail to return data when reading from a block
device under memory pressure."
I dind't knew that was the problem. Can't I just dowgrade the kernel?
There are known working versions o just need to be prior 4.9?
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.
Paul
2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still
7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006
It happens sometimes, in all my OSDs.
Bluestore OSDs with data in HDD and block.db in SSD
After running pg repair the pgs were always repaired.
running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
Alfredo Daniel Rezinovsky
2018-09-19 15:01:40 UTC
Permalink
Tried 4.17 with the same problem

Just downgraded to 4.8. Let's see if no more 0x67... appears
Post by Alfredo Daniel Rezinovsky
I started with this after upgrade to bionic. I had Xenial with lts
kernels (4.13) without problem.
I will try to change to ubuntu 4.13 and wait for the logs.
Thanks
Post by Paul Emmerich
Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).
4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.
Paul
2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
Post by Alfredo Daniel Rezinovsky
MOMENT !!!
"Some kernels (4.9+) sometime fail to return data when reading from a block
device under memory pressure."
I dind't knew that was the problem. Can't I just dowgrade the kernel?
There are known working versions o just need to be prior 4.9?
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.
Paul
2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still
7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006
It happens sometimes, in all my OSDs.
Bluestore OSDs with data in HDD and block.db in SSD
After running pg repair the pgs were always repaired.
running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
Alfredo Daniel Rezinovsky
2018-09-21 14:47:41 UTC
Permalink
I have ubuntu servers.

With ukuu I installed kernel 4.8.17-040817 (The last < 4.9 available
kernel) and I haven't any 0x6706be76 crc since.

Nor any inconsistence.
Post by Alfredo Daniel Rezinovsky
Tried 4.17 with the same problem
Just downgraded to 4.8. Let's see if no more 0x67... appears
Post by Alfredo Daniel Rezinovsky
I started with this after upgrade to bionic. I had Xenial with lts
kernels (4.13) without problem.
I will try to change to ubuntu 4.13 and wait for the logs.
Thanks
Post by Paul Emmerich
Yeah, it's very likely a kernel bug (that no one managed to reduce to
a simpler test case or even to reproduce it reliably with reasonable
effort on a test system).
4.9 and earlier aren't affected as far as we can tell, we only
encountered this after upgrading. But I think Bionic ships with a
broken kernel.
Try raising the issue with the ubuntu guys if you are using a
distribution kernel.
Paul
2018-09-18 21:23 GMT+02:00 Alfredo Daniel Rezinovsky
Post by Alfredo Daniel Rezinovsky
MOMENT !!!
"Some kernels (4.9+) sometime fail to return data when reading from a block
device under memory pressure."
I dind't knew that was the problem. Can't I just dowgrade the kernel?
There are known working versions o just need to be prior 4.9?
We built a work-around here: https://github.com/ceph/ceph/pull/23273
Which hasn't been backported, but we'll ship 13.2.2 in our Debian
packages for the croit OS image.
Paul
2018-09-18 21:10 GMT+02:00 Alfredo Daniel Rezinovsky
Changed all my hardware. Now I have plenty of free ram. swap never needed,
low iowait and still
7fdbbb73e700 -1 bluestore(/var/lib/ceph/osd/ceph-6) _verify_csum bad
crc32c/0x1000 checksum at blob offset 0x1e000, got 0x6706be76, expected
0x85a3fefe, device location [0x25ac04be000~1000], logical extent
0x1e000~1000, object #2:fd955b81:::10000729cdb.00000006
It happens sometimes, in all my OSDs.
Bluestore OSDs with data in HDD and block.db in SSD
After running pg repair the pgs were always repaired.
running ceph in ubuntu 13.2.1-1bionic
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________
ceph-users mailing list
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo
Loading...