Discussion:
[ceph-users] TCP failed connection attempts
Dan Van Der Ster
2014-03-26 14:29:02 UTC
Permalink
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/4ecd4a28/attachment.htm>
Ирек Фасихов
2014-03-26 14:48:47 UTC
Permalink
Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its connection to
the server.
Post by Dan Van Der Ster
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
I?m not a TCP expert at all ? but this doesn?t look good. Do others have
similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this
up?
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/16310b76/attachment.htm>
Dan Van Der Ster
2014-03-26 14:54:25 UTC
Permalink
Thanks, I?ll try that. (Our current settings are the exact opposite of your suggestion).

I found an old thread discussing a new option, ms tcp rcvbuf, but I found that it is still not enabled by default in dumpling:

"ms_tcp_rcvbuf": "0",

Not sure if that?s related.
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 26 Mar 2014 at 15:49:42, ???? ??????? (malmyzh at gmail.com<mailto:malmyzh at gmail.com>) wrote:

Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a server error, but the client. For example: A client lost its connection to the server.



2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch<mailto:daniel.vanderster at cern.ch>>:
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/6332b38a/attachment.htm>
Sergey Malinin
2014-03-26 20:31:24 UTC
Permalink
Post by Ирек Фасихов
Hi, Daniel.
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1
Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its
connection to the server.
It refers to connection *attemtps*, that means loss of established
connection will not affect the counter.
Post by Ирек Фасихов
2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch
Hi all,
I recently noticed our OSD servers have a very large number of TCP
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides osd
daemon running on the machine, i would say that this is extraordinarily
large
number indicating that something is definitely going wrong.
Post by Ирек Фасихов
I?m not a TCP expert at all ? but this doesn?t look good. Do
others have similar numbers? Does anyone know if some ipv4 sysctl
tuning can clear this up?
'netstat -s' provides verbose display of /proc/net/snmp counters for
your information. These counters don't need to be reset or cleared up in
some other way.
Post by Ирек Фасихов
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT
Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/b7c5404e/attachment.htm>
Dan Van Der Ster
2014-03-27 07:52:55 UTC
Permalink
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

Taking into account that presumably you don't have anything besides osd daemon running on the machine, i would say that this is extraordinarily large
number indicating that something is definitely going wrong.

This was my thinking as well. I am seeing this on a test cluster with very few real clients, so most of the connect attempts should be replication from other OSDs.

The suggested sysctl changes didn?t stop the failed conn attempts from increasing. I?m going to keep looking around?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/ab04f6d2/attachment.htm>
Sergey Malinin
2014-03-27 09:44:21 UTC
Permalink
Post by Dan Van Der Ster
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com
Post by Dan Van Der Ster
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides
osd daemon running on the machine, i would say that this is
extraordinarily large
number indicating that something is definitely going wrong.
This was my thinking as well. I am seeing this on a test cluster with
very few real clients, so most of the connect attempts should be
replication from other OSDs.
This figure represents connections initiated locally, i.e. replication
*to* other osds.
Post by Dan Van Der Ster
The suggested sysctl changes didn?t stop the failed conn attempts from
increasing. I?m going to keep looking around?
sysctl has nothing to do with that since those are just counters. You
can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/6c499ee3/attachment.htm>
Dan Van Der Ster
2014-03-27 17:13:11 UTC
Permalink
On 27 Mar 2014 at 10:44:35, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
sysctl has nothing to do with that since those are just counters. You can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG

Thanks for that? you helped me identify a host that was looping through many failed connections to our OSD servers. (It was a half-disabled ceph dashboard tool? my fault, probably).

But even after removing that host, I still see the failed connections counter increasing while there are no packets logged from above.

So I?m still looking ?

Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/1a5353a2/attachment.htm>
Dan Van Der Ster
2014-03-27 17:13:11 UTC
Permalink
On 27 Mar 2014 at 10:44:35, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
sysctl has nothing to do with that since those are just counters. You can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG

Thanks for that? you helped me identify a host that was looping through many failed connections to our OSD servers. (It was a half-disabled ceph dashboard tool? my fault, probably).

But even after removing that host, I still see the failed connections counter increasing while there are no packets logged from above.

So I?m still looking ?

Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/1a5353a2/attachment-0002.htm>
Dan Van Der Ster
2014-03-27 17:13:11 UTC
Permalink
On 27 Mar 2014 at 10:44:35, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
sysctl has nothing to do with that since those are just counters. You can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG

Thanks for that? you helped me identify a host that was looping through many failed connections to our OSD servers. (It was a half-disabled ceph dashboard tool? my fault, probably).

But even after removing that host, I still see the failed connections counter increasing while there are no packets logged from above.

So I?m still looking ?

Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/1a5353a2/attachment-0003.htm>
Dan Van Der Ster
2014-03-27 17:13:11 UTC
Permalink
On 27 Mar 2014 at 10:44:35, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
sysctl has nothing to do with that since those are just counters. You can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG

Thanks for that? you helped me identify a host that was looping through many failed connections to our OSD servers. (It was a half-disabled ceph dashboard tool? my fault, probably).

But even after removing that host, I still see the failed connections counter increasing while there are no packets logged from above.

So I?m still looking ?

Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/1a5353a2/attachment-0004.htm>
Sergey Malinin
2014-03-27 09:44:21 UTC
Permalink
Post by Dan Van Der Ster
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com
Post by Dan Van Der Ster
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides
osd daemon running on the machine, i would say that this is
extraordinarily large
number indicating that something is definitely going wrong.
This was my thinking as well. I am seeing this on a test cluster with
very few real clients, so most of the connect attempts should be
replication from other OSDs.
This figure represents connections initiated locally, i.e. replication
*to* other osds.
Post by Dan Van Der Ster
The suggested sysctl changes didn?t stop the failed conn attempts from
increasing. I?m going to keep looking around?
sysctl has nothing to do with that since those are just counters. You
can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/6c499ee3/attachment-0002.htm>
Sergey Malinin
2014-03-27 09:44:21 UTC
Permalink
Post by Dan Van Der Ster
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com
Post by Dan Van Der Ster
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides
osd daemon running on the machine, i would say that this is
extraordinarily large
number indicating that something is definitely going wrong.
This was my thinking as well. I am seeing this on a test cluster with
very few real clients, so most of the connect attempts should be
replication from other OSDs.
This figure represents connections initiated locally, i.e. replication
*to* other osds.
Post by Dan Van Der Ster
The suggested sysctl changes didn?t stop the failed conn attempts from
increasing. I?m going to keep looking around?
sysctl has nothing to do with that since those are just counters. You
can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/6c499ee3/attachment-0003.htm>
Sergey Malinin
2014-03-27 09:44:21 UTC
Permalink
Post by Dan Van Der Ster
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com
Post by Dan Van Der Ster
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides
osd daemon running on the machine, i would say that this is
extraordinarily large
number indicating that something is definitely going wrong.
This was my thinking as well. I am seeing this on a test cluster with
very few real clients, so most of the connect attempts should be
replication from other OSDs.
This figure represents connections initiated locally, i.e. replication
*to* other osds.
Post by Dan Van Der Ster
The suggested sysctl changes didn?t stop the failed conn attempts from
increasing. I?m going to keep looking around?
sysctl has nothing to do with that since those are just counters. You
can debug failed connections by logging connection resets:
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/6c499ee3/attachment-0004.htm>
Dan Van Der Ster
2014-03-27 07:52:55 UTC
Permalink
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

Taking into account that presumably you don't have anything besides osd daemon running on the machine, i would say that this is extraordinarily large
number indicating that something is definitely going wrong.

This was my thinking as well. I am seeing this on a test cluster with very few real clients, so most of the connect attempts should be replication from other OSDs.

The suggested sysctl changes didn?t stop the failed conn attempts from increasing. I?m going to keep looking around?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/ab04f6d2/attachment-0002.htm>
Dan Van Der Ster
2014-03-27 07:52:55 UTC
Permalink
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

Taking into account that presumably you don't have anything besides osd daemon running on the machine, i would say that this is extraordinarily large
number indicating that something is definitely going wrong.

This was my thinking as well. I am seeing this on a test cluster with very few real clients, so most of the connect attempts should be replication from other OSDs.

The suggested sysctl changes didn?t stop the failed conn attempts from increasing. I?m going to keep looking around?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/ab04f6d2/attachment-0003.htm>
Dan Van Der Ster
2014-03-27 07:52:55 UTC
Permalink
On 26 Mar 2014 at 21:33:06, Sergey Malinin (hell at newmail.com<mailto:hell at newmail.com>) wrote:
This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

Taking into account that presumably you don't have anything besides osd daemon running on the machine, i would say that this is extraordinarily large
number indicating that something is definitely going wrong.

This was my thinking as well. I am seeing this on a test cluster with very few real clients, so most of the connect attempts should be replication from other OSDs.

The suggested sysctl changes didn?t stop the failed conn attempts from increasing. I?m going to keep looking around?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140327/ab04f6d2/attachment-0004.htm>
Dan Van Der Ster
2014-03-26 14:54:25 UTC
Permalink
Thanks, I?ll try that. (Our current settings are the exact opposite of your suggestion).

I found an old thread discussing a new option, ms tcp rcvbuf, but I found that it is still not enabled by default in dumpling:

"ms_tcp_rcvbuf": "0",

Not sure if that?s related.
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 26 Mar 2014 at 15:49:42, ???? ??????? (malmyzh at gmail.com<mailto:malmyzh at gmail.com>) wrote:

Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a server error, but the client. For example: A client lost its connection to the server.



2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch<mailto:daniel.vanderster at cern.ch>>:
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/6332b38a/attachment-0002.htm>
Sergey Malinin
2014-03-26 20:31:24 UTC
Permalink
Post by Ирек Фасихов
Hi, Daniel.
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1
Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its
connection to the server.
It refers to connection *attemtps*, that means loss of established
connection will not affect the counter.
Post by Ирек Фасихов
2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch
Hi all,
I recently noticed our OSD servers have a very large number of TCP
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides osd
daemon running on the machine, i would say that this is extraordinarily
large
number indicating that something is definitely going wrong.
Post by Ирек Фасихов
I?m not a TCP expert at all ? but this doesn?t look good. Do
others have similar numbers? Does anyone know if some ipv4 sysctl
tuning can clear this up?
'netstat -s' provides verbose display of /proc/net/snmp counters for
your information. These counters don't need to be reset or cleared up in
some other way.
Post by Ирек Фасихов
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT
Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/b7c5404e/attachment-0001.htm>
Dan Van Der Ster
2014-03-26 14:54:25 UTC
Permalink
Thanks, I?ll try that. (Our current settings are the exact opposite of your suggestion).

I found an old thread discussing a new option, ms tcp rcvbuf, but I found that it is still not enabled by default in dumpling:

"ms_tcp_rcvbuf": "0",

Not sure if that?s related.
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 26 Mar 2014 at 15:49:42, ???? ??????? (malmyzh at gmail.com<mailto:malmyzh at gmail.com>) wrote:

Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a server error, but the client. For example: A client lost its connection to the server.



2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch<mailto:daniel.vanderster at cern.ch>>:
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/6332b38a/attachment-0003.htm>
Sergey Malinin
2014-03-26 20:31:24 UTC
Permalink
Post by Ирек Фасихов
Hi, Daniel.
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1
Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its
connection to the server.
It refers to connection *attemtps*, that means loss of established
connection will not affect the counter.
Post by Ирек Фасихов
2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch
Hi all,
I recently noticed our OSD servers have a very large number of TCP
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides osd
daemon running on the machine, i would say that this is extraordinarily
large
number indicating that something is definitely going wrong.
Post by Ирек Фасихов
I?m not a TCP expert at all ? but this doesn?t look good. Do
others have similar numbers? Does anyone know if some ipv4 sysctl
tuning can clear this up?
'netstat -s' provides verbose display of /proc/net/snmp counters for
your information. These counters don't need to be reset or cleared up in
some other way.
Post by Ирек Фасихов
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT
Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/b7c5404e/attachment-0002.htm>
Dan Van Der Ster
2014-03-26 14:54:25 UTC
Permalink
Thanks, I?ll try that. (Our current settings are the exact opposite of your suggestion).

I found an old thread discussing a new option, ms tcp rcvbuf, but I found that it is still not enabled by default in dumpling:

"ms_tcp_rcvbuf": "0",

Not sure if that?s related.
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 26 Mar 2014 at 15:49:42, ???? ??????? (malmyzh at gmail.com<mailto:malmyzh at gmail.com>) wrote:

Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a server error, but the client. For example: A client lost its connection to the server.



2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch<mailto:daniel.vanderster at cern.ch>>:
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --

_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com<mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/6332b38a/attachment-0004.htm>
Sergey Malinin
2014-03-26 20:31:24 UTC
Permalink
Post by Ирек Фасихов
Hi, Daniel.
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1
Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its
connection to the server.
It refers to connection *attemtps*, that means loss of established
connection will not affect the counter.
Post by Ирек Фасихов
2014-03-26 18:29 GMT+04:00 Dan Van Der Ster <daniel.vanderster at cern.ch
Hi all,
I recently noticed our OSD servers have a very large number of TCP
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
Taking into account that presumably you don't have anything besides osd
daemon running on the machine, i would say that this is extraordinarily
large
number indicating that something is definitely going wrong.
Post by Ирек Фасихов
I?m not a TCP expert at all ? but this doesn?t look good. Do
others have similar numbers? Does anyone know if some ipv4 sysctl
tuning can clear this up?
'netstat -s' provides verbose display of /proc/net/snmp counters for
your information. These counters don't need to be reset or cleared up in
some other way.
Post by Ирек Фасихов
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT
Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/b7c5404e/attachment-0003.htm>
Dan Van Der Ster
2014-03-26 14:29:02 UTC
Permalink
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/4ecd4a28/attachment-0002.htm>
Ирек Фасихов
2014-03-26 14:48:47 UTC
Permalink
Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its connection to
the server.
Post by Dan Van Der Ster
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
I?m not a TCP expert at all ? but this doesn?t look good. Do others have
similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this
up?
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/16310b76/attachment-0002.htm>
Dan Van Der Ster
2014-03-26 14:29:02 UTC
Permalink
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/4ecd4a28/attachment-0003.htm>
Ирек Фасихов
2014-03-26 14:48:47 UTC
Permalink
Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its connection to
the server.
Post by Dan Van Der Ster
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
I?m not a TCP expert at all ? but this doesn?t look good. Do others have
similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this
up?
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/16310b76/attachment-0003.htm>
Dan Van Der Ster
2014-03-26 14:29:02 UTC
Permalink
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed connection attempts. This is typical (output from netstat -s):

50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts

I?m not a TCP expert at all ? but this doesn?t look good. Do others have similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this up?
Cheers, Dan


-- Dan van der Ster || Data & Storage Services || CERN IT Department --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/4ecd4a28/attachment-0004.htm>
Ирек Фасихов
2014-03-26 14:48:47 UTC
Permalink
Hi, Daniel.

I use the following settings:

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_moderate_rcvbuf=0
net.ipv4.tcp_low_latency = 1

Message "failed connection attempts", can be ignored, it is not just a
server error, but the client. For example: A client lost its connection to
the server.
Post by Dan Van Der Ster
Hi all,
I recently noticed our OSD servers have a very large number of TCP failed
50329019 active connections openings
15218590 passive connection openings
44167087 failed connection attempts
I?m not a TCP expert at all ? but this doesn?t look good. Do others have
similar numbers? Does anyone know if some ipv4 sysctl tuning can clear this
up?
Cheers, Dan
-- Dan van der Ster || Data & Storage Services || CERN IT Department --
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140326/16310b76/attachment-0004.htm>
Continue reading on narkive:
Loading...