Discussion:
[ceph-users] Understanding the number of TCP connections between clients and OSDs
Rick Balsano
2015-10-26 20:32:06 UTC
Permalink
We've run into issues with the number of open TCP connections from a single
client to the OSDs in our Ceph cluster.

We can (& have) increased the open file limit to work around this, but
we're looking to understand what determines the number of open connections
maintained between a client and a particular OSD. Our naive assumption was
1 open TCP connection per OSD or per port made available by the Ceph node.
There are many more than this, presumably to allow parallel connections,
because we see 1-4 connections from each client per open port on a Ceph
node.

Here is some background on our cluster:
* still running Firefly 0.80.8
* 414 OSDs, 35 nodes, one massive pool
* clients are KVM processes, accessing Ceph RBD images using virtio
* total number of open TCP connections from one client to all nodes between
500-1000

Is there any way to either know or cap the maximum number of connections we
should expect?

I can provide more info as required. I've done some searches and found
references to "huge number of TCP connections" but nothing concrete to tell
me how to predict how that scales.

Thanks,
Rick
--
*Rick Balsano*
Senior Software Engineer
Opower <http://www.opower.com>

O +1 571 384 1210
We're Hiring! See jobs here <http://www.opower.com/careers>.
Jan Schermer
2015-10-26 21:48:41 UTC
Permalink
If we're talking about RBD clients (qemu) then the number also grows with number of volumes attached to the client. With a single volume it was <1000. It grows when there's heavy IO happening in the guest.
I had to bump up the file open limits to several thusands (8000 was it?) to accomodate client with 10 volumes in our cluster. We just scaled the number of OSDs down so hopefully I could have a graph of that.
But I just guesstimated what it could become, and that's not necessarily what the theoretical limit is. Very bad things happen when you reach that threshold. It could also depend on the guest settings (like queue depth), and how much it seeks over the drive (how many different PGs it hits), but knowing the upper bound is most critical.

Jan

> On 26 Oct 2015, at 21:32, Rick Balsano <***@opower.com> wrote:
>
> We've run into issues with the number of open TCP connections from a single client to the OSDs in our Ceph cluster.
>
> We can (& have) increased the open file limit to work around this, but we're looking to understand what determines the number of open connections maintained between a client and a particular OSD. Our naive assumption was 1 open TCP connection per OSD or per port made available by the Ceph node. There are many more than this, presumably to allow parallel connections, because we see 1-4 connections from each client per open port on a Ceph node.
>
> Here is some background on our cluster:
> * still running Firefly 0.80.8
> * 414 OSDs, 35 nodes, one massive pool
> * clients are KVM processes, accessing Ceph RBD images using virtio
> * total number of open TCP connections from one client to all nodes between 500-1000
>
> Is there any way to either know or cap the maximum number of connections we should expect?
>
> I can provide more info as required. I've done some searches and found references to "huge number of TCP connections" but nothing concrete to tell me how to predict how that scales.
>
> Thanks,
> Rick
> --
> Rick Balsano
> Senior Software Engineer
> Opower <http://www.opower.com/>
>
> O +1 571 384 1210
> We're Hiring! See jobs here <http://www.opower.com/careers>.
> _______________________________________________
> ceph-users mailing list
> ceph-***@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
h***@gmail.com
2015-10-27 00:41:38 UTC
Permalink
Hi,
I also concerns about this problem. And my problem is how many threads will the qemu-system-x86 has.

From what i tested, it could between 100 to 800, yeah, maybe it has relationship with the osd number. But it
seems affect the performance when it has many threads. From what i tested, 4k randwrite will reduce from 15k
to 4k. That's really unacceptable!

My evnironment:

1. nine OSD storage servers with two intel DC 3500 SSD on each
2. hammer 0.94.3
3. QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-12+deb8u4~bpo70+1)

Thanks!


***@gmail.com

From: Jan Schermer
Date: 2015-10-27 05:48
To: Rick Balsano
CC: ceph-***@lists.ceph.com
Subject: Re: [ceph-users] Understanding the number of TCP connections between clients and OSDs
If we're talking about RBD clients (qemu) then the number also grows with number of volumes attached to the client. With a single volume it was <1000. It grows when there's heavy IO happening in the guest.
I had to bump up the file open limits to several thusands (8000 was it?) to accomodate client with 10 volumes in our cluster. We just scaled the number of OSDs down so hopefully I could have a graph of that.
But I just guesstimated what it could become, and that's not necessarily what the theoretical limit is. Very bad things happen when you reach that threshold. It could also depend on the guest settings (like queue depth), and how much it seeks over the drive (how many different PGs it hits), but knowing the upper bound is most critical.

Jan

On 26 Oct 2015, at 21:32, Rick Balsano <***@opower.com> wrote:

We've run into issues with the number of open TCP connections from a single client to the OSDs in our Ceph cluster.

We can (& have) increased the open file limit to work around this, but we're looking to understand what determines the number of open connections maintained between a client and a particular OSD. Our naive assumption was 1 open TCP connection per OSD or per port made available by the Ceph node. There are many more than this, presumably to allow parallel connections, because we see 1-4 connections from each client per open port on a Ceph node.

Here is some background on our cluster:
* still running Firefly 0.80.8
* 414 OSDs, 35 nodes, one massive pool
* clients are KVM processes, accessing Ceph RBD images using virtio
* total number of open TCP connections from one client to all nodes between 500-1000

Is there any way to either know or cap the maximum number of connections we should expect?

I can provide more info as required. I've done some searches and found references to "huge number of TCP connections" but nothing concrete to tell me how to predict how that scales.

Thanks,
Rick
--
Rick Balsano
Senior Software Engineer
Opower

O +1 571 384 1210
We're Hiring! See jobs here.
_______________________________________________
ceph-users mailing list
ceph-***@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Dan van der Ster
2015-10-27 09:05:15 UTC
Permalink
On Mon, Oct 26, 2015 at 10:48 PM, Jan Schermer <***@schermer.cz> wrote:
> If we're talking about RBD clients (qemu) then the number also grows with
> number of volumes attached to the client.

I never thought about that but it might explain a problem we have
where multiple attached volumes crashes an HV. I had assumed that
multiple volumes would reuse the same rados client instance, and thus
reuse the same connections to the OSDs.

-- dan
Rick Balsano
2015-11-04 20:27:52 UTC
Permalink
Just following up since this thread went silent after a few comments
showing similar concerns, but no explanation of the behavior. Can anyone
point to some code or documentation which explains how to estimate the
expected number of TCP connections a client would open based on read/write
volume, # of volumes, # of OSDs in the pool, etc?


On Tue, Oct 27, 2015 at 5:05 AM, Dan van der Ster <***@vanderster.com>
wrote:

> On Mon, Oct 26, 2015 at 10:48 PM, Jan Schermer <***@schermer.cz> wrote:
> > If we're talking about RBD clients (qemu) then the number also grows with
> > number of volumes attached to the client.
>
> I never thought about that but it might explain a problem we have
> where multiple attached volumes crashes an HV. I had assumed that
> multiple volumes would reuse the same rados client instance, and thus
> reuse the same connections to the OSDs.
>
> -- dan
>



--
*Rick Balsano*
Senior Software Engineer
Opower <http://www.opower.com>

O +1 571 384 1210
We're Hiring! See jobs here <http://www.opower.com/careers>.
Gregory Farnum
2015-11-04 23:19:12 UTC
Permalink
On Wed, Nov 4, 2015 at 12:27 PM, Rick Balsano <***@opower.com> wrote:
> Just following up since this thread went silent after a few comments showing
> similar concerns, but no explanation of the behavior. Can anyone point to
> some code or documentation which explains how to estimate the expected
> number of TCP connections a client would open based on read/write volume, #
> of volumes, # of OSDs in the pool, etc?

Each RBD volume creates its own connections to the cluster. It will
clean up unused connections after enough idle time but has the
possibility of creating a connection to each OSD used in the pool
hosting the volume (ie, all the OSDs in the cluster, unless you've
partitioned them in your crush map).
-Greg

>
>
> On Tue, Oct 27, 2015 at 5:05 AM, Dan van der Ster <***@vanderster.com>
> wrote:
>>
>> On Mon, Oct 26, 2015 at 10:48 PM, Jan Schermer <***@schermer.cz> wrote:
>> > If we're talking about RBD clients (qemu) then the number also grows
>> > with
>> > number of volumes attached to the client.
>>
>> I never thought about that but it might explain a problem we have
>> where multiple attached volumes crashes an HV. I had assumed that
>> multiple volumes would reuse the same rados client instance, and thus
>> reuse the same connections to the OSDs.
>>
>> -- dan
>
>
>
>
> --
> Rick Balsano
> Senior Software Engineer
> Opower
>
> O +1 571 384 1210
> We're Hiring! See jobs here.
>
> _______________________________________________
> ceph-users mailing list
> ceph-***@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
Somnath Roy
2015-11-04 23:36:29 UTC
Permalink
Hope this will be helpful..



Total connections per osd = (Target PGs per osd) * (# of pool replicas)

* 3 + (2 #clients) + (min_hb_peer)



# of pool replicas = configurable, default is 3

3 = is number of data communication messengers (cluster, hb_backend,

hb_frontend)

min_hb_peer = default is 20 I guess..
Total number connections per node: total connections per osd * number of osds per node

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-***@lists.ceph.com] On Behalf Of Rick Balsano
Sent: Wednesday, November 04, 2015 12:28 PM
To: ceph-***@lists.ceph.com
Subject: Re: [ceph-users] Understanding the number of TCP connections between clients and OSDs

Just following up since this thread went silent after a few comments showing similar concerns, but no explanation of the behavior. Can anyone point to some code or documentation which explains how to estimate the expected number of TCP connections a client would open based on read/write volume, # of volumes, # of OSDs in the pool, etc?


On Tue, Oct 27, 2015 at 5:05 AM, Dan van der Ster <***@vanderster.com<mailto:***@vanderster.com>> wrote:
On Mon, Oct 26, 2015 at 10:48 PM, Jan Schermer <***@schermer.cz<mailto:***@schermer.cz>> wrote:
> If we're talking about RBD clients (qemu) then the number also grows with
> number of volumes attached to the client.

I never thought about that but it might explain a problem we have
where multiple attached volumes crashes an HV. I had assumed that
multiple volumes would reuse the same rados client instance, and thus
reuse the same connections to the OSDs.

-- dan



--
Rick Balsano
Senior Software Engineer
Opower<http://www.opower.com>

O +1 571 384 1210<tel:%2B1%20571%20384%201210>
We're Hiring! See jobs here<http://www.opower.com/careers>.
Loading...