[ceph-users] [rgw] increase the first chunk size

Discussion:

[ceph-users] [rgw] increase the first chunk size

Ray Lv

2014-02-20 13:08:30 UTC

Hi,

Currently, the first chunk size of a radosgw object is 512KB.

Here is a case that most of data workload gets to radosgw is ~ 4MB. With the current first chunk size, each radosgw object is stripped to two chunks (512K + 3.5M). And we?re using several large disks on each host with 40TB capacity. So there will be 10 millions of files on each host. If the first chunk size is increased to 4MB, the number of files on each host will be reduced by 50%. It will be benifitial to performance of read because of reduced dcache and inode cache footprint in main memory (in other words, increased cache hit ratio).

The questions are:

* What?s the rationale behind for the current first chunk size?
* Are there any side effects if it is increased to 4MB?

Thanks,
Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140220/58d616d6/attachment.htm>

Yehuda Sadeh

2014-02-20 20:46:58 UTC

Apparently I missed including the mailing list in the response.

---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.

Yehuda

Ray Lv

2014-03-03 13:31:17 UTC

Hi Yehuda,

Thanks for your answers.

The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.

Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 01:01:34 UTC

Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.

Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-04 14:55:16 UTC

Hi Yehuda,

That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.

Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 18:03:01 UTC

Increasing that shouldn't be problematic. The real issue is when
decreasing it. First, you'd be throwing object atomicity out the
window so with concurrent readers and writers to the same object you
might end up having a reader getting inconsistent data. And second, it
hasn't really been tested.

Yehuda

Post by Ray Lv
Hi Yehuda,
That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.
Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 18:03:01 UTC

Increasing that shouldn't be problematic. The real issue is when
decreasing it. First, you'd be throwing object atomicity out the
window so with concurrent readers and writers to the same object you
might end up having a reader getting inconsistent data. And second, it
hasn't really been tested.

Yehuda

Post by Ray Lv
Hi Yehuda,
That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.
Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 18:03:01 UTC

Increasing that shouldn't be problematic. The real issue is when
decreasing it. First, you'd be throwing object atomicity out the
window so with concurrent readers and writers to the same object you
might end up having a reader getting inconsistent data. And second, it
hasn't really been tested.

Yehuda

Post by Ray Lv
Hi Yehuda,
That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.
Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 18:03:01 UTC

Increasing that shouldn't be problematic. The real issue is when
decreasing it. First, you'd be throwing object atomicity out the
window so with concurrent readers and writers to the same object you
might end up having a reader getting inconsistent data. And second, it
hasn't really been tested.

Yehuda

Post by Ray Lv
Hi Yehuda,
That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.
Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-04 14:55:16 UTC

Hi Yehuda,

That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.

Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-04 14:55:16 UTC

Hi Yehuda,

That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.

Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-04 14:55:16 UTC

Hi Yehuda,

That?s great. Is that backward compatiable with the previous configuration
settings? That is to set rgw_max_chunk_size to 512 KB first and put some
objects in size between 50 KB - 10 MB, and then set rgw_max_chunk_size to
1 MB, radosgw can read out the previously put objects.

Thanks,
Ray

Post by Yehuda Sadeh
Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.
Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 01:01:34 UTC

Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.

Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 01:01:34 UTC

Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.

Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Yehuda Sadeh

2014-03-04 01:01:34 UTC

Created issue #7589, and pushed a wip-7589 that addresses the issue. I
just ran some basic tests, so it should be taken with that in mind.

Yehuda

Post by Ray Lv
Hi Yehuda,
Thanks for your answers.
The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.
Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-03 13:31:17 UTC

Hi Yehuda,

Thanks for your answers.

The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.

Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-03 13:31:17 UTC

Hi Yehuda,

Thanks for your answers.

The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.

Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-03-03 13:31:17 UTC

Hi Yehuda,

Thanks for your answers.

The situation behind these questions is we found that an application data
load is > 512KB and <4MB (maybe < 1 MB more precisely). The GET request
usually requires 2 rados read ops. If there is a way we can configure the
RGW_MAX_CHUNK_SIZE to a number between 512KB and 1MB, only 1 read op will
be needed. So we can get total latency improved much even the first byte
latency increased very little.

Thanks,
Ray

Post by Yehuda Sadeh
Apparently I missed including the mailing list in the response.
---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.
Yehuda

Ray Lv

2014-02-20 13:08:30 UTC

Hi,

Currently, the first chunk size of a radosgw object is 512KB.

Here is a case that most of data workload gets to radosgw is ~ 4MB. With the current first chunk size, each radosgw object is stripped to two chunks (512K + 3.5M). And we?re using several large disks on each host with 40TB capacity. So there will be 10 millions of files on each host. If the first chunk size is increased to 4MB, the number of files on each host will be reduced by 50%. It will be benifitial to performance of read because of reduced dcache and inode cache footprint in main memory (in other words, increased cache hit ratio).

The questions are:

* What?s the rationale behind for the current first chunk size?
* Are there any side effects if it is increased to 4MB?

Thanks,
Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140220/58d616d6/attachment-0002.htm>

Yehuda Sadeh

2014-02-20 20:46:58 UTC

Apparently I missed including the mailing list in the response.

---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.

Yehuda

Ray Lv

2014-02-20 13:08:30 UTC

Hi,

Currently, the first chunk size of a radosgw object is 512KB.

Here is a case that most of data workload gets to radosgw is ~ 4MB. With the current first chunk size, each radosgw object is stripped to two chunks (512K + 3.5M). And we?re using several large disks on each host with 40TB capacity. So there will be 10 millions of files on each host. If the first chunk size is increased to 4MB, the number of files on each host will be reduced by 50%. It will be benifitial to performance of read because of reduced dcache and inode cache footprint in main memory (in other words, increased cache hit ratio).

The questions are:

* What?s the rationale behind for the current first chunk size?
* Are there any side effects if it is increased to 4MB?

Thanks,
Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140220/58d616d6/attachment-0003.htm>

Yehuda Sadeh

2014-02-20 20:46:58 UTC

Apparently I missed including the mailing list in the response.

---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.

Yehuda

Ray Lv

2014-02-20 13:08:30 UTC

Hi,

Currently, the first chunk size of a radosgw object is 512KB.

Here is a case that most of data workload gets to radosgw is ~ 4MB. With the current first chunk size, each radosgw object is stripped to two chunks (512K + 3.5M). And we?re using several large disks on each host with 40TB capacity. So there will be 10 millions of files on each host. If the first chunk size is increased to 4MB, the number of files on each host will be reduced by 50%. It will be benifitial to performance of read because of reduced dcache and inode cache footprint in main memory (in other words, increased cache hit ratio).

The questions are:

* What?s the rationale behind for the current first chunk size?
* Are there any side effects if it is increased to 4MB?

Thanks,
Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140220/58d616d6/attachment-0004.htm>

Yehuda Sadeh

2014-02-20 20:46:58 UTC

Apparently I missed including the mailing list in the response.

---------- Forwarded message ----------
From: Yehuda Sadeh <yehuda at inktank.com>
Date: Thu, Feb 20, 2014 at 9:04 AM
Subject: Re: [ceph-users] [rgw] increase the first chunk size
To: Ray Lv <raylv at yahoo-inc.com>

Post by Ray Lv
Hi,
Currently, the first chunk size of a radosgw object is 512KB.
Here is a case that most of data workload gets to radosgw is ~ 4MB. With the
current first chunk size, each radosgw object is stripped to two chunks
(512K + 3.5M). And we're using several large disks on each host with 40TB
capacity. So there will be 10 millions of files on each host. If the first
chunk size is increased to 4MB, the number of files on each host will be
reduced by 50%. It will be benifitial to performance of read because of
reduced dcache and inode cache footprint in main memory (in other words,
increased cache hit ratio).
What's the rationale behind for the current first chunk size?

The head side conforms to the read chunk side. The gateway reads in
512k chunks. This is the basic read unit, and when accessing an object
we access the head only once and read the entire data + all its
attributes in one compound rados operation. This is done to ensure
atomicity. We don't have any knowledge as to whether the operation is
deemed to fail later on (e.g., insufficient permissions), so it's
going to read it anyway.

Post by Ray Lv
Are there any side effects if it is increased to 4MB?

Will require reading the head in a single operation, which can be a
problem with regard to concurrency, add latency to all operations
(will take more time to stream data back to the client). Unauthorized
requests will use more resources.

Yehuda

23 Replies
313 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Ray Lv 2014-02-20 13:08:30 UTC

Yehuda Sadeh 2014-02-20 20:46:58 UTC

Ray Lv 2014-03-03 13:31:17 UTC

Yehuda Sadeh 2014-03-04 01:01:34 UTC

Ray Lv 2014-03-04 14:55:16 UTC

Yehuda Sadeh 2014-03-04 18:03:01 UTC

Yehuda Sadeh 2014-03-04 18:03:01 UTC

Yehuda Sadeh 2014-03-04 18:03:01 UTC

Yehuda Sadeh 2014-03-04 18:03:01 UTC

Ray Lv 2014-03-04 14:55:16 UTC

Ray Lv 2014-03-04 14:55:16 UTC

Ray Lv 2014-03-04 14:55:16 UTC

Yehuda Sadeh 2014-03-04 01:01:34 UTC

Yehuda Sadeh 2014-03-04 01:01:34 UTC

Yehuda Sadeh 2014-03-04 01:01:34 UTC

Ray Lv 2014-03-03 13:31:17 UTC

Ray Lv 2014-03-03 13:31:17 UTC

Ray Lv 2014-03-03 13:31:17 UTC

Ray Lv 2014-02-20 13:08:30 UTC

Yehuda Sadeh 2014-02-20 20:46:58 UTC

Ray Lv 2014-02-20 13:08:30 UTC

Yehuda Sadeh 2014-02-20 20:46:58 UTC

Ray Lv 2014-02-20 13:08:30 UTC

Yehuda Sadeh 2014-02-20 20:46:58 UTC

about - legalese

Loading...