Discussion:
About Ceph SSD and HDD strategy
(too old to reply)
Martin Catudal
2013-10-07 01:23:15 UTC
Permalink
Hi Guys,
I read all Ceph documentation more than twice. I'm now very
comfortable with all the aspect of Ceph except for the strategy of using
my SSD and HDD.

Here is my reflexion

I've two approach in my understanding about use fast SSD (900 GB) for my
primary storage and huge but slower HDD (4 TB) for replicas.

FIRST APPROACH
1. I can use PG with cache write enable as my primary storage that's
goes on my SSD and let replicas goes on my 7200 RPM.
With the cache write enable, I will gain performance for my VM
user machine in VDI environment since Ceph client will not have to wait
for the replicas write confirmation on the slower HDD.

SECOND APPROACH
2. Use pools hierarchies and let's have one pool for the SSD as primary
and lets the replicas goes to a second pool name platter for HDD
replication.
As explain in the Ceph documentation
rule ssd-primary {
ruleset 4
type replicated
min_size 5
max_size 10
step take ssd
step chooseleaf firstn 1 type host
step emit
step take platter
step chooseleaf firstn -1 type host
step emit
}

At this point, I could not figure out what approach could have the most
advantage.

Your point of view would definitely help me.

Sincerely,
Martin

--
Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708
Mike Lowe
2013-10-07 15:25:00 UTC
Permalink
Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.

On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Hi Guys,
> I read all Ceph documentation more than twice. I'm now very
> comfortable with all the aspect of Ceph except for the strategy of using
> my SSD and HDD.
>
> Here is my reflexion
>
> I've two approach in my understanding about use fast SSD (900 GB) for my
> primary storage and huge but slower HDD (4 TB) for replicas.
>
> FIRST APPROACH
> 1. I can use PG with cache write enable as my primary storage that's
> goes on my SSD and let replicas goes on my 7200 RPM.
> With the cache write enable, I will gain performance for my VM
> user machine in VDI environment since Ceph client will not have to wait
> for the replicas write confirmation on the slower HDD.
>
> SECOND APPROACH
> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> and lets the replicas goes to a second pool name platter for HDD
> replication.
> As explain in the Ceph documentation
> rule ssd-primary {
> ruleset 4
> type replicated
> min_size 5
> max_size 10
> step take ssd
> step chooseleaf firstn 1 type host
> step emit
> step take platter
> step chooseleaf firstn -1 type host
> step emit
> }
>
> At this point, I could not figure out what approach could have the most
> advantage.
>
> Your point of view would definitely help me.
>
> Sincerely,
> Martin
>
> --
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Martin Catudal
2013-10-07 15:34:48 UTC
Permalink
Thank's Mike,
Kyle Bader suggest me also to use my large SSD (900 GB) as cache
drive using "bcache" or "flashcache".
Since I have already plan to use SSD for my journal, I would certainly
use also SSD as cache drive in addition.

I will have to read documentation about "bcache" and his integration
with Ceph.

Martin

Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708

Le 2013-10-07 11:25, Mike Lowe a ?crit :
> Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.
>
> On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>
>> Hi Guys,
>> I read all Ceph documentation more than twice. I'm now very
>> comfortable with all the aspect of Ceph except for the strategy of using
>> my SSD and HDD.
>>
>> Here is my reflexion
>>
>> I've two approach in my understanding about use fast SSD (900 GB) for my
>> primary storage and huge but slower HDD (4 TB) for replicas.
>>
>> FIRST APPROACH
>> 1. I can use PG with cache write enable as my primary storage that's
>> goes on my SSD and let replicas goes on my 7200 RPM.
>> With the cache write enable, I will gain performance for my VM
>> user machine in VDI environment since Ceph client will not have to wait
>> for the replicas write confirmation on the slower HDD.
>>
>> SECOND APPROACH
>> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> and lets the replicas goes to a second pool name platter for HDD
>> replication.
>> As explain in the Ceph documentation
>> rule ssd-primary {
>> ruleset 4
>> type replicated
>> min_size 5
>> max_size 10
>> step take ssd
>> step chooseleaf firstn 1 type host
>> step emit
>> step take platter
>> step chooseleaf firstn -1 type host
>> step emit
>> }
>>
>> At this point, I could not figure out what approach could have the most
>> advantage.
>>
>> Your point of view would definitely help me.
>>
>> Sincerely,
>> Martin
>>
>> --
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Jason Villalta
2013-10-07 15:39:55 UTC
Permalink
I also would be interested in how bcache or flashcache would integrate.


On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Thank's Mike,
> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
> drive using "bcache" or "flashcache".
> Since I have already plan to use SSD for my journal, I would certainly
> use also SSD as cache drive in addition.
>
> I will have to read documentation about "bcache" and his integration
> with Ceph.
>
> Martin
>
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
>
> Le 2013-10-07 11:25, Mike Lowe a ?crit :
> > Based on my experience I think you are grossly underestimating the
> expense and frequency of flushes issued from your vm's. This will be
> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
> the vm is suspended while qemu waits for the flush to finish. I think your
> best course of action until the caching pool work is completed (I think I
> remember correctly that this is currently in development) is to either use
> the ssd's as large caches with bcache or to use them for journal devices.
> I'm sure there are some other more informed opinions out there on the best
> use of ssd's in a ceph cluster and hopefully they will chime in.
> >
> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
> >
> >> Hi Guys,
> >> I read all Ceph documentation more than twice. I'm now very
> >> comfortable with all the aspect of Ceph except for the strategy of using
> >> my SSD and HDD.
> >>
> >> Here is my reflexion
> >>
> >> I've two approach in my understanding about use fast SSD (900 GB) for my
> >> primary storage and huge but slower HDD (4 TB) for replicas.
> >>
> >> FIRST APPROACH
> >> 1. I can use PG with cache write enable as my primary storage that's
> >> goes on my SSD and let replicas goes on my 7200 RPM.
> >> With the cache write enable, I will gain performance for my VM
> >> user machine in VDI environment since Ceph client will not have to wait
> >> for the replicas write confirmation on the slower HDD.
> >>
> >> SECOND APPROACH
> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> >> and lets the replicas goes to a second pool name platter for HDD
> >> replication.
> >> As explain in the Ceph documentation
> >> rule ssd-primary {
> >> ruleset 4
> >> type replicated
> >> min_size 5
> >> max_size 10
> >> step take ssd
> >> step chooseleaf firstn 1 type host
> >> step emit
> >> step take platter
> >> step chooseleaf firstn -1 type host
> >> step emit
> >> }
> >>
> >> At this point, I could not figure out what approach could have the most
> >> advantage.
> >>
> >> Your point of view would definitely help me.
> >>
> >> Sincerely,
> >> Martin
> >>
> >> --
> >> Martin Catudal
> >> Responsable TIC
> >> Ressources Metanor Inc
> >> Ligne directe: (819) 218-2708
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0001.png>
Jason Villalta
2013-10-07 15:43:56 UTC
Permalink
I found this without much effort.
http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/


On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I also would be interested in how bcache or flashcache would integrate.
>
>
> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>
>> Thank's Mike,
>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>> drive using "bcache" or "flashcache".
>> Since I have already plan to use SSD for my journal, I would certainly
>> use also SSD as cache drive in addition.
>>
>> I will have to read documentation about "bcache" and his integration
>> with Ceph.
>>
>> Martin
>>
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>>
>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>> > Based on my experience I think you are grossly underestimating the
>> expense and frequency of flushes issued from your vm's. This will be
>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>> the vm is suspended while qemu waits for the flush to finish. I think your
>> best course of action until the caching pool work is completed (I think I
>> remember correctly that this is currently in development) is to either use
>> the ssd's as large caches with bcache or to use them for journal devices.
>> I'm sure there are some other more informed opinions out there on the best
>> use of ssd's in a ceph cluster and hopefully they will chime in.
>> >
>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>> >
>> >> Hi Guys,
>> >> I read all Ceph documentation more than twice. I'm now very
>> >> comfortable with all the aspect of Ceph except for the strategy of
>> using
>> >> my SSD and HDD.
>> >>
>> >> Here is my reflexion
>> >>
>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>> my
>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>> >>
>> >> FIRST APPROACH
>> >> 1. I can use PG with cache write enable as my primary storage that's
>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>> >> With the cache write enable, I will gain performance for my VM
>> >> user machine in VDI environment since Ceph client will not have to wait
>> >> for the replicas write confirmation on the slower HDD.
>> >>
>> >> SECOND APPROACH
>> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> >> and lets the replicas goes to a second pool name platter for HDD
>> >> replication.
>> >> As explain in the Ceph documentation
>> >> rule ssd-primary {
>> >> ruleset 4
>> >> type replicated
>> >> min_size 5
>> >> max_size 10
>> >> step take ssd
>> >> step chooseleaf firstn 1 type host
>> >> step emit
>> >> step take platter
>> >> step chooseleaf firstn -1 type host
>> >> step emit
>> >> }
>> >>
>> >> At this point, I could not figure out what approach could have the most
>> >> advantage.
>> >>
>> >> Your point of view would definitely help me.
>> >>
>> >> Sincerely,
>> >> Martin
>> >>
>> >> --
>> >> Martin Catudal
>> >> Responsable TIC
>> >> Ressources Metanor Inc
>> >> Ligne directe: (819) 218-2708
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users at lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment.png>
Jason Villalta
2013-10-08 02:27:06 UTC
Permalink
I tried putting Flashcache on my spindle OSDs using an Intel SSL and it
works great. This is getting me read and write SSD caching instead of just
write performance on the journal. It should also allow me to protect the
OSD journal on the same drive as the OSD data and still get benefits of SSD
caching for writes.


On Mon, Oct 7, 2013 at 11:43 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I found this without much effort.
>
> http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/
>
>
> On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com>wrote:
>
>> I also would be interested in how bcache or flashcache would integrate.
>>
>>
>> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>>
>>> Thank's Mike,
>>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>>> drive using "bcache" or "flashcache".
>>> Since I have already plan to use SSD for my journal, I would certainly
>>> use also SSD as cache drive in addition.
>>>
>>> I will have to read documentation about "bcache" and his integration
>>> with Ceph.
>>>
>>> Martin
>>>
>>> Martin Catudal
>>> Responsable TIC
>>> Ressources Metanor Inc
>>> Ligne directe: (819) 218-2708
>>>
>>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>>> > Based on my experience I think you are grossly underestimating the
>>> expense and frequency of flushes issued from your vm's. This will be
>>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>>> the vm is suspended while qemu waits for the flush to finish. I think your
>>> best course of action until the caching pool work is completed (I think I
>>> remember correctly that this is currently in development) is to either use
>>> the ssd's as large caches with bcache or to use them for journal devices.
>>> I'm sure there are some other more informed opinions out there on the best
>>> use of ssd's in a ceph cluster and hopefully they will chime in.
>>> >
>>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca>
>>> wrote:
>>> >
>>> >> Hi Guys,
>>> >> I read all Ceph documentation more than twice. I'm now very
>>> >> comfortable with all the aspect of Ceph except for the strategy of
>>> using
>>> >> my SSD and HDD.
>>> >>
>>> >> Here is my reflexion
>>> >>
>>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>>> my
>>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>>> >>
>>> >> FIRST APPROACH
>>> >> 1. I can use PG with cache write enable as my primary storage that's
>>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>>> >> With the cache write enable, I will gain performance for my VM
>>> >> user machine in VDI environment since Ceph client will not have to
>>> wait
>>> >> for the replicas write confirmation on the slower HDD.
>>> >>
>>> >> SECOND APPROACH
>>> >> 2. Use pools hierarchies and let's have one pool for the SSD as
>>> primary
>>> >> and lets the replicas goes to a second pool name platter for HDD
>>> >> replication.
>>> >> As explain in the Ceph documentation
>>> >> rule ssd-primary {
>>> >> ruleset 4
>>> >> type replicated
>>> >> min_size 5
>>> >> max_size 10
>>> >> step take ssd
>>> >> step chooseleaf firstn 1 type host
>>> >> step emit
>>> >> step take platter
>>> >> step chooseleaf firstn -1 type host
>>> >> step emit
>>> >> }
>>> >>
>>> >> At this point, I could not figure out what approach could have the
>>> most
>>> >> advantage.
>>> >>
>>> >> Your point of view would definitely help me.
>>> >>
>>> >> Sincerely,
>>> >> Martin
>>> >>
>>> >> --
>>> >> Martin Catudal
>>> >> Responsable TIC
>>> >> Ressources Metanor Inc
>>> >> Ligne directe: (819) 218-2708
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users at lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> --
>> *Jason Villalta*
>> Co-founder
>> [image: Inline image 1]
>> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0001.png>
Robert van Leeuwen
2013-10-08 07:09:47 UTC
Permalink
> I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
> This is getting me read and write SSD caching instead of just write performance on the journal.
> It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.

Small note that on Red Hat based distro's + Flashcache + XFS:
There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.

Fore more info, take a look here:
https://github.com/facebook/flashcache/issues/113

Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)

Also, if you want to make use of the SSD trim func

P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131008/8da569fd/attachment.htm>
Kyle Bader
2013-10-09 21:52:48 UTC
Permalink
Journal on SSD should effectively double your throughput because data will
not be written to the same device twice to ensure transactional integrity.
Additionally, by placing the OSD journal on an SSD you should see less
latency, the disk head no longer has to seek back and forth between the
journal and data partitions. For large writes it's not as critical to have
a device that supports high IOPs or throughput because large writes are
striped across many 4MB rados objects, relatively evenly distributed across
the cluster. Small write operations will benefit the most from an OSD data
partition with a writeback cache like btier/flashcache because it can
absorbs an order of magnitude more IOPs and allow a slower spinning device
catch up when there is less activity.


On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
Robert.vanLeeuwen at spilgames.com> wrote:

> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
> it works great.
> > This is getting me read and write SSD caching instead of just write
> performance on the journal.
> > It should also allow me to protect the OSD journal on the same drive as
> the OSD data and still get benefits of SSD caching for writes.
>
> Small note that on Red Hat based distro's + Flashcache + XFS:
> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
> kernel. (anything higher then 2.6.32-279)
> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
> which only just entered Beta.
>
> Fore more info, take a look here:
> https://github.com/facebook/flashcache/issues/113
>
> Since I've hit this issue (thankfully in our dev environment) we are
> slightly less enthusiastic about running flashcache :(
> It also adds a layer of complexity so I would rather just run the journals
> on SSD, at least on Redhat.
> I'm not sure about the performance difference of just journals v.s.
> Flashcache but I'd be happy to read any such comparison :)
>
> Also, if you want to make use of the SSD trim func
>
> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/8cc35732/attachment-0001.htm>
Warren Wang
2013-10-09 22:45:56 UTC
Permalink
While in theory this should be true, I'm not finding it to be the case for a typical enterprise LSI card with 24 drives attached. We tried a variety of ratios and went back to collocated journals on the spinning drives.

Eagerly awaiting the tiered performance changes to implement a faster tier via SSD.

--
Warren

On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:

> Journal on SSD should effectively double your throughput because data will not be written to the same device twice to ensure transactional integrity. Additionally, by placing the OSD journal on an SSD you should see less latency, the disk head no longer has to seek back and forth between the journal and data partitions. For large writes it's not as critical to have a device that supports high IOPs or throughput because large writes are striped across many 4MB rados objects, relatively evenly distributed across the cluster. Small write operations will benefit the most from an OSD data partition with a writeback cache like btier/flashcache because it can absorbs an order of magnitude more IOPs and allow a slower spinning device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
>> > This is getting me read and write SSD caching instead of just write performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Kyle
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/d732ebe4/attachment.htm>
Kyle Bader
2013-10-10 11:43:59 UTC
Permalink
It's hard to comment on how your experience could be made better without
more information about your configuration and how your testing. Anything
along the lines of what LSI controller model, PCI-E bus speed, number of
expander cables, drive type, number of SSDs and whether the SSDs were
connected to the controller or directly to SATA2/SATA3 port on the
mainboard. You mentioned using SSD journal but nothing about a writeback
cache, did you try both? I'm also curious about what kind of workload
didn't get better with an external journal, was this with rados-bench?

I'm really excited about tiering, it will disaggregate the SSDs and allow
more flexibility in cephstore chassis selection because you no longer have
to maintain strict SSD:drive ratios - this seems like a much more elegant
and maintainable solution.


On Wed, Oct 9, 2013 at 3:45 PM, Warren Wang <warren at wangspeed.com> wrote:

> While in theory this should be true, I'm not finding it to be the case for
> a typical enterprise LSI card with 24 drives attached. We tried a variety
> of ratios and went back to collocated journals on the spinning drives.
>
> Eagerly awaiting the tiered performance changes to implement a faster tier
> via SSD.
>
> --
> Warren
>
> On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:
>
> Journal on SSD should effectively double your throughput because data will
> not be written to the same device twice to ensure transactional integrity.
> Additionally, by placing the OSD journal on an SSD you should see less
> latency, the disk head no longer has to seek back and forth between the
> journal and data partitions. For large writes it's not as critical to
> have a device that supports high IOPs or throughput because large writes
> are striped across many 4MB rados objects, relatively evenly distributed
> across the cluster. Small write operations will benefit the most from an
> OSD data partition with a writeback cache like btier/flashcache because it
> can absorbs an order of magnitude more IOPs and allow a slower spinning
> device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
> Robert.vanLeeuwen at spilgames.com> wrote:
>
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
>> it works great.
>> > This is getting me read and write SSD caching instead of just write
>> performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as
>> the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
>> kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
>> which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are
>> slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the
>> journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s.
>> Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
>
> Kyle
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131010/32d57497/attachment-0001.htm>
Kyle Bader
2013-10-10 11:43:59 UTC
Permalink
It's hard to comment on how your experience could be made better without
more information about your configuration and how your testing. Anything
along the lines of what LSI controller model, PCI-E bus speed, number of
expander cables, drive type, number of SSDs and whether the SSDs were
connected to the controller or directly to SATA2/SATA3 port on the
mainboard. You mentioned using SSD journal but nothing about a writeback
cache, did you try both? I'm also curious about what kind of workload
didn't get better with an external journal, was this with rados-bench?

I'm really excited about tiering, it will disaggregate the SSDs and allow
more flexibility in cephstore chassis selection because you no longer have
to maintain strict SSD:drive ratios - this seems like a much more elegant
and maintainable solution.


On Wed, Oct 9, 2013 at 3:45 PM, Warren Wang <warren at wangspeed.com> wrote:

> While in theory this should be true, I'm not finding it to be the case for
> a typical enterprise LSI card with 24 drives attached. We tried a variety
> of ratios and went back to collocated journals on the spinning drives.
>
> Eagerly awaiting the tiered performance changes to implement a faster tier
> via SSD.
>
> --
> Warren
>
> On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:
>
> Journal on SSD should effectively double your throughput because data will
> not be written to the same device twice to ensure transactional integrity.
> Additionally, by placing the OSD journal on an SSD you should see less
> latency, the disk head no longer has to seek back and forth between the
> journal and data partitions. For large writes it's not as critical to
> have a device that supports high IOPs or throughput because large writes
> are striped across many 4MB rados objects, relatively evenly distributed
> across the cluster. Small write operations will benefit the most from an
> OSD data partition with a writeback cache like btier/flashcache because it
> can absorbs an order of magnitude more IOPs and allow a slower spinning
> device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
> Robert.vanLeeuwen at spilgames.com> wrote:
>
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
>> it works great.
>> > This is getting me read and write SSD caching instead of just write
>> performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as
>> the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
>> kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
>> which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are
>> slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the
>> journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s.
>> Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
>
> Kyle
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131010/32d57497/attachment-0002.htm>
Kyle Bader
2013-10-10 11:43:59 UTC
Permalink
It's hard to comment on how your experience could be made better without
more information about your configuration and how your testing. Anything
along the lines of what LSI controller model, PCI-E bus speed, number of
expander cables, drive type, number of SSDs and whether the SSDs were
connected to the controller or directly to SATA2/SATA3 port on the
mainboard. You mentioned using SSD journal but nothing about a writeback
cache, did you try both? I'm also curious about what kind of workload
didn't get better with an external journal, was this with rados-bench?

I'm really excited about tiering, it will disaggregate the SSDs and allow
more flexibility in cephstore chassis selection because you no longer have
to maintain strict SSD:drive ratios - this seems like a much more elegant
and maintainable solution.


On Wed, Oct 9, 2013 at 3:45 PM, Warren Wang <warren at wangspeed.com> wrote:

> While in theory this should be true, I'm not finding it to be the case for
> a typical enterprise LSI card with 24 drives attached. We tried a variety
> of ratios and went back to collocated journals on the spinning drives.
>
> Eagerly awaiting the tiered performance changes to implement a faster tier
> via SSD.
>
> --
> Warren
>
> On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:
>
> Journal on SSD should effectively double your throughput because data will
> not be written to the same device twice to ensure transactional integrity.
> Additionally, by placing the OSD journal on an SSD you should see less
> latency, the disk head no longer has to seek back and forth between the
> journal and data partitions. For large writes it's not as critical to
> have a device that supports high IOPs or throughput because large writes
> are striped across many 4MB rados objects, relatively evenly distributed
> across the cluster. Small write operations will benefit the most from an
> OSD data partition with a writeback cache like btier/flashcache because it
> can absorbs an order of magnitude more IOPs and allow a slower spinning
> device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
> Robert.vanLeeuwen at spilgames.com> wrote:
>
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
>> it works great.
>> > This is getting me read and write SSD caching instead of just write
>> performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as
>> the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
>> kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
>> which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are
>> slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the
>> journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s.
>> Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
>
> Kyle
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131010/32d57497/attachment-0003.htm>
Kyle Bader
2013-10-10 11:43:59 UTC
Permalink
It's hard to comment on how your experience could be made better without
more information about your configuration and how your testing. Anything
along the lines of what LSI controller model, PCI-E bus speed, number of
expander cables, drive type, number of SSDs and whether the SSDs were
connected to the controller or directly to SATA2/SATA3 port on the
mainboard. You mentioned using SSD journal but nothing about a writeback
cache, did you try both? I'm also curious about what kind of workload
didn't get better with an external journal, was this with rados-bench?

I'm really excited about tiering, it will disaggregate the SSDs and allow
more flexibility in cephstore chassis selection because you no longer have
to maintain strict SSD:drive ratios - this seems like a much more elegant
and maintainable solution.


On Wed, Oct 9, 2013 at 3:45 PM, Warren Wang <warren at wangspeed.com> wrote:

> While in theory this should be true, I'm not finding it to be the case for
> a typical enterprise LSI card with 24 drives attached. We tried a variety
> of ratios and went back to collocated journals on the spinning drives.
>
> Eagerly awaiting the tiered performance changes to implement a faster tier
> via SSD.
>
> --
> Warren
>
> On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:
>
> Journal on SSD should effectively double your throughput because data will
> not be written to the same device twice to ensure transactional integrity.
> Additionally, by placing the OSD journal on an SSD you should see less
> latency, the disk head no longer has to seek back and forth between the
> journal and data partitions. For large writes it's not as critical to
> have a device that supports high IOPs or throughput because large writes
> are striped across many 4MB rados objects, relatively evenly distributed
> across the cluster. Small write operations will benefit the most from an
> OSD data partition with a writeback cache like btier/flashcache because it
> can absorbs an order of magnitude more IOPs and allow a slower spinning
> device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
> Robert.vanLeeuwen at spilgames.com> wrote:
>
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
>> it works great.
>> > This is getting me read and write SSD caching instead of just write
>> performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as
>> the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
>> kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
>> which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are
>> slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the
>> journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s.
>> Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
>
> Kyle
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131010/32d57497/attachment-0004.htm>
Warren Wang
2013-10-09 22:45:56 UTC
Permalink
While in theory this should be true, I'm not finding it to be the case for a typical enterprise LSI card with 24 drives attached. We tried a variety of ratios and went back to collocated journals on the spinning drives.

Eagerly awaiting the tiered performance changes to implement a faster tier via SSD.

--
Warren

On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:

> Journal on SSD should effectively double your throughput because data will not be written to the same device twice to ensure transactional integrity. Additionally, by placing the OSD journal on an SSD you should see less latency, the disk head no longer has to seek back and forth between the journal and data partitions. For large writes it's not as critical to have a device that supports high IOPs or throughput because large writes are striped across many 4MB rados objects, relatively evenly distributed across the cluster. Small write operations will benefit the most from an OSD data partition with a writeback cache like btier/flashcache because it can absorbs an order of magnitude more IOPs and allow a slower spinning device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
>> > This is getting me read and write SSD caching instead of just write performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Kyle
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/d732ebe4/attachment-0002.htm>
Warren Wang
2013-10-09 22:45:56 UTC
Permalink
While in theory this should be true, I'm not finding it to be the case for a typical enterprise LSI card with 24 drives attached. We tried a variety of ratios and went back to collocated journals on the spinning drives.

Eagerly awaiting the tiered performance changes to implement a faster tier via SSD.

--
Warren

On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:

> Journal on SSD should effectively double your throughput because data will not be written to the same device twice to ensure transactional integrity. Additionally, by placing the OSD journal on an SSD you should see less latency, the disk head no longer has to seek back and forth between the journal and data partitions. For large writes it's not as critical to have a device that supports high IOPs or throughput because large writes are striped across many 4MB rados objects, relatively evenly distributed across the cluster. Small write operations will benefit the most from an OSD data partition with a writeback cache like btier/flashcache because it can absorbs an order of magnitude more IOPs and allow a slower spinning device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
>> > This is getting me read and write SSD caching instead of just write performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Kyle
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/d732ebe4/attachment-0003.htm>
Warren Wang
2013-10-09 22:45:56 UTC
Permalink
While in theory this should be true, I'm not finding it to be the case for a typical enterprise LSI card with 24 drives attached. We tried a variety of ratios and went back to collocated journals on the spinning drives.

Eagerly awaiting the tiered performance changes to implement a faster tier via SSD.

--
Warren

On Oct 9, 2013, at 5:52 PM, Kyle Bader <kyle.bader at gmail.com> wrote:

> Journal on SSD should effectively double your throughput because data will not be written to the same device twice to ensure transactional integrity. Additionally, by placing the OSD journal on an SSD you should see less latency, the disk head no longer has to seek back and forth between the journal and data partitions. For large writes it's not as critical to have a device that supports high IOPs or throughput because large writes are striped across many 4MB rados objects, relatively evenly distributed across the cluster. Small write operations will benefit the most from an OSD data partition with a writeback cache like btier/flashcache because it can absorbs an order of magnitude more IOPs and allow a slower spinning device catch up when there is less activity.
>
>
> On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:
>> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
>> > This is getting me read and write SSD caching instead of just write performance on the journal.
>> > It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.
>>
>> Small note that on Red Hat based distro's + Flashcache + XFS:
>> There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
>> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.
>>
>> Fore more info, take a look here:
>> https://github.com/facebook/flashcache/issues/113
>>
>> Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
>> It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
>> I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)
>>
>> Also, if you want to make use of the SSD trim func
>>
>> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
>
> Kyle
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/d732ebe4/attachment-0004.htm>
Kyle Bader
2013-10-09 21:52:48 UTC
Permalink
Journal on SSD should effectively double your throughput because data will
not be written to the same device twice to ensure transactional integrity.
Additionally, by placing the OSD journal on an SSD you should see less
latency, the disk head no longer has to seek back and forth between the
journal and data partitions. For large writes it's not as critical to have
a device that supports high IOPs or throughput because large writes are
striped across many 4MB rados objects, relatively evenly distributed across
the cluster. Small write operations will benefit the most from an OSD data
partition with a writeback cache like btier/flashcache because it can
absorbs an order of magnitude more IOPs and allow a slower spinning device
catch up when there is less activity.


On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
Robert.vanLeeuwen at spilgames.com> wrote:

> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
> it works great.
> > This is getting me read and write SSD caching instead of just write
> performance on the journal.
> > It should also allow me to protect the OSD journal on the same drive as
> the OSD data and still get benefits of SSD caching for writes.
>
> Small note that on Red Hat based distro's + Flashcache + XFS:
> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
> kernel. (anything higher then 2.6.32-279)
> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
> which only just entered Beta.
>
> Fore more info, take a look here:
> https://github.com/facebook/flashcache/issues/113
>
> Since I've hit this issue (thankfully in our dev environment) we are
> slightly less enthusiastic about running flashcache :(
> It also adds a layer of complexity so I would rather just run the journals
> on SSD, at least on Redhat.
> I'm not sure about the performance difference of just journals v.s.
> Flashcache but I'd be happy to read any such comparison :)
>
> Also, if you want to make use of the SSD trim func
>
> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/8cc35732/attachment-0002.htm>
Kyle Bader
2013-10-09 21:52:48 UTC
Permalink
Journal on SSD should effectively double your throughput because data will
not be written to the same device twice to ensure transactional integrity.
Additionally, by placing the OSD journal on an SSD you should see less
latency, the disk head no longer has to seek back and forth between the
journal and data partitions. For large writes it's not as critical to have
a device that supports high IOPs or throughput because large writes are
striped across many 4MB rados objects, relatively evenly distributed across
the cluster. Small write operations will benefit the most from an OSD data
partition with a writeback cache like btier/flashcache because it can
absorbs an order of magnitude more IOPs and allow a slower spinning device
catch up when there is less activity.


On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
Robert.vanLeeuwen at spilgames.com> wrote:

> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
> it works great.
> > This is getting me read and write SSD caching instead of just write
> performance on the journal.
> > It should also allow me to protect the OSD journal on the same drive as
> the OSD data and still get benefits of SSD caching for writes.
>
> Small note that on Red Hat based distro's + Flashcache + XFS:
> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
> kernel. (anything higher then 2.6.32-279)
> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
> which only just entered Beta.
>
> Fore more info, take a look here:
> https://github.com/facebook/flashcache/issues/113
>
> Since I've hit this issue (thankfully in our dev environment) we are
> slightly less enthusiastic about running flashcache :(
> It also adds a layer of complexity so I would rather just run the journals
> on SSD, at least on Redhat.
> I'm not sure about the performance difference of just journals v.s.
> Flashcache but I'd be happy to read any such comparison :)
>
> Also, if you want to make use of the SSD trim func
>
> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/8cc35732/attachment-0003.htm>
Kyle Bader
2013-10-09 21:52:48 UTC
Permalink
Journal on SSD should effectively double your throughput because data will
not be written to the same device twice to ensure transactional integrity.
Additionally, by placing the OSD journal on an SSD you should see less
latency, the disk head no longer has to seek back and forth between the
journal and data partitions. For large writes it's not as critical to have
a device that supports high IOPs or throughput because large writes are
striped across many 4MB rados objects, relatively evenly distributed across
the cluster. Small write operations will benefit the most from an OSD data
partition with a writeback cache like btier/flashcache because it can
absorbs an order of magnitude more IOPs and allow a slower spinning device
catch up when there is less activity.


On Tue, Oct 8, 2013 at 12:09 AM, Robert van Leeuwen <
Robert.vanLeeuwen at spilgames.com> wrote:

> > I tried putting Flashcache on my spindle OSDs using an Intel SSL and
> it works great.
> > This is getting me read and write SSD caching instead of just write
> performance on the journal.
> > It should also allow me to protect the OSD journal on the same drive as
> the OSD data and still get benefits of SSD caching for writes.
>
> Small note that on Red Hat based distro's + Flashcache + XFS:
> There is a major issue (kernel panics) running xfs + flashcache on a 6.4
> kernel. (anything higher then 2.6.32-279)
> It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5
> which only just entered Beta.
>
> Fore more info, take a look here:
> https://github.com/facebook/flashcache/issues/113
>
> Since I've hit this issue (thankfully in our dev environment) we are
> slightly less enthusiastic about running flashcache :(
> It also adds a layer of complexity so I would rather just run the journals
> on SSD, at least on Redhat.
> I'm not sure about the performance difference of just journals v.s.
> Flashcache but I'd be happy to read any such comparison :)
>
> Also, if you want to make use of the SSD trim func
>
> P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131009/8cc35732/attachment-0004.htm>
Robert van Leeuwen
2013-10-08 07:09:47 UTC
Permalink
> I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
> This is getting me read and write SSD caching instead of just write performance on the journal.
> It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.

Small note that on Red Hat based distro's + Flashcache + XFS:
There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.

Fore more info, take a look here:
https://github.com/facebook/flashcache/issues/113

Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)

Also, if you want to make use of the SSD trim func

P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131008/8da569fd/attachment-0002.htm>
Robert van Leeuwen
2013-10-08 07:09:47 UTC
Permalink
> I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
> This is getting me read and write SSD caching instead of just write performance on the journal.
> It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.

Small note that on Red Hat based distro's + Flashcache + XFS:
There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.

Fore more info, take a look here:
https://github.com/facebook/flashcache/issues/113

Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)

Also, if you want to make use of the SSD trim func

P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131008/8da569fd/attachment-0003.htm>
Robert van Leeuwen
2013-10-08 07:09:47 UTC
Permalink
> I tried putting Flashcache on my spindle OSDs using an Intel SSL and it works great.
> This is getting me read and write SSD caching instead of just write performance on the journal.
> It should also allow me to protect the OSD journal on the same drive as the OSD data and still get benefits of SSD caching for writes.

Small note that on Red Hat based distro's + Flashcache + XFS:
There is a major issue (kernel panics) running xfs + flashcache on a 6.4 kernel. (anything higher then 2.6.32-279)
It should be fixed in kernel 2.6.32-387.el6 which, I assume, will be 6.5 which only just entered Beta.

Fore more info, take a look here:
https://github.com/facebook/flashcache/issues/113

Since I've hit this issue (thankfully in our dev environment) we are slightly less enthusiastic about running flashcache :(
It also adds a layer of complexity so I would rather just run the journals on SSD, at least on Redhat.
I'm not sure about the performance difference of just journals v.s. Flashcache but I'd be happy to read any such comparison :)

Also, if you want to make use of the SSD trim func

P.S. My experience with Flashcache is on Openstack Swift & Nova not Ceph.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131008/8da569fd/attachment-0004.htm>
Jason Villalta
2013-10-08 02:27:06 UTC
Permalink
I tried putting Flashcache on my spindle OSDs using an Intel SSL and it
works great. This is getting me read and write SSD caching instead of just
write performance on the journal. It should also allow me to protect the
OSD journal on the same drive as the OSD data and still get benefits of SSD
caching for writes.


On Mon, Oct 7, 2013 at 11:43 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I found this without much effort.
>
> http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/
>
>
> On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com>wrote:
>
>> I also would be interested in how bcache or flashcache would integrate.
>>
>>
>> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>>
>>> Thank's Mike,
>>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>>> drive using "bcache" or "flashcache".
>>> Since I have already plan to use SSD for my journal, I would certainly
>>> use also SSD as cache drive in addition.
>>>
>>> I will have to read documentation about "bcache" and his integration
>>> with Ceph.
>>>
>>> Martin
>>>
>>> Martin Catudal
>>> Responsable TIC
>>> Ressources Metanor Inc
>>> Ligne directe: (819) 218-2708
>>>
>>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>>> > Based on my experience I think you are grossly underestimating the
>>> expense and frequency of flushes issued from your vm's. This will be
>>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>>> the vm is suspended while qemu waits for the flush to finish. I think your
>>> best course of action until the caching pool work is completed (I think I
>>> remember correctly that this is currently in development) is to either use
>>> the ssd's as large caches with bcache or to use them for journal devices.
>>> I'm sure there are some other more informed opinions out there on the best
>>> use of ssd's in a ceph cluster and hopefully they will chime in.
>>> >
>>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca>
>>> wrote:
>>> >
>>> >> Hi Guys,
>>> >> I read all Ceph documentation more than twice. I'm now very
>>> >> comfortable with all the aspect of Ceph except for the strategy of
>>> using
>>> >> my SSD and HDD.
>>> >>
>>> >> Here is my reflexion
>>> >>
>>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>>> my
>>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>>> >>
>>> >> FIRST APPROACH
>>> >> 1. I can use PG with cache write enable as my primary storage that's
>>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>>> >> With the cache write enable, I will gain performance for my VM
>>> >> user machine in VDI environment since Ceph client will not have to
>>> wait
>>> >> for the replicas write confirmation on the slower HDD.
>>> >>
>>> >> SECOND APPROACH
>>> >> 2. Use pools hierarchies and let's have one pool for the SSD as
>>> primary
>>> >> and lets the replicas goes to a second pool name platter for HDD
>>> >> replication.
>>> >> As explain in the Ceph documentation
>>> >> rule ssd-primary {
>>> >> ruleset 4
>>> >> type replicated
>>> >> min_size 5
>>> >> max_size 10
>>> >> step take ssd
>>> >> step chooseleaf firstn 1 type host
>>> >> step emit
>>> >> step take platter
>>> >> step chooseleaf firstn -1 type host
>>> >> step emit
>>> >> }
>>> >>
>>> >> At this point, I could not figure out what approach could have the
>>> most
>>> >> advantage.
>>> >>
>>> >> Your point of view would definitely help me.
>>> >>
>>> >> Sincerely,
>>> >> Martin
>>> >>
>>> >> --
>>> >> Martin Catudal
>>> >> Responsable TIC
>>> >> Ressources Metanor Inc
>>> >> Ligne directe: (819) 218-2708
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users at lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> --
>> *Jason Villalta*
>> Co-founder
>> [image: Inline image 1]
>> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0002.png>
Jason Villalta
2013-10-08 02:27:06 UTC
Permalink
I tried putting Flashcache on my spindle OSDs using an Intel SSL and it
works great. This is getting me read and write SSD caching instead of just
write performance on the journal. It should also allow me to protect the
OSD journal on the same drive as the OSD data and still get benefits of SSD
caching for writes.


On Mon, Oct 7, 2013 at 11:43 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I found this without much effort.
>
> http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/
>
>
> On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com>wrote:
>
>> I also would be interested in how bcache or flashcache would integrate.
>>
>>
>> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>>
>>> Thank's Mike,
>>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>>> drive using "bcache" or "flashcache".
>>> Since I have already plan to use SSD for my journal, I would certainly
>>> use also SSD as cache drive in addition.
>>>
>>> I will have to read documentation about "bcache" and his integration
>>> with Ceph.
>>>
>>> Martin
>>>
>>> Martin Catudal
>>> Responsable TIC
>>> Ressources Metanor Inc
>>> Ligne directe: (819) 218-2708
>>>
>>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>>> > Based on my experience I think you are grossly underestimating the
>>> expense and frequency of flushes issued from your vm's. This will be
>>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>>> the vm is suspended while qemu waits for the flush to finish. I think your
>>> best course of action until the caching pool work is completed (I think I
>>> remember correctly that this is currently in development) is to either use
>>> the ssd's as large caches with bcache or to use them for journal devices.
>>> I'm sure there are some other more informed opinions out there on the best
>>> use of ssd's in a ceph cluster and hopefully they will chime in.
>>> >
>>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca>
>>> wrote:
>>> >
>>> >> Hi Guys,
>>> >> I read all Ceph documentation more than twice. I'm now very
>>> >> comfortable with all the aspect of Ceph except for the strategy of
>>> using
>>> >> my SSD and HDD.
>>> >>
>>> >> Here is my reflexion
>>> >>
>>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>>> my
>>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>>> >>
>>> >> FIRST APPROACH
>>> >> 1. I can use PG with cache write enable as my primary storage that's
>>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>>> >> With the cache write enable, I will gain performance for my VM
>>> >> user machine in VDI environment since Ceph client will not have to
>>> wait
>>> >> for the replicas write confirmation on the slower HDD.
>>> >>
>>> >> SECOND APPROACH
>>> >> 2. Use pools hierarchies and let's have one pool for the SSD as
>>> primary
>>> >> and lets the replicas goes to a second pool name platter for HDD
>>> >> replication.
>>> >> As explain in the Ceph documentation
>>> >> rule ssd-primary {
>>> >> ruleset 4
>>> >> type replicated
>>> >> min_size 5
>>> >> max_size 10
>>> >> step take ssd
>>> >> step chooseleaf firstn 1 type host
>>> >> step emit
>>> >> step take platter
>>> >> step chooseleaf firstn -1 type host
>>> >> step emit
>>> >> }
>>> >>
>>> >> At this point, I could not figure out what approach could have the
>>> most
>>> >> advantage.
>>> >>
>>> >> Your point of view would definitely help me.
>>> >>
>>> >> Sincerely,
>>> >> Martin
>>> >>
>>> >> --
>>> >> Martin Catudal
>>> >> Responsable TIC
>>> >> Ressources Metanor Inc
>>> >> Ligne directe: (819) 218-2708
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users at lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> --
>> *Jason Villalta*
>> Co-founder
>> [image: Inline image 1]
>> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0003.png>
Jason Villalta
2013-10-08 02:27:06 UTC
Permalink
I tried putting Flashcache on my spindle OSDs using an Intel SSL and it
works great. This is getting me read and write SSD caching instead of just
write performance on the journal. It should also allow me to protect the
OSD journal on the same drive as the OSD data and still get benefits of SSD
caching for writes.


On Mon, Oct 7, 2013 at 11:43 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I found this without much effort.
>
> http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/
>
>
> On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com>wrote:
>
>> I also would be interested in how bcache or flashcache would integrate.
>>
>>
>> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>>
>>> Thank's Mike,
>>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>>> drive using "bcache" or "flashcache".
>>> Since I have already plan to use SSD for my journal, I would certainly
>>> use also SSD as cache drive in addition.
>>>
>>> I will have to read documentation about "bcache" and his integration
>>> with Ceph.
>>>
>>> Martin
>>>
>>> Martin Catudal
>>> Responsable TIC
>>> Ressources Metanor Inc
>>> Ligne directe: (819) 218-2708
>>>
>>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>>> > Based on my experience I think you are grossly underestimating the
>>> expense and frequency of flushes issued from your vm's. This will be
>>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>>> the vm is suspended while qemu waits for the flush to finish. I think your
>>> best course of action until the caching pool work is completed (I think I
>>> remember correctly that this is currently in development) is to either use
>>> the ssd's as large caches with bcache or to use them for journal devices.
>>> I'm sure there are some other more informed opinions out there on the best
>>> use of ssd's in a ceph cluster and hopefully they will chime in.
>>> >
>>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca>
>>> wrote:
>>> >
>>> >> Hi Guys,
>>> >> I read all Ceph documentation more than twice. I'm now very
>>> >> comfortable with all the aspect of Ceph except for the strategy of
>>> using
>>> >> my SSD and HDD.
>>> >>
>>> >> Here is my reflexion
>>> >>
>>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>>> my
>>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>>> >>
>>> >> FIRST APPROACH
>>> >> 1. I can use PG with cache write enable as my primary storage that's
>>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>>> >> With the cache write enable, I will gain performance for my VM
>>> >> user machine in VDI environment since Ceph client will not have to
>>> wait
>>> >> for the replicas write confirmation on the slower HDD.
>>> >>
>>> >> SECOND APPROACH
>>> >> 2. Use pools hierarchies and let's have one pool for the SSD as
>>> primary
>>> >> and lets the replicas goes to a second pool name platter for HDD
>>> >> replication.
>>> >> As explain in the Ceph documentation
>>> >> rule ssd-primary {
>>> >> ruleset 4
>>> >> type replicated
>>> >> min_size 5
>>> >> max_size 10
>>> >> step take ssd
>>> >> step chooseleaf firstn 1 type host
>>> >> step emit
>>> >> step take platter
>>> >> step chooseleaf firstn -1 type host
>>> >> step emit
>>> >> }
>>> >>
>>> >> At this point, I could not figure out what approach could have the
>>> most
>>> >> advantage.
>>> >>
>>> >> Your point of view would definitely help me.
>>> >>
>>> >> Sincerely,
>>> >> Martin
>>> >>
>>> >> --
>>> >> Martin Catudal
>>> >> Responsable TIC
>>> >> Ressources Metanor Inc
>>> >> Ligne directe: (819) 218-2708
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users at lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users at lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> --
>> --
>> *Jason Villalta*
>> Co-founder
>> [image: Inline image 1]
>> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/b7996b67/attachment-0004.png>
Jason Villalta
2013-10-07 15:43:56 UTC
Permalink
I found this without much effort.
http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/


On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I also would be interested in how bcache or flashcache would integrate.
>
>
> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>
>> Thank's Mike,
>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>> drive using "bcache" or "flashcache".
>> Since I have already plan to use SSD for my journal, I would certainly
>> use also SSD as cache drive in addition.
>>
>> I will have to read documentation about "bcache" and his integration
>> with Ceph.
>>
>> Martin
>>
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>>
>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>> > Based on my experience I think you are grossly underestimating the
>> expense and frequency of flushes issued from your vm's. This will be
>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>> the vm is suspended while qemu waits for the flush to finish. I think your
>> best course of action until the caching pool work is completed (I think I
>> remember correctly that this is currently in development) is to either use
>> the ssd's as large caches with bcache or to use them for journal devices.
>> I'm sure there are some other more informed opinions out there on the best
>> use of ssd's in a ceph cluster and hopefully they will chime in.
>> >
>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>> >
>> >> Hi Guys,
>> >> I read all Ceph documentation more than twice. I'm now very
>> >> comfortable with all the aspect of Ceph except for the strategy of
>> using
>> >> my SSD and HDD.
>> >>
>> >> Here is my reflexion
>> >>
>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>> my
>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>> >>
>> >> FIRST APPROACH
>> >> 1. I can use PG with cache write enable as my primary storage that's
>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>> >> With the cache write enable, I will gain performance for my VM
>> >> user machine in VDI environment since Ceph client will not have to wait
>> >> for the replicas write confirmation on the slower HDD.
>> >>
>> >> SECOND APPROACH
>> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> >> and lets the replicas goes to a second pool name platter for HDD
>> >> replication.
>> >> As explain in the Ceph documentation
>> >> rule ssd-primary {
>> >> ruleset 4
>> >> type replicated
>> >> min_size 5
>> >> max_size 10
>> >> step take ssd
>> >> step chooseleaf firstn 1 type host
>> >> step emit
>> >> step take platter
>> >> step chooseleaf firstn -1 type host
>> >> step emit
>> >> }
>> >>
>> >> At this point, I could not figure out what approach could have the most
>> >> advantage.
>> >>
>> >> Your point of view would definitely help me.
>> >>
>> >> Sincerely,
>> >> Martin
>> >>
>> >> --
>> >> Martin Catudal
>> >> Responsable TIC
>> >> Ressources Metanor Inc
>> >> Ligne directe: (819) 218-2708
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users at lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment-0002.png>
Jason Villalta
2013-10-07 15:43:56 UTC
Permalink
I found this without much effort.
http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/


On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I also would be interested in how bcache or flashcache would integrate.
>
>
> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>
>> Thank's Mike,
>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>> drive using "bcache" or "flashcache".
>> Since I have already plan to use SSD for my journal, I would certainly
>> use also SSD as cache drive in addition.
>>
>> I will have to read documentation about "bcache" and his integration
>> with Ceph.
>>
>> Martin
>>
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>>
>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>> > Based on my experience I think you are grossly underestimating the
>> expense and frequency of flushes issued from your vm's. This will be
>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>> the vm is suspended while qemu waits for the flush to finish. I think your
>> best course of action until the caching pool work is completed (I think I
>> remember correctly that this is currently in development) is to either use
>> the ssd's as large caches with bcache or to use them for journal devices.
>> I'm sure there are some other more informed opinions out there on the best
>> use of ssd's in a ceph cluster and hopefully they will chime in.
>> >
>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>> >
>> >> Hi Guys,
>> >> I read all Ceph documentation more than twice. I'm now very
>> >> comfortable with all the aspect of Ceph except for the strategy of
>> using
>> >> my SSD and HDD.
>> >>
>> >> Here is my reflexion
>> >>
>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>> my
>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>> >>
>> >> FIRST APPROACH
>> >> 1. I can use PG with cache write enable as my primary storage that's
>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>> >> With the cache write enable, I will gain performance for my VM
>> >> user machine in VDI environment since Ceph client will not have to wait
>> >> for the replicas write confirmation on the slower HDD.
>> >>
>> >> SECOND APPROACH
>> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> >> and lets the replicas goes to a second pool name platter for HDD
>> >> replication.
>> >> As explain in the Ceph documentation
>> >> rule ssd-primary {
>> >> ruleset 4
>> >> type replicated
>> >> min_size 5
>> >> max_size 10
>> >> step take ssd
>> >> step chooseleaf firstn 1 type host
>> >> step emit
>> >> step take platter
>> >> step chooseleaf firstn -1 type host
>> >> step emit
>> >> }
>> >>
>> >> At this point, I could not figure out what approach could have the most
>> >> advantage.
>> >>
>> >> Your point of view would definitely help me.
>> >>
>> >> Sincerely,
>> >> Martin
>> >>
>> >> --
>> >> Martin Catudal
>> >> Responsable TIC
>> >> Ressources Metanor Inc
>> >> Ligne directe: (819) 218-2708
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users at lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment-0003.png>
Jason Villalta
2013-10-07 15:43:56 UTC
Permalink
I found this without much effort.
http://www.sebastien-han.fr/blog/2012/11/15/make-your-rbd-fly-with-flashcache/


On Mon, Oct 7, 2013 at 11:39 AM, Jason Villalta <jason at rubixnet.com> wrote:

> I also would be interested in how bcache or flashcache would integrate.
>
>
> On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca>wrote:
>
>> Thank's Mike,
>> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
>> drive using "bcache" or "flashcache".
>> Since I have already plan to use SSD for my journal, I would certainly
>> use also SSD as cache drive in addition.
>>
>> I will have to read documentation about "bcache" and his integration
>> with Ceph.
>>
>> Martin
>>
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>>
>> Le 2013-10-07 11:25, Mike Lowe a ?crit :
>> > Based on my experience I think you are grossly underestimating the
>> expense and frequency of flushes issued from your vm's. This will be
>> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
>> the vm is suspended while qemu waits for the flush to finish. I think your
>> best course of action until the caching pool work is completed (I think I
>> remember correctly that this is currently in development) is to either use
>> the ssd's as large caches with bcache or to use them for journal devices.
>> I'm sure there are some other more informed opinions out there on the best
>> use of ssd's in a ceph cluster and hopefully they will chime in.
>> >
>> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>> >
>> >> Hi Guys,
>> >> I read all Ceph documentation more than twice. I'm now very
>> >> comfortable with all the aspect of Ceph except for the strategy of
>> using
>> >> my SSD and HDD.
>> >>
>> >> Here is my reflexion
>> >>
>> >> I've two approach in my understanding about use fast SSD (900 GB) for
>> my
>> >> primary storage and huge but slower HDD (4 TB) for replicas.
>> >>
>> >> FIRST APPROACH
>> >> 1. I can use PG with cache write enable as my primary storage that's
>> >> goes on my SSD and let replicas goes on my 7200 RPM.
>> >> With the cache write enable, I will gain performance for my VM
>> >> user machine in VDI environment since Ceph client will not have to wait
>> >> for the replicas write confirmation on the slower HDD.
>> >>
>> >> SECOND APPROACH
>> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> >> and lets the replicas goes to a second pool name platter for HDD
>> >> replication.
>> >> As explain in the Ceph documentation
>> >> rule ssd-primary {
>> >> ruleset 4
>> >> type replicated
>> >> min_size 5
>> >> max_size 10
>> >> step take ssd
>> >> step chooseleaf firstn 1 type host
>> >> step emit
>> >> step take platter
>> >> step chooseleaf firstn -1 type host
>> >> step emit
>> >> }
>> >>
>> >> At this point, I could not figure out what approach could have the most
>> >> advantage.
>> >>
>> >> Your point of view would definitely help me.
>> >>
>> >> Sincerely,
>> >> Martin
>> >>
>> >> --
>> >> Martin Catudal
>> >> Responsable TIC
>> >> Ressources Metanor Inc
>> >> Ligne directe: (819) 218-2708
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users at lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/76ffc8f2/attachment-0004.png>
Jason Villalta
2013-10-07 15:39:55 UTC
Permalink
I also would be interested in how bcache or flashcache would integrate.


On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Thank's Mike,
> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
> drive using "bcache" or "flashcache".
> Since I have already plan to use SSD for my journal, I would certainly
> use also SSD as cache drive in addition.
>
> I will have to read documentation about "bcache" and his integration
> with Ceph.
>
> Martin
>
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
>
> Le 2013-10-07 11:25, Mike Lowe a ?crit :
> > Based on my experience I think you are grossly underestimating the
> expense and frequency of flushes issued from your vm's. This will be
> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
> the vm is suspended while qemu waits for the flush to finish. I think your
> best course of action until the caching pool work is completed (I think I
> remember correctly that this is currently in development) is to either use
> the ssd's as large caches with bcache or to use them for journal devices.
> I'm sure there are some other more informed opinions out there on the best
> use of ssd's in a ceph cluster and hopefully they will chime in.
> >
> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
> >
> >> Hi Guys,
> >> I read all Ceph documentation more than twice. I'm now very
> >> comfortable with all the aspect of Ceph except for the strategy of using
> >> my SSD and HDD.
> >>
> >> Here is my reflexion
> >>
> >> I've two approach in my understanding about use fast SSD (900 GB) for my
> >> primary storage and huge but slower HDD (4 TB) for replicas.
> >>
> >> FIRST APPROACH
> >> 1. I can use PG with cache write enable as my primary storage that's
> >> goes on my SSD and let replicas goes on my 7200 RPM.
> >> With the cache write enable, I will gain performance for my VM
> >> user machine in VDI environment since Ceph client will not have to wait
> >> for the replicas write confirmation on the slower HDD.
> >>
> >> SECOND APPROACH
> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> >> and lets the replicas goes to a second pool name platter for HDD
> >> replication.
> >> As explain in the Ceph documentation
> >> rule ssd-primary {
> >> ruleset 4
> >> type replicated
> >> min_size 5
> >> max_size 10
> >> step take ssd
> >> step chooseleaf firstn 1 type host
> >> step emit
> >> step take platter
> >> step chooseleaf firstn -1 type host
> >> step emit
> >> }
> >>
> >> At this point, I could not figure out what approach could have the most
> >> advantage.
> >>
> >> Your point of view would definitely help me.
> >>
> >> Sincerely,
> >> Martin
> >>
> >> --
> >> Martin Catudal
> >> Responsable TIC
> >> Ressources Metanor Inc
> >> Ligne directe: (819) 218-2708
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0002.png>
Jason Villalta
2013-10-07 15:39:55 UTC
Permalink
I also would be interested in how bcache or flashcache would integrate.


On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Thank's Mike,
> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
> drive using "bcache" or "flashcache".
> Since I have already plan to use SSD for my journal, I would certainly
> use also SSD as cache drive in addition.
>
> I will have to read documentation about "bcache" and his integration
> with Ceph.
>
> Martin
>
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
>
> Le 2013-10-07 11:25, Mike Lowe a ?crit :
> > Based on my experience I think you are grossly underestimating the
> expense and frequency of flushes issued from your vm's. This will be
> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
> the vm is suspended while qemu waits for the flush to finish. I think your
> best course of action until the caching pool work is completed (I think I
> remember correctly that this is currently in development) is to either use
> the ssd's as large caches with bcache or to use them for journal devices.
> I'm sure there are some other more informed opinions out there on the best
> use of ssd's in a ceph cluster and hopefully they will chime in.
> >
> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
> >
> >> Hi Guys,
> >> I read all Ceph documentation more than twice. I'm now very
> >> comfortable with all the aspect of Ceph except for the strategy of using
> >> my SSD and HDD.
> >>
> >> Here is my reflexion
> >>
> >> I've two approach in my understanding about use fast SSD (900 GB) for my
> >> primary storage and huge but slower HDD (4 TB) for replicas.
> >>
> >> FIRST APPROACH
> >> 1. I can use PG with cache write enable as my primary storage that's
> >> goes on my SSD and let replicas goes on my 7200 RPM.
> >> With the cache write enable, I will gain performance for my VM
> >> user machine in VDI environment since Ceph client will not have to wait
> >> for the replicas write confirmation on the slower HDD.
> >>
> >> SECOND APPROACH
> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> >> and lets the replicas goes to a second pool name platter for HDD
> >> replication.
> >> As explain in the Ceph documentation
> >> rule ssd-primary {
> >> ruleset 4
> >> type replicated
> >> min_size 5
> >> max_size 10
> >> step take ssd
> >> step chooseleaf firstn 1 type host
> >> step emit
> >> step take platter
> >> step chooseleaf firstn -1 type host
> >> step emit
> >> }
> >>
> >> At this point, I could not figure out what approach could have the most
> >> advantage.
> >>
> >> Your point of view would definitely help me.
> >>
> >> Sincerely,
> >> Martin
> >>
> >> --
> >> Martin Catudal
> >> Responsable TIC
> >> Ressources Metanor Inc
> >> Ligne directe: (819) 218-2708
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0003.png>
Jason Villalta
2013-10-07 15:39:55 UTC
Permalink
I also would be interested in how bcache or flashcache would integrate.


On Mon, Oct 7, 2013 at 11:34 AM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Thank's Mike,
> Kyle Bader suggest me also to use my large SSD (900 GB) as cache
> drive using "bcache" or "flashcache".
> Since I have already plan to use SSD for my journal, I would certainly
> use also SSD as cache drive in addition.
>
> I will have to read documentation about "bcache" and his integration
> with Ceph.
>
> Martin
>
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
>
> Le 2013-10-07 11:25, Mike Lowe a ?crit :
> > Based on my experience I think you are grossly underestimating the
> expense and frequency of flushes issued from your vm's. This will be
> especially bad if you aren't using the async flush from qemu >= 1.4.2 as
> the vm is suspended while qemu waits for the flush to finish. I think your
> best course of action until the caching pool work is completed (I think I
> remember correctly that this is currently in development) is to either use
> the ssd's as large caches with bcache or to use them for journal devices.
> I'm sure there are some other more informed opinions out there on the best
> use of ssd's in a ceph cluster and hopefully they will chime in.
> >
> > On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
> >
> >> Hi Guys,
> >> I read all Ceph documentation more than twice. I'm now very
> >> comfortable with all the aspect of Ceph except for the strategy of using
> >> my SSD and HDD.
> >>
> >> Here is my reflexion
> >>
> >> I've two approach in my understanding about use fast SSD (900 GB) for my
> >> primary storage and huge but slower HDD (4 TB) for replicas.
> >>
> >> FIRST APPROACH
> >> 1. I can use PG with cache write enable as my primary storage that's
> >> goes on my SSD and let replicas goes on my 7200 RPM.
> >> With the cache write enable, I will gain performance for my VM
> >> user machine in VDI environment since Ceph client will not have to wait
> >> for the replicas write confirmation on the slower HDD.
> >>
> >> SECOND APPROACH
> >> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> >> and lets the replicas goes to a second pool name platter for HDD
> >> replication.
> >> As explain in the Ceph documentation
> >> rule ssd-primary {
> >> ruleset 4
> >> type replicated
> >> min_size 5
> >> max_size 10
> >> step take ssd
> >> step chooseleaf firstn 1 type host
> >> step emit
> >> step take platter
> >> step chooseleaf firstn -1 type host
> >> step emit
> >> }
> >>
> >> At this point, I could not figure out what approach could have the most
> >> advantage.
> >>
> >> Your point of view would definitely help me.
> >>
> >> Sincerely,
> >> Martin
> >>
> >> --
> >> Martin Catudal
> >> Responsable TIC
> >> Ressources Metanor Inc
> >> Ligne directe: (819) 218-2708
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users at lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
--
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 6619 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20131007/29c09e6c/attachment-0004.png>
Martin Catudal
2013-10-07 15:34:48 UTC
Permalink
Thank's Mike,
Kyle Bader suggest me also to use my large SSD (900 GB) as cache
drive using "bcache" or "flashcache".
Since I have already plan to use SSD for my journal, I would certainly
use also SSD as cache drive in addition.

I will have to read documentation about "bcache" and his integration
with Ceph.

Martin

Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708

Le 2013-10-07 11:25, Mike Lowe a ?crit :
> Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.
>
> On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>
>> Hi Guys,
>> I read all Ceph documentation more than twice. I'm now very
>> comfortable with all the aspect of Ceph except for the strategy of using
>> my SSD and HDD.
>>
>> Here is my reflexion
>>
>> I've two approach in my understanding about use fast SSD (900 GB) for my
>> primary storage and huge but slower HDD (4 TB) for replicas.
>>
>> FIRST APPROACH
>> 1. I can use PG with cache write enable as my primary storage that's
>> goes on my SSD and let replicas goes on my 7200 RPM.
>> With the cache write enable, I will gain performance for my VM
>> user machine in VDI environment since Ceph client will not have to wait
>> for the replicas write confirmation on the slower HDD.
>>
>> SECOND APPROACH
>> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> and lets the replicas goes to a second pool name platter for HDD
>> replication.
>> As explain in the Ceph documentation
>> rule ssd-primary {
>> ruleset 4
>> type replicated
>> min_size 5
>> max_size 10
>> step take ssd
>> step chooseleaf firstn 1 type host
>> step emit
>> step take platter
>> step chooseleaf firstn -1 type host
>> step emit
>> }
>>
>> At this point, I could not figure out what approach could have the most
>> advantage.
>>
>> Your point of view would definitely help me.
>>
>> Sincerely,
>> Martin
>>
>> --
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Martin Catudal
2013-10-07 15:34:48 UTC
Permalink
Thank's Mike,
Kyle Bader suggest me also to use my large SSD (900 GB) as cache
drive using "bcache" or "flashcache".
Since I have already plan to use SSD for my journal, I would certainly
use also SSD as cache drive in addition.

I will have to read documentation about "bcache" and his integration
with Ceph.

Martin

Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708

Le 2013-10-07 11:25, Mike Lowe a ?crit :
> Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.
>
> On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>
>> Hi Guys,
>> I read all Ceph documentation more than twice. I'm now very
>> comfortable with all the aspect of Ceph except for the strategy of using
>> my SSD and HDD.
>>
>> Here is my reflexion
>>
>> I've two approach in my understanding about use fast SSD (900 GB) for my
>> primary storage and huge but slower HDD (4 TB) for replicas.
>>
>> FIRST APPROACH
>> 1. I can use PG with cache write enable as my primary storage that's
>> goes on my SSD and let replicas goes on my 7200 RPM.
>> With the cache write enable, I will gain performance for my VM
>> user machine in VDI environment since Ceph client will not have to wait
>> for the replicas write confirmation on the slower HDD.
>>
>> SECOND APPROACH
>> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> and lets the replicas goes to a second pool name platter for HDD
>> replication.
>> As explain in the Ceph documentation
>> rule ssd-primary {
>> ruleset 4
>> type replicated
>> min_size 5
>> max_size 10
>> step take ssd
>> step chooseleaf firstn 1 type host
>> step emit
>> step take platter
>> step chooseleaf firstn -1 type host
>> step emit
>> }
>>
>> At this point, I could not figure out what approach could have the most
>> advantage.
>>
>> Your point of view would definitely help me.
>>
>> Sincerely,
>> Martin
>>
>> --
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Martin Catudal
2013-10-07 15:34:48 UTC
Permalink
Thank's Mike,
Kyle Bader suggest me also to use my large SSD (900 GB) as cache
drive using "bcache" or "flashcache".
Since I have already plan to use SSD for my journal, I would certainly
use also SSD as cache drive in addition.

I will have to read documentation about "bcache" and his integration
with Ceph.

Martin

Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708

Le 2013-10-07 11:25, Mike Lowe a ?crit :
> Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.
>
> On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:
>
>> Hi Guys,
>> I read all Ceph documentation more than twice. I'm now very
>> comfortable with all the aspect of Ceph except for the strategy of using
>> my SSD and HDD.
>>
>> Here is my reflexion
>>
>> I've two approach in my understanding about use fast SSD (900 GB) for my
>> primary storage and huge but slower HDD (4 TB) for replicas.
>>
>> FIRST APPROACH
>> 1. I can use PG with cache write enable as my primary storage that's
>> goes on my SSD and let replicas goes on my 7200 RPM.
>> With the cache write enable, I will gain performance for my VM
>> user machine in VDI environment since Ceph client will not have to wait
>> for the replicas write confirmation on the slower HDD.
>>
>> SECOND APPROACH
>> 2. Use pools hierarchies and let's have one pool for the SSD as primary
>> and lets the replicas goes to a second pool name platter for HDD
>> replication.
>> As explain in the Ceph documentation
>> rule ssd-primary {
>> ruleset 4
>> type replicated
>> min_size 5
>> max_size 10
>> step take ssd
>> step chooseleaf firstn 1 type host
>> step emit
>> step take platter
>> step chooseleaf firstn -1 type host
>> step emit
>> }
>>
>> At this point, I could not figure out what approach could have the most
>> advantage.
>>
>> Your point of view would definitely help me.
>>
>> Sincerely,
>> Martin
>>
>> --
>> Martin Catudal
>> Responsable TIC
>> Ressources Metanor Inc
>> Ligne directe: (819) 218-2708
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Martin Catudal
2013-10-07 01:23:15 UTC
Permalink
Hi Guys,
I read all Ceph documentation more than twice. I'm now very
comfortable with all the aspect of Ceph except for the strategy of using
my SSD and HDD.

Here is my reflexion

I've two approach in my understanding about use fast SSD (900 GB) for my
primary storage and huge but slower HDD (4 TB) for replicas.

FIRST APPROACH
1. I can use PG with cache write enable as my primary storage that's
goes on my SSD and let replicas goes on my 7200 RPM.
With the cache write enable, I will gain performance for my VM
user machine in VDI environment since Ceph client will not have to wait
for the replicas write confirmation on the slower HDD.

SECOND APPROACH
2. Use pools hierarchies and let's have one pool for the SSD as primary
and lets the replicas goes to a second pool name platter for HDD
replication.
As explain in the Ceph documentation
rule ssd-primary {
ruleset 4
type replicated
min_size 5
max_size 10
step take ssd
step chooseleaf firstn 1 type host
step emit
step take platter
step chooseleaf firstn -1 type host
step emit
}

At this point, I could not figure out what approach could have the most
advantage.

Your point of view would definitely help me.

Sincerely,
Martin

--
Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708
Mike Lowe
2013-10-07 15:25:00 UTC
Permalink
Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.

On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Hi Guys,
> I read all Ceph documentation more than twice. I'm now very
> comfortable with all the aspect of Ceph except for the strategy of using
> my SSD and HDD.
>
> Here is my reflexion
>
> I've two approach in my understanding about use fast SSD (900 GB) for my
> primary storage and huge but slower HDD (4 TB) for replicas.
>
> FIRST APPROACH
> 1. I can use PG with cache write enable as my primary storage that's
> goes on my SSD and let replicas goes on my 7200 RPM.
> With the cache write enable, I will gain performance for my VM
> user machine in VDI environment since Ceph client will not have to wait
> for the replicas write confirmation on the slower HDD.
>
> SECOND APPROACH
> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> and lets the replicas goes to a second pool name platter for HDD
> replication.
> As explain in the Ceph documentation
> rule ssd-primary {
> ruleset 4
> type replicated
> min_size 5
> max_size 10
> step take ssd
> step chooseleaf firstn 1 type host
> step emit
> step take platter
> step chooseleaf firstn -1 type host
> step emit
> }
>
> At this point, I could not figure out what approach could have the most
> advantage.
>
> Your point of view would definitely help me.
>
> Sincerely,
> Martin
>
> --
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Martin Catudal
2013-10-07 01:23:15 UTC
Permalink
Hi Guys,
I read all Ceph documentation more than twice. I'm now very
comfortable with all the aspect of Ceph except for the strategy of using
my SSD and HDD.

Here is my reflexion

I've two approach in my understanding about use fast SSD (900 GB) for my
primary storage and huge but slower HDD (4 TB) for replicas.

FIRST APPROACH
1. I can use PG with cache write enable as my primary storage that's
goes on my SSD and let replicas goes on my 7200 RPM.
With the cache write enable, I will gain performance for my VM
user machine in VDI environment since Ceph client will not have to wait
for the replicas write confirmation on the slower HDD.

SECOND APPROACH
2. Use pools hierarchies and let's have one pool for the SSD as primary
and lets the replicas goes to a second pool name platter for HDD
replication.
As explain in the Ceph documentation
rule ssd-primary {
ruleset 4
type replicated
min_size 5
max_size 10
step take ssd
step chooseleaf firstn 1 type host
step emit
step take platter
step chooseleaf firstn -1 type host
step emit
}

At this point, I could not figure out what approach could have the most
advantage.

Your point of view would definitely help me.

Sincerely,
Martin

--
Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708
Mike Lowe
2013-10-07 15:25:00 UTC
Permalink
Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.

On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Hi Guys,
> I read all Ceph documentation more than twice. I'm now very
> comfortable with all the aspect of Ceph except for the strategy of using
> my SSD and HDD.
>
> Here is my reflexion
>
> I've two approach in my understanding about use fast SSD (900 GB) for my
> primary storage and huge but slower HDD (4 TB) for replicas.
>
> FIRST APPROACH
> 1. I can use PG with cache write enable as my primary storage that's
> goes on my SSD and let replicas goes on my 7200 RPM.
> With the cache write enable, I will gain performance for my VM
> user machine in VDI environment since Ceph client will not have to wait
> for the replicas write confirmation on the slower HDD.
>
> SECOND APPROACH
> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> and lets the replicas goes to a second pool name platter for HDD
> replication.
> As explain in the Ceph documentation
> rule ssd-primary {
> ruleset 4
> type replicated
> min_size 5
> max_size 10
> step take ssd
> step chooseleaf firstn 1 type host
> step emit
> step take platter
> step chooseleaf firstn -1 type host
> step emit
> }
>
> At this point, I could not figure out what approach could have the most
> advantage.
>
> Your point of view would definitely help me.
>
> Sincerely,
> Martin
>
> --
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Martin Catudal
2013-10-07 01:23:15 UTC
Permalink
Hi Guys,
I read all Ceph documentation more than twice. I'm now very
comfortable with all the aspect of Ceph except for the strategy of using
my SSD and HDD.

Here is my reflexion

I've two approach in my understanding about use fast SSD (900 GB) for my
primary storage and huge but slower HDD (4 TB) for replicas.

FIRST APPROACH
1. I can use PG with cache write enable as my primary storage that's
goes on my SSD and let replicas goes on my 7200 RPM.
With the cache write enable, I will gain performance for my VM
user machine in VDI environment since Ceph client will not have to wait
for the replicas write confirmation on the slower HDD.

SECOND APPROACH
2. Use pools hierarchies and let's have one pool for the SSD as primary
and lets the replicas goes to a second pool name platter for HDD
replication.
As explain in the Ceph documentation
rule ssd-primary {
ruleset 4
type replicated
min_size 5
max_size 10
step take ssd
step chooseleaf firstn 1 type host
step emit
step take platter
step chooseleaf firstn -1 type host
step emit
}

At this point, I could not figure out what approach could have the most
advantage.

Your point of view would definitely help me.

Sincerely,
Martin

--
Martin Catudal
Responsable TIC
Ressources Metanor Inc
Ligne directe: (819) 218-2708
Mike Lowe
2013-10-07 15:25:00 UTC
Permalink
Based on my experience I think you are grossly underestimating the expense and frequency of flushes issued from your vm's. This will be especially bad if you aren't using the async flush from qemu >= 1.4.2 as the vm is suspended while qemu waits for the flush to finish. I think your best course of action until the caching pool work is completed (I think I remember correctly that this is currently in development) is to either use the ssd's as large caches with bcache or to use them for journal devices. I'm sure there are some other more informed opinions out there on the best use of ssd's in a ceph cluster and hopefully they will chime in.

On Oct 6, 2013, at 9:23 PM, Martin Catudal <mcatudal at metanor.ca> wrote:

> Hi Guys,
> I read all Ceph documentation more than twice. I'm now very
> comfortable with all the aspect of Ceph except for the strategy of using
> my SSD and HDD.
>
> Here is my reflexion
>
> I've two approach in my understanding about use fast SSD (900 GB) for my
> primary storage and huge but slower HDD (4 TB) for replicas.
>
> FIRST APPROACH
> 1. I can use PG with cache write enable as my primary storage that's
> goes on my SSD and let replicas goes on my 7200 RPM.
> With the cache write enable, I will gain performance for my VM
> user machine in VDI environment since Ceph client will not have to wait
> for the replicas write confirmation on the slower HDD.
>
> SECOND APPROACH
> 2. Use pools hierarchies and let's have one pool for the SSD as primary
> and lets the replicas goes to a second pool name platter for HDD
> replication.
> As explain in the Ceph documentation
> rule ssd-primary {
> ruleset 4
> type replicated
> min_size 5
> max_size 10
> step take ssd
> step chooseleaf firstn 1 type host
> step emit
> step take platter
> step chooseleaf firstn -1 type host
> step emit
> }
>
> At this point, I could not figure out what approach could have the most
> advantage.
>
> Your point of view would definitely help me.
>
> Sincerely,
> Martin
>
> --
> Martin Catudal
> Responsable TIC
> Ressources Metanor Inc
> Ligne directe: (819) 218-2708
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Continue reading on narkive:
Loading...