[SGVLUG] ssd and linux

Mon Jul 2 10:49:00 PDT 2012

On Sat, Jun 30, 2012 at 8:41 AM, Dan Kegel <dank at kegel.com> wrote:
> On Fri, Jun 29, 2012 at 10:59 PM, matti <mathew_2000 at yahoo.com> wrote:
>> So, watching the prices go down, and wondering...
>>
>> Anyone here have experience with linux and SSDs?
>
> My two year old notes say:
> 1) the only disks worth even thinking about are Intel's

That was true at one time. It is no longer the case.

> 2) and even so they're no good for heavy disk crunching (like
> repeated building of complex C++ projects like chrome;
> it starts off fast but then degrades.)

There are a lot of confusing aspects to SSD performance. People often
get the wrong idea that they are just a "faster hard drive", but they
aren't. When you look at how SSD's actually work, they aren't really
like a hard drive at all. They're more like a massive RAID of hundreds
of tiny & frail hard disks, each with a fairly generous read cache,
and then there are the issues with configuration.... Unfortunately
most OS's do a lousy job of tuning appropriately for them. With Linux
though, there is always a way to get it right yourself. Here's the
quick and dirty rules:

1) TRIM support is needed in the filesystem or... forget it. Lacking
TRIM hurts overall performance, and performance degrades horribly
without it (overall performance does degrade as wear on the drive
increases, but it is much milder than without TRIM). If you simply
can't do it, periodically back up your filesystem, reformat the
partition, and restore.
2) Even more so than with regular drives, unallocated space is
critical to achieve good write throughput.
3) For most ssd's, you're going to want to tell the filesystem to work
with 4k blocks. If need be, pretend your are setting up the filesystem
on a RAID with 4K stripe sizes. Don't forget that your partition needs
to be properly aligned on 4K boundaries or it is all for naught!
4) Elevator schedulers are not an SSD's best friend. Most people
recommend turning off IO scheduling entirely when dealing with SSD's.
I think an argument could be made for the anticipatory scheduler... if
it didn't have bugs and were tweaked for SSD's, but otherwise anything
other than "NONE" is really slowing your SSD down. I've used the CFQ
scheduler, but I did so knowing full well it would cost me
performance. I did it so I could properly do IO prioritization.
5) SSD's really kick butt at iops (hundreds of times better than
typical hard drives). This isn't the same as latency. Where you'll see
the huge win is if you can really create tons of IO requests in
parallel (which, for example, is not how the typical compiler or
enterprise database works... it often is how web servers and webapps
work).
6) Most SSD's do have pretty impressive read latency. You can expect a
good 10x improvement over your typical hard drive, even if the
filesystem has been beating on the cells for a long time. That sounds
awesome, except when you realize that these days people tend to have
most of their reads cached in RAM (and while SSD's have great read
latency, they can't hold a candle to RAM). This is why people with new
SSD's end up talking about boot times and application launch times:
those are the times where you are least able to get help from RAM.
7) Most SSD's do NOT have impressive write latency. On average they
tend to be a bit better than a hard drive, but depending on your
choices, they can actually be much, much worse (if your SSD's cost is
close to that of the hard drive... I'd lay good odds on it being
worse). This is confusing, as you see all these benchmarks showing
tends of thousands (if not hundreds of thousands) of iops... Guess
what? To achieve those numbers, not only do the writes need to be
perfectly aligned along 4K boundaries, but a good chunk of those IO's
have to be in flight at any given time during that second. As
compilers generate a lot of writes sequentially... well, compiles are
not going to get much faster with an SSD. You can try to "simulate"
parallel IO by breaking durability rules and letting IO's complete
before they are written to the SSD. With a hard drive, this is
generally a recipe for disaster, as unless you find some way to really
amortize your IO's, you end up bottle necked by the maximum iops of
the drive anyway. With SSD's, that's pretty much a non-issue. On the
down side though, you are putting your data at risk. If I was doing
trying to optimize compiles with SSD's, I'd probably try to maximize
the parallelism of my builds and I'd probably have my generated object
files, etc. written to a dedicated tmpfs like filesystem/partition. As
long as you can sacrifice a decent amount of RAM to let the IO's
appear completed, you would see a performance win. Only problem would
be: what was the point of all that?
8) Finally, certain firmwares/controllers had bugs in them which
caused periodic multi-second IO stalls. IIRC, it was primarily second
generation SanForce controllers (looking at you OCZ Vertex2! ;-). For
some reason it seemed to plague Linux particularly badly. Anyway, a
firmware upgrade would make it all go away (backup the drive before
you do it though!).

> If I got one for Linux, I would put just the OS on it,
> not /var and not /home.  But then I also don't buy
> cars with power windows.

I've been primarily using SSD's with Linux on my laptops, and the
results have been pretty stellar. We've also been using them at work
(my previous employer is now using them almost everywhere). They
really do help tremendously. These days I generally use ext4 tweaked a
fair bit to accommodate SSD's needs. Btrfs supposedly does a much
better job addressing optimization of SSD performance, but I don't yet
feel comfortable letting any data I value at all touch that thing with
a 10' pole. Probably the most annoying I've found is that when I *am*
IO bound, I no longer have the not so subtle sound of drive arms
scrambling back and forth providing an out-of-bound signal that "you
are I/O bound again" (I had one case where my system was running at
like 1/2 normal speed, particularly with context switches, and it took
me several minutes to realize that my working set size had completely
exceeded RAM, and my system was swapping like a demon.

-- 
Chris