[SGVLUG] Cluster Filesystems

Chris Smith cbsmith at gmail.com
Sun Jan 8 12:08:42 PST 2006


On 1/8/06, Max Clark <max at clarksys.com> wrote:
> NetApp does support multiple head and device redundancy via a cluster
> software option - this is just not economical feasible for many people.
> For our customers with the 6 figure storage budget we can sell them a
> wide selection of different options all with exceptional redundancy and
> failover options - I am looking for something in the sub $10,000 dollar
> range.

Ah. Now I get it. It's a cost thing. Yeah, NetApp is not going to win
any points for cost effectiveness.

> A series of ATA/SATA disks in a 4U w/ a Raidcore or 3ware controller
> works well for mass storage requirements - however the recovery time due
> to disk failure is way to painful (a disk failure on a 4TB array w/
> 500GB drives will take most of the day to rebuild.

Use RAID-10. The whole recovery experience is much smoother than
RAID-5, and it's not like you aren't going to have enough disk space.

> I am looking to build a true COTS based system with each individual node
> having between 1 and 4 hard drives.

Note that with the model you are proposing recovery times due to node
failure with such a style configuration of the data will actually be
*worse* than the getto-RAID model's disk failure recovery time. The
nodes are going to have to replicate data back and forth, and the
network isn't likely to have the bandwidth that even SATA provides.
Furthermore, while they're doing this they are going to be very hard
on whatever switch is sitting in between them. This will be
particularly painful if that switch is managing other traffic.

Even for disk failure, your recovery will only be as good as with a
comperable getto-RAID setup.

> I want the data to be replicated across the nodes in the storage cluster so that
> if one of the nodes fails the storage is still accessible. I know with CIFS/NFS
> this does not equate to automatic failover for the client - but the outage window
> to reconnect to a different node is acceptable as long as the storage is
> still accessible. While all of the storage nodes will be running Linux
> the clients need to be a mix of Linux, Windows, Sun, OS X hence the need
> for NFS/CIFS.

Ah, now I get it. Actually, with that combination, I'd go for pure NFS
or AFS  without a clustered filesystem behind it (I'm trying to
remember the state of AFS for OS X, but as I recall it exists).

Okay, now that I grok your goals, I'm tempted to suggest going with
NFS on top of ext3 on top of network block device-RAID-10 that I had
mentioned before. Here's the rational:

1) It's incredibly cheap. You aren't going to need to spend a dime on
software or fancy hardware. Ideally you shouldn't have more than 2 or
3 storage nodes, so even the switch won't need to be upgraded (heck,
if the servers have multiple network ports you could potentially just
use cross-over cables and bypass the switch entirely for
communications between the storage nodes).
2) It has fast recovery times in the event of node failure. Most nodes
are just providing raw storage, so even if they fail the system keeps
working. You can also use Linux-HA's heartbeat software to have one of
the dumb nodes instantly take over NFS server duty if that node fails.
RAID rebuilds for RAID-10 are much, much faster than what you've got
right now.
3) It's incredibly simple. One can never overestimate the value of this. ;-)

So, what's the bad news? The bad news is that you need to buy at least
2x the raw storage that you'll actually have (more if you want to have
hot/cold spares). In this day of gigantic SATA drives that may be an
acceptible trade off, and is undoubtedly cheaper than your NetApp
solution.

--
Chris


More information about the SGVLUG mailing list