[ale] Giant storage system suggestions

Alex Carver agcarver+ale at acarver.net
Fri Jul 13 14:19:03 EDT 2012


On 7/12/2012 04:22, JD wrote:
 > On 07/11/2012 05:03 PM, Alex Carver wrote:
 >> I'm trying to design a storage system for some of my data in a way that
 >> will be useful to duplicate the design for a project at work.
 >
 > We have 2 requirements at this point.
 > a) Cheap
 > b) 10TB usable with room to grow
 >
 > A few more questions need to be asked to get started understanding the
 > requirements more completely.
 >
 > * What services do you require?
 > ** CIFS
 > ** NFS
 > ** iSCSI
 > ** AoE
 > ** rsync
 > ** others?

Actually, none of those on the work system. :)  The data is going to be 
accessible via HTTP for the most part (custom website designed to 
catalog and sort the data).  Uploads of smaller files will be by direct 
HTTP POST and larger files will be scp/sftp.  The home system might use 
just NFS/CIFS to be mounted and used like any normal storage volume.

 > * How will you backup all this data automatically?
 > ** tape
 > ** duplicity
 > ** lvm
 > ** zsend

That was TBD but likely a combination of tape and blueray.  No remote 
storage in either home or work.  Home because that's how I feel.  Work 
because of ITAR and other government regulations.

 > * Which file system(s) do you want/need?

The file system has to work with Linux/BSD because the OS on the server 
is going to be one of those.  Beyond that it won't matter as long as 
it's a scalable filesystem.

 > * Storage Performance?
 > ** 10/100 connection
 > ** GigE connection
 > ** 10GigE connection
 > ** multiple bonded 10GigE connections?

The connection to the server is going to be 100Mb/s ethernet (GigE may 
show up later depending on network upgrades).  The array is directly 
connected to the server so that's all SATA/SAS (or at least it was in my 
first thoughts about it).

 > * Transport connections?
 > ** ethernet over copper

Ethernet over copper.

> * Are there any unusual distance requirements for access to the storage?

No distance requirements.

>
> * What is the largest partition size required?  This feeds into backups and
> future data migration options.  You WILL need to migrate the data in the future.

I was hoping for one monolithic volume.  It just needs to be a giant 
data storage volume.  Organization is covered via the custom interface.

 > * How critical is the data?

The data is archival.  No one dies but the data should be around for 
many years.

 > * Budget?
 > ** SW - is commercial SW an option at all?
> ** HW - RAID cards fail occasionally, so you'll want an identical spare available.
> ** Support - things that might take me a week to figure out are solved in an
> hour by a professional in the business.

Software RAID is fine, it's not necessary for this system to use 
hardware RAID (my original concept wasn't hardware RAID either).  As 
close to standard software is best (Linux/BSD, abstracted hardware ( 
/dev/hd[a-z][0-9], udev, etc.) so that there's no dependence on the 
specific hardware).

 > * RAID Options
 > ** RAID6
 > ** RAID10
 > ** RAIDz
 > ** RAIDz2

This was one of the questions I had asked.  RAID level is flexible as 
long as I have some redundancy but not to the point that I'm losing a 
significant amount of array space to the redundancy components.  If I 
can handle a two or three disk loss then I'm fine.  I will have cold 
spares for the drives.

 > * Any data replication requirements?

Nope, just the backups once in a while.


 > * Any HA data requirements?

No, no high availability requirements.  It's completely archival.  At 
work live data that is in regular use is stored on personal machines 
with a backup sent to the server.  Archived data resides on the server. 
  At home it's pretty much the same thing with some minor exceptions 
like movies or music is pulled directly from the server.

> At this level, I'd expect hardware to be picky, so be certain that anything you
> piece together is listed as supported between the RAID, expansion, external
> array, protocols, physical connections and motherboard.

Right, I wanted to make a system that could gloss over the hardware 
specifics.  For example, software RAID shouldn't care what kind of SATA 
card is in place as long as the drives are accessible and still have the 
same device names.  I wanted to avoid hardware RAID only because I get 
locked into a specific card and vendor.  Other than that everything was 
flexible.


> My initial brainstorm said "he needs ZFS", but only you can decide if that is
> possible.  Last time I checked, ZFS under Linux isn't a first-class supported
> solution.
>
> Sorry, no real answer from me, just more questions.  Good luck and please post
> more about your attempts and final solution.  Actually, the final solution would
> be a fantastic ALE presentation.




More information about the Ale mailing list