Friday, February 22, 2013

Every Man's Tech - Part I - RAID

Ahoy,

Sometimes, okay a lot of times, I forget I live in a unique microcosm where we throw around terms like VDI, RAID, FC, iSCSI, CIFS, SATA, SAS, SSD, etc and forget that most people look at us like we're from Mars when we start mumbling about this stuff.  I suppose it happens in whatever industry you get into, but the acronyms get out of control and soon you're using more acronyms than actual words.  I should have known something was up when my father, a brilliant man, reads my blogs and says, "I've read the whole thing and have NO idea what you're talking about."  I suppose it's like I'm talking in code, so my Sister and Nephew gave me a great idea, write a blog for the layperson.  While discussing the idea with my wonderful wife, she told me how about writing a series of blogs.  I think it's a great idea and hope you will to.  I dedicate this series to Matt, Michelle and Sandy.

I figured I'd start with RAID since storage really is built on on it.  So what does RAID stand for?  Redundant Array of Inexpensive Disks.  Okay, but what does it MEAN?  So in the good 'ol days hard drive sizes were, small, I mean REALLY small.  Forget gigabytes, I'm talking megabytes.  So you had a bunch of 10meg hard drives, what did you do with them?  Well, some brilliant folks came up with a great idea, take all of these cheap, well sorta cheap, drives and get the OS to lump them all together into one big blob. 

You have to remember that a new hard drive is like a new born child.  It doesn't have an identity nor does it know what it's purpose is until someone names, feeds, and nurtures it.  The same holds true for a hard drive.  When you put it into a computer, the computer adopts the drive, gives it a name, a file system type and begins to use it to store data.

So what if we took a bunch of drives, put them into an enclosure, because the computer just doesn't have enough room to house all the drives, and call it a JBOD.  What's a JBOD?  Just a Bunch Of Disks.  Nope I'm not kidding.  Attach that JBOD to the computer and with this brilliant code you take 10 individual, 10meg hard drives and turn it into 1 100meg hard drive!  How cool is that?  Imagine 10 individual Lego blocks.  Now line up those 10 blocks, you've got a concatenated storage pool. The problem with concatenation is it's a bit slow cause it has to fill up the first Lego block, than the next, etc.

Which brings me to our first RAID level, RAID 0.  Still have those Lego blocks?  Keep them lined up.  Now, image instead of writing to the first block until it's full, we write to ALL the blocks all at once.  A bit confused?  Say I'm writing my name to the blocks, with concatenation, I write NEIL to the first block.  First I write the N, then the E, next I and finally L.  Now instead of that way, I write N E I L across 4 of the blocks simultaneously, bit faster isn't it?  That's what's called Striping and it's RAID 0.  Striping is very fast, but take away the Lego block that had the N on it.  Instead of NEIL, you now have EIL, which is no longer my name.  Now image your data is spread across multiple drives in the same way.  If you lost a drive, the surviving data wouldn't make any sense.

Okay how can we protect against data loss?  Let's take just two blocks.  Instead of lining them up, place them one under the other.  Now write NEIL to the top block and as you're writing to it, write to the bottom one.  This is called mirroring and is known as RAID 1.  The problem with this is you're limited to the size of the drive or need to go back and mirror your concatenation which we won't get into.  Plus you need to buy an extra drive JUST for mirrored data.  Now a days RAID 1 is great for home or small business computers because the single drives in computers are so large you just need to purchase one extra to mirror to protect your data.  But what if you need more storage than that?  Or want the speed of the striping model?

This leads us to the next RAID level, RAID 0 + 1.  Take those 10 Lego blocks and either get 10 more or divide them in half.  So you'll have either 2 rows of 10 or 2 rows of 5.  Now imagine you're using RAID 1 to write my name to the first row of blocks, when you write the N to the first block, you also write it to the block below it and the same holds true for the E, I, and L.  So what are you doing?  You're creating a mirror of your stripe, so in case you lose one of your blocks you have a mirror that's got everything on it so your data won't be lost.  This is great technology and many storage vendors use this as the preferred method since it's fast AND reliable.  The problem?  Go back to the beginning of the paragraph.  I had you either get 10 more bricks or split the10 into 5.  Now back to hard drives.  If I want to maintain the 100megs, I need to buy 10 more drives.  If I split them in half, I lose half my capacity.  So RAID 0 + 1 is great, but it can get expensive.

So what's next?  Let's talk about RAID 4.  So line up those 10 blocks again.  We're going to stripe our data again, but only to 9 of those blocks.  The 10th block is going to calculate where all the data is going, that's called calculating parity.  So after you write my name, it will know where the N, E, I, and L live on those blocks.  Now say I lose the block with the E.  If this were RAID 0 my data would be lost even though most of it is still there.  With RAID 4 since it was calculating where all the letters were going, when I replace the drive it knows an E is missing and re-builds it!  How cool is that?!  So it brings your data back from the dead, well sorta.  The problem with this?  That one little drive has a lot of work to do cause it's constantly calculating what is going where and will become the bottleneck and begin to slow down your writes.  Plus that drive is no longer used to store data, it only calculates parity.

What if we share the parity burden?  Take your 10 blocks, now instead of number 10 doing all of the parity heavy lifting, let's spread the work across all 10 blocks!  You've now got RAID 5.  We're still going to lose a total of one drive to parity, but at least one drive isn't getting beat up anymore.  This is a great method to keep your data safe and not pay as much since you're only losing one drive to parity.  The problem with RAID 5 is there still is the cost of calculating parity.  Writes aren't going to be as slow, but won't be as fast as RAID 0 + 1.

What about all the levels in between?  There still there but not really used anymore.  I hope you enjoyed my Every Man's Tech on RAID technology.  Remember this is the building block of modern storage so if you have questions, feel free to ask away!  :-)  I'm looking for suggestions, for this series, please comment and let me know what you'd like for me to discuss.  Cause if you don't I'll just start babbling away on what I want to talk about!

Until Next Time!

No comments:

Post a Comment