return to I Love My Journal
A Little Closer to Center...
Musings about Life, Linux, and Latter-day Saints.
Pages
About Me
Links


Tags
PERSONAL 520
SPIRITUAL 416
LDS 312
BOOK OF MORMON 237
SCRIPTURES 154
STUDIO-JOURNEY 129
RELIGION 112
LINUX 79
COMPUTERS 65
LIFE 60
GENERAL CONFERENCE 46
GENTOO 39
MISCELLANEOUS 37
MUSIC 37
PROGRAMMING 33
CARS 29
MICROSOFT 23
FAMILY 23
AUDIO 21
I LOVE MY JOURNAL 18
FUN 15
CHILDREN 12
CURRENT EVENTS 10
NATURE'S WAY 10
VIDEO 9
DRM 9
CONEXM 7
BABBLINGS 7
PROVO CITY CENTER TEMPLE 6
FRIENDS 6
HEROD THE FINK 5
GAMES 5
COMPUTER HARDWARE 5
DRUMS 4
HAND OF GOD 3
ADVERSITY 3
KDENLIVE 3
AUDIO HARDWARE 3
GENERAL INSANITY 3
STUDIO 3
THANKS4GIVING 2
CATS 2
MY JOURNAL 1
POETRY 1
FOREVERGREEN 1
EVERYDAY THOUGHTS 1
GOSPEL 1
PARENTING 1
YOUTH CONFERENCE 1
CHURCH NOTES 1
POLITICS 1


RSS Feed

RSS FeedSubscribe!
Wed - Apr 02, 2008 : 02:33 pm
amazed
   rated 0 times
>>next>>
<<previous<<
Linux Software RAID On Trial
Here at work, the time is fast coming where I will be required to put live, the server I've been working on for the past year.

And we all know that when the rubber meets the road, things get a little bit more serious.

I came to the conclusion that since I'm the only Linux person in this company, should the Linux servers ever decide to go *kaput*, there would be only one person to blame, and quite frankly, I never want to deal with that situation.

The servers we have are IBM cheapies, so I didn't have any hardware RAID controllers at my disposal.

I have heard really good things about the Linux software RAID, so I decided to use the two available hard drive bays as mirrors in a RAID-1 configuration using Linux software RAID (md) on gentoo.

Setting up the RAID was easy.  I had that done about 10 months ago.

Well... I hadn't realized until now that no testing had been done on the RAID at all, so in event of a hard drive failure, I wouldn't even know it had happened.

So, yesterday, I got the mdadm monitoring system in place by putting the following command in a cron job which runs every 15 minutes: mdadm --monitor --scan -1

I also made sure a valid email address was placed in the /etc/mdadm.conf file.

Once I saw that the monitoring system was running (by using the --test option), we decided to go to the server and pull a hard drive out while the system was running.

Before I pulled it out, I catted the /proc/mdstat node, and it gave me the "we're all good" sign.  Here's the output:

JDEV php5 # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md2 : active raid1 sdb2[1] sda2[0]
      3911744 blocks [2/2] [UU]
     
md3 : active raid1 sdb3[1] sda3[0]
      240179712 blocks [2/2] [UU]
     
unused devices: <none>

We did so, and when I came back, I had two emails waiting for me which told me a hard drive had failed.

The server was still 100% functional and had no evidence of any stuttering or stalling whatsoever.

This is what the readout looked like when I catted /proc/mdstat:

JDEV php5 # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md2 : active raid1 sdb2[2](F) sda2[0]
      3911744 blocks [2/1] [U_]
     
md3 : active raid1 sdb3[2](F) sda3[0]
      240179712 blocks [2/1] [U_]
     
unused devices: <none>

That tells me that the sdb device had failed.

There was something interesting, however, which I didn't anticipate.  If you'll look at the first block, the md1 raid block seemed to be doing just fine.

After a bit of pondering, I figured that since that whole block wasn't even mounted, it couldn't have known about the removed drive.

As soon as I mounted /dev/md1, the software recognized the missing drive, and it was assigned as a failed drive.

So, all three blocks told me the /dev/sdb disc was having problems.

In order to get rid of the failed status, and get the system ready to accept a new hard drive, I issued the following commands:

JDEV dev # mdadm /dev/md3 -r detached
mdadm: hot removed 8:19


JDEV dev # mdadm /dev/md1 --remove detached
mdadm: hot removed 8:17

JDEV dev # mdadm /dev/md2 -r detached
mdadm: hot removed 8:18

-r and --remove are the same thing.

After issuing that command, the /proc/mdstat looked like this:

JDEV dev # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
     
md2 : active raid1 sda2[0]
      3911744 blocks [2/1] [U_]
     
md3 : active raid1 sda3[0]
      240179712 blocks [2/1] [U_]
     
unused devices: <none>

So, there are no mirrored partitions at all, now.

After that, I put the removed hard drive back in (the hard drive wasn't really bad, so I put the same one back in)

As soon as I put it back in, the /dev directory assured me that the system recognized it and had hot-plugged it back in.

At this point, if it were a new unformatted disc, I would have to partition it so as to match the existing drive's partitions.

Then, I issued the following commands to re-add the drive's partitions to their respective mirrors:

JDEV dev # mdadm /dev/md3 --add /dev/sdb3
mdadm: re-added /dev/sdb3
JDEV dev # mdadm /dev/md2 --add /dev/sdb2
mdadm: re-added /dev/sdb2
JDEV dev # mdadm /dev/md1 --add /dev/sdb1
mdadm: re-added /dev/sdb1

And all looked good!

after syncing for about 30 minutes, this is what the /proc/mdstat node looks like:

JDEV dev # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2] sda1[0]
      104320 blocks [2/1] [U_]
          resync=DELAYED
     
md2 : active raid1 sdb2[2] sda2[0]
      3911744 blocks [2/1] [U_]
          resync=DELAYED
     
md3 : active raid1 sdb3[2] sda3[0]
      240179712 blocks [2/1] [U_]
      [=======>.............]  recovery = 35.5% (85384448/240179712) finish=44.4min speed=58045K/sec
     
unused devices: <none>

And that, my friend is pretty darned slick.

Linux software RAID-1 can handle a failed disk, removing the disk, adding a new one, and syncing the two, all without rebooting the machine, or having any downtime at all.
Comment by anonymous on Apr. 03, 2008 @ 11:01 am
Finally.  I've been wondering about this for quite awhile.  Good to know.
Comment by romild0 on Apr. 17, 2008 @ 10:05 am
w0rd up.
Comment by anonymous on Jul. 23, 2009 @ 11:15 am
Linux RAID can also do this

http://www.purplefrog.com/~thoth/philosophy/raid.html

Although I would recommend that you find an easier way to manage your storage.