return to I Love My Journal
A Little Closer to Center...
Musings about Life, Linux, and Latter-day Saints.
Pages
About Me
Links


Tags
PERSONAL 322
LDS 258
SPIRITUAL 251
BOOK OF MORMON 181
SCRIPTURES 99
RELIGION 78
LINUX 68
COMPUTERS 63
LIFE 60
GENERAL CONFERENCE 39
MISCELLANEOUS 34
GENTOO 33
MUSIC 31
PROGRAMMING 30
CARS 29
MICROSOFT 23
FAMILY 22
AUDIO 18
I LOVE MY JOURNAL 18
FUN 15
CHILDREN 12
CURRENT EVENTS 10
NATURE'S WAY 10
DRM 9
VIDEO 8
CONEXM 7
FRIENDS 6
COMPUTER HARDWARE 5
BABBLINGS 5
DRUMS 4
GAMES 4
GENERAL INSANITY 3
KDENLIVE 3
ADVERSITY 2
CATS 2
YOUTH CONFERENCE 1
EVERYDAY THOUGHTS 1
CHURCH NOTES 1
MY JOURNAL 1
POLITICS 1
PARENTING 1
POETRY 1


RSS Feed

RSS FeedSubscribe!
Wed - Apr 02, 2008 : 02:33 pm
amazed
   rated 0 times
>>next>>
<<previous<<
Linux Software RAID On Trial
Here at work, the time is fast coming where I will be required to put live, the server I've been working on for the past year.

And we all know that when the rubber meets the road, things get a little bit more serious.

I came to the conclusion that since I'm the only Linux person in this company, should the Linux servers ever decide to go *kaput*, there would be only one person to blame, and quite frankly, I never want to deal with that situation.

The servers we have are IBM cheapies, so I didn't have any hardware RAID controllers at my disposal.

I have heard really good things about the Linux software RAID, so I decided to use the two available hard drive bays as mirrors in a RAID-1 configuration using Linux software RAID (md) on gentoo.

Setting up the RAID was easy.  I had that done about 10 months ago.

Well... I hadn't realized until now that no testing had been done on the RAID at all, so in event of a hard drive failure, I wouldn't even know it had happened.

So, yesterday, I got the mdadm monitoring system in place by putting the following command in a cron job which runs every 15 minutes: mdadm --monitor --scan -1

I also made sure a valid email address was placed in the /etc/mdadm.conf file.

Once I saw that the monitoring system was running (by using the --test option), we decided to go to the server and pull a hard drive out while the system was running.

Before I pulled it out, I catted the /proc/mdstat node, and it gave me the "we're all good" sign.  Here's the output:

JDEV php5 # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md2 : active raid1 sdb2[1] sda2[0]
      3911744 blocks [2/2] [UU]
     
md3 : active raid1 sdb3[1] sda3[0]
      240179712 blocks [2/2] [UU]
     
unused devices: <none>

We did so, and when I came back, I had two emails waiting for me which told me a hard drive had failed.

The server was still 100% functional and had no evidence of any stuttering or stalling whatsoever.

This is what the readout looked like when I catted /proc/mdstat:

JDEV php5 # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md2 : active raid1 sdb2[2](F) sda2[0]
      3911744 blocks [2/1] [U_]
     
md3 : active raid1 sdb3[2](F) sda3[0]
      240179712 blocks [2/1] [U_]
     
unused devices: <none>

That tells me that the sdb device had failed.

There was something interesting, however, which I didn't anticipate.  If you'll look at the first block, the md1 raid block seemed to be doing just fine.

After a bit of pondering, I figured that since that whole block wasn't even mounted, it couldn't have known about the removed drive.

As soon as I mounted /dev/md1, the software recognized the missing drive, and it was assigned as a failed drive.

So, all three blocks told me the /dev/sdb disc was having problems.

In order to get rid of the failed status, and get the system ready to accept a new hard drive, I issued the following commands:

JDEV dev # mdadm /dev/md3 -r detached
mdadm: hot removed 8:19


JDEV dev # mdadm /dev/md1 --remove detached
mdadm: hot removed 8:17

JDEV dev # mdadm /dev/md2 -r detached
mdadm: hot removed 8:18

-r and --remove are the same thing.

After issuing that command, the /proc/mdstat looked like this:

JDEV dev # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
     
md2 : active raid1 sda2[0]
      3911744 blocks [2/1] [U_]
     
md3 : active raid1 sda3[0]
      240179712 blocks [2/1] [U_]
     
unused devices: <none>

So, there are no mirrored partitions at all, now.

After that, I put the removed hard drive back in (the hard drive wasn't really bad, so I put the same one back in)

As soon as I put it back in, the /dev directory assured me that the system recognized it and had hot-plugged it back in.

At this point, if it were a new unformatted disc, I would have to partition it so as to match the existing drive's partitions.

Then, I issued the following commands to re-add the drive's partitions to their respective mirrors:

JDEV dev # mdadm /dev/md3 --add /dev/sdb3
mdadm: re-added /dev/sdb3
JDEV dev # mdadm /dev/md2 --add /dev/sdb2
mdadm: re-added /dev/sdb2
JDEV dev # mdadm /dev/md1 --add /dev/sdb1
mdadm: re-added /dev/sdb1

And all looked good!

after syncing for about 30 minutes, this is what the /proc/mdstat node looks like:

JDEV dev # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2] sda1[0]
      104320 blocks [2/1] [U_]
          resync=DELAYED
     
md2 : active raid1 sdb2[2] sda2[0]
      3911744 blocks [2/1] [U_]
          resync=DELAYED
     
md3 : active raid1 sdb3[2] sda3[0]
      240179712 blocks [2/1] [U_]
      [=======>.............]  recovery = 35.5% (85384448/240179712) finish=44.4min speed=58045K/sec
     
unused devices: <none>

And that, my friend is pretty darned slick.

Linux software RAID-1 can handle a failed disk, removing the disk, adding a new one, and syncing the two, all without rebooting the machine, or having any downtime at all.
Comment by anonymous on Apr. 03, 2008 @ 11:01 am
Finally.  I've been wondering about this for quite awhile.  Good to know.
Comment by romild0 on Apr. 17, 2008 @ 10:05 am
w0rd up.
Comment by anonymous on Jul. 23, 2009 @ 11:15 am
Linux RAID can also do this

http://www.purplefrog.com/~thoth/philosophy/raid.html

Although I would recommend that you find an easier way to manage your storage.
Let us know what you think:
Name:
Email:
Website:
Comment:


Human Test: What's the light-source in the night-sky? (not the sun, but the...?)
Answer: