Restoring a Broken Linux RAID Array

About 18 months ago I set up a Linux media server for my home. It was made from an old Dell desktop that my neighbor was (literally) discarding, and a pair of new, identical Seagate hard disks. Since I was going to be spending a lot of time copying my CDs to this server, I configured a RAID-1 array that mirrored the hard disks; that way, there would always be a current backup. The OS was Ubuntu Linux 6.06 Server, and it used software RAID.

One of the drives started failing last week, so happy though I was to have a handy backup, I was a bit daunted about the prospect of restoring a broken RAID array. You see, there’s plenty of tutorials on how to set up software RAID, but not that many resources on what to do after a drive breaks. It actually turned out to be really easy, thanks in large part to my friends Yossie’s and Eric’s guidance, so I decided to document the process here.

Determining if Your RAID Array is Broken

This part is really easy. Log in as root, and run:

cat /proc/mdstat

You should see something like this if there is a problem:

Personalities : [raid1]
md1 : active raid1 hda2[0]
1927680 blocks [2/1] [U_]

md0 : active raid1 hda1[0]
310640768 blocks [2/1] [U_]

unused devices: <none>

Note the underscore in the [U_]. A healthy RAID array will not have the underscore. Instead, it would say [UU]. Also note that hda is active (it’s listed in the md0/md1 lines), but hdb is conspicuously absent. So hdb is in trouble.

Finding the Broken Drive

If you know which drive hdb is, then remove it, but it’s probably best to run a test to be sure it actually has errors. For this, I found a very versatile free tool called Ultimate Boot CD. You download the ISO and burn it onto a CD. You can use the CD to boot your PC. It’s packed with diagnostic tools, such as hard disk testers, memory testers, etc. Run the appropriate one for your brand of hard disk, on each disk in the array. The Seagate tester I used lists the hard disks serial numbers, so there’s no confusion once you open the case to remove the bad drive.

Ensure You Can Boot From the Good Drive

Depending on which drive is bad, and how you originally configured them, you may not be able to boot properly from the good drive. Best to test that you can. Remove the bad drive, and try to boot. You may need to modify jumper settings and/or move the good drive to the master plug on the IDE cable. If you can boot, great. If not, the Grub loader may be missing from the the good drive. To reinstall it, put the drives back where they were so that you can boot, log in as root, and run:

grub-install /dev/hda

(Assuming you need to install it on hda).

Retest that you can now boot off the remaining drive with the bad one removed.

Replace the Broken Drive

Obtain a matching drive, and install it in the case.

I only discovered (happily) after this happened that Seagate drives come with a 5-year warranty. If a drive goes bad, you can check if it’s covered using Seagate’s Warranty Checker. You don’t even need your original receipt to process the return, just the serial and model numbers. The whole process couldn’t be easier. I opted for their $20 premium service, where they send you a refurbished drive that matches yours immediately, via 2-day air shipping. The replacement drive comes with a box and a prepaid label to return the broken one to Seagate. The return shipping alone would have cost me $10, so that service seems like a bargain.

Rebuild the RAID Array

This is the bit I was daunted by, but it really turned out to be fairly simple. After the new drive is installed, boot up and log in as root.

Verify which drive is now part of the array. Run:

cat /proc/mdstat

You should see something like:

md1 : active raid1 hda2[1]
1927680 blocks [2/1] [_U]

md0 : active raid1 hda1[1]
310640768 blocks [2/1] [_U]

In this case, hda is the working drive.

Look at the current partition table of the working drive (hda in my case). Run:

fdisk /dev/hda
p (for print)
q (to exit)

Output should be something like:

Disk /dev/hda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1       38673   310640841   fd  Linux raid autodetect
/dev/hda2           38674       38913     1927800   fd  Linux raid autodetect

Now configure the partitions of the new drive (hdb) to match the working one (hda). Run:

fdisk /dev/hda

  1. Enter n (for new partition).
  2. Enter p (for primary partition).
  3. Make it the first partition (i.e. 1).
  4. Start at cylinder 1 (the default).
  5. Use the End value for the last cylinder of partition 1 on hda (i.e. 38673).
  6. Change the new partition’s type to match the Id value of your first
    partition (i.e. “fd”). This is important for the RAID controller to understand that this is part of the RAID array.
    Enter t (for “change a partition’s system id”).
    Enter fd.
  7. That would have configured the first partition hdb1. Repeat steps 1-6 for all the partitions on hda. In my case, there was only one additional one. Remember to use the copy Start and End blocks.
  8. when you’re finished, enter w (for “write table to disk and exit”).

Now you’re ready to hot-add the new drive to your RAID array. You’ll need to run mdadm for each partition you need to restore (two in my case). Use the output of /proc/mdstat as a guide. e.g.

md1 : active raid1 hda2[1]
1927680 blocks [2/1] [_U]

md0 : active raid1 hda1[1]
310640768 blocks [2/1] [_U]

Here md1 currently maps to hda2 only. We need to add hdb2 to md1. md0 currently maps to hda1 only. We need to add hdb1 to md0. So in this case, you would run the following commands:

mdadm /dev/md0 -a /dev/hdb1
mdadm /dev/md1 -a /dev/hdb2

That’s it! The RAID array should be doing its magic.

Verify That It’s Updating

Examine the output from /proc/mdstat now, and you should see something like:

Personalities : [raid1]
md1 : active raid1 hdb2[2] hda2[1]
1927680 blocks [2/1] [_U]
resync=DELAYED

md0 : active raid1 hdb1[2] hda1[1]
310640768 blocks [2/1] [_U]
[>………………..]  recovery =  1.0% (3196864/310640768) finish=158.8min speed=32257K/sec

unused devices: <none>

Since it’ll take several hours, you can log out of the shell, and let it synchronize the two disks.

Install the Grub Loader on Your New Drive

Since you just added hdb, you should install the Grub loader, in case hda ever fails or is removed. Run:

grub-install /dev/hdb

… and you should be good to go.

Finding the OpenLaszlo Version of a Compiled SWF

When you need to additional work on a previously-built OpenLaszlo application, you need to make sure that you use the same version of OpenLaszlo as was originally used to build it. If you use a different version of OpenLaszlo, you may run into incompatibilities in the existing application’s LZX source code. Unless the original application developer documented what version of OpenLaszlo they used, it won’t be immediately obvious.

I’m assuming that at this point, you have the source code, and a compiled SWF (possibly from the live deployment server). If you don’t have the application as a compiled SWF, then this approach won’t work.

First, download and extract Flasm. Flasm is an open-source SWF disassembler. It won’t translate a SWF into LZX code, but it will convert it into readable bytecode, and that is all you really need to find the OpenLaszlo canvas attributes. You run Flasm from the command line.

Second, download the compiled application SWF (the one whose OpenLaszlo version you want to determine) onto your development machine. If its on a production server, you can use a *NIX command-line tool like wget or curl. If you’re on Windows, install Cygwin, and you’ll have all the *NIX goodies you need. (The Windows Flasm executable works well in a Cygwin shell). e.g.

wget http://www.antunkarlovac.com/blog/wp-content/uploads/2008/09/selectionmanager.swf

That should download the SWF to your current directory.

Finally, run Flasm, with the disassemble argument, and search for the term lpsversion. Either save the output to a file, or pipe it through grep. There’s a bunch of output, but I find the last line of output tends to be the relevant one:


push ‘canvas’, ‘__LZproxied’, ‘true’, ‘bgcolor’, 16777215, ’embedfonts’, TRUE, ‘fontname’, ‘Verdana,Vera,sans-serif’, ‘fontsize’, 11, ‘fontstyle’, ‘plain’, ‘height’, 260, ‘lpsbuild‘, ‘10323-openlaszlo-branches-4.1‘, ‘lpsbuilddate’, ‘2008-07-11T15:05:43-0700’, ‘lpsrelease’, ‘Production’, ‘lpsversion‘, ‘4.1.1‘, ‘proxied’, FALSE, ‘runtime‘, ‘swf8‘, ‘width’, 300, 14

Note the relevant terms I’ve highlighted in bold:

  • lpsbuild: The exact build number of OpenLaszlo that was used to compile this SWF.
  • lpsversion: The version number that was used to compile this SWF.
  • runtime: The version of SWF that this was compiled to.

There you have it. It’s also a good tip to put a comment in your main application file, to tell other developers (and yourself, six months from now) what version of OpenLaszlo you were using.

The lz Namespace

I wrote about migrating code to OpenLaszlo 4.1 a while back, but even after a few weeks of working with it, I’m still getting bitten by the new lz namespace. I’m just so used to doing things the old way, and this (pretty significant change) is not that well documented, and . Here’s a summary of the lz namespace, and how it affects LZX code.

Firstly, a general rule: These changes apply to referencing classes in JavaScript and JavaScript expressions in attribute values or constraints only. The way you write tags is unchanged.

For classes that have a corresponding tag, the new syntax is to use lz.classname. (Note: all lower case, for LFC classes). This applies to Laszlo Foundation Classes (e.g. view, text), LZX components (e.g. button, window) and all classes that you write. These now live in the new lz namespace:

// Old pre-4.1 syntax
var v = new LzView(canvas, {width: 30, height: 30, bgcolor: red});

// New 4.1 syntax
var v = new lz.view(canvas, {width: 30, height: 30, bgcolor: red});

For classes with no tag, (e.g. LzDelegate, LzContextMenu), the syntax is unchanged, so you would use LzClassName. These continue to live in the global namespace:

// Old pre-4.1 syntax
var v = new LzDelegate(this, "doSomething", this, "onmouseover");

// New 4.1 syntax (unchanged)
var v = new LzDelegate(this, "doSomething", this, "onmouseover");

For services (e.g. LzKeys, LzTimer), there has been some refactoring. The class names are now called LzKeysService, LzTimerService, and they are accessible by the new syntax lz.Name (note there’s no “Service” suffix to the name).  They are in the lz namespace:

<!-- Old pre-4.1 syntax -->
<handler name="onkeydown" reference="LzKeys" args="keyCode">
    Debug.write("User pressed key: ", keyCode);
</handler>

<!-- New 4.1 syntax -->
<handler name="onkeydown" reference="lz.Keys" args="keyCode">
    Debug.write("User pressed key: ", keyCode);
</handler>

JavaScript classes (e.g. String, Math) are unchanged. They’re not technically part of the LZX global namespace.

// Old pre-4.1 syntax
var opposite = hypotenuse * Math.sin(angle);

// New 4.1 syntax (unchanged)
var opposite = hypotenuse * Math.sin(angle);

These four categories should cover all the possible scenarios you have to deal with as a developer. If you’re ever in doubt, or want to explore what’s in the namespaces, enter lz or global in the Debugger window and click Eval. Then click the blue link that the Debugger returns to serialize the object. You’ll be able to see all the class names and objects that are in each scope. Once this is done it is only a matter of putting it online on your site, and it should work just fine, great even. Remember however, that for it to have any effect you will need traffic, so you’ll want to make sure you work with experts to have a good SEO score. Here you can learn more about franchise SEO services if you would like to research the topic further. Make sure that you do this sooner rather later so that you can see the effects as soon as possible and get the project rolling!