How to recover from a broken RAID array with MDADM

This article will attempt to guide you to determine if a MDADM based raid array (in our case a RAID1 array) is broken and how to rebuild it. This procedure has been tested on CentOS 5 and 6.

Determine the status of your RAID array

To view the status of your RAID array enter the following as root. Note the two U's, indicating that two drives are available and the RAID array is in an active state (UU):

# cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md2 : active raid1 hda3[1] hdb3[0]
      262016 blocks [2/2] [UU]
      
md1 : active raid1 hda2[1] hdb2[0]
      119684160 blocks [2/2] [UU]
      
md0 : active raid1 hda1[1] hdb1[0]
      102208 blocks [2/2] [UU]
      
unused devices:

In case a drive has failed, the output would look like this:

Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1  hda1[1]
      102208 blocks [2/1] [_U]

md2 : active raid1 hda3[1]
      262016 blocks [2/1] [_U]

md1 : active raid1 hda2[1]
      119684160 blocks [2/1] [_U]
unused devices:

Note that it doesn't list the failed drive parts, and that an underscore appears beside each U (_U or U_). This shows that only one drive is active in these arrays - There is no mirror.

Another command that will show us the state of the raid drives is mdadm where /dev/md1 is our device ID:

# mdadm -D /dev/md1
/dev/md0:
        Version : 00.90.00
  Creation Time : Thu Aug 21 12:22:43 2003
     Raid Level : raid1
     Array Size : 102208 (99.81 MiB 104.66 MB)
    Device Size : 102208 (99.81 MiB 104.66 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Oct 15 06:25:45 2004
          State : dirty, no-errors
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       0        0        0      faulty removed
       1       3        1        1      active sync   /dev/hda1
           UUID : f9401842:995dc86c:b4102b57:f2996278

This output shows that we have presently only one drive in the array. A working array would display state: clean.

To get information about the status of the RAID array use the mdadm -misc command:

mdadm --misc --detail /dev/md2 
/dev/md1:
Version : 00.90.00
Creation Time : Tue Nov 7 22:01:16 2006
Raid Level : raid1
Array Size : 3068288 (2.93 GiB 3.14 GB)
Device Size : 3068288 (2.93 GiB 3.14 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Nov 8 15:42:35 2006
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 4a9a446d:af55e24b:b311aa61:8dc74ed4
Events : 0.12

Number Major Minor RaidDevice? State
0 8 1 0 active sync /dev/sda1
1 0 0 1 removed

To view a list of all partitions on a disk enter the following:

cat /proc/partitions

Recover from a broken RAID array

1. Install GRUB on remaining hard drive

Prior to the removal of the failed hard drive, it is imperative to double-check GRUB has been installed on the remaining drive. In case of a non-hot-swap hard drive replacement, your system needs to be rebooted. If GRUB is not present on the remaining drive, your system will fail to boot.

The procedure outlined below is valid and working for CentOS5 and 6. CentOS7 makes use of GRUB2.

So, let's install grub on the MBR of the remaining hard drive.

Enter the Grub command line:

# grub

First, locate the grub setup files:

grub> find /grub/stage1

On a RAID 1 with two drives present you should expect to get:

(hd0,0)
(hd1,0)

Install grub on the MBR of the remaining hard drive if this hasn't already been done:

grub> device (hd0) /dev/sdx (or /dev/hdb for IDE drives)
grub> root (hd0,0)
grub> setup (hd0)
grub>quit

This should ensure your system boots properly after your hard drive has been replaced. It is highly recommended to follow these steps during initial installation of your server. That way, if trouble come knocking, you are well prepared.

2. Recreate the partition structure of the failed drive

To get the mirrored drives working properly again, we need to run fdisk to see what partitions are on the working drive:

# fdisk -l /dev/sda

Command (m for help): p

Disk /dev/hda: 255 heads, 63 sectors, 14946 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1        13    104391   fd  Linux raid autodetect
/dev/hda2            14     14913 119684250   fd  Linux raid autodetect
/dev/hda3         14914     14946    265072+  fd  Linux raid autodetect

We can copy the partition table from the working drive to the new drive: It may be necessary to add the –force token.

If sda has been replaced:

sfdisk -d /dev/sdb | sfdisk /dev/sda

If sdb has been replaced:

sfdisk -d /dev/sda | sfdisk /dev/sdb

3. Rebuild the partition table

Now that the partitions are configured on the newly installed hard drive, we can begin rebuilding the partitions of this RAID Array. Please note that synchronizing your hard drive may take a long time to complete.

mdadm /dev/md1 --manage --add /dev/sda1 
mdadm /dev/md2 --manage --add /dev/sda2

The rebuilding progress can be viewed by entering:

# cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md0 : active raid1 hdb1[0] hda1[1]
      102208 blocks [2/2] [UU]
      
md2 : active raid1 hda3[1]
      262016 blocks [2/1] [_U]
      
md1 : active raid1 hdb2[2] hda2[1]
      119684160 blocks [2/1] [_U]
      [>....................]  recovery =  
0.2% (250108/119684160) finish=198.8min speed=10004K/sec
unused devices:

The md0, a small array, has already completed rebuilding (UU), while md1 has only begun. After it finishes, it will show:

#  mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.00
  Creation Time : Thu Aug 21 12:21:21 2003
     Raid Level : raid1
     Array Size : 119684160 (114.13 GiB 122.55 GB)
    Device Size : 119684160 (114.13 GiB 122.55 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Oct 15 13:19:11 2004
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       3       66        0      active sync   /dev/hdb2
       1       3        2        1      active sync   /dev/hda2
           UUID : ede70f08:0fdf752d:b408d85a:ada8922b

4. Recreate Swap partition

Finally ensure the swap partition gets added to the array:

See current swap partition:

cat /proc/swaps
Filename Type Size Used Priority
/dev/sdb3 partition 522104 43984 -1

Recreate the swap partition by issuing the commands below. Please ensure to change partition SDA3 to the correct partition as found when using the cat /proc/swaps command:

mkswap /dev/sda3
swapon -a

18 Users Found This Useful

How to optimize MySQL performance

Structured Query Language (SQL) is a special purpose programming language used to store,...

Convert cPanel accounts to DirectAdmin

As of version 1.57.4 of DirectAdmin, native conversion and restoration of cPanel user accounts is...

Rescue Mode

Rescue mode is a tool that is available for your Dedicated Server. Rescue Mode can be used to...

How to optimize Apache performance

Squeezing the most performance out of your Apache server can make a difference in how your...

Categories

Categories

Promotion

Newsletter

Support