OS Installation

How do I rebuild a software RAID 1 array after a disk failure?

* Behavior

  • One of my software RAID disks died and my system won't boot.

* Environment

  • Red Hat Enterprise Linux 5 or later.

* Resolution

  1. Boot the machine with only the "good disk" in the system.
  2. Look at /proc/mdstat and fdisk -l output to verify RAID status. Paste output into the ticket if possible.
  3. Add the new disk into the system and boot.
  4. Install grub on both disks so if a falure occurs the system will still have one disk with a mbr:
    [[email protected] ~]# grub-install /dev/sda
    [[email protected] ~]# grub-install /dev/sdb
  5. Partition the new disk to make it exactly like the disk that is still there with the same starting and ending cylinders:
    [[email protected] ~]# fdisk /dev/sda
  6. Make the partition ID read "fd" which is "Linux raid autodetect" as in the following example:
    Disk /dev/sda: 750.1 GB, 750156374016 bytes
    255 heads, 63 sectors/track, 91201 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/sda1 * 1 38 305203+ fd Linux raid autodetect
    /dev/sda2 39 1058 8193150 fd Linux raid autodetect
    /dev/sda3 1059 91201 724073647+ fd Linux raid autodetect
  7. Tell the kernel to detect the newly created partitions:
    [[email protected] ~]# partprobe
  8. Rebuild each md device by adding the second disk back in using the /proc/mdstat output with the mdadm command:
    [[email protected] ~]# mdadm /dev/md0 --add /dev/sda1
    [[email protected] ~]# mdadm /dev/md1 --add /dev/sda3
    [[email protected] ~]# mdadm /dev/md2 --add /dev/sda2
  9. Monitor the progress of the rebuild with this command:
    [[email protected] ~]# watch -n 2 cat /proc/mdstat
