Skip to main content Logo (IEC resistor symbol)logo

Quis custodiet ipsos custodes?
Home | About | All pages | RSS Feed | Gopher

Linux software raid, rebuilding broken raid 1

Published: 14-04-2014 | Author: Remy van Elst | Text only version of this article

Table of Contents

Last week Nagios alerted me about a broken disk in one of my clients testingservers. There is a best effort SLA on the thing, and there were spare drives ofthe same type and size in the datacenter. Lucky me. This particular data centeris on biking distance, so I enjoyed a sunny ride there.

If you like this article, consider sponsoring me by trying out a Digital OceanVPS. With this link you'll get $100 credit for 60 days). (referral link)

Simply put, I needed to replace the disk and rebuild the raid 1 array. Thisserver is a simple Ubuntu 12.04 LTS server with two disks running in raid 1, nospare. Client has a tight budget, and with a best effort SLA not in production,fine with me. Consultant tip, make sure you have those things signed.

The _ in the cat /proc/mdstat tells me the second disk (/dev/sdb) hasfailed:

Personalities : [raid1] [raid6] [raid5] [raid4]md0 : active raid1 sda1[0] sdb1[1]      129596288 blocks [2/2] [U_]

U means up, _ means down [source]

First we remove the disk from the RAID array:

mdadm --manage /dev/md0 --remove /dev/sdb1

Make sure the server can boot from a degraded RAID array:

grep BOOT_DEGRADED /etc/initramfs-tools/conf.d/mdadm

If it says true, continue on. If not, add or change it and rebuild the initramfsusing the following command:

update-initramfs -u

(Thank you Karssen)

We can now safely shut down the server:

shutdown -h 10

Replacing the disk was an issue on itself, it is a Supermicro 512L-260Bchassis where the disks are not in a drive bay, rather they are screwed in fromthe bottom. Therefore the whole server needs to be removed from the rack (norails...) when replacing the disk.

Normally I would replace them while the server is on, but this server has no hotswap disks so that would be kind of an issue in a full rack.

After that, boot the server from the first disk (via the BIOS/UEFI). Make sureyou boot to recovery mode. Select the root shell and mount the disk read/write:

mount -o remount,rw /dev/sda1

Now copy the partition table to the new (in my case, empty) disk:

sfdisk -d /dev/sda > sfdisk /dev/sdb

This will erase data on the new disk.

Add the disk to the RAID array and wait for the rebuilding to be complete:

mdadm --manage /dev/md0 --add /dev/sdb1

This is a nice progress command:

watch cat /proc/mdstat

It will take a while on large disks:

Personalities : [raid1] [raid6] [raid5] [raid4]md0 : active raid1 sda1[0] sdb1[1]      129596288 blocks [2/2] [U_]      [=>...................]  recovery = 2.6% (343392/129596288) finish=67min speed=98840K/secunused devices: <none> 
Tags: blog, disks, kernel, mdadm, raid, software-raid, ubuntu