Skip to main content Logo

Quis custodiet ipsos custodes?
Home | About | All pages | Cluster Status | RSS Feed

Linux software raid, rebuilding broken raid 1

Published: 14-04-2014 | Author: Remy van Elst | Text only version of this article

❗ This post is over ten years old. It may no longer be up to date. Opinions may have changed.

Last week Nagios alerted me about a broken disk in one of my clients testing servers. There is a best effort SLA on the thing, and there were spare drives of the same type and size in the datacenter. Lucky me. This particular data center is on biking distance, so I enjoyed a sunny ride there.

Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:

I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!

Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.

You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $200 credit for 60 days. Spend $25 after your credit expires and I'll get $25!

Simply put, I needed to replace the disk and rebuild the raid 1 array. This server is a simple Ubuntu 12.04 LTS server with two disks running in raid 1, no spare. Client has a tight budget, and with a best effort SLA not in production, fine with me. Consultant tip, make sure you have those things signed.

The _ in the cat /proc/mdstat tells me the second disk (/dev/sdb) has failed:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[0] sdb1[1]
      129596288 blocks [2/2] [U_]

U means up, _ means down [source]

First we remove the disk from the RAID array:

mdadm --manage /dev/md0 --remove /dev/sdb1

Make sure the server can boot from a degraded RAID array:

grep BOOT_DEGRADED /etc/initramfs-tools/conf.d/mdadm

If it says true, continue on. If not, add or change it and rebuild the initramfs using the following command:

update-initramfs -u

(Thank you Karssen)

We can now safely shut down the server:

shutdown -h 10

Replacing the disk was an issue on itself, it is a Supermicro 512L-260B chassis where the disks are not in a drive bay, rather they are screwed in from the bottom. Therefore the whole server needs to be removed from the rack (no rails...) when replacing the disk.

Normally I would replace them while the server is on, but this server has no hot swap disks so that would be kind of an issue in a full rack.

After that, boot the server from the first disk (via the BIOS/UEFI). Make sure you boot to recovery mode. Select the root shell and mount the disk read/write:

mount -o remount,rw /dev/sda1

Now copy the partition table to the new (in my case, empty) disk:

sfdisk -d /dev/sda > sfdisk /dev/sdb

This will erase data on the new disk.

Add the disk to the RAID array and wait for the rebuilding to be complete:

mdadm --manage /dev/md0 --add /dev/sdb1

This is a nice progress command:

watch cat /proc/mdstat

It will take a while on large disks:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[0] sdb1[1]
      129596288 blocks [2/2] [U_]
      [=>...................]  recovery = 2.6% (343392/129596288) finish=67min speed=98840K/sec

unused devices: <none> 
Tags: blog , disks , kernel , mdadm , raid , software-raid , ubuntu