Raid Administration

Introduction

Whilst there are articles on RAID installation (see 1, 2, 3, 4, 5, 6, 7 for example) to various degrees, this article is designed to provide practical information on RAID administration, regardless of RAID type used or installation method.

This article is of course using linux software RAID, also known as md after the controlling process, which is controlled by the mdadm command.

For the purposes of this example, we will create a RAID 1 array across /dev/sda and /dev/sdb using the setup-alpine script (more specifically the setup-disk script) and then add /dev/sdc to the array after installation. This will add it as a hot spare which will be used if one of the other drives becomes degraded. Alternatively the drive can immediately be added to the RAID array (as explained in the optional steps). The instructions in this article should work regardless of whether you are using RAID 1 or RAID 5 and whether you have setup your disks manually or with the setup script, unless stated otherwise.

In this example /dev/sda, /dev/sdb and /dev/sdc are all virtual 2GB disks on a VMware machine (it doesn't matter that it's a VM, the same process applies to a real machine with physical disks of larger sizes).

In our example, all disks are available (present) at the time of installation, however /dev/sdc could be added at a later time; this has no impact on the procedure described other than having to physically add the disk.

Initial setup

Install with setup-alpine and pass the relevant disks to setup-disk (in our case sda sdb) and use installation method sys.

This should create the following disk setup (it will differ in your setup since values of course depend on drive size):

md0 composed of /dev/sda1 and /dev/sdb1 ~100MB mounted as /boot

md1 composed of /dev/sda2 and /dev/sdb2 ~512MB as /swap

md2 composed of /dev/sda3 and /dev/sdb3 ~1400MB mounted as /

As you can see, we have redundancy across the two drives /dev/sda and /dev/sdb.

Review

Run df -h and observe that the RAID arrays are mounted, not the disk partitions as usual.

To see information on the current RAID partitions use the query option:

 mdadm --query /dev/md0

or for more information use the detail option

 mdadm --detail /dev/md1

After the initial setup, if you haven't added the third drive (/dev/sdc) now is the time to poweroff and physically add it to the machine.

Add devices to the array

Now, let's add /dev/sdc to the RAID array.

Copy partition table

First, copy the partition table from an existing drive to the new drive. Be very careful with the dd command and ensure you are copying from/to the correct place!

Note: for GPT partitioning, which you might have used if you've setup your disks manually, this dd command is unlikely to work since GPT stores its information differently

 dd if=/dev/sda of=/dev/sdc bs=512 count=1

Ensure this worked correctly by comparing the output of sfdisk, they should be identical:

 sfdisk --dump /dev/sda
 sfdisk --dump /dev/sdc

Add devices

Now add the partitions of the new disk to the relevant RAID arrays. Be sure to add the correct partitions to the correct arrays!

 mdadm /dev/md0 -a /dev/sdc1
 mdadm /dev/md1 -a /dev/sdc2
 mdadm /dev/md2 -a /dev/sdc3

You should see something like mdadm: added /dev/sdc1 if the command is successful. The -a flag is for add.

Now see how the output of the query command has changed from earlier:

 mdadm --query /dev/md0
 mdadm --query /dev/md1
 mdadm --query /dev/md2

You should see we still have two devices in each array, plus now we have a spare. A spare is an inactive device that is a member of the array; it will only be used if one of the other devices fails. If this is good enough for you, you're done!

Grow the array (optional)

Otherwise you can take the optional step to add the 'spare' device so it immediately becomes part of the array. Since we're using RAID 1 in our example this effectively gives us another backup of all data:

 mdadm --grow /dev/md0 -n 2

Should give you something like mdamd: /dev/md0: no change requested. This is because we already have -n 2 set (so we use 2 devices in the array). Obviously the --grow flag is used to grow the array and increase (or decrease) the number of devices in the array. Let's increase the value and bring in the additional device to the array:

 mdadm --grow /dev/md0 -n 3

You should see something like raid_disks for /dev/md0 set to 3 if successful. -n 3 specifies that there should be three active devices in the array. There can still be additional spare devices if you add more and do not grow the array.

Review the output of mdadm --query /dev/md0 or mdadm --detail /dev/md0 again to confirm it worked. Don't worry if you see something about 'spare rebuilding' - this is normal and will be replaced with a state of 'active sync' once data copying is complete.

Ensure to add the other devices (partitions) to the arrays by increasing the device count for the other arrays (otherwise they will remain as spares and not be immediately utilised):

 mdadm --grow /dev/md1 -n 3 
 mdadm --grow /dev/md2 -n 3

Remove devices

To remove a failed device use the following; remember you will need to remove all the partitions of the failing drive (devices) from the relevant RAID arrays. In our example, we will mark the partitions of /dev/sdb as failed and remove them from the array:

 mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1
 mdadm /dev/md1 -f /dev/sdb2 -r /dev/sdb2
 mdadm /dev/md2 -f /dev/sdb3 -r /dev/sdb3

Check the output of mdadm --detail /dev/md2 and see how the device is marked as 'removed'. The -f flag is used to mark a device as failed and -r is used to remove a device from the array.

To add a removed device back in, ensure it's partitioned correctly (replace the drive if necessary and copy over the partition table from a known good drive) and then simply add it back in again:

  mdadm /dev/md2 -a /dev/sdb3

(repeat for other partitions as appropriate).

Change device count (optional)

To entirely remove the device from the array (assuming you are not going to add it back later for instance) amend the device count again, this will remove it from the list so it no longer shows as 'removed' and we are back to two devices in the array:

 mdadm --grow /dev/md0 -n 2 
 mdadm --grow /dev/md1 -n 2 
 mdadm --grow /dev/md2 -n 2

If you do need to add disks back in again, you need to add them as spares (mdadm /dev/md0 -a /dev/sdb1 etc) and then change the device count if you wish to make the device active, as per the section on adding devices.

Zero the superblock

Zeroing the superblock is important if you intend to take a disk from an array and add it to another array, for example on another machine. Zeroing the superblock will prevent the RAID array from becoming confused about which array it should be building; leaving the old superblock information on a disk means it will try to read this old superblock information and this can cause all manner of headaches. So, to remove the superblock from a disk so you can use it elsewhere, simply use the --zero-superblock option. To continue from our example above:

mdadm --zero-superblock /dev/sdb1
mdadm --zero-superblock /dev/sdb2
mdadm --zero-superblock /dev/sdb3

drive /dev/sdb can then be removed and added to another RAID array without causing issues.

General recommendations

When making use of RAID arrays best practice is to have one more disk than is required and added as a spare. This immediately provides some form of redundancy. Remember that for RAID 1 you cannot go below 2 disks (well you can run on one disk, known as degraded mode, but this is best avoided at all costs) and with RAID 5 you cannot go below 3 disks. In short, if you are using RAID, have a spare device configured.

Disks cost money, but the data on those disks is often priceless!

It's a good idea to have a test environment to play around with RAID before implementing it in a production environment. Worst case, setup a VirtualBox host and run an Alpine VM and play around with that, prior to using a production system.

Further information

man mdadm

RAID on wikipedia

Linux RAID wiki at kernel.org