Raid Administration: Difference between revisions
Ginjachris (talk | contribs) |
m (Update wikilink.) |
||
(26 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
Whilst there are articles on RAID installation (see [[ISCSI Raid and Clustered File Systems|1]], [[ | Whilst there are articles on RAID installation (see [[ISCSI Raid and Clustered File Systems|1]], [[Setting up a software RAID array|2]], [[Linux iSCSI Target (TCM)|3]] [[Setting up a /var partition on software IDE raid1|4]], [[Setting up disks manually|5]] for example) to varying degrees, this article is designed to provide practical information on RAID administration, regardless of RAID type used or installation method. | ||
This article is of course using linux software RAID, which is controlled by the <code>mdadm</code> command. | This article is of course using linux software RAID, named ''md'', after the controlling process, which is controlled by the <code>mdadm</code> command. | ||
For the purposes of this example, we will create a [https://en.wikipedia.org/wiki/RAID#RAID_1 RAID 1] array across /dev/sda and /dev/sdb using the <code>setup-alpine</code> script (more specifically the [[Alpine Setup Scripts#RAID|setup-disk script]] and then add /dev/sdc to the array after installation. This will add it as a hot spare which will be used if one of the other drives becomes degraded. Alternatively the drive can immediately be added to the RAID array (as explained in the optional steps). | For the purposes of this example, we will create a [https://en.wikipedia.org/wiki/RAID#RAID_1 RAID 1] array across /dev/sda and /dev/sdb using the <code>setup-alpine</code> script (more specifically the [[Alpine Setup Scripts#RAID|setup-disk script]]) and then add /dev/sdc to the array after installation. This will add it as a hot spare which will be used if one of the other drives becomes degraded. Alternatively, the drive can immediately be added to the RAID array (as explained in the optional steps). | ||
The instructions in this article should work regardless of whether you are using [https://en.wikipedia.org/wiki/RAID#RAID_1 RAID 1] or [https://en.wikipedia.org/wiki/RAID#RAID_5 RAID 5] and whether you have [[Setting up disks manually|setup your disks manually]] or with the [[ | The instructions in this article should work regardless of whether you are using [https://en.wikipedia.org/wiki/RAID#RAID_1 RAID 1] or [https://en.wikipedia.org/wiki/RAID#RAID_5 RAID 5] and whether you have [[Setting up disks manually|setup your disks manually]] or with the [[Alpine setup scripts|setup script]], unless stated otherwise. | ||
In this example /dev/sda, /dev/sdb and /dev/sdc are all virtual 2GB disks on a VMware machine (it doesn't matter that it's a VM, the same process applies to a real machine with physical disks of larger sizes). | In this example /dev/sda, /dev/sdb and /dev/sdc are all virtual 2GB disks on a VMware machine (it doesn't matter that it's a VM, the same process applies to a real machine with physical disks of larger sizes). | ||
In our example, all disks are available (present) at the time of installation, however /dev/sdc could be added at a later time | In our example, all disks are available (present) at the time of installation, however /dev/sdc could be added at a later time. That has no impact on the procedure described other than having to physically add the disk. | ||
== Initial setup == | == Initial setup == | ||
Line 64: | Line 64: | ||
mdadm /dev/md2 -a /dev/sdc3 | mdadm /dev/md2 -a /dev/sdc3 | ||
</pre> | </pre> | ||
You should see something like | You should see something like <code>mdadm: added /dev/sdc1</code> if the command is successful. The '''-a''' flag is for '''add'''. | ||
Now see how the output of the query command has changed from earlier: | Now see how the output of the query command has changed from earlier: | ||
Line 72: | Line 72: | ||
mdadm --query /dev/md2 | mdadm --query /dev/md2 | ||
</pre> | </pre> | ||
You should see we still have two devices in each array, plus now we have a spare. A spare will only be used if one of the other devices fails. If this is good enough for you, you're done! | You should see we still have two devices in each array, plus now we have a spare. A spare is an inactive device that is a member of the array; it will only be used if one of the other devices fails. If this is good enough for you, you're done! | ||
=== Grow the array (optional) === | === Grow the array (optional) === | ||
Line 80: | Line 80: | ||
mdadm --grow /dev/md0 -n 2 | mdadm --grow /dev/md0 -n 2 | ||
</pre> | </pre> | ||
Should give you something like <code>mdamd: /dev/md0: no change requested</code>. This is because we already have <code>-n 2</code> set (so we use 2 devices in the array). Let's increase the value and bring in the additional device to the array: | Should give you something like <code>mdamd: /dev/md0: no change requested</code>. This is because we already have <code>-n 2</code> set (so we use 2 devices in the array). Obviously the '''--grow''' flag is used to '''grow''' the array and increase (or [[Raid_Administration#Change_device_count_.28optional.29|decrease]]) the number of devices in the array. Let's increase the value and bring in the additional device to the array: | ||
<pre> | <pre> | ||
mdadm --grow /dev/md0 -n 3 | mdadm --grow /dev/md0 -n 3 | ||
</pre> | </pre> | ||
You should see something like <code>raid_disks for /dev/md0 set to 3</code> if successful. | You should see something like <code>raid_disks for /dev/md0 set to 3</code> if successful. '''-n 3''' specifies that there should be three ''active'' devices in the array. There can still be additional ''spare'' devices if you add more and do not grow the array. | ||
Review the output of <code>mdadm --query /dev/md0</code> or <code>mdadm --detail /dev/md0</code> again to confirm it worked. Don't worry if you see something about 'spare rebuilding' - this is normal and will be replaced with a state of 'active sync' once data copying is complete. | Review the output of <code>mdadm --query /dev/md0</code> or <code>mdadm --detail /dev/md0</code> again to confirm it worked. Don't worry if you see something about 'spare rebuilding' - this is normal and will be replaced with a state of 'active sync' once data copying is complete. | ||
Line 103: | Line 103: | ||
</pre> | </pre> | ||
Check the output of <code>mdadm --detail /dev/md2</code> and see how the device is marked as 'removed'. | Check the output of <code>mdadm --detail /dev/md2</code> and see how the device is marked as 'removed'. | ||
The '''-f''' flag is used to mark a device as '''failed''' and '''-r''' is used to '''remove''' a device from the array. | |||
To add a removed device back in, ensure it's partitioned correctly and then simply add it back in again: | To add a removed device back in, ensure it's partitioned correctly (replace the drive if necessary and copy over the partition table from a known good drive) and then simply add it back in again: | ||
<pre> | <pre> | ||
mdadm /dev/md2 -a /dev/ | mdadm /dev/md2 -a /dev/sdb3 | ||
</pre> | </pre> | ||
(repeat for other | (repeat for other partitions as appropriate). | ||
=== Change device count (optional) === | === Change device count (optional) === | ||
Line 118: | Line 119: | ||
mdadm --grow /dev/md2 -n 2 | mdadm --grow /dev/md2 -n 2 | ||
</pre> | </pre> | ||
If you do need to add disks back in again, you need to add them as spares (<code>mdadm /dev/md0 -a /dev/sdb1</code> etc) and then change the device count if you wish, as per the section on adding devices. | If you do need to add disks back in again, you need to add them as spares (<code>mdadm /dev/md0 -a /dev/sdb1</code> etc) and then change the device count if you wish to make the device active, as per the section on [[Raid_Administration#Add_devices_to_the_array|adding devices]]. | ||
=== Zero the superblock === | |||
Zeroing the superblock is important if you intend to take a disk from an array and add it to another array, for example on another machine. Zeroing the superblock will prevent the RAID array from becoming confused about which array it should be building; leaving the old superblock information on a disk means it will try to read this old superblock information and this can cause all manner of headaches. So, to remove the superblock from a disk so you can use it elsewhere, simply use the <code>--zero-superblock</code> option. To continue from our example above: | |||
<pre> | |||
mdadm --zero-superblock /dev/sdb1 | |||
mdadm --zero-superblock /dev/sdb2 | |||
mdadm --zero-superblock /dev/sdb3 | |||
</pre> | |||
drive /dev/sdb can then be removed and added to another RAID array without causing issues. | |||
== General recommendations == | == General recommendations == | ||
When making use of RAID arrays best practice is to have one more disk than is required | When making use of RAID arrays best practice is to have one more disk than is required and added as a spare. This immediately provides some form of redundancy. Remember that for RAID 1 you cannot go below 2 disks (well you can run on one disk, known as degraded mode, but ''this is best avoided at all costs'') and with RAID 5 you cannot go below 3 disks. In short, if you are using RAID, have a ''spare'' device configured. | ||
Disks cost money, but the data on those disks is often priceless! | Disks cost money, but the data on those disks is often priceless! | ||
Line 129: | Line 143: | ||
== Further information == | == Further information == | ||
[ | [https://linux.die.net/man/8/mdadm man mdadm] | ||
[https://en.wikipedia.org/wiki/RAID RAID on wikipedia] | [https://en.wikipedia.org/wiki/RAID RAID on wikipedia] | ||
[https://raid.wiki.kernel.org/index.php/Linux_Raid Linux RAID wiki at kernel.org] | [https://raid.wiki.kernel.org/index.php/Linux_Raid Linux RAID wiki at kernel.org] | ||
[[Category:Storage]] |
Latest revision as of 09:07, 12 January 2024
Introduction
Whilst there are articles on RAID installation (see 1, 2, 3 4, 5 for example) to varying degrees, this article is designed to provide practical information on RAID administration, regardless of RAID type used or installation method.
This article is of course using linux software RAID, named md, after the controlling process, which is controlled by the mdadm
command.
For the purposes of this example, we will create a RAID 1 array across /dev/sda and /dev/sdb using the setup-alpine
script (more specifically the setup-disk script) and then add /dev/sdc to the array after installation. This will add it as a hot spare which will be used if one of the other drives becomes degraded. Alternatively, the drive can immediately be added to the RAID array (as explained in the optional steps).
The instructions in this article should work regardless of whether you are using RAID 1 or RAID 5 and whether you have setup your disks manually or with the setup script, unless stated otherwise.
In this example /dev/sda, /dev/sdb and /dev/sdc are all virtual 2GB disks on a VMware machine (it doesn't matter that it's a VM, the same process applies to a real machine with physical disks of larger sizes).
In our example, all disks are available (present) at the time of installation, however /dev/sdc could be added at a later time. That has no impact on the procedure described other than having to physically add the disk.
Initial setup
Install with setup-alpine
and pass the relevant disks to setup-disk (in our case sda sdb
) and use installation method sys
.
This should create the following disk setup (it will differ in your setup since values of course depend on drive size):
md0 composed of /dev/sda1 and /dev/sdb1 ~100MB mounted as /boot md1 composed of /dev/sda2 and /dev/sdb2 ~512MB as /swap md2 composed of /dev/sda3 and /dev/sdb3 ~1400MB mounted as /
As you can see, we have redundancy across the two drives /dev/sda and /dev/sdb.
Review
Run df -h
and observe that the RAID arrays are mounted, not the disk partitions as usual.
To see information on the current RAID partitions use the query option:
mdadm --query /dev/md0
or for more information use the detail option
mdadm --detail /dev/md1
After the initial setup, if you haven't added the third drive (/dev/sdc) now is the time to poweroff and physically add it to the machine.
Add devices to the array
Now, let's add /dev/sdc to the RAID array.
Copy partition table
First, copy the partition table from an existing drive to the new drive. Be very careful with the dd command and ensure you are copying from/to the correct place!
dd if=/dev/sda of=/dev/sdc bs=512 count=1
Ensure this worked correctly by comparing the output of sfdisk, they should be identical:
sfdisk --dump /dev/sda sfdisk --dump /dev/sdc
Add devices
Now add the partitions of the new disk to the relevant RAID arrays. Be sure to add the correct partitions to the correct arrays!
mdadm /dev/md0 -a /dev/sdc1 mdadm /dev/md1 -a /dev/sdc2 mdadm /dev/md2 -a /dev/sdc3
You should see something like mdadm: added /dev/sdc1
if the command is successful. The -a flag is for add.
Now see how the output of the query command has changed from earlier:
mdadm --query /dev/md0 mdadm --query /dev/md1 mdadm --query /dev/md2
You should see we still have two devices in each array, plus now we have a spare. A spare is an inactive device that is a member of the array; it will only be used if one of the other devices fails. If this is good enough for you, you're done!
Grow the array (optional)
Otherwise you can take the optional step to add the 'spare' device so it immediately becomes part of the array. Since we're using RAID 1 in our example this effectively gives us another backup of all data:
mdadm --grow /dev/md0 -n 2
Should give you something like mdamd: /dev/md0: no change requested
. This is because we already have -n 2
set (so we use 2 devices in the array). Obviously the --grow flag is used to grow the array and increase (or decrease) the number of devices in the array. Let's increase the value and bring in the additional device to the array:
mdadm --grow /dev/md0 -n 3
You should see something like raid_disks for /dev/md0 set to 3
if successful. -n 3 specifies that there should be three active devices in the array. There can still be additional spare devices if you add more and do not grow the array.
Review the output of mdadm --query /dev/md0
or mdadm --detail /dev/md0
again to confirm it worked. Don't worry if you see something about 'spare rebuilding' - this is normal and will be replaced with a state of 'active sync' once data copying is complete.
Ensure to add the other devices (partitions) to the arrays by increasing the device count for the other arrays (otherwise they will remain as spares and not be immediately utilised):
mdadm --grow /dev/md1 -n 3 mdadm --grow /dev/md2 -n 3
Remove devices
To remove a failed device use the following; remember you will need to remove all the partitions of the failing drive (devices) from the relevant RAID arrays. In our example, we will mark the partitions of /dev/sdb as failed and remove them from the array:
mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1 mdadm /dev/md1 -f /dev/sdb2 -r /dev/sdb2 mdadm /dev/md2 -f /dev/sdb3 -r /dev/sdb3
Check the output of mdadm --detail /dev/md2
and see how the device is marked as 'removed'.
The -f flag is used to mark a device as failed and -r is used to remove a device from the array.
To add a removed device back in, ensure it's partitioned correctly (replace the drive if necessary and copy over the partition table from a known good drive) and then simply add it back in again:
mdadm /dev/md2 -a /dev/sdb3
(repeat for other partitions as appropriate).
Change device count (optional)
To entirely remove the device from the array (assuming you are not going to add it back later for instance) amend the device count again, this will remove it from the list so it no longer shows as 'removed' and we are back to two devices in the array:
mdadm --grow /dev/md0 -n 2 mdadm --grow /dev/md1 -n 2 mdadm --grow /dev/md2 -n 2
If you do need to add disks back in again, you need to add them as spares (mdadm /dev/md0 -a /dev/sdb1
etc) and then change the device count if you wish to make the device active, as per the section on adding devices.
Zero the superblock
Zeroing the superblock is important if you intend to take a disk from an array and add it to another array, for example on another machine. Zeroing the superblock will prevent the RAID array from becoming confused about which array it should be building; leaving the old superblock information on a disk means it will try to read this old superblock information and this can cause all manner of headaches. So, to remove the superblock from a disk so you can use it elsewhere, simply use the --zero-superblock
option. To continue from our example above:
mdadm --zero-superblock /dev/sdb1 mdadm --zero-superblock /dev/sdb2 mdadm --zero-superblock /dev/sdb3
drive /dev/sdb can then be removed and added to another RAID array without causing issues.
General recommendations
When making use of RAID arrays best practice is to have one more disk than is required and added as a spare. This immediately provides some form of redundancy. Remember that for RAID 1 you cannot go below 2 disks (well you can run on one disk, known as degraded mode, but this is best avoided at all costs) and with RAID 5 you cannot go below 3 disks. In short, if you are using RAID, have a spare device configured.
Disks cost money, but the data on those disks is often priceless!
It's a good idea to have a test environment to play around with RAID before implementing it in a production environment. Worst case, setup a VirtualBox host and run an Alpine VM and play around with that, prior to using a production system.