Migrating data: Difference between revisions
Dubiousjim (talk | contribs) (Created page, WIP) |
Dubiousjim (talk | contribs) (Added categories) |
||
Line 260: | Line 260: | ||
</dl> | </dl> | ||
[[Category:Installation]] | |||
[[Category:Storage]] |
Revision as of 13:42, 27 March 2012
This material is work-in-progress ... Do not follow instructions here until this notice is removed. |
We compare several methods for copying/migrating large amounts of data. The aim is to preserve all file permissions and metadata, which will require root access on both source and target filesystems.
We include information about some options that aren't available on BusyBox's implementation of these tools, because you may be copying data to/from systems with other Unix tools installed, which do provide those options.
If your source data occupies an entire partition (that is, nothing else resides on that partition), and your target partition is the same size or larger, and you're willing to have it use the same filesystem as the source, then one option is to bit-copy the whole source partition:
- dd or partimage
-
Just notes from StackExchange ----------------------------- Q: Does target partition also need to have same partition type (e.g., Linux 0x83) as the source? If the partitions are the exact same size and type, you can use dd. I've run into this scenario a few time: I need to replace a drive because I need more space, it's having problems, moving to a new box, etc. What's the best way to copy data from one partition size to another (presuming the target has enough space)? What about if it's a different file system (such as ReiserFS to ext4)? If it's just a new drive on an existing system, how do I ensure I don't need to reinstall to get everything working? The more optimal variant of dd is using partimage, it will copy only the used section of the partition making copying of large unused partitions more expedient. Note the important caveat: Partimage does NOT support Ext4 which is the default on new Ubuntu installations. When moving Linux installations between hard drives, I always boot from a Live CD and use dd to copy the entire partition. I recognize that this doesn't deal with changes in disk size (inevitably the new disk is bigger, which simplifies things). Now, of course, there's the issue of the partition not filling the new disk (RESIZING...)
- dump
- TODO
The other options we'll discuss suppose that the target filesystem is already formatted and mounted as /target.
If you're proposing to copy volatile data from a running system, you may want to copy from a lvm snapshot, or do the copy from single-user mode, or from a Live CD or other boot medium so that your source data is no longer in use during the copy. If you're just proposing to copy from /home or such, no such precautions are necessary.
If you're copying a complete filesystem, you'll probably only want to copy the /sys and /proc mountpoints, but not their current contents. Remember to copy /boot and /dev and /tmp, the last two of which may require special attention. Pay attention to other mountpoints (such as /mnt, /media) and RAM-based filesystems (such as /run and /lib/rc/init.d).
/dev is tricky nowadays on some systems since it's hidden by udev. Here's a possible solution:
mkdir /tmp/dev mount --move /dev /tmp/dev copy /dev to /mnt/sdb5 using one of the methods elsewhere on this page mount --move /tmp/dev /dev rmdir /tmp/dev
- cp
- Depending on what system you're working with, and what kind of data you're copying, you may be able to do a satisfactory local copy using just
cp
. But there are a number of limits here that prevent this from being a general solution. The Gnu implementation ofcp
provides the following options (among others):-a
: same as -RP -p/--preserve=mode,ownership,timestamps --preserve=links,context,xattr-v
: verbose-l
: make hard links instead of copying-s
: make symlinks instead of copying (source file names must be absolute)--sparse=always
-u
,--update
: don't copy non-directories onto an existing destination unless its mtime is older-x
,--one-file-system
: skip subdirectories on different filesystems
-s
and--sparse
and-u
, and the second group of--preserve
options: in particular, it always copies hard links as separate files. The Mac version shares the limitations of the FreeBSD version, and additionally lacks-x
. Also, on the Mac version the-a
shortcut isn't available, you must explicitly say-RPp
. The BusyBox version provides all of the Gnu options except--sparse
and-u
and-x
, and the second group of--preserve
options; it silently ignores the-v
option. So on many of these implementations, including BusyBox's,cp
will break hard links. Avoid the lower-case-r
option. Its behavior varies between implementations, and may diverge from what you want for special files or symlinks. - tar
-
Options recognized by BusyBox and other implementations:
-c create a new archive records existing perms and mtimes, symbolic owner/group (can specify --numeric-owner), hard links among archive elements -x extract files from an archive uses current owner/group unless root, subtracts current umask unless root or -p, must extract first hard link -t list the contents of an archive -f TARFILE: use '-' for stdin/stdout on some but not all implementations, defaults to stdin/stdout when not supplied -C DIR: change to directory DIR, when creating this is order-sensitive -v: verbose -z, --gzip, --gunzip --ungzip -j, --bzip2 -Z, --compress, --uncompress -O extract files to standard output -h follow symlinks; archive and extract the files they point to (like -L in other programs) -m don't extract file mtime, leave touched with present time -T FILE: get names to extract or create from FILE, can also include "-C/etc" lines --exclude=PATTERN: exclude specified glob PATTERNs -X FILE: exclude glob patterns listed in FILE Default globbing: matches after any /, not just at ^ case-sensitive wildcards match / undocumented but allegedly honored by BusyBox --------------------------------------------- -o, --no-same-owner (BSD doesn't recognize a long option) extract as yourself (default for ordinary users) -p, --same-permissions (BSD doesn't recognize this long option) extract permissions verbatim, instead of subtracting current umask (usually default for root) --no-same-permissions subtract umask from extracted permissions (default for ordinary users) --numeric-owner create using numbers for user/groups -k don't replace existing files when extracting --overwrite (not on BSD) more aggressively overwrites existing files and directory metadata when extracting default is to first remove existing files (otherwise, all their hardlinks would be modified) and symlinks if existing dir is nonempty, their metadata will be updated instead --overwrite instead *follows* existing symlinks, removes anything that impedes extraction except non-empty dirs (Gnu's --recursive-unlink removes even that, will replace with a file or symlink)
Gnu options (also Mac and FreeBSD unless noted) also include:--one-file-system when creating archive, stay on the volume(s) where the specified roots are located, and don't cross mountpoints --null -T reads null-terminated names, ignores "-C/etc" entries -l, --check-links (BSD for -c only) when creating or extracting, print a message if not all hard links are processed -S, --sparse (not on BSD) when creating archive, handle sparse files efficiently
Some Gnu tar formats:
- in future, will default to
-H posix/pax
, which is the POSIX.1-2001 format - v.1.13 - current (v1.26 released Mar 2011, still current in Mar 2012): defaults to
-H gnu
, which is not very different from theoldgnu
format. -H ustar
is the portable POSIX.1-1988 format, some limits including 256 char filename lengths, 8GB file size
Example of local copy:sudo tar -cf- [--one-file-system] /source | sudo tar -xp[v]f- -C /target
The optional flags are:
--one-file-system
: Stay on the volume where /source is located. This may or may not be the behavior you desire. Note that this feature isn't available on BusyBoxtar
in any case, though it is available in some othertar
implementations.-v
: verbose
Example of using in cpio-style:
cd /source sudo find . [-xdev] -depth [-print0] | sudo tar -cf- -T- [--null] | sudo tar -xp[v]f- -C /target
The next examples add the
-z
flag, to compress output using gzip. If you have a fast connection, you may omit this; alternatively, you may use-j
for bzip2. Example of copying to remote machine:tar -czf- [--one-file-system] /source | ssh root@machine "tar -xpz[v]f- -C /target"
Example of copying from remote machine:
ssh root@machine "tar -czf- [--one-file-system] /source" | tar -xpz[v]f- -C /target
- in future, will default to
- cpio
-
cpio operates in one of these four modes:
-o -H newc
: receive list of files from stdin (or -F ARCHIVE), create archive on stdout (like tar's -cf- -T-)-H newc
: (on Mac, -H sv4cpio) specifies to use SVR4 cpio format instead of old POSIX.1 octet format, this is required to create archives in BusyBox
-i
: receive archive from stdin (or -F ARCHIVE), extract specified patterns underneath curdir (like tar's -xf-)-it
: (on BusyBox, simply -t) receive archive from stdin (or -F ARCHIVE), list contents to stdout (like tar's -tf-)-p /target
: receive list of files from stdin (or -F ARCHIVE), copy them underneath /target (like tar's -cf- -T- | tar -xf- -C /target)
- if you have includes/excludes, it's easier to use find than unfamiliar and non-portable options of Gnu tar
- cpio handles special files (block and char devices, fifos, etc); traditional tar ("v7" format) didn't, also didn't store symbolic owner/group
- new ASCII and CRC ASCII cpio formats: 1024 char filename lengths (though Gnu cpio can handle arbitrary lengths)
- tar can handle 32-bit inodes; cpio's binary and portable ASCII formats can't
- tar must extract first hard link from archive
- cpio allegedly doesn't copy files over 4G properly (implementation? version?)
cd /source; find . [-xdev] -depth [-print0] | cpio -pdm[vul] [-0] /target
The flags are:
-xdev
: stay on the volume where /source is located; this may or may not be the behavior you desire-print0
/-0
: able to process filenames with embedded\n
s, not available on BusyBoxcpio
or Mac-d
: create leading directories, likemkdir -p
-m
: preserve file's original mtime-v
: verbose-u
: overwrite existing files, even if they're newer than files being copied-l
: create hard links instead of copying. This may or may not be the behavior you desire. Note that this feature isn't available on BusyBoxcpio
in any case, though it is available in some othercpio
implementations.
Other flags are:
-oA
: append to existing archive (not on BusyBox or BSD)-L
: follow symlinks (not on BusyBox)--sparse
: write files with blocks of zeros as sparse files (Gnu only)
- rsync
-
rsync
must be installed on both source and target machines. Example:rsync -a[vzx] --delete [--numeric-ids] [-HAX] [--exclude=/proc --exclude=/dev --exclude=/sys] /source/ [root@machine:]/target
Note that the trailing / on /source/ is essential, if you want /target to end up a clone of /source.
The flags are:
-v
: verbose-z
: compress data during transfer-x
or--one-file-system
: EXPLAIN--numeric-ids
: EXPLAIN-H
or--hard-links
: EXPLAIN-A
or--acls
: EXPLAIN-X
or--xattrs
: EXPLAIN
-h
or--human-readable
: EXPLAIN--progress
: EXPLAIN, implies-v
-8
or--8-bit-output
: EXPLAIN--inplace
: EXPLAIN
rsync's advantages:- like cp, can handle ACLs and xattrs; many tar and cpio implementations don't
- like cpio, can optionally work over the network
- can resume incomplete transfers, and can do incremental updates if the data is being copied multiple times
Limits:- I had it stop in the middle of a large copy because of some funky filenames (perhaps they included
\n
s); cpio handled these fine.
- lvm
- "This is one reason I like LVM. Just add the new disk to the volume group, pvmove the logical volumes from the old to new disk, [wait a bit], remove the old disk from the volume group, and then from the system. If it's your boot disk you're replacing then you also need to update your boot loader." EXPAND