Migrating data: Difference between revisions

From Alpine Linux
(Added categories)
m (Removed wikilink to an empty/RfD article.)
 
(13 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Draft}}
We compare several methods for copying/migrating large amounts of data. The aim is to preserve all file permissions and metadata, which requires root access on both source and target filesystems.
We compare several methods for copying/migrating large amounts of data. The aim is to preserve all file permissions and metadata, which will require root access on both source and target filesystems.


We include information about some options that aren't available on BusyBox's implementation of these tools, because you may be copying data to/from systems with other Unix tools installed, which do provide those options.
We include information about some options that aren't available on the BusyBox implementation of these tools, because you may be copying data to/from systems with other Unix tools installed, which do provide those options.


{{Warning| None of the methods below will copy the bootmanager (ext/syslinux, grub, etc) to a new drive; you'll have to install it there explicitly.}}
{{Warning| None of the methods below will copy the bootmanager (ext/syslinux, grub, etc) to a new drive; you'll have to install it there explicitly.}}


== Bit-copying methods ==


If your source data occupies an entire partition (that is, nothing else resides on that partition), and your target partition is the same size or larger, and you're willing to have it use the same filesystem as the source, then one option is to bit-copy the whole source partition:
If your source data occupies an entire partition (that is, nothing else resides on that partition), and your target partition is the same size or larger, and you're willing to have it use the same filesystem as the source, then one option is to bit-copy the whole source partition:




<dl>
Boot from LiveCD and make sure that neither /dev/<var>sourcepart</var> nor /dev/<var>targetpart</var> are mounted.
<dt>dd or partimage
<dd>
<pre>
Just notes from StackExchange
-----------------------------
Q: Does target partition also need to have same partition type (e.g., Linux 0x83) as the source?


If the partitions are the exact same size and type, you can use dd.
Example:
{{Cmd| dd bs{{=}}10M if{{=}}/dev/<var>sourcepart</var> of{{=}}/dev/<var>targetpart</var>}}


I've run into this scenario a few time: I need to replace a drive because I need more space, it's having problems, moving to a new box, etc.
This also copies unoccupied blocks on the <var>sourcepart</var> to the <var>targetpart</var>. A more optimal variant is to use [https://www.partimage.org/Main_Page partimage], which copies only the used portions of the sourcepart. One limitation is that currently (v0.6.9, released July 2010, current as of Mar 2012) partimage doesn't support ext4 format.
What's the best way to copy data from one partition size to another (presuming the target has enough space)?
What about if it's a different file system (such as ReiserFS to ext4)?
If it's just a new drive on an existing system, how do I ensure I don't need to reinstall to get everything working?


{{Todo| Generally the target partition will have same partition type (e.g., Linux 0x83) as the source; but is this strictly necessary?}}


The more optimal variant of dd is using partimage, it will copy only the used section of the partition making copying of large unused partitions more expedient.
Once you've finished, the filesystem on <var>targetpart</var> may occupy less than the total space available on <var>targetpart</var>. You may want to use [[Filesystems|a tool like resize2fs]] to grow the partition.
Note the important caveat:
    Partimage does NOT support Ext4 which is the default on new Ubuntu installations.


When moving Linux installations between hard drives, I always boot from a Live CD and use dd to copy the entire partition. I recognize that this doesn't deal with changes in disk size (inevitably the new disk is bigger, which simplifies things).
Now, of course, there's the issue of the partition not filling the new disk (RESIZING...)
</pre>


<dt>dump
== Higher-level methods ==
<dd>
TODO


The other options we'll discuss assume the target filesystem is already formatted and mounted as {{Path|/target}}.


</dl>
If you need to copy volatile data from a running system, you may want to copy from a lvm snapshot, or do the copy from single-user mode, or from a LiveCD or other boot medium so your source data is not in use during the copy. If you want to copy from {{Path|/home}} or such, no such precautions are necessary.


If you're copying a complete filesystem, you'll probably only want to copy the {{Path|/sys}} and {{Path|/proc}} mountpoints, but not their current contents. Remember to copy {{Path|/boot}} and {{Path|/dev}} and {{Path|/tmp}}, the last two of which may require special attention. Pay attention to other mountpoints (such as {{Path|/mnt}}, {{Path|/media}}) and RAM-based filesystems (such as {{Path|/run}} and {{Path|/lib/rc/init.d}}).


{{Todo|The preceding comments assume you know what you are doing, and only serve as reminders. Would be a good idea to link to a fuller explanation.}}


The other options we'll discuss suppose that the target filesystem is already formatted and mounted as {{Path|/target}}.
{{Note| nowadays, /dev is tricky on some systems since it's hidden by udev. Here's a possible solution:
 
If you're proposing to copy volatile data from a running system, you may want to copy from a lvm snapshot, or do the copy from single-user mode, or from a Live CD or other boot medium so that your source data is no longer in use during the copy. If you're just proposing to copy from /home or such, no such precautions are necessary.
 
If you're copying a complete filesystem, you'll probably only want to copy the {{Path|/sys}} and {{Path|/proc}} mountpoints, but not their current contents. Remember to copy {{Path|/boot}} and {{Path|/dev}} and {{Path|/tmp}}, the last two of which may require special attention. Pay attention to other mountpoints (such as {{Path|/mnt}}, {{Path|/media}}) and RAM-based filesystems (such as {{Path|/run}} and {{Path|/lib/rc/init.d}}).
{{Todo|The preceding comments assume you know what you are doing, and only serve as reminders. Would be good to link to a fuller explanation.}}
 
{{Note|
/dev is tricky nowadays on some systems since it's hidden by udev. Here's a possible solution:
{{Cmd|mkdir /tmp/dev
{{Cmd|mkdir /tmp/dev
mount --move /dev /tmp/dev
mount --move /dev /tmp/dev
Line 60: Line 41:




<dl>
=== cp ===
<dt>cp
Depending on which system you're working with, and what kind of data you're copying, you ''may'' be able to do a satisfactory ''local'' copy using just <code>cp</code>. However, there are a number of limits that prevent this from being a general solution.
<dd>Depending on what system you're working with, and what kind of data you're copying, you ''may'' be able to do a satisfactory ''local'' copy using just <code>cp</code>. But there are a number of limits here that prevent this from being a general solution.


The Gnu implementation of <code>cp</code> provides the following options (among others):
The Gnu implementation of <code>cp</code> provides the following options (among others):


* <code>-a</code>: same as -RP -p/--preserve=mode,ownership,timestamps --preserve=links,context,xattr
* <code>-a</code>: same as <code>-RP -p/--preserve=mode,ownership,timestamps,links,context,xattr</code>
* <code>-v</code>: verbose
* <code>-v</code>: verbose
* <code>-l</code>: make hard links instead of copying
* <code>-l</code>: make hard links instead of copying
* <code>-s</code>: make symlinks instead of copying (source file names must be absolute)
* <code>-s</code>: make symlinks instead of copying (source file names must be absolute)
* <code>--sparse=always</code>
* <code>--sparse=always</code>
* <code>-u</code>, <code>--update</code>: don't copy non-directories onto an existing destination unless its mtime is older
* <code>-u</code>, <code>--update</code>: don't copy files or symlinks over an existing file unless its mtime is older
* <code>-x</code>, <code>--one-file-system</code>: skip subdirectories on different filesystems
* <code>-x</code>, <code>--one-file-system</code>: skip subdirectories on different volumes
      
      
The FreeBSD implementation provides all of these except <code>-s</code> and <code>--sparse</code> and <code>-u</code>, and the second group of <code>--preserve</code> options: in particular, it always copies hard links as separate files. The Mac version shares the limitations of the FreeBSD version, and additionally lacks <code>-x</code>. Also, on the Mac version the <code>-a</code> shortcut isn't available, you must explicitly say <code>-RPp</code>.
The BusyBox version provides all of the GNU options except <code>--sparse</code>, <code>-u</code> and <code>-x</code>, and may lack the <code>context</code> and <code>xattr</code> options to <code>--preserve</code>. It will preserve hard links: if {{Path|/source/A}} and {{Path|/source/B}} are hard links, {{Path|/target/A}} and {{Path|/target/B}} will also be hard links with each other (though not with the originals). BusyBox's <code>cp</code> seems to silently ignore the <code>-v</code> option.


The BusyBox version provides all of the Gnu options except <code>--sparse</code> and <code>-u</code> and <code>-x</code>, and the second group of <code>--preserve</code> options; it silently ignores the <code>-v</code> option.
The FreeBSD implementation provides all of these except <code>-s</code>, <code>--sparse</code> and <code>-u</code>, and the <code>--preserve</code> options from <code>links</code> on: in particular, it always copies hard links as separate files. The Mac version shares the limitations of the FreeBSD version, and additionally lacks <code>-x</code>. Also, on the Mac version the <code>-a</code> shortcut isn't available, you must explicitly say <code>-RPp</code>.


So on many of these implementations, including BusyBox's, <code>cp</code> will break hard links.
{{Note| As just explained, some of these implementations will break hard links.}}


Avoid the lower-case <code>-r</code> option. Its behavior varies between implementations, and may diverge from what you want for special files or symlinks.
{{Warning| Avoid the lower-case <code>-r</code> option. Its behavior varies between implementations, and may diverge from what you want for special files or symlinks.}}






<dt>tar
=== tar ===
<dd>


Options recognized by BusyBox and other implementations:
Options recognized by BusyBox and other implementations:
<pre>
-c create a new archive
  records existing perms and mtimes, symbolic owner/group (can specify --numeric-owner), hard links among archive elements
-x extract files from an archive
  uses current owner/group unless root, subtracts current umask unless root or -p, must extract first hard link
-t list the contents of an archive


-f TARFILE: use '-' for stdin/stdout
* <code>-c</code>: create a new archive, records existing perms and mtimes, symbolic owner/group (can specify <code>--numeric-owner</code>), hard links among archive elements
  on some but not all implementations, defaults to stdin/stdout when not supplied
* <code>-x</code>: extract files from an archive, uses current owner/group unless root, subtracts current umask unless root or <code>-p</code>, when extracting hard links must include first
-C DIR: change to directory DIR, when creating this is order-sensitive
* <code>-t</code>: list the contents of an archive
-v: verbose
 
* <code>-f <var>archive</var></code>: use <code>-f -</code> for stdin/stdout
: {{Note| On some but not all implementations, the archive defaults to stdin/stdout when <code>-f</code> is not supplied. It's most portable to always declare this explicitly.}}
* <code>-O</code>: extract files to standard output
* <code>-C <var>dir</var></code>: change to directory <var>dir</var>, when creating this is order-sensitive
* <code>-v</code>: verbose


-z, --gzip, --gunzip --ungzip
* <code>-z</code>: compress archive with gzip
-j, --bzip2
* <code>-j</code>: compress archive with bzip2
-Z, --compress, --uncompress
* <code>-Z</code>: compress archive with compress


-O extract files to standard output
* <code>-h</code>: follow symlinks, archive and extract the files they point to (like <code>-L</code> in other programs)
-h follow symlinks; archive and extract the files they point to (like -L in other programs)
* <code>-m</code>: don't extract file's original mtime, leave touched with the extraction time
-m don't extract file mtime, leave touched with present time
* <code>-T <var>file</var></code>: get names to extract or create from <var>file</var>, which can also include "-C/etc" lines
-T FILE: get names to extract or create from FILE, can also include "-C/etc" lines
* <code>--exclude=<var>pattern</var></code>: exclude specified glob <var>pattern</var>s
--exclude=PATTERN: exclude specified glob PATTERNs
* <code>-X <var>file</var></code>: exclude glob patterns listed in <var>file</var>
-X FILE: exclude glob patterns listed in FILE
: {{Note| Default globbing rules: (i) matches after any /, not just at ^; (ii) case-sensitive; (iii) wildcards match /}}


Default globbing: matches after any /, not just at ^
The following common options are allegedly also honored by BusyBox (according to
                  case-sensitive
comments in its source) but aren't declared in its <code>--help</code>:
                  wildcards match /


undocumented but allegedly honored by BusyBox
* <code>-o</code>, <code>--no-same-owner</code>: (BSD doesn't recognize a long option) extract as yourself (default for ordinary users)
---------------------------------------------
-o, --no-same-owner (BSD doesn't recognize a long option)
      extract as yourself (default for ordinary users)


-p, --same-permissions (BSD doesn't recognize this long option)
* <code>-p</code>, <code>--same-permissions</code>: (BSD doesn't recognize this long option) extract permissions verbatim, instead of subtracting current umask (usually default for root)
      extract permissions verbatim, instead of subtracting current umask (usually default for root)


--no-same-permissions
* <code>--no-same-permissions</code>: subtract umask from extracted permissions (default for ordinary users)
      subtract umask from extracted permissions (default for ordinary users)


--numeric-owner
* <code>--numeric-owner</code>: create using numbers for user/groups
      create using numbers for user/groups


-k
* <code>-k</code>: don't replace existing files when extracting
      don't replace existing files when extracting


--overwrite (not on BSD)
<ul>
      more aggressively overwrites existing files and directory metadata when extracting
<li><code>--overwrite</code>: (not on BSD) more aggressively overwrite existing files and directory metadata when extracting<br />
       default is to first remove existing files (otherwise, all their hardlinks would be modified) and symlinks
       tar's default behavior is to first remove existing files (otherwise, all their hardlinks would be modified) and symlinks.
       if existing dir is nonempty, their metadata will be updated instead
       If an existing dir is nonempty, its existing contents won't be removed; tar will instead just update the dir's metadata.
       --overwrite instead *follows* existing symlinks, removes anything that impedes extraction except non-empty dirs
       <code>--overwrite</code> instead ''follows'' existing symlinks, and removes anything that impedes extraction except non-empty dirs.
       (Gnu's --recursive-unlink removes even that, will replace with a file or symlink)
       (Gnu's <code>--recursive-unlink</code> removes even those, may replace them with files or symlinks.)
</pre>
</ul>


GNU options (Mac and FreeBSD as well, unless noted) include:


Gnu options (also Mac and FreeBSD unless noted) also include:
* <code>--one-file-system</code>: when creating archive, stay on the volume(s) where the specified roots are located, and don't cross mountpoints
<pre>
--one-file-system
  when creating archive, stay on the volume(s) where the specified roots are located, and don't cross mountpoints


--null
* <code>--null</code>: <code>-T</code> reads null-terminated names, ignores "-C/etc" entries
  -T reads null-terminated names, ignores "-C/etc" entries


-l, --check-links (BSD for -c only)
* <code>-l</code>, <code>--check-links</code>: (BSD for <code>-c</code> only) when creating or extracting, print a message if not all hard links are processed
  when creating or extracting, print a message if not all hard links are processed


-S, --sparse (not on BSD)
* <code>-S</code>, <code>--sparse</code>: (not on BSD) when creating archive, handle sparse files efficiently
  when creating archive, handle sparse files efficiently
</pre>


Some Gnu tar formats:
 
* in future, will default to <code>-H posix/pax</code>, which is the POSIX.1-2001 format
Some GNU tar formats:
* v.1.13 - current (v1.26 released Mar 2011, still current in Mar 2012): defaults to <code>-H gnu</code>, which is not very different from the <code>oldgnu</code> format.
* in the future, it will default to <code>-H posix/pax</code>, which is the POSIX.1-2001 format
* <code>-H ustar</code> is the portable POSIX.1-1988 format, some limits including 256 char filename lengths, 8GB file size
* v.1.13 - current (v1.26 released Mar 2011, still current in Mar 2012): defaults to <code>-H gnu</code>, which is not very different from the <code>-H oldgnu</code> format.
* <code>-H ustar</code> is the portable POSIX.1-1988 format, some limits including max 256 char filenames, max 8G filesize; supports special files, but not sparse files. Also seems not to support xattrs and ACLs (other tar formats are meant to handle ACLs, but see [[#tar_acl|comments below]])




Line 170: Line 135:
* <code>-v</code>: verbose
* <code>-v</code>: verbose


Example of using in cpio-style:
Example of using tar in cpio-style (for which, see below):
{{Cmd|cd /source
{{Cmd|cd /source
sudo find . [-xdev] -depth [-print0] {{!}} sudo tar -cf- -T- [--null] {{!}} sudo tar -xp[v]f- -C /target}}
sudo find . [-xdev] -depth [-print0] {{!}} sudo tar -cf- -T- [--null] {{!}} sudo tar -xp[v]f- -C /target}}


The next examples add the <code>-z</code> flag, to compress output using gzip. If you have a fast connection, you may omit this; alternatively, you may use <code>-j</code> for bzip2. Example of copying to remote machine:
The next examples add the <code>-z</code> flag, to compress output using gzip. If you have a fast connection, you may omit this; alternatively, you may use <code>-j</code> for bzip2. Example of copying to remote machine:
{{Cmd|tar -czf- [--one-file-system] /source {{!}} ssh root@machine "tar -xpz[v]f- -C /target"}}
{{Cmd|sudo tar -czf- [--one-file-system] /source {{!}} ssh root@machine "tar -xpz[v]f- -C /target"}}


Example of copying from remote machine:
Example of copying from remote machine:
{{Cmd|ssh root@machine "tar -czf- [--one-file-system] /source" {{!}} tar -xpz[v]f- -C /target}}
{{Cmd|ssh root@machine "tar -czf- [--one-file-system] /source" {{!}} sudo tar -xpz[v]f- -C /target}}


=== cpio ===


cpio operates in one of four modes:
* <code>-o -H newc</code>: create archive on stdout (or <code>-F <var>archive</var></code>), including filenames supplied from stdin (like <code>tar -cf- -T-</code>)
: <code>-H newc</code>: (on Mac, <code>-H sv4cpio</code> and/or <code>-c</code>) specifies to use SVR4 ASCII format, this is required to create archives in BusyBox
* <code>-i</code>: receive archive from stdin (or <code>-F <var>archive</var></code>), extract specified patterns underneath curdir (like <code>tar -xf-</code>)
* <code>-t</code>: (on some systems, may need <code>-it</code>) receive archive from stdin (or <code>-F <var>archive</var></code>), list contents to stdout (like <code>tar -tf-</code>)
* <code>-p /target</code>: receive list of files from stdin (or <code>-F <var>archive</var></code>), copy them underneath {{Path|/target}} (like <code>tar -cf- -T- | tar -xf- -C /target</code>)




<dt>cpio
tar vs cpio:
<dd>
* if you have includes/excludes, it's easier to use <code>find</code> than the unfamiliar and non-portable options of GNU tar
cpio operates in one of these four modes:
* cpio handles special files (block and char devices, fifos, etc) traditional tar (<code>-H v7</code> format) didn't, also didn't store symbolic owner/group
* <code>-o -H newc</code>: receive list of files from stdin (or -F ARCHIVE), create archive on stdout (like tar's -cf- -T-)
* when extracting hard links, tar must always include the first
*: <code>-H newc</code>: (on Mac, -H sv4cpio) specifies to use SVR4 cpio format instead of old POSIX.1 octet format, this is required to create archives in BusyBox
* tar and the <code>-H newc</code> cpio format handle 32-bit inodes; though some older cpio formats didn't
* <code>-i</code>: receive archive from stdin (or -F ARCHIVE), extract specified patterns underneath curdir (like tar's -xf-)
* the <code>-H newc</code> cpio format has max 1024 char filenames (though GNU cpio can handle arbitrary lengths) and max 4G filesize. The portable <code>-H ustar</code> tar format has max 256 char filenames and max 8G filesize limits
* <code>-it</code>: (on BusyBox, simply -t) receive archive from stdin (or -F ARCHIVE), list contents to stdout (like tar's -tf-)
* in some implementations, cpio wouldn't copy files over 2G properly. Don't have details about which implementations/versions are so limited
* <code>-p /target</code>: receive list of files from stdin (or -F ARCHIVE), copy them underneath /target (like tar's -cf- -T- | tar -xf- -C /target)
* <span id="tar_acl">some tar formats are supposed to handle acls, however [https://unix.stackexchange.com/q/391/4801 in practice this doesn't seem very reliable]. Not sure what the situation is with cpio. rsync is known to work well for such cases</span>
 
<!-- Some more info: https://www.suse.de/~aj/linux_lfs.html -->


tar vs cpio:
* if you have includes/excludes, it's easier to use find than unfamiliar and non-portable options of Gnu tar
* cpio handles special files (block and char devices, fifos, etc); traditional tar ("v7" format) didn't, also didn't store symbolic owner/group
* new ASCII and CRC ASCII cpio formats: 1024 char filename lengths (though Gnu cpio can handle arbitrary lengths)
* tar can handle 32-bit inodes; cpio's binary and portable ASCII formats can't
* tar must extract first hard link from archive
* cpio allegedly doesn't copy files over 4G properly (implementation? version?)


Example of local copy:
Example of local copy:
{{Cmd|cd /source; find . [-xdev] -depth [-print0] {{!}} cpio -pdm[vul] [-0] /target}}
{{Cmd|cd /source
sudo find . [-xdev] -depth [\! -path ./lost+found] [-print0] {{!}} sudo cpio -pdm[vul] [-0] /target}}
 


The flags are:
The flags are:
* <code>-xdev</code>: stay on the volume where {{Path|/source}} is located; this may or may not be the behavior you desire
* <code>-xdev</code>: stay on the volume where {{Path|/source}} is located; this may or may not be the behavior you desire
* <code>-print0</code>/<code>-0</code>: able to process filenames with embedded <code>\n</code>s, not available on BusyBox <code>cpio</code> or Mac
* <code>\! -path ./lost+found</code>: omit empty {{Path|./lost+found}}; <code>!</code> only needs to be escaped for some shells, and this sequence may be repeated
* <code>-print0</code> ... <code>-0</code>: permits processing filenames with embedded <code>\n</code>s, not available on BusyBox <code>cpio</code> or Mac
* <code>-d</code>: create leading directories, like <code>mkdir -p</code>
* <code>-d</code>: create leading directories, like <code>mkdir -p</code>
* <code>-m</code>: preserve file's original mtime
* <code>-m</code>: preserve file's original mtime
Line 213: Line 183:


Other flags are:
Other flags are:
* <code>-oA</code>: append to existing archive (not on BusyBox or BSD)
* <code>-oA -F <var>archive</var></code>: append to existing archive (not on BusyBox or BSD)
* <code>-L</code>: follow symlinks (not on BusyBox)
* <code>-L</code>: follow symlinks (not on BusyBox)
* <code>--sparse</code>: write files with blocks of zeros as sparse files (Gnu only)
* <code>--sparse</code>: write files with blocks of zeros as sparse files (Gnu only)




Example of copying to remote machine:
{{Cmd|cd /source
sudo find . [-xdev] -depth [-print0] {{!}} sudo cpio -o -H newc [-0] {{!}} [gzip -3 {{!}}] ssh root@machine "cd /target; gunzip {{!}} cpio -idm[vu]"}}


=== rsync ===


<dt>rsync
<dd>
<code>rsync</code> must be installed on both source and target machines. Example:
<code>rsync</code> must be installed on both source and target machines. Example:


Line 232: Line 203:
* <code>-v</code>: verbose
* <code>-v</code>: verbose
* <code>-z</code>: compress data during transfer
* <code>-z</code>: compress data during transfer
* <code>-x</code> or <code>--one-file-system</code>: EXPLAIN
* <code>-x</code> or <code>--one-file-system</code>: skip subdirectories on different volumes
* <code>--numeric-ids</code>: EXPLAIN
* <code>--numeric-ids</code>: rsync's default is to use symbolic user/groupnames
* <code>-H</code> or <code>--hard-links</code>: EXPLAIN
* <code>-H</code> or <code>--hard-links</code>: preserve hard links in copied data, this can be expensive, and won't break existing hard links on {{Path|/target}} unless rsync needed to write updated data to some of the linked files
* <code>-A</code> or <code>--acls</code>: EXPLAIN
* <code>-A</code> or <code>--acls</code>: preserve ACLs
* <code>-X</code> or <code>--xattrs</code>: EXPLAIN
* <code>-X</code> or <code>--xattrs</code>: preserve xattrs


* <code>-h</code> or <code>--human-readable</code>: EXPLAIN
* <code>-8</code> or <code>--8-bit-output</code>: don't escape chars in filename even if they're invalid in current locale
* <code>--progress</code>: EXPLAIN, implies <code>-v</code>
* <code>--partial</code>: keep partially-transferred files
* <code>-8</code> or <code>--8-bit-output</code>: EXPLAIN
<ul>
* <code>--inplace</code>: EXPLAIN
<li><code>--inplace</code>: overwrite existing files in place<br />
rsync's default behavior is to write new files and move them into place when complete. This breaks any hard links the existing file may have had and makes a copy-on-write filesystem see the file as entirely new. <code>--inplace</code> specifies an alternate behavior, of writing updated data directly into the existing file.
</ul>
* <code>-h</code> or <code>--human-readable</code>: report statistics in prettier form (use once for powers of 10, twice for powers of 2)
* <code>--progress</code>: show progress of each file transferred, implies <code>-v</code>
* <code>-n</code> or <code>--dry-run</code>


[https://rsync.samba.org/ftp/rsync/rsync.html rsync's full manpage]


rsync's advantages:
rsync's advantages:
* like cp, can handle ACLs and xattrs; many tar and cpio implementations don't
* like cp, can handle ACLs and xattrs. Many tar and cpio implementations don't
* like cpio, can optionally work over the network
* like cpio, can optionally work over a network
* can resume incomplete transfers, and can do incremental updates if the data is being copied multiple times
* can resume incomplete transfers, and can do incremental updates if the data is being copied multiple times




Limits:
Including/excluding from transfers:
* I had it stop in the middle of a large copy because of some funky filenames (perhaps they included <code>\n</code>s); cpio handled these fine.
* <code>--include=<var>pattern</var>[/]</code>
* <code>--exclude=<var>pattern</var>[/]</code>
Each of these may be repeated. The <var>pattern</var>s may contain <code>*</code>, <code>**</code>, <code>?</code>, <code>[a-z]</code>.
A trailing <code><var>dir</var>/***</code> matches both <var>dir</var> and its contents.
If <var>pattern</var> contains a <code>**</code> or (non-trailing) <code>/</code>, it matches against the full path from {{Path|/target}}, else it only matches against the final path element.


=== lvm ===


<dt>lvm
"This is one reason I like LVM. Just add the new disk to the volume group, pvmove the logical volumes from the old to new disk, [wait a bit], remove the old disk from the volume group, and then from the system. If it's your boot disk you're replacing then you also need to update your boot loader." <!-- https://superuser.com/a/11468/28187 -->
<dd>
"This is one reason I like LVM. Just add the new disk to the volume group, pvmove the logical volumes from the old to new disk, [wait a bit], remove the old disk from the volume group, and then from the system. If it's your boot disk you're replacing then you also need to update your boot loader." EXPAND


See [[Setting up Logical Volumes with LVM]].
=== Other tools ===
{{Cmd|cd /source
pax -pe -rw -v [-X] -YZ . /target}}


</dl>


[[Category:Installation]]
[[Category:Installation]]
[[Category:Storage]]
[[Category:Storage]]

Latest revision as of 16:00, 14 August 2023

We compare several methods for copying/migrating large amounts of data. The aim is to preserve all file permissions and metadata, which requires root access on both source and target filesystems.

We include information about some options that aren't available on the BusyBox implementation of these tools, because you may be copying data to/from systems with other Unix tools installed, which do provide those options.

Warning: None of the methods below will copy the bootmanager (ext/syslinux, grub, etc) to a new drive; you'll have to install it there explicitly.


Bit-copying methods

If your source data occupies an entire partition (that is, nothing else resides on that partition), and your target partition is the same size or larger, and you're willing to have it use the same filesystem as the source, then one option is to bit-copy the whole source partition:


Boot from LiveCD and make sure that neither /dev/sourcepart nor /dev/targetpart are mounted.

Example:

dd bs=10M if=/dev/sourcepart of=/dev/targetpart

This also copies unoccupied blocks on the sourcepart to the targetpart. A more optimal variant is to use partimage, which copies only the used portions of the sourcepart. One limitation is that currently (v0.6.9, released July 2010, current as of Mar 2012) partimage doesn't support ext4 format.

Todo: Generally the target partition will have same partition type (e.g., Linux 0x83) as the source; but is this strictly necessary?


Once you've finished, the filesystem on targetpart may occupy less than the total space available on targetpart. You may want to use a tool like resize2fs to grow the partition.


Higher-level methods

The other options we'll discuss assume the target filesystem is already formatted and mounted as /target.

If you need to copy volatile data from a running system, you may want to copy from a lvm snapshot, or do the copy from single-user mode, or from a LiveCD or other boot medium so your source data is not in use during the copy. If you want to copy from /home or such, no such precautions are necessary.

If you're copying a complete filesystem, you'll probably only want to copy the /sys and /proc mountpoints, but not their current contents. Remember to copy /boot and /dev and /tmp, the last two of which may require special attention. Pay attention to other mountpoints (such as /mnt, /media) and RAM-based filesystems (such as /run and /lib/rc/init.d).

Todo: The preceding comments assume you know what you are doing, and only serve as reminders. Would be a good idea to link to a fuller explanation.


Note: nowadays, /dev is tricky on some systems since it's hidden by udev. Here's a possible solution:

mkdir /tmp/dev mount --move /dev /tmp/dev copy /dev to /mnt/sdb5 using one of the methods elsewhere on this page mount --move /tmp/dev /dev rmdir /tmp/dev


cp

Depending on which system you're working with, and what kind of data you're copying, you may be able to do a satisfactory local copy using just cp. However, there are a number of limits that prevent this from being a general solution.

The Gnu implementation of cp provides the following options (among others):

  • -a: same as -RP -p/--preserve=mode,ownership,timestamps,links,context,xattr
  • -v: verbose
  • -l: make hard links instead of copying
  • -s: make symlinks instead of copying (source file names must be absolute)
  • --sparse=always
  • -u, --update: don't copy files or symlinks over an existing file unless its mtime is older
  • -x, --one-file-system: skip subdirectories on different volumes

The BusyBox version provides all of the GNU options except --sparse, -u and -x, and may lack the context and xattr options to --preserve. It will preserve hard links: if /source/A and /source/B are hard links, /target/A and /target/B will also be hard links with each other (though not with the originals). BusyBox's cp seems to silently ignore the -v option.

The FreeBSD implementation provides all of these except -s, --sparse and -u, and the --preserve options from links on: in particular, it always copies hard links as separate files. The Mac version shares the limitations of the FreeBSD version, and additionally lacks -x. Also, on the Mac version the -a shortcut isn't available, you must explicitly say -RPp.

Note: As just explained, some of these implementations will break hard links.
Warning: Avoid the lower-case -r option. Its behavior varies between implementations, and may diverge from what you want for special files or symlinks.



tar

Options recognized by BusyBox and other implementations:

  • -c: create a new archive, records existing perms and mtimes, symbolic owner/group (can specify --numeric-owner), hard links among archive elements
  • -x: extract files from an archive, uses current owner/group unless root, subtracts current umask unless root or -p, when extracting hard links must include first
  • -t: list the contents of an archive
  • -f archive: use -f - for stdin/stdout
Note: On some but not all implementations, the archive defaults to stdin/stdout when -f is not supplied. It's most portable to always declare this explicitly.
  • -O: extract files to standard output
  • -C dir: change to directory dir, when creating this is order-sensitive
  • -v: verbose
  • -z: compress archive with gzip
  • -j: compress archive with bzip2
  • -Z: compress archive with compress
  • -h: follow symlinks, archive and extract the files they point to (like -L in other programs)
  • -m: don't extract file's original mtime, leave touched with the extraction time
  • -T file: get names to extract or create from file, which can also include "-C/etc" lines
  • --exclude=pattern: exclude specified glob patterns
  • -X file: exclude glob patterns listed in file
Note: Default globbing rules: (i) matches after any /, not just at ^; (ii) case-sensitive; (iii) wildcards match /

The following common options are allegedly also honored by BusyBox (according to comments in its source) but aren't declared in its --help:

  • -o, --no-same-owner: (BSD doesn't recognize a long option) extract as yourself (default for ordinary users)
  • -p, --same-permissions: (BSD doesn't recognize this long option) extract permissions verbatim, instead of subtracting current umask (usually default for root)
  • --no-same-permissions: subtract umask from extracted permissions (default for ordinary users)
  • --numeric-owner: create using numbers for user/groups
  • -k: don't replace existing files when extracting
  • --overwrite: (not on BSD) more aggressively overwrite existing files and directory metadata when extracting
    tar's default behavior is to first remove existing files (otherwise, all their hardlinks would be modified) and symlinks. If an existing dir is nonempty, its existing contents won't be removed; tar will instead just update the dir's metadata. --overwrite instead follows existing symlinks, and removes anything that impedes extraction except non-empty dirs. (Gnu's --recursive-unlink removes even those, may replace them with files or symlinks.)

GNU options (Mac and FreeBSD as well, unless noted) include:

  • --one-file-system: when creating archive, stay on the volume(s) where the specified roots are located, and don't cross mountpoints
  • --null: -T reads null-terminated names, ignores "-C/etc" entries
  • -l, --check-links: (BSD for -c only) when creating or extracting, print a message if not all hard links are processed
  • -S, --sparse: (not on BSD) when creating archive, handle sparse files efficiently


Some GNU tar formats:

  • in the future, it will default to -H posix/pax, which is the POSIX.1-2001 format
  • v.1.13 - current (v1.26 released Mar 2011, still current in Mar 2012): defaults to -H gnu, which is not very different from the -H oldgnu format.
  • -H ustar is the portable POSIX.1-1988 format, some limits including max 256 char filenames, max 8G filesize; supports special files, but not sparse files. Also seems not to support xattrs and ACLs (other tar formats are meant to handle ACLs, but see comments below)


Example of local copy:

sudo tar -cf- [--one-file-system] /source | sudo tar -xp[v]f- -C /target

The optional flags are:

  • --one-file-system: Stay on the volume where /source is located. This may or may not be the behavior you desire. Note that this feature isn't available on BusyBox tar in any case, though it is available in some other tar implementations.
  • -v: verbose

Example of using tar in cpio-style (for which, see below):

cd /source sudo find . [-xdev] -depth [-print0] | sudo tar -cf- -T- [--null] | sudo tar -xp[v]f- -C /target

The next examples add the -z flag, to compress output using gzip. If you have a fast connection, you may omit this; alternatively, you may use -j for bzip2. Example of copying to remote machine:

sudo tar -czf- [--one-file-system] /source | ssh root@machine "tar -xpz[v]f- -C /target"

Example of copying from remote machine:

ssh root@machine "tar -czf- [--one-file-system] /source" | sudo tar -xpz[v]f- -C /target

cpio

cpio operates in one of four modes:

  • -o -H newc: create archive on stdout (or -F archive), including filenames supplied from stdin (like tar -cf- -T-)
-H newc: (on Mac, -H sv4cpio and/or -c) specifies to use SVR4 ASCII format, this is required to create archives in BusyBox
  • -i: receive archive from stdin (or -F archive), extract specified patterns underneath curdir (like tar -xf-)
  • -t: (on some systems, may need -it) receive archive from stdin (or -F archive), list contents to stdout (like tar -tf-)
  • -p /target: receive list of files from stdin (or -F archive), copy them underneath /target (like tar -cf- -T- | tar -xf- -C /target)


tar vs cpio:

  • if you have includes/excludes, it's easier to use find than the unfamiliar and non-portable options of GNU tar
  • cpio handles special files (block and char devices, fifos, etc) traditional tar (-H v7 format) didn't, also didn't store symbolic owner/group
  • when extracting hard links, tar must always include the first
  • tar and the -H newc cpio format handle 32-bit inodes; though some older cpio formats didn't
  • the -H newc cpio format has max 1024 char filenames (though GNU cpio can handle arbitrary lengths) and max 4G filesize. The portable -H ustar tar format has max 256 char filenames and max 8G filesize limits
  • in some implementations, cpio wouldn't copy files over 2G properly. Don't have details about which implementations/versions are so limited
  • some tar formats are supposed to handle acls, however in practice this doesn't seem very reliable. Not sure what the situation is with cpio. rsync is known to work well for such cases


Example of local copy:

cd /source sudo find . [-xdev] -depth [\! -path ./lost+found] [-print0] | sudo cpio -pdm[vul] [-0] /target


The flags are:

  • -xdev: stay on the volume where /source is located; this may or may not be the behavior you desire
  • \! -path ./lost+found: omit empty ./lost+found; ! only needs to be escaped for some shells, and this sequence may be repeated
  • -print0 ... -0: permits processing filenames with embedded \ns, not available on BusyBox cpio or Mac
  • -d: create leading directories, like mkdir -p
  • -m: preserve file's original mtime
  • -v: verbose
  • -u: overwrite existing files, even if they're newer than files being copied
  • -l: create hard links instead of copying. This may or may not be the behavior you desire. Note that this feature isn't available on BusyBox cpio in any case, though it is available in some other cpio implementations.

Other flags are:

  • -oA -F archive: append to existing archive (not on BusyBox or BSD)
  • -L: follow symlinks (not on BusyBox)
  • --sparse: write files with blocks of zeros as sparse files (Gnu only)


Example of copying to remote machine:

cd /source sudo find . [-xdev] -depth [-print0] | sudo cpio -o -H newc [-0] | [gzip -3 |] ssh root@machine "cd /target; gunzip | cpio -idm[vu]"

rsync

rsync must be installed on both source and target machines. Example:

rsync -a[vzx] --delete [--numeric-ids] [-HAX] [--exclude=/proc --exclude=/dev --exclude=/sys] /source/ [root@machine:]/target

Note that the trailing / on /source/ is essential, if you want /target to end up a clone of /source.

The flags are:

  • -v: verbose
  • -z: compress data during transfer
  • -x or --one-file-system: skip subdirectories on different volumes
  • --numeric-ids: rsync's default is to use symbolic user/groupnames
  • -H or --hard-links: preserve hard links in copied data, this can be expensive, and won't break existing hard links on /target unless rsync needed to write updated data to some of the linked files
  • -A or --acls: preserve ACLs
  • -X or --xattrs: preserve xattrs
  • -8 or --8-bit-output: don't escape chars in filename even if they're invalid in current locale
  • --partial: keep partially-transferred files
  • --inplace: overwrite existing files in place
    rsync's default behavior is to write new files and move them into place when complete. This breaks any hard links the existing file may have had and makes a copy-on-write filesystem see the file as entirely new. --inplace specifies an alternate behavior, of writing updated data directly into the existing file.
  • -h or --human-readable: report statistics in prettier form (use once for powers of 10, twice for powers of 2)
  • --progress: show progress of each file transferred, implies -v
  • -n or --dry-run

rsync's full manpage

rsync's advantages:

  • like cp, can handle ACLs and xattrs. Many tar and cpio implementations don't
  • like cpio, can optionally work over a network
  • can resume incomplete transfers, and can do incremental updates if the data is being copied multiple times


Including/excluding from transfers:

  • --include=pattern[/]
  • --exclude=pattern[/]

Each of these may be repeated. The patterns may contain *, **, ?, [a-z]. A trailing dir/*** matches both dir and its contents. If pattern contains a ** or (non-trailing) /, it matches against the full path from /target, else it only matches against the final path element.

lvm

"This is one reason I like LVM. Just add the new disk to the volume group, pvmove the logical volumes from the old to new disk, [wait a bit], remove the old disk from the volume group, and then from the system. If it's your boot disk you're replacing then you also need to update your boot loader."

See Setting up Logical Volumes with LVM.


Other tools

cd /source pax -pe -rw -v [-X] -YZ . /target