How to setup a Alpine Linux mirror
Introduction
This document describes how to set up an Alpine Linux mirror and make it available via http and rsync.
We will:
- create the dir where we have the mirror
- set up a cron job to sync with master mirror every hour
- set up lighttpd for http access
- set up rsync so other mirrors can rsync from you
Make sure that you have enough disk space.
Current (2023-07-06) disk usage in GB:
edge | v3.0 | v3.1 | v3.2 | v3.3 | v3.4 | v3.5 | v3.6 | v3.7 | v3.8 | v3.9 | v3.10 | v3.11 | v3.12 | v3.13 | v3.14 | v3.15 | v3.16 | v3.17 | v3.18 | total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
348 | 17 | 18 | 15 | 21 | 25 | 27 | 45 | 43 | 59 | 73 | 92 | 126 | 148 | 156 | 181 | 186 | 195 | 214 | 242 | 2222 |
Script used to calculate the size:
#!/bin/sh total=0 dest="$(mktemp -d)" for dir in edge v3.0 v3.1 v3.2 v3.3 v3.4 v3.5 v3.6 v3.7 v3.8 v3.9 v3.10 v3.11 v3.12 v3.13 v3.14 v3.15 v3.16 v3.17 v3.18; do old_total="$total" src="rsync://rsync.alpinelinux.org/alpine/$dir/" size=$(rsync -a -n --stats "$src" "$dest" | grep '^Total file size' | tr -d ',' | awk '{ print $4 }') total=$(( old_total + size )) echo "$dir: $size" | awk '{ print $1 sprintf("%.1f", $2/1073741824) }' done echo "total: $total" | awk '{ print $1 sprintf("%.1f", $2/1073741824) }' rm -r "$dest"
Setting up the cron job
Install rsync which will be used to sync from the master mirror.
apk add rsync
Save the following file as /etc/periodic/hourly/alpine-mirror
#!/bin/sh # make sure we never run 2 rsync at the same time lockfile="/tmp/alpine-mirror.lock" if [ -z "$flock" ] ; then exec env flock=1 flock -n $lockfile "$0" "$@" fi src=rsync://rsync.alpinelinux.org/alpine/ dest=/var/www/localhost/htdocs/alpine/ # uncomment this to exclude old v2.x branches #exclude="--exclude v2.*" mkdir -p "$dest" /usr/bin/rsync \ --archive \ --update \ --hard-links \ --delete \ --delete-after \ --delay-updates \ --timeout=600 \ $exclude \ "$src" "$dest"
(or use this script)
Make it executable:
chmod +x /etc/periodic/hourly/alpine-mirror
Now it will sync every hour. (given cron runs)
Setting up HTTP access via lighttpd
Install the lighttpd server
apk add lighttpd
Enable dir listings by uncommenting the following line in /etc/lighttpd/lighttpd.conf:
dir-listing.activate = "enable"
Also set cache-control to force cache revalidate every 30 mins. Uncomment mod_setenv in /etc/lighttpd/lighttpd.conf:
"mod_setenv",
Add also the following lines to /etc/lighttpd/lighttpd.conf:
setenv.add-response-header += ( "Cache-Control" => "must-revalidate" )
Start lighttpd and make it start at boot:
rc-service lighttpd start rc-update add lighttpd
If so, simply install, start and auto-start the webserver:
apk add darkhttpd && rc-service darkhttpd start && rc-update add darkhttpd
Darkhttpd will, by default, offer directory listings and serve data from /var/www/localhost/htdocs/
See the main article on Darkhttpd for more configuration optionsSetting up rsyncd
Add the following lines to /etc/rsyncd.conf:
[alpine] path = /var/www/localhost/htdocs/alpine comment = My Alpine Linux Mirror
Optionally set a bandwidth limit in /etc/conf.d/rsyncd. In this example we limit to 500Kbytes/s (approx 5Mbit/s)
RSYNC_OPTS="--bwlimit=500"
Mirror statistics
Simple bandwidth statistics can be generated with vnstat.
apk add vnstat
edit /etc/vnstat.conf and replace the interface name with the appropriate one.
Start vnstatd
rc-service vnstatd start
copy the following script to /etc/periodic/15min/stats and make sure your crond is running. please not that heredoc should be tab indented or the script will fail. A working copy can be found here: https://tpaste.us/RrMv
#!/bin/sh output="/var/www/localhost/htdocs/.stats" nic="eth0" generate_index() { cat <<-EOF <!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="cache-control" content=no-cache"> <meta http-equiv="refresh" content="3000"> <title>Alpine Linux mirror statistics</title> </head> <body> <table border="0"> <tr><td><img src="summary.png" alt="summary"></td><td><img src="hours.png" alt="hours"></td></tr> <tr><td rowspan="2"><img src="days.png" alt="days"></td><td><img src="top10.png" alt="top10"></td></tr> <tr><td><img src="months.png" alt="months"></td></tr> </table> </body> </html> EOF } if [ ! -f "$output"/index.html ]; then mkdir -p $output generate_index > "$output"/index.html fi for type in hours days months top10 summary hsummary vsummary; do vnstati --${type} -i $nic -o $output/${type}.png done
Update mirror from mqtt
If you want your mirror to be really uptodate compared to our master mirror you can subscribe to Alpine Linux message server "msg.alpinelinux.org" and check for upload messages. Add mqtt-exec to be able to execute processes when specific topics are being send.
apk add mqtt-exec
mqtt-exec supports running multiple time so we need to setup a specific config.
ln -s mqtt-exec /etc/init.d/mqtt-exec.sync-mirror
ln -s mqtt-exec /etc/conf.d/mqtt-exec.sync-mirror
edit /etc/conf.d/mqtt-exec.sync-mirror
mqtt_topics="rsync/rsync.alpinelinux.org/#" exec_user="buildozer" exec_command="/usr/local/bin/sync-mirror"
Copy the following file to /usr/local/bin/sync-mirror and make it executable (dont forget to update the variables).
#!/bin/sh src="rsync://rsync.alpinelinux.org/alpine/" dest="/var/www/localhost/htdocs/alpine/" lock="/tmp/sync-mirror.lock" topic="$1" dir="$2" [ -z "$flock" ] && exec env flock=1 flock $lock $0 "$@" if [ -n "$dir" ] && [ -d "$dest/${dir%/*}" ]; then logger "Syncing directory: $dir" src="${src}${dir%/}/" dest="${dest}${dir%/}/" else logger "Syncing all directories" fi /usr/bin/rsync \ --archive \ --update \ --verbose \ --progress \ --timeout=600 \ --delay-updates \ --delete-after \ "$src" \ "$dest"
And finally start mqtt-exec and let it listen on msg.alpinelinux.org
rc-service mqtt-exec.sync-mirror start
To make sure you are not missing any packages (in case something goes wrong with MQTT subscription) you can periodically sync all directories by adding the script to cron.
ln -s /usr/local/bin/sync-mirror /etc/periodic/hourly/sync-mirror
Now watch your syslog as it should tell you when it will update directories in your local mirror.
Partial mirror using nginx
For a private mirror it might make sense to sync only the newest versions of Alpine to save space, but if you do point an old Alpine version to your mirror they should still be able to install packages. We can achieve this by using nginx to serve the mirrored content and use regex location matching to redirect requests to a public mirror.
Let's assume you chose to only mirror Alpine versions up from v3.13. If a client asks your mirror for v.3.10 it should redirect to another mirror.
Your nginx config server block should look something like this:
server { listen 80; server_name alpine.mydomain.local; root /data/alpine; # point to where your alpine mirror is located. make sure nginx is allowed to read it autoindex on; # Enable indexing # the following location block will match for v3.0 to v3.12 # and will forward it to dl-4.alpinelinux.org. location ~* /(v3\.([1-9]|1[012]))$ { return 302 http://dl-cdn.alpinelinux.org/alpine$request_uri; } }
The corresponding sync script could look something like this:
#!/bin/sh # make sure we never run 2 rsync at the same time lockfile="/tmp/alpine-mirror.lock" if [ -z "$flock" ] ; then exec env flock=1 flock -n $lockfile "$0" "$@" fi src=rsync://rsync.alpinelinux.org/alpine/ dest=/data/alpine/ exclude="--exclude v2.* --exclude v3.0 --exclude v3.1 --exclude v3.2 --exclude v3.3 --exclude v3.4 --exclude v3.5 --exclude v3.6 --exclude v3.7 --exclude v3.8 --exclude v3.9 --exclude v3.10 --exclude v3.11 --exclude v3.12" mkdir -p "$dest" /usr/bin/rsync -vvv \ --archive \ --update \ --hard-links \ --delete \ --delete-after \ --delete-excluded \ --delay-updates \ --timeout=600 \ $exclude \ "$src" "$dest"