User talk:Jch: Difference between revisions
m (→Bootstrap) |
m (→Bootstrap) |
||
Line 488: | Line 488: | ||
openvswitch mdadm qemu screen collectd collectd-network gptdisk irqbalance ssmtp mailx | openvswitch mdadm qemu screen collectd collectd-network gptdisk irqbalance ssmtp mailx | ||
=== Bootstrap === | === Bootstrap PXEboot capacity === | ||
First, we setup the network. Remember, this is a bootstrap. We assume nothing | First, we setup the network. Remember, this is a bootstrap. We assume nothing.<br/> | ||
It means we may take any decision we see fit. | It means we may take any decision we see fit. | ||
Our primary machine is the only fixed point for now. Let's give it the number 1.<br/> | Our primary machine is the only fixed point for now. Let's give it the number 1.<br/> | ||
All machines | All machines will be connected to the LAN. We know nothing yet about other NICs.<br/> | ||
First we must decide about the | First we must decide about the LAN_IP_RANGE. For instance be it 192.168.1.0/24. | ||
We will use complicated network setup, let's start by installing openvswitch | We will use complicated network setup, let's start by installing openvswitch | ||
rc-service networking start | rc-service networking start | ||
apk add openvswitch | apk add openvswitch consul | ||
rc-update add ovs-modules | rc-update add ovs-modules | ||
rc-update add ovsdb-server | rc-update add ovsdb-server | ||
Line 507: | Line 507: | ||
rc-service ovs-vswitch start | rc-service ovs-vswitch start | ||
ovs-vsctl add-br lan | ovs-vsctl add-br lan | ||
ovs-vsctl add-br wan | |||
ovs-vsctl add-br storage | |||
ovs-vsctl add-br ipmi | |||
ovs-vsctl add-br vpn | |||
ovs-vsctl add-port lan eth0 | ovs-vsctl add-port lan eth0 | ||
vi /etc/network/interfaces #iface eth0 inet manual | vi /etc/network/interfaces #iface eth0 inet manual | ||
#iface lan inet dhcp | #iface lan inet dhcp | ||
rc-service networking restart | rc-service networking restart | ||
rc-service sshd restart | rc-service sshd restart | ||
No machine will offer service from bare metal. LXC will be prefered, KVM otherwise. | No machine will offer service from bare metal. LXC will be prefered, KVM otherwise. | ||
apk add qemu-system-x86_64 screen libusb | if /dev/sda # bare metal or first level KVM | ||
then | |||
apk add qemu-system-x86_64 screen libusb | |||
modprobe kvm | |||
modprobe kvm-intel | |||
screen -m -d -S KVM-PXE-01 qemu-system-x86_64 -kvm -kernel /kernel -initrd /initrd -append alpine_dev=...,apkovl=... -net -net -drive /media/usb/pxeboot.img,ro | modprobe kvm-amd | ||
modprobe tun | |||
elif /dev/vda # second level KVM | |||
## | |||
fi | |||
On instancie un PXEboot server | |||
screen -m -d -S KVM-PXE-01 qemu-system-x86_64 -kvm -kernel /kernel -initrd /initrd -append alpine_dev=...,apkovl=... -net -net -drive /media/usb/custom/pxeboot.img,ro | |||
La suite immédiate se fait dans cette VM | La suite immédiate se fait dans cette VM | ||
Line 528: | Line 538: | ||
In KVM-PXE-01 | In KVM-PXE-01 | ||
setup-alpine --mode data | setup-alpine --mode data | ||
vi /etc/network/interfaces #iface eth0 inet static LAN_IP | vi /etc/network/interfaces #iface eth0 inet static LAN_IP=1 | ||
reboot KVM-PXE-01 | reboot KVM-PXE-01 | ||
apk add dhcpd | apk add dhcpd | ||
Line 544: | Line 554: | ||
apk add nfs-utils | apk add nfs-utils | ||
rc-update add nfs | rc-update add nfs | ||
vi /etc/exports ${LAN_IP_RANGE} | vi /etc/exports /var/tftpboot/media ${LAN_IP_RANGE} | ||
cp -pr /media/usb /var/tftpboot/media | |||
rc-service nfs start | rc-service nfs start | ||
mkdir -p /var/tftpboot/alpine | mkdir -p /var/tftpboot/alpine | ||
Line 581: | Line 592: | ||
configure consul as server | configure consul as server | ||
rc-service consul start | rc-service consul start | ||
apk add timeserver | apk add timeserver | ||
rc-update add timeserver | rc-update add timeserver | ||
Line 590: | Line 599: | ||
rc-update add dnscache | rc-update add dnscache | ||
rc-service dnscache start | rc-service dnscache start | ||
apk add git | |||
consul service add pxe | |||
consul service add repo | |||
consul service add dnscache | |||
consul service add timeserver | |||
# detect if running from key or from pxe | |||
if $run_from_usb() | |||
then # /media/usb | |||
mkdir -p /var/www/localhost/htdocs/apkovl | |||
cd /var/www/localhost/htdocs/apkovl | |||
git init | |||
# populate with default config for PXEboot client | |||
git add . | |||
git commit -m "apkovl:: initial commit" | |||
else # /media/alpine | |||
## | |||
consul join pxeserver | |||
rm -fr /var/www/localhost/htdocs/apkovl | |||
cd /var/www/localhost/htdocs/ | |||
git clone apkovl | |||
cd /var/tftpboot/ | |||
git clone pxelinux.cfg | |||
fi | |||
if consul leader = self | |||
then | |||
rc-service dhcpd restart | |||
else | |||
rc-service dhcpd stop | |||
dhclient lan | |||
fi | |||
check_consul_leader_is_dhcpd_server() | |||
=== Bootstrap regular machines === | |||
setup-alpine --mode none | |||
rc-service networking start | |||
apk add openvswitch consul openssh lxc rsync screen git curl collectd collectd-network | |||
rc-update add consul | |||
rc-update add sshd | |||
rc-update add collectd | |||
rc-update add ovs-modules | |||
rc-update add ovsdb-server | |||
rc-update add ovs-vswitch | |||
rc-service ovs-modules start | |||
rc-service ovsdb-server start | |||
rc-service ovs-vswitch start | |||
ovs-vsctl add-br lan | |||
ovs-vsctl add-br wan | |||
ovs-vsctl add-br storage | |||
ovs-vsctl add-br ipmi | |||
ovs-vsctl add-br vpn | |||
ovs-vsctl add-port lan eth0 | |||
vi /etc/network/interfaces #iface eth0 inet manual | |||
#iface lan inet dhcp | |||
rc-service networking restart | |||
mkdir -p ~/.ssh | |||
scp -r pxeserver:bootstrap/ssh/authorized_keys ~/.ssh/ | |||
chmod -R go-rwx ~/.ssh | |||
lbu add ~/.ssh/authorized_keys | |||
rc-service consul start | |||
consul join pxeserver | |||
consul service add ssh check | |||
if /dev/sda | |||
then # prepare SAN | |||
apk add qemu-system-x86_64 screen libusb | |||
modprobe kvm | |||
modprobe kvm-intel | |||
modprobe kvm-amd | |||
modprobe tun | |||
apk add mdadm | |||
rc-update add mdadm | |||
rc-update add mdadm-raid | |||
mdadm create raid1 /dev/md0 /dev/sda1 /dev/sdb1 | |||
consul service add raid | |||
screen -m -d -S storage qemu -net -net -boot n -drive /dev/md0 | |||
lbu package | |||
scp ~/${hostname}.apkovl.tar.gz pxeserver:/var/www/localhost/htdocs/apkovl/machine.apkovl.tar.gz | |||
elif /dev/vda and SAN | |||
apk add lvm2 nbd gptfdisk netcat # later we hope for rbd (ceph) also | |||
pv create /dev/vda | |||
vg create /dev/vda storage | |||
consul service add storage | |||
# every SAN will have a copy of the needed files to start a new PXE server | |||
lv create -L 32g -n pxeserver storage | |||
# we copy it from the running consul leader (the active PXE server) | |||
screen -m -d -S REC-pxeserver "nc -l -p 12345 | dd BS=16M of=/dev/storage/pxeserver" | |||
ssh pxeserver screen -m -d -S SND-pxeserver "dd BS=16M if=/dev/vda | nc ${self} 12345" | |||
nbd-server publish storage/pxeserver ro | |||
lbu package | |||
scp ~/${hostname}.apkovl.tar.gz pxeserver:/var/www/localhost/htdocs/apkovl/san.apkovl.tar.gz | |||
elif | |||
apk add xfsprogs btrfsprogs gptfdisk nfs-utils lxc | |||
consul service add lxc | |||
modprobe xfs nbd | |||
rc-update add nfs | |||
wget pxeserver:fichier_de_conf?name=${MAC} | |||
setup-alpine --mode data -f fichier_de_conf | |||
lbu package | |||
scp ~/${hostname}.apkovl.tar.gz pxeserver:/var/www/localhost/htdocs/apkovl/default.apkovl.tar.gz | |||
fi | |||
reboot | |||
Revision as of 09:01, 7 April 2015
How to automate KVM creation
The goal is not only to have a working install but to have it at the after setup-alpine stage without human intervention... Tis is the first stages of a work in progress...
I want to pass a Block Device and a name as parameters. The block device could be an image file, a LV, a NBD, a hdd, a raid array, whatever.
Everything else should be fully automatic according to some config file (stating the http-proxy, the time server, the log server, ...).
The I will just run the script, watch my dhcp logs to discover the new IP assigned (that's why the name is a parameter), then log in with ssh without password to customize it further but at high level only (will be a robot and not me in fact).
I guess it would be something like emulate boot from usb key with specific overlay already on key...
then run setup-disk with proper parameters on the command line to avoid the interactive process (like setup-alpine does)...
Methink this could be done from a couple of scripts put in /etc/local.d/. The last.stop one deleting all of them to be clean at next reboot.
Let's start easy ;)
How to prepare a img file to emulate an USB key
first a working example done in console (accessed trough ssh).
Will build a script from it...
First, lets's prepare somme block device (here an image file but could be something else)
apk add qemu-img qemu-img create -f raw usbkey.img 512M apk del qemu-img T="usbkey.img"
Next, let's install AL on this $T
apk add multipath-tools syslinux dosfstools fdisk $T kpartx -av $T mkdosfs -F32 /dev/mapper/loop1p1 dd if=/usr/share/syslinux/mbr.bin of=/dev/mapper/loop1 syslinux /dev/mapper/loop1p1 mkdir key mount -t vfat /dev/mapper/loop1p1 key wget http://wiki.alpinelinux.org/cgi-bin/dl.cgi/v3.1/releases/x86_64/alpine-mini-3.1.1-x86_64.iso mkdir cdrom mount alpine-mini-3.1.1-x86_64.iso cdrom cd cdrom cp -a .alpine-release * ../key/ cd .. umount key umount cdrom kpartx -d $T apk del multipath-tools syslinux dosfstools rm alpine-mini-3.1.1-x86_64.iso
This block device may now be use to boot some KVM for instance like:
screen -d -m -S KVM-builder \ qemu-system-x86_64 -name KVM-usb -enable-kvm -cpu qemu64 -curses \ -device nec-usb-xhci -drive if=none,id=usbstick,file=$T -device usb-storage,drive=usbstick
This is working fine. The problem is when adding a HDD to the lot, qemu try to boot from the hdd and does not even try to boot from the usb key. Enabling menu in boot let's one access the emulated bios which allows to select USB device to boot interactively but this break the goal of fully automated boot :( The stanza is for instance
screen -d -m -S KVM-builder \ qemu-system-x86_64 -name KVM-usb -enable-kvm -cpu qemu64 -curses \ -device nec-usb-xhci -drive if=none,id=usbstick,file=$T -device usb-storage,drive=usbstick \ -drive file=$T2 boot menu=on
qemu-doc states that very clearly:
> -boot [order=drives][,once=drives][,menu=on|off][,splash=sp_name][,splash-time=sp_time][,reboot-timeout=rb_timeout][,strict=on|off]
> Specify boot order drives as a string of drive letters. Valid drive letters depend on the target achitecture. The x86 PC uses: a, b (floppy 1 and 2), c (first hard disk), d (first CD-ROM), n-p (Etherboot from network adapter 1-4), hard disk boot is the default
Starting AL from network
As it does not seems possible to start qemu with a virtual USB key *and* a virtual HDD attached to the VM. Let's try something different: to start AL from the network and mount the HDD later on...
Usually this kind of setup needs
- a DHCP server to get an IP address and the location of the TFTP server
- a TFTP server to download the kernel and tje root file system to boot from
- a NFS server or a HTTP one to get the overlay used to configure the machine
- a NFS server to share files with others
- a NBD server to get his own block devices as storage
- a machine where to prepare initramfs
First, let's check what is vailable in AL and what is not...
- dhcpcd-6.6.7-r0
- tftp-hpa-5.2-r1
- nfs-utils-1.3.1-r2
- darkhttpd-1.10-r1
- nbd-3-10-r0
PXE_boot
We are trying to do something as in PXE_boot.
We did it on separate machine for each service. It forces us to deeply understand all interactions between processes.
In current state we
umount /media/alpine
as last step of the boot procees and we are running with no tie.
dhcpd
192.168.1.1
with package dhcp from repo. Nothing special.
filename "pxelinux.0"; next-server 192.168.1.2;
and
# Disable RFC 2136 dynamic DNS updates. ddns-update-style none; # Define actions to take when leases are committed, released, or expired to # accomplish dynamic DNS updates to djbdns. This does not use the RFC 2136 # update mechanism, because djbdns does not support it. However, it # accomplishes the same thing. # syntax "execute(cmd, arg, ...)" ### need to check if the two "on EVENT" must be nested or in sequence... on commit { execute ("/usr/local/bin/dns-update-djb", "commit", lcase (option host-name), config-option domain-name, binary-to-ascii (10, 8, ".", leased-address)); on release or expiry { execute ("/usr/local/bin/dns-update-djb", "release", binary-to-ascii (10, 8, ".", leased-address)); } }
with a custom /usr/local/bin/dns-update-djb script largely inspired from https://sites.google.com/site/dmoulding/dns-update-djb but adapted for a distant tinydns server and to the AL way.
tftp
192.168.1.2
tftp-hpa configured to serve some SYSLINUX files.
The config is in /etc/conf.d/in.tftpd
Then to issue:
rc-update add in.tftpd rc-service in.tftpd start
We serve from /var/tftpboot.
We add to temporary install the syslinux apk to get pxelinix.0 and other libs needed.
We did prepare a "pxerd" initramfs file with virtio_net.ko, dhcp and nfs included; made sure loop and squashfs are included.
pxelinux.cfg/default looks like
PROMPT 0 TIMEOUT 3 default alpine LABEL alpine LINUX alpine/vmlinuz-grsec INITRD alpine/pxerd APPEND ip=dhcp alpine_dev=nfs:192.168.1.3:/srv/boot/alpine modloop=/boot/grsec.modloop.squashfs nomodeset quiet apkovl=http://192.168.1.4/localhost.apkovl.tar.gz #APPEND modloop=http:/192.168.1.4/grsec.modloop.squashfs #APPEND apkovl=http://192.168.1.4/localhost.apkovl.tar.gz # including the modloop hack #APPEND alpine_repo=http://repo-url
Modules are loaded
/ # lsmod Module Size Used by Not tainted nfsv3 22784 1 nfs 144376 2 nfsv3 lockd 71917 2 nfsv3,nfs sunrpc 225574 6 nfsv3,nfs,lockd af_packet 28735 0 sr_mod 13487 0 cdrom 40424 1 sr_mod pata_acpi 3326 0 ata_piix 25601 0 ata_generic 3554 0 libata 181955 3 pata_acpi,ata_piix,ata_generic virtio_net 19684 0 scsi_mod 113710 2 sr_mod,libata virtio_pci 6485 0 virtio 4933 2 virtio_net,virtio_pci virtio_ring 9161 2 virtio_net,virtio_pci squashfs 25893 1 loop 18243 2
Network is up
/ # ifconfig eth0 Link encap:Ethernet HWaddr 52:54:33:B0:C2:D2 inet addr:192.168.1.108 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:322 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:20514 (20.0 KiB) TX bytes:684 (684.0 B)
but modloop does not load This patch fix this issue (hope to see it mainstream soon)
localhost:~# diff /etc/init.d/modloop modloop.new --- /etc/init.d/modloop +++ modloop.new @@ -32,7 +32,7 @@ local search_dev="$1" fstab="$2" local dev mnt fs mntopts chk case "$search_dev" in - UUID=*|LABEL=*|/dev/*);; + UUID=*|LABEL=*|/dev/*|nfs);; *) search_dev=/dev/$search_dev;; esac local search_real_dev=$(resolve_dev $search_dev) @@ -49,6 +49,10 @@ fi done done + if [ "$fs" = "$search_dev" ]; then + echo "$mnt" + return + fi done < $fstab 2>/dev/null }
References
http://www.syslinux.org/wiki/index.php/PXELINUX
nfs
192.168.1.3
see http://wiki.alpinelinux.org/wiki/User_talk:Jch#NFS_bug_study
It is now working with http://dev.alpinelinux.org/~clandmeter/rpcbind-0.2.3_rc2-r0.apk
We serve the content of an usb key (iso) in ro as
/srv/boot/alpine *(ro,no_root_squash,no_subtree_check)
http
192.168.1.4
With package Darkhttpd from repo serving from /var/tftpboot/ to serve files needed to boot (kernel, rootfs, apkovl.tar.gz)
nbd
192.168.1.5
I really would like to have xnbd-server in AL. nbd-3.1.0 was just added to edge/testing repo; need to try it in real situation...
For now, we have a qcow2 debian image added to the apkovl with lbu add; lbu ci.
This image is used to launch a first KVM with /dev/mdX as second drive.
In turn, inside the KVM, vdb is used to define a lvm2 volume.
The LV are published with xnbd-server.
Later on, the same KVM will be able to connect to RBD device and re-publish it as NBD.
xnbd-server allows live migration of Block Devices while live. And has a powerfull proxy mode.
All other KVM are running from FS accessed trough NBD from such SAN. Even other SAN.
As soon as those KVM-NBD are up, they may be used to launch others or to provide datastores.
We put that image on every USB key we use along with mdadm and OpenVSwitch (and collectd).
dns
192.168.1.6
tinydns from repo with split-dns config.
Building a complete infrastucture with AL
I'm doing it. It's for real! That's my daily job at present ^^
I'm building a full private cloud bootstraped with only an AlpineLinux USB key for each physical machine. But next ones will be able to boot from network; not even USB keys will be needed. As a matter of fact, we used more than only one physical USB key because we didn't started from scratch but had a live migration from Debian to Alpine for most of the services and machines...
If there is some feed-back, I may develop config files and so on ;)
As I started from scratch and OpenVSwitch was not available in Alpine at that time yet, It took me a while to build everything. But to reproduce it, it would be piece of cake!
We use qemu-kvm for KVM. But I guess one may use whatever Virtual Machine technology one likes.
This is the presentation of a use case. Not a HOW TO. And it's still a work in progess...
Network
Firewall
We put a dedicated physical machine on each link between our LAN and other networks. It just run iptables and some paquets accounting metrology.
Router
Physical machine connected to our LAN and other networks (trough a firewall). A static routing table do the trick.
Switches
All physical machines run OpenVSwitch reproducing virtually all physical switches we have plus some virtuals only.
VPN
All physical machines run openVPN as client to as many switch defined less the physical interfaces of the machine. There is an openVPN server somewhere running in a KVM connected to needed switches.
Storage
SAN
On each physical machine, a couple of HDD are mounted in raid1 witch mdadm. This raid array is passed as parameter to a KVM who in turn mount it as physical volume for LVM. The created LV are published as NBD with xnbd-server. For the time being, this KVM is running debian 7.8 as xnbd is not in Alpine (yet?)..
The SAN also connects to the CEPH cluster as client and publish reached RBD as NBD with xnbd-server. For the time being, this KVM is running debian 7.8 as no xnbd nor RBD are in Alpine (yet?)..
NAS
Some KVM is mounting some NBD as local drives and publishing some directories as NFS shares.
We now have nfs and nbd in AL.
CEPH
KVM with physical HDD as parameters are used for building OSD and MON needed to operate a CEPH cluster. One KVM is the "console" to drive it from a single point of presence (usefull but not "needed").For the time being, those KVM are running debian 7.8 as CEPH and RBD are not in Alpine (yet?)..
Low-level services
No service at all is running in the AL on bare metal. All are running is some KVM connected to needed switches by the means of the OpenVSwitches. The apkovl on the USB keys contains only the scripts to launch KVM and one image file to launch the first SAN. Other KVM are launched from LV in the SAN.
dhcp
Exactly two KVM stored in different SAN, primary and secondary in failover mode, are running dhcpd from repo.
We just have to configure it properly.
We have to test if dhcpd may run in a LXC instead of a KVM?
DNS
tinydns from repo with split-dns config.
Resolver
With dnscache from repo.
Those KVM have manually assigned IP address in the LAN and does know a gateway to the Internet.
They use themselves as resolver...
They know the direct manually assigned IP address in the LAN of the main DNS server of selected domains (for split dns configuration).
PXEboot
kernel and initrd files in tftp server.
copy of usb content in nfs server.
apkovl files in darkhttpd server.
Time server
The router (who has access to internet) usr ntpd (or similar) from repo, to act as client to the WAN and server to the LAN.
syslog
With syslog-ng from repo, we receive the logs from all machines be it physical or virtual.
It's the only place who needs logrotate from repo.
HTTP proxy/cache
The web proxy/cache squid, from repo, uses a NBD as cache. It has a link to the internet to forward requests and one to the LAN.
Because of him, no machine, as they are all connected to the LAN, be it physical or virtual, needs a published default gateway. And all machines are able to install/upgrade packages or to see the WWW as client.
We point all AL boxes to this KVM with setup-proxy.
Monitoring
shinken from sources in some LXC with barely only the python package installed
Metrology
Collectd (one LXC as server, all other machines, be it physical or virtual, as client) with collectd-network from repo.
A couple of lines in CGP config file is enough for now.
Backups
with common tools: rsync, tar, nc, bzip2, openssh, cron
LDAP
openldap with openldap-back-hdb, both from repo.
http://www.openldap.org/doc/admin24/backends.html states
> The hdb backend to slapd(8) is the recommended primary backend for a normal slapd database.
And
> Note: The hdb backend has superseded the bdb backend, and both will soon be deprecated in favor of the new mdb backend.
> The mdb backend to slapd(8) is the upcoming primary backend for a normal slapd database. It uses OpenLDAP's own Lightning Memory-Mapped Database (LMDB) library to store data and is intended to replace the Berkeley DB backends.
Unfortunately there is no openldap-back-mdb package in AL yet.
High-level services
in LXC AL whenever possible.
in LXC Debian as second choice
in KVM otherwise.
x2goserver
I did package nx-libs and x2goserver. I'm waiting for the packages to be included in edge/testing. They are already being used for single app access. Next step is full desktop but we are not sure if AL is the right choice for full desktop usage for our customers...
unfortunately, x2goclient pops up "kex error : did not find one of algos diffie-hellman-group1-sha1 in list curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1 for kex algos" need to specify diffie-hellman-group1-sha1 in sshd_config. Luckyly a fix exists and my business partner is looking after a way to enhance it's security upstream.
ejabberd
with ejabberd from edge/testing repo. I migrate the mnesia DB from an old debian squeeze just copying the files and changing ownership in a LXC-AL. I just had to disable mod_pubsub to have it run properly. Authentification is done with openLDAP. I now plan to migrate a very very old jabberd (11 years I guess) running on a debian etch to it if I find a way to keep users's password and rosters... I also would like to use it as a gateway to IRC to follow #alpine, #alpine-devel and #x2go channels ;) Some other ejabberd features are interesting to my organisation and we will experiment more in depth, namely mod_sip, mod_stun, mod_proxy65...
redmine
in a brand new LXC with edge/main and edge/testing repos
mostly following Redmine page
I use a mariaDB server on another host where I did create the user and pushed the sqldump from a running redmine 3.0.0 instance
apk update apk upgrade reboot setup-timezone apk add redmine apk add ruby-unicorn cp /etc/unicorn/redmine.conf.rb.sample /etc/unicorn/redmine.conf.rb vi /etc/conf.d/unicorn vi /etc/redmine/database.yml apk add sudo apk add ruby-mysql2 apk add ruby-yard apk add tzdata cd /usr/share/webapps/redmine sudo -u redmine rake generate_secret_token sudo -u redmine RAILS_ENV=production rake db:migrate
LDAP
SMTP in
Antispam
Antivirus
SMTP store
IMAP
SMTP relay
SMTP out
Webmail
webhosting
Front-end
Back-end static
Back-end dynamic
Master key
We want to be able to bootstrap the full infrastructure from only one usb key and one machine with physical access (to insert the usb key obviously).
This key will run AL stable. With only very few packages installed. But some images on the storage.
almirror:~# du -shc /var/www/localhost/htdocs/alpine/????/*/x86_64 3.4G /var/www/localhost/htdocs/alpine/edge/main/x86_64 1.3G /var/www/localhost/htdocs/alpine/edge/releases/x86_64 2.3G /var/www/localhost/htdocs/alpine/edge/testing/x86_64 3.2G /var/www/localhost/htdocs/alpine/v3.1/main/x86_64 6.5G /var/www/localhost/htdocs/alpine/v3.1/releases/x86_64 16.6G total
A repo with stable and edge will be present on the 32GB USB stick.
Initial packages
dhcpd tftp syslinux nfs darkhttp openssh vim openvswitch mdadm qemu screen collectd collectd-network gptdisk irqbalance ssmtp mailx
Bootstrap PXEboot capacity
First, we setup the network. Remember, this is a bootstrap. We assume nothing.
It means we may take any decision we see fit.
Our primary machine is the only fixed point for now. Let's give it the number 1.
All machines will be connected to the LAN. We know nothing yet about other NICs.
First we must decide about the LAN_IP_RANGE. For instance be it 192.168.1.0/24.
We will use complicated network setup, let's start by installing openvswitch
rc-service networking start apk add openvswitch consul rc-update add ovs-modules rc-update add ovsdb-server rc-update add ovs-vswitch rc-service ovs-modules start rc-service ovsdb-server start rc-service ovs-vswitch start ovs-vsctl add-br lan ovs-vsctl add-br wan ovs-vsctl add-br storage ovs-vsctl add-br ipmi ovs-vsctl add-br vpn ovs-vsctl add-port lan eth0 vi /etc/network/interfaces #iface eth0 inet manual #iface lan inet dhcp rc-service networking restart rc-service sshd restart
No machine will offer service from bare metal. LXC will be prefered, KVM otherwise.
if /dev/sda # bare metal or first level KVM then apk add qemu-system-x86_64 screen libusb modprobe kvm modprobe kvm-intel modprobe kvm-amd modprobe tun elif /dev/vda # second level KVM ## fi
On instancie un PXEboot server
screen -m -d -S KVM-PXE-01 qemu-system-x86_64 -kvm -kernel /kernel -initrd /initrd -append alpine_dev=...,apkovl=... -net -net -drive /media/usb/custom/pxeboot.img,ro
La suite immédiate se fait dans cette VM
screen -r KVM-PXE-01
We need the storage space from the usb key to handle boot images and apkovl files (as vda1).
In KVM-PXE-01
setup-alpine --mode data vi /etc/network/interfaces #iface eth0 inet static LAN_IP=1 reboot KVM-PXE-01 apk add dhcpd rc-update dhcpd vi /etc/dhcp/dhcpd.conf #filename "pxelinux.0"; #next-server ${LAN_IP}; apk add darkhttp rc-update add darkhttpd vi /etc/darkhttp ${LAN_IP} rc-service darkhttpd start apk add tftp-hla rc-update add tftp vi /etc/tftp ${LAN_IP} rc-service tftp start apk add nfs-utils rc-update add nfs vi /etc/exports /var/tftpboot/media ${LAN_IP_RANGE} cp -pr /media/usb /var/tftpboot/media rc-service nfs start mkdir -p /var/tftpboot/alpine cp /media/usb/boot/vmlinuz* /var/tftpboot/alpine/ cp /media/usb/boot/modloop* /var/tftpboot/alpine/ apk add mkinitfs cd /etc/mkinitfs vi features.d/network.modules vi features.d/dhcp.files vi features.d/dhcp.modules vi features.d/nfs.modules vi mkinitfs.conf # add network, dhcp, nfs and squashfs mkinitfs -o /var/tftpboot/alpine/pxerd apk del mkinitfs apk add syslinux cp /usr/share/syslinux/pxelinux.0 /var/tftpboot/ cp /usr/share/syslinux/ldlinux.c32 /var/tftpboot/ apk del syslinux mkdir -p /var/tftpboot/pxelinux.cfg vi /var/tftpboot/pxelinux.cfg/default src=rsync://rsync.alpinelinux.org/alpine/ dest=/var/www/localhost/htdocs/alpine/ exclude="--exclude v2.[0-9] --exclude v3.0 --exclude edge-uclibc --exclude armhf --exclude x86/" mkdir -p "$dest" /usr/bin/rsync -prua \ $exclude \ --delete \ --timeout=600 \ --delay-updates \ --delete-after \ "$src" "$dest" dest=/var/www/localhost/htdocs/apkovl/ mkdir -p "$dest" apk add consul rc-update add consul default configure consul as server rc-service consul start apk add timeserver rc-update add timeserver rc-service timeserver start apk add dnscache vi /etc/dnscache.conf rc-update add dnscache rc-service dnscache start apk add git consul service add pxe consul service add repo consul service add dnscache consul service add timeserver # detect if running from key or from pxe if $run_from_usb() then # /media/usb mkdir -p /var/www/localhost/htdocs/apkovl cd /var/www/localhost/htdocs/apkovl git init # populate with default config for PXEboot client git add . git commit -m "apkovl:: initial commit" else # /media/alpine ## consul join pxeserver rm -fr /var/www/localhost/htdocs/apkovl cd /var/www/localhost/htdocs/ git clone apkovl cd /var/tftpboot/ git clone pxelinux.cfg fi if consul leader = self then rc-service dhcpd restart else rc-service dhcpd stop dhclient lan fi check_consul_leader_is_dhcpd_server()
Bootstrap regular machines
setup-alpine --mode none rc-service networking start apk add openvswitch consul openssh lxc rsync screen git curl collectd collectd-network rc-update add consul rc-update add sshd rc-update add collectd rc-update add ovs-modules rc-update add ovsdb-server rc-update add ovs-vswitch rc-service ovs-modules start rc-service ovsdb-server start rc-service ovs-vswitch start ovs-vsctl add-br lan ovs-vsctl add-br wan ovs-vsctl add-br storage ovs-vsctl add-br ipmi ovs-vsctl add-br vpn ovs-vsctl add-port lan eth0 vi /etc/network/interfaces #iface eth0 inet manual #iface lan inet dhcp rc-service networking restart mkdir -p ~/.ssh scp -r pxeserver:bootstrap/ssh/authorized_keys ~/.ssh/ chmod -R go-rwx ~/.ssh lbu add ~/.ssh/authorized_keys rc-service consul start consul join pxeserver consul service add ssh check if /dev/sda then # prepare SAN apk add qemu-system-x86_64 screen libusb modprobe kvm modprobe kvm-intel modprobe kvm-amd modprobe tun apk add mdadm rc-update add mdadm rc-update add mdadm-raid mdadm create raid1 /dev/md0 /dev/sda1 /dev/sdb1 consul service add raid screen -m -d -S storage qemu -net -net -boot n -drive /dev/md0 lbu package scp ~/${hostname}.apkovl.tar.gz pxeserver:/var/www/localhost/htdocs/apkovl/machine.apkovl.tar.gz elif /dev/vda and SAN apk add lvm2 nbd gptfdisk netcat # later we hope for rbd (ceph) also pv create /dev/vda vg create /dev/vda storage consul service add storage # every SAN will have a copy of the needed files to start a new PXE server lv create -L 32g -n pxeserver storage # we copy it from the running consul leader (the active PXE server) screen -m -d -S REC-pxeserver "nc -l -p 12345 | dd BS=16M of=/dev/storage/pxeserver" ssh pxeserver screen -m -d -S SND-pxeserver "dd BS=16M if=/dev/vda | nc ${self} 12345" nbd-server publish storage/pxeserver ro lbu package scp ~/${hostname}.apkovl.tar.gz pxeserver:/var/www/localhost/htdocs/apkovl/san.apkovl.tar.gz elif apk add xfsprogs btrfsprogs gptfdisk nfs-utils lxc consul service add lxc modprobe xfs nbd rc-update add nfs wget pxeserver:fichier_de_conf?name=${MAC} setup-alpine --mode data -f fichier_de_conf lbu package scp ~/${hostname}.apkovl.tar.gz pxeserver:/var/www/localhost/htdocs/apkovl/default.apkovl.tar.gz fi reboot
All we need now to boot another AL machines (be it physical or virtual) are some {MAC}.apkovl.tar.gz files served by darkhttpd. We badly need name resolution at this stage. DNS and resolver are needed. DNS to be updated dynamically by dhcp server with split-dns. Resolver knowing the fixed IP address on the DNS and the default route if known at this stage. Both may run in LXC inside this KVM-infra (like the other previous services). DNS will be djbdns and resolver will be dnscache (both from repo).
Then (for now) we need image of a debian install with xnbd-server and lvm2 to build SAN.
Also, on bare-metal we need mdadm to assemble raid1 arrays.
A new SAN is therefore, a MAC address (for debian boot as san), some BD as vda (raid1 from mdadm).
A new server is therefore, a MAC address (for AL boot), a apkovl file (MAC named), some data NBD from some SAN.
The apkovl will be downloaded at boot time with PXE provided address; before launching openvswitch! The IP address will then change because of the apparent MAC change when OVS becomes active.
We may use symlinks to MAC named config files to have a more human friendly view.
It is to be noted that after bootstrap KVM may move to other physical machines. While some KVM-infra is somehow connected to the LAN, everything stay alive! This precise image will be reproduced in every SAN build.
Deploy
After bootstraping, we dispose of a way to boot any AL KVM or bare-bone in about 10 sec.
First we deploy KVM-SAN on bare-metal.
Next we deploy KVM-AL grouping (or not) some LXC (AL or debian).
Second we deploy low-level services: syslog-ng, fail2ban, openVPN, la_console, http-reverse-proxy (primary and secondary), http-proxy, smtp relay, secondary resolver, secondary dns, ldap (primary and secondary), NAS, mariaDB, backups, collectd, shinken, local AL repo, git
Third intermediary services: smtp in, smtp out, antivirus, antispam, smtp store, imap, pop3, http, php, sip, jabber
High level services: x2goserver, lamp, mail toaster, webdav, redmine, etc
For each of those services, we provide a template in the form of a {kvm-template}.apkovl.tar.gz.
After customisation, "lbu package" followed by sending the a.tgz to the central repository is all needed.
We follow a naming convention for MAC:
For bare metal, the 3 first bytes of the MAC is the manufacturer ID.
We symlink that to the baremetal.apkovl.tar.gz.
For KVM, we fix the MAC ourself.
The first 2 bytes (AA:BB) are fixed.
The third one (CC) is the level type of the KVM.
The fourth one (DD) is the specific type of the template.
The last 2 ones are incremental unique ID.
So we are able to define pxelinux.cfg/AA:BB:CC:DD symlinks to config files defining use of {kvm-template}.apkovl.tar.gz.
As {kvm-template}.apkovl.tar.gz tend to be small, we can store a lot of those on the initial USB stick.
Depending on available space on the USB stick, we could offer {lxc-template}s that way from the USB stick to be downloaded from darkhttpd with wget to the right KVM. Or later on from any wanted NAS.
We add a couple of other OVS (WAN, STORAGE) in every machines. Some are connected to NIC. Some are connected to VPN. Netflow will be used in the future to manage the network (naas: network as a service). One of those OVS (WAN) allows connected machines to access the internet trough a default route passing through a physical firewall. STORAGE is used for data replication between SAN and NAS.
We have the list of bare-metal machines.
Those may launch KVM in one command.
We have the list of SAN KVM.
Those may create and publish NBD in two commands.
Even on diskless' machines those are present to offer nbd-proxy in one command.
All those command are grouped as one-liner scripts in some redundant NAS available from la_console.
Waiting for CEPH, we need a strategy for duplicating NBD accros SAN.
About NFS
NFS is now working with AL. Both as server and client with the nfs-utils package.
However, to use NFS as client in some LXC does not seems to work yet as shown below
nfstest:~# mount -t nfs -o ro 192.168.1.149:/srv/boot/alpine /mnt mount.nfs: Operation not permitted mount: permission denied (are you root?) nfstest:~# tail /var/log/messages Apr 4 10:05:59 nfstest daemon.notice rpc.statd[431]: Version 1.3.1 starting Apr 4 10:05:59 nfstest daemon.warn rpc.statd[431]: Flags: TI-RPC Apr 4 10:05:59 nfstest daemon.warn rpc.statd[431]: Failed to read /var/lib/nfs/state: Address in use Apr 4 10:05:59 nfstest daemon.notice rpc.statd[431]: Initializing NSM state Apr 4 10:05:59 nfstest daemon.warn rpc.statd[431]: Failed to write NSM state number: Operation not permitted Apr 4 10:05:59 nfstest daemon.warn rpc.statd[431]: Running as root. chown /var/lib/nfs to choose different user nfstest:~# ls -l /var/lib/nfs total 12 -rw-r--r-- 1 root root 0 Nov 10 15:43 etab -rw-r--r-- 1 root root 0 Nov 10 15:43 rmtab drwx------ 2 nobody root 4096 Apr 4 10:05 sm drwx------ 2 nobody root 4096 Apr 4 10:05 sm.bak -rw-r--r-- 1 root root 4 Apr 4 10:05 state -rw-r--r-- 1 root root 0 Nov 10 15:43 xtab
msg from ncopa """ dmesg should tell you that grsecurity tries to prevent you to do this.
grsecurity does not permit the syscall mount from within a chroot since that is a way to break out of a chroot. This affects lxc containers too.
I would recommend that you do the mouting from the lxc host in the container config with lxc.mount.entry or similar.
https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html#lbAR
If you still want disable mount protection in grsecurity then you can do that with: echo 0 > /proc/sys/kernel/grsecurity/chroot_deny_mount """
this is not working with
lxc.mount.entry=nfsserver:/srv/boot/alpine mnt nfs nosuid,intr 0 0
on the host machine with all nfs modules and helper software installed and loaded.
backend:~# lxc-start -n nfstest lxc-start: conf.c: mount_entry: 2049 Invalid argument - failed to mount 'nfsserver:/srv/boot/alpine' on '/usr/lib/lxc/rootfs/mnt' lxc-start: conf.c: lxc_setup: 4163 failed to setup the mount entries for 'nfstest' lxc-start: start.c: do_start: 688 failed to setup the container lxc-start: sync.c: __sync_wait: 51 invalid sequence number 1. expected 2 lxc-start: start.c: __lxc_start: 1080 failed to spawn 'nfstest'
Nor with
echo 0 > /proc/sys/kernel/grsecurity/chroot_deny_mount
on the host machine with all nfs modules and helper software installed and loaded which does'nt work either.
To find a proper way to use NFS shares from AL LXC is an important topic in order to be able to, for instance, load balance web servers sharing contents uploaded by users.
Next step will be to have HA for the NFS server itself (with only AL machines).
About NBD
NBD is now in edge/testing thanks to clandmeter.
I cannot test it properly at the moment because all the machine are busy in prod. and this package allows newstyle only. I'm waiting my new lab machine...
We still miss xnbd fot it's proxy features allowing live migration.
Also we are still looking after the right solution to backup NBD as a whole (versus by it's content) while in use. dd|nc is the used way nowadays.
New lab machine
Very soon, I will receive a brand new lab machine.
I plan to use lxc in qemu (KVM) in qemu (yes, twice!) to simulate a rack of servers running AL.
There will be 8 first level KVMs. A firewall, a router, storage nodes and compute nodes.
OpenVSwitch (OVS) will be used to simulate the networks (isp, internet, lan, storage, wan, ipmi).
The first level KVMs will receive block devices (BD) as logical volumes (LV) in LVM2 on top of a mdadm raid array composed with the physical hard disk drives.
They will assemble the received BD with mdadm and pass the raw raid as single BD tho the second level SAN KVMs. Those SAN will use LVM2 to publish LV as NBD on OVS "lan".
Some second level KVM will mount NBDs to expose NFS shares.
Other will mount NBS and NFS for real data access with containers (LXC) and expose services on OVS "wan" or "lan".
The first second level KVM to be launched will be a virtual laptop from an virtual USB stick. This particuliar machine with offer a PXEboot environment to the OVS "lan".
The storage and compute nodes will be launched with PXE on the OVS "lan" but will be able to run totally from RAM with no string attached to the boot devices (for instance the initial NFS share).
As soon as 1 SAN and 1 compute node will be available, the PXEboot server will reproduce himself from the virtual laptop USB stick to the compute node using the storage node to store the information about the setup; then live-migrate (keeping status of running machines).
eth0 is almays connected to OVS "lan" but on the firewall (connected to OVS "internet" and "isp").
The router is connected to all OVS but "isp" and "storage".
The storage nodes are connected to OVS "storage".
The compute nodes are connected to OVS "wan".
The DHCP lease is offered with no time limit after absence check on OVS "lan".
As a matter of fact, the only difference between a first and a second level KVM is sda first and vda second.
All machines run a consul instance.
The PXEboot server is a fixed known consul server guarantee to be present (otherwise boot does'nt even exist!).
On the N first compute nodes launched, a consul server KVM will be started (configured to reach a quorum of N) to replace the standard consul client.
As the state of a running cluster is always kept in the PXEboot server, This capacity is present in all consul server but active only on the actual consul leader.
We need to link or maintain the PXE configuration and bootstrap (including relevant apkovl) files to the consul key/value datastore to benefit from his resilience.
We need to hack lbu commit to push the resulting apkovl to all consul servers (as they are also stand-by copy of the consul leader).
Each consul election need to enforce the consul leader as the active PXE server.
In the real rack, at this stage, we just switch machine on connected to right switches after checking that it will boot trough PXE on first NIC (eth0).
In our simulator, we can manually start a KVM as fake physical machine (sda) or have a script on the real physical lab machine driving the lyfe cycle of those KVMs.
About consul
nothing yet but big hopes ^^ I'm lurking IRC about it ;)
Open questions
- What memory footprint is needed?
- What about dynamycally adapt quorum size?
- Are checks possible triggers?
consul watch -prefix type -name name /path/to/executable
consul event [options] -name name [payload]
- What best practice to store etc configurations?