User:Jch: Difference between revisions
| m (→NAS) | m (→The fundamentals:   future with consul) | ||
| Line 23: | Line 23: | ||
| Some hdd are paired into raid1 arrays (mdadm) (2 such arrays for storage boxes, 1 for services boxes).<br/> | Some hdd are paired into raid1 arrays (mdadm) (2 such arrays for storage boxes, 1 for services boxes).<br/> | ||
| The remaining hdd on storage boxes will be discussed later on. | The remaining hdd on storage boxes will be discussed later on. | ||
| All machines, be it physical or virtual, are started by mean of PXE boot. <br/> | |||
| Even PXEboot boxes ;) <br/> | |||
| (After an initial bootstrap of the infrastucture)<br/> | |||
| We have PXE boxes and other boxes. <br/> | |||
| Only three different apkovl initial files are needed: the PXEboot one, the default one and the LXC-repo one. | |||
| We will use '''consul''' as soon as the package will be availaible in edge/testing. <br/> | |||
| We will run as many dormant PXEboot servers as consul servers. <br/> | |||
| The consul leader will also be the active PXEboot server.<br/> | |||
| All other machines will be consul agent. | |||
| When a machine is first bootstrapped, it register itself implicit against the dhcpd server and explicit against the consul cluster. It finishes the bootstrap procedure registering herself against the apkovl http server. Later reboot will be done with the specific IP address, apkovl file registered and available key/value pairs (for configuration of services) in the consul k/v data store. | |||
| The PXEservers are booted using a specially crated hwaddr for eth0 in order to use the right apkovl. All the serices sonfiguration is done "manualy" in some /etc/local.d/scripts.start so to have eth0 IP address available. DHCPd is not started by default. It will only be activated on explicit consul leader election event. A local repo is also passed to have all needed package files at hand. | |||
| Our cluster will have a bootstrap redundancy level equal to the the consul redundancy level. As for our CEPH cluster, we will use N=3 as wanted redundancy level. | |||
| =The storage= | =The storage= | ||
Revision as of 06:53, 12 April 2015
Hi,
I'm sysadmin in a small linux shop.
I'm in the process of moving my machine to a new architecture. I will try to describe it here as is it now based mostly on AlpineLinux ;)
I sadly use the word "mostly" as there are quite a bit of needed pieces missing or broken for my use case.
A big thanks to the great folks at AL who made this possible !
The machines
We currently run about 10 boxes in a data center.
A firewall, a router, 3 storage boxes (with 8 hdd), 5 service boxes (2 of them will not be discussed here).
The router is connected to the firewall and to 3 different switches: LAN, WAN and IPMI. There is a fourth switch called STORAGE.
All machines (but the firewall), be it physical or virtual, are connected to LAN.
Storage boxes are connected to STORAGE.
Most physical boxes (but the old ones) are connected to IPMI.
Some machines are connected to WAN (if services are exposed to the Internet).
The fundamentals
The physical machines are running AlpineLinux from usb keys with openssh, mdadm, screen, OpenVSwitch, openVPN and qemu. Nothing more.
OVS replicates the physical switches (obviously except for IPMI). When no physical NIC is available to link to the physical network we use openVPN to connect the OVS to another connected box.
Some hdd are paired into raid1 arrays (mdadm) (2 such arrays for storage boxes, 1 for services boxes).
The remaining hdd on storage boxes will be discussed later on.
All machines, be it physical or virtual, are started by mean of PXE boot. 
Even PXEboot boxes ;) 
(After an initial bootstrap of the infrastucture)
We have PXE boxes and other boxes. 
Only three different apkovl initial files are needed: the PXEboot one, the default one and the LXC-repo one.
We will use consul as soon as the package will be availaible in edge/testing. 
We will run as many dormant PXEboot servers as consul servers. 
The consul leader will also be the active PXEboot server.
All other machines will be consul agent.
When a machine is first bootstrapped, it register itself implicit against the dhcpd server and explicit against the consul cluster. It finishes the bootstrap procedure registering herself against the apkovl http server. Later reboot will be done with the specific IP address, apkovl file registered and available key/value pairs (for configuration of services) in the consul k/v data store.
The PXEservers are booted using a specially crated hwaddr for eth0 in order to use the right apkovl. All the serices sonfiguration is done "manualy" in some /etc/local.d/scripts.start so to have eth0 IP address available. DHCPd is not started by default. It will only be activated on explicit consul leader election event. A local repo is also passed to have all needed package files at hand.
Our cluster will have a bootstrap redundancy level equal to the the consul redundancy level. As for our CEPH cluster, we will use N=3 as wanted redundancy level.
The storage
"Regular" SAN
On the usb key, we have a qcow2 image file that we start with "qemu -file storage-0.img -file /dev/md0" (I skip details as there are not meaningfull here). 
Inside KVM-storage-0, we run LVM2 and xnbd-server to publish the LVs. Therefore, for now, this kvm is debian based as xnbd is not available in AL.
The first LV is the /home of the VM (to host ISOs to install other machines). The second one is storage-1 on storage boxes (also debian).
We then start a similar KVM-storage-1 on the physical box with "qemu -file nbd:KVM-storage-0:port -file /dev/md1".
NAS
Inside the KVM-storage, we mount locally some LVs and export their contents with NFS.
With both NBD and NFS now in AL (testing) we are now able to do it in AL KVM.
CEPH cluster
This is still in alpha stage.
With remaining hdd on the storage boxes, we are in the process of deploying a CEPH cluster.
Fot this purpose we start the CEPH components (MON, OSD) with "qemu -file nbd:KVM-storage:port -file /dev/sdX -file /dev/sdY". 
Those are also debian based as RBD is not available in AL.
The cluster is used as a backend with the KVM-storage acting as proxies between RBD and NBD. 
We plan to replace our SANs with CEPH only later on when AL will integrate RBD into stable version for the kernel and for qemu.
The services
The router offers DHCP to bootstrap all the installs.
The different KVMs described here under are running on the services boxes, which one is not relevant as they are to move around whenever need arise for maintenance, for load balancing, whatever...
Those KVM are AL running from a small NBD for system and a large one for data.
First they are several KVM-infra (for redundance) running LXC for dns resolver/caches, squid, web front-end, openLDAP, ...
Second they are several KVM-services running LXC for ... services ;) Like lighttpd, php-fpm, postfix, dovecot, mysql, etc. Those may share data to serve trough the use of the NASs (for instance to have redundant web servers).
Most of the LXC are AL based but some are debian based when we where not able to use AL. For instance, our LXC-sFTP servers are debian based to be able to properly use pam-ldap with sshd and with pam_mkhomedir, our internal LXC-redmine is also debian based and so on...
to be continued...