User:Jch: Difference between revisions
| m (→NAS) | m (→The services) | ||
| (3 intermediate revisions by the same user not shown) | |||
| Line 23: | Line 23: | ||
| Some hdd are paired into raid1 arrays (mdadm) (2 such arrays for storage boxes, 1 for services boxes).<br/> | Some hdd are paired into raid1 arrays (mdadm) (2 such arrays for storage boxes, 1 for services boxes).<br/> | ||
| The remaining hdd on storage boxes will be discussed later on. | The remaining hdd on storage boxes will be discussed later on. | ||
| All machines, be it physical or virtual, are started by mean of PXE boot. <br/> | |||
| Even PXEboot boxes ;) <br/> | |||
| (After an initial bootstrap of the infrastucture)<br/> | |||
| We have PXE boxes and other boxes. <br/> | |||
| Only three different apkovl initial files are needed: the PXEboot one, the default one and the LXC-repo one. | |||
| We will use '''consul''' as soon as the package will be availaible in edge/testing. <br/> | |||
| We will run as many dormant PXEboot servers as consul servers. <br/> | |||
| The consul leader will also be the active PXEboot server.<br/> | |||
| All other machines will be consul agent. | |||
| When a machine is first bootstrapped, it register itself implicit against the dhcpd server and explicit against the consul cluster. It finishes the bootstrap procedure registering herself against the apkovl http server. Later reboot will be done with the specific IP address, apkovl file registered and available key/value pairs (for configuration of services) in the consul k/v data store. | |||
| The PXEservers are booted using a specially crated hwaddr for eth0 in order to use the right apkovl. All the serices sonfiguration is done "manualy" in some /etc/local.d/scripts.start so to have eth0 IP address available. DHCPd is not started by default. It will only be activated on explicit consul leader election event. A local repo is also passed to have all needed package files at hand. | |||
| Our cluster will have a bootstrap redundancy level equal to the the consul redundancy level. As for our CEPH cluster, we will use N=3 as wanted redundancy level. | |||
| =The storage= | =The storage= | ||
| Line 28: | Line 45: | ||
| =="Regular" SAN== | =="Regular" SAN== | ||
| If a machine is booted with 2 /dev/sd?, those are assembled as raid1 array and the resulting /dev/md0 is passed as /dev/vda to some KVM-SAN. <br/> | |||
| Inside KVM- | Inside KVM-SAN, we run LVM2 and nbd-server to publish the LVs. <br/> | ||
| The first LV is the  | The first LV is the repo to install other machines.<br/> | ||
| The SAN service is registered in consul. As all published LV.<br/> | |||
| Usually, NBD are to be mounted as /var in specific KVM. The KVM uid (hwaddre of eth0) will be used to identify related components (aka NBD).<br/> | |||
| It will be possible to ask consul for the SAN hosting specific KVM NBD. | |||
| ==NAS== | ==NAS== | ||
| a KVM mount locally some LV NBD and export their contents with NFS. | |||
| ==CEPH cluster== | ==CEPH cluster== | ||
| Line 52: | Line 70: | ||
| =The services= | =The services= | ||
| The  | The PXEserver offers DHCPd to bootstrap all the installs. | ||
| The different KVMs described here under are running on the services boxes, which one is not relevant as they are to move around whenever need arise for maintenance, for load balancing, whatever... | The different KVMs described here under are running on the services boxes, which one is not relevant as they are to move around whenever need arise for maintenance, for load balancing, whatever... | ||
| Those KVM are AL  | Those KVM are running AL. | ||
| First they are several KVM-infra (for redundance) running LXC for dns resolver/caches, squid, web front-end, openLDAP, ... | First they are several KVM-infra (for redundance) running LXC for dns resolver/caches, squid, web front-end, openLDAP, ... | ||
| Second they are several KVM-services running LXC for ... services ;) Like lighttpd, php-fpm, postfix, dovecot,  | Second they are several KVM-services running LXC for ... services ;) Like lighttpd, php-fpm, postfix, dovecot, mariaDB, etc. <br/> | ||
| Most of the LXC are AL based but some are debian based when we where not able to use AL. For instance, our LXC-sFTP servers are debian based to be able to properly use pam-ldap with sshd and with pam_mkhomedir, our internal LXC-redmine is also debian based and so on... | Those may share data to serve trough the use of the NASs (for instance to have redundant web servers).<br/> | ||
| Most of the LXC are AL based but some are debian based when we where not able to use AL. <br/> | |||
| For instance, our LXC-sFTP servers are debian based to be able to properly use pam-ldap with sshd and with pam_mkhomedir, our internal LXC-redmine is also debian based and so on... | |||
Latest revision as of 07:08, 12 April 2015
Hi,
I'm sysadmin in a small linux shop.
I'm in the process of moving my machine to a new architecture. I will try to describe it here as is it now based mostly on AlpineLinux ;)
I sadly use the word "mostly" as there are quite a bit of needed pieces missing or broken for my use case.
A big thanks to the great folks at AL who made this possible !
The machines
We currently run about 10 boxes in a data center.
A firewall, a router, 3 storage boxes (with 8 hdd), 5 service boxes (2 of them will not be discussed here).
The router is connected to the firewall and to 3 different switches: LAN, WAN and IPMI. There is a fourth switch called STORAGE.
All machines (but the firewall), be it physical or virtual, are connected to LAN.
Storage boxes are connected to STORAGE.
Most physical boxes (but the old ones) are connected to IPMI.
Some machines are connected to WAN (if services are exposed to the Internet).
The fundamentals
The physical machines are running AlpineLinux from usb keys with openssh, mdadm, screen, OpenVSwitch, openVPN and qemu. Nothing more.
OVS replicates the physical switches (obviously except for IPMI). When no physical NIC is available to link to the physical network we use openVPN to connect the OVS to another connected box.
Some hdd are paired into raid1 arrays (mdadm) (2 such arrays for storage boxes, 1 for services boxes).
The remaining hdd on storage boxes will be discussed later on.
All machines, be it physical or virtual, are started by mean of PXE boot. 
Even PXEboot boxes ;) 
(After an initial bootstrap of the infrastucture)
We have PXE boxes and other boxes. 
Only three different apkovl initial files are needed: the PXEboot one, the default one and the LXC-repo one.
We will use consul as soon as the package will be availaible in edge/testing. 
We will run as many dormant PXEboot servers as consul servers. 
The consul leader will also be the active PXEboot server.
All other machines will be consul agent.
When a machine is first bootstrapped, it register itself implicit against the dhcpd server and explicit against the consul cluster. It finishes the bootstrap procedure registering herself against the apkovl http server. Later reboot will be done with the specific IP address, apkovl file registered and available key/value pairs (for configuration of services) in the consul k/v data store.
The PXEservers are booted using a specially crated hwaddr for eth0 in order to use the right apkovl. All the serices sonfiguration is done "manualy" in some /etc/local.d/scripts.start so to have eth0 IP address available. DHCPd is not started by default. It will only be activated on explicit consul leader election event. A local repo is also passed to have all needed package files at hand.
Our cluster will have a bootstrap redundancy level equal to the the consul redundancy level. As for our CEPH cluster, we will use N=3 as wanted redundancy level.
The storage
"Regular" SAN
If a machine is booted with 2 /dev/sd?, those are assembled as raid1 array and the resulting /dev/md0 is passed as /dev/vda to some KVM-SAN. 
Inside KVM-SAN, we run LVM2 and nbd-server to publish the LVs. 
The first LV is the repo to install other machines.
The SAN service is registered in consul. As all published LV.
Usually, NBD are to be mounted as /var in specific KVM. The KVM uid (hwaddre of eth0) will be used to identify related components (aka NBD).
It will be possible to ask consul for the SAN hosting specific KVM NBD.
NAS
a KVM mount locally some LV NBD and export their contents with NFS.
CEPH cluster
This is still in alpha stage.
With remaining hdd on the storage boxes, we are in the process of deploying a CEPH cluster.
Fot this purpose we start the CEPH components (MON, OSD) with "qemu -file nbd:KVM-storage:port -file /dev/sdX -file /dev/sdY". 
Those are also debian based as RBD is not available in AL.
The cluster is used as a backend with the KVM-storage acting as proxies between RBD and NBD. 
We plan to replace our SANs with CEPH only later on when AL will integrate RBD into stable version for the kernel and for qemu.
The services
The PXEserver offers DHCPd to bootstrap all the installs.
The different KVMs described here under are running on the services boxes, which one is not relevant as they are to move around whenever need arise for maintenance, for load balancing, whatever...
Those KVM are running AL.
First they are several KVM-infra (for redundance) running LXC for dns resolver/caches, squid, web front-end, openLDAP, ...
Second they are several KVM-services running LXC for ... services ;) Like lighttpd, php-fpm, postfix, dovecot, mariaDB, etc. 
Those may share data to serve trough the use of the NASs (for instance to have redundant web servers).
Most of the LXC are AL based but some are debian based when we where not able to use AL. 
For instance, our LXC-sFTP servers are debian based to be able to properly use pam-ldap with sshd and with pam_mkhomedir, our internal LXC-redmine is also debian based and so on...