User talk:Jch/New lab machine

From Alpine Linux

New lab machine

Very soon, I will receive a brand new lab machine.

I plan to use lxc in qemu (KVM) in qemu (yes, twice!) to simulate a rack of servers running AL.

There will be 8 first level KVMs. A firewall, a router, storage nodes and compute nodes.

OpenVSwitch (OVS) will be used to simulate the networks (isp, internet, lan, storage, wan, ipmi).

The first level KVMs will receive block devices (BD) as logical volumes (LV) in LVM2 on top of a mdadm raid array composed with the physical hard disk drives.
They will assemble the received BD with mdadm and pass the raw raid as single BD tho the second level SAN KVMs. Those SAN will use LVM2 to publish LV as NBD on OVS "lan".
Some second level KVM will mount NBDs to expose NFS shares.
Other will mount NBS and NFS for real data access with containers (LXC) and expose services on OVS "wan" or "lan".

The first second level KVM to be launched will be a virtual laptop from an virtual USB stick. This particuliar machine with offer a PXEboot environment to the OVS "lan".
The storage and compute nodes will be launched with PXE on the OVS "lan" but will be able to run totally from RAM with no string attached to the boot devices (for instance the initial NFS share).

As soon as 1 SAN and 1 compute node will be available, the PXEboot server will reproduce himself from the virtual laptop USB stick to the compute node using the storage node to store the information about the setup; then live-migrate (keeping status of running machines).

eth0 is almays connected to OVS "lan" but on the firewall (connected to OVS "internet" and "isp").
The router is connected to all OVS but "isp" and "storage".
The storage nodes are connected to OVS "storage".
The compute nodes are connected to OVS "wan".

The DHCP lease is offered with no time limit after absence check on OVS "lan".

As a matter of fact, the only difference between a first and a second level KVM is sda first and vda second.

All machines run a consul instance.
The PXEboot server is a fixed known consul server guarantee to be present (otherwise boot does'nt even exist!).
On the N first compute nodes launched, a consul server KVM will be started (configured to reach a quorum of N) to replace the standard consul client.
As the state of a running cluster is always kept in the PXEboot server, This capacity is present in all consul server but active only on the actual consul leader.
We need to link or maintain the PXE configuration and bootstrap (including relevant apkovl) files to the consul key/value datastore to benefit from his resilience.
We need to hack lbu commit to push the resulting apkovl to all consul servers (as they are also stand-by copy of the consul leader).
Each consul election need to enforce the consul leader as the active PXE server.

In the real rack, at this stage, we just switch machine on connected to right switches after checking that it will boot trough PXE on first NIC (eth0).
In our simulator, we can manually start a KVM as fake physical machine (sda) or have a script on the real physical lab machine driving the lyfe cycle of those KVMs.