User talk:Jch/Building a complete infrastucture with AL: Difference between revisions

From Alpine Linux
(10 intermediate revisions by the same user not shown)
Line 3: Line 3:
= Open Issues =
= Open Issues =


* introspection (to know what ressources are available where)
* ceph (for transparent redundant storage)
* ceph (for transparent redundant storage)
* xnbd (for nbd-proxies and live-migration)
* xnbd (for nbd-proxies and live-migration)
Line 22: Line 21:
* data
* data
* sys
* sys
* manual introspection
= State of the project =
We already are able to
# '''network'''
#* lan
#* wan
#* storage
#* ipmi
#* vpn
# '''storage'''
## Network Block Device (NBD)
##* provision
##* destroy
# '''compute'''
## KVM
##* prepare (raid, PXE, SAN, data, sys, diskless)
##* start
##* stop
##* kill
##* status
##* destroy
## LXC
##* create
##* start
##* stop
##* kill
##* status
##* destroy
# '''orchestration'''
#* ''not in production yet, will be based on consul and envconsul''
# '''monitoring'''
#* stats collection
#* health checks ''not in production yet, will be based on consul''
To spinup a new basic KVM takes about 10 seconds.


= Building a complete infrastucture with AL =
= Building a complete infrastucture with AL =
Line 31: Line 67:
If there is some feed-back, I may develop config files and so on ;)
If there is some feed-back, I may develop config files and so on ;)


As I started from scratch and OpenVSwitch was not available in Alpine at that time yet, It took me a while to build everything. But to reproduce it, it would be ''piece of cake''!
As I started from scratch and OpenVSwitch was not available in Alpine at that time yet, It took me a while to build everything. <br/>
But to reproduce it, it would be ''piece of cake''! <br/>
NFS was not available neither.<br/>
And NBD just entered testing stge.<br/>
Consul is the new addition to the big picture and it's amazing! We are waiting for it to be included in the distro (see Open Issues).


We use qemu-kvm for KVM. But I guess one may use whatever Virtual Machine technology one likes.
We use qemu-kvm for KVM. But I guess one may use whatever Virtual Machine technology one likes.
Line 79: Line 119:


No service at all is running in the AL on bare metal. All are running is some KVM connected to needed switches by the means of the OpenVSwitches.
No service at all is running in the AL on bare metal. All are running is some KVM connected to needed switches by the means of the OpenVSwitches.
==== consul ====
on every machine, '''consul''' from http://repos.mauras.ch/alpinelinux/x86_64/
==== dnsmasq ====
on every machine, from repo pointing to the local consul instance to resolve .consul names and regular resolvers otherwise.


==== dhcp ====
==== dhcp ====
Line 92: Line 138:
2 '''dnscache''' from repo as resolver on the lan.
2 '''dnscache''' from repo as resolver on the lan.


'''dnsmaqq''' from repo pointing to consul as local resolver on every machine.
==== Resolver ====
With '''dnscache''' from repo for 2 resolver on the LAN.
Those KVM does know a gateway to the Internet.
Those KVM does know a gateway to the Internet.
'''dnsmaqq''' from repo pointing to consul as local resolver on every machine.


==== PXEboot ====
==== PXEboot ====

Revision as of 08:22, 27 April 2015

This material is work-in-progress ...

Do not follow instructions here until this notice is removed.
(Last edited by Jch on 27 Apr 2015.)

Open Issues

  • ceph (for transparent redundant storage)
  • xnbd (for nbd-proxies and live-migration)
  • consul (for fleet orchestration)
  • envconsul (for centralized ENV variables)
  • consul-template (for automatic services configuration)
  • mount NFS share in LXC (for middelware load-balancing)
  • x2go server in full desktop mode (for product activation)

Solved Issues

  • PXE
  • bootstrap
  • SAN
  • NAS
  • diskless
  • raid
  • data
  • sys
  • manual introspection

State of the project

We already are able to

  1. network
    • lan
    • wan
    • storage
    • ipmi
    • vpn
  2. storage
    1. Network Block Device (NBD)
      • provision
      • destroy
  3. compute
    1. KVM
      • prepare (raid, PXE, SAN, data, sys, diskless)
      • start
      • stop
      • kill
      • status
      • destroy
    2. LXC
      • create
      • start
      • stop
      • kill
      • status
      • destroy
  4. orchestration
    • not in production yet, will be based on consul and envconsul
  5. monitoring
    • stats collection
    • health checks not in production yet, will be based on consul

To spinup a new basic KVM takes about 10 seconds.

Building a complete infrastucture with AL

I'm doing it. It's for real! That's my daily job at present ^^

I'm building a full private cloud bootstraped with only an AlpineLinux USB key for each physical machine. But next ones will be able to boot from network; not even USB keys will be needed. As a matter of fact, we used more than only one physical USB key because we didn't started from scratch but had a live migration from Debian to Alpine for most of the services and machines...

If there is some feed-back, I may develop config files and so on ;)

As I started from scratch and OpenVSwitch was not available in Alpine at that time yet, It took me a while to build everything.
But to reproduce it, it would be piece of cake!
NFS was not available neither.
And NBD just entered testing stge.
Consul is the new addition to the big picture and it's amazing! We are waiting for it to be included in the distro (see Open Issues).

We use qemu-kvm for KVM. But I guess one may use whatever Virtual Machine technology one likes.

This is the presentation of a use case. Not a HOW TO. And it's still a work in progess...

Elements

Network

Firewall

We put a dedicated physical machine on each link between our LAN and other networks. It just run iptables and some paquets accounting metrology.

Router

Physical machine connected to our LAN and other networks (trough a firewall). A static routing table do the trick.

Switches

All physical machines run OpenVSwitch reproducing virtually all physical switches we have plus some virtuals only.

VPN

All physical machines run openVPN as client to as many switch defined less the physical interfaces of the machine. There is an openVPN server somewhere running in a KVM connected to needed switches.

Storage

SAN

On each physical machine, a couple of HDD are mounted in raid1 witch mdadm. This raid array is passed as parameter to a KVM who in turn mount it as physical volume for LVM. The created LV are published as NBD with xnbd-server. For the time being, this KVM is running debian 7.8 as xnbd is not in Alpine (yet?)..

The SAN also connects to the CEPH cluster as client and publish reached RBD as NBD with xnbd-server. For the time being, this KVM is running debian 7.8 as no xnbd nor RBD are in Alpine (yet?)..

NAS

Some KVM is mounting some NBD as local drives and publishing some directories as NFS shares.
We now have nfs and nbd in AL.

CEPH

KVM with physical HDD as parameters are used for building OSD and MON needed to operate a CEPH cluster. One KVM is the "console" to drive it from a single point of presence (usefull but not "needed").For the time being, those KVM are running debian 7.8 as CEPH and RBD are not in Alpine (yet?)..

Low-level services

No service at all is running in the AL on bare metal. All are running is some KVM connected to needed switches by the means of the OpenVSwitches.

consul

on every machine, consul from http://repos.mauras.ch/alpinelinux/x86_64/

dnsmasq

on every machine, from repo pointing to the local consul instance to resolve .consul names and regular resolvers otherwise.

dhcp

dhcpd from repo.
We just have to configure it properly.

We have to test if dhcpd may run in a LXC instead of a KVM?

DNS

tinydns from repo with split-dns config as master DNS.

2 dnscache from repo as resolver on the lan.

Those KVM does know a gateway to the Internet.

PXEboot

kernel and initrd files in tftp server.
copy of usb content (or nothing) in nfs server.
apkovl and modloop files in darkhttpd server.

Time server

The PXEserver uses ntpd (or similar) from repo, to act as client to the WAN and server to the LAN.

syslog

With syslog-ng from repo, we receive the logs from all machines be it physical or virtual.
It's the only place who needs logrotate from repo.

HTTP proxy/cache

The web proxy/cache squid, from repo, uses a NBD as cache. It has a link to the internet to forward requests and one to the LAN.

Because of him, no machine, as they are all connected to the LAN, be it physical or virtual, needs a published default gateway. And all machines are able to install/upgrade packages or to see the WWW as client.

We point all AL boxes to this KVM with setup-proxy. As we have a repo.service.consul on the LAN, we do not use setup-proxy anymore...

Monitoring

shinken from sources in some LXC with barely only the python package installed

smokeping from repo to monitor connectivity to remote locations.

Metrology

Collectd (one LXC as server as collectd.service.consul, all other machines, be it physical or virtual, as client) with collectd-network from repo.
A couple of lines in CGP config file is enough for now.

Backups

with common tools: rsync, tar, netcat, bzip2, openssh, cron, dd

LDAP

openldap with openldap-back-hdb, both from repo published as ldap.service.consul.

Database

with mariaDB from repo published as mariadb.service.consul.

High-level services

in LXC AL whenever possible.
in LXC Debian as second choice
in KVM otherwise.

x2goserver

I did package nx-libs and x2goserver. I'm waiting for the packages to be included in edge/testing. They are already being used for single app access. Next step is full desktop but we are not sure if AL is the right choice for full desktop usage for our customers...

unfortunately, x2goclient pops up "kex error : did not find one of algos diffie-hellman-group1-sha1 in list curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1 for kex algos" need to specify diffie-hellman-group1-sha1 in sshd_config. Luckyly a fix exists and my business partner is looking after a way to enhance it's security upstream.

ejabberd

with ejabberd from edge/testing repo. I migrate the mnesia DB from an old debian squeeze just copying the files and changing ownership in a LXC-AL. I just had to disable mod_pubsub to have it run properly. Authentification is done with openLDAP. I now plan to migrate a very very old jabberd (11 years I guess) running on a debian etch to it if I find a way to keep users's password and rosters... I also would like to use it as a gateway to IRC to follow #alpine, #alpine-devel and #x2go channels ;) Some other ejabberd features are interesting to my organisation and we will experiment more in depth, namely mod_sip, mod_stun, mod_proxy65...

redmine

in a brand new LXC with edge/main and edge/testing repos
mostly following Redmine page
I use a mariaDB server on another host where I did create the user and pushed the sqldump from a running redmine 3.0.0 instance

apk update
apk upgrade
reboot
setup-timezone
apk add redmine
apk add ruby-unicorn
cp /etc/unicorn/redmine.conf.rb.sample /etc/unicorn/redmine.conf.rb
vi /etc/conf.d/unicorn
vi /etc/redmine/database.yml 
apk add sudo
apk add ruby-mysql2
apk add ruby-yard
apk add tzdata
cd /usr/share/webapps/redmine
sudo -u redmine rake generate_secret_token
sudo -u redmine RAILS_ENV=production rake db:migrate

email

LDAP - opeldap from repo

SMTP in - postfix-ldap from repo

Antispam - spamassassin from repo

Antivirus - clamav from repo

SMTP store - postfix-ldap from repo

mail NAS - nfsutils from repo

IMAP - dovecot-ldap from repo

SMTP relay - emailrelay from source

SMTP out - postfix from repo

Webmail - squirrelmail from source

webhosting

Front-end - nginx

Back-end static - darkhttpd

Back-end dynamic - php-fpm

File server - nfs, sftp (based on ssh-ldap)

Architecture

After bootstrap, a running cluster (or cloud) will have 3 (or 5) KVM-PXE acting as consul servers.
The consul leader being the active PXE server, others being hot spares kept in sync with the active one.
Those KVM-PXE are booted from PXE, running from RAM with no tie left to the initial machine.
Consul is in charge of maintening an active PXE server wich is just a hook to the leader election process.

If a machine detect it has physical drives, it assembles a raid array and starts a KVM-SAN based on that array.
If a machine detect it has virtual drive and is a SAN, it prepare a LVM2 volume and hooks the creation of LV to there export as NBD publication.<br/ If a machine detect it has virtuel drive ans is not a SAN, it mount the drive as /var (alpine in data mode).
If a machine does'nt detect ant drive, it starts diskless (alpine in run-from-ram mode).
At first run, each machine register herself in both consul and the PXE server.
Then reboot. At this stage the machine, be it physical or virtual, is uniquely identified both in the catalogue and in the PXE boot process.

One of the first service build atop the now available infrastructure, are KVM-NAS to share file systems and not only block devices.

Consul fournit le framework pour disposer des données de décision d'élasticité des services exposés par le cluster.
HAproxy est omniprésent pour permettre l'élasticité d'un service.
Consul permettant la reconfiguration dynamique des HAproxies.