High Availability High Performance Web Cache

From Alpine Linux

Jump to: navigation, search

Contents

Introduction

This document explains how to use HAProxy and ucarp to provide high performance and high-availability services. This document has been tested using Alpine Linux 2.2.3.

In this document we will use the Squid web cache as the example service. Squid typically uses only a single processor, even on a multi-processor machine. To get increased web-caching performance, it is better to scale the web cache out across multiple (cheap) physical boxes. Although web caching is used as the example service, this document applies to other services, such as mail, web acceleration, etc.

Network Diagram

In the end, we will have an architecture that looks like this:

Squid-HAProxy-Ucarp.png

The workstations all connect to the HAProxy instance at 192.168.1.10. 192.168.1.10 is a virtual IP controlled by ucarp; that is, HAProxy runs on one of the web cache servers at any given time, but any of the web caches can be the HAProxy instance.

HAProxy distributes the web traffic across all live web cache servers, which cache the resources from the Internet.

Benefits

Initial Services

The first step in getting high-availability is to have more than one server; do the following on each of cache1-4

apk add squid
acl all src all
acl localhost src 127.0.0.1/32
acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network 

icp_port 3130
icp_access allow all

cache_peer 192.168.1.11 sibling 3128 3130
cache_peer 192.168.1.12 sibling 3128 3130
cache_peer 192.168.1.13 sibling 3128 3130
cache_peer 192.168.1.14 sibling 3128 3130

http_access allow localnet
http_access allow localhost
http_access deny all

http_port 3128

forwarded_for off
Tango-dialog-warning.png
Warning: This is a minimal configuration for demonstration purposes only. Likely you will need to configure more restrictive ACLs

The icp and cache_peer entries allow all four squid server to share cached information. Once one server has retrieved an item, the others can get the information from the local server instead of reterieving it themselves.

rc_update add squid
/etc/init.d/squid start


At this point, you should be able to set your browser to use any of 192.168.1.1[1-4]:3129 as a proxy address, and get to the Internet. Because this config file does not use any optimizations, browsing will be slower than normal. This is to be expected. Any optimizations to the squid configuration you make to one server can be applied to all in the array. The purpose of this example is to show that the service is uniform across the array.

Ucarp Virtual IP Manager

Ucarp runs on all the servers and makes sure that a virtual IP address is available. In the example diagram we use the virtual IP of 192.168.1.10.

apk add ucarp
ln -s /etc/init.d/ucarp /etc/init.d/ucarp.eth0
cp /etc/conf.d/ucarp /etc/conf.d/ucarp.eth0
REALIP=
VHID=1
VIP=192.168.1.10
PASSWORD=SecretPassword
#!/bin/sh
 
 # Add the VIP address
 ip addr add $2/24 dev $1
 
 for a in 330 440 550; do beep -f $a -l 100; done
#!/bin/sh

 # Remove the VIP address
 ip addr del $2/24 dev $1

 for a in 550 440 330; do beep -f $a -l 100; done
chmod +x /etc/ucarp/*.sh
 rc-update add ucarp.eth0
 /etc/init.d/ucarp.eth0 start
 lbu commit

Once it is running on each server, unplug the network cable on each server in turn. After a couple seconds, the tone should sound on the other boxes as they hold an election to select a new master. (Note, all boxes will briefly become master, and then the others will quickly demote themselves.) You should be able to ping 192.168.1.10 no matter which server is elected master.

HA Proxy Load Balancer

The HA Load Balancer:


apk add haproxy
global
  uid haproxy
  gid haproxy
  chroot /var/empty

defaults
  # 30 minutes of waiting for a web request is crazy, 
  # but some users do it, and then complain the proxy
  # broke the interwebs.
  timeout client 30m
  timeout server 30m 
  # If the server doesnt respond in 4 seconds its dead
  timeout connect 4s

listen http_proxy 192.168.1.10:8080
  mode tcp
  balance roundrobin
  server cache1 192.168.1.11:3128 check
  server cache2 192.168.1.12:3128 check
  server cache3 192.168.1.13:3128 check
  server cache4 192.168.1.14:3128 check
  

If your squid caches have public, routeable IP addresses, you may wish to change the balance algorithm to source. Some web applications get confused when a client's IP address changes between requests. Using balance source load balances clients across all web proxies, but once a client is assigned to a specific proxy, it continues to use that proxy.

adduser -s /bin/false -D -h /dev/null -H haproxy
/etc/init.d/haproxy start

Enabling the HA Service

After following the above instructions, you should have the following in place:


#!/bin/sh
 
 # Add the VIP address
 ip addr add $2/24 dev $1

 /etc/init.d/haproxy start
 
 for a in 330 440 550; do beep -f $a -l 100; done
#!/bin/sh

 /etc/init.d/haproxy stop

 # Remove the VIP address
 ip addr del $2/24 dev $1

 for a in 550 440 330; do beep -f $a -l 100; done
 lbu commit


Maintenance

The haproxy process load balances requests across all available web proxies. If a proxy crashes, haproxy automatically removes it from the pool, and redirects incoming requests to the remaining available proxies. Once the proxy is returned to service, haproxy automatically notices and starts sending requests to it again.


The ucarp process ensures that "haproxy" is running on the virtual ip (192.168.1.10) at all times. Clients do not need to be reconfigured, even if the machine haproxy is running on crashes. Another box takes on the virtual address and things "just work"


To remove a proxy from service, it is possible to just take it down. A more graceful way to do this is to update the haproxy.conf and tell haproxy to use the new config. For instance, to delete 192.168.1.11 from the pool:

# server cache1 192.168.1.11:3128 check # 192.168
 /usr/bin/haproxy -D -p /var/run/haproxy.pid -f /etc/haproxy.cfg -sf \
  $( cat /var/run/haproxy.pid )

This causes the existing haproxy to finish connections, but not accept new ones. Eventually, the old haproxy process will die, leaving only the new process. Since web requests can take a long time, the old haproxy instance may linger for several minutes. Make sure the old process has terminated before taking down the web proxy server.


Similarly, to add a web cache into the farm, use the above command to have haproxy start using the new config file.

The "-sf" flag allows rolling maintenance of the web caches with no observable effect on the clients.

Personal tools
Namespaces
Variants
Actions
Welcome
Services
Wiki
Toolbox