HA Proxy Load Balancer
HAProxy is a free, very fast and reliable webserver load-balancing solution offering high availability load balancing, and proxying for TCP and HTTP-based applications.
It is particularly suited for very high traffic web sites and powers quite a number of the world’s most visited ones.
Since its creation in 2000 HAProxy has become the de-facto standard open-source load balancer, now shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms.
HAProxy Compared to Varnish
You often encounter environments using both Varnish and HAProxy. Below is a quick overview of Varnish compared to HAProxy.
HAProxy is a an open source reverse-proxy load balancer, while Varnish is an open source HTTP reverse-proxy cache, sometimes referred to as a web application accelerator. It is installed t in front of any web server that speaks HTTP and is configured to cache the contents. Varnish is extremely fast, typically with speeds of 300 – 1000x depending on the system architecture.
The two work well together in many high traffic environments, with Varnish making the website faster by offloading static object delivery functions to itself, while HAProxy serves to ensure smooth load-balancing with smart persistence and DDOS mitigation.
While the reverse proxy caching system Varnish basically only supports round-robin and random balancing algorithms, HAProxy is specifically designed for load-balancing and supports:
round robin (with weighting)
a static round robin approach
least connections
first server available
bucketing based on source
bucketing based on URI
bucketing based on a cookie
bucketing based on a URL parameter
bucketing based on an HTTP header
Below are some common aspects of HAProxy and Varnish. Both offer:
high performance
basic load-balancing
reverse-proxy mode
advanced HTTP features
server health checking
IPv6 ready
tunnel mode available
Management socket (CLI)
no SSL offloading
client-side HTTP 1.1 with keepalive
Here are some features that HAProxy offers which Varnish does not:
advanced load-balancing
Operates at TCP level with any Layer 7 protocol
Proxy protocol for both client and server side
DOS and DDOS mitigation capacity
Web interface connection
Server and application protection using queue management
Advanced and custom logging with a powerful log analyzer tool (halog)
SNI content switching
multiple persistence
Named ACLs
Full HTTP 1.1 support on the server side, with keep-alive
Varnish is especially strong in these areas:
caching
TCP connection re-use
Has a powerful live traffic analyzer (varnishlog)
modular construction with wide variety of modules available
grace mode for delivery of stale content
saint mode to manage origin server error
Stats command line tools available such as varnishhist and varnishstat
Edge side includes (ESIs)
HTTP 1.1 used on server side
Uses the intuitive VCL configuration language
One drawback with HAProxy is that it can itself represent a SPOF or Single Point of Failure. To avoid this multiple HAProxies can be deployed.
What HAProxy Is and Isn’t
This is the description from the official HAProxy Configuration Handbook:
What HAProxy is and isn't ------------------------------ HAProxy is : - a TCP proxy : it can accept a TCP connection from a listening socket, connect to a server and attach these sockets together allowing traffic to flow in both directions; IPv4, IPv6 and even UNIX sockets are supported on either side, so this can provide an easy way to translate addresses between different families. - an HTTP reverse-proxy (called a "gateway" in HTTP terminology) : it presents itself as a server, receives HTTP requests over connections accepted on a listening TCP socket, and passes the requests from these connections to servers using different connections. It may use any combination of HTTP/1.x or HTTP/2 on any side and will even automatically detect the protocol spoken on each side when ALPN is used over TLS. - an SSL terminator / initiator / offloader : SSL/TLS may be used on the connection coming from the client, on the connection going to the server, or even on both connections. A lot of settings can be applied per name (SNI), and may be updated at runtime without restarting. Such setups are extremely scalable and deployments involving tens to hundreds of thousands of certificates were reported. - a TCP normalizer : since connections are locally terminated by the operating system, there is no relation between both sides, so abnormal traffic such as invalid packets, flag combinations, window advertisements, sequence numbers, incomplete connections (SYN floods), or so will not be passed to the other side. This protects fragile TCP stacks from protocol attacks, and also allows to optimize the connection parameters with the client without having to modify the servers' TCP stack settings. - an HTTP normalizer : when configured to process HTTP traffic, only valid complete requests are passed. This protects against a lot of protocol-based attacks. Additionally, protocol deviations for which there is a tolerance in the specification are fixed so that they don't cause problem on the servers (e.g. multiple-line headers). - an HTTP fixing tool : it can modify / fix / add / remove / rewrite the URL or any request or response header. This helps fixing interoperability issues in complex environments. - a content-based switch : it can consider any element from the request to decide what server to pass the request or connection to. Thus it is possible to handle multiple protocols over a same port (e.g. HTTP, HTTPS, SSH). - a server load balancer : it can load balance TCP connections and HTTP requests. In TCP mode, load balancing decisions are taken for the whole connection. In HTTP mode, decisions are taken per request. - a traffic regulator : it can apply some rate limiting at various points, protect the servers against overloading, adjust traffic priorities based on the contents, and even pass such information to lower layers and outer network components by marking packets. - a protection against DDoS and service abuse : it can maintain a wide number of statistics per IP address, URL, cookie, etc and detect when an abuse is happening, then take action (slow down the offenders, block them, send them to outdated contents, etc). - an observation point for network troubleshooting : due to the precision of the information reported in logs, it is often used to narrow down some network-related issues. - an HTTP compression offloader : it can compress responses which were not compressed by the server, thus reducing the page load time for clients with poor connectivity or using high-latency, mobile networks. - a caching proxy : it may cache responses in RAM so that subsequent requests for the same object avoid the cost of another network transfer from the server as long as the object remains present and valid. It will however not store objects to any persistent storage. Please note that this caching feature is designed to be maintenance free and focuses solely on saving haproxy's precious resources and not on save the server's resources. Caches designed to optimize servers require much more tuning and flexibility. If you instead need such an advanced cache, please use Varnish Cache, which integrates perfectly with haproxy, especially when SSL/TLS is needed on any side. - a FastCGI gateway : FastCGI can be seen as a different representation of HTTP, and as such, HAProxy can directly load-balance a farm comprising any combination of FastCGI application servers without requiring to insert another level of gateway between them. This results in resource savings and a reduction of maintenance costs. HAProxy is not : - an explicit HTTP proxy, i.e. the proxy that browsers use to reach the internet. There are excellent open-source software dedicated for this task, such as Squid. However HAProxy can be installed in front of such a proxy to provide load balancing and high availability. - a data scrubber : it will not modify the body of requests nor responses. - a static web server : during startup, it isolates itself inside a chroot jail and drops its privileges, so that it will not perform any single file- system access once started. As such it cannot be turned into a static web server (dynamic servers are supported through FastCGI however). There are excellent open-source software for this such as Apache or Nginx, and HAProxy can be easily installed in front of them to provide load balancing, high availability and acceleration. - a packet-based load balancer : it will not see IP packets nor UDP datagrams, will not perform NAT or even less DSR. These are tasks for lower layers. Some kernel-based components such as IPVS (Linux Virtual Server) already do this pretty well and complement perfectly with HAProxy.
How to Install HAProxy
[root@router2 ~]# yum install -y haproxy Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: ftp.antilo.de * centos-ceph-nautilus: ftp.plusline.net * centos-nfs-ganesha28: mirror.softaculous.com * epel: mirror.hostnet.nl * extras: mirror.checkdomain.de * updates: mirror.23media.com Resolving Dependencies --> Running transaction check ---> Package haproxy.x86_64 0:1.5.18-9.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ============================================================== Package Arch Version Repository Size ============================================================== Installing: haproxy x86_64 1.5.18-9.el7 base 834 k Transaction Summary ============================================================== Install 1 Package Total download size: 834 k Installed size: 2.6 M Downloading packages: haproxy-1.5.18-9.el7.x86_64.rpm | 834 kB 00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : haproxy-1.5.18-9.el7.x86_64 1/1 Verifying : haproxy-1.5.18-9.el7.x86_64 1/1 Installed: haproxy.x86_64 0:1.5.18-9.el7 Complete! [root@router2 ~]# [root@router1 ~]# haproxy -v HA-Proxy version 1.5.18 2016/05/10 Copyright 2000-2016 Willy Tarreau <willy@haproxy.org> [root@router1 ~]# [root@router1 ~]# yum install nc -y Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: ftp.antilo.de * centos-ceph-nautilus: ftp.plusline.net * centos-nfs-ganesha28: mirror.softaculous.com * epel: ftp.plusline.net * extras: mirror.checkdomain.de * updates: de.mirrors.clouvider.net Resolving Dependencies --> Running transaction check ---> Package nmap-ncat.x86_64 2:6.40-19.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved =============================================================================================================== Package Arch Version Repository Size =============================================================================================================== Installing: nmap-ncat x86_64 2:6.40-19.el7 base 206 k Transaction Summary =============================================================================================================== Install 1 Package Total download size: 206 k Installed size: 423 k Downloading packages: nmap-ncat-6.40-19.el7.x86_64.rpm | 206 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : 2:nmap-ncat-6.40-19.el7.x86_64 1/1 Verifying : 2:nmap-ncat-6.40-19.el7.x86_64 1/1 Installed: nmap-ncat.x86_64 2:6.40-19.el7 Complete! [root@router1 ~]#
Configuration of HAProxy in Lab Test Environment
For Lab testing purposes we will create two Ncat web servers and an API server:
These simple servers simply print out a message (such as “This is Server ONE”) and keep running until the server is stopped.
In a real-world environment you would be deploying real production web and app servers.
Enter the following code to create the dummy webservers:
while true ; do nc -l -p 10080 -c 'echo -e "HTTP/1.1 200 OK\n\n This is Server ONE"' ; done & while true ; do nc -l -p 10081 -c 'echo -e "HTTP/1.1 200 OK\n\n This is Server TWO"' ; done & while true ; do nc -l -p 10082 -c 'echo -e "HTTP/1.1 200 OK\nContent-Type: application/json\n\n { \"Message\" :\"Hello, World!\" }"' ; done &
[root@router1 ~]# while true ; > do > nc -l -p 10080 -c 'echo -e "HTTP/1.1 200 OK\n\n This is Server ONE"' ; > done & [1] 3374 [root@router1 ~]# [root@router1 ~]# while true ; > do > nc -l -p 10081 -c 'echo -e "HTTP/1.1 200 OK\n\n This is Server TWO"' ; > done & [2] 3375 [root@router1 ~]# [root@router1 ~]# while true ; > do > nc -l -p 10082 -c 'echo -e "HTTP/1.1 200 OK\nContent-Type: application/json\n\n { \"Message\" :\"Hello, World!\" }"' ; > done & [3] 3376 [root@router1 ~]#
Modify the HAProxy config file
HAProxy’s configuration file is /etc/haproxy/haproxy.cfg. This is where you define the load balancer.
Enter the following into the file:
global log 127.0.0.1 local2 user haproxy group haproxy defaults mode http log global option httplog frontend main bind *:80 default_backend web use_backend api if { path_beg -i /api/ } #------------------------- # SSL termination - HAProxy handles the encryption. # To use it, put your PEM file in /etc/haproxy/certs # then edit and uncomment the bind line (75) #------------------------- # bind *:443 ssl crt /etc/haproxy/certs/haproxy.pem ssl-min-ver TLSv1.2 # redirect scheme https if !{ ssl_fc } #----------------------------- # Enable stats at http://test.local:8404/stats #----------------------------- frontend stats bind *:8404 stats enable stats uri /stats #----------------------------- # round robin balancing between the various backends #----------------------------- backend web server web1 127.0.0.1:10080 check server web2 127.0.0.1:10081 check #----------------------------- # API backend for serving up API content #----------------------------- backend api server api1 127.0.0.1:10082 check
Restart and reload HAProxy
HAProxy is probably not running yet, so issue the command to start (or restart) it:
systemctl restart haproxy
The restart method is ok for lab testing but in production environments you should use systemctl reload haproxy to avoid service interruptions to haproxy.
[root@router1 haproxy]# systemctl restart haproxy
If there are no errors reported, then you have a running load balancer.
How to Test HaProxy
Check that the haproxy service is running:
[root@router1 haproxy]# systemctl status haproxy ● haproxy.service - HAProxy Load Balancer Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled) Active: active (running) since Mo 2021-05-17 04:46:49 CEST; 1s ago Main PID: 3404 (haproxy-systemd) CGroup: /system.slice/haproxy.service ├─3404 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid ├─3405 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds └─3406 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | with such a configuration. To fix this, please e...ing Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | timeouts are set to a non-zero value: 'client', ...r'. Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: [WARNING] 136/044649 (3405) : config : missing tim...b'. Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | While not properly invalid, you will certainly e...ems Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | with such a configuration. To fix this, please e...ing Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | timeouts are set to a non-zero value: 'client', ...r'. Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: [WARNING] 136/044649 (3405) : config : missing tim...i'. Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | While not properly invalid, you will certainly e...ems Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | with such a configuration. To fix this, please e...ing Mai 17 04:46:49 router1 haproxy-systemd-wrapper[3404]: | timeouts are set to a non-zero value: 'client', ...r'. Hint: Some lines were ellipsized, use -l to show in full. [root@router1 haproxy]#
Then test the haproxy functionality by typing curl http://localhost/ on the command line.
If you see “This is Server ONE,” then it works.
Run curl a few times and watch it cycle through your backend server pool. See also what happens when you enter:
curl http://localhost/api/
[root@router1 haproxy]# curl http://localhost/ This is Server ONE [root@router1 haproxy]# [root@router1 haproxy]# curl http://localhost/api/ { "Message" :"Hello, World!" } [root@router1 haproxy]#
Adding /api/ to the end of the URL will send the web traffic to the third server in the backend pool.
The haproxy load balancer is now configured.
[root@router1 haproxy]# cat haproxy.cfg global log 127.0.0.1 local2 user haproxy group haproxy defaults mode http log global option httplog frontend main bind *:80 default_backend web use_backend api if { path_beg -i /api/ } #------------------------- # SSL termination - HAProxy handles the encryption. # To use it, put your PEM file in /etc/haproxy/certs # then edit and uncomment the bind line (75) #------------------------- # bind *:443 ssl crt /etc/haproxy/certs/haproxy.pem ssl-min-ver TLSv1.2 # redirect scheme https if !{ ssl_fc } #----------------------------- # Enable stats at http://test.local:8404/stats #----------------------------- frontend stats bind *:8404 stats enable stats uri /stats #----------------------------- # round robin balancing between the various backends #----------------------------- backend web server web1 127.0.0.1:10080 check server web2 127.0.0.1:10081 check #----------------------------- # API backend for serving up API content #----------------------------- backend api server api1 127.0.0.1:10082 check [root@router1 haproxy]#
Using HAProxy Stats
The configuration defined a frontend called stats that is listening on port 8404:
frontend stats
bind *:8404
stats uri /stats
stats enable
In a browser, connect to http://localhost:8404/stats
for my router1 machine using the webbrowser on the laptop I can enter:
http://10.0.8.100:8404/stats
and this connects to the stats on the router1 VM haproxy. See output below.
The HAProxy Config File in Detail
If you’re using the Community Edition, it’s located at /etc/haproxy/haproxy.cfg.
For the Enterprise Edition its located at /etc/hapee-1.8/hapee-lb.cfg.
The structure of this file is as follows:
global
# global settings here
defaults
# defaults here
frontend
# a frontend that accepts requests from clients
backend
# servers that fulfill the requests
view raw
Important global settings:
maxconn
The maxconn setting limits the maximum number of connections that HAProxy will accept. This is to protect your load balancer from running out of memory.
You can select the best value for your environment by referring to the HAPRoxy sizing guide for memory requirements.
user / group
The user and group lines tell HAProxy to drop privileges after initialization.
Processes have to be root to listen on ports below 1024. You will need your SSL/TLS private keys to be readable only by root as well.
Without defining a user and group HAProxy will use root privileges, which is bad practice. HAProxy itself does not create the user and group and so you must create these uses manually.
stats socket
The stats socket line enables the Runtime API, which dynamically disables servers and health checks, changes the load balancing weights of servers and other things.
nbproc / nbthread
The nbproc and nbthread settings specify the number of processes and threads respectively HAProxy will launch on starting up.
This increases efficiency of your load balancer. However, each process created by nbproc maintains its own stats, stick tables, health checks, etc.
Threads created with nbthread, on the other hand, share them.
You can use one or the other or both settings. HAProxy also performs well with only one process and thread, unless you are doing a large number of TLS terminations, in which case these will benefit from multiple CPU cores.
ssl-default-bind-ciphers
The ssl-default-bind-ciphers setting enumerates the SSL and TLS ciphers that every bind directive will use by default.
It can be overridden with a more specific setting by adding the bind directive’s ciphers parameter.
It takes a list of cipher suites in order of preference. HAProxy will select the first one listed that the client also supports, unless the prefer-client-ciphers option is enabled.
ssl-default-bind-options
The ssl-default-bind-options setting configures SSL/TLS options such as ssl-min-ver to disable support for older protocols. For example, you might choose to accept only connections that use a TLS version of 1.2 or newer.
Defaults section:
Typical defaults are:
defaults
timeout connect 10s
timeout client 30s
timeout server 30s
log global
mode http
option httplog
maxconn 3000
timeout connect / timeout client / timeout server
The timeout connect sets the time HAProxy will wait for a TCP connection to a backend server to be created.
The “s” suffix = seconds. Without any suffix, default time setting is in milliseconds.
The timeout client setting measures inactivity when sending TCP segments. The timeout server setting measures inactivity when the backend server is communicating.
When a timeout expires, the connection is closed.
When operating HAProxy in TCP mode, (set with mode tcp), timeout server should be same as for timeout client. This is because HAProxy doesn’t know which side is communicating since both apply all the time.
log global
The log global setting is a way of telling each subsequent frontend to use the log setting defined in the global section.
mode
The mode setting defines whether HAProxy operates as a simple TCP proxy or if it’s able to inspect incoming traffic’s higher-level HTTP messages.
The alternative to specifying mode http is to use mode tcp, which operates at the faster, but less-aware, level.
If most of your frontend and backend sections are using the same mode, you can specify it in the defaults section to avoid repetition.
maxconn
The maxconn setting limits the number of connections each frontend will accept. By default this is set to 2000. If you want to allow more connections, you can increase it to your global maxconn.
Alternatively you may wish to use a number that gives each frontend a fairer share of the global connections.
option httplog
The option httplog setting, or more rarely option tcplog, tells HAProxy to use a more verbose log format when sending messages to Syslog. Usually option httplog is preferable over option tcplog in the defaults section because when HAProxy encounters a frontend using mode tcp, it will send a warning and downgrade it to option tcplog in any case.
If neither is specified, then the connect log format is used.
Alternatively you can define a custom log format with the log-format setting, in which case option httplog and option tcplog are not required.
Frontend Settings
When you use HAProxy as a reverse proxy in front of your backend servers, the frontend section defines the IP addresses and ports clients can connect to.
You can add any number of HAProxy frontends for different purposes, with each frontend keyword followed by a label, eg www.mysite.com, to differentiate it from other frontends.
Eg:
frontend www.mysite.com
bind 10.0.0.3:80
bind 10.0.0.3:443 ssl crt /etc/ssl/certs/mysite.pem
http-request redirect scheme https unless { ssl_fc }
use_backend api_servers if { path_beg /api/ }
default_backend web_servers
view raw
bind
The bind setting assigns a listener to a specific IP address and port. The IP can be omitted to bind to all IP addresses on the server and a port can be a single port, a range, or a comma-delimited list.
SSL and crt arguments are used to instruct HAProxy to manage SSL/TLS terminations, rather than termination occuring on the backend web servers.
http-request redirect
A http-request redirect setting tells the client they should try a different URL. In our example, clients that request your site over non-encrypted HTTP are redirected to the HTTPS version of the site.
use_backend
The use_backend setting chooses a backend pool of servers to respond to incoming requests if a given condition is true.
It’s followed by an ACL statement,
eg
if path_beg /api/
this allows HAProxy to select a specific backend based on some criteria, eg checking to see if the path begins with /api/. These lines aren’t required by HAProxy and many frontend sections only have a default_backend line and no selection rules.
default_backend
The default_backend setting is found in practically all every frontends and defines the name of a backend to send traffic to if a use_backend rule doesn’t send it elsewhere beforehand.
Note that if a request isn’t routed by a use_backend or default_backend directive, then HAProxy will return a 503 Service Unavailable error.
Backend Settings
The backend section defines the servers that will be load balanced and assigned to handle requests forwarded by the frontend section of HAProxy.
You define a label for each backend for reference and easy recognition in the config, such as web_servers.
Eg
backend web_servers
balance roundrobin
cookie SERVERUSED insert indirect nocache
option httpchk HEAD /
default-server check maxconn 20
server server1 10.0.1.3:80 cookie server1
server server2 10.0.1.4:80 cookie server2
view raw
balance
The balance setting controls how HAProxy will select the server to respond to the request unless a persistence method overrides that selection.
A persistence method might be to always send a particular client to the same server based on a cookie. You might need this is the process involves something like a shopping cart procedure.
Common load balancing values include roundrobin, which simply picks the next server and then starts over at the top of the list again,
or leastconn, where HAProxy selects the server with the lowest number of active sessions.
cookie
The cookie setting enables cookie-based persistence. This tells HAProxy to send a cookie named SERVERUSED to the client, and to associate it with the name of the server that issues the initial response.
This causes the client to continue communicating with that server for the duration of the session. The name of the server is set by using a cookie argument on the server line.
option httpchk
The option httpchk setting causes HAProxy to send Layer 7 (HTTP) health checks rather than Layer 4 (TCP) checks to the backend servers.
Servers that don’t respond are not served any more requests.
Whereas TCP checks succeed if they’re able to make a connection to the backend server’s IP and port, HTTP health checks expect to receive back a successful HTTP response.
Smarter health checks are instrumental in removing unresponsive servers, even if unresponsive means just getting a bad HTTP response like 500 Server Error.
By default, an HTTP health check makes a request to the root path, /, using the OPTIONS verb.
HAProxy will treat any check that gets a 2xx or 3xx response code to be successful. This can be customized by using an http-check line.
Using option httpchk isn’t restricted to backends that use mode http, so servers that communicate using HTTP can be checked regardless of proxying mode in use.
default-server
The default-server setting configures defaults for any server lines that follow, such as enabling health checks, max connections, etc in order to make the config easier to read and modify. Alternatively, you can specify these arguments on each server.
server
The server setting is the main part of the backend.
First server argument is a name, plus IP address and port of the backend server.
ou can specify a domain name instead of an IP address, in which case it will be resolved at startup or alternatively, if you add a resolvers argument, it will be updated during runtime.
If the DNS entry contains an SRV record, then the port and weight will be filled in from it too.
If the port isn’t specified, then HAProxy uses the same port the client connected to, which is useful for randomly used ports such as for active-mode FTP.
Although we added option httpchk to set up HTTP-based health checking of our servers, each server must opt in to health checks by adding a check argument. This can be set on the server line or by using the default-server line.
Every server line should have a maxconn setting that limits the maximum number of concurrent requests that the server will be given.
Having a value here will avoid saturating your servers with traffic requests and will also provide a baseline for later fine-tuning.
To perform SSL/TLS termination at the frontend use:
To use haproxy as a TLS terminator you have to set inside your frontend section
bind :80
bind :443 ssl crt <path-to-combined-cert>