Load balancing distributes traffic across multiple servers to improve availability, performance, and scalability of applications.
Why Load Balancing?
Benefits
- High availability: Continue service if servers fail
- Horizontal scalability: Add capacity by adding servers
- Performance: Distribute load to prevent overload
- Maintenance: Take servers offline without downtime
- Geographic distribution: Serve users from nearest location
Use Cases
- Web applications with high traffic
- API endpoints requiring scalability
- Database read replicas
- Microservices architectures
- Content delivery networks
Load Balancing Layers
Layer 4 (Transport Layer)
Routes based on IP address and TCP/UDP port.
Characteristics
- Fast (minimal packet inspection)
- Protocol-agnostic
- Cannot make content-based decisions
- Good for TCP/UDP traffic
Example: HAProxy L4
frontend tcp_front
bind *:3306
mode tcp
default_backend mysql_servers
backend mysql_servers
mode tcp
balance roundrobin
server mysql1 10.0.1.10:3306 check
server mysql2 10.0.1.11:3306 check
Use Cases
- Database connections
- SMTP, IMAP mail servers
- Game servers
- VoIP/SIP traffic
Layer 7 (Application Layer)
Routes based on HTTP headers, cookies, URL paths, etc.
Characteristics
- Content-aware routing
- SSL termination
- Session persistence
- Slower (more processing)
Example: Nginx L7
upstream backend {
server 10.0.1.10:8080;
server 10.0.1.11:8080;
server 10.0.1.12:8080;
}
server {
listen 80;
server_name example.com;
location /api/ {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Use Cases
- Web applications
- REST APIs
- Microservices
- Content-based routing
Load Balancing Algorithms
1. Round Robin
Distributes requests sequentially across servers.
How It Works
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A
Request 5 → Server B
...
Configuration Examples
HAProxy
backend web_servers
balance roundrobin
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
server web3 10.0.1.12:80 check
Nginx
upstream backend {
# Round robin is default
server 10.0.1.10:80;
server 10.0.1.11:80;
server 10.0.1.12:80;
}
AWS ALB (Application Load Balancer)
# Round robin is default behavior
# No specific configuration needed
Pros
- Simple and predictable
- Equal distribution (if requests are similar)
- No state required
Cons
- Doesn’t account for server capacity
- Doesn’t consider current load
- Can overload slower servers
Best For
- Homogeneous server pool
- Similar request processing times
- Stateless applications
2. Weighted Round Robin
Round robin with server capacity weights.
How It Works
Server A (weight 3): Gets 3 requests
Server B (weight 2): Gets 2 requests
Server C (weight 1): Gets 1 request
Request distribution:
A, A, A, B, B, C, A, A, A, B, B, C, ...
Configuration Examples
HAProxy
backend web_servers
balance roundrobin
server web1 10.0.1.10:80 weight 3 check
server web2 10.0.1.11:80 weight 2 check
server web3 10.0.1.12:80 weight 1 check
Nginx
upstream backend {
server 10.0.1.10:80 weight=3;
server 10.0.1.11:80 weight=2;
server 10.0.1.12:80 weight=1;
}
AWS Target Group
{
"Targets": [
{"Id": "i-1234567890abcdef0", "Weight": 300},
{"Id": "i-abcdef1234567890a", "Weight": 200},
{"Id": "i-567890abcdef12345", "Weight": 100}
]
}
Best For
- Mixed server capacities
- Gradual deployment (canary releases)
- Cost optimization (use cheaper servers less)
3. Least Connections
Sends requests to server with fewest active connections.
How It Works
Server A: 5 connections
Server B: 3 connections
Server C: 7 connections
Next request → Server B (least connections)
Configuration Examples
HAProxy
backend web_servers
balance leastconn
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
server web3 10.0.1.12:80 check
Nginx
upstream backend {
least_conn;
server 10.0.1.10:80;
server 10.0.1.11:80;
server 10.0.1.12:80;
}
Pros
- Adapts to varying request durations
- Prevents server overload
- Better for long-lived connections
Cons
- Requires connection tracking
- More complex than round robin
- May not balance requests if connection durations vary
Best For
- WebSocket connections
- Long-polling applications
- Database connection pools
- Varying request processing times
4. Weighted Least Connections
Combines least connections with server weights.
HAProxy
backend web_servers
balance leastconn
server web1 10.0.1.10:80 weight 2 check
server web2 10.0.1.11:80 weight 1 check
Best For
- Mixed server capacities with long connections
- WebSocket servers of different sizes
5. IP Hash / Source IP
Routes based on client IP address.
How It Works
hash(client_ip) % num_servers = server_index
Client 192.0.2.1 → hash → Server A (always)
Client 192.0.2.50 → hash → Server C (always)
Client 198.51.100.10 → hash → Server B (always)
Configuration Examples
HAProxy
backend web_servers
balance source
hash-type consistent
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
server web3 10.0.1.12:80 check
Nginx
upstream backend {
ip_hash;
server 10.0.1.10:80;
server 10.0.1.11:80;
server 10.0.1.12:80;
}
Pros
- Session persistence (same client → same server)
- No session sharing needed
- Simple implementation
Cons
- Uneven distribution if clients are behind NAT
- Doesn’t adapt to server load
- Removing servers disrupts many sessions
Best For
- Applications with server-side sessions
- When session sharing is impractical
- Small number of distinct clients
6. Consistent Hashing
Improved hash-based routing with minimal disruption.
How It Works
- Servers placed on a hash ring
- Client hashed to ring position
- Routed to next server clockwise
- Adding/removing servers affects only ~1/N of clients
HAProxy
backend web_servers
balance uri
hash-type consistent
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
server web3 10.0.1.12:80 check
Nginx (with hash directive)
upstream backend {
hash $request_uri consistent;
server 10.0.1.10:80;
server 10.0.1.11:80;
server 10.0.1.12:80;
}
Best For
- Caching layers (CDN, reverse proxy)
- Distributed storage systems
- Dynamic server pools
7. Least Response Time
Routes to server with lowest response time.
How It Works
Server A: avg 50ms response
Server B: avg 120ms response
Server C: avg 80ms response
Next request → Server A (fastest)
AWS ALB
# Default behavior considers response time
# No explicit configuration needed
Nginx Plus (commercial)
upstream backend {
least_time header;
server 10.0.1.10:80;
server 10.0.1.11:80;
}
Best For
- Geographically distributed servers
- Mixed performance servers
- Cloud environments with variable performance
8. Random
Randomly selects a server for each request.
Nginx
upstream backend {
random;
server 10.0.1.10:80;
server 10.0.1.11:80;
server 10.0.1.12:80;
}
With Two Choices (Power of Two Random Choices)
upstream backend {
random two least_conn;
server 10.0.1.10:80;
server 10.0.1.11:80;
server 10.0.1.12:80;
}
Best For
- Large server pools
- When simplicity is key
- Power of two random (good balance vs. complexity)
9. URL Hash
Routes based on request URL.
HAProxy
backend web_servers
balance uri
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
Nginx
upstream backend {
hash $request_uri;
server 10.0.1.10:80;
server 10.0.1.11:80;
}
Best For
- Cache optimization
- Content-based routing
- Segment-specific handling
10. Header-Based / Cookie-Based
Routes based on HTTP headers or cookies.
Nginx - Cookie
upstream backend {
hash $cookie_jsessionid;
server 10.0.1.10:80;
server 10.0.1.11:80;
}
HAProxy - Header
backend web_servers
balance hdr(X-Session-ID)
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
Best For
- Session affinity
- A/B testing
- User segmentation
Session Persistence (Sticky Sessions)
Cookie-Based Persistence
HAProxy - Insert Cookie
backend web_servers
cookie SERVERID insert indirect nocache
server web1 10.0.1.10:80 cookie web1 check
server web2 10.0.1.11:80 cookie web2 check
Nginx - Sticky Cookie
upstream backend {
server 10.0.1.10:80;
server 10.0.1.11:80;
sticky cookie srv_id expires=1h domain=.example.com path=/;
}
AWS ALB
{
"Type": "app_cookie",
"AppCookieName": "JSESSIONID",
"Duration": 86400
}
Source IP Persistence
Nginx
upstream backend {
ip_hash;
server 10.0.1.10:80;
server 10.0.1.11:80;
}
HAProxy
backend web_servers
stick-table type ip size 1m expire 30m
stick on src
server web1 10.0.1.10:80 check
server web2 10.0.1.11:80 check
Trade-offs
Pros
- No session replication needed
- Simpler application design
- Better cache locality
Cons
- Uneven load distribution
- Harder to scale down
- Server failure affects sessions
Health Checks
Active Health Checks
Load balancer actively probes backends.
HAProxy
backend web_servers
option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
http-check expect status 200
server web1 10.0.1.10:80 check inter 5s fall 3 rise 2
# Check every 5s, fail after 3 failures, recover after 2 successes
Nginx
upstream backend {
server 10.0.1.10:80 max_fails=3 fail_timeout=30s;
server 10.0.1.11:80 max_fails=3 fail_timeout=30s;
}
AWS Target Group
{
"HealthCheckProtocol": "HTTP",
"HealthCheckPath": "/health",
"HealthCheckIntervalSeconds": 30,
"HealthCheckTimeoutSeconds": 5,
"HealthyThresholdCount": 2,
"UnhealthyThresholdCount": 3
}
Passive Health Checks
Monitor actual traffic for failures.
Nginx Plus
upstream backend {
server 10.0.1.10:80;
server 10.0.1.11:80;
# Passive health check
zone backend 64k;
health_check interval=5s fails=3 passes=2;
}
Health Check Best Practices
Dedicated health endpoint
@app.route('/health') def health_check(): # Check database connection # Check external dependencies # Return 200 if healthy, 503 if not return {'status': 'healthy'}, 200Check critical dependencies
- Database connectivity
- Required external APIs
- Disk space
- Memory availability
Fast checks (< 1 second)
- Don’t perform expensive operations
- Cache dependency checks
Meaningful responses
- 200: Healthy
- 503: Unhealthy (temporarily unavailable)
- Different codes for different issues
Advanced Patterns
Blue-Green Deployment
Nginx - Switch Traffic
# Initially all traffic to blue
upstream backend {
server blue.example.com:80 weight=100;
server green.example.com:80 weight=0;
}
# After validation, switch to green
# upstream backend {
# server blue.example.com:80 weight=0;
# server green.example.com:80 weight=100;
# }
Canary Deployment
HAProxy - 10% to Canary
backend web_servers
balance roundrobin
server stable1 10.0.1.10:80 weight 9 check
server stable2 10.0.1.11:80 weight 9 check
server canary 10.0.1.50:80 weight 2 check
# 2/(9+9+2) = 10% to canary
Nginx - Percentage Split
split_clients $request_id $variant {
10% canary;
* stable;
}
server {
location / {
proxy_pass http://$variant;
}
}
upstream stable {
server 10.0.1.10:80;
server 10.0.1.11:80;
}
upstream canary {
server 10.0.1.50:80;
}
Geographic Load Balancing
AWS Route 53 - Geolocation
{
"Name": "example.com",
"Type": "A",
"SetIdentifier": "US-East",
"GeoLocation": {"ContinentCode": "NA"},
"ResourceRecords": [{"Value": "192.0.2.1"}]
},
{
"Name": "example.com",
"Type": "A",
"SetIdentifier": "EU-West",
"GeoLocation": {"ContinentCode": "EU"},
"ResourceRecords": [{"Value": "198.51.100.1"}]
}
Content-Based Routing
HAProxy - Path-Based
frontend http_front
bind *:80
acl is_api path_beg /api/
acl is_static path_beg /static/
use_backend api_servers if is_api
use_backend cdn_servers if is_static
default_backend web_servers
backend api_servers
balance leastconn
server api1 10.0.2.10:8080 check
backend cdn_servers
balance roundrobin
server cdn1 10.0.3.10:80 check
backend web_servers
balance roundrobin
server web1 10.0.1.10:80 check
Nginx - Location Blocks
server {
location /api/ {
proxy_pass http://api_backend;
}
location /static/ {
proxy_pass http://cdn_backend;
}
location / {
proxy_pass http://web_backend;
}
}
Rate Limiting
Nginx
limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;
server {
location /api/ {
limit_req zone=one burst=20 nodelay;
proxy_pass http://backend;
}
}
HAProxy
backend web_servers
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request deny if { sc_http_req_rate(0) gt 100 }
Cloud Load Balancers
AWS Elastic Load Balancer
Application Load Balancer (ALB) - Layer 7
- Path-based routing
- Host-based routing
- HTTP header routing
- WebSocket support
- Ideal for microservices
Network Load Balancer (NLB) - Layer 4
- Ultra-low latency
- Static IP addresses
- Millions of requests per second
- TCP/UDP/TLS traffic
Gateway Load Balancer (GWLB)
- Third-party appliances
- Transparent inspection
- Firewall, IDS/IPS integration
Azure Load Balancer
Standard Load Balancer
- Layer 4 (TCP/UDP)
- High availability
- Health probes
- Outbound connections
Application Gateway
- Layer 7 (HTTP/HTTPS)
- WAF integration
- SSL termination
- URL-based routing
Google Cloud Load Balancer
Global Load Balancer
- Anycast IP
- Cross-region failover
- HTTP(S) load balancing
Regional Load Balancer
- Internal load balancing
- TCP/UDP load balancing
Monitoring and Metrics
Key Metrics
Request Metrics
- Requests per second
- Request latency (p50, p95, p99)
- Error rate (4xx, 5xx)
- Active connections
Backend Metrics
- Backend response time
- Backend error rate
- Health check status
- Connection pool usage
Load Balancer Metrics
- CPU utilization
- Network throughput
- Active flows
- Dropped connections
Example Prometheus Metrics
# Request rate
rate(http_requests_total[5m])
# Error rate
rate(http_requests_total{status=~"5.."}[5m])
# Latency percentiles
histogram_quantile(0.95, http_request_duration_seconds_bucket)
# Healthy backends
haproxy_backend_up
Best Practices
1. Choose the Right Algorithm
- Stateless apps: Round robin or least connections
- Stateful apps: Sticky sessions (IP hash, cookie)
- Caching: Consistent hashing or URL hash
- Mixed capacity: Weighted algorithms
- Long connections: Least connections
2. Implement Proper Health Checks
- Active and passive checks
- Check critical dependencies
- Fast response times (< 1s)
- Appropriate intervals and thresholds
3. Plan for Failure
- Graceful degradation
- Circuit breakers
- Automatic failover
- Backup pools
4. Monitor Everything
- Request rates and latencies
- Error rates
- Backend health
- Load balancer health
5. Use Connection Pooling
- Reuse backend connections
- Configure appropriate pool sizes
- Monitor pool exhaustion
6. Enable Logging
- Access logs for troubleshooting
- Error logs for failures
- Structured logging for analysis
7. SSL/TLS Offloading
- Terminate SSL at load balancer
- Reduce backend CPU usage
- Centralize certificate management
8. Gradual Rollouts
- Use weighted routing for deployments
- Canary releases for new versions
- Quick rollback capability
9. Geographic Distribution
- Route users to nearest data center
- Reduce latency
- Improve user experience
10. Regular Testing
- Load testing under realistic conditions
- Failover testing
- Capacity planning
Troubleshooting
Uneven Load Distribution
Symptoms: Some servers overloaded, others idle
Causes
- Long-lived connections with round robin
- Sticky sessions with IP hash
- Varying request complexity
Solutions
- Use least connections algorithm
- Implement connection timeouts
- Use consistent hashing
Session Loss on Server Failure
Symptoms: Users logged out when server fails
Causes
- Server-side sessions without replication
- Sticky sessions to failed server
Solutions
- Implement session replication
- Use external session store (Redis, Memcached)
- Client-side sessions (JWT tokens)
High Latency
Symptoms: Slow response times
Causes
- Unhealthy backends not removed
- Too many connections to backends
- Load balancer overhead
Solutions
- Tune health check parameters
- Adjust connection pooling
- Use Layer 4 instead of Layer 7 if content routing not needed
Connection Timeouts
Symptoms: Connections dropped or timeout errors
Causes
- Aggressive timeout settings
- Slow backend processing
- Network issues
Solutions
- Increase timeout values
- Optimize backend performance
- Check network connectivity
Conclusion
Load balancing is essential for building scalable, highly available systems. Key takeaways:
- Choose the right layer (L4 vs L7) based on routing needs
- Select appropriate algorithm for your traffic patterns
- Implement comprehensive health checks to detect failures
- Monitor continuously to detect issues early
- Plan for failure with graceful degradation
- Test thoroughly under realistic conditions
- Document configuration and changes
The “best” load balancing method depends on your specific requirements. There’s no one-size-fits-all solution. Start simple (round robin), measure, and optimize based on observed behavior.