Load Balancing Methods

Load balancing distributes traffic across multiple servers to improve availability, performance, and scalability of applications.

Why Load Balancing?

Benefits

High availability: Continue service if servers fail
Horizontal scalability: Add capacity by adding servers
Performance: Distribute load to prevent overload
Maintenance: Take servers offline without downtime
Geographic distribution: Serve users from nearest location

Use Cases

Web applications with high traffic
API endpoints requiring scalability
Database read replicas
Microservices architectures
Content delivery networks

Load Balancing Layers

Layer 4 (Transport Layer)

Routes based on IP address and TCP/UDP port.

Characteristics

Fast (minimal packet inspection)
Protocol-agnostic
Cannot make content-based decisions
Good for TCP/UDP traffic

Example: HAProxy L4

frontend tcp_front
    bind *:3306
    mode tcp
    default_backend mysql_servers

backend mysql_servers
    mode tcp
    balance roundrobin
    server mysql1 10.0.1.10:3306 check
    server mysql2 10.0.1.11:3306 check

Use Cases

Database connections
SMTP, IMAP mail servers
Game servers
VoIP/SIP traffic

Layer 7 (Application Layer)

Routes based on HTTP headers, cookies, URL paths, etc.

Characteristics

Content-aware routing
SSL termination
Session persistence
Slower (more processing)

Example: Nginx L7

upstream backend {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
}

server {
    listen 80;
    server_name example.com;

    location /api/ {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Use Cases

Web applications
REST APIs
Microservices
Content-based routing

Load Balancing Algorithms

1. Round Robin

Distributes requests sequentially across servers.

How It Works

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A
Request 5 → Server B
...

Configuration Examples

HAProxy

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx

upstream backend {
    # Round robin is default
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

AWS ALB (Application Load Balancer)

# Round robin is default behavior
# No specific configuration needed

Pros

Simple and predictable
Equal distribution (if requests are similar)
No state required

Cons

Doesn’t account for server capacity
Doesn’t consider current load
Can overload slower servers

Best For

Homogeneous server pool
Similar request processing times
Stateless applications

2. Weighted Round Robin

Round robin with server capacity weights.

How It Works

Server A (weight 3): Gets 3 requests
Server B (weight 2): Gets 2 requests
Server C (weight 1): Gets 1 request

Request distribution:
A, A, A, B, B, C, A, A, A, B, B, C, ...

Configuration Examples

HAProxy

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:80 weight 3 check
    server web2 10.0.1.11:80 weight 2 check
    server web3 10.0.1.12:80 weight 1 check

Nginx

upstream backend {
    server 10.0.1.10:80 weight=3;
    server 10.0.1.11:80 weight=2;
    server 10.0.1.12:80 weight=1;
}

AWS Target Group

{
  "Targets": [
    {"Id": "i-1234567890abcdef0", "Weight": 300},
    {"Id": "i-abcdef1234567890a", "Weight": 200},
    {"Id": "i-567890abcdef12345", "Weight": 100}
  ]
}

Best For

Mixed server capacities
Gradual deployment (canary releases)
Cost optimization (use cheaper servers less)

3. Least Connections

Sends requests to server with fewest active connections.

How It Works

Server A: 5 connections
Server B: 3 connections
Server C: 7 connections

Next request → Server B (least connections)

Configuration Examples

HAProxy

backend web_servers
    balance leastconn
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx

upstream backend {
    least_conn;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Pros

Adapts to varying request durations
Prevents server overload
Better for long-lived connections

Cons

Requires connection tracking
More complex than round robin
May not balance requests if connection durations vary

Best For

WebSocket connections
Long-polling applications
Database connection pools
Varying request processing times

4. Weighted Least Connections

Combines least connections with server weights.

HAProxy

backend web_servers
    balance leastconn
    server web1 10.0.1.10:80 weight 2 check
    server web2 10.0.1.11:80 weight 1 check

Best For

Mixed server capacities with long connections
WebSocket servers of different sizes

5. IP Hash / Source IP

Routes based on client IP address.

How It Works

hash(client_ip) % num_servers = server_index

Client 192.0.2.1  → hash → Server A (always)
Client 192.0.2.50 → hash → Server C (always)
Client 198.51.100.10 → hash → Server B (always)

Configuration Examples

HAProxy

backend web_servers
    balance source
    hash-type consistent
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx

upstream backend {
    ip_hash;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Pros

Session persistence (same client → same server)
No session sharing needed
Simple implementation

Cons

Uneven distribution if clients are behind NAT
Doesn’t adapt to server load
Removing servers disrupts many sessions

Best For

Applications with server-side sessions
When session sharing is impractical
Small number of distinct clients

6. Consistent Hashing

Improved hash-based routing with minimal disruption.

How It Works

Servers placed on a hash ring
Client hashed to ring position
Routed to next server clockwise
Adding/removing servers affects only ~1/N of clients

HAProxy

backend web_servers
    balance uri
    hash-type consistent
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx (with hash directive)

upstream backend {
    hash $request_uri consistent;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Best For

Caching layers (CDN, reverse proxy)
Distributed storage systems
Dynamic server pools

7. Least Response Time

Routes to server with lowest response time.

How It Works

Server A: avg 50ms response
Server B: avg 120ms response
Server C: avg 80ms response

Next request → Server A (fastest)

AWS ALB

# Default behavior considers response time
# No explicit configuration needed

Nginx Plus (commercial)

upstream backend {
    least_time header;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

Best For

Geographically distributed servers
Mixed performance servers
Cloud environments with variable performance

8. Random

Randomly selects a server for each request.

Nginx

upstream backend {
    random;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

With Two Choices (Power of Two Random Choices)

upstream backend {
    random two least_conn;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Best For

Large server pools
When simplicity is key
Power of two random (good balance vs. complexity)

9. URL Hash

Routes based on request URL.

HAProxy

backend web_servers
    balance uri
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check

Nginx

upstream backend {
    hash $request_uri;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

Best For

Cache optimization
Content-based routing
Segment-specific handling

Routes based on HTTP headers or cookies.

Nginx - Cookie

upstream backend {
    hash $cookie_jsessionid;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

HAProxy - Header

backend web_servers
    balance hdr(X-Session-ID)
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check

Best For

Session affinity
A/B testing
User segmentation

Session Persistence (Sticky Sessions)

HAProxy - Insert Cookie

backend web_servers
    cookie SERVERID insert indirect nocache
    server web1 10.0.1.10:80 cookie web1 check
    server web2 10.0.1.11:80 cookie web2 check

Nginx - Sticky Cookie

upstream backend {
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

AWS ALB

{
  "Type": "app_cookie",
  "AppCookieName": "JSESSIONID",
  "Duration": 86400
}

Source IP Persistence

Nginx

upstream backend {
    ip_hash;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

HAProxy

backend web_servers
    stick-table type ip size 1m expire 30m
    stick on src
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check

Trade-offs

Pros

No session replication needed
Simpler application design
Better cache locality

Cons

Uneven load distribution
Harder to scale down
Server failure affects sessions

Health Checks

Active Health Checks

Load balancer actively probes backends.

HAProxy

backend web_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    server web1 10.0.1.10:80 check inter 5s fall 3 rise 2
    # Check every 5s, fail after 3 failures, recover after 2 successes

Nginx

upstream backend {
    server 10.0.1.10:80 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:80 max_fails=3 fail_timeout=30s;
}

AWS Target Group

{
  "HealthCheckProtocol": "HTTP",
  "HealthCheckPath": "/health",
  "HealthCheckIntervalSeconds": 30,
  "HealthCheckTimeoutSeconds": 5,
  "HealthyThresholdCount": 2,
  "UnhealthyThresholdCount": 3
}

Passive Health Checks

Monitor actual traffic for failures.

Nginx Plus

upstream backend {
    server 10.0.1.10:80;
    server 10.0.1.11:80;

    # Passive health check
    zone backend 64k;
    health_check interval=5s fails=3 passes=2;
}

Health Check Best Practices

Dedicated health endpoint

@app.route('/health')
def health_check():
    # Check database connection
    # Check external dependencies
    # Return 200 if healthy, 503 if not
    return {'status': 'healthy'}, 200

Check critical dependencies
- Database connectivity
- Required external APIs
- Disk space
- Memory availability
Fast checks (< 1 second)
- Don’t perform expensive operations
- Cache dependency checks
Meaningful responses
- 200: Healthy
- 503: Unhealthy (temporarily unavailable)
- Different codes for different issues

Advanced Patterns

Blue-Green Deployment

Nginx - Switch Traffic

# Initially all traffic to blue
upstream backend {
    server blue.example.com:80 weight=100;
    server green.example.com:80 weight=0;
}

# After validation, switch to green
# upstream backend {
#     server blue.example.com:80 weight=0;
#     server green.example.com:80 weight=100;
# }

Canary Deployment

HAProxy - 10% to Canary

backend web_servers
    balance roundrobin
    server stable1 10.0.1.10:80 weight 9 check
    server stable2 10.0.1.11:80 weight 9 check
    server canary  10.0.1.50:80 weight 2 check
    # 2/(9+9+2) = 10% to canary

Nginx - Percentage Split

split_clients $request_id $variant {
    10%     canary;
    *       stable;
}

server {
    location / {
        proxy_pass http://$variant;
    }
}

upstream stable {
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

upstream canary {
    server 10.0.1.50:80;
}

Geographic Load Balancing

AWS Route 53 - Geolocation

{
  "Name": "example.com",
  "Type": "A",
  "SetIdentifier": "US-East",
  "GeoLocation": {"ContinentCode": "NA"},
  "ResourceRecords": [{"Value": "192.0.2.1"}]
},
{
  "Name": "example.com",
  "Type": "A",
  "SetIdentifier": "EU-West",
  "GeoLocation": {"ContinentCode": "EU"},
  "ResourceRecords": [{"Value": "198.51.100.1"}]
}

Content-Based Routing

HAProxy - Path-Based

frontend http_front
    bind *:80
    acl is_api path_beg /api/
    acl is_static path_beg /static/

    use_backend api_servers if is_api
    use_backend cdn_servers if is_static
    default_backend web_servers

backend api_servers
    balance leastconn
    server api1 10.0.2.10:8080 check

backend cdn_servers
    balance roundrobin
    server cdn1 10.0.3.10:80 check

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:80 check

Nginx - Location Blocks

server {
    location /api/ {
        proxy_pass http://api_backend;
    }

    location /static/ {
        proxy_pass http://cdn_backend;
    }

    location / {
        proxy_pass http://web_backend;
    }
}

Rate Limiting

Nginx

limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;

server {
    location /api/ {
        limit_req zone=one burst=20 nodelay;
        proxy_pass http://backend;
    }
}

HAProxy

backend web_servers
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request deny if { sc_http_req_rate(0) gt 100 }

Cloud Load Balancers

AWS Elastic Load Balancer

Application Load Balancer (ALB) - Layer 7

Path-based routing
Host-based routing
HTTP header routing
WebSocket support
Ideal for microservices

Network Load Balancer (NLB) - Layer 4

Ultra-low latency
Static IP addresses
Millions of requests per second
TCP/UDP/TLS traffic

Gateway Load Balancer (GWLB)

Third-party appliances
Transparent inspection
Firewall, IDS/IPS integration

Azure Load Balancer

Standard Load Balancer

Layer 4 (TCP/UDP)
High availability
Health probes
Outbound connections

Application Gateway

Layer 7 (HTTP/HTTPS)
WAF integration
SSL termination
URL-based routing

Google Cloud Load Balancer

Global Load Balancer

Anycast IP
Cross-region failover
HTTP(S) load balancing

Regional Load Balancer

Internal load balancing
TCP/UDP load balancing

Monitoring and Metrics

Key Metrics

Request Metrics

Requests per second
Request latency (p50, p95, p99)
Error rate (4xx, 5xx)
Active connections

Backend Metrics

Backend response time
Backend error rate
Health check status
Connection pool usage

Load Balancer Metrics

CPU utilization
Network throughput
Active flows
Dropped connections

Example Prometheus Metrics

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Latency percentiles
histogram_quantile(0.95, http_request_duration_seconds_bucket)

# Healthy backends
haproxy_backend_up

Best Practices

1. Choose the Right Algorithm

Stateless apps: Round robin or least connections
Stateful apps: Sticky sessions (IP hash, cookie)
Caching: Consistent hashing or URL hash
Mixed capacity: Weighted algorithms
Long connections: Least connections

2. Implement Proper Health Checks

Active and passive checks
Check critical dependencies
Fast response times (< 1s)
Appropriate intervals and thresholds

3. Plan for Failure

Graceful degradation
Circuit breakers
Automatic failover
Backup pools

4. Monitor Everything

Request rates and latencies
Error rates
Backend health
Load balancer health

5. Use Connection Pooling

Reuse backend connections
Configure appropriate pool sizes
Monitor pool exhaustion

6. Enable Logging

Access logs for troubleshooting
Error logs for failures
Structured logging for analysis

7. SSL/TLS Offloading

Terminate SSL at load balancer
Reduce backend CPU usage
Centralize certificate management

8. Gradual Rollouts

Use weighted routing for deployments
Canary releases for new versions
Quick rollback capability

9. Geographic Distribution

Route users to nearest data center
Reduce latency
Improve user experience

10. Regular Testing

Load testing under realistic conditions
Failover testing
Capacity planning

Troubleshooting

Uneven Load Distribution

Symptoms: Some servers overloaded, others idle

Causes

Long-lived connections with round robin
Sticky sessions with IP hash
Varying request complexity

Solutions

Use least connections algorithm
Implement connection timeouts
Use consistent hashing

Session Loss on Server Failure

Symptoms: Users logged out when server fails

Causes

Server-side sessions without replication
Sticky sessions to failed server

Solutions

Implement session replication
Use external session store (Redis, Memcached)
Client-side sessions (JWT tokens)

High Latency

Symptoms: Slow response times

Causes

Unhealthy backends not removed
Too many connections to backends
Load balancer overhead

Solutions

Tune health check parameters
Adjust connection pooling
Use Layer 4 instead of Layer 7 if content routing not needed

Connection Timeouts

Symptoms: Connections dropped or timeout errors

Causes

Aggressive timeout settings
Slow backend processing
Network issues

Solutions

Increase timeout values
Optimize backend performance
Check network connectivity

Conclusion

Load balancing is essential for building scalable, highly available systems. Key takeaways:

Choose the right layer (L4 vs L7) based on routing needs
Select appropriate algorithm for your traffic patterns
Implement comprehensive health checks to detect failures
Monitor continuously to detect issues early
Plan for failure with graceful degradation
Test thoroughly under realistic conditions
Document configuration and changes

The “best” load balancing method depends on your specific requirements. There’s no one-size-fits-all solution. Start simple (round robin), measure, and optimize based on observed behavior.

Load Balancing Methods

On This Page

Why Load Balancing?

Benefits

Use Cases

Load Balancing Layers

Layer 4 (Transport Layer)

Layer 7 (Application Layer)

Load Balancing Algorithms

1. Round Robin

2. Weighted Round Robin

3. Least Connections

4. Weighted Least Connections

5. IP Hash / Source IP

6. Consistent Hashing

7. Least Response Time

8. Random

9. URL Hash

10. Header-Based / Cookie-Based

Session Persistence (Sticky Sessions)

Cookie-Based Persistence

Source IP Persistence

Trade-offs

Health Checks

Active Health Checks

Passive Health Checks

Health Check Best Practices

Advanced Patterns

Blue-Green Deployment

Canary Deployment

Geographic Load Balancing

Content-Based Routing

Rate Limiting

Cloud Load Balancers

AWS Elastic Load Balancer

Azure Load Balancer

Google Cloud Load Balancer

Monitoring and Metrics

Key Metrics

Example Prometheus Metrics

Best Practices

1. Choose the Right Algorithm

2. Implement Proper Health Checks

3. Plan for Failure

4. Monitor Everything

5. Use Connection Pooling

6. Enable Logging

7. SSL/TLS Offloading

8. Gradual Rollouts

9. Geographic Distribution

10. Regular Testing

Troubleshooting

Uneven Load Distribution

Session Loss on Server Failure

High Latency

Connection Timeouts

Conclusion