Skip to main content

Load Balancing Methods

November 10, 2025

Comprehensive guide to load balancing algorithms, strategies, and implementation patterns

Load balancing distributes traffic across multiple servers to improve availability, performance, and scalability of applications.

Why Load Balancing?

Benefits

  • High availability: Continue service if servers fail
  • Horizontal scalability: Add capacity by adding servers
  • Performance: Distribute load to prevent overload
  • Maintenance: Take servers offline without downtime
  • Geographic distribution: Serve users from nearest location

Use Cases

  • Web applications with high traffic
  • API endpoints requiring scalability
  • Database read replicas
  • Microservices architectures
  • Content delivery networks

Load Balancing Layers

Layer 4 (Transport Layer)

Routes based on IP address and TCP/UDP port.

Characteristics

  • Fast (minimal packet inspection)
  • Protocol-agnostic
  • Cannot make content-based decisions
  • Good for TCP/UDP traffic

Example: HAProxy L4

frontend tcp_front
    bind *:3306
    mode tcp
    default_backend mysql_servers

backend mysql_servers
    mode tcp
    balance roundrobin
    server mysql1 10.0.1.10:3306 check
    server mysql2 10.0.1.11:3306 check

Use Cases

  • Database connections
  • SMTP, IMAP mail servers
  • Game servers
  • VoIP/SIP traffic

Layer 7 (Application Layer)

Routes based on HTTP headers, cookies, URL paths, etc.

Characteristics

  • Content-aware routing
  • SSL termination
  • Session persistence
  • Slower (more processing)

Example: Nginx L7

upstream backend {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
    server 10.0.1.12:8080;
}

server {
    listen 80;
    server_name example.com;

    location /api/ {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Use Cases

  • Web applications
  • REST APIs
  • Microservices
  • Content-based routing

Load Balancing Algorithms

1. Round Robin

Distributes requests sequentially across servers.

How It Works

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A
Request 5 → Server B
...

Configuration Examples

HAProxy

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx

upstream backend {
    # Round robin is default
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

AWS ALB (Application Load Balancer)

# Round robin is default behavior
# No specific configuration needed

Pros

  • Simple and predictable
  • Equal distribution (if requests are similar)
  • No state required

Cons

  • Doesn’t account for server capacity
  • Doesn’t consider current load
  • Can overload slower servers

Best For

  • Homogeneous server pool
  • Similar request processing times
  • Stateless applications

2. Weighted Round Robin

Round robin with server capacity weights.

How It Works

Server A (weight 3): Gets 3 requests
Server B (weight 2): Gets 2 requests
Server C (weight 1): Gets 1 request

Request distribution:
A, A, A, B, B, C, A, A, A, B, B, C, ...

Configuration Examples

HAProxy

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:80 weight 3 check
    server web2 10.0.1.11:80 weight 2 check
    server web3 10.0.1.12:80 weight 1 check

Nginx

upstream backend {
    server 10.0.1.10:80 weight=3;
    server 10.0.1.11:80 weight=2;
    server 10.0.1.12:80 weight=1;
}

AWS Target Group

{
  "Targets": [
    {"Id": "i-1234567890abcdef0", "Weight": 300},
    {"Id": "i-abcdef1234567890a", "Weight": 200},
    {"Id": "i-567890abcdef12345", "Weight": 100}
  ]
}

Best For

  • Mixed server capacities
  • Gradual deployment (canary releases)
  • Cost optimization (use cheaper servers less)

3. Least Connections

Sends requests to server with fewest active connections.

How It Works

Server A: 5 connections
Server B: 3 connections
Server C: 7 connections

Next request → Server B (least connections)

Configuration Examples

HAProxy

backend web_servers
    balance leastconn
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx

upstream backend {
    least_conn;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Pros

  • Adapts to varying request durations
  • Prevents server overload
  • Better for long-lived connections

Cons

  • Requires connection tracking
  • More complex than round robin
  • May not balance requests if connection durations vary

Best For

  • WebSocket connections
  • Long-polling applications
  • Database connection pools
  • Varying request processing times

4. Weighted Least Connections

Combines least connections with server weights.

HAProxy

backend web_servers
    balance leastconn
    server web1 10.0.1.10:80 weight 2 check
    server web2 10.0.1.11:80 weight 1 check

Best For

  • Mixed server capacities with long connections
  • WebSocket servers of different sizes

5. IP Hash / Source IP

Routes based on client IP address.

How It Works

hash(client_ip) % num_servers = server_index

Client 192.0.2.1  → hash → Server A (always)
Client 192.0.2.50 → hash → Server C (always)
Client 198.51.100.10 → hash → Server B (always)

Configuration Examples

HAProxy

backend web_servers
    balance source
    hash-type consistent
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx

upstream backend {
    ip_hash;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Pros

  • Session persistence (same client → same server)
  • No session sharing needed
  • Simple implementation

Cons

  • Uneven distribution if clients are behind NAT
  • Doesn’t adapt to server load
  • Removing servers disrupts many sessions

Best For

  • Applications with server-side sessions
  • When session sharing is impractical
  • Small number of distinct clients

6. Consistent Hashing

Improved hash-based routing with minimal disruption.

How It Works

  • Servers placed on a hash ring
  • Client hashed to ring position
  • Routed to next server clockwise
  • Adding/removing servers affects only ~1/N of clients

HAProxy

backend web_servers
    balance uri
    hash-type consistent
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check
    server web3 10.0.1.12:80 check

Nginx (with hash directive)

upstream backend {
    hash $request_uri consistent;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Best For

  • Caching layers (CDN, reverse proxy)
  • Distributed storage systems
  • Dynamic server pools

7. Least Response Time

Routes to server with lowest response time.

How It Works

Server A: avg 50ms response
Server B: avg 120ms response
Server C: avg 80ms response

Next request → Server A (fastest)

AWS ALB

# Default behavior considers response time
# No explicit configuration needed

Nginx Plus (commercial)

upstream backend {
    least_time header;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

Best For

  • Geographically distributed servers
  • Mixed performance servers
  • Cloud environments with variable performance

8. Random

Randomly selects a server for each request.

Nginx

upstream backend {
    random;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

With Two Choices (Power of Two Random Choices)

upstream backend {
    random two least_conn;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

Best For

  • Large server pools
  • When simplicity is key
  • Power of two random (good balance vs. complexity)

9. URL Hash

Routes based on request URL.

HAProxy

backend web_servers
    balance uri
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check

Nginx

upstream backend {
    hash $request_uri;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

Best For

  • Cache optimization
  • Content-based routing
  • Segment-specific handling

Routes based on HTTP headers or cookies.

Nginx - Cookie

upstream backend {
    hash $cookie_jsessionid;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

HAProxy - Header

backend web_servers
    balance hdr(X-Session-ID)
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check

Best For

  • Session affinity
  • A/B testing
  • User segmentation

Session Persistence (Sticky Sessions)

HAProxy - Insert Cookie

backend web_servers
    cookie SERVERID insert indirect nocache
    server web1 10.0.1.10:80 cookie web1 check
    server web2 10.0.1.11:80 cookie web2 check

Nginx - Sticky Cookie

upstream backend {
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

AWS ALB

{
  "Type": "app_cookie",
  "AppCookieName": "JSESSIONID",
  "Duration": 86400
}

Source IP Persistence

Nginx

upstream backend {
    ip_hash;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

HAProxy

backend web_servers
    stick-table type ip size 1m expire 30m
    stick on src
    server web1 10.0.1.10:80 check
    server web2 10.0.1.11:80 check

Trade-offs

Pros

  • No session replication needed
  • Simpler application design
  • Better cache locality

Cons

  • Uneven load distribution
  • Harder to scale down
  • Server failure affects sessions

Health Checks

Active Health Checks

Load balancer actively probes backends.

HAProxy

backend web_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    server web1 10.0.1.10:80 check inter 5s fall 3 rise 2
    # Check every 5s, fail after 3 failures, recover after 2 successes

Nginx

upstream backend {
    server 10.0.1.10:80 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:80 max_fails=3 fail_timeout=30s;
}

AWS Target Group

{
  "HealthCheckProtocol": "HTTP",
  "HealthCheckPath": "/health",
  "HealthCheckIntervalSeconds": 30,
  "HealthCheckTimeoutSeconds": 5,
  "HealthyThresholdCount": 2,
  "UnhealthyThresholdCount": 3
}

Passive Health Checks

Monitor actual traffic for failures.

Nginx Plus

upstream backend {
    server 10.0.1.10:80;
    server 10.0.1.11:80;

    # Passive health check
    zone backend 64k;
    health_check interval=5s fails=3 passes=2;
}

Health Check Best Practices

  1. Dedicated health endpoint

    @app.route('/health')
    def health_check():
        # Check database connection
        # Check external dependencies
        # Return 200 if healthy, 503 if not
        return {'status': 'healthy'}, 200
    
  2. Check critical dependencies

    • Database connectivity
    • Required external APIs
    • Disk space
    • Memory availability
  3. Fast checks (< 1 second)

    • Don’t perform expensive operations
    • Cache dependency checks
  4. Meaningful responses

    • 200: Healthy
    • 503: Unhealthy (temporarily unavailable)
    • Different codes for different issues

Advanced Patterns

Blue-Green Deployment

Nginx - Switch Traffic

# Initially all traffic to blue
upstream backend {
    server blue.example.com:80 weight=100;
    server green.example.com:80 weight=0;
}

# After validation, switch to green
# upstream backend {
#     server blue.example.com:80 weight=0;
#     server green.example.com:80 weight=100;
# }

Canary Deployment

HAProxy - 10% to Canary

backend web_servers
    balance roundrobin
    server stable1 10.0.1.10:80 weight 9 check
    server stable2 10.0.1.11:80 weight 9 check
    server canary  10.0.1.50:80 weight 2 check
    # 2/(9+9+2) = 10% to canary

Nginx - Percentage Split

split_clients $request_id $variant {
    10%     canary;
    *       stable;
}

server {
    location / {
        proxy_pass http://$variant;
    }
}

upstream stable {
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

upstream canary {
    server 10.0.1.50:80;
}

Geographic Load Balancing

AWS Route 53 - Geolocation

{
  "Name": "example.com",
  "Type": "A",
  "SetIdentifier": "US-East",
  "GeoLocation": {"ContinentCode": "NA"},
  "ResourceRecords": [{"Value": "192.0.2.1"}]
},
{
  "Name": "example.com",
  "Type": "A",
  "SetIdentifier": "EU-West",
  "GeoLocation": {"ContinentCode": "EU"},
  "ResourceRecords": [{"Value": "198.51.100.1"}]
}

Content-Based Routing

HAProxy - Path-Based

frontend http_front
    bind *:80
    acl is_api path_beg /api/
    acl is_static path_beg /static/

    use_backend api_servers if is_api
    use_backend cdn_servers if is_static
    default_backend web_servers

backend api_servers
    balance leastconn
    server api1 10.0.2.10:8080 check

backend cdn_servers
    balance roundrobin
    server cdn1 10.0.3.10:80 check

backend web_servers
    balance roundrobin
    server web1 10.0.1.10:80 check

Nginx - Location Blocks

server {
    location /api/ {
        proxy_pass http://api_backend;
    }

    location /static/ {
        proxy_pass http://cdn_backend;
    }

    location / {
        proxy_pass http://web_backend;
    }
}

Rate Limiting

Nginx

limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;

server {
    location /api/ {
        limit_req zone=one burst=20 nodelay;
        proxy_pass http://backend;
    }
}

HAProxy

backend web_servers
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request deny if { sc_http_req_rate(0) gt 100 }

Cloud Load Balancers

AWS Elastic Load Balancer

Application Load Balancer (ALB) - Layer 7

  • Path-based routing
  • Host-based routing
  • HTTP header routing
  • WebSocket support
  • Ideal for microservices

Network Load Balancer (NLB) - Layer 4

  • Ultra-low latency
  • Static IP addresses
  • Millions of requests per second
  • TCP/UDP/TLS traffic

Gateway Load Balancer (GWLB)

  • Third-party appliances
  • Transparent inspection
  • Firewall, IDS/IPS integration

Azure Load Balancer

Standard Load Balancer

  • Layer 4 (TCP/UDP)
  • High availability
  • Health probes
  • Outbound connections

Application Gateway

  • Layer 7 (HTTP/HTTPS)
  • WAF integration
  • SSL termination
  • URL-based routing

Google Cloud Load Balancer

Global Load Balancer

  • Anycast IP
  • Cross-region failover
  • HTTP(S) load balancing

Regional Load Balancer

  • Internal load balancing
  • TCP/UDP load balancing

Monitoring and Metrics

Key Metrics

Request Metrics

  • Requests per second
  • Request latency (p50, p95, p99)
  • Error rate (4xx, 5xx)
  • Active connections

Backend Metrics

  • Backend response time
  • Backend error rate
  • Health check status
  • Connection pool usage

Load Balancer Metrics

  • CPU utilization
  • Network throughput
  • Active flows
  • Dropped connections

Example Prometheus Metrics

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Latency percentiles
histogram_quantile(0.95, http_request_duration_seconds_bucket)

# Healthy backends
haproxy_backend_up

Best Practices

1. Choose the Right Algorithm

  • Stateless apps: Round robin or least connections
  • Stateful apps: Sticky sessions (IP hash, cookie)
  • Caching: Consistent hashing or URL hash
  • Mixed capacity: Weighted algorithms
  • Long connections: Least connections

2. Implement Proper Health Checks

  • Active and passive checks
  • Check critical dependencies
  • Fast response times (< 1s)
  • Appropriate intervals and thresholds

3. Plan for Failure

  • Graceful degradation
  • Circuit breakers
  • Automatic failover
  • Backup pools

4. Monitor Everything

  • Request rates and latencies
  • Error rates
  • Backend health
  • Load balancer health

5. Use Connection Pooling

  • Reuse backend connections
  • Configure appropriate pool sizes
  • Monitor pool exhaustion

6. Enable Logging

  • Access logs for troubleshooting
  • Error logs for failures
  • Structured logging for analysis

7. SSL/TLS Offloading

  • Terminate SSL at load balancer
  • Reduce backend CPU usage
  • Centralize certificate management

8. Gradual Rollouts

  • Use weighted routing for deployments
  • Canary releases for new versions
  • Quick rollback capability

9. Geographic Distribution

  • Route users to nearest data center
  • Reduce latency
  • Improve user experience

10. Regular Testing

  • Load testing under realistic conditions
  • Failover testing
  • Capacity planning

Troubleshooting

Uneven Load Distribution

Symptoms: Some servers overloaded, others idle

Causes

  • Long-lived connections with round robin
  • Sticky sessions with IP hash
  • Varying request complexity

Solutions

  • Use least connections algorithm
  • Implement connection timeouts
  • Use consistent hashing

Session Loss on Server Failure

Symptoms: Users logged out when server fails

Causes

  • Server-side sessions without replication
  • Sticky sessions to failed server

Solutions

  • Implement session replication
  • Use external session store (Redis, Memcached)
  • Client-side sessions (JWT tokens)

High Latency

Symptoms: Slow response times

Causes

  • Unhealthy backends not removed
  • Too many connections to backends
  • Load balancer overhead

Solutions

  • Tune health check parameters
  • Adjust connection pooling
  • Use Layer 4 instead of Layer 7 if content routing not needed

Connection Timeouts

Symptoms: Connections dropped or timeout errors

Causes

  • Aggressive timeout settings
  • Slow backend processing
  • Network issues

Solutions

  • Increase timeout values
  • Optimize backend performance
  • Check network connectivity

Conclusion

Load balancing is essential for building scalable, highly available systems. Key takeaways:

  1. Choose the right layer (L4 vs L7) based on routing needs
  2. Select appropriate algorithm for your traffic patterns
  3. Implement comprehensive health checks to detect failures
  4. Monitor continuously to detect issues early
  5. Plan for failure with graceful degradation
  6. Test thoroughly under realistic conditions
  7. Document configuration and changes

The “best” load balancing method depends on your specific requirements. There’s no one-size-fits-all solution. Start simple (round robin), measure, and optimize based on observed behavior.