Skip to content

Load Balancing

Consider we have a service behind a load balancer, where we spawn instance of server as load increases . We initially started with N instances of server . As load increases we decide increment server instance to N+1. Assume server was using consistent hashing to distribute load. At this instance , all new connections which are supposed to go to new server will work fine , but what will happen to the old and running connections which were setup prior to spawning instance N+1.

Fundamentally , because of consistent hashing only fraction of old connection will move to this new instance , but those old connection had setup their TCP connection with some other server on the consistent hashing ring when number of server instance were N.

  1. Is there hand off of TCP connection to this new server instance?
  2. Does LB keep track of which server instance it assigned to connection when it saw first packet of the connection ? And keeps track of connection till its closure?

Assume these connections were long polling connections.

For use cases where a long running session is established between a client and an instance in a server cluster, you don't usually handoff ongoing sessions to new instances. However, you can (depending on use case) have mechanisms for a different instance to take over in case one of them fails. For instance, handoff in a game of chess, or in a shopping cart checkout is easier, compared to let's say an online game of PUBG.

Load balancers often use sticky sessions to redirect requests for existing sessions to the same server instance that was originally handling the request (unless it had died, of course). Only new sessions would be routed based on the hashing.

This still means that instances already processing at near peak capacity continue to get more workload, and the new instance only get part of the new workload. However, I guess load balancer can have enough buffer beyond thresholds so that this doesn't actually cause problems, or they can actively pick instances with low workload till a fair distribution is reached. I am not exactly sure what strategies are considered best approach to address this.

TCP/IP once goes to listening and waiting mode, there is only certain number of them defined by operating system. Sometimes these don’t clear up so we have to explicitly reduce the wait_time to a very low number. If TCP/IP port already picked them up we can not transfer.

TCP connections are tied to the machine, if it goes to another machine TCP will reply back RESET. So generally app servers are kept stateless and L7 load balancer takes care of handling the tcp connection. One such example where TCP connection stays with app server is websocket but in that case you use an L4 load balancer which maintains mapping between tcp connections and server machines handling those.

So if it's L7 load balancer is the one terminating the connection ... Doesn't it become bottleneck ? Since entry point for all your traffic is your load balancer. I understand we can have L4 load balancer in front of multiple L7 load balancer .... But then we are back to square one 🤔 , what happens if increment number of L7 load balancer by 1 ?

There is no consistent hashing involved and requests are handled by next layer of web servers. LB acts as an intermediate between client and server. It terminates the connection and forwards the request to web server.

LB is rarely a bottleneck in practice. More often your DB or some other part of the system becomes a bottlneck first.

What happens when this reconciliation of keys is happening when a new server is getting added or some server fails/crashes? Or rather ... What happens to existing connection?

In the L7 LB, where termination happens at the LB itself, the LB will just route requests to known healthy hosts. Any existing connections to the bad instances are forcibly closed.

What do you mean by "reconciliation of keys"?

By reconciliation what i meant was , when we decide to add a new instance of server , fundamentally by consistent hashing it is supposed to own some amount of keys from existing set of keys . Extending this basics to LB , keys here are our existing ongoing connections which were mapped to some server when we had N servers . Post addition of this (n+1)th server, some fraction of connection should have ideally got passed over to this new server . But because of the nature of TCP connection , we just can't simply ask this new server to respond to existing ongoing connections.

As others pointed out to something called as sticky sessions , seems possible solution. But then LB seems to become single point of failure or scalability as it needs to maintain sticky session .

Other option of LB (layer 7) terminating the TCP connection, looks great and seems to be maintaining consistent view connection . But then I feel is this not again single point of failure or scalability?

LB and consistent hashing usually don't go together. So when you run dynamo or something, the DB itself figures out who to talk to and why, not some load balancer. This can be zookeeper, which is fairly common, or some form of quorum or consensus. A LB might front the entire DB cluster, but won't be used to actually route requests to which DB instance contains what key. Most LB just do round robin, random allocation, there are some load based ideas too.

Even with sticky sessions, we just allow existing sessions to exist, assign new sticky sessions to new instances as they come in. If an instance dies, that sticky session dies with it, and then the user reconnects and gets assigned another instance.

LB are definitely a single point of failure, but they are unlikely to be a bottleneck. For some services you might have a backup LB waiting. Terminating at L7 is usually a somewhat light operation and most systems won't have an issue with it. Perhaps if you get to millions of requests per second you want a L3 (or L4) LB or something, but those are usually gateways to large systems such as an entire cloud provider. TLS termination is work, so when you do L7 LB, you do extra work. Imagine L3/L4 as being pass through where termination happens elsewhere, so you're offloading work to something else (other LB, other services) therefore you can scale up to huge numbers, tens of millions of requests per second.

LB generally doesn't do much heavy computation. Its sole job is to redirect requests. Looking up a session in a HashTable is very fast. The LB's network interface is going to be a bottleneck much before the computing power is. And then you look into strategies like Hardware load-balancer, Geography based DNS routing, and multiple IP addresses mapped to the same DNS, thus distributing the load of your load balancer over different instances.

Ususally there are routers, L4, L7 load balancers. Routers send traffic to L4 which inturn send traffic to L7. Even if L4 LB goes down generally state is distributed so other L4 LBs should be able take the traffic without any disruption. If L7 goes down L4 LB sends traffic to other L7 LBs and this mapping would be updated at L4 LB layer. Don't see how that addresses the problem. The first point of contact in the LB hierarchy is a unique host. If that goes down, a new first point of contact needs to be established. (and before that, the failure also needs to be discovered) Intuitively thinking, this means the client has to be notified of a new entry point or a standby instance assumes the same address. Both of these mean there is a window of disruption.

Consistent Hashing

There are two types of consistent hashing algorithms available, Ketama and Wheel. Both types are supported by libmemcached, and implementations are available for PHP and Java.

Other Resources