Home Uncategorized A Tale of Persistent Connections

A Tale of Persistent Connections

by ayazpashabb


In the world of microservices, every millisecond of latency counts, especially if you are running a core microservice which serves a high traffic throughput with strict SLAs on response times. If your microservice depends on multiple other services to respond, then it becomes important to optimise the flows wherever possible to speedup the interactions with the dependent services. While optimizations are good, sometimes they cause various unintended side effects. In this article, we will discuss about one such optimization, the problems it created and the solution we used.

Microservice Overview

We deploy all our microservices in kubernetes. We use kubernetes service object to abstract the application running on a set of pods. The communication between microservices deployed in kubernetes is happening via service layer as shown below.

The Problem

We have a service written in NodeJS which makes API calls to a downstream service written in Java. To start optimising the NodeJS service, we started looking at NewRelic slow trace metrics to identify if there are any obvious bottlenecks that we are seeing. 

Based on the observations, it looked like http.Agent.createConnection() was consuming lot of time. Remember NodeJS is single threaded and any time consuming activity on main thread is not good. In this case http.Agent.createConnection() was blocking the main thread thus impacting the response time and the overall throughput. Fixing this would give great performance benefits

The Fix

Based on our observations it looked like we are creating new http connections to downstream service for every incoming request and every new http connection would execute a http.Agent.createConnection() function on main thread. To avoid this, we decided to use persistent connections.

What is Persistent Connection?

A persistent connection (HTTP persistent connection) is a network communication channel that remains open for further HTTP requests and responses rather than closing after a single exchange. To maintain a persistent connection, TCP keep-alive packets are sent to prevent the connection from timing out. An open connection is faster for frequent data exchanges. Benefits include reduced network congestion, latency, CPU and memory usage due to the lower number of connections.

Enabling Persistent Connection in NodeJS

To enable persistent connections in NodeJS one can add the below code.

const http = require(‘http’);
http.globalAgent.keepAlive = true;


After deploying our changes to use persistent connections, we observed that the slow traces due to http.Agent.createConnection() are no longer visible in NewRelic slow trace metrics. There was a good improvement in response time of the service and per pod throughput was improved.

The Side Effect

While there is a great improvement in the metrics of NodeJS service, however we observed that there was a degradation in the downstream Java service.

  • Infrastructure footprint increased a lot.
  • CPU utilization was not even across all pods. 
  • Traffic distribution across pods was uneven.
  • Despite autoscaling, some pods would still remain at high CPU utilization.

The below image illustrates the uneven traffic distribution problem along with uneven CPU & Memory utilisation across the pods.

It was now clear that every pod was not utilised to its best extent due to which lot of pods were running unnecessarily and response time was not similar across different pods.

The Cause of Side Effect

The uneven traffic distribution occurs because inspite of auto-scaling the traffic is not being sent to new pods. Let us understand why.

  • Autoscaling logic was based on CPU > 70%
  • New pods get added when average CPU utilisation goes beyond 70%
  • Post addition of new pods, the average CPU utilisation comes below 70%
  • Few pods from upstream NodeJS service still had persistent connections with old pods of downstream Java service
  • Majority of the traffic was still sent to old pods only despite HPA (horizontal pod autoscaler) making new pods available

Ideally, the service layer of kubernetes should have taken care of load balancing the traffic among all the pods of Java service (both old & new pods) but that was not the case. Most of the traffic was still served by old pods. To understand why this is happening, let us deep-dive into how kubernetes service works.

Dive into Kubernetes Service

In kubernetes world, we send requests to kubernetes service and not individual pods and the kubernetes service is supposed to distribute the traffic evenly among pods and hence the uneven traffic distribution problem shouldn’t occur in first place.

Let us understand how the load balancing works in kubernetes services.

Unlike typical load balancers (ex. HAProxy, NGinX) the kubernetes service doesn’t have a process listening on an IP & a Port. In fact, the kubernetes service IP address will not be found in the cluster as well.

The kubernetes service IP is allocated by control plane. This IP is stored in etcd DB and no network interface is created with this IP address. The kube-proxy reads this service IP addresses from etcd DB and creates IPtables rules on each node. And these IPtables rules help in load balancing the traffic.

IPTables help with NAT (Network Address Translation). The rules are created such that if a request comes to a service IP address, it rewrites the request to pick one of the pod as destination.

Below is an example of IPTables rules where service IP is and pod IPs are,,

iptables -A PREROUTING -t nat -p tcp -d --dport 80 -m statistic --mode random --probability 0.33 -j DNAT --to-destination 

iptables -A PREROUTING -t nat -p tcp -d --dport 80 -m statistic --mode random --probability 0.5 -j DNAT --to-destination 

iptables -A PREROUTING -t nat -p tcp -d --dport 80 -j DNAT --to-destination

In this example, IPTables is using a statistic module random which helps with load distribution of the requests.

Persistence Connection and IPTables

The IPtables rules get invoked only during the connection phase of the request. Once the connection is established, the subsequent requests are channeled through the same connection only instead of new connection. When we use Persistent Connections the rules get invoked only once and all subsequent requests are sent to the same pod. Since the rules are not invoked anymore this results in uneven traffic distribution.

To summarise, if a connection is established with Pod-5 from Pod-1, the subsequent requests from Pod-1 are sent only to Pod-5. Hence, even when scale up happens and new pods are added, the traffic will still be sent to the old pods instead of new ones as shown below. 

However, the new pods (Pod-6, Pod-7 & Pod-8) do receive some minor portion of the traffic which is due to new TCP connections generated by Pod-1.


We looked at few possible solutions that can be implemented to solve the uneven traffic distribution problem.

Solution 1

A simple solution is to keep closing the connections after every request. However, this is not an ideal solution as will go back to the original problem of invoking http.agent#createConnection() function as explained in the earlier section.

Solution 2

Close the HTTP connection by setting the response header Connection: close periodically from the server so that the client is forced to re-establish new HTTP connections. This increases the probability of traffic going to newer pods and eventually resulting in even traffic distribution.

We can build an application logic that closes the HTTP connection after x number of requests or y number of seconds to find an ideal balance.

We wanted to avoid custom logic to handle HTTP connections and explored further for any stable library or out-of-box solutions that does same.

Solution 3

Use NGinX as a side car. NGinX is known for its high performance, stability, and low resource consumption. It has the capability of terminating all client connections and create separate independent connections to the both upstream and downstream servers.

In nginx.conf, the below parameter can be tweaked appropriately.
keepalive_requests – the number of requests a client can make over a single keepalive connection. The default value of this parameter is 100.

With these settings, NGinX closes connections after every x no. of requests. With this solution there is an extra hop of 2-3ms and requires extra resources for NGinX itself.

Solution 4 (final)

Use any language or framework specific library that can do similar mechanism i.e closing connections after certain number of requests or duration. One example of such library in NodeJS is Agent-KeepAlive.

This library helps in creating new HTTP connections in the client after a certain interval configured. Below is a sample code snippet with configuration.

// instantiating new HTTP agent with appropriate config
let keepaliveAgent = new agent({
    keepAlive : ENABLE_AGENT_KEEP_ALIVE, // true/false
    socketActiveTTL: SOCKET_ACTIVE_TTL // in seconds

// instantiating new HTTPs agent with appropriate config
let keepaliveHttpsAgent = new httpsAgent({
    keepAlive : ENABLE_AGENT_KEEP_ALIVE, // true/false
    socketActiveTTL: SOCKET_ACTIVE_TTL // in seconds

// using HTTP agent in the options for HTTP request
let options = {
       url: url,
       qs: params,
       method: "GET",
       agent: keepAliveHttpAgent

We are yet to explore options like running kube-proxy in IPVS Mode or using Ambassador API GW with Envoy proxy or some kind of service mesh (like Istio).


Persistent connections don’t scale out-of-the-box in Kubernetes and requires some extra layer to solve the problem. We can code it in the application or use some framework library to manage this.

It is recommended that both server and client implement this mechanism so that we don’t unintentionally impact each other.


You may also like

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: