Lessons Learned from Building a Low-Latency System Based on AWS Lambda
March 17, 2020

Serverless is a modern architecture pattern which has gotten a lot of traction in recent years. According to the AWS Lambda product page, “AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.” and the product page continues “You can use AWS Lambda to extend other AWS services with custom logic or create your own back-end services that operate at AWS scale, performance, and security.” 

It is an engineer’s dream to operate at AWS scale and performance. And it looks like you don’t have to do anything special: AWS will do the magic for your application to scale. Other services like AWS DocumentDB also promise great scalability. 

You might wonder: what if I simply take the unlimited scalability of Lambda and combine it with an auto-scaling DocumentDB? 

Would I get the magic of “operate at AWS scale, performance, and security”? 

Simple answer: It depends.

We tried it and it failed for our case.

We paid a big price for this lesson, and we want to share it with you.

The goal

Our goal was to build a simple REST service that could handle several orders of magnitude traffic spikes, e.g. rapid growth of online users from 1,000 to 100,000, while keeping response time under one second.

A naїve solution

We were not trying to implement anything tricky. The stack we chose was pretty standard and can be found in many guides and tutorials: 

  • AWS API Gateway as HTTPS server routing traffic to our Lambda
  • Lambda/node.js serverless backend
  • DocumentDB as the main storage. It supports MongoDB API and is dynamically scalable at runtime.
  • Serverless framework for gluing everything together

We knew about issues with AWS Lambda cold start and we were ready to do the “warm-ups.” Even a small amount of warmed-up Lambdas should be good enough, — we were using node.js, which is great at handling a lot of HTTP requests.

Expectations meet reality

Our team was very excited about writing backend with AWS Lambda as complex things just became “a piece of cake.” We did believe that scaling, logging, and metrics would be solved by AWS out of the box. And adding a new service took just a few lines of code. 

Once we’ve built the initial naїve version of our API we decided to do a performance test. We were shocked by the results. 

As you can see from the chart below, the latency was constantly growing with the increasing number of online users. With every hundred of concurrent users, our service responded slower and slower. And at some point, it became completely unavailable.

We relied on AWS Lambda auto-scaling and our code did not have any obvious errors, however, the reality was not looking promising. Taking a closer look, we found a couple more strange observations:

  • Even at minimum load, we had responses with a latency of 10-15 seconds
  • We got a lot of 504 HTTP status code errors even at a small concurrency rate
  • The number of AWS Lambda instances was enormous: 800-1,000 instances at ~1000 requests per second.

It turned out that our understanding of how things work were heavily biased by marketing claims and far from reality:

  • A single Lambda node.js instance can handle only one concurrent request when it is busy with any operation, even non-blocking operation, so when another concurrent request comes in AWS does not re-use it, but spins off a new Lambda instance. In our case, this means a new AWS Lambda instance is created for every new concurrent request. Every new Lambda instance creates a new connection to DocumentDB and at some point, there are ~1000 Lambdas with 1000 concurrent database connections. Obviously, this is not the most efficient DocumentDB pattern and sooner than later it becomes overwhelmed with an unmanageable number of connections. 
  • As soon as the number of concurrent requests exceeds the number of Lambda instances, the API Gateway responds with an HTTP 504 status code. 
  • Usually, a cold start takes 3-5 seconds, but when Lambda is deployed inside of VPC, it takes up to 10-15 seconds. When AWS Lambda connects to storage services like DocumentDB or RDS, this is the only option.

Final solution

Unfortunately, we did not have a magical solution to make everything work as expected. So we considered two possible ways to go from that point:

  1. To overcome the database being a scaling bottleneck, we switch to DynamoDB instead of DocumentDB. It provides REST interface and does not require any persistent connection. We resolve cold start issues with a massive warm-up for Lambdas. Now it will be possible to spin up a new Lambda instance for each concurrent request. Even though the default limit for maximum concurrent Lambdas for every AWS Account is 1,000, it could be increased on-demand. We successfully requested a limit upgrade to 10,000 without any extra fees. The only limits remaining are the number of IP addresses in every VPC and budget, but none of those was a problem for us at that point. 
  2. Simply migrate from Lambda to node.js application on Elastic Beanstalk for the most critical parts.

We could not predict what level of the warm-up was sufficient and at some point, warming up Lambdas would become more expensive than running static EC2 servers. So, we went with option B(above) for simplicity and to stay on track with our delivery schedule. 

It was hard to leave the realm of Lambdas, since now we had to take care of  things like deployment, configuration management, and scaling. 

As you can see from the diagram below, we placed an Elastic Beanstalk load balancer behind an API Gateway. This allowed us to check JWT authorization tokens on the gateway level and have multiple versions of the API in production.

It is worth mentioning that we are still using AWS Lambdas for the other services that do not have strict, low-latency requirements. 

Based on several performance tests we have fine-tuned auto-scaling of ElasticBeanstalk with t2.micro instances to handle the load within required SLA. It turns out we can have three instances handle regular traffic and then add more instances whenever traffic increases. 

We created conservative scaling rules, so sometimes we spin up a little bit more instances than required, but it responds well to radnom spiks of traffic. To successfully handle the load rate of 1,000 requests per second we spin up only 12 t2.micro instances (compared to 800-1,000 concurrent Lambdas).


  1. AWS Lambda is not a good fit for low latency systems with high load variability
  2. Avoid using services that require a persistent connection from Lambda code. You need to open and close a connection properly, and in Lambda there is no good way to implement it.
  3. Pay great attention to details when reading through documentation. For example, you may assume AWS Lambda scales unlimited and performs fast in all cases. However, there are always some limitations. Effectively, there is no SLA for AWS Lambda startup time for now.
  4. For low latency systems, do performance testing as early as possible. It helped us reveal the performance and scalability problem before it was unmanageable. 
  5. Follow news and updates from AWS as they are constantly updating their services. And some problems may already be solved. For example, in recent updates, they claimed that the problem with a long Lambda cold start inside of VPC is fixed.