Lessons Learned from Building a Low-Latency System Based on AWS Lambda

March 17, 2020

Serverless is a modern architecture pattern that has gotten a lot of traction in recent years. According to the AWS Lambda product page, “AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.” The product page explains further: “You can use AWS Lambda to extend other AWS services with custom logic or create your own back-end services that operate at AWS scale, performance, and security.” 

It is an engineer’s dream to operate at AWS scale and performance. And it looks like you don’t have to do anything special: AWS will do the magic for your application to scale. Other services like AWS DocumentDB also promise great scalability. 

You might wonder: what if I simply take the unlimited scalability of Lambda and combine it with an auto-scaling DocumentDB? 

Would I get the magic of “operate at AWS scale, performance, and security”? 

Simple answer: It depends.

We tried it and it failed for our case.

We paid a big price for this lesson, and we want to share it with you.

The goal

Our goal was to build a simple REST service that could handle several orders of magnitude traffic spikes, e.g. rapid growth of online users from 1,000 to 100,000, while keeping response time under one second.

A naїve solution

We were not trying to implement anything tricky. The stack we chose was pretty standard and can be found in many guides and tutorials: 

  • AWS API Gateway as HTTPS server routing traffic to our Lambda
  • Lambda/node.js serverless backend
  • DocumentDB as the main storage. It supports MongoDB API and is dynamically scalable at runtime.
  • Serverless framework for gluing everything together

We knew about issues with AWS Lambda cold start and we were ready to do the “warm-ups.” Even a small amount of warmed-up Lambdas should be good enough. We were using node.js, which is great at handling a lot of HTTP requests.

Expectations meet reality

Our team was very excited about writing backend with AWS Lambda as complex things just became “a piece of cake.” We believed that scaling, logging, and metrics would be solved by AWS out of the box. And adding a new service took just a few lines of code. 

Once we’ve built the initial naїve version of our API we decided to do a performance test. We were shocked by the results. 

As you can see from the chart below, the latency was constantly growing with the increasing number of online users. With every hundred of concurrent users, our service responded slower and slower. And at some point, it became completely unavailable.

We relied on AWS Lambda auto-scaling and our code did not have any obvious errors, however, the reality was not looking promising. Taking a closer look, we found a couple more strange observations:

  • Even at minimum load, we had responses with a latency of 10-15 seconds
  • We got a lot of 504 HTTP status code errors even at a small concurrency rate
  • The number of AWS Lambda instances was enormous: 800-1,000 instances at ~1000 requests per second.

It turned out that our understanding of how things work were heavily biased by marketing claims and far from reality:

  • A single Lambda node.js instance can handle only one concurrent request when it is busy with any operation, even non-blocking operation, so when another concurrent request comes in, AWS does not re-use it, but spins off a new Lambda instance. In our case, this meant that a new AWS Lambda instance was created for every new concurrent request. Every new Lambda instance created a new connection to DocumentDB and at some point, there were ~1000 Lambdas with 1000 concurrent database connections. Obviously, this is not the most efficient DocumentDB pattern and sooner than later it became overwhelmed with an unmanageable number of connections. 
  • As soon as the number of concurrent requests exceeded the number of Lambda instances, the API Gateway responded with an HTTP 504 status code. 
  • Usually, a cold start takes 3-5 seconds, but when Lambda is deployed inside of VPC, it takes up to 10-15 seconds. When AWS Lambda connects to storage services like DocumentDB or RDS, this is the only option.

Final solution

Unfortunately, we did not have a magical solution to make everything work as expected. So we considered two possible ways to go from that point:

  1. To overcome the database being a scaling bottleneck, we switched to DynamoDB instead of DocumentDB. It provided a REST interface and did not require any persistent connection. We resolved cold start issues with a massive warm-up for Lambdas. Now it will be possible to spin up a new Lambda instance for each concurrent request. Even though the default limit for maximum concurrent Lambdas for every AWS Account is 1,000, it could be increased on-demand. We successfully requested a limit upgrade to 10,000 without any extra fees. The only limits remaining are the number of IP addresses in every VPC and budget, but none of those were a problem for us at that point. 
  2. Simply migrate from Lambda to node.js application on Elastic Beanstalk for the most critical parts.

We could not predict what level of the warm-up was sufficient and at some point, warming up Lambdas would become more expensive than running static EC2 servers. So, we went with option B(above) for simplicity and to stay on track with our delivery schedule. 

It was hard to leave the realm of Lambdas, since now we had to take care of things like deployment, configuration management, and scaling. 

As you can see from the diagram below, we placed an Elastic Beanstalk load balancer behind an API Gateway. This allowed us to check JWT authorization tokens on the gateway level and have multiple versions of the API in production.

It is worth mentioning that we are still using AWS Lambdas for the other services that do not have strict, low-latency requirements. 

Based on several performance tests we have fine-tuned auto-scaling of ElasticBeanstalk with t2.micro instances to handle the load within required SLA. It turns out we can have three instances handle regular traffic and then add more instances whenever traffic increases. 

We created conservative scaling rules, so sometimes we spin up a few more instances than required, but it responds well to random spikes of traffic. To successfully handle the load rate of 1,000 requests per second we spin up only 12 t2.micro instances (compared to 800-1,000 concurrent Lambdas).


  1. AWS Lambda is not a good fit for low latency systems with high load variability
  2. Avoid using services that require a persistent connection from Lambda code. You need to open and close a connection properly, and in Lambda there is no good way to implement it.
  3. Pay great attention to details when reading through documentation. For example, you may assume AWS Lambda can scale without limit and perform quickly in all cases. However, there are always some limitations. Effectively, there is no SLA for AWS Lambda startup time for now.
  4. For low latency systems, do performance testing as early as possible. It helped us reveal the performance and scalability problem before it was unmanageable. 
  5. Follow news and updates from AWS as they are constantly updating their services. And some problems may already be solved. For example, in recent updates, they claimed that the problem with a long Lambda cold start inside of VPC is fixed.

Get in touch

We hope you found this post useful. If you’re interested in learning more about the services we provide, feel free to email us at You can find additional ways to reach us on our Contact page.