Serverless MCP on AWS: Lambda vs. Fargate for Agentic AI Workloads

Ran Isenberg
Jul 16
12 min read

Agentic AI depends on more than prompts—it requires secure, structured access to data and systems to operate effectively within context. Model Context Protocol (MCP) enables this by bridging AI agents with organizational systems in a scalable and governed manner.

In this post, you will learn how to build production-grade remote MCP servers on AWS using both serverless and container-based approaches with a GitHub repository MCP template with full IaC, tests, CI/CD pipeline and documentation.

The Case for MCP Servers
Choosing the Right AWS Architecture for Your MCP Server
1. The Problem with Lambda-Based MCP Servers
2. GitHub Template
Option 1: AWS Lambda with Web Adapter & MCP SDK
Option 2: "Native" AWS Lambda MCP Server
Option 3: AWS Fargate ECS MCP Server
Summary
The Future

The Case for MCP Servers

Model Context Protocol (MCP) enables AI agents to securely and effectively interact with your organization's data and systems, going beyond simple prompts to deliver meaningful, context-aware outcomes. Whether it's read-only resources, interacting with tools, or generating prompts, MCP servers extend the capabilities of agentic AI and its context with your applications and data.

Think of it as Vibe Coding's superpower, enabling your agent to go even further.

If you're unfamiliar with MCP and want a deep dive into the subject, I highly recommend Anton's insightful post, "Building Serverless MCP Servers and What Does Peppa Pig Have To Do With It."

Choosing the Right AWS Architecture for Your MCP Server

Before building your MCP solution on AWS, it's crucial to identify your users' requirements:

Do you require continuous data streaming, or are short-lived sessions sufficient for your needs?
Will your users primarily be developers engaging in "vibe coding" or AI agents autonomously executing tasks?
Do you build for internal developers or paying customers from across the globe?

All these questions can point you to a different architecture choice.

Ultimately, your choice of MCP server architecture depends on your specific needs and constraints, including expected usage patterns, performance and scalability requirements, cost considerations, developer experience, observability, and security.

In this post, I will compare three Serverless architecture options and make your decision easier. First, let's take a look at the default candidate: Lambda functions.

The Problem with Lambda-Based MCP Servers

Running MCP servers on Lambda functions is a mixed bag at the moment.

There's no official Lambda MCP support from either AWS or Anthropic.

The protocol was initially designed to be streamable (using SSE) and long-running (stateless is now supported). And there has always been the assumption that it's a server, so we need to open a socket and wait for messages.

However, that's not the experience with Lambda; it's not a server but a function.

It is a short-lived (15 min max) function that provides a different DevEx, which is not server-like, unlike Fargate ECS and its containers.

So, we have three options for serverless HTTP-based MCP servers:

Mimic a web server with Lambda web adapter and use an official MCP SDK.
Implement an MCP parser yourself on Lambda for a "native" experience.
Use Fargate ECS with official MCP SDK - server all the way.

For this post, I will use Python and its MCP SDK - FastMCP.

By exploring the strengths and trade-offs of each approach, you'll be better able to select the right architecture for your MCP server.

Please note that the MCP protocol and SDKs are constantly evolving, so some data may not be up-to-date going forward.

GitHub Template

At the time of writing, options one and two are available in my GitHub template repository: AWS MCP Lambda Cookbook.

This repository provides a working, deployable, open-source-based, serverless MCP server blueprint featuring an AWS Lambda function and AWS CDK Python code, incorporating all best practices and a complete CI/CD pipeline.

Option 1: AWS Lambda with Web Adapter & MCP SDK

The AWS Lambda web adapter project enables you to run your server-based code on Lambda. Sort of a lift and shift move from ECS containers/EC2 servers to Lambda.

In this case, we will use it to spin up an MCP server using the FastMCP SDK and configure it for stateless HTTP. Our custom tools, resources, and logic are the last in the chain.

The web adapter framework handles the function entry on the "/MCP" API GW endpoint, translating the request from the API Gateway into the payload that FastMCP expects to receive on its open socket. It also manages all Lambda lifecycle events, like initialization, event and gracefully terminating your FastMCP server.

In addition, streaming and long sessions are not supported; if you require it, head over to the Fargate option.

And if wasn't obvious, we use a mono-Lambda/Lambda-lith - single function for all MCP tools/resources/prompts etc. as all traffic passes on 'mcp' endpoint.

If you require more HTTP paths other than '/mcp,' you will need to add FastAPI into the loop and mount '/mcp' endpoint with FastMCP, making the chain even longer with:

Lambda web adapter -> FastAPI -> FastMCP.

Let's review the pros and cons of each category.

Performance

The Lambda web adapter extension introduces a significant cold start, especially for Python. Expect 1-3 seconds of cold starts. However, vibe coding as a whole isn't blazing fast - I find myself staring at the screen, waiting for even more than a minute at a time (depending on task complexity). So, what's a couple of seconds of cold start once in a while?

However, it depends on your use case and requirements.

The second and third options in this post have better performance.

Cost considerations

Lambda functions are cost-effective - you don't pay for them when there's no traffic. However, at large-scale, consistent traffic, Fargate might be the more cost-effective option. Use the AWS Pricing Calculator and determine your traffic and expected cost.

Developer experience

You've got multiple layers at play - the web adapter, FastMCP, and your code. If you need to add more HTTP paths, you will also need to add FastAPI and mount FastMCP to it as another library that requires initialization and configuration during the cold start.

I added Lambda Web Adapter as a Lambda extension. I wrote a lengthy blog post called "A Critical Look at AWS Lambda Extensions: Pros, Cons, and Recommended Use Cases" on why I think extensions are problematic, but here I don't have many choices.

Getting it to work wasn't pretty either, as seen in the CDK code. You need to add a bash script (which felt very wrong!), environment variables, and all sorts of dark magic.

But putting all that aside, the handler below does look clean, even though it doesn't look very classic Lambda-like. There's no event dictionary, and you hope the SDKs play nice and call your code. But they do, eventually.

Debugging isn't easy. You have many loggers with different log levels and formats, making it challenging to understand where something fails and at which SDK.

And what about testing - I was able to create effective unit tests for the tools and end-to-end (E2E) tests that utilize the MCP client and connect to the deployed server. It's not bad, but not perfect, as it's harder to test the server locally within a CI/CD pipeline (I suppose it's possible, but although I haven't done it yet).

However, on the bright side, once you set it up, it's the easiest way to get the latest MCP version support. This is quite critical as the protocol changes every 2 weeks at this stage.

handler with fastmcp — https://github.com/ran-isenberg/aws-lambda-mcp-cookbook/blob/main/service/handlers/mcp.py

Observability

You have many loggers with different log levels and formats, making it challenging to understand where something fails and at which SDK.

You can enable tracing on the function as a whole, but you won't get the Powertools for AWS Lambda tracing experience, which allows you to trace individual Python functions.

Custom CloudWatch metrics - you will need to write them yourself; you cannot use Powertools here, either.

Overall, you have logs and some amount of tracing, and you can send metrics, but it's not the best or smoothest experience.

Security

FastMCP has native OAuth support (and OIDC coming soon) for MCP; you can also use Cognito/IAM authorizers (less preferred as it less native or easier to the user). However, it's not trivial, and most IDPs, Cognito included, don't support it properly yet. I will add Cognito support to my template - that's on my to-do list.

When to use

If your traffic is erratic and comes in bursts, and you want to reduce cost, Lambda is the way to go. However, if that is not the case, pay attention to the overall cost, as it may indicate using Fargate ECS.

If you are building an internal developer platform (platform engineering FTW!) MCP servers, where traffic is erratic and developers can wait another 2-3 seconds for a cold start, are a very cost-effective approach.

I provided more examples of how platform engineering goes hand in hand with agentic AI and MCP in my post, "Agentic AI & MCP for Platform Teams: Strategy and Real-World Patterns."

Template to use

Check out https://github.com/ran-isenberg/aws-lambda-mcp-cookbook

Option 2: "Native" AWS Lambda MCP Server

Flow of events: Agents call the MCP API GW endpoint ('/mcp'), a function is invoked, and the MCP parser is the entry point handler function. It parses the event and understands which tool/resource/prompt or prompt is required to pass the input, too, and returns a proper MCP response. Any raised exception is caught and returned as an MCP error.

The MCP Parser is responsible for the entire MCP lifecycle.

When I say native Lambda, I mean that we write the Lambda function's handler's entry function as we do in our regular Lambda function. In addition, there are no extra libraries or web adapters. Simple, fast, and clean, as Lambda functions were intended to be.

However, streaming is not supported (at least not in Python); if you require it, consider the Fargate option. And if it wasn't obvious, we use a mono-Lambda/Lambda-lith - single function for all MCP tools/resources/prompts, etc., as all traffic passes on the 'mcp' endpoint.

The primary challenge is to implement the MCP parser and provide a similar DevEx to that of the MCP SDKs. I started with the AWS labs GitHub example. The AWS example helped me to understand the protocol, write tests, and see it in action. However, it doesn't cover the entire MCP spec or provide all the security validations that the MCP parser should have.

I refactored it and transformed it into an enhanced MCP parser, adding support for previously missing parts of the protocol; however, it's far from complete.

I modified the DevEx to my liking, improving observability by utilizing best-practice utilities such as Powertools for AWS Lambda and Pydantic models for enhanced error handling and secure parsing (FastMCP also uses Pydantic).

However, the main issue is that it's far from being fully compliant with the latest MCP version. Even worse, MCP changes every two weeks or so.

I decided to stop chasing the protocol and use FastMCP (the first option in this post).

However, there are cases where you'd want to choose this approach, especially when building security-oriented MCP gateway products where you might want to be "man in the middle" so you can alter HTTP requests in both directions.

Let's review the pros and cons of each category.

Performance

The MCP parser is lean and fast, with minimal cold starts; no extra frameworks are loaded.

Cold starts will affect you, but they are significantly better than with the Web Adapter solution, which adds a noticeable impact to the cold start time. Want to learn how to improve cold starts? Check out my blog post.

Bottom line: to be honest, vibe coding as a whole isn't blazing fast - I find myself staring at the screen, waiting for even minutes at a time. So, what's a 1-2 second cold start once in a while?

Cost considerations

Same as the first Lambda option. Lambda functions are cost-effective - you don't pay for them when there's no traffic. However, at higher scale traffic, Fargate might be the cheaper option. Use the AWS Pricing Calculator and determine your traffic and expected cost.

Developer experience

This is the best developer experience of all the three options. With a native Lambda function, we can do whatever we want; we control all logs (format and level) from one logger configuration, traces, and the handler's code.

In addition, testing is simpler. We can have a "local" IDE testing experience.

You can view sample tests in the GitHub repository presented below and learn about best practices for testing Lambda functions in my blog post, "Guide to Serverless & Lambda Testing - Part 2 - Testing Pyramid."

Native handler with mcp parser — https://github.com/ran-isenberg/aws-lambda-mcp-cookbook/tree/main/service/mcp_lambda_handler

However, the biggest issue is that you will need to chase the MCP versions and write the MCP parser yourself, which can be quite an effort!

Observability

You can leverage AWS X-ray, CloudWatch logging, and custom CloudWatch

There's one log format, and tracing is supported across both MCP Parser and tools alike.

Security

Unless you develop the OAuth support (and OICD coming soon) for MCP, you can also use Cognito/IAM authorizers. You are also required to implement all input validation and security protections by yourself in the MCP parser.

When to use

If you want the best DevEx/observability and total control of the code, this is the way to go. If you need to alter or monitor the requests going in and out, or if you build a security-oriented MCP server/gateway product, you should write the MCP parser and choose this approach.

Template to use

You can get started with my MCP Parser variation. Head over to

https://github.com/ran-isenberg/aws-lambda-mcp-cookbook .

and the parser. https://github.com/ran-isenberg/aws-lambda-mcp-cookbook/tree/main/service/mcp_lambda_handler

Option 3: AWS Fargate ECS MCP Server

As I've mentioned previously, MCP is intended to be served on a server, a long-running server supporting potentially long-running tasks. This is where Fargate ECS shines.

Fargate ECS allows us to configure a managed (serverless-y) ECS cluster.

AWS Fargate is a service that elevates containerization services, such as ECS or EKS, to the next level. It adds another layer of welcomed abstraction and ease of management.

In this design, we will utilize Fargate ECS to deploy our MCP server container image.

The image can contain your own custom MCP parser or an official framework, such as FastMCP.

If you want to learn more about Fargate ECS and see AWS CDK code for deploying a production-grade cluster, check out my post, "Build a Serverless Web Application on Fargate ECS with AWS CDK."

Let's review the pros and cons of each category.

Performance

No cold starts. Fargate ECS supports scaling and multiple machine memory size or CPU number combinations. It provides the best performance among the three options, albeit at a higher cost. Allows streaming support (no 15-minute runtime limit like Lambda)

Cost considerations

Your MCP server is up 24/7. However, as traffic scale increases, Fargate ECS might cost less than its Lambda counterparts. Use the AWS Pricing Calculator and determine your traffic and expected cost.

Developer experience

Building Docker images and dealing with long deployment times is the name of the game. I'm not a fan. Testing isn't as straightforward as I'd like, but it can be done, both E2E and local testing, to some degree. Debugging Fargate when it does not work as expected isn't straightforward, and setting up a production-grade Fargate ECS cluster is also not simple.

Testing consists of unit tests for the tools and resources, as well as end-to-end (E2E) tests using the MCP client, similar to the Lambda FastMCP variation.

Observability

No limitations. However, from a logging perspective, you need to consider all the moving parts: VPC logs, ECS logs, FastMCP logs, and your application logs. They use different formats and are stored in various locations, and you need to monitor them all. The initial setup requires more effort, but it's doable.

Security

More moving parts and VPC resources mean complex security in addition to setting up MCP authentication methods.

There are no Cognito/IAM authorizers for your ALB. You need to configure the OAuth support of FastMCP(and OICD coming soon) for MCP.

If you choose to build your own MCP parser, you will need to implement all these security mechanisms yourself.

When to use

When you require the best performance, a 24/7 cost-effective solution, or you need MCP streaming for long-running tools and agent actions.

Template to use

Coming soon but you can start by using the CDK code at "Build a Serverless Web Application on Fargate ECS with AWS CDK".

Summary

Category	Lambda + Web Adapter (FastMCP)	Native Lambda (Custom MCP Parser)	Fargate ECS
Performance	Cold starts (1–3s), no streaming, SDK overhead	Faster cold starts, no streaming	Always-on, best performance, supports streaming
Cost	Low cost on bursty traffic, pay-per-use	Low cost on bursty traffic, pay-per-use	24/7 cost, may be cheaper at scale
Dev Experience	Complex setup (Web Adapter + FastMCP), poor local testing	Clean Lambda DX, full control, easy to test locally	Docker complexity, long deploys, harder to debug
Observability	Fragmented logs, minimal tracing, no Powertools support	Unified logs, Powertools, X-Ray support	Full observability, but logs in multiple locations
Security	OAuth supported in SDK (OIDC coming), Cognito tricky	Full control, but must implement security manually	Flexible but complex due to ALB/VPC/OAuth setup
Streaming Support	No	No	Yes
When to Use	Quickest way to get MCP working with SDK	Best for devs wanting full control & observability	Best for 24/7, streaming, or performance-critical use
Python Template	MCP Lambda Cookbook (FastMCP)	MCP Lambda Cookbook	Coming soon (based on Fargate web app CDK template)

The Future

MCP is constantly evolving, with new security features being requested and added by the team. However, native AWS support is lacking. The best option at the moment is to use Fargate. The Lambda web adapter experience is not good enough from both performance and observability perspectives.

What will the future bring us? It is hard to tell. With MCP and A2A continually evolving and gaining popularity, I expect MCP to mature and gain adoption in the form of MCP gateways, registries, and additional MCP server adaptations, which will vary in their DevEx and programming language.

I assume AWS and other cloud providers will provide a better implementation (see the recent launch of Kafka native support).

But for now, I'd choose either Fargate ECS or Lambda with a web adapter and FastMCP, letting cost and traffic patterns become the deciding factors.

Serverless MCP on AWS: Lambda vs. Fargate for Agentic AI Workloads

Table of Contents

The Case for MCP Servers

Choosing the Right AWS Architecture for Your MCP Server

The Problem with Lambda-Based MCP Servers

GitHub Template

Option 1: AWS Lambda with Web Adapter & MCP SDK

Performance

Cost considerations

Developer experience

Observability

Security

When to use

Template to use

Option 2: "Native" AWS Lambda MCP Server

Performance

Cost considerations

Developer experience

Observability

Security

When to use

Template to use

Option 3: AWS Fargate ECS MCP Server

Performance

Cost considerations

Developer experience

Observability

Security

When to use

Template to use

Summary

The Future

Related Posts