AWS Lambda Cookbook - Part 5 - Input Validation Best Practices
Updated: Dec 13, 2022
What makes an AWS Lambda handler resilient, traceable, and easy to maintain? How do you write such a code?
In this blog series, I’ll attempt to answer these questions by sharing my knowledge and AWS Lambda best practices, so you won’t make the mistakes I once did.
This blog series progressively introduces best practices and utilities by adding one utility at a time.
Part 1 focused on Logging.
Part 2 focused on Observability: monitoring and tracing.
Part 3 focused on Business Domain Observability.
Part 4 focused on Environment Variables.
Part 6 focused on Configuration and Feature Flags.
Part 7 focused on how to start your own Serverless service in two clicks.
Part 8 focused on AWS CDK Best Practices.
This blog focuses on input validation and parsing best practices.
I’ll provide a working, open-source AWS Lambda handler template Python project.
This handler embodies Serverless best practices and has all the bells and whistles for a proper production-ready handler.
During this blog series, I’ll cover logging, observability, input validation, features flags, dynamic configuration, and how to use environment variables safely.
While the code examples are written in Python, the principles are valid for all programming languages supported by AWS Lambda functions.
You can find all examples at this GitHub repository, including CDK deployment code.
The Case For Input Validation
Developers tend to focus on implementing the AWS Lambda handler business logic and pay less attention to the validity of the 'event' input parameter.
Their algorithm is simple: extract the business logic payload from the input and process it. Easy.
However, this overly optimistic behavior can lead to crashes, undefined behaviors, bugs, and even security issues.
In this blog you will learn the importance of input validation in the cloud, the pitfalls it prevents, and how to overcome the inherent challenges and complexity you encounter when developing AWS Lambda functions.
You will learn how to process your event input in a safe and resilient manner so you can focus on the things that matter the most, your business logic.
The Optimistic Approach
Let's examine the following optimistic AWS Lambda handler code:
The AWS Lambda handler 'my_handler' receives the 'event' parameter, a Python dictionary.
Line 6, which might seem innocent, is in fact, quite dangerous, and demonstrates several hidden assumptions.
It assumes that:
'event' dictionary argument has a 'Records' key with a list value.
The list has at least two items.
Each list item is a dictionary and the second list item holds a 'name' key.
'my_name' is a non-empty string that represents a valid name.
All values are safe and sanitized and do not expose the code to a security threat (XSS, input injections, etc., as discussed here).
The first three assumptions are related to failure in syntactic validation: not validating the syntax of structured fields (JSON, List, etc.).
These assumptions can cause 'KeyError,' 'TypeError,' or 'IndexError' exceptions to be raised.
What happens if an exception is raised and slips through the cracks?
Well, the Lambda handler will crash ungracefully. In addition, if an API Gateway triggers the AWS Lambda function, an HTTP 5XX error code returns to the caller, and the user experience is hindered as you lose control over the error message.
What if you handle a batch of SQS records, and after processing two records successfully, the third record causes an unhandled exception? Well, that's a real shame since your entire batch (processed and unprocessed records alike) will be returned to the queue, ready to be processed all over again (and fail again), costing you money for the extra invocations.
The fourth and fifth assumptions are related to value constraints validation, a.k.a semantic validation.
Is 'my_name' a valid name? is it a non-empty value? Does it match an expected regex? Who knows, it's not checked in the example but assumed to be ok.
This hidden assumption can lead to undefined behaviors and hard to debug bugs or exceptions.
"Your code will sometime fail, and that’s ok, as long as it fails in the “right” way".
The AWS Events Schema Conundrum
When an AWS Service sends an event that triggers your AWS Lambda function, metadata information is added to the event, and the business logic payload is encapsulated.
Let's call this metadata information 'envelope.' The envelope contains valuable information, interesting headers, and the most important data, the business logic payload that you wish to process.
That's where it gets tricky.
Each AWS service has its own envelope structure and may encapsulate the business logic payload differently. Some services save it as an encoded string, some as a dictionary.
It's all very different and not always well documented.
This layer of extra complexity must be addressed before validating the business payload input.
Our main goal here is to focus on the business logic and don't want to worry about different AWS service schemas. We want the envelope type to be transparent as much as possible.
Let's take a look at several AWS services' event structures.
The business logic payload is sent as a dictionary in the 'detail' field.
All other fields are considered as the envelope.
API Gateway (REST)
In API Gateway, the business logic payload is sent as a JSON encoded string in the 'body' field.
An SQS event is a list of records. The business logic payload is sent as a JSON encoded string in the 'body' field of every inner record.
The Bottom Line
The AWS service schema variations and inconsistency increase the schema validation effort. We need to understand what AWS service schema to expect, where to find the payload, and how to decode it before we can validate it.
"Input validation should be applied on both syntactical and Semantic level. - OWASP
We will validate the incoming event, extract the input business payload, decode it and validate it according to a predefined schema. This schema will verify that all required parameters exist and that their type is as expected and validate all value constraints.
The schema will cover both syntactic and semantic validations.
All this will be achieved with a single line of code.
AWS Lambda Powertools Parser
We will use a Parsing utility.
I had the pleasure of writing and contributing the Parser utility to a fantastic project on Github called AWS Lambda Powertools. We have used this repository previously in the blog series (parts one to three) for logging, tracing and metrics.
The parser utility will help you to achieve next-level validation.
The Parser's engine is the excellent Pydantic library introduced in part 4.
First, we need to define our business logic payload as a Pydantic schema.
Let's assume that 'my_handler' processes orders for customers, one customer at a time.
It expects a JSON document that contains the two parameters: 'my_name' and 'order_item_count'. Let's define both semantic and syntactic validations:
'my_name' - customer name, a non-empty string with up to 20 characters.
'order_item_count' - a positive integer representing the number of ordered items that 'my_name' placed.
And the matching Pydantic schema:
Pydantic is a powerful parser library that allows defining custom validation in the form of 'validator' functions. Read more about it here.
Validation Magic Time
Let's add the Parser utility to 'my_handler' and use the 'Input' schema. We assume that the handler is triggered by an AWS API Gateway, meaning the envelope will contain AWS API Gateway metadata fields.
Let's take a look at the validation code:
In line 3, we import the 'parse' function and the 'ValidationError' exception.
In line 4, we import 'ApiGatewayEnvelope,' which correlates to the AWS service that triggers this handler, AWS API Gateway.
In line 7, we import the 'Input' scheme we defined in the previous step. All handler schemas are placed in the 'service/handlers/schema' folder.
The magic happens in line 14. We tell the parser to extract and validate our 'Input' schema model (the business logic payload schema) from the 'event' dictionary.
We also tell it that the event it expects has an envelope structure that matches that of an AWS API Gateway. The parser will return a valid data class instance of type 'Input.'
In line 18, the handler can safely process the business logic payload. For example, we can access 'my_name' by writing 'input.my_name'.
In case of a validation error, a detailed exception is raised.
Pydantic's exceptions contain detailed information about why the validation failed, and what values or fields caused the failure.
In this example, it's best to log the exception and return an HTTP BAD Request error code (400).
What About Other AWS Services?
The Parser supports the most common AWS Lambda integration services, including SNS, SQS, Kinesis, S3, EventBridge, etc.
Read more about it here.
Putting It All Together
Let's add the Parser utility to the other utilities introduced in this blog series: the logger, tracer, metrics, and environment variables parser.
The handler will now look like this:
In line 34, we parse and validate the input.
In line 35, we log the valid request details. We don't log 'my_name' as it is considered personally identifiable information.
In line 40, we use the parsed data class to send the input to the inner business logic hander function.
Handling Validation Exceptions Best Practices
The best practice for handling validation exceptions is to log the exception and gracefully return a detailed error to the caller.
The Parser's validation exception will contain detailed information regarding the malformed fields in the above example.
In Line 36, if the input is malformed, an exception is caught, and logged.
For example, if the 'my_name' key is missing from the input event, the code will print the following error log from the Pydantic exception:
"error: 1 validation error for Input\nmy_name\n field required (type=value_error.missing)".
In line 38, an HTTP Bad Request status is sent back to the caller.
Using Envelope Metadata Parameters
This is an advanced use case.
When using the 'parse' function with the envelope parameter, you can't access the parsed envelope parameters. However, in some cases, these parameters hold data that you will find helpful.
You can use the Parser utility without specifying the envelope argument and parse solely according to the model argument.
You will be required to provide a FULL Pydantic schema containing your business model input schema AND the metadata parameter.
The easiest way to do this is to create a new class that extends an existing Parser model class (API gateway model, SQS model, etc.) and add your business logic payload schema.
See detailed examples here.
Validation? Not Just For Input!
You should perform schema validation to any dictionary object your AWS Lambda handler uses. It can be a boto3 response, JSON configuration file, HTTP response from a service, or any object you can map to a schema.
Better be safe than sorry.
This concludes the fifth part of the series.
Join me for the next part where I implement dynamic configuration and feature flags.