AWS Lambda Cookbook - Part 4 - Environment Variables Best Practices
Updated: Feb 18
What makes an AWS Lambda handler resilient, traceable, and easy to maintain? How do you write such a code?
In this blog series, I’ll attempt to answer these questions by sharing my knowledge and AWS Lambda best practices, so you won’t make the mistakes I once did.
This blog series progressively introduces best practices and utilities by adding one utility at a time.
Part 1 focused on Logging.
Part 2 focused on Observability: monitoring and tracing.
Part 3 focused on Business Domain Observability.
Part 5 focused on Input Validation.
Part 6 focused on Configuration and Feature Flags.
Part 7 focused on how to start your own Serverless service in two clicks.
Part 8 focused on AWS CDK Best Practices.
This blog focuses on environment variables best practices.
I’ll provide a working, open-source AWS Lambda handler template Python project.
This handler embodies Serverless best practices and has all the bells and whistles for a proper production-ready handler.
During this blog series, I’ll cover logging, observability, input validation, features flags, dynamic configuration, and how to use environment variables safely.
While the code examples are written in Python, the principles are valid for all programming languages supported by AWS Lambda functions.
You can find all examples at this GitHub repository, including CDK deployment code.
According to the official AWS Lambda documentation, “Environment variables are pairs of strings (key-value) stored in a function’s version-specific configuration.”
Environment variables are often viewed as an essential utility. They serve as static AWS Lambda function configuration. Their values are set during the Lambda deployment, and the only way to change them is to redeploy the Lambda function with updated values.
However, many engineers use them unsafely despite being such an integral and fundamental part of any AWS Lambda function deployment. This usage may cause nasty bugs or even crashes in production.
This blog will show you how to correctly parse, validate, and use your environment variables in your Python AWS Lambda both in deployment code and the AWS Lambda function code.
Let’s start with the fundamental assumptions.
Python environment variables are stored in a dictionary in the ‘os’ module — os.environ. These variables, pairs of key-value strings, can be individually accessed by calling os.getenv(‘my_var_name’). If ‘my_var_name’ is not defined as an environment variable, this function will return a None object instead of a string.
AWS Lambda function environment variables are defined in a deployment code of an infrastructure as code framework such as AWS CDK/Serverless/Terraform etc.
Common Environment Variables Bad Practices
Sporadic & Unsafe os.getenv Usage
Many developers access os.getenv sporadically in the AWS Lambda function files.
However, they usually do not check that the values are valid. In addition, some environment variables values might have hidden assumptions. For example, they can represent a valid ARN string format or an HTTP REST endpoint. However, most developers don’t validate the values during runtime.
They assume everything is ok since their tests did not fail.
If their test coverage is genuinely excellent and covers every single os.getenv call, they might be out of hazard’s reach. However, suppose that’s not the case; a horrible crash/bug might lurk around in the code, waiting for a misconfiguration in the AWS Lambda deployment code.
In other cases, environment variables serve as custom configuration values for 3rd party libraries and dependencies. Failing to validate these configuration values before using them might cause unexpected behavior or bugs. Think of a logger, observability tracer, or database handler that you wish to override their default configuration.
Deployment Code Clutter
When you define environment variables in infrastructure as code frameworks such as CDK/Serverless/Terraform, you usually start with a small dictionary of environment variables and their values. However, this dictionary will increase in size and become harder to maintain over time. It’s crucial to understand what variables are used and why.
Since the variables are used in numerous places in the function’s code or its 3rd party dependencies (as argued above), tracking what variables are used and which can be safely removed becomes challenging. In addition, due to less than perfect test coverage, removing environment variables becomes a risky operation.
So people don’t do it, and variables are seldom removed.
Environment Variables Best Practices
We want to address both bad practices mentioned above.
First, we need to define an environment variables schema per handler. This schema informs us precisely what environment variables the AWS Lambda function uses across its code and dependencies. The schema may also define value restrictions.
Second, we will validate & parse the environment variables according to the predefined schema when the AWS Lambda function is triggered. A validation exception is raised with all the relevant exception details in case of misconfiguration.
Third, we require a global getter function for validated environment variables that any file in the AWS Lambda function can call.
And lastly, the deployment code, i.e., CDK/Serverless code, will define and set only variables that are part of the schema.
Ok, let’s head over to the proposed solution.
Tools Of The Trade
We will define and parse the schema with Pydantic, a performance-oriented parsing and validation library. Read more about Pydantic in my first blog.
The environment variables initializer will provide 'os.environ' dictionary as input to the Pydantic parser. Pydantic will raise a very detailed 'ValidationError' exception if one or more parameters fail the validation.
In addition, we would like to access these variables in all AWS Lambda function files with the same ease of usage that calling 'os.getenv' provides us, BUT in a safe manner.
We will call a getter function that returns a global instance of the parsed configuration.
It's important to note that 'os.getenv' still works and may be used by the 3rd party dependencies.
Initialize and Parse Environment Variables
Let’s define a new Python decorator that will initialize the environment variables: ‘init_environment_variables’.
We will use the AWS Lambda Powertools middleware factory, the ‘lambda_handler_decorator’ decorator to create a new AWS Lambda handler decorator.
You can read more about it here.
Let’s take a look at the code below.
In line 10, we define a global schema instance, ‘ENV_CONF.’
In line 13, we use AWS Lambda Powertools middleware factory to turn ‘init_environment_variables’ into a decorator. The decorator accepts three regular AWS Lambda handler decorator parameters (handler, event, and context) and one of its own: the ‘model’ parameter. This parameter is the class name of the schema that we define.
In line 18, the magic happens. We pass the ‘os.environ’ dictionary as kwargs to Pydantic’s ‘model’ class constructor. Pydantic will raise a detailed ‘ValidationError’ exception in case the environment variables dictionary fails validation.
Once the code gets to line 22, the global instance ‘ENV_CONF’ is parsed and validated, and the Lambda handler can be triggered safely.
In line 25, the global getter function, ‘get_environment_variables’, will return the ‘ENV_CONF’ global instance. It can be called anywhere in the AWS Lambda handler code, including inner functions.
We will put this code in the environment parser file in the utility folder as all handles and functions use it. Each handler may define a different schema.
It’s Schema Time!
Let’s define a Pydantic schema by the name of ‘MyHandlerEnvVars.’
If you recall, we implemented logger, tracer, and metrics utilities in the previous three parts of this blog series. We can configure these utilities with environment variables and change the service name and log level.
In addition, let’s assume that our handler, ‘my_handler,’ requires two additional variables: a role ARN that it assumes during its execution and an HTTP REST API endpoint URL that it uses.
We will define all four variables in the schema below.
All Pydantic schemas extend Pydantic’s ‘BaseModel’ class, turning them into a dataclass.
The schema defines four environment variables: ‘LOG_LEVEL,’ ‘POWERTOOLS_SERVICE_NAME,’ ‘ROLE_ARN,’ and ‘REST_API.’
In line 6, ‘MyHandlerEnvVars’ extends the default Pydantic ‘BaseModel’ class.
This schema makes sure that:
‘LOG_LEVEL’ is one of the strings in the Literal list.
‘ROLE_ARN’ exists and is between 20 and 2048 characters long, as defined here.
‘REST_API’ is a valid HTTP URL.
‘POWERTOOLS_SERVICE_NAME’ is a non-empty string.
The schema will reside in a new schemas folder under the handlers folder.
Putting It All Together
Let’s add the environment variables initializer and getter utilities to the logger, tracer, and metric utilities already implemented in the previous blogs.
You can find all code examples at this GitHub repository.
In line 8, we import the handler environment variables schema.
In line 9, we import the two initialization and getter functions we placed in the utilities folder.
In line 18, we add the new decorator, ‘init_environment_variables,’ and set the ‘model’ argument to ‘MyHandlerEnvVars,’ the previously defined schema.
In line 25, we use the getter function to access the global parsed schema dataclass of our environment variables, and in line 26, we log all four variables of our schema.
This AWS CDK code defines the variables of the schema ‘MyHandlerEnvVars’ and sets their values. Look specifically at ‘__add_post_lambda_integration’ function.
This concludes the fourth part of the series.
Join me for the next part on ranthebuilder.cloud, where I parse and validate AWS Lambda event inputs.