comment 0

Logging within AWS Lambda Functions (python edition)

This post covers in detail, the logging functionality within Lambda functions using the Python runtime — skip to the TL;DR if you’d like to just understand what to do without getting into the nitty-gritty.

For the brave folks still reading this from top to bottom — let’s dive in.

Lambda Logging: 101

When building lambda functions, it’s always a good idea to sprinkle logging statements throughout your code — particularly in areas where something important happens. This makes it easier to troubleshoot errors and issues, as your logs get more verbose as the closer it is to important events (that might fail)

Logs aren’t just for troubleshooting, they’re also used for metrics, and tracing, etc. But for this post, we’ll focus on just logging to give us visibility on specific Lambda function invocations to allow us to quickly triage issues both in test and production.

Logging is a great tool to quickly troubleshoot an issue, but log statements by themselves are useless, unless they appear somewhere accessible. If this were an EC2 instance, you’d log out to file on the server that you could SSH into to read it — or have some 3rd-party to siphon those logs out to a central location, e.g. Cloudwatch Agent on that instance.

Lambda has no SSH capability, but it does have in-built cloudwatch logs integration, which makes shipping logs to Cloudwatch logs a breeze.

Cloudwatch Logs requires no configuration to setup. The logs reach it asynchronously from Lambda, with a 10-30s delay. From here, you can see your logs, and even query them using regular cloudwatch searches, but also Cloudwatch Insights. For more detailed analysis, you can ship them off to your 3rd-party tools, like your own ELK stack.

But in this post, we’ll focus on what we can do with the readily available (and purely serverless) tools like Cloudwatch Logs, and Cloudwatch insights.

Note: Cloudwatch Logs is a service that is distinct from Cloudwatch. Do not confuse the two to be same thing — it isn’t even referenced with the same service name in boto3 (logs vs. cloudwatch)

If you’ve ever used Cloudwatch you’d notice these things named LogGroups and LogStreams, a LogGroup is a single ‘partition’ within Cloudwatch that captures data, and LogStream is a sub-partition within a LogGroup that actually stores the individual log messages. Within frameworks like Serverless Framework, the default behavior is to create a single LogGroup per Lambda function.

Each LogGroup will have multiple LogStreams — with each LogStream corresponding to one Execution Context of a Lambda Function, the more parallel executions you spin up, the more Execution Contexts get created, and the more LogStreams you have in that group.

Each Logstream is a linear stream of log messages, arranged in chronological order, with a clear delineation between invocations/requests, as each execution context is typically used for more than one invocation.

The overall relationships looks like this:

And on the console it looks like this:

If you’re troubleshooting this one issue that occurred for that one customer back in May, drilling to the specific request is the place you want to quickly get into — the actual events leading up to the issue in question. And in this post, we’ll delve into how to do that.

Side Note:

Choice of Framework is important. Serverless framework does this by default without the user even seeing it, but tools like Terraform require you to work through the verbosity of creating individual logGroups, and then assigning the right permissions, logs:CreateLogStream, and logs:PutLogEvents to the underlying functions. With terraform modules this isn’t a monumental task, but it’s still undifferentiated heavy-lifting.

With that intro, let’s look at how we can configure our Logging within our Lambda Functions…

Logging in Python

To setup logging within a Lambda function use the following code snippet:

If you ran the code above, you would get the following logs in cloudwatch logs (I removed some superfluous data)

Obviously the Debug error wasn’t displayed, because the log setting is set to INFO, but the logs do still look verbose as they contain the standard log format, which is a tab delimited text made up of the following four fields.

  • Level
  • Time(isoformat) in milliseconds
  • AWS_REQUEST_ID
  • Log Message

But reading flat-lines of text aren’t ideal for analysis and consumption. Can you imagine parsing through 10,000 lines of this. Instead let’s try to change the format of the logs to something better like JSON.

Configuring Lambda Log Format

In order to over-write the existing log format into something else (e.g JSON), you’ll need to modify the root log handler that already exist in the Lambda Function, as it is added during the bootstrap time. You have no control over the setting of this handler, but fortunately you can only change it post-creation.

To change the format of our logs, we’ll change the existing handler using the code below:

Now we’re in business — because our logs lot more pleasing to the eye:

This allows you to change the format from tab-delimited to JSON. Great!

But ultimately the message (the thing we actually logged) is still plaintext without any ability to create fields or a hierarchy within them. What if you wanted more structure in your logs. i.e. embed JSON objects into the log message? Let’s take a look…

Configuring Lambda Log Format

This goes a little deeper than required, but details on how to create a class to handle logging python dictionaries within the log message are found in my repo here. You’re not going to need it, as a much easier way will be revealed later.

But what the code allows us to do, is output our data in nested JSON format, for example here, where I create an additional field called data which two fields invoke_arn and value.

But surely there’s a better way to do this, perhaps a standard tool that does this heavy lifting, preferably a tool from AWS themselves.

Well you’re in luck — there is!

TL:DR AWS Lambda Powertools

Forget all the initial stuff I told you about re-writing Logging Parsers, or introducing new classes to your code — to log out in JSON from within AWS Lambda simply use AWS Lambda Powertools. The tools allows us to add the ability to log out in structured JSON with just two lines of code (well 3 if you count the decorator):

Once you’re logging out in JSON, querying via Cloudwatch insights is a breeze, and as an added bonus, the tool also allows us to set log settings via Environment Variables, namely:

This allows to change the log level of our functions without redeploying them. It’s best practice to do this, as it allows for a quick setting of functions to log out DEBUG information for troubleshooting. It only works though, if you’ve already pre-added these debug log lines throughout areas of your code.

Fortunately, AWS Powertools is also available as a Lambda Layer from Klayers, so including it into your functions is as simple as pasting an ARN into the AWS Console.

But now that we’ve written these out, let’s look at how to go read and parse them.

Reading Logs

To read logs, we can now search within a single LogStream or all LogStreams in a LogGroup. You can do this from within the AWS console or your tool of choice, but because our logs are in structured JSON format we can query the logs using powerful filter and patterns.

For example, querying for {$.level = "WARNING"} gives us all the warning messages in a log group, or {$.level = "WARNING" || $.level = "ERROR"} gives us both warnings and errors.

But perhaps, looking at a standalone error log output isn’t helpful, we’d like to see the events leading up to the error. Fortunately, if you refer to screenshot above, each log message is also populated with a “function_request_id”.

The request id corresponds to the actual request that triggered the execution that led to the Error in question. Picking up a request id from the logs, and then searching for {$.function_request_id = "b959bd9a-edd4-4e02-94e5-15379a31479c"} will return all logs from that single execution. This narrows our focus to a singular execution allowing us to triage an error efficiently.

The major downside though, is if we have a uncaught exception, they tend to bypass the structured log output and print out non-JSON lines in the tab-delimited manner we saw earlier. However, we can still search for them by simply searching for "[ERROR]". From there, because we have a timestamp, and logStream, we can manually drill into the event. Less elegant, but still workable.

Before we move onto querying across multiple LogGroups, it’s always good practice to let the Lambda execution ‘die’, i.e. do not catch the exception and return, but allow the function to exit with an error code. This allows us to use tools like Lambda Destinations & Cloudwatch metrics to alert us on catastrophic issues, which would otherwise be hidden from view.

So far so good, but what if we wanted to query across multiple CloudWatch log groups? If we needed to triage an issue in an application, but didn’t know which function was causing it.

Cloudwatch Insights

Cloudwatch insights allows us to do similar search functionality, but across multiple Log Groups. Think of this as an application level log query tool.

By default, each cloudwatch log record will have a @message and @timestamp field. Modifying the format of the log output, changes the format of the @message field (it still exist). But if the format of the message is JSON, then cloudwatch automatically provides the ability to query on fields within @message by specifying just the field name,

For example, we can query all error logs from across multiple log groups using the following syntax

To the keen observer, this looks very similar to SQL Select queries, and indeed we get a nearly free log solution without the need for any servers, agents, or Gigabyte sized log files. The platform takes care of this for us.

It’s not super quick of efficient, but it gets the job done, without servers or separate 3rd-party tools.

Conclusion

So to conclude on how to log within python functions in AWS Lambda:

  • Use AWS Lambda Powertools to log out in structured format, this is a real no-brainer.
  • Be Judicious with your ‘INFO’ logging (and sensitive), but also ensure you have DEBUG logging ready-to-go if you need them
  • Ensure the Logging Level (INFO, DEBUG) is not hard-coded into the function, but set as an environment variable which can be changed when you need to.
  • Use Cloudwatch to query your logs, and cloudwatch insights for more wider search.
  • Set the right retention period for your logs to reduce cost
  • If you need more, consider 3rd-party tools which will include alerting/monitoring, above standard log outputs.

Proper logging might seem like a waste of time — but trust me, when it saves you hours of agonizing troubleshooting, you’ll never go back to not logging ever.

But what most developers do, is over-compensate by logging out everything — without any control, which makes analyzing logs a chore. The balance is ensuring we log out sufficient information (by setting the right log levels), and then having tools to comb through that info (using Cloudwatch insights) to us to quickly triage issues and gaining the most from our logs.

Also note, that unlike traditional server-full environments, where people logged onto EC2s to read logs — this setup logs out to a central location, which might not have the same level of security scrutiny. Ensure confidential and PII data aren’t logged out in plaintext — to avoid your logs leaking out sensitive data.

But once you get hang of it, a combination of cloudwatch logs, cloudwatch insights, and AWS Lambda Powertools, allows us to do effective and efficient logging within our functions, with minimal overhead in a completely serverless way.

Additional Reading

To learn more, the following references are useful:

Astound us with your intelligence