A lambda function is a like a little island, surrounded by network. Unlike Fargate containers, of EC2 instances, they do not have EFS, EBS or some other fast storage support. Everything that goes into a lambda, goes in via the network interface (and network only).
![](https://www.keithrozario.com/wp-content/uploads/Screenshot-2020-04-18-at-11.37.04-AM.png)
And hence, since Lambda’s are ephemeral, everything going in and out of the lambda has to transverse that network ‘moat’. And because they have no long-term storage, everything of value must be exfiltrated out the function’s execution context, and onto something else (like S3)
This is easy for HTTP requests or messages via SQS/SNS, but when dealing with files, the common tactic is to store them in /tmp
for reading or processing.
![](https://www.keithrozario.com/wp-content/uploads/Screenshot-2020-04-18-at-11.39.49-AM.png)
But a lesser known technique, bypasses the need for storing anything in the lambda /tmp
directory. Instead it uses Python’s inbuilt tempfile module, to create temporary files in memory, that be read/process the files in place like so:
This bypasses the need for /tmp
, and the limitations of the directory’s size (currently capped at 512MB). Since the file is loaded into memory, you get a larger capacity (though by not as much as you think).
Plus it actually incurs some additional complexity as well, because I’m not entirely comformtable coding io.Bytes
and io.String
, but generally speaking this does make your architecture neater at the expense of a couple lines of ‘not-so-straigtforward’ code.