I’ve been really digging into Lambda Layers lately, and once you begin using layers you’ll wonder how you got by without them.
Layers allow you to package just about anything into lambda, but in a modular way. So elements of your code that don’t change much, can be packaged into layers, while keeping your actual lambda deployment for just the code that’s changing.
It’s akin to docker cache, where you keep the un-changing elements higher up in your docker file, separate from the code that always changes. The difference though, is that docker cache speeds up builds, while layers speeds up lambda deployments.
But layers aren’t magic, and they’re still limited by the AWS size limit, hence your entire function (including all it’s layers) need to be no larger than 250MB (unzipped).
Which is tough for something like spaCy — because its default installation size on a AWS Linux is ~400MB (or 492MB based on my quick installation on lambci for python3.7). So, in order to get spaCy working on a lambda, certain tweaks are going to be necessary.
Some have tried working around this problem by installing spaCy onto the lambda container on cold-start — i.e. pull the data into lambda only when you have access to the 512MB in
/tmp. Cool solution, but it almost completely fills out
/tmp, and makes a cold-start even slower.
A more optimal solution would be to reduce the size of the spaCy installation and have it fit into a layer! Fortunately I found a GitHub issue after some googling that enables us to do exactly this.
It involves removing unnecessary language files, which spaCy lazy load in, If you’re only interested in one language, you can simply remove the unnnecessary language files in the
After manually removing all non-English (en) language files, I managed to reduce the size of the spaCy package to 110MB, which fits very nicely into a lambda layer, in the end my lang directory only had the following files:
As a bonus, I also included the English en_core_web_sm-2.1.0 model, to make the lambda layer fully usable on its own .
Finally I published it as a publicly available layer, for anyone to consume. One of the amazing things about layers, is that once a layer is made, it can be shared across AWS for anyone to consume.