Working with Large Dependencies (≥500MB) with AWS Lambda and EFS (Amazon Elastic File System)
Satyajit Ghana
DL EngineerThis story begins with a lot of hard work put into deploying ML models into AWS Lambda and exhausting my entire free S3 Quota. I’ve built backends with AWS Lambda and Serverless framework, it makes it really easy to build and deploy model backends. But . . . continue reading . . .
#
What is the storage issue with Lambda ?All of my backends are dependent on PyTorch and Torchvision, all deployments were fine until PyTorch 1.6.0 arrived, this exhausted my lambda storage, you get 500MB (uncompressed requirements) of storage in /tmp directory of lambda, and serverless python requirements also reside in that directory. and PyTorch 1.6.0 exceeded that. I had no other way to make it work, other than moving to Heroku, which solved all my problems !! Heroku has a “compressed requirements” slug limit of 500MB, and GBs of temporary storage, ofcourse that means when the system goes to down state all of the temp storage is released, but I had no state ! all I needed was for my dependencies to work.
Another huge issue was that I couldn’t bundle OpenCV, PyTorch and ScaPy models in the same Lambda due to the 500MB dependencies limit, I had to make different lambdas for different parts of the preprocessing and inferencing.
And on top of all this issues, I exhausted my S3 Quota 😕, what a bummer. It turns out serverless uploads to S3 and then deploys the Lambda, and it also keeps last 5 deployments, so like 1.2GB of space per lambda in S3, and for the free tier I get a max of 5GB, of-course it had to exhaust, in my experimentation phase I push about 4–5 times at-least to get the backend to work.
#
What’s the solution ?I searched a lot for a good fitting solution, to solve my storage issue exhaustion, and also my deployment times, it took a lot of time just on deployment, since I always had to upload the 250MB slug to S3 and deploy every single time I changed something in my backend.
The deployment time can be solved easily by using Lambda Layers, which is a wonderful concept! it lets you create different layers with different dependencies, and you can simply always update your backend function without needing to change the layers, the layers are responsible to handle the dependencies, but . . . this DOES NOT solve my 500MB limit, all of the layers combined can have a total storage of 500MB, this means I can’t use the latest PyTorch 1.7.1, bummer . . .😖😕
#
Then comes EFS (Amazon Elastic File System)Finally after discovering that EFS works with Lambda out-of-the-box I had to try this out, was it true ? could I use PyTorch+OpenCV+SpaCy+MyModels all on a single Lambda ?
yeah yeah read on . . .
I’ll be using Serverless framework to deploy my backend, since it makes it really easy to setup everything, I don’t have to do much work setting up the API Gateway and stuff.
More about EFS here: https://docs.aws.amazon.com/efs/latest/ug/how-it-works.html
EFS FileSystem
#
Let’s create an EFS instance !Go to your AWS Console -> EFS -> Create File System
(remember the VPC being used here)
Now go ahead and create an “Access Point” for this EFS
The above just means that this Access Point has full access to the EFS
Now we have to somehow copy our dependencies into this newly created EFS
To make sure that our dependencies will definitely work with lambda, we’ll lambda’s docker image to get the dependencies first.
I built my docker image, mounted a folder, and installed the packages into that folder,
Here’s my Dockerfile
Above basically is installing all dependencies and copying my code into the image, nothing fancy
I built the image with docker build -t tejas .
and ran it with mounting the empty requirements folder into the container
Then installed requirements into the requirements folder
-t option for pip tells it to install the collected requirements in that given folder
Installing the requirements in the requirements folder mounted into docker container
These are all the requirements copied to our host machine from the docker container
#
It’s EC2 TimeNow we need to spin and EC2 instance and copy these requirements to EC2 first, so go to your EC2 Dashboard, select a simple Ubuntu server and start it.
EC2 selection
But you wont be able to mount your EFS, yet . . . to do that, go to the Security Groups and add an Inbound Connection to the SG associated with your EFS’s VPC, to allow EC2 to mount NFS over this VPC.
Allowing NFS Inbound rule into EFS’s default VPS’s Security Group
When choosing the SG here, select the SG of your newly created EC2, which should probably come as launch-wizard-x.
Now lets connect to our EC2 and mount this EFS, go to your EC2 instance and click on CONNECT, you’ll get something like
Command to SSH into EC2
Now you can use your pem file and above command to connect to EC2
To mount EFS volume, go to your EFS instance, and then click on ATTACH, you’ll get something like
Command to mount EFS
Copy the “Using the NFS Client” command,
Run this in your EC2 to install the NFS Utilities:
sudo apt install nfs-common
and make an efs directory
mkdir efs
Now you can mount your NFS !
If you do a quick df -h you’ll see that your EFS is now mounted
EFS mounted into EC2
I had ran into timeout errors because my SG weren’t right, the above steps shouldn’t cause any issues, if you aren’t able to mount then look into: https://docs.aws.amazon.com/efs/latest/ug/troubleshooting-efs-mounting.html#mount-hangs-fails-timeout
Now we can copy over our requirements to this EFS, I’ll use scp for that
Copying Requirements from local machine to EC2 via SCP
Now simply copy over the requirements to the EFS
Requirements copied to EFS via EC2
Isn’t this just amazing !!!!! You get 5 GB for free EFS storage, and you can use multiple lambdas with the same dependencies! and as soon as I update the EFS files, all the lambdas quickly pickup the new dependencies on new invocation !
EFS Total Size
As you can see above all the dependencies in total are 854 MiB, and Lambda works with all of them ! damn right ?
Here’s my requirements.txt
There are some drawbacks though
As you have large dependencies and you’ll be reading those files every time your Lambda starts up, it’ll cost you Read Operations, and EFS has a different way of billing you for it.
Also if multiple lambdas are using the common EFS for dependencies, and you decide to update it someday, you might break some of your lambdas as they might be holding on to some specific dependency versions. Well but i dont think its really an issue, I like to live on the edge 😂😂 bring in all the updated dependencies.
You’ll lose Internet access in Lambda, since you are on a VPC, but that can be solved easily: https://stackoverflow.com/questions/62240023/aws-lambda-function-cant-invoke-another-lambda-function-in-the-same-vpc/65541460#65541460
#
Deploying Serverless with EFS attachedNow I’ll deploy the serverless application 😁
#
Look at that cute little zip file 😵😚😍 of just 9.02 KB ! awww, compared to 200–240 MB of zips being uploaded on every deployment, that’s quite an improvement, isn’t it ?My original plan was to use this Lambda to train a Deep Learning model, and I was successful !! (BTW this is one Lambda calling another Lambda to train the model, and both of them share the same EFS)
CloudLogs for Model Training
Here’s my serverless.yml
This is an amazing blog on how to use EFS with Lambda : https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/
Some interesting links: