Update: Join the discussion over at HackerNews!
We’ve written in the past about how the Flowcase web application is written in Ruby on Rails. The web application isn’t the whole story, though. Surrounding it are many supporting services, and we are increasingly using Rust to write these services.
We’re also using more Lambdas in our architecture, and we want to use Rust in those as well. The landscape for Rust lambdas isn’t barren, but it’s not well-trodden either.
This post is going to cover how we write, build, and deploy our Rust lambdas. Our lambdas have the following qualities that we’re proud of and want to share with you:
- Fast, standardised build. All of our lambdas use the same Dockerfile to build, and make good use of Docker’s layer caching. Incremental builds in CI take under a minute.
- Run locally. If you’re working on a lambda, you don’t want to have to sit through a CI build to see if your changes work. All of our lambdas can run locally and in AWS using the same code.
- Private GitHub dependencies. There aren’t many options out there for private Cargo repositories, so we use private GitHub repositories for our internal libraries.
The code
The starting point for writing a Lambda in Rust is to use the official Rust lambda runtime. At the time of writing, the last release of this library is version 0.2, which doesn’t support async/await. Async/await support is present in master, though. Here’s how it looks in practice:
The problem with this is that you can’t run it locally. The #[lambda] attribute wraps your main function in another main function that calls in to the AWS lambda API.
To get around this, we write two main functions:
We’re making use of Rust’s “feature” flags to compile a different harness around the handle function depending on whether we want to run locally or in AWS.
Here’s the Cargo.toml file:
Two noteworthy things:
- We’re using a version of the aws-lambda-rust-runtime that hasn’t officially been released. This isn’t ideal, and we’re eagerly awaiting a 0.3 release.
- We have a features section, which is where we define the with-lambda feature we use in the Rust code shown above.
Running our lambda now gives us the following:
The Dockerfile
All of our lambdas build with the same Dockerfile. I’ll show it in all of its glory and then explain what’s going on bit by bit. Brace yourself.
First of all, shout out to Shane Utt whose blog post we used as a starting point for this.
The first line is a Docker directive that says we want to use some experimental Dockerfile syntax. The syntax in question is the --mount=type=ssh flag to the RUN commands, but we’ll talk about later.
This next bit says we want to use the latest Rust image, and we’re passing in a build arg called “name.” This is how we’re able to share this Dockerfile between all of our lambdas without having to modify it.
Next we run an update on the image, and we install musl. If you’re not familiar, musl is a libc replacement that you can link to statically. This means the resulting binary won’t depend on the system’s libc, which makes it more portable. It’s not a strict requirement for running on AWS Lambda, but it’s good practice.
The next few lines set up a pseudo project, where the only things we’re going to compile are our dependencies and a dummy main.rs. The idea behind this is to use Docker’s layer caching to avoid having to compile our dependencies every build. This leads to significantly faster incremental builds in Docker.
Up until now, we’ve done exactly what Shane Utt did in his version of this. These three lines, though, are new. Because we use SSH to fetch private dependencies (more on this later), we would sometimes find that our builds would fail with the error “host key verification failed.” To get around that we pull down GitHub’s host keys and make sure they’re what we expect them to be based on the values here.
Our first bit of experimental syntax! The --mount flag is a new thing introduced with the BuildKit engine for Docker, you can read about it in depth here. The type=ssh bit is us telling Docker that we want to use an SSH agent for this command. In the docker build invocation, which we’ll see later, we can tell Docker what keys to add to this SSH agent.
The reason we do this is because it was the only way we could find that let us depend on private GitHub repositories in our Cargo.toml file, in a way that worked both locally and in CI. It means we can do this in our Cargo.toml file:
And it Just Works™.
The rest of the RUN command is our first cargo build. It looks a lot scarier than it is. Most of it is us telling rustc to link against musl instead of the default libc. The only other interesting bit is the --features with-lambda. This matches up with the code we saw earlier to produce a binary that’s going to work properly when deployed in AWS.
Next up, we’re copying over our actual source code. The touch command is necessary for cargo to realise the files are new, because when we created our dummy main.rs file earlier we created a new file with a timestamp later than the one on the real main.rs file. This is different to the approach taken by Shane Utt, as we found that approach would often result in builds where the dummy main.rs file was the one that ended up in the final build.
Another addition is the cargo test invocation. Tests are good!
Lastly we create a new build stage and copy over the final executable. The new build stage is in order to keep the final image small. Ours tend to clock in at around 8MB.
The build script
Invoking Docker is done in a shell script which is also identical for all of our lambdas.
Let’s walk through it just like we did with the Dockerfile.
These lines are common to a lot of the bash scripts we have at Flowcase. They set up the following behaviours:
- The script will exit on the first unsuccessful command.
- The script will echo every command run.
- The script will execute as if it were run from the directory it lives in.
We find these to be useful defaults for writing most of our bash scripts.
These lines aren’t beautiful, but they get the job done. The name of the project is extracted from the Cargo.toml file and some variables are set based on it.
We use the presence of an environment variable called BUILD_ID to check if we’re running in CI or locally. We use this information to set some more variables, the first one being the tag to use for the Docker image we end up building. The other is used to tell Docker what SSH keys to forward into the Docker build. This is the other half of the --mount=type=ssh thing we saw in the Dockerfile. Locally we tell it to use our existing SSH agent, and in CI we use a specific key that has access to our private GitHub repositories.
We also make sure to use our private Docker registry in CI so when we push images, they’re then available to use by later builds as a cache. Locally, we don’t do this. We just use the local Docker daemon.
This is the bit that actually does the building. The Docker invocation is quite involved, so let’s break it down:
- DOCKER_BUILDKIT=1 is an environment variable we have to set to tell Docker to use the BuildKit engine. We need to do this for the SSH agent forwarding.
- docker build is what it says on the tin.
- $SSH subs in the SSH part of the command we crafted earlier.
- --cache-from $TAG tells the Docker build to use any layers it can from the tag we specified earlier. Without this, it would only search locally for layers.
- --build-arg “name=$NAME” passes in the name of the project we extracted from the Cargo.toml file.
- --build-arg “BUILDKIT_INLINE_CACHE=1” this is, for some reason, necessary to get --cache-from to work with BuildKit.
- -t $TAG what to call the image once built.
Phew. Scary but necessary to get all the good stuff.
The line after copies the resulting binary out of the image and into our current directory with the name “bootstrap,” which is necessary as it’s the file AWS Lambda looks for to execute when you’re running without a runtime. More on this later.
If we’re running in CI, push to our private registry.
Create a .zip file containing our bootstrap executable. This is the final packaged artifact we’ll be uploading to AWS Lambda in the next step.
The deployment
As mentioned in a previous post, we deploy all of our infrastructure using CloudFormation. Our lambdas are no exception.
The barebones CloudFormation template for one of our Rust lambdas contains 3 resources, and only 2 of them are strictly necessary. I’ll cover all 3 for completeness.
IAM Role
The first one is the most vanilla. It’s your bog-standard Lambda IAM role:
It’s very common for our lambdas to access other AWS resources, and when they do we’ll add permissions to this role. By default it just allows lambda.amazonaws.com to assume it, and it allows the lambda to create and write logs.
The lambda
The second and last necessary resource is the lambda itself.
There are a few things here you might want to modify. The FunctionName and Description, for example. Also if you anticipate needing different Timeout or MemorySize parameters you should, of course, tweak those.
You can see the reference to our lambda.zip in here. This is a local file path, which means this template will need to be packaged before use. Our CI server does this for us at build time, which doubles up nicely as a basic check on the validity of our templates.
You can also see that we pass in RUST_BACKTRACE=1. We found that in practice the usefulness of this outweighed the costs. In a bunch of our lambdas we also use the env_logger crate and set RUST_LOG=rust_lambda_template=info for logging. Note that you’ll need to change rust_lambda_template to the module name of your executable. This should be the same as the name field in your Cargo.toml, with dashes replaced with underscores.
The monitoring
This last resource is the optional one. For all of our lambdas we like to have some basic monitoring in place. The one alert we thought applies to all of our lambdas is one to tell us if the lambda has a higher-than-usual error rate. Our template defines something generic that can be tweaked as necessary.
It’s a lot, but all it’s saying is if the error rate of the lambda is higher than 5% for 5 minutes, an alert is sent to an SNS topic. We hook this up to PagerDuty, and we hook PagerDuty up to Slack and our phones.
There’s a lot of flexibility in CloudWatch alarms. For example, if you want to alert if 2 out of the last 5 data points are above the given threshold, you can specify a DatapointsToAlarm: 2 parameter. It’s worth losing a few hours in the documentation if alerting is something you’re planning to take seriously.
A note on Rusoto
Because we’re frequently interacting with AWS services, we use the rusoto crates. They’re fantastic, but there are two things you should be aware of:
- Creating a client struct is expensive.
- There are no retries by default.
For the first one, we recommend using the lazy_static crate. Here’s an example:
This reuses the client struct as much as possible. Multiple lambda invocations will go to the same process, so this will also get reused across invocations.
For the second one we’ve found the again crate works well.
By default this retries 5 times, with backoff. The retry policy is configurable if you need more control, and it’s covered in the crate’s documentation.
Closing Thoughts
That’s it. That’s how we do Rust lambdas at Flowcase, from start to finish. We’re hoping that this serves as a resource for other people wanting to start running Rust lambdas in production, and have been struggling to find somewhere that ties all of the pieces together.