Last year, a cool new feature was introduced called Amazon Data Lifecycle Manager (DLM) which can be used to snapshot your EBS disks. This was previously possible using a Lambda function or cron job, but if you can use the DLM, you should. Here are the full details.
EBS snapshots created with the DLM are indistinguishable from snapshots created manually or via the API, however, DLM is superior to a homegrown snapshot solution for many use cases.
- It handles the basics of snapshot management in a well documented way that corresponds to AWS best practices with no code on your part.
- You can create multiple policies and manage which EBS volumes they are applied to with tags. Changing the retention policy of a volume is as easy as switching a tag.
- You can manage DLM policies with declarative infrastructure as code solutions like Terraform or the CLI.
- Expiration is built into the DLM and is communicated in a very transparent manner. You state how many backups you want to keep and the system shows you how long it will keep the backup around. So if you select a twenty four hour frequency and 90 backups, the DLM will let you know that it will be keeping backups for at most three months.
- Less code is better, especially when it is for something as crucial as backups.
- It’s free (you are charged based on the size of the snapshots, just as you would be for a normal snapshot).
It’s worth acknowledging that, like any other choice, there are downsides to using the DLM.
- The minimum available frequency is once every two hours and the maximum frequency is once every twenty four hours. There are a fixed number of frequencies, listed in the documentation. If you need snapshots more frequently than that, the DLM won’t work for you.
- It’s not entirely deterministic. If you need to know exactly when an EBS snapshot will occur, use another method.
- If you have special business logic that triggers a snapshot (for example, just before a release), the DLM won’t help you out.
- There are account level limits for the DLM.
One thing that caught us up when we first implemented a policy was that you need to have a different tag for each policy, and there can be no sharing of tag values. If you have one policy that backs up every other hour for fourteen days, and another that backs up every twenty four hours for one hundred days, you might have these tags on the relevant volumes:
SnapshotEveryOtherHour = true
SnapshotEveryNight = true
That will work. The below tag, while equivalent in meaning for human beings, won’t work:
SnapshotRules = everyOtherHour everyNight
Even if you set up two policies that matches on the “SnapshotRules” tag, it will be ignored.
For generic infrastructure tasks like taking snapshots, specialized tools that are maintained by AWS are superior to lambda functions or other solutions that I have to code up and maintain. There are always exceptions, but the DLM is definitely worth evaluating to see if it can fit into your backup strategy.