jmanteau

Mon coin de toile - A piece of Web

Cleaning up an AWS Glacier Vault: A Step-by-Step Guide

Posted: Jan 14, 2024

Recently, I embarked on a task that might sound simple but turned out to be quite intricate: migrating a vault full of archives from one AWS region to another. This process involved not only the migration but also the deletion of the old vault, a task that isn’t as straightforward as one might think. The AWS Management Console doesn’t provide a direct way to delete a vault with existing archives. So, I decided to document the process I followed, hoping it might help others facing similar challenges.

Step 1: Set Up Variables

First things first, you need to set some environment variables. This will streamline the subsequent commands. Replace YOUR_ACCOUNT_ID, YOUR_REGION, and YOUR_VAULT_NAME with your actual AWS account ID, the region of your vault, and the vault name:

export AWS_ACCOUNT_ID=YOUR_ACCOUNT_ID
export AWS_REGION=YOUR_REGION
export AWS_VAULT_NAME=cvast-YOUR_VAULT_NAME

Step 2: Retrieve Inventory

Next, we’ll create a job to gather all the necessary information about the vault’s contents. Run the following command:

aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id $AWS_ACCOUNT_ID --region $AWS_REGION --vault-name $AWS_VAULT_NAME

Keep in mind, this process can take a significant amount of time, especially for larger vaults (in my case it took 6 hours). To check the status of this job, use:

aws glacier list-jobs --account-id $AWS_ACCOUNT_ID --region $AWS_REGION --vault-name $AWS_VAULT_NAME

Once complete, make sure to note down the JobId for the next step.

Step 3: Get the Archive IDs

Now, let’s extract a list of all the archive IDs into a json file. These IDs are essential for the deletion process. Execute the following command:

aws glacier get-job-output --account-id $AWS_ACCOUNT_ID --region $AWS_REGION --vault-name $AWS_VAULT_NAME --job-id YOUR_JOB_ID ./output.json

Step 4: Delete the Archives

Finally, it’s time to delete the archives. This process can be time-consuming, so here’s a script that uses xargs to run up to 8 deletion commands concurrently:

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
    echo "Please set the following environment variables: "
    echo "AWS_ACCOUNT_ID"
    echo "AWS_REGION"
    echo "AWS_VAULT_NAME"
    exit 1
fi

jq -r .ArchiveList[].ArchiveId < $file | xargs -P8 -n1 bash -c "echo \"Deleting: \$1\"; aws glacier delete-archive --archive-id=\$1 --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}" {}

After running this script, all archives in your vault will be deleted. You can then delete the vault itself through the AWS Management Console.