How to delete an Amazon S3 bucket with lots of files in it
:: Monday, July 14th, 2008 @ 6:28:51 pm
:: Tags: Computers
This may be common knowledge for those of you who live in S3 land, but it took me awhile to figure out.
If you’re on Mac OS X or have access to a system with Ruby on Rails installed, it’s easy to delete a bucket and all of its keys.
- download s3sync and s3cmd
- configure s3sync’s environment (like it says to do in the README)
- open a terminal window and run “s3cmd.rb deleteall bucketname”
- then, if you want to delete the bucket itself, run “s3cmd.rb deletebucket bucketname”
On my MacBook Pro on an EVDO connection, it seems to be deleting 4 keys per second, or just over 4 minutes for 1,000 keys and 7 hours for 100,000 keys. ECHENG.COM alone has around 68,000 files in it. 45 days of backups = about 3 million files. My calculations tell me that it will take 35 days of constant deleting to remove all of the keys, and that’s only for ECHENG.COM. Wetpixel is pretty big, too.
I’m screwed. :(
UPDATE On my Mac Pro on DSL, it’s deleting 10-15 keys per second. That is much better.

success! it’s slow, but it looks like it’s working
THE HISTORY, IF YOU CARE
I asked my server guys to use S3 as a backup repository for wetpixel.com and echeng.com, and they filled it up with a month and a half of FULL daily backups instead of doing something incremental. My sites are relatively large, and this resulted in 260GB of data spanning hundreds of thousands of files (“keys” in S3). I have been paying $40-60/month for S3 usage, and I wasn’t too happy after I discovered that the most recent backup was 6 months old (luckily, I’m told that they have been backing me up to their own drives instead).
I tried to delete the useless buckets, but discovered that you must delete all of the keys in a bucket before you can delete it, and keys have to be deleted one at a time. There have been requests in the S3 forums for a feature that allows bucket deletion without prior removal of files, but the S3 folks replied saying that it was a “low priority.”
I’m on the Mac, and normally use S3 Browser.app and Transmit.app to access my S3 buckets. I tried deleting a bucket using Transmit, but it failed repeatedly with an error after it deleted some number of keys (it consistently fails, but can take 15 or 20 minutes before the error). S3 Browser is an app that fails to inspire confidence — it loads some number of keys at a time in bursts (1000, I think) and appends them to a single, scrolling list. I didn’t want to find out what might happen if I let it pull down the entire list of files.
I started looking for other options, and eventually found s3sync and s3cmd. I then discovered that Ruby on Rails comes installed by default on Leopard, which was convenient. And now, I’m watching a scrolling list of files as they are removed from my bucket, which is very satisfying.
You should fire up an EC2 instance and see how fast it is able to delete items. It wouldn’t surprise me if it goes 10x faster than doing it over your DSL connection.
[...] Eric Cheng and other blogs appearing after a google search pointed out S3sync as a suitable tool to remove a non-empty bucket. [...]
“S3 Browser is an app that fails to inspire confidence — it loads some number of keys at a time in bursts (1000, I think) and appends them to a single, scrolling list. I didn’t want to find out what might happen if I let it pull down the entire list of files.”
Well to be honest this is actually how amazon handle a LIST request, they don’t send the files list at once but instead send it in chunks of 1000 per response :(
Yes the multi-object delete is probably the most demanded request hopefully Amazon will realize that they can’t keep ignoring their customers.