How to Detect and Clean Up Unused Amazon S3 Buckets

As an AWS Cloud Administrator, you know managing Amazon S3 storage can become increasingly complex. Amazon S3 buckets tend to accumulate, and identifying which ones are genuinely needed versus those that are unused or forgotten presents a challenge. To help you avoid these many pitfalls, we will walk you through proven methods to detect and clean up unused S3 buckets while maintaining security and compliance.

Why Should You Care About Unused S3 Buckets?

A recent news story explained how abandoned S3 buckets could facilitate remote code execution and create supply-chain compromises. Researchers say they identified about 150 Amazon S3 buckets that were gone but applications and websites were still trying to pull software updates and other code from them. It’s a wake-up call to avoid this problem.

Unused S3 buckets pose several significant risks:

Security Vulnerabilities: Forgotten buckets with misconfigured permissions or outdated security policies can become security liabilities
Unnecessary Costs: While individual storage costs might seem minimal, they compound across multiple buckets and storage classes
Compliance Risks: Unmonitored buckets might retain sensitive data beyond required retention periods
Resource Management: Excessive buckets complicate backup strategies and disaster recovery planning

Prerequisites

Before starting your cleanup effort:

1. Ensure you have appropriate IAM permissions to:

List and inspect S3 buckets
Access CloudTrail logs
Use AWS Storage Lens
Access Cost Explorer

2. Enable necessary services:

AWS CloudTrail (with S3 data events if budget permits)
S3 Server Access Logging
AWS Config (for automated monitoring)

Detection Methods

1. AWS CloudTrail Analysis (Best for Activity Auditing)

CloudTrail provides comprehensive logging of all API activities. Here’s how to leverage it:

```sql
SELECT 
eventTime,
eventName,
requestParameters.bucketName,
userIdentity.principalId
FROM cloudtrail_logs_database.cloudtrail_logs
WHERE 
eventSource = 's3.amazonaws.com'
AND eventTime > date_sub('month', 3, current_timestamp)
AND eventName LIKE '%Object%'
GROUP BY 
requestParameters.bucketName,
eventName,
eventTime,
userIdentity.principalId
ORDER BY eventTime DESC
```

This query helps you:

Distinguish between read and write operations
Identify who last accessed the bucket
Track access patterns over time

Note, CloudTrail logging incurs costs based on the volume of events recorded.

2. S3 Storage Lens (Best for Organization-Wide Analysis)

AWS Storage Lens offers comprehensive storage analytics:

Access Storage Lens in the S3 Console
Enable Advanced Metrics (required for activity tracking)

Look for these key metrics:

Last accessed date
Request patterns
Storage class distribution
Cross-region activity

Pro tip: Create custom dashboards focusing on:

Buckets with zero requests
Objects not accessed in 90+ days
Storage class optimization opportunities

3. Automated Detection with Python

Here’s a Python script that handles common edge cases:

```python
import boto3
from datetime import datetime, timezone
from botocore.exceptions import ClientError

def check_bucket_usage(bucket_name):
s3 = boto3.client('s3')

try:
# Check bucket versioning
versioning = s3.get_bucket_versioning(Bucket=bucket_name)
is_versioned = versioning.get('Status') == 'Enabled'

# List objects with pagination
paginator = s3.get_paginator('list_objects_v2')
last_modified = None
total_size = 0
object_count = 0

for page in paginator.paginate(Bucket=bucket_name):
if 'Contents' in page:
for obj in page['Contents']:
object_count += 1
total_size += obj['Size']
if last_modified is None or obj['LastModified'] > last_modified:
last_modified = obj['LastModified']

# Check for replication
try:
replication = s3.get_bucket_replication(Bucket=bucket_name)
has_replication = True
except ClientError:
has_replication = False

return {
'name': bucket_name,
'last_modified': last_modified,
'object_count': object_count,
'total_size_gb': total_size / (1024**3),
'versioned': is_versioned,
'has_replication': has_replication
}

except ClientError as e:
print(f"Error processing bucket {bucket_name}: {str(e)}")
return None

def main():
s3 = boto3.client('s3')
results = []

try:
buckets = s3.list_buckets()['Buckets']
for bucket in buckets:
result = check_bucket_usage(bucket['Name'])
if result:
results.append(result)

# Sort by last modified date
results.sort(key=lambda x: x['last_modified'] if x['last_modified'] else datetime.min.replace(tzinfo=timezone.utc))

# Print results
for r in results:
print(f"\nBucket: {r['name']}")
print(f"Last Modified: {r['last_modified'] or 'Empty'}")
print(f"Object Count: {r['object_count']}")
print(f"Total Size (GB): {r['total_size_gb']:.2f}")
print(f"Versioning: {'Enabled' if r['versioned'] else 'Disabled'}")
print(f"Replication: {'Configured' if r['has_replication'] else 'None'}")

except ClientError as e:
print(f"Error listing buckets: {str(e)}")

if __name__ == '__main__':
main()
```

This script:

Handles pagination for large buckets
Checks versioning status
Identifies replication configurations
Includes error handling
Calculates total storage used

Clean-Up Process

Assessment Phase

Before deleting any buckets:

1. Tag suspicious buckets:

```json
{
"Status": "Review",
"LastChecked": "2025-02-05",
"Owner": "cloud-ops",
"ProjectStatus": "Unknown"
}
```

2. Check for:

Cross-account access in bucket policies
Replication configurations
Lifecycle policies
Object lock settings
Legal holds

Documentation

Create a decommissioning document for each bucket:

1. Bucket metadata:

Creation date
Last access date
Size and object count
Storage class distribution

2. Dependencies:

Applications using the bucket
CloudFront distributions
Cross-account access

3. Cost impact:

Current storage costs
Data transfer costs
Potential savings

Implementation

Follow this sequence for safe cleanup:

1. Enable MFA Delete for sensitive buckets

2. Create final backups if needed

3. Update bucket lifecycle policies:

```json
{
"Rules": [
{
"Status": "Enabled",
"Transition": {
"Days": 30,
"StorageClass": "GLACIER"
},
"Expiration": {
"Days": 90
}
}
]
}
```

4. Remove bucket from any replication configurations

5. Update DNS records and application configurations

6. Execute deletion with verification

Best Practices Moving Forward

1. Implement Mandatory Tagging

Project/Application
Owner
Environment
Expiry date

2. Set Up Automated Monitoring

Create AWS Config rules
Configure CloudWatch alerts
Schedule regular usage reports

3. Establish Governance

Require justification for new bucket creation
Regular access reviews
Automated lifecycle policies

4. Cost Optimization:

Use S3 Analytics to identify optimal storage classes
Set up cost allocation tags
Monitor data transfer patterns

Detect and Clean Up Unused Amazon S3 Buckets

Managing S3 buckets requires ongoing attention, but with these tools and processes, you can maintain a clean, secure, and cost-effective storage environment. Regular audits using a combination of CloudTrail, Storage Lens, and automated scripts will help keep your S3 infrastructure under control. Always verify before deleting any resources, and maintain clear documentation of your cleanup processes. Consider implementing preventive measures like mandatory tagging and automated monitoring to avoid future accumulation of unused buckets.

Your S3 buckets.
Organized. Searchable. Effortless.

For AWS administrators and end users,
an Amazon S3 file browser…
in your browser.

Get a Demo