As an AWS Cloud Administrator, managing Amazon S3 storage can become increasingly complex over time. Amazon S3 buckets tend to accumulate, and identifying which ones are genuinely needed versus those that are unused or forgotten presents a challenge. To help you avoid these many pitfalls, we will walk you through proven methods to detect and clean up unused S3 buckets while maintaining security and compliance.

Why Should You Care About Unused S3 Buckets?

A recent news story explained how abandoned AWS S3 buckets could facilitate remote code execution and create supply-chain compromises. Researchers say they identified about 150 Amazon S3 buckets that were gone but applications and websites were still trying to pull software updates and other code from them. It’s a wake-up call to avoid this problem.

Unused S3 buckets pose several significant risks:

  • Security Vulnerabilities:** Forgotten buckets with misconfigured permissions or outdated security policies can become security liabilities
  • Unnecessary Costs:** While individual storage costs might seem minimal, they compound across multiple buckets and storage classes
  • Compliance Risks:** Unmonitored buckets might retain sensitive data beyond required retention periods
  • Resource Management:** Excessive buckets complicate backup strategies and disaster recovery planning

Prerequisites

Before starting your cleanup effort:

1. Ensure you have appropriate IAM permissions to:

  • List and inspect S3 buckets
  • Access CloudTrail logs
  • Use AWS Storage Lens
  • Access Cost Explorer

2. Enable necessary services:

  • AWS CloudTrail (with S3 data events if budget permits)
  • S3 Server Access Logging
  • AWS Config (for automated monitoring)

Detection Methods

1. AWS CloudTrail Analysis (Best for Activity Auditing)

CloudTrail provides comprehensive logging of all API activities. Here’s how to leverage it:

```sql
SELECT 
eventTime,
eventName,
requestParameters.bucketName,
userIdentity.principalId
FROM cloudtrail_logs_database.cloudtrail_logs
WHERE 
eventSource = 's3.amazonaws.com'
AND eventTime > date_sub('month', 3, current_timestamp)
AND eventName LIKE '%Object%'
GROUP BY 
requestParameters.bucketName,
eventName,
eventTime,
userIdentity.principalId
ORDER BY eventTime DESC
```

This query helps you:

  • Distinguish between read and write operations
  • Identify who last accessed the bucket
  • Track access patterns over time

Note, CloudTrail logging incurs costs based on the volume of events recorded.

2. S3 Storage Lens (Best for Organization-Wide Analysis)

AWS Storage Lens offers comprehensive storage analytics:

  • Access Storage Lens in the S3 Console
  • Enable Advanced Metrics (required for activity tracking)

Look for these key metrics:

  • Last accessed date
  • Request patterns
  • Storage class distribution
  • Cross-region activity

Pro tip: Create custom dashboards focusing on:

  • Buckets with zero requests
  • Objects not accessed in 90+ days
  • Storage class optimization opportunities

3. Automated Detection with Python

Here’s a Python script that handles common edge cases:

```python
import boto3
from datetime import datetime, timezone
from botocore.exceptions import ClientError
def check_bucket_usage(bucket_name):
s3 = boto3.client('s3')
try:
# Check bucket versioning
versioning = s3.get_bucket_versioning(Bucket=bucket_name)
is_versioned = versioning.get('Status') == 'Enabled'
# List objects with pagination
paginator = s3.get_paginator('list_objects_v2')
last_modified = None
total_size = 0
object_count = 0
for page in paginator.paginate(Bucket=bucket_name):
if 'Contents' in page:
for obj in page['Contents']:
object_count += 1
total_size += obj['Size']
if last_modified is None or obj['LastModified'] > last_modified:
last_modified = obj['LastModified']
# Check for replication
try:
replication = s3.get_bucket_replication(Bucket=bucket_name)
has_replication = True
except ClientError:
has_replication = False
return {
'name': bucket_name,
'last_modified': last_modified,
'object_count': object_count,
'total_size_gb': total_size / (1024**3),
'versioned': is_versioned,
'has_replication': has_replication
}
except ClientError as e:
print(f"Error processing bucket {bucket_name}: {str(e)}")
return None
def main():
s3 = boto3.client('s3')
results = []
try:
buckets = s3.list_buckets()['Buckets']
for bucket in buckets:
result = check_bucket_usage(bucket['Name'])
if result:
results.append(result)
# Sort by last modified date
results.sort(key=lambda x: x['last_modified'] if x['last_modified'] else datetime.min.replace(tzinfo=timezone.utc))
# Print results
for r in results:
print(f"\nBucket: {r['name']}")
print(f"Last Modified: {r['last_modified'] or 'Empty'}")
print(f"Object Count: {r['object_count']}")
print(f"Total Size (GB): {r['total_size_gb']:.2f}")
print(f"Versioning: {'Enabled' if r['versioned'] else 'Disabled'}")
print(f"Replication: {'Configured' if r['has_replication'] else 'None'}")
except ClientError as e:
print(f"Error listing buckets: {str(e)}")
if __name__ == '__main__':
main()
```

This script:

  • Handles pagination for large buckets
  • Checks versioning status
  • Identifies replication configurations
  • Includes error handling
  • Calculates total storage used

Clean-Up Process

Assessment Phase

Before deleting any buckets:

1. Tag suspicious buckets:

```json
{
"Status": "Review",
"LastChecked": "2025-02-05",
"Owner": "cloud-ops",
"ProjectStatus": "Unknown"
}
```

2. Check for:

  • Cross-account access in bucket policies
  • Replication configurations
  • Lifecycle policies
  • Object lock settings
  • Legal holds

Documentation

Create a decommissioning document for each bucket:

1. Bucket metadata:

  • Creation date
  • Last access date
  • Size and object count
  • Storage class distribution

2. Dependencies:

  • Applications using the bucket
  • CloudFront distributions
  • Cross-account access

3. Cost impact:

  • Current storage costs
  • Data transfer costs
  • Potential savings

Implementation

Follow this sequence for safe cleanup:

1. Enable MFA Delete for sensitive buckets

2. Create final backups if needed

3. Update bucket lifecycle policies:

```json
{
"Rules": [
{
"Status": "Enabled",
"Transition": {
"Days": 30,
"StorageClass": "GLACIER"
},
"Expiration": {
"Days": 90
}
}
]
}
```

4. Remove bucket from any replication configurations

5. Update DNS records and application configurations

6. Execute deletion with verification

Best Practices Moving Forward

1. Implement Mandatory Tagging

  • Project/Application
  • Owner
  • Environment
  • Expiry date

2. Set Up Automated Monitoring

  • Create AWS Config rules
  • Configure CloudWatch alerts
  • Schedule regular usage reports

3. Establish Governance

  • Require justification for new bucket creation
  • Regular access reviews
  • Automated lifecycle policies

4. Cost Optimization:

  • Use S3 Analytics to identify optimal storage classes
  • Set up cost allocation tags
  • Monitor data transfer patterns

Detect and Clean Up Unused Amazon S3 Buckets

Managing S3 buckets requires ongoing attention, but with these tools and processes, you can maintain a clean, secure, and cost-effective storage environment. Regular audits using a combination of CloudTrail, Storage Lens, and automated scripts will help keep your S3 infrastructure under control. Always verify before deleting any resources, and maintain clear documentation of your cleanup processes. Consider implementing preventive measures like mandatory tagging and automated monitoring to avoid future accumulation of unused buckets.

CloudSee Drive

Your S3 buckets.
Organized. Searchable. Effortless.

For AWS administrators and end users,
an Amazon S3 file browser…
in your browser.