AWS CloudTrail vs CloudWatch vs Config: what to use and when

Short version: CloudTrail is your audit log of API activity, CloudWatch is telemetry + alerting (metrics/logs/events), and Config is resource inventory + compliance. You probably need all three, wired together sensibly.

Who does what

Goal	CloudTrail	CloudWatch	Config
Audit every API call (who/what/when/where)	✔️	—	—
Real-time alerts (errors, patterns, thresholds)	via Logs → Metric Filters	✔️ (Metrics, Logs, Alarms, EventBridge)	—
App/system logs centralisation	—	✔️ CloudWatch Logs	—
Resource inventory & change history	—	—	✔️
Compliance checks & drift detection	—	—	✔️ (managed rules)
Forensics / retain for years	✔️ to S3 (+ Glacier via lifecycle)	Logs retention as set	Snapshots & history in S3

Baseline that works (Org-wide)

Organization trail: one multi-region trail writing to a central S3 bucket in a log archive account. Enable log file validation; encrypt with a KMS key.
Data events (scoped): turn on for critical S3 buckets and Lambda functions (not blindly for all; costs and noise rise fast).
Send to CloudWatch Logs: attach the trail to a log group; create metric filters + alarms for key patterns (below).
Enable AWS Config in all regions: record all resources; deliver to S3 + SNS; turn on a small set of managed rules (below).
Wire alerts: CloudWatch Alarms → SNS → email/Chat/incident channel. Keep severity mapping simple.

Three CloudTrail → CloudWatch alarms to start with

# 1) Root account usage
Filter pattern:
{ ($.userIdentity.type = "Root") && ($.userIdentity.invokedBy NOT EXISTS) && ($.eventType != "AwsServiceEvent") }

# 2) Unauthorized API calls
Filter pattern:
{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }

# 3) Console logins without MFA
Filter pattern:
{ ($.eventName = "ConsoleLogin") && ($.additionalEventData.MFAUsed = "No") && ($.responseElements.ConsoleLogin = "Success") }

Create a metric filter for each, then an alarm on >=1 occurrence over ~5 minutes to your incident SNS topic.

Config rules (start small)

s3-bucket-public-read-prohibited / s3-bucket-public-write-prohibited
restricted-ssh (security groups shouldn’t allow 0.0.0.0/0 on 22)
cloudtrail-log-file-validation-enabled
iam-root-access-key-check (root must not have access keys)
mfa-enabled-for-iam-console-access
ebs-encrypted-volume-by-default

Route non-compliant findings to a ticket queue; avoid email storms.

Retention and costs (keep it boring)

S3 lifecycle: move CloudTrail and Config objects to cheaper storage after 90 days; retain for 365+ days to suit your policy.
CloudWatch Logs: set explicit retention (e.g., 30 or 90 days). Don’t leave “Never expire”.
Data events: enable for high-value buckets/functions only; review quarterly.

Security notes

KMS key policy: allow CloudTrail/Config to write; grant read to your security role only; block deletes with a bucket policy and Object Lock if required.
Least privilege: delivery roles for CloudTrail/Config/Logs are narrow; rotate their creds if you use access keys (avoid where possible).

CLI snippets (illustrative)

# Create a log group for CloudTrail events
aws logs create-log-group --log-group-name org-cloudtrail

# Example metric filter: Unauthorized API calls
aws logs put-metric-filter \
  --log-group-name org-cloudtrail \
  --filter-name "unauthorized-api" \
  --filter-pattern '{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }' \
  --metric-transformations metricName=UnauthorizedApiCalls,metricNamespace=Security,metricValue=1

# Alarm on >=1 in 5 minutes (adjust ARN/topic)
aws cloudwatch put-metric-alarm \
  --alarm-name "Unauthorized API calls" \
  --metric-name UnauthorizedApiCalls \
  --namespace Security \
  --statistic Sum --period 300 --threshold 1 --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 --treat-missing-data notBreaching \
  --alarm-actions arn:aws:sns:eu-west-1:123456789012:incident-notify

What not to do

Don’t enable data events for every bucket and function “just in case”. Scope it.
Don’t keep CloudWatch Logs forever. Set retention on day one.
Don’t send every finding to email. Route to a queue/ticket system and batch notify.