0 AWS CloudTrail vs CloudWatch vs Config: what to use and when - kevwells.com

AWS CloudTrail vs CloudWatch vs Config: what to use and when

Short version: CloudTrail is your audit log of API activity, CloudWatch is telemetry + alerting (metrics/logs/events), and Config is resource inventory + compliance. You probably need all three, wired together sensibly.

Who does what

Goal CloudTrail CloudWatch Config
Audit every API call (who/what/when/where) ✔️
Real-time alerts (errors, patterns, thresholds) via Logs → Metric Filters ✔️ (Metrics, Logs, Alarms, EventBridge)
App/system logs centralisation ✔️ CloudWatch Logs
Resource inventory & change history ✔️
Compliance checks & drift detection ✔️ (managed rules)
Forensics / retain for years ✔️ to S3 (+ Glacier via lifecycle) Logs retention as set Snapshots & history in S3

Baseline that works (Org-wide)

  1. Organization trail: one multi-region trail writing to a central S3 bucket in a log archive account. Enable log file validation; encrypt with a KMS key.
  2. Data events (scoped): turn on for critical S3 buckets and Lambda functions (not blindly for all; costs and noise rise fast).
  3. Send to CloudWatch Logs: attach the trail to a log group; create metric filters + alarms for key patterns (below).
  4. Enable AWS Config in all regions: record all resources; deliver to S3 + SNS; turn on a small set of managed rules (below).
  5. Wire alerts: CloudWatch Alarms → SNS → email/Chat/incident channel. Keep severity mapping simple.

Three CloudTrail → CloudWatch alarms to start with

# 1) Root account usage
Filter pattern:
{ ($.userIdentity.type = "Root") && ($.userIdentity.invokedBy NOT EXISTS) && ($.eventType != "AwsServiceEvent") }

# 2) Unauthorized API calls
Filter pattern:
{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }

# 3) Console logins without MFA
Filter pattern:
{ ($.eventName = "ConsoleLogin") && ($.additionalEventData.MFAUsed = "No") && ($.responseElements.ConsoleLogin = "Success") }

Create a metric filter for each, then an alarm on >=1 occurrence over ~5 minutes to your incident SNS topic.

Config rules (start small)

  • s3-bucket-public-read-prohibited / s3-bucket-public-write-prohibited
  • restricted-ssh (security groups shouldn’t allow 0.0.0.0/0 on 22)
  • cloudtrail-log-file-validation-enabled
  • iam-root-access-key-check (root must not have access keys)
  • mfa-enabled-for-iam-console-access
  • ebs-encrypted-volume-by-default

Route non-compliant findings to a ticket queue; avoid email storms.

Retention and costs (keep it boring)

  • S3 lifecycle: move CloudTrail and Config objects to cheaper storage after 90 days; retain for 365+ days to suit your policy.
  • CloudWatch Logs: set explicit retention (e.g., 30 or 90 days). Don’t leave “Never expire”.
  • Data events: enable for high-value buckets/functions only; review quarterly.

Security notes

  • KMS key policy: allow CloudTrail/Config to write; grant read to your security role only; block deletes with a bucket policy and Object Lock if required.
  • Least privilege: delivery roles for CloudTrail/Config/Logs are narrow; rotate their creds if you use access keys (avoid where possible).

CLI snippets (illustrative)

# Create a log group for CloudTrail events
aws logs create-log-group --log-group-name org-cloudtrail

# Example metric filter: Unauthorized API calls
aws logs put-metric-filter \
  --log-group-name org-cloudtrail \
  --filter-name "unauthorized-api" \
  --filter-pattern '{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }' \
  --metric-transformations metricName=UnauthorizedApiCalls,metricNamespace=Security,metricValue=1

# Alarm on >=1 in 5 minutes (adjust ARN/topic)
aws cloudwatch put-metric-alarm \
  --alarm-name "Unauthorized API calls" \
  --metric-name UnauthorizedApiCalls \
  --namespace Security \
  --statistic Sum --period 300 --threshold 1 --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 --treat-missing-data notBreaching \
  --alarm-actions arn:aws:sns:eu-west-1:123456789012:incident-notify

What not to do

  • Don’t enable data events for every bucket and function “just in case”. Scope it.
  • Don’t keep CloudWatch Logs forever. Set retention on day one.
  • Don’t send every finding to email. Route to a queue/ticket system and batch notify.

Need a minimal, org-wide baseline deployed correctly? Request a call.

Security gaps in Linux and cloud systems risk downtime, data compromise, lost business — and compliance failures.

With 20+ years’ experience and active UK Security Check (SC) clearance, I harden Linux and cloud platforms for government, corporate, and academic sectors — ensuring secure, compliant, and resilient infrastructure.