Short version: CloudTrail is your audit log of API activity, CloudWatch is telemetry + alerting (metrics/logs/events), and Config is resource inventory + compliance. You probably need all three, wired together sensibly.
Who does what
Goal | CloudTrail | CloudWatch | Config |
---|---|---|---|
Audit every API call (who/what/when/where) | ✔️ | — | — |
Real-time alerts (errors, patterns, thresholds) | via Logs → Metric Filters | ✔️ (Metrics, Logs, Alarms, EventBridge) | — |
App/system logs centralisation | — | ✔️ CloudWatch Logs | — |
Resource inventory & change history | — | — | ✔️ |
Compliance checks & drift detection | — | — | ✔️ (managed rules) |
Forensics / retain for years | ✔️ to S3 (+ Glacier via lifecycle) | Logs retention as set | Snapshots & history in S3 |
Baseline that works (Org-wide)
- Organization trail: one multi-region trail writing to a central S3 bucket in a log archive account. Enable log file validation; encrypt with a KMS key.
- Data events (scoped): turn on for critical S3 buckets and Lambda functions (not blindly for all; costs and noise rise fast).
- Send to CloudWatch Logs: attach the trail to a log group; create metric filters + alarms for key patterns (below).
- Enable AWS Config in all regions: record all resources; deliver to S3 + SNS; turn on a small set of managed rules (below).
- Wire alerts: CloudWatch Alarms → SNS → email/Chat/incident channel. Keep severity mapping simple.
Three CloudTrail → CloudWatch alarms to start with
# 1) Root account usage
Filter pattern:
{ ($.userIdentity.type = "Root") && ($.userIdentity.invokedBy NOT EXISTS) && ($.eventType != "AwsServiceEvent") }
# 2) Unauthorized API calls
Filter pattern:
{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }
# 3) Console logins without MFA
Filter pattern:
{ ($.eventName = "ConsoleLogin") && ($.additionalEventData.MFAUsed = "No") && ($.responseElements.ConsoleLogin = "Success") }
Create a metric filter for each, then an alarm on >=1 occurrence over ~5 minutes to your incident SNS topic.
Config rules (start small)
- s3-bucket-public-read-prohibited / s3-bucket-public-write-prohibited
- restricted-ssh (security groups shouldn’t allow 0.0.0.0/0 on 22)
- cloudtrail-log-file-validation-enabled
- iam-root-access-key-check (root must not have access keys)
- mfa-enabled-for-iam-console-access
- ebs-encrypted-volume-by-default
Route non-compliant findings to a ticket queue; avoid email storms.
Retention and costs (keep it boring)
- S3 lifecycle: move CloudTrail and Config objects to cheaper storage after 90 days; retain for 365+ days to suit your policy.
- CloudWatch Logs: set explicit retention (e.g., 30 or 90 days). Don’t leave “Never expire”.
- Data events: enable for high-value buckets/functions only; review quarterly.
Security notes
- KMS key policy: allow CloudTrail/Config to write; grant read to your security role only; block deletes with a bucket policy and Object Lock if required.
- Least privilege: delivery roles for CloudTrail/Config/Logs are narrow; rotate their creds if you use access keys (avoid where possible).
CLI snippets (illustrative)
# Create a log group for CloudTrail events
aws logs create-log-group --log-group-name org-cloudtrail
# Example metric filter: Unauthorized API calls
aws logs put-metric-filter \
--log-group-name org-cloudtrail \
--filter-name "unauthorized-api" \
--filter-pattern '{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }' \
--metric-transformations metricName=UnauthorizedApiCalls,metricNamespace=Security,metricValue=1
# Alarm on >=1 in 5 minutes (adjust ARN/topic)
aws cloudwatch put-metric-alarm \
--alarm-name "Unauthorized API calls" \
--metric-name UnauthorizedApiCalls \
--namespace Security \
--statistic Sum --period 300 --threshold 1 --comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 --treat-missing-data notBreaching \
--alarm-actions arn:aws:sns:eu-west-1:123456789012:incident-notify
What not to do
- Don’t enable data events for every bucket and function “just in case”. Scope it.
- Don’t keep CloudWatch Logs forever. Set retention on day one.
- Don’t send every finding to email. Route to a queue/ticket system and batch notify.
Security gaps in Linux and cloud systems risk downtime, data compromise, lost business — and compliance failures.
With 20+ years’ experience and active UK Security Check (SC) clearance, I harden Linux and cloud platforms for government, corporate, and academic sectors — ensuring secure, compliant, and resilient infrastructure.