Last updated: 20 Aug 2025
Short version: set log retention explicitly, wire 3 alarms that actually matter, and keep severity mapping simple. Don’t create 40 “informational” alerts and call it monitoring.
1) Set log retention on day one
- Pick a default (e.g., 30 or 90 days) for all CloudWatch log groups. “Never expire” is not a plan.
- Apply exceptions only where justified (e.g., audit streams mirrored to S3).
2) Three alarms that earn their keep
- EC2 instance health:
StatusCheckFailed_System > 0
for 5 minutes → page the on-call. - ALB 5xx rate spike: error rate >= threshold (e.g., 5% over 5 minutes) → notify incident channel.
- RDS free storage low:
FreeStorageSpace
below threshold (e.g., 15% capacity) for 10 minutes → ticket + notify.
Everything else can wait until your runbook exists.
3) Optional: metric filters from CloudTrail
If your CloudTrail is shipped to CloudWatch Logs, add these filters + alarms (they’re security-relevant and low-noise):
# Root account usage
{ ($.userIdentity.type = "Root") && ($.userIdentity.invokedBy NOT EXISTS) && ($.eventType != "AwsServiceEvent") }
# Unauthorized API calls
{ ($.errorCode = "*UnauthorizedOperation") || ($.errorCode = "AccessDenied*") }
# Console logins without MFA
{ ($.eventName = "ConsoleLogin") && ($.additionalEventData.MFAUsed = "No") && ($.responseElements.ConsoleLogin = "Success") }
4) Wire alerts sensibly
- Alarms → SNS → email/Chat/incident tool. Use one topic per severity.
- Define OK/ALARM/INSUFFICIENT_DATA handling. Don’t spam on flaps; use a 5-minute period and sensible evaluation.
5) Keep the noise down
- Every alarm must have a clear owner and a runbook link. If nobody owns it, delete it.
- Review alarms monthly; remove those that didn’t help.
Security gaps in Linux and cloud systems risk downtime, data compromise, lost business — and compliance failures.
With 20+ years’ experience and active UK Security Check (SC) clearance, I harden Linux and cloud platforms for government, corporate, and academic sectors — ensuring secure, compliant, and resilient infrastructure.