Security Monitoring and Alerting Design on AWS

Why Security Monitoring and Alerting Design Matters on the SCS-C02 Exam

Security monitoring and alerting design is the heart of Domain 2 on the AWS Certified Security – Specialty (SCS-C02) exam, contributing 18% of the score. Task statements 2.1 and 2.2 expect you to design CloudWatch metric filters, EventBridge rules, Security Hub insights, GuardDuty baselines, and SNS notification fan-out — and then troubleshoot when a security monitoring and alerting pipeline silently breaks. This topic shows up in scenario questions where the wrong service combination, missing IAM permission, or sloppy event pattern is the difference between catching a credential compromise in two minutes versus two days.

Because security monitoring and alerting on AWS spans seven or more services that pass events between each other, candidates who memorize one service in isolation tend to fail. The exam loves chained pipelines: GuardDuty finding → EventBridge rule → Lambda function → SNS topic → AWS Chatbot → Slack. Every hop has a permission boundary, an event-shape contract, and a failure mode. Strong security monitoring and alerting design forces you to think about all of them at once.

This topic note treats security monitoring and alerting as a system rather than a list of features. We will walk the data plane (where events come from), the control plane (which AWS service evaluates them), the action plane (who is notified or what is remediated), and finally the troubleshooting tree for when alerts go missing. By the end you should be able to draw a security monitoring and alerting architecture on a whiteboard, defend each arrow, and diagnose any broken link.

Core Building Blocks for Security Monitoring and Alerting

Security monitoring and alerting on AWS is built from a small set of primitives that compose into many patterns. CloudWatch supplies metrics, alarms, log groups, and dashboards. EventBridge supplies event buses, rules, schedules, and input transformers. Security Hub supplies the AWS Security Finding Format (ASFF) aggregator and custom insights. GuardDuty supplies managed threat intelligence baselines. Lambda and Step Functions supply the compute that takes action. SNS, SQS, and AWS Chatbot supply human notification.

The recurring exam pattern asks you to pick the right primitive for the right job. CloudWatch alarms watch numeric metrics over time windows; EventBridge rules watch the shape of individual events; Security Hub watches normalized findings across security services. If you confuse these three you will reach for an alarm when you needed a rule, or build a Lambda that polls CloudTrail when EventBridge would have invoked you for free.

Security monitoring and alerting also separates monitoring (collecting telemetry, computing aggregates) from alerting (notifying humans or triggering automation). Many exam wrong answers conflate the two — for example, by claiming you can SNS-publish directly from a CloudWatch Logs metric filter. You cannot: the filter creates a metric, an alarm watches the metric, and the alarm publishes to SNS.

ASFF is the JSON schema that Security Hub uses to normalize findings from GuardDuty, Inspector, Macie, IAM Access Analyzer, AWS Config, AWS Health, third-party partners, and your own custom integrations. Every Security Hub finding has the same top-level structure (Severity, Resources, Workflow, Compliance, ProductFields), which makes EventBridge pattern matching predictable across all sources. Reference: https://docs.aws.amazon.com/securityhub/latest/userguide/securityhub-findings-format.html

CloudWatch Patterns for Security Monitoring and Alerting

CloudWatch is the substrate for all numeric security monitoring and alerting on AWS. Three patterns dominate the SCS-C02 exam: metric filters on log groups, composite alarms for noise reduction, and anomaly detection bands.

Metric Filters on Log Groups → CloudWatch Alarms → SNS

The first canonical pattern starts with a CloudWatch Logs log group — typically the one CloudTrail or VPC Flow Logs writes into. You attach a metric filter that scans every log event for a pattern (for example, { $.eventName = "ConsoleLogin" && $.errorMessage = "Failed authentication" }). When the pattern matches, the filter increments a custom CloudWatch metric. A CloudWatch alarm watches that metric, and when the threshold breaches the alarm transitions to ALARM and publishes to an SNS topic. The SNS topic fans out to email, AWS Chatbot, PagerDuty, or a Lambda subscriber.

The exam will test every step. You must remember that CloudTrail logs only land in CloudWatch Logs if the trail was explicitly configured with a CloudWatch Logs role. You must remember that metric filters apply only to log events ingested after the filter is created — historical data stays unscanned. You must remember that an SNS subscription must be confirmed by the recipient before the topic can deliver to it.

Composite Alarms for Noise Reduction

A single alarm on a single metric is too noisy for most security signals. Composite alarms let you express a Boolean rule across multiple child alarms — for example, "alert only if (failed-logins > 5) AND (geographic-anomaly = true) AND (NOT maintenance-window)". Composite alarms reduce the security operations center's pager burden and surface only correlated events that are likely real incidents.

Create a "ChangeWindow" alarm whose state is driven by a CloudWatch metric you publish from your CI/CD pipeline. Reference it in the composite rule with NOT ALARM("ChangeWindow") to silence security alerting during planned deployments without disabling the underlying detector. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create_Composite_Alarm_How_To.html

Anomaly Detection Bands

For metrics that have a daily or weekly seasonality — login volume, API call rate, S3 GetObject count from a specific role — fixed thresholds either alert too often during peaks or miss attacks during quiet hours. CloudWatch Anomaly Detection trains a band around the expected value using up to two weeks of history. You alarm when the metric leaves the band by more than N standard deviations. This pattern is heavily favored by the exam for "user behavior baseline" scenarios.

If you turn on anomaly detection during an active incident, the model learns the attack as normal. Always seed anomaly-band alarms from a known-good window. The model retrains continuously so a long incident will eventually flatten the band. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html

EventBridge for Security Automation

EventBridge is the event bus that connects detection services to action services. Where CloudWatch alarms watch metrics, EventBridge rules watch the shape of individual events. Every security service worth knowing emits events to the default bus: GuardDuty, Security Hub, Config, AWS Health, IAM Access Analyzer, Macie, Inspector, and CloudTrail management events.

GuardDuty Finding Rule

The most-tested rule on the exam is the GuardDuty finding rule. The pattern matches on source = "aws.guardduty" and detail-type = "GuardDuty Finding". You usually narrow further on detail.severity — for example, { "numeric": [">=", 7] } — to alert only on HIGH (severity 7.0–8.9) and CRITICAL (severity 9.0+) findings, and to suppress LOW findings that would otherwise flood the channel.

IAM Root Account Usage Rule

Every regulated environment must alert when the AWS account root user signs in or makes an API call. The pattern matches CloudTrail events where detail.userIdentity.type = "Root" and detail.eventType != "AwsServiceEvent". Pair this rule with an SNS target that pages the on-call engineer immediately. The exam treats this as a baseline control on every multi-account question.

Config Compliance Change Rule

When a Config rule transitions a resource from COMPLIANT to NON_COMPLIANT, Config emits an event with source = "aws.config" and detail-type = "Config Rules Compliance Change". Filtering on detail.newEvaluationResult.complianceType = "NON_COMPLIANT" lets you trigger remediation Lambdas only when state actually degrades, not on every periodic re-evaluation.

Security Hub Finding Rule with Severity Filter

Security Hub re-emits findings from all underlying services in ASFF format on source = "aws.securityhub" and detail-type = "Security Hub Findings - Imported". The SCS-C02 favorite filter is detail.findings[0].Severity.Label matching "HIGH" or "CRITICAL", paired with detail.findings[0].Workflow.Status = "NEW" so already-triaged findings do not re-page the team.

A rule with "Severity": "HIGH" will not match a finding whose JSON contains "severity": 7 because the field name capitalization is different and the value type is different. Always copy the live event from the Security Hub console "Sample event" panel before authoring the rule, and use input transformers to reshape — not pattern matching to coerce. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-patterns.html

Auto-Remediation Pipelines

Once an EventBridge rule matches, the question becomes: what should AWS do? The two dominant patterns are EventBridge → Lambda for fixed single-step actions, and EventBridge → Step Functions for multi-step workflows that need branching, retries, or human approval.

EventBridge → Lambda for Fixed Actions

Use Lambda when the remediation is a single API call: revoke a security group ingress rule, disable an IAM access key, attach a quarantine policy, terminate an EC2 instance, snapshot an EBS volume. The Lambda function reads the finding from the input event, calls one or two AWS APIs, and writes a structured log line for audit. Cold-start latency is acceptable because security remediation rarely needs sub-second response.

EventBridge → Step Functions for Multi-Step

Use Step Functions when the remediation needs branching ("if instance is in production, page first; else auto-isolate"), parallel work ("snapshot volume AND copy memory dump AND tag forensic-hold"), retry policies ("re-attempt the API call three times with exponential backoff"), or human-in-the-loop approval via SNS-and-token activities. Step Functions also gives you a visual execution history that auditors love.

For a Lambda target, EventBridge uses a resource-based policy on the Lambda function (AddPermission with events.amazonaws.com).
For SNS, SQS, Step Functions, and Kinesis targets, EventBridge uses an IAM role you specify in the rule.
For cross-account targets, the destination account's bus policy must allow PutEvents from the source account, and the rule on the destination side dispatches to the actual target. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-use-resource-based.html

Custom Security Hub Insights and Automation Rules

Security Hub insights are saved queries over ASFF findings. The console ships about a dozen managed insights (top resources by finding count, top accounts by severity, etc.). Custom insights let you express your own group-by question — for example, "all CRITICAL findings on resources tagged env=prod, grouped by ResourceId, in the last 7 days". Insights are read-only views; they do not trigger anything by themselves. They are the dashboard equivalent of a SQL GROUP BY.

Security Hub automation rules, introduced in 2023, do take action. An automation rule has a criteria block (an ASFF filter) and an actions block (set Workflow.Status, set Severity.Label, suppress, add notes, or change Confidence). They run server-side inside Security Hub before the finding is re-published to EventBridge. This lets you, for example, auto-suppress findings on dev-account resources, or auto-elevate severity for findings on resources tagged pci=true. Automation rules execute in priority order; only matching rules fire, and a rule can be marked terminal to short-circuit the rest.

Because Security Hub automation rules can mutate or suppress a finding before EventBridge re-publishes, your downstream EventBridge rules see the post-rule state. If your auto-remediation Lambda does not fire, check the Security Hub automation rule list before you check the EventBridge pattern. Reference: https://docs.aws.amazon.com/securityhub/latest/userguide/automation-rules.html

GuardDuty Baseline Tracking and Behavior Monitoring

GuardDuty is the managed-threat-detection backbone of AWS security monitoring and alerting. It analyzes three primary data sources by default — VPC Flow Logs, CloudTrail management events, and Route 53 DNS query logs — without requiring you to enable those logs separately. Optional protection plans extend coverage to S3 data events (S3 Protection), EKS audit logs (Kubernetes Protection), EC2 instance malware scanning (Malware Protection), RDS login activity (RDS Protection), and Lambda execution telemetry (Lambda Protection).

The "baseline" aspect that the SCS-C02 exam emphasizes is that GuardDuty profiles each principal's normal behavior over time and flags deviations. A finding like UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B does not just say "someone logged in"; it says "this principal logged in from a geolocation it has never used before". Disabling and re-enabling GuardDuty resets the behavior baseline, which is why exam answers always prefer the suppression-rule pattern over disable-then-enable.

Suppression Rules vs Filters

A GuardDuty filter is a saved view over findings; it does not change what gets generated. A suppression rule, by contrast, automatically archives matching findings as soon as they arrive, so they never flow to EventBridge or Security Hub. Use suppression rules for known-good service accounts, vulnerability-scanner IPs, and intentional pen-test windows. Never use them to silence findings you simply do not understand.

Use AWS Organizations to designate a security-tooling account as the GuardDuty delegated administrator. New member accounts auto-enroll, all findings aggregate to one console, and you author suppression rules and EventBridge wiring in one place instead of N. The same pattern applies to Security Hub, Macie, Inspector, IAM Access Analyzer, and Detective. Reference: https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_organizations.html

Systems Manager Compliance for Baseline Tracking

While GuardDuty tracks behavior baselines, AWS Systems Manager tracks configuration baselines on EC2 fleets. Two pieces matter for security monitoring and alerting design.

Patch Manager publishes patch-compliance data: for each managed instance, it records which approved patches are installed, missing, failed, or rejected. The aggregate compliance state is queryable through the Systems Manager Compliance API, surfaced in the Compliance dashboard, and emitted as Config-rule evaluations when you pair it with the ec2-managedinstance-patch-compliance-status-check managed Config rule.

State Manager and Association Compliance track whether the SSM documents you require to run on every instance — antivirus install, CIS benchmark hardening, log-agent install — are actually executing successfully. Association compliance becomes NON_COMPLIANT the moment a document fails on any target, and Config-rule evaluation re-emits that state into Security Hub via ASFF.

The clinching exam point: Systems Manager Compliance does not page anyone by itself. You wire Config compliance-change events through EventBridge → SNS, the same as any other Config rule. This is why the topic belongs in security monitoring and alerting design rather than in the patching topic.

Any custom security signal your application publishes to CloudWatch must use the cloudwatch:PutMetricData API. The IAM principal — whether an EC2 instance role, ECS task role, Lambda execution role, or on-prem IAM user — must have that permission, and the call must specify a Namespace plus at least one Dimension to be queryable. Missing PutMetricData permission is the #1 cause of "my custom metric is not appearing" troubleshooting tickets. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html

Defining Metrics, Thresholds, and Alarm Math

A CloudWatch alarm is defined by five parameters that all show up on the exam: the metric (or metric math expression), the statistic (Average, Sum, Maximum, p99, etc.), the period (the bucket size in seconds), the evaluation periods (how many consecutive buckets), and the threshold (numeric or anomaly band). A sixth parameter, treatMissingData, decides what happens when no data points arrive in a bucket — notBreaching, breaching, ignore, or missing.

For security signals where "no data" means "the agent died and we are blind", you want treatMissingData: breaching. For seasonal signals where gaps are expected (a metric that only emits during business hours), you want notBreaching. Choosing the wrong policy is the most common reason an alarm flaps or, worse, silently fails.

Metric math lets you combine multiple metrics into a single alarm input. Common security expressions: m1/m2 (failed-login ratio), RATE(m1) (per-second rate), IF(m1 > 100, 1, 0) (binary signal). Math alarms cost the same as scalar alarms but reduce the number of pages.

Do not under-provision evaluation periods on critical security alarms. With a 1-minute period and 5 evaluation periods you wait 5 minutes minimum. For root-account-usage alerts use 1-of-1 (period = 60s, evaluationPeriods = 1, datapointsToAlarm = 1). For noisy metrics use 3-of-5 to absorb single-bucket spikes without ignoring sustained issues. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

Troubleshooting Missing Alerts — Decision Tree

Task statement 2.2 is dedicated to troubleshooting security monitoring and alerting failures. Exam scenarios always describe a pipeline that "should" be alerting but is silent. Walk the data path in order from source to destination and check the most common breakage at each hop.

Check that the subscription is in Confirmed state, not PendingConfirmation. Email and HTTPS subscribers must click a confirmation link before SNS will deliver. Check the topic access policy — a Lambda or CloudWatch alarm in another account needs an explicit sns:Publish Allow. Check the SNS delivery status logs (you must enable them per protocol with an IAM role) for HTTP 4xx responses indicating the receiver rejected the payload.

EventBridge Rule Not Firing

Use the rule's MatchedEvents and Invocations CloudWatch metrics to distinguish "pattern never matched" from "matched but target failed". If MatchedEvents is zero, the pattern is wrong: replay a sample event from the source service through the EventBridge sandbox or aws events test-event-pattern and compare. If MatchedEvents is non-zero but Invocations is zero, the target IAM role or resource policy is wrong; check FailedInvocations and the rule's dead-letter queue (DLQ).

Input transformer mistakes are a particularly nasty failure mode: the rule fires, but the target receives malformed JSON and silently rejects it. Always pair input transformers with a target DLQ.

CloudWatch Alarm Flapping or Stuck in INSUFFICIENT_DATA

A flapping alarm is usually a threshold-plus-period mismatch. Increase evaluationPeriods and use datapointsToAlarm (M-of-N) to absorb noise. An alarm stuck in INSUFFICIENT_DATA means no data points are arriving in the configured period; check whether the underlying metric is emitting at all, whether the period matches the metric's native resolution, and whether treatMissingData should be breaching instead.

Lambda Target Not Executing

Check the Lambda's CloudWatch Logs /aws/lambda/<fn> log group. If there is no log stream at all, the function never invoked: check the EventBridge target permissions (resource-based policy on the Lambda) and the source's invoke permissions. If there are log streams with errors, fix the runtime issue and add a DLQ (SQS or SNS) to the function so future failures are visible. Confirm the execution role has every API permission the function uses, including kms:Decrypt for any encrypted environment variables or secrets.

Custom Application Metrics Not Reporting

The instance role, ECS task role, Lambda execution role, or on-prem IAM principal lacks cloudwatch:PutMetricData. Add the permission, optionally restrict to a single namespace via the cloudwatch:namespace condition key. Verify the agent or SDK is configured for the right region — metrics published to us-east-1 by mistake will never surface in eu-west-1 dashboards. Confirm the metric name, namespace, and dimension set match exactly what the alarm is configured to watch; CloudWatch silently creates a separate metric for any deviation.

Without a DLQ, failed deliveries are dropped after EventBridge's internal retry budget (24 hours, exponential backoff). With a DLQ — SQS or SNS — you get a permanent record of the event plus the failure reason, which is the only realistic way to diagnose intermittent target failures. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rule-dlq.html

End-to-End Reference Architecture

Putting the pieces together, a production-grade security monitoring and alerting design on AWS looks like this. CloudTrail organization trail writes management events to a central S3 bucket and to a CloudWatch log group in the security-tooling account. VPC Flow Logs and Route 53 Resolver logs land in the same log group set. GuardDuty is enabled organization-wide with the security-tooling account as delegated administrator and S3, EKS, and Malware Protection turned on.

Security Hub is also delegated-admin in the same account, with all member accounts auto-enrolled and the AWS Foundational Security Best Practices, CIS, and PCI standards enabled. Automation rules pre-classify findings: dev-account suppression, prod-tag elevation, known-vulnerability deduplication. The post-automation finding stream flows into EventBridge.

A small set of EventBridge rules fan out from there: HIGH/CRITICAL findings to a Lambda that opens a JIRA ticket and pages via SNS to PagerDuty; root-user activity to a separate SNS topic with AWS Chatbot delivery to a #sec-alerts Slack channel; Config NON_COMPLIANT events to a Step Functions auto-remediation workflow with a 30-second human-override window; GuardDuty findings of type Backdoor: or CryptoCurrency: directly to a Step Functions isolation flow that detaches the instance's security groups and snapshots the volume.

CloudWatch metric filters on the central log group catch the long-tail signals GuardDuty does not cover: console logins without MFA, KMS key disable attempts, S3 bucket policy changes that grant Principal: "*". Each filter feeds a CloudWatch alarm with composite logic to suppress maintenance-window noise. Custom application metrics from the application teams flow through the Custom/AppSec namespace and inherit the same alarm/SNS plumbing.

Use a custom EventBridge bus in the security-tooling account that all member accounts target via cross-account PutEvents. Application teams own the rules that consume from the bus into their own targets, while the security team owns the bus policy, the standard targets (PagerDuty, Slack, JIRA), and the audit trail. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cross-account.html

Plain-Language Explanation:

Common Exam Pitfalls and How to Avoid Them

The SCS-C02 exam loves to set traps around security monitoring and alerting because the surface area is huge and a single misconfiguration kills the entire pipeline. Watch for these patterns.

The "automatic" answer is usually wrong. If a question says "the system automatically notifies the security team", check whether the proposed solution actually has a notification step. CloudWatch metric filters do not notify; they need an alarm. Security Hub insights do not notify; they need an EventBridge rule or automation rule wired to SNS.

CloudTrail-only is rarely enough. Many wrong answers propose "enable CloudTrail and review logs daily". Daily review is not alerting; it is forensics. The right answer almost always pairs CloudTrail with CloudWatch Logs metric filters or with EventBridge rules.

Per-account is rarely enough. SCS-C02 is multi-account-first. If the answer enables GuardDuty, Security Hub, or Config in a single account when AWS Organizations is in scope, look for the delegated-administrator alternative and prefer it.

FAQ

How is EventBridge different from CloudWatch Events for security monitoring?

EventBridge is the rebranded and extended successor to CloudWatch Events. The default bus is shared between the two service names, and old rules continue to work. EventBridge added custom buses, schema registry, partner sources (Zendesk, Datadog, Auth0), pipes, and scheduler. For security monitoring and alerting design, always say EventBridge in 2026; the exam uses both names interchangeably but the modern features (input transformers on bus targets, archive and replay, cross-account event buses) are EventBridge-only.

When should I use a CloudWatch alarm vs an EventBridge rule for the same security event?

Use a CloudWatch alarm when the trigger is a numeric threshold over time — failed logins per 5 minutes, KMS Decrypt API count above baseline, GuardDuty finding count rising. Use an EventBridge rule when the trigger is a single event whose shape matters — root user signs in once, a specific GuardDuty finding type fires, a Config rule transitions to NON_COMPLIANT. Many real systems use both: the CloudWatch alarm provides the volume signal, and the EventBridge rule provides the per-event detail to the responder.

What is the difference between a Security Hub insight and a Security Hub automation rule?

A Security Hub insight is a saved query — a GROUP BY view over ASFF findings that you read in the console or pull from API. Insights take no action. A Security Hub automation rule has a criteria block and an actions block; it mutates findings server-side (set Workflow.Status to NOTIFIED, change Severity.Label, add notes, suppress) before they re-publish to EventBridge. Use insights to build dashboards and reports; use automation rules to enforce triage policy at scale.

Why is my GuardDuty finding not triggering my EventBridge rule?

Check four things in order: (1) the GuardDuty detector is in the same Region as the EventBridge rule (findings are regional); (2) the event pattern matches the actual finding JSON — copy a sample from the GuardDuty console rather than guessing field names; (3) no Security Hub automation rule is suppressing the finding before re-publish; (4) the finding is new, not an update to an existing one (GuardDuty re-emits updates with a different detail-type of GuardDuty Finding but service.archived = false — make sure your pattern allows updates if you care about them). Add a CloudWatch Logs target to the rule for debugging so you can see exactly what is matching.

What permissions does a Lambda function need to remediate a security finding?

The Lambda execution role needs the AWS APIs the remediation calls (for example, ec2:RevokeSecurityGroupIngress, iam:UpdateAccessKey, s3:PutBucketPolicy), plus logs:CreateLogStream and logs:PutLogEvents for its own logging, plus kms:Decrypt if it reads encrypted environment variables or Secrets Manager secrets. Use the principle of least privilege: scope each remediation Lambda to one resource type so a compromised function cannot escalate. Use Lambda function URLs only with IAM auth, never with public access, and prefer EventBridge invocation over function URLs for security automation.

How do I alert on root account usage immediately?

Create an EventBridge rule on the default bus with the pattern {"source":["aws.signin","aws.cloudtrail"], "detail":{"userIdentity":{"type":["Root"]}}} (the exact field set varies between console-login and API events; combine both). Target an SNS topic with email, AWS Chatbot, and PagerDuty subscribers. The rule fires within seconds of the CloudTrail event. Pair it with a CloudWatch alarm on a metric filter for { $.userIdentity.type = "Root" } against the central CloudTrail log group as a belt-and-suspenders backup, and ensure the trail is multi-region so the alert fires regardless of which Region the root user touches.

Not directly. CloudTrail itself has no SNS publish step for alerting purposes (its SnsTopicName field is for trail-delivery notifications only, not for log-content alerting). The two supported paths are: (1) CloudTrail → CloudWatch Logs → metric filter → CloudWatch alarm → SNS, and (2) CloudTrail management events → EventBridge default bus → rule → SNS target. Path 2 is faster (sub-minute) and recommended for most security monitoring and alerting. Path 1 is necessary when you need numeric aggregation across log lines.

How do I prevent alarm fatigue without disabling alarms?

Use composite alarms with maintenance-window suppressors, Security Hub automation rules to mute known-good findings, EventBridge rules with severity filters (Severity.Label in ["HIGH","CRITICAL"]), GuardDuty suppression rules for vetted vulnerability scanners, and datapointsToAlarm (M-of-N) on noisy metrics. Track alert volume itself as a metric: a CloudWatch dashboard tile showing pages per day per category surfaces fatigue trends so the security operations team can tune thresholds before responders start ignoring them.

Is AWS Chatbot required for Slack and Microsoft Teams notifications?

AWS Chatbot is the AWS-native, no-Lambda way to deliver SNS messages to Slack and Teams channels with rich formatting and slash-command actionability. Alternatives — a Lambda subscriber that calls the Slack webhook directly, or third-party tools like PagerDuty's SNS integration — work but require you to maintain code and secrets. For SCS-C02, AWS Chatbot is the canonical answer when the question mentions Slack or Teams alongside SNS.

Key Takeaways for SCS-C02

Security monitoring and alerting design on AWS is the orchestration of CloudWatch (numeric thresholds), EventBridge (event-shape routing), Security Hub (ASFF normalization and automation rules), GuardDuty (managed behavior baseline), Lambda and Step Functions (action), and SNS plus AWS Chatbot (notification). Master the data flow between these services and you master Domain 2 task 2.1.

Troubleshooting follows the same data flow in reverse: SNS subscription → SNS topic policy → CloudWatch alarm state → metric filter pattern → log ingestion permission, or SNS → EventBridge target permission → EventBridge invocation metric → event pattern → source service emission. Knowing the order is more valuable than memorizing any single service.

Multi-account security monitoring and alerting is the SCS-C02 differentiator. Always reach for AWS Organizations, delegated administrators, organization-wide CloudTrail, and cross-account EventBridge buses. Single-account answers are almost always wrong on this exam. With those patterns, plus the troubleshooting decision tree from this topic, you should be able to handle every Domain 2 task 2.1 and 2.2 scenario the exam can construct.

Security Monitoring Alerting Design

Why Security Monitoring and Alerting Design Matters on the SCS-C02 Exam

Core Building Blocks for Security Monitoring and Alerting

CloudWatch Patterns for Security Monitoring and Alerting

Metric Filters on Log Groups → CloudWatch Alarms → SNS

Composite Alarms for Noise Reduction

Anomaly Detection Bands

EventBridge for Security Automation

GuardDuty Finding Rule

IAM Root Account Usage Rule

Config Compliance Change Rule

Security Hub Finding Rule with Severity Filter

Auto-Remediation Pipelines

EventBridge → Lambda for Fixed Actions

EventBridge → Step Functions for Multi-Step

Custom Security Hub Insights and Automation Rules

GuardDuty Baseline Tracking and Behavior Monitoring

Suppression Rules vs Filters

Systems Manager Compliance for Baseline Tracking

Defining Metrics, Thresholds, and Alarm Math

Troubleshooting Missing Alerts — Decision Tree

EventBridge Rule Not Firing

CloudWatch Alarm Flapping or Stuck in INSUFFICIENT_DATA

Lambda Target Not Executing

Custom Application Metrics Not Reporting

End-to-End Reference Architecture

Plain-Language Explanation:

Common Exam Pitfalls and How to Avoid Them

FAQ

How is EventBridge different from CloudWatch Events for security monitoring?

When should I use a CloudWatch alarm vs an EventBridge rule for the same security event?

What is the difference between a Security Hub insight and a Security Hub automation rule?

Why is my GuardDuty finding not triggering my EventBridge rule?

What permissions does a Lambda function need to remediate a security finding?

How do I alert on root account usage immediately?

How do I prevent alarm fatigue without disabling alarms?

Is AWS Chatbot required for Slack and Microsoft Teams notifications?

Key Takeaways for SCS-C02

Official sources

Why Security Monitoring and Alerting Design Matters on the SCS-C02 Exam

Core Building Blocks for Security Monitoring and Alerting

CloudWatch Patterns for Security Monitoring and Alerting

Metric Filters on Log Groups → CloudWatch Alarms → SNS

Composite Alarms for Noise Reduction

Anomaly Detection Bands

EventBridge for Security Automation

GuardDuty Finding Rule

IAM Root Account Usage Rule

Config Compliance Change Rule

Security Hub Finding Rule with Severity Filter

Auto-Remediation Pipelines

EventBridge → Lambda for Fixed Actions

EventBridge → Step Functions for Multi-Step

Custom Security Hub Insights and Automation Rules

GuardDuty Baseline Tracking and Behavior Monitoring

Suppression Rules vs Filters

Systems Manager Compliance for Baseline Tracking

Defining Metrics, Thresholds, and Alarm Math

Troubleshooting Missing Alerts — Decision Tree

SNS Not Receiving / Subscribers Not Notified

EventBridge Rule Not Firing

CloudWatch Alarm Flapping or Stuck in INSUFFICIENT_DATA

Lambda Target Not Executing

Custom Application Metrics Not Reporting

End-to-End Reference Architecture

Plain-Language Explanation:

Common Exam Pitfalls and How to Avoid Them

FAQ

How is EventBridge different from CloudWatch Events for security monitoring?

When should I use a CloudWatch alarm vs an EventBridge rule for the same security event?

What is the difference between a Security Hub insight and a Security Hub automation rule?

Why is my GuardDuty finding not triggering my EventBridge rule?

What permissions does a Lambda function need to remediate a security finding?

How do I alert on root account usage immediately?

Can I send security alerts directly from CloudTrail to SNS without CloudWatch?

How do I prevent alarm fatigue without disabling alarms?

Is AWS Chatbot required for Slack and Microsoft Teams notifications?

Key Takeaways for SCS-C02

Official sources