examhub .cc 用最有效率的方法,考取最有價值的認證
Vol. I
本篇導覽 約 37 分鐘

排程任務、Config 自動修復與流程自動化

7,400 字 · 約 37 分鐘閱讀

Task Statement 3.2 of the SOA-C02 Exam Guide says, in plain words, "automate manual or repeatable processes" — and that single sentence hides three of the most heavily examined operational disciplines on the SysOps Administrator track: scheduling tasks on a calendar, detecting drift from policy, and chaining Config rules to EventBridge rules to Systems Manager Automation runbooks for closed-loop remediation. Where SAA-C03 asks "which AWS service detects a non-compliant S3 bucket?", SOA-C02 asks "the bucket is non-compliant — write the EventBridge target, attach the SSM Automation document, configure the remediation IAM role, and explain why the auto-remediation silently failed at 02:00 last Sunday". Scheduled tasks and Config automation is the topic where EventBridge, EventBridge Scheduler, AWS Config, Systems Manager Automation, and Maintenance Windows all interlock into a single operational fabric.

This guide walks through scheduled tasks and Config automation from the SysOps angle: when EventBridge Scheduler is the right answer instead of an EventBridge rule, how to read and write rate and cron expressions without falling for the 5-field versus 6-field trap, what the difference between a periodic Config rule and a configuration-change Config rule means in practice, exactly how a Config auto-remediation hooks into an SSM Automation document with a remediation role, and how Systems Manager Maintenance Windows let you fold patch jobs, snapshot jobs, and arbitrary runbooks into a recurring schedule with priorities and concurrency limits. You will also see the recurring SOA-C02 scenario shapes around scheduled tasks and Config automation: nightly snapshots that should not be cron jobs on EC2, Config rules that fail to remediate because of a missing trust policy, EventBridge rules that fire the wrong number of times because the cron field count was wrong, and cross-account schedule fan-out via EventBridge Scheduler.

Why Scheduled Tasks and Config Automation Sit at the Heart of SOA-C02 Domain 3.2

The official SOA-C02 Exam Guide v2.3 lists three skills under Task Statement 3.2: use AWS services to automate deployment processes, implement automated patch management, and "schedule automated tasks by using AWS services (for example, EventBridge, AWS Config)". Patching has its own topic in this study set; deployment automation overlaps with CloudFormation; this topic owns the third skill plus the cross-cutting glue: scheduled tasks and Config-driven automation.

At the SysOps tier the framing is operational, not architectural. SAA-C03 asks "which AWS service should we use to detect a public S3 bucket?" — the answer is AWS Config plus the managed rule. SOA-C02 asks "the public-bucket Config rule is firing, but auto-remediation never runs — why?" — the answers usually point at the remediation role's trust policy, a missing parameter on the SSM Automation document, or the rule being in manual rather than automatic mode. Scheduled Tasks and Config Automation is the topic where every other SOA-C02 topic plugs back in: CloudWatch alarms feed EventBridge, Auto Scaling reacts to scheduled actions, CloudFormation deploys schedules and rules as IaC, IAM policies authorize the chain, and VPC endpoints let private instances reach the SSM and EventBridge APIs.

  • EventBridge rule (event-pattern): a rule on an EventBridge event bus that matches incoming events by a JSON pattern and routes them to one or more targets. Reactive, not scheduled.
  • EventBridge rule (scheduled): a rule on the default event bus that fires on a rate(...) or cron(...) expression. The classic "scheduled CloudWatch Events" pattern, still supported today.
  • EventBridge Scheduler: a separate, newer service for scheduling at scale. One-time and recurring schedules, schedule groups, flexible time windows, time zone support, dead-letter queues, and 270+ target services. Not the same API as EventBridge rules.
  • Rate expression: rate(value unit) where unit is minute(s), hour(s), or day(s) — for example rate(5 minutes), rate(1 hour), rate(7 days).
  • Cron expression: a 6-field cron-style string in EventBridge — minutes, hours, day-of-month, month, day-of-week, year. Note the year field; classic Unix cron has 5 fields and no year.
  • AWS Config: a service that records resource configuration over time and evaluates resources against rules.
  • Config rule: a managed or custom evaluator that returns COMPLIANT or NON_COMPLIANT for a resource. Two trigger types: configuration-change and periodic.
  • Conformance pack: a packaged collection of Config rules and remediation actions deployable in one click, often aligned to a compliance framework (CIS, NIST, PCI-DSS).
  • Auto-remediation: a Config rule action that runs an SSM Automation document on a non-compliant resource — manual approval or automatic execution, with optional retries.
  • Remediation role: the IAM role assumed by the Config remediation engine when it executes the SSM Automation document. Trust policy must trust ssm.amazonaws.com.
  • SSM Automation document (runbook): a YAML or JSON document with sequential steps such as aws:executeAwsApi, aws:runCommand, aws:executeScript, aws:waitForAwsResourceProperty, aws:branch, aws:approve. AWS-managed documents have the AWS- prefix.
  • Maintenance Window: a recurring schedule in Systems Manager that bundles targets, tasks, and execution priorities for patching, snapshots, and arbitrary commands or runbooks.
  • Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html

白話文解釋 Scheduled Tasks, Config Auto-Remediation, and Process Automation

The Scheduled Tasks and Config automation jargon stacks fast. Three analogies help the constructs stick.

Analogy 1: The Clock + Alarm Panel + Janitor Chain

Think of the whole AWS automation fabric as a clock + alarm panel + janitor chain in an office building. EventBridge Scheduler and EventBridge scheduled rules are the wall clocks that announce, "It is 02:00 — kick off the nightly snapshot job." AWS Config rules are the fire and smoke detectors that watch the building continuously and call out, "There is an unencrypted EBS volume on the third floor!" EventBridge event-pattern rules are the central alarm panel that receives every alert (clock chimes, smoke detectors, security cameras) and decides where to dispatch them. SSM Automation documents are the standing orders to the janitor — exactly which steps to take, with parameters like which volume to encrypt or which instance to restart. The remediation role is the janitor's keycard that opens just the right doors to do the job and nothing more. Maintenance Windows are the planned overnight cleaning shifts with their own crew, schedule, and priority — patching at 02:00, log rotation at 02:30, snapshot copy at 03:00. The whole point is that nothing happens because a human clicked something at 02:00; everything happens because the clock, the alarm panel, and the janitor are wired together with explicit policies and schedules.

Analogy 2: The Restaurant SOPs Binder

A modern restaurant kitchen runs on Standard Operating Procedures. The opening checklist ("turn on grills at 10:00, check fridge temps at 10:15, prep mise en place by 10:45") is a scheduled rule — pure time-based. The closing checklist is the same on a different schedule. The food safety rule "if fridge temperature exceeds 5°C, isolate the contents and call maintenance" is a Config rule with auto-remediation — event-driven, conditional, with a runbook attached. The head chef's plating SOP is an SSM Automation document: step 1 plate the protein, step 2 add sauce, step 3 garnish, step 4 wipe the rim. Each step has parameters (which protein, which sauce). New cooks (instances) never wing it; they execute the SOP. When health inspections (compliance audits) ask "show me the SOPs for fridge temperature breaches", the kitchen hands over the binder — exactly what AWS Config + SSM Automation give you in the cloud.

Analogy 3: The Factory Assembly Line With Quality Gates

An automated factory assembly line runs on a clock — every 30 seconds a part advances. That cadence is EventBridge scheduled rules. Each station has a quality gate — sensors that check the part — that is the AWS Config rule trigger on configuration change. When a part fails the quality gate, a diverter arm routes it off the main line into a rework cell that runs a fixed sequence of corrective steps — that is the EventBridge → SSM Automation chain. The rework cell's tools only work because the factory keycard system authorized them; that is the remediation role's trust policy and IAM permissions. Periodically a shift supervisor walks the floor and audits everything against a checklist — that is the periodic Config rule that runs every 24 hours regardless of events. The factory ships products predictably because every step is pre-defined, parameterized, audited, and recoverable.

For SOA-C02, the clock + alarm panel + janitor chain analogy is the most useful when a question describes the full Config-EventBridge-SSM pipeline. The clock is the schedule; the alarm panel is EventBridge as a router; the janitor is SSM Automation. When the question asks "what is missing from the chain?", you can almost always trace it back to one of: a missing keycard (remediation role missing or trust policy wrong), a missing alarm wire (EventBridge rule with the wrong event pattern), or a missing standing order (SSM document parameter not set). Reference: https://docs.aws.amazon.com/config/latest/developerguide/remediation.html

Scheduling Automation on AWS: EventBridge Scheduler vs EventBridge Rules

AWS gives you two different scheduling primitives, both branded under EventBridge but architecturally distinct. Picking the right one is a recurring SOA-C02 question shape.

EventBridge scheduled rules (the original primitive)

EventBridge scheduled rules are the descendant of CloudWatch Events scheduled rules. They live on the default event bus, are configured via the EventBridge "Rules" UI or PutRule, and use either a rate(...) or cron(...) expression. Each rule supports up to five targets, runs in a single AWS Region, and is identified by the rule name. Pricing is bundled into EventBridge — you pay for invocations, not for the rule itself.

When to use scheduled rules:

  • A small number (tens to a few hundred) of recurring schedules per Region.
  • The schedule is naturally tied to an event bus you are already using.
  • You want a unified rule list for both event-pattern and scheduled rules.
  • The classic CloudWatch Events migration path — the same cron(...) syntax still works.

Limitations that drive teams to EventBridge Scheduler:

  • One-to-five targets per rule; for 50 targets you need 10 rules.
  • No native one-time schedule (at(...) expression).
  • No native time zone support — cron is always evaluated in UTC.
  • No flexible time window for jitter — useful when 10,000 schedules would all fire at the same second and overload a downstream API.
  • Limited per-Region quota on the number of rules.
  • No native dead-letter queue per rule until 2020.

EventBridge Scheduler (the newer purpose-built service)

EventBridge Scheduler launched in November 2022 as a standalone service for scheduled invocations at scale. It supports one-time schedules with at(...), recurring schedules with rate(...) or cron(...), time zone–aware cron evaluation, flexible time windows for jitter, dead-letter queues, retry policies with exponential backoff, and schedule groups for organizing schedules and applying tags. It supports 270+ AWS service APIs as targets (versus EventBridge rule's smaller built-in target list), with templated targets for the common ones (Lambda, SQS, SNS, ECS, Step Functions, SSM Automation) and a generic AWS API target for anything else.

When to use EventBridge Scheduler:

  • Hundreds, thousands, or millions of schedules per account.
  • Per-tenant schedules in a SaaS application — each tenant gets their own schedule.
  • Time zone–aware cron — "every weekday at 9am Tokyo time" needs cron(0 9 ? * MON-FRI *) with a TimeZone: Asia/Tokyo setting.
  • One-time schedules — "send this reminder once at 2026-05-15T09:00:00 UTC".
  • Flexible time windows — "fire any time within a 15-minute window centered on 02:00 UTC" to spread load.
  • EventBridge scheduled rule: up to 5 targets per rule, one Region per rule, cron always in UTC, no native one-time at(...).
  • EventBridge Scheduler: one target per schedule (you create more schedules), time zone–aware cron, supports at(...) for one-time, supports flexible time windows, scales to millions of schedules.
  • EventBridge cron has 6 fields (minutes hours day-of-month month day-of-week year). Unix cron has 5.
  • EventBridge Scheduler retry: configurable max retries (0–185) and max age of event (60 s to 1 day).
  • EventBridge invocation cost: roughly the same per-invocation rate; Scheduler has no per-schedule fixed fee.
  • Scheduler schedule group default quota: 500 groups per account; 1,000,000 schedules per account by default with limit increase available.
  • Reference: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html

Decision summary

Need Pick
Up to ~100 simple recurring schedules in one Region EventBridge scheduled rule
Time zone–aware cron EventBridge Scheduler
One-time future invocation (at(...)) EventBridge Scheduler
Per-tenant scheduling at SaaS scale EventBridge Scheduler
Need to attach 5+ targets to one schedule EventBridge scheduled rule (Scheduler is one target per schedule)
Flexible jitter window for load spreading EventBridge Scheduler
Native integration with an existing custom event bus EventBridge scheduled rule (default bus)

A common SOA-C02 distractor is "use one EventBridge Scheduler schedule with five targets". EventBridge Scheduler is one target per schedule — for five targets you create five schedules in the same schedule group. EventBridge scheduled rules allow up to five targets per rule. Mixing the two services' constraints in your head is exactly the trap. Reference: https://docs.aws.amazon.com/scheduler/latest/UserGuide/managing-schedule-group.html

EventBridge Rate and Cron Expressions: Syntax and Exam Gotchas

Almost every SOA-C02 candidate sees at least one question that requires reading or writing a rate or cron expression. The fields are deceptively simple but the gotchas are real.

Rate expressions

Rate expressions follow the form rate(value unit) where:

  • value is a positive integer.
  • unit is minute(s), hour(s), or day(s). Singular for 1, plural for everything else.

Valid examples: rate(5 minutes), rate(1 hour), rate(2 hours), rate(7 days).

Invalid examples: rate(30 seconds) (no second resolution), rate(0 minutes) (must be positive), rate(1 minutes) (must be 1 minute), rate(1 month) (no month unit — use rate(30 days) or a cron expression).

Rate expressions count from the time the rule is created. A rule created at 09:13:42 with rate(1 hour) next fires at 10:13:42, then 11:13:42, and so on. To pin invocations to clock-aligned times like exactly 09:00, 10:00, 11:00, you need a cron expression instead.

EventBridge cron expressions — six fields

EventBridge cron has six fields, in this order:

cron(minutes hours day-of-month month day-of-week year)
Field Values Wildcards
Minutes 0–59 , - * /
Hours 0–23 , - * /
Day-of-month 1–31 , - * ? / L W
Month 1–12 or JAN-DEC , - * /
Day-of-week 1–7 or SUN-SAT , - * ? L #
Year 1970–2199 , - * /

Critical exam-relevant rules:

  • You cannot specify * for both day-of-month and day-of-week in the same expression. One of them must be ? (the "no specific value" wildcard). Standard Unix cron has no ? wildcard; EventBridge requires it.
  • Day-of-week values are 1=SUN, 2=MON, …, 7=SAT (this differs from some other cron dialects).
  • The year field is required — there is no 5-field cron in EventBridge.
  • Cron is always evaluated in UTC for EventBridge scheduled rules. For EventBridge Scheduler you can set a TimeZone.

Common cron patterns

Goal Expression
Every day at 02:00 UTC cron(0 2 * * ? *)
Every Monday at 18:00 UTC cron(0 18 ? * MON *)
Every weekday (Mon–Fri) at 09:00 UTC cron(0 9 ? * MON-FRI *)
Every 10 minutes between 09:00 and 17:59 on weekdays cron(0/10 9-17 ? * MON-FRI *)
First day of every month at 00:00 UTC cron(0 0 1 * ? *)
Last day of every month at 23:59 UTC cron(59 23 L * ? *)
Every Sunday at 02:00 UTC for the EBS snapshot job cron(0 2 ? * SUN *)
Every 30 minutes cron(0/30 * * * ? *) or rate(30 minutes)
Once at 2026-12-25T08:00:00 UTC EventBridge Scheduler at(2026-12-25T08:00:00)
  • 6 fields: minutes, hours, day-of-month, month, day-of-week, year. Unix cron has 5; EventBridge has 6.
  • ? wildcard required: exactly one of day-of-month or day-of-week must be ?. Both * is invalid.
  • Day-of-week numbering: 1=SUN, 2=MON, …, 7=SAT.
  • L (last) and W (weekday closest) are EventBridge extensions to standard cron.
  • # (nth weekday of month): cron(0 9 ? * 2#1 *) = first Monday of the month at 09:00.
  • All scheduled rules in UTC; Scheduler supports per-schedule time zone.
  • Minimum granularity: 1 minute. No sub-minute scheduling.
  • Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cron-expressions.html

The single most-tested syntax gotcha on SOA-C02. Candidates copy a 5-field cron from a Linux crontab (0 2 * * SUN) into the EventBridge console and the validation fails because the year field is missing. The right answer in EventBridge is cron(0 2 ? * SUN *) — note the ? for day-of-month and the trailing * for year. SOA-C02 routinely offers a wrong answer choice that is a perfectly valid 5-field cron expression to trap candidates who skipped the EventBridge syntax docs. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cron-expressions.html

Common Scheduled Automation Targets: Lambda, SSM Automation, Step Functions, and More

Once a schedule fires, what does it actually invoke? The target dictates the operational shape of your automation.

Lambda — for lightweight, single-step tasks

A Lambda function is the right target when:

  • The work is short (under 15 minutes — the Lambda timeout ceiling).
  • It is a single-step API call or computation — call the EC2 API, write to S3, post to a webhook.
  • You want event payload parsing in Python or Node.js — Lambda is stateful enough to format and forward.
  • The target service is not natively supported by EventBridge or Scheduler.

Examples: nightly scan of expired credentials, hourly health probe, weekly cost report email.

Systems Manager Automation document — for multi-step operational sequences

An SSM Automation document is the right target when:

  • The work has multiple steps with branching, retries, or wait-for-resource-state semantics.
  • You need IAM-bounded execution (assume a specific role, use AssumeRole action).
  • You need a documented runbook that humans can also execute manually for the same task.
  • You need parameterization — same document, different inputs (instance ID, tag value).

Examples: stop instances by tag at 18:00 and start at 09:00 (AWS-StopEC2Instance, AWS-StartEC2Instance); create EBS snapshot for tagged volumes (AWS-CreateSnapshot); patch a fleet (AWS-RunPatchBaseline); update a security group rule on schedule (custom document).

Step Functions — for long-running, complex orchestration

Step Functions is the right target when:

  • The work crosses multiple services with intricate error handling.
  • The duration may exceed 15 minutes — Standard Workflows last up to one year.
  • You need explicit visualization of the state machine for audit.
  • You need parallel branches with map state semantics — process N items concurrently with retry per item.

Examples: nightly ETL pipeline, multi-region failover orchestration, complex disaster recovery drill.

Other common targets

  • SQS / SNS — for fan-out to many subscribers without invoking compute directly.
  • ECS RunTask — kick off a containerized batch job on schedule.
  • CodePipeline — trigger a pipeline run on schedule.
  • EC2 / Auto Scaling action — scheduled scaling action via Auto Scaling scheduled action (an Auto Scaling–native primitive that overlaps with EventBridge schedule patterns).
  • Generic AWS API target (Scheduler only) — call any AWS API directly without writing a Lambda.

SOA-C02 is biased toward AWS-native targets that minimize custom code. The hierarchy of preference for a "schedule a task" question is roughly: scheduled rule → AWS-managed SSM Automation document; scheduled rule → Lambda; scheduled rule → Step Functions for orchestration; never schedule a cron job inside an EC2 instance if any AWS-native target works. Cron-on-EC2 fails the moment the instance is patched, replaced, or moved across AZs. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-targets.html

AWS Config Rule Trigger Types: Periodic vs Configuration Change

AWS Config evaluates resources against rules in two trigger modes, and the choice has major operational consequences.

Configuration-change rules

A configuration-change rule fires whenever Config records a change to a resource of a specified type. Config records changes in near-real-time (typically within minutes) for every supported resource. When the rule fires, it evaluates only the changed resource — not the whole population.

Use a configuration-change rule when:

  • You want immediate feedback when a resource is mis-created or mis-modified.
  • The check is cheap and resource-scoped (e.g., "this EBS volume must be encrypted").
  • You want to drive auto-remediation immediately on the bad resource.

Examples: s3-bucket-public-read-prohibited, encrypted-volumes, restricted-ssh, iam-user-mfa-enabled.

Periodic rules

A periodic rule fires on a schedule — every 1, 3, 6, 12, or 24 hours. When it fires, it evaluates all in-scope resources, not just changed ones.

Use a periodic rule when:

  • The check requires aggregation over many resources or external state (e.g., "the account has at least one CloudTrail trail").
  • You want to catch slow drift that does not cause a configuration-change event (e.g., a managed-policy attached to many roles).
  • The check is expensive and you want to bound how often it runs.

Examples: cloudtrail-enabled, securityhub-enabled, multi-region-cloudtrail-enabled, required-tags (often configured as periodic to avoid evaluating on every tag change).

Combined evaluation

Many AWS-managed rules support both trigger types — you choose at rule creation time. A common operational pattern is configuration-change for fast feedback plus a daily periodic re-evaluation as a safety net.

  • Configuration-change: fires on resource change, evaluates only the changed resource, near-real-time (minutes).
  • Periodic: fires every 1, 3, 6, 12, or 24 hours, evaluates all in-scope resources.
  • Some managed rules support only one trigger type — check the rule reference before selecting.
  • Custom Lambda rules can be configuration-change or periodic; you write the evaluator.
  • Custom Guard policy rules: declarative CloudFormation Guard syntax, supported for configuration-change.
  • Reference: https://docs.aws.amazon.com/config/latest/developerguide/evaluate-config.html

A common SOA-C02 scenario: "a developer creates a non-compliant resource and the SysOps team expected an immediate alarm, but nothing fired for hours". The trap is that the rule is configured as periodic with a 24-hour frequency. The fix is either to switch the rule to configuration-change trigger (if the rule type allows) or to add a separate configuration-change rule for fast detection while keeping the periodic for completeness. Reference: https://docs.aws.amazon.com/config/latest/developerguide/evaluate-config.html

AWS Config Auto-Remediation: The Full Chain

AWS Config auto-remediation is the headline closed-loop pattern of SOA-C02 Domain 3.2. The full chain is: a Config rule detects non-compliance → a remediation action runs an SSM Automation document → the document fixes the resource → optionally re-evaluation confirms compliance.

Anatomy of a remediation configuration

A remediation configuration is attached to a Config rule and specifies:

  1. Target: an SSM Automation document — either AWS-managed (e.g., AWS-EnableS3BucketEncryption, AWS-DisablePublicAccessBlockS3, AWS-ConfigureS3BucketLogging) or a custom document.
  2. Parameters: the inputs to the document. The special parameter ResourceId (or ResourceValue) is auto-filled by Config with the non-compliant resource ID.
  3. Resource ID resolver: which Config-recorded resource maps to which document parameter. For an S3 rule, the bucket name is bound to the document's BucketName parameter via this mapping.
  4. Automatic vs manual: in automatic mode, Config invokes the document as soon as the rule fires NON_COMPLIANT. In manual mode, a human clicks "Remediate" in the Config console.
  5. Maximum automatic attempts: 1–25 retries within the chosen retry interval (60–2700 seconds).
  6. Execution role: the IAM role that SSM assumes to run the document. This is the single most-error-prone configuration.

The remediation IAM role

The remediation role must:

  • Have a trust policy that allows ssm.amazonaws.com to assume it (Service: ssm.amazonaws.com in the principal block).
  • Have permissions broad enough for the SSM Automation document's steps — s3:PutBucketEncryption, ec2:CreateTags, etc.
  • Have iam:PassRole on itself if the document calls another service that needs a role passed.

A common failure mode: the remediation runs to "InProgress" but never moves to "Success" because the assumed role lacks one specific API permission deep inside a multi-step document. Diagnostic: open the SSM Automation execution in the console, drill into the failing step, read the AccessDenied error, add the missing permission to the role policy.

Example: end-to-end auto-remediation for an unencrypted EBS volume

  1. Config rule: encrypted-volumes (managed, configuration-change trigger).
  2. Remediation document: AWS-EnableEbsVolumeEncryption — actually no such managed document exists for in-place encryption. The realistic remediation is AWS-CreateEncryptedVolumeFromSnapshot or a custom document that snapshots, copies the snapshot encrypted, creates a new volume, detaches the old one, attaches the new one, and tags the resource.
  3. Parameters: VolumeId mapped from the Config resource ID; KmsKeyId set to the encryption KMS key.
  4. Mode: manual (because the workflow is destructive — detaching a volume requires downtime, so a human approves).
  5. Role: SSM remediation role with EC2 snapshot, copy, create-volume, attach, detach, and tagging permissions.

Example: end-to-end auto-remediation for a non-compliant tag

  1. Config rule: required-tags (managed, configuration-change trigger), parameterized with tag1Key=Owner, tag1Value=*.
  2. Remediation document: AWS-SetRequiredTags (custom or community). It calls ec2:CreateTags (or other taggable service APIs) on the resource ID.
  3. Parameters: ResourceId from Config, tag key/value from rule parameters.
  4. Mode: automatic with 3 retry attempts at 300-second intervals.
  5. Role: remediation role with ec2:CreateTags, s3:PutBucketTagging, etc., for the in-scope resource types.

The single most common SOA-C02 question failure on this topic: "auto-remediation is configured but nothing happens". The answer is almost always the remediation role's trust policy is missing ssm.amazonaws.com, the role lacks one of the action permissions inside the SSM document, or the rule is in manual rather than automatic mode. The Config console's remediation history shows the failing reason — read the AccessDenied or MissingPermission line in the SSM execution to identify the exact gap. Reference: https://docs.aws.amazon.com/config/latest/developerguide/remediation.html

Conformance packs — packaged rules and remediations

A conformance pack is a YAML template that bundles multiple Config rules and remediation actions into a single deployable unit. AWS publishes sample packs aligned to CIS, NIST 800-53, PCI-DSS, HIPAA, and the AWS Operational Best Practices set. Conformance packs deploy at the account level or, via Organizations integration, across an entire organization. The operational use case: "we need to enforce CIS controls in every account" — deploy the CIS conformance pack once at the org level, monitor compliance through the aggregator dashboard, customize parameters per region.

SOA-C02 may ask "the security team wants the same 25 Config rules deployed to 80 accounts with consistent remediation actions". The answer is a conformance pack deployed via AWS Organizations integration — not 25 individual PutConfigRule API calls per account. Conformance packs also allow per-pack rollback and version control. Reference: https://docs.aws.amazon.com/config/latest/developerguide/conformance-packs.html

Systems Manager Maintenance Windows: Recurring Schedules with Targets, Tasks, and Priority

A Maintenance Window is a recurring time slot in Systems Manager during which scheduled tasks run on a defined set of targets. Maintenance Windows are the SOA-C02-canonical answer for "regular operational work that should run on the same fleet at the same time every week".

The four building blocks

  1. Schedule: a cron(...) or rate(...) expression plus a duration and cutoff. Duration is how long the window stays open (e.g., 4 hours); cutoff is how many minutes before the window ends to stop launching new tasks (e.g., 1 hour). Time zone is configurable per window.
  2. Targets: a registered set of resources for the window. Targeting modes include instance IDs, tag-based (tag:Environment=Production), or resource group ARNs. Targets are registered once and reused by every task.
  3. Tasks: the actual work to run. Task types include:
    • RUN_COMMAND — execute an SSM Run Command document (e.g., AWS-RunPatchBaseline).
    • AUTOMATION — execute an SSM Automation document.
    • LAMBDA — invoke a Lambda function.
    • STEP_FUNCTIONS — start a Step Functions state machine.
  4. Priority and concurrency: each task has a numeric priority (lower number = higher priority; tasks with the same priority run in parallel). You also set max concurrency (how many targets to act on at once — 10 instances, 25% of the fleet) and max errors (how many failures before halting — 2 instances, 10% of the fleet).

Operational use cases

  • Patching: AWS-RunPatchBaseline Run Command on a tag-based target every Sunday 02:00 with concurrency 10% and max errors 5%.
  • Snapshots: AWS-CreateSnapshot Automation across all production EBS volumes weekly.
  • Log rotation / cleanup: a custom Run Command rotating old logs on every instance during the window.
  • Configuration drift checks: a Lambda task that polls Config rules for compliance status and posts a report.

Maintenance Windows vs EventBridge scheduled rules

Both can fire on a cron expression. So when do you pick which?

Need Pick
Run a single API or workflow on a schedule EventBridge scheduled rule (or Scheduler)
Run patching, snapshots, or fleet commands on a registered set of EC2/RDS targets with concurrency control Maintenance Window
Need explicit duration and cutoff semantics (window opens for 4h, no new tasks in last hour) Maintenance Window
Need ordered execution by priority within a single schedule firing Maintenance Window
Need cross-account scheduling EventBridge Scheduler (Maintenance Windows are per-account)
Need tag-based fleet targeting with built-in concurrency limits Maintenance Window

Conceptually a Maintenance Window is "scheduled rule + a registered target set + concurrency and error limits + ordered tasks by priority". You could rebuild a Maintenance Window with EventBridge + Step Functions + custom logic, but Maintenance Windows are AWS-built for the patching/snapshot/fleet-ops use case. SOA-C02 strongly prefers Maintenance Windows for any "every Sunday 02:00 patch the production fleet with controlled rollout" question. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-maintenance.html

SSM Automation Document Anatomy: Parameters, Steps, and Outputs

Whenever the answer involves running multi-step automation, an SSM Automation document is what executes. Understanding its anatomy lets you reason about why a runbook stalls or fails.

Document structure (YAML)

schemaVersion: "0.3"
description: "Stop EC2 instances by tag."
assumeRole: "{{ AutomationAssumeRole }}"
parameters:
  InstanceIds:
    type: StringList
    description: List of instance IDs.
  AutomationAssumeRole:
    type: String
    default: ""
mainSteps:
  - name: stopInstances
    action: aws:changeInstanceState
    inputs:
      InstanceIds: "{{ InstanceIds }}"
      DesiredState: stopped
  - name: waitForStopped
    action: aws:waitForAwsResourceProperty
    inputs:
      Service: ec2
      Api: DescribeInstances
      InstanceIds: "{{ InstanceIds }}"
      PropertySelector: "Reservations[0].Instances[0].State.Name"
      DesiredValues:
        - stopped
outputs:
  - stopInstances.StoppedInstances

The key fields:

  • schemaVersion: 0.3 is the modern Automation schema; 1.2 is older Run Command.
  • assumeRole: which IAM role the document assumes. For Config auto-remediation, this is the remediation role.
  • parameters: typed inputs (String, StringList, Integer, Boolean, MapList) — passed in at execution.
  • mainSteps: ordered actions. Each step has action (the action plugin), inputs, optional onFailure, onCancel, nextStep, and outputs.
  • outputs: top-level outputs of the document, referencing step outputs.

Common Automation actions

  • aws:executeAwsApi — call any AWS API directly.
  • aws:runCommand — run an SSM Run Command document on EC2 / on-prem.
  • aws:executeScript — run a Python or PowerShell script inline.
  • aws:waitForAwsResourceProperty — pause until a resource attribute reaches a desired value (instance running, snapshot completed).
  • aws:assertAwsResourceProperty — fail the document if the property does not match (precondition check).
  • aws:branch — conditional branching based on previous step outputs.
  • aws:approve — pause for manual approval via SNS notification.
  • aws:changeInstanceState — start, stop, terminate, reboot.

Parameter passing from Config

When Config invokes a document for remediation, it auto-fills parameters from the rule context. The default mapping uses parameter names like ResourceId, ResourceValue, BucketName (depending on the resource type). You configure the mapping in the remediation configuration: "the document's BucketName parameter should be bound to the Config-detected ResourceValue for S3 bucket non-compliance".

Every AWS-managed Automation document has a clear parameter list and IAM requirement section. Before attaching an AWS-... document to a Config remediation, read the doc page (or run aws ssm describe-document --name AWS-EnableS3BucketEncryption) to see exactly which parameters are required, which are optional, and what permissions the assumed role needs. SOA-C02 questions sometimes provide the document name and ask "which parameter is missing?" — the answer is in the doc's parameter list. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-documents.html

Combining Config + EventBridge + SSM: Continuous Compliance Pipelines

Direct Config-to-SSM remediation handles the simple case. For complex compliance pipelines you usually want EventBridge in the middle for additional routing, transformation, and fan-out.

The classic three-stage pipeline

  1. Detect: AWS Config evaluates a rule and emits a Config Rules Compliance Change event to EventBridge with complianceType: NON_COMPLIANT.
  2. Route: An EventBridge rule matches the event pattern (e.g., specific rule name, account, region, severity) and routes to one or more targets:
    • SNS topic for human notification.
    • Lambda function that filters or enriches the event.
    • SSM Automation execution that fixes the resource.
    • Step Functions state machine for multi-step orchestration.
  3. Remediate: The SSM Automation document executes the fix and emits a completion event back to EventBridge for audit.

This is more flexible than Config's built-in remediation because the EventBridge rule lets you:

  • Route to different targets based on rule name or resource type (e.g., low-severity rules go to a Slack channel; high-severity rules trigger automatic SSM remediation).
  • Apply input transformation to map the Config event to the SSM document's parameter shape.
  • Fan out to a security ticketing system and a remediation pipeline simultaneously.
  • Combine with cross-account event buses to centralize compliance handling in a security account.

The role of CloudTrail in the pipeline

CloudTrail does not trigger remediation directly — Config does. But CloudTrail provides the audit trail of who changed the resource and when, which is what the on-call needs after remediation runs. A complete pipeline often pairs Config (continuous configuration) + CloudTrail (API audit) so that "the bucket was made public, fixed by remediation, and the CloudTrail log shows user dev-jane made the change at 14:32" is one queryable timeline.

Example: continuous compliance for a multi-account org

Architecture for a security team operating an AWS Organization:

  1. Config aggregator in the security account collects compliance status from every member account.
  2. Conformance pack deployed via Organizations applies the same 25 rules everywhere.
  3. EventBridge rule in each member account on Config Rules Compliance Change → cross-account event bus in the security account.
  4. Security account EventBridge rule routes by severity:
    • High severity (public S3, missing encryption) → SSM Automation cross-account remediation document with assumed role into the source account.
    • Medium severity (missing tags) → Slack channel via SNS-to-webhook Lambda.
    • Low severity (informational) → CloudWatch Logs for archival.
  5. Audit dashboard built on Config aggregator + CloudWatch metric streams + Athena queries on CloudTrail.

SOA-C02 explicitly tests the three-service automation chain. Memorize the order: Config detects, EventBridge routes, SSM Automation remediates. Each link has its own configuration: Config rule trigger type, EventBridge event pattern, SSM document and remediation role. When a question asks "the chain isn't working", the diagnosis path is: (1) is Config evaluating? (2) is EventBridge matching the event pattern? (3) is SSM running, and if so, which step is failing? Reference: https://docs.aws.amazon.com/config/latest/developerguide/remediation.html

Automating EBS Snapshots: DLM vs EventBridge → Lambda vs CloudWatch Events Legacy

EBS snapshot scheduling is a topic-favourite scenario because there are several valid patterns and the exam tests the operational trade-offs.

Amazon Data Lifecycle Manager (DLM) — the SOA-preferred answer

DLM is purpose-built for scheduling EBS snapshots and AMI creation. You define a policy (target by tag, schedule, retention rules, cross-region copy, fast snapshot restore) and DLM handles the rest. No Lambda code, no IAM gymnastics beyond the DLM service role, no missed runs after the cron-on-EC2 host gets replaced.

Use DLM when:

  • The work is "snapshot tagged volumes on a schedule with retention".
  • Retention is by count or age (keep last 7 daily, 4 weekly, 12 monthly).
  • You want cross-region or cross-account snapshot copy.
  • You want fast snapshot restore enabled per AZ.

EventBridge scheduled rule → SSM Automation AWS-CreateSnapshot

Use this pattern when the snapshot needs to be part of a larger orchestrated runbook — e.g., "stop the application, snapshot the EBS, run a database backup, restart the application". The Automation document chains the steps; the EventBridge rule fires the chain.

EventBridge scheduled rule → Lambda → EC2 snapshot API

Use this pattern when DLM does not fit (custom retention logic, complex tag-driven naming, integration with a CMDB) and an SSM Automation document is too rigid for the logic.

Why not cron-on-EC2

A cron job inside an EC2 instance fails the moment that instance is patched, terminated, replaced by Auto Scaling, or moves to a different AZ. There is also no audit trail, no IAM-bounded execution, no retry, and no fleet-level visibility. SOA-C02 rejects cron-on-EC2 in every scheduled-task scenario.

Default to DLM for EBS and AMI snapshots. Use EventBridge → SSM Automation when snapshots are one step of a larger orchestrated runbook. Use EventBridge → Lambda when the logic is too custom for DLM but the orchestration is too simple for SSM. Never use cron-on-EC2. Reference: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-lifecycle.html

Cross-Account Scheduling: EventBridge Scheduler and Cross-Account Event Buses

Production AWS estates almost always span multiple accounts. Scheduling and remediation must work across the boundary.

Cross-account targets with EventBridge Scheduler

EventBridge Scheduler can target an AWS API in a different account by assuming a cross-account role. The schedule's execution role in the source account is granted sts:AssumeRole on a target-account role; the target-account role grants the actual API permissions and trusts the source-account role. The Scheduler invokes the target API as the assumed role.

This pattern is useful for a centralized scheduling account that fires snapshots, audits, or cleanup jobs into many member accounts.

Cross-account EventBridge event buses

Custom event buses can be granted resource policies that allow other accounts to PutEvents. A common architecture: every member account's local Config emits a Compliance Change event to its local default bus; an EventBridge rule on the default bus forwards the event to the security account's central event bus; rules on the central bus route to remediation and notification targets.

Maintenance Window cross-account

Maintenance Windows themselves are per-account, but tasks within a window can call cross-account APIs by assuming roles. For fleet-wide patching across accounts, the operational pattern is one Maintenance Window per account, deployed via CloudFormation StackSets, with a consistent schedule and target tag.

Every cross-account scheduling answer ultimately reduces to "the source account assumes a role in the target account". On SOA-C02, when a question asks "centralize scheduling for 50 accounts", the answer is some combination of EventBridge Scheduler with cross-account roles, EventBridge cross-account event buses, or StackSets-deployed schedules — never custom cron servers running in a "management" account. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cross-account.html

Operational Runbooks: Converting Manual Procedures to SSM Automation Documents

A SysOps team's wiki of manual runbooks ("how to recover the order-processing queue when it backs up") is the natural input to SSM Automation document migration. The conversion is mostly mechanical.

Step-by-step conversion

  1. Identify the steps in the manual runbook and order them. Each step becomes an Automation mainSteps entry.
  2. Identify parameters — anything the operator types in (instance ID, queue name, threshold). These become document parameters.
  3. Identify checkpoints — places where the operator validates state before continuing. These become aws:assertAwsResourceProperty or aws:waitForAwsResourceProperty actions.
  4. Identify approvals — places where the operator waits for a human sign-off. These become aws:approve actions.
  5. Identify error paths — what the runbook says to do if a step fails. These become onFailure clauses (Abort, Continue, step:<name>).
  6. Identify outputs — what the runbook reports at the end (snapshot ID, ticket number). These become document outputs.
  7. Test in non-production with a small parameter set, then graduate to production.

Why convert at all

  • Audit: every execution is logged in CloudTrail and the SSM Automation history. Manual runbook execution leaves only chat-channel evidence.
  • Repeatability: the document executes the same way every time; humans skip steps under stress.
  • IAM-bounded: the assumed role enforces least privilege; manual operators may have broader credentials.
  • Schedulable: a document can be invoked by EventBridge, Maintenance Windows, or Config remediation.
  • Composable: documents can call other documents, allowing a "library" of building-block runbooks.

"Convert the team's manual operational runbook to an SSM Automation document" is a phrase that appears in SOA-C02 scenarios. The right answer is always to translate manual steps into the document schema — not to write a Lambda function that imitates the steps, not to encode the steps in a CloudFormation template, not to leave them as a wiki. The SSM Automation document is the SysOps standard. Reference: https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-documents.html

Scenario Pattern: S3 Bucket With Public Access Detected — Auto-Block Pipeline

A canonical SOA-C02 scenario combining Config, EventBridge, and SSM. The runbook:

  1. Detection layer: enable AWS Config in the account; turn on the managed rule s3-bucket-public-read-prohibited (configuration-change trigger). Also turn on s3-bucket-public-write-prohibited for write-side coverage.
  2. Routing layer: choose between (a) Config built-in remediation directly invoking SSM, or (b) EventBridge rule on Config Rules Compliance Change events. Pick (b) when you also want to notify SecOps via SNS and create a Jira ticket via Lambda in parallel with remediation.
  3. Remediation layer: SSM Automation document — for this case, a custom document or AWS-DisablePublicAccessBlockS3 (depending on whether you want to flip the bucket-level public access block versus removing a public ACL or bucket policy statement). Parameters: BucketName from the Config resource ID, KmsKeyId if encryption hardening is part of the same step.
  4. IAM layer: remediation role trusts ssm.amazonaws.com, has s3:PutBucketPublicAccessBlock, s3:GetBucketPublicAccessBlock, s3:PutBucketPolicy, and audit-logging permissions for the document's aws:executeScript step that records what was changed.
  5. Audit layer: CloudTrail records both the original API call that made the bucket public (with the actor's IAM identity) and the remediation API call (with the SSM remediation role's identity). EventBridge rule on RemediationExecution events writes a record to a security S3 bucket for SOX/PCI evidence.
  6. Re-evaluation: after remediation, Config re-evaluates the rule. If it transitions back to COMPLIANT, the chain logs success. If it stays NON_COMPLIANT (e.g., the policy was re-added by an external tool), the retry kicks in or escalation paging fires.

The whole pipeline is declarative, lives in a CloudFormation template, and deploys identically to every account via StackSets.

Scenario Pattern: EBS Snapshots Aged Greater Than 30 Days — Sunday 02:00 Cleanup

Another canonical SOA-C02 scenario, this time pure scheduled task. The runbook:

  1. Schedule: EventBridge scheduled rule with cron(0 2 ? * SUN *) — every Sunday at 02:00 UTC.
  2. Target: SSM Automation document, custom — MyOrg-DeleteOldSnapshots. Parameters: MaxAgeDays=30, TagFilter=Environment=Production, DryRun=false.
  3. Document logic: aws:executeScript Python step calling ec2:DescribeSnapshots with tag filter, computing age from StartTime, calling ec2:DeleteSnapshot for each snapshot older than MaxAgeDays, recording the IDs to an S3 audit log.
  4. Role: Automation assume role with ec2:DescribeSnapshots, ec2:DeleteSnapshot, s3:PutObject on the audit bucket, kms:Decrypt for tag-encryption metadata if applicable.
  5. Notification: document's final step posts a summary to an SNS topic — "Deleted 47 snapshots, total size 1.2 TB, full list in s3://audit-bucket/2026/04/26/snapshots.json".
  6. Maintenance Window alternative: same logic wrapped in a Maintenance Window if the work is part of a broader Sunday-night ops batch; otherwise EventBridge scheduled rule + Automation is simpler.

This pattern explicitly avoids: a cron job on a "utility EC2 instance" (fragile), a Lambda function that re-implements snapshot age math without an audit trail (low traceability), and a manual quarterly cleanup (slow and error-prone).

Common Trap: Config Auto-Remediation Silent Failure Modes

Config auto-remediation has several silent-failure modes that all show up on SOA-C02. The diagnostic checklist:

  1. Mode is manual, not automatic — the rule fires, the remediation appears in the console, but nothing happens until a human clicks "Remediate". Fix: switch the remediation configuration's mode to automatic.
  2. Remediation role trust policy missing ssm.amazonaws.com — SSM cannot assume the role. Fix: add Service: ssm.amazonaws.com to the trust policy's principal block.
  3. Remediation role lacks API permissions — the role assumes successfully but the document fails at a specific step. Fix: read the SSM Automation execution failure reason and add the missing permission.
  4. Document parameters mismapped — Config passes ResourceId but the document expects BucketName. Fix: configure the parameter mapping in the remediation configuration.
  5. Retry exhausted — the document failed N times and Config gave up. Fix: investigate the failure, either tune retry settings (max 25 attempts, 60–2700 second intervals) or fix the underlying issue.
  6. Resource type unsupported by the document — the document only handles a subset of the Config rule's evaluated resources. Fix: split into multiple rules or write a custom document that handles all types.
  7. Region mismatch — Config rule is in us-east-1 but the document is registered in us-west-2. Documents are regional. Fix: register the document in the same region as the rule.

Common Trap: EventBridge Cron Field Count and Day-of-Week / Day-of-Month Wildcards

The EventBridge cron syntax tricks every candidate at least once:

  • 6 fields, not 5: cron(0 2 * * SUN) is invalid; cron(0 2 ? * SUN *) is valid.
  • ? is required: at least one of day-of-month or day-of-week must be ?. Both * is a syntax error.
  • All UTC: cron(0 9 ? * MON-FRI *) fires at 09:00 UTC, not 09:00 Beijing time. For time zones, use EventBridge Scheduler with TimeZone: Asia/Shanghai.
  • Day-of-week numbering: 1=SUN through 7=SAT. Some Unix dialects use 0=SUN; do not assume.
  • Names must be uppercase: MON, not Mon or mon.

A SysOps engineer migrating a 0 2 * * 0 Linux cron entry to EventBridge will fail validation because EventBridge needs six fields with the year. The correct translation is cron(0 2 ? * SUN *). SOA-C02 routinely offers the wrong 5-field expression as a tempting but invalid distractor. Reference: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cron-expressions.html

Common Trap: Auto Scaling Scheduled Actions vs EventBridge Scheduled Rules

Auto Scaling has its own native scheduled action API — aws autoscaling put-scheduled-update-group-action — separate from EventBridge schedules. Scheduled actions are evaluated by the Auto Scaling service itself and can adjust min, max, and desired capacity at a specific time.

When to use which:

  • Auto Scaling scheduled action — purely capacity changes on a schedule (scale to 50 instances every weekday at 08:00, back to 10 every weekday at 20:00). Native, no EventBridge dependency, simpler.
  • EventBridge scheduled rule — anything other than capacity changes, or when the schedule needs to invoke multiple actions (capacity change plus notification plus dashboard refresh).

Both accept cron expressions. SOA-C02 sometimes asks "schedule the Auto Scaling group to scale up before peak hours" — the AWS-recommended answer is the native Auto Scaling scheduled action when the only goal is capacity.

Common Trap: Periodic Rule Frequency vs Real-Time Compliance Expectations

A team configures a periodic Config rule with 24-hour frequency and assumes "compliance is checked daily, that's enough". When a developer creates a non-compliant resource at 09:00, the rule does not flag it until the next periodic run — possibly 23 hours later. The fix is either to switch to configuration-change trigger (if the rule supports it) or to layer a configuration-change rule alongside the periodic one for fast detection plus periodic re-validation.

SOA-C02 vs SAA-C03: Wiring vs Selection

SAA-C03 and SOA-C02 both reference Config, EventBridge, and SSM, but the lenses differ.

Question style SAA-C03 lens SOA-C02 lens
Detecting non-compliant resources "Which AWS service detects non-compliant configurations?" "The Config rule fires but auto-remediation does not run — diagnose."
Scheduling "Which AWS service schedules tasks without managing infrastructure?" "Convert the Linux cron entry 0 2 * * 0 to a working EventBridge cron expression."
Remediation "Which AWS service remediates non-compliant resources?" "Configure the remediation IAM role's trust policy and parameter mapping."
Patching "Which service patches EC2 instances on a schedule?" "Build a Maintenance Window with concurrency 10% and max errors 5%."
Scale "Which service handles millions of schedules?" "Migrate 200,000 per-tenant cron jobs from a custom server to EventBridge Scheduler with schedule groups."
Cross-account "How do you centralize compliance across accounts?" "Configure the cross-account event bus resource policy and the SSM cross-account remediation role."
Conformance "Which service applies CIS controls?" "Deploy the CIS conformance pack via Organizations and tune the rule parameters per region."

The SAA candidate selects the service; the SOA candidate writes the cron expression, configures the trust policy, and diagnoses why the chain stalled at 02:00.

Exam Signal: How to Recognize a Domain 3.2 Scheduled-Task or Config-Automation Question

Domain 3.2 questions on SOA-C02 follow predictable shapes. Recognize them and your time on each question drops dramatically.

  • "Convert this cron expression" — the answer almost always involves the 6-field EventBridge syntax with ? wildcard. Watch for 5-field distractors.
  • "Schedule a one-time task in the future" — the answer is EventBridge Scheduler with at(...). EventBridge scheduled rules do not support one-time future invocation cleanly.
  • "Schedule with time zone awareness" — the answer is EventBridge Scheduler with the TimeZone setting.
  • "Schedule millions of per-tenant jobs" — the answer is EventBridge Scheduler with schedule groups.
  • "Auto-remediation does not run" — the answer almost always involves the remediation IAM role's trust policy (ssm.amazonaws.com) or the remediation mode (manual vs automatic).
  • "Detect non-compliance immediately" — the answer is a configuration-change Config rule, not periodic.
  • "Audit all in-scope resources daily" — the answer is a periodic Config rule with 24-hour frequency.
  • "Patch the production fleet every Sunday at 02:00 with controlled rollout" — the answer is a Maintenance Window with AWS-RunPatchBaseline, concurrency, and max errors.
  • "Snapshot tagged EBS volumes weekly with retention" — the answer is DLM, not custom Lambda.
  • "Centralize compliance across the organization" — the answer is a conformance pack via Organizations plus a central Config aggregator.
  • "Convert a manual runbook to automation" — the answer is SSM Automation document, with parameters, sequential steps, and an assumed role.
  • "Run a fleet command after office hours" — the answer is a Maintenance Window with a Run Command task, not a cron-on-EC2.

SOA-C02 Domain 3.2 expects you to wire Config + EventBridge + SSM together correctly, not just to name the services. Memorize the chain (detect → route → remediate), the IAM trust policies (ssm.amazonaws.com for the remediation role), the cron syntax (6 fields, ? wildcard, UTC), and the Maintenance Window structure (schedule + targets + tasks + priority + concurrency + max errors). With those four memorized, most Domain 3.2 questions become reading comprehension. Reference: https://docs.aws.amazon.com/config/latest/developerguide/remediation.html

Decision Matrix — Pick the Right Automation Construct

Use this lookup during the exam.

Operational goal Primary construct Notes
Recurring schedule, simple, ≤ 100 schedules in one Region EventBridge scheduled rule Cron in UTC, up to 5 targets per rule.
Recurring schedule with time zone EventBridge Scheduler Set TimeZone: Asia/Tokyo etc.
One-time future invocation EventBridge Scheduler at(...) EventBridge rule cannot do this cleanly.
Millions of per-tenant schedules EventBridge Scheduler with schedule groups Built for SaaS scale.
Detect non-compliant configuration AWS Config rule Choose configuration-change vs periodic by use case.
Auto-remediate non-compliant resource Config rule + SSM Automation document with remediation role Mode = automatic; role trusts ssm.amazonaws.com.
Apply many rules across accounts Conformance pack via Organizations Single deploy, central aggregation.
Recurring fleet ops (patch, snapshot, run command) Systems Manager Maintenance Window Targets + tasks + priority + concurrency.
Snapshot EBS / AMI on schedule with retention Data Lifecycle Manager (DLM) Native, no Lambda.
Multi-step orchestrated workflow Step Functions invoked by schedule For long-running or branching flows.
Custom logic that does not fit DLM or SSM EventBridge schedule → Lambda Always preferred over cron-on-EC2.
Cross-account scheduled tasks EventBridge Scheduler with cross-account roles Or StackSets-deployed per-account schedules.
Cross-account compliance routing Cross-account EventBridge event bus Resource policy on bus + member-account rule.
Manage all schedules and rules as IaC CloudFormation AWS::Events::Rule, AWS::Scheduler::Schedule Versioned in Git.
Capacity changes on Auto Scaling group Auto Scaling scheduled action Native, simpler than EventBridge for capacity-only.
React to AWS Health events EventBridge rule on aws.health Plus SNS or SSM Automation as target.

Common Traps Recap — Scheduled Tasks, Config Auto-Remediation, and Process Automation

Every SOA-C02 attempt will see two or three of these distractors.

Trap 1: 5-field Unix cron in EventBridge

EventBridge cron has 6 fields. cron(0 2 * * SUN) is invalid; cron(0 2 ? * SUN *) is valid.

Trap 2: Both day-of-month and day-of-week as *

Exactly one must be ?. EventBridge enforces the rule at validation time.

Trap 3: Cron evaluated in local time

EventBridge scheduled rules always evaluate cron in UTC. For other time zones, use EventBridge Scheduler with TimeZone.

Trap 4: EventBridge Scheduler with multiple targets per schedule

Scheduler is one target per schedule. Create more schedules in the same group for multiple targets.

Trap 5: Config remediation in manual mode when automation was expected

The rule fires, the remediation queue fills, nothing happens until a human clicks. Switch to automatic mode.

Trap 6: Remediation role trust policy missing ssm.amazonaws.com

The role exists, has the right permissions, but SSM cannot assume it. Fix the trust policy.

Trap 7: Periodic Config rule for fast detection

Periodic rules run every 1, 3, 6, 12, or 24 hours. For real-time detection use configuration-change trigger.

Trap 8: Cron-on-EC2 for scheduled work

Fragile, opaque, and not exam-correct. Use EventBridge schedule + SSM/Lambda/Step Functions instead.

Trap 9: DLM ignored in favor of custom Lambda for snapshots

DLM is the SOA-preferred answer for EBS and AMI snapshots with tag-based targeting and retention.

Trap 10: Auto Scaling scheduled action vs EventBridge schedule

For pure capacity changes, the native Auto Scaling scheduled action is simpler than EventBridge.

Trap 11: Maintenance Window task without max-errors

A patching task on 1,000 instances with no max-errors limit can wreck the whole fleet if a bad patch is approved. Set max errors = 5% or similar.

Trap 12: Periodic rule with wrong frequency

The frequency choices are 1, 3, 6, 12, or 24 hours. There is no "every 30 minutes" periodic option — for that, use configuration-change trigger or a separate scheduled Lambda.

FAQ — Scheduled Tasks, Config Auto-Remediation, and Process Automation

Q1: When do I pick EventBridge Scheduler over an EventBridge scheduled rule?

Pick EventBridge Scheduler when (a) you need a one-time future invocation with at(...), (b) you need time zone awareness, (c) you have hundreds of thousands or millions of schedules (SaaS per-tenant), (d) you need flexible time windows for jitter, or (e) you want the broader 270+ AWS API target set without writing a Lambda. Pick EventBridge scheduled rules when you have under a few hundred schedules, want multiple targets per rule (up to 5), are already organizing rules on a custom event bus, or are migrating from CloudWatch Events with the same syntax. The clean rule of thumb: Scheduler is the modern default for new projects; scheduled rules are the legacy-friendly choice for small estates already living on EventBridge buses.

Q2: How do EventBridge cron expressions differ from Unix cron?

Three structural differences. First, EventBridge cron has six fields — minutes, hours, day-of-month, month, day-of-week, year. Unix cron has five and no year field. Second, EventBridge requires the ? wildcard in either day-of-month or day-of-week (you cannot have * in both); standard Unix cron has no ?. Third, EventBridge cron is always evaluated in UTC for scheduled rules (Scheduler supports TimeZone); Unix cron uses the host's local time. Day-of-week numbering also differs (EventBridge: 1=SUN to 7=SAT). The most common conversion error is taking a Linux cron and pasting it as-is — always re-write to the 6-field form with ? and explicit UTC reasoning.

Q3: Why is my Config auto-remediation not running, and how do I diagnose it?

The diagnosis path is sequential. Step 1: confirm the rule is firing — check the Config rules dashboard, look for NON_COMPLIANT resources. Step 2: confirm the remediation configuration is in automatic mode, not manual. Step 3: check the remediation IAM role's trust policy — it must allow ssm.amazonaws.com to assume it. Step 4: open the SSM Automation execution history for the document — find the failed execution, drill into the failing step, read the AccessDenied or parameter-mismatch reason. Step 5: check the parameter mapping in the remediation configuration — does the Config-detected ResourceId map to the document's expected parameter name? Step 6: confirm the document is registered in the same region as the Config rule. About 80 percent of real-world failures are diagnosed at step 3 (trust policy) or step 4 (missing API permission deep in the document).

Q4: How do I choose between a configuration-change Config rule and a periodic Config rule?

Use configuration-change when (a) the rule evaluates a single resource against a property that is visible at change time (encryption flag, public ACL, tag presence) and (b) you want immediate detection (within minutes). Use periodic when (a) the rule needs to aggregate across many resources or check global state (e.g., "the account has at least one CloudTrail trail enabled"), (b) the check is expensive and you want to bound how often it runs, or (c) you want to catch slow drift that does not generate a configuration-change event. Many AWS-managed rules support both triggers — pick the right one for the use case, or layer both for fast detection plus daily completeness.

Q5: How does a Maintenance Window's priority and concurrency work?

A Maintenance Window has a list of registered tasks, each with a priority (lower number = higher priority) and max concurrency and max errors settings. When the window opens, tasks are grouped by priority — tasks with the same priority run in parallel, tasks with lower priority numbers run before higher. Within a single task, max concurrency caps how many targets are processed at once (10 instances or 25% of the fleet). Max errors is the failure threshold — if more targets fail than this limit, the task halts. Example: priority 1 task patches 500 instances with concurrency 10% and max errors 5%; priority 2 task creates snapshots; priority 3 task rotates logs. The window opens at 02:00, runs for 4 hours with a 1-hour cutoff (no new task launches in the last hour); priority 1 finishes by 03:00, then priority 2 and 3 run in parallel.

Q6: How does EventBridge Scheduler scale beyond EventBridge scheduled rules?

EventBridge scheduled rules have a per-Region quota on the number of rules and were not designed for the per-tenant SaaS pattern. EventBridge Scheduler was purpose-built to scale to millions of schedules per account using schedule groups (organizational containers, default 500 per account, raisable). Each schedule has its own rate or cron, time zone, target, and retry settings. Schedules can be created and deleted programmatically at high rate via CreateSchedule and DeleteSchedule APIs without throttling concerns. For a SOA-C02 question that says "every customer in our SaaS gets their own daily report at their local 09:00", the answer is EventBridge Scheduler with one schedule per customer in a customer-reports schedule group, each with TimeZone set to the customer's local TZ.

Q7: What is the difference between Config built-in remediation and an EventBridge → SSM pipeline?

Config built-in remediation binds an SSM Automation document directly to a Config rule. When the rule fires NON_COMPLIANT, Config invokes the document automatically (or queues it for manual approval). Simpler, less flexible — one rule, one document, one parameter mapping. EventBridge → SSM pipeline routes the Config Rules Compliance Change event through an EventBridge rule, which can then fan out to multiple targets (SNS for notification, Lambda for enrichment, SSM for remediation, Step Functions for multi-step orchestration). More flexible, more moving parts. Use built-in remediation for simple "rule fires → fix it" cases; use EventBridge → SSM when you need parallel notification + remediation, severity-based routing, or cross-account remediation.

Q8: How do I schedule a task across multiple AWS accounts?

Three patterns. Pattern A — EventBridge Scheduler with cross-account roles: a central scheduling account creates schedules whose execution role assumes a target-account role with the actual API permissions. Centralized, but every target account must trust the scheduling account. Pattern B — StackSets-deployed schedules: deploy the same EventBridge scheduled rule (or Scheduler schedule) into every account via CloudFormation StackSets. Decentralized execution but centrally managed configuration. Pattern C — Cross-account EventBridge event bus: emit a scheduled event to a central bus, route to the right account's bus, and execute locally. Pattern A is the cleanest for ad-hoc cross-account tasks; Pattern B is the right answer when each account should own its own execution audit trail; Pattern C is the right answer when scheduling is one piece of a larger event-driven architecture.

Q9: When should I use DLM versus EventBridge → Lambda for EBS snapshots?

Use DLM for any snapshot use case that fits the model: tag-based targeting, time-based schedule, count-or-age retention, optional cross-region copy, optional fast snapshot restore. DLM is the SOA-preferred answer because it requires no custom code, has a dedicated service role, supports cross-account copy, and is auditable in the DLM console. Use EventBridge → Lambda when the snapshot logic does not fit DLM — for example, "snapshot a volume only if a custom tag value matches a CMDB lookup", "name the snapshot with a hash of the application version", or "skip snapshots when a maintenance window is active". Use EventBridge → SSM Automation when snapshots are one step of a larger orchestrated runbook (stop application → snapshot → run database export → restart application). Never write a cron job inside an EC2 instance to take snapshots — it fails the moment that instance is replaced.

Q10: What is a conformance pack, and how does it differ from individual Config rules?

A conformance pack is a YAML or JSON template bundling multiple Config rules and remediation actions into a single deployable unit. AWS publishes sample packs aligned to compliance frameworks (CIS, NIST 800-53, PCI-DSS, HIPAA, the AWS Operational Best Practices set). A conformance pack deploys atomically — either all rules deploy or the deployment fails — and supports parameters that customize the pack per environment. Through AWS Organizations integration, a conformance pack can deploy to every member account in one operation. The operational difference: deploying 25 Config rules individually requires 25 PutConfigRule API calls, separate IaC entries, and per-rule lifecycle management; a conformance pack treats the 25 as a single logical unit with one lifecycle. SOA-C02 prefers conformance packs whenever a question says "consistent compliance across many accounts" or "apply the CIS benchmark to the entire org".

Q11: Why does my EventBridge rule not match the Config event?

Diagnostic checklist. First, confirm the rule is on the default event bus — Config events go there, not to custom buses (unless you forwarded them explicitly). Second, check the event pattern JSON — source must be aws.config, detail-type must be Config Rules Compliance Change, and any additional filters (detail.configRuleName, detail.newEvaluationResult.complianceType) must match the actual event shape exactly. Third, confirm the rule is enabled — disabled rules do not match. Fourth, view the rule's invocations metric in CloudWatch (AWS/Events, Invocations and FailedInvocations) — zero invocations means no match; non-zero failed invocations means the target is rejecting. Fifth, use the EventBridge sandbox in the console to paste a sample event and test the pattern. The most common cause is a typo in detail-type (e.g., Config Rule Compliance Change instead of Config Rules Compliance Change).

Q12: How does an SSM Automation document handle approval steps?

The aws:approve action pauses the document and sends an SNS notification to a configured topic. Approvers receive the notification and use aws ssm send-automation-signal --signal-type Approve --automation-execution-id <id> (or the console "Approve" button) to advance the document. The action's parameters specify the SNS topic ARN, the list of approver IAM principals (only those identities can approve), the minimum number of approvals required (default 1), and an optional message. Use approval steps for destructive remediations — detaching an EBS volume, terminating an instance, deleting a snapshot — where automatic execution carries unacceptable risk. The approval step is also a useful audit checkpoint because each approval decision is logged in CloudTrail with the approver's identity.

Once schedules and Config automation are in place, the next operational layers are: Systems Manager Automation and Patch Manager for the runbook depth that backs every remediation step; EventBridge Rules, SNS, and Automated Remediation for the alarm-driven side of the same automation fabric; CloudFormation Stacks and StackSets for deploying schedules, Config rules, conformance packs, and Maintenance Windows as IaC across accounts; and CloudTrail and AWS Config for Audit and Compliance for the audit signals that validate every automated change.

官方資料來源