EC2 Image Builder, AMIs, and Deployment Strategies

Task Statement 3.1 of the SOA-C02 exam guide is titled "Provision and maintain cloud resources" and it puts AMI hygiene, EC2 Image Builder pipelines, and deployment strategy selection at the center of the SysOps job description. Where SAA-C03 asks an architect to choose a deployment style, SOA-C02 asks the SysOps administrator to operate that deployment — schedule the AMI build, distribute the AMI across accounts and regions, roll the fleet to the new AMI without downtime, troubleshoot the failed deploy at 2am, and decide whether to roll forward, roll back, or pause. Every word of this topic — AMI, EC2 Image Builder, deployment — is something a real on-call SysOps engineer touches every week.

This study note walks through the AMI lifecycle from create through deregister, the EC2 Image Builder pipeline machinery (recipes, components, build/test stages, infrastructure config, distribution config, and lifecycle policy), and the four exam-relevant deployment strategies — all-at-once, rolling, blue/green, and canary — together with how launch templates, Auto Scaling group instance refresh, warm pools, and CloudFormation UpdatePolicy plug them into operations. Along the way you will see the recurring SOA-C02 traps: AMI cross-account sharing that breaks when the snapshot is encrypted with a customer-managed KMS key, deregistering an AMI without realizing it does not terminate running instances, Image Builder lifecycle policies that delete the AMI a launch template still references, and rolling deployments that produce a brief 5xx spike because MinHealthyPercentage and warmup were never tuned. CodeDeploy and CodeBuild are out of scope for SOA-C02 and only appear here in passing for context.

Why AMI, Image Builder, and Deployment Sit at the Heart of TS 3.1

The official SOA-C02 Exam Guide v2.3 lists five skills under Task Statement 3.1: create and manage AMIs (explicitly calling out EC2 Image Builder), create/manage/troubleshoot CloudFormation, provision across regions and accounts, select deployment scenarios (blue/green, rolling, canary), and identify/remediate deployment issues (service quotas, subnet sizing, CloudFormation errors, permissions). This topic owns the first and fourth skills outright and the fifth one for AMI-specific failure modes; CloudFormation has its own topic (cloudformation-stacks-and-stacksets) for the second skill, and StackSets/RAM cover the third.

At the SysOps tier, the framing is operational. SAA-C03 asks "which deployment strategy fits this RTO requirement?" SOA-C02 asks "the rolling deployment is producing 5xx errors during instance replacement — what setting do you tune?" The answer is rarely a different strategy; it is MinHealthyPercentage, health-check grace period, target group deregistration delay, or warm pool size. The same applies to AMIs: the question is rarely "should we use a golden AMI?" — yes, every production fleet should — but rather "the patching pipeline produced a new AMI two weeks ago and the fleet is still on the old one — what is broken?" That is an Image Builder pipeline schedule, an EventBridge rule, an instance refresh that was never initiated, or a launch template still pinned to a specific AMI ID instead of $Latest.

AMI (Amazon Machine Image): the immutable template that EC2 uses to launch an instance — root volume snapshot, launch permissions, block device mapping, kernel/architecture metadata. Every running EC2 instance was launched from exactly one AMI.
Golden AMI: an organization's pre-hardened, pre-patched, agent-baked base AMI. Every fleet launches from the golden AMI rather than the raw vendor AMI.
EC2 Image Builder: managed pipeline service that automates the build, test, and distribution of AMIs (and container images). Replaces hand-rolled Packer + Jenkins toolchains.
Image Recipe: an Image Builder document declaring base image plus an ordered list of build/test components and block device mappings. The recipe is what becomes the AMI.
Component: a YAML build or test step (install package, run script, validate output). AWS provides managed components; customers can author custom components.
Distribution Settings: an Image Builder document declaring which accounts and regions the AMI is copied to, with launch permissions and KMS keys per region.
Image Lifecycle Policy: an Image Builder rule that automatically deprecates and deletes old AMIs (and their snapshots) on a schedule.
Launch Template: a versioned EC2 launch specification (AMI ID, instance type, security groups, user data, IAM profile) that ASGs and RunInstances reference.
Instance Refresh: an Auto Scaling group operation that replaces instances in batches according to a target launch template version, respecting MinHealthyPercentage and warmup.
MinHealthyPercentage: the floor on healthy capacity that an instance refresh (or rolling deploy) is allowed to dip to during replacement.
Blue/Green Deployment: launch a parallel fleet on the new AMI, cut traffic over (Route 53 weighted, ALB target group swap, or stack swap), keep the old fleet briefly for rollback.
Canary Deployment: route a small fraction of traffic (5–10 percent) to the new AMI fleet, watch metrics, then scale out or roll back.
Rolling Deployment: replace instances in-place in batches, draining and replacing while the rest of the fleet serves traffic.
Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html

白話文解釋 AMI, EC2 Image Builder, and Deployment Strategies

Image and deployment terminology stacks fast. Three analogies make the constructs stick.

Analogy 1: The Bakery Master Recipe and Daily Loaves

Think of an AMI as a fully baked, frozen master loaf sitting on the bakery shelf — every customer who orders bread walks out with an identical thawed copy. The AMI ID is the shelf label (ami-0abc...) that uniquely identifies that frozen master. EC2 Image Builder is the automated bakery line that runs every Monday morning: it pulls last week's recipe, fetches the latest flour batch (the upstream Amazon Linux 2023 AMI), kneads in the standard ingredients (CloudWatch agent, SSM agent, security baseline, application runtime), bakes a test loaf, runs taste tests (component test phase), and only if the loaf passes does it stamp a new shelf label and freeze the master loaf. The image recipe is the written cookbook page — base flour plus the ordered list of mix-ins. Components are the labeled ingredient bins the bakery line draws from (amazon-cloudwatch-agent-linux, update-linux, your custom install-payment-app bin). Distribution settings are the delivery roster — which warehouses (regions) and which subsidiary bakeries (other AWS accounts) get a copy of the new master loaf, and which freezer key (KMS key) each warehouse uses. Image lifecycle policies are the freezer rotation rule — any master loaf older than 90 days gets pulled and discarded so the freezer does not fill up with stale stock. Deprecation is the "do not order from this loaf anymore" sticker; deregistration is the actual discarding of the master loaf. Crucially, customers (running EC2 instances) who already bought a thawed loaf are not affected when the master is discarded — they keep eating their bread until they order again.

Analogy 2: The Hotel Renovation Strategies

Now picture deploying a new application version to a 100-room hotel. All-at-once deployment is closing the entire hotel for one weekend and re-painting every room simultaneously — fastest, cheapest, but the hotel earns nothing during the renovation and if the new paint is the wrong color, every guest sees it on Monday. Rolling deployment is renovating 10 rooms at a time — guests in the other 90 rooms keep their stay uninterrupted, but if the new paint is wrong, by the time you notice, 30 rooms are already done and the hotel looks like a patchwork until you fix it. Blue/green deployment is building an entirely new 100-room wing, polishing it to perfection, then redirecting all new check-ins to the new wing while the old wing finishes out its existing reservations and is later torn down — most expensive (you pay for two wings briefly), but the safest because the old wing is still there to fall back to if guests complain. Canary deployment is opening just 5 of the new wing's rooms to a small group of friendly guests (the canary cohort), watching their reviews for two days, and only after the reviews are positive opening the rest of the wing. The SOA-C02 exam tests when each style is the right answer: rolling for routine version bumps and stateless apps, blue/green when you must guarantee zero downtime and instant rollback, canary when you cannot afford to expose a bad change to all users at once, all-at-once only for non-production or maintenance windows.

A launch template is the build spec sheet the company hands to the van factory: chassis model (instance type), engine (AMI ID), wrap design (security group), driver toolkit (IAM instance profile), pre-loaded software (user data). The fleet manager does not order vans by hand-typing the spec each time — they reference the spec sheet. When the chassis model is upgraded, the manager publishes a new version of the spec sheet (Version 4) and tells the dispatch system "from now on, all replacement vans use spec sheet $Latest". The Auto Scaling group is the dispatch system: it knows it must keep 50 vans on the road at all times, draws replacements from the launch template's latest version, and can perform an instance refresh which is the formal "replace every old-spec van with a new-spec van, but never let healthy fleet capacity drop below 90 percent". A warm pool is a garage of pre-built but parked vans that can be rolled out the moment a new one is needed — cheaper than running them and faster to dispatch than building from scratch.

For SOA-C02, the bakery analogy is the most useful when a question mixes AMI lifecycle (deprecate vs deregister) with cross-region copy and KMS encryption. The hotel analogy is the cleanest mental model for picking between rolling, blue/green, and canary. The van fleet analogy is the right frame when a question describes a launch template version mismatch or an instance refresh failing to replace old instances. Reference: https://docs.aws.amazon.com/imagebuilder/latest/userguide/what-is-image-builder.html

AMI Fundamentals: Anatomy, Permissions, and Block Device Mapping

Before deployment strategies make sense you need a precise mental model of what an AMI actually is.

What lives inside an AMI

An AMI is metadata plus a pointer to one or more EBS snapshots. The metadata records: the architecture (x86_64, arm64), virtualization type (HVM is the only modern option), root device type (almost always ebs), kernel/RAM disk IDs (legacy), and a block device mapping that lists each volume the instance will receive at launch. The block device mapping references EBS snapshots for each volume — the root volume snapshot is mandatory, and additional data volumes can be specified for ephemeral data, swap, or pre-populated reference data.

The AMI itself does not store the snapshot data; it stores pointers. When you copy an AMI to another region, the underlying snapshots are copied too, and the new AMI in the destination region points at the destination-region snapshots. When you deregister an AMI, the snapshots are not automatically deleted — they continue to incur storage charges until you delete them explicitly. Image Builder lifecycle policies do delete the underlying snapshots when they delete the AMI, which is the cleanest operational pattern.

Every AMI has a set of launch permissions that decide who can launch instances from it: public (anyone), specific AWS account IDs, or your own account only (the default). On SOA-C02 you must remember that sharing an AMI with another account requires three things to line up:

Launch permissions on the AMI itself granted to the target account ID.
EBS snapshot permissions for the underlying snapshots also granted to the target account.
If the snapshot is encrypted with a customer-managed KMS key, a KMS key grant or key policy permission for the target account, otherwise the target account can see the AMI but cannot launch from it.

The third condition is where SOA-C02 questions trap candidates. You can perfectly share the AMI metadata, but if the snapshot was encrypted with a CMK whose key policy does not authorize the target account, the launch will fail with a key access error. Default aws/ebs AWS-managed keys cannot be shared at all — you must re-encrypt with a customer-managed key during the copy.

AMI lifecycle states

An AMI moves through a well-defined lifecycle:

Create / register — the AMI is born, either from a running instance (CreateImage), from a snapshot (RegisterImage), or from an Image Builder pipeline run.
Available — instances can launch from it; the AMI shows up in the EC2 console under "My AMIs".
Deprecated — the AMI is marked deprecated as of a date; it still works for explicit launches by the owning account, but it is hidden from search and console listings for other accounts. Auto Scaling groups will continue to launch from a deprecated AMI.
Disabled — a separate state introduced for stronger lifecycle control: a disabled AMI cannot be used to launch new instances at all, even by the owning account, until re-enabled.
Deregistered — the AMI metadata is removed; you can no longer launch new instances from this AMI ID. Existing instances launched from it keep running unaffected.
Snapshot deletion — the underlying EBS snapshots are deleted (separate API), reclaiming storage cost. Without this step, a deregistered AMI's snapshots linger and cost money.

Default AMI launch permission: private (owning account only).
Public AMI: launch permission all. Avoid for production AMIs unless intentionally sharing.
Cross-account AMI sharing requires three things: AMI launch permission + snapshot permission + KMS key grant (if CMK-encrypted).
Default aws/ebs AWS-managed key: cannot be shared cross-account; must re-encrypt with CMK during copy.
Deregister an AMI: prevents NEW launches; running instances are unaffected; snapshots remain and cost money until separately deleted.
Deprecate an AMI: hides from search/listings, still launchable by explicit ID; ASGs keep launching from it.
Disable an AMI: blocks ALL new launches including by the owner — stronger than deprecate.
AMI quota: 50,000 AMIs per region per account by default; 25 instance store-backed AMIs.
Image Builder pipeline runs: can be scheduled (cron) or triggered (new dependency, manual, or by EventBridge).
Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html

Cross-region AMI copy

The copy-image API duplicates an AMI (and its snapshots) into another region. Three operational considerations:

Encryption: you can choose to encrypt the destination AMI with a different KMS key — typical for keeping a region-local CMK rather than re-using the source-region key. If the source is unencrypted, you can encrypt-on-copy. If the source is encrypted with a CMK, the copy operation needs kms:CreateGrant on the source key and kms:Encrypt/kms:GenerateDataKey on the destination key.
Permissions: copied AMIs do not inherit launch permissions — the destination AMI defaults to private to your account. You must re-grant explicitly.
Cost: storage cost in both regions, plus inter-region snapshot data transfer for the initial copy.

For multi-region deployment, Image Builder distribution settings handle this automatically — you declare the target regions in the distribution config and Image Builder performs the copy and applies the per-region launch permissions and encryption keys without a separate API call.

A SysOps engineer deregisters an old AMI to "clean up" and is surprised that the 200 EC2 instances launched from that AMI keep running normally. Deregistration only prevents NEW launches — it has zero effect on instances already running. Deletion of the underlying EBS snapshots is also separate; the snapshots survive deregistration unless you delete them explicitly. The exam frequently presents this scenario and asks "what changed?" — the answer is "nothing for running instances; only new launches are blocked, and snapshots still cost money". Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/deregister-ami.html

The Golden AMI Pattern: Why Every Fleet Should Start From One

A golden AMI is the organization's pre-hardened base image that every fleet's launch templates point at. It contains the patched OS, the standard agents (CloudWatch, SSM, OS-level monitoring), the security baseline (CIS-aligned hardening, sshd config, audit rules), and optionally the application runtime baseline (JDK version, language runtime). The application code itself is not in the golden AMI — that is layered on top via user data, configuration management, or a separate application AMI built from the golden.

The golden AMI pattern is operational standard for three reasons:

Boot-time consistency: every new instance starts identical. No "this instance is missing the CloudWatch agent because user-data failed silently".
Patch latency reduction: the latest golden AMI carries last week's CVE patches. Re-baking weekly means new instances boot with current patches, no in-place patching catch-up needed.
Audit and compliance: the golden AMI is a single, immutable artifact you can sign, hash, and reference in a compliance audit. Hand-built instances cannot be audited that way.

EC2 Image Builder is the AWS-native way to produce golden AMIs, and SOA-C02 expects you to know its mechanics in detail.

EC2 Image Builder: Pipelines, Recipes, Components

EC2 Image Builder is a managed service that automates the build, test, and distribution of AMIs and container images. It eliminates the need for hand-rolled Packer plus Jenkins (or shell scripts plus AMIs.sh) by giving you a pipeline DSL, a component DSL, distribution settings, and lifecycle policies as first-class AWS resources.

The pipeline object model

An Image Builder pipeline is the top-level object. It binds together:

An image recipe (or a container recipe) — the spec for what gets built.
An infrastructure configuration — the EC2 instance type, subnet, IAM instance profile, key pair, and security group used to perform the build (Image Builder spins up a temporary builder instance, runs the recipe inside it, takes a snapshot, and tears the instance down).
A distribution configuration — where the resulting AMI is copied (regions and accounts) with what permissions and which KMS keys.
An optional schedule — pipelineExecutionStartCondition of EXPRESSION_MATCH_ONLY (run on cron only) or EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE (run on cron and whenever the upstream base AMI or a referenced component publishes a new version). The cron expression itself uses standard cron(...) syntax.
An optional image tests configuration — whether to run the recipe's test components and how long to wait for them.

When the pipeline runs, Image Builder produces a new image (the term Image Builder uses for the output AMI plus its metadata), assigns it a semantic version (1.2.3/4 for recipe version 1.2.3, build counter 4), and executes distribution.

The image recipe

A recipe declares:

A parent image — either an AWS-managed base AMI (e.g., Amazon Linux 2023 x86), an AMI ID, or another image previously built by Image Builder. The parent image is a versioned reference; choosing latest means the next pipeline run automatically pulls the latest base.
An ordered list of build components — install packages, copy files, run scripts, configure agents.
An optional ordered list of test components — run validation scripts after the build, fail the pipeline if they fail.
A block device mapping — root volume size, type (gp3 typically), additional data volumes if needed, and encryption settings.
Optional working directory and additional instance configuration (user data overrides).

A recipe is immutable once published — to change anything you publish a new recipe version. The pipeline references the recipe by name and version (or latest).

Components

A component is a YAML document that declares a sequence of phases (build, validate, test) with steps inside each. Steps can:

Run shell scripts (ExecuteBash, ExecutePowerShell).
Download files from S3 (S3Download).
Set environment variables and reboot.
Call SSM agent operations.

AWS provides hundreds of managed components for common tasks — update-linux, amazon-cloudwatch-agent-linux, aws-cli-version-2-linux, chrony-time-configuration-test, simple-boot-test-linux, and many more. Customers author custom components for their organization-specific configuration: install the corporate root certificate, configure the syslog forwarder, install the application runtime.

Components are versioned (1.0.0, 1.0.1) and published into a regional component repository. Cross-region pipelines need the component to be available in each target region — Image Builder handles this transparently for AWS-managed components but custom components must be published to each region or shared via the component DSL across regions.

A common SOA-C02 anti-pattern: install the CloudWatch agent and SSM agent via user-data on every instance launch. This works but is fragile (user-data runs once, can fail silently, and is replayed if the instance is stopped/started). The exam-correct pattern bakes the agents into the golden AMI via Image Builder components — amazon-cloudwatch-agent-linux and SSM agent (which is pre-installed on Amazon Linux 2/2023 anyway). The result is a fleet where every instance boots ready to be monitored and managed. Reference: https://docs.aws.amazon.com/imagebuilder/latest/userguide/manage-components.html

Infrastructure configuration

The infrastructure config tells Image Builder what kind of instance to use as the builder and what network to put it in. Important fields:

Instance types — at least one (Image Builder will fall back to others if capacity is unavailable).
Subnet ID and security group IDs — must allow outbound access to S3 (component artifacts), to Amazon Linux package repos, and to AWS APIs. Typical setup: a private subnet with a NAT gateway, plus VPC endpoints for SSM, EC2, S3, and Image Builder if you want fully private builds.
IAM instance profile — the builder instance assumes this role to do its work. Must include EC2InstanceProfileForImageBuilder, AmazonSSMManagedInstanceCore, and any custom permissions your components need (S3 read for download, etc.).
Logging — S3 bucket where the build logs are written; essential for troubleshooting failed builds.
Termination on failure — keep the builder instance alive when a build fails so you can SSH in and debug, or terminate it (default: terminate, but for development pipelines override to keep).

Distribution configuration

The distribution config specifies where the resulting AMI goes:

Regions — list of target regions (the source region is implied; additional regions are copies).
Per-region: AMI name and description, launch permissions (account IDs, organizations, or all), and the KMS key used to encrypt the destination snapshots.
License configurations — optional License Manager attachments for BYOL workloads.
Launch template association — Image Builder can automatically update a target launch template to point at the new AMI as the latest version, with optional notification via SNS or EventBridge. This is the operational glue that connects "new AMI exists" to "ASG can pick it up".

The cleanest SOA-C02 answer for "the new AMI must automatically replace the old one in the ASG" is to configure the Image Builder distribution settings to update a target launch template. The distribution step creates a new launch template version pointing at the new AMI ID, and the ASG (configured to use $Latest) automatically picks it up. The fleet does not roll over until you trigger an instance refresh, which decouples "AMI is ready" from "fleet is replaced" — exactly what a controlled deployment requires. Reference: https://docs.aws.amazon.com/imagebuilder/latest/userguide/manage-distribution-settings.html

Image lifecycle policies

An image lifecycle policy is an Image Builder rule that automatically deprecates and deletes old AMIs (and their snapshots) on a schedule. Without lifecycle policies the AMI count and snapshot storage grow unboundedly — every weekly pipeline run produces a new AMI and the bills accumulate.

A lifecycle policy declares:

Resource selection — by image name pattern, tag, or recipe.
Retention rules — keep the latest N versions, or keep versions younger than X days.
Action sequence — typically deprecate first (mark old AMIs as not-recommended), wait Y days, then delete (deregister AMI and delete snapshots). The exam expects you to know the canonical sequence: deprecate → wait → disable → wait → delete.
Cross-account behavior — actions can apply only in the source account or also propagate to the target accounts the AMI was distributed to.

Image Builder lifecycle policies do not check whether a launch template, ASG, or Service Catalog product still references the AMI before deleting it. If your retention rule says "keep only the last 4 weekly AMIs" and a launch template version is pinned to the AMI from 6 weeks ago, the lifecycle policy will deregister that AMI and the next ASG launch attempt will fail with InvalidAMIID.NotFound. The mitigations: pin launch templates to $Latest so they always reference the newest AMI; or make the lifecycle policy retention generous enough to outlast the slowest deployment cadence; or use the deprecate-only action and delete manually after verifying. Reference: https://docs.aws.amazon.com/imagebuilder/latest/userguide/image-lifecycle.html

Launch Templates: AMI Pinning vs Latest Reference

Launch templates are versioned EC2 launch specifications. ASGs and the EC2 console reference them when launching new instances.

Why launch templates beat launch configurations

Launch configurations are the legacy alternative — immutable, no versioning, deprecated for new ASGs. Launch templates are versioned, support more EC2 features (T3 unlimited mode, hibernation, Capacity Reservations, instance metadata service v2 enforcement), and integrate with EC2 Image Builder distribution. SOA-C02 expects launch templates throughout; launch configurations appear only as wrong answers.

AMI ID pinning vs `$Latest`

Inside a launch template version, the AMI ID is stored as a literal value (ami-0abc...). When the underlying AMI changes (a new Image Builder run produces ami-0def...), you have two options:

Pin to a specific AMI ID — every launch template version explicitly references one AMI. To roll forward, you publish a new launch template version with the new AMI ID and the ASG points at the new version (or $Latest).
Use the special version $Latest in the ASG config — the ASG always launches new instances from whatever the most recent launch template version says. If the most recent version was published yesterday with the new AMI, instances launched today use the new AMI.

A third option, $Default, points at the launch template's designated default version, which is typically the version you have validated for production. Some teams use $Default for ASG launches and reserve $Latest for canary or test ASGs.

A common SOA-C02 misconfiguration: the ASG references launch template version 5 explicitly. Operations publishes version 6 with a new AMI ID and expects the ASG to start using it. New launches keep going to version 5 because the ASG is pinned. The fix is to set the ASG's launch template version to $Latest (always newest) or $Default (always the marked-default version), not a literal version number. Then changing the latest or default version automatically flows to new launches. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html

Updating a fleet to a new AMI

Updating a launch template version does not automatically replace already-running instances. New launches pick up the new AMI, but existing instances stay until they are explicitly replaced. Replacement happens via:

Auto Scaling group instance refresh — the SOA-canonical answer for ASG-managed fleets.
CloudFormation UpdatePolicy: AutoScalingRollingUpdate — the IaC-driven equivalent when the ASG is managed by CloudFormation.
CloudFormation UpdatePolicy: AutoScalingReplacingUpdate — blue/green-style stack swap.
Manual termination — terminate old instances one by one and let the ASG launch replacements with the new AMI; only acceptable for tiny fleets.

Auto Scaling Group Instance Refresh: The Operational Heart

Instance refresh is the ASG operation that replaces running instances with a new launch template version. It is the most operationally tested deployment mechanic on SOA-C02.

How instance refresh works

You start an instance refresh on an ASG with parameters:

Strategy — currently only Rolling (more strategies may arrive but rolling is the production-ready one for SOA-C02).
MinHealthyPercentage (default 90) — the floor on healthy capacity during refresh. If 90, an ASG with 10 instances will replace at most 1 at a time; with 100 instances, at most 10 at a time. Lower this to make refresh faster (more parallelism) at the cost of capacity dip; raise to 100 to require new instances first, then terminate old (zero-dip rolling, slower).
InstanceWarmup — how long after a new instance is launched before it counts as healthy and the next batch can begin. Tune to your application's warm-up time (JIT compilation, cache prime, ELB target health check stabilization).
CheckpointPercentages and CheckpointDelay — pause the refresh at specified completion percentages for verification. Common pattern: pause at 25 percent for an hour to validate canary cohort, then continue.
Skip matching — skip instances already running the new launch template version, so resuming a partially failed refresh does not double-replace.

MinHealthyPercentage interaction with deployment style

The MinHealthyPercentage setting effectively determines what deployment style instance refresh delivers:

MinHealthyPercentage = 100 — new instances are launched first, then old ones terminated. Effectively zero-dip rolling, requires the ASG MaxSize to be larger than DesiredCapacity to allow simultaneous old+new instances. Closest to blue/green at the instance level.
MinHealthyPercentage = 90 (default) — small batches replaced in-place, brief capacity dip. Standard rolling.
MinHealthyPercentage = 50 — aggressive parallel replacement, half the fleet down at peak. Used for non-customer-facing batch fleets.
MinHealthyPercentage = 0 — all-at-once-style replacement (terminate everything, relaunch). Avoid in production.

Default MinHealthyPercentage: 90 — replace 10% of fleet at a time, 90% always serving.
Zero-dip rolling: MinHealthyPercentage = 100 — requires MaxSize > DesiredCapacity.
Canary checkpoint: typically 5–10 percent of fleet first, pause for ≥ 15 minutes to gather metrics, then continue.
Blue/green ratio: 100% old + 100% new running simultaneously during cutover; double cost briefly.
Rolling deployment ratio: typically 10–25% of fleet replaced per batch.
InstanceWarmup: typical 60–300 seconds; should exceed health check grace period and ELB target stabilization.
CloudFormation MinSuccessfulInstancesPercent: similar to ASG MinHealthyPercentage, governs CloudFormation AutoScalingRollingUpdate.
Health check grace period: typically 300 seconds for stateful apps; 60 seconds for fast-booting stateless apps.
Target group deregistration delay: 30–60 seconds for short-lived connections; up to 3600 for long-poll/WebSocket.
Reference: https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html

Warm pools

A warm pool is a pool of pre-initialized instances kept in Stopped or Hibernated state, ready to be brought online quickly when the ASG needs to scale out or refresh. Warm pools shorten effective boot time from minutes (cold launch + user data + warm-up) to seconds (resume + register with ELB).

Warm pools matter for instance refresh because they let InstanceWarmup shrink — a hibernated instance comes up with the JIT warm and caches primed, so the deployment can proceed faster while still being safe. SOA-C02 occasionally tests warm pool sizing for refresh acceleration.

Deployment Strategies: All-at-Once, Rolling, Blue/Green, Canary

The four canonical EC2 deployment strategies. SOA-C02 tests selecting the right one for a given operational constraint and configuring the AWS primitives that implement it.

All-at-once

Terminate every old instance and launch every new instance simultaneously (or in a single batch with no health gating). Fastest, simplest, cheapest in terms of running cost, but the entire fleet is briefly unavailable, and a bad AMI takes the whole fleet down at once. Acceptable for non-production, off-hours batch jobs, or scheduled maintenance windows; never for customer-facing 24/7 production.

Implementation: MinHealthyPercentage = 0 instance refresh, or aws autoscaling update-auto-scaling-group to swap the launch template version followed by manual termination of all instances.

Rolling

Replace instances in batches while the rest of the fleet serves traffic. The dominant strategy for SOA-C02 routine deployments. Tunable via MinHealthyPercentage, InstanceWarmup, and target group DeregistrationDelay. Single fleet, no double cost. Rollback is "start another instance refresh with the old launch template version", which is slow if the new AMI is broken.

Implementation: ASG instance refresh with default or custom MinHealthyPercentage. CloudFormation: UpdatePolicy: AutoScalingRollingUpdate { MaxBatchSize, MinInstancesInService, MinSuccessfulInstancesPercent, PauseTime, WaitOnResourceSignals }.

Blue/Green

Launch a new full fleet (the green) on the new AMI alongside the existing fleet (the blue), validate it, then switch traffic. Two implementations on EC2:

ALB target group swap — keep the ALB, swap which target group the listener forwards to. Sub-second cutover, instant rollback by swapping back. The new ASG must register healthy targets in the green target group before the swap.
Route 53 weighted routing — the blue and green fleets each have their own ALB or NLB; Route 53 weighted records point at both with weights 100/0, then 0/100, optionally going through 50/50 for canary testing. DNS TTL determines the cutover speed.
CloudFormation stack swap — deploy two parallel CloudFormation stacks (blue stack and green stack), use UpdatePolicy: AutoScalingReplacingUpdate so CloudFormation creates a new ASG and switches the LB target, the old ASG is then deleted.

Blue/green has the strongest rollback story (the old fleet is still live during the validation window) but doubles cost during the overlap.

Canary

Route a small fraction of traffic (typically 5–10 percent) to a small fleet running the new AMI, observe metrics, and proceed only if the canary is healthy. Implementations:

ALB weighted target groups — the listener's forward action references both old and new target groups with weights (95, 5) for example. Traffic splits at the listener.
Route 53 weighted records — weights split DNS responses, less precise than ALB weighting because clients cache DNS.
Instance refresh checkpoints — start an instance refresh with CheckpointPercentages: [10, 100] and a CheckpointDelay of 30 minutes. The refresh pauses at 10 percent fleet replaced, lets you observe canary metrics, then continues to 100 percent if you do not abort.

Canary is the right answer when the question emphasizes "exposing a small percentage of users first" and "abort early if metrics degrade".

::warning

A common SOA-C02 production scenario: a rolling instance refresh produces a 30-second 5xx spike each time a batch is replaced. Three settings to tune in order: (1) raise MinHealthyPercentage toward 100 so new instances come up before old ones terminate; (2) extend InstanceWarmup so newly launched instances finish JIT warm-up and ELB health-check stabilization before being counted as healthy; (3) increase target group DeregistrationDelay so in-flight requests on the terminating instance complete before the connection is dropped. The combination eliminates the 5xx spike at the cost of a slightly slower deployment. Reference: https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html ::

Strategy selection decision matrix

Operational requirement	Strategy	AWS primitive
Fastest, lowest-cost, non-production	All-at-once	Instance refresh `MinHealthyPercentage = 0`
Routine version bumps, stateless app	Rolling	Instance refresh default 90
Zero-downtime, instant rollback, willing to double cost briefly	Blue/Green	ALB target group swap or CloudFormation `AutoScalingReplacingUpdate`
Validate change with small user fraction first	Canary	ALB weighted target groups + checkpoint refresh
Stateful application that cannot tolerate cap dip	Blue/Green or zero-dip rolling (`MinHealthyPercentage = 100`)	Requires `MaxSize > DesiredCapacity` headroom
Batch/non-customer-facing, fast deploy	Aggressive rolling (`MinHealthyPercentage = 50`)	Instance refresh with custom percentage

CloudFormation UpdatePolicy: ASG Rolling and Replacing Updates

When ASGs are managed by CloudFormation, the UpdatePolicy attribute on AWS::AutoScaling::AutoScalingGroup controls how CloudFormation rolls a launch template change.

`AutoScalingRollingUpdate`

Rolls the existing ASG in place. Key fields:

MaxBatchSize — instances replaced per batch.
MinInstancesInService — minimum healthy capacity during update (analogous to MinHealthyPercentage).
MinSuccessfulInstancesPercent — the percentage of newly launched instances that must signal success for the batch to proceed.
PauseTime — how long to wait between batches (ISO 8601 duration like PT5M).
WaitOnResourceSignals — if true, CloudFormation waits for cfn-signal from each new instance before counting it healthy. Pairs with user-data calling cfn-signal after app startup.

`AutoScalingReplacingUpdate`

Creates an entirely new ASG, brings it up to capacity, switches the load balancer target, then deletes the old ASG. Effectively blue/green at the ASG level. The new ASG has a new logical resource ID inside the stack.

`AutoScalingScheduledAction`

Special handling for ASGs with scheduled scaling actions during the update — pause or skip them.

The exam tests recognizing which UpdatePolicy matches a stated requirement. "Replace the ASG entirely so we can roll back by deleting the new one" → AutoScalingReplacingUpdate. "Roll instances in place at 25 percent batch size with 5-minute pauses" → AutoScalingRollingUpdate with MaxBatchSize: 25 percent equivalent and PauseTime: PT5M.

Scenario Pattern: Fleet Running Outdated AMI After Patching Cycle

Canonical SOA-C02 troubleshooting. The Image Builder pipeline is supposed to run weekly, but the fleet still shows an AMI from six weeks ago. Diagnostic runbook:

Check pipeline execution history in the Image Builder console. Did the pipeline run last week? Last month? If the most recent run was six weeks ago, the schedule is broken.
Inspect the schedule — pipelineExecutionStartCondition and the cron expression. Common issues: cron format error (CloudWatch Events cron(...) differs from Linux cron — six fields including year), or the schedule was disabled.
Check the EventBridge rule that triggers the pipeline (if you trigger by event rather than cron). Was it disabled? Did the rule's IAM role lose permission to start the pipeline?
Inspect the most recent failed build log, written to the S3 bucket configured in the infrastructure config. Common failures: a component referenced by the recipe was deleted; the parent base AMI was deprecated and no replacement was set; the builder instance ran out of disk space; a test component failed and the pipeline aborts builds when test fails.
Verify the launch template was updated by the distribution config. If the pipeline succeeded but the launch template is unchanged, the distribution settings did not include the launch template association.
Verify the ASG references $Latest — if it still references a specific version number, new launch template versions are ignored.
Verify an instance refresh was initiated — the ASG only replaces existing instances when an instance refresh runs. If everything else is correct but the existing instances still run the old AMI, no refresh has happened.

The most common root cause in our experience: step 6 — the ASG was created with a specific launch template version number rather than $Latest, so updates never flow.

Scenario Pattern: New AMI Deployed but Old Instances Not Replaced

A new launch template version was published. The ASG references $Latest. Yet existing instances are still running the old AMI. Diagnostic:

New launches use the new AMI: launch template versioning works at launch time, not retroactively. Existing instances were launched from the old version and are not automatically replaced.
Trigger an instance refresh: this is the explicit operation that replaces existing instances. The exam expects you to know aws autoscaling start-instance-refresh --auto-scaling-group-name <name> is the answer.
Tune the refresh parameters: MinHealthyPercentage, InstanceWarmup, CheckpointPercentages for canary, SkipMatching: true so already-new instances are not double-replaced.
Watch refresh progress in the EC2 Auto Scaling console or via describe-instance-refreshes. The refresh state moves through Pending, InProgress, Successful, or Failed/Cancelled.

A SysOps team shares an AMI with a partner account. The partner account sees the AMI in their console but RunInstances fails with Client.InvalidVolume.NotFound or a KMS access error. Runbook:

Check AMI launch permission — the partner account ID must be in the AMI's launch permissions. describe-images --image-ids ... --owners self and inspect.
Check snapshot permission — each snapshot referenced by the AMI's block device mapping must also have the partner account in createVolumePermission. describe-snapshots --owner-ids self.
Check encryption — if the snapshot is encrypted, identify the KMS key. If it is the AWS-managed aws/ebs key, you cannot share — you must copy the AMI re-encrypting with a customer-managed CMK.
Check KMS key policy — the CMK's key policy must grant the partner account kms:Decrypt, kms:DescribeKey, kms:CreateGrant, and kms:GenerateDataKey*. Either via key policy statement or via aws kms create-grant.
Verify with a test launch in the partner account.

Common Trap: Default `aws/ebs` Key Cannot Be Shared

A SOA-C02 distractor exploits the encryption permission model. The candidate "shares" the AMI launch permission and snapshot permission, but the snapshot was encrypted with the AWS-managed aws/ebs key, which cannot be shared cross-account. The correct sequence is: copy the AMI within the source account encrypting with a customer-managed CMK, share the new AMI and its CMK-encrypted snapshot, and grant the partner account on the CMK key policy.

Common Trap: Pipeline Lifecycle Policy Deleting Active AMIs

Image Builder lifecycle policies can delete an AMI that a launch template version still references. The deletion happens silently — the lifecycle policy does not introspect launch templates. The next ASG launch attempt fails with InvalidAMIID.NotFound. Mitigation: use $Latest in launch templates; or set lifecycle retention to be more generous than the longest deployment cadence; or use deprecate-only and delete after a long manual review window.

Common Trap: User Data Replacing the Image Builder Component Pattern

A team writes a 200-line user-data shell script that installs the CloudWatch agent, configures syslog, drops in the application code, and starts services. This works but every new AMI requires 90 seconds of user-data execution at boot, and silent user-data failures leave inconsistent fleets. The SOA-correct pattern bakes the agents and configuration into the golden AMI via Image Builder components, so user-data shrinks to a tiny config-only script (set instance role, fetch secrets, start app) that runs in seconds.

SOA-C02 vs SAA-C03: The Operational Lens

SAA and SOA both touch AMIs and deployment, but the questions differ.

Question style	SAA-C03 lens	SOA-C02 lens
Choosing AMI strategy	"Which AMI strategy supports horizontal scaling?"	"Pipeline produced new AMI; fleet still on old one — diagnose."
Image Builder	Rarely tested.	Pipeline schedule, components, distribution settings, lifecycle policy.
Cross-account AMI	"Architecture should share AMIs across accounts."	"Cross-account launch fails with KMS error — list every IAM and KMS step."
Deployment strategy	"Which strategy minimizes downtime?"	"Rolling deploy produces 5xx spike — what setting do you tune?"
ASG instance refresh	Not tested by name.	`MinHealthyPercentage`, `InstanceWarmup`, checkpoints — operational defaults.
Launch template	"Use launch template instead of launch configuration."	"ASG references version 5 explicitly; new version 6 ignored — fix it."
CloudFormation `UpdatePolicy`	Mentioned conceptually.	Pick `AutoScalingRollingUpdate` vs `AutoScalingReplacingUpdate` for stated requirement.
Blue/green	"Which strategy gives instant rollback?"	"ALB target group swap vs Route 53 weighted vs stack swap — choose for the constraints."

Exam Signal: How to Recognize a TS 3.1 AMI/Deployment Question

Domain 3.1 questions on SOA-C02 follow predictable shapes for AMI and deployment topics.

"The fleet is running an outdated AMI" — answer involves Image Builder pipeline schedule, distribution setting to update launch template, ASG using $Latest, and an instance refresh.
"The deploy produces 5xx errors during replacement" — answer is MinHealthyPercentage, InstanceWarmup, target group deregistration delay, and possibly health check grace period.
"A small fraction of users should test the new version first" — canary, implemented via ALB weighted target groups or instance refresh checkpoints.
"Need instant rollback" — blue/green via ALB target group swap or CloudFormation AutoScalingReplacingUpdate.
"Cross-account AMI launch fails" — KMS key policy on the customer-managed CMK plus snapshot permission plus AMI launch permission, all three required.
"AMI count is growing without bound" — Image Builder lifecycle policy with deprecate-then-delete sequence.
"AMI deregistered but instances still running" — that is the expected behavior; deregister blocks new launches only.
"Memory and disk metrics missing on every new instance" — bake the CloudWatch agent into the golden AMI via Image Builder component, not user-data.

Domain 3 is worth 18 percent and TS 3.1 covers AMI, Image Builder, CloudFormation, cross-account/region provisioning, and deployment strategies. Of the roughly 12 questions in this domain, expect 6–10 to involve AMI lifecycle, Image Builder pipeline mechanics, instance refresh tuning, and deployment strategy selection. Mastering this section — in particular the MinHealthyPercentage and KMS-key-grant traps — is high-leverage. Reference: https://d1.awsstatic.com/onedam/marketing-channels/website/aws/en_US/certification/approved/pdfs/docs-sysops-associate/AWS-Certified-SysOps-Administrator-Associate_Exam-Guide.pdf

Decision Matrix — AMI and Deployment Constructs by SysOps Goal

Use this lookup during the exam.

Operational goal	Primary construct	Notes
Produce a weekly patched golden AMI	Image Builder pipeline with cron schedule	Components for agent install + OS update.
Roll new AMI to fleet automatically	Distribution → launch template `$Latest` → ASG instance refresh	Three-link chain; all three must be wired.
Replace fleet with zero capacity dip	Instance refresh `MinHealthyPercentage = 100`	Requires `MaxSize > DesiredCapacity` headroom.
Test new AMI with 5% traffic	ALB weighted target groups + canary ASG	Or instance refresh `CheckpointPercentages: [5, 100]`.
Instant rollback	Blue/green via ALB target group swap	Or CloudFormation `AutoScalingReplacingUpdate`.
Reduce per-instance boot time	Warm pool with hibernated instances	Pre-warmed JIT and caches.
Share AMI cross-account (encrypted)	Copy with CMK + share AMI + share snapshot + KMS key policy grant	All four steps required.
Distribute AMI to multi-region	Image Builder distribution config	Per-region encryption keys, launch permissions, naming.
Prevent AMI sprawl	Image Builder lifecycle policy	Deprecate → wait → delete sequence.
Hide old AMI from launches but keep instances	Deprecate AMI	Or disable AMI for stronger block.
Block ALL new launches from an AMI	Disable AMI	Stronger than deprecate; the owner cannot launch either.
Bake CloudWatch agent into AMI	Image Builder component `amazon-cloudwatch-agent-linux`	Not user-data.
Update fleet via CloudFormation	`UpdatePolicy: AutoScalingRollingUpdate`	Or `AutoScalingReplacingUpdate` for blue/green.
Pause deployment for canary metrics	Instance refresh `CheckpointPercentages` + `CheckpointDelay`	Resume or cancel based on metrics.
Avoid replacing already-current instances	`SkipMatching: true` on instance refresh	Important when resuming a partial refresh.

Common Traps Recap — AMI, Image Builder, and Deployment

Every SOA-C02 attempt sees two or three of these.

Trap 1: Deregister deletes the AMI completely

It does not. Deregister blocks new launches. Running instances are unaffected. Snapshots survive and cost money until separately deleted.

Trap 2: AWS-managed `aws/ebs` key is shareable

It is not. Cross-account sharing of encrypted AMIs requires a customer-managed CMK and explicit key policy grants on the partner account.

Trap 3: Detailed monitoring means more metrics

Wrong topic but recurring. Memory and disk metrics require the CloudWatch agent baked into the golden AMI; detailed monitoring only changes the period of existing hypervisor metrics.

Trap 4: ASG references launch template version `5` explicitly

New versions are ignored. Use $Latest or $Default so launch template updates flow to new launches.

Trap 5: Instance refresh is automatic on launch template change

It is not. Updating a launch template version only affects new launches. Existing instances stay until you explicitly start an instance refresh.

Trap 6: Lifecycle policy is safe by default

It is not. The lifecycle policy does not check launch template references and can delete an AMI a launch template still pins to. Use $Latest or generous retention.

Trap 7: Rolling deploy is always zero-downtime

It is not. With MinHealthyPercentage below 100 and short InstanceWarmup, a rolling deploy can produce a 5xx spike. Tune both, plus deregistration delay.

Trap 8: Blue/green has no extra cost

It does. The blue and green fleets run simultaneously during the validation window, doubling compute cost briefly.

Trap 9: User data is the right place for agent install

It is not. Image Builder components bake agents into the AMI; user data is for instance-specific runtime configuration only.

Trap 10: CodeDeploy is the SOA deployment answer

It is not. CodeDeploy and CodeBuild are out of scope for SOA-C02 per the exam guide appendix. SOA uses ASG instance refresh, CloudFormation UpdatePolicy, and Image Builder distribution.

FAQ — AMI, EC2 Image Builder, and Deployment Strategies

Q1: What is the difference between deprecating, disabling, and deregistering an AMI?

Deprecate marks the AMI as not-recommended after a date — it stays launchable by explicit ID for the owning account, ASGs continue to launch from it, but it disappears from search results and console listings for other accounts. Disable blocks all new launches including by the owning account; it is a stronger block, used when you need to stop usage immediately without committing to deletion. Deregister removes the AMI metadata entirely — no new launches from this AMI ID by anyone — but running instances are unaffected and the underlying snapshots remain (and cost money) until you delete them separately. Operational sequence for retiring an AMI: deprecate it (signal don't use), wait, disable it (force stop), wait, deregister it (remove), delete snapshots (reclaim cost).

Q2: My EC2 Image Builder pipeline runs successfully but the ASG never picks up the new AMI. What is wrong?

Three links must all be wired. (a) The Image Builder distribution configuration must reference the target launch template — without this, no new launch template version is created when the pipeline succeeds. (b) The ASG's launch template version must be $Latest or $Default, not a literal version number, so new versions flow automatically. (c) The ASG only launches new instances from the new AMI for new launches — existing instances still run the old AMI until you explicitly start an aws autoscaling start-instance-refresh. Check each link in turn. The most common failure is (b) — an ASG created with a fixed version that ignores newer versions.

The encryption permission must be shared in addition to the AMI metadata and snapshot permissions. If the snapshot is encrypted with the AWS-managed aws/ebs key, you cannot share at all — that key is not shareable. The fix is to copy the AMI within the source account specifying a customer-managed CMK as the encryption key, then share the new AMI, share its snapshot, and grant the partner account on the CMK's key policy (kms:Decrypt, kms:DescribeKey, kms:CreateGrant, kms:GenerateDataKey*). All four of those steps are required; missing any one produces an AccessDenied at launch in the partner account.

Q4: What `MinHealthyPercentage` should I use for an ASG instance refresh?

It depends on the application. For routine version bumps on stateless customer-facing apps, 90 (the default) is a balance between speed and capacity. For zero-dip deployments — for example, stateful sessions or apps with cold-start cost — set 100 so new instances launch first and old ones terminate only after replacements are healthy; this requires the ASG MaxSize to be above DesiredCapacity to allow simultaneous old+new running. For internal batch fleets that do not serve customer traffic, 50 accelerates the deploy. For canary-style deployments, combine MinHealthyPercentage with CheckpointPercentages: [10, 100] and CheckpointDelay of 30+ minutes so the refresh pauses at 10 percent for metric review before continuing. The only setting to avoid in production is 0, which is effectively all-at-once.

Q5: When should I choose blue/green over rolling deployment?

Choose blue/green when (a) you need instant rollback — keeping the old fleet alive during validation lets you flip back in seconds; (b) you cannot tolerate any capacity dip during deploy — both fleets are at full capacity throughout; (c) you can absorb the temporary doubling of compute cost; (d) the application supports two parallel fleets (no shared mutable state, or the state layer is external). Choose rolling when (a) routine version bumps for stateless apps, (b) cost matters and double-compute is unacceptable, (c) the fleet size is small enough that even small batches deploy quickly. The exam giveaway: phrases like "must roll back instantly" or "no capacity dip" point to blue/green; "routine deploy" or "minimize cost" point to rolling.

Q6: Why does my rolling deployment cause a brief HTTP 5xx spike on the load balancer?

Three settings interact. (a) MinHealthyPercentage below 100 means old instances terminate before replacements are fully online; raise toward 100 (with MaxSize > DesiredCapacity) for zero-dip rolling. (b) InstanceWarmup too short means a new instance is counted healthy before its application stack is ready (cold JVM, empty caches); extend warmup to exceed health check stabilization and JIT warm. (c) Target group DeregistrationDelay too short causes in-flight requests on the terminating instance to be dropped; increase to match longest expected request duration (30–60 seconds for short HTTP, up to 3600 for WebSocket). Tune all three for clean rolling deploys.

Q7: Where should I install the CloudWatch agent and SSM agent — in the AMI, in user data, or via Run Command?

Bake them into the AMI via EC2 Image Builder components. AWS provides amazon-cloudwatch-agent-linux (and Windows equivalent) and SSM agent is pre-installed on Amazon Linux 2/2023. User-data installs work but are fragile (run once, can fail silently, replay on stop/start) and lengthen boot time. Run Command installs are imperative — they fix instances after the fact but mean every new instance starts unmonitored until Run Command catches up. The SOA-correct pattern is golden AMI: agents installed and configured in the AMI, user-data shrinks to small per-instance config (set role tag, fetch secrets, start app), and every fleet member boots ready to be monitored. This also produces consistent, audit-able fleets.

Three options scale better than per-account sharing. (a) Image Builder distribution config with targetAccountIds specifying all member accounts at once, including per-account launch permissions and KMS keys — declarative and IaC-friendly. (b) AWS Resource Access Manager (RAM) for AMI-as-resource sharing within an organization, but RAM does not handle the KMS key sharing, so you still need key policy grants on the CMK. (c) Centralized AMI account pattern — a dedicated account owns all golden AMIs, distributes via Image Builder, and member accounts reference AMIs by name lookup (Systems Manager Parameter Store with aws:ec2:image data type) rather than ID. The third pattern is the most resilient because launch templates resolve the AMI at launch time from a parameter that points at the latest approved AMI, so retirement and replacement are atomic.

Q9: What happens if I delete the AMI that an ASG launch template references?

The launch template still holds the AMI ID literally. Existing instances continue to run because they were already launched. New launches — whether for scale-out, instance refresh, or replacement of an unhealthy instance — fail with InvalidAMIID.NotFound. The ASG enters a partial-failure state where it cannot meet desired capacity. The fix is to publish a new launch template version pointing at a valid AMI, set the ASG to use $Latest, and start an instance refresh. Mitigation in the first place: use Image Builder lifecycle policies with generous retention windows so an active AMI is never deleted; pin launch templates to $Latest so they always reference the newest AMI; and tag AMIs so lifecycle policies can exclude AMIs marked "in-use".

Q10: Are CodeDeploy and CodeBuild on the SOA-C02 exam?

No. CodeDeploy, CodeBuild, CodeCommit, and CodeStar are explicitly listed as out-of-scope in the SOA-C02 exam guide appendix. SOA-C02 questions about deployment use AWS-native primitives that are in scope: ASG instance refresh, CloudFormation UpdatePolicy: AutoScalingRollingUpdate and AutoScalingReplacingUpdate, EC2 Image Builder pipelines, Systems Manager Automation runbooks, ALB target group routing for blue/green and canary, Route 53 weighted routing, and EventBridge scheduled triggers. If a question's answer choices include CodeDeploy, that choice is almost always wrong on SOA-C02. The DOP-C02 (DevOps Engineer Professional) exam is where CodeDeploy strategies (in-place, blue/green, hooks, deployment groups) are tested in depth.

Once your AMI factory and deployment workflow are running, the next operational layers are: CloudFormation Stacks and StackSets for orchestrating the surrounding infrastructure that the AMI runs in (the second TS 3.1 skill), Systems Manager Automation and Patch Manager for in-place patching of the running fleet between AMI rebakes, EC2 Auto Scaling, ELB, and Multi-AZ HA for the fleet capacity model the AMI supplies, and Scheduled Tasks and Config Auto-Remediation for the EventBridge-driven automation that wires Image Builder pipeline triggers and Config-rule-driven remediation runbooks.