High Availability and Multi-AZ Design

High Availability and Multi-AZ Design is the backbone of every production workload that wants to survive data center failures on AWS. For the SAA-C03 exam, you must be able to walk into any scenario question and instantly decide which AWS services combine to meet a given availability target — whether that is a stateless web tier behind an Application Load Balancer, an EC2 Auto Scaling Group stretched across three Availability Zones, an RDS Multi-AZ standby, an Aurora cluster with cross-AZ replicas, or Route 53 health-checked DNS failover. Solutions architects live and die by High Availability and Multi-AZ Design, because nearly half of Domain 2 questions test whether you can eliminate single points of failure (SPOFs) without overspending.

This page belongs to Domain 2 (Design Resilient Architectures), Task Statement 2.2. It focuses on High Availability and Multi-AZ Design within a single AWS Region. Cross-Region disaster recovery — RTO/RPO targets, pilot light, warm standby, active-active — belongs to the disaster-recovery-strategies sibling topic. For raw database performance characteristics of Aurora or DynamoDB DAX, see high-performing-database-solutions. For API front-door patterns with CloudFront and Global Accelerator, see api-gateway-and-edge.

What Is High Availability and Multi-AZ Design?

High Availability and Multi-AZ Design on AWS means architecting a workload so that the failure of any single component — an instance, a rack, an entire data center, or even a full Availability Zone — does not take the system offline. The canonical High Availability and Multi-AZ Design pattern is to run redundant copies of every tier across two or three Availability Zones, front them with a load balancer or DNS-based router that can detect unhealthy endpoints, and use managed services (RDS Multi-AZ, Aurora, DynamoDB, EFS) that replicate state across AZs automatically.

High Availability (HA) and Fault Tolerance (FT) are related but not identical. HA targets a percentage of uptime — for example 99.95% — and tolerates short interruptions during failover (seconds to minutes). FT is stricter: the system must keep serving with zero user-visible disruption even during a component failure. Multi-AZ RDS is HA (60–120 s failover). Aurora with a cross-AZ replica promoted to writer is HA. DynamoDB is closer to FT because it is natively multi-AZ with no visible failover event. True FT usually requires application-level redundancy, not just infrastructure.

Why High Availability and Multi-AZ Design Matters for SAA-C03

High Availability and Multi-AZ Design appears on the SAA-C03 exam in three recurring shapes:

SPOF elimination — "A company runs a web app on a single EC2 instance with an RDS database in a single AZ. Which architecture changes provide the highest availability with the least code change?"
Service selection — "An application needs sub-millisecond failover without TCP reset. ALB or NLB?"
Trap questions — pitting Multi-AZ RDS (HA) against Read Replicas (read scaling), or ALB (Layer 7) against NLB (Layer 4), or cross-zone load balancing on vs off.

The Four Pillars of High Availability and Multi-AZ Design

Every HA question on SAA-C03 can be reduced to four pillars. Drill these into muscle memory.

Redundancy — more than one of everything that can fail: AZs, instances, NAT gateways, load balancer nodes, database replicas.
Health detection — ELB target health checks, Route 53 health checks, Auto Scaling health checks, RDS Multi-AZ heartbeat.
Automated failover — no human in the loop. RDS Multi-AZ DNS flip, Auto Scaling replacement, Route 53 DNS failover, ALB zonal shift.
Stateless application tier — so any replacement instance can serve any user session without replaying prior state.

High Availability (HA) — a system designed to stay operational through component failures, usually measured as uptime percentage (99.9%, 99.95%, 99.99%). Fault Tolerance (FT) — a stricter property: no user-visible disruption even during a component failure. Availability Zone (AZ) — one or more physically separate data centers inside a Region with independent power, cooling, and networking. Multi-AZ deployment — a pattern where every tier of the workload runs in at least two AZs. Single Point of Failure (SPOF) — any component whose failure can take the whole system down. The entire job of High Availability and Multi-AZ Design is to find and remove SPOFs. Elastic Load Balancer (ELB) — AWS managed load balancer (ALB, NLB, GWLB, classic CLB) that distributes traffic across healthy targets across AZs. Auto Scaling Group (ASG) — a logical group of EC2 instances that Amazon EC2 Auto Scaling maintains at a desired capacity, replacing unhealthy instances automatically.

Source ↗

AWS Global Infrastructure — Regions, AZs, and Local Zones for HA

Every High Availability and Multi-AZ Design starts with the AWS Global Infrastructure footprint. You cannot design Multi-AZ without knowing the AZ model, and you cannot design multi-Region DR without knowing the Region model.

AZs Are the Atomic Fault Isolation Unit

An AWS Availability Zone is one or more physically separate data centers with independent power, cooling, physical security, and network fabric, isolated from other AZs by meaningful physical distance (typically tens of kilometres) but connected to them by AWS-owned low-latency fibre. Synchronous replication between AZs is feasible — round-trip times are usually sub-millisecond — which is why RDS Multi-AZ, Aurora, and EFS can all replicate across AZs without hurting write latency noticeably.

Every AWS Region in High Availability and Multi-AZ Design is guaranteed to have at least three AZs. This three-AZ minimum is what lets you survive a full AZ outage while keeping a quorum in the remaining two AZs (important for databases like Aurora, which use a 4-out-of-6 write quorum across three AZs).

Local Zones and Outposts Extend HA Closer to the User

High Availability and Multi-AZ Design normally stops at the AZ boundary inside a Region, but AWS offers extensions:

Local Zones — metro-area extensions of a parent Region, useful when you need single-digit-ms latency for users in a specific city. A Local Zone is not a full AZ substitute for HA — it is a single failure domain.
AWS Outposts — AWS-managed hardware on your premises. Useful for edge workloads but again should not be counted as an AZ for HA purposes.

For true HA, stretch across two or three in-Region AZs. Treat Local Zones and Outposts as latency tools, not availability tools.

Always design High Availability and Multi-AZ Design across at least two Availability Zones, and preferably three when the service supports it (especially Aurora, RDS Multi-AZ cluster deployment, and any workload that needs quorum-based replication). An architecture that runs in a single AZ is never considered Highly Available regardless of how many instances you pack into it. Source ↗

Multi-AZ Deployment Patterns — EC2, RDS, ElastiCache, EFS

The phrase "Multi-AZ deployment" is not one pattern but a family of per-service patterns. The SAA-C03 exam expects you to know how each major service achieves Multi-AZ.

EC2 Multi-AZ via Auto Scaling Groups

Pure EC2 does not replicate anything. To get Multi-AZ for EC2, you put instances inside an Auto Scaling Group configured with subnets in two or three AZs, and put a load balancer (ALB or NLB) in front. The ASG rebalances instances across AZs automatically. If an AZ goes dark, the ASG spins up replacement instances in the surviving AZs.

RDS Multi-AZ — Synchronous Standby

RDS Multi-AZ instance deployment creates a primary in one AZ and a synchronous standby in another AZ. Writes are replicated synchronously. The standby is not readable and does not serve traffic — it exists only to take over during failover. Failover is triggered by host failure, AZ failure, storage failure, or manual reboot with failover, and typically completes in 60–120 seconds via a DNS CNAME flip.

RDS Multi-AZ also supports a newer Multi-AZ DB cluster deployment (available for MySQL and PostgreSQL) that uses two readable standbys in two additional AZs and offers failover typically under 35 seconds.

Aurora Multi-AZ — Distributed Storage Layer

Aurora decouples compute from storage. The storage volume for an Aurora DB cluster is replicated across six copies in three AZs by default, with a 4-of-6 write quorum and 3-of-6 read quorum. Compute nodes (writer + up to 15 readers) can be placed in any AZ, and failover promotes a reader to writer typically in under 30 seconds.

ElastiCache Multi-AZ

ElastiCache Redis supports Multi-AZ with automatic failover when you enable Multi-AZ on a replication group with at least one replica in a different AZ. Memcached does not support cross-AZ replication — it only supports auto-discovery of sharded nodes.

EFS Multi-AZ by Default

Amazon EFS Standard stores files redundantly across multiple AZs in a Region by default. You simply mount the file system from instances in any AZ via mount targets per AZ. No configuration needed — EFS is Multi-AZ out of the box. EFS One Zone is the cheaper single-AZ tier and is not Multi-AZ.

RDS Multi-AZ (single standby) — 60–120 seconds typical failover.
RDS Multi-AZ DB cluster (two readable standbys) — typically under 35 seconds.
Aurora failover — typically under 30 seconds; with a reader in the target AZ it can be much faster.
Auto Scaling instance replacement — depends on AMI boot time, usually 2–5 minutes.
Route 53 health check interval — 30 s standard or 10 s fast; failover decision typically 3 consecutive failures.
ALB zonal shift — manual, near-immediate DNS removal of a single AZ.

Source ↗

Elastic Load Balancing — ALB vs NLB vs GWLB vs CLB

Elastic Load Balancing is the front door of almost every High Availability and Multi-AZ Design. SAA-C03 asks you to pick among four ELB types. Get the decision tree into muscle memory.

Application Load Balancer (ALB) — Layer 7

ALB operates at HTTP/HTTPS (Layer 7). It can inspect path, host header, query string, cookies, and HTTP methods to route traffic. Target types include instance, IP, Lambda function, and other ALBs.

Key ALB features:

Host-based and path-based routing — route /api/* to one target group and /static/* to another.
Native HTTPS and TLS termination with ACM.
WebSocket and HTTP/2 support.
Sticky sessions (application-based or duration-based cookies).
Target health checks per target group.
Integration with AWS WAF for Layer 7 protection.
Lambda as target for serverless HTTP APIs.

Use ALB when the workload is HTTP/HTTPS and you want smart routing.

Network Load Balancer (NLB) — Layer 4

NLB operates at TCP / UDP / TLS (Layer 4). It is built for extreme performance — millions of requests per second, ultra-low latency, and static IP addresses per AZ (plus optional Elastic IP).

Key NLB features:

Static IPs per AZ — useful for whitelisting by IP.
Preserves source IP by default (no X-Forwarded-For needed).
TLS termination at the NLB (PassThrough or Terminated modes).
UDP support — the only ELB that handles UDP, required for gaming, VoIP, SIP, and DNS workloads.
Sub-millisecond latency.

Use NLB when you need raw TCP/UDP performance, static IPs, or UDP protocol support.

Gateway Load Balancer (GWLB) — Layer 3

Gateway Load Balancer operates at IP (Layer 3) and is designed for deploying, scaling, and managing third-party virtual network appliances such as firewalls, intrusion detection systems, and deep packet inspection tools. GWLB uses the GENEVE protocol on port 6081 to tunnel traffic transparently through fleets of appliances.

Use GWLB only for network security appliance insertion patterns. It is not a general application load balancer.

Classic Load Balancer (CLB) — Legacy

CLB is the original ELB from 2009, supporting both Layer 4 and basic Layer 7 for EC2-Classic and simple use cases. AWS now treats it as legacy and recommends ALB or NLB for all new workloads. CLB does appear on SAA-C03 but only as a distractor — rarely the correct answer.

Target Groups

All modern ELBs route to target groups, not directly to targets. A target group is a logical set of targets with a shared health check configuration. A single ALB can route to many target groups based on rules. A single target group can be shared across multiple listeners.

Target types:

Instance — register EC2 instance IDs. Auto Scaling can manage registration automatically.
IP — register arbitrary IPs, including on-premises via Direct Connect or peered VPCs.
Lambda (ALB only) — invoke a Lambda function per request.
ALB as target (NLB only) — put an NLB in front of an ALB for static IP plus Layer 7 routing.

Health Checks

Every ELB continuously probes its targets. Configurable knobs: protocol (HTTP/HTTPS/TCP), path, port, healthy threshold, unhealthy threshold, timeout, interval, success codes. A target is marked unhealthy after N consecutive failures and removed from rotation; it is marked healthy again after N consecutive successes.

Cross-Zone Load Balancing

Without cross-zone load balancing, each AZ-local ELB node only distributes traffic to targets in its own AZ. With cross-zone load balancing on, each node distributes to targets in every AZ.

ALB — cross-zone load balancing is always on and free.
NLB — cross-zone is off by default and, when enabled, generates inter-AZ data transfer charges.
GWLB — cross-zone is off by default.
CLB — cross-zone is off by default.

Scenario: "The application uses HTTP and needs path-based routing, TLS termination, and WebSocket support." → ALB. Do not pick NLB just because "NLB is faster." NLB cannot do path-based routing.

Scenario: "The application is a gaming server using UDP and needs static IPs for firewall whitelisting." → NLB. UDP support and static IPs are NLB-only.

Scenario: "The application needs to insert a third-party firewall appliance transparently." → GWLB.

Source ↗

EC2 Auto Scaling Groups — Launch Templates, Policies, Hooks

An Auto Scaling Group (ASG) is the AWS mechanism that keeps your EC2 fleet alive and right-sized inside High Availability and Multi-AZ Design. Master the five moving parts.

Launch Templates vs Launch Configurations

The ASG needs a blueprint for new instances. Two options exist, but only one is recommended.

Launch Templates — the modern, versioned blueprint. Supports all EC2 features (Spot mixing, T2/T3 Unlimited, Nitro Enclaves, capacity reservations, multiple instance types per ASG). Use launch templates.
Launch Configurations — the legacy, non-versioned blueprint. Still supported but lacks newer features. AWS recommends migrating off.

A launch template has versions ($LATEST, $DEFAULT, or a specific number) so you can roll forward and roll back AMI updates.

Scaling Policy Types

The ASG responds to load using one or more scaling policies. SAA-C03 tests four types.

Target Tracking Scaling

You specify a target metric value — for example "keep average CPU at 50%" — and Auto Scaling automatically computes the instances needed to hit that target. It creates CloudWatch alarms behind the scenes. Target tracking is the default recommendation for most workloads because it is the simplest to reason about.

Step Scaling

Based on CloudWatch alarms with multiple steps. For example: "If CPU > 70%, add 1 instance. If CPU > 85%, add 3 instances. If CPU > 95%, add 5 instances." More reactive than target tracking for spiky workloads, and gives you control over the magnitude of scaling action per breach level.

Simple Scaling

One CloudWatch alarm triggers one scaling adjustment. After the adjustment, a cooldown period blocks further scaling actions until it expires. This is the oldest form and generally superseded by step scaling.

Predictive Scaling

Uses machine learning to forecast traffic based on historical CloudWatch data (minimum 24 hours of data, ideally 14 days). It proactively launches capacity ahead of expected demand, avoiding the lag inherent in reactive scaling. Predictive is ideal for workloads with daily or weekly seasonality (for example a retail site with a lunch-time traffic spike).

Scheduled Scaling

Scales on a cron-like schedule. Example: "At 09:00 on weekdays, set desired capacity to 20; at 19:00, set desired capacity to 5." Useful when you know the pattern rather than having Auto Scaling learn it.

Lifecycle Hooks

Lifecycle hooks let you pause an instance during its launch or termination transition so that a custom action can run. There are two hook types:

autoscaling:EC2_INSTANCE_LAUNCHING — pause after instance launch, before putting it into service. Use to install software, run bootstrap scripts, or pre-warm caches.
autoscaling:EC2_INSTANCE_TERMINATING — pause before terminating. Use to drain connections, upload final logs, or complete in-flight transactions.

Hooks put the instance into Pending:Wait or Terminating:Wait state for up to 48 hours by default. Your script calls complete-lifecycle-action to release the hook (or record-lifecycle-action-heartbeat to extend the timeout).

Health Checks and Instance Replacement

An ASG checks each instance via two signals:

EC2 status checks — default health source; failed instance → replaced.
ELB health checks — opt-in via health-check-type ELB; an instance failing ELB target health is replaced by the ASG even if EC2 status is still OK. This is essential so that a hung application (process stuck but OS alive) gets replaced.

Combine EC2 and ELB health checks for robust High Availability and Multi-AZ Design.

AZ Balance and Rebalance

An ASG automatically balances instances across its configured AZs. If one AZ temporarily cannot launch capacity (for example capacity shortage), the ASG concentrates in the others and rebalances once capacity returns.

Warm Pools

A warm pool is a pool of pre-initialized instances kept in Stopped state. When scale-out happens, the ASG promotes a warm instance to InService rather than launching from scratch — bringing startup time from minutes down to seconds. Useful for workloads with long AMI boot times.

Unpredictable but smooth load → target tracking on CPU, request count, or custom metric. Spiky load with known magnitudes → step scaling for differentiated response. Known schedule (business hours, marketing campaign) → scheduled scaling. Repeatable daily/weekly pattern → predictive scaling layered on top of target tracking. Slow instance boot → add a warm pool. Need to run bootstrap before serving → add a launch lifecycle hook. Need to drain before killing → add a terminate lifecycle hook.

Source ↗

RDS Multi-AZ vs Read Replicas — The Most Tested Distinction

This is the single most frequently tested boundary in the entire High Availability and Multi-AZ Design topic on SAA-C03. Memorize it exactly.

RDS Multi-AZ — for Availability

Purpose — survive an AZ-level failure.
Replication — synchronous from primary to standby.
Readable — the standby is not readable by applications. It only exists to take over.
Failover — automatic via DNS CNAME flip on primary failure, host issue, network partition, or manual reboot with failover.
Failover time — 60–120 seconds for single-standby deployments; ~35 seconds for Multi-AZ DB cluster with two readable standbys (MySQL/PostgreSQL).
Cost — roughly double the single-AZ price (you pay for the standby compute and storage).
Where the standby lives — same Region, different AZ. Not cross-Region.

RDS Read Replica — for Scaling Reads

Purpose — offload read-heavy workloads from the primary.
Replication — asynchronous, so replicas can lag slightly behind the primary.
Readable — yes, applications connect to replicas for SELECT queries. You get a separate endpoint per replica.
Failover — no automatic failover. You can manually promote a replica to become a standalone primary (this breaks the replication link).
Cross-Region — read replicas can live in a different Region (MySQL, MariaDB, PostgreSQL, Oracle). Multi-AZ standby cannot.
Number — up to 5 read replicas per source RDS instance (15 for Aurora, across AZs and Regions).

Can You Combine Them?

Yes. A common High Availability and Multi-AZ Design pattern is: primary in AZ-a (Multi-AZ enabled, so synchronous standby in AZ-b), plus two asynchronous read replicas — one in AZ-c and one in a different Region for cross-Region DR. You get HA and read scaling simultaneously.

If the exam scenario emphasizes availability, failover, survive AZ outage, automatic recovery → pick RDS Multi-AZ. If the scenario emphasizes offload read queries, scale read traffic, reporting, analytics workload → pick RDS Read Replica. If the scenario emphasizes both HA and read scale → pick both (Multi-AZ primary + read replicas). If the scenario needs a DR site in another Region → pick cross-Region Read Replica (or Aurora Global Database). Source ↗

Amazon Aurora DB Cluster — Storage-Compute Separation

Aurora is AWS's purpose-built cloud-native relational database engine, compatible with MySQL and PostgreSQL. It changes the game for High Availability and Multi-AZ Design by decoupling compute from storage.

Aurora Storage Architecture

Storage volume — a single distributed volume that automatically grows in 10 GB segments, up to 128 TB.
Six replicas across three AZs — two copies per AZ.
Write quorum — 4 of 6 must acknowledge a write.
Read quorum — 3 of 6 must be available to serve a read.
Self-healing — Aurora continuously scrubs data blocks, repairing any that fail checksum from the other replicas.

This architecture lets Aurora tolerate the loss of an entire AZ (two of the six copies) without data loss or write-path interruption.

Aurora Compute Architecture

Writer instance — one primary DB instance.
Reader instances — up to 15 Aurora replicas, all reading from the same shared storage volume with replication lag typically under 100 ms.
Failover — when the writer fails, Aurora promotes a reader to writer. Typical failover time is under 30 seconds, and if a reader already exists in the target AZ, it can be much faster.
Reader endpoint — a single DNS name that load-balances across all reader instances.

Aurora Serverless v2

Aurora Serverless v2 auto-scales compute in fine-grained increments (ACUs) between 0.5 and 128 ACU, adjusting within seconds. Ideal for workloads with unpredictable traffic, infrequent access, or development/test environments. Still provides the same Multi-AZ storage guarantees.

Aurora Global Database

For cross-Region HA, Aurora Global Database replicates from a primary Region to up to 5 secondary Regions with typical replication lag under 1 second. RTO is measured in minutes for managed failover and under 1 minute for unplanned failover. This is the strongest cross-Region DR option for relational databases on AWS, and it belongs in the disaster-recovery-strategies topic for detailed treatment.

Aurora vs RDS — When to Choose Each

Choose Aurora when you want:

Faster failover (under 30 s vs 60–120 s for RDS Multi-AZ).
More read replicas (15 vs 5).
Storage that auto-grows without capacity planning.
Better performance — Aurora MySQL is often marketed at ~5x standard MySQL throughput; PostgreSQL at ~3x.
Cross-Region DR via Global Database.

Choose RDS (non-Aurora) when you need:

Engines Aurora does not offer (SQL Server, Oracle, MariaDB).
Lower cost for small workloads.
Simpler operational model.

RDS Proxy — Connection Pooling for Serverless Workloads

Lambda functions and serverless apps create a new database connection per invocation. At scale, this overwhelms RDS, because each connection consumes primary memory and CPU. RDS Proxy sits between the client and the database, maintaining a pool of long-lived database connections and multiplexing incoming client connections onto them.

What RDS Proxy Provides

Connection pooling — dramatically reduces the load on the primary database from high-concurrency clients.
Faster failover — during RDS Multi-AZ failover, RDS Proxy can reduce failover time by up to 66% because it absorbs the DNS flip on behalf of clients.
IAM authentication — proxy can enforce IAM-based DB authentication.
Secrets Manager integration — credentials stored and rotated automatically.

Use RDS Proxy whenever Lambda or another high-concurrency client connects to RDS / Aurora. It is a High Availability and Multi-AZ Design best practice, not just a performance tool.

DynamoDB Multi-AZ and Global Tables

Amazon DynamoDB is natively Multi-AZ inside a Region. You do not configure Multi-AZ — every DynamoDB table synchronously replicates writes to three copies across three AZs, and reads by default are eventually consistent but can be upgraded to strongly consistent on request.

DynamoDB Global Tables — Multi-Region Active-Active

Global Tables replicate a DynamoDB table across multiple Regions, all of them writable (active-active multi-master). Replication is asynchronous with typical lag under one second. Last-writer-wins conflict resolution is used when two Regions write to the same item concurrently.

Use Global Tables for:

Globally distributed apps with low-latency writes from every Region.
Disaster recovery for critical session or metadata stores.
Compliance setups that need data present in multiple jurisdictions.

DynamoDB Global Tables is the NoSQL counterpart to Aurora Global Database in the High Availability and Multi-AZ Design toolbox.

Route 53 — Health Checks and Failover Routing

Route 53 is the DNS front door that ties everything together. It adds availability above the ELB layer by steering traffic away from failed endpoints or entire Regions.

Route 53 Health Checks

A Route 53 health check continuously probes an endpoint (public IP, ALB, or any URL) or evaluates the state of other health checks (calculated health check) or a CloudWatch alarm. Three types:

Endpoint health checks — directly probe an IP or domain on TCP, HTTP, or HTTPS. Intervals of 30 s (standard) or 10 s (fast). Health is decided by N consecutive successes or failures.
Calculated health checks — boolean combinations of child health checks (e.g., "Endpoint A is healthy AND at least 2 of 3 child checks pass").
CloudWatch alarm health checks — treat a CloudWatch alarm state as the signal. Useful when health cannot be determined by probing (e.g., internal queue depth).

Route 53 Routing Policies

Seven routing policies power failover and traffic steering. For SAA-C03, five are exam-critical.

Simple Routing

One record pointing at one or more values (no health checks, no intelligence). If you return multiple IPs, the client picks one.

Failover Routing

A primary and secondary record, each associated with a health check. Route 53 returns the primary when it is healthy and fails over to the secondary when it is not. Classic active-passive DR pattern.

Latency Routing

Returns the record that has the lowest measured network latency to the querying resolver. Great for multi-Region active-active where users should be served by the nearest Region.

Weighted Routing

Splits traffic by configurable weights — e.g., 90/10 between prod and canary. Used for blue/green deploys and A/B tests. Can be combined with health checks so a zero-weight record is skipped automatically.

Geolocation Routing

Returns a record based on the geographic location of the DNS resolver. Used for content localization or compliance (e.g., EU users must only hit EU Regions).

Geoproximity Routing

Similar to geolocation but based on coordinates and a bias parameter to shift traffic toward or away from a Region.

Multi-Value Answer Routing

Returns up to 8 healthy records at random, each associated with an optional health check. A lightweight alternative to an ELB for simple client-side load spread across multiple IPs. Unlike simple routing, multi-value answer respects health checks.

Combining Routing Policies for HA

Real HA designs often nest routing policies:

Primary region latency routing → failover to secondary region — latency-based routing for normal traffic, with a failover backup record pointing at a DR Region.
Weighted across Regions with health checks — each Region's weight is set, and unhealthy Regions are skipped automatically.
Geolocation with failover — serve EU customers from eu-central-1, but fail over to eu-west-1 if the primary is down.

"Send traffic to the closest AWS Region for the lowest latency" → latency routing (not geolocation). "Direct users in Germany to the EU Region for GDPR compliance" → geolocation routing. "Blue/green deploy with 95/5 split" → weighted routing. "Active-passive DR with automated DNS failover" → failover routing. "Return several healthy IPs so the client can pick one" → multi-value answer routing. Source ↗

Eliminating Single Points of Failure — Redundancy Patterns

The discipline of High Availability and Multi-AZ Design is fundamentally the discipline of SPOF elimination. Walk every tier and ask: "What happens if this exact thing fails?" If the answer is "everything goes down," that's a SPOF.

Classic SPOFs and Their Fixes

Single EC2 instance → ASG across 2-3 AZs behind an ALB.
Single AZ database → RDS Multi-AZ or Aurora cluster.
Single NAT Gateway → one NAT Gateway per AZ, each in its own AZ's public subnet, with subnet-specific route tables.
Single Internet Gateway → you cannot have more than one IGW per VPC, but IGW is regional and inherently HA; no action needed.
Single Direct Connect link → pair with a Site-to-Site VPN as backup, or provision a second Direct Connect through a different location.
Single Region → cross-Region DR (see disaster-recovery-strategies).
Session state stored on one EC2 instance → externalize to DynamoDB, ElastiCache, or RDS so any instance can serve any user.
File uploads on instance local disk → externalize to S3 or EFS.

NAT Gateway HA Pattern

One common question on SAA-C03: where to place NAT Gateways.

Wrong — one NAT Gateway in a single public subnet, with all private subnets (across three AZs) routing through it. This creates a SPOF: if that AZ fails, all outbound internet is dead.
Right — one NAT Gateway per AZ, in each AZ's public subnet, with each private subnet routing to its own AZ's NAT Gateway. Costs more but eliminates the SPOF.

Stateless vs Stateful in High Availability and Multi-AZ Design

The single most important principle in High Availability and Multi-AZ Design is: make the application tier stateless. A stateless tier can be scaled, replaced, and load-balanced freely without needing to coordinate state.

What Makes an Application Tier Stateless

No server-side session stored on the instance. Session lives in DynamoDB, ElastiCache, or a signed cookie.
No user-uploaded files on instance disk. Files go to S3 or EFS.
No in-memory caches that cannot be rebuilt on demand. Use ElastiCache for shared caches.
Database connections are lightweight or pooled via RDS Proxy.
Configuration comes from Parameter Store, Secrets Manager, or environment variables, not baked into an instance.

What Remains Stateful and How to HA It

Some state is irreducible — the database, the file store, the queue. These stateful services are the ones you HA via AWS managed services:

Database — RDS Multi-AZ, Aurora, DynamoDB, ElastiCache Multi-AZ.
Files — S3 (natively 11-nines durable, multi-AZ), EFS (multi-AZ), FSx with Multi-AZ deployment.
Queues — SQS (managed, Multi-AZ).
Streaming — Kinesis (managed, Multi-AZ).

If your application tier holds user state on an instance, it is neither horizontally scalable nor highly available — replacing an instance means losing a user's session. Externalize session state to a shared backing store before claiming High Availability and Multi-AZ Design. This is the single most common architectural defect in SAA-C03 scenarios. Source ↗

Immutable Infrastructure — Blue/Green and Rolling Updates

Deployments are a hidden availability risk. A bad release can take down the system as reliably as an AZ outage. High Availability and Multi-AZ Design therefore includes deployment patterns.

Immutable Infrastructure Principle

Instead of patching running instances in place (which drifts configurations and risks breakage), you bake a new AMI or container image for every change and replace instances. Once deployed, an instance is never mutated — it is either running or replaced.

Blue/Green Deployment

Blue — the currently live production environment.
Green — a second, parallel environment with the new version.
Switch — flip traffic from Blue to Green via Route 53 weighted records, ALB target group swap, or DNS CNAME change. If Green misbehaves, roll back by flipping traffic back to Blue.

Blue/green gives near-zero-downtime deployments and fast rollback at the cost of running double capacity during the cutover window.

Rolling Update

Replace instances in batches within the same ASG. Each batch launches new instances, waits for them to become healthy behind the ELB, then terminates the old ones. Cheaper than blue/green but has a slower rollback path.

CodeDeploy and ASG Refresh

AWS CodeDeploy supports both blue/green and rolling patterns. EC2 Auto Scaling offers Instance Refresh to replace instances in a controlled wave matching the launch template's current version.

Monitoring High Availability — CloudWatch, X-Ray, and Route 53

You cannot defend availability you cannot measure. High Availability and Multi-AZ Design includes the monitoring stack.

Amazon CloudWatch

Metrics — per-instance, per-ELB, per-ASG, per-RDS metrics.
Alarms — trigger SNS notifications, Lambda remediations, or Auto Scaling actions.
Composite alarms — combine multiple alarms with AND/OR logic for cleaner alerting.
CloudWatch Synthetics — canaries (headless browsers) that continuously probe user-facing endpoints.
CloudWatch Logs Insights — query logs in real time.

AWS X-Ray

Distributed tracing across microservices. Each request is tagged with a trace ID, and each service reports its spans. X-Ray service maps visualize where latency and errors originate — essential when diagnosing an HA architecture with many moving parts.

Route 53 Health Check Metrics

Each Route 53 health check is itself a CloudWatch metric. Alarm on "health check failed" to alert ops teams even when DNS has already failed over — failover that happens silently is still a failure worth investigating.

Service Quotas and Throttling — HA in Standby Environments

Even if your architecture is perfect, a hidden quota can sabotage availability during a regional failover. Examples:

ENI limits per subnet — scale-out may hit subnet ENI quota.
Elastic IP limits per Region — each Region has a soft quota (5 by default).
RDS instance quota — may block promoting a DR replica.
ALB target group limits — number of targets per group.
Auto Scaling Group max size — raise before failover testing.

Before declaring a DR Region ready, request quota increases for the services you'll need to scale up. Quota increases are approved asynchronously — do it before the disaster, not during.

Key Numbers and Must-Memorize Facts for High Availability and Multi-AZ Design

Minimum AZs per Region — 3.
RDS Multi-AZ failover time — 60–120 seconds (single-standby); under 35 s for Multi-AZ DB cluster.
Aurora failover time — typically under 30 seconds.
RDS read replicas per source — up to 5 (RDS), up to 15 (Aurora).
Aurora storage copies — 6 copies across 3 AZs (4-of-6 write quorum, 3-of-6 read quorum).
Route 53 health check intervals — 30 s standard or 10 s fast.
Route 53 multi-value answer — returns up to 8 healthy records.
ALB cross-zone — always on, free.
NLB cross-zone — off by default, inter-AZ charges apply when enabled.
ASG cooldown — default 300 seconds for simple scaling.
Lifecycle hook default timeout — 1 hour; max 48 hours via heartbeat extension.
DynamoDB replication — synchronous across 3 AZs within a Region.
S3 Standard durability — 11 nines across ≥3 AZs.
EFS Standard — Multi-AZ by default; EFS One Zone is single-AZ only.

Source ↗

Common Exam Traps in High Availability and Multi-AZ Design

Trap 1 — Multi-AZ vs Read Replica

The single most tested confusion. Multi-AZ = HA, synchronous, non-readable, auto-failover. Read Replica = read scaling, asynchronous, readable, manual promotion. Read the scenario stem carefully: "survive AZ outage" → Multi-AZ; "offload read traffic" → Read Replica.

Trap 2 — ALB vs NLB Layer Decision

ALB does Layer 7 (HTTP features like path routing, headers, WAF). NLB does Layer 4 (TCP/UDP, static IPs, low latency). The moment the scenario mentions UDP, static IP, or source-IP preservation without X-Forwarded-For → NLB. The moment it mentions path-based routing, host-based routing, or WebSocket → ALB.

Trap 3 — Cross-Zone Load Balancing

ALB cross-zone is always on and free. NLB cross-zone is off by default and charges inter-AZ data transfer when turned on. A scenario asking for "the most cost-effective way to keep traffic within the AZ it entered" points to NLB with cross-zone off.

Trap 4 — Single NAT Gateway SPOF

Questions sometimes describe "a NAT Gateway in the public subnet routing all private traffic." If all AZs share one NAT, that AZ is a SPOF. Correct answer: one NAT Gateway per AZ.

Trap 5 — Placing Session State on EC2

Any scenario where "stickiness solves the problem" is usually a misdirection. Stickiness (session affinity on ALB) can help, but the correct architectural answer is to externalize session state to DynamoDB or ElastiCache so stickiness becomes unnecessary.

Trap 6 — Route 53 Latency vs Geolocation

Latency routing = fastest network path. Geolocation routing = physical location of the user. If the question says "route based on where the user actually is" → geolocation. If it says "route based on network performance" → latency.

Trap 7 — Auto Scaling Termination Policies

Default termination policy is to balance across AZs first, then delete the oldest launch configuration, then close to the next billing hour. Scenarios that want "terminate the oldest instances first" call for an explicit termination policy of OldestInstance.

Trap 8 — RDS Multi-AZ Is Not Cross-Region

A common distractor: "RDS Multi-AZ provides disaster recovery across AWS Regions." This is wrong — Multi-AZ is same-Region, multi-AZ only. Cross-Region requires a Read Replica in another Region, an Aurora Global Database, or an AWS Backup cross-Region copy.

High Availability and Multi-AZ Design vs Disaster Recovery — Scope Boundary

This topic — High Availability and Multi-AZ Design — covers resilience inside a single Region. The sibling disaster-recovery-strategies topic covers resilience across Regions.

2.2 HA (this page) — Multi-AZ, ELB, ASG, RDS Multi-AZ, Aurora, Route 53 health checks, SPOF elimination.
2.2 DR (sibling) — RPO/RTO targets, backup-and-restore, pilot light, warm standby, active-active multi-Region, S3 CRR, Aurora Global Database, AWS Elastic Disaster Recovery (DRS).

When a question says "survive an AZ outage" or "99.95% uptime inside one Region" → HA topic. When a question says "survive a full Region outage," mentions RPO/RTO, or describes cross-Region replication → DR topic.

Practice Question Links — Task 2.2 HA Exercises

Use the quiz engine to drill Task 2.2 questions that map to High Availability and Multi-AZ Design concepts:

"A web application runs on a single EC2 instance and an RDS database in one AZ. Which architectural changes provide the highest availability with minimal code changes?" — targets SPOF elimination via ASG + Multi-AZ.
"An application must route requests to different microservices based on URL path." — ALB path-based routing.
"A gaming backend serves UDP traffic and needs static IPs for firewall whitelisting." — NLB.
"Reads are overwhelming the primary database; how to offload?" — Read Replica.
"The database must survive an AZ failure with automatic failover." — RDS Multi-AZ.
"Traffic should go to the Region with the lowest latency for each user." — Route 53 latency routing.
"Deployments need zero downtime and fast rollback." — blue/green via weighted routing or ALB target group swap.
"Lambda invocations are exhausting RDS connections." — RDS Proxy.

FAQ — High Availability and Multi-AZ Design Top Questions

Q1. What is the difference between High Availability and Fault Tolerance on AWS?

High Availability (HA) is a property of a system that remains operational through most component failures, typically expressed as an uptime SLA (99.9%, 99.95%, 99.99%). HA allows brief interruptions during failover — RDS Multi-AZ taking 60–120 seconds to promote the standby is a classic example of HA. Fault Tolerance (FT) is stricter: the system must continue serving with zero user-visible interruption even during a failure. Achieving FT typically requires redundant live-live components with automatic load distribution, such as DynamoDB's multi-AZ write path or an ALB with healthy targets in every AZ. For the SAA-C03 exam, most scenarios ask for HA; when the question insists on "no disruption" or "zero downtime," you are in FT territory.

Q2. When should I use RDS Multi-AZ versus RDS Read Replicas in High Availability and Multi-AZ Design?

Use RDS Multi-AZ when the goal is availability — surviving the failure of an AZ or a host with automatic failover. The standby is synchronous and cannot be read by applications. Use RDS Read Replicas when the goal is read scaling — offloading SELECT queries from the primary. Replicas are asynchronous, readable, and must be manually promoted if you want them to become a primary. Both patterns can coexist: a Multi-AZ primary handles writes and HA, and three read replicas handle reports and analytics. The exam tests this boundary constantly — if the scenario says "survive AZ failure" pick Multi-AZ; if it says "scale read traffic" pick Read Replicas; if it says "cross-Region DR" pick cross-Region Read Replica or Aurora Global Database.

Q3. How do I choose between Application Load Balancer and Network Load Balancer?

Choose Application Load Balancer when the workload is HTTP or HTTPS and you want Layer 7 features: path-based routing, host-based routing, HTTP header matching, native WebSocket, HTTP/2, sticky sessions with cookies, TLS termination with ACM, AWS WAF integration, or Lambda as a target. Choose Network Load Balancer when the workload is TCP or UDP and you need Layer 4 speed: static IP addresses per AZ, Elastic IP, sub-millisecond latency, source IP preservation without X-Forwarded-For, or millions of requests per second. Choose Gateway Load Balancer only for inserting third-party network appliances transparently via the GENEVE protocol. Choose Classic Load Balancer essentially never for new designs — it is legacy.

Q4. How many AZs should an Auto Scaling Group span?

At least two AZs for any production workload; three AZs is the modern best practice because it matches the minimum AZ count of every AWS Region and gives you quorum-style resilience. Spanning three AZs means that losing one AZ still leaves two-thirds of capacity serving traffic. For latency-sensitive or stateful workloads (for example Kafka clusters on EC2), three AZs is required. For stateless web tiers behind an ALB, two AZs is acceptable but three is safer and only marginally more expensive. Always pair the ASG with a load balancer whose target group spans the same AZs so traffic distribution matches instance placement.

Q5. What does it mean for an application to be stateless, and why is that required for High Availability and Multi-AZ Design?

A stateless application tier stores no session-specific or user-specific data on the local instance. All user session state lives in a shared backing store — DynamoDB for low-latency key-value sessions, ElastiCache for in-memory sessions, or a signed cookie for small payloads. File uploads go to S3 or EFS rather than the instance's local disk. Statelessness is required because Auto Scaling, load balancing, and instance replacement only work when any replacement instance can pick up any user's request without needing to replay prior state. If state lives on a specific instance, terminating that instance means losing user data — which defeats the entire High Availability and Multi-AZ Design exercise. Stickiness on the ALB is sometimes used as a crutch, but the proper answer is to externalize state.

Q6. How does Route 53 failover routing actually detect a failure and swap DNS?

Route 53 associates each record (for example the primary A record) with a health check. The health check polls the endpoint on an interval (30 s standard or 10 s fast) over HTTP, HTTPS, or TCP. After a configurable number of consecutive failures (typically 3), the health check is marked unhealthy. Route 53 then stops returning the primary record in DNS responses and begins returning the secondary record. The swap happens at DNS resolution time, so clients with cached DNS entries (remember TTL) see the new record only after their cache expires. Set a short TTL (30–60 seconds) on failover records so clients pick up changes quickly. Also remember that Route 53 health checks live on the Edge layer of AWS Global Infrastructure, not inside your Region, which means they are themselves highly available and do not depend on the Region being healthy.

Q7. Does DynamoDB need Multi-AZ configuration like RDS?

No. DynamoDB is natively multi-AZ inside a Region: every write is synchronously replicated to three copies across three AZs before the write is acknowledged. There is nothing to enable, and there is no visible failover event — DynamoDB's availability is a property of the service itself. For cross-Region replication, enable DynamoDB Global Tables, which asynchronously replicate the table to one or more secondary Regions, all of them writable. Global Tables give you multi-Region active-active with last-writer-wins conflict resolution. This is the NoSQL analog of Aurora Global Database and a common High Availability and Multi-AZ Design building block for globally distributed applications.

Q8. Why is a single NAT Gateway a single point of failure in High Availability and Multi-AZ Design?

A NAT Gateway is itself a highly available service within its own AZ — AWS replaces failing NAT nodes automatically. But a NAT Gateway lives in exactly one AZ. If private subnets in three AZs all route their outbound internet traffic through one NAT Gateway in, say, AZ-a, and AZ-a has a power event, the entire workload loses outbound internet — not because EC2 or RDS failed, but because the NAT dependency collapsed. The fix is to deploy one NAT Gateway per AZ, put it in that AZ's public subnet, and configure each private subnet's route table to point 0.0.0.0/0 at its own AZ's NAT Gateway. Costs more but eliminates the SPOF. This is one of the most commonly missed design flaws on the SAA-C03 exam.