examhub .cc The most efficient path to the most valuable certifications.
In this note ≈ 31 min

High-Performing and Scalable Storage Solutions

6,100 words · ≈ 31 min read

Choosing high-performing scalable storage solutions is the single fattest task statement inside SAA-C03 Domain 3, and it is also the domain where most candidates bleed points. The exam does not ask "what is S3" — it asks things like "which scalable storage solutions combination meets 200 GB/s throughput for a genomics pipeline while the working set is staged from S3" or "pick the scalable storage solutions architecture that lets 5,000 EC2 instances share a POSIX file system with sub-millisecond latency across three AZs." You must know the scalable storage solutions matrix cold: Amazon S3, Amazon EBS, Amazon EFS, the four Amazon FSx flavors, and AWS Storage Gateway, plus the performance levers (IOPS, throughput, latency, protocol, concurrent access) that turn a good answer into the right answer.

This study note walks the full scalable storage solutions portfolio with enough repetition on the core keywords (scalable storage solutions, S3 storage classes, S3 Intelligent-Tiering, EBS volume types, EFS performance modes, FSx for Lustre) that the right scalable storage solutions fall out of any SAA-C03 scenario within twenty seconds of reading the stem. By the end you will have a durable mental model for picking scalable storage solutions under real exam pressure.

What Are Scalable Storage Solutions on AWS?

Scalable storage solutions on AWS are managed storage services whose capacity, throughput, IOPS, and concurrent-client count grow independently of the underlying compute — without you ever touching a disk or a filer. A scalable storage solutions portfolio covers five families:

  • Object storage at exabyte scale — Amazon S3 with seven S3 storage classes, multipart upload, Transfer Acceleration, and versioning.
  • Block storage per EC2 instance — Amazon EBS with six volume types spanning SSD (gp3, gp2, io2 Block Express, io1) and HDD (st1, sc1).
  • Shared file storage across many clients — Amazon EFS for NFSv4.1 Linux workloads, with Bursting / Provisioned / Elastic throughput and General Purpose / Max I/O performance modes.
  • Managed specialized file systems — Amazon FSx for Windows File Server (SMB/NTFS/AD), Amazon FSx for Lustre (HPC), Amazon FSx for NetApp ONTAP, and Amazon FSx for OpenZFS.
  • Hybrid storage bridges — AWS Storage Gateway in File, Volume, and Tape modes.

Scalable storage solutions all share five traits you should treat as axioms: AWS manages the hardware, durability is baked in, scope is AZ-level or Region-level depending on the service, billing is pay-as-you-go, and encryption at rest is available through AWS KMS. The SAA-C03 twist versus CLF-C02 is that architectural trade-offs (IOPS vs throughput, AZ-scoped vs Region-scoped, protocol, concurrency) dominate — not identification.

Scalable storage solutions on AWS are object (S3), block (EBS), and file (EFS, FSx) services whose capacity and performance grow without operator intervention. Object = HTTPS API, no mount. Block = one AZ, one EC2 attachment. File = many clients, one mount target per AZ. Burn this classification into memory — every SAA-C03 scalable storage solutions question pivots on it.

The Scalable Storage Solutions Taxonomy at a Glance

Paradigm AWS Service Access Unit Protocol Concurrency Typical SAA-C03 Hook
Object Amazon S3 Object (key + body + metadata) HTTPS REST Thousands Data lake, static assets, archive
Block Amazon EBS 512B–4 KiB blocks Block device to EC2 1 EC2 (or Multi-Attach cluster) Databases, boot volumes
File (Linux) Amazon EFS Files and folders NFSv4.1 Thousands of EC2 / on-prem Shared POSIX, web tier content
File (Windows) Amazon FSx for Windows File Server Files and folders SMB 2/3, NTFS Thousands AD-integrated SMB share
File (HPC) Amazon FSx for Lustre Files and folders Lustre / POSIX Thousands ML training, genomics, CFD
File (NetApp) Amazon FSx for NetApp ONTAP Files and blocks NFS, SMB, iSCSI Thousands Enterprise NetApp lift-and-shift
File (ZFS) Amazon FSx for OpenZFS Files and folders NFSv3/v4.1 Thousands Linux ZFS snapshots/clones
Hybrid AWS Storage Gateway Files, blocks, virtual tapes NFS, SMB, iSCSI, iSCSI-VTL Many on-prem Tape replacement, on-prem cache

Plain-Language Explanation: Scalable Storage Solutions

Analogy 1: The Library (Object Storage = Amazon S3)

Amazon S3 is a massive public library. Each book is an object. The barcode on the spine is the S3 key (s3://bucket/2026/report.pdf). You never walk into the stacks yourself — you ask the librarian over the counter (the S3 HTTPS API) who brings the whole book to you or takes a new one you hand over. The library has zones from a bright reading room (S3 Standard) to a refrigerated basement (Glacier Deep Archive); the librarian can move books between zones automatically on a schedule (lifecycle rules) or shuffle them continuously based on how often patrons check them out (S3 Intelligent-Tiering). A five-terabyte encyclopedia is too heavy to hand over in one go, so you break it into chapters and hand them one by one (multipart upload). If a patron is abroad, there is a satellite drop-off point closer to them (Transfer Acceleration via CloudFront edges). Point being: you interact with scalable storage solutions like this only through the counter; you cannot plug a USB cable into the library.

Analogy 2: The Private Workbench Drawer (Block Storage = Amazon EBS)

Amazon EBS is a drawer bolted under one specific workbench (one EC2 instance in one AZ). Only that workbench owner uses that drawer. You choose the drawer material by your job: plastic for cheap cold archives (sc1), steel for daily work (gp3), aerospace-grade titanium with extreme capacity (io2 Block Express) for the most demanding precision tasks. You can snap a photo of the drawer's contents at any moment (EBS snapshot) and store the photo in the library (S3 behind the scenes), then ship the photo to another building and reconstruct the drawer there (cross-Region snapshot copy). But you cannot wheel the drawer itself over to a different workbench in a different building — it is bolted down in one AZ. If two people need the same files at the same time, a drawer is the wrong answer. Use a shared filing cabinet instead.

Analogy 3: The Shared Office Filing Cabinet (File Storage = Amazon EFS and Amazon FSx)

The shared filing cabinet sits in the hallway where everyone can open it at once. Amazon EFS is the cabinet for the Linux floor — people use NFS keys. Amazon FSx for Windows File Server is the cabinet on the Windows floor — SMB keys, Active Directory sign-in. Amazon FSx for Lustre is the hyperspeed cabinet in the HPC lab — specialists wheel in millions of files per second; the cabinet pulls its inventory directly from the library (S3 link) and returns results there. Amazon FSx for NetApp ONTAP is the cabinet the finance team brought over from the old NetApp office — SnapMirror, SnapVault, FlexClone all still work. Amazon FSx for OpenZFS is the Linux cabinet with the ZFS engineer's favorite instant-clone tricks. One rule for all of them: you pick the cabinet by who needs to read it (protocol + OS) and how fast (throughput + latency).

Analogy 4: The Logistics Hub (Hybrid Storage = AWS Storage Gateway)

If your office is still on-premises but you want to use the library remotely, you put a local receiving dock (Storage Gateway appliance) in the basement. Staff hand mail to the dock as if it were the old mailroom (NFS/SMB/iSCSI/tape), and a truck quietly carts everything to the cloud library every night. The dock also keeps a small cache of the most-used items locally so nobody waits for the truck. Four receiving-dock modes exist: File Gateway (mail that ends up as objects in S3), FSx File Gateway (cache for FSx for Windows), Volume Gateway (virtual hard drives over iSCSI backed by S3 and snapshotted as EBS snapshots), and Tape Gateway (virtual tape robot so your old backup software never knows it is talking to the cloud).

Bottom line for scalable storage solutions: answer three questions in order — (1) what is the access unit (object / block / file / virtual tape)? (2) who has to read and write at the same time (one EC2, many EC2, on-prem, global)? (3) what performance envelope do you need (latency, IOPS, throughput, protocol)? The right scalable storage solutions service then picks itself.

Core Operating Principles of AWS Scalable Storage Solutions

Every scalable storage solutions service on AWS shares the same operational DNA:

  1. Managed lifecycle — AWS handles provisioning, replication, patching, and capacity expansion. You manage data, IAM policies, encryption keys, and lifecycle rules.
  2. Durability first — S3 offers 99.999999999% (11 nines) object durability; EBS replicates within an AZ; EFS replicates across AZs (Standard) or within one AZ (One Zone); FSx replicates according to the deployment type you chose.
  3. Scope is explicit — S3 = Region. EBS = single AZ. EFS = Region (with per-AZ mount targets) or single AZ (EFS One Zone). FSx = single AZ or Multi-AZ depending on deployment. Storage Gateway = on-prem with cloud back-end.
  4. Pay-as-you-go — You pay for stored capacity, requests, data transfer, and sometimes retrieval (Glacier) or provisioned throughput (EFS, FSx).
  5. Encryption and access control — SSE-S3, SSE-KMS, SSE-C, client-side, EBS KMS encryption, EFS encryption (at rest and in transit via TLS/stunnel), FSx KMS encryption.
Scope is the first performance lever, not just a security concern. EBS is AZ-scoped, so two EC2 instances in different AZs cannot share one EBS volume — that alone disqualifies EBS from half of all "shared data" scenarios. EFS has mount targets per AZ so it scales horizontally across AZs; FSx Multi-AZ similarly survives an AZ outage while FSx Single-AZ does not. Always read the AZ and Region constraint first.

Amazon S3 — Object Storage at Internet Scale

Amazon S3 is the default answer to any SAA-C03 scalable storage solutions question that mentions "objects," "data lake," "static website," "backup," or "archive." You place objects (up to 5 TB each) into a globally named bucket; the S3 data plane handles 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix with no provisioning needed. S3 durability is 11 nines; S3 Standard availability is 99.99%.

Amazon S3 Key Facts for SAA-C03

  • Object size: up to 5 TB. Single PUT: up to 5 GB. Anything over ~100 MB should use multipart upload.
  • Request scaling: thousands of requests per second per prefix, horizontally by using many prefixes.
  • Durability: 11 nines across at least 3 AZs (except One Zone classes).
  • Bucket names are globally unique; objects are regional.
  • Access: HTTPS API, console, CLI, SDK — never mountable as a block device.
  • Strong read-after-write consistency for all operations (since Dec 2020).

S3 Multipart Upload — The Performance Lever Exam Loves

Multipart upload splits a single large object into parts (5 MB to 5 GB each, up to 10,000 parts) that upload in parallel. Benefits:

  • Throughput scales with the number of parts you upload concurrently — a single EC2 instance can easily saturate 10+ Gbps.
  • Resumability — if a part fails, retry only that part, not the whole 2 TB object.
  • Required for objects larger than 5 GB.

Operational guidance:

  • Use lifecycle rules to abort incomplete multipart uploads after N days so you do not pay for orphaned parts.
  • Pair multipart upload with Transfer Acceleration when the uploader is far from the bucket Region.

S3 Transfer Acceleration

S3 Transfer Acceleration routes uploads through the nearest CloudFront edge location over the optimized AWS backbone to the destination bucket. Use cases:

  • Uploading from remote offices to a bucket in a distant Region.
  • Users in many geographies uploading to one centralized bucket.
  • Large-object uploads where last-mile internet is the bottleneck.

It is enabled per bucket, costs extra per GB, and has no effect when the uploader is already in the same Region as the bucket. If the S3 Transfer Acceleration Speed Comparison tool shows no benefit, AWS will not charge the surcharge.

S3 Byte-Range Fetch

A symmetric download optimization: GET a specific byte range of an object in parallel using the Range header. Used by FSx for Lustre, Athena, and many SDKs internally. Exam hook: "how to download a single large object faster from S3?" → parallel byte-range fetches.

Multipart upload, byte-range fetch, Transfer Acceleration, and many prefixes are the four S3 performance levers. For write-heavy scalable storage solutions on Amazon S3, combine multipart upload with sharded key prefixes. For read-heavy workloads on giant objects, use byte-range fetch. For globally distributed uploaders, add Transfer Acceleration. Mention any of these in an exam scenario and the correct S3 lever is usually one of the four.

S3 Versioning

Versioning keeps every version of every object in a bucket. Once enabled you can only suspend it — not fully disable it. Use versioning for:

  • Protecting against accidental overwrite and delete.
  • Enabling Cross-Region Replication and Same-Region Replication (which require versioning).
  • Feeding S3 Object Lock compliance workflows (WORM + versioning).

Cost watch: every version stores a full object copy and counts against your bill. Combine versioning with a lifecycle rule that transitions noncurrent versions to a colder class and expires them after N days.

S3 Storage Classes and S3 Intelligent-Tiering

There are seven S3 storage classes. Memorize them as a performance-and-cost ladder and tag each with one sentence:

  1. S3 Standard — Frequent access, millisecond latency, 99.99% availability, no minimum storage duration.
  2. S3 Intelligent-Tiering — Moves objects through Frequent / Infrequent / Archive Instant / Archive Access / Deep Archive Access tiers automatically based on access pattern. No retrieval fees on the instant tiers. Small monthly monitoring fee per object.
  3. S3 Standard-IA — Monthly-access-tier data, 99.9% availability, per-GB retrieval fee, 30-day minimum, 128 KB minimum billable size.
  4. S3 One Zone-IA — Same as Standard-IA but single-AZ storage. 20% cheaper, 99.5% availability. Use only for reproducible data.
  5. S3 Glacier Instant Retrieval — Archive class with millisecond retrieval for once-per-quarter access; 90-day minimum.
  6. S3 Glacier Flexible Retrieval — Minutes-to-hours retrieval (Expedited 1–5 min, Standard 3–5 hr, Bulk 5–12 hr); 90-day minimum.
  7. S3 Glacier Deep Archive — 12–48 hour retrieval, cheapest class, 180-day minimum. Target for 7–10 year compliance archives.

S3 Storage Classes Performance Cheat Sheet

S3 Storage Class First-byte latency Minimum storage Typical SAA-C03 pick
S3 Standard ms None Default hot data, data lake bronze layer
S3 Intelligent-Tiering ms to hours (lowest tiers) None Unknown or changing access pattern
S3 Standard-IA ms 30 days Monthly backup retrieval, DR data
S3 One Zone-IA ms 30 days Secondary copies, easy to re-create
S3 Glacier Instant ms 90 days Quarterly compliance queries
S3 Glacier Flexible min to hr 90 days Long-tail backups
S3 Glacier Deep Archive 12–48 hr 180 days 7-year financial or medical archives

S3 Intelligent-Tiering Deep Dive

S3 Intelligent-Tiering is the single most-asked S3 storage classes answer on SAA-C03 scenarios that say "unknown access pattern," "variable access pattern," or "without managing lifecycle rules." The service moves each object across five access tiers based on actual access:

  • Frequent Access tier (default, like Standard)
  • Infrequent Access tier (30 days no access)
  • Archive Instant Access tier (90 days no access)
  • Archive Access tier (optional, opt-in; 90+ days no access, minutes-to-hours retrieval)
  • Deep Archive Access tier (optional, opt-in; 180+ days no access, 12-hour retrieval)

No retrieval fee on any of the automatically managed tiers; you pay a small monthly monitoring fee per object (only for objects over 128 KB). Use Intelligent-Tiering when access is unpredictable and a human cannot write a reliable lifecycle rule.

S3 Intelligent-Tiering and S3 Standard-IA are both "IA" but they solve different problems. Standard-IA requires you to know the access pattern (quarterly retrieval, 30-day minimum, per-GB retrieval fee) — it is cheaper if you are right. Intelligent-Tiering removes the prediction problem at a small monitoring fee and has no retrieval fees on its auto-tiers. If the question says "unknown access pattern" the answer is Intelligent-Tiering. If the question says "monthly access, 30-day minimum is fine" the answer is Standard-IA.

Lifecycle Policies for Cost and Performance

Lifecycle policies run daily and move objects across S3 storage classes or expire them. Typical data-tiering chain for append-only log data:

Day 0    → S3 Standard
Day 30   → S3 Standard-IA
Day 90   → S3 Glacier Flexible Retrieval
Day 365  → S3 Glacier Deep Archive
Day 2555 → Expire (delete after 7 years)

For unpredictable access: skip lifecycle and put everything into S3 Intelligent-Tiering.

Amazon EBS — Block Storage Performance Tiers

Amazon EBS is the scalable storage solutions answer any time a single EC2 instance needs a persistent local-looking disk — boot volumes, transactional databases, latency-sensitive apps, log spools. EBS volumes live in one AZ; snapshots live in S3 (managed, invisible to you) and can be copied cross-Region to seed DR.

EBS Volume Types — The Core Performance Matrix

Type Media Max size Max IOPS per volume Max throughput Primary use case
gp3 SSD 16 TiB 16,000 1,000 MiB/s Default general purpose; decouple IOPS and throughput from size
gp2 SSD 16 TiB 16,000 250 MiB/s Legacy general purpose; IOPS scale with size (3 IOPS/GB, burstable to 3,000)
io2 Block Express SSD 64 TiB 256,000 4,000 MiB/s Mission-critical: SAP HANA, Oracle, SQL Server AlwaysOn
io1 SSD 16 TiB 64,000 1,000 MiB/s Legacy provisioned IOPS; consider io2 Block Express instead
st1 HDD 16 TiB 500 500 MiB/s Throughput-optimized: big-data, log processing, data warehouse scan workloads
sc1 HDD 16 TiB 250 250 MiB/s Cold HDD: infrequently accessed large datasets, lowest EBS cost

gp3 vs gp2 — The Free Performance Upgrade

gp3 is the current default general-purpose SSD. Compared with gp2:

  • gp3 provisions IOPS (up to 16,000) and throughput (up to 1,000 MiB/s) independently of volume size, while gp2 scales IOPS with size and caps throughput at 250 MiB/s.
  • gp3 is roughly 20% cheaper than gp2 at baseline capacity.
  • For most workloads, migrating from gp2 to gp3 improves price and performance simultaneously.

io2 Block Express — The 256k IOPS Monster

io2 Block Express is the performance apex of scalable storage solutions on Amazon EBS: up to 64 TiB, 256,000 IOPS, 4,000 MiB/s throughput, and 99.999% volume durability. It is also the volume you pick for SAP HANA and the kind of tier-1 OLTP workload that justifies premium per-IOPS pricing.

Throughput-Optimized HDD (st1) vs Cold HDD (sc1)

  • st1 — Streaming throughput workloads: Hadoop, Kafka, data-warehouse scans, video processing. Not for boot volumes. Cannot be used for workloads needing high IOPS on small random reads.
  • sc1 — Cold storage on HDD; the cheapest EBS tier. Good for infrequently accessed large files where Amazon S3 is not an option (because you need a block device).

EBS Snapshots — Point-in-Time Backups

Snapshots are incremental — only changed blocks are copied after the first snapshot — and are stored in S3 behind the scenes. You can:

  • Copy snapshots to another Region (DR seeding).
  • Share encrypted snapshots across AWS accounts via KMS grants.
  • Use EBS Fast Snapshot Restore (FSR) to pre-warm snapshots so the first-block read latency is normal from the moment a new volume is created.
  • Use Amazon Data Lifecycle Manager (DLM) or AWS Backup to orchestrate snapshots on a schedule.
EBS performance envelopes you must recall in the exam: gp3 up to 16,000 IOPS and 1,000 MiB/s; io2 Block Express up to 256,000 IOPS and 4,000 MiB/s and 64 TiB; st1 up to 500 MiB/s throughput but only 500 IOPS; sc1 up to 250 MiB/s and 250 IOPS. These numbers are the deciding factor in "which EBS volume type" scenario questions.

EBS Multi-Attach

io1 and io2 volumes support Multi-Attach to up to 16 Nitro EC2 instances in the same AZ for clustered applications (Oracle RAC-style). The application must coordinate writes — EBS does not give you file-system-level locking. Multi-Attach is not a substitute for EFS or FSx; it is a niche feature for clustered databases.

Amazon EFS — Shared NFS File Storage with Regional Reach

Amazon EFS is the scalable storage solutions pick whenever many EC2 instances (or on-prem hosts via VPN / Direct Connect) need to mount a shared POSIX file system over NFSv4.1. EFS handles from a few MB to petabytes with no provisioning, and it scales throughput with the workload.

EFS Performance Modes — General Purpose vs Max I/O

Performance mode is chosen at file-system creation and cannot be changed later:

  • General Purpose (default) — Lowest per-operation latency. Supports up to 35,000 file-system-wide IOPS. The right pick for web tier content, CMS, developer home directories, container persistent storage — i.e., almost everything.
  • Max I/O — Higher aggregate throughput and higher IOPS ceiling at the cost of slightly higher per-operation latency. Use for extremely parallel workloads that benefit from high aggregate throughput and tolerate ms-level extra latency. AWS guidance as of 2023 is to default to General Purpose; Max I/O is now rarely the best answer because Elastic throughput covers most previous Max I/O scenarios.

EFS Throughput Modes — Bursting, Provisioned, Elastic

Throughput mode governs how much data per second the file system can move:

  • Bursting Throughput (default) — Throughput scales with the amount of data stored; small file systems earn burst credits for short high-throughput spikes. Good for steady-state workloads roughly proportional to size.
  • Provisioned Throughput — Fixed throughput that you provision independently of file-system size. Use when workload throughput exceeds what Bursting offers and size is small.
  • Elastic Throughput — Automatic scaling from zero to tens of GB/s with pay-per-use pricing, no provisioning. Current AWS recommendation for most unpredictable workloads.

EFS Storage Classes and Lifecycle Management

EFS supports four storage classes:

  • EFS Standard — Multi-AZ, frequently accessed files.
  • EFS Standard-Infrequent Access (Standard-IA) — Multi-AZ, infrequently accessed; much cheaper per GB but has a per-access retrieval charge.
  • EFS One Zone — Single-AZ, frequently accessed files, cheaper than Standard.
  • EFS One Zone-Infrequent Access (One Zone-IA) — Single-AZ, infrequently accessed; cheapest EFS tier.

EFS Lifecycle Management automatically moves files not accessed for N days (7, 14, 30, 60, 90) to the matching IA class and (optionally) moves them back to Standard on access. Using IA + Lifecycle can cut EFS costs by 70–90% for workloads with a long-tail access pattern.

EFS Access Scope and Scaling

  • Region-scoped — EFS Standard and Standard-IA replicate across multiple AZs in the Region. Create a mount target in each AZ of the VPC; EC2 instances in that AZ mount the local mount target.
  • EFS One Zone — Lives in a single AZ. Cheaper; loses data if that AZ is destroyed (snapshots / AWS Backup to another AZ are advised).
  • EFS Replication — Cross-Region replication feature for DR; RPO typically minutes.
  • POSIX permissions and EFS Access Points — App-level mount restrictions and UID/GID enforcement.
Amazon EFS is Linux NFS only. Any SAA-C03 scenario mentioning "Windows," "SMB share," "Active Directory authentication," or "NTFS ACLs" disqualifies EFS immediately. The correct scalable storage solutions pick in those cases is Amazon FSx for Windows File Server (or Amazon FSx for NetApp ONTAP if SMB + NFS + iSCSI are all required).

Amazon FSx — Specialized Managed File Systems

Amazon FSx is the scalable storage solutions umbrella for four managed file systems, each solving a specific workload pattern that EFS cannot.

Amazon FSx for Windows File Server

  • Protocol: SMB 2.0 / 3.0 with NTFS semantics.
  • Identity: Integrates with AWS Managed Microsoft AD or self-managed AD.
  • Features: NTFS ACLs, DFS Namespaces, shadow copies, data deduplication, backups.
  • Deployment: Single-AZ or Multi-AZ (synchronous replication + automatic failover).
  • Use cases: Windows home directories, SharePoint back-end, SQL Server file shares, lift-and-shift Windows applications.

Exam hook: "Windows application needs shared file storage with AD and SMB" → FSx for Windows File Server. Never EFS.

Amazon FSx for Lustre

  • Protocol: Lustre (POSIX).
  • Performance: Sub-millisecond latency, up to 100s of GB/s aggregate throughput, millions of IOPS.
  • S3 integration: An FSx for Lustre file system can be linked to an S3 bucket; it lazy-loads objects as files on first access and can export modified files back to S3.
  • Deployment: Scratch file systems (temporary, no replication) for short-lived workloads; Persistent file systems (replicated in an AZ) for long-running workloads with sensitive data.
  • Use cases: Machine learning training, genomics, financial Monte Carlo, CFD, media rendering, any workload where the bottleneck is shared file throughput.

Exam hook: "ML training streams data from S3 and needs low-latency POSIX file access at 200 GB/s" → FSx for Lustre linked to the S3 bucket.

Amazon FSx for NetApp ONTAP

  • Protocols: NFS v3/v4, SMB 2/3, iSCSI — all in one file system.
  • Features: SnapMirror, SnapVault, FlexClone, ONTAP snapshots, dedup, compression, compaction, thin provisioning, tiering to capacity pool (S3-backed cold storage behind ONTAP).
  • Deployment: Single-AZ or Multi-AZ.
  • Use cases: Enterprise NetApp customers lifting existing workloads to AWS; mixed NFS + SMB + iSCSI environments; workloads that depend on NetApp-specific features like SnapMirror or FlexClone.

Exam hook: "Enterprise needs SMB, NFS, and iSCSI from a single managed file system with SnapMirror replication" → FSx for NetApp ONTAP.

Amazon FSx for OpenZFS

  • Protocol: NFSv3 and NFSv4.1.
  • Features: ZFS snapshots, clones (instant writable copies), compression, copy-on-write.
  • Deployment: Single-AZ (with multi-AZ HA available in newer file-system types).
  • Use cases: Linux workloads needing ZFS-style instant clones (dev/test data copies), low-latency NFS, simpler migrations from on-prem ZFS systems.

Exam hook: "Need instant writable point-in-time clones of a multi-TB file system for dev/test without copying data" → FSx for OpenZFS.

FSx Decision Table — When to Pick Each

Requirement Answer
Windows SMB + Active Directory FSx for Windows File Server
HPC / ML / sub-ms latency / 100s GB/s throughput / S3 linked FSx for Lustre
SMB + NFS + iSCSI from one file system, NetApp features FSx for NetApp ONTAP
Linux NFS with ZFS snapshots and instant clones FSx for OpenZFS
Generic Linux NFS share, lowest operations overhead Amazon EFS (not FSx)
Amazon FSx for Lustre has two deployment types that exam questions often hinge on. Scratch file systems are the cheapest but have no data replication — losing data on hardware failure is possible and acceptable for short-lived HPC jobs. Persistent file systems replicate within the AZ, protect against hardware failure, and are the right pick for long-lived or critical HPC data. If the stem emphasizes durability and long-running workloads, pick Persistent. If it emphasizes cost and temporary working sets, pick Scratch.

AWS Storage Gateway — Hybrid Scalable Storage Solutions

AWS Storage Gateway is a virtual or hardware appliance that bridges on-premises workloads to AWS storage without forcing an application rewrite. There are four modes, all in SAA-C03 scope.

S3 File Gateway

  • Protocols: NFS v3/v4.1 and SMB.
  • Back-end: Each file becomes an S3 object in the target bucket, with S3 metadata.
  • Local cache: Hot files cached on-prem for low-latency reads.
  • Use cases: On-prem applications that need to write files and have those files become cloud-native S3 objects automatically (log ingestion, media ingest, backup targets, hybrid lakes).

FSx File Gateway

  • Protocols: SMB.
  • Back-end: Amazon FSx for Windows File Server in AWS, with low-latency local cache on-prem.
  • Use cases: Branch offices reading and writing a central Windows file share hosted on FSx.

Volume Gateway

  • Protocol: iSCSI block volumes presented to on-prem servers.
  • Modes:
    • Cached volumes — Primary data lives in AWS; hot subset cached locally. Lets you grow storage without buying on-prem disk.
    • Stored volumes — Primary data is on-prem; full async replication to AWS for backup and DR. Low-latency local reads and writes.
  • Snapshots: Point-in-time EBS snapshots stored in S3; can be used to seed AWS-native EBS volumes for lift-and-shift or DR.

Tape Gateway

  • Protocol: iSCSI Virtual Tape Library (VTL) compatible with NetBackup, Veeam, Backup Exec, Dell EMC NetWorker, CommVault, etc.
  • Back-end: Virtual tapes in S3, with retirement to S3 Glacier and S3 Glacier Deep Archive.
  • Use cases: Replacing physical tape libraries without replacing backup software.

Exam hook: "Company with existing tape-based backup software wants to eliminate physical tape" → Tape Gateway. "On-prem app needs to write files that become S3 objects" → S3 File Gateway. "Growing on-prem block storage needs without buying more SAN" → Volume Gateway Cached mode.

Performance-Driven Scalable Storage Solutions Selection

The single most dangerous habit on SAA-C03 scalable storage solutions questions is picking based on service familiarity instead of the required performance envelope. Use this four-dimensional lens on every scenario.

Lens 1: Latency

  • Sub-millisecond / microsecond per operation → FSx for Lustre, EBS io2 Block Express, EC2 Instance Store, FSx for OpenZFS.
  • Single-digit millisecond → EBS gp3, EFS General Purpose, FSx for Windows, FSx for NetApp ONTAP.
  • 10s of ms (network + filesystem overhead) → EFS Max I/O, S3 Standard first byte.
  • Minutes to hours → S3 Glacier Flexible, Glacier Deep Archive restore.

Lens 2: Throughput

  • 100s of GB/s aggregate → FSx for Lustre Persistent.
  • 10s of GB/s per client → EFS Elastic throughput, EBS io2 Block Express, FSx for NetApp ONTAP.
  • GB/s per client → EBS gp3 (1 GiB/s), st1 (500 MiB/s), S3 with multipart upload + many connections.
  • Single-connection unoptimized throughput is usually the bottleneck — parallelism unlocks the real limits.

Lens 3: Protocol

  • HTTPS REST / AWS SDK → S3.
  • Block device (raw disk) → EBS, Instance Store, Volume Gateway.
  • NFSv4.1 (Linux) → EFS, FSx for Lustre, FSx for OpenZFS, FSx for NetApp ONTAP.
  • SMB (Windows) → FSx for Windows, FSx for NetApp ONTAP, S3 File Gateway, FSx File Gateway.
  • iSCSI → Volume Gateway, FSx for NetApp ONTAP, EBS Multi-Attach (within AZ).
  • Virtual Tape Library → Tape Gateway.

Lens 4: Concurrent Access Pattern

  • Single EC2, exclusive access → EBS or Instance Store.
  • Clustered within one AZ, shared block → EBS Multi-Attach (io1/io2) or FSx NetApp ONTAP iSCSI.
  • Many EC2, shared files, single AZ → EFS One Zone, FSx Single-AZ.
  • Many EC2, shared files, multi-AZ → EFS Standard, FSx Multi-AZ.
  • Thousands of clients globally, object fetch → S3 + CloudFront or S3 Transfer Acceleration.
  • On-prem + cloud mixed access → Storage Gateway matching mode.
When in doubt, build a four-column table in scratch space: Latency | Throughput | Protocol | Concurrency. Fill the row with what the question asks for. The correct scalable storage solutions service will usually be the only one whose envelope matches all four cells. This one habit eliminates most scalable storage solutions distractors on SAA-C03.

Determining Storage That Scales to Future Needs

Scalable storage solutions are not only about today's workload — SAA-C03 repeatedly asks about ten-fold growth. Rules of thumb:

  • Prefer services with no hard capacity cap (S3, EFS, FSx NetApp ONTAP with capacity pool, Storage Gateway) over fixed-capacity picks (EBS volumes max out at 16 TiB for most types; 64 TiB for io2 Block Express).
  • Prefer regional-scope services (S3, EFS Standard) for workloads that will need cross-AZ failover.
  • Prefer decoupled IOPS/throughput from size (gp3, Provisioned / Elastic EFS, Provisioned IOPS io2) when size growth will not match performance growth.
  • Use lifecycle rules (S3, EFS) to keep cost proportional to actual hot-data size, not total stored data.
  • Use Intelligent-Tiering for S3 when you cannot predict the access pattern years out.

Key Numbers to Memorize for Scalable Storage Solutions

  • S3 max object size: 5 TB. Single PUT: 5 GB. Multipart part size: 5 MB–5 GB, up to 10,000 parts.
  • S3 request scaling: 3,500 write and 5,500 read requests per second per prefix.
  • S3 durability: 11 nines across ≥ 3 AZs (except One Zone classes).
  • S3 Standard availability 99.99%; Standard-IA 99.9%; One Zone-IA 99.5%; Glacier classes 99.99% with retrieval SLA.
  • EBS gp3: up to 16,000 IOPS and 1,000 MiB/s independent of size, 16 TiB max.
  • EBS io2 Block Express: up to 256,000 IOPS, 4,000 MiB/s, 64 TiB, 99.999% durability.
  • EBS st1: 500 MiB/s throughput, 500 IOPS max; sc1: 250 MiB/s, 250 IOPS.
  • EFS General Purpose performance mode: up to 35,000 IOPS.
  • EFS Elastic throughput: scales automatically into the 10s of GB/s range.
  • FSx for Lustre Persistent: 100s of GB/s aggregate throughput, sub-ms latency.
  • FSx for Windows deployments: Single-AZ and Multi-AZ.
  • AWS Storage Gateway modes: S3 File Gateway, FSx File Gateway, Volume Gateway (Cached / Stored), Tape Gateway.
Twenty scalable storage solutions numbers to burn in: S3 11 nines durability; S3 5 TB max object; S3 3,500 write / 5,500 read per prefix; S3 classes = 7; gp3 16,000 IOPS / 1,000 MiB/s; io2 Block Express 256,000 IOPS / 4,000 MiB/s; io2 64 TiB; st1 500 MiB/s; sc1 250 MiB/s; EFS GP mode 35,000 IOPS; EFS One Zone exists; FSx flavors = 4 (Windows, Lustre, ONTAP, OpenZFS); FSx for Lustre scratch vs persistent; Storage Gateway modes = 4; S3 Glacier Deep Archive 12–48 hr; Standard-IA 30-day min; Glacier 90-day min; Deep Archive 180-day min; S3 Transfer Acceleration uses CloudFront edges; EBS is AZ-scoped.

Common SAA-C03 Exam Traps for Scalable Storage Solutions

  1. EBS vs EFS for "shared across EC2 instances" — EBS is single-AZ and (outside Multi-Attach) single-instance. The instant you see "many EC2 instances share the same files," it is EFS or FSx, not EBS.
  2. EFS vs FSx for Windows — EFS is Linux NFS only. "Windows SMB share" always points to FSx for Windows, not EFS.
  3. FSx for Lustre Scratch vs Persistent — Scratch is cheapest but has no replication. Persistent is the right pick for long-running or critical HPC datasets.
  4. S3 Intelligent-Tiering vs Standard-IA — Unknown access → Intelligent-Tiering. Known predictable infrequent access → Standard-IA.
  5. S3 One Zone-IA loses data if the AZ fails — Use only for reproducible data.
  6. Transfer Acceleration does not help in-Region uploads — It only helps over long distances.
  7. Glacier Instant Retrieval vs Glacier Flexible Retrieval — Instant is ms retrieval for once-a-quarter access; Flexible is minutes-to-hours for DR archives.
  8. EBS Multi-Attach is not a substitute for EFS — It is a niche feature for clustered databases that coordinate writes themselves, and it is AZ-scoped.
  9. Storage Gateway Tape Gateway replaces physical tape — "existing backup software writes to tape" → Tape Gateway.
  10. FSx for NetApp ONTAP is the one-file-system multiprotocol answer — If the scenario demands NFS, SMB, and iSCSI from the same file system, ONTAP is the only pick.
  11. FSx for Lustre links to S3 — "ML training reads from S3 with POSIX semantics at GB/s" → FSx for Lustre linked to the bucket.
  12. EBS snapshots are incremental and live in S3 — Only changed blocks are copied after the first snapshot; they can be copied cross-Region for DR.
An SAA-C03 scalable storage solutions stem describes "3,000 EC2 Linux instances across 3 AZs, each reads and writes a shared content directory at 5 GB/s aggregate." Candidates who default to EBS with Multi-Attach fail for two reasons: Multi-Attach maxes at 16 EC2 instances and is AZ-scoped. Candidates who pick FSx for Lustre may over-spend and miss the simpler Linux POSIX answer. The correct scalable storage solutions pick is Amazon EFS with Elastic throughput (or Max I/O if throughput dominates). Read the three-AZ and thousands-of-clients constraints first.

Scalable Storage Solutions Scenario Patterns

  • Pattern 1 — Static website and mobile app asset distribution with 11 nines durability → S3 Standard + CloudFront (and optionally Transfer Acceleration for uploads).
  • Pattern 2 — Compliance archive, rarely read, 7-year retention → S3 Glacier Deep Archive with lifecycle expiration.
  • Pattern 3 — Unknown access pattern on a new data lake → S3 Intelligent-Tiering.
  • Pattern 4 — EC2 MySQL OLTP needs 20,000 steady IOPS on a 2 TiB volume → EBS gp3 with 20,000 provisioned IOPS or io2 Block Express.
  • Pattern 5 — SAP HANA on EC2 → EBS io2 Block Express.
  • Pattern 6 — 500 Linux EC2 instances serving a CMS, all mount the same content directory across 3 AZs → Amazon EFS, Elastic throughput.
  • Pattern 7 — Windows SharePoint farm needs AD-integrated SMB share with Multi-AZ failover → FSx for Windows File Server Multi-AZ.
  • Pattern 8 — ML training reads terabytes from S3 and needs sub-ms POSIX access at 200 GB/s → FSx for Lustre Persistent linked to the S3 bucket.
  • Pattern 9 — NetApp customer migrating with SnapMirror, SMB + NFS + iSCSI → FSx for NetApp ONTAP.
  • Pattern 10 — Dev/test teams need instant writable clones of a 5 TiB Linux NFS dataset → FSx for OpenZFS.
  • Pattern 11 — On-prem app writes files that need to become S3 objects with low-latency local reads → S3 File Gateway.
  • Pattern 12 — Replace physical tape library used by Veeam → AWS Storage Gateway Tape Gateway.
  • Pattern 13 — Upload 2 TB object from Singapore to us-east-1 bucket fast → Multipart upload + S3 Transfer Acceleration.
  • Pattern 14 — Cross-Region DR for EBS-backed database → Snapshot + copy to target Region (or AWS Backup cross-Region copy).
  • Pattern 15 — Growing on-prem block needs without buying more SAN → Storage Gateway Volume Gateway Cached mode.

How Scalable Storage Solutions Connect to Other SAA-C03 Topics

  • Data Encryption and Key Management (1.3) — Every scalable storage solutions service integrates with AWS KMS: SSE-KMS for S3, KMS keys for EBS volumes and snapshots, KMS for EFS, KMS for FSx.
  • Data Governance and Compliance (1.3) — S3 Object Lock WORM mode, versioning, and AWS Backup cross the scalable storage solutions portfolio.
  • Disaster Recovery Strategies (2.2) — S3 CRR, EBS snapshot copy, EFS replication, FSx backups, and AWS Elastic Disaster Recovery all rely on scalable storage solutions.
  • Cost-Optimized Storage (4.1) — Lifecycle rules, S3 Intelligent-Tiering, EFS Lifecycle Management, gp3 over gp2, and One Zone classes are the cost levers.
  • Data Transfer and Migration (3.5) — DataSync, Snowball, Storage Gateway, and Transfer Family all feed scalable storage solutions.

FAQ — Scalable Storage Solutions Top Questions

Q1: Which EBS volume type should I choose for a mission-critical OLTP database that needs 50,000 sustained IOPS?

The right scalable storage solutions pick is Amazon EBS io2 Block Express, which delivers up to 256,000 provisioned IOPS, up to 4,000 MiB/s throughput, 99.999% durability, and up to 64 TiB per volume. io1 also covers 50,000 IOPS but io2 Block Express is newer, cheaper per IOPS, and the recommended choice for tier-1 databases like SAP HANA, Oracle, or SQL Server. gp3 tops out at 16,000 IOPS, which is insufficient.

Q2: When should I pick Amazon FSx for Lustre over Amazon S3 directly?

Pick Amazon FSx for Lustre when the workload demands POSIX file-system semantics with sub-millisecond latency and hundreds of GB/s aggregate throughput — typical of HPC, machine learning training, genomics pipelines, financial simulations, or media rendering. FSx for Lustre can link directly to an S3 bucket and lazy-load objects as files, then export modified files back to S3. Use S3 alone when the application can speak HTTPS or when latency and throughput from POSIX semantics are not required.

Q3: My EC2 instances span three AZs and all need to read and write a shared file system. What is the right scalable storage solutions choice?

Amazon EFS is the right answer for Linux workloads — it is Region-scoped with a mount target per AZ and supports thousands of concurrent NFS clients. Choose Elastic throughput for unpredictable workloads. For Windows, pick Amazon FSx for Windows File Server Multi-AZ. For HPC-scale throughput requirements, pick Amazon FSx for Lustre with a Persistent deployment. EBS is wrong because it is AZ-scoped.

Q4: How do I speed up uploads of a 2 TB file from Europe to an S3 bucket in us-east-1?

Use S3 multipart upload with many parallel parts and enable S3 Transfer Acceleration on the bucket. Multipart lets you upload parts (5 MB–5 GB each, up to 10,000 parts) in parallel — your throughput ceiling becomes the sum of your concurrent TCP streams rather than a single stream. Transfer Acceleration routes the uploads via the nearest CloudFront edge over the AWS optimized backbone. Combined, you can often halve or quarter the wall-clock upload time.

Q5: What is the difference between EFS Bursting, Provisioned, and Elastic throughput — and which should I pick?

Bursting throughput scales with the stored data size and is appropriate when the workload throughput grows proportionally with capacity. Provisioned throughput is a flat rate you provision independently of size — use when you need more throughput than Bursting gives but the file system is small. Elastic throughput auto-scales up and down without any provisioning and bills per-GB transferred — AWS now recommends it as the default for most unpredictable scalable storage solutions workloads on Amazon EFS because it eliminates both under-provisioning and over-provisioning risk.

Q6: When is S3 One Zone-IA the right answer vs S3 Standard-IA?

S3 One Zone-IA is 20% cheaper per GB than Standard-IA but stores data in a single AZ — you lose the data if the AZ is destroyed. Pick it only for data you can recreate: secondary copies, rendered thumbnails, processed analytic outputs. For primary infrequently accessed copies where durability matters, pick S3 Standard-IA (multi-AZ). Both have a 30-day minimum storage duration and per-GB retrieval fees.

Q7: Which AWS Storage Gateway mode do I pick for a company replacing physical tape backups?

Tape Gateway. It exposes a virtual tape library (VTL) over iSCSI that existing backup software (Veeam, NetBackup, Backup Exec, CommVault, etc.) treats exactly like a physical tape library. Virtual tapes land in S3 and can archive to S3 Glacier or S3 Glacier Deep Archive for long-term retention. This is a repeat scenario — "eliminate physical tape robots" always maps to Tape Gateway, not to EBS snapshots or AWS Backup.

Q8: My access pattern is completely unpredictable and I do not want to maintain lifecycle rules. Which S3 storage class?

S3 Intelligent-Tiering. It charges a small per-object monthly monitoring fee and then automatically moves each object across Frequent, Infrequent, Archive Instant, and optionally Archive and Deep Archive tiers based on real access. No retrieval fees on the managed tiers (except the opt-in Archive and Deep Archive tiers). This is the single correct answer to "unknown access pattern, minimize cost, no operational overhead" in the SAA-C03 scalable storage solutions question family.

Q9: What is the fastest way to restore an EBS volume from a snapshot with full performance from the first block read?

Use EBS Fast Snapshot Restore (FSR). Without FSR, a volume created from a snapshot lazy-loads blocks from S3 on first access, which introduces first-touch latency. FSR pre-warms the volume so every block is instantly available at the full provisioned IOPS. Enable FSR on the snapshot in each AZ where you plan to create volumes. This matters for DR and large-scale clone scenarios.

Q10: Which scalable storage solutions service do I pick if my workload needs SMB, NFS, and iSCSI from a single file system?

Amazon FSx for NetApp ONTAP. It is the only managed AWS scalable storage solutions service that exposes NFS, SMB, and iSCSI from the same file system, with NetApp-native features (SnapMirror replication, SnapVault backup, FlexClone instant clones, dedup, compression, tiering to an S3-backed capacity pool). Enterprise NetApp customers lifting existing storage infrastructure to AWS should default to FSx for NetApp ONTAP.

Further Reading on Scalable Storage Solutions

Final Study Tips for Scalable Storage Solutions

  1. Rehearse the object / block / file / hybrid taxonomy until it is automatic — scalable storage solutions questions always pivot on this first.
  2. Commit the EBS volume-type matrix (gp3, gp2, io2 Block Express, io1, st1, sc1) with max IOPS and max throughput to memory.
  3. Know the four FSx flavors by protocol and hook: Windows + AD → FSx Windows; HPC + Lustre → FSx Lustre; multiprotocol + NetApp → FSx ONTAP; Linux ZFS clones → FSx OpenZFS.
  4. Learn the S3 Intelligent-Tiering vs Standard-IA distinction cold — it appears in most cost-related scalable storage solutions questions.
  5. Practice applying the four-lens framework (latency, throughput, protocol, concurrency) to every scalable storage solutions scenario.
  6. Know where hybrid matters: S3 File Gateway (files → S3 objects), Volume Gateway (iSCSI block), Tape Gateway (virtual tape library).
  7. Remember that scalable storage solutions choices interlock with encryption, DR, and cost-optimization topics — the exam will blend them.

Master these scalable storage solutions patterns and SAA-C03 Task Statement 3.1 ("Determine high-performing and/or scalable storage solutions") becomes a reliable source of points on exam day. Good luck.

Official sources