examhub .cc The most efficient path to the most valuable certifications.
In this note ≈ 29 min

Data Transfer and Migration Solutions

5,680 words · ≈ 29 min read

AWS data transfer solutions are the services that move bytes between on-premises systems, third parties, and AWS Regions — without you writing rsync loops that die at 2 AM. The SAA-C03 Task 3.5 ("Determine high-performing data ingestion and transformation solutions") tests whether you can look at a migration scenario — "we have 500 TB in a datacenter, a 100 Mbps link, and 30 days to cut over" — and instantly pick between AWS DataSync, AWS Snowball Edge, AWS Snowmobile, AWS Transfer Family, AWS Storage Gateway, AWS Direct Connect, S3 Transfer Acceleration, and AWS Database Migration Service (DMS). This study note walks every AWS data transfer solution in the SAA-C03 scope, drills the online-vs-offline decision tree, and gives you enough repetition on the high-frequency keywords so you can eliminate wrong answers in under 30 seconds.

AWS data transfer solutions are one of the most testable topic families in Domain 3 because they are scenario-heavy. The exam rarely asks "what is DataSync" — it asks "you have 90 TB, a 1 Gbps link, and 10 days, pick the AWS data transfer service" and expects you to do the arithmetic in your head. Memorize the decision tree first, then the per-service details, then the traps.

What Are AWS Data Transfer Solutions?

AWS data transfer solutions are managed services, appliances, and network products that move data into, out of, and between AWS without requiring you to build the plumbing yourself. They span four categories:

  • Online network transfer — AWS DataSync, AWS Transfer Family, S3 Transfer Acceleration, AWS Storage Gateway. Bytes ride over the public internet, a VPN, or AWS Direct Connect.
  • Offline physical transfer — AWS Snow Family (Snowcone, Snowball Edge Storage Optimized, Snowball Edge Compute Optimized, and the now-deprecated Snowmobile). AWS literally ships you a ruggedized appliance, you load data, you ship it back.
  • Database and schema transfer — AWS Database Migration Service (DMS) and the AWS Schema Conversion Tool (SCT). DMS handles homogeneous (Oracle→Oracle) and heterogeneous (Oracle→Aurora PostgreSQL) migrations with continuous replication.
  • Dedicated bandwidth — AWS Direct Connect. A private fiber link from your datacenter to an AWS Direct Connect location, used for ongoing bulk transfer when the public internet is too slow, too expensive, or too unpredictable.

AWS data transfer solutions differ from AWS storage services: storage services (Amazon S3, Amazon EBS, Amazon EFS, Amazon FSx) persist the bytes; AWS data transfer solutions get the bytes from here to there. Most real migrations combine both — DataSync moves the bytes, S3 stores them, Glue transforms them.

AWS Data Transfer Solutions at a Glance

Service Online or Offline Primary Use Case Source / Destination
AWS DataSync Online Automated one-time or recurring bulk copy NFS / SMB / HDFS / S3 / on-prem object store → S3 / EFS / FSx
AWS Transfer Family Online Managed SFTP / FTPS / FTP / AS2 ingestion from partners External SFTP/FTPS/FTP/AS2 client → S3 / EFS
AWS Storage Gateway Online (hybrid) Ongoing on-prem ↔ AWS integration NFS / SMB / iSCSI / VTL on-prem ↔ S3 / EBS snapshots / Glacier
S3 Transfer Acceleration Online Faster long-distance uploads to one S3 bucket Any internet client → S3 via CloudFront edge
AWS Direct Connect Online (dedicated) High-bandwidth, low-latency, predictable pipe On-prem datacenter ↔ AWS Region
AWS Snowcone Offline (rugged, 8/14 TB) Edge / small offline transfer On-prem → ship → S3
AWS Snowball Edge Storage Optimized Offline (~80 TB usable) Medium-to-large offline transfer On-prem → ship → S3
AWS Snowball Edge Compute Optimized Offline + edge compute Edge processing plus transfer On-prem → ship → S3
AWS Snowmobile Offline (up to 100 PB, deprecated for new orders) Exabyte-scale datacenter evacuation On-prem semi-truck → S3
AWS Database Migration Service Online (DB-aware) DB migration with continuous replication (CDC) Source DB → target DB (RDS, Aurora, Redshift, DynamoDB, S3, etc.)
AWS data transfer solutions are the managed services that move data into, out of, or between AWS. The four sub-families are online network transfer (DataSync, Transfer Family, Storage Gateway, S3 Transfer Acceleration), offline physical transfer (Snow Family), dedicated bandwidth (Direct Connect), and database-aware migration (DMS). Memorize this four-way split before anything else — every SAA-C03 data transfer question maps to exactly one of them.

Analogy 1: The Postal System (Online vs Offline Transfer)

Think of AWS data transfer solutions as a country-wide postal system. If you have a single postcard to send across town, you drop it in a mailbox and it arrives tomorrow — that is online transfer (DataSync, Transfer Family, S3 Transfer Acceleration). The road network (your internet link) is good enough. But if you want to move the entire contents of a warehouse — ten million books — no mailbox is going to help. You call a moving company with an eighteen-wheeler, they load everything at the warehouse, drive to the destination, and unload. That eighteen-wheeler is AWS Snowball Edge; the convoy of eighteen-wheelers is AWS Snowmobile. The pocket-sized courier envelope for an overnight ship is AWS Snowcone.

The reason AWS Snow Family exists is not that the internet is broken — it is that physics still wins. Shipping a hard drive across a country takes two days regardless of how big the hard drive is. Uploading 100 TB over a 100 Mbps link takes 92 days. If volume × urgency beats bandwidth, ship the disk.

Analogy 2: The Kitchen (DataSync vs Storage Gateway vs Transfer Family)

Imagine three kitchen roles, each for a different data transfer purpose:

  • AWS DataSync is the moving van that brings the whole pantry from the old house to the new house. One-time or scheduled bulk move. You point it at an NFS share, it copies everything to S3 or EFS, it tracks incremental changes, and it stops when you tell it to stop. It has an agent (the driver) on-premises that orchestrates the copy.
  • AWS Storage Gateway is the dumbwaiter between the old pantry and the new pantry — they stay connected forever. It caches hot items in the on-prem kitchen and keeps the cold inventory in the cloud pantry. File Gateway (NFS/SMB → S3), Volume Gateway (iSCSI → EBS snapshots), Tape Gateway (VTL → S3/Glacier).
  • AWS Transfer Family is the service door where partners and suppliers drop off ingredients. SFTP, FTPS, FTP, or AS2 endpoints that deliver straight into S3 or EFS, with managed identity, managed TLS, and logical directories to mask bucket paths.

They look similar at a glance, but the role is different: DataSync moves, Storage Gateway bridges, Transfer Family receives from outside partners.

Analogy 3: Choosing a Shipping Method (The Decision Tree)

You have 500 boxes to ship. How do you choose?

  • Small and urgent? Hand it to the courier (S3 Transfer Acceleration or DataSync).
  • Medium and predictable? Regular freight truck (AWS Direct Connect for ongoing, DataSync for one-shot).
  • Large and one-off? Rent a container truck (Snowball Edge). At some point the truck is cheaper per GB than paying bandwidth.
  • Massive and datacenter-sized? Hire a convoy of semi-trucks (Snowmobile — though AWS has deprecated new orders and pushes large customers toward fleets of Snowball Edges).

The rule of thumb: if online transfer would take longer than one week over your available bandwidth, the Snow Family is almost always faster and cheaper. If online transfer takes longer than one month, the Snow Family is overwhelmingly the correct answer.

Plain-English conclusion: picking an AWS data transfer solution is a three-variable arithmetic problem — volume, bandwidth, urgency — and the decision tree tells you which service wins.

Core Operating Principles of AWS Data Transfer Solutions

All AWS data transfer solutions share a few design principles that repeat across the portfolio:

  1. Encryption in transit and at rest by default. DataSync uses TLS, Snow devices use 256-bit encryption with keys held in AWS KMS (never on the device), Transfer Family supports SFTP (SSH) / FTPS (TLS) / AS2 (signed and encrypted payloads). DMS supports TLS and SSL to source/target endpoints.
  2. Managed checkpointing and validation. DataSync validates every transferred object with metadata and can verify at the destination. Snow devices compute checksums at load and import time. DMS has validation-only tasks.
  3. Incremental where possible. DataSync detects changed files and only copies deltas on subsequent runs — this is the single biggest reason to prefer DataSync over hand-rolled rsync scripts. DMS Change Data Capture (CDC) streams ongoing changes after the initial full load.
  4. IAM-based access control. Every data transfer service requires an IAM role on the AWS side. DataSync agents assume a role to write to S3; Transfer Family servers assume a role to write to S3/EFS; DMS replication instances run under a VPC with an IAM role.
  5. Separation of data plane and control plane. Control (create task, schedule, monitor) lives in the AWS Management Console / API; data plane (the actual bytes) flows over the chosen transport (internet, VPC endpoint, Direct Connect, or physical disk).
Every AWS data transfer solution encrypts data in transit. For Snow Family devices, the encryption keys are managed in AWS KMS and never stored on the device — if the device is stolen, the data is unreadable. This is a favorite SAA-C03 security distractor: "how do you protect data on a Snowball?" Answer: "you do not have to; AWS encrypts it end-to-end with KMS-managed keys."

AWS DataSync — Automated Online Data Transfer

AWS DataSync is a managed, online, bulk-copy service that moves file and object data between on-premises storage and AWS, or between AWS storage services. It is the default answer for "we have 10 TB to 100 TB of files and a network link that is fast enough to move it in a few days."

DataSync Architecture

An AWS DataSync deployment has four components:

  • Agent — A virtual appliance (VMware, Hyper-V, KVM, or Amazon EC2) installed near the source storage. The agent reads from NFS, SMB, HDFS, object stores, or Amazon S3 sources. For AWS-to-AWS transfers (S3 → S3 cross-Region, EFS → EFS), no agent is needed.
  • Location — A definition of a source or destination. Examples: nfs://filer01/data, smb://winfs01/share, s3://bucket-name/prefix, efs://fs-0abc/, fsx://fs-0xyz/.
  • Task — A pairing of a source location and a destination location, plus options (include/exclude patterns, verification mode, bandwidth limit, schedule).
  • Task execution — A single run of a task. You can execute on demand, on a schedule (cron-like), or via the API.

Supported Sources and Destinations

AWS DataSync supports a broad matrix:

  • On-premises sources: NFS (v3 / v4.0 / v4.1), SMB (2.1 / 3.x), HDFS, self-managed object storage (S3-compatible)
  • AWS storage destinations/sources: Amazon S3 (all storage classes including Glacier for writes), Amazon EFS, Amazon FSx for Windows File Server, FSx for Lustre, FSx for OpenZFS, FSx for NetApp ONTAP
  • Other clouds: Google Cloud Storage, Microsoft Azure Files, Azure Blob (with DataSync Discovery and agent-based tasks)

DataSync Task Options You Must Know

  • Bandwidth throttling. You can cap a task at N MBps so the copy does not saturate your production internet link during business hours.
  • Scheduling. Hourly, daily, weekly cron expressions. This is how you do "nightly incremental replication" of an on-prem NFS share into S3.
  • Include / exclude filters. Glob patterns to skip temp files, build artifacts, or specific directories.
  • Verification modes. POINT_IN_TIME_CONSISTENT (default; verify entire dataset), ONLY_FILES_TRANSFERRED (faster; verify only what moved), NONE (skip verification, not recommended).
  • Incremental transfer. DataSync detects changed files by comparing source and destination metadata (size, modification time, optionally checksums). Only changed bytes are copied on subsequent runs — this is what makes DataSync dramatically more efficient than repeatedly copying the whole dataset.
  • Transfer over VPC endpoints. You can route DataSync traffic through AWS PrivateLink so data never touches the public internet.

DataSync Throughput

A single DataSync task can push up to ~10 Gbps per agent in practice (AWS advertises up to tens of Gbps for aggregate tasks with multiple agents). For a 100 TB dataset on a 10 Gbps link with agent parallelism, the copy completes in roughly a day, modulo overhead.

For ongoing nightly sync from an on-prem NFS filer into S3, AWS DataSync is the right answer. Not Storage Gateway (that is for bridging, not one-way sync). Not Snowball (that is for offline one-shots). Not Transfer Family (that is for partner SFTP drops). DataSync with a scheduled task and incremental transfer is the purpose-built fit.

DataSync Pricing Model

You pay per gigabyte transferred (flat rate) plus standard AWS request, storage, and data transfer charges on the destination. There is no charge for the number of agents, tasks, or schedules. This flat per-GB model is the key reason DataSync is so much cheaper than writing your own rsync over EC2 — you pay only for throughput, not for the compute that did the copy.

AWS Transfer Family — Managed SFTP, FTPS, FTP, and AS2

AWS Transfer Family is a fully managed service that exposes SFTP, FTPS, FTP, and AS2 endpoints backed by Amazon S3 or Amazon EFS. It is the right answer when external partners, vendors, or legacy systems need to drop files into AWS using a standard file-transfer protocol and you do not want to run an EC2 SFTP server yourself.

Protocol Endpoints

  • SFTP (SSH File Transfer Protocol). Most common. TCP port 22. File transfer over SSH.
  • FTPS (File Transfer Protocol over TLS). TCP port 21 (explicit) or 990 (implicit). Legacy but still widely used, especially in finance and healthcare.
  • FTP (plain). Not encrypted. Only usable for VPC-internal traffic (the service refuses to expose plain FTP to the public internet).
  • AS2 (Applicability Statement 2). Signed, encrypted, message-based B2B protocol used heavily in EDI (retail, logistics, healthcare). HTTP/HTTPS transport with S/MIME payloads and MDN receipts.

Identity Provider Options

Transfer Family supports three authentication backends:

  • Service-managed users. Usernames and SSH public keys stored directly in the Transfer Family service. Simplest option for small partner lists.
  • AWS Directory Service. Integrates with AWS Managed Microsoft AD or AD Connector for enterprise identity federation — partners authenticate with corporate credentials.
  • Custom identity provider (Lambda-backed). A Lambda function that you write receives the username/password/SSH key and returns IAM role, home directory, and logical-directory mappings. This is what you use when your identity lives in Okta, Azure AD, or a home-grown user database.

Logical Directories

By default, an SFTP user dropped into s3://bucket-name/partner-a/ would see the full bucket path. Logical directories let you remap paths so partner A only sees /upload and /download — hiding bucket names, prefixes, and internal structure. This is a common compliance requirement.

Managed File Transfer Workflows

Transfer Family includes managed workflows — event-driven pipelines triggered on file arrival. A workflow can: decrypt PGP-encrypted files, move files to final S3 keys, invoke Lambda for custom processing, tag files for downstream jobs. This is the supported pattern for "when a partner SFTPs a file, validate it, decrypt it, and publish to a processing queue."

Endpoint Types and Network Exposure

Transfer Family servers can be:

  • Public endpoint. Reachable from the internet at an AWS-assigned DNS name.
  • VPC endpoint (internet-facing). Elastic IP(s) you control, placed in your VPC public subnets.
  • VPC endpoint (internal). Private-only, reachable via VPN / Direct Connect / VPC peering.
AWS Transfer Family is specifically for ingesting files from external systems via SFTP, FTPS, FTP, or AS2 — it is not a general-purpose data copy tool. For bulk one-way copy of an on-prem NFS share into S3, use AWS DataSync. For hybrid on-prem caching with NFS/SMB, use AWS Storage Gateway File Gateway. Transfer Family is the answer only when the requirement mentions SFTP, FTPS, FTP, AS2, or "partners need to drop files."

AWS Snow Family — Offline Physical Transfer

AWS Snow Family is the portfolio of ruggedized hardware appliances AWS ships to your location for offline data transfer and edge compute. You load data on-site, ship the device back, and AWS imports the data into S3 inside the Region.

Snowcone

  • Size: Portable, 4.5 lbs, fits in a backpack.
  • Storage: 8 TB HDD or 14 TB SSD.
  • Compute: 2 vCPUs, 4 GB RAM (enough for light edge workloads).
  • Connectivity: Wi-Fi, Ethernet, or LTE (via AWS Snowcone with LTE modem).
  • Use case: Tactical / field / first-responder data collection; small (single-digit TB) offline transfer; IoT edge.
  • Shipping: Fits in a standard shipping envelope.

Snowball Edge Storage Optimized

  • Storage: ~80 TB of usable HDD capacity (210 TB in the newer SSD variant for object storage tasks).
  • Compute: Modest (e.g., 40 vCPUs, 80 GB RAM).
  • Use case: The default choice for offline transfer in the ~30–100 TB range. Ship one, ship two, ship a fleet — the price-per-GB is dominated by device rental and shipping, not size.

Snowball Edge Compute Optimized

  • Storage: ~42 TB usable (lower, because it trades some disk for compute).
  • Compute: 52 vCPUs, 208 GB RAM, optional GPU (NVIDIA V100 in the GPU variant).
  • Use case: Edge computing at disconnected / intermittently connected sites — run EC2 instances, Lambda functions, and EKS Anywhere on the device while it is on-site. Think offshore oil rigs, military forward bases, remote scientific stations. Data is collected, processed on-device, and the processed output goes back to AWS.

Snowmobile

  • Capacity: Up to 100 PB per truck.
  • Form factor: A 45-foot shipping container pulled by a semi-truck. Literally.
  • Use case: Evacuating exabyte-scale on-prem datacenters. Historically used for media libraries, genomics archives, and satellite imagery hoards.
  • Status (as of 2026): AWS no longer accepts new Snowmobile orders for most regions and recommends multiple Snowball Edge devices in parallel for PB-scale migrations. Still appears in SAA-C03 questions — know it exists, know the 100 PB number, know it is for "datacenter-evacuation-scale" only.

Snow Family Security

  • All devices encrypt data with 256-bit encryption.
  • Encryption keys are managed in AWS KMS and never stored on the device — if the device is lost or stolen, the data is unreadable.
  • Tamper-evident and tamper-resistant enclosures with TPM.
  • Trusted Platform Module (TPM) verifies device integrity on return.
  • Chain-of-custody logged through the AWS OpsHub for Snow Family application and the Snow Family console.

The Snowball Decision Tree (Memorize This)

Use this mental model for SAA-C03 "pick the data transfer service" scenarios:

  1. Is the dataset > 100 TB AND the available bandwidth < 1 Gbps? → Snowball Edge (probably multiple devices in parallel).
  2. Is the dataset 10–100 TB with < 500 Mbps bandwidth? → Snowball Edge.
  3. Is the dataset 1–10 TB and you need edge compute too? → Snowcone (with LTE) or Snowball Edge Compute Optimized depending on compute needs.
  4. Is the dataset < 10 TB and bandwidth is adequate? → AWS DataSync online. Or S3 Transfer Acceleration for direct upload.
  5. Is it ongoing, not one-shot? → Storage Gateway (hybrid) or Direct Connect + DataSync (dedicated pipe).
  6. Is it datacenter-scale, PB-level, one-time? → Fleet of Snowball Edges (historically Snowmobile).

The arithmetic heuristic: compute how long online transfer takes. If it is more than one week, Snowball wins. If it is more than one month, Snowball wins overwhelmingly. The formula days = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × utilization) with utilization ≈ 0.8 is enough to solve any exam scenario in your head.

Snowball Edge Storage Optimized holds ~80 TB usable. Snowcone holds 8 TB (HDD) or 14 TB (SSD). Snowmobile holds up to 100 PB. Memorize these three numbers — they appear verbatim in SAA-C03 question stems. Also memorize: Snow devices encrypt with KMS-managed keys; keys are never on the device.
Do not confuse AWS Snowball with AWS Storage Gateway. Snowball is offline, one-time, physical. Storage Gateway is online, continuous, software. A scenario saying "we need to permanently connect our on-prem NAS to S3 with local caching" is always Storage Gateway, never Snowball. A scenario saying "we have 50 TB to migrate once and our WAN is slow" is always Snowball Edge, never Storage Gateway.

AWS Storage Gateway — Hybrid Cloud Storage Integration

AWS Storage Gateway is a hybrid, software-plus-service that exposes AWS storage to on-premises applications using standard storage protocols (NFS, SMB, iSCSI, iSCSI-VTL). Unlike DataSync, it is designed for ongoing connectivity, not one-time migration.

Storage Gateway Types

  • Amazon S3 File Gateway. Exposes S3 buckets as NFS or SMB mounts. Files written on-prem become objects in S3 (one-to-one). Hot files are cached locally for low-latency reads.
  • Amazon FSx File Gateway. Local cache in front of FSx for Windows File Server across a slow WAN. Helps remote offices access FSx with low latency.
  • Volume Gateway. Presents iSCSI block volumes on-prem.
    • Cached Volumes: primary data in S3, frequently accessed data cached locally.
    • Stored Volumes: primary data on-prem, asynchronously backed up to S3 as EBS snapshots.
  • Tape Gateway (VTL). Emulates an LTO tape library over iSCSI VTL. Backup software (Veritas NetBackup, Commvault, Veeam) writes "tapes" that are stored in S3 and archived to S3 Glacier / Glacier Deep Archive.

When to Pick Storage Gateway

  • "We have NFS filers on-prem and want to gradually move cold data to S3 while keeping hot data fast locally." → File Gateway.
  • "We need to replace a physical tape library with cloud storage but keep the existing backup software." → Tape Gateway.
  • "We want on-prem block volumes backed up continuously to AWS as EBS snapshots." → Volume Gateway (Stored Volumes).
If the SAA-C03 scenario says "ongoing," "hybrid," "local cache," or "replace tape library," the answer is AWS Storage Gateway. If it says "one-time," "bulk migration," or "copy and done," the answer is DataSync (online) or Snowball Edge (offline). The trigger words matter more than the volume numbers.

Picking the Transfer Method — The Decision Matrix

This is the core skill the SAA-C03 tests. Use these three variables in this order:

Variable 1: Data Volume

  • < 10 TB → Online is almost always viable.
  • 10 TB – 100 TB → Depends on bandwidth. Run the math.
  • 100 TB – 1 PB → Snowball Edge fleet almost always wins.
  • 1 PB → Historically Snowmobile; today a large Snowball Edge fleet.

Variable 2: Available Bandwidth

  • < 100 Mbps (typical small-office broadband) → Snowball Edge past ~5 TB.
  • 100 Mbps – 1 Gbps → Online viable up to ~50 TB over a few days.
  • 1 Gbps – 10 Gbps (dedicated link) → Online viable up to ~500 TB over a week. DataSync + Direct Connect.
  • 10 Gbps (multiple Direct Connects) → Online competitive even at PB scale for non-urgent migrations.

Variable 3: Urgency

  • Hours → S3 Transfer Acceleration (if the dataset is small enough) or fleet of Snowballs in parallel.
  • Days → DataSync over the fastest available link.
  • Weeks → Snowball Edge is most cost-effective for most sizes.
  • Ongoing (not a one-shot) → Storage Gateway or DataSync on a schedule, typically over Direct Connect.

The Decision Tree

Is it a database migration?
├── Yes → AWS DMS (with AWS Schema Conversion Tool for heterogeneous)
└── No ↓

Is it ongoing / continuous hybrid access?
├── Yes → AWS Storage Gateway
└── No ↓

Is it an external partner sending files via SFTP/FTPS/FTP/AS2?
├── Yes → AWS Transfer Family
└── No ↓

Compute days_online = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × 0.8)

Is days_online > 7?
├── Yes → AWS Snow Family (Snowcone / Snowball Edge / fleet)
└── No ↓

Is it a single S3 bucket upload from globally distributed clients?
├── Yes → S3 Transfer Acceleration
└── No → AWS DataSync (with or without Direct Connect)
The SAA-C03 decision-tree questions almost always hinge on three variables: dataset size, available bandwidth, and whether the transfer is one-time or ongoing. Run the days-online arithmetic first. If online transfer would take longer than a week, pick Snow Family. If it is ongoing, pick Storage Gateway. If it is partner-initiated SFTP, pick Transfer Family. Everything else is DataSync.

AWS Direct Connect for Ongoing Bulk Transfer

AWS Direct Connect is a dedicated network connection from your datacenter, office, or colocation environment to AWS. It provides:

  • Dedicated bandwidth: 1 Gbps, 10 Gbps, or 100 Gbps dedicated ports; sub-1 Gbps hosted connections available through AWS Direct Connect partners.
  • Private, predictable latency: Traffic never rides the public internet, so latency variance is low.
  • Lower data-transfer cost: Egress from AWS over Direct Connect is priced significantly below internet egress per GB.
  • Virtual Interfaces (VIFs): Private VIF (to a VPC), Public VIF (to AWS public services like S3), Transit VIF (to a Transit Gateway).

Direct Connect for Data Transfer

Direct Connect is not itself a data transfer service — it is the pipe that other data transfer services use. Combine Direct Connect with:

  • DataSync over Direct Connect for bulk one-time or scheduled transfer without touching the public internet.
  • Storage Gateway over Direct Connect for hybrid integration with predictable latency.
  • DMS over Direct Connect for database migration with stable replication throughput.

Direct Connect + VPN for Encryption

A raw Direct Connect circuit is private but not encrypted. For end-to-end encryption you layer a VPN over the Direct Connect (called "Direct Connect + VPN" or an IPsec tunnel over a public VIF).

For ongoing, high-volume, predictable-latency data transfer between your datacenter and AWS, AWS Direct Connect is the pipe and DataSync / Storage Gateway / DMS is what runs over it. Direct Connect alone does not transfer data — you still need a transfer service on top.

Amazon S3 Transfer Acceleration

Amazon S3 Transfer Acceleration (S3TA) speeds up long-distance uploads to S3 by routing traffic through the AWS global edge network (the same 600+ CloudFront edge points of presence). Clients upload to a nearby edge location, and AWS moves the bytes to the destination S3 bucket over the AWS backbone.

When to Use S3 Transfer Acceleration

  • Globally distributed clients uploading to a single S3 bucket.
  • Large objects (≥ 100 MB) where the AWS backbone meaningfully outperforms the public internet path.
  • Single-bucket workloads — S3TA is enabled per bucket.
  • Workloads where the S3 Transfer Acceleration pricing premium is worth the speedup.

When Not to Use S3 Transfer Acceleration

  • Small objects (< 1 MB); the overhead dwarfs the speedup.
  • Clients in the same Region as the bucket; they already benefit from AWS's regional backbone.
  • Cost-sensitive workloads where the per-GB acceleration premium exceeds the value.

Transfer Acceleration vs Multipart Upload

Multipart upload parallelizes one large upload into chunks — it is independent of S3TA and should always be used for objects > 100 MB. You can combine both: multipart upload over S3 Transfer Acceleration.

S3 Transfer Acceleration accelerates uploads to a single S3 bucket from globally distributed clients by routing through CloudFront edge locations. It is not a substitute for DataSync (bulk file transfer) or Snowball (offline). Pick S3TA when the scenario mentions "global users," "long distance," and "direct upload to S3."

AWS Database Migration Service — Moving Databases

AWS Database Migration Service (DMS) migrates relational databases, data warehouses, and NoSQL stores into AWS with minimal downtime. It supports two migration classes:

Homogeneous Migration

Source and target speak the same engine — Oracle to Oracle, MySQL to MySQL, PostgreSQL to PostgreSQL. Schema conversion is trivial (or identical). Example: self-managed MySQL on EC2 → Amazon RDS for MySQL.

Heterogeneous Migration

Source and target speak different engines — Oracle to Amazon Aurora PostgreSQL, SQL Server to Amazon RDS for MySQL, Oracle to DynamoDB. Schema conversion is non-trivial and handled by the AWS Schema Conversion Tool (SCT), a desktop application that analyzes the source schema, produces a target schema, and flags manual-fix items. SCT runs first; DMS runs second to move the data.

DMS Architecture

  • Replication instance. An EC2 instance (managed by DMS) that runs the replication engine. Sized for throughput (dms.t3.medium for tests, dms.c5.4xlarge for production-grade migrations).
  • Endpoints. Source endpoint and target endpoint definitions (connection strings, credentials, SSL options).
  • Replication task. The unit of migration. Three task types:
    • Full load. One-time copy of existing data.
    • Full load + CDC. Full load followed by continuous change data capture to keep the target in sync until cutover.
    • CDC only. Apply only ongoing changes (used after a separate one-time load).

Supported Sources and Targets

  • Sources: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, Amazon Aurora, IBM Db2, SAP ASE, Azure SQL, Google Cloud SQL.
  • Targets: All of the above, plus Amazon Redshift, Amazon DynamoDB, Amazon S3 (as Parquet / CSV), Amazon OpenSearch Service, Amazon Kinesis Data Streams, Amazon MSK, Amazon Neptune, Babelfish for Aurora PostgreSQL.

DMS Typical Migration Flow

  1. Run AWS SCT against the source to produce a target schema (heterogeneous only).
  2. Create the target database (Aurora PostgreSQL, RDS, Redshift, etc.).
  3. Create a DMS replication instance in a VPC with network access to both endpoints.
  4. Define source and target endpoints and test connectivity.
  5. Launch a "full load + CDC" task.
  6. When CDC lag is near zero, cut over the application and stop the task.
AWS DMS is specifically for migrating databases, data warehouses, and NoSQL stores with minimal downtime. It is not for file or object bulk transfer — that is DataSync, Snowball, or Storage Gateway. Any SAA-C03 question that mentions migrating an Oracle, MySQL, SQL Server, PostgreSQL, MongoDB, or similar database system to AWS points to DMS (with SCT for heterogeneous moves).

Securing AWS Data Transfer Access Points

Every AWS data transfer solution must be locked down for production use. The SAA-C03 tests security integration heavily.

IAM Roles for Transfer Services

  • DataSync agents assume an IAM role to write to S3 / EFS / FSx. The role must have s3:PutObject, s3:ListBucket, and destination-specific permissions.
  • Transfer Family servers assign an IAM role per user (service-managed) or per authentication callback (custom IdP). The role scopes which bucket/prefix the user can read/write.
  • DMS replication instances run in a VPC and use an IAM role (dms-vpc-role) plus endpoint-specific credentials.
  • Snow Family devices are tied to an IAM role for S3 import and AWS KMS keys for encryption.

VPC Endpoints for Private Transfer

Route data transfer traffic through AWS PrivateLink so it never crosses the public internet:

  • DataSync over VPC endpoint (interface type).
  • S3 Gateway VPC endpoint for any service that writes to S3 (free, regional).
  • DMS endpoints inside the VPC with private IP addressing.
  • Transfer Family with VPC endpoint (internal) hostname so only VPN/Direct Connect clients can reach it.

Encryption

  • DataSync uses TLS in transit; destination encryption depends on the destination (SSE-S3 / SSE-KMS for S3, EFS encryption, FSx encryption).
  • Transfer Family enforces SSH (SFTP), TLS (FTPS), or S/MIME (AS2) in transit; data at rest inherits the destination's encryption.
  • Snow Family encrypts every byte with 256-bit encryption; keys in AWS KMS only.
  • DMS supports TLS to source and target endpoints and supports encrypting the replication instance storage.
For regulated workloads (HIPAA, PCI-DSS, GDPR), configure every AWS data transfer solution to use VPC endpoints (PrivateLink) so traffic never rides the public internet, and confirm destination encryption (SSE-KMS for S3, KMS-encrypted EBS snapshots for Volume Gateway, encrypted FSx/EFS for file destinations). The SAA-C03 tests this exact integration: "how do you copy 50 TB from on-prem to S3 without the data touching the internet?" Answer: DataSync over Direct Connect with a VPC endpoint.

Transfer Sizing — Estimating Time and Cost

Sizing the transfer is half the SAA-C03 data transfer question family. The arithmetic you need:

Time Arithmetic

Days of online transfer = (volume in TB × 8000) / (bandwidth in Mbps × 86.4 × utilization_factor)

With utilization ≈ 0.8 (80% of link usable after TCP overhead):

  • 10 TB over 100 Mbps: (10 × 8000) / (100 × 86.4 × 0.8) ≈ 11.6 days
  • 10 TB over 1 Gbps: ~1.2 days
  • 100 TB over 1 Gbps: ~11.6 days
  • 100 TB over 100 Mbps: ~116 days (hence: Snowball)
  • 1 PB over 10 Gbps: ~12 days

Cost Arithmetic

  • DataSync flat per-GB transferred plus destination request/storage/data-transfer.
  • Snowball Edge flat device rental ($X per job) plus shipping plus optional extra-day fees plus standard S3 request/storage. Under ~50 TB, cost per GB is typically higher than DataSync; above ~80 TB on a slow link, Snowball is dramatically cheaper.
  • Direct Connect monthly port fee ($) plus per-GB egress (much lower than internet egress).
  • DMS replication instance per-hour plus data transfer between VPC / Region / internet.
The time-arithmetic formula: days = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × 0.8). Internalize this. If days > 7, Snowball almost always wins on both time and cost. If days > 30, Snowball overwhelmingly wins. SAA-C03 scenarios are solvable with this one calculation and the three service-trigger keywords (ongoing → Storage Gateway, partner-SFTP → Transfer Family, database → DMS).

Key Numbers to Memorize

For quick SAA-C03 recall on AWS data transfer solutions:

  • Snowcone: 8 TB HDD or 14 TB SSD; 2 vCPUs; 4 GB RAM.
  • Snowball Edge Storage Optimized: ~80 TB usable HDD; 40 vCPUs; 80 GB RAM.
  • Snowball Edge Compute Optimized: ~42 TB; 52 vCPUs; 208 GB RAM; optional NVIDIA V100 GPU.
  • Snowmobile: up to 100 PB per truck; deprecated for new orders.
  • DataSync: up to ~10 Gbps per agent; flat per-GB pricing.
  • Transfer Family: SFTP, FTPS, FTP, AS2; backed by S3 or EFS.
  • Direct Connect: 1 / 10 / 100 Gbps dedicated ports; lower egress cost than internet.
  • S3 Transfer Acceleration: uses CloudFront edge network; per-bucket setting; per-GB premium on top of standard S3 pricing.
  • DMS task types: Full load, Full load + CDC, CDC only.
  • AWS SCT: heterogeneous schema conversion; free; desktop tool.
  • Arithmetic: days = (TB × 8000) / (Mbps × 86.4 × 0.8). Seven-day threshold = Snowball.

Common Exam Traps for AWS Data Transfer Solutions

Trap 1: DataSync vs Snowball

The question gives you a dataset size and a bandwidth, and asks which to use. Always do the days arithmetic. Under ~7 days online, DataSync; over ~7 days online, Snowball. Do not be fooled by "we want it secure" distractors — both are encrypted.

Trap 2: Storage Gateway vs DataSync

Storage Gateway is ongoing and bidirectional (apps keep using on-prem mounts while data lives in cloud). DataSync is one-time or scheduled and primarily one-way (copy, not mount). Trigger words: "replace tape library" / "ongoing NAS tiering" → Storage Gateway; "migrate this share to S3" → DataSync.

Trap 3: Transfer Family vs DataSync

Transfer Family is for external partners using a file-transfer protocol (SFTP / FTPS / FTP / AS2). DataSync is for you copying your own data. If the requirement says "our customers / vendors / partners send files," it is Transfer Family. If it says "we need to migrate / sync our own data," it is DataSync.

Trap 4: Snowball vs Snowmobile

Snowmobile is for exabyte scale (100 PB, a literal truck). Snowball Edge is for terabyte to petabyte scale. Today AWS recommends Snowball Edge fleets instead of Snowmobile for most large migrations. If a question says "50 PB datacenter evacuation," the historically correct answer is Snowmobile; the current AWS-recommended answer is a Snowball Edge fleet.

Trap 5: DMS vs DataSync for Databases

DMS is database-aware — it reads transaction logs, handles CDC, and writes to a live target database. DataSync is file/object-aware — it reads files. Dumping a database to flat files and moving with DataSync loses transactional consistency and costs you downtime. For any live database migration, DMS is the correct answer.

Trap 6: Direct Connect Alone Does Not Transfer Data

Direct Connect is the pipe. Data transfer services (DataSync, Storage Gateway, DMS) run over the pipe. If a question asks "how do we migrate 500 TB from on-prem to S3 over our existing Direct Connect," the answer is "DataSync over Direct Connect," not "Direct Connect alone."

Trap 7: S3 Transfer Acceleration Is Not DataSync

S3 Transfer Acceleration speeds up direct client uploads to a single S3 bucket via CloudFront edges. DataSync orchestrates file-system-to-S3 bulk copy with scheduling, incrementals, and validation. The question wording is the tell: "global users uploading directly to S3" → S3TA; "scheduled NFS share sync to S3" → DataSync.

The most common SAA-C03 distractor pattern: offering DataSync when the answer is Storage Gateway (ongoing hybrid), or offering Snowball when the answer is DataSync (online transfer fits in available bandwidth and time budget). Anchor every answer to the decision tree and the days arithmetic, not to familiarity bias.

Data Transfer Costs — Minimizing Egress and Transfer Fees

Cost optimization interleaves with transfer design. The high-impact levers:

  • Inbound data transfer into AWS Regions is free. You do not pay per GB to upload to S3 over the internet. Outbound egress is expensive, cross-Region is expensive, AZ-to-AZ has a small fee.
  • S3 Gateway VPC endpoints are free and remove NAT Gateway data-processing charges for S3 traffic from private subnets.
  • Direct Connect egress is much cheaper than internet egress per GB — often the break-even point for a 1 Gbps dedicated port is reached above ~10 TB/month of sustained egress.
  • DataSync has flat per-GB transfer pricing on top of destination storage/request costs; there is no compute premium for the copy.
  • Snowball Edge is a flat device rental fee plus shipping; there is no per-GB transfer charge for the Snow Family data itself (you still pay S3 storage for what lands in the bucket).
  • Transfer Family charges per-hour per enabled endpoint plus per-GB uploaded/downloaded — keep endpoints off when not in use.
The cheapest way to upload data into AWS is often the Snowball Edge for large datasets over slow links, because the bandwidth cost of an online transfer (your WAN bill, plus time-to-value cost) can dwarf the Snowball rental fee. Do the arithmetic: if online transfer takes longer than your deadline, Snowball is also the cost winner nine times out of ten.

Data Transfer vs Data Transformation — Where This Topic Ends

AWS data transfer solutions move bytes. They do not transform bytes. If you also need to convert CSV to Parquet, catalog the schema, clean up PII, or run analytics, those are jobs for AWS Glue, Amazon EMR, AWS Lambda, and Amazon Athena — covered in the Glue / EMR data transformation topic and the Athena / Lake Formation / QuickSight analytics topic. A typical production pipeline is "DataSync → S3 landing zone → Glue crawler → Glue job → S3 curated zone → Athena." Know the boundary — SAA-C03 will offer Glue as a distractor for transfer questions and Transfer Family as a distractor for transformation questions.

FAQ — AWS Data Transfer Solutions Top 7 Questions

1. When should I use AWS DataSync instead of writing my own rsync script?

Use DataSync whenever the source is NFS, SMB, HDFS, or an S3-compatible object store, and the destination is S3, EFS, or FSx. DataSync is a managed service — you pay a flat per-GB fee and AWS handles scheduling, retries, incremental detection, verification, bandwidth throttling, and metadata preservation. Writing your own rsync on EC2 means running a VM, monitoring it, handling crashes, reasoning about verification, and paying for the EC2 hours. DataSync wins on operational overhead for almost every real workload. The only time to roll your own is when the source protocol is not supported (rare) or the dataset is trivially small (a few GB, where any tool works).

2. How do I decide between online transfer and AWS Snow Family offline transfer?

Do the time arithmetic: days = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × 0.8). If the result is more than one week, Snow Family almost always wins on time, cost, and operational risk (no worries about WAN flaps, saturating production internet, or multi-day copy jobs being interrupted). If the result is less than a few days, online with DataSync is simpler. The bandwidth-volume grid: under 10 TB with reasonable bandwidth → online; over 100 TB with < 1 Gbps → Snow Family; in between → run the math.

3. What is the difference between AWS Transfer Family and AWS DataSync?

Transfer Family receives files from external parties using standard file-transfer protocols (SFTP, FTPS, FTP, AS2). DataSync copies data between your own storage systems on a schedule or on demand. If the scenario is "our partners need to drop daily files," Transfer Family. If the scenario is "we need to migrate or sync our own NFS share," DataSync.

4. Can AWS Database Migration Service migrate without downtime?

DMS supports near-zero-downtime migration using the Full load + CDC task type. The initial full load copies existing data while the CDC component captures every transaction from the source. When the CDC lag is small (seconds), you cut over the application — stop writes on the source, wait for CDC to drain, then start writes on the target. Actual downtime is typically minutes, not hours. For heterogeneous migrations (Oracle → Aurora PostgreSQL), run the AWS Schema Conversion Tool first to translate the schema, then run DMS.

5. When does AWS Direct Connect make sense for data transfer?

Direct Connect makes sense when you have ongoing, high-volume, predictable-latency transfer between your datacenter and AWS, or when you need lower egress cost per GB. For a one-time 10 TB migration, the internet over DataSync is fine. For a 50 TB-per-month ongoing sync with latency-sensitive workloads, Direct Connect + DataSync pays for itself quickly. Remember: Direct Connect is the pipe; you still need DataSync, Storage Gateway, or DMS on top to actually move the data.

6. How do I transfer data securely so nothing touches the public internet?

Combine three things: (a) AWS Direct Connect or AWS Site-to-Site VPN as the transport, (b) VPC endpoints (AWS PrivateLink) for the AWS service endpoints (DataSync, S3 Gateway endpoint, Transfer Family internal endpoint, DMS in-VPC endpoints), and (c) encryption at rest on the destination (SSE-KMS for S3, KMS-encrypted EBS snapshots, encrypted EFS/FSx). This is the standard pattern for HIPAA, PCI-DSS, and GDPR-regulated workloads, and SAA-C03 tests it directly.

7. Should I use S3 Transfer Acceleration for a bulk on-premises to S3 migration?

Usually no. S3 Transfer Acceleration is designed for globally distributed clients uploading to a single S3 bucket over the public internet — it routes traffic through CloudFront edges to ride the AWS backbone. For a bulk migration from a single on-premises datacenter, DataSync (for file-system sources) or Snowball Edge (for large one-time) is purpose-built and more cost-effective. Reach for S3TA when the requirement says "many users worldwide upload directly to S3 and we want the uploads to be faster."

Data Transfer Solutions — Summary

AWS data transfer solutions split into four families: online network transfer (DataSync, Transfer Family, Storage Gateway, S3 Transfer Acceleration), offline physical transfer (Snow Family), dedicated bandwidth (Direct Connect), and database-aware migration (DMS). The SAA-C03 tests three decision variables repeatedly: volume, bandwidth, and urgency. Run the days-online arithmetic first — if online would take longer than a week, pick Snow Family. Then look for trigger words: "ongoing hybrid" → Storage Gateway, "partner SFTP/FTPS/AS2" → Transfer Family, "database" → DMS, "global direct uploads" → S3 Transfer Acceleration. Everything else that is "copy this storage into AWS once or on a schedule" is DataSync, ideally over Direct Connect and a VPC endpoint for regulated workloads. Memorize the Snow Family capacities (Snowcone 8/14 TB, Snowball Edge Storage Optimized ~80 TB, Snowmobile up to 100 PB) and the arithmetic, and the entire AWS data transfer solutions question family collapses into a 30-second decision tree.

Official sources