AWS data transfer solutions are the services that move bytes between on-premises systems, third parties, and AWS Regions — without you writing rsync loops that die at 2 AM. The SAA-C03 Task 3.5 ("Determine high-performing data ingestion and transformation solutions") tests whether you can look at a migration scenario — "we have 500 TB in a datacenter, a 100 Mbps link, and 30 days to cut over" — and instantly pick between AWS DataSync, AWS Snowball Edge, AWS Snowmobile, AWS Transfer Family, AWS Storage Gateway, AWS Direct Connect, S3 Transfer Acceleration, and AWS Database Migration Service (DMS). This study note walks every AWS data transfer solution in the SAA-C03 scope, drills the online-vs-offline decision tree, and gives you enough repetition on the high-frequency keywords so you can eliminate wrong answers in under 30 seconds.
AWS data transfer solutions are one of the most testable topic families in Domain 3 because they are scenario-heavy. The exam rarely asks "what is DataSync" — it asks "you have 90 TB, a 1 Gbps link, and 10 days, pick the AWS data transfer service" and expects you to do the arithmetic in your head. Memorize the decision tree first, then the per-service details, then the traps.
What Are AWS Data Transfer Solutions?
AWS data transfer solutions are managed services, appliances, and network products that move data into, out of, and between AWS without requiring you to build the plumbing yourself. They span four categories:
- Online network transfer — AWS DataSync, AWS Transfer Family, S3 Transfer Acceleration, AWS Storage Gateway. Bytes ride over the public internet, a VPN, or AWS Direct Connect.
- Offline physical transfer — AWS Snow Family (Snowcone, Snowball Edge Storage Optimized, Snowball Edge Compute Optimized, and the now-deprecated Snowmobile). AWS literally ships you a ruggedized appliance, you load data, you ship it back.
- Database and schema transfer — AWS Database Migration Service (DMS) and the AWS Schema Conversion Tool (SCT). DMS handles homogeneous (Oracle→Oracle) and heterogeneous (Oracle→Aurora PostgreSQL) migrations with continuous replication.
- Dedicated bandwidth — AWS Direct Connect. A private fiber link from your datacenter to an AWS Direct Connect location, used for ongoing bulk transfer when the public internet is too slow, too expensive, or too unpredictable.
AWS data transfer solutions differ from AWS storage services: storage services (Amazon S3, Amazon EBS, Amazon EFS, Amazon FSx) persist the bytes; AWS data transfer solutions get the bytes from here to there. Most real migrations combine both — DataSync moves the bytes, S3 stores them, Glue transforms them.
AWS Data Transfer Solutions at a Glance
| Service | Online or Offline | Primary Use Case | Source / Destination |
|---|---|---|---|
| AWS DataSync | Online | Automated one-time or recurring bulk copy | NFS / SMB / HDFS / S3 / on-prem object store → S3 / EFS / FSx |
| AWS Transfer Family | Online | Managed SFTP / FTPS / FTP / AS2 ingestion from partners | External SFTP/FTPS/FTP/AS2 client → S3 / EFS |
| AWS Storage Gateway | Online (hybrid) | Ongoing on-prem ↔ AWS integration | NFS / SMB / iSCSI / VTL on-prem ↔ S3 / EBS snapshots / Glacier |
| S3 Transfer Acceleration | Online | Faster long-distance uploads to one S3 bucket | Any internet client → S3 via CloudFront edge |
| AWS Direct Connect | Online (dedicated) | High-bandwidth, low-latency, predictable pipe | On-prem datacenter ↔ AWS Region |
| AWS Snowcone | Offline (rugged, 8/14 TB) | Edge / small offline transfer | On-prem → ship → S3 |
| AWS Snowball Edge Storage Optimized | Offline (~80 TB usable) | Medium-to-large offline transfer | On-prem → ship → S3 |
| AWS Snowball Edge Compute Optimized | Offline + edge compute | Edge processing plus transfer | On-prem → ship → S3 |
| AWS Snowmobile | Offline (up to 100 PB, deprecated for new orders) | Exabyte-scale datacenter evacuation | On-prem semi-truck → S3 |
| AWS Database Migration Service | Online (DB-aware) | DB migration with continuous replication (CDC) | Source DB → target DB (RDS, Aurora, Redshift, DynamoDB, S3, etc.) |
Analogy 1: The Postal System (Online vs Offline Transfer)
Think of AWS data transfer solutions as a country-wide postal system. If you have a single postcard to send across town, you drop it in a mailbox and it arrives tomorrow — that is online transfer (DataSync, Transfer Family, S3 Transfer Acceleration). The road network (your internet link) is good enough. But if you want to move the entire contents of a warehouse — ten million books — no mailbox is going to help. You call a moving company with an eighteen-wheeler, they load everything at the warehouse, drive to the destination, and unload. That eighteen-wheeler is AWS Snowball Edge; the convoy of eighteen-wheelers is AWS Snowmobile. The pocket-sized courier envelope for an overnight ship is AWS Snowcone.
The reason AWS Snow Family exists is not that the internet is broken — it is that physics still wins. Shipping a hard drive across a country takes two days regardless of how big the hard drive is. Uploading 100 TB over a 100 Mbps link takes 92 days. If volume × urgency beats bandwidth, ship the disk.
Analogy 2: The Kitchen (DataSync vs Storage Gateway vs Transfer Family)
Imagine three kitchen roles, each for a different data transfer purpose:
- AWS DataSync is the moving van that brings the whole pantry from the old house to the new house. One-time or scheduled bulk move. You point it at an NFS share, it copies everything to S3 or EFS, it tracks incremental changes, and it stops when you tell it to stop. It has an agent (the driver) on-premises that orchestrates the copy.
- AWS Storage Gateway is the dumbwaiter between the old pantry and the new pantry — they stay connected forever. It caches hot items in the on-prem kitchen and keeps the cold inventory in the cloud pantry. File Gateway (NFS/SMB → S3), Volume Gateway (iSCSI → EBS snapshots), Tape Gateway (VTL → S3/Glacier).
- AWS Transfer Family is the service door where partners and suppliers drop off ingredients. SFTP, FTPS, FTP, or AS2 endpoints that deliver straight into S3 or EFS, with managed identity, managed TLS, and logical directories to mask bucket paths.
They look similar at a glance, but the role is different: DataSync moves, Storage Gateway bridges, Transfer Family receives from outside partners.
Analogy 3: Choosing a Shipping Method (The Decision Tree)
You have 500 boxes to ship. How do you choose?
- Small and urgent? Hand it to the courier (S3 Transfer Acceleration or DataSync).
- Medium and predictable? Regular freight truck (AWS Direct Connect for ongoing, DataSync for one-shot).
- Large and one-off? Rent a container truck (Snowball Edge). At some point the truck is cheaper per GB than paying bandwidth.
- Massive and datacenter-sized? Hire a convoy of semi-trucks (Snowmobile — though AWS has deprecated new orders and pushes large customers toward fleets of Snowball Edges).
The rule of thumb: if online transfer would take longer than one week over your available bandwidth, the Snow Family is almost always faster and cheaper. If online transfer takes longer than one month, the Snow Family is overwhelmingly the correct answer.
Plain-English conclusion: picking an AWS data transfer solution is a three-variable arithmetic problem — volume, bandwidth, urgency — and the decision tree tells you which service wins.
Core Operating Principles of AWS Data Transfer Solutions
All AWS data transfer solutions share a few design principles that repeat across the portfolio:
- Encryption in transit and at rest by default. DataSync uses TLS, Snow devices use 256-bit encryption with keys held in AWS KMS (never on the device), Transfer Family supports SFTP (SSH) / FTPS (TLS) / AS2 (signed and encrypted payloads). DMS supports TLS and SSL to source/target endpoints.
- Managed checkpointing and validation. DataSync validates every transferred object with metadata and can verify at the destination. Snow devices compute checksums at load and import time. DMS has validation-only tasks.
- Incremental where possible. DataSync detects changed files and only copies deltas on subsequent runs — this is the single biggest reason to prefer DataSync over hand-rolled
rsyncscripts. DMS Change Data Capture (CDC) streams ongoing changes after the initial full load. - IAM-based access control. Every data transfer service requires an IAM role on the AWS side. DataSync agents assume a role to write to S3; Transfer Family servers assume a role to write to S3/EFS; DMS replication instances run under a VPC with an IAM role.
- Separation of data plane and control plane. Control (create task, schedule, monitor) lives in the AWS Management Console / API; data plane (the actual bytes) flows over the chosen transport (internet, VPC endpoint, Direct Connect, or physical disk).
AWS DataSync — Automated Online Data Transfer
AWS DataSync is a managed, online, bulk-copy service that moves file and object data between on-premises storage and AWS, or between AWS storage services. It is the default answer for "we have 10 TB to 100 TB of files and a network link that is fast enough to move it in a few days."
DataSync Architecture
An AWS DataSync deployment has four components:
- Agent — A virtual appliance (VMware, Hyper-V, KVM, or Amazon EC2) installed near the source storage. The agent reads from NFS, SMB, HDFS, object stores, or Amazon S3 sources. For AWS-to-AWS transfers (S3 → S3 cross-Region, EFS → EFS), no agent is needed.
- Location — A definition of a source or destination. Examples:
nfs://filer01/data,smb://winfs01/share,s3://bucket-name/prefix,efs://fs-0abc/,fsx://fs-0xyz/. - Task — A pairing of a source location and a destination location, plus options (include/exclude patterns, verification mode, bandwidth limit, schedule).
- Task execution — A single run of a task. You can execute on demand, on a schedule (cron-like), or via the API.
Supported Sources and Destinations
AWS DataSync supports a broad matrix:
- On-premises sources: NFS (v3 / v4.0 / v4.1), SMB (2.1 / 3.x), HDFS, self-managed object storage (S3-compatible)
- AWS storage destinations/sources: Amazon S3 (all storage classes including Glacier for writes), Amazon EFS, Amazon FSx for Windows File Server, FSx for Lustre, FSx for OpenZFS, FSx for NetApp ONTAP
- Other clouds: Google Cloud Storage, Microsoft Azure Files, Azure Blob (with DataSync Discovery and agent-based tasks)
DataSync Task Options You Must Know
- Bandwidth throttling. You can cap a task at N MBps so the copy does not saturate your production internet link during business hours.
- Scheduling. Hourly, daily, weekly cron expressions. This is how you do "nightly incremental replication" of an on-prem NFS share into S3.
- Include / exclude filters. Glob patterns to skip temp files, build artifacts, or specific directories.
- Verification modes.
POINT_IN_TIME_CONSISTENT(default; verify entire dataset),ONLY_FILES_TRANSFERRED(faster; verify only what moved),NONE(skip verification, not recommended). - Incremental transfer. DataSync detects changed files by comparing source and destination metadata (size, modification time, optionally checksums). Only changed bytes are copied on subsequent runs — this is what makes DataSync dramatically more efficient than repeatedly copying the whole dataset.
- Transfer over VPC endpoints. You can route DataSync traffic through AWS PrivateLink so data never touches the public internet.
DataSync Throughput
A single DataSync task can push up to ~10 Gbps per agent in practice (AWS advertises up to tens of Gbps for aggregate tasks with multiple agents). For a 100 TB dataset on a 10 Gbps link with agent parallelism, the copy completes in roughly a day, modulo overhead.
DataSync Pricing Model
You pay per gigabyte transferred (flat rate) plus standard AWS request, storage, and data transfer charges on the destination. There is no charge for the number of agents, tasks, or schedules. This flat per-GB model is the key reason DataSync is so much cheaper than writing your own rsync over EC2 — you pay only for throughput, not for the compute that did the copy.
AWS Transfer Family — Managed SFTP, FTPS, FTP, and AS2
AWS Transfer Family is a fully managed service that exposes SFTP, FTPS, FTP, and AS2 endpoints backed by Amazon S3 or Amazon EFS. It is the right answer when external partners, vendors, or legacy systems need to drop files into AWS using a standard file-transfer protocol and you do not want to run an EC2 SFTP server yourself.
Protocol Endpoints
- SFTP (SSH File Transfer Protocol). Most common. TCP port 22. File transfer over SSH.
- FTPS (File Transfer Protocol over TLS). TCP port 21 (explicit) or 990 (implicit). Legacy but still widely used, especially in finance and healthcare.
- FTP (plain). Not encrypted. Only usable for VPC-internal traffic (the service refuses to expose plain FTP to the public internet).
- AS2 (Applicability Statement 2). Signed, encrypted, message-based B2B protocol used heavily in EDI (retail, logistics, healthcare). HTTP/HTTPS transport with S/MIME payloads and MDN receipts.
Identity Provider Options
Transfer Family supports three authentication backends:
- Service-managed users. Usernames and SSH public keys stored directly in the Transfer Family service. Simplest option for small partner lists.
- AWS Directory Service. Integrates with AWS Managed Microsoft AD or AD Connector for enterprise identity federation — partners authenticate with corporate credentials.
- Custom identity provider (Lambda-backed). A Lambda function that you write receives the username/password/SSH key and returns IAM role, home directory, and logical-directory mappings. This is what you use when your identity lives in Okta, Azure AD, or a home-grown user database.
Logical Directories
By default, an SFTP user dropped into s3://bucket-name/partner-a/ would see the full bucket path. Logical directories let you remap paths so partner A only sees /upload and /download — hiding bucket names, prefixes, and internal structure. This is a common compliance requirement.
Managed File Transfer Workflows
Transfer Family includes managed workflows — event-driven pipelines triggered on file arrival. A workflow can: decrypt PGP-encrypted files, move files to final S3 keys, invoke Lambda for custom processing, tag files for downstream jobs. This is the supported pattern for "when a partner SFTPs a file, validate it, decrypt it, and publish to a processing queue."
Endpoint Types and Network Exposure
Transfer Family servers can be:
- Public endpoint. Reachable from the internet at an AWS-assigned DNS name.
- VPC endpoint (internet-facing). Elastic IP(s) you control, placed in your VPC public subnets.
- VPC endpoint (internal). Private-only, reachable via VPN / Direct Connect / VPC peering.
AWS Snow Family — Offline Physical Transfer
AWS Snow Family is the portfolio of ruggedized hardware appliances AWS ships to your location for offline data transfer and edge compute. You load data on-site, ship the device back, and AWS imports the data into S3 inside the Region.
Snowcone
- Size: Portable, 4.5 lbs, fits in a backpack.
- Storage: 8 TB HDD or 14 TB SSD.
- Compute: 2 vCPUs, 4 GB RAM (enough for light edge workloads).
- Connectivity: Wi-Fi, Ethernet, or LTE (via AWS Snowcone with LTE modem).
- Use case: Tactical / field / first-responder data collection; small (single-digit TB) offline transfer; IoT edge.
- Shipping: Fits in a standard shipping envelope.
Snowball Edge Storage Optimized
- Storage: ~80 TB of usable HDD capacity (210 TB in the newer SSD variant for object storage tasks).
- Compute: Modest (e.g., 40 vCPUs, 80 GB RAM).
- Use case: The default choice for offline transfer in the ~30–100 TB range. Ship one, ship two, ship a fleet — the price-per-GB is dominated by device rental and shipping, not size.
Snowball Edge Compute Optimized
- Storage: ~42 TB usable (lower, because it trades some disk for compute).
- Compute: 52 vCPUs, 208 GB RAM, optional GPU (NVIDIA V100 in the GPU variant).
- Use case: Edge computing at disconnected / intermittently connected sites — run EC2 instances, Lambda functions, and EKS Anywhere on the device while it is on-site. Think offshore oil rigs, military forward bases, remote scientific stations. Data is collected, processed on-device, and the processed output goes back to AWS.
Snowmobile
- Capacity: Up to 100 PB per truck.
- Form factor: A 45-foot shipping container pulled by a semi-truck. Literally.
- Use case: Evacuating exabyte-scale on-prem datacenters. Historically used for media libraries, genomics archives, and satellite imagery hoards.
- Status (as of 2026): AWS no longer accepts new Snowmobile orders for most regions and recommends multiple Snowball Edge devices in parallel for PB-scale migrations. Still appears in SAA-C03 questions — know it exists, know the 100 PB number, know it is for "datacenter-evacuation-scale" only.
Snow Family Security
- All devices encrypt data with 256-bit encryption.
- Encryption keys are managed in AWS KMS and never stored on the device — if the device is lost or stolen, the data is unreadable.
- Tamper-evident and tamper-resistant enclosures with TPM.
- Trusted Platform Module (TPM) verifies device integrity on return.
- Chain-of-custody logged through the AWS OpsHub for Snow Family application and the Snow Family console.
The Snowball Decision Tree (Memorize This)
Use this mental model for SAA-C03 "pick the data transfer service" scenarios:
- Is the dataset > 100 TB AND the available bandwidth < 1 Gbps? → Snowball Edge (probably multiple devices in parallel).
- Is the dataset 10–100 TB with < 500 Mbps bandwidth? → Snowball Edge.
- Is the dataset 1–10 TB and you need edge compute too? → Snowcone (with LTE) or Snowball Edge Compute Optimized depending on compute needs.
- Is the dataset < 10 TB and bandwidth is adequate? → AWS DataSync online. Or S3 Transfer Acceleration for direct upload.
- Is it ongoing, not one-shot? → Storage Gateway (hybrid) or Direct Connect + DataSync (dedicated pipe).
- Is it datacenter-scale, PB-level, one-time? → Fleet of Snowball Edges (historically Snowmobile).
The arithmetic heuristic: compute how long online transfer takes. If it is more than one week, Snowball wins. If it is more than one month, Snowball wins overwhelmingly. The formula days = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × utilization) with utilization ≈ 0.8 is enough to solve any exam scenario in your head.
AWS Storage Gateway — Hybrid Cloud Storage Integration
AWS Storage Gateway is a hybrid, software-plus-service that exposes AWS storage to on-premises applications using standard storage protocols (NFS, SMB, iSCSI, iSCSI-VTL). Unlike DataSync, it is designed for ongoing connectivity, not one-time migration.
Storage Gateway Types
- Amazon S3 File Gateway. Exposes S3 buckets as NFS or SMB mounts. Files written on-prem become objects in S3 (one-to-one). Hot files are cached locally for low-latency reads.
- Amazon FSx File Gateway. Local cache in front of FSx for Windows File Server across a slow WAN. Helps remote offices access FSx with low latency.
- Volume Gateway. Presents iSCSI block volumes on-prem.
- Cached Volumes: primary data in S3, frequently accessed data cached locally.
- Stored Volumes: primary data on-prem, asynchronously backed up to S3 as EBS snapshots.
- Tape Gateway (VTL). Emulates an LTO tape library over iSCSI VTL. Backup software (Veritas NetBackup, Commvault, Veeam) writes "tapes" that are stored in S3 and archived to S3 Glacier / Glacier Deep Archive.
When to Pick Storage Gateway
- "We have NFS filers on-prem and want to gradually move cold data to S3 while keeping hot data fast locally." → File Gateway.
- "We need to replace a physical tape library with cloud storage but keep the existing backup software." → Tape Gateway.
- "We want on-prem block volumes backed up continuously to AWS as EBS snapshots." → Volume Gateway (Stored Volumes).
Picking the Transfer Method — The Decision Matrix
This is the core skill the SAA-C03 tests. Use these three variables in this order:
Variable 1: Data Volume
- < 10 TB → Online is almost always viable.
- 10 TB – 100 TB → Depends on bandwidth. Run the math.
- 100 TB – 1 PB → Snowball Edge fleet almost always wins.
-
1 PB → Historically Snowmobile; today a large Snowball Edge fleet.
Variable 2: Available Bandwidth
- < 100 Mbps (typical small-office broadband) → Snowball Edge past ~5 TB.
- 100 Mbps – 1 Gbps → Online viable up to ~50 TB over a few days.
- 1 Gbps – 10 Gbps (dedicated link) → Online viable up to ~500 TB over a week. DataSync + Direct Connect.
-
10 Gbps (multiple Direct Connects) → Online competitive even at PB scale for non-urgent migrations.
Variable 3: Urgency
- Hours → S3 Transfer Acceleration (if the dataset is small enough) or fleet of Snowballs in parallel.
- Days → DataSync over the fastest available link.
- Weeks → Snowball Edge is most cost-effective for most sizes.
- Ongoing (not a one-shot) → Storage Gateway or DataSync on a schedule, typically over Direct Connect.
The Decision Tree
Is it a database migration?
├── Yes → AWS DMS (with AWS Schema Conversion Tool for heterogeneous)
└── No ↓
Is it ongoing / continuous hybrid access?
├── Yes → AWS Storage Gateway
└── No ↓
Is it an external partner sending files via SFTP/FTPS/FTP/AS2?
├── Yes → AWS Transfer Family
└── No ↓
Compute days_online = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × 0.8)
Is days_online > 7?
├── Yes → AWS Snow Family (Snowcone / Snowball Edge / fleet)
└── No ↓
Is it a single S3 bucket upload from globally distributed clients?
├── Yes → S3 Transfer Acceleration
└── No → AWS DataSync (with or without Direct Connect)
AWS Direct Connect for Ongoing Bulk Transfer
AWS Direct Connect is a dedicated network connection from your datacenter, office, or colocation environment to AWS. It provides:
- Dedicated bandwidth: 1 Gbps, 10 Gbps, or 100 Gbps dedicated ports; sub-1 Gbps hosted connections available through AWS Direct Connect partners.
- Private, predictable latency: Traffic never rides the public internet, so latency variance is low.
- Lower data-transfer cost: Egress from AWS over Direct Connect is priced significantly below internet egress per GB.
- Virtual Interfaces (VIFs): Private VIF (to a VPC), Public VIF (to AWS public services like S3), Transit VIF (to a Transit Gateway).
Direct Connect for Data Transfer
Direct Connect is not itself a data transfer service — it is the pipe that other data transfer services use. Combine Direct Connect with:
- DataSync over Direct Connect for bulk one-time or scheduled transfer without touching the public internet.
- Storage Gateway over Direct Connect for hybrid integration with predictable latency.
- DMS over Direct Connect for database migration with stable replication throughput.
Direct Connect + VPN for Encryption
A raw Direct Connect circuit is private but not encrypted. For end-to-end encryption you layer a VPN over the Direct Connect (called "Direct Connect + VPN" or an IPsec tunnel over a public VIF).
Amazon S3 Transfer Acceleration
Amazon S3 Transfer Acceleration (S3TA) speeds up long-distance uploads to S3 by routing traffic through the AWS global edge network (the same 600+ CloudFront edge points of presence). Clients upload to a nearby edge location, and AWS moves the bytes to the destination S3 bucket over the AWS backbone.
When to Use S3 Transfer Acceleration
- Globally distributed clients uploading to a single S3 bucket.
- Large objects (≥ 100 MB) where the AWS backbone meaningfully outperforms the public internet path.
- Single-bucket workloads — S3TA is enabled per bucket.
- Workloads where the S3 Transfer Acceleration pricing premium is worth the speedup.
When Not to Use S3 Transfer Acceleration
- Small objects (< 1 MB); the overhead dwarfs the speedup.
- Clients in the same Region as the bucket; they already benefit from AWS's regional backbone.
- Cost-sensitive workloads where the per-GB acceleration premium exceeds the value.
Transfer Acceleration vs Multipart Upload
Multipart upload parallelizes one large upload into chunks — it is independent of S3TA and should always be used for objects > 100 MB. You can combine both: multipart upload over S3 Transfer Acceleration.
AWS Database Migration Service — Moving Databases
AWS Database Migration Service (DMS) migrates relational databases, data warehouses, and NoSQL stores into AWS with minimal downtime. It supports two migration classes:
Homogeneous Migration
Source and target speak the same engine — Oracle to Oracle, MySQL to MySQL, PostgreSQL to PostgreSQL. Schema conversion is trivial (or identical). Example: self-managed MySQL on EC2 → Amazon RDS for MySQL.
Heterogeneous Migration
Source and target speak different engines — Oracle to Amazon Aurora PostgreSQL, SQL Server to Amazon RDS for MySQL, Oracle to DynamoDB. Schema conversion is non-trivial and handled by the AWS Schema Conversion Tool (SCT), a desktop application that analyzes the source schema, produces a target schema, and flags manual-fix items. SCT runs first; DMS runs second to move the data.
DMS Architecture
- Replication instance. An EC2 instance (managed by DMS) that runs the replication engine. Sized for throughput (dms.t3.medium for tests, dms.c5.4xlarge for production-grade migrations).
- Endpoints. Source endpoint and target endpoint definitions (connection strings, credentials, SSL options).
- Replication task. The unit of migration. Three task types:
- Full load. One-time copy of existing data.
- Full load + CDC. Full load followed by continuous change data capture to keep the target in sync until cutover.
- CDC only. Apply only ongoing changes (used after a separate one-time load).
Supported Sources and Targets
- Sources: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, Amazon Aurora, IBM Db2, SAP ASE, Azure SQL, Google Cloud SQL.
- Targets: All of the above, plus Amazon Redshift, Amazon DynamoDB, Amazon S3 (as Parquet / CSV), Amazon OpenSearch Service, Amazon Kinesis Data Streams, Amazon MSK, Amazon Neptune, Babelfish for Aurora PostgreSQL.
DMS Typical Migration Flow
- Run AWS SCT against the source to produce a target schema (heterogeneous only).
- Create the target database (Aurora PostgreSQL, RDS, Redshift, etc.).
- Create a DMS replication instance in a VPC with network access to both endpoints.
- Define source and target endpoints and test connectivity.
- Launch a "full load + CDC" task.
- When CDC lag is near zero, cut over the application and stop the task.
Securing AWS Data Transfer Access Points
Every AWS data transfer solution must be locked down for production use. The SAA-C03 tests security integration heavily.
IAM Roles for Transfer Services
- DataSync agents assume an IAM role to write to S3 / EFS / FSx. The role must have
s3:PutObject,s3:ListBucket, and destination-specific permissions. - Transfer Family servers assign an IAM role per user (service-managed) or per authentication callback (custom IdP). The role scopes which bucket/prefix the user can read/write.
- DMS replication instances run in a VPC and use an IAM role (
dms-vpc-role) plus endpoint-specific credentials. - Snow Family devices are tied to an IAM role for S3 import and AWS KMS keys for encryption.
VPC Endpoints for Private Transfer
Route data transfer traffic through AWS PrivateLink so it never crosses the public internet:
- DataSync over VPC endpoint (interface type).
- S3 Gateway VPC endpoint for any service that writes to S3 (free, regional).
- DMS endpoints inside the VPC with private IP addressing.
- Transfer Family with VPC endpoint (internal) hostname so only VPN/Direct Connect clients can reach it.
Encryption
- DataSync uses TLS in transit; destination encryption depends on the destination (SSE-S3 / SSE-KMS for S3, EFS encryption, FSx encryption).
- Transfer Family enforces SSH (SFTP), TLS (FTPS), or S/MIME (AS2) in transit; data at rest inherits the destination's encryption.
- Snow Family encrypts every byte with 256-bit encryption; keys in AWS KMS only.
- DMS supports TLS to source and target endpoints and supports encrypting the replication instance storage.
Transfer Sizing — Estimating Time and Cost
Sizing the transfer is half the SAA-C03 data transfer question family. The arithmetic you need:
Time Arithmetic
Days of online transfer = (volume in TB × 8000) / (bandwidth in Mbps × 86.4 × utilization_factor)
With utilization ≈ 0.8 (80% of link usable after TCP overhead):
- 10 TB over 100 Mbps: (10 × 8000) / (100 × 86.4 × 0.8) ≈ 11.6 days
- 10 TB over 1 Gbps: ~1.2 days
- 100 TB over 1 Gbps: ~11.6 days
- 100 TB over 100 Mbps: ~116 days (hence: Snowball)
- 1 PB over 10 Gbps: ~12 days
Cost Arithmetic
- DataSync flat per-GB transferred plus destination request/storage/data-transfer.
- Snowball Edge flat device rental ($X per job) plus shipping plus optional extra-day fees plus standard S3 request/storage. Under ~50 TB, cost per GB is typically higher than DataSync; above ~80 TB on a slow link, Snowball is dramatically cheaper.
- Direct Connect monthly port fee ($) plus per-GB egress (much lower than internet egress).
- DMS replication instance per-hour plus data transfer between VPC / Region / internet.
Key Numbers to Memorize
For quick SAA-C03 recall on AWS data transfer solutions:
- Snowcone: 8 TB HDD or 14 TB SSD; 2 vCPUs; 4 GB RAM.
- Snowball Edge Storage Optimized: ~80 TB usable HDD; 40 vCPUs; 80 GB RAM.
- Snowball Edge Compute Optimized: ~42 TB; 52 vCPUs; 208 GB RAM; optional NVIDIA V100 GPU.
- Snowmobile: up to 100 PB per truck; deprecated for new orders.
- DataSync: up to ~10 Gbps per agent; flat per-GB pricing.
- Transfer Family: SFTP, FTPS, FTP, AS2; backed by S3 or EFS.
- Direct Connect: 1 / 10 / 100 Gbps dedicated ports; lower egress cost than internet.
- S3 Transfer Acceleration: uses CloudFront edge network; per-bucket setting; per-GB premium on top of standard S3 pricing.
- DMS task types: Full load, Full load + CDC, CDC only.
- AWS SCT: heterogeneous schema conversion; free; desktop tool.
- Arithmetic: days = (TB × 8000) / (Mbps × 86.4 × 0.8). Seven-day threshold = Snowball.
Common Exam Traps for AWS Data Transfer Solutions
Trap 1: DataSync vs Snowball
The question gives you a dataset size and a bandwidth, and asks which to use. Always do the days arithmetic. Under ~7 days online, DataSync; over ~7 days online, Snowball. Do not be fooled by "we want it secure" distractors — both are encrypted.
Trap 2: Storage Gateway vs DataSync
Storage Gateway is ongoing and bidirectional (apps keep using on-prem mounts while data lives in cloud). DataSync is one-time or scheduled and primarily one-way (copy, not mount). Trigger words: "replace tape library" / "ongoing NAS tiering" → Storage Gateway; "migrate this share to S3" → DataSync.
Trap 3: Transfer Family vs DataSync
Transfer Family is for external partners using a file-transfer protocol (SFTP / FTPS / FTP / AS2). DataSync is for you copying your own data. If the requirement says "our customers / vendors / partners send files," it is Transfer Family. If it says "we need to migrate / sync our own data," it is DataSync.
Trap 4: Snowball vs Snowmobile
Snowmobile is for exabyte scale (100 PB, a literal truck). Snowball Edge is for terabyte to petabyte scale. Today AWS recommends Snowball Edge fleets instead of Snowmobile for most large migrations. If a question says "50 PB datacenter evacuation," the historically correct answer is Snowmobile; the current AWS-recommended answer is a Snowball Edge fleet.
Trap 5: DMS vs DataSync for Databases
DMS is database-aware — it reads transaction logs, handles CDC, and writes to a live target database. DataSync is file/object-aware — it reads files. Dumping a database to flat files and moving with DataSync loses transactional consistency and costs you downtime. For any live database migration, DMS is the correct answer.
Trap 6: Direct Connect Alone Does Not Transfer Data
Direct Connect is the pipe. Data transfer services (DataSync, Storage Gateway, DMS) run over the pipe. If a question asks "how do we migrate 500 TB from on-prem to S3 over our existing Direct Connect," the answer is "DataSync over Direct Connect," not "Direct Connect alone."
Trap 7: S3 Transfer Acceleration Is Not DataSync
S3 Transfer Acceleration speeds up direct client uploads to a single S3 bucket via CloudFront edges. DataSync orchestrates file-system-to-S3 bulk copy with scheduling, incrementals, and validation. The question wording is the tell: "global users uploading directly to S3" → S3TA; "scheduled NFS share sync to S3" → DataSync.
Data Transfer Costs — Minimizing Egress and Transfer Fees
Cost optimization interleaves with transfer design. The high-impact levers:
- Inbound data transfer into AWS Regions is free. You do not pay per GB to upload to S3 over the internet. Outbound egress is expensive, cross-Region is expensive, AZ-to-AZ has a small fee.
- S3 Gateway VPC endpoints are free and remove NAT Gateway data-processing charges for S3 traffic from private subnets.
- Direct Connect egress is much cheaper than internet egress per GB — often the break-even point for a 1 Gbps dedicated port is reached above ~10 TB/month of sustained egress.
- DataSync has flat per-GB transfer pricing on top of destination storage/request costs; there is no compute premium for the copy.
- Snowball Edge is a flat device rental fee plus shipping; there is no per-GB transfer charge for the Snow Family data itself (you still pay S3 storage for what lands in the bucket).
- Transfer Family charges per-hour per enabled endpoint plus per-GB uploaded/downloaded — keep endpoints off when not in use.
Data Transfer vs Data Transformation — Where This Topic Ends
AWS data transfer solutions move bytes. They do not transform bytes. If you also need to convert CSV to Parquet, catalog the schema, clean up PII, or run analytics, those are jobs for AWS Glue, Amazon EMR, AWS Lambda, and Amazon Athena — covered in the Glue / EMR data transformation topic and the Athena / Lake Formation / QuickSight analytics topic. A typical production pipeline is "DataSync → S3 landing zone → Glue crawler → Glue job → S3 curated zone → Athena." Know the boundary — SAA-C03 will offer Glue as a distractor for transfer questions and Transfer Family as a distractor for transformation questions.
FAQ — AWS Data Transfer Solutions Top 7 Questions
1. When should I use AWS DataSync instead of writing my own rsync script?
Use DataSync whenever the source is NFS, SMB, HDFS, or an S3-compatible object store, and the destination is S3, EFS, or FSx. DataSync is a managed service — you pay a flat per-GB fee and AWS handles scheduling, retries, incremental detection, verification, bandwidth throttling, and metadata preservation. Writing your own rsync on EC2 means running a VM, monitoring it, handling crashes, reasoning about verification, and paying for the EC2 hours. DataSync wins on operational overhead for almost every real workload. The only time to roll your own is when the source protocol is not supported (rare) or the dataset is trivially small (a few GB, where any tool works).
2. How do I decide between online transfer and AWS Snow Family offline transfer?
Do the time arithmetic: days = (volume_TB × 8000) / (bandwidth_Mbps × 86.4 × 0.8). If the result is more than one week, Snow Family almost always wins on time, cost, and operational risk (no worries about WAN flaps, saturating production internet, or multi-day copy jobs being interrupted). If the result is less than a few days, online with DataSync is simpler. The bandwidth-volume grid: under 10 TB with reasonable bandwidth → online; over 100 TB with < 1 Gbps → Snow Family; in between → run the math.
3. What is the difference between AWS Transfer Family and AWS DataSync?
Transfer Family receives files from external parties using standard file-transfer protocols (SFTP, FTPS, FTP, AS2). DataSync copies data between your own storage systems on a schedule or on demand. If the scenario is "our partners need to drop daily files," Transfer Family. If the scenario is "we need to migrate or sync our own NFS share," DataSync.
4. Can AWS Database Migration Service migrate without downtime?
DMS supports near-zero-downtime migration using the Full load + CDC task type. The initial full load copies existing data while the CDC component captures every transaction from the source. When the CDC lag is small (seconds), you cut over the application — stop writes on the source, wait for CDC to drain, then start writes on the target. Actual downtime is typically minutes, not hours. For heterogeneous migrations (Oracle → Aurora PostgreSQL), run the AWS Schema Conversion Tool first to translate the schema, then run DMS.
5. When does AWS Direct Connect make sense for data transfer?
Direct Connect makes sense when you have ongoing, high-volume, predictable-latency transfer between your datacenter and AWS, or when you need lower egress cost per GB. For a one-time 10 TB migration, the internet over DataSync is fine. For a 50 TB-per-month ongoing sync with latency-sensitive workloads, Direct Connect + DataSync pays for itself quickly. Remember: Direct Connect is the pipe; you still need DataSync, Storage Gateway, or DMS on top to actually move the data.
6. How do I transfer data securely so nothing touches the public internet?
Combine three things: (a) AWS Direct Connect or AWS Site-to-Site VPN as the transport, (b) VPC endpoints (AWS PrivateLink) for the AWS service endpoints (DataSync, S3 Gateway endpoint, Transfer Family internal endpoint, DMS in-VPC endpoints), and (c) encryption at rest on the destination (SSE-KMS for S3, KMS-encrypted EBS snapshots, encrypted EFS/FSx). This is the standard pattern for HIPAA, PCI-DSS, and GDPR-regulated workloads, and SAA-C03 tests it directly.
7. Should I use S3 Transfer Acceleration for a bulk on-premises to S3 migration?
Usually no. S3 Transfer Acceleration is designed for globally distributed clients uploading to a single S3 bucket over the public internet — it routes traffic through CloudFront edges to ride the AWS backbone. For a bulk migration from a single on-premises datacenter, DataSync (for file-system sources) or Snowball Edge (for large one-time) is purpose-built and more cost-effective. Reach for S3TA when the requirement says "many users worldwide upload directly to S3 and we want the uploads to be faster."
Data Transfer Solutions — Summary
AWS data transfer solutions split into four families: online network transfer (DataSync, Transfer Family, Storage Gateway, S3 Transfer Acceleration), offline physical transfer (Snow Family), dedicated bandwidth (Direct Connect), and database-aware migration (DMS). The SAA-C03 tests three decision variables repeatedly: volume, bandwidth, and urgency. Run the days-online arithmetic first — if online would take longer than a week, pick Snow Family. Then look for trigger words: "ongoing hybrid" → Storage Gateway, "partner SFTP/FTPS/AS2" → Transfer Family, "database" → DMS, "global direct uploads" → S3 Transfer Acceleration. Everything else that is "copy this storage into AWS once or on a schedule" is DataSync, ideally over Direct Connect and a VPC endpoint for regulated workloads. Memorize the Snow Family capacities (Snowcone 8/14 TB, Snowball Edge Storage Optimized ~80 TB, Snowmobile up to 100 PB) and the arithmetic, and the entire AWS data transfer solutions question family collapses into a 30-second decision tree.