Supervised, Unsupervised, and Reinforcement Learning

The Supervised, Unsupervised, and Reinforcement Learning taxonomy is the first vocabulary AIF-C01 candidates must master in Domain 1. Task statement 1.1 "Explain basic AI concepts and terminologies" routinely puts a business scenario in front of you and asks which flavor of supervised, unsupervised, and reinforcement learning applies. Pick wrong and a good chunk of the 20% Domain 1 weight evaporates.

This study guide walks every dimension of supervised, unsupervised, and reinforcement learning that appears on AIF-C01, plus the self-supervised and semi-supervised variants that power modern foundation models. You will learn the label-availability heuristic that decides the family, the AWS services that implement each paradigm (SageMaker built-in algorithms, SageMaker JumpStart, Amazon DeepRacer, Amazon Bedrock fine-tuning), and the subtle supervised, unsupervised, and reinforcement learning distinctions the AIF-C01 exam uses as trap doors — including the notorious Domain Adaptation vs Transfer Learning pair flagged in community pain-point reports.

What Are Supervised, Unsupervised, and Reinforcement Learning?

Supervised, unsupervised, and reinforcement learning are the three canonical machine-learning paradigms, distinguished by the kind of training signal the model sees. Supervised learning uses labeled input-output pairs. Unsupervised learning uses inputs only and hunts for structure. Reinforcement learning uses a reward signal earned by interacting with an environment. Every ML problem on AIF-C01 — and in the wider AWS ML stack — ultimately reduces to one of these three (plus two hybrids we will cover: self-supervised and semi-supervised).

The AIF-C01 exam guide lists these paradigms explicitly under Task 1.1 and expects candidates to (a) recognize each by keywords in a scenario, (b) pick the correct AWS service that hosts that paradigm, and (c) avoid the trap questions where a plausible-sounding but wrong paradigm is offered.

Supervised learning trains a model on (input, label) pairs so it can predict labels for new inputs. Unsupervised learning trains a model on inputs only and finds hidden structure (clusters, low-dimensional embeddings, anomalies). Reinforcement learning trains an agent to pick actions in an environment to maximize cumulative reward. These three categories — collectively supervised, unsupervised, and reinforcement learning — cover the vast majority of classical ML on AWS. Source ↗

Why AIF-C01 Obsesses Over the Three-Family Taxonomy

Domain 1 is worth 20% of the AIF-C01 exam and Task 1.1 alone drives roughly six to ten scored questions in a 65-question sitting. Scenario questions almost never ask "define reinforcement learning." Instead they describe a problem ("an engineering team wants a robot to learn to walk without hand-written rules") and expect you to snap to the paradigm. Memorizing keywords is part of it; understanding the shape of the training signal is the durable skill.

How This Topic Links to the Rest of AIF-C01

The supervised, unsupervised, and reinforcement learning vocabulary is the conceptual root of:

Classification metrics (accuracy, precision, recall, F1) — evaluated in Overfitting, Bias, and Variance
The ML Development Lifecycle — data labeling cost depends directly on which paradigm you picked
Foundation models — pre-training is self-supervised; fine-tuning is supervised; RLHF is reinforcement learning from human feedback
AWS service mapping — SageMaker built-in algorithms partition cleanly into the three families

Why Learning Type Matters: The Label Availability Test

Before any algorithm selection, AWS recommends a single diagnostic question: do you have labels, can you get them cheaply, and what do the labels look like? The answer routes you through supervised, unsupervised, and reinforcement learning in predictable ways.

Lots of labeled data → supervised learning is default
No labels but lots of raw inputs → unsupervised learning
No labels but a simulator/environment with a scoring function → reinforcement learning
Labels are expensive but unlabeled data is abundant → semi-supervised or self-supervised learning
Labels emerge from the data itself (predict next token, predict masked pixel) → self-supervised learning

This ordering reappears below and is the single most useful heuristic for supervised, unsupervised, and reinforcement learning scenario questions.

白話文解釋 Supervised, Unsupervised, and Reinforcement Learning

The supervised, unsupervised, and reinforcement learning distinction becomes obvious when you step out of ML jargon and into everyday environments. Three analogies from different domains make the differences unforgettable.

The Open-Book Exam Analogy (Supervised Learning)

Imagine a student preparing for a certification exam. The study materials are a giant question bank where every question comes with the correct answer printed next to it. The student reviews thousands of (question, answer) pairs, notices patterns in how answers are structured, and on exam day is presented with brand-new questions whose answers are hidden. If the student has truly learned, predictions on the new questions will be right most of the time.

That is supervised learning. The labeled question bank is the training set. The hidden-answer exam is the test set. The whole point is generalization from labels you already have to inputs you have not seen. Classification problems (Is this email spam? yes/no) and regression problems (What will this house sell for? a number) are just different output shapes of the same supervised, unsupervised, and reinforcement learning family — the supervised branch.

The Library Analogy (Unsupervised Learning)

Now imagine a new librarian walking into a warehouse full of unshelved, uncatalogued books. Nobody told them which books belong to which genre. All they have is the text itself. Their job is to notice that some books share vocabulary, themes, and writing style — "these fifty books keep mentioning starships; they probably belong together." The librarian is not predicting a pre-existing label. They are discovering structure latent in the data.

That is unsupervised learning. K-Means clustering groups similar customers. Principal Component Analysis squeezes 500 features into 20 while keeping most of the variance. Anomaly detection notices a transaction that does not look like any of the clusters and flags it as potential fraud. Across all three, no ground-truth label exists — the supervised, unsupervised, and reinforcement learning unsupervised branch earns its keep by surfacing patterns no one explicitly taught the model.

The Video-Game Analogy (Reinforcement Learning)

Finally, picture a child learning a new platformer video game. Nobody gives them a textbook of correct moves. They press buttons, fall in lava, die, respawn, press different buttons, reach checkpoints, collect coins, unlock levels. Every reward (coin collected, level cleared) or penalty (death, lost life) adjusts their strategy. Over hours of play, a policy emerges: this state plus this button-combo yields the highest cumulative score.

That is reinforcement learning. The agent is the child, the environment is the game, the state is the current screen, the action is the button press, and the reward is the score. Unlike supervised learning, there is no teacher saying "correct action was X." Unlike unsupervised learning, there is a clear objective (maximize score). Reinforcement learning is the supervised, unsupervised, and reinforcement learning family branch that thrives when you can score outcomes but cannot enumerate correct actions — exactly what AWS DeepRacer simulates on a virtual race track.

Which Analogy to Use on Exam Day

All three analogies describe the supervised, unsupervised, and reinforcement learning taxonomy from a different angle. Pick based on scenario wording:

Scenario mentions labeled data, historical outcomes, training examples with answers → open-book exam (supervised)
Scenario mentions discover patterns, group similar items, no labels available → library (unsupervised)
Scenario mentions agent, environment, reward, policy, trial and error, simulator → video game (reinforcement)

Supervised Learning: Labeled Data, Precise Predictions

Supervised learning is the most common form of supervised, unsupervised, and reinforcement learning in production AWS workloads and the most common AIF-C01 scenario. You have historical data where every row carries a ground-truth answer, and you want the model to predict that answer on future rows.

The Core Assumption: Labels Exist

Supervised learning presumes you either already have labels or can obtain them. Labels might come from human annotators, business outcomes that naturally labeled themselves (the customer did churn, or did not), or automated labeling via Amazon SageMaker Ground Truth. Without labels, you cannot do supervised learning, full stop — that constraint is what separates it from its supervised, unsupervised, and reinforcement learning siblings.

Classification vs Regression: The Output-Type Fork

Supervised learning splits into two sub-categories by output type:

Classification predicts a category from a discrete set. Is this email spam or not? Is this X-ray showing pneumonia, tuberculosis, or nothing? Classification outputs a class label (sometimes with a probability).
Regression predicts a continuous numeric value. What will this house sell for? How many units will we ship next month? Regression outputs a real number.

The AIF-C01 exam repeatedly tests this fork. Any scenario whose answer is a number (price, temperature, demand) is regression. Any scenario whose answer is a category (yes/no, A/B/C/D, fraud/not-fraud) is classification.

Binary classification is still classification, not regression. Candidates sometimes pick "regression" for a yes/no problem because yes/no can be encoded as 0/1. Do not fall for this. Two-class output = binary classification. Only pick regression when the target is a continuous real number. This is the single most-reported supervised-learning trap in AIF-C01 practice-exam reports for supervised, unsupervised, and reinforcement learning questions. Source ↗

Training, Validation, and Test Splits

Supervised learning rests on splitting labeled data into three subsets:

Training set (typically 60–80%) — the model learns patterns here.
Validation set (typically 10–20%) — used to tune hyperparameters without touching the test set.
Test set (typically 10–20%) — used exactly once at the end to estimate real-world performance.

Mixing these up leaks information and inflates accuracy. AIF-C01 expects you to know that tuning on the test set is a cardinal sin and the validation set exists precisely to prevent it. The formal name is cross-validation when the splits are rotated, and k-fold cross-validation when the rotation covers k distinct folds.

Common Supervised Algorithms (Conceptual Level)

AIF-C01 does not require mathematical derivations, but expects recognition of the following algorithm families:

Linear regression — fits a line (or hyperplane) to continuous outputs.
Logistic regression — despite the name, this is classification, outputting a probability.
Decision trees — rule-based splits, highly interpretable.
Random forests — ensembles of decision trees, robust to overfitting.
Gradient-boosted trees / XGBoost — the workhorse of tabular ML on AWS; SageMaker has a built-in XGBoost algorithm.
Support vector machines (SVMs) — margin-based classifiers.
Neural networks — from shallow MLPs to deep CNNs and transformers.

AWS Services That Host Supervised Learning

The supervised branch of supervised, unsupervised, and reinforcement learning is the best-covered by AWS services:

Amazon SageMaker built-in algorithms — Linear Learner, XGBoost, Factorization Machines, Image Classification, Object Detection, BlazingText (supervised mode), Seq2Seq.
Amazon SageMaker JumpStart — pre-trained supervised models for text, vision, and tabular tasks.
Amazon SageMaker Canvas — no-code supervised classification and regression for business analysts.
Amazon Comprehend Custom Classification and Custom Entity Recognition — supervised NLP on your labeled text.
Amazon Rekognition Custom Labels — supervised image classification and object detection on your labeled images.
Amazon Fraud Detector — supervised fraud classifier trained on your historical outcomes.

When a AIF-C01 scenario contains phrases like "historical labeled data," "predict future value," "classify into categories," or "trained on past outcomes," the answer lives in the supervised branch of supervised, unsupervised, and reinforcement learning. Pair that keyword recognition with the regression-vs-classification output-type fork to lock in the correct answer fast. Source ↗

Unsupervised Learning: Pattern Discovery Without Labels

Unsupervised learning is the second family of supervised, unsupervised, and reinforcement learning and is used whenever labels are missing or expensive. Instead of predicting a target, unsupervised models find structure in the data itself.

Clustering: Grouping Similar Items

Clustering partitions inputs into groups such that items inside a cluster are more similar to one another than to items in other clusters.

K-Means — pick k, then iteratively assign points to nearest centroid and recompute centroids until stable. AIF-C01's most-tested clustering algorithm.
Hierarchical clustering — build a tree of nested clusters (agglomerative bottom-up or divisive top-down).
DBSCAN — density-based, handles arbitrary cluster shapes and marks outliers automatically.

K-Means is available as a SageMaker built-in algorithm, which is the most common AIF-C01 exam-surface mapping.

Dimensionality Reduction: Compression Without Labels

When your data has 500 features, most downstream algorithms slow down or overfit. Dimensionality reduction compresses the feature space while preserving as much signal as possible.

Principal Component Analysis (PCA) — projects data onto orthogonal axes that capture maximum variance. Available as a SageMaker built-in algorithm.
t-SNE and UMAP — nonlinear embedding methods mostly used for visualization.
Autoencoders — neural networks that learn to compress and reconstruct inputs; the bottleneck layer is the low-dimensional representation.

Anomaly Detection: Finding the Weird Ones

Anomaly detection surfaces inputs that do not fit the observed pattern. Fraud detection, infrastructure monitoring, and manufacturing defect detection all lean on this capability.

Random Cut Forest (RCF) — SageMaker's built-in anomaly algorithm, also used inside Amazon Kinesis Data Analytics and Amazon Lookout.
Isolation Forest — isolates anomalies by random partitioning; shorter paths correspond to outliers.
Amazon Lookout for Metrics / Equipment / Vision — managed anomaly-detection services built on unsupervised techniques.

AWS Services That Host Unsupervised Learning

Unsupervised supervised, unsupervised, and reinforcement learning coverage on AWS:

SageMaker built-in algorithms — K-Means, PCA, Random Cut Forest, IP Insights (anomaly on IP pairs).
Amazon Lookout for Metrics — anomaly detection on time-series business KPIs.
Amazon Lookout for Equipment — anomaly detection on industrial sensor data.
Amazon Lookout for Vision — visual defect detection using unsupervised feature learning.
Amazon Kinesis Data Analytics — RANDOM_CUT_FOREST SQL function for streaming anomaly detection.

Clustering is not classification. Classification assigns items to pre-existing, labeled classes (spam vs not-spam). Clustering discovers groups with no prior labels. If a AIF-C01 scenario has a fixed set of named categories, the answer is classification (supervised). If the scenario says "group similar customers into segments we have not defined yet," the answer is clustering (unsupervised). This is the most-reported supervised, unsupervised, and reinforcement learning keyword trap. Source ↗

Reinforcement Learning: Learning from Rewards

Reinforcement learning (RL) is the third canonical family of supervised, unsupervised, and reinforcement learning and the one most likely to appear as a distractor when it is actually correct. AIF-C01 covers RL at a conceptual level — enough to identify it from scenario wording and map it to AWS DeepRacer and SageMaker RL.

The Agent-Environment-Reward-Policy Loop

RL formalizes the trial-and-error learning loop with five standard terms:

Agent — the decision-maker being trained.
Environment — the world the agent interacts with (real or simulated).
State — a snapshot of the environment the agent observes.
Action — what the agent chooses to do.
Reward — numeric feedback from the environment after the action.
Policy — the learned mapping from states to actions; the "strategy" of the agent.

Each time the agent picks an action, the environment transitions to a new state and emits a reward. Over many such steps the agent updates its policy to maximize cumulative future reward (formally, the "expected discounted return").

Q-Learning Intuition

Q-Learning is the textbook RL algorithm AIF-C01 sometimes names. Intuitively, Q(s, a) is a table (or neural network) that estimates "how much total reward can I expect if I take action a from state s and then act optimally thereafter?" The agent repeatedly refines Q based on observed rewards, then picks actions by choosing the a that maximizes Q(s, a) in each state. Deep Q-Networks (DQN) replace the table with a neural network so the approach scales to high-dimensional state spaces like pixels.

Exploration vs Exploitation

RL agents face a perpetual tension: should I exploit what I already know (pick the action with the highest estimated reward) or explore (try something new that might turn out better)? The epsilon-greedy strategy — act greedily 1 − epsilon of the time and randomly epsilon of the time — is the classic balance. This tension has no analogue in supervised learning, which is one reason RL sits in its own family of supervised, unsupervised, and reinforcement learning.

On-Policy vs Off-Policy (Conceptual)

On-policy methods (e.g. SARSA) learn about the policy they are currently executing. Off-policy methods (e.g. Q-Learning) can learn about the optimal policy while executing a different (exploratory) one. AIF-C01 rarely drills this distinction — knowing it exists is enough.

AWS DeepRacer: The Exam-Friendly RL Example

AWS DeepRacer is an autonomous 1/18th-scale race car that learns to drive around a track using RL. The student defines a reward function (e.g. reward = track_width − distance_from_center), picks a training algorithm (PPO or SAC), trains in a simulator, and optionally deploys to a physical car. DeepRacer is the AIF-C01 canonical RL example because (a) the five RL terms map cleanly onto it, (b) AWS markets it as an educational tool, and (c) it maps to SageMaker RL under the hood.

RL Term	DeepRacer Mapping
Agent	The 1/18th-scale car
Environment	The simulated race track
State	Camera image + car velocity
Action	Steering angle + throttle
Reward	Custom Python function you define
Policy	Neural network mapping state → action

Other AWS Services That Host Reinforcement Learning

SageMaker RL — fully managed RL training jobs with RLlib, Coach, and TensorFlow/PyTorch backends.
Amazon Bedrock RLHF (Reinforcement Learning from Human Feedback) — used internally by foundation-model providers to align models to human preferences; surfaces as a concept when discussing model fine-tuning.

AIF-C01 RL cheat sheet:

Five RL terms: Agent, Environment, State, Action, Reward → Policy emerges from training
AWS flagship RL product: AWS DeepRacer
SageMaker RL supports PPO, SAC, DQN algorithms
RL hallmark keywords: "trial and error," "reward function," "simulator," "cumulative return," "policy"
RLHF = Reinforcement Learning from Human Feedback — used in foundation-model alignment

Source ↗

Self-Supervised Learning: How Foundation Models Pretrain

Self-supervised learning is a relatively recent addition to the supervised, unsupervised, and reinforcement learning family tree and the secret sauce behind every modern foundation model on Amazon Bedrock.

The Trick: Create Labels from the Data Itself

Self-supervised learning is technically a special case of supervised learning — there are labels — but the labels are synthesized from the data itself rather than annotated by humans. For text, the label for each position is simply "what token comes next?" For images, the label might be "which patch was masked out?" For audio, "which chunk was silenced?" You need no human annotators, yet you have infinitely many training examples.

Why It Matters for Foundation Models

Every large language model (LLM) on Amazon Bedrock — Anthropic Claude, Amazon Titan, Meta Llama, Mistral, Cohere — was pre-trained self-supervised on a giant text corpus. The predict-next-token objective gives each model billions of free training examples at essentially zero labeling cost. This is the economic reason foundation models exist: you could never afford to hand-label trillions of tokens, but self-supervision sidesteps the cost entirely.

Other self-supervised objectives:

Masked language modeling (MLM) — BERT-style models predict randomly masked tokens.
Masked image modeling — vision models predict masked image patches.
Contrastive learning — bring similar pairs closer in embedding space, push dissimilar pairs apart (used by CLIP and Titan Multimodal Embeddings).
Next-token prediction (autoregressive) — GPT-style models predict the next token given all previous tokens; the standard objective for modern LLMs.

Self-Supervised vs Unsupervised: The Fine Line

Some textbooks lump self-supervised under unsupervised learning because neither uses human labels. AIF-C01 treats them as distinct: self-supervised has an explicit prediction target (even if auto-generated), whereas unsupervised learning has no target at all (clustering has no "correct" answer). When in doubt on the exam, remember: if the model is trained to predict something, it is supervised (including self-supervised); if the model is discovering structure with no predicted output, it is unsupervised.

Semi-Supervised Learning: When Labels Are Scarce

Semi-supervised learning is the last paradigm AIF-C01 expects you to recognize. It sits between supervised and unsupervised in the supervised, unsupervised, and reinforcement learning taxonomy and is useful whenever you have a small labeled dataset plus a large pool of unlabeled data.

The Typical Recipe

You train a first-pass model on the small labeled set, use it to generate pseudo-labels on the unlabeled data, keep the high-confidence pseudo-labels, and retrain on the combined set. Alternatives include co-training (two models label each other's unlabeled data) and graph-based label propagation.

When to Pick Semi-Supervised

Semi-supervised is the right pick when:

Labeling cost is high (medical imaging, legal documents)
Unlabeled data is abundant (you have millions of product reviews, only thousands labeled)
Unlabeled distribution matches the labeled distribution (crucial; otherwise pseudo-labels are poisonous)

AWS Services That Support Semi-Supervised Workflows

Amazon SageMaker Ground Truth — active learning built in; the system automatically labels the "easy" samples and routes hard ones to human annotators, which operationally is a form of semi-supervised labeling.
Amazon Comprehend Custom — training on small labeled sets with transfer from Amazon's pre-trained base models.

Semi-supervised learning is rarely a direct answer on AIF-C01, but recognizing it lets you eliminate distractors in supervised, unsupervised, and reinforcement learning scenario questions.

When to Pick Each: Problem Shape + Label Availability

A consolidated decision flow for the whole supervised, unsupervised, and reinforcement learning family plus the two hybrids:

Do I have an environment with rewards and actions? → Reinforcement learning (DeepRacer, SageMaker RL).
Can I generate labels for free from the data itself (predict next token, masked patch)? → Self-supervised learning (foundation-model pre-training).
Do I have abundant labels? → Supervised learning.
- Output is a category → Classification.
- Output is a number → Regression.
Do I have a few labels plus lots of unlabeled data? → Semi-supervised learning.
Do I have only unlabeled data and need to discover structure? → Unsupervised learning.
- Group similar items → Clustering.
- Compress features → Dimensionality reduction.
- Flag outliers → Anomaly detection.

Keyword-to-Paradigm Cheat Table

Scenario Keyword	Paradigm	Likely AWS Service
"historical labeled data," "predict category"	Supervised classification	SageMaker XGBoost, Comprehend Custom Classification
"predict sales number"	Supervised regression	SageMaker Linear Learner, Amazon Forecast (deprecated; see SageMaker Canvas time-series)
"segment customers, no predefined groups"	Unsupervised clustering	SageMaker K-Means
"reduce 500 features"	Unsupervised dimensionality reduction	SageMaker PCA
"detect unusual behavior, no labeled fraud examples"	Unsupervised anomaly detection	SageMaker Random Cut Forest, Lookout for Metrics
"trial and error, simulator, reward"	Reinforcement learning	AWS DeepRacer, SageMaker RL
"pre-train a foundation model on raw text"	Self-supervised learning	Bedrock continued pre-training
"small labeled set plus lots of unlabeled"	Semi-supervised learning	SageMaker Ground Truth (active learning)

SageMaker Built-in Algorithms Mapped to the Paradigms

Amazon SageMaker ships a suite of built-in algorithms that map cleanly onto supervised, unsupervised, and reinforcement learning. AIF-C01 asks you to recognize which algorithm is used for which paradigm.

Supervised Built-ins

Linear Learner — classification or regression on tabular data.
XGBoost — gradient-boosted trees, the tabular workhorse; classification or regression.
Factorization Machines — for high-dimensional sparse datasets (clickstream, recommendations).
Image Classification (ResNet-based) and Object Detection (SSD-based) — computer vision supervised tasks.
Semantic Segmentation — pixel-level classification.
BlazingText (supervised mode) — text classification.
Seq2Seq — sequence-to-sequence tasks like translation.
DeepAR — supervised time-series forecasting.

Unsupervised Built-ins

K-Means — clustering.
PCA — dimensionality reduction.
Random Cut Forest — anomaly detection.
IP Insights — anomaly detection on IP address pairs.
BlazingText (unsupervised Word2Vec mode) — word-embedding learning.
Object2Vec — general-purpose neural embeddings.
Neural Topic Model (NTM) and LDA — topic modeling over text corpora.

Reinforcement Built-ins

SageMaker RL — not a single algorithm but a framework supporting PPO, SAC, DQN via RLlib and Coach.
AWS DeepRacer — packages RL into a gamified consumer-facing product.

SageMaker BlazingText has two modes — and the mode determines the paradigm. In supervised mode it is a text classifier (supervised learning). In Word2Vec mode it learns word embeddings without labels (unsupervised learning). AIF-C01 loves this ambiguity. If the scenario mentions training labels, the answer is BlazingText supervised. If the scenario mentions learning representations of words from raw corpora, the answer is BlazingText unsupervised. Do not confuse the two. Source ↗

Transfer Learning vs Domain Adaptation: The Exam Trap

Community pain-point reports consistently flag the distinction between Transfer Learning and Domain Adaptation as one of the most-missed supervised, unsupervised, and reinforcement learning adjacent concepts on AIF-C01. The two terms look synonymous in casual usage but the exam treats them as distinct.

Transfer Learning: New Task, Reuse Knowledge

Transfer learning reuses a model trained on one task to jump-start a different task. Classic example: a ResNet pre-trained on ImageNet (1000-class general image classification) is fine-tuned on a small dataset of chest X-rays (binary pneumonia detection). The source task (ImageNet classification) and target task (X-ray classification) are different tasks. The feature extractor layers are reused; the classification head is replaced or fine-tuned.

Transfer learning is about changing the task.

Domain Adaptation: Same Task, New Domain

Domain adaptation keeps the same task but adapts the model to a new data distribution. Classic example: a sentiment classifier trained on Amazon product reviews is adapted to classify sentiment on movie reviews. The task (sentiment classification) is identical. What changed is the domain (the statistical distribution of the inputs — vocabulary, style, length).

Domain adaptation is about changing the data distribution while keeping the task.

Side-by-Side Comparison

Dimension	Transfer Learning	Domain Adaptation
Task	Changes (ImageNet → X-ray)	Stays the same (sentiment → sentiment)
Domain / input distribution	Often changes too	Definitionally changes
Typical scenario	Reuse pre-trained ImageNet backbone	Sentiment model: Amazon reviews → movie reviews
Usual method	Replace + fine-tune output layer	Continued training on target-domain data
AWS surface	SageMaker JumpStart fine-tuning	Bedrock continued pre-training, Comprehend Custom

Why the Exam Loves This Pair

Both concepts describe "use an existing model on a new situation." Without the precise definitions above, candidates shrug and pick whichever word sounds more familiar. The AIF-C01 exam guide explicitly lists both, and community reports confirm they appear as distinct answer options in the same question — forcing you to choose correctly.

Transfer Learning changes the task. Domain Adaptation changes the input distribution while keeping the task. Memorize this one-line distinction. On AIF-C01 it is one of the highest-value supervised, unsupervised, and reinforcement learning adjacent rules because miss-selecting costs a whole question — and community data suggests this pair appears in most AIF-C01 exam sittings. Source ↗

Common Exam Traps for Supervised, Unsupervised, and Reinforcement Learning

Beyond the two traps already flagged (binary classification vs regression, clustering vs classification), AIF-C01 fields a handful of recurring supervised, unsupervised, and reinforcement learning trick patterns.

Trap 1: "Unsupervised" Does Not Mean "No Training"

Unsupervised models are still trained — they just train without labels. Do not pick "unsupervised" as a synonym for "rule-based" or "no ML involved."

Trap 2: RL Is Not Always the Right Answer for "Adaptive" Systems

Adaptive recommendation engines are often supervised (collaborative filtering, matrix factorization) or a hybrid. RL shows up only when the system takes sequential actions in an environment with delayed rewards. If the scenario is just "recommend products based on past purchases," the answer is supervised, not reinforcement.

Trap 3: Self-Supervised Pre-Training ≠ Fine-Tuning

Foundation models go through two phases: self-supervised pre-training on a giant corpus, then (optionally) supervised fine-tuning on a smaller labeled set. An AIF-C01 question about fine-tuning on domain-specific labeled data is asking about the supervised fine-tuning step, not the original self-supervised pre-training.

Trap 4: Anomaly Detection Can Be Supervised or Unsupervised

Most anomaly detection on AWS (Random Cut Forest, Lookout family) is unsupervised — you do not have pre-labeled anomalies. But Amazon Fraud Detector uses supervised learning because you label past transactions as fraudulent or legitimate. The paradigm depends on whether the anomalies are pre-labeled.

Trap 5: DeepRacer's Reward Function Is Written by the Human, Not Learned

A common misconception: "DeepRacer learns its own reward." Wrong. The student writes the reward function in Python. The agent learns a policy that maximizes the reward function the student defined. If the reward function is bad, the learned policy will be bad.

Do not confuse "self-supervised learning" with "unsupervised learning" on AIF-C01. Both skip human labels, but self-supervised models have a clear prediction target auto-generated from the data (predict-next-token, predict-masked-patch); unsupervised models have no prediction target at all (clustering, PCA). The AIF-C01 exam guide treats them as distinct entries in the supervised, unsupervised, and reinforcement learning taxonomy and has been seen to put both as answer options for the same question. Source ↗

Numbers and Constants to Memorize

A handful of numbers show up repeatedly when AIF-C01 exam writers build supervised, unsupervised, and reinforcement learning scenario questions. These are not in any one whitepaper but reflect AWS documentation defaults and typical guidance.

AIF-C01 cheat numbers for Supervised, Unsupervised, and Reinforcement Learning:

3 — canonical paradigms in the ML taxonomy (supervised, unsupervised, reinforcement)
5 — variants when you add self-supervised and semi-supervised
60/20/20 or 70/15/15 or 80/10/10 — typical train/validation/test split ratios
5-fold, 10-fold — most common k-fold cross-validation values
k — number of clusters in K-Means, chosen by the human (often via elbow method or silhouette score)
PPO, SAC — two standard RL algorithms exposed in AWS DeepRacer
1000 — number of classes in ImageNet, the typical transfer-learning source task
1/18 — DeepRacer car scale ratio

Source ↗

Supervised, Unsupervised, and Reinforcement Learning vs Foundation Models

Modern foundation models blur the old boundaries of supervised, unsupervised, and reinforcement learning. Understanding how they overlap is a fast way to answer comparison questions.

Pre-Training: Self-Supervised

Foundation-model pre-training is self-supervised. Predict the next token, predict the masked patch. No human labels.

Supervised Fine-Tuning (SFT)

After pre-training, providers often fine-tune the model on a smaller dataset of (instruction, ideal-response) pairs. This phase is pure supervised learning — the label is the ideal response.

Reinforcement Learning from Human Feedback (RLHF)

A third alignment phase applies reinforcement learning. A reward model is trained on human rankings of model outputs, and the policy (the LLM) is updated to maximize the reward-model score. The final alignment of models like Claude and Titan uses RLHF, bringing all three families of supervised, unsupervised, and reinforcement learning into a single production model.

Takeaway for the Exam

When a AIF-C01 question asks how foundation models are trained, the correct answer mentions all three phases: self-supervised pre-training, supervised fine-tuning, and reinforcement learning from human feedback. Single-paradigm answers are usually wrong for foundation-model questions.

Question Links: AIF-C01 Task 1.1 Practice Pointers

AIF-C01 task statement 1.1 "Explain basic AI concepts and terminologies" exercises the supervised, unsupervised, and reinforcement learning vocabulary through scenario-based questions. The templates below are the most common shapes you will see. Detailed practice questions with full explanations appear in the ExamHub question bank.

Template A: Label-Availability Routing

A retailer has five years of customer transaction data with no labels indicating which customers are "high value," and wants to discover natural customer segments to design targeted marketing campaigns. Which paradigm applies? Answer: unsupervised learning (clustering). Distractors: supervised classification (wrong because no labels), reinforcement learning (wrong because no environment/reward).

Template B: Output-Type Fork

A real-estate startup wants to predict the sale price of a house from square footage, location, and number of bedrooms. Which paradigm applies? Answer: supervised regression (continuous numeric target). Distractor: supervised classification (wrong because sale price is continuous, not categorical).

Template C: Environment-with-Reward Pattern

A logistics company wants a warehouse robot to learn the fastest path to pick items, adapting to changing warehouse layouts. Which paradigm applies? Answer: reinforcement learning (agent, environment, reward over time). Distractors: supervised learning (wrong because no labeled optimal paths exist).

Template D: Domain Adaptation vs Transfer Learning

A data scientist has a sentiment-analysis model trained on English product reviews and wants to adapt it to classify sentiment on English movie reviews. Which technique applies? Answer: domain adaptation (same task, different domain). Distractor: transfer learning (wrong because the task is unchanged).

Template E: Foundation-Model Pre-Training

An AI team wants to pre-train a language model on a 1-TB corpus of legal documents with no human labels. Which paradigm applies? Answer: self-supervised learning (predict-next-token is the implicit objective). Distractors: unsupervised learning (technically adjacent but does not capture the prediction objective), supervised learning (no human labels).

Supervised, Unsupervised, and Reinforcement Learning Frequently Asked Questions (FAQ)

What is the difference between supervised, unsupervised, and reinforcement learning?

Supervised learning trains a model on (input, label) pairs to predict labels for new inputs. Unsupervised learning trains on inputs alone and finds hidden structure such as clusters or low-dimensional embeddings. Reinforcement learning trains an agent to pick actions in an environment, guided by a reward signal, to maximize cumulative return. These three families of supervised, unsupervised, and reinforcement learning cover most classical ML on AWS and appear explicitly in the AIF-C01 exam guide.

How do I know whether to use supervised or unsupervised learning?

Start with the label-availability test. If you have labeled training data and want to predict the same label for new inputs, use supervised learning. If you have only raw inputs and want to discover structure (groups, compressed features, anomalies), use unsupervised learning. If labels are scarce but unlabeled data is abundant, semi-supervised learning or self-supervised pre-training followed by supervised fine-tuning may beat pure supervised.

Is classification the same as regression?

No. Both are supervised learning, but classification predicts a category from a discrete set (spam / not spam, A / B / C) while regression predicts a continuous numeric value (price, temperature). AIF-C01 asks you to distinguish them by looking at the target variable's type. Numeric → regression. Categorical → classification. Binary yes/no is classification (specifically binary classification), not regression, even though yes/no can be encoded as 0/1.

What is reinforcement learning and when should I pick it on AIF-C01?

Reinforcement learning trains an agent via trial and error to maximize reward in an environment. Pick it when the scenario involves sequential decisions, a simulator or real environment, and a scoring function rather than a labeled dataset. AWS DeepRacer is the AIF-C01 canonical example. Keywords like "agent," "reward," "policy," "simulator," and "trial and error" should snap your attention to reinforcement learning.

What is self-supervised learning and how does it relate to foundation models?

Self-supervised learning creates labels from the data itself — for text, predict the next token; for images, predict the masked patch — so no human annotation is needed. Every modern foundation model on Amazon Bedrock (Anthropic Claude, Amazon Titan, Meta Llama, Mistral) is pre-trained self-supervised on a giant corpus. Self-supervised pre-training is what makes foundation models economically feasible — human-labeled data at the required scale would cost billions. AIF-C01 expects you to recognize self-supervised learning as the mechanism behind foundation-model pre-training.

What is the difference between transfer learning and domain adaptation?

Transfer learning changes the task (e.g. ImageNet classifier → chest X-ray classifier) by reusing learned feature representations. Domain adaptation keeps the task and adapts the model to a new input distribution (e.g. product-review sentiment classifier → movie-review sentiment classifier). Both concepts appear in AIF-C01 Task 1.1 and 3.3, and community reports consistently flag this pair as a high-trap distinction. Memorize: transfer learning = new task; domain adaptation = new data distribution.

Which AWS services implement each learning paradigm?

For supervised learning, use SageMaker built-in algorithms (XGBoost, Linear Learner, Image Classification), SageMaker JumpStart, Comprehend Custom, Rekognition Custom Labels, Amazon Fraud Detector, and Amazon Personalize. For unsupervised learning, use SageMaker K-Means, PCA, Random Cut Forest, and the Amazon Lookout family for anomaly detection. For reinforcement learning, use AWS DeepRacer and SageMaker RL. For self-supervised learning, Amazon Bedrock continued pre-training lets you continue a foundation model's self-supervised training on your own unlabeled data.

Does AIF-C01 require me to code any of these algorithms?

No. AIF-C01 is a foundational certification that tests conceptual understanding, vocabulary, and AWS service mapping. You need to recognize the paradigms, distinguish their sub-types, match scenarios to the correct family, and identify which AWS service hosts which approach. You do not need to implement K-Means or gradient descent. That level of depth is reserved for the AWS Certified Machine Learning Engineer – Associate (MLA-C01) and AWS Certified Machine Learning – Specialty (MLS-C01) certifications.

What Are Supervised, Unsupervised, and Reinforcement Learning?

Why AIF-C01 Obsesses Over the Three-Family Taxonomy

How This Topic Links to the Rest of AIF-C01

Why Learning Type Matters: The Label Availability Test

白話文解釋 Supervised, Unsupervised, and Reinforcement Learning

The Open-Book Exam Analogy (Supervised Learning)

The Library Analogy (Unsupervised Learning)

The Video-Game Analogy (Reinforcement Learning)

Which Analogy to Use on Exam Day

Supervised Learning: Labeled Data, Precise Predictions

The Core Assumption: Labels Exist

Classification vs Regression: The Output-Type Fork

Training, Validation, and Test Splits

Common Supervised Algorithms (Conceptual Level)

AWS Services That Host Supervised Learning

Unsupervised Learning: Pattern Discovery Without Labels

Clustering: Grouping Similar Items

Dimensionality Reduction: Compression Without Labels

Anomaly Detection: Finding the Weird Ones

AWS Services That Host Unsupervised Learning

Reinforcement Learning: Learning from Rewards

The Agent-Environment-Reward-Policy Loop

Q-Learning Intuition

Exploration vs Exploitation

On-Policy vs Off-Policy (Conceptual)

AWS DeepRacer: The Exam-Friendly RL Example

Other AWS Services That Host Reinforcement Learning

Self-Supervised Learning: How Foundation Models Pretrain

The Trick: Create Labels from the Data Itself

Why It Matters for Foundation Models

Self-Supervised vs Unsupervised: The Fine Line

Semi-Supervised Learning: When Labels Are Scarce

The Typical Recipe

When to Pick Semi-Supervised

AWS Services That Support Semi-Supervised Workflows

When to Pick Each: Problem Shape + Label Availability

Keyword-to-Paradigm Cheat Table

SageMaker Built-in Algorithms Mapped to the Paradigms

Supervised Built-ins

Unsupervised Built-ins

Reinforcement Built-ins

Transfer Learning vs Domain Adaptation: The Exam Trap

Transfer Learning: New Task, Reuse Knowledge

Domain Adaptation: Same Task, New Domain

Side-by-Side Comparison

Why the Exam Loves This Pair

Common Exam Traps for Supervised, Unsupervised, and Reinforcement Learning

Trap 1: "Unsupervised" Does Not Mean "No Training"

Trap 2: RL Is Not Always the Right Answer for "Adaptive" Systems

Trap 3: Self-Supervised Pre-Training ≠ Fine-Tuning

Trap 4: Anomaly Detection Can Be Supervised or Unsupervised

Trap 5: DeepRacer's Reward Function Is Written by the Human, Not Learned

Numbers and Constants to Memorize

Supervised, Unsupervised, and Reinforcement Learning vs Foundation Models

Pre-Training: Self-Supervised

Supervised Fine-Tuning (SFT)

Reinforcement Learning from Human Feedback (RLHF)

Takeaway for the Exam

Question Links: AIF-C01 Task 1.1 Practice Pointers

Template A: Label-Availability Routing

Template B: Output-Type Fork

Template C: Environment-with-Reward Pattern

Template D: Domain Adaptation vs Transfer Learning

Template E: Foundation-Model Pre-Training

Supervised, Unsupervised, and Reinforcement Learning Frequently Asked Questions (FAQ)

What is the difference between supervised, unsupervised, and reinforcement learning?

How do I know whether to use supervised or unsupervised learning?

Is classification the same as regression?

What is reinforcement learning and when should I pick it on AIF-C01?

What is self-supervised learning and how does it relate to foundation models?

What is the difference between transfer learning and domain adaptation?

Which AWS services implement each learning paradigm?

Does AIF-C01 require me to code any of these algorithms?

Further Reading

官方資料來源