Generative AI Concepts — AIF-C01 Study Notes (Foundations, Transformers, Hallucination)

Generative AI concepts are the foundation of Domain 2 on the AWS Certified AI Practitioner (AIF-C01) exam, and Domain 2 carries 24 percent of the total score. The AIF-C01 exam guide states that Task 2.1 asks you to "explain the basic concepts of generative AI" — which means you must recognize what makes a model generative, how generative AI differs from older discriminative models, and how generative AI produces text, images, audio, video, code, and synthetic data. Miss the generative AI concepts section and you lose a quarter of the exam. This topic covers every generative AI concept the exam guide names, decodes the transformer architecture at the conceptual depth the AIF-C01 expects, walks through hallucination causes and mitigation, and closes with the cost, privacy, and intellectual-property considerations that the exam loves to test in scenario form.

Generative AI concepts on AIF-C01 map specifically to foundation models accessed through Amazon Bedrock, Amazon Q, Amazon SageMaker JumpStart, and the task-specific AWS AI/ML services (Amazon Comprehend, Amazon Rekognition, Amazon Textract) when they are contrasted against generative AI approaches. This topic stays at the AIF-C01 scope — concept recognition — and flags every spot where the AIP-C01 professional exam would go deeper into build and deployment details.

What is Generative AI? — From Discriminative to Generative Paradigm

Generative AI is the category of machine learning systems whose primary output is new content rather than a classification label or a numeric score. Traditional discriminative AI looks at an input and decides a category: "this email is spam," "this image contains a cat," "this transaction is fraud." Generative AI looks at an input and produces a fresh artifact in the same modality as its training data: an email draft, a new cat image, a synthetic transaction record.

The technical definition the AIF-C01 exam expects: a generative model learns a probability distribution over sequences (or over pixel grids, audio samples, or other structured outputs) so that it can sample new, realistic examples from that distribution. A text generative model learns the distribution P(next token | previous tokens). An image generative model learns the distribution P(image | prompt). When the model "generates," it is sampling from that learned distribution — which is why the same prompt can produce different outputs on different runs.

Why Generative AI Matters for AIF-C01

Domain 2 (Fundamentals of Generative AI) is 24 percent of AIF-C01. Domain 3 (Applications of Foundation Models) is 28 percent. Together they are 52 percent of the exam — and every foundation-model question assumes you already know the generative AI concepts covered here. If you can articulate what makes a model generative, you automatically unlock Bedrock, prompt engineering, RAG, fine-tuning, and foundation-model evaluation questions downstream.

Core Claim: Generative AI Learns a Probability Distribution Over Sequences

Every modern generative AI system — GPT-style LLMs, Stable Diffusion, Amazon Titan Image Generator, Amazon Titan Text — ultimately reduces to the same mathematical idea. The model is a compressed representation of the probability distribution of its training corpus. Text generation is sampling from P(next token | context). Image generation is sampling from P(pixel arrangement | prompt). Audio generation is sampling from P(audio sample | prompt). Understanding this one idea unlocks every downstream concept on the exam.

Generative AI is a class of AI models that can produce new content (text, images, audio, video, code, or synthetic data) by learning the statistical patterns of training data and sampling from a learned probability distribution over outputs. Foundation models are the dominant architecture for modern generative AI, accessed on AWS primarily through Amazon Bedrock. Source ↗

Generative vs Discriminative Models — The Core Distinction

This distinction is the single most-tested generative AI concept on AIF-C01. Memorize it.

A discriminative model learns the boundary between classes. Given an input X, it predicts a label Y. Mathematically it models P(Y | X) — the conditional probability of the label given the input. Examples on AWS: Amazon Comprehend sentiment classification, Amazon Rekognition object detection, Amazon Fraud Detector risk scores, a SageMaker-trained XGBoost churn classifier. Discriminative models answer the question "which class does this input belong to?"

A generative model learns how the data itself was produced. It models the joint distribution P(X, Y) or the input distribution P(X) directly. Because it understands what the data looks like from the inside, it can generate new examples that look like training data. Examples on AWS: Anthropic Claude on Amazon Bedrock, Amazon Titan Text, Stable Diffusion on Amazon Bedrock, Amazon Titan Image Generator, Amazon CodeWhisperer (now Amazon Q Developer). Generative models answer the question "what would a new, realistic example of this data look like?"

The practical consequence: discriminative models are cheaper to train, more accurate for classification, and more interpretable — but they cannot produce new content. Generative models are far more expensive to train, carry hallucination risk, and are harder to evaluate — but they can write, draw, code, and converse.

Generative vs Discriminative — Side by Side

Dimension	Discriminative Models	Generative Models
What it learns	P(Y \| X) — label given input	P(X) or P(X, Y) — the data distribution itself
What it outputs	A label, score, or bounding box	New content in the training modality
Training cost	Low to medium (task-specific)	Very high (foundation model scale)
Inference cost	Cheap, deterministic	Expensive, stochastic (per-token pricing on Bedrock)
Examples on AWS	Amazon Comprehend, Amazon Rekognition, SageMaker XGBoost	Anthropic Claude, Amazon Titan, Stable Diffusion via Bedrock
Output determinism	Same input → same label	Same input → different outputs per run (temperature)
Hallucination risk	None (bounded label set)	Yes (can fabricate confident but false content)

Every AIF-C01 attempt includes at least one question that requires this distinction. Shortcut: if the output is a fixed label from a predefined set, the model is discriminative. If the output is new content of arbitrary structure (text, image, code), the model is generative. Amazon Comprehend detecting sentiment = discriminative. Amazon Bedrock with Claude writing a summary = generative. Amazon Rekognition labeling objects = discriminative. Amazon Titan Image Generator creating a product photo = generative. Source ↗

Content Types Generated — Text, Images, Audio, Video, Code, Synthetic Data

Generative AI is not only LLMs. The AIF-C01 exam expects you to recognize every output modality by the generative AI approach typically used.

Text — Large language models (LLMs) like Anthropic Claude, Amazon Titan Text, Meta Llama, Mistral, and Cohere Command. Accessed on AWS via Amazon Bedrock.
Images — Diffusion models like Stable Diffusion and Amazon Titan Image Generator. Accessed on AWS via Amazon Bedrock.
Audio / Speech — Neural text-to-speech (Amazon Polly neural voices) and generative audio models for music and sound effects (third-party on AWS Marketplace).
Video — Emerging video diffusion models (Amazon Nova Reel on Amazon Bedrock). Text-to-video and image-to-video.
Code — Code-specialized LLMs like Amazon Q Developer (formerly Amazon CodeWhisperer). Generates, explains, and reviews source code.
Synthetic Data — Tabular and simulation data used to augment scarce training sets or preserve privacy. Generated by specialized GANs or LLMs and used in fraud modeling, healthcare, and autonomous-vehicle training.

On the exam, if a question says "generate marketing copy," that is text. "Produce a product image from a description" is an image diffusion model. "Write Python unit tests for this function" is code generation through Amazon Q Developer or Amazon Bedrock.

Large Language Models — Predict-Next-Token Mechanism

A Large Language Model (LLM) is a generative AI model trained to predict the next token in a sequence of text. The whole conversational, summarization, question-answering, code-writing surface of modern chatbots like Anthropic Claude, Amazon Titan, and Meta Llama emerges from this single objective: given a sequence of tokens, predict the probability distribution over the next token, then sample from it, then append it, then repeat.

Tokens are the LLM's unit of operation — usually subword fragments (a common rule of thumb on AIF-C01: one token is roughly 4 English characters, or about 0.75 English words). When you send a prompt to Amazon Bedrock, the service tokenizes your text, runs the transformer forward pass to compute the probability distribution over the vocabulary, selects the next token (using temperature, top-p, top-k), appends it to the context, and runs the forward pass again. That loop continues until a stop condition — a stop sequence, a max-token limit, or the model generating an end-of-sequence token.

Because every token generation is a sample from a probability distribution, LLMs are stochastic by design. The same prompt can yield different completions across calls. Temperature controls how "peaky" the sampling distribution is: low temperature biases sampling toward the highest-probability token (nearly deterministic); high temperature flattens the distribution (more creative, more variety, more risk of hallucination).

LLM Capabilities Without Task-Specific Training

The remarkable property of LLMs is that a single pretrained model — trained only on next-token prediction over a massive corpus — can perform summarization, classification, translation, question answering, code generation, entity extraction, and many more downstream tasks without any task-specific training. You steer the model by writing a prompt that describes the task, and because the training corpus included examples of every human language task ever written down, the model has implicit competence at all of them. This is the foundation for prompt engineering, zero-shot prompting, and few-shot prompting — covered in detail in the prompt-engineering topic.

Diffusion Models — Image Generation via Noise Denoising

Diffusion models are the dominant generative AI approach for images. The concept is counter-intuitive but testable on AIF-C01.

Training: take a real image, progressively add random Gaussian noise over many steps until the image becomes pure noise. Train a neural network to reverse one step of that noise at a time — given a slightly noisier image, predict the slightly less noisy version.

Inference (generation): start from pure random noise. Run the trained denoising network many times, conditioning on a text prompt (through an embedding). Each step removes a little noise, pushing the pixel arrangement toward something that matches the prompt. After many steps — typically 20 to 100 — the final output is a coherent image.

On AWS, Stable Diffusion (Stability AI) and Amazon Titan Image Generator are both diffusion models available through Amazon Bedrock. The exam does not require mathematical depth; it requires you to recognize diffusion as an image generative AI approach distinct from LLMs, and to know diffusion models are typically text-conditioned (prompt-in, image-out).

GANs — Generator vs Discriminator Architecture

Generative Adversarial Networks (GANs) were the previous-generation approach to image generation, now largely replaced by diffusion models for state-of-the-art image quality but still appearing on the AIF-C01 exam as a generative AI concept you must recognize.

A GAN has two neural networks playing a zero-sum game:

Generator — takes random noise as input and produces synthetic content (usually an image).
Discriminator — takes content (real or synthetic) and classifies it as real or fake.

The generator tries to fool the discriminator. The discriminator tries to catch the generator. Both improve through training. When training converges, the generator produces outputs indistinguishable from real data.

GANs are the technology behind many synthetic-face generators, style transfer systems, and older image synthesis products. On AIF-C01 you need to recognize GAN as a generative AI architecture, know it has a generator and a discriminator, and know it is one of several generative approaches alongside LLMs and diffusion models.

LLM = predict the next token in a sequence (text). Diffusion = reverse noise one step at a time (images). GAN = generator vs discriminator adversarial game (older images, synthetic data). If an AIF-C01 scenario describes a chatbot, expect an LLM. If it describes creating images from text, expect diffusion. If it describes synthetic data for training augmentation, expect either GAN or diffusion. Source ↗

Transformer Architecture at Concept Level — Attention Is All You Need

The transformer is the neural network architecture behind every modern LLM and most multimodal models. The AIF-C01 exam guide lists transformer architecture knowledge at a conceptual level — no math, no gradients, just the building blocks and what they do.

Why Transformers Replaced RNNs

Before transformers (2017), sequence models used recurrent neural networks (RNNs) and long short-term memory (LSTM) networks. RNNs processed tokens one at a time, carrying a hidden state forward. They had two problems: (1) they struggled with long-range dependencies because information decayed over many time steps, and (2) they could not be trained in parallel because each token's computation depended on the previous one.

Transformers solve both problems with one mechanism: attention. Every token can look directly at every other token in the sequence in parallel, regardless of distance. That removes the long-range-dependency bottleneck and enables massive parallel training on GPU and AWS Trainium clusters.

The Attention Mechanism — Conceptually

Attention answers the question "when generating the next token, which earlier tokens in the sequence should I focus on?" For each token, the model computes three vectors: a query (what this token is looking for), a key (what this token offers), and a value (the actual content to pass along if attended to). The model computes attention weights by comparing every query to every key (via dot product), normalizes them into a probability distribution with softmax, and then takes a weighted sum of the values. The result is a new representation of each token that incorporates relevant information from across the whole sequence.

For AIF-C01, you do not need to derive attention formulas. You need to recognize three facts: (1) attention is the core innovation of transformers, (2) attention lets the model weigh different parts of the input when producing each output, and (3) attention enables parallel computation and long-range dependency handling.

Self-Attention vs Cross-Attention

Self-attention — every token attends to every other token in the same sequence. This is what lets an LLM understand that the pronoun "it" in the sentence "The trophy didn't fit in the suitcase because it was too big" refers to "trophy," not "suitcase." Self-attention is the dominant mechanism in decoder-only models like GPT, Anthropic Claude, and Amazon Titan Text.
Cross-attention — tokens in one sequence attend to tokens in a different sequence. Used in encoder-decoder architectures for translation and in multimodal models (where text tokens attend to image patch tokens).

Encoder vs Decoder

The original 2017 transformer paper proposed an encoder-decoder architecture for translation. Modern generative AI has diverged into three flavors the AIF-C01 exam may reference:

Encoder-only — transformers that read input and produce a rich representation, but do not generate new tokens. Used for classification and embedding generation. Example: BERT (historical), Amazon Titan Embeddings.
Decoder-only — transformers that generate tokens one at a time from a prompt. Used for all modern chat and completion LLMs. Examples: Anthropic Claude, Amazon Titan Text, Meta Llama, Mistral.
Encoder-decoder — transformers with a separate reading and writing network. Used for translation and some summarization models (T5, BART lineage).

For AIF-C01, the decoder-only architecture is the most common generative AI case. Encoder-only is associated with embedding and classification. The exam may ask you to match architecture family to use case, but will not ask you to describe layer counts or attention head configurations.

A transformer is a neural network architecture based on self-attention that processes sequences in parallel instead of step by step. Transformers power nearly all modern large language models and many multimodal generative AI systems. On AWS, transformers are trained on AWS Trainium and NVIDIA GPU instances and served for inference on AWS Inferentia and NVIDIA GPU instances. Source ↗

Training Data Scale and Compute Requirements

Generative AI foundation models are expensive because they are trained at a scale that discriminative models never reach. The AIF-C01 exam does not require you to memorize parameter counts, but it does expect you to understand the relative magnitude and the infrastructure implications.

The Scale Gap

A typical discriminative model might have 1 million to 100 million parameters, train on 100 GB of labeled data, and take a few hours on a handful of GPUs. A foundation model like Anthropic Claude or Amazon Titan Text trains on trillions of tokens of text scraped from the public internet, books, and code repositories, may have tens to hundreds of billions of parameters, and requires thousands of GPU or AWS Trainium chips running for weeks or months. That is four to six orders of magnitude more compute.

Compute Infrastructure on AWS

AWS Trainium — purpose-built AWS silicon for foundation-model training. Trn1 and Trn2 instances deliver price-performance advantages over general-purpose GPUs for large-model pretraining.
NVIDIA GPUs on AWS — P4d, P5, and P5e instances with H100 GPUs for the most demanding training workloads. Amazon SageMaker HyperPod orchestrates thousands of these for foundation-model training.
AWS Inferentia — purpose-built AWS silicon for inference. Inf1 and Inf2 instances lower per-token inference cost versus GPU-based inference.
Amazon Bedrock — fully managed service that hides all the infrastructure. You never see the underlying Trainium, Inferentia, or GPU; you just call the model API.

The AIF-C01 exam pattern: if a question mentions "training a foundation model at lowest cost," the answer direction is AWS Trainium. If it mentions "serving inference at lowest cost," the direction is AWS Inferentia. If it mentions "no infrastructure to manage, call an API," the answer is Amazon Bedrock.

Why the Scale Matters

The scale of training data and compute is what produces emergent capabilities — model behaviors that do not exist in smaller versions and appear suddenly as scale increases. Emergent capabilities include chain-of-thought reasoning, few-shot task learning from in-context examples, code generation, and cross-lingual transfer. The AIF-C01 exam frames emergent capabilities as a property of foundation models (large, pre-trained, general-purpose) that smaller task-specific models do not have.

Emergent Capabilities — What Scale Unlocks

Emergent capabilities are generative AI abilities that arise only when training data and parameter count pass certain thresholds. The AIF-C01 exam guide treats emergent capabilities as a key generative AI concept because they justify the existence of foundation models in the first place — if small models could do everything large models do, there would be no reason to spend millions on foundation model training.

Concrete emergent capabilities the exam may reference:

Few-shot in-context learning — solving a new task from a handful of examples placed in the prompt, without any weight updates. This is the basis of few-shot prompting.
Chain-of-thought reasoning — producing multi-step logical arguments when prompted with "think step by step." Smaller models cannot do this reliably.
Instruction following — understanding natural-language instructions like "summarize this in three bullets" without ever having been trained on that specific instruction.
Cross-lingual transfer — performing tasks in languages that were underrepresented in training, because shared conceptual structures transfer across languages.
Code generation — writing working code from natural-language descriptions of behavior.

Emergent capabilities are why the AIF-C01 exam frames foundation models as "general-purpose adaptable models" rather than "one model per task" — scale produces a single model that handles many tasks through prompting, which is the business case for Amazon Bedrock.

Multimodal Models — Accepting and Producing Multiple Input Types

A multimodal model processes or produces more than one type of content. The AIF-C01 exam expects recognition of multimodal as a concept and awareness of multimodal foundation models on AWS.

Types of Multimodal Behavior

Vision-language input — model accepts text + image and produces text output. Example: upload a product photo with the prompt "describe this item for an e-commerce listing" and the model writes the description. Anthropic Claude on Amazon Bedrock supports vision input.
Text-to-image — pure text prompt produces an image. Example: Stable Diffusion and Amazon Titan Image Generator on Amazon Bedrock.
Text-to-video — text prompt produces a video. Example: Amazon Nova Reel on Amazon Bedrock.
Speech-to-text-to-speech pipelines — Amazon Transcribe converts speech to text, a foundation model on Amazon Bedrock processes the text, and Amazon Polly converts the response back to speech. This is not a single multimodal model, but a multimodal pipeline the exam may describe.

Multimodal vs Unimodal on the Exam

The AIF-C01 exam will test whether you can recognize a multimodal scenario. Keywords in scenarios: "image plus text input," "analyze and describe a chart," "generate a video from a storyboard," "caption an image." All of these require multimodal capability. A task-specific service alone — Amazon Rekognition for image analysis, Amazon Comprehend for text — cannot combine modalities. Only a multimodal foundation model can.

Hallucination — Causes and Mitigation

Hallucination is the single most-tested generative AI concept on Domain 2 of AIF-C01. You must be able to define it, explain why it happens, and name the mitigation strategies AWS offers.

What Hallucination Is

Hallucination is when a generative AI model produces content that is factually incorrect, unsupported by any source, or fabricated — yet sounds confident and plausible. Examples: an LLM citing a paper that does not exist, inventing a non-existent person's biography, generating incorrect legal precedent, or producing code that calls a non-existent API.

Hallucination is not a bug in a specific model; it is a structural property of generative AI. Because LLMs are trained to produce plausible-sounding token sequences — not to verify factual accuracy — the same mechanism that lets them write fluent prose also lets them write fluent falsehoods.

Why Hallucination Happens — Five Causes

Training-data gaps — the model was never exposed to information about a topic, so it interpolates from nearby concepts and produces something that sounds right but is wrong.
Training-data staleness — the training corpus has a knowledge cutoff. Events after the cutoff are unknown, and if asked, the model may fabricate.
Tokenization and sampling noise — the model samples from a probability distribution. Low-probability but confident-sounding tokens can produce confidently wrong answers, especially at high temperature.
Over-optimization for fluency — the model is optimized to sound natural, not to verify facts. Fluency and accuracy are decoupled objectives.
Prompt ambiguity — vague or under-specified prompts push the model to fill gaps, which often produces fabricated details.

Hallucination Mitigation Strategies on AWS

The AIF-C01 exam will test the mitigation strategies — not a complete list, but these five are the canonical answers:

Retrieval-Augmented Generation (RAG) — retrieve authoritative source documents at query time, inject them into the prompt, and instruct the model to answer only from the retrieved context. Implemented on AWS via Amazon Bedrock Knowledge Bases with a vector database (Amazon OpenSearch Service, Amazon Aurora PostgreSQL with pgvector, or Amazon Neptune Analytics). RAG is the primary hallucination-mitigation pattern on AWS.
Grounding check via Amazon Bedrock Guardrails — an automated check that compares the generated response against a reference source and flags or blocks answers that are not grounded in that source.
Lower temperature — reducing temperature toward 0 biases sampling toward the highest-probability (and usually safer) tokens, reducing creative hallucination at the cost of variety.
Prompt engineering with explicit grounding instructions — "Answer only from the provided context. If the context does not contain the answer, respond 'I do not know.'" This does not eliminate hallucination but measurably reduces it.
Human-in-the-loop review via Amazon A2I (Augmented AI) — route low-confidence or high-risk generative outputs to a human reviewer before delivery.

Candidates often confuse hallucination with general model inaccuracy. A discriminative model that misclassifies a spam email is not hallucinating — it is just wrong. Hallucination specifically refers to generative AI fabricating plausible-sounding content that has no basis in training data or retrieved context. If the exam question mentions "the model confidently produced a source that does not exist" or "the model invented a fact," that is hallucination. If the exam asks for hallucination mitigation, the top answers are RAG, Bedrock Guardrails grounding check, lower temperature, and explicit grounding prompts. Source ↗

Cost Drivers of Generative AI Workloads

Generative AI workloads have a distinct cost profile from traditional ML workloads. The AIF-C01 exam will test whether you can identify what drives cost and how AWS services price generative AI access.

Training Cost Drivers

Parameter count — more parameters mean more compute per training step. A 70-billion-parameter model costs roughly an order of magnitude more to train than a 7-billion-parameter model.
Training data volume — more tokens seen during training means more forward and backward passes. Modern foundation models train on trillions of tokens.
Compute hardware choice — AWS Trainium offers better price-performance than general-purpose GPUs for large-model training. For a foundation-model training job, switching from GPU to Trainium can substantially reduce cost.
Training duration — bigger models on more data take weeks to months of continuous cluster time. Fault tolerance, checkpointing, and resumption (via Amazon SageMaker HyperPod) affect total cost.

For AIF-C01, the key insight is that training foundation models from scratch is prohibitively expensive for most customers. That is why Amazon Bedrock exists — you pay to use a pretrained model through an API, not to train your own. Fine-tuning an existing foundation model on a smaller domain corpus is much cheaper than training from scratch, and is the cost sweet spot when prompt engineering and RAG do not suffice.

Inference Cost Drivers

Input tokens — the number of tokens you send to the model. Long context windows and long RAG-retrieved context drive input-token cost.
Output tokens — the number of tokens the model generates. Typically priced higher per-token than input tokens because output generation is sequential (one token at a time).
Model choice — larger models (Anthropic Claude Opus, larger Amazon Titan variants) cost more per token than smaller models (Claude Haiku, smaller Titan variants). The AIF-C01 exam may frame this as "right-sizing" the model to the task.
Provisioned throughput vs on-demand — Amazon Bedrock offers on-demand pricing (pay per token, no commitment) and provisioned throughput (reserved capacity for predictable high-volume workloads). On-demand is cheaper for bursty workloads; provisioned throughput is cheaper for sustained heavy use.
Embedding generation — RAG pipelines must embed every query and every chunk of source data. Amazon Titan Embeddings and Cohere Embed on Amazon Bedrock charge per input token for embedding generation. Over millions of chunks, this becomes a real line item.
Vector database storage and query cost — the RAG retrieval side of the system has its own cost: Amazon OpenSearch Service node-hours, Amazon Aurora instance cost, or Amazon Bedrock Knowledge Bases managed-infrastructure cost.

Cost Optimization Tactics

Choose the smallest model that meets the quality bar — Claude Haiku before Claude Sonnet before Claude Opus.
Cache frequent prompts — Amazon Bedrock supports prompt caching to avoid reprocessing identical context.
Trim context aggressively — long RAG context inflates input-token cost. Chunk, rerank, and truncate.
Use batch inference for non-real-time workloads — Amazon Bedrock batch inference offers a discount over real-time inference.
Consider provisioned throughput for sustained high-volume production traffic.

When an AIF-C01 scenario mentions cost, the answer direction is usually: right-size the model (smaller model if quality is sufficient), shorten the context window, prefer batch over real-time for non-interactive workloads, and use Amazon Bedrock on-demand for variable traffic versus provisioned throughput for steady production. For training specifically, the cost-optimal direction is AWS Trainium over GPU. Source ↗

Privacy and Intellectual Property Considerations

The AIF-C01 exam treats privacy and IP considerations as first-class generative AI concepts under Domain 4 (Responsible AI) and Domain 5 (Security, Compliance, Governance). Three pain points appear in nearly every exam sitting.

Training Data Provenance

Foundation models are pretrained on massive corpora scraped from the public internet, licensed datasets, and partner data. That corpus may include copyrighted text, code, images, and personally identifiable information (PII). Downstream consequences:

Copyright risk — if the model reproduces training-data content verbatim, it may infringe copyright. Generative AI vendors increasingly offer indemnification for outputs, but the customer still owns responsibility for final use.
Brand and trademark risk — an image generative model may produce outputs resembling trademarked characters or logos.
PII memorization — LLMs can memorize rare strings from training data, including names, email addresses, and credit card numbers. Prompt extraction attacks can sometimes recover these.

Customer Data Privacy on Amazon Bedrock

A core AIF-C01 exam fact: customer data sent to Amazon Bedrock is not used to train the underlying foundation models. Customer prompts and completions are encrypted in transit and at rest, not shared with the model providers for retraining, and retained according to the service's documented policy. This is the single most important privacy statement for the exam.

Amazon Bedrock supports:

VPC endpoints — private connectivity without traversing the public internet.
KMS encryption — customer-managed KMS keys for customer data at rest where applicable.
No cross-customer data leakage — a customer's prompts are not exposed to any other customer or to model providers.
Guardrails — Amazon Bedrock Guardrails can detect and redact PII in inputs and outputs, supporting GDPR and HIPAA-style compliance requirements.

Intellectual Property of Generative Outputs

Ownership of content produced by a foundation model is a developing legal area. AIF-C01 scenarios may frame this as a governance question: "Who owns the copyright to marketing copy generated by Amazon Bedrock?" The exam-guide-friendly answer: customers are generally granted usage rights to outputs they generate through Amazon Bedrock, but broader copyright law is still evolving, and customers should not assume outputs are copyrightable or free of third-party claims without legal review.

Privacy Mitigation on AWS

Amazon Bedrock Guardrails — PII filter — detects and redacts PII (names, SSNs, emails, phone numbers) in prompts and completions.
Amazon Macie — scans Amazon S3 training datasets for PII before that data is used to fine-tune a model.
Amazon Comprehend PII Detection — standalone PII detection API for custom pipelines.
VPC endpoints and KMS — standard AWS security controls applied to generative AI.
Model Cards and AI Service Cards — AWS-published transparency documents describing intended use, known limitations, and training-data sources for managed models.

Expect at least one scenario asking about customer data, training, and Amazon Bedrock. The canonical answer: Amazon Bedrock does not use customer prompts or completions to train the foundation models. For training-data PII, the canonical answer involves Amazon Macie for S3 scanning and Amazon Bedrock Guardrails PII filters at inference. For copyright concerns, the direction is that training-data provenance and output-usage rights are governance-level concerns that customers must document in their own policies. Source ↗

Capabilities and Limitations Overview

The AIF-C01 exam expects a balanced view — what generative AI does well and where it fails.

Capabilities

Fluent text generation across languages, domains, and styles.
Summarization and extraction from long documents (within context window limits).
Code generation and review at developer productivity level.
Image and video generation from natural-language descriptions.
Conversational interfaces and natural-language question answering over internal data (via RAG).
Synthetic data generation for privacy-preserving analytics.
Multi-step agentic workflows via Amazon Bedrock Agents.

Limitations

Hallucination — covered above; the most exam-tested limitation.
Knowledge cutoff — the model's training data has a fixed date; anything after that date is unknown unless retrieved via RAG.
Context window limit — there is a hard limit on how many tokens fit in a single prompt. Long documents must be chunked, retrieved, or summarized.
Cost at scale — per-token pricing compounds rapidly at high volume. Discriminative models remain cheaper for fixed-output tasks.
Non-determinism — outputs vary run to run. Not suitable for workflows requiring deterministic outputs without prompt engineering and low temperature.
Bias and fairness — training data biases are amplified at foundation-model scale. Human review and bias audits are required for high-stakes deployments.
Explainability — foundation models are black-box. Explainability tooling like SageMaker Clarify applies to discriminative models; LLM explainability is an active research area.
Security attack surface — prompt injection, jailbreaking, and data exfiltration are new generative AI-specific threat categories.

Common Exam Traps — Domain 2 Generative AI Concepts

Generative AI is not always an LLM — diffusion models, GANs, and multimodal systems are also generative AI. If a scenario describes image generation, do not answer LLM.
Generative vs discriminative is not always obvious — Amazon Rekognition labeling objects is discriminative even though it processes images. Amazon Titan Image Generator creating images is generative.
Transformer architecture applies to encoder-only, decoder-only, and encoder-decoder models — not only LLMs. Embedding models (Amazon Titan Embeddings) are encoder-only transformers.
Attention ≠ supervised labeling — attention is a computation inside the network that weights different parts of the input. It is not a labeling step.
Hallucination ≠ general accuracy error — hallucination is a generative-specific failure mode, not the same as classification error.
Temperature ≠ accuracy control — lowering temperature reduces variance, not factual accuracy. A low-temperature model can still hallucinate.
Foundation model ≠ AGI — foundation models are very capable, domain-general ML models. They are not general intelligence.
Amazon Bedrock does not use customer data to train base models — memorize this one statement.
AWS Trainium is for training, AWS Inferentia is for inference — do not confuse the two. Trainium is for pre-training and fine-tuning large models; Inferentia is for cost-efficient serving.
Multimodal ≠ using multiple AI services — a pipeline that chains Amazon Rekognition into Amazon Comprehend is multi-service, not multimodal in the model sense. Multimodal refers to a single model that accepts multiple input modalities.

A very common AIF-C01 trap: a scenario describes generating photorealistic product images and asks which architecture applies. Candidates sometimes answer LLM because LLM is the most famous generative AI term. The correct answer is a diffusion model (or historically a GAN). LLMs generate text tokens, not pixels. If a scenario mentions images, think diffusion first, GAN second. If it mentions synthetic tabular data, think GAN. If it mentions text, code, or conversation, think LLM. Source ↗

Distinction Note — AIF-C01 Concept Recognition vs AIP-C01 Build Depth

The AWS certification path has two AI-focused certifications with overlapping names, and the AIF-C01 exam guide explicitly scopes what AIF-C01 does NOT cover. Misunderstanding the scope is a common cause of over-studying and exam-time confusion.

AIF-C01 (AI Practitioner, Foundational) — tests concept recognition. You must recognize what generative AI is, what a foundation model is, what an LLM does, what hallucination means, which AWS service applies to which use case. You do not need to write code, tune hyperparameters, or architect end-to-end systems.
AIP-C01 (AI/ML Engineer Associate or Professional-level) — tests build depth. You must configure Amazon Bedrock Knowledge Bases, tune retrieval parameters, implement evaluation pipelines, optimize inference cost with provisioned throughput, and troubleshoot agent workflows.

Concrete AIF-C01 scope examples: "Which architecture underlies LLMs?" (Transformer). "What causes hallucination?" (Training gaps, staleness, sampling). "Which AWS service accesses foundation models without infrastructure?" (Amazon Bedrock). "What mitigates hallucination?" (RAG, Guardrails grounding check, lower temperature).

Out-of-scope for AIF-C01 (but AIP-C01 territory): configuring specific chunking strategies for a Knowledge Base, choosing between HNSW and IVF indexes in a vector DB, tuning top_k retrieval at the query planner layer, building a LangGraph agent loop. If an exam question seems to require implementation detail, re-read it — AIF-C01 questions stay at the concept level.

The GenAI Value Chain — Foundation Models to Business Outcomes

The AIF-C01 exam frames generative AI as a layered value chain:

Compute infrastructure — AWS Trainium, AWS Inferentia, GPU instances. Customers rarely see this layer directly.
Foundation models — Anthropic Claude, Amazon Titan, Meta Llama, Mistral, Stability AI, Cohere, AI21 Labs. Pretrained by vendors at massive cost.
Foundation model access platform — Amazon Bedrock (single API to all the FMs above), Amazon SageMaker JumpStart (FMs you can customize and deploy to your own endpoints).
Application building blocks — Amazon Bedrock Knowledge Bases (managed RAG), Amazon Bedrock Agents (multi-step orchestration), Amazon Bedrock Guardrails (content safety), prompt management, model evaluation.
Purpose-built generative AI assistants — Amazon Q Business (enterprise assistant), Amazon Q Developer (coding assistant), Amazon Q in QuickSight (BI assistant), Amazon Q in Connect (contact-center assistant).
Business outcomes — customer support automation, content marketing, code acceleration, knowledge management, drug discovery, creative production.

The AIF-C01 exam will test your ability to pick the right layer for a use case. "Business users want a chat assistant over internal SharePoint and S3" — layer 5, Amazon Q Business. "Developers need API access to Anthropic Claude for a custom app" — layer 3, Amazon Bedrock. "ML team wants to fine-tune Llama on healthcare records and deploy to a private endpoint" — layer 3 alternative, Amazon SageMaker JumpStart.

Plain-Language Explanation: Generative AI Concepts

Generative AI concepts get abstract fast. Three non-technical analogies cover almost every exam scenario.

Analogy 1 — The open-book exam vs the closed-book exam (開書考試)

Discriminative AI is a closed-book multiple-choice exam. The question is fixed, the answers are one of four letters, and the model's job is to pick the right letter. There is no creativity. Amazon Comprehend sentiment classification is a closed-book exam — the four answers are positive, negative, neutral, mixed, and the model picks one.

Generative AI is an open-book essay exam. The question is "write me 500 words on X," and the model has to compose an answer from scratch, drawing on everything it has ever read. There is no single right answer, the output is long and structured, and two students given the same question write different essays. Anthropic Claude on Amazon Bedrock is taking an open-book exam every time it answers. If the student invents a quote from a book that does not exist — that is hallucination. The fix is to give the student a real book to reference during the exam — that is retrieval-augmented generation (RAG).

Analogy 2 — The master chef and the probability cookbook (廚房)

Imagine a chef who has cooked millions of meals and built up a mental cookbook — not a rigid recipe list, but a learned intuition that says "when I have these ingredients, the next step is probably sauté, with some probability roast, with a small probability boil." That intuition is a probability distribution over next steps. A generative model is exactly this chef. It does not store every training-data dish verbatim; it stores a distribution over "what comes next."

When you prompt an LLM with "tell me a story about a dragon," the model is not looking up a story in a database. It is checking its internal distribution — "given this starting context, what is the most likely next word?" — and sampling one. Then it repeats. Sometimes the chef invents a dish that tastes great but uses an ingredient that does not exist in the kitchen. That is hallucination. Lower temperature = the chef plays it safe, always using the most predictable ingredient. Higher temperature = the chef gets creative, sometimes brilliantly, sometimes with fabricated ingredients.

A diffusion model is a different kind of chef. Instead of writing a recipe one word at a time, the chef starts with a plate of random ingredients scattered around, then cleans up the plate one pass at a time until it looks like the final dish. Each cleanup pass is guided by "what should this dish look like based on the prompt?" After 50 passes, the plate is a finished dish. That is how text-to-image models like Amazon Titan Image Generator work.

Analogy 3 — The traffic control tower (交通號誌 / 導航)

The transformer's attention mechanism is a traffic control tower where every car (token) can see every other car on the board at once and decides how much to be influenced by each. In an old RNN architecture, cars could only see the car directly in front of them, and information about distant cars faded fast — so long-range planning was impossible. The attention mechanism puts every car on a single radar screen. When the navigation AI plans the next move, it queries the radar ("which of the other cars are relevant to my lane change?"), weighs them, and decides.

Self-attention is the controller looking at cars in the same direction (same sequence). Cross-attention is the controller comparing cars in two different lanes — for example, text tokens attending to image tokens in a multimodal model. Encoder-only transformers are controllers that only read the current state of traffic. Decoder-only transformers are controllers that also issue movement commands one car at a time. Encoder-decoder transformers are two controllers that hand off — one reads, the other commands.

The AIF-C01 exam uses this mental picture without asking for formulas. If the question mentions "attention," think "every token looks at every other token simultaneously in parallel." That is the insight.

FAQ — Generative AI Concepts Top Questions

1. What makes a model "generative" versus "discriminative"?

A generative model learns a probability distribution over sequences (or images, audio, etc.) so it can sample new examples that look like training data. It models P(X) or P(X, Y). A discriminative model learns only the boundary between classes — it models P(Y | X), outputting a label given an input. Amazon Comprehend classifying sentiment is discriminative. Amazon Bedrock with Anthropic Claude writing a summary is generative. The exam shortcut: if the output is a fixed label from a known set, it is discriminative; if the output is new content of arbitrary structure, it is generative.

2. What is the transformer's attention mechanism, in plain English?

Attention is a computation inside a transformer that lets every token in a sequence look at every other token in parallel, and decide how much weight to give each one when producing its output representation. Self-attention applies this within the same sequence (every input token attends to every other input token). Cross-attention applies it across sequences (text tokens attending to image tokens in a multimodal model). Attention is the core innovation that made modern LLMs possible because it handles long-range dependencies and enables massive parallel training on AWS Trainium and GPU clusters.

3. What causes hallucination in generative AI, and how do I mitigate it?

Hallucination happens because LLMs are trained to produce plausible-sounding token sequences, not to verify facts. Causes: training-data gaps, knowledge cutoffs, sampling noise, and prompt ambiguity. The main mitigation strategies on AWS are: (1) retrieval-augmented generation (RAG) with Amazon Bedrock Knowledge Bases, which injects authoritative source documents into the prompt; (2) Amazon Bedrock Guardrails grounding check, which verifies responses against a reference source; (3) lower temperature, which biases sampling toward safer high-probability tokens; (4) explicit grounding prompts like "answer only from the provided context"; and (5) Amazon A2I human-in-the-loop review for high-stakes outputs.

4. What is a multimodal model and when do I need one?

A multimodal model is a single foundation model that processes or produces more than one content type — for example, accepting text and images as input, or generating video from a text prompt. You need a multimodal model when the task cannot be separated into independent unimodal steps: describing a chart from an image, generating images from text, or captioning video. On Amazon Bedrock, Anthropic Claude supports vision input (image + text), Amazon Titan Image Generator does text-to-image, and Amazon Nova Reel does text-to-video. A pipeline that chains Amazon Rekognition into Amazon Comprehend is multi-service, but not multimodal in the model-architecture sense.

5. Why are foundation models so expensive to train, and does that matter for my exam?

Foundation models train on trillions of tokens with tens to hundreds of billions of parameters, requiring thousands of GPU or AWS Trainium chips for weeks or months. That is roughly four to six orders of magnitude more compute than a typical discriminative model. For AIF-C01, this matters because it explains (a) why Amazon Bedrock exists — you rent access to pretrained models instead of training your own; (b) why fine-tuning is cheaper than pre-training and is the realistic customization path for most customers; and (c) why AWS Trainium is the cost-optimal training hardware and AWS Inferentia is the cost-optimal inference hardware. If a cost-related scenario asks about training, the answer direction is Trainium; for inference, Inferentia or Amazon Bedrock on-demand.

6. Does Amazon Bedrock use my prompts and data to train foundation models?

No. This is one of the most important privacy facts on AIF-C01. Customer prompts and completions sent to Amazon Bedrock are not used to train the underlying foundation models. Data is encrypted in transit and at rest, can be routed through VPC endpoints to avoid the public internet, and is not shared with model providers for retraining. For additional PII protection, Amazon Bedrock Guardrails can detect and redact sensitive information in both inputs and outputs, and Amazon Macie can pre-scan Amazon S3 training datasets for PII before fine-tuning.

7. What is the difference between a foundation model, an LLM, and generative AI?

Generative AI is the broadest term — any ML system that produces new content (text, images, audio, video, code, synthetic data). A foundation model is a specific kind of generative AI: a large, pretrained, general-purpose model adaptable to many downstream tasks via prompting, fine-tuning, or RAG. An LLM (large language model) is the text-specialized subset of foundation models — foundation models for text. On AWS, Anthropic Claude and Amazon Titan Text are LLMs, Amazon Titan Image Generator is a foundation model but not an LLM (it generates images), and both are examples of generative AI.

8. Are emergent capabilities a real thing, and does AIF-C01 test them?

Yes and yes. Emergent capabilities are abilities that appear in foundation models only when scale crosses a threshold — few-shot in-context learning, chain-of-thought reasoning, instruction following, cross-lingual transfer. Smaller models cannot reliably do these. AIF-C01 frames emergent capabilities as a justification for foundation models: if scale did not unlock new capabilities, no one would spend millions on pre-training. The exam tests this as a property of foundation models, not as a rigorous benchmark — you need to recognize the term and associate it with "large, pretrained, general-purpose models."

Summary

Generative AI concepts are the foundation of Domain 2 (24 percent) and implicit in Domain 3 (28 percent) of AIF-C01 — together more than half the exam. A generative AI model learns a probability distribution over sequences (or images, audio, etc.) and samples new content from it; a discriminative model only learns a boundary between classes. Modern generative AI is built on the transformer architecture, whose attention mechanism lets every token attend to every other token in parallel, enabling long-range dependency handling and massive-scale parallel training on AWS Trainium and GPU clusters. Scale produces emergent capabilities — few-shot learning, chain-of-thought reasoning, instruction following — that small models do not have. Generative AI spans text (LLMs), images (diffusion models), audio, video, code, and synthetic data (GANs or diffusion), and multimodal foundation models combine modalities in a single network. The primary generative AI failure mode is hallucination: fluent but fabricated content, mitigated on AWS through retrieval-augmented generation with Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails grounding check, lower temperature, explicit grounding prompts, and Amazon A2I human review. Generative AI cost drivers are training compute (lowest-cost on AWS Trainium), inference tokens (right-size the model, use batch when possible, consider provisioned throughput for steady workloads), and retrieval infrastructure. Privacy and IP considerations center on training-data provenance, customer data isolation (Amazon Bedrock does not use customer prompts to train base models), PII handling with Amazon Macie and Amazon Bedrock Guardrails, and output-usage rights. For AIF-C01, stay at the concept-recognition level — transformer architecture without math, attention without gradient derivations, hallucination mitigation by service name — and leave the build-depth detail to AIP-C01.

What is Generative AI? — From Discriminative to Generative Paradigm

Why Generative AI Matters for AIF-C01

Core Claim: Generative AI Learns a Probability Distribution Over Sequences

Generative vs Discriminative Models — The Core Distinction

Generative vs Discriminative — Side by Side

Content Types Generated — Text, Images, Audio, Video, Code, Synthetic Data

Large Language Models — Predict-Next-Token Mechanism

LLM Capabilities Without Task-Specific Training

Diffusion Models — Image Generation via Noise Denoising

GANs — Generator vs Discriminator Architecture

Transformer Architecture at Concept Level — Attention Is All You Need

Why Transformers Replaced RNNs

The Attention Mechanism — Conceptually

Self-Attention vs Cross-Attention

Encoder vs Decoder

Training Data Scale and Compute Requirements

The Scale Gap

Compute Infrastructure on AWS

Why the Scale Matters

Emergent Capabilities — What Scale Unlocks

Multimodal Models — Accepting and Producing Multiple Input Types

Types of Multimodal Behavior

Multimodal vs Unimodal on the Exam

Hallucination — Causes and Mitigation

What Hallucination Is

Why Hallucination Happens — Five Causes

Hallucination Mitigation Strategies on AWS

Cost Drivers of Generative AI Workloads

Training Cost Drivers

Inference Cost Drivers

Cost Optimization Tactics

Privacy and Intellectual Property Considerations

Training Data Provenance

Customer Data Privacy on Amazon Bedrock

Intellectual Property of Generative Outputs

Privacy Mitigation on AWS

Capabilities and Limitations Overview

Capabilities

Limitations

Common Exam Traps — Domain 2 Generative AI Concepts

Distinction Note — AIF-C01 Concept Recognition vs AIP-C01 Build Depth

The GenAI Value Chain — Foundation Models to Business Outcomes

Plain-Language Explanation: Generative AI Concepts

Analogy 1 — The open-book exam vs the closed-book exam (開書考試)

Analogy 2 — The master chef and the probability cookbook (廚房)

Analogy 3 — The traffic control tower (交通號誌 / 導航)

FAQ — Generative AI Concepts Top Questions

1. What makes a model "generative" versus "discriminative"?

2. What is the transformer's attention mechanism, in plain English?

3. What causes hallucination in generative AI, and how do I mitigate it?

4. What is a multimodal model and when do I need one?

5. Why are foundation models so expensive to train, and does that matter for my exam?

6. Does Amazon Bedrock use my prompts and data to train foundation models?

7. What is the difference between a foundation model, an LLM, and generative AI?

8. Are emergent capabilities a real thing, and does AIF-C01 test them?

Further Reading

Summary

Official sources