Foundation Models and LLMs — AIF-C01 Study Notes on Amazon Bedrock

Foundation models and LLMs are the centre of gravity of the AWS Certified AI Practitioner AIF-C01 exam. Task Statement 2.1 demands that you can define a foundation model, explain why a large language model is a specific subtype, list the foundation models available on Amazon Bedrock (Anthropic Claude, Amazon Titan, Amazon Nova, Meta Llama, Mistral AI, AI21 Labs Jurassic, Cohere Command, Stability AI Stable Diffusion), and reason about the tradeoffs that drive model choice — size versus cost versus latency versus accuracy, open weight versus proprietary, text versus image versus multimodal. This study note is written to that bar. Foundation models and LLMs receive roughly 350 of the 1800 questions budgeted for Domain 2, and every remaining Domain 2 and Domain 3 topic assumes you already know the foundation models and LLMs vocabulary cold.

This guide also marks where AIF-C01 ends and the forthcoming AIP-C01 professional tier begins — the foundational exam asks you to recognise and select foundation models, while AIP-C01 asks you to design, optimise, and operate them in production. Keep that scope boundary in mind as you read.

What Are Foundation Models and LLMs?

A foundation model is a large neural network that has been pretrained on a very broad dataset using self-supervised learning, and is therefore adaptable — through prompting, retrieval, or fine-tuning — to a wide range of downstream tasks without being rebuilt from scratch. The term was introduced by Stanford's Center for Research on Foundation Models in 2021 precisely to capture that "one model, many tasks" property. The AIF-C01 exam adopts this definition verbatim.

A large language model (LLM) is the text-specific subtype of foundation model. An LLM ingests sequences of tokens and predicts the next token; that simple mechanism, repeated billions of times during pretraining, produces emergent capabilities such as summarisation, translation, question answering, classification, and code generation. Every LLM on Amazon Bedrock — Claude, Titan Text, Llama, Mistral, Jurassic, Cohere Command — is a foundation model, but not every foundation model is an LLM. Stability AI Stable Diffusion, for example, is a foundation model for image generation; Amazon Titan Image Generator is a foundation model for images; Amazon Nova is a family of multimodal foundation models that covers text, images, and video.

The word "foundation" is therefore load-bearing: it signals (1) large scale in parameters and training tokens, (2) broad pretraining coverage, and (3) downstream adaptability. Miss any of the three and the model is not a foundation model in the exam sense. Amazon Bedrock is the AWS service through which you call foundation models as APIs; Amazon SageMaker JumpStart is the AWS service through which you deploy foundation models onto managed infrastructure you control. Both are relevant, both appear in AIF-C01 questions, and both rely on the foundation models and LLMs vocabulary you are about to master.

Why Foundation Models and LLMs Dominate AIF-C01

AIF-C01 Domain 2 (Fundamentals of Generative AI) carries 24 percent of the exam and every Domain 2 question either directly tests foundation models and LLMs or assumes you can recognise one. Domain 3 (Applications of Foundation Models) adds another 28 percent — so more than half of the AIF-C01 exam depends on the foundation models and LLMs knowledge in this note. Skipping this topic is not an option.

A foundation model is (1) large, (2) pretrained on broad data with self-supervision, and (3) adaptable to many downstream tasks. All three clauses must hold. A task-specific CNN trained only on chest X-rays is large and pretrained but not broad, so it is not a foundation model. A small rules engine is broad but not pretrained, so it is not a foundation model. Drill this definition; roughly one in every five Domain 2 questions rewards a clean recitation. Source ↗

白話文解釋 Foundation Models and LLMs

Foundation models and LLMs sound academic, but three everyday analogies make the concept obvious.

Analogy 1 — The Open-Book Exam Library

Imagine you are sitting an open-book exam. You walk into a library that has already catalogued every textbook, newspaper, code repository, and Wikipedia article on Earth. You did not build the library; a team of librarians (the pretraining team) spent millions of dollars and months of GPU time organising it. When the exam starts, you do not re-read every book — you ask pointed questions and the library hands you the relevant pages.

The library itself is the foundation model.
Reading a single shelf end-to-end is pretraining.
The librarian's helpful summary of your question is inference.
Writing a sticky note to narrow a future query is prompt engineering.
Re-indexing a private section so the library speaks your company's jargon is fine-tuning.
Adding your company's intranet to the catalogue without retraining is retrieval-augmented generation (RAG).

Large language models are the text wing of that library. Image foundation models like Stable Diffusion are the art collection. Multimodal foundation models like Amazon Nova are the cross-referenced media room. A question on AIF-C01 that asks "which approach lets a pretrained model answer domain-specific questions without changing its weights" is really asking "which approach uses the library without re-cataloguing it," and the answer is prompt engineering or RAG — not fine-tuning and definitely not pretraining from scratch.

Analogy 2 — The Swiss Army Knife Factory

A foundation model is a Swiss Army knife. It is not the sharpest kitchen knife, not the most precise scalpel, and not the strongest crowbar — but it is acceptable at hundreds of tasks out of the box. The factory (OpenAI, Anthropic, Meta, Amazon, Mistral, AI21, Cohere, Stability AI) forges the knife once with enormous capital; you rent it from Amazon Bedrock by the token.

The big main blade is the LLM text capability — summarise, classify, translate, write code.
The scissors are the multimodal vision capability — describe an image, read a chart.
The corkscrew is the code assistant capability — explain a stack trace, generate a function.
The small blade is the embedding model capability — Amazon Titan Embeddings, Cohere Embed.
The toothpick is the guardrail — small, quiet, but keeps the whole tool safe.

Picking a foundation model is picking which Swiss Army knife to clip to your belt. A jeweller (latency-critical chat) wants a small, fast knife such as Claude Haiku or Amazon Nova Micro. A carpenter framing a house (long-context analysis of a 200-page contract) wants a big robust knife such as Claude Opus or Amazon Nova Premier. Both are foundation models; both are LLMs; the difference is the cost-to-capability ratio — which is exactly the tradeoff AIF-C01 loves to probe.

Analogy 3 — The Electrical Grid

Treat foundation models and LLMs as an electrical grid. Before the grid, every factory ran its own coal generator — the pre-2018 world in which each company trained its own model from scratch. The grid arrives (pretraining by hyperscalers) and now any household can plug in a toaster (a downstream task) and get reliable power. Amazon Bedrock is the wall socket; the model provider (Anthropic, Meta, Amazon, Mistral) is the power plant; tokens are kilowatt-hours you pay by the meter.

Pretraining a foundation model from scratch costs tens to hundreds of millions of dollars — like building a nuclear plant.
Fine-tuning is installing a voltage converter so the grid speaks your appliance's language.
Prompt engineering is flipping the switch correctly.
Proprietary models are like a regulated utility — you pay the meter and obey the EULA.
Open-weight models such as Meta Llama or Mistral are like a cooperative microgrid — you can download the generator and run it yourself on Amazon SageMaker JumpStart, but you inherit the maintenance.

If a question asks "why did the industry move from task-specific ML to foundation models," the grid analogy gives you the answer: centralising the expensive work (pretraining) behind a metered API dramatically lowers the cost of each downstream application.

Transformer Architecture — The Engine Inside Every LLM

The transformer is the neural-network architecture that powers every LLM on Amazon Bedrock. The AIF-C01 exam does not require the math; it requires the conceptual vocabulary.

The Attention Mechanism in One Paragraph

Older architectures processed text one word at a time and forgot earlier context quickly. The transformer, introduced by Vaswani et al. in 2017, replaces that sequential reading with an attention mechanism: for every token being generated, the model looks at every other token in the context window simultaneously and decides which ones matter most. That parallelism is why transformers scale to hundreds of billions of parameters and to context windows of hundreds of thousands of tokens, and it is why a single foundation model can handle summarisation, translation, and code generation with the same weights.

Why It Matters for Model Selection

Two transformer properties drive every AIF-C01 model-selection question.

Context window size is bounded by the transformer's attention mechanism — doubling it roughly quadruples compute. That is why Claude Sonnet with a 200k context window costs more per token than Claude Haiku with the same window but fewer layers. Context window is a foundation models and LLMs design parameter; the exam expects you to recognise it as a cost driver.
Parameter count (the billions of weights learned during pretraining) controls capability. More parameters usually mean more world knowledge and stronger reasoning, but also higher latency, higher price per token, and more GPU memory. Small LLMs like Amazon Nova Micro, Claude Haiku, Mistral 7B are cheap and fast. Large LLMs like Claude Opus, Amazon Nova Premier, Llama 3.1 405B are expensive and deliberate.

A foundation model (FM) is a large neural network pretrained on a broad, mostly unlabelled dataset using self-supervised learning, such that it can be adapted — via prompting, in-context learning, retrieval augmentation, or fine-tuning — to many downstream tasks without being retrained from scratch. AWS specifically markets Amazon Bedrock as "the easiest way to build and scale generative AI applications with foundation models." Source ↗

Pretraining at Scale — How a Foundation Model Is Born

Pretraining is the single most expensive step in the lifecycle of a foundation model. The pretraining team assembles a corpus — for LLMs this is typically trillions of tokens drawn from web crawls, books, code repositories, and licensed datasets — and trains the transformer to predict the next token across that entire corpus. The training run itself takes weeks to months on thousands of GPUs or AWS Trainium chips, and the cost can run from single-digit millions of dollars for a 7-billion-parameter model to more than a hundred million dollars for a frontier-class model.

The AIF-C01 exam does not ask you to price a pretraining run, but it does ask you to recognise three consequences of pretraining at scale:

Foundation models and LLMs exhibit emergent capabilities — behaviours (chain-of-thought reasoning, zero-shot translation, tool use) that appear only above a certain scale and are not explicitly programmed.
Foundation models and LLMs carry a knowledge cutoff — the most recent date of their training data. Anything after that date is invisible unless supplied via RAG or tool use.
Foundation models and LLMs inherit their training data's biases, copyrighted material, and factual errors. This is why responsible-AI guardrails (covered in Domain 4) exist.

Pretraining is also why the industry moved to renting foundation models rather than building them. For most enterprises, training a new foundation model from scratch is economically irrational; calling one on Amazon Bedrock is the dominant choice.

Pretraining, fine-tuning, continued pretraining, and instruction tuning are four distinct operations. Pretraining builds a foundation model from scratch on broad data. Continued pretraining extends an existing foundation model on more broad data (for example, domain text). Fine-tuning updates weights on labelled (instruction, response) pairs. Instruction tuning is a specific kind of fine-tuning that teaches a model to follow human-style instructions — the step that turns a raw LLM into a chat assistant. Confusing them is a classic AIF-C01 trap. Source ↗

Model Parameters — What "Billions of Parameters" Means

A parameter is one learned weight inside the transformer. A 7-billion-parameter LLM has seven billion such weights; a 405-billion-parameter LLM has 405 billion. On AIF-C01 you need three practical intuitions.

More parameters generally mean more capability, especially on reasoning-heavy tasks — but with diminishing returns. Doubling parameters does not double accuracy.
More parameters mean more GPU memory. A 70B model in 16-bit precision needs roughly 140 GB of accelerator memory just to hold the weights; a 7B model fits on a single consumer GPU. This drives instance selection on Amazon SageMaker and drives per-token pricing on Amazon Bedrock.
More parameters mean higher latency per token. For interactive chat, latency is often the binding constraint even when capability is acceptable.

Parameter count is therefore one axis of the model-selection decision, not the only axis. The foundation models and LLMs on Amazon Bedrock are deliberately offered in multiple sizes — Claude Haiku / Sonnet / Opus, Amazon Nova Micro / Lite / Pro / Premier, Llama 3.1 8B / 70B / 405B, Mistral 7B / 8x7B / Large — so that you can match the size to the workload.

Parameters vs Training Tokens — The Scaling Laws

The DeepMind Chinchilla paper showed that, for a given compute budget, a smaller model trained on more tokens can beat a larger model trained on fewer tokens. That insight explains why modern foundation models advertise both parameter count and training-token count. AIF-C01 does not ask you to compute scaling laws, but it expects you to know that model quality is a function of parameters and data, not parameters alone.

Foundation Model Families on Amazon Bedrock

Amazon Bedrock aggregates foundation models from seven external providers plus Amazon's own Titan and Nova families behind a single API. On the exam you must be able to map each family to its typical use case.

Anthropic Claude

Anthropic's Claude family (Haiku, Sonnet, Opus, and the Claude 3.5/3.7 generations at time of writing) is the proprietary text and multimodal LLM line-up that Amazon Bedrock highlights first. Claude excels at long-context reasoning (up to 200k tokens), careful instruction following, and constitutional AI guardrailing. Anthropic is an AWS strategic partner and Claude is typically the default "capable and safe" choice on the exam. Claude is a closed foundation model — you access it via Amazon Bedrock API, you do not download the weights.

Amazon Titan

Amazon Titan is Amazon's first-party foundation model family. Titan Text (Express, Lite) covers general-purpose text generation; Titan Embeddings produces vectors for semantic search and RAG; Titan Image Generator produces images from text prompts. Titan is deeply integrated with Amazon Bedrock Knowledge Bases and Amazon Bedrock Guardrails. The exam often positions Titan as the "stay within AWS, predictable pricing" option.

Amazon Nova

Amazon Nova, introduced at re:Invent 2024, is Amazon's frontier multimodal foundation model family. Nova Micro is text-only and optimised for latency; Nova Lite, Nova Pro, and Nova Premier handle text, images, and video inputs. Nova is offered only through Amazon Bedrock, is tightly integrated with Amazon Bedrock Agents, and is positioned as Amazon's price/performance leader. Expect AIF-C01 to include Nova in the Bedrock model catalogue by exam-guide v1.1.

Meta Llama

Meta's Llama family (Llama 2, Llama 3, Llama 3.1, Llama 3.2) is the flagship open-weight foundation model line. Llama 3.1 ships at 8B, 70B, and 405B parameters; Llama 3.2 adds smaller and multimodal variants. On Amazon Bedrock, Llama is consumed via API like any other foundation model; on Amazon SageMaker JumpStart, Llama weights can be deployed onto customer-controlled infrastructure for heavier customisation. Llama's Community License is permissive but not pure open-source — large deployments (over 700M monthly active users) require a separate licence from Meta. That licensing nuance matters for the exam's open-versus-proprietary questions.

Mistral AI

Mistral AI (a French lab) offers Mistral 7B, Mixtral 8x7B (a mixture-of-experts architecture), and Mistral Large. Mistral 7B and Mixtral 8x7B are released under the permissive Apache 2.0 licence; Mistral Large is proprietary. Mistral's niche on the exam is "efficient foundation model, strong at European languages, available open-weight." Mistral models are available on Amazon Bedrock and via SageMaker JumpStart.

AI21 Labs Jurassic

AI21 Labs provides the Jurassic-2 family and the newer Jamba family on Amazon Bedrock. Jurassic is positioned as a proprietary text LLM optimised for enterprise content generation in multiple languages. On AIF-C01 you need only recognise Jurassic as one of the Amazon Bedrock foundation model providers, not memorise its benchmark scores.

Cohere Command

Cohere's Command family (Command, Command R, Command R+) is a proprietary text LLM line with an enterprise focus, especially on RAG workflows and tool use. Cohere Embed is the companion embedding model. Cohere is a core citizen of Amazon Bedrock Knowledge Bases.

Stability AI Stable Diffusion

Stability AI contributes Stable Diffusion (SD3, SDXL) and Stable Image models — foundation models for image generation rather than text. Stable Diffusion is the canonical example the AIF-C01 exam uses to show that "foundation model" is broader than "LLM." Stable Diffusion weights are released under the Stability AI Community License, which is an open-weight but non-commercial-friendly licence with usage thresholds.

Other Amazon Bedrock Foundation Models

The Amazon Bedrock model catalogue also includes models from HuggingFace (via custom model import), Writer (Palmyra), and specialty providers. For AIF-C01, memorise the seven headline families listed above — Anthropic, Amazon (Titan and Nova), Meta, Mistral, AI21, Cohere, Stability AI — and treat the rest as awareness-level knowledge.

For AIF-C01 you must be able to recite the providers on Amazon Bedrock: Anthropic Claude, Amazon Titan, Amazon Nova, Meta Llama, Mistral AI, AI21 Labs Jurassic, Cohere Command, Stability AI Stable Diffusion. Missing even one in a drag-and-drop question is a direct point loss. Build a flashcard for each provider and each model's primary modality (text, image, multimodal, embedding). Source ↗

Text vs Image vs Multimodal Foundation Models

Foundation models come in three modality flavours, and the AIF-C01 exam tests scenario-to-modality mapping.

Text Foundation Models (LLMs)

Text foundation models — the classic LLM — accept token sequences and emit token sequences. Examples: Anthropic Claude Sonnet, Amazon Titan Text Express, Meta Llama 3.1 70B, Mistral Large, AI21 Jurassic-2 Ultra, Cohere Command R+. Use cases: summarisation, Q&A, translation, classification, code generation, chat.

Image Foundation Models

Image foundation models accept text prompts (and optionally image inputs) and emit images. Examples: Stability AI Stable Diffusion SDXL, Amazon Titan Image Generator. Use cases: marketing creative generation, synthetic training data, variation and inpainting. On Amazon Bedrock, image models are billed per generated image, not per token.

Multimodal Foundation Models

Multimodal foundation models accept more than one modality simultaneously. Anthropic Claude Sonnet and Opus accept text plus images (vision input). Amazon Nova Pro and Premier accept text, images, and video. Multimodal models enable use cases such as chart understanding, document layout analysis beyond OCR, and video summarisation. On AIF-C01, expect at least one question that requires you to pick a multimodal model for a scenario like "summarise this 10-minute video transcript and attached slide deck."

Embedding Models

Separately, embedding models are a foundation model subtype that map text (or images) into fixed-length numeric vectors for semantic search and RAG. Examples: Amazon Titan Embeddings, Cohere Embed. Embeddings are covered in depth in the embeddings-and-vector-databases topic.

Model Size vs Cost vs Latency vs Accuracy — The Four-Axis Tradeoff

This is the single most tested mental model in foundation models and LLMs. Every real-world choice on Amazon Bedrock is a point on a four-axis tradeoff.

Axis 1 — Capability (Accuracy, Reasoning Depth)

Larger foundation models score higher on reasoning benchmarks, handle more ambiguous instructions, and produce higher-quality creative output. Claude Opus and Llama 3.1 405B sit at the top of this axis; Claude Haiku and Mistral 7B sit lower.

Axis 2 — Latency (Tokens per Second and First-Token Time)

Smaller models emit tokens faster. A 7B LLM can produce 100+ tokens per second on a modest GPU; a 400B+ LLM may emit 20–30. For an interactive chat UI with a 1-second latency budget, only small and medium foundation models qualify. For batch document processing with a 10-minute SLA, the fastest model no longer matters.

Axis 3 — Cost (Per Input Token and Per Output Token)

Amazon Bedrock bills per 1,000 input tokens and per 1,000 output tokens, and prices scale roughly with parameter count. At the time of writing, Claude Haiku is roughly 10–20× cheaper per output token than Claude Opus. Batch inference and provisioned throughput further change the cost profile — for AIF-C01, recognise that cost is a first-class model-selection factor, not a footnote.

Axis 4 — Context Window (Tokens per Request)

Foundation models differ on context window length. Claude Sonnet supports 200k tokens; Llama 3.1 supports 128k; older Titan Text Express supports 8k. Long context windows unlock whole-document summarisation without chunking but cost more per request. Pick the smallest context window that covers your worst-case input.

Combining the Axes — Small Fast vs Large Accurate

Use the following heuristic on the exam:

Real-time customer-facing chat with short prompts → small, fast foundation model (Claude Haiku, Nova Micro, Mistral 7B).
Complex multi-step reasoning or nuanced writing → large foundation model (Claude Opus, Nova Premier, Llama 3.1 405B).
High-volume batch classification → small foundation model with batch inference or provisioned throughput.
Long-context analysis of contracts or research → large foundation model with big context window (Claude Sonnet/Opus).

Exam distractors frequently suggest the largest foundation model "because it is the most accurate." That is wrong whenever the small model already passes the accuracy bar at a tenth of the cost and a fifth of the latency. AIF-C01 rewards the minimum viable foundation model for the scenario, not the biggest one. If the question says "optimise cost while meeting the requirement," pick the smallest Claude (Haiku) or Amazon Nova (Micro/Lite) that plausibly solves the task. Source ↗

Open-Weight vs Proprietary Foundation Models — The Licensing Landscape

Foundation models split into two commercial categories, and licensing is the exam's favourite trap after size-versus-cost.

Proprietary (Closed-Weight) Foundation Models

A proprietary foundation model is one whose weights are never released to customers. You call it through an API — Amazon Bedrock, the provider's own endpoint, or Azure/GCP equivalents — and you accept the provider's end-user licence agreement (EULA) for each invocation. Examples: Anthropic Claude, Amazon Titan, Amazon Nova, AI21 Jurassic, Cohere Command, Mistral Large.

Proprietary advantages: no infrastructure to run; continuous updates from the provider; strong guardrails and safety training baked in.

Proprietary disadvantages: per-token cost over the product lifetime; dependency on provider availability; weights cannot be audited or deployed on-premises.

Open-Weight Foundation Models

An open-weight foundation model is one whose weights are published, usually on HuggingFace, under a licence that permits download and self-hosting. Examples: Meta Llama 3.1, Mistral 7B, Mixtral 8x7B, Stability AI Stable Diffusion. These models are still available on Amazon Bedrock as managed APIs, and they can additionally be deployed onto Amazon SageMaker JumpStart or self-managed compute.

Open-weight advantages: full control over deployment; one-time infrastructure cost rather than per-token cost; ability to fine-tune weights aggressively; audit-friendly.

Open-weight disadvantages: you run the infrastructure; you are responsible for safety guardrails; licence terms can restrict commercial use.

Open-Weight Is Not the Same as Open-Source

A classic AIF-C01 trap: "open-weight" and "open-source" are different. Meta's Llama Community License is open-weight but not OSI-approved open-source — it restricts large-scale deployments and certain use cases. Stability AI's Community License is open-weight but non-commercial beyond certain revenue thresholds. Only models released under Apache 2.0, MIT, or equivalent (for example, Mistral 7B and Mixtral 8x7B) are genuinely open-source.

EULA and Responsible Use

Every foundation model on Amazon Bedrock carries an acceptable-use policy. You cannot use Claude to generate child-safety violations; you cannot use Llama to power a mass-surveillance system in violation of Meta's licence; you cannot use Stable Diffusion to create non-consensual imagery. The exam expects you to recognise that the EULA is enforced by the provider, not by AWS, and that AWS provides additional controls (Amazon Bedrock Guardrails) that customers can layer on top.

Proprietary = weights never leave the provider (Claude, Titan, Nova, Jurassic, Cohere Command, Mistral Large). Open-weight = weights downloadable under a custom licence with some restrictions (Llama, Stable Diffusion). Open-source = weights plus permissive OSI-approved licence such as Apache 2.0 or MIT (Mistral 7B, Mixtral 8x7B). On the AIF-C01 exam, "open-source" is the strictest label — do not apply it to Llama unless the question explicitly allows the Community License. Source ↗

When to Choose a Small Fast Model vs a Large Accurate Model

Bringing the four axes together, here is the decision rubric AIF-C01 expects you to apply.

Choose a Small Fast Foundation Model When

Latency is a hard constraint (sub-500ms first token).
Cost per invocation must be very low (millions of calls per day).
The task is simple classification, extraction, or templated generation.
Volume is high enough that provisioned throughput on a small model beats on-demand on a large model.
The product is in rapid prototyping and you want quick iteration cycles.

Concrete picks: Claude 3 Haiku, Amazon Nova Micro, Amazon Titan Text Lite, Mistral 7B, Llama 3.1 8B, Cohere Command Light.

Choose a Large Accurate Foundation Model When

Task requires multi-step reasoning, nuance, or creative writing judged by humans.
Context window needs to hold a whole contract, research paper, or codebase.
Accuracy on long-tail questions is worth a 10× cost premium.
The foundation model's output feeds directly into a regulated decision (still subject to responsible-AI review).
You need the strongest multimodal understanding (charts, diagrams, video).

Concrete picks: Claude 3.5 Sonnet, Claude 3 Opus, Amazon Nova Pro, Amazon Nova Premier, Llama 3.1 70B or 405B, Mistral Large, Cohere Command R+.

The Hybrid Pattern — Small by Default, Escalate on Uncertainty

Real production systems rarely pick one foundation model for all traffic. A common pattern — and an AIF-C01 scenario staple — is to route 95 percent of traffic to a small fast foundation model and escalate the uncertain 5 percent to a large accurate one. Amazon Bedrock supports this pattern natively through model routing via application logic and through cross-region inference profiles for resilience.

Why Foundation Models Changed the ML Industry

Before foundation models, every business ML problem required its own dataset, its own training pipeline, and its own deployed model. A bank building a fraud classifier, a retailer building a product-description generator, and a hospital building a radiology assistant each trained from scratch — expensive, slow, and brittle. Foundation models collapsed that per-task cost because a single pretrained foundation model can be adapted to many downstream tasks through prompting, RAG, or light fine-tuning.

The AIF-C01 exam expects you to articulate this shift in one sentence: foundation models and LLMs let organisations solve new ML problems in days instead of months, by renting capability from Amazon Bedrock rather than building it on Amazon SageMaker from scratch. That shift is the economic reason generative AI became a C-suite priority.

The Stanford CRFM Framing

The Stanford Center for Research on Foundation Models coined the term foundation model precisely to capture three risks of this shift. First, homogenisation: if every downstream application rests on a handful of foundation models, a single flaw in one foundation model propagates everywhere. Second, emergence: capabilities appear at scale that the pretraining team did not anticipate, which complicates safety analysis. Third, centralisation: the economic cost of pretraining concentrates power in a few providers. AIF-C01 does not ask you to debate these risks, but it does expect you to recognise "foundation model" as a term of art with specific properties.

Intrinsic Risks of Foundation Models and LLMs

The exam tests three intrinsic risks repeatedly. Each risk is a direct consequence of how foundation models are trained.

Hallucination

An LLM generates fluent text even when it has no factual grounding. It will confidently invent a case citation, a library function, or a historical date. Hallucination is not a bug — it is the predictable outcome of training the model to predict the next token without a truth oracle. Mitigations include RAG (supply grounded context), lower temperature (reduce randomness), and Amazon Bedrock Guardrails grounding check.

Bias Amplification

Foundation models inherit and often amplify biases present in the pretraining corpus. A model trained on English internet text will speak more confidently about Western topics, associate certain professions with certain genders, and underperform on minority languages. Mitigations include careful data curation, instruction tuning with diverse feedback, and post-deployment monitoring through Amazon SageMaker Clarify.

Training-Data Memorisation

Foundation models can regurgitate verbatim chunks of training data — copyrighted text, licensed code, or PII. This is both a legal risk and a privacy risk. Mitigations include provider-side deduplication during training, differential-privacy techniques, and Amazon Bedrock Guardrails sensitive-information filters on output.

A distractor that suggests a foundation model "is on the path to AGI" or "has general intelligence" is always wrong on AIF-C01. Foundation models are statistical next-token predictors with emergent capabilities; they are not artificial general intelligence. The exam guide explicitly frames current AI as narrow AI, and every Amazon Bedrock marketing page frames foundation models as tools, not agents with general intelligence. Pick the option that describes narrow capability, not general cognition. Source ↗

Foundation Models on Amazon Bedrock vs Amazon SageMaker JumpStart

AIF-C01 tests the boundary between Amazon Bedrock and Amazon SageMaker JumpStart at least once per exam.

Amazon Bedrock — Managed API

Amazon Bedrock is the serverless, API-first service for foundation models. You pick a model ID, you send prompts, you pay per token. AWS runs the accelerators, patches the runtime, and keeps the models available. Amazon Bedrock supports managed features — Knowledge Bases for RAG, Agents for tool use, Guardrails for content safety, Model Evaluation for benchmarking — all attached to the same foundation model catalogue.

Amazon SageMaker JumpStart — Managed Deployment

Amazon SageMaker JumpStart hosts foundation model weights (proprietary and open-weight) and lets you deploy them onto Amazon SageMaker endpoints that you control. You pick instance types, autoscaling, and VPC placement; you pay for instance hours rather than tokens; you can fine-tune weights with deeper customisation than Amazon Bedrock's managed fine-tuning.

Decision Rule

Want per-token pricing, no infrastructure, and quick integration? Amazon Bedrock.
Want per-instance pricing, VPC isolation, heavy fine-tuning, or unusual model variants? Amazon SageMaker JumpStart.

Both are valid AIF-C01 answers; the scenario specifies which to pick. Foundation models and LLMs appear on both services — do not confuse service choice with model existence.

Foundation Model Evaluation at Selection Time

Selecting a foundation model is an evaluation problem. AIF-C01 frames evaluation at two moments: at selection (this topic) and at post-deployment monitoring (covered in foundation-model-evaluation).

Benchmarks You Should Recognise

MMLU (Massive Multitask Language Understanding) — 57-subject academic benchmark.
HellaSwag — commonsense reasoning.
HumanEval — Python code generation.
GSM8K — grade-school math word problems.
MT-Bench — open-ended chat quality judged by GPT-4.

These benchmarks are useful for initial shortlisting of foundation models. AIF-C01 does not ask you to compute them, only to recognise them as standard comparison tools.

Amazon Bedrock Model Evaluation

Amazon Bedrock provides a managed Model Evaluation feature — automated evaluation on your own dataset and prompts, plus optional human evaluation through Amazon SageMaker Ground Truth. Use it to compare candidate foundation models (Claude Haiku vs Amazon Nova Lite vs Llama 3.1 8B) against your actual use case, not an abstract benchmark. This is the exam-preferred approach: "benchmark on your own data" beats "trust the leaderboard."

Common Exam Traps for Foundation Models and LLMs

Trap 1 — "LLM" and "Foundation Model" Used Interchangeably

Every LLM is a foundation model; not every foundation model is an LLM. Stable Diffusion is a foundation model but not an LLM. Embeddings models are foundation models but not LLMs. The exam uses this distinction to catch careless readers.

Trap 2 — "Pretrained" Means "Task-Ready Out of the Box"

A pretrained foundation model can do many things but rarely does your specific task perfectly without prompting, RAG, or fine-tuning. Questions that ask "what else do we need" after selecting a foundation model often want you to pick prompt engineering, RAG, or fine-tuning.

Trap 3 — "Open-Source" Applied to Meta Llama

Meta Llama is open-weight under the Llama Community License. It is not OSI open-source. If a question demands an OSI open-source foundation model, pick Mistral 7B or Mixtral 8x7B (Apache 2.0), not Llama.

Trap 4 — "Amazon Bedrock Hosts Only Amazon Models"

False. Amazon Bedrock hosts Anthropic, Amazon, Meta, Mistral, AI21, Cohere, and Stability AI foundation models. Do not confuse Amazon Bedrock with Amazon Titan or Amazon Nova.

Trap 5 — "Larger Context Window = Better Model"

Context window and capability are independent axes. A small model with a 128k context window (some Llama 3.1 8B variants) is still a small model. Pick foundation models on the axis that matches the requirement.

Trap 6 — "Foundation Model Training Is Fast"

Pretraining a frontier foundation model takes weeks to months on thousands of accelerators and costs tens to hundreds of millions of dollars. The exam never rewards an answer that treats pretraining as cheap or quick.

Trap 7 — "Bedrock Model Invocations Train the Base Model"

False. Amazon Bedrock explicitly documents that customer inputs to foundation models are not used to retrain the base models. This is a core trust and compliance property that the exam tests.

AIF-C01 vs AIP-C01 Scope on Foundation Models and LLMs

Foundation models and LLMs appear on both AWS AI certifications. The scope differs sharply.

AIF-C01 (Foundational Tier — This Exam)

Define a foundation model and an LLM.
Recite the Amazon Bedrock model catalogue.
Compare foundation models on size, cost, latency, and context window.
Recognise open-weight vs proprietary licensing categories.
Map a business scenario to a small vs large foundation model.
Identify intrinsic risks: hallucination, bias amplification, training-data memorisation.

AIP-C01 (Professional / Practitioner-Plus Tier — Future Exam)

Architect multi-model foundation model pipelines on Amazon Bedrock Agents.
Design fine-tuning and continued-pretraining workflows on Amazon SageMaker.
Optimise inference cost with provisioned throughput, custom model import, and Amazon SageMaker endpoints with Inferentia2.
Build custom guardrails and prompt-injection defences at production scale.
Operate foundation model lifecycles: shadow deployment, A/B testing, rollback, drift detection.

If a question dives into GPU memory calculations, fine-tuning hyperparameters, or custom-model import details, it is AIP-C01 territory, not AIF-C01. Do not over-study — AIF-C01 rewards clean recognition and scenario mapping, not hands-on optimisation.

Spend 60 percent of your foundation models and LLMs study time on the Amazon Bedrock catalogue (which provider, which modality, which size tier) and on the four-axis tradeoff. Spend 25 percent on licensing (proprietary, open-weight, open-source). Spend 15 percent on intrinsic risks (hallucination, bias, memorisation). That allocation mirrors the exam question distribution based on published AIF-C01 blueprints. Source ↗

Practice Anchors — How the Exam Tests Foundation Models and LLMs

Expect three question archetypes on AIF-C01.

Definition recognition — "Which of the following best describes a foundation model?" Pick the option that includes large, pretrained, broad, and adaptable.
Catalogue selection — "A company needs to generate product marketing images from text prompts on AWS." Pick Stability AI Stable Diffusion on Amazon Bedrock or Amazon Titan Image Generator.
Tradeoff scenario — "A contact-centre chatbot must answer in under one second at millions of calls per day while staying on AWS." Pick a small fast foundation model such as Claude Haiku or Amazon Nova Micro, not Claude Opus.

Drill these three archetypes and you will capture the majority of Domain 2 and Domain 3 foundation models and LLMs points.

FAQ — Foundation Models and LLMs on AIF-C01

Q1. What exactly makes a model a "foundation model" rather than just a "big model"?

A foundation model must satisfy three criteria simultaneously: it is large in parameters and training data, it is pretrained on broad general-purpose data using self-supervised learning, and it is adaptable to many downstream tasks without being rebuilt. A model that is merely large but trained on a single narrow task (for example, a 50-billion-parameter classifier trained only on chest X-rays) is not a foundation model in the AIF-C01 sense. Amazon Bedrock's marketing copy, the Stanford CRFM paper, and the AIF-C01 exam guide all use this three-part definition.

Q2. Is every large language model (LLM) on Amazon Bedrock a foundation model?

Yes. Every LLM on Amazon Bedrock — Anthropic Claude (Haiku, Sonnet, Opus), Amazon Titan Text, Amazon Nova (Micro, Lite, Pro, Premier in their text roles), Meta Llama (8B, 70B, 405B), Mistral (7B, Mixtral, Large), AI21 Jurassic, Cohere Command — is a foundation model. The reverse is not true: Stability AI Stable Diffusion is a foundation model for images, not an LLM. Amazon Titan Image Generator is a foundation model but not an LLM. Keep LLM as the text-generation subset of foundation models in your mental model.

Q3. Should I pick a small fast foundation model or a large accurate one for a real-time chatbot?

Start small. For real-time customer-facing chat with a sub-second latency budget and millions of calls per day, a small fast foundation model such as Claude 3 Haiku, Amazon Nova Micro, or Mistral 7B is the AIF-C01 preferred answer. Escalate to a larger model only when the small model fails to meet the accuracy bar on your own evaluation dataset. This escalation pattern is exactly what Amazon Bedrock's multi-model support is designed for. On the exam, an answer that picks the largest foundation model for a latency-sensitive scenario almost always loses to an answer that picks a smaller one that still meets the quality requirement.

Q4. What is the difference between open-weight, open-source, and proprietary foundation models?

Proprietary foundation models (Claude, Titan, Nova, Jurassic, Cohere Command, Mistral Large) keep their weights closed — you call them only through Amazon Bedrock or the provider's API. Open-weight foundation models (Meta Llama, Stability AI Stable Diffusion) publish downloadable weights under custom licences that usually include usage restrictions. Open-source foundation models (Mistral 7B, Mixtral 8x7B under Apache 2.0) go further and publish weights under OSI-approved permissive licences. On AIF-C01, expect at least one question that separates these three categories — Meta Llama is the most common trap because many study materials incorrectly call it open-source.

Q5. Are my prompts and outputs used to train the base foundation model on Amazon Bedrock?

No. Amazon Bedrock documentation is explicit: inputs and outputs from foundation model invocations through the Amazon Bedrock API are not used by AWS or by the model providers to train or improve the underlying foundation models. This is a core trust property that matters for regulated industries. You can optionally enable model invocation logging (sent to CloudWatch or S3 in your own account) for your own auditability, but AWS does not harvest this data back into pretraining corpora.

Q6. Do I need to understand the transformer math for AIF-C01?

No. AIF-C01 is a foundational exam and asks for conceptual vocabulary, not mathematics. You need to know that the transformer uses an attention mechanism to relate tokens to each other, that context window length is bounded by the attention design, that parameters (billions of weights) are learned during pretraining, and that scale enables emergent capabilities. Derivations of self-attention, softmax, or positional encodings are AIP-C01 and MLS-C01 territory. Spend your study time on Amazon Bedrock model selection instead — that is what the exam rewards.

Q7. Is Amazon Bedrock the only way to use foundation models on AWS?

No. Amazon Bedrock is the managed, API-first entry point and is the first answer for most AIF-C01 scenarios. Amazon SageMaker JumpStart is the alternative — it lets you deploy foundation model weights (Llama, Mistral, Falcon, Stable Diffusion, and many more) onto Amazon SageMaker endpoints inside your own VPC, pay for instances rather than tokens, and perform deeper fine-tuning than Amazon Bedrock's managed jobs allow. Pick Amazon Bedrock when the scenario emphasises zero infrastructure and per-token pricing. Pick Amazon SageMaker JumpStart when the scenario emphasises VPC isolation, instance-level cost control, or aggressive customisation.

Summary — Foundation Models and LLMs Cheat Sheet

A foundation model is large, pretrained, broad, and adaptable. An LLM is the text subtype.
Amazon Bedrock catalogue: Anthropic Claude, Amazon Titan, Amazon Nova, Meta Llama, Mistral, AI21 Jurassic, Cohere Command, Stability AI Stable Diffusion.
Modalities: text (LLM), image (Stable Diffusion, Titan Image Generator), multimodal (Claude Sonnet/Opus, Amazon Nova Pro/Premier), embeddings (Titan Embeddings, Cohere Embed).
Four-axis tradeoff: capability, latency, cost, context window. The exam rewards the minimum viable foundation model, not the largest.
Licensing: proprietary (closed), open-weight (downloadable with restrictions), open-source (Apache 2.0 or MIT). Llama is open-weight, not open-source.
Intrinsic risks: hallucination, bias amplification, training-data memorisation — mitigated by RAG, guardrails, monitoring.
Deployment options: Amazon Bedrock (API, per-token) or Amazon SageMaker JumpStart (endpoints, per-instance).
AIF-C01 scope: recognise and select foundation models. AIP-C01 scope: architect, fine-tune, and operate foundation models.

Master this foundation models and LLMs cheat sheet and you have already banked a large chunk of the AIF-C01 Domain 2 and Domain 3 question budget. The remaining topics in this study track — tokens-context-window-temperature, fine-tuning-vs-in-context-learning, bedrock-model-selection, rag-retrieval-augmented-generation — all build on the foundation models and LLMs vocabulary you now hold.

What Are Foundation Models and LLMs?

Why Foundation Models and LLMs Dominate AIF-C01

白話文解釋 Foundation Models and LLMs

Analogy 1 — The Open-Book Exam Library

Analogy 2 — The Swiss Army Knife Factory

Analogy 3 — The Electrical Grid

Transformer Architecture — The Engine Inside Every LLM

The Attention Mechanism in One Paragraph

Why It Matters for Model Selection

Pretraining at Scale — How a Foundation Model Is Born

Model Parameters — What "Billions of Parameters" Means

Parameters vs Training Tokens — The Scaling Laws

Foundation Model Families on Amazon Bedrock

Anthropic Claude

Amazon Titan

Amazon Nova

Meta Llama

Mistral AI

AI21 Labs Jurassic

Cohere Command

Stability AI Stable Diffusion

Other Amazon Bedrock Foundation Models

Text vs Image vs Multimodal Foundation Models

Text Foundation Models (LLMs)

Image Foundation Models

Multimodal Foundation Models

Embedding Models

Model Size vs Cost vs Latency vs Accuracy — The Four-Axis Tradeoff

Axis 1 — Capability (Accuracy, Reasoning Depth)

Axis 2 — Latency (Tokens per Second and First-Token Time)

Axis 3 — Cost (Per Input Token and Per Output Token)

Axis 4 — Context Window (Tokens per Request)

Combining the Axes — Small Fast vs Large Accurate

Open-Weight vs Proprietary Foundation Models — The Licensing Landscape

Proprietary (Closed-Weight) Foundation Models

Open-Weight Foundation Models

Open-Weight Is Not the Same as Open-Source

EULA and Responsible Use

When to Choose a Small Fast Model vs a Large Accurate Model

Choose a Small Fast Foundation Model When

Choose a Large Accurate Foundation Model When

The Hybrid Pattern — Small by Default, Escalate on Uncertainty

Why Foundation Models Changed the ML Industry

The Stanford CRFM Framing

Intrinsic Risks of Foundation Models and LLMs

Hallucination

Bias Amplification

Training-Data Memorisation

Foundation Models on Amazon Bedrock vs Amazon SageMaker JumpStart

Amazon Bedrock — Managed API

Amazon SageMaker JumpStart — Managed Deployment

Decision Rule

Foundation Model Evaluation at Selection Time

Benchmarks You Should Recognise

Amazon Bedrock Model Evaluation

Common Exam Traps for Foundation Models and LLMs

Trap 1 — "LLM" and "Foundation Model" Used Interchangeably

Trap 2 — "Pretrained" Means "Task-Ready Out of the Box"

Trap 3 — "Open-Source" Applied to Meta Llama

Trap 4 — "Amazon Bedrock Hosts Only Amazon Models"

Trap 5 — "Larger Context Window = Better Model"

Trap 6 — "Foundation Model Training Is Fast"

Trap 7 — "Bedrock Model Invocations Train the Base Model"

AIF-C01 vs AIP-C01 Scope on Foundation Models and LLMs

AIF-C01 (Foundational Tier — This Exam)

AIP-C01 (Professional / Practitioner-Plus Tier — Future Exam)

Practice Anchors — How the Exam Tests Foundation Models and LLMs

FAQ — Foundation Models and LLMs on AIF-C01

Q1. What exactly makes a model a "foundation model" rather than just a "big model"?

Q2. Is every large language model (LLM) on Amazon Bedrock a foundation model?

Q3. Should I pick a small fast foundation model or a large accurate one for a real-time chatbot?

Q4. What is the difference between open-weight, open-source, and proprietary foundation models?

Q5. Are my prompts and outputs used to train the base foundation model on Amazon Bedrock?

Q6. Do I need to understand the transformer math for AIF-C01?

Q7. Is Amazon Bedrock the only way to use foundation models on AWS?

Summary — Foundation Models and LLMs Cheat Sheet

官方資料來源