examhub .cc The most efficient path to the most valuable certifications.
In this note ≈ 35 min

Amazon Bedrock — Model Selection and Pricing

6,850 words · ≈ 35 min read

Amazon Bedrock model selection is the single most tested design decision on the AWS Certified AI Practitioner (AIF-C01) exam. Task Statement 3.1 asks you to describe design considerations for applications that use foundation models (FMs), and almost every scenario funnels through the same question: which Amazon Bedrock model do you pick, and how do you serve it? This topic covers Amazon Bedrock end-to-end — what Amazon Bedrock is, the full Amazon Bedrock model catalog (Anthropic Claude, Amazon Titan, Amazon Nova, Meta Llama, Mistral, AI21 Jurassic and Jamba, Cohere, Stability AI), the axes of Amazon Bedrock model selection (capability, cost, latency, context window, multimodality, licensing), on-demand vs Provisioned Throughput on Amazon Bedrock, Cross-Region Inference in Amazon Bedrock, Amazon Bedrock Model Evaluation, Amazon Bedrock custom model import, Amazon Bedrock Agents, and Amazon Bedrock Knowledge Bases. If you can pick the right Amazon Bedrock model for a scenario and defend the Amazon Bedrock throughput choice, you will answer a large fraction of Domain 3 correctly.

This Amazon Bedrock model selection guide is written for the AIF-C01 foundational scope. Where Amazon Bedrock decisions go deeper into production engineering — Amazon Bedrock Agents orchestration code, fine-tuning job pipelines, custom guardrail plug-ins — that depth belongs to the AWS Certified Machine Learning Engineer — Associate (MLA-C01) and the future AWS Certified AI Practitioner Professional (AIP-C01) scopes. AIF-C01 asks you to choose, not build. Keep that frame, and every Amazon Bedrock question becomes tractable.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed AWS service that exposes a catalog of foundation models from multiple providers behind a single API surface. You call one API, InvokeModel or Converse, and Amazon Bedrock routes the request to Anthropic Claude, Amazon Titan, Amazon Nova, Meta Llama, Mistral, AI21 Jurassic or Jamba, Cohere, or Stability AI depending on the modelId you specify. Amazon Bedrock is serverless — there are no GPUs for you to provision, no model servers to patch, and no scaling policies to tune. You pay per input token and per output token (on-demand) or by committing hourly capacity (Provisioned Throughput).

Amazon Bedrock differs from Amazon SageMaker in one sentence: Amazon Bedrock is for calling somebody else's foundation model through an API, and Amazon SageMaker is for building, training, and deploying your own model end-to-end. The AIF-C01 exam loves this distinction, so keep the mental bookmark active. Within Amazon Bedrock itself, there are several value-added capabilities — Amazon Bedrock Agents for tool use, Amazon Bedrock Knowledge Bases for managed retrieval-augmented generation (RAG), Amazon Bedrock Guardrails for content safety, Amazon Bedrock Model Evaluation for objective benchmarking, and Amazon Bedrock custom model import for bringing your own weights.

Why Amazon Bedrock model selection matters for AIF-C01

AIF-C01 Domain 3 — Applications of Foundation Models — weighs 28 percent of the exam. Amazon Bedrock is referenced in at least four of the six task statements under Domain 3 and appears again in Domain 2 (Task 2.3 AWS infrastructure for generative AI) and Domain 5 (Task 5.1 securing AI systems via Amazon Bedrock Guardrails). Every scenario that starts with "a company wants to add a chatbot using Anthropic Claude" or "the team needs to generate images from text" or "a developer wants to ground model responses in internal documents without managing infrastructure" resolves to an Amazon Bedrock choice. The community pain points (collected from AIF-C01 post-exam reports) highlight three recurring traps: confusing Amazon Bedrock with Amazon SageMaker, confusing Amazon Bedrock Guardrails with AWS IAM, and misjudging when to pay for Provisioned Throughput on Amazon Bedrock instead of staying on-demand.

The AIF-C01 vs AIP-C01 scope note for Amazon Bedrock

AIF-C01 (AI Practitioner) tests recognition of Amazon Bedrock features and model selection reasoning. You must know which Amazon Bedrock model fits which use case, what Provisioned Throughput on Amazon Bedrock is, and why Amazon Bedrock Knowledge Bases exist. AIP-C01 (future professional tier) is expected to test deeper Amazon Bedrock Agents workflow design, fine-tuning job configuration, and production monitoring. If a practice question asks you to write the Amazon Bedrock Agents action group schema or tune a Provisioned Throughput commitment level, that is AIP-C01 territory, not AIF-C01.

Plain-Language Explanation of Amazon Bedrock Model Selection

Amazon Bedrock model selection sounds intimidating until you strip it down with analogies. Three different pictures cover the whole surface.

Analogy 1 — The restaurant menu (restaurant)

Imagine you walk into a multi-cuisine restaurant. Amazon Bedrock is the restaurant, and every foundation model on the Amazon Bedrock menu is a dish from a different chef.

  • Anthropic Claude is the French tasting menu — careful, precise, long conversations, strong reasoning.
  • Amazon Titan and Amazon Nova are the house specials — affordable, reliable, optimized for the restaurant's own supply chain.
  • Meta Llama is the open-source family recipe — you can see the ingredient list, and the kitchen is happy to let you take the recipe home (with licensing conditions).
  • Mistral is the European bistro plate — compact, efficient, good for bulk orders.
  • AI21 Jurassic and Jamba are the long-course banquet — massive context windows for extended documents.
  • Cohere is the enterprise catering plate — tuned for summarization, classification, and embeddings at scale.
  • Stability AI is the dessert station — text-to-image generation with Stable Diffusion.

You, the customer (developer), only hand the waiter (the Amazon Bedrock API) one instruction: "I want the Claude 3.5 Sonnet with a side of 8K context." The kitchen handles the rest. Provisioned Throughput on Amazon Bedrock is like buying a reserved table with a pre-paid tasting menu — the table is always ready, but you pay whether you show up or not. On-demand on Amazon Bedrock is walk-in seating — cheaper if you eat occasionally, slower if the restaurant is full.

This analogy makes one thing obvious: Amazon Bedrock does not cook its own foundation models from scratch for every dish. Amazon Bedrock is the restaurant management layer; the chefs (Anthropic, Meta, Mistral, Amazon, AI21, Cohere, Stability AI) are the foundation model providers. Your Amazon Bedrock model selection is really a chef selection.

Analogy 2 — The power grid (electric grid)

Think of Amazon Bedrock as the power grid for generative AI.

  • Each Amazon Bedrock foundation model provider is a different power plant — Anthropic runs one nuclear plant (high capability, steady output), Amazon runs a hydro plant (Titan and Nova, cheap and reliable), Meta runs a wind farm (Llama, open and variable), Mistral runs a solar array (compact and efficient).
  • The Amazon Bedrock API is the transmission network — you plug into one socket and receive electricity without knowing which plant generated it.
  • On-demand pricing on Amazon Bedrock is the residential electric bill — pay by the kilowatt-hour (token).
  • Provisioned Throughput on Amazon Bedrock is the industrial contract — commit to X megawatts per hour for 1 or 6 months and get a volume discount plus guaranteed capacity.
  • Cross-Region Inference on Amazon Bedrock is the national grid interconnection — if the local plant is at capacity, the grid can draw from a neighboring region transparently.
  • Amazon Bedrock Guardrails are the circuit breakers — they trip when dangerous current (harmful content) tries to flow through.
  • Amazon Bedrock custom model import is the private solar panel on your own roof feeding into the grid — you brought the weights, Amazon Bedrock hosts them.

The power grid analogy explains why Amazon Bedrock's value is not the models themselves but the integration: unified API, unified auth, unified logging, unified Guardrails, unified monitoring. The same reason no one builds their own power plant for a single house is the same reason teams pick Amazon Bedrock over self-hosting open-source models on EC2 GPUs.

Analogy 3 — The workshop toolbox (toolbox)

Imagine a carpenter's toolbox. The toolbox itself is Amazon Bedrock. Inside, every foundation model is a different tool with a different purpose.

  • Claude 3.5 Sonnet is the precision chisel — slow, expensive per cut, unmatched detail. Use it for coding assistance, long-form writing, complex reasoning.
  • Claude 3 Haiku is the utility knife — fast, cheap, handles 80 percent of everyday cuts. Use it for high-volume chat, summarization, classification.
  • Amazon Titan Text is the house-brand hammer — reliable, low cost, good enough for most general tasks.
  • Amazon Nova Micro, Lite, and Pro are a three-piece wrench set — different sizes for different jobs, all from the same manufacturer.
  • Meta Llama 3.1 70B is the power drill — high throughput, open-weight, customizable with attachments.
  • Mistral Large is the Swiss file — compact, efficient, strong on European languages and structured output.
  • AI21 Jamba 1.5 Large is the extended tape measure — context windows up to 256K tokens for reading long contracts.
  • Cohere Command R+ is the precision jigsaw — tuned for retrieval-augmented generation and multilingual text.
  • Stability AI Stable Diffusion is the paint sprayer — turns text into images.
  • Amazon Titan Embeddings and Cohere Embed are the measuring calipers — not for building, for measuring similarity (embeddings for vector search).

You do not reach for the precision chisel when a utility knife will do. You do not reach for the paint sprayer when you need text. Amazon Bedrock model selection is exactly this: match the tool to the job on dimensions of capability, cost, latency, context window, multimodality, and licensing.

Amazon Bedrock Model Catalog — Providers and Families

The Amazon Bedrock catalog is the centerpiece of Amazon Bedrock model selection. Below are the provider families you must recognize for AIF-C01. Exam questions typically describe a use case, quote a behavior (long context, image generation, cheap summarization, agentic tool use), and ask you to pick the best-matched Amazon Bedrock model family.

Anthropic Claude on Amazon Bedrock

Anthropic Claude is the highest-capability text foundation model family on Amazon Bedrock. The Claude family on Amazon Bedrock includes Claude 3.5 Sonnet (flagship reasoning and coding), Claude 3.5 Haiku (fast, cheap, intelligent enough for most production chat), Claude 3 Opus (deepest reasoning, highest cost), Claude 3 Sonnet (balanced), and Claude 3 Haiku (original fast tier). Claude on Amazon Bedrock supports up to 200K-token context windows, vision inputs (image plus text), and tool use for Amazon Bedrock Agents.

Claude shines on tasks that require careful instruction-following, nuanced reasoning, code generation, long-document summarization, and multi-turn conversation. On the AIF-C01 exam, scenarios that say "highest-quality assistant," "coding copilot," "complex reasoning over long documents," or "multi-turn agent with tool use" map to Claude on Amazon Bedrock.

Amazon Titan and Amazon Nova on Amazon Bedrock

Amazon Titan is the first-party foundation model family from AWS on Amazon Bedrock. Amazon Titan Text Express and Titan Text Lite handle general-purpose text generation at low cost. Amazon Titan Text Embeddings and Titan Text Embeddings V2 produce vector embeddings for semantic search and RAG. Amazon Titan Image Generator creates images from text. Amazon Titan Multimodal Embeddings encode text and images into the same vector space.

Amazon Nova is the newer AWS foundation model family on Amazon Bedrock, announced at re:Invent 2024. Amazon Nova Micro, Nova Lite, and Nova Pro are text and multimodal models tuned for low cost and low latency. Amazon Nova Canvas generates images, and Amazon Nova Reel generates short videos. On AIF-C01, Amazon Titan and Amazon Nova are the "AWS-native, cost-optimized" answer when a scenario emphasizes budget, compliance with AWS-only providers, or embeddings for Amazon Bedrock Knowledge Bases.

Meta Llama on Amazon Bedrock

Meta Llama models on Amazon Bedrock (Llama 3, Llama 3.1, Llama 3.2, Llama 3.3 at various parameter counts — 8B, 70B, 405B, plus vision variants) are open-weight foundation models served through the Amazon Bedrock API. Llama on Amazon Bedrock is attractive when teams want open licensing (Llama Community License), multilingual support, and large parameter counts. Llama 3.2 adds vision capability. On AIF-C01 the Llama answer fits scenarios emphasizing "open-source model via a managed API" or "bring my own Llama fine-tuning from SageMaker JumpStart back into Amazon Bedrock."

Mistral on Amazon Bedrock

Mistral AI models on Amazon Bedrock include Mistral 7B, Mixtral 8x7B, Mistral Large, Mistral Large 2, and Mistral Small. Mistral on Amazon Bedrock is known for efficient parameter counts, strong structured-output behavior, strong European-language coverage, and competitive cost. Pick Mistral on Amazon Bedrock when the scenario says "efficient small model," "structured JSON output," or "European-language processing."

AI21 Labs Jurassic and Jamba on Amazon Bedrock

AI21 Labs on Amazon Bedrock offers the Jurassic-2 family (instruction-tuned text generation) and the newer Jamba 1.5 Mini and Jamba 1.5 Large. Jamba uses a hybrid Transformer-Mamba architecture and supports context windows up to 256K tokens — one of the largest on Amazon Bedrock. Pick AI21 Jamba on Amazon Bedrock when the scenario emphasizes "very long context," "contract analysis," or "entire codebase reasoning."

Cohere on Amazon Bedrock

Cohere on Amazon Bedrock includes Command, Command Light, Command R, and Command R+ for text generation, plus Cohere Embed English and Cohere Embed Multilingual for embeddings. Cohere Command R+ on Amazon Bedrock is explicitly tuned for RAG and tool use. Pick Cohere on Amazon Bedrock when the scenario says "enterprise retrieval-augmented generation," "multilingual embeddings for Amazon Bedrock Knowledge Bases," or "summarization at scale."

Stability AI on Amazon Bedrock

Stability AI on Amazon Bedrock provides Stable Diffusion (SD3, SDXL, SD3.5 Large) for text-to-image generation and image-to-image transformations. Pick Stability AI on Amazon Bedrock when the use case is image generation, marketing creative automation, or synthetic image data.

An Amazon Bedrock model ID is the string you pass to the Amazon Bedrock API to select a specific foundation model and version, such as anthropic.claude-3-5-sonnet-20241022-v2:0 or amazon.titan-text-express-v1. The Amazon Bedrock model ID encodes provider, model family, variant, date, and version — choose it carefully because model versions on Amazon Bedrock are not silently upgraded under you. Source ↗

Amazon Bedrock Model Selection Axes — How to Choose

Every Amazon Bedrock model selection decision lives on six axes. Memorize them; AIF-C01 scenarios are almost always disguised axis questions.

Axis 1 — Capability

Does the model reason well? Does it follow complex instructions? Does it write code at senior-engineer quality? Amazon Bedrock models sort roughly as Claude 3.5 Sonnet / Claude 3 Opus at the top of general reasoning, followed by Llama 3.1 405B, Mistral Large 2, Nova Pro, Command R+, Jamba 1.5 Large. For multimodal vision, Claude 3.5 Sonnet, Nova Pro, and Llama 3.2 lead on Amazon Bedrock. For image generation, Stability AI Stable Diffusion and Amazon Titan Image Generator / Nova Canvas are the primary Amazon Bedrock options.

Axis 2 — Cost

Amazon Bedrock on-demand pricing is quoted per 1,000 input tokens and per 1,000 output tokens, and output tokens always cost more than input tokens. Within a family, smaller models cost 5x to 20x less than flagship models. On Amazon Bedrock, Claude 3.5 Haiku runs roughly an order of magnitude cheaper than Claude 3.5 Sonnet, and Nova Micro is cheaper again. For high-volume, low-complexity tasks (classification, short summarization, entity extraction) a cheaper Amazon Bedrock model typically wins on unit economics.

Axis 3 — Latency

Time-to-first-token and tokens-per-second vary across Amazon Bedrock models. Smaller models (Haiku, Nova Micro, Mistral Small) emit faster for interactive UX. Larger models (Opus, Llama 405B) take longer. For real-time chat, pick the smallest Amazon Bedrock model that meets capability, not the largest. For batch summarization, latency matters less and capability dominates.

Axis 4 — Context Window

Context window is the token ceiling for the combined prompt plus conversation history plus retrieved context. Amazon Bedrock models range from 4K (some older Titan and Jurassic) up to 200K (Claude 3 and 3.5) and 256K (Jamba 1.5 Large). If the scenario needs to reason over a 300-page contract, an Amazon Bedrock model with at least 200K-token context is required — or combine a smaller-context model with Amazon Bedrock Knowledge Bases (RAG) to retrieve only the relevant passages.

Axis 5 — Multimodality

Not every Amazon Bedrock model accepts images. Claude 3, Claude 3.5, Nova Lite, Nova Pro, and Llama 3.2 Vision take image plus text inputs on Amazon Bedrock. If the scenario says "analyze this screenshot" or "describe a product photo," a vision-capable Amazon Bedrock model is required. For image output (text-to-image), Stable Diffusion, Titan Image Generator, and Nova Canvas are the Amazon Bedrock answers.

Axis 6 — Licensing and Data Policy

All Amazon Bedrock model providers guarantee that customer prompts and completions are not used to train the base foundation models — this is a consistent Amazon Bedrock promise. Beyond that, licensing differs: Anthropic Claude, Amazon Titan, and Amazon Nova are proprietary; Meta Llama carries the Llama Community License; Mistral has Apache 2.0 on some variants and commercial license on Large. For AIF-C01 you do not need to recite license text, but you should know that "customer data never trains base models" is the default on Amazon Bedrock.

On AIF-C01, always ask the six questions in order: What capability tier do I need? What is my per-token budget? What latency does the user see? How long is my context? Do I need images in or out? Any licensing constraint? The answers narrow the Amazon Bedrock catalog to one or two candidates, and the scenario's adjectives (highest-quality, cheapest, fastest, longest-context, vision, open-source) pick the final Amazon Bedrock model. Source ↗

Amazon Bedrock Pricing Modes — On-Demand vs Provisioned Throughput

Amazon Bedrock offers three ways to pay: On-Demand, Batch, and Provisioned Throughput. The AIF-C01 exam tests the choice between On-Demand and Provisioned Throughput most often.

On-Demand on Amazon Bedrock

On-Demand on Amazon Bedrock charges per 1,000 input tokens and per 1,000 output tokens consumed. There is no commitment, no minimum spend, and capacity is pooled across all Amazon Bedrock customers. On-Demand is the right Amazon Bedrock pricing mode for variable, unpredictable, or spiky workloads. Development environments, proofs of concept, and most production chat applications start and stay on On-Demand on Amazon Bedrock.

Pros: no commitment, pay for exactly what you use, easy to scale down to zero. Cons: shared capacity means no throughput guarantee during peak hours; some advanced features (like custom fine-tuned models) require Provisioned Throughput on Amazon Bedrock.

Batch on Amazon Bedrock

Batch inference on Amazon Bedrock lets you submit a file of prompts to Amazon S3, Amazon Bedrock processes them asynchronously (hours, not seconds), and results land back in Amazon S3. Batch on Amazon Bedrock costs roughly half the On-Demand price. Use Batch on Amazon Bedrock for overnight document summarization, nightly content moderation, or bulk embeddings generation — anywhere latency does not matter.

Provisioned Throughput on Amazon Bedrock

Provisioned Throughput on Amazon Bedrock (PT) reserves dedicated capacity for a specific model at a fixed hourly rate. You commit to a term — No Commitment, 1 Month, or 6 Month — and buy "model units" (MUs), where each MU grants a guaranteed tokens-per-minute input and output throughput. Provisioned Throughput on Amazon Bedrock is required for:

  • Custom fine-tuned models on Amazon Bedrock.
  • Custom imported models on Amazon Bedrock.
  • Production workloads that demand guaranteed latency or throughput SLAs.
  • Very high steady-state volume that exceeds the On-Demand cost-equivalence threshold.

The cost-equivalence threshold is the break-even point: if your steady token volume per hour exceeds the PT hourly cost at on-demand rates, PT becomes cheaper. Under that threshold, On-Demand wins.

A common AIF-C01 trap: assuming Provisioned Throughput on Amazon Bedrock gives you unlimited capacity. It does not. Each model unit on Amazon Bedrock has a fixed tokens-per-minute cap. Overshoot the cap and requests get throttled. Provisioned Throughput on Amazon Bedrock is about reserved baseline capacity, not elasticity. If traffic is spiky and unpredictable, On-Demand on Amazon Bedrock or Cross-Region Inference on Amazon Bedrock is the right answer, not Provisioned Throughput. Source ↗

Rule of thumb for AIF-C01: stay On-Demand on Amazon Bedrock unless one of three things is true. First, you have deployed a custom fine-tuned or imported model on Amazon Bedrock — PT is mandatory. Second, you need predictable latency for a production SLA and On-Demand throttling is hurting you. Third, your steady-state token volume is high enough that the hourly PT rate undercuts equivalent On-Demand spend. Otherwise, do not buy PT on Amazon Bedrock. Source ↗

Cross-Region Inference on Amazon Bedrock

Cross-Region Inference on Amazon Bedrock is a routing feature that lets the Amazon Bedrock runtime automatically serve a request from a different AWS Region when the home Region is at capacity, without you changing your modelId or rewriting code. You opt in by using a Cross-Region Inference profile (an inference profile ID that aggregates multiple Regions).

Cross-Region Inference on Amazon Bedrock improves availability during traffic spikes, reduces throttling, and can offer better per-token throughput. Data remains within a geographic boundary group (for example, US inference profiles route across US Regions only), which is important for residency considerations. For AIF-C01, remember: Cross-Region Inference on Amazon Bedrock is the answer when the scenario says "spiky traffic, no capacity commitment, want to reduce throttling." It is not the answer for data residency problems that require strict single-Region confinement.

Cross-Region Inference on Amazon Bedrock is designed to smooth short-term capacity shortages and improve throughput availability. It is not a disaster-recovery failover mechanism and it is not a latency-reduction feature for distant users. AIF-C01 scenarios that say "reduce throttling during peak hours" point to Cross-Region Inference. Scenarios that say "active-active DR across Regions" or "user in Tokyo should hit a Tokyo endpoint" are different problems. Source ↗

Amazon Bedrock Model Evaluation

Amazon Bedrock Model Evaluation is the built-in job type that benchmarks Amazon Bedrock models for a specific task using either automated metrics or human reviewers. You upload a prompt dataset (with expected outputs for ground-truth evaluation, or without for preference comparison), pick one or more Amazon Bedrock models, choose a task type — General Text, Text Summarization, Question & Answer, or Text Classification — and Amazon Bedrock runs the evaluation job, produces metric reports, and stores results in Amazon S3.

Automated evaluation on Amazon Bedrock

Automated Amazon Bedrock Model Evaluation computes task-specific metrics: accuracy (for classification), F1 (for QA), BLEU and ROUGE (for summarization and translation), BERTScore (for semantic similarity), and robustness/toxicity scores. No humans needed — results are ready in minutes to hours depending on dataset size.

Human evaluation on Amazon Bedrock

Human evaluation on Amazon Bedrock routes model outputs to your own team (bring-your-own workforce) or to an AWS-managed workforce. Reviewers rate responses on dimensions you define — relevance, coherence, harmlessness, style match, factual accuracy — via thumbs-up/down, Likert scales, or ranked comparisons.

Retrieval-augmented evaluation on Amazon Bedrock

Amazon Bedrock Model Evaluation also supports RAG evaluation: given an Amazon Bedrock Knowledge Base and a prompt dataset, it measures retrieval quality (context relevance, precision, recall) alongside generation quality (answer faithfulness to retrieved context).

For AIF-C01, recognize Amazon Bedrock Model Evaluation as the managed, comparison-friendly answer when a scenario asks "how do we objectively decide between Claude 3.5 Sonnet and Llama 3.1 70B for our use case."

Amazon Bedrock Custom Model Import

Amazon Bedrock custom model import lets you bring an externally trained, open-weight model into Amazon Bedrock and call it through the same Amazon Bedrock API surface as the built-in catalog. Supported architectures include Llama, Mistral, Mixtral, Flan, and compatible variants; you upload the weights to Amazon S3, Amazon Bedrock registers and hosts the model, and you invoke it via modelId the same way as any first-party Amazon Bedrock model.

Custom imported models on Amazon Bedrock always run on Provisioned Throughput — there is no on-demand capacity pool for arbitrary imported weights. This is a common AIF-C01 trap: if a scenario says "we fine-tuned Llama on our GPUs and now want to serve it via Amazon Bedrock," the correct answer includes Amazon Bedrock custom model import plus Provisioned Throughput.

For AIF-C01: any custom fine-tuned or custom imported model on Amazon Bedrock requires Provisioned Throughput. On-demand pricing applies only to the Amazon Bedrock first-party catalog models. If the scenario says "serve our fine-tuned Llama from Amazon Bedrock cheaply with on-demand pricing," that is impossible — the correct answer is Provisioned Throughput on Amazon Bedrock or hosting on Amazon SageMaker endpoints instead. Source ↗

Amazon Bedrock Agents

Amazon Bedrock Agents is the orchestration feature that turns a foundation model on Amazon Bedrock into a tool-using, multi-step agent. An Amazon Bedrock Agent is configured with:

  • A foundation model (typically Claude on Amazon Bedrock) as the reasoning brain.
  • One or more Action Groups — OpenAPI schemas or AWS Lambda functions the agent can call.
  • Optional Amazon Bedrock Knowledge Bases attached for retrieval.
  • Instruction prompt defining the agent's role, goals, and constraints.

When a user sends a request, the Amazon Bedrock Agent plans the steps, picks actions, calls Lambdas or Knowledge Bases, observes results, and iterates until the task completes. The whole orchestration loop runs inside Amazon Bedrock — you do not write the plan-act-observe loop yourself, which is the key difference from rolling your own agent framework.

Typical Amazon Bedrock Agents use cases on AIF-C01: customer support bots that can look up order status via an API, internal ops assistants that can query databases through Lambda, coding assistants that can execute functions. If the scenario mentions "multi-step task," "agent calls an API," or "orchestrate tool use," Amazon Bedrock Agents is the answer.

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases is the managed retrieval-augmented generation (RAG) pipeline on Amazon Bedrock. You connect a data source (Amazon S3, Confluence, Salesforce, SharePoint, web crawl), Amazon Bedrock Knowledge Bases chunks the documents, generates embeddings using an embedding model (Amazon Titan Text Embeddings or Cohere Embed), stores vectors in a supported vector store (Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL with pgvector, Pinecone, Redis Enterprise Cloud, MongoDB Atlas), and exposes a RetrieveAndGenerate API that does retrieval plus generation in a single call.

Amazon Bedrock Knowledge Bases removes the operational burden of building your own RAG stack: no custom chunking code, no embedding orchestration, no vector DB provisioning logic. You pay for underlying services (OpenSearch Serverless OCUs, embedding tokens on Amazon Bedrock, generation tokens on Amazon Bedrock) plus standard Amazon Bedrock pricing. For AIF-C01, recognize Amazon Bedrock Knowledge Bases as the answer whenever the scenario says "ground model responses in our private documents with minimal infrastructure."

RetrieveAndGenerate is the single Amazon Bedrock API call that takes a user query, retrieves relevant passages from an Amazon Bedrock Knowledge Base, injects them into the prompt, and invokes the chosen Amazon Bedrock foundation model to produce a grounded answer with citations. It collapses what would otherwise be a five-step pipeline (embed query → search vector store → rerank → assemble prompt → invoke model) into one managed call on Amazon Bedrock. Source ↗

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails apply programmable content-safety policies to both prompts and model completions. A Guardrail on Amazon Bedrock can block or redact:

  • Harmful content categories (hate, violence, sexual, insults, misconduct) at configurable severity levels.
  • Denied topics (e.g., "no financial advice," "no medical diagnosis") defined by name and sample phrases.
  • Specific words and phrases (profanity list, competitor names).
  • Personally Identifiable Information (PII) like SSNs, phone numbers, emails — either blocked or redacted.
  • Ungrounded responses (via the contextual grounding check), which flags completions that are not supported by the retrieved Amazon Bedrock Knowledge Base context.

Amazon Bedrock Guardrails attach to an InvokeModel call via guardrailIdentifier, and they work across all supported Amazon Bedrock text models. Critically, Amazon Bedrock Guardrails are distinct from AWS IAM — Guardrails govern content safety, IAM governs who can call Amazon Bedrock at all. Conflating them is a top AIF-C01 trap.

AIF-C01 frequently offers distractor answers that propose AWS IAM policies to "filter out hate speech" or "redact PII from model output." That is not what IAM does. AWS IAM on Amazon Bedrock controls whether a principal can call bedrock:InvokeModel, which models they can target, and which Guardrail IDs they can attach — it is access control. Content-safety filtering on Amazon Bedrock is the job of Amazon Bedrock Guardrails. If the scenario is about content filtering or PII redaction at inference time, pick Amazon Bedrock Guardrails; if it is about who can invoke which model, pick AWS IAM. Source ↗

Amazon Bedrock vs Amazon SageMaker

The Amazon Bedrock vs Amazon SageMaker decision is the most-asked AIF-C01 question in Domain 3. Internalize this table.

Dimension Amazon Bedrock Amazon SageMaker
Primary purpose Call pre-built foundation models via API Build, train, deploy your own model
Infrastructure Fully serverless Managed but visible instances
Typical user Application developer Data scientist or ML engineer
Pricing unit Per-token or per-hour MU Per-instance-hour
Customization Prompt engineering, fine-tuning, RAG Full algorithm choice, custom training code
Model catalog Anthropic, Amazon, Meta, Mistral, AI21, Cohere, Stability AI Any algorithm you can code or JumpStart model
Operational effort Minimal Significant (data prep, tuning, monitoring)
Best-fit scenarios GenAI apps, chatbots, RAG, agents Custom fraud model, forecasting, domain-specific CV

Both services share the same core distinction as the CLF-C02 rule: Amazon Bedrock calls somebody else's foundation model; Amazon SageMaker builds your own. Amazon SageMaker JumpStart blurs the line slightly by hosting pretrained foundation models you can fine-tune and deploy to SageMaker endpoints, but the AIF-C01 exam generally keeps the boundary clean — foundation model API via one call means Amazon Bedrock.

Amazon Bedrock vs Amazon Q

Amazon Q is the packaged, end-user-facing generative AI assistant built on top of Amazon Bedrock. Amazon Q Business connects to enterprise content and answers employee questions with citations; Amazon Q Developer sits inside IDEs and the AWS Console to help write and debug code; Amazon Q in QuickSight produces BI narratives; Amazon Q in Connect helps contact-center agents. Amazon Q is what non-developers use. Amazon Bedrock is what developers call from code.

On AIF-C01, questions phrased as "business users want a chat interface over internal documents without writing code" point to Amazon Q Business, which uses Amazon Bedrock under the hood. Questions phrased as "developers want API access to Claude or Llama to build a custom generative AI app" point to Amazon Bedrock directly.

Amazon Bedrock Integration with Other AWS Services

Amazon Bedrock is designed to compose cleanly with the rest of AWS.

  • Amazon S3 — source for Amazon Bedrock Knowledge Bases documents, Batch inference input/output, custom model import weights.
  • AWS Lambda — invoked by Amazon Bedrock Agents as Action Group handlers; also a common client that calls Amazon Bedrock from serverless apps.
  • Amazon OpenSearch Serverless — default vector store for Amazon Bedrock Knowledge Bases.
  • Amazon Aurora PostgreSQL (pgvector) — alternative vector store for Amazon Bedrock Knowledge Bases.
  • AWS IAM — controls bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, bedrock-agent:InvokeAgent, and Guardrail attach/detach permissions.
  • Amazon CloudWatch — metrics for Amazon Bedrock invocations, latencies, throttles.
  • AWS CloudTrail — audit log of every Amazon Bedrock API call.
  • AWS KMS — customer-managed keys for encrypting custom model artifacts, fine-tuning datasets, and Knowledge Base storage on Amazon Bedrock.
  • VPC Endpoints (AWS PrivateLink) — private connectivity so Amazon Bedrock calls never traverse the public internet.
  • Amazon SageMaker — source for fine-tuned models that are then imported into Amazon Bedrock, or alternative hosting target if Amazon Bedrock is not suitable.

Common Amazon Bedrock Exam Traps

  • Amazon Bedrock vs Amazon SageMaker — if the scenario mentions training, notebooks, or hyperparameters, it is Amazon SageMaker. If it mentions foundation models, Claude, Titan, Nova, Llama, or generative AI API, it is Amazon Bedrock.
  • Amazon Bedrock Guardrails vs AWS IAM — Guardrails filter content; IAM controls who can call the API. Do not cross them.
  • Provisioned Throughput on Amazon Bedrock vs on-demand — on-demand is default and cheapest for variable load. Provisioned Throughput is mandatory for custom imported models on Amazon Bedrock and for guaranteed SLAs, not for casual scaling.
  • Cross-Region Inference on Amazon Bedrock — it is a capacity-smoothing feature, not a DR pattern, not a latency reducer for distant users.
  • Amazon Bedrock Knowledge Bases vs building your own RAG — if the scenario says "minimal infrastructure," "managed RAG," or "S3 plus vector store plus generation in one call," the answer is Amazon Bedrock Knowledge Bases.
  • Amazon Bedrock Agents vs AWS Step Functions — Step Functions orchestrates deterministic workflows. Amazon Bedrock Agents orchestrates LLM-driven, non-deterministic tool use. Very different patterns.
  • Amazon Bedrock Model Evaluation vs human testing — if the question emphasizes "objective metric comparison between foundation models," pick Amazon Bedrock Model Evaluation. Do not invent a custom evaluation pipeline when a managed one exists.
  • Amazon Bedrock custom model import vs fine-tuning on Amazon Bedrock — fine-tuning on Amazon Bedrock takes a supported base model and trains it further. Custom model import takes externally trained weights (from outside Amazon Bedrock) and hosts them. Both require Provisioned Throughput at serving time.
  • Amazon Titan vs Amazon Nova — both are AWS-native foundation models on Amazon Bedrock. Titan is the original family; Nova is the newer, broader family covering text, multimodal, image, and video. When a 2025-era scenario asks for "the newest AWS-native multimodal model," Nova is usually the intended answer.
  • Vision capability on Amazon Bedrock — Claude 3 and 3.5, Nova Lite and Pro, Llama 3.2 Vision accept images. Titan Text Express and Jurassic-2 do not. Mismatches here sink answers.

Must-Memorize Numbers and Facts for Amazon Bedrock

  • Amazon Bedrock pricing is per 1,000 tokens on on-demand; output tokens cost more than input tokens on every model.
  • Amazon Bedrock Provisioned Throughput terms are No Commitment, 1 Month, and 6 Month. Longer commitments have deeper discounts.
  • Claude 3 and 3.5 on Amazon Bedrock support up to 200K-token context.
  • Jamba 1.5 Large on Amazon Bedrock supports up to 256K-token context — the largest in the public Amazon Bedrock catalog at the time of writing.
  • Amazon Bedrock Guardrails can be attached at invocation time or set as a default for a model.
  • Amazon Bedrock Knowledge Bases supports Amazon OpenSearch Serverless, Aurora PostgreSQL (pgvector), Pinecone, Redis Enterprise Cloud, and MongoDB Atlas as vector stores.
  • Amazon Bedrock Agents supports OpenAPI 3.0 schemas and AWS Lambda as action backends.
  • Customer prompts and completions on Amazon Bedrock are never used to train base foundation models.
  • Amazon Bedrock custom model import requires Provisioned Throughput for inference.
  • Cross-Region Inference on Amazon Bedrock stays within a geographic boundary group (US, EU, APAC).
  • Amazon Bedrock Batch inference is roughly 50 percent of on-demand per-token pricing.

Practice Scenarios — AIF-C01 Amazon Bedrock Mapping

Scenario 1: A startup wants to add a coding assistant to its developer tools, needs the highest-quality reasoning on long Python files, and latency is acceptable at 1-2 seconds per response. Correct Amazon Bedrock choice: Claude 3.5 Sonnet on Amazon Bedrock.

Scenario 2: A SaaS app needs to classify 10 million short support messages per day into 12 categories. Cost is the primary constraint; quality at high volume matters more than peak reasoning. Correct Amazon Bedrock choice: Claude 3.5 Haiku or Amazon Nova Micro/Lite on Amazon Bedrock (Batch inference further cuts cost).

Scenario 3: A legal team must summarize 300-page contracts end-to-end without chunking. Correct Amazon Bedrock choice: AI21 Jamba 1.5 Large on Amazon Bedrock (256K context) or Claude 3.5 Sonnet (200K context).

Scenario 4: A marketing team wants on-brand product images generated from text descriptions. Correct Amazon Bedrock choice: Stability AI Stable Diffusion or Amazon Titan Image Generator or Amazon Nova Canvas on Amazon Bedrock.

Scenario 5: A company fine-tuned a Llama model on its proprietary dataset outside of AWS and wants to serve it through the same Amazon Bedrock API their app already uses. Correct Amazon Bedrock choice: Amazon Bedrock custom model import with Provisioned Throughput.

Scenario 6: An enterprise wants a chat assistant grounded in Confluence, SharePoint, and S3 documents with minimal infrastructure and managed chunking + embedding + retrieval. Correct Amazon Bedrock choice: Amazon Bedrock Knowledge Bases with Claude or Nova as the generation model.

Scenario 7: A customer-support bot needs to call internal order-status and shipping APIs across multi-turn conversation to resolve tickets end-to-end. Correct Amazon Bedrock choice: Amazon Bedrock Agents with Lambda Action Groups, Claude as the reasoning model.

Scenario 8: A production chatbot is hitting throttling during peak hours. The team does not want to commit to monthly reserved capacity. Correct Amazon Bedrock choice: enable Cross-Region Inference on Amazon Bedrock.

Scenario 9: A healthcare customer needs to ensure the chatbot never gives medical diagnoses and must redact patient names from outputs. Correct Amazon Bedrock choice: Amazon Bedrock Guardrails with Denied Topics + PII filters.

Scenario 10: The AI team must objectively compare Claude 3.5 Sonnet, Llama 3.1 70B, and Mistral Large 2 for summarization quality before picking one. Correct Amazon Bedrock choice: Amazon Bedrock Model Evaluation with automated ROUGE/BERTScore + optional human review.

Scenario 11: A data scientist wants to build and train a custom XGBoost fraud model with notebooks, feature engineering, and hyperparameter tuning. Correct choice: Amazon SageMaker (not Amazon Bedrock — this is out of scope for Amazon Bedrock).

Scenario 12: A non-technical business analyst wants to query company sales documents in natural language and get cited answers without any code. Correct choice: Amazon Q Business (built on Amazon Bedrock but is the end-user-facing product).

FAQ — Amazon Bedrock Model Selection Top Questions

1. How do I choose between Amazon Bedrock and Amazon SageMaker for a generative AI project?

Pick Amazon Bedrock when you want to call a foundation model through an API, skip infrastructure management, and customize via prompts, RAG (Amazon Bedrock Knowledge Bases), or managed fine-tuning. Pick Amazon SageMaker when you need to build a model from scratch, have specialized model architectures, or need full control over training code and serving instances. A practical 2026 pattern: most companies use Amazon Bedrock for 80 percent of generative AI workloads and reserve Amazon SageMaker for the 20 percent that require truly custom modeling. On AIF-C01, any scenario with "foundation model," "Claude," "Titan," "Nova," or "generative AI API" maps to Amazon Bedrock; any scenario with "notebooks," "hyperparameter tuning," or "build a custom model" maps to Amazon SageMaker.

2. When should I use Provisioned Throughput on Amazon Bedrock instead of On-Demand?

Provisioned Throughput on Amazon Bedrock makes sense in three situations. First, whenever you serve a custom fine-tuned or custom imported model on Amazon Bedrock — Provisioned Throughput is the only serving mode available. Second, when production traffic is heavy and steady enough that the per-hour PT cost undercuts equivalent on-demand token spend. Third, when you need guaranteed capacity and predictable latency for an SLA-bound workload. Stay on On-Demand on Amazon Bedrock for dev environments, proofs of concept, and most production chat workloads — throughput is shared but generally sufficient, and cost scales to zero.

3. What is the difference between Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases?

Amazon Bedrock Knowledge Bases is the managed retrieval-augmented generation (RAG) pipeline — it connects data sources, generates embeddings, stores vectors, and answers queries with grounded context in a single RetrieveAndGenerate call. Amazon Bedrock Agents is a tool-using orchestrator — it takes a goal, plans steps, calls APIs or Lambdas, and iterates until done. Knowledge Bases retrieves and answers. Agents reason and act. They compose: you can attach an Amazon Bedrock Knowledge Base to an Amazon Bedrock Agent so the agent grounds its reasoning in your documents while also calling APIs to perform actions.

4. Can I use Amazon Bedrock with my own fine-tuned Llama model?

Yes, through Amazon Bedrock custom model import. You upload the Llama weights to Amazon S3, register them with Amazon Bedrock, and invoke the imported model through the same Amazon Bedrock API used for first-party catalog models. Two constraints: the architecture must be supported (Llama, Mistral, Mixtral, Flan families at the time of writing), and inference requires Provisioned Throughput — there is no on-demand capacity for arbitrary imported weights. If those constraints do not fit, an alternative is deploying the fine-tuned Llama on Amazon SageMaker endpoints and keeping Amazon Bedrock for first-party models.

5. Do Amazon Bedrock providers use my prompts and completions to train their models?

No. All providers in the Amazon Bedrock catalog contractually agree that customer input (prompts) and output (completions) are not used to train the base foundation models. This is a foundational Amazon Bedrock guarantee and a frequent AIF-C01 scenario trap — if a distractor says "switch away from Amazon Bedrock because our prompts train the vendor's models," that distractor is false. Your data stays your data on Amazon Bedrock.

6. What is Cross-Region Inference on Amazon Bedrock and when should I turn it on?

Cross-Region Inference on Amazon Bedrock is an opt-in routing feature that lets the Amazon Bedrock runtime serve a request from a different AWS Region in the same geographic boundary group (US, EU, APAC) when the home Region is at capacity. You use a Cross-Region Inference profile ID as the modelId and Amazon Bedrock handles the routing transparently. Turn it on when you experience throttling during peak hours and do not want to commit to Provisioned Throughput. Do not turn it on expecting disaster recovery — if the whole geographic group is down, Cross-Region Inference will not save you. And do not expect latency improvements for globally distributed users; routing stays within the boundary group, not across continents.

7. How does Amazon Bedrock pricing actually work?

Amazon Bedrock On-Demand pricing charges per 1,000 input tokens and per 1,000 output tokens, with rates varying by model — Claude 3.5 Sonnet is roughly 10x Claude 3.5 Haiku, which is roughly 3x Amazon Nova Micro. Output tokens always cost more than input tokens on every Amazon Bedrock model. Batch inference on Amazon Bedrock costs approximately 50 percent of on-demand per-token pricing with higher latency (hours). Provisioned Throughput on Amazon Bedrock charges per model unit per hour — the commitment term (No Commitment, 1 Month, 6 Month) determines the discount level. Amazon Bedrock Knowledge Bases add underlying vector store costs (for example, Amazon OpenSearch Serverless OCU hours). Amazon Bedrock Guardrails are priced per 1,000 text units evaluated. Fine-tuning on Amazon Bedrock costs per training token, with the resulting custom model requiring Provisioned Throughput for inference.

8. What are Amazon Bedrock Guardrails and why are they not just IAM?

Amazon Bedrock Guardrails are content-safety filters applied to prompts and completions on Amazon Bedrock. They block or redact harmful content (hate, violence, sexual, misconduct), denied topics, specific words, PII, and ungrounded responses. AWS IAM is access control — it decides whether a principal can call bedrock:InvokeModel at all. Guardrails are about what passes through the model at inference time; IAM is about who can call the model in the first place. Both are necessary in a mature Amazon Bedrock deployment, and they solve different problems. On AIF-C01 this distinction is a top-five trap — never substitute one for the other.

Further Reading for Amazon Bedrock Model Selection

Summary of Amazon Bedrock Model Selection

Amazon Bedrock is the AWS service for calling foundation models from Anthropic (Claude), Amazon (Titan, Nova), Meta (Llama), Mistral, AI21 (Jurassic, Jamba), Cohere, and Stability AI through a single serverless API. Amazon Bedrock model selection is driven by six axes — capability, cost, latency, context window, multimodality, and licensing — and the right choice is usually the smallest, cheapest model that meets capability. Amazon Bedrock offers three pricing modes: On-Demand (default, pay-per-token), Batch (half price, high latency), and Provisioned Throughput (reserved capacity, mandatory for custom and imported models on Amazon Bedrock). Amazon Bedrock composes with Cross-Region Inference for capacity smoothing, Amazon Bedrock Agents for tool use, Amazon Bedrock Knowledge Bases for managed RAG, Amazon Bedrock Guardrails for content safety, Amazon Bedrock Model Evaluation for benchmarking, and Amazon Bedrock custom model import for bring-your-own-weights. On AIF-C01, Amazon Bedrock is tested as a selection and design problem — choose the right Amazon Bedrock model, choose the right Amazon Bedrock throughput mode, compose with the right Amazon Bedrock feature — and that framing covers a majority of Domain 3 scenarios. Keep Amazon Bedrock distinct from Amazon SageMaker (build your own), distinct from Amazon Q (end-user assistant), and distinct from AWS IAM (access control, not content safety), and the remaining Amazon Bedrock traps collapse quickly.

Official sources