examhub .cc The most efficient path to the most valuable certifications.
In this note ≈ 27 min

GenAI Business Capabilities, Limitations, and AWS Infrastructure

5,320 words · ≈ 27 min read

Generative AI capabilities and limitations is the single task statement on AIF-C01 (Task 2.2, with infrastructure overlap into Task 2.3) that decides whether a candidate can make real-world generative AI deployment decisions, not just recite definitions. The AWS Certified AI Practitioner (AIF-C01) exam expects you to recognize what generative AI does well (summarization, translation, draft-to-polish, code completion, open-domain Q&A, creative ideation), what generative AI does poorly (deterministic math, cite-verifiable facts, real-time data access, complex multi-step logic without scaffolding), and how to mitigate generative AI failure modes such as hallucination and bias. Roughly 15 to 20 percent of Domain 2 questions touch some aspect of generative AI capabilities and limitations — meaning three to four questions per sitting depending on form assembly — and they are among the most scenario-heavy items on the blueprint.

This study guide walks through every sub-skill of generative AI capabilities and limitations that appears on AIF-C01: the capability catalog, the failure-mode catalog, hallucination causes and mitigations (Retrieval-Augmented Generation, grounding, structured output, self-reflection, temperature=0), bias in generative AI outputs, cost drivers (parameter count multiplied by token volume multiplied by throughput), latency for streaming versus batch, intellectual property considerations (training data provenance and output licensing), the AWS infrastructure stack backing generative AI (AWS Trainium, AWS Inferentia, Amazon EC2 P5/G5, Amazon Bedrock), and the decision rubric for when NOT to use generative AI at all. Three analogies, ten-plus callouts, and seven FAQ entries anchor the memory.

What are Generative AI Capabilities and Limitations?

Generative AI capabilities and limitations describe the two-sided map of what foundation models can and cannot reliably do in a production application. Capabilities are the tasks where a foundation model reliably beats a rule-based baseline at acceptable cost — summarization, translation, draft-to-polish editing, code completion, open-domain question answering, creative ideation, and structured extraction from unstructured text. Limitations are the tasks where a foundation model fails silently, confidently, or expensively — deterministic arithmetic, citation-verifiable facts outside the training cutoff, real-time data access without tools, complex multi-step logic without explicit scaffolding, and any decision where a wrong answer has unacceptable downstream cost.

On AIF-C01 the phrase "generative AI capabilities and limitations" is the hinge between Task 2.2 ("Understand the capabilities and limitations of generative AI for solving business problems") and Task 2.3 ("Describe AWS infrastructure and technologies for building generative AI applications"). Task 2.2 asks: given a business scenario, is generative AI the right tool, and what risks must you mitigate? Task 2.3 asks: once you decide to use generative AI, which AWS services and chips deliver it? Both tasks lean on the same mental model — capabilities multiplied by mitigations equals deployable value; limitations minus mitigations equals production risk.

Why generative AI capabilities and limitations matter for AIF-C01

The AIF-C01 blueprint weights Domain 2 (Fundamentals of Generative AI) at 28 percent — the largest single-domain slice. Community pain-point reports show that candidates who memorize foundation-model trivia (parameters, tokenizers, attention) but cannot articulate generative AI capabilities and limitations in a business scenario routinely miss these 3 to 4 questions. Worse, scenarios that test generative AI limitations are written adversarially: the "obvious" answer (use generative AI for everything) is almost always wrong. Mastering this topic is the fastest way to add points on exam day.

Plain-Language Explanation: Generative AI Capabilities and Limitations

Generative AI capabilities and limitations sound abstract, but three white-board analogies lock them down. Each analogy maps one dimension of the capability-versus-limitation trade-off.

Analogy 1 — The open-book exam (開書考試)

Think of generative AI as a brilliant but amnesiac student sitting an open-book exam.

  • The student has read the entire library once (pre-training) and remembers the gist but not every page number.
  • Ask the student to summarize a chapter — excellent, that is a capability where generative AI shines.
  • Ask the student to translate a paragraph into French — excellent again.
  • Ask the student to cite the exact page number of a law passed last Tuesday — disaster, the student will fabricate a number with total confidence. This is hallucination, the headline limitation of generative AI.
  • Now hand the student an open-book resource sheet (Retrieval-Augmented Generation) that contains the actual Tuesday law. The student's answer instantly becomes grounded and trustworthy.
  • Ask the student to multiply 879,234 by 2,314 in their head — they will guess. But hand them a calculator (tool use) and they will compute it exactly.

The lesson: generative AI is a pattern-matching reasoner, not a database or calculator. Capabilities emerge where patterns dominate; limitations appear where exact retrieval or exact computation dominate. Every mitigation on AIF-C01 (RAG, grounding, tool use, structured output, temperature=0) is a way to hand the student the right resource for the right kind of question.

Analogy 2 — The kitchen brigade running a new menu (廚房)

Picture a new restaurant with foundation models as the line cooks.

  • Summarization, translation, draft-to-polish are weeknight pasta — cheap ingredients (input tokens), quick plating (output tokens), consistent quality. Any cook (small model) can do it.
  • Code completion, complex Q&A, creative ideation are tasting-menu courses — they require the experienced chef (a larger foundation model with more parameters) and more prep time (longer context window). Quality scales with model size, but so does cost.
  • Arithmetic on the check, real-time table availability, citing specific health codes are jobs for the point-of-sale system, the reservations system, and the compliance binder — NOT the cook. If the cook tries to do them from memory, they hallucinate.
  • Service speed depends on whether dishes stream out course by course (streaming tokens, perceived latency low, time-to-first-token 200 to 800 ms) or arrive all at once at the end of the meal (batch inference, total throughput higher, per-request latency much higher).
  • Cost is parameter count multiplied by ingredients consumed multiplied by how fast you need them plated — a three-Michelin-star kitchen (a 400B-parameter model on high-throughput GPUs) will always cost orders of magnitude more than a diner (a 7B-parameter model on AWS Inferentia).

On exam day, any scenario asking you to balance speed, quality, and cost is asking you to pick the right station in the brigade.

Analogy 3 — Insurance (保險)

Generative AI deployment is an insurance underwriting problem.

  • Every capability has a premium you pay in tokens: input tokens (what you feed in, including RAG context) plus output tokens (what the model writes back). The more context, the higher the premium.
  • Every limitation has a deductible you pay when things go wrong: a hallucinated legal citation in a production contract, a biased hiring recommendation, a copied passage from training data that triggers a lawsuit. The deductible is reputation, regulatory fines, and lost customers.
  • Mitigations are the safety devices that lower the premium-plus-deductible total: RAG lowers hallucination risk, Amazon Bedrock Guardrails lower toxicity and PII leakage risk, temperature=0 lowers output variance, human-in-the-loop review caps catastrophic failure.
  • When to NOT buy the policy at all — when the deductible (cost of a wrong answer in a life-safety, medical, legal, or regulated financial decision) is so high that no amount of premium-reduction makes the risk acceptable. Generative AI is wrong for those scenarios. Pick deterministic rules or human experts instead.

The insurance frame is the single most exam-ready way to reason about generative AI capabilities and limitations.

What Generative AI Does Well — The Capability Catalog

Generative AI capabilities cluster into eight repeatable patterns. Memorize them; AIF-C01 scenarios almost always reuse this list verbatim.

Summarization

Foundation models excel at condensing long inputs into short outputs. Summarization works because pre-training teaches the model which tokens tend to matter and which are filler. Amazon Bedrock with Anthropic Claude, Amazon Titan Text, Meta Llama, or Mistral all ship strong summarization. Use cases: call-center transcript summaries, legal-brief digests, earnings-call highlights, incident post-mortems. Capability ceiling: summaries remain extractive-abstractive, not fact-checked. Pair with grounding when source fidelity matters.

Translation

Machine translation is a sequence-to-sequence capability that modern foundation models handle at near-professional quality for major language pairs. For AIF-C01, recognize that a dedicated Amazon Translate call is often cheaper and simpler, while Amazon Bedrock translation shines when you need tone transfer, domain-specific vocabulary, or combined translation-plus-summarization in a single prompt.

Draft-to-polish editing

Generative AI can take a rough draft and rewrite it for a target audience, tone, or style — press releases, marketing copy, developer documentation, resume bullets. This is the highest-ROI office productivity capability of generative AI and the core of Amazon Q in productivity surfaces.

Code completion and code explanation

Foundation models trained on public source code (Amazon Q Developer, GitHub-based assistants) can generate, complete, explain, and refactor code. Capability is strongest for mainstream languages (Python, JavaScript, Java, Go, Rust) and common patterns. Capability degrades for proprietary domain-specific languages and cutting-edge library versions (knowledge cutoff limitation).

Open-domain question answering

Foundation models can answer broad questions using pre-training knowledge. This is the single most hallucination-prone capability, so AIF-C01 scenarios heavily feature question answering plus grounding plus RAG as a combined mitigation pattern.

Creative ideation

Generative AI can brainstorm taglines, product names, storyboard concepts, plot variations, and marketing angles at a rate no human writer can match. Quality is judged by the human curator downstream, not by automated metrics.

Structured extraction from unstructured text

With explicit prompt engineering (JSON schema prompts, structured-output modes), foundation models extract entities, fields, and relationships from emails, invoices, contracts, and medical notes. Amazon Textract handles OCR-plus-forms; generative AI handles the remaining prose extraction.

Conversational interfaces

Multi-turn chat is a capability unique to generative AI versus classical ML. Amazon Bedrock Agents plus Amazon Bedrock Knowledge Bases plus Amazon Bedrock Guardrails assemble a conversational interface without you writing dialog code.

Every AIF-C01 generative AI scenario describes one of these eight capability patterns in plain English. If you recognize the pattern in the first sentence of the stem, you have already narrowed the answer to "use generative AI with mitigation X." Summarization plus fidelity requirement equals RAG. Open-domain Q&A plus fidelity requirement equals RAG. Code completion equals Amazon Q Developer. Creative ideation equals Amazon Bedrock direct call. Memorize the capability-to-service mapping. Source ↗

What Generative AI Does Poorly — The Limitation Catalog

Generative AI limitations are equally predictable. The AIF-C01 exam will always frame at least one question where the "obvious" use of generative AI is the wrong answer.

Deterministic math and arithmetic

Foundation models are next-token predictors, not calculators. Ask a model to compute the exact total of a 47-line invoice and you will get a plausible number that is often wrong. Mitigation: tool use (the model emits a function call to a calculator or Python sandbox) or just do the math in code before the prompt. On AIF-C01, any scenario that says "ensure financial calculations are accurate to the cent" is not a generative AI-only scenario.

Citation-verifiable facts beyond training cutoff

Training data has a knowledge cutoff date. Ask about events after that date and the model either refuses or hallucinates. Mitigation: Retrieval-Augmented Generation or tool use that fetches fresh data.

Real-time data access

Foundation models cannot natively query a database, hit an API, or read today's stock price. They need explicit tools. Amazon Bedrock Agents provides tool use; without it the limitation is absolute.

Complex multi-step logic without scaffolding

Foundation models handle two- to three-step reasoning natively but degrade quickly on five- to ten-step logical chains. Mitigation: Chain-of-Thought prompting, prompt chaining, or Amazon Bedrock Agents that decompose the task.

Factual precision in regulated domains

Medical dosing, legal citations, tax calculations, aviation safety — domains where a hallucinated answer is dangerous. Generative AI can draft, but a licensed human or deterministic system must verify.

Consistency across long generations

Models drift over long outputs. A 10,000-token report may contradict itself at token 8,000. Mitigation: chunked generation with verification steps between chunks.

Understanding truly novel information

If a concept was never in training data and no context is supplied, the model cannot invent coherent understanding from nothing. RAG fixes this only when the retrieved context is complete.

AIF-C01 scenarios often describe a task that superficially screams "use Amazon Bedrock." Read the stem for disqualifying phrases: "must be exact to the cent," "must cite the current regulation," "safety-critical," "regulatory determinism required," "no tolerance for hallucination." Any of these phrases flip the correct answer away from pure generative AI and toward either deterministic rules, RAG-plus-grounding-plus-human-review, or a classical ML approach with Amazon SageMaker. Source ↗

Hallucination — The Flagship Generative AI Limitation

Hallucination is when a foundation model generates confident output that is factually wrong or unsupported by any source. AIF-C01 tests hallucination more often than any other generative AI failure mode.

Why hallucination happens

Foundation models optimize for the most plausible next token given the preceding tokens and pre-training distribution. They are not optimizing for truth. Four root causes:

  1. Knowledge cutoff — the fact is not in training data.
  2. Knowledge gap — the fact was in training data but under-represented, so the model cannot recover it reliably.
  3. Prompt ambiguity — the question is vague, so the model guesses a specific interpretation.
  4. Sampling randomness — non-zero temperature injects variance that produces different wrong answers on each run.

Hallucination is generative AI output that is syntactically fluent and presented with confidence but is factually incorrect, unsupported by cited sources, or invented. The term covers both false facts (the model states a wrong date) and false citations (the model fabricates a URL or legal case that does not exist). Hallucination is intrinsic to generative AI because foundation models optimize for plausibility, not truth. Source ↗

Mitigations for hallucination

AIF-C01 expects fluency in five mitigation techniques. Memorize the name and one-line description of each.

Retrieval-Augmented Generation (RAG)

RAG grounds the model in retrieved source documents. Before calling the foundation model, the application embeds the user query, searches a vector store (Amazon OpenSearch Service k-NN, Amazon Aurora PostgreSQL pgvector, or an Amazon Bedrock Knowledge Base), and injects the top-ranked chunks into the prompt. The model then answers from retrieved context, not memory. RAG is the most-cited generative AI hallucination mitigation on AIF-C01.

Grounding and grounding checks

Grounding is a broader concept: any technique that anchors the model to authoritative data. Amazon Bedrock Guardrails includes a contextual grounding check that compares the model's response to the provided source and flags unsupported claims. Grounding checks catch hallucinations that slip past RAG retrieval.

Structured output and JSON schema constraints

Forcing the model to emit JSON with a fixed schema constrains hallucination because the allowed output space is narrow. Amazon Bedrock structured output modes (response format constraints) and tool-use schemas provide this. A hallucinated JSON key is easier to detect and reject than a hallucinated prose sentence.

Self-reflection and self-consistency

Self-reflection is a prompt pattern where the model reviews its own draft output against a checklist before finalizing. Self-consistency samples the model multiple times and picks the most frequent answer. Both add cost but reduce hallucination rate meaningfully on reasoning tasks.

Temperature=0 (deterministic decoding)

Setting temperature=0 (or the lowest supported value) makes the model pick the single most probable token at each step. The output becomes deterministic (same prompt yields same output) and less creative but more faithful. For factual Q&A, temperature=0 is the default recommendation.

Remember five hallucination mitigations with RGSST: RAG (retrieve authoritative context), Grounding check (Amazon Bedrock Guardrails contextual grounding), Structured output (JSON schema), Self-reflection (model reviews own draft), Temperature=0 (deterministic decoding). On AIF-C01 every hallucination-mitigation question maps to one or more RGSST techniques. If the scenario mentions "ground truth documents" or "citations," the answer is RAG. If it mentions "verify response against source," the answer is grounding check. If it mentions "force exact JSON," the answer is structured output. If it mentions "reproducibility," the answer is temperature=0. Source ↗

Hallucination versus bias — different failure modes

Hallucination is a factuality failure. Bias is a fairness failure. A model can hallucinate without bias (wrong but even-handed) or be biased without hallucinating (systematically skewed but retrievable). AIF-C01 sometimes contrasts the two in a single stem — read carefully.

Bias in Generative AI Outputs

Bias in generative AI is any systematic skew in model outputs that correlates with protected attributes (gender, race, age, nationality) or with demographic subgroups. Bias in generative AI has three root causes worth knowing for AIF-C01.

Training data bias

Foundation models learn from internet-scale corpora. Those corpora over-represent English, Western cultural references, male-written code, and historical stereotypes. The model absorbs and amplifies whatever the corpus encodes. This is the dominant source of generative AI bias.

Algorithmic bias

The optimization objective (next-token prediction) does not include fairness. Architectural choices and loss functions can unintentionally favor majority patterns over minority patterns.

Deployment bias

Even a fair model becomes biased in deployment if the human-facing prompt or the downstream business rule reintroduces skew (for example, a resume-screening prompt that mentions "culture fit" will rediscover demographic patterns).

Bias mitigations on AWS

  • Amazon SageMaker Clarify for Foundation Models runs automated bias evaluation jobs against Amazon Bedrock models and custom models.
  • Amazon Bedrock Guardrails filters hate, insults, sexual, violence, and misconduct categories — a defense against bias-driven toxic output.
  • Human-in-the-loop review (Amazon A2I) routes low-confidence or sensitive outputs to human reviewers.
  • Prompt engineering — explicit fairness instructions in system prompts reduce bias in outputs, though never eliminate it.

A single biased training example in a dataset of billions is invisible. But when a foundation model generates millions of outputs per day in production, even a small bias becomes systemic discrimination affecting real users. On AIF-C01 any scenario involving hiring, lending, insurance pricing, healthcare triage, or criminal-justice decisions must treat generative AI bias as a first-class risk — not an afterthought. The correct answer always includes bias evaluation (Amazon SageMaker Clarify) plus human review plus documented limitations (AWS AI Service Cards or Amazon SageMaker Model Cards). Source ↗

Cost Drivers — Parameter Count × Tokens × Throughput

Generative AI cost is predictable once you know the three multipliers. AIF-C01 will test your ability to estimate relative cost between options.

Parameter count

Larger models (70B, 175B, 400B parameters) cost more per token than smaller models (7B, 13B). The price difference between Anthropic Claude Opus and Claude Haiku on Amazon Bedrock spans roughly 10x to 60x depending on benchmark. Always pick the smallest model that meets quality requirements.

Token volume

Cost scales linearly with input tokens plus output tokens. Input tokens include the system prompt, user message, few-shot examples, and RAG context. Output tokens are what the model writes back. Amazon Bedrock bills input and output separately, often with output priced higher.

Throughput mode

  • On-demand — pay per token, no commitment. Use for unpredictable or low-volume workloads.
  • Provisioned Throughput — reserve model units for a time commitment (1 month or 6 months). Lower per-token price but a minimum commitment. Use for sustained high-throughput production.
  • Batch inference — Amazon Bedrock batch API offers discounted pricing (typically 50 percent off on-demand) for jobs that can tolerate hours of latency. Use for overnight summarization, back-office document processing, or dataset labeling.

Additional cost levers

  • Context window size — larger context windows (up to 200K tokens on Claude 3) cost more and have non-linear latency growth.
  • Fine-tuning cost — separate training compute cost, plus ongoing Provisioned Throughput for the fine-tuned model.
  • RAG cost — embedding model invocations plus vector database storage and query cost. Often the biggest hidden line item.
  • Guardrails cost — Amazon Bedrock Guardrails bills per policy evaluation.

For rough cost estimation on AIF-C01 scenarios use: (parameters relative cost multiplier) × (input tokens + output tokens) × (throughput mode discount). Anthropic Claude Haiku is roughly 1x baseline. Claude Sonnet is roughly 10x to 15x Haiku. Claude Opus is roughly 60x Haiku. Batch mode knocks 50 percent off. Provisioned Throughput is break-even around 80 percent utilization. Use the smallest model that meets quality — then optimize token count with prompt engineering and RAG chunk pruning. Source ↗

Latency — Streaming versus Batch

Latency for generative AI behaves differently from classical ML inference. Two numbers matter on AIF-C01.

Time-to-first-token (TTFT)

The delay between API call and the first output token arriving. Typical range 200 ms to 2 seconds depending on model size, prompt length, and region. TTFT dominates perceived latency in conversational UIs because users watch the first token appear.

Tokens-per-second (output rate)

After the first token, the model streams tokens at a rate of roughly 30 to 80 tokens per second for mid-size models, and 10 to 30 tokens per second for the largest models. A 500-token answer at 50 tokens per second takes 10 seconds to complete.

Streaming mode

Amazon Bedrock InvokeModelWithResponseStream returns tokens as they are generated. Users see output progressively. Total end-to-end time is the same as non-streaming, but perceived latency drops dramatically. Use for any chat UI.

Batch mode

Amazon Bedrock batch inference accepts a job file, processes it asynchronously (minutes to hours), and writes results to Amazon S3. Total latency is much higher per request, but throughput cost is 50 percent lower. Use for offline workloads.

Latency reduction techniques

  • Smaller model — smaller parameter count means faster generation.
  • Shorter prompts — TTFT scales with input token count.
  • Prompt caching — Amazon Bedrock prompt caching re-uses cached prefix processing.
  • Cross-Region inference — Amazon Bedrock cross-Region inference can route to least-loaded Region to smooth spikes.

Intellectual Property in Generative AI

Intellectual property is a sleeper topic on AIF-C01. Two angles appear.

Training data provenance

Foundation models are trained on corpora that include copyrighted text, code, and images. If model output closely reproduces a training passage, it can trigger copyright claims. Amazon Bedrock mitigates this through vendor indemnification (for supported providers and eligible uses) and through the Amazon Titan family, which is trained on AWS-controlled data. Customers remain responsible for reviewing output before public use.

Output licensing

Output from a foundation model is typically assigned to the customer under the model provider's terms, but terms vary. Key AIF-C01 facts:

  • Amazon Titan — customer owns output; AWS-trained on data AWS has rights to use.
  • Third-party models on Bedrock — each provider (Anthropic, Meta, Mistral, Cohere, AI21, Stability) has its own license; Amazon Bedrock passes those through.
  • Customer data — prompts and completions sent to Amazon Bedrock are NOT used to train the base models. This is a compliance-grade commitment.

Customer obligations

Customers must still:

  • Avoid feeding regulated data (PHI, PCI, PII) without proper controls (HIPAA BAA, encryption, VPC endpoints).
  • Review generative AI output before publishing, especially code (license compatibility) and marketing text (plagiarism).
  • Document generative AI use in internal AI governance programs (aligning with NIST AI RMF and ISO/IEC 42001).

On Amazon Bedrock, customer inputs (prompts) and outputs (completions) are NOT used to train or improve the base foundation models. Data is encrypted in transit (TLS) and at rest (AWS KMS), stays in the customer's AWS Region, and can be isolated using VPC endpoints. This distinguishes Amazon Bedrock from some consumer-facing generative AI products that may retain conversations for retraining. For AIF-C01, remember: "Does Amazon Bedrock train on my data?" — the answer is no. Source ↗

AWS Infrastructure for Generative AI

Task 2.3 asks you to identify which AWS technologies build generative AI applications. The stack has four layers.

Layer 1 — Custom chips

  • AWS Trainium (Trn1, Trn2 instances) — purpose-built training chip. Lowest cost per training FLOP for foundation model pre-training and large-scale fine-tuning. Amazon SageMaker HyperPod uses Trainium for multi-node distributed training.
  • AWS Inferentia (Inf1, Inf2 instances) — purpose-built inference chip. Lowest cost per inference token for steady-state production serving.

Layer 2 — General-purpose GPU

  • Amazon EC2 P5 / P5e (NVIDIA H100) — largest-scale training, foundation model pre-training. Highest performance, highest cost.
  • Amazon EC2 P4d (NVIDIA A100) — previous-generation training.
  • Amazon EC2 G5 / G6 (NVIDIA A10G / L4) — cost-efficient inference for mid-size models and fine-tuning.

Layer 3 — Managed platforms

  • Amazon SageMaker — end-to-end platform for building, training, and deploying custom models including foundation models via Amazon SageMaker JumpStart.
  • Amazon Bedrock — fully managed foundation model API. Serverless. No infrastructure to manage. The default AIF-C01 answer for "build a generative AI app without operating infrastructure."

Layer 4 — Application services

  • Amazon Q Business — pre-built enterprise assistant on top of Amazon Bedrock with connectors to SharePoint, Salesforce, ServiceNow, Amazon S3.
  • Amazon Q Developer — IDE and console coding assistant.
  • Amazon Q in QuickSight — natural-language BI.
  • Amazon Q in Connect — real-time contact-center agent assistance.

AWS Trainium is for TRAINING (the word is in the name). AWS Inferentia is for INFERENCE (the word is in the name). Both are AWS-designed chips with lower cost per operation than equivalent NVIDIA GPUs for their target workload. Use NVIDIA GPU instances (P5, G5, G6) when you need CUDA-specific libraries or when a model is not yet ported to Trainium or Inferentia. On AIF-C01, scenarios mentioning "lowest training cost" map to Trainium; "lowest inference cost" map to Inferentia; "highest peak training performance" map to Amazon EC2 P5. Source ↗

Amazon Bedrock and Amazon SageMaker both appear in generative AI scenarios. The decision rule for AIF-C01: if the scenario says "access pretrained foundation models via API with no infrastructure to manage," pick Amazon Bedrock. If it says "build, train, deploy a custom model" or "fine-tune with full control using Jupyter notebooks and training jobs," pick Amazon SageMaker. Amazon SageMaker JumpStart bridges them by offering pretrained foundation models inside the SageMaker environment — useful when deep fine-tuning control is needed. The single biggest AIF-C01 trap is picking SageMaker when Bedrock would be simpler. Source ↗

When NOT to Use Generative AI

The highest-value AIF-C01 skill inside generative AI capabilities and limitations is recognizing scenarios where generative AI is the WRONG choice. These recur on every sitting.

Do not use generative AI for

  1. Safety-critical decisions — medical diagnosis confirmation, aviation control logic, autonomous vehicle actuation, industrial safety interlocks. Use certified deterministic systems with formal verification.
  2. Regulated financial calculations requiring precision — tax computation to the cent, interest accrual, regulatory reporting totals. Use deterministic calculation engines; at most, let generative AI draft the report prose afterward.
  3. Legal compliance answers that must be cited — specific law citations, specific case numbers, specific regulation paragraphs. Without RAG plus grounding check plus human attorney review, the risk is too high.
  4. Real-time exact data — current stock prices, current inventory counts, current reservation availability. Use the authoritative system of record; if needed, wrap it in a tool for an agent.
  5. Low-volume tasks where rules are cheaper — if a 50-line regex or a 20-line SQL query solves the problem, generative AI is over-engineering.
  6. Tasks requiring true novelty outside training distribution — inventing a new mathematical theorem, discovering a new physics principle. Foundation models recombine, they do not invent de novo.
  7. Highly sensitive data without proper controls — classified information, un-redacted PHI, payment card numbers. Even on Amazon Bedrock (which does not train on customer data), the compliance posture must be designed deliberately with VPC endpoints, Amazon Macie scanning, and Amazon Bedrock Guardrails PII filters.

Do use generative AI for

  1. Summarization of long-form content.
  2. Translation between major languages.
  3. Draft-to-polish editing.
  4. Code completion and code review assistance.
  5. Open-domain Q&A with RAG grounding.
  6. Creative ideation and brainstorming.
  7. Structured extraction from unstructured text (with validation).
  8. Conversational interfaces over bounded knowledge bases.

If an AIF-C01 scenario stem includes the phrases "regulatory requirement," "auditable," "deterministic output required," "exact numerical result," "cannot tolerate variation," or "must be reproducible with legal certainty," generative AI is rarely the whole answer. The correct choice is either a deterministic rule engine, classical ML on Amazon SageMaker, or generative AI sharply constrained by RAG plus grounding plus temperature=0 plus human review. Pure "use Amazon Bedrock" answers are wrong when the stem signals regulatory determinism. Source ↗

Responsible Generative AI Deployment — Putting It Together

A production-ready generative AI application on AWS combines capabilities, limitations, and mitigations in a layered architecture.

  1. Scope the use case using the AWS Generative AI Security Scoping Matrix — is the app a consumer SaaS call, an enterprise app on a pre-built model, an enterprise app on a fine-tuned model, an enterprise app on a self-trained model, or a pre-trained model you ship?
  2. Select the model on Amazon Bedrock using cost, capability, context window, and compliance constraints.
  3. Ground the model with Amazon Bedrock Knowledge Bases (managed RAG) when factual accuracy matters.
  4. Constrain outputs with Amazon Bedrock Guardrails (content filters, denied topics, PII redaction, contextual grounding check).
  5. Set inference parameters (temperature=0 for factual tasks, higher temperature for creative tasks; max tokens to control cost).
  6. Log and monitor with Amazon CloudWatch for metrics and AWS CloudTrail for audit.
  7. Evaluate continuously with Amazon Bedrock Model Evaluation and Amazon SageMaker Clarify for Foundation Models.
  8. Route low-confidence cases to human reviewers via Amazon A2I.
  9. Document purpose, limitations, and intended use in Amazon SageMaker Model Cards and AWS AI Service Cards.

Common Exam Traps for Generative AI Capabilities and Limitations

  • Hallucination ≠ bias — hallucination is factual failure; bias is fairness failure. Both exist, both require different mitigations.
  • RAG ≠ fine-tuning — RAG adds fresh context at inference time; fine-tuning updates model weights on training data. Scenarios about "update model with new product catalog weekly" lean RAG, not fine-tuning.
  • Temperature=0 ≠ accuracy — temperature=0 is deterministic, not correct. A deterministically wrong answer is still wrong. Pair with RAG for factual accuracy.
  • Trainium vs Inferentia — Trainium for training, Inferentia for inference. Do not swap.
  • Amazon Bedrock ≠ Amazon SageMaker — Bedrock for FM-as-API, SageMaker for custom model lifecycle. Both can run foundation models but differ in operational model.
  • Generative AI ≠ always better than classical ML — for structured prediction on tabular data, classical ML on Amazon SageMaker usually beats generative AI on cost and accuracy.
  • Guardrails ≠ IAM — Amazon Bedrock Guardrails is content safety; IAM is access control. Both are needed.
  • Hallucination is not just "wrong" — it specifically means confident, fluent, unsupported output. A refusal ("I don't know") is not a hallucination.
  • Customer data on Bedrock is NOT used to train base models — common compliance question.
  • "Open-source" foundation model ≠ free to operate — you still pay infrastructure cost.

FAQ — Generative AI Capabilities and Limitations Top Questions

1. What is the single most important mitigation for generative AI hallucination?

Retrieval-Augmented Generation (RAG) is the highest-impact hallucination mitigation for production generative AI. RAG grounds the foundation model's output in retrieved source documents, so the model answers from provided context rather than pre-training memory. On AWS, Amazon Bedrock Knowledge Bases provides a fully managed RAG pipeline (ingest documents from Amazon S3 or other connectors, embed and index into a vector store, and retrieve top chunks at query time). Pair RAG with Amazon Bedrock Guardrails contextual grounding check for an additional verification layer that flags any model claims not supported by retrieved context. For AIF-C01, any scenario about "reduce hallucination while using up-to-date internal documents" has RAG as the core answer.

2. What does "temperature=0" actually do, and when should I use it?

Temperature controls the randomness of token selection during generation. At temperature=0 (or the lowest supported value) the model always picks the single most probable next token, producing deterministic output — the same prompt yields the same response every time. Use temperature=0 for factual Q&A, structured extraction, code generation from specifications, and any scenario requiring reproducibility or regulatory auditability. Use higher temperatures (0.7 to 1.0) for creative ideation, marketing copy, and brainstorming where variety is valuable. Temperature=0 reduces hallucination variance (you will not get different wrong answers on re-run) but does not reduce hallucination rate by itself — the deterministic answer can still be wrong. Combine with RAG for factual correctness.

3. What are the three biggest cost drivers for a generative AI application on Amazon Bedrock?

The three cost drivers in rough order of impact are: (1) Model choice by parameter count — Anthropic Claude Opus can cost 60x more per token than Claude Haiku; always pick the smallest model that meets quality. (2) Token volume — input tokens (system prompt plus user message plus RAG context plus few-shot examples) multiplied by output tokens. RAG context often dwarfs user input. (3) Throughput mode — on-demand is most flexible but most expensive; Provisioned Throughput is cheaper at sustained high utilization; batch inference offers roughly 50 percent off for asynchronous workloads. Secondary drivers include context window size, fine-tuning training compute, vector database cost for RAG, and Amazon Bedrock Guardrails per-evaluation charges.

4. When should I NOT use generative AI at all?

Do not use generative AI for safety-critical decisions (medical dosing, aviation control, industrial safety), for calculations requiring regulatory numerical precision (tax, financial reporting), for answers requiring verifiable legal citations without human attorney review, for real-time exact data lookups (current stock prices, live inventory), for tasks where a simple rule or SQL query already works, or for highly sensitive data without proper controls (VPC endpoints, Amazon Macie scanning, Amazon Bedrock Guardrails PII filters). For AIF-C01, scenarios with "regulatory determinism," "exact numerical result," "auditable to the cent," or "life-safety" disqualify pure generative AI answers and push toward deterministic systems, classical ML on Amazon SageMaker, or heavily scaffolded generative AI (RAG plus grounding plus human review).

5. What is the difference between AWS Trainium and AWS Inferentia?

AWS Trainium is an AWS-designed chip purpose-built for machine learning TRAINING workloads, available on Amazon EC2 Trn1 and Trn2 instances. It is optimized for foundation model pre-training and large-scale fine-tuning at lower cost per training FLOP than comparable NVIDIA GPUs. AWS Inferentia is an AWS-designed chip purpose-built for machine learning INFERENCE workloads, available on Amazon EC2 Inf1 and Inf2 instances. It is optimized for serving predictions at lower cost per inference token than comparable GPUs. Mnemonic: Trainium contains "Train," Inferentia contains "Infer." For peak training performance or when using CUDA-specific libraries, use NVIDIA-based Amazon EC2 P5 (H100) or P4d (A100) instances. For cost-efficient inference of mid-size models, use Amazon EC2 G5 or G6 (NVIDIA A10G / L4).

6. How does Amazon Bedrock protect customer intellectual property and data privacy?

Amazon Bedrock applies four privacy commitments. First, customer prompts and completions are NOT used to train the base foundation models — this is a contractual commitment, not a best-effort promise. Second, data is encrypted in transit with TLS and at rest with AWS-managed or customer-managed AWS KMS keys. Third, data stays in the customer's selected AWS Region (regional data residency). Fourth, VPC endpoints (AWS PrivateLink) allow private connectivity so Bedrock traffic never traverses the public internet. For regulated industries, Amazon Bedrock supports HIPAA-eligible workloads and has compliance attestations in AWS Artifact. Customers remain responsible for reviewing output for IP and licensing compatibility, especially for generated code (which may need license review) and generated marketing text (which should be checked for unintended similarity to training data).

7. When should I pick Amazon Bedrock over Amazon SageMaker for a generative AI use case?

Pick Amazon Bedrock when you want API access to pretrained foundation models with no infrastructure to operate, you are fine with vendor-provided models (Anthropic Claude, Meta Llama, Mistral, Amazon Titan, Cohere, AI21, Stability AI), and your customization needs are served by prompt engineering, RAG via Amazon Bedrock Knowledge Bases, or managed fine-tuning. Pick Amazon SageMaker when you need to train a custom model from scratch, run deep fine-tuning with full control (custom training loops, distributed training on AWS Trainium or NVIDIA GPU clusters), deploy models in customer-managed VPCs with specific networking, or run classical ML alongside foundation models in a single MLOps pipeline. Amazon SageMaker JumpStart is the bridge: it exposes pretrained foundation models inside the SageMaker environment for teams that want managed model selection plus deep training control. For most AIF-C01 "build a generative AI app quickly" scenarios, the answer is Amazon Bedrock.

Further Reading

Summary

Generative AI capabilities and limitations is the decision-making backbone of AIF-C01 Domain 2. Capabilities cluster into eight repeatable patterns (summarization, translation, draft-to-polish, code completion, open-domain Q&A, creative ideation, structured extraction, conversational interfaces). Limitations cluster into seven failure modes (deterministic math, knowledge cutoff, real-time data, complex logic, regulated-domain precision, long-generation drift, true novelty). Hallucination is the flagship limitation, mitigated by the RGSST stack (RAG, grounding check, structured output, self-reflection, temperature=0). Bias is a separate fairness failure mode, mitigated by Amazon SageMaker Clarify evaluation, Amazon Bedrock Guardrails content filters, and Amazon A2I human review. Cost equals parameters multiplied by tokens multiplied by throughput mode, with Provisioned Throughput and batch inference as the main cost levers. Latency splits into time-to-first-token and tokens-per-second, with streaming mode for chat UIs and batch for offline. Intellectual property considerations center on Amazon Bedrock's no-train-on-customer-data commitment and per-provider output licensing. AWS infrastructure spans AWS Trainium (training), AWS Inferentia (inference), Amazon EC2 P5/G5 (GPU), Amazon SageMaker (custom ML), and Amazon Bedrock (managed FM API). The most exam-valuable generative AI skill is recognizing when NOT to use generative AI — safety-critical, regulatory-deterministic, exact-calculation, and real-time-data scenarios push the correct answer away from pure generative AI toward deterministic systems or heavily scaffolded generative AI with RAG plus grounding plus human review. Master the capability catalog, the limitation catalog, and the mitigation stack, and generative AI capabilities and limitations questions become the easiest points on AIF-C01.

Official sources