AWS AI/ML & Analytics Services — CLF-C02 Study Notes

AWS AI/ML and analytics services are the managed cloud products that let customers apply machine learning, generative AI, and large-scale data analytics without running their own GPU clusters, Spark farms, or data warehouses. On the AWS Certified Cloud Practitioner (CLF-C02) exam, Task Statement 3.7 asks you to recognize which AWS AI/ML services fit a given use case and which AWS analytics services process which style of data. The most tested names are Amazon SageMaker, Amazon Bedrock, Amazon Q, Amazon Rekognition, Amazon Comprehend, Amazon Textract, Amazon Athena, Amazon Redshift, Amazon Kinesis, AWS Glue, and Amazon QuickSight. This topic is the single fastest-rising sub-area of Domain 3 (+25 percent year over year), so expect at least three to five questions on your exam.

This study guide covers every AWS AI/ML service and every AWS analytics service in the CLF-C02 blueprint, decodes the traps between Amazon SageMaker and Amazon Bedrock, between Amazon Kinesis Data Streams and Amazon Data Firehose, and between Amazon Athena and Amazon Redshift, and ends with five FAQ entries plus a practice-ready summary.

What are AWS AI/ML & Analytics Services?

AWS AI/ML services are a three-layer stack. The bottom layer is Amazon SageMaker, the end-to-end AWS AI/ML platform for data scientists who train and deploy custom models. The middle layer is Amazon Bedrock, the serverless API that hands you pretrained foundation models such as Anthropic Claude, Meta Llama, and Amazon Titan for generative AI. The top layer is the family of task-specific AWS AI/ML services (Amazon Rekognition, Amazon Comprehend, Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Textract, Amazon Lex, Amazon Personalize, Amazon Forecast, Amazon Kendra) plus Amazon Q — all consumed through a single API call with zero model tuning required.

AWS analytics services sit next to this AI/ML stack. Amazon Athena runs serverless SQL on Amazon S3. Amazon Redshift is the petabyte-scale data warehouse. AWS Glue handles ETL and the Data Catalog. Amazon Kinesis (and Amazon MSK) stream events in real time. Amazon EMR runs managed Hadoop and Spark. Amazon OpenSearch Service powers search and observability. Amazon QuickSight delivers BI dashboards. AWS Lake Formation governs the data lake.

Together, AWS AI/ML services and AWS analytics services form the backbone of every modern data product on AWS — and the exam will test whether you can pick the right service on the first read.

Why AI/ML & Analytics Services matter for CLF-C02

CLF-C02 Domain 3 weighs 34 percent of the exam. Task Statement 3.7 is newly enriched with generative AI coverage — Amazon Bedrock and Amazon Q entered the blueprint post-2024. Explorer data shows the fastest trend line of any topic (+25 percent mention growth) and an exam-signal frequency of 48 for generative AI questions (+35 percent). Missing this topic is the single biggest way to fail CLF-C02 today.

Plain-Language Explanation: AI/ML & Analytics Services

AWS AI/ML services 跟 AWS analytics services 聽起來很複雜，用三個白話比喻就懂。

Analogy 1 — The kitchen brigade (廚房)

Think of data work as a restaurant kitchen.

Amazon SageMaker is the pastry chef with a full molecular-gastronomy lab. You supply raw ingredients (training data), and you bake a custom cake (the model) from scratch.
Amazon Bedrock is the ready-to-serve dessert station — pretrained foundation models (Claude, Llama, Titan) are already plated. You just add your own toppings through a prompt.
Amazon Q is the waiter who already knows the menu and your regular customers; it answers business questions on the fly.
Amazon Rekognition, Amazon Comprehend, Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Textract are the single-function appliances — the blender, the toaster, the juicer. Drop one ingredient in, get one result out.
Amazon Athena, Amazon Redshift, Amazon EMR, Amazon Kinesis, AWS Glue, Amazon QuickSight are the prep station, the cold storage, the industrial mixer, the conveyor belt, the dishwasher, and the plating window — every AWS analytics service maps to a different kitchen step.

If the exam question is "we want to analyze raw customer photos," you reach for the blender (Amazon Rekognition), not the molecular-gastronomy lab (Amazon SageMaker).

Analogy 2 — The Swiss Army knife (瑞士刀)

The AWS AI/ML services portfolio is a 13-blade Swiss Army knife.

The big custom blade is Amazon SageMaker — you sharpen it yourself.
The corkscrew is Amazon Bedrock — already shaped, just pull the cork (prompt).
The scissors, tweezers, toothpick, file, and saw are Amazon Rekognition (image/video), Amazon Comprehend (NLP), Amazon Textract (OCR), Amazon Transcribe (speech-to-text), Amazon Polly (text-to-speech), Amazon Translate (translation), Amazon Lex (chatbot), Amazon Personalize (recommendations), Amazon Forecast (time series), and Amazon Kendra (enterprise search).

On the CLF-C02 exam you do not need to build anything; you only need to pick the correct blade for the job. "Extract text and tables from a scanned PDF" = Amazon Textract. "Translate reviews into Spanish" = Amazon Translate. "Detect sentiment in tweets" = Amazon Comprehend. That is the whole trick.

Analogy 3 — The postal system (郵政系統)

AWS analytics services route data the way a postal system routes mail.

Amazon Kinesis Data Streams is the live conveyor belt inside the sorting facility — packets fly by in real time, you decide where they land.
Amazon Data Firehose (previously Kinesis Data Firehose) is the automatic delivery truck that drops mail at a preconfigured address (Amazon S3, Amazon Redshift, Amazon OpenSearch Service).
Amazon MSK is the same conveyor belt, but built on Apache Kafka for customers who already standardized on Kafka.
Amazon S3 is the warehouse where letters live long-term.
AWS Glue is the mailroom that labels every envelope (Data Catalog) and rewrites the address format (ETL).
Amazon Athena is the clerk who reads any letter right inside the warehouse with SQL.
Amazon Redshift is the high-security archive with indexed shelves — fast retrieval for petabyte-scale OLAP reports.
Amazon EMR is the industrial sorting robot running Spark and Hadoop.
Amazon QuickSight is the front-desk monitor that displays the daily mail statistics.
AWS Lake Formation is the postmaster general who sets the permission rules for the whole building.
Amazon OpenSearch Service is the search index — ask "where is this letter?" and get an instant answer.

Keep this mail-routing picture in mind and every AWS analytics service question becomes a geography quiz.

Core Operating Principles — Pre-built AI APIs vs Custom ML vs Generative AI

AWS AI/ML services follow a three-tier abstraction model. Understanding the tier boundary is the single most useful mental tool for CLF-C02.

Tier 1 — AI Services (pre-built APIs): Amazon Rekognition, Amazon Comprehend, Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Textract, Amazon Lex, Amazon Personalize, Amazon Forecast, Amazon Kendra. No model training required. You call an API, you get a result.
Tier 2 — Generative AI / Foundation Models: Amazon Bedrock (foundation model marketplace) and Amazon Q (pre-built assistant built on top of Bedrock). You provide prompts; the model reasons and generates text, images, or code.
Tier 3 — ML Platform: Amazon SageMaker. You bring data, pick an algorithm, train, tune, deploy. Maximum flexibility, maximum effort.

Questions that say "the company wants to use pretrained models with minimal ML expertise" map to Tier 1 or Tier 2. Questions that say "the data science team needs a notebook environment to train a custom model" map to Tier 3 (Amazon SageMaker).

A foundation model is a large, pretrained model (such as Anthropic Claude or Amazon Titan) trained on massive general-purpose data that can be adapted via prompts or fine-tuning to many downstream tasks. Amazon Bedrock is the AWS service for accessing foundation models as APIs. Source ↗

The Pre-built vs Custom decision tree

"I have no ML team, I want OCR" → Amazon Textract (pre-built).
"I have no ML team, I want sentiment detection" → Amazon Comprehend (pre-built).
"I want a chatbot that sounds human" → Amazon Lex plus Amazon Polly, or Amazon Bedrock for generative responses.
"I want to deploy my own trained fraud model" → Amazon SageMaker.
"I want to summarize internal documents with a chat UI" → Amazon Q Business.
"I want to generate marketing copy via API" → Amazon Bedrock with Claude or Titan.

Generative AI Services — Amazon Bedrock and Amazon Q

Amazon Bedrock

Amazon Bedrock is the fully managed AWS AI/ML service that provides foundation models from Anthropic (Claude), Meta (Llama), AI21 Labs (Jurassic), Cohere, Mistral AI, Stability AI (Stable Diffusion), and Amazon (Titan, Nova) through a single API. Amazon Bedrock is serverless — no GPUs to provision, no model servers to run. Customers can customize foundation models with their own data via fine-tuning or Retrieval Augmented Generation (RAG) using Amazon Bedrock Knowledge Bases, and they can chain models with Amazon Bedrock Agents.

Key Amazon Bedrock exam facts:

Serverless, no infrastructure management.
Multiple foundation-model providers behind one API.
Data sent to Amazon Bedrock is not used to train the base models.
Available in many AWS Regions with per-region model availability.

Amazon Q

Amazon Q is the business-facing AI assistant family powered partly by Amazon Bedrock.

Amazon Q Business is the enterprise assistant that connects to your company documents, wikis, S3 buckets, Salesforce, ServiceNow, and more, and answers natural-language questions with citations.
Amazon Q Developer (previously Amazon CodeWhisperer) is the in-IDE coding assistant that generates, reviews, and explains code; it also helps with AWS Management Console troubleshooting.
Amazon Q in QuickSight generates BI narratives and dashboards from plain-English questions.
Amazon Q in Connect assists contact-center agents in real time.

On the CLF-C02 exam, if the scenario says "business users want a chat assistant over internal documents," pick Amazon Q Business. If it says "developers need API access to Claude / Llama / Titan to build a custom generative AI app," pick Amazon Bedrock. Amazon Q is what non-technical users touch; Amazon Bedrock is what developers call from code. Source ↗

Custom ML Platform — Amazon SageMaker

Amazon SageMaker is the flagship end-to-end AWS AI/ML platform. It covers every step of the ML lifecycle:

Data prep — Amazon SageMaker Data Wrangler, Amazon SageMaker Feature Store, Amazon SageMaker Ground Truth for labeling.
Model building — Amazon SageMaker Studio notebooks, built-in algorithms, JumpStart pretrained models.
Training — managed training jobs with distributed training, Automatic Model Tuning (hyperparameter search), Amazon SageMaker HyperPod for large-scale foundation-model training.
Deployment — real-time endpoints, serverless endpoints, batch transform, Amazon SageMaker Asynchronous Inference, Multi-Model Endpoints.
MLOps — Amazon SageMaker Pipelines, Model Registry, Model Monitor, Clarify (bias detection).

For CLF-C02 you only need to recognize Amazon SageMaker as the AWS AI/ML service for building, training, and deploying custom models end-to-end. Deep feature recall (which sub-feature does what) is scoped to AIF-C01 and MLS-C01, not CLF-C02.

Amazon SageMaker = build, train, deploy YOUR OWN model. Amazon Bedrock = call SOMEONE ELSE'S foundation model via API. If the question mentions "training data," "notebooks," or "hyperparameters," it is Amazon SageMaker. If it mentions "foundation model," "Claude," "Titan," or "generative AI," it is Amazon Bedrock. Source ↗

Pre-built AI/ML APIs — The Service-per-Task Catalog

These are the AWS AI/ML services that solve a single job with a single API call. Memorize the noun-to-service mapping.

Amazon Rekognition — Image and video analysis

Amazon Rekognition analyzes images and videos to detect objects, scenes, activities, unsafe content, text in images, and faces (including celebrity recognition, face comparison, and face search against a stored collection). Live video analysis works with Amazon Kinesis Video Streams.

Use cases: content moderation on a user-generated platform, face-based login, workplace safety (detecting PPE).

Amazon Comprehend — Natural language processing

Amazon Comprehend is the pre-built NLP AWS AI/ML service. It extracts entities (people, places, organizations), key phrases, sentiment (positive/negative/neutral/mixed), language detection, syntax, and Personally Identifiable Information (PII). Amazon Comprehend Medical adds medical-specific NLP (ICD-10-CM codes, RxNorm drugs).

Use cases: customer review sentiment scoring, compliance redaction, multilingual content routing.

Amazon Textract — Document OCR plus forms and tables

Amazon Textract goes beyond OCR. It preserves the structure of forms (key/value pairs) and tables from PDFs, invoices, IDs, and handwritten pages. Unlike plain OCR, Amazon Textract returns structured JSON with cells and field relationships.

Use cases: automated invoice processing, loan-application intake, medical-form digitization.

Amazon Transcribe — Speech-to-text

Amazon Transcribe converts audio to text in batch or streaming mode, supports many languages, speaker identification, custom vocabulary, automatic language detection, and Amazon Transcribe Medical for clinical speech.

Use cases: call-center transcription, podcast captions, meeting minutes.

Amazon Polly — Text-to-speech

Amazon Polly turns text into lifelike speech using neural and long-form voices. Output can be MP3, Ogg Vorbis, or PCM. Supports Speech Synthesis Markup Language (SSML) for fine-grained control.

Use cases: IVR prompts, audiobook generation, accessibility tooling.

Amazon Translate — Neural machine translation

Amazon Translate offers neural translation across 75+ languages, real-time or batch, with custom terminology for brand-specific vocabulary and Active Custom Translation for domain adaptation.

Use cases: localize product catalog, real-time chat translation, multilingual customer support.

Amazon Lex — Conversational chatbots

Amazon Lex is the conversational AI AWS AI/ML service that powers Alexa. It builds voice and text chatbots with intents, slots, and fulfillment via AWS Lambda. Amazon Lex V2 adds multi-language bots and streaming conversations.

Use cases: customer-service bots, appointment scheduling, banking IVR.

Amazon Personalize — Real-time recommendations

Amazon Personalize builds the same recommendation engine technology Amazon.com uses. Feed in user interactions and item catalogs; get real-time personalized recommendations, related items, and personalized rankings via API.

Use cases: product recommendations, content feed personalization, personalized emails.

Amazon Forecast — Time-series forecasting

Amazon Forecast generates time-series forecasts using the same technology Amazon.com uses for demand planning. It combines AutoML over multiple algorithms (ARIMA, Prophet, DeepAR+, CNN-QR).

Use cases: retail inventory forecasting, workforce planning, financial metric projection.

Amazon Kendra — Enterprise search

Amazon Kendra is the intelligent enterprise search AWS AI/ML service. It understands natural-language questions across internal repositories (Amazon S3, Microsoft SharePoint, Salesforce, ServiceNow, Confluence, Google Drive) and returns precise answers, not keyword matches.

Use cases: internal knowledge base, IT help-desk search, customer-facing FAQ search.

For AWS AI/ML services pre-built APIs, always map the noun in the question to a single service. Image or video → Amazon Rekognition. Speech to text → Amazon Transcribe. Text to speech → Amazon Polly. Translate languages → Amazon Translate. Sentiment or entities → Amazon Comprehend. Forms and tables from documents → Amazon Textract. Chatbot → Amazon Lex. Recommendations → Amazon Personalize. Forecasting → Amazon Forecast. Enterprise search → Amazon Kendra. Source ↗

AWS Analytics Services — The Full Stack

AWS analytics services cover ingestion, storage, cataloging, querying, warehousing, big-data processing, BI, search, and governance. The CLF-C02 exam tests recognition, not deep tuning.

Amazon Athena — Serverless SQL on Amazon S3

Amazon Athena runs standard SQL directly on data in Amazon S3 with zero infrastructure. You pay per terabyte scanned. Amazon Athena uses the AWS Glue Data Catalog as its metadata store. Amazon Athena federated queries can also read from Amazon DynamoDB, Amazon RDS, and other sources. Perfect for ad-hoc analysis of log files, CSV exports, Apache Parquet, and Apache ORC datasets in Amazon S3.

Amazon Redshift — Petabyte-scale data warehouse

Amazon Redshift is the AWS analytics service for Online Analytical Processing (OLAP). It is a columnar, massively parallel processing data warehouse that scales to petabytes. Amazon Redshift Serverless auto-provisions capacity. Amazon Redshift Spectrum lets you query exabytes of data in Amazon S3 without loading it. Amazon Redshift excels at complex joins and aggregations over structured data at enterprise scale.

Amazon EMR — Managed Hadoop, Spark, Hive, Presto

Amazon EMR is the managed big-data AWS analytics service that runs Apache Spark, Apache Hadoop, Apache Hive, Presto, Apache HBase, and Apache Flink on EC2, Amazon EKS, AWS Outposts, or Amazon EMR Serverless. Use Amazon EMR when you need full code-level control over Spark jobs, large-scale ETL, or machine-learning preprocessing.

Amazon Kinesis — Real-time streaming

Amazon Kinesis is a family of three streaming AWS analytics services.

Amazon Kinesis Data Streams — a durable real-time stream of records with producer and consumer APIs. You write your own consumers (AWS Lambda, Amazon Kinesis Client Library, AWS Glue streaming). Retention 1–365 days. Ordered per shard.
Amazon Data Firehose (formerly Kinesis Data Firehose) — a fully managed, no-code delivery pipeline. It ingests streams and delivers them to Amazon S3, Amazon Redshift, Amazon OpenSearch Service, Splunk, HTTP endpoints, and more, with optional inline transformation via AWS Lambda and format conversion to Apache Parquet.
Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) — managed Apache Flink for real-time analytics and streaming SQL.

This is the most-tested Amazon Kinesis trap on CLF-C02. Amazon Kinesis Data Streams is for custom real-time processing — you code the consumer. Amazon Data Firehose is for delivery without code — it loads data into Amazon S3, Amazon Redshift, or Amazon OpenSearch Service automatically and is near-real-time (buffering). If the scenario says "no code, deliver straight to S3," pick Amazon Data Firehose. If it says "custom consumer with Lambda" or "sub-second latency," pick Amazon Kinesis Data Streams. Source ↗

Amazon MSK — Managed Streaming for Apache Kafka

Amazon MSK provides fully managed Apache Kafka clusters. Choose Amazon MSK when the organization has existing Kafka expertise or Kafka-based integrations. Amazon MSK Serverless removes broker sizing. Amazon MSK Connect runs Kafka Connect workers.

Amazon OpenSearch Service — Search and observability

Amazon OpenSearch Service is the managed AWS analytics service for OpenSearch (the Apache 2.0 fork of Elasticsearch). Use cases: log analytics, application search, full-text search, security event analytics (SIEM), and observability dashboards via OpenSearch Dashboards (the fork of Kibana).

AWS Glue — Serverless ETL and Data Catalog

AWS Glue is the serverless ETL AWS analytics service. It auto-discovers schemas with Glue Crawlers, stores metadata in the AWS Glue Data Catalog (used by Amazon Athena, Amazon EMR, Amazon Redshift Spectrum), and runs ETL jobs in Apache Spark or Python shell. AWS Glue DataBrew is the visual no-code data prep tool. AWS Glue Studio is the low-code visual ETL designer.

Amazon QuickSight — BI dashboards

Amazon QuickSight is the serverless business-intelligence AWS analytics service. Pay per session with Enterprise edition. Amazon QuickSight Q (now enhanced with Amazon Q in QuickSight) supports natural-language questions that generate charts automatically. Amazon QuickSight SPICE is the in-memory engine that caches data for fast dashboards.

AWS Lake Formation — Data lake governance

AWS Lake Formation sets up and secures a data lake on Amazon S3. It centralizes fine-grained access control (row, column, cell level) across Amazon Athena, Amazon Redshift Spectrum, Amazon EMR, and AWS Glue using the AWS Glue Data Catalog. AWS Lake Formation is the AWS analytics service that answers "who can see what data in our lake."

Side-by-Side Comparisons (High-Value for Exam)

Amazon Athena vs Amazon Redshift

Dimension	Amazon Athena	Amazon Redshift
Data location	Amazon S3 direct	Amazon Redshift storage (or Redshift Spectrum on Amazon S3)
Pricing	Per TB scanned	Per node-hour or Redshift Serverless RPU
Best for	Ad-hoc SQL on raw files	Complex recurring OLAP reports
Setup	Serverless, zero cluster	Provisioned cluster (or serverless)
Scale	Petabyte (but pay-per-scan)	Petabyte, tuned indexes

Pick Amazon Athena for occasional exploration of Amazon S3 data. Pick Amazon Redshift when you need sustained high-performance analytics workloads with BI tool connections.

Amazon Kinesis Data Streams vs Amazon Data Firehose vs Amazon MSK

Dimension	Kinesis Data Streams	Data Firehose	Amazon MSK
Paradigm	Custom stream consumers	No-code delivery	Managed Apache Kafka
Latency	Sub-second	~60 seconds (buffer)	Sub-second
Code required	Yes (consumer)	Minimal	Yes
Destinations	Any (you write it)	S3, Redshift, OpenSearch, Splunk	Any Kafka consumer
Best for	Real-time custom apps	Stream-to-warehouse ETL	Kafka-native orgs

Amazon Lex vs Amazon Bedrock

Amazon Lex is purpose-built for structured chatbots with intents and slots. Amazon Bedrock generates free-form text via foundation models. A modern pattern is Amazon Lex for dialog orchestration combined with Amazon Bedrock for generative fallback responses.

Key Numbers and Must-Memorize Facts

Well-Architected ML pillar: Amazon SageMaker follows the same AWS Well-Architected principles, but you own model data and code (shared responsibility).
Amazon Bedrock retention: customer prompts and completions are not used to train base foundation models.
Amazon Kinesis Data Streams retention: 24 hours default, configurable up to 365 days.
Amazon Data Firehose latency: typically ~60 seconds (buffer size/interval configurable).
Amazon Redshift storage per cluster: petabytes (RA3 node type decouples storage and compute).
Amazon Athena cost model: $5 per TB scanned on-demand (keep Parquet/ORC to cut cost drastically).
Amazon QuickSight pricing: per user (Standard/Enterprise) and per session (Enterprise).
Amazon Rekognition: supports stored image, stored video, and streaming video (via Amazon Kinesis Video Streams).
Amazon Transcribe: supports real-time (streaming) and batch transcription.
AWS Glue Data Catalog: Hive-compatible, shared across Amazon Athena, Amazon EMR, Amazon Redshift Spectrum.

Common Exam Traps

Amazon SageMaker vs Amazon Bedrock: training vs calling. Questions with "pretrained foundation models" and "generative AI API" are Amazon Bedrock. Questions with "build, train, deploy a custom model" are Amazon SageMaker.
Amazon Athena vs Amazon Redshift: direct S3 SQL vs OLAP data warehouse. "Ad-hoc SQL on S3 log files" is Amazon Athena. "Complex joins over a curated star schema for BI dashboards" is Amazon Redshift.
Kinesis Data Streams vs Data Firehose: code vs no-code. "Deliver to S3 without writing consumer code" is Amazon Data Firehose. "Process each record with AWS Lambda within milliseconds" is Amazon Kinesis Data Streams.
Amazon Lex vs Amazon Bedrock: structured chatbot vs free-form generation. "Voice or text bot with intents" is Amazon Lex. "Summarize or generate creative text" is Amazon Bedrock.
Amazon Rekognition vs Amazon Textract: images vs documents. "Detect people in a photo" is Amazon Rekognition. "Extract fields from an invoice PDF" is Amazon Textract.
Amazon Comprehend vs Amazon Kendra: NLP extraction vs search. "Sentiment, entities, PII" is Amazon Comprehend. "Find the right document across SharePoint and S3" is Amazon Kendra.
Amazon EMR vs AWS Glue: DIY Spark vs serverless ETL. "We have Spark developers running custom code" is Amazon EMR. "No-ops ETL with crawlers and Data Catalog" is AWS Glue.
Amazon MSK vs Kinesis: Kafka ecosystem vs AWS-native streaming. "We already use Apache Kafka" is Amazon MSK. "We want AWS-native simplicity" is Amazon Kinesis.
Amazon Q Business vs Amazon Q Developer: business users vs developers. Non-technical users asking questions over internal docs is Amazon Q Business. In-IDE code suggestions is Amazon Q Developer.

Amazon Kendra is a managed AI/ML search product that understands natural-language questions and is optimized for enterprise content across many connectors. Amazon OpenSearch Service is a managed search and analytics engine (fork of Elasticsearch) — you operate the cluster, define indices, and write queries. If the question emphasizes "natural-language question answering over company documents," pick Amazon Kendra. If it emphasizes "build a custom search index or log analytics dashboards," pick Amazon OpenSearch Service. Source ↗

Explorer data shows generative AI questions growing +35 percent. Every recent CLF-C02 sitting includes at least one Amazon Bedrock or Amazon Q question. Do not skip this section. Memorize: Amazon Bedrock = API to foundation models; Amazon Q = pre-built AI assistants (Business, Developer, in QuickSight, in Connect). These two services are the single highest-value additions to the 2024–2026 CLF-C02 exam. Source ↗

AI/ML Services vs Other Service Categories — Boundary with 3.8

Task 3.7 covers AI/ML and analytics. Task 3.8 covers everything else — application integration (Amazon SQS, Amazon SNS, Amazon EventBridge), developer tools (AWS CodePipeline), end-user compute (Amazon WorkSpaces), and IoT (AWS IoT Core). There is one friction point: Amazon Kinesis Video Streams is sometimes referenced with Amazon Rekognition (AI/ML) but technically lives in the streaming family (closer to analytics). On the exam, treat Amazon Kinesis Video Streams as an AWS analytics ingestion service that can feed Amazon Rekognition.

Another boundary: Amazon OpenSearch Service can act as a search back-end or an observability tool. If the scenario is security event analytics, it still sits in AWS analytics services, not in the security domain (Domain 2).

Practice Question Links — Task 3.7 Mapped Exercises

Scenario 1: A marketing team wants to translate product descriptions into 12 languages without hiring translators. Correct choice: Amazon Translate.

Scenario 2: A developer wants to add a chat experience to a mobile app using Anthropic Claude. Correct choice: Amazon Bedrock.

Scenario 3: A data science team needs Jupyter notebooks, distributed training, and managed deployment for a fraud-detection model. Correct choice: Amazon SageMaker.

Scenario 4: A bank wants to extract fields from scanned loan applications. Correct choice: Amazon Textract.

Scenario 5: A SaaS company wants to deliver clickstream events from their website to Amazon S3 for later analysis, with near-real-time buffering and no custom code. Correct choice: Amazon Data Firehose.

Scenario 6: An analyst wants to run ad-hoc SQL queries against JSON logs stored in Amazon S3 without provisioning any cluster. Correct choice: Amazon Athena.

Scenario 7: A BI team wants to build interactive dashboards with natural-language question answering. Correct choice: Amazon QuickSight (with Amazon Q in QuickSight).

Scenario 8: An enterprise wants employees to search across SharePoint, Salesforce, and Amazon S3 using plain-English questions. Correct choice: Amazon Kendra.

Scenario 9: The operations team needs to run complex joins across a petabyte-scale star schema for nightly BI. Correct choice: Amazon Redshift.

Scenario 10: A media company wants to detect unsafe content in user-uploaded videos. Correct choice: Amazon Rekognition.

FAQ — AWS AI/ML & Analytics Services Top Questions

1. What is the difference between Amazon Bedrock and Amazon SageMaker?

Amazon Bedrock gives you API access to pretrained foundation models (Claude, Llama, Titan, Mistral, and more) for generative AI — no training required. Amazon SageMaker is the end-to-end AWS AI/ML platform for building, training, and deploying your own custom models. If you never want to train a model, use Amazon Bedrock. If you need to train or fine-tune with your own data at the code level, use Amazon SageMaker (Amazon SageMaker JumpStart also exposes some foundation models for fine-tuning). On the CLF-C02 exam, questions mentioning "foundation models" or "generative AI" almost always map to Amazon Bedrock.

2. When should I pick Amazon Athena over Amazon Redshift?

Pick Amazon Athena when your data already lives in Amazon S3, queries are ad-hoc or infrequent, and you want zero infrastructure. Pick Amazon Redshift when you need sustained OLAP performance, complex joins, materialized views, and BI-tool connectivity for hundreds of analysts. Amazon Athena pricing is per terabyte scanned — cheap for small or infrequent queries, expensive for repeated full-table scans. Amazon Redshift is per-node-hour (or Redshift Serverless RPU) — cheaper at sustained heavy workload.

3. Is Amazon Kinesis Data Streams the same as Amazon Data Firehose?

No. Amazon Kinesis Data Streams is a durable real-time stream where you write custom consumers (AWS Lambda, Amazon Kinesis Client Library, AWS Glue streaming). Amazon Data Firehose is a no-code delivery pipeline that automatically writes streams to Amazon S3, Amazon Redshift, Amazon OpenSearch Service, or Splunk with near-real-time buffering. If the question says "custom real-time processing," choose Amazon Kinesis Data Streams. If it says "deliver straight to a destination without code," choose Amazon Data Firehose.

4. What is Amazon Q and how is it different from Amazon Bedrock?

Amazon Q is a family of pre-built AI assistants (Amazon Q Business for enterprise Q&A, Amazon Q Developer for coding, Amazon Q in QuickSight for BI, Amazon Q in Connect for agent assistance). Amazon Bedrock is the underlying AWS AI/ML service for accessing raw foundation models via API. Amazon Q is what non-technical users touch through a ready-made assistant; Amazon Bedrock is what developers call from code to build their own generative AI app. Think of Amazon Q as the finished car and Amazon Bedrock as the engine.

5. Which AWS AI/ML service should I use for OCR?

Use Amazon Textract for OCR plus structure extraction from documents, forms, and tables. Amazon Textract is superior to generic OCR because it preserves key/value pairs, table cells, and relationships — perfect for invoices, tax forms, identity documents, and medical records. Plain text-only OCR from images (such as a street sign) can also be handled by Amazon Rekognition's text-in-image detection, but any document-oriented use case (receipts, PDFs, forms) should choose Amazon Textract.

6. Do I need AWS Glue if I already use Amazon Athena?

Often yes — Amazon Athena uses the AWS Glue Data Catalog as its default metadata store. AWS Glue Crawlers auto-discover schemas in Amazon S3 and register them so Amazon Athena knows which columns exist. AWS Glue ETL jobs also transform data into columnar formats like Apache Parquet, which slashes Amazon Athena scan costs. For CLF-C02, remember: AWS Glue = serverless ETL plus Data Catalog; Amazon Athena = serverless SQL. They are complementary, not competing.

7. Are AWS AI/ML services covered under the AWS Shared Responsibility Model?

Yes. AWS owns the security of the AI/ML services (infrastructure, managed model hosting, patching). Customers own the security in the services (training data protection, IAM policies, prompt content, model artifacts, API-key management). This is identical to other managed services like Amazon RDS. For generative AI with Amazon Bedrock, customer prompts and completions are not used to train foundation models — a common compliance concern in exam scenarios.

Summary

AWS AI/ML services divide into three tiers: Amazon SageMaker (custom ML), Amazon Bedrock and Amazon Q (generative AI), and task-specific APIs (Amazon Rekognition, Amazon Comprehend, Amazon Textract, Amazon Transcribe, Amazon Polly, Amazon Translate, Amazon Lex, Amazon Personalize, Amazon Forecast, Amazon Kendra). AWS analytics services split into ingestion (Amazon Kinesis, Amazon MSK), storage and cataloging (Amazon S3 plus AWS Glue Data Catalog, AWS Lake Formation), processing (Amazon Athena, Amazon EMR, AWS Glue), warehousing (Amazon Redshift), search (Amazon OpenSearch Service), and BI (Amazon QuickSight). For CLF-C02, recognize the noun-to-service mapping, remember the SageMaker-vs-Bedrock rule, and never mix up Amazon Kinesis Data Streams with Amazon Data Firehose. Given the +25 percent trend in AWS AI/ML services exam questions and the +35 percent rise in generative AI signals, this is the single highest-ROI topic in Domain 3 to master before your CLF-C02 sitting.

What are AWS AI/ML & Analytics Services?

Why AI/ML & Analytics Services matter for CLF-C02

Plain-Language Explanation: AI/ML & Analytics Services

Analogy 1 — The kitchen brigade (廚房)

Analogy 2 — The Swiss Army knife (瑞士刀)

Analogy 3 — The postal system (郵政系統)

Core Operating Principles — Pre-built AI APIs vs Custom ML vs Generative AI

The Pre-built vs Custom decision tree

Generative AI Services — Amazon Bedrock and Amazon Q

Amazon Bedrock

Amazon Q

Custom ML Platform — Amazon SageMaker

Pre-built AI/ML APIs — The Service-per-Task Catalog

Amazon Rekognition — Image and video analysis

Amazon Comprehend — Natural language processing

Amazon Textract — Document OCR plus forms and tables

Amazon Transcribe — Speech-to-text

Amazon Polly — Text-to-speech

Amazon Translate — Neural machine translation

Amazon Lex — Conversational chatbots

Amazon Personalize — Real-time recommendations

Amazon Forecast — Time-series forecasting

Amazon Kendra — Enterprise search

AWS Analytics Services — The Full Stack

Amazon Athena — Serverless SQL on Amazon S3

Amazon Redshift — Petabyte-scale data warehouse

Amazon EMR — Managed Hadoop, Spark, Hive, Presto

Amazon Kinesis — Real-time streaming

Amazon MSK — Managed Streaming for Apache Kafka

Amazon OpenSearch Service — Search and observability

AWS Glue — Serverless ETL and Data Catalog

Amazon QuickSight — BI dashboards

AWS Lake Formation — Data lake governance

Side-by-Side Comparisons (High-Value for Exam)

Amazon Athena vs Amazon Redshift

Amazon Kinesis Data Streams vs Amazon Data Firehose vs Amazon MSK

Amazon Lex vs Amazon Bedrock

Key Numbers and Must-Memorize Facts

Common Exam Traps

AI/ML Services vs Other Service Categories — Boundary with 3.8

Practice Question Links — Task 3.7 Mapped Exercises

FAQ — AWS AI/ML & Analytics Services Top Questions

1. What is the difference between Amazon Bedrock and Amazon SageMaker?

2. When should I pick Amazon Athena over Amazon Redshift?

3. Is Amazon Kinesis Data Streams the same as Amazon Data Firehose?

4. What is Amazon Q and how is it different from Amazon Bedrock?

5. Which AWS AI/ML service should I use for OCR?

6. Do I need AWS Glue if I already use Amazon Athena?

7. Are AWS AI/ML services covered under the AWS Shared Responsibility Model?

Further Reading

Summary

Official sources