examhub .cc 用最有效率的方法,考取最有價值的認證
Vol. I
本篇導覽 約 34 分鐘

Amazon CloudWatch:指標、日誌與警報

6,800 字 · 約 34 分鐘閱讀

Amazon CloudWatch is the single most tested observability service on the AWS Certified Developer Associate (DVA-C02) exam. Task statement 4.2 — "Instrument code for observability" — hands you a compact but dense target: Amazon CloudWatch Metrics, Amazon CloudWatch Logs, Amazon CloudWatch Alarms, Amazon CloudWatch Dashboards, Amazon CloudWatch Synthetics canaries, Amazon CloudWatch RUM, plus Embedded Metric Format (EMF), subscription filters, metric filters, the CloudWatch Agent, and the difference between Amazon CloudWatch and adjacent services like AWS CloudTrail or AWS Config. This chapter trains you to recognize every Amazon CloudWatch concept the DVA-C02 exam can throw at you and to map each to the correct answer under time pressure. Amazon CloudWatch deserves 8–10 percent of your total study time; mastering Amazon CloudWatch observability is how you convert "my Lambda is broken" questions into guaranteed points.

What Is Amazon CloudWatch?

Amazon CloudWatch is the AWS-managed observability service that collects, stores, queries, visualizes, and alerts on metrics and logs from AWS services, your applications, and on-premises systems. Amazon CloudWatch ingests three signal types — metrics (numeric time series), logs (text / JSON events), and events (now rebranded as Amazon EventBridge) — and exposes them through dashboards, alarms, Logs Insights queries, Synthetics canaries, and the CloudWatch API. Every AWS service publishes a baseline of built-in Amazon CloudWatch metrics automatically, and your application code can push custom Amazon CloudWatch metrics and Amazon CloudWatch Logs events via the AWS SDK or the CloudWatch Agent.

How Amazon CloudWatch Fits the DVA-C02 Exam Map

On DVA-C02, Amazon CloudWatch appears across every domain:

  • Domain 1 (Development): AWS Lambda writes to Amazon CloudWatch Logs automatically; API Gateway access logs land in Amazon CloudWatch Logs; DynamoDB publishes throttling metrics to Amazon CloudWatch.
  • Domain 2 (Security): Amazon CloudWatch Logs encryption with AWS KMS, resource policies for log destination accounts, Amazon CloudWatch Alarms on failed login metrics.
  • Domain 3 (Deployment): AWS CodeDeploy rollback triggers on Amazon CloudWatch Alarms, Amazon CloudWatch dashboards for deployment health.
  • Domain 4 (Troubleshooting): Amazon CloudWatch Logs Insights for root cause analysis, Amazon CloudWatch Synthetics canaries for availability, Embedded Metric Format for high-cardinality metrics.

Memorize Amazon CloudWatch fundamentals and Domain 4 scoring becomes a walkover, while the cross-domain scenarios where Amazon CloudWatch is a supporting actor turn into easy points.

The Amazon CloudWatch Data Model at 30,000 Feet

Every piece of telemetry in Amazon CloudWatch lives in one of two namespaces: the metrics store (key = namespace + metric name + dimensions, value = numeric time series, retention = 15 months with progressive aggregation) or the logs store (key = log group + log stream, value = timestamped event text, retention = configurable 1 day to 10 years or never expire). Alarms watch metric streams. Metric filters convert matching log events into metrics. Subscription filters stream raw log events to Amazon Kinesis, AWS Lambda, or Amazon Kinesis Data Firehose in near real time. Dashboards render metrics and Logs Insights queries. Synthetics canaries generate synthetic metrics and logs from scheduled scripts. Everything on DVA-C02 flows from these primitives.

白話文解釋 Amazon CloudWatch

Amazon CloudWatch 講白了就是 AWS 的「監視室」— 所有服務的心跳、儀表、黑盒子錄音,全部集中在這一個房間牆上。用下面三種類比,Amazon CloudWatch 的幾個抽象概念一次就記牢。

Analogy 1 — The Electrical Grid Control Room

把 Amazon CloudWatch 想成一座電網調度中心。全城每一個變電站(AWS 服務)都自動把電壓、電流、頻率(built-in metrics)拉線到這個控制室的儀表板(Amazon CloudWatch Dashboards);任何一個數值跨過紅線,警報燈(Amazon CloudWatch Alarms)立刻閃爍,自動派遣修復車(Amazon SNS 通知、Auto Scaling、Systems Manager Automation)。黑盒子錄音機(Amazon CloudWatch Logs)同時記下每一次操作日誌,想回查時拿出放大鏡(Amazon CloudWatch Logs Insights)按時段、按關鍵字搜尋。複合警報(composite alarms)是主任級的「當 A 變電站跳閘 B 變電站電壓異常才算緊急」的組合條件,避免半夜被單一偶發警報吵醒。

Amazon CloudWatch = 雲端服務的電網調度室,儀表、警報、黑盒子一站齊全。

Analogy 2 — The Hospital Vital Signs Monitor

Amazon CloudWatch 也像醫院的生命監測系統。每一個病人(AWS 資源、應用程式實例)都接上感測器(metrics collector),每分鐘回報心跳、血壓、血氧(standard 1-minute metrics),重症 ICU 的病人則接上秒級監測(high-resolution 1-second metrics)。護理站的大螢幕(Amazon CloudWatch Dashboards)讓值班醫師一眼看完整個病房。數值跨過紅線時警報器響(Amazon CloudWatch Alarms);連續兩次超標 其他指標一起異常才叫急診(anomaly detection + composite alarm),避免假警報。巡房護士按排定時間用固定腳本做病例檢查(Amazon CloudWatch Synthetics canaries)— 就算沒有真病人,也能先發現設備壞了。

Amazon CloudWatch = AWS 環境的院內監護系統,全自動、全時段、全科別。

Analogy 3 — The Black Box Flight Recorder + Air Traffic Radar

Amazon CloudWatch 還像飛機的黑盒子加塔台雷達。每一次飛行(request)的每一步操作、每一筆錯誤、每一句駕駛艙對話,都寫進黑盒子(Amazon CloudWatch Logs)— 就算飛機失事,降落後也能拆解分析。雷達上(Amazon CloudWatch Metrics)看到的則是即時高度、速度、航向(Invocations、Duration、Errors)。黑盒子可以即時串流到調度中心(subscription filter → Kinesis / Lambda / Firehose),讓地面先發現異常;也能事後用 Amazon CloudWatch Logs Insights 做 filter + stats 的 SQL-like 分析。當管制員要交接給其他塔台(跨帳號 / 跨區 dashboard),Amazon CloudWatch 支援共享與跨帳號觀測性。

三個類比串起來,Amazon CloudWatch 的「集中 × 即時 × 可查 × 可警報 × 可合成」五大性格就全清晰。

Observability in Amazon CloudWatch rests on three pillars: Metrics (time-series numbers, native retention 15 months with aggregation), Logs (timestamped text/JSON events in log groups, configurable retention 1 day – 10 years or never expire), and Traces (handled by AWS X-Ray, integrated with Amazon CloudWatch via ServiceLens). Synthetic monitoring (Canaries, RUM) augments the pillars by generating signal when there is no real traffic. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html

Amazon CloudWatch Logs: Log Groups, Log Streams, and Retention

Amazon CloudWatch Logs is the managed log store that every AWS developer touches within their first hour on AWS.

Log Groups vs Log Streams

An Amazon CloudWatch log group is the top-level container that holds shared configuration — retention policy, KMS encryption key, metric filters, subscription filters, access control. A log stream is the sequence of log events from a single source inside a log group. For Amazon CloudWatch Logs inside AWS Lambda, each execution environment (sandbox) writes to its own stream with a name like 2026/04/20/[$LATEST]abc123. For Amazon EC2 instances, the CloudWatch Agent typically creates one stream per instance per log file.

Naming convention you should memorize: AWS Lambda log groups are always /aws/lambda/FUNCTION_NAME. Amazon API Gateway access logs typically target /aws/apigateway/ACCESS_LOGS/STAGE_NAME. Amazon CloudWatch does not auto-create a log group until first write; pre-create in IaC to set retention from day one.

Retention Policies

By default, Amazon CloudWatch Logs retains logs forever — which is a silent cost grenade. You must explicitly set retention. Supported retention values are 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827 (5 years), 2192 (6 years), 2557 (7 years), 2922 (8 years), 3288 (9 years), 3653 (10 years), or Never Expire. Set retention per log group with aws logs put-retention-policy --log-group-name NAME --retention-in-days 30.

Encryption at Rest

Amazon CloudWatch Logs is always encrypted at rest. By default, AWS manages the key. For compliance you can associate a customer-managed AWS KMS key per log group; Amazon CloudWatch Logs calls kms:GenerateDataKey and kms:Decrypt on your behalf, so the key policy must allow logs.REGION.amazonaws.com via the kms:EncryptionContext:aws:logs:arn condition.

A classic DVA-C02 trap: "the Amazon CloudWatch Logs bill is exploding — what is the cheapest fix?" The answer is almost always set a retention policy on the log groups. Default retention is Never Expire, so storage costs grow forever. Pre-creating log groups in AWS SAM or AWS CloudFormation with RetentionInDays is the idiomatic fix. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SettingLogRetention.html

Amazon CloudWatch Logs Insights: Query Syntax

Amazon CloudWatch Logs Insights is the interactive query engine for Amazon CloudWatch Logs. It parses JSON fields automatically and supports a SQL-like pipeline language.

Core Commands

Every Amazon CloudWatch Logs Insights query is a pipeline of commands separated by |:

  • fields — select which columns to surface (e.g., fields @timestamp, @message, level).
  • filter — restrict rows (e.g., filter level = "ERROR" and duration > 1000).
  • stats — aggregate (e.g., stats count() by bin(5m), stats avg(duration) by functionName).
  • sort — order results (sort @timestamp desc).
  • limit — cap rows (limit 20).
  • parse — extract fields from unstructured messages with a glob or regex pattern (parse @message "user=* action=*" as user, action).
  • display — choose which fields the result table shows.
  • dedup — de-duplicate by chosen fields.

Reserved Fields

Amazon CloudWatch Logs Insights automatically exposes @timestamp, @message, @logStream, @log, and @ingestionTime. For AWS Lambda log groups, Amazon CloudWatch adds @requestId, @duration, @billedDuration, @memorySize, @maxMemoryUsed, @initDuration — extremely useful for cold-start analysis.

Worked Examples

Find the slowest AWS Lambda invocations in the last hour:

fields @timestamp, @requestId, @duration
| filter @type = "REPORT"
| sort @duration desc
| limit 20

Count ERROR-level events per minute bucket:

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(1m)

Parse a custom log line and aggregate by user:

parse @message "user=* action=* latency=*" as user, action, latency
| filter ispresent(user)
| stats avg(latency), count() by user
| sort count desc
| limit 10

Memorize the canonical pipeline: fields → filter → parse → stats → sort → limit. Every Amazon CloudWatch Logs Insights exam question follows this shape. On DVA-C02, parse is the one that catches candidates — it extracts named fields from unstructured text, and is paired with filter ispresent(name) to drop non-matching rows before aggregation. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html

Pricing Model

Amazon CloudWatch Logs Insights is billed by bytes of data scanned. Minimize scan cost by (a) narrowing the log group, (b) narrowing the time window, (c) pre-aggregating via metric filters when possible, and (d) selecting only the fields you need — Amazon CloudWatch does not charge by fields returned but narrowing time and groups is the real lever.

Amazon CloudWatch Logs Subscription Filters

Amazon CloudWatch Logs subscription filters turn Amazon CloudWatch Logs into a real-time event stream.

Supported Destinations

A subscription filter matches a log pattern and streams matching events to one of:

  • AWS Lambda — direct invocation per log batch. Classic pattern for log-triggered alerting and ETL.
  • Amazon Kinesis Data Streams — high-throughput streaming to any consumer (KCL, another AWS Lambda, custom app).
  • Amazon Kinesis Data Firehose — managed delivery to Amazon S3, Amazon Redshift, Amazon OpenSearch Service, or Splunk.

Each log group can have up to 2 subscription filters simultaneously. Filter patterns use the Amazon CloudWatch Logs filter pattern grammar — space-separated tokens, JSON field matchers like { $.level = "ERROR" }, or regular expressions (in the new filter syntax).

Cross-Account Subscription

Subscription filters support cross-account destinations: the source account's log group streams events to a destination AWS Kinesis stream in a central aggregation account. This is the backbone of enterprise log pipelines.

Subscription Filter vs Metric Filter

These two are often confused:

  • Subscription filter = real-time streaming to AWS Lambda, Amazon Kinesis, or Amazon Kinesis Data Firehose. Event payload flows downstream.
  • Metric filter = pattern match that publishes a numeric metric to Amazon CloudWatch Metrics. No event payload flows; only a count or extracted value.

If the scenario says "forward error logs to an ElasticSearch cluster" → subscription filter to Amazon Kinesis Data Firehose. If the scenario says "trigger an alarm when the word ERROR appears 5 times per minute" → metric filter that publishes to a custom Amazon CloudWatch metric, then Amazon CloudWatch Alarm on that metric. Swapping these two is the #1 Amazon CloudWatch Logs trap. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html

Amazon CloudWatch Logs Metric Filters

Amazon CloudWatch Logs metric filters convert log patterns into numeric Amazon CloudWatch metrics.

How Metric Filters Work

A metric filter has three parts: (1) a filter pattern that matches log events, (2) a metric transformation specifying the namespace, metric name, default value, and optional extracted numeric value, and (3) optional dimensions populated from parsed log fields.

Example — publish a metric named ErrorCount to namespace MyApp every time ERROR appears in AWS Lambda logs:

Filter pattern: ERROR
Metric name:    ErrorCount
Namespace:      MyApp
Metric value:   1

Extract a numeric field for aggregation — JSON log { "latency_ms": 234 }:

Filter pattern: { $.latency_ms = * }
Metric name:    ApiLatency
Namespace:      MyApp
Metric value:   $.latency_ms

Metric filters are free (no extra ingestion cost) and evaluated at ingestion time. They back the classic "alarm on error count" pattern.

Metric Filter Limits

  • Up to 100 metric filters per log group.
  • Extracted values use JSON path syntax ($.fieldName) for JSON logs, or positional parse with [ip, user, timestamp] for space-separated logs.
  • Metric filters are not retroactive — they only apply to new log events after creation.

The CloudWatch Agent: EC2 and On-Premises

AWS Lambda, Amazon API Gateway, Amazon DynamoDB, and most managed services publish metrics and logs to Amazon CloudWatch automatically. Amazon EC2 and on-premises servers do not — they need the CloudWatch Agent (formerly called the Unified CloudWatch Agent).

What the CloudWatch Agent Collects

The CloudWatch Agent ships two signal families:

  • System metrics beyond the hypervisor: memory used, swap used, disk filesystem used, process count, TCP connection state. These cannot be seen from outside the instance — the EC2 hypervisor only sees CPU, network, and disk I/O.
  • Logs: any file on disk (/var/log/nginx/access.log, Windows Event Log, IIS logs) streamed to Amazon CloudWatch Logs.

The agent reads a JSON config file (typically managed by AWS Systems Manager Parameter Store or SSM State Manager) that declares which metrics and logs to collect.

On-Premises Support

The CloudWatch Agent runs on on-premises servers using an IAM user's access keys or an IAM Roles Anywhere profile. The same agent binary supports Linux and Windows. This is how hybrid architectures unify observability in Amazon CloudWatch.

Memory and Disk Metrics Are Not Built-In

A frequently missed DVA-C02 detail: Amazon EC2 memory usage and disk-filesystem usage are NOT built-in Amazon CloudWatch metrics. You must install the CloudWatch Agent (or publish custom metrics) to get them. Built-in EC2 metrics cover CPU, network, disk I/O (from the hypervisor), and status checks — not memory, not filesystem usage.

On DVA-C02, if the scenario asks "how do I monitor Amazon EC2 memory usage?" the answer is install the CloudWatch Agent — NOT "use built-in Amazon CloudWatch metrics," because memory is not a built-in metric. Same for disk free space / filesystem usage. This one-liner answers several distinct questions. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html

Embedded Metric Format (EMF)

Embedded Metric Format (EMF) is the modern idiomatic way to publish high-cardinality custom metrics from AWS Lambda without the latency of a synchronous PutMetricData call.

How EMF Works

You emit a structured JSON log line that follows the EMF schema — a top-level _aws block lists the metric name, unit, and dimensions, while sibling properties carry values and dimension values. Amazon CloudWatch Logs automatically parses EMF-shaped events and extracts the metrics into Amazon CloudWatch Metrics at ingestion time, no PutMetricData API call required.

Minimal EMF example (Node.js AWS Lambda):

{
  "_aws": {
    "Timestamp": 1713571200000,
    "CloudWatchMetrics": [{
      "Namespace": "MyApp",
      "Dimensions": [["Service", "Operation"]],
      "Metrics": [{ "Name": "Latency", "Unit": "Milliseconds" }]
    }]
  },
  "Service": "CheckoutService",
  "Operation": "CreateOrder",
  "Latency": 123,
  "RequestId": "abc-123",
  "UserId": "u-42"
}

The ingest parses Latency = 123 as an Amazon CloudWatch metric with dimensions Service=CheckoutService, Operation=CreateOrder, while RequestId and UserId remain in the log event as searchable fields — high cardinality on the log side, safe cardinality on the metric side.

Why EMF Wins for AWS Lambda

  • No synchronous API call — emitting a log line is essentially free, while PutMetricData adds 10–50 ms of latency and costs per call.
  • Atomic — metric and log event are emitted together in one line; they cannot get out of sync.
  • High-cardinality safe — you keep UserId, RequestId, OrderId in the log for debugging but publish only low-cardinality dimensions to Amazon CloudWatch Metrics.
  • Works out of the box — AWS provides the aws-embedded-metrics library for Node.js, Python, Java, and .NET.

The DVA-C02 Exam Guide v2.1 (December 2024) explicitly added "Embedded Metric Format" to Task 4.2. Expect at least one question that contrasts EMF with PutMetricData: EMF is the preferred pattern from AWS Lambda because it avoids an extra API call, keeps high-cardinality attributes in the log, and publishes low-cardinality dimensions to Amazon CloudWatch Metrics in one atomic log write. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html

Amazon CloudWatch Logs Live Tail

Amazon CloudWatch Logs Live Tail is the interactive tail-follow experience for debugging live systems.

What Live Tail Does

Live Tail streams new log events as they are ingested, across one or more log groups, with optional filter patterns. It replaces the old awkward "refresh the console" workflow for real-time debugging. Live Tail is ideal for verifying a deployment is writing healthy logs, reproducing a bug while watching AWS Lambda output, or tailing an Amazon EKS cluster's control plane logs.

Live Tail Limits and Cost

  • Each Live Tail session lasts up to 3 hours before you must restart.
  • Billed per minute of active session, per log group included.
  • Supports up to 10 log groups per session (check current limits).
  • The filter pattern must match the Amazon CloudWatch Logs filter-pattern syntax.

For DVA-C02, recognize Live Tail as the "tail -f for the cloud" counterpart to Amazon CloudWatch Logs Insights (historical queries).

Amazon CloudWatch Metrics: Namespaces and Dimensions

Amazon CloudWatch Metrics is a time-series database indexed by namespace, metric name, and a set of dimensions.

Namespaces

A namespace is the top-level metric container — AWS services publish to built-in namespaces like AWS/Lambda, AWS/EC2, AWS/DynamoDB, AWS/ApiGateway. Your custom metrics go under any namespace you choose (typically MyApp/Service). Namespaces cannot start with AWS/ — that prefix is reserved for AWS services.

Dimensions

A dimension is a name-value pair that scopes a metric. An AWS Lambda Invocations metric has the FunctionName dimension; DynamoDB's ConsumedReadCapacityUnits has TableName, GlobalSecondaryIndexName, Operation dimensions. A metric with no dimensions and a metric with dimensions are different time series — Amazon CloudWatch does not roll dimensionless data up automatically across combinations.

You can attach up to 30 dimensions per metric. Each unique (namespace + metric name + full dimension set) is a separate custom metric billed individually — high cardinality in dimensions is the easiest way to blow up the Amazon CloudWatch bill.

Standard vs High-Resolution Metrics

Amazon CloudWatch supports two resolution tiers:

  • Standard resolution — data points at 1-minute granularity. This is the default for AWS services and custom metrics published with default StorageResolution=60.
  • High-resolution — data points at 1-second granularity. You publish with StorageResolution=1 in PutMetricData. High-resolution metrics cost more and only keep 1-second granularity for 3 hours (then aggregate up to 1 minute for 15 days, 5 minutes for 63 days, and 1 hour for 15 months).

Metric Retention Ladder

Amazon CloudWatch retains metrics for 15 months total with automatic aggregation:

  • 1-second data (high-resolution only): 3 hours.
  • 1-minute data: 15 days.
  • 5-minute data: 63 days.
  • 1-hour data: 15 months.

Beyond 15 months, metrics roll off. If you need longer retention, export metrics via AWS SDK to Amazon S3 or a data warehouse.

Publishing Custom Metrics with PutMetricData

The traditional way to publish custom Amazon CloudWatch metrics is the PutMetricData API.

API Shape

cloudwatch.put_metric_data(
    Namespace="MyApp",
    MetricData=[{
        "MetricName": "OrdersProcessed",
        "Dimensions": [{"Name": "Env", "Value": "prod"}],
        "Value": 1.0,
        "Unit": "Count",
        "StorageResolution": 60
    }]
)

PutMetricData Best Practices

  • Batch up to 1,000 data points per call to reduce API overhead and cost.
  • Set the right Unit (Count, Seconds, Milliseconds, Bytes, Percent, etc.) — Amazon CloudWatch displays alarms more correctly and graphs align properly.
  • Prefer EMF from AWS LambdaPutMetricData adds synchronous latency and costs per call; EMF avoids both.
  • Never publish high-cardinality dimensionsUserId as a dimension = one custom metric per user = bill disaster. Use EMF to keep those attributes in the log and publish only aggregated metrics.

Amazon CloudWatch Metrics is billed per unique combination of (namespace + metric name + dimensions). Putting UserId, OrderId, or RequestId as a dimension creates one billable custom metric per unique value. Use EMF to retain high-cardinality fields as log properties while publishing only low-cardinality dimensions as metrics. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html

Amazon CloudWatch Alarms

Amazon CloudWatch Alarms watch a metric (or a metric math expression) and transition through three states.

Alarm States

  • OK — most recent data points are inside the threshold.
  • ALARM — most recent data points breached the threshold.
  • INSUFFICIENT_DATA — not enough data points to decide (new alarm, or metric source stopped reporting).

You configure: metric + statistic (Average, Sum, Maximum, Minimum, p99, etc.) + period (60 s, 300 s, ...) + evaluation periods (how many periods to consider) + datapoints-to-alarm (how many of those periods must breach). Example: "CPU p95 > 80% for 3 of 5 evaluation periods of 1 minute" gives a robust alarm that shrugs off single-minute spikes.

Treat Missing Data

The TreatMissingData parameter decides how gaps are interpreted — missing (ignored), notBreaching (treated as OK), breaching (treated as ALARM), or ignore (state unchanged). Choice matters for sparse metrics like error counts that only publish when errors happen.

Alarm Actions

Amazon CloudWatch Alarms can trigger actions on state change:

  • Amazon SNS topic — the universal fan-out (email, SMS, Slack via AWS Chatbot, AWS Lambda, HTTPS webhook).
  • Auto Scaling action — scale up / scale down the Auto Scaling group.
  • Amazon EC2 action — stop, terminate, reboot, or recover the instance that hosts the metric.
  • AWS Systems Manager action — create an Incident Manager incident, or run an automation document.
  • AWS Lambda function (via the newer alarm action) — trigger arbitrary remediation.

Composite Alarms

A composite alarm combines the states of multiple child alarms with a boolean expression:

ALARM("HighCPU") AND (ALARM("HighErrorRate") OR ALARM("LowDisk"))

Composite alarms suppress alarm storms — when upstream fails, downstream alarms cascade; a composite condition catches the real root and stays quiet on the noise. Composite alarms also support a SuppressorAlarm (masking child alarms while a maintenance window alarm is active).

Metric Math

Amazon CloudWatch alarms can evaluate metric math expressions rather than a single metric. Use this to divide error count by total invocations for an error rate, sum metrics across dimensions, or apply anomaly detection.

Example error rate alarm — fire when Errors / Invocations > 0.05 over 5 minutes:

m1 = Errors      (AWS/Lambda, FunctionName=CheckoutFn, Sum, 300s)
m2 = Invocations (AWS/Lambda, FunctionName=CheckoutFn, Sum, 300s)
e1 = m1 / m2
Alarm: e1 > 0.05 for 3 of 5 datapoints

Anomaly Detection Bands

Amazon CloudWatch anomaly detection fits a machine-learning model (seasonal + trend) on a metric's history and emits an expected band of values. You alarm when the metric leaves the band, not on a static threshold. Anomaly detection handles daily and weekly seasonality automatically — ideal for traffic-driven metrics that static thresholds cannot pin down.

DVA-C02 increasingly tests "how do I avoid false alarms while catching real incidents?" The answer combines (a) anomaly detection bands instead of static thresholds for seasonal metrics, (b) composite alarms to require co-occurring signals, and (c) SuppressorAlarm to silence child alarms during known maintenance. Memorize this triad. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create_Composite_Alarm_How_To.html

Amazon CloudWatch Dashboards

Amazon CloudWatch Dashboards are the visualization layer — a canvas of widgets driven by metrics, metric math, and Logs Insights queries.

Widget Types

  • Line, stacked area, bar, number — time-series metric widgets.
  • Text (Markdown) — runbook snippets, contact info.
  • Log table — render an Amazon CloudWatch Logs Insights query result.
  • Alarm status — show current alarm states.
  • Explorer — auto-grouped view over tags or resource groups.

Each widget holds a metric or metric-math expression, a time range, and a statistic. Dashboards can be auto-refreshed at 10 s, 1 min, 2 min, 5 min, or 15 min intervals.

Shared and Cross-Account Dashboards

  • Shareable dashboards — generate a public URL (protected by email link, SSO, or an IP allowlist) for read-only external viewers.
  • Cross-account dashboards — with AWS Organizations + the Amazon CloudWatch cross-account observability feature, one central monitoring account can render metrics and logs from many source accounts on one dashboard. This is the foundation of centralized platform engineering observability.

Pricing is flat: $3.00 per dashboard per month, for dashboards with more than 3 widgets (the first 3 dashboards with up to 50 metrics are free).

Amazon CloudWatch Synthetics Canaries

Amazon CloudWatch Synthetics canaries are scheduled scripts that simulate user behavior against your endpoints — the synthetic-monitoring side of observability.

Canary Blueprints

You pick from standard blueprints or write custom scripts:

  • Heartbeat Monitor — hit an HTTPS URL and verify a 200 response.
  • API Canary — sequence of REST calls with assertions on response body and headers.
  • Broken Link Checker — crawl a page and verify every link resolves.
  • Visual Monitor — screenshot comparison against a baseline.
  • GUI Workflow — Puppeteer / Selenium-style scripted user journey (login, add to cart, checkout) using the Amazon CloudWatch Synthetics Recorder.

Canaries run on a schedule (rate expression rate(5 minutes), cron, or rate(1 minute) for 1-minute frequency) and publish metrics (SuccessPercent, Duration, 4xx, 5xx) to namespace CloudWatchSynthetics plus screenshots and HAR files to Amazon S3.

Canary Scripts Are AWS Lambda Under the Hood

A canary runs as an AWS Lambda function you do not manage directly — Amazon CloudWatch Synthetics packages the Node.js / Python + Puppeteer / Selenium runtime for you. You provide the script, Amazon CloudWatch runs it.

Alarms on Canary Metrics

Canaries are most valuable when paired with Amazon CloudWatch Alarms on SuccessPercent < 100 — this catches outages the moment they start, before a real user complains. Canaries run 24/7, so they provide baseline availability signal even on 3 a.m. no-traffic periods.

CloudWatch RUM: The Real-User Counterpart

Amazon CloudWatch RUM (Real User Monitoring) is the real-user browser-side telemetry that complements canaries. You embed the RUM JavaScript snippet in your web app; it reports Core Web Vitals (LCP, FID, CLS), page load times, JavaScript errors, and user session flow to Amazon CloudWatch. RUM and Synthetics canaries cover the full "real-user vs synthetic" matrix.

Canaries = simulated user, runs on schedule, catches outages in no-traffic windows. RUM = real user in the browser, runs only when someone is using your app, catches per-geo / per-browser regressions. Use both — canaries for availability SLOs, RUM for performance SLOs. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html

ServiceLens and Contributor Insights

Two advanced Amazon CloudWatch features appear on DVA-C02 as supporting actors.

ServiceLens

Amazon CloudWatch ServiceLens stitches together AWS X-Ray traces, Amazon CloudWatch Metrics, Amazon CloudWatch Logs, and Synthetics canaries into one unified service map. You see each node's health (alarm status), metrics (latency, error rate), traces (slowest operations), and logs (recent errors) without switching consoles. ServiceLens is the recommended starting point for root-cause analysis on microservice architectures.

Contributor Insights

Amazon CloudWatch Contributor Insights identifies the top N contributors to a metric over a time window — e.g., "which 5 IP addresses sent the most requests to the ALB," "which 10 DynamoDB partition keys drove the most throttles." You define rules that parse log fields and group by contributor key. Contributor Insights is the fastest way to find the hot-key producer in a DynamoDB throttling scenario.

On DVA-C02, the scenario "which partition key is throttling my DynamoDB table?" points to Contributor Insights. The scenario "a user reports slow checkout — trace it across services" points to ServiceLens. Different tools, different questions. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ServiceLens.html

Amazon CloudWatch vs AWS CloudTrail vs AWS Config

The DVA-C02 exam loves to confuse these three. Scope them cleanly:

  • Amazon CloudWatchoperational observability. Metrics (numeric time series), Logs (application / system log events), Alarms, Dashboards. Answers "is my application healthy right now?"
  • AWS CloudTrailaudit log of AWS API calls. Every call to AWS APIs — who called, when, from which IP, which resource, what parameters. Answers "who deleted my S3 bucket?" CloudTrail events can be delivered to Amazon CloudWatch Logs for alerting (e.g., alarm on DeleteBucket or root-user login).
  • AWS Configconfiguration compliance. Snapshots the configuration of every resource over time and evaluates rules. Answers "is my resource compliant with my policies, and when did it drift?"

AWS CloudTrail + Amazon CloudWatch Logs is the idiomatic way to alarm on suspicious API activity. AWS Config + AWS Lambda remediation is the idiomatic way to auto-fix compliance drift. Amazon CloudWatch alone covers app-level health.

"Who called the AWS API?" = AWS CloudTrail. "Is my resource configured correctly over time?" = AWS Config. "Is my application healthy right now?" = Amazon CloudWatch. When a scenario asks about API-call auditing, do not answer Amazon CloudWatch Logs alone — the source of truth is AWS CloudTrail, which may stream into Amazon CloudWatch Logs as a downstream target. Reference: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html

Amazon CloudWatch Cost Considerations

Amazon CloudWatch pricing is where teams get silently expensive. The DVA-C02 exam occasionally asks "cheapest way to reduce Amazon CloudWatch bill."

Amazon CloudWatch Logs Cost Drivers

  • Ingestion — billed per GB ingested (the dominant cost for chatty apps).
  • Storage — billed per GB per month stored, compounded by default Never Expire retention.
  • Logs Insights queries — billed per GB scanned.
  • Live Tail — billed per minute per log group.

Biggest wins: set retention policies, trim log verbosity in hot paths, use log levels, prefer EMF (the log line is one structured event, not three separate prints), and use metric filters for counts instead of downloading logs to count.

Amazon CloudWatch Metrics Cost Drivers

  • Custom metrics — billed per unique (namespace + metric name + dimension set) per month. High-cardinality dimensions multiply fast.
  • PutMetricData API calls — billed per 1,000 calls. Batch up to 1,000 data points per call.
  • High-resolution metrics — cost more than standard-resolution. Use only when you truly need 1-second granularity.
  • GetMetricData / GetMetricStatistics API — billed per metric queried.

Biggest wins: avoid user/order/request IDs as dimensions (use EMF to keep them in logs), batch PutMetricData, and use standard resolution unless you need sub-minute granularity.

Amazon CloudWatch Alarms and Dashboards Cost

  • Standard alarms — $0.10 per alarm per month.
  • High-resolution alarms — $0.30 per alarm per month.
  • Composite alarms — $0.50 per alarm per month.
  • Anomaly detection alarms — $0.30 per alarm per month.
  • Dashboards — $3.00 per dashboard per month (first 3 with up to 50 metrics free).

Keep alarms focused and composite — one composite parent alarm paging the on-call is cheaper and quieter than 20 child pages.

Amazon CloudWatch Common Exam Traps

Knowing the traps is worth as many points as knowing the docs.

Trap 1 — Subscription Filter vs Metric Filter

Subscription filter = stream events to AWS Lambda / Amazon Kinesis / Amazon Kinesis Data Firehose. Metric filter = publish a number to Amazon CloudWatch Metrics. Pick based on whether the downstream wants the payload or just a count.

Trap 2 — Default Log Retention Is Forever

Amazon CloudWatch Logs default retention is Never Expire. Set RetentionInDays explicitly in your IaC to cap storage cost.

Trap 3 — EC2 Memory / Disk Not Built-In

Amazon EC2 built-in metrics miss memory and filesystem usage. Install the CloudWatch Agent, or the exam scenario is unsolvable with built-in metrics alone.

Trap 4 — EMF vs PutMetricData From AWS Lambda

Inside AWS Lambda, prefer EMF (structured log → metric) over PutMetricData (synchronous API call). V2.1 emphasis.

Trap 5 — CloudWatch vs CloudTrail vs Config

Operational health = Amazon CloudWatch. API audit = AWS CloudTrail. Configuration compliance = AWS Config. Three different questions, three different services.

Trap 6 — Standard vs High-Resolution Metrics

Standard resolution = 1 minute (default, cheaper). High-resolution = 1 second (StorageResolution=1, costs more, only used when truly needed).

Trap 7 — Alarm Evaluation vs Datapoints

EvaluationPeriods = 5 and DatapointsToAlarm = 3 means "3 out of 5 periods breach," not "5 consecutive breaches." This M-of-N model avoids flapping.

Trap 8 — Canary Alarm Target

Synthetics canaries publish to namespace CloudWatchSynthetics with metrics SuccessPercent, Duration, 4xx, 5xx. Alarm on SuccessPercent < 100 for the fastest outage detection.

Trap 9 — Composite Alarm Suppression

A composite alarm's SuppressorAlarm silences child pages during maintenance windows — do not hack around with disabling alarms manually.

Trap 10 — Logs Insights Query Pipeline Order

fields → filter → parse → stats → sort → limit. parse extracts fields from text; filter ispresent(name) then drops non-matching rows.

FAQ — Amazon CloudWatch Observability

Q1. What is Amazon CloudWatch in one sentence for DVA-C02?

Amazon CloudWatch is the AWS-managed observability service that ingests metrics, logs, and synthetic signals from AWS services, your applications, and on-premises systems; stores them with configurable retention; and exposes them through Amazon CloudWatch Dashboards, Amazon CloudWatch Alarms, Amazon CloudWatch Logs Insights, Amazon CloudWatch Synthetics canaries, and the Amazon CloudWatch API. For DVA-C02, Amazon CloudWatch is the default answer for any "monitor / alert / troubleshoot" scenario and a supporting actor in nearly every other domain.

Q2. How do Amazon CloudWatch Logs subscription filters differ from Amazon CloudWatch Logs metric filters?

A subscription filter streams raw log events in near real time to AWS Lambda, Amazon Kinesis Data Streams, or Amazon Kinesis Data Firehose — the downstream receives the event payload and can process it (ETL, alerting, forwarding to OpenSearch). A metric filter matches a log pattern and publishes a numeric Amazon CloudWatch metric (count, or an extracted numeric value); no event payload flows. If the question asks "forward logs downstream" pick subscription filter; if it asks "alarm when keyword appears N times" pick metric filter.

Q3. What is Embedded Metric Format and why should AWS Lambda use it?

Embedded Metric Format (EMF) is a JSON log schema where a top-level _aws block declares metric names and dimensions and sibling properties carry the values. Amazon CloudWatch Logs parses EMF at ingestion and automatically creates metrics in Amazon CloudWatch Metrics — no PutMetricData API call required. AWS Lambda should use EMF because (a) it avoids the 10–50 ms latency of a synchronous API call, (b) it keeps high-cardinality fields like UserId and RequestId in the log for debugging while publishing only low-cardinality dimensions as metrics, and (c) the AWS-provided aws-embedded-metrics library makes emission one line.

Q4. How do I monitor Amazon EC2 memory and disk usage with Amazon CloudWatch?

Built-in Amazon CloudWatch metrics for Amazon EC2 cover CPU, network, disk I/O (from the hypervisor), and status checks — but NOT memory usage and NOT filesystem usage. To monitor those, install the CloudWatch Agent on the instance (or on-premises server) with a JSON config that declares the metrics to collect (mem_used_percent, disk_used_percent). The agent publishes the metrics to a namespace (typically CWAgent) and you alarm on them like any other Amazon CloudWatch metric.

Q5. What is the difference between Amazon CloudWatch, AWS CloudTrail, and AWS Config?

Amazon CloudWatch is for operational telemetry — metrics, logs, alarms, dashboards, answering "is my application healthy right now?" AWS CloudTrail is the audit log of every AWS API call — who called, from where, what parameters — answering "who deleted my bucket?" AWS Config records the configuration of every resource over time and evaluates compliance rules — answering "is my resource still compliant and when did it drift?" Amazon CloudWatch Logs often serves as a downstream destination for AWS CloudTrail events so that you can alarm on suspicious API activity, but the source of truth for API audit is AWS CloudTrail.

Q6. How do composite alarms and anomaly detection reduce false alarms?

A composite alarm combines child alarm states with a boolean expression (e.g., ALARM(HighCPU) AND ALARM(HighErrorRate)) — the composite only fires when the co-occurrence you care about is true, suppressing cascading noise. Anomaly detection fits a seasonal + trend machine-learning model on a metric's history and emits an expected band; you alarm when the metric leaves the band rather than on a static threshold, so normal daily and weekly patterns do not trigger false pages. Pair composite alarms with anomaly detection and add a SuppressorAlarm for maintenance windows to achieve quiet-but-smart paging.

Q7. What is Amazon CloudWatch Logs Insights and what does its query pipeline look like?

Amazon CloudWatch Logs Insights is an interactive SQL-like query engine for Amazon CloudWatch Logs. The canonical pipeline is fields → filter → parse → stats → sort → limit. fields selects columns, filter narrows rows, parse extracts fields from unstructured text via glob or regex, stats aggregates (e.g., stats avg(duration) by functionName), sort orders, and limit caps rows. Amazon CloudWatch Logs Insights pre-parses JSON log fields automatically and exposes AWS Lambda specifics like @requestId, @duration, @billedDuration, @memorySize, @maxMemoryUsed, and @initDuration. Queries are billed per GB of data scanned.

Q8. What are Amazon CloudWatch Synthetics canaries and how do they differ from Amazon CloudWatch RUM?

Amazon CloudWatch Synthetics canaries are scheduled scripts (Node.js / Python with Puppeteer or Selenium) that simulate user behavior against your endpoints — heartbeat checks, API sequences, GUI workflows, visual regression. They publish metrics (SuccessPercent, Duration, 4xx, 5xx) and screenshots, and alarm on SuccessPercent < 100 to detect outages even when no real users are present. Amazon CloudWatch RUM is a browser-side JavaScript snippet that captures real user behavior — Core Web Vitals, page load times, JavaScript errors — for in-production user experience data. Use canaries for availability SLOs and RUM for real-user performance SLOs.

Q9. What determines the cost of Amazon CloudWatch Logs and how do I reduce it?

Amazon CloudWatch Logs cost has three main drivers: ingestion (per GB), storage (per GB per month with default Never Expire retention), and Logs Insights queries (per GB scanned). Biggest wins: set retention policies (default is forever), lower log verbosity in hot paths, use log levels, prefer Embedded Metric Format (one structured event carries metric + searchable context), replace log-counting download workflows with metric filters, and narrow Logs Insights time windows and log-group scope before each query to reduce scan bytes.

Q10. What are the must-memorize Amazon CloudWatch limits for DVA-C02?

Memorize: metric retention 15 months with automatic aggregation (1 s → 3 h, 1 min → 15 days, 5 min → 63 days, 1 h → 15 months); 30 dimensions per metric maximum; 2 subscription filters per log group maximum; 100 metric filters per log group; PutMetricData batches up to 1,000 data points per call; standard alarm $0.10/month, composite $0.50/month, anomaly detection $0.30/month; Amazon CloudWatch Dashboards flat $3.00/dashboard/month above the free tier; Logs Insights billed per GB scanned; log retention values from 1 day to 10 years or Never Expire. These numbers answer the majority of raw-recall Amazon CloudWatch questions on DVA-C02.

Summary — Amazon CloudWatch Observability at a Glance

  • Amazon CloudWatch is the central observability service on DVA-C02, mapping directly to Task 4.2 and appearing as a supporting actor in every other domain.
  • Three primitives: Metrics (time series, 15-month retention with aggregation), Logs (log groups and streams, configurable retention, default Never Expire), and Alarms (OK / ALARM / INSUFFICIENT_DATA state machine).
  • Amazon CloudWatch Logs Insights query pipeline: fields → filter → parse → stats → sort → limit; reserved fields @timestamp, @message, @requestId, @duration, @initDuration for AWS Lambda.
  • Subscription filters stream log events to AWS Lambda, Amazon Kinesis Data Streams, or Amazon Kinesis Data Firehose (2 per log group). Metric filters publish numeric metrics from log patterns (100 per log group).
  • The CloudWatch Agent is required to collect memory, disk-filesystem, and custom logs from Amazon EC2 and on-premises servers — these are NOT built-in.
  • Embedded Metric Format emits structured JSON logs that become Amazon CloudWatch metrics automatically — V2.1 emphasis, preferred pattern from AWS Lambda over PutMetricData.
  • Amazon CloudWatch Logs Live Tail is the real-time tail-follow experience; Amazon CloudWatch Logs Insights is the historical SQL-like query engine.
  • Metrics split into standard resolution (1 minute, default) and high-resolution (1 second, costs more, 3-hour 1-second retention).
  • PutMetricData supports batching up to 1,000 data points; avoid high-cardinality dimensions — they explode the bill.
  • Alarms support metric math for derived metrics (error rate = errors / invocations), composite alarms for boolean combinations with suppressors, and anomaly detection bands for seasonal metrics.
  • Alarm actions: Amazon SNS, Auto Scaling, Amazon EC2 stop / terminate / reboot / recover, AWS Systems Manager automation, AWS Lambda.
  • Dashboards support cross-account rendering via AWS Organizations + Amazon CloudWatch cross-account observability, and can be shared with public or IP-restricted URLs.
  • Synthetics canaries are scheduled AWS Lambda scripts (Puppeteer / Selenium) that simulate users; alarm on SuccessPercent < 100. Amazon CloudWatch RUM captures real-user browser telemetry including Core Web Vitals.
  • ServiceLens unifies X-Ray traces, Amazon CloudWatch Metrics, Amazon CloudWatch Logs, and Synthetics on one service map; Contributor Insights finds top-N contributors to a metric (hot partition keys, noisy IPs).
  • Amazon CloudWatch (operational health) vs AWS CloudTrail (API audit) vs AWS Config (configuration compliance) — three distinct scopes the exam loves to test.
  • Cost control: set log retention, prefer EMF over PutMetricData, avoid high-cardinality dimensions, batch metric writes, use standard resolution by default, and consolidate pages with composite alarms.

官方資料來源