Why Log Analysis Matters for SCS-C02
Log analysis is the backbone of cloud security operations, and the SCS-C02 exam treats it as a first-class skill under Domain 2 task statement 2.5. Every detective control you build — from GuardDuty to Security Hub to custom Lambda rules — eventually produces logs that must be parsed, normalized, correlated, and queried to answer real questions. Log analysis turns raw JSON, syslog, and flow records into actionable intelligence: who assumed which role, which IP scanned your VPC, which WAF rule blocked an attack, and whether last night's anomaly was a misconfigured cron job or a real intrusion. AWS gives you five complementary log analysis surfaces — Amazon Athena for ad-hoc SQL over S3, CloudWatch Logs Insights for live streams, CloudTrail Insights for management-event anomaly detection, Security Hub Insights for finding aggregation, and Amazon Security Lake for OCSF-normalized cross-account analysis — and a competent SCS-C02 candidate must know when to reach for each.
The exam will not ask you to write a complete Athena DDL from memory. Instead, it tests service selection and architectural fit: given a scenario with cost constraints, query latency targets, retention windows, multi-account scope, and integration requirements, which log analysis tool is correct, and how should you design the underlying storage so that log analysis is fast, cheap, and tamper-resistant? This topic walks through the patterns that show up most often in scored questions and gives you the reasoning frameworks to handle novel scenarios.
Security Logging and Monitoring is 18 percent of SCS-C02. Within that domain, task 2.5 (log analysis) directly tests Athena, CloudWatch Logs Insights, CloudTrail Insights, and Security Hub insights — alongside the newer Security Lake. Expect at least 4 to 7 questions that require log analysis service selection. See the official exam guide.
The Five Log Analysis Surfaces at a Glance
AWS does not have a single "log analysis service." It has a layered set of tools, each tuned for a specific access pattern, retention window, and cost profile. Memorize the five surfaces and the boundary between them, because the exam will hand you a scenario and expect you to pick exactly one.
The first surface is Amazon Athena — serverless SQL over data in S3. Athena reads CloudTrail logs, VPC Flow Logs, ALB logs, CloudFront logs, WAF logs, and any custom JSON or Parquet you drop into a bucket. It uses the AWS Glue Data Catalog for schema and is billed per terabyte scanned. Athena is the right answer when the question mentions "query S3 logs", "ad-hoc", "long retention", or "forensic investigation across months of data".
The second surface is CloudWatch Logs Insights — a query language built into CloudWatch Logs that operates over log groups (not S3). It is fast for the last days to weeks of logs that are still in CloudWatch retention. It is the right answer for "interactive troubleshooting of Lambda errors", "failed login spikes in the last hour", or "application logs that are streamed via the CloudWatch agent".
The third surface is CloudTrail Insights — a feature of CloudTrail itself that uses machine learning to detect anomalous management event API call rates. It does not query, it detects. It is the right answer when the scenario mentions "unusual API call volume" or "baseline drift on management events".
The fourth surface is Security Hub Insights — saved filtered views over the AWS Security Finding Format (ASFF) findings that Security Hub aggregates. Insights are not arbitrary log queries; they are filters over already-normalized security findings.
The fifth surface is Amazon Security Lake — a managed data lake that ingests logs from native AWS sources and third-party partners, normalizes them into the Open Cybersecurity Schema Framework (OCSF) format, stores them as Parquet in a customer-owned S3 bucket, and exposes them to subscribers through Athena, OpenSearch, or third-party SIEMs. Security Lake is the right answer for "multi-account, multi-source, schema-normalized analysis" and "share normalized logs with a SIEM team".
S3-resident logs and SQL ad-hoc → Athena. Live CloudWatch log groups → Logs Insights. Anomaly on API call rate → CloudTrail Insights. Filtered finding view → Security Hub Insights. Cross-account OCSF-normalized data lake → Security Lake. See Athena docs and Security Lake docs.
Plain-Language Explanation:
如果把雲端日誌想成廚房,CloudTrail、VPC Flow Logs、WAF logs 就是各種食材,從不同產地(服務)送到你的中央倉庫 S3。光有食材不能吃,要有刀工跟食譜,才會變成菜。Athena 就像那把瑞士刀——你想切什麼就切什麼,不需要先架烤箱、不需要先開冰箱,臨時想吃什麼直接拿出來,刀的鋒利度(效能)跟食材怎麼擺(partition、Parquet)直接決定速度和帳單。
第二個類比:CloudWatch Logs Insights 就像便利商店。它離你最近、24 小時開門、商品種類有限但拿了就走。你不會在便利商店辦年夜飯(Athena 才適合大規模分析),但臨時想查最近一小時的 Lambda 錯誤、五分鐘前的登入失敗,便利商店就是最快的選擇。Logs Insights 的查詢語言(filter / stats / parse / sort)就是便利商店的櫃位編號——熟了之後幾秒鐘就能找到要的東西。
第三個類比:Security Lake 就像郵政系統。每個服務(CloudTrail、VPC、Route 53、Security Hub)寫信的格式不同,有人手寫、有人打字、有人用毛筆,收信端會瘋掉。郵政總局把所有信件統一裝進標準信封(OCSF schema),收件人(Athena 訂閱者、OpenSearch、第三方 SIEM)只要會拆一種信封就能讀全部來源的內容。這就是 normalization 的價值——你的 SOC 分析師不用再為每種 log format 寫一支 parser。
OCSF is an open, vendor-neutral schema for security events, originally developed by Splunk, AWS, and other vendors. Security Lake stores all native and custom sources as Parquet files conforming to OCSF event classes (e.g. api_activity, network_activity, dns_activity, file_activity). Subscribers query a unified taxonomy regardless of the original source format. See the Security Lake OCSF mapping reference.
Amazon Athena Query Patterns for Security
Athena is the SCS-C02 default for "I have months of CloudTrail or VPC Flow Logs in S3 and need to find evidence." Three things make Athena work well: a sane Glue Data Catalog table, partition projection so you don't pay for full-bucket scans, and Parquet (or at least gzipped JSON) so each scan reads compressed columnar data instead of raw text.
CloudTrail Query Patterns
For CloudTrail, the table schema is published by AWS and you can create it with one CREATE EXTERNAL TABLE. The most common security queries are:
- Find AssumeRole abuse: filter
eventName='AssumeRole'and group byuserIdentity.arn, the source IP, and the assumed role ARN; look for principals assuming roles they have never touched before, or short bursts of assumes from non-corporate IPs. - Identify unusual region usage: select
awsRegion,eventSource, anduserIdentity.userName, group by region, and flag any region outside your approved list. This catches credential theft when an attacker uses an unused region (e.g.ap-south-1) hoping the victim isn't watching. - User activity timeline: filter on
userIdentity.arn = 'arn:aws:iam::...:user/alice'over the past 30 days, order byeventTime, and you have a chronological audit trail. Combine withsourceIPAddressanduserAgentto spot machine-generated calls (CLI vs SDK vs console). - Detect console logins without MFA:
eventName='ConsoleLogin'andadditionalEventData.MFAUsed='No'— a classic finding for compliance auditors.
CloudTrail Lake is a fully managed query engine that uses an immutable event data store with up to 7 years of retention and pre-defined SQL. Athena is BYO storage — you control S3, lifecycle, encryption, and partitions. Use CloudTrail Lake when you want managed retention and compliance evidence. Use Athena when you have a wider data lake and need to JOIN across CloudTrail, VPC Flow Logs, WAF logs, and custom data. See the CloudTrail Lake docs.
VPC Flow Logs Query Patterns
VPC Flow Logs (especially v5 with extended fields) are the canonical network detective control. Common Athena patterns include:
- Top talkers: group by
srcaddranddstaddr, sumbytes, order descending. The top 20 rows often surface a runaway data-exfiltration job. - REJECTED traffic patterns: filter
action='REJECT', group bysrcaddranddstport. A spike of REJECTs on port 22 from a single external IP usually means an SSH brute-force scan. - Unusual external IP flows: join Flow Logs against a list of known corporate CIDRs; any flow whose
srcaddranddstaddrare both outside both your VPC CIDR and your corporate range is suspicious. - Egress-only abuse: find ENIs whose
bytes_sentgreatly exceedsbytes_receivedand that talk to ports 80 / 443 of unknown destinations — classic data exfiltration via HTTPS.
WAF Log Patterns
WAF logs are JSON, store rich metadata about each request, and are a goldmine for application-layer attack analysis. Useful queries include:
- Blocked requests by rule: group by
terminatingRuleIdand count, ordered descending, to see which managed or custom rule is doing most of the work. - Geographic distribution of attacks: parse the
httpRequest.countryfield and bucket by country code. A large block from a country your business doesn't serve is a candidate for a geo-block rule. - Bot signatures: group by
httpRequest.headers.User-Agentand look for empty UAs, scripted UAs, and outdated browsers.
A naive SELECT * FROM cloudtrail_logs WHERE eventTime > ... scans every byte of every CloudTrail file in your bucket — easily hundreds of GB and tens of dollars per query. Always partition by year, month, and day, and filter on the partition columns in the WHERE clause. Even better, use partition projection so Athena calculates partitions from the column expression and you don't have to run MSCK REPAIR TABLE after every new daily partition. See partition projection docs.
Glue Data Catalog and Partition Projection
The Glue Data Catalog is the metadata layer Athena, EMR, and Redshift Spectrum all share. For security log analysis, you typically:
- Create one Glue database (e.g.
security_logs). - Create one external table per log source (cloudtrail, vpcflow, alb, cloudfront, waf, route53query) pointing at the matching S3 prefix.
- Use partition projection instead of crawler-managed partitions; partition projection computes partition values from the table properties, so daily new prefixes are queryable instantly without running a crawler.
- Convert to Parquet with a CTAS (
CREATE TABLE AS SELECT) job for any logs you query repeatedly. Parquet typically reduces scan size by 10x or more and applies columnar pushdown so aSELECT eventNamequery reads only the eventName column.
For Athena: (1) partition projection, (2) Parquet, (3) workgroup-level data scan limits. The third one is a hard guardrail — if a user runs a query that would scan above the limit, Athena cancels it. This protects you from a junior analyst running SELECT * on five years of logs. See the Athena workgroups guide.
CloudWatch Logs Insights Deep Dive
CloudWatch Logs Insights is a query language layered on top of CloudWatch Logs. It is interactive, returns results in seconds, and is paid per gigabyte of data scanned (cheaper than CloudTrail Lake but more expensive than Athena over Parquet). The query language has six top-level commands you must memorize: fields, filter, stats, sort, limit, and parse.
Common Security Patterns in Logs Insights
- Failed login bursts:
filter @message like /Failed password/ | stats count() by bin(5m)over an SSH log group; spikes show up as histogram peaks. - Lambda error rate:
fields @timestamp, @message | filter @type = "REPORT" and @message like /ERROR/ | stats count() by bin(1m). - Top error codes:
fields @message | parse @message /errorCode=(?<code>[^,]+)/ | stats count() by code | sort count desc. - Slow API requests:
filter @duration > 5000 | stats avg(@duration), max(@duration) by @logStream.
Parse for Structured Logs
The parse command converts a free-form log line into named fields using a glob pattern or regex, after which stats and filter work on those fields. This is the bridge between unstructured application logs and structured analysis. If your application emits JSON natively, CloudWatch Logs auto-extracts the top-level keys as fields prefixed with @, so you can skip parse entirely.
Logs Insights queries log groups (CloudWatch). Athena queries S3. If a question shows logs already going to S3 (via subscription filter or Firehose), the answer is Athena. If logs are only in a CloudWatch log group, the answer is Logs Insights. If both, choose based on retention window and cost: CloudWatch retention is configurable per log group, but for logs older than ~30 days the Athena-on-S3 path is usually cheaper. See CloudWatch Logs Insights docs.
CloudTrail Insights — Anomaly Detection on Management Events
CloudTrail Insights is not a query tool. It is an anomaly detection feature you enable on a CloudTrail trail, and it learns the baseline rate of write management events on the account, then surfaces deviations as Insights events. A typical CloudTrail Insight reads "the rate of RunInstances calls between 14:00 and 15:00 UTC was 27 standard deviations above baseline" — exactly the signal you want when an attacker uses stolen credentials to spin up crypto-mining instances.
There are two important boundaries you must know:
- CloudTrail Insights detects write management events only. It does not analyze data events (S3 GetObject, Lambda Invoke, DynamoDB GetItem). If the scenario asks about anomalous data event rates, the answer is GuardDuty (S3 Protection) or a custom CloudWatch metric filter, not CloudTrail Insights.
- CloudTrail Insights costs extra, billed per 100,000 management events analyzed. You enable it per trail and per event type. For multi-account organizations, enable it on the org-level trail in the log archive account.
Both are anomaly detectors, but they overlap minimally. CloudTrail Insights focuses purely on API call volume baselines. GuardDuty looks at CloudTrail, VPC Flow Logs, DNS logs, and S3 data events for IOC matching, behavioral anomalies, and known threat patterns. If the question mentions IP reputation, malware C2, or credential exfiltration, the answer is GuardDuty. If it mentions "unusually high CreateUser rate", the answer is CloudTrail Insights. See CloudTrail Insights docs.
Security Hub Insights — Filtered Views over ASFF
Security Hub Insights are saved filters that aggregate findings (in AWS Security Finding Format, ASFF) by a chosen attribute. They are not queries over raw logs; they are queries over already-normalized findings. There are two kinds:
- AWS managed insights: 100+ pre-built insights covering common patterns. Examples include "AWS resources with the most findings", "S3 buckets with public read or public write permissions", "IAM users with suspicious activity", and "Access keys not rotated in the last 90 days".
- Custom insights: define your own filter over ASFF fields (resource type, severity, label, workflow status, compliance status) and group by an attribute. Common custom insights include "all CRITICAL findings in production accounts grouped by resource", "all unresolved findings older than 30 days grouped by responsible team tag", and "Macie findings on PCI-tagged buckets grouped by sensitive data type".
Custom insights are the answer when an exam question asks "how do you create a continuously updated dashboard of org-specific finding patterns without building custom infrastructure?". They run on a schedule and drive both the Security Hub console and EventBridge rules.
Security Hub Insights cannot JOIN findings against external tables, cannot do windowed aggregation, and cannot enrich with arbitrary data. If you need richer analysis, export findings via EventBridge to S3 (Firehose) and use Athena, or use Security Lake. See Security Hub Insights docs.
Amazon Security Lake — OCSF-Normalized Data Lake
Security Lake is the newest and most architecturally significant tool in the SCS-C02 log analysis arsenal. It is a managed service that:
- Ingests logs from native AWS sources and third-party partners.
- Normalizes them into the OCSF schema, version 1.x, with one Parquet table per OCSF event class.
- Stores them in a customer-owned S3 bucket (Security Lake creates the bucket but you own the data) with rollup regions and lifecycle policies.
- Exposes the data to subscribers, who can query through Athena, OpenSearch, or any third-party SIEM that understands OCSF Parquet.
Native Sources
Security Lake supports first-class native sources at GA:
- AWS CloudTrail management events, data events, and CloudTrail Lake event data store.
- Amazon VPC Flow Logs.
- Amazon Route 53 Resolver query logs (DNS).
- AWS Lambda data events via CloudTrail.
- AWS Security Hub findings (ASFF, mapped to OCSF Detection Finding class).
- Amazon EKS audit logs.
These sources are configured with one click; Security Lake handles the partition layout, the OCSF mapping, and the Parquet conversion. There is no Glue crawler to manage.
Custom Sources
You can register a custom source for any third-party log format. The source must produce OCSF-conformant Parquet (you do the mapping), drop it in the prescribed S3 prefix, and Security Lake automatically picks it up and exposes it to subscribers. Common custom sources include Okta logs, Salesforce audit logs, on-premises firewall logs, and Crowdstrike or SentinelOne EDR telemetry. The mapping work is the cost — but once done, every subscriber benefits.
Subscriber Model
Security Lake separates producers (the sources) from consumers (the subscribers). Subscribers come in two flavors:
- Query access: subscriber gets read access to the S3 data via Lake Formation grants, and can query directly with their own Athena workgroup. Best for in-house SOC teams and data scientists.
- Data access: subscriber receives an SQS notification when new Parquet files land, and pulls the data into their own SIEM (Splunk, Sumo Logic, IBM QRadar, etc.). Best for third-party SIEM integrations.
A Security Lake rollup region is a designated region that aggregates data from one or more contributing regions. You typically choose one rollup region per geographic compliance zone (e.g. one for US, one for EU) so that subscribers in that zone query a single bucket containing all relevant data. Lifecycle and replication still respect data residency. See Security Lake regions docs.
The exam answer is Security Lake when the scenario combines: multiple accounts, multiple log sources (CloudTrail + VPC + Route 53 + third-party), schema normalization for SIEM consumption, and long-term storage as Parquet in S3. If a single dimension is missing — say only one account, or only CloudTrail — Athena alone is usually correct. See Security Lake user guide.
Log Normalization, Parsing, and Correlation Patterns
Normalization turns heterogeneous log formats into a single schema your analysts can reason about. Parsing extracts structured fields from semi-structured text. Correlation links events across sources to build a coherent narrative. SCS-C02 expects you to know patterns for all three.
CloudTrail JSON Parsing
CloudTrail records are nested JSON. The fields you'll touch most often are:
eventTime(ISO-8601, UTC).eventName(the API call, e.g.RunInstances).eventSource(the service, e.g.ec2.amazonaws.com).userIdentity.type(IAMUser,AssumedRole,Root,AWSService).userIdentity.arnanduserIdentity.principalId.sourceIPAddressanduserAgent.requestParametersandresponseElements(call-specific payloads, often the most interesting forensic detail).errorCodeanderrorMessage(present only on failed calls).awsRegion.
In Athena, you select these with dot syntax (userIdentity.arn) once the table is defined with the published CloudTrail Glue schema. In CloudWatch Logs Insights, top-level fields auto-extract; nested fields require parse @message with a regex.
VPC Flow Log v5 Fields
The default v2 format has 14 fields. The v5 extended format adds pkt-srcaddr, pkt-dstaddr, region, az-id, tcp-flags, type (IPv4/IPv6), flow-direction, traffic-path, and sublocation-type among others. The two pkt-* fields are critical for Transit Gateway and middlebox scenarios because they show the original source/destination before NAT or proxy rewriting, while the standard srcaddr/dstaddr show what the ENI saw.
OCSF Unified Taxonomy
OCSF maps every event into one of several event classes, each with a defined attribute set. For SCS-C02, the relevant classes are:
api_activity— covers CloudTrail and similar API audit events.network_activity— covers VPC Flow Logs and similar L4 telemetry.dns_activity— covers Route 53 Resolver query logs.detection_finding— covers Security Hub findings (ASFF mapped to OCSF).file_activity— for object access events (S3 data events).
Once data is in OCSF, an analyst writes one query like SELECT * FROM amazon_security_lake_glue_db.amazon_security_lake_table_us_east_1_cloud_trail_mgmt_2_0 WHERE actor.user.name = 'alice' and gets a consistent shape regardless of source.
Correlation Across Sources
Correlation is what turns five logs into one story. Classic patterns:
- Compromise narrative: GuardDuty finding (
UnauthorizedAccess:IAMUser/MaliciousIPCaller) → CloudTrail entry showing the same principal callingAssumeRolefrom the malicious IP → VPC Flow Logs showing the assumed role's session reaching out to a C2 IP → S3 data event showing the same session downloading a sensitive object. Detective service builds this graph automatically; Athena can do it manually with JOINs on time windows and principal IDs. - Failed-then-successful login: SSH log in CloudWatch Logs Insights showing 50
Failed passwordattempts from one IP, followed within minutes by anAccepted password. JOIN with CloudTrailConsoleLoginif the same human escalates from instance shell to AWS console. - Bucket exposure: Macie sensitive-data finding → CloudTrail showing
PutBucketPolicythat made the bucket public → Security Hub aggregated finding with severity CRITICAL.
All AWS log sources emit timestamps in UTC. When you correlate across sources, never convert to local time during the JOIN — it leads to off-by-one-hour bugs near DST. Convert at the presentation layer only. Athena's from_iso8601_timestamp() and CloudWatch's @timestamp are both UTC. See the CloudTrail record contents reference.
Cost-Aware Log Analysis Architecture
Log analysis is cheap until it isn't. A poorly designed pipeline can burn five-figure monthly bills with little useful output. SCS-C02 expects you to recognize cost-aware patterns.
S3 Lifecycle to Glacier
Logs older than 90 days are rarely queried interactively but must be retained for compliance (HIPAA, PCI, SOC 2 typically demand 1–7 years). The pattern is:
- Day 0–30: S3 Standard, fully indexed in Glue, queried by Athena.
- Day 30–90: S3 Standard-IA — same access latency, half the storage cost.
- Day 90–365: S3 Glacier Instant Retrieval — millisecond access for occasional Athena reruns.
- Day 365–7 years: S3 Glacier Deep Archive — 12-hour retrieval, lowest cost. Plan ahead: Athena cannot read directly from Deep Archive.
Parquet and Partition Hygiene
Converting raw JSON or text to Parquet typically compresses data 5–10x and applies predicate pushdown so Athena reads only the columns you select. A scheduled CTAS query that converts yesterday's CloudTrail JSON to Parquet runs in minutes and pays for itself within a week of analyst queries. Partition by year, month, day, and (optionally) account ID and region — but don't over-partition (more than ~10,000 partitions per table degrades performance).
Subscription Filter to Firehose to S3
CloudWatch Logs is comparatively expensive to retain at scale ($0.50/GB ingested + storage). The standard pattern for security log retention is:
- Application emits logs to CloudWatch Logs (live tail and Logs Insights).
- Subscription filter sends matching events to Kinesis Data Firehose.
- Firehose buffers, converts to Parquet (using a Glue table as schema), and writes to S3.
- CloudWatch retention is set to 30 days; S3 retention is set per compliance.
- Athena queries the S3 archive for anything older than 30 days.
This pattern keeps CloudWatch costs proportional to the working set while allowing unbounded retention in S3.
Multi-account orgs should send all org-trail CloudTrail, VPC Flow, ALB, and CloudFront logs to a dedicated log archive account with S3 Object Lock in compliance mode. Athena then queries this single bucket via cross-account permissions. This is the architecture in the AWS Security Reference Architecture.
For interactive search and dashboards over a moving 30–90 day window, OpenSearch Service is excellent. But for multi-year retention and cheap ad-hoc SQL, OpenSearch is far more expensive than S3+Athena because it stores hot data on EBS. The exam often gives a scenario emphasizing long-retention forensic queries — that points to Athena, not OpenSearch. Use OpenSearch when sub-second dashboards on a small window matter; use Athena when cost-per-TB on a large archive matters.
Apache Iceberg and Modern Data Lake Patterns
Athena now supports Apache Iceberg tables, which add ACID transactions, time travel, schema evolution, and row-level deletes on top of S3 Parquet. For security log analysis, Iceberg means you can:
- Update enrichment columns without rewriting partitions (e.g. backfill a
geo_countrycolumn after ingestion). - Time-travel to query the table state as of a past timestamp, which is invaluable for forensic reproducibility.
- MERGE new data into existing partitions transactionally.
Security Lake stores its tables in Iceberg-compatible Parquet by default, and CloudTrail Lake also uses an Iceberg-style internal store. For exam purposes, know that Iceberg is the modern choice for a security data lake on S3 and that it integrates natively with Athena, EMR, and Glue. See the Athena Iceberg docs.
OpenSearch Service for Security Log Search
For real-time search and pre-built dashboards (Kibana-style), OpenSearch Service is a complement to Athena, not a replacement. Common SCS-C02-relevant patterns:
- GuardDuty finding ingestion: stream findings via EventBridge → Lambda → OpenSearch for live triage dashboards.
- VPC Flow Log near-real-time: subscription filter from CloudWatch Logs → Firehose → OpenSearch with index lifecycle moving older indices to UltraWarm or cold storage on S3.
- Pre-built security analytics: OpenSearch Security Analytics plugin ships with pre-built detector rules (Sigma rule format) for common attacker techniques across CloudTrail, VPC, and host logs.
OpenSearch wins when you need sub-second search latency, rich dashboards, and alerting on stream. Athena wins on cost per TB, schema flexibility, and multi-year retention. Security Lake feeds either or both.
Threat Indicator Searches
Threat intelligence feeds (commercial like Recorded Future or open like AlienVault OTX) give you lists of bad IPs, domains, file hashes, and TLS JA3 fingerprints. The log analysis problem is searching for these indicators across your logs at scale. Patterns:
- IOC table in S3: maintain a daily-refreshed Parquet table of bad IPs and their threat scores; JOIN against VPC Flow Logs in Athena to find any internal flow that touched a known-bad IP.
- GuardDuty managed threat intel: AWS supplies a curated list under the hood, and you can also upload your own threat intelligence sets and trusted IP lists per detector.
- CloudWatch metric filter on user-agent: if a known malicious tool has a distinctive UA string, a metric filter generates an alarm in seconds without any extra storage cost.
GuardDuty for AWS-curated and self-uploaded IOC lists with continuous evaluation. Athena for ad-hoc or historical IOC search across petabytes. CloudWatch metric filter for cheap real-time alarming on a string match. Security Lake when you want to share IOC-enriched data with a third-party SIEM. See GuardDuty threat intel.
End-to-End Reference Architecture
A SCS-C02-grade log analysis architecture for a mid-size org typically looks like:
- Org-level CloudTrail (management + data events, multi-region) → log archive account S3 with Object Lock + KMS CMK.
- VPC Flow Logs v5 in Parquet → log archive account S3.
- Route 53 Resolver query logs → log archive account S3.
- CloudFront, ALB, WAF logs → log archive account S3 (separate prefixes).
- Application logs via CloudWatch agent → CloudWatch Logs (30-day retention) → subscription filter → Firehose → log archive S3 (long-term).
- GuardDuty + Security Hub in delegated admin account; findings exported via EventBridge → Firehose → S3 in ASFF JSON.
- Amazon Security Lake subscribes to CloudTrail, VPC, Route 53, Security Hub natively + custom source for application logs; OCSF-normalized Parquet in rollup region S3.
- Athena workgroups with per-team data scan limits, Glue Data Catalog with partition projection, and Iceberg tables for enriched/curated views.
- OpenSearch Service with UltraWarm tier for live SOC dashboards over the last 30 days.
- Detective for entity-relationship investigation; CloudTrail Lake for compliance-grade immutable retention.
This architecture maps cleanly to most exam scenarios. When a question describes a subset of these requirements, identify which layer is implicated.
SCS-C02 Exam Tips
- The exam loves service boundary questions. Memorize: CloudTrail Insights detects rate anomalies on management events only; GuardDuty detects threats based on patterns; Security Hub aggregates findings; Detective investigates relationships; Athena queries S3; Logs Insights queries log groups; Security Lake normalizes to OCSF.
- When a question mentions "long retention" or "forensic" or "cheap" plus "S3", the answer involves Athena.
- When a question mentions "interactive" or "recent logs" or "CloudWatch log group", the answer involves Logs Insights.
- When a question mentions "third-party SIEM" plus "normalized" plus "multi-account", the answer is Security Lake with a data-access subscriber.
- When a question mentions "unusual API call rate" with no other anomaly type, the answer is CloudTrail Insights.
- When a question mentions "saved view" or "finding pattern" or "insight" in a Security Hub context, the answer is Security Hub Insights.
- Watch for "automatically" and "console" distractors — they're often wrong, per community wisdom.
You will not write Athena DDL or Logs Insights syntax from scratch. You will recognize the right tool for a scenario and the right architectural component to fix a problem. Focus practice time on service selection and integration patterns, not query authoring.
FAQ
Q1. Can I run Athena queries directly against S3 logs in another AWS account?
Yes. The bucket policy in the log archive account must grant the Athena query account s3:GetObject and s3:ListBucket (or use Lake Formation cross-account grants). The Glue Data Catalog can either live in the log archive account (and be shared via Lake Formation) or be replicated to the query account. Most large orgs use a single Glue Data Catalog in the log archive account and grant cross-account access via Lake Formation, which gives row- and column-level security as a bonus.
Q2. What's the difference between CloudTrail Insights and CloudTrail Lake?
CloudTrail Insights is an anomaly detection feature layered on a regular CloudTrail trail; it surfaces events when API call rates deviate from baseline, billed per 100,000 events analyzed. CloudTrail Lake is a managed event data store with up to 7 years retention and SQL query capability over CloudTrail records, billed per GB ingested and per GB scanned. They are independent — you can have either, both, or neither.
Q3. Should I use Security Lake or just keep using Athena over my centralized S3 logs?
If your needs are single-source (just CloudTrail) and single-team (your in-house SOC), Athena alone is fine. Security Lake adds value when you have multiple sources (CloudTrail + VPC + Route 53 + third-party), multiple consumers (in-house team + external SIEM + data scientists), and want schema normalization so each consumer doesn't reinvent parsing. The OCSF mapping is the killer feature; if you don't need it, Security Lake is overhead.
Q4. How do I detect anomalous data events in S3 (e.g. someone downloading lots of objects)?
CloudTrail Insights does not cover data events. The right answer is GuardDuty S3 Protection, which analyzes CloudTrail data events for behavioral anomalies (Exfiltration:S3/AnomalousBehavior, UnauthorizedAccess:S3/MaliciousIPCaller). For custom thresholds, build a CloudWatch metric filter on the data-event log group, or run a scheduled Athena query that counts GetObject calls per principal per hour and alerts via SNS when a threshold is exceeded.
Q5. What's the cheapest way to keep 7 years of CloudTrail logs queryable for occasional audits?
Use S3 Object Lock in compliance mode + lifecycle: Standard for the first 30 days, Standard-IA for 30–90, Glacier Instant Retrieval for 90–365, Glacier Deep Archive thereafter. Convert daily prefixes to Parquet with a CTAS job. Athena can query Standard, IA, and Glacier Instant Retrieval with no special setup. For Deep Archive, you must restore the objects first (12+ hours), then query — acceptable for rare audits.
Q6. Can I use CloudWatch Logs Insights to query logs that already moved to S3 via subscription filter?
No. Logs Insights queries log groups, not S3. Once logs are in S3, switch to Athena. This is a frequent exam trap: the scenario describes a log group with a subscription filter to Firehose to S3, and asks how to query the archive — the answer is Athena, not Logs Insights.
Q7. How does OCSF in Security Lake handle vendor-specific fields that don't fit a standard event class?
OCSF event classes have an unmapped attribute (a key-value object) where vendors put fields that don't map to the standard schema. Subscribers can reference unmapped['vendor_field_name'] in queries, but core security analytics typically rely on the standard attributes. When mapping a custom source, prioritize getting the standard attributes right; put exotic fields in unmapped rather than forcing a poor fit into a standard attribute.
Q8. What's the role of Detective in log analysis?
Amazon Detective ingests CloudTrail, VPC Flow Logs, GuardDuty, and EKS audit logs and builds a graph of entity-to-entity relationships (principal → role → IP → resource → finding) over the past year. It's not a query tool; it's an investigation UI that answers "show me everything related to this GuardDuty finding". Detective complements Athena (which is for ad-hoc SQL) by visualizing relationships interactively. For SCS-C02, Detective is the answer when the question asks "investigate", "root cause", or "graph of related activity".