examhub .cc 用最有效率的方法,考取最有價值的認證
Vol. I
本篇導覽 約 40 分鐘

新專案的成本架構設計

7,850 字 · 約 40 分鐘閱讀

New solutions cost design is the SAP-C02 discipline of building cost optimization into the architecture on day one, not bolting it on after the invoice explodes. Domain 2.6 of the AWS Certified Solutions Architect Professional exam guide asks you to determine a cost optimization strategy to meet solution goals — and in practice this means a Professional-level architect must stand at the whiteboard and simultaneously balance Savings Plans portfolio construction, Reserved Instance size flexibility, Spot pattern libraries, Graviton migration paths, serverless cost envelopes, data transfer traps, tagging at organization scale, and machine-learning-driven anomaly detection, all before a single line of Terraform is written. This new solutions cost design guide walks through each dimension at Pro depth, with a running scenario where a VP of Engineering hands you a 500,000 dollar per month cloud bill and says "cut thirty percent without losing features in one quarter." By the end of this note you will know exactly which commitment to buy, which compute to swap, which endpoint to add, and which anomaly monitor to wire up.

What Is New Solutions Cost Design on AWS?

New solutions cost design is the architectural practice of making cost a first-class non-functional requirement in every greenfield AWS workload. In Phase 1 this is the Well-Architected Cost Optimization pillar applied to a blank canvas; in Phase 2 it is the SAP-C02 exam testing whether you can reason about Savings Plans versus Reserved Instances, Spot versus On-Demand, Fargate versus EC2, Lambda versus always-on, CloudFront versus direct origin, and VPC endpoints versus NAT Gateway, in the same 4-minute question window. New solutions cost design on SAP-C02 is never "which service is cheapest" in isolation — it is "given this workload shape, this growth curve, and this risk tolerance, which combination of pricing models, instance families, networking primitives, and storage classes produces the lowest total cost of ownership without sacrificing the stated SLOs."

Why New Solutions Cost Design Matters on SAP-C02

Community data from recent SAP-C02 takers shows cost design scenarios appearing heavily in Domain 2 (new solutions, 29 percent) and Domain 3 (continuous improvement, 25 percent). The exam rarely asks you to recite a price; it asks you to match a workload pattern (bursty, predictable, interruptible, latency-sensitive, data-heavy) to the correct cost primitive. New solutions cost design at Pro level is therefore a decision-matrix discipline, not a calculator discipline.

Core termNew solutions cost design on AWS is the greenfield architectural pattern of selecting Savings Plans and Reserved Instances for predictable baseline, Spot and Fargate Spot for interruptible capacity, Graviton for modern workloads, serverless where request patterns favour it, VPC endpoints and CloudFront to eliminate data transfer, Intelligent-Tiering for unknown access patterns, organization-level tagging for attribution, and Compute Optimizer plus Cost Anomaly Detection for continuous feedback — all committed before any production traffic lands.

白話文解釋 New Solutions Cost Design

New solutions cost design sounds abstract because it mixes a dozen pricing models at once. Four analogies from completely different domains make the whole picture click.

Analogy 1 — The Restaurant Kitchen Staffing Plan

Imagine you are opening a restaurant that will serve breakfast, lunch, dinner, and an occasional weekend brunch rush. On-Demand EC2 is hiring a waiter by the hour through a temp agency — infinitely flexible but the hourly rate is the highest. Compute Savings Plans are signing a one-year or three-year contract with a core kitchen staff who can cook anything on the menu — you commit to paying them forty hours a week whether or not the dining room is full, and in exchange their hourly rate drops thirty to forty percent. EC2 Instance Savings Plans are committing to a head pizza chef specifically — cheaper than the generalist contract, but if you later pivot to sushi that commitment is wasted. Spot Instances are the line cooks from a hospitality school who come in as extras on busy nights at a fraction of the price, but the head of school can pull them back with two minutes notice when they have a school event. Fargate Spot is the same cheap extras, but delivered through an agency that handles the paperwork. Lambda is paying a caterer per plate served — no wages between orders, perfect if you only open three nights a week. New solutions cost design is the head chef sitting down in month zero, looking at the menu and the projected covers, and deciding the mix: how much permanent staff, how much generalist contract, how much specialist contract, and how much per-shift hiring. The restaurant that books this correctly lives; the one that hires everybody full time burns cash in quiet weeks, and the one that only uses temps pays double on weekends.

Analogy 2 — The Hotel Room Booking Strategy for a Conference Organiser

You run an annual conference and must reserve rooms for attendees. The hotel offers several contracts. On-Demand is the rack rate: book a room the day before and you pay the sticker price. A one-year Compute Savings Plan is a corporate contract where you commit to a total dollar amount of hotel stays per month across any hotel in the chain, and any room booked against that contract gets a thirty percent discount; if one attendee cancels their room you can reassign the discount to another attendee in another city. A three-year contract is the same with a deeper discount but locks you in longer. A no-upfront payment is pay-as-you-go against the contract month by month; an all-upfront payment is writing one big cheque now and trading cash flow for a deeper discount. An EC2 Instance Savings Plan is like committing specifically to the Hilton brand — cheaper than the chain-wide contract but useless if the conference moves to a Marriott city. A Reserved Instance is a specific room type at a specific property — size flexibility means if you booked a queen you can apply the discount to any queen or king in the same property up to your commitment amount. A Convertible RI is the room booking you can exchange for any other room type mid-year. Spot is the hotel's late-cancellation inventory sold at seventy percent off with the contract that the hotel can take the room back with two minutes notice if a full-price guest walks in. New solutions cost design for the conference is deciding: what fraction of my attendees have certain dates (book Savings Plans), what fraction can stay in the loyalty chain (EC2 Instance SP), what fraction are flexible on dates and locations (Spot), and how much cash I have upfront to deepen the discount.

Analogy 3 — The Electrical Grid and Peaking Plants

An electricity utility provides power to a city. Baseload demand — the power the city needs twenty-four hours a day, seven days a week — is served by nuclear or coal plants with low per-kilowatt-hour cost but multi-year construction commitments. Intermediate-load demand is served by natural gas combined-cycle plants — cheaper than peakers, more flexible than baseload. Peak demand during heatwaves is served by fast-start peaker plants whose per-kilowatt-hour cost is ten times higher but which can come online in minutes. Spot inventory is surplus renewable generation sold at negative prices when the grid has too much wind. New solutions cost design is the utility planner deciding: how much baseload to commit to (three-year Savings Plans for the workload floor), how much intermediate (one-year Savings Plans), how much peaker (On-Demand and Auto Scaling), and how much opportunistic (Spot for batch jobs). A utility that builds only peaker plants burns the city's money; a utility that builds only baseload cannot handle a heatwave; the right mix is engineering plus forecasting plus a pricing model for each.

Analogy 4 — The Subway System Versus Ride-Sharing

A city transport authority needs to move commuters. The subway is a serverless-like fixed-cost service: massive upfront capital (infrastructure), then cheap per-rider cost and it scales elastically within its design envelope. Ride-sharing is On-Demand EC2: zero commitment, infinitely flexible, but expensive per mile. Buses are a Savings Plan: committed routes that run whether full or empty during their service windows. New solutions cost design models the commuter pattern: if ninety percent of trips follow the subway corridor at predictable times, build the subway; if usage is bursty and unpredictable, keep ride-sharing; if a specific route has steady but modest demand, run a bus. Mixing the three in the right ratio is the art of the transit planner, and it is the art of the AWS architect.

Hold these four mental models — restaurant kitchen, hotel conference, electric grid, subway system — and every SAP-C02 new solutions cost design question becomes a ratio decision, not a memorisation problem.

The New Solutions Cost Design Reference Architecture

Every serious new solutions cost design on AWS layers eight capabilities on top of a fresh landing zone. Picture this as a cost stack.

Layer 0 — AWS Organizations and Tagging Foundation

Nothing else works at scale without this. AWS Organizations with all features enabled, tag policies at the root, and cost allocation tag activation in the payer account form the substrate on which every later layer is measured.

Layer 1 — Commitment Portfolio: Savings Plans and Reserved Instances

The predictable baseline of your new solution is covered by Compute Savings Plans plus a thin layer of EC2 Instance Savings Plans and a sliver of Reserved Instances for specific families. This is the new solutions cost design foundation.

Layer 2 — Opportunistic Capacity: Spot Patterns

On top of the commitment floor, interruptible workloads run on Spot, Fargate Spot, and Karpenter-managed Spot nodes using diversification across instance families, sizes, and Availability Zones.

Layer 3 — Compute Substrate: Graviton First

New solutions cost design on greenfield code chooses Graviton (ARM64) as the default instance family, falling back to Intel or AMD only for workloads with hard dependencies.

Layer 4 — Serverless Where the Math Works

Lambda, Fargate, Step Functions Express, API Gateway HTTP, EventBridge, and DynamoDB on-demand replace always-on EC2 wherever request density favours pay-per-invocation pricing.

Layer 5 — Data Transfer Elimination

VPC endpoints, CloudFront, Origin Shield, Transit Gateway versus VPC Peering cost crossover, and PrivateLink are the new solutions cost design levers for shaving the silent data-transfer tax.

Layer 6 — Storage Class Intelligence

S3 Intelligent-Tiering by default, lifecycle rules for known access patterns, EBS gp3 as the default block storage, and EFS Intelligent-Tiering for file workloads.

Layer 7 — Feedback Loop: Compute Optimizer, Cost Anomaly Detection, Storage Lens

Continuous rightsizing recommendations, ML-based anomaly alerts, and organization-wide S3 analytics close the loop from design to operation.

The Savings Plans Portfolio for New Solutions Cost Design

Savings Plans are the single biggest lever a Pro-level architect has. New solutions cost design decides the commitment portfolio before the first production instance launches.

Compute Savings Plans versus EC2 Instance Savings Plans

Compute Savings Plans apply to EC2 regardless of family, region, size, tenancy, or operating system, and also apply to AWS Fargate and AWS Lambda. Flexibility is maximum; discount is up to 66 percent off On-Demand. EC2 Instance Savings Plans apply only to a specific instance family in a specific region (for example m6i in us-east-1), across sizes and across Availability Zones within that family and region. Flexibility is lower; discount is up to 72 percent off On-Demand.

The New Solutions Cost Design Portfolio Rule

Cover the predictable organization-wide baseline with a Compute Savings Plan at the payer level, so the discount can flow to any member account and any compute service. Cover a specific high-volume, stable workload that you know will run on a specific family (for example a fleet of Graviton m7g web servers) with an EC2 Instance Savings Plan in the account that owns that fleet to capture the extra six percentage points of discount. Reserve Reserved Instances for edge cases like Windows with SQL Server BYOL that Savings Plans do not cover.

The 1-Year versus 3-Year Decision Matrix

Dimension 1-Year 3-Year
Discount depth (Compute SP, no upfront) ~27% ~54%
Discount depth (EC2 Instance SP, all upfront) ~40% ~72%
Commitment risk 12 months 36 months
Workload confidence needed Medium High (stable platform roadmap, stable region, stable architecture)
Break-even vs On-Demand Payback within months Payback within weeks but locked 3x longer

The new solutions cost design rule of thumb: commit 3-year for the workload layer you are absolutely certain will still be on AWS in three years (core services, data platform), 1-year for services where architecture may evolve (experimental ML, new product lines), and leave the rest uncovered for On-Demand or Spot.

The No-Upfront versus All-Upfront Decision Matrix

Payment Option Cash Flow Impact Discount Uplift vs On-Demand When to Choose
No Upfront $0 now, monthly charge ~27% (1y Compute) Preserve cash; CFO prefers OPEX
Partial Upfront ~50% now, reduced monthly ~28.5% (1y Compute) Middle ground
All Upfront 100% now, $0 monthly ~30% (1y Compute) Cash rich; CFO wants deepest unit economics

For a 3-year Compute Savings Plan the delta between No Upfront and All Upfront can be 4 to 6 percentage points of additional discount — material on a 500,000 dollar per month bill.

Savings Plans apply automatically across the whole organization — A Savings Plan purchased in the payer account (or any member account) is applied to matching usage across every member account in the AWS Organization by default, just like Reserved Instances. This is why new solutions cost design almost always buys Savings Plans at the payer level — centralised purchasing, distributed benefit. To prevent a specific member account from consuming central commitments, toggle sharing off per-member in the payer billing preferences. SCPs and IAM policies cannot achieve this; it is a billing-layer toggle.

Savings Plans cannot be cancelled or exchanged — Unlike a Convertible Reserved Instance, an AWS Savings Plan cannot be cancelled, modified, or exchanged once purchased. The only safety valve is letting it expire at the end of its term. New solutions cost design therefore recommends laddering commitments: if your baseline is 100 units, buy 60 units now at 3-year and 30 units now at 1-year, and leave 10 units uncommitted. When demand grows you top up with another layer; when demand shrinks you simply let the 1-year expire without buying the next round.

Reserved Instance Strategy Across Accounts

Even with Savings Plans available, RIs remain relevant for three cases: RDS, ElastiCache, Redshift, and OpenSearch (where Savings Plans do not apply), Windows BYOL scenarios, and organizations with pre-Savings Plans commitments.

Size Flexibility Groups for Regional RIs

A Regional Reserved Instance applies across sizes in the same instance family, Linux/Unix, default tenancy. AWS computes the "normalized units" per size: nano=0.25, micro=0.5, small=1, medium=2, large=4, xlarge=8, 2xlarge=16, 4xlarge=32, 8xlarge=64, 16xlarge=128. A single m5.4xlarge Regional RI covers four m5.xlarge instances, or two m5.2xlarge instances, or eight m5.large instances — whatever combination adds up to 32 normalized units in that region, that family.

The RI Exchange Versus Cancel Decision

Situation Correct Action Why
Need to switch m5 to m6i mid-term Exchange a Convertible RI Preserves commitment value
Need to switch m5 to r5 mid-term Exchange a Convertible RI Family change requires Convertible
No longer need the capacity at all Sell on Reserved Instance Marketplace (Standard RI only, one-time payment) Recover partial value
Standard RI, no buyer on marketplace Accept sunk cost Standard RIs cannot be cancelled
Region change needed Exchange (Convertible) or wait out Region is a Convertible-exchangeable attribute

The New Solutions Cost Design Rule for RIs

On a greenfield build, default to Compute Savings Plans for EC2. Buy Convertible RIs only for RDS, ElastiCache, Redshift, and OpenSearch where Savings Plans do not apply. Never buy Standard RIs on a new solution unless you have absolute certainty about the instance family for three years — which, by definition, a new solution rarely has.

The Spot Pattern Library at Professional Depth

Spot Instances are offered at up to 90 percent discount off On-Demand, in exchange for AWS's right to reclaim the capacity with a two-minute notice. New solutions cost design treats Spot not as a single primitive but as a library of patterns.

Pattern 1 — ASG MixedInstancesPolicy with Capacity-Optimized Allocation

Auto Scaling Groups support a mixed instances policy that lets you specify multiple instance types, multiple purchase options (On-Demand plus Spot), and an allocation strategy. The recommended strategy for Spot is capacity-optimized which picks the Spot pools with the deepest available capacity (lowest interruption probability), rather than lowest-price (cheapest pool, higher interruption rate).

A typical new solutions cost design for a stateless web tier:

  • Instance families: m6i.large, m6a.large, m5.large, m5a.large, m6g.large, m7g.large (diversified across Intel, AMD, Graviton)
  • On-Demand base: 20 percent (covers the workload floor with guaranteed capacity)
  • Spot portion: 80 percent with capacity-optimized allocation across the six pools
  • Result: ~60 to 75 percent effective discount vs all-On-Demand, with interruption rate under 5 percent per pool

Pattern 2 — EC2 Spot Fleet with InstanceRequirements

Spot Fleet is the older API; for new solutions cost design on greenfield, ASG with InstanceRequirements (attributes-based instance-type selection) is preferred. You declare "give me instances with 4 to 8 vCPUs, 16 to 32 GB RAM, ARM or x86, any generation newer than current minus two" and ASG picks from every matching pool in every AZ, massively widening the Spot diversification.

Pattern 3 — Fargate Spot for Containerised Batch

Fargate Spot is a capacity provider for ECS that runs tasks on interruptible Fargate capacity at ~70 percent discount. Combine with a Fargate On-Demand base capacity provider and weights: FARGATE weight 1, base 2 tasks; FARGATE_SPOT weight 3, base 0. This runs the first two tasks guaranteed, scales the next three of every five tasks on Spot.

Pattern 4 — Karpenter Spot for EKS

Karpenter is the open-source just-in-time node provisioner for EKS that replaces the Cluster Autoscaler and managed node groups. A single Karpenter NodePool can specify spot capacity type, a list of instance families and sizes (or InstanceRequirements), and Karpenter consolidates workloads by packing pods efficiently and replacing underutilised nodes. Interruption handling is built in via SQS + EventBridge integration with the EC2 Spot interruption notice.

Pattern 5 — Checkpointing for Long-Running Spot Jobs

For machine learning training, video encoding, scientific computing — jobs that run for hours — new solutions cost design adds application-level checkpoints to S3 every N minutes. On a Spot interruption notice the workload flushes state to S3, the replacement instance pulls state on startup, and the job resumes. AWS Batch with Spot and managed compute environments does this for you; SageMaker Training Jobs support managed Spot with checkpoints natively.

Spot is NOT suitable for stateful databases or latency-critical synchronous APIs — A classic SAP-C02 distractor offers Spot as a cost optimization for a primary RDS database or a customer-facing synchronous API with strict p99 latency SLOs. The two-minute interruption notice makes Spot unsuitable for any workload where loss of a node breaks data durability or where connection setup latency would be unacceptable. Spot is for stateless, idempotent, horizontally scalable, interruption-tolerant workloads: web tier behind a load balancer, batch processing, CI runners, rendering farms, training jobs with checkpoints. New solutions cost design keeps primary databases, single-node stateful services, and licensed workloads on On-Demand or RIs.

Use Spot placement scores for capacity confidence — Before committing a new solutions cost design to Spot, run the EC2 Spot placement score API to get a 1-to-10 score per region and instance mix, indicating AWS's confidence that it can fulfil that Spot capacity. A score of 8 or higher means your Spot strategy is safe; lower scores suggest diversifying across more instance families or more regions.

Graviton Migration Plan for New Solutions Cost Design

AWS Graviton processors (ARM64) deliver up to 40 percent better price-performance than comparable x86 instances for a wide range of workloads. New solutions cost design defaults to Graviton unless a hard blocker exists.

The Graviton Migration Waterfall

  1. Managed services — Aurora, RDS, ElastiCache, OpenSearch, MemoryDB, MSK all support Graviton with a single parameter change on the instance class. Zero code change.
  2. Serverless — Lambda arm64 architecture, Fargate platform version with ARM. One-line change in the function config or task definition.
  3. Container workloads — ECS and EKS on Graviton nodes. Requires multi-architecture container images (docker buildx producing linux/amd64,linux/arm64 manifests).
  4. EC2 application tier — interpreted languages (Node.js, Python, Java, Go, .NET 6+) are typically drop-in; check native dependencies.
  5. EC2 with native dependencies — compiled C/C++ libraries, proprietary agents, BYOL software: test carefully, last to migrate.

Graviton Savings Plan Interaction

Compute Savings Plans apply to Graviton the same as to Intel and AMD. An EC2 Instance Savings Plan on the m7g family stacks with Graviton's inherent price-performance advantage, compounding the savings.

The New Solutions Cost Design Rule

Every new workload starts on Graviton. Fall back to x86 only when a specific native dependency fails testing. This alone is the single biggest compute cost reduction available on greenfield.

Serverless Cost Patterns at Pro Depth

Serverless is not automatically cheaper — it is cheaper when request patterns are bursty, idle periods are long, or traffic is unpredictable. New solutions cost design uses serverless strategically.

Pattern 1 — Lambda Power Tuning

Lambda is billed per GB-second (memory × duration). Counter-intuitively, increasing memory often reduces total cost because CPU scales with memory and functions finish faster. The open-source AWS Lambda Power Tuning tool runs a function at memory values from 128 MB to 10,240 MB, plots cost and latency against memory, and identifies the sweet spot. A CPU-bound function tuned from 512 MB to 1,792 MB can cut cost by 40 percent and latency by 60 percent simultaneously.

Pattern 2 — Step Functions Express versus Standard

Dimension Standard Workflows Express Workflows
Execution duration Up to 1 year Up to 5 minutes
Pricing model Per state transition Per request + duration (similar to Lambda)
History Visual execution history retained 90 days CloudWatch Logs only
Idempotency Exactly-once At-least-once (async) or at-most-once (sync)
Cost at 1M executions ~$25 (25 state transitions each) ~$1 (100ms average)

New solutions cost design selects Express for high-throughput short-lived microservice orchestration (API Gateway backends, IoT data processing), and Standard for long-running workflows with human approval, legal retention, or financial reconciliation.

Pattern 3 — EventBridge Rules Cost

EventBridge is billed per million events published to custom buses (no charge for default AWS service bus events and some targets). At scale a chatty microservice producing a billion events per month costs roughly 1,000 dollars on EventBridge alone. New solutions cost design uses content filtering at the rule level (rather than filtering inside Lambda after delivery) to avoid firing downstream targets unnecessarily, and Pipes with built-in filtering and enrichment to collapse lambda-based connector code.

Pattern 4 — API Gateway HTTP API versus REST API

HTTP APIs are roughly 70 percent cheaper than REST APIs at equivalent request volume and support most common patterns (Lambda proxy, JWT authorizer, CORS). New solutions cost design defaults to HTTP APIs, falling back to REST APIs only when features like API keys with usage plans, request validators, or WAF native integration are required.

Pattern 5 — DynamoDB On-Demand versus Provisioned

On-Demand bills per request (about $1.25 per million write requests, $0.25 per million reads in us-east-1). Provisioned is cheaper at steady utilisation above ~18 percent. New solutions cost design starts new tables on On-Demand for the first months while access patterns are unknown, then switches to Provisioned with auto-scaling once utilisation curves stabilise.

Pattern 6 — S3 Intelligent-Tiering as Default

S3 Intelligent-Tiering moves objects between access tiers (Frequent, Infrequent, Archive Instant, Archive, Deep Archive) based on observed access. There is a small monitoring fee per 1,000 objects per month but no retrieval fee for the Instant tiers. New solutions cost design puts every new bucket on Intelligent-Tiering by default unless the access pattern is known with certainty (for example, logs that are write-once read-never go to Standard-IA + Glacier Deep Archive via lifecycle).

Lambda Power Tuning is a free one-afternoon win — Running AWS Lambda Power Tuning across every production function typically finds 20 to 40 percent cost reduction and 30 to 60 percent latency improvement. The tool itself is a Step Functions state machine deployable from AWS Serverless Application Repository in minutes. Make it a checkpoint in every CI/CD pipeline — new solutions cost design means Power Tuning runs before promotion to production.

Data Transfer Reduction Patterns

Data transfer is the silent killer of AWS bills. NAT Gateway processing, inter-AZ replication, and egress to the internet compound into surprise line items.

Pattern 1 — VPC Gateway Endpoints for S3 and DynamoDB

Gateway endpoints are free and route S3 and DynamoDB traffic within the AWS network without ever traversing NAT Gateway. On a VPC processing 100 TB of S3 traffic per month via NAT, the NAT Gateway data processing charges alone can exceed 4,500 dollars per month. Adding a gateway endpoint to the route table reduces this to zero.

Pattern 2 — VPC Interface Endpoints for AWS Service APIs

Interface endpoints (PrivateLink) expose AWS service APIs (SSM, Secrets Manager, ECR, STS, KMS, SQS, SNS, and more) as private ENIs inside your VPC. Per-hour ENI charge is ~$0.01 per AZ plus per-GB processing. For high-traffic services like ECR (container image pulls) or CloudWatch Logs, interface endpoints beat NAT Gateway on both cost and latency.

Pattern 3 — CloudFront Origin Shield

Origin Shield is an additional caching layer between CloudFront regional edge caches and your origin. It increases cache hit ratio on the origin by consolidating requests from all regional edges into one or two shield POPs per origin. For a video streaming workload with 100 TB per month egress from S3 through CloudFront, enabling Origin Shield can reduce origin egress by 30 to 50 percent at a small Origin Shield request fee that nets a positive outcome.

Pattern 4 — Transit Gateway versus VPC Peering Cost Crossover

VPC Peering is free for the connection itself (only inter-AZ data transfer cost applies). Transit Gateway charges per-hour per attachment (~$0.05) plus per-GB processed ($0.02). At low VPC count Peering is cheaper; at scale TGW's operational simplicity wins. The crossover depends on traffic volume per attachment:

  • For under ~5 VPCs, Peering is usually cheaper
  • At 10+ VPCs, TGW's N-attachment model beats N-squared peering operationally
  • If heavy traffic flows across more than 2 VPCs (e.g. shared services pattern), TGW's centralisation avoids the hairpin cost of peering

New solutions cost design uses Peering for small stable pairs (for example dev and prod) and TGW for everything else.

When consuming a third-party SaaS service privately (say, a security vendor's API), PrivateLink avoids internet egress charges on the consumer side and eliminates public IP surface area. The provider side incurs NLB cost; the consumer side incurs interface endpoint hourly plus processing. Compared to routing through NAT Gateway to the public API, PrivateLink is typically cheaper above ~200 GB per month per service.

Pattern 6 — Same-Region, Same-AZ for Hot Paths

Inter-AZ data transfer is $0.01 per GB in each direction ($0.02 round trip). For synchronous chatty microservice pairs, using zonal endpoints and placing both ends in the same AZ eliminates this. For RDS replicas used as read-only, keeping the read replica in the same AZ as the primary consumer tier cuts ongoing replication-read cost in half. New solutions cost design considers AZ placement as a cost dimension, not only a reliability dimension.

NAT Gateway is the most common surprise line item on new workloads — NAT Gateway charges for both hourly availability ($0.045 per hour per AZ) and data processing ($0.045 per GB processed). A three-AZ architecture processing 50 TB per month on NAT Gateway is approximately 2,350 dollars per month in data processing alone. New solutions cost design adds S3 and DynamoDB gateway endpoints before production cutover, adds interface endpoints for the top 5 to 10 AWS services the workload uses, and routes all egress traffic (including application logs, metrics, and agent heartbeats) through endpoints where available.

Tagging Policy and Cost Allocation at Scale

Attribution without tags is useless. New solutions cost design writes the tagging policy before the first resource deploys.

Mandatory Tag Schema

  • CostCenter — finance attribution (e.g. CC-Retail-100)
  • Project — product or initiative (e.g. checkout-v3)
  • Environmentprod, staging, dev, sandbox
  • Owner — team email or group (e.g. [email protected])
  • DataClassificationpublic, internal, confidential, restricted
  • ManagedByterraform, cdk, console, cloudformation

Enforcement Layers

  1. Tag policies at the AWS Organizations root define the schema and allowed values.
  2. SCPs deny RunInstances, CreateBucket, CreateFunction, and friends when aws:RequestTag/CostCenter is missing or not in the approved list.
  3. Terraform / CDK modules automatically apply tags from a shared library so developers inherit them without thinking.
  4. AWS Config rule required-tags flags existing non-compliant resources for remediation.
  5. Cost allocation tag activation in the payer account makes the tag appear in Cost Explorer, CUR, and Budgets.

Showback and Chargeback Design

New solutions cost design decides upfront whether the organization will do showback (report what each team spent, no internal invoicing) or chargeback (internal invoicing, each team's budget gets charged). Showback uses Cost Categories and Cost Explorer filtered by tag. Chargeback uses AWS Billing Conductor to produce proforma invoices per business unit with optional internal markup.

AWS Compute Optimizer in a Multi-Account New Solution

Compute Optimizer is the free AWS ML-based rightsizing service. For new solutions cost design it closes the feedback loop two weeks after go-live.

Enable at the Organization Level

From the payer account, enable Compute Optimizer with Organizations enrollment. This gives a single pane across every member account: EC2, EBS, Lambda, ECS on Fargate, Auto Scaling groups, and RDS (MySQL, PostgreSQL) recommendations.

Recommendation Categories

  • Under-provisioned — workload is CPU or memory bound, upgrade
  • Over-provisioned — utilisation low, downsize
  • Optimized — current choice is good
  • Not Optimized (risk) — recommendation involves risk (e.g. family change) — review manually

The 14-Day Observation Window

Compute Optimizer needs 14 days of CloudWatch metrics before it will issue recommendations. For memory-based EC2 recommendations, the CloudWatch agent must be installed (it is not by default). New solutions cost design therefore bakes the CloudWatch agent into the AMI or adds it via Systems Manager State Manager as a baseline.

Actioning Recommendations

  1. Filter recommendations to medium or low risk and at least 20 percent savings
  2. Validate via the referenced CloudWatch metrics
  3. Apply during a maintenance window via Systems Manager Automation (modify launch template, cycle instances through ASG)
  4. Re-check after 30 days to confirm SLOs held

AWS Cost Anomaly Detection for New Workloads

Cost Anomaly Detection is free and uses machine learning to learn each monitor's baseline, then alerts on statistically significant spikes. For a new solution the monitor set is deployed on day zero.

The Standard Monitor Set for a New Solution

  1. Services monitor — covers all AWS services in the account
  2. Linked account monitor per production member account
  3. Cost allocation tag monitor on Project tag for each major new workload
  4. Cost Category monitor on business-unit categories

Alert Subscription

Alerts fire through SNS (with SQS or Lambda fan-out) or email, with a minimum anomaly threshold you control ($100, $1,000, custom). New solutions cost design routes anomaly alerts to the same on-call channel as service SLO alerts — cost anomalies are operational events.

New solutions cost design day-zero checklist — On a new greenfield AWS workload the Pro-level architect commits to day zero: (1) AWS Organizations all features + tag policy + cost allocation tags activated in payer; (2) Compute Savings Plan for the committed baseline, laddered 3-year 60 percent + 1-year 30 percent, leave 10 percent uncommitted; (3) Graviton as default CPU architecture; (4) MixedInstancesPolicy with Spot for stateless tiers; (5) VPC gateway endpoints for S3 and DynamoDB + interface endpoints for top services; (6) CloudFront with Origin Shield for any S3 origin serving the public; (7) S3 Intelligent-Tiering by default; (8) Compute Optimizer enabled with organization enrollment; (9) Cost Anomaly Detection monitors for services + linked accounts + tags + categories; (10) S3 Storage Lens activated for organization-wide storage analytics. This checklist is the SAP-C02 answer pattern to "design a cost-optimized greenfield architecture."

Scenario — Cutting a 500,000 Dollar Monthly Bill by 30 Percent

A VP of Engineering shares the monthly bill: 500,000 dollars. Cost Explorer shows the top line items: EC2 (230,000), RDS (85,000), NAT Gateway processing (42,000), S3 (38,000), data transfer out (35,000), Lambda (18,000), others (52,000). Reduce the total by 30 percent (150,000) per month in one quarter, without losing product features.

Move 1 — Commit Savings Plans on Predictable EC2 (expected savings: 55,000/month)

Analyse Cost Explorer's Savings Plans recommendations for 3-year and 1-year commitments. Purchase a 3-year Compute Savings Plan covering 70 percent of the current EC2 baseline at no-upfront ($55,000/month saving at 27 percent discount on $200,000 of covered usage). Keep 30 percent uncommitted for growth and Spot migration headroom.

Move 2 — Migrate Stateless Tiers to Spot (expected savings: 30,000/month)

Identify stateless web and API tiers (currently $100,000 On-Demand before SP). Reconfigure ASGs to MixedInstancesPolicy with 20 percent On-Demand base and 80 percent Spot across six diversified pools, capacity-optimized allocation. Expected effective discount on the 80 percent Spot portion: ~60 percent → $48,000 savings, offset by Savings Plan overlap, net $30,000.

Move 3 — Graviton Migration for Managed Services (expected savings: 18,000/month)

Switch RDS primary and replicas from m5 to m7g (Aurora), ElastiCache from r5 to r7g, OpenSearch from r5 to r6g. One-parameter change per service. ~20 percent cost reduction on $85,000 RDS + $20,000 cache = $21,000 saving, offset by migration effort, net $18,000.

Move 4 — Eliminate NAT Gateway Data Processing via Endpoints (expected savings: 28,000/month)

VPC Flow Logs reveal the NAT Gateway traffic: 60 percent is S3, 15 percent is DynamoDB, 15 percent is ECR image pulls, 10 percent is other. Add S3 and DynamoDB gateway endpoints (free), plus interface endpoints for ECR, Secrets Manager, SSM, CloudWatch Logs, STS. NAT Gateway data processing drops from $42,000 to ~$12,000. Interface endpoint cost adds ~$2,000. Net saving: $28,000.

Move 5 — CloudFront Origin Shield for S3 Egress (expected savings: 12,000/month)

The $35,000 data transfer out includes $18,000 from S3 origins. Enable CloudFront in front of public S3 buckets with Origin Shield in us-east-1. Public egress moves from S3 to CloudFront (cheaper per GB above tier thresholds), origin egress drops 40 percent via Origin Shield cache consolidation. Net saving: $12,000.

Move 6 — Lambda Power Tuning and Step Functions Express Migration (expected savings: 6,000/month)

Run Power Tuning on every production Lambda, adjust memory to cost-optimal point. Migrate short-lived Step Functions workflows from Standard to Express. Lambda cost drops from $18,000 to ~$12,000. Saving: $6,000.

Move 7 — S3 Intelligent-Tiering and Lifecycle Rules (expected savings: 9,000/month)

Analyse S3 Storage Lens: 60 percent of S3 cost is in Standard storage older than 90 days with no access in the last 30 days. Apply Intelligent-Tiering on new prefixes, lifecycle rules moving data older than 90 days to Standard-IA and older than 365 days to Glacier Deep Archive. Net saving: $9,000.

Total Expected Monthly Savings: 158,000 Dollars

That is 31.6 percent on a 500,000 dollar bill. The moves compound: Move 1 reduces the baseline that Move 2 and Move 3 then optimise further; Move 4 and Move 5 eliminate charges that are not reduced by commitments. None of these moves require feature cuts, architectural rewrites, or customer-facing SLO changes. Every one of them is a SAP-C02 new solutions cost design answer pattern — not memorised tricks but the application of the cost primitives covered in this note.

Key Numbers and Must-Memorize Facts for New Solutions Cost Design

  • Compute Savings Plans discount up to ~66 percent off On-Demand; apply to EC2, Fargate, Lambda; flexibility across family, size, region, OS, tenancy.
  • EC2 Instance Savings Plans discount up to ~72 percent off On-Demand; locked to a specific instance family in a specific region; still flexible across size and AZ.
  • 3-year All-Upfront Savings Plan delivers the deepest discount, roughly double the 1-year No-Upfront depth.
  • Savings Plans cannot be cancelled or exchanged — ladder your commitments.
  • Convertible RIs can be exchanged family-to-family, size-to-size, region-to-region; Standard RIs only resold on the Marketplace.
  • Reserved Instance size flexibility within a family-region uses normalized units (nano=0.25 ... 16xlarge=128).
  • Spot discount up to ~90 percent; 2-minute interruption notice; use capacity-optimized allocation across ≥6 pools.
  • Fargate Spot ~70 percent off Fargate On-Demand.
  • Graviton delivers up to 40 percent better price-performance; default CPU for new workloads.
  • Lambda Power Tuning often finds 20 to 40 percent cost reduction.
  • Step Functions Express ~95 percent cheaper than Standard for short workflows.
  • S3 Intelligent-Tiering has a small monitoring fee per 1,000 objects; no retrieval fee for Instant tiers.
  • NAT Gateway ~$0.045/hour per AZ + ~$0.045/GB processed.
  • VPC gateway endpoints for S3 and DynamoDB are free.
  • Interface endpoints ~$0.01 per ENI-hour per AZ + per-GB processing.
  • TGW attachment ~$0.05/hour + $0.02/GB processed; VPC Peering attachment is free.
  • Compute Optimizer needs 14 days of metrics; memory recommendations need the CloudWatch agent.
  • Cost Anomaly Detection is free; ML-based; supports 4 monitor types (service, linked account, Cost Category, cost allocation tag).
  • S3 Storage Lens free tier shows 14 metrics; advanced tier shows 29 metrics with activity and prefix-level insights.

Common SAP-C02 Traps for New Solutions Cost Design

Trap 1 — Picking Spot for a stateful or synchronous workload

Spot is stateless, idempotent, horizontally scalable work only. If the scenario mentions "primary database", "customer-facing API with strict p99", or "single-node stateful service", Spot is the wrong answer.

Trap 2 — Buying Standard RIs on a greenfield

Standard RIs cannot be exchanged. On a new solution where the instance family may change, Convertible RI or Savings Plans is the correct answer.

Trap 3 — Choosing EC2 Instance SP when workload may change family

EC2 Instance SP is locked to a family. If the scenario describes an evolving platform (e.g. "team plans to evaluate Graviton in 6 months"), Compute Savings Plan is the correct answer.

Trap 4 — Forgetting NAT Gateway data processing charges

A VPC-heavy design serving S3 through NAT generates surprise costs. The correct answer adds gateway endpoints for S3 and DynamoDB, interface endpoints for other services.

Trap 5 — Putting S3 on Intelligent-Tiering for write-once read-never logs

Intelligent-Tiering has a small monitoring fee. For data with known access patterns (archival logs), lifecycle rules to Standard-IA and Glacier Deep Archive are cheaper than Intelligent-Tiering.

Trap 6 — Using VPC Peering for a 50-VPC organization

Peering is N-squared operationally and non-transitive. Transit Gateway is the right answer at scale despite the per-attachment hourly cost.

Trap 7 — Mixing up Compute Optimizer and Cost Anomaly Detection

Compute Optimizer rightsizes resources (structural recommendation). Cost Anomaly Detection alerts on spend spikes (runtime alert). The exam asks for both; pick the right one per scenario.

Trap 8 — Turning off commitment sharing organization-wide to stop one subsidiary

Per-member-account toggle, not organization-wide. SCPs and IAM policies cannot change commitment sharing.

Trap 9 — Forgetting Savings Plans cannot be cancelled

Candidates under-commit or over-commit because they expect a safety valve. Ladder the commitments: 3-year on the absolute floor, 1-year on the confident middle layer, On-Demand on the uncertain tip.

Trap 10 — Using Step Functions Standard for short-lived high-throughput

Express is 90+ percent cheaper for <5-minute workflows at scale. Standard is only correct for long-running, human-in-the-loop, or exactly-once-critical orchestration.

Practice Question Patterns

  1. "Greenfield stateless web application with baseline 100 m5.large and peaks of 300 m5.large." → Compute Savings Plan covering 100 instances + MixedInstancesPolicy with Spot diversification for the 200 peak delta.
  2. "Batch ML training runs 6 hours/day on 100 GPU instances." → Spot + S3 checkpoints via AWS Batch or SageMaker Managed Spot.
  3. "New SaaS expects 10x growth in 12 months, family may change." → Compute Savings Plan (family-flexible) + Convertible RI for managed services.
  4. "VPC processes 200 TB/month S3 traffic via NAT Gateway." → S3 Gateway Endpoint (free, solves it).
  5. "Public-facing video platform with S3 origin, 500 TB egress/month." → CloudFront + Origin Shield with S3 as origin.
  6. "New Lambda function, team unsure of optimal memory." → Lambda Power Tuning.
  7. "High-throughput microservice orchestration, <1 second average duration, 10M executions/day." → Step Functions Express.
  8. "New S3 bucket, unknown access pattern." → Intelligent-Tiering default.
  9. "Finance wants alerts on unusual cost spikes per business unit." → Cost Anomaly Detection with Cost Category or tag monitors.
  10. "New EKS cluster, want aggressive Spot adoption without node group churn." → Karpenter with spot capacity type and InstanceRequirements.

FAQ — New Solutions Cost Design Top Questions

Q1: On a greenfield solution, when do I pick a Compute Savings Plan versus an EC2 Instance Savings Plan?

Default to Compute Savings Plans for the organization-wide baseline because flexibility matters more than the extra 6 percentage points of discount when the architecture is still evolving. A Compute SP applies across EC2 regardless of family, size, region, OS, and tenancy, and also covers AWS Fargate and AWS Lambda — so if in month four the team decides to move a service from EC2 to Fargate, the commitment follows the workload. Layer EC2 Instance Savings Plans on top for a specific stable high-volume fleet where you are absolutely certain the family will not change (for example a production Graviton m7g web tier with 100 committed vCPUs). The extra discount of EC2 Instance SP is real but the family lock-in only pays off when the workload is well-characterised. New solutions cost design on SAP-C02 almost always starts with Compute SP as the foundation and adds EC2 Instance SP as a precision tool on specific workloads.

Q2: How do I decide between 1-year and 3-year Savings Plans, and between no-upfront and all-upfront?

Ladder the commitments. Take your committed baseline and split it into three tranches. The bottom tranche — the absolute floor you are certain will still exist in three years — commit 3-year. The middle tranche — workloads that are confident for the next 12 to 18 months but may evolve beyond — commit 1-year. The top tranche — growth headroom and experimental workloads — leave uncommitted on On-Demand or Spot. Within each tranche, pick the payment option that matches your finance team's preference: all-upfront gives the deepest discount (roughly 4 to 6 percentage points extra on a 3-year term) and is appropriate when the CFO wants to lock unit economics and the cash is available; no-upfront preserves cash flow and keeps the discount as an OPEX reduction; partial-upfront is the middle ground. Concrete example: on a 500,000 dollar monthly bill with 60 percent predictable baseline, commit 180,000 dollars per month 3-year all-upfront (if cash available) plus 90,000 dollars per month 1-year no-upfront, leaving 30,000 dollars per month On-Demand for flexibility. Savings Plans cannot be cancelled, so the laddering is the safety mechanism — you let the shortest layer expire first if the workload shrinks.

Q3: When is Spot appropriate on a greenfield architecture and when is it a trap?

Spot is appropriate for stateless, idempotent, horizontally scalable, interruption-tolerant workloads. Canonical fits are stateless web and API tiers behind a load balancer, batch processing (Glue, EMR, Batch), CI/CD runners, rendering farms, Kubernetes workloads with PodDisruptionBudgets and graceful termination handling, and machine learning training with S3 checkpoints. Spot is a trap for primary databases (RDS, ElastiCache, MemoryDB), single-node stateful services, licensed workloads that take minutes to relicense, and customer-facing synchronous APIs with hard p99 SLOs where a two-minute eviction would cause perceptible impact. The new solutions cost design pattern for Spot on stateless tiers is a MixedInstancesPolicy with 20 percent On-Demand base (guaranteed workload floor) and 80 percent Spot across six or more diversified instance pools with capacity-optimized allocation. Add the Spot Instance interruption notice handler (EventBridge → Lambda → graceful drain) as part of the baseline infrastructure. For containers, ECS Fargate Spot and EKS Karpenter with spot capacity type give the same savings with simpler operations. Never put the absolute workload floor on Spot — that floor belongs on Savings Plans.

Q4: My team is building a new microservices platform and every service processes data from S3 and writes to DynamoDB. What is the most important new solutions cost design lever we can pull on day one?

Install VPC gateway endpoints for Amazon S3 and Amazon DynamoDB before production cutover. Both endpoints are free and route traffic to S3 and DynamoDB within the AWS backbone without traversing NAT Gateway. A three-AZ architecture processing 100 terabytes per month of S3 traffic via NAT Gateway incurs roughly 4,500 dollars per month in NAT data processing charges plus NAT hourly charges — all of which disappear the moment the gateway endpoint is added to the VPC route tables. The change is non-breaking: existing S3 and DynamoDB SDK clients use the same endpoint URLs, and traffic automatically routes via the gateway endpoint entry in the route table. Pair with interface endpoints (PrivateLink) for the top AWS service APIs your workload calls — Amazon ECR (container image pulls are often the next-biggest NAT spend), AWS Secrets Manager, AWS Systems Manager, Amazon CloudWatch Logs, AWS STS, AWS KMS, Amazon SQS, Amazon SNS. Interface endpoints charge per-ENI-hour plus per-GB processing, but the economics turn positive above a few hundred gigabytes per month per service. Finally, enable S3 Storage Lens at the organization level so you can see, per prefix, how traffic flows and where to optimize next.

Q5: How do I design the commitment portfolio for a new SaaS product where growth is uncertain — we might 10x in twelve months or we might stay flat?

Start with a conservative floor and ladder. Buy Savings Plans only for the baseline you are certain will exist regardless of growth — typically the shared platform services (observability, CI/CD runners, internal APIs, data platform) that serve every customer. Leave the customer-variable tier (web tier, worker tier) uncommitted or on a 1-year Compute Savings Plan for 40 to 60 percent of its expected base, with the remainder on On-Demand and Spot. As the growth curve resolves over the first three to six months, use Cost Explorer's Savings Plans recommendations (which analyses the trailing 7, 30, or 60 days of usage) to top up with another commitment layer. This laddering approach accepts that the first months pay slightly more on uncommitted usage in exchange for not over-committing to a growth scenario that may not materialise. Complement with AWS Cost Anomaly Detection monitors scoped to the new product's Cost Category and tag, so you detect both spend spikes (feature launches, runaway costs) and dips (product changes that reduce compute) and can react fast. Do not forget the Graviton default and the Spot diversification patterns — both compound with commitments to deliver 50 to 70 percent effective discount vs all-On-Demand at steady state.

Q6: How do Compute Optimizer and Cost Anomaly Detection fit into a new solutions cost design workflow?

They close the feedback loop. In the design phase you pick the initial instance types, memory sizes, and pricing models based on best guesses and benchmarks. After 14 days of production traffic, AWS Compute Optimizer starts emitting recommendations: which EC2 instances are over-provisioned, which ASGs could use a different type, which EBS volumes could shrink, which Lambda memory configurations are sub-optimal, which ECS Fargate tasks are over-sized. Review the medium-risk, 20-percent-or-more savings recommendations weekly and action them during maintenance windows via Systems Manager Automation. In parallel, AWS Cost Anomaly Detection monitors learn the baselines for each monitor — services, linked accounts, Cost Categories, cost allocation tags — and alert when the statistical deviation exceeds your threshold (configurable, typically 100 dollars or 1,000 dollars). Route these alerts to the same SNS topic that feeds the on-call channel. A typical new solutions cost design sees Compute Optimizer capture another 10 to 20 percent in structural savings over the first 90 days, and Cost Anomaly Detection catches two to three surprise spikes per quarter (a forgotten dev environment, a misconfigured batch job, a data transfer misroute) before they blow the budget. Both services are free.

Q7: What is the single biggest mistake teams make on new solutions cost design, and how do I avoid it?

Treating cost optimization as a Q3 project after the bill is alarming, rather than as a day-zero architectural commitment. The single biggest mistake is building on On-Demand EC2 with Intel instances, no tags, no endpoints, no Spot, no commitments, and no feedback loop — then three quarters later trying to retrofit Savings Plans, Graviton, and endpoints on a production system that is now resistant to change. Retrofit works but costs 3 to 5 times more engineering effort than doing it right the first time. Avoid this by making new solutions cost design a checklist item on the pre-production design review: Graviton default, Compute Savings Plan purchased for the committed baseline, MixedInstancesPolicy with Spot diversification for stateless tiers, VPC endpoints for S3 and DynamoDB and top service APIs, CloudFront with Origin Shield for public origins, S3 Intelligent-Tiering by default, tag policy enforced via SCP, Compute Optimizer enrolled, Cost Anomaly Detection monitors deployed, S3 Storage Lens activated. Every item is a single Terraform module or a single console checkbox; the architect's job is to make sure all of them ship before the production cutover. The SAP-C02 exam tests exactly this discipline — the correct answer pattern for any "design a cost-optimized new workload" question is not one service, it is the layered stack.

Further Reading

  • Cost Visibility Multi-Account — the organization-level visibility, tagging, and chargeback patterns that feed the attribution side of new solutions cost design.
  • Continuous Improvement Cost Optimization — the remediation patterns for existing workloads that complement the greenfield focus of new solutions cost design.
  • New Solutions Performance — the performance architecture lever where caching and purpose-built databases intersect with cost decisions.
  • Event-Driven Serverless Architecture — the serverless patterns that new solutions cost design selects when request density favours pay-per-invocation.
  • Containerization ECS EKS — the container cost patterns including Fargate Spot, Karpenter, and EKS node group strategy.

Nail new solutions cost design at this depth and you have covered roughly one-fifth of the cost surface area on SAP-C02 — plus built the FinOps discipline that every production AWS workload should begin with.

官方資料來源