AWS Lambda Performance Optimization (Cold Starts, Memory Tuning, SnapStart, Provisioned Concurrency)

AWS Lambda performance optimization is Task 4.3 of the DVA-C02 exam guide — "Optimize applications by using AWS services and features" — and it is where the exam stops asking "what is Lambda?" and starts asking "why is this Lambda slow, expensive, or throttled, and which lever fixes it?" This chapter picks up where the Lambda fundamentals topic left off and drills into the operator-level levers: memory and CPU tuning with AWS Lambda Power Tuning, cold start anatomy and the three mitigation strategies (smaller init, SnapStart, Provisioned Concurrency), Reserved Concurrency for throttling, execution context reuse for connection pooling, Hyperplane ENI for VPC-attached functions, arm64 Graviton2 for 20% price-performance, Lambda Layers and Lambda Extensions tradeoffs, Function URL versus API Gateway latency, burst concurrency math, the 6 MB / 256 KB payload escape hatch via Amazon S3, efficient packaging, and destinations versus DLQ for asynchronous retry control. Every single one of these is a button the DVA-C02 exam can press — master AWS Lambda performance optimization and Task 4.3 scores become near-automatic.

What AWS Lambda Performance Optimization Means on DVA-C02

AWS Lambda performance optimization on DVA-C02 is not a single knob but a decision tree across four dimensions: latency, throughput, cost, and reliability. Every AWS Lambda performance optimization scenario on the exam ultimately asks you to pick the right lever for the right constraint. The Lambda fundamentals topic teaches the service surface; this AWS Lambda performance optimization topic teaches the dials you turn once you ship the function.

Why AWS Lambda Performance Optimization Is Its Own Task Statement

The DVA-C02 exam guide lists Task 4.3 separately because the skills are different from building the function. Developing a Lambda function asks "what goes inside handler(event, context)?" AWS Lambda performance optimization asks "given the function already works, how do I cut the p99 from 2 s to 200 ms and the bill from $400 to $80?" These are two different muscles, and the exam tests both.

The Four Dimensions of AWS Lambda Performance Optimization

Latency — p50 / p95 / p99 invocation time, which drives user-facing UX through API Gateway or Function URLs.
Throughput — invocations per second the account or function can sustain without throttling.
Cost — GB-seconds billed plus any Provisioned Concurrency fee, network egress, and X-Ray sampling.
Reliability — error rate under load, retry behavior on async sources, poison-pill handling on streams.

Every AWS Lambda performance optimization lever in this chapter pulls at one or more of these four dimensions — and the classic DVA-C02 trap is a lever that fixes one dimension while silently worsening another (e.g., Provisioned Concurrency crushes latency but increases cost; larger memory cuts duration but may or may not cut cost depending on the CPU-bound ratio of the workload).

Plain-Language Explanation: AWS Lambda Performance Optimization

AWS Lambda performance optimization 的六個關鍵詞 — cold start、memory tuning、context reuse、concurrency、Graviton2、payload — 用下面三個類比串起來，白話一次就通。

Analogy 3 — The Commuter Train System

AWS Lambda performance optimization 也可以想成通勤捷運系統。Function URL 像直達區間車 — 少一站（少 API Gateway 一層），延遲更低；API Gateway 像有驗票閘門的轉運站 — 多一點延遲，換來授權、throttling、mapping template、caching。Lambda Layers 像每班車都要拖的補給車廂 — 方便共用但拖得越多，發車（init）越慢。Lambda Extensions 像在車上的獨立監控車廂 — 它跟主車共用能源（同一個 execution environment、按 GB-seconds 計費）但有自己的生命週期，在 SHUTDOWN 還可以多跑 2 秒把資料回傳。Payload 6 MB / 256 KB 限制 是車廂載重 — 要搬家具（大檔案）不能塞進捷運，得改走貨運（Amazon S3 pre-signed URL、Amazon EFS mount）。Async Destinations vs DLQ 則是失物招領 — 現代版 Destination 送到 SQS / SNS / EventBridge / Lambda，資訊更完整，DLQ 只是舊版的單點 SQS/SNS。AWS Lambda performance optimization 就是在這條捷運網路上同時優化「車速、班距、車廂載重、失物處理」。

三個類比合起來，AWS Lambda performance optimization 的整張地圖就立體了。

A cold start is any AWS Lambda invocation that runs on a freshly initialized execution environment — AWS Lambda must download the deployment package, start the runtime, and execute your init code outside the handler before invoking handler(event, context). A warm start reuses a frozen environment whose init has already completed; only the handler itself runs. AWS Lambda performance optimization techniques split into two categories: (1) make cold starts rarer or faster, and (2) make warm starts reuse more work. Reference: https://docs.aws.amazon.com/lambda/latest/operatorguide/execution-environments.html

The AWS Lambda Execution Environment Lifecycle

Every AWS Lambda performance optimization technique operates on one of three lifecycle phases, so mastering the lifecycle is prerequisite.

Phase 1 — Init Phase (INIT)

When AWS Lambda creates a new execution environment, it runs three steps in sequence:

Extension init — every Lambda Extension in /opt/extensions/ is started.
Runtime init — AWS Lambda starts the language runtime (Node.js, Python, JVM, .NET CLR, Ruby, custom provided.al2023).
Function init — AWS Lambda executes your module-level code: import statements, SDK client construction, database connection pools, loaded ML models.

Phase 1 is what the exam calls "cold start." AWS Lambda charges you for INIT time (at the function memory rate) on standard functions; with SnapStart you pay a one-time restore fee per restore.

Phase 2 — Invoke Phase (INVOKE)

AWS Lambda delivers the event to the runtime, the runtime calls your handler, and the handler runs to completion (return or unhandled exception) or timeout. Every warm invocation is Phase 2 only.

Phase 3 — Shutdown Phase (SHUTDOWN)

When AWS Lambda reclaims an idle execution environment, the runtime and extensions receive a shutdown signal (up to 2 seconds). Extensions can use this window to flush telemetry; most handlers have no shutdown hook.

Every AWS Lambda performance optimization lever on DVA-C02 maps to one phase: INIT levers (SnapStart, Provisioned Concurrency, smaller package, lazy loading, arm64 Graviton2) make cold starts rare or cheap; INVOKE levers (memory/CPU tuning, context reuse, connection pooling, efficient algorithms) make warm runs faster; SHUTDOWN levers (extension flush windows, destinations) shape reliability. Diagnose the phase first, then pick the lever. Reference: https://docs.aws.amazon.com/lambda/latest/operatorguide/execution-environments.html

AWS Lambda Memory Configuration: The Primary Performance Knob

Memory is the single most important AWS Lambda performance optimization setting because it controls both CPU and cost.

The 128 MB to 10,240 MB Range

AWS Lambda lets you configure memory from 128 MB up to 10,240 MB (10 GB) in 1 MB increments. CPU is allocated linearly with memory: you reach one full vCPU at approximately 1,769 MB and roughly six vCPUs at 10,240 MB. This means AWS Lambda performance optimization through memory is really CPU optimization, and CPU-bound workloads often get faster and cheaper at higher memory simply because duration drops faster than GB-seconds rise.

The Counterintuitive Cost Curve

AWS Lambda billing is GB-seconds × request count × regional rate. A naive engineer assumes minimum memory = minimum cost. But for CPU-bound code (image transformation, PDF rendering, crypto, JSON parsing of large payloads) moving from 512 MB to 1,792 MB can cut duration by 3× while only doubling per-ms cost — a net 33% cost reduction. AWS Lambda performance optimization through memory tuning is one of the few places on AWS where you can buy more speed and save money simultaneously.

AWS Lambda Power Tuning

AWS Lambda Power Tuning is an open-source AWS Step Functions state machine (deployable from the AWS Serverless Application Repository) that empirically finds the optimal memory for a function. You pass in:

The function ARN (and alias).
A representative invocation payload.
A list of memory sizes to test (e.g., [128, 256, 512, 1024, 1536, 3008, 5120, 10240]).
The number of invocations per memory setting (statistical confidence).
A strategy: speed, cost, or balanced.

The state machine invokes the function N times at each memory, measures duration and calculates cost, and returns a visualization pinpointing the sweet spot. AWS Lambda Power Tuning is the DVA-C02-blessed answer whenever a scenario says "how do you pick the optimal memory?"

Memory-Tuning Decision Rules

I/O-bound (function spends most time waiting on DynamoDB, S3, HTTP) → small memory (128–512 MB) is usually optimal; extra CPU goes unused.
CPU-bound (image/PDF/crypto/compression) → run AWS Lambda Power Tuning across the full range; often lands around 1,792 MB or 3,008 MB.
Memory-bound (big in-memory data structures, ML models) → size to working set + 20% headroom.
Mixed → AWS Lambda Power Tuning with balanced strategy.

128 MB ≈ tiny CPU slice. 1,769 MB = 1 full vCPU. 3,538 MB ≈ 2 vCPUs. 5,307 MB ≈ 3 vCPUs. 10,240 MB ≈ 6 vCPUs. These anchors drive every AWS Lambda memory-tuning decision — CPU-bound workloads rarely benefit from below 1,769 MB, and multi-threaded runtimes (Node.js worker threads, Java ForkJoinPool, Go goroutines) need ≥ 3,538 MB to actually parallelize. Reference: https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html

AWS Lambda Cold Start Anatomy

AWS Lambda cold start is a composite of three sub-latencies, and AWS Lambda performance optimization requires attacking the right sub-latency.

Sub-Latency 1 — Runtime Bootstrap

AWS Lambda launches a Firecracker micro-VM, mounts the deployment package, and starts the language runtime. This sub-latency is AWS-controlled and largely invariant: ~50–100 ms for Node.js and Python, ~150–300 ms for Ruby and Go, ~300–500 ms for .NET, ~400–800 ms for Java. You cannot shrink runtime bootstrap other than choosing a lighter runtime — this is why Java cold starts are the longest and why SnapStart was invented specifically for Java (and now Python/.NET).

Sub-Latency 2 — Deployment Package Load

AWS Lambda downloads and extracts the deployment package into the sandbox. Time scales with the unzipped size plus the number of files. A 250 MB unzipped ZIP takes hundreds of ms to load; a tight 5 MB bundle loads in tens of ms. This is why AWS Lambda performance optimization almost always starts with "shrink the deployment package."

Sub-Latency 3 — Function Init Code

The code you execute at module scope — import boto3, const dynamodb = new DynamoDBClient(), loading a Hugging Face model — all runs during INIT. Heavy imports and eager initialization dominate cold start for data-science and ML workloads. Move expensive work into lazy loaders inside the handler only if you will not need it on every invocation.

Per-Runtime Cold Start Ranking

From shortest to longest (typical p50 on a trivial handler at 1,024 MB):

Node.js 20.x — 80–150 ms.
Python 3.12 — 100–200 ms.
Go on provided.al2023 — 100–250 ms.
Ruby 3.3 — 200–400 ms.
.NET 8 — 400–800 ms (200–500 ms with SnapStart).
Java 21 — 500–1500 ms (150–400 ms with SnapStart).

These numbers move per region, package size, and init code, but the ordering is stable and exam-relevant.

Work the ladder top-down: (1) shrink the deployment package (tree-shake, prune dev dependencies, compile to native where possible); (2) move heavy imports behind lazy if branches inside the handler when they are not universally needed; (3) enable AWS Lambda SnapStart if the runtime supports it (Java 11/17/21, Python 3.12+, .NET 8+); (4) reach for Provisioned Concurrency only when (1)–(3) are exhausted and latency SLO still misses. AWS Lambda performance optimization is cheapest when you climb the ladder in order. Reference: https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/

AWS Lambda SnapStart

AWS Lambda SnapStart is the most important AWS Lambda performance optimization feature added in the last three years, and DVA-C02 v2.1 emphasizes it explicitly.

How AWS Lambda SnapStart Works

AWS Lambda SnapStart snapshots the entire execution environment — Firecracker memory, JVM heap, loaded classes, initialized SDK clients — after the INIT phase completes on a published version. On every subsequent cold start, AWS Lambda resumes the environment from the snapshot instead of running INIT again. The restore cost is typically 100–400 ms versus the 1–6 seconds of cold JVM init.

Supported Runtimes

As of DVA-C02 v2.1 scope:

Java 11 (Corretto) — the original launch runtime (November 2022).
Java 17 and Java 21 — added as the Corretto managed runtimes landed.
Python 3.12+ — SnapStart expanded to Python in late 2024.
.NET 8 — SnapStart expanded to .NET in late 2024.

AWS Lambda SnapStart is not available for Node.js, Ruby, Go, or custom runtimes — those runtimes already cold-start fast enough that the engineering effort would not pay off.

AWS Lambda SnapStart Billing Model

AWS Lambda SnapStart used to be free at launch; AWS now charges a per-GB-second restore fee and a per-GB-hour cache fee for the snapshot. You still save money versus Provisioned Concurrency for spiky traffic because you pay only when a cold invocation actually restores the snapshot, not for idle pre-warmed capacity.

AWS Lambda SnapStart Restore Hooks and Uniqueness Re-Seeding

The biggest AWS Lambda SnapStart gotcha is state captured in the snapshot that must be unique per sandbox: random seeds, UUIDs, cached IAM credentials, database connection IDs, ephemeral ports. If a thousand cold starts restore from the same snapshot, they will all inherit the same RNG state and possibly the same expired credential cache.

The fix is CRaCResource hooks (org.crac.Resource):

public class ConnectionPool implements Resource {
  @Override public void beforeCheckpoint(Context<? extends Resource> ctx) {
    pool.closeAllConnections();
  }
  @Override public void afterRestore(Context<? extends Resource> ctx) {
    pool.reopenAllConnections();
    SecureRandom.reseed();
  }
}

Register the resource in static init; AWS Lambda SnapStart invokes the hooks around checkpoint/restore. Python and .NET expose equivalent beforeSnapshot / afterRestore callbacks through their runtime hooks APIs.

A classic AWS Lambda performance optimization trap: you enable SnapStart, cold starts drop 80%, and three weeks later you discover every request in a thousand-concurrent burst is assigned the same UUID because UUID.randomUUID() was called once during INIT and the SecureRandom state was frozen into the snapshot. Always audit module-scope init for uniqueness-sensitive state and re-seed in afterRestore. On DVA-C02, any question mentioning "duplicate values" + "SnapStart" points straight at this. Reference: https://docs.aws.amazon.com/lambda/latest/dg/snapstart-uniqueness.html

AWS Lambda Provisioned Concurrency

AWS Lambda Provisioned Concurrency is the brute-force AWS Lambda performance optimization button — pay money to keep N sandboxes fully warm.

What Provisioned Concurrency Guarantees

When you set Provisioned Concurrency = 50 on an alias:

AWS Lambda pre-initializes 50 execution environments — all three INIT sub-phases complete before traffic arrives.
The first 50 concurrent invocations arrive on warm sandboxes with no cold start (p99 under 50 ms for most runtimes).
Invocations 51 through account-limit fall back to on-demand scaling and can cold-start.
You pay a per-GB-second fee for the pre-warmed capacity plus normal invocation cost.

Application Auto Scaling Integration

Provisioned Concurrency is itself a scalable target for AWS Application Auto Scaling, so you can:

Schedule pre-warming for business-hours peaks (e.g., 9 AM – 6 PM weekdays).
Target-track on ProvisionedConcurrencyUtilization to keep utilization at, say, 70%.

Combined with CloudWatch alarms, this makes AWS Lambda Provisioned Concurrency economically reasonable for predictable spiky traffic — the classic "traffic jumps 20× at market open" pattern.

Provisioned Concurrency vs SnapStart Decision

SnapStart — good for Java, Python, .NET; free or cheap on restore; ideal for unpredictable traffic.
Provisioned Concurrency — works for any runtime including Node.js and Go; ideal for predictable bursts where you need sub-50 ms tail latency.
You can combine: SnapStart reduces the INIT time of the sandboxes that Provisioned Concurrency pre-warms, so the warm-up fleet reaches ready state faster.

A perennial DVA-C02 AWS Lambda performance optimization confusion: Provisioned Concurrency is a floor of pre-warmed capacity (50 warm always available), while Reserved Concurrency is a floor AND ceiling (exactly N concurrent, no more no less). If you need "guaranteed warm + guaranteed capped," set Reserved = 100 and Provisioned = 50 on the same function; then you get 50 warm, up to 100 total, and no function starves the rest of the account. Reference: https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html

AWS Lambda Reserved Concurrency for Throttling

AWS Lambda Reserved Concurrency is the throttling knob of AWS Lambda performance optimization — it reshapes throughput and protects downstream dependencies.

How Reserved Concurrency Works

When you set Reserved Concurrency = 100 on function F:

F's concurrency is drawn from a private pool of exactly 100 (guaranteed floor).
F cannot exceed 100 concurrent executions — requests over 100 are throttled with HTTP 429 (sync) or sent to the async queue for retry / DLQ (async).
The remaining account pool (1,000 − 100 = 900 on a default account) is shared by every other function in the region.

Why Reserved Concurrency Is a Performance Lever, Not Just a Limit

AWS Lambda Reserved Concurrency is a performance lever because modern applications usually have a downstream bottleneck that Lambda must respect:

Amazon RDS connection limit (Postgres max_connections = 100 by default; 1,000 Lambdas hammering it exhausts connections).
Third-party API rate limit (Stripe, Twilio, SendGrid).
DynamoDB provisioned throughput on a legacy table.
Legacy on-premises SOAP endpoint with tiny thread pool.

Setting Reserved Concurrency below the downstream ceiling turns Lambda into a well-behaved client and prevents cascading failures. It's AWS Lambda performance optimization for the whole system, not just the function.

Reserved Concurrency Set to Zero = Kill Switch

A special case: Reserved Concurrency = 0 throttles every invocation of the function. This is the safe "kill switch" pattern — use it to pause a misbehaving async consumer without deleting the function, then raise it when the fix is deployed.

AWS Lambda Execution Context Reuse and Connection Pooling

AWS Lambda execution context reuse is the single cheapest AWS Lambda performance optimization technique — just write code outside the handler.

The Global Scope Pattern

Any code at module scope (outside the handler) runs once per execution environment, during INIT. For warm invocations, the results are already in memory. The canonical optimization:

// Node.js — runs ONCE during INIT
const { DynamoDBClient } = require("@aws-sdk/client-dynamodb");
const ddb = new DynamoDBClient({});

// Runs EVERY invocation
exports.handler = async (event) => {
  const result = await ddb.send(/* ... */);
  return result;
};

Things to initialize globally:

AWS SDK clients — they pool HTTPS keep-alive connections (SDK v3 Node.js enables keep-alive by default; v2 requires NodeHttpHandler with keepAlive: true; Python boto3 respects Config(keep_alive=True) through the underlying urllib3 pool).
Database connection pools — MySQL, Postgres, MongoDB, Redis clients.
Secrets / configuration — pull from Secrets Manager or Parameter Store once during INIT (but remember the uniqueness caveat with SnapStart).
Large static data — reference data, config JSON loaded from S3.
ML models — load once into memory at module scope.

The Anti-Pattern — Reconnecting Every Invocation

# ANTI-PATTERN: adds 20-50 ms to every invocation
def lambda_handler(event, context):
    conn = psycopg2.connect(host=..., ...)
    # ...
    conn.close()

Versus:

# Global init — runs once per sandbox
conn = psycopg2.connect(host=..., ..., keepalives=1)

def lambda_handler(event, context):
    with conn.cursor() as cur:
        # ...

This single refactor typically cuts p50 latency by 20–100 ms per invocation and reduces RDS connection churn by 95%+. It is the highest-ROI AWS Lambda performance optimization technique on the whole exam.

Amazon RDS Proxy as a Connection Pool Layer

When even pooled connections overwhelm RDS at high concurrency, Amazon RDS Proxy sits between AWS Lambda and RDS, maintaining a fixed-size pool on the RDS side and multiplexing thousands of Lambda invocations onto it. On DVA-C02, "Lambda exhausts RDS connections" → RDS Proxy is the canonical answer.

Moving SDK clients, DB pools, cached secrets, and reference data from inside the handler to module scope is the cheapest AWS Lambda performance optimization move — zero cost, zero deployment risk, and typically 20–100 ms shaved off p50. If a DVA-C02 scenario shows code that connects to DynamoDB / RDS / Redis inside the handler, the first refactor is always to hoist the client to global scope. Reference: https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

AWS Lambda VPC Configuration and Hyperplane ENI

AWS Lambda performance optimization in a VPC used to be the hardest problem on this topic; Hyperplane ENI largely fixed it, but the exam still tests the mechanics.

The Pre-2019 VPC Cold Start Nightmare

Before September 2019, attaching AWS Lambda to a VPC meant every sandbox got a dedicated ENI provisioned at cold start. ENI creation + attachment added 10–20 seconds to the first invocation, and at scale the account ran out of ENIs. This is the history DVA-C02 expects you to recognize when a question mentions "VPC cold start" — the historical context frames why Hyperplane ENI is the answer.

Hyperplane ENI (Current Architecture)

Today AWS Lambda uses Hyperplane — a shared NAT/proxy layer that provisions a small pool of ENIs lazily at function configuration time (not per-sandbox) and shares them across thousands of concurrent sandboxes. Consequences:

First attachment to a new VPC config still creates ENIs; subsequent cold starts reuse them.
VPC cold start overhead drops from 10+ seconds to tens of milliseconds.
ENI count scales with VPC subnets × security groups combinations, not concurrency.
Functions in the same account/region that share subnets + security groups share the underlying Hyperplane ENI pool.

VPC + Lambda Outbound Internet

A VPC-attached AWS Lambda in a private subnet has no route to the internet by default. For AWS service targets (S3, DynamoDB, Secrets Manager, KMS, STS, Lambda itself, CloudWatch Logs) the right answer is a VPC Endpoint — Gateway Endpoints for S3/DynamoDB, Interface Endpoints for everything else. For public-internet targets (Stripe, Twilio, Google API), you need a NAT Gateway in a public subnet.

Classic DVA-C02 AWS Lambda performance optimization trap: a function suddenly times out after being attached to a VPC. Root cause is almost never cold start (Hyperplane ENI killed that) — it's the missing NAT Gateway or VPC Endpoint. The fix is a route to the target, not Provisioned Concurrency. Always check network routing before reaching for concurrency features. Reference: https://docs.aws.amazon.com/lambda/latest/dg/foundation-networking.html

AWS Lambda arm64 Graviton2 — 20% Price-Performance

Switching AWS Lambda to arm64 Graviton2 is the most disproportionate AWS Lambda performance optimization move: a one-line config change for up to 20% better price-performance.

What Graviton2 Actually Gives You

AWS Lambda supports two instruction set architectures:

x86_64 — the default Intel/AMD architecture.
arm64 — the AWS Graviton2 processor.

At the same memory setting, arm64 functions bill 20% less per GB-second than x86_64 on AWS Lambda, and real-world workloads often run 5–15% faster thanks to Graviton2's wider vector units and better memory bandwidth on compiled code. The net effect is 20–34% cost reduction for compatible workloads.

Runtime Compatibility

Fully supported — Python, Node.js, Java, Ruby, .NET (6 / 8), provided.al2023.
Container images — must be built for linux/arm64 (use docker buildx multi-platform builds).
Native binaries — if your Lambda bundles a compiled C/C++/Rust/Go binary, you must ship an arm64 build; x86_64 binaries will not run.
Third-party native dependencies — NumPy, Pandas, Sharp, Prisma engines, etc. must have arm64 wheels/binaries. Most major libraries now do.

When Not to Switch

Legacy functions with an old native dependency that has no arm64 build.
Workloads benchmarked faster on x86_64 (rare — usually old Intel-specific SIMD code paths).

On DVA-C02, "how to reduce Lambda cost by 20%" + "no code changes" → arm64 Graviton2 is the canonical answer.

AWS Lambda Layers — Layer Size vs INIT Time Tradeoff

Lambda Layers are a sharing feature, but from an AWS Lambda performance optimization perspective they are also an INIT-time tax.

How Layers Affect Cold Start

Every Layer is downloaded, unzipped, and mounted at /opt during sandbox provisioning. Each Layer adds a few milliseconds of mount time plus any bytes loaded during runtime import. Consequences:

1 large Layer (your entire shared util library) loads faster than 5 small Layers because unzip overhead dominates.
Layer contents that are never import-ed still cost nothing beyond the mount — dead code in Layers is free at runtime (but costs storage and deploy time).
A 5-Layer, near-250 MB function can add 100–300 ms to cold start versus a 5 MB bundled function.

Best Practices for Layers in Performance Contexts

Use Layers for cross-function shared dependencies (NumPy across 20 data-science functions).
Do not use Layers just to shrink your ZIP — bundlers (esbuild, webpack, Rollup) that tree-shake into a single 5 MB file outperform Layers for latency.
Combine multiple small custom Layers into one Layer when they always deploy together.
For Lambda Extensions delivered as Layers, remember extensions run during INIT and INVOKE — a heavy extension adds latency to every cold start.

Counting the Combined 250 MB Limit

Function code + all attached Layers + Extensions cannot exceed 250 MB unzipped total. Running close to the limit hurts cold start; staying under 50 MB total is the AWS Lambda performance optimization sweet spot for most functions.

AWS Lambda Extensions Cost Model

Lambda Extensions change the AWS Lambda performance optimization calculus because they charge compute and bend the SHUTDOWN window.

How Extensions Bill

A Lambda Extension runs as a separate process (external) or inside the runtime (internal) inside the same execution environment. Consequences:

The extension's CPU and memory usage is billed against the function's GB-seconds — AWS Lambda cannot distinguish "your code" from "your extension."
A chatty observability extension can add 5–20% overhead per invocation.
Extensions participate in INIT (up to 10 s), INVOKE (event-based), and SHUTDOWN (up to 2 s). The SHUTDOWN budget lets extensions flush telemetry to Datadog / New Relic / Dynatrace after the handler returns.

The Parameters and Secrets Lambda Extension

The AWS Parameters and Secrets Lambda Extension is an official Extension that caches Parameter Store and Secrets Manager values in-process, serving them from localhost:2773 with zero network calls after the first fetch. AWS Lambda performance optimization when retrieving secrets drops from ~80 ms (first call) to <5 ms (cached) with a TTL you control. On DVA-C02, "cache secrets across invocations without code changes" → Parameters and Secrets Lambda Extension is the canonical answer.

When NOT to Use an Extension

Latency-critical functions where every millisecond counts.
Cost-sensitive functions with already-optimized SDK code.
Cases where a global-scope SDK fetch + in-process cache achieves the same effect.

AWS Lambda Function URL vs API Gateway — Latency Tradeoff

Function URL versus API Gateway is a DVA-C02 AWS Lambda performance optimization question that splits latency, features, and cost.

What a Lambda Function URL Gives You

A Lambda Function URL is a dedicated HTTPS endpoint bound directly to a specific function + alias/version. Configuration is:

HTTPS only, TLS 1.2+.
Auth: NONE (public) or AWS_IAM.
CORS configured on the function.
No mapping templates, no authorizers, no usage plans, no API keys.

Latency overhead: Function URL adds ~5–10 ms of HTTP-to-Lambda wiring — the lowest-latency synchronous entry into AWS Lambda.

What API Gateway Gives You

API Gateway REST/HTTP API add 20–50 ms (REST) or 10–20 ms (HTTP) of routing, auth, throttling, and transformation overhead in exchange for Cognito authorizers, Lambda authorizers, usage plans, API keys, mapping templates, WAF integration, caching, custom domains with ACM certificates, and OpenAPI import.

Choosing for AWS Lambda Performance Optimization

Function URL — when you need raw HTTP speed, IAM sufficient for auth, and no fancy transforms. Common for internal service-to-service calls or Auth0/Clerk-authenticated webhooks.
API Gateway HTTP API — most common balance: sub-20 ms overhead, Cognito/JWT auth, and route-based Lambda integration.
API Gateway REST API — when you need request validation, mapping templates, usage plans, or edge-optimized/private endpoints.

The DVA-C02 exam routinely asks "lowest latency for invoking a Lambda from an external HTTP client" and the answer depends on auth needs: pure internal → Function URL; needs Cognito/JWT → HTTP API; needs mapping templates → REST API.

AWS Lambda Concurrency Quotas and Burst Limits

Concurrency quota math is essential for AWS Lambda performance optimization at scale.

The Account Concurrency Pool

Every AWS account starts with a regional soft quota of 1,000 concurrent executions. This is a single pool shared across all functions in that region. You request increases through AWS Service Quotas / AWS Support — multi-thousand and tens-of-thousands limits are common for large customers.

Burst Concurrency (Initial Scaling Rate)

When a function scales from zero, AWS Lambda allows an immediate burst of 500–3,000 concurrent executions depending on region (US/EU: 3,000; some other regions: 1,000 or 500). After the initial burst, concurrency grows +1,000 per minute until it hits the account pool or the function's Reserved Concurrency ceiling.

Example: function is idle, traffic suddenly spikes to 5,000 RPS of 1-second invocations in us-east-1.

Second 0 → up to 3,000 concurrent.
Second 60 → up to 4,000 concurrent.
Second 120 → up to 5,000 concurrent.
During seconds 0–120, any invocations above the cap are throttled (HTTP 429 sync, SQS retry / DLQ async).

When Burst Capacity Is Insufficient

Solutions, in order of cost:

SnapStart — faster warm-up inside the burst window.
Provisioned Concurrency — pre-warm N sandboxes so burst-from-zero becomes burst-from-N.
Reserved Concurrency on noisy neighbors — prevents other functions from starving the pool.
Quota increase — request via AWS Support if aggregate account demand truly needs >1,000.

1,000 default regional concurrent executions (soft quota). Initial burst 500, 1,000, or 3,000 depending on region. After burst, scaling at +1,000 per minute. Reserved Concurrency caps a function; Provisioned Concurrency pre-warms. Memorize these three numbers and every AWS Lambda performance optimization throttling scenario becomes a simple arithmetic problem. Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html

AWS Lambda Payload Limits — 6 MB Sync / 256 KB Async

Payload size is a hidden AWS Lambda performance optimization lever: the hard limits force you into different integration patterns for large data.

The Hard Limits

Synchronous invocation — 6 MB request AND 6 MB response.
Asynchronous invocation — 256 KB per event.
Event source mapping batches — 6 MB per batch for SQS, effectively unbounded record count within the batch limit for streams.

The S3 Escape Hatch

When payloads exceed these limits, the idiomatic pattern is Amazon S3 as the data carrier:

Caller uploads the large object to S3 (direct PUT or pre-signed URL).
Caller invokes AWS Lambda with a small pointer event (bucket + key).
Lambda downloads the object from S3, processes it, writes the result back to S3.
Lambda returns a small response (result key or status).

This pattern scales to gigabyte payloads with no Lambda-side size limit, at the cost of two additional S3 round-trips per invocation. AWS Lambda performance optimization for large data = S3 pointer pattern.

The Amazon EFS Escape Hatch

For workloads that share state across invocations — large reference datasets, model artifacts, build caches — Amazon EFS can mount into the Lambda execution environment at a local path. The file system is shared across all warm sandboxes of the function and across function versions. EFS adds first-access latency (single-digit ms) but removes the 250 MB unzipped cap and the per-invocation S3 download cost.

Decision rule:

Transient large payloads → S3 pointer.
Persistent large reference data shared across invocations → Amazon EFS mount.
Per-sandbox large temp data → /tmp up to 10 GB.

AWS X-Ray Active Tracing Overhead

AWS X-Ray active tracing is the default observability tool, but it is not free from an AWS Lambda performance optimization angle.

The Overhead Profile

With active tracing enabled:

Each sampled invocation writes trace segments through the X-Ray SDK.
Segment serialization and transmission add 1–5 ms per invocation.
Subsegments for downstream calls add another 0.5–2 ms each.
The default sampling rule is ~1 request per second + 5% beyond — so most invocations pay little.
At high sampling rates (e.g., 100%) the cost can reach 5–10% of duration.

Performance Best Practices

Sampling rules — tune per-service to sample 1–5% of steady-state traffic; 100% only for debugging windows.
Annotations vs metadata — annotations are indexed (filterable, billed per dimension); prefer metadata for unbounded key-value data.
Subsegment budget — avoid wrapping every function call; focus on downstream network calls (DynamoDB, HTTP APIs).

On DVA-C02, "reduce X-Ray billing without losing key traces" → tune sampling rules, not disable tracing.

Efficient Packaging — Bundle, Tree-Shake, Minify

Efficient packaging is table-stakes AWS Lambda performance optimization and appears in DVA-C02 as "reduce cold start."

Node.js / TypeScript

Use esbuild (SAM CLI's default with esbuild BuildMethod) or webpack to bundle source + dependencies into a single file, tree-shaking dead code, and minifying. A typical Node.js Lambda goes from 50 MB (node_modules + source) to 1–5 MB bundled — cold start drops proportionally.

# AWS SAM — esbuild build
Metadata:
  BuildMethod: esbuild
  BuildProperties:
    Minify: true
    Target: es2022
    External:
      - "@aws-sdk/*"   # use Lambda's baked-in SDK v3

External-izing @aws-sdk/* (provided by the Lambda Node.js 18+/20+ runtime) shrinks the bundle further.

Python

Use pip install --target ./package + zip -r to produce a minimal ZIP. Exclude *.pyc, __pycache__, tests, documentation. For heavy ML dependencies, consider container images to avoid the 250 MB limit.

Java

Use jlink to produce a custom JRE with only the modules you need, or compile with GraalVM native-image (with the AWS Lambda Custom Runtime) to produce a native binary that cold-starts in tens of ms. AWS SAM supports Gradle -Pfunction.package.type=Zip -Pfunction.package.optimized=true for layer optimization.

.NET

Use dotnet publish -c Release -r linux-arm64 --self-contained false against the managed .NET 8 runtime; enable trimming (PublishTrimmed=true) to remove unused assemblies.

Go / Rust

Compile to static binaries against the provided.al2023 runtime — cold starts are already 100–200 ms because there is no runtime init beyond the binary startup.

Before enabling SnapStart or Provisioned Concurrency, shrink the deployment package. An esbuild-bundled Node.js function at 2 MB cold-starts 2–3× faster than a 50 MB raw node_modules ZIP, for zero ongoing cost. On DVA-C02, "reduce cold start without increasing cost" + "Node.js" points at bundling/tree-shaking as the answer. Reference: https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-2/

Async Invocation — Destinations vs DLQ for Retry Control

AWS Lambda performance optimization on asynchronous invocations hinges on retry semantics and failure routing.

The Retry Model

Asynchronous AWS Lambda:

Default retries = 2 additional attempts (configurable 0/1/2), with exponential backoff.
Default max event age = 6 hours (configurable 60 s – 6 h).
After all retries fail, the event goes to a destination (on-failure) or DLQ.

Destinations (Modern, Recommended)

Lambda Destinations let you route both on-success and on-failure results to one of four targets:

Amazon SQS — durable queue for failure inspection.
Amazon SNS — fan-out alerting.
Amazon EventBridge — schema-aware routing, cross-account.
Another AWS Lambda function — chain into a compensation workflow.

Destinations carry rich context: the original invocation payload, the response or error, the number of retries, and the invocation metadata. This dramatically simplifies distributed troubleshooting.

DLQ (Legacy, Still Tested)

Lambda Dead-Letter Queues predate Destinations. A DLQ is a single SQS queue or SNS topic configured on the function; on final failure, AWS Lambda writes only the original event (no error context). DLQs still appear on DVA-C02 because many real-world accounts still use them.

Choosing

New work → Destinations (on-failure + on-success).
Legacy consistency → DLQ.
Best practice → both can be set simultaneously; Destinations win the richer metadata game.

Destinations (on-failure) carry the original event, the error, and context — ideal for observability and compensation. DLQ carries only the original event. For new async Lambdas on DVA-C02, the answer is always Destinations; DLQ is the "legacy" distractor. Both fire only on async failures — synchronous failures return the error to the caller and neither Destination nor DLQ is involved. Reference: https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html

Event Source Mapping Performance Tuning

Poll-based sources (SQS, Kinesis, DynamoDB Streams, MSK) have their own AWS Lambda performance optimization levers.

SQS Performance Levers

BatchSize (1–10 for standard, up to 10,000 with BatchWindow for high-throughput) — more messages per invocation = less overhead per message.
MaximumBatchingWindowInSeconds — wait up to N seconds to fill a batch before invoking.
ScalingConfig.MaximumConcurrency (2022+) — cap the SQS-driven concurrency independently of Reserved Concurrency.
ReportBatchItemFailures — return partial batch failures so only poison messages go back to the queue.

Kinesis / DynamoDB Streams Performance Levers

BatchSize up to 10,000.
ParallelizationFactor 1–10 — run up to 10 Lambdas per shard in parallel (breaks per-record ordering, preserves per-partition-key ordering).
MaximumRecordAgeInSeconds — skip records older than threshold to avoid being stuck on a poison-pill batch forever.
BisectBatchOnFunctionError — halve-and-retry to isolate the failing record.
On-failure destination — ship the failing batch metadata to SQS/SNS for triage.

SQS vs Kinesis for AWS Lambda Performance Optimization

SQS — unlimited throughput (charged per API call), no ordering guarantee (standard) or per-MessageGroup ordering (FIFO), no replay, simple retry semantics.
Kinesis — ordered per shard, 1 MB/s in / 2 MB/s out per shard (or enhanced fan-out for 2 MB/s per consumer), replayable up to 7 days (365 with extended retention), batch-oriented with backoff.

DVA-C02 routinely asks "high-throughput event processing with replay" → Kinesis; "simple queue with per-message retry" → SQS.

AWS Lambda Performance Optimization Traps on DVA-C02

Knowing the traps is worth as many points as knowing the features.

Trap 1 — Provisioned vs Reserved Name Confusion

Reserved sounds like "pre-reserves capacity" but does NOT pre-warm sandboxes — it only carves out a slice of the pool. Provisioned is the pre-warming one. Re-read these two words slowly every time they appear.

Trap 2 — SnapStart Uniqueness Inheritance

Module-scope UUID.randomUUID() captured in a SnapStart snapshot will yield the same value across all restored sandboxes. Always re-seed uniqueness in afterRestore hooks.

Trap 3 — VPC Cold Start No Longer the Villain

Post-Hyperplane ENI, VPC cold start overhead is tens of ms, not seconds. If a 2024 scenario blames "VPC" for 10 s cold starts, the answer is almost always elsewhere (large package, heavy JVM init, missing SnapStart).

Trap 4 — Global Init Versus Handler-Local

Writing new DynamoDBClient() inside the handler runs on every invocation; outside the handler, once per sandbox. Always hoist SDK clients, DB pools, and secrets fetchers to global scope.

Trap 5 — Payload Limit Escape via S3

When a scenario describes "processing 100 MB files," Lambda itself cannot receive 100 MB — the caller must upload to S3 and pass a pointer. "Invoke Lambda with 100 MB payload" is always a distractor.

Trap 6 — Function URL vs API Gateway Latency

Function URL is the fastest path to a Lambda (~5–10 ms overhead) but lacks Cognito authorizers, mapping templates, and usage plans. If the scenario mentions any of those features, the answer is API Gateway.

Trap 7 — arm64 Incompatibility with Native Binaries

Switching to arm64 Graviton2 saves 20% but breaks any x86_64-only native binary. Verify dependency compatibility before flipping the switch.

Trap 8 — Destinations and DLQ on Synchronous Failures

Neither Destinations nor DLQ fire on synchronous AWS Lambda failures — only on async. "API Gateway → Lambda fails, why is DLQ empty?" Because DLQ only receives async failures.

A cost-optimization scenario: the candidate sees "reduce Lambda bill" and picks "lower memory to 128 MB." This is wrong for CPU-bound functions — lowering memory below the vCPU anchor (1,769 MB) often triples duration and increases total GB-seconds. AWS Lambda performance optimization requires running AWS Lambda Power Tuning to find the empirical sweet spot, which is frequently HIGHER than current memory for CPU-bound workloads. Reference: https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html

AWS Lambda Performance Optimization Decision Matrix

A consolidated decision matrix to map symptom → lever.

Symptom: High p99 Latency on Cold Starts

Shrink package (bundler, tree-shake, arm64 native).
Lazy-load rarely-used imports inside the handler.
Enable SnapStart (Java 11/17/21, Python 3.12+, .NET 8+).
Enable Provisioned Concurrency for predictable bursts.

Symptom: High p50 Latency on Warm Invocations

Hoist SDK clients and DB pools to global scope.
Run AWS Lambda Power Tuning to raise memory (CPU-bound).
Use Parameters and Secrets Lambda Extension to cache secrets.
Cache reference data in /tmp or global variables.
For I/O-heavy patterns, consider Step Functions Express to parallelize instead of sequencing inside one handler.

Symptom: Function Throttled Under Spike

Raise burst concurrency awareness — check the 500/1,000/3,000 regional initial burst.
Enable Provisioned Concurrency for the first N concurrent.
Request an account concurrency quota increase.
Add Reserved Concurrency to noisy neighbors to protect this function.

Symptom: Downstream Service Overwhelmed by Lambda

Set Reserved Concurrency = downstream capacity.
Use RDS Proxy for database-backed workloads.
Switch triggers to SQS with tuned BatchSize + MaximumConcurrency to smooth bursts.

Symptom: Bill Too High

Run AWS Lambda Power Tuning with cost strategy.
Switch to arm64 Graviton2 (20% off per GB-second).
Check for silent re-work inside the handler that could be moved to global init.
Audit Lambda Extensions — each extension bills against function GB-seconds.
Lower X-Ray sampling rate in production.

Symptom: Events Larger Than 6 MB

Switch to S3 pointer pattern — upload to S3, invoke Lambda with bucket+key.
For shared reference data, mount Amazon EFS.
For high-throughput event fan-out, keep events small and store payload in S3.

AWS Lambda Performance Optimization Limits Cheat Sheet

Memorize these AWS Lambda performance optimization numbers cold:

Memory — 128 MB – 10,240 MB; 1,769 MB = 1 vCPU; 10,240 MB ≈ 6 vCPUs.
Timeout — 900 s (15 min) max; default 3 s.
Ephemeral /tmp — 512 MB default, up to 10,240 MB.
Deployment package — 50 MB direct ZIP, 250 MB unzipped via S3, 10 GB container image.
Payload — 6 MB sync / 256 KB async.
Layers — 5 per function, 250 MB combined unzipped.
Environment variables — 4 KB total.
Account concurrency — 1,000 default (soft), raisable.
Burst concurrency — 500 / 1,000 / 3,000 depending on region; +1,000 per minute after.
Async retries — 0 / 1 / 2 (default 2 = 3 total attempts).
Max event age — 60 s – 6 h (default 6 h).
SnapStart restore latency — typically 100–400 ms.
Provisioned Concurrency auto-scaling — via Application Auto Scaling (scheduled or target-tracking on utilization).
arm64 discount — ~20% cheaper per GB-second versus x86_64.

FAQ — AWS Lambda Performance Optimization

Q1. What is the fastest way to cut AWS Lambda cold start latency on DVA-C02?

The cheapest, fastest AWS Lambda performance optimization for cold starts is to shrink the deployment package — bundle and tree-shake Node.js/TypeScript with esbuild, trim unused Python dependencies, or ship arm64 native binaries for Go/Rust. A 2 MB bundle cold-starts 2–3× faster than a 50 MB raw ZIP for free. Only after shrinking the package does it make sense to enable AWS Lambda SnapStart (Java, Python 3.12+, .NET 8+) or Provisioned Concurrency (any runtime). These three levers — shrink, snapshot, pre-warm — form the cold-start ladder you climb in order.

Q2. When should I pick Provisioned Concurrency vs SnapStart vs Reserved Concurrency?

Provisioned Concurrency is the brute-force pre-warming lever — pay per-GB-second to keep N sandboxes fully initialized so that the first N concurrent invocations hit no cold start. SnapStart is the free-or-cheap snapshot lever for supported runtimes (Java/Python/.NET) — it restores a post-INIT snapshot in 100–400 ms, killing most of the cold start pain without paying for idle capacity. Reserved Concurrency is the throttling lever — it guarantees a capacity floor AND caps the function's maximum concurrency, protecting both the function (from starvation by noisy neighbors) and downstream services (from being overwhelmed). AWS Lambda performance optimization usually combines all three: SnapStart for speed, Provisioned for predictable bursts, Reserved to protect RDS or third-party APIs.

Q3. How do I find the optimal AWS Lambda memory setting?

Use AWS Lambda Power Tuning — an open-source Step Functions state machine deployable from the AWS Serverless Application Repository. Pass the function ARN, a representative payload, and a list of memory sizes (e.g., [128, 512, 1024, 1792, 3008, 5120, 10240]) plus a strategy (speed, cost, or balanced). The state machine invokes the function N times at each memory and returns a visualization with the sweet spot. For CPU-bound workloads, the optimum is frequently higher than intuition suggests — 1,792 MB or 3,008 MB — because CPU scales linearly with memory (1 full vCPU at 1,769 MB, ~6 vCPUs at 10,240 MB) and shorter duration often outweighs the per-ms cost increase.

Q4. What is AWS Lambda SnapStart and which runtimes support it?

AWS Lambda SnapStart snapshots the execution environment after the INIT phase on a published version and caches it. On subsequent cold starts, AWS Lambda restores from the snapshot in ~100–400 ms instead of re-running INIT — this is transformative for Java (cold start drops from 1–6 s to 200–400 ms). As of DVA-C02 v2.1 scope, SnapStart is supported on Java 11, 17, 21, Python 3.12+, and .NET 8+. The critical gotcha is uniqueness re-seeding: any module-scope state that must be unique per sandbox (RNG seeds, UUIDs, cached credentials, DB connection IDs) will be frozen into the snapshot and inherited by every restore. Register CRaCResource (Java) or beforeSnapshot / afterRestore (Python, .NET) hooks to re-seed in the afterRestore callback.

Q5. How does AWS Lambda execution context reuse speed up warm invocations?

AWS Lambda freezes and reuses execution environments across invocations, so any code you run at module scope (outside the handler) executes exactly once during INIT and its results stay in memory for all warm invocations. The AWS Lambda performance optimization pattern is to hoist AWS SDK clients (which hold HTTPS keep-alive connection pools), database connection pools, cached secrets, and reference data to module scope; the handler then reuses them. This single refactor typically cuts p50 latency by 20–100 ms and reduces RDS connection churn by 95%+. The anti-pattern is creating new clients or connections inside the handler — this forces a TLS handshake and connection open on every invocation.

Q6. How does Hyperplane ENI affect AWS Lambda VPC performance?

Before 2019, attaching AWS Lambda to a VPC created a dedicated ENI per sandbox, adding 10+ seconds to cold start and exhausting account ENI limits at scale. AWS Lambda now uses Hyperplane ENIs — a shared NAT/proxy layer that provisions a small pool of ENIs lazily at function configuration time and multiplexes thousands of concurrent sandboxes through them. Current VPC cold start overhead is tens of milliseconds, not seconds. However, a VPC-attached AWS Lambda in a private subnet still has no internet access without a NAT Gateway (for public APIs) or VPC Endpoints (for AWS services — prefer Gateway Endpoints for S3/DynamoDB, Interface Endpoints for Secrets Manager, KMS, STS). On DVA-C02, "Lambda in VPC times out calling external API" = missing NAT or missing Endpoint, not cold start.

Q7. When should I switch AWS Lambda to arm64 Graviton2?

Switch to arm64 Graviton2 whenever the dependency graph supports it — you get ~20% cheaper per GB-second billing and frequently 5–15% faster real-world duration for a single config change. Compatibility requirements: (1) runtime must support arm64 (Python, Node.js, Java, Ruby, .NET 6/8, provided.al2023 all do); (2) container images must be built for linux/arm64; (3) bundled native binaries (Go, Rust, C extensions) must be compiled for arm64; (4) native Python/Node.js dependencies (NumPy, Pandas, Sharp, Prisma) must have arm64 wheels — most major libraries now do. On DVA-C02, "reduce Lambda cost by 20% with no code changes" is the canonical arm64 Graviton2 scenario.

Q8. How do I handle AWS Lambda payloads larger than 6 MB?

AWS Lambda hard limits are 6 MB synchronous (request and response) and 256 KB asynchronous per event. For larger data, use the Amazon S3 pointer pattern: caller uploads the large object to S3 (direct PUT or pre-signed URL), then invokes AWS Lambda with a small event containing {bucket, key}. The handler downloads from S3, processes, and writes results back to S3, returning only a status or result key. For persistent large reference data shared across invocations (ML models, lookup tables), mount Amazon EFS at a local path — EFS is shared across all warm sandboxes of the function and removes the 250 MB unzipped package cap. For per-sandbox temp data up to 10 GB, use /tmp ephemeral storage. These three escape hatches cover every "Lambda can't handle big payload" scenario on DVA-C02.

Q9. What is the difference between Lambda Destinations and DLQ for async invocations?

Both Destinations and DLQ catch asynchronous AWS Lambda failures after all retries exhaust (default 2 retries, configurable 0–2). DLQ (legacy) is a single SQS queue or SNS topic that receives only the original event — no error context, no response, no retry count. Lambda Destinations (modern, recommended) let you configure on-success AND on-failure targets across four service types (SQS, SNS, EventBridge, another Lambda) and carry rich context: original event, response or error, retry count, invocation metadata. Destinations are the DVA-C02-preferred answer for new async work; DLQ remains in scope because many legacy systems still use it. Neither fires on synchronous invocation failures — synchronous failures return the error to the caller, and the caller is responsible for retries.

Q10. What memory setting gives AWS Lambda one full vCPU and why does it matter?

AWS Lambda allocates CPU linearly with memory. The anchor points to memorize: 1,769 MB = 1 full vCPU, 3,538 MB ≈ 2 vCPUs, 5,307 MB ≈ 3 vCPUs, 10,240 MB ≈ 6 vCPUs. Matters because most CPU-bound workloads (image processing, PDF rendering, crypto, large JSON parsing, compression) perform dramatically worse below 1,769 MB — a 512 MB function often runs 3× slower than a 1,792 MB function, and since AWS Lambda bills GB-seconds, the higher-memory version is both faster AND cheaper. AWS Lambda performance optimization through memory tuning is therefore rarely "minimum memory = minimum cost"; run AWS Lambda Power Tuning and let the data drive the choice.

Summary — AWS Lambda Performance Optimization at a Glance

AWS Lambda performance optimization is Task 4.3 on DVA-C02 — four dimensions (latency, throughput, cost, reliability) governed by levers that target the three lifecycle phases (INIT, INVOKE, SHUTDOWN).
Memory is the primary knob — 128 MB–10,240 MB maps linearly to CPU (1 vCPU at 1,769 MB, 6 vCPUs at 10,240 MB); use AWS Lambda Power Tuning to find the empirical sweet spot.
Cold start is a composite of runtime bootstrap + package load + function init; mitigate via (1) package shrinking, (2) SnapStart (Java/Python/.NET), (3) Provisioned Concurrency (any runtime).
SnapStart snapshots post-INIT state and restores in 100–400 ms; always re-seed uniqueness (RNG, UUIDs, DB connection IDs) in afterRestore hooks.
Provisioned Concurrency pre-warms N sandboxes for zero cold start up to N concurrent; integrates with Application Auto Scaling.
Reserved Concurrency is a floor AND ceiling — protects the function and caps downstream impact.
Execution context reuse — hoist SDK clients and DB pools to global scope for 20–100 ms p50 savings; use RDS Proxy to pool database connections at scale.
VPC cold start is tens of ms post-Hyperplane ENI; use VPC Endpoints (AWS services) or NAT Gateway (public internet) for outbound.
arm64 Graviton2 gives ~20% price-performance for a one-line config change, contingent on dependency compatibility.
Lambda Layers trade share-ability for INIT time — bundling usually beats Layers for cold start.
Lambda Extensions bill against function GB-seconds; the Parameters and Secrets Lambda Extension is the canonical answer for caching Secrets Manager / Parameter Store values.
Function URL is the lowest-latency HTTPS entry to Lambda (~5–10 ms overhead); API Gateway adds 20–50 ms in exchange for auth, throttling, and mapping templates.
Concurrency quotas — 1,000 default, 500/1,000/3,000 burst, +1,000 per minute scaling.
Payload limits — 6 MB sync / 256 KB async; use S3 pointer pattern or Amazon EFS for larger data.
Efficient packaging (esbuild/webpack, trimmed Python, arm64 native, GraalVM) is the cheapest cold-start win.
Destinations replace DLQ for modern async failure routing; both fire only on async, never on sync.
Event source mapping — tune BatchSize, MaximumBatchingWindowInSeconds, ParallelizationFactor, and ReportBatchItemFailures for stream and queue throughput.

Master these AWS Lambda performance optimization levers and Task 4.3 becomes the highest-accuracy section on the whole DVA-C02 exam — and the skills transfer directly to running serverless systems in production, which is the real point of the certification.