DynamoDB Data Modeling — DVA-C02 Deep Dive (Partition Keys, Single-Table Design, GSI, Streams, Transactions)

DynamoDB data modeling is arguably the single heaviest topic on the AWS Certified Developer Associate DVA-C02 exam. While the exam expects broad coverage of AWS developer services, it hits DynamoDB data modeling questions hardest because DynamoDB is AWS's flagship NoSQL primitive and the default state store for serverless architectures. A developer who cannot reason about partition keys, sort keys, single-table design, LSI versus GSI, RCU and WCU math, streams, transactions, and PITR will fail DynamoDB data modeling scenarios even if they memorise every other service. This deep dive walks through every DynamoDB data modeling construct in the DVA-C02 blueprint, pairs each concept with a plain-language analogy, loads up callouts for traps and tips, and closes with exam-style FAQs so you can drill DynamoDB data modeling until the patterns are automatic.

What Is DynamoDB Data Modeling?

DynamoDB data modeling is the practice of designing tables, keys, indexes, and capacity settings in Amazon DynamoDB so that every application access pattern can be served with predictable single-digit millisecond latency at any scale. Unlike relational data modeling, which starts from normalised entities and relies on JOINs to answer ad-hoc queries, DynamoDB data modeling starts from a list of access patterns — "get order by order id", "list orders for a customer between two dates", "find all items in cart X" — and works backwards to a table shape that serves those patterns without scans.

DynamoDB data modeling is the skill that distinguishes a junior DynamoDB user from a serverless architect. The DVA-C02 exam guide explicitly calls out DynamoDB data modeling under Domain 1 (Development with AWS Services) because data access layers written against DynamoDB are the core state primitive of AWS Lambda, Amazon API Gateway, and AWS AppSync applications. Getting DynamoDB data modeling right means your application scales from zero to millions of requests without re-architecting. Getting DynamoDB data modeling wrong means hot partitions, surprise bills, throttled writes, and expensive full-table scans.

Why DynamoDB Data Modeling Is Hard for Relational Developers

Candidates coming from PostgreSQL, MySQL, or Oracle backgrounds often find DynamoDB data modeling counter-intuitive. Relational modeling rewards normalisation, referential integrity, and deferring access-pattern decisions until query time. DynamoDB data modeling inverts every one of those instincts. You denormalise aggressively, embed related entities, and you must enumerate every access pattern before you create the first table. DVA-C02 questions exploit this inversion constantly: the right answer to many DynamoDB data modeling scenarios is counter-intuitive unless you have internalised the NoSQL mindset.

Where DynamoDB Data Modeling Sits in the DVA-C02 Blueprint

Domain 1 (Development with AWS Services, 32% of the DVA-C02 exam) is where DynamoDB data modeling is scored most heavily, but DynamoDB also appears in Domain 2 (Security) via IAM condition keys on partition keys, in Domain 3 (Deployment) via infrastructure-as-code for tables and indexes, and in Domain 4 (Troubleshooting and Optimization) via capacity tuning, DAX caching, and CloudWatch metrics. Expect DynamoDB data modeling questions on roughly 10 to 15 of the 65 scored DVA-C02 questions.

白話文解釋 DynamoDB Data Modeling

If the formal language of DynamoDB data modeling still feels abstract, reframe it using one of these three everyday analogies. Pick whichever sticks best for you and recall it during the exam.

Analogy 1: The Coat-Check Room

Picture DynamoDB data modeling as running a massive coat-check room at a concert venue. The partition key is the hook number printed on your ticket — whoever brings hook number 42 gets coat 42 returned instantly, no scanning the rack. The sort key is the order in which multiple items hang on the same hook: a coat, a bag, an umbrella, a hat — all tied to ticket 42 but retrievable in a predictable order. Single-table design is the observation that one giant coat-check room with colour-coded tickets (USER#, ORDER#, ITEM#) is cheaper and faster than running ten separate rooms. A Global Secondary Index (GSI) is a second ticket booth that sorts items by a different attribute, such as "all red coats" across all hooks. DynamoDB Streams is the CCTV feed that broadcasts every hook event (put, modify, remove) to any listener. Transactions are the bouncer who guarantees that "hand in coat AND return deposit" either both happen or neither does. DAX is the fast-lane desk at the entrance that remembers your face so you skip the lookup entirely.

Analogy 2: The Library Card Catalogue

Treat DynamoDB data modeling as organising a public library's card catalogue. The partition key is the primary drawer — ISBN. The sort key is the order inside the drawer — edition year. A Local Secondary Index (LSI) is a second set of tabs inside the same drawer that lets you scan by author name instead of edition year; because the drawer is the same, you must decide at drawer-construction time whether to add author tabs. A Global Secondary Index (GSI) is an entirely separate catalogue cabinet that re-indexes the library by subject or by language; you can build a new GSI cabinet at any time, but the new cabinet is refreshed eventually, not instantly. Strongly consistent reads are like checking the master copy at the circulation desk — slower but always current. Eventually consistent reads are like checking a branch cabinet — twice as cheap, but a freshly returned book might not appear for a second or two. Transactional reads and writes are the librarian stamp that ensures every book in a multi-step check-out operation gets updated atomically, at twice the usual cost. Point-in-time recovery is the microfilm archive that lets you rewind the catalogue to any moment in the last 35 days.

Analogy 3: The Open-Book Exam Cheat Sheet

Imagine DynamoDB data modeling as compressing all your course notes onto a single cheat sheet for an open-book exam. You cannot include everything, so you write down only the answers to the specific questions you expect to be asked. The partition key is the heading at the top of each section — you flip straight to "ORDERS" or "USERS" without scanning the whole sheet. The sort key orders notes within a section — chronological, alphabetical, priority. A composite primary key lets a single section contain many related entries. Overloaded keys (USER#123 alongside ORDER#456 in the same table) let you pack two topics onto one sheet. If you anticipate a new question type after the sheet is printed, you can staple a GSI photocopy on the back — a copy of the cheat sheet re-sorted by a different attribute. You cannot staple a new LSI after printing, because LSIs live inside the original partition section. DAX is your short-term memory — you have already rehearsed the top 100 facts so you do not even look at the sheet. Transactions are the teacher's rule that you must answer parts (a), (b), and (c) together or leave the whole question blank. TTL is the eraser that wipes expired scribbles after a set time.

Core Operating Principles of DynamoDB Data Modeling

Every DynamoDB data modeling decision boils down to a small number of principles. Internalise these and the rest of the exam material clicks into place.

Principle 1: Model Access Patterns First

In relational design you normalise entities and defer query design. In DynamoDB data modeling you enumerate every access pattern first — "get user by id", "list orders for user by date", "find cart items by session", "count messages by room" — and then you reverse-engineer the partition key, sort key, and secondary indexes to answer each pattern with one query and no scans.

Principle 2: Avoid Scans and Filters in Production Paths

A DynamoDB Scan reads every item in the table and costs one RCU per 4 KB. On a 10 GB table, a single scan can cost thousands of RCUs. DynamoDB data modeling done well means every production request is a Query (partition-key scoped) or a GetItem (single key), not a Scan. Filters are applied after reading from disk, so they reduce payload but not cost.

Principle 3: Distribute Load Across Partitions

DynamoDB spreads items across physical partitions based on a hash of the partition key. If one partition key value gets 90% of traffic (a "hot partition"), you will hit the per-partition throughput cap regardless of how much capacity the table has. Good DynamoDB data modeling picks partition keys with high cardinality (user id, device id, session id) and avoids low-cardinality keys (status=ACTIVE, country=US, yyyy-mm-dd).

Principle 4: Denormalise Aggressively

Embed related data inside a single item where it is always read together (for example, order line items inside the order record) rather than spreading across multiple tables. The trade-off is larger items and duplicated data, but the benefit is one query instead of many. DynamoDB items can be up to 400 KB, which is enough headroom for most embedded patterns.

Principle 5: Choose Capacity Mode to Match Workload Shape

On-Demand absorbs unpredictable spikes without capacity planning. Provisioned with Auto Scaling is cheaper for steady-state workloads. DynamoDB data modeling includes capacity planning because the wrong mode inflates bills by 5× to 10×.

DynamoDB data modeling is the discipline of designing partition keys, sort keys, secondary indexes, item shapes, and capacity settings in Amazon DynamoDB to serve every application access pattern with predictable single-digit millisecond latency at any scale. Unlike relational modeling, DynamoDB data modeling starts from enumerated access patterns and denormalises aggressively to avoid scans and JOINs. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

Primary Keys in DynamoDB Data Modeling

Every DynamoDB table has a primary key that uniquely identifies each item. DynamoDB data modeling offers two primary-key shapes, and the choice drives everything that follows.

Partition Key Only (Simple Primary Key)

A simple primary key is a single attribute — the partition key. DynamoDB hashes the partition key value and routes the item to the physical partition that owns that hash range. GetItem by partition key is an O(1) operation that returns in single-digit milliseconds. Use a simple primary key when you only ever look up items by a unique id (for example, a user id, an order id, or a session token) and you never need to query for multiple items that share a common attribute.

Partition Key + Sort Key (Composite Primary Key)

A composite primary key combines a partition key with a sort key. Items that share the same partition key are stored physically together, sorted by sort-key value. This unlocks range queries such as "all orders for user U42 between 2026-01-01 and 2026-03-31" or "all messages in room R7 after timestamp T". DynamoDB data modeling for composite keys lets you encode hierarchical relationships (one user to many orders, one room to many messages) in a single table. The sort key is also the foundation of the single-table design pattern covered later in this guide.

Uniqueness Rules

In a simple-key table, every partition-key value must be unique. In a composite-key table, the pair (partition key, sort key) must be unique — two items with the same partition key are allowed as long as their sort keys differ. This rule is why composite keys model one-to-many relationships naturally.

DynamoDB distributes items across physical partitions by hashing the partition key. Low-cardinality partition keys (status, country, category, date) funnel traffic into a small number of hashes and create hot partitions that throttle even on a table with abundant total capacity. High-cardinality partition keys (user id, device id, request id, session id) spread load evenly. Every DynamoDB data modeling exam scenario that mentions "throttling despite low total utilisation" is testing hot-partition awareness. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-uniform-load.html

Single-Table Design Pattern

Single-table design is the DynamoDB data modeling pattern most strongly associated with advanced serverless architectures and it appears in multiple DVA-C02 scenario questions. The idea is deliberately counter-intuitive: instead of creating one table per entity (Users table, Orders table, Products table), you pack multiple entity types into a single DynamoDB table and use overloaded key attributes to keep them separate.

Overloaded Partition Keys and Sort Keys

In a single-table design, the partition-key attribute is typically named PK and the sort-key attribute is named SK. Their values carry an entity prefix that identifies which entity type each item represents.

A user item might have PK = USER#u42, SK = PROFILE.
An order item for the same user might have PK = USER#u42, SK = ORDER#2026-04-20#o987.
A line item inside that order might have PK = ORDER#o987, SK = LINE#1.
A product item might have PK = PRODUCT#p123, SK = METADATA.

All four items live in the same table. A Query for PK = USER#u42 returns the profile plus every order, sorted by the order sort key. This single query replaces a JOIN across three relational tables.

Entity Prefixes as Type Discriminators

The entity prefix (USER#, ORDER#, PRODUCT#) is both a human-readable type tag and a sorting trick. Because DynamoDB sorts by sort-key string, items beginning with ORDER# group together ahead of items beginning with PROFILE. Prefixes also let you query a slice of the partition — SK begins_with "ORDER#2026" retrieves all 2026 orders without scanning other entity types.

When Single-Table Design Wins

Single-table design is optimal when:

Your access patterns frequently mix entities (user + orders + line items retrieved together).
Write and read throughput are easier to reason about as one aggregate number.
You want one CloudFormation resource, one IAM policy, and one backup to maintain instead of many.
You are building a serverless application that benefits from minimising round trips.

When Multi-Table Design Wins

Multi-table design remains acceptable when entities share nothing in common, have radically different capacity profiles (one entity is written 1000× more than another), or must be owned by different IAM principals. For typical DVA-C02 scenarios involving related entities, the exam favours single-table design.

When you overload PK and SK values, always include a human-readable entity prefix such as USER#, ORDER#, PRODUCT#, CART#, SESSION#. Your AWS CloudTrail logs, CloudWatch metric dimensions, and ad-hoc queries all become self-explanatory. Without a prefix, a raw id like u42 in the partition key forces every operator to consult an external schema document just to know what the item is. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-aggregation.html

Local Secondary Index (LSI) vs Global Secondary Index (GSI)

Secondary indexes are the DynamoDB data modeling feature that lets you query the same items by a different key without duplicating data yourself. DynamoDB offers two kinds of secondary index and the DVA-C02 exam loves to test the differences.

Local Secondary Index (LSI) Essentials

A Local Secondary Index shares the same partition key as the base table but uses a different sort key. LSIs are "local" because they only reorder items within the same partition.

Key LSI facts for DynamoDB data modeling:

Same partition key as the base table.
Different sort key from the base table.
Must be created at table-creation time — you cannot add an LSI later without rebuilding the table.
Up to 5 LSIs per table.
Strongly consistent reads are supported (because the LSI shares the base-table partition).
LSI storage counts against the 10 GB per-partition collection limit.
LSIs consume RCUs and WCUs from the base table — no separate capacity.

Global Secondary Index (GSI) Essentials

A Global Secondary Index has its own partition key and optional sort key, independent of the base table. GSIs are "global" because they re-partition the data across every partition.

Key GSI facts for DynamoDB data modeling:

Different partition key and sort key from the base table.
Can be added, modified, or deleted at any time — not just at table creation.
Up to 20 GSIs per table (soft limit — request higher via AWS Support).
Only eventually consistent reads are supported.
GSIs have their own provisioned capacity (or share the table's On-Demand billing).
GSI writes cost additional WCUs — every base-table write that touches indexed attributes also writes to every affected GSI.

GSI Overloading for Single-Table Design

Advanced single-table designs overload GSIs too: GSI1PK and GSI1SK attributes carry entity-prefixed values that power a different access pattern, such as "list orders by status" or "find users by email". One GSI can serve multiple access patterns when prefixes are chosen carefully.

This is the highest-frequency DVA-C02 exam trap for DynamoDB data modeling. If the scenario says "we need to add a new query pattern to an existing production table without downtime", the correct answer is a Global Secondary Index (GSI). A Local Secondary Index (LSI) cannot be added after the table is created — you would have to recreate the table and migrate data. Memorise this distinction verbatim: LSI is create-time only; GSI is any-time. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

LSI vs GSI Comparison Matrix

Dimension	LSI	GSI
Partition key	Same as base	Different
Sort key	Different from base	Different (optional)
Add after creation	No	Yes
Consistency	Strong or eventual	Eventual only
Count limit	5 per table	20 per table (soft)
Capacity	Shared with base	Independent
10 GB per partition limit	Applies	Does not apply

RCU and WCU Math

Capacity math is a staple of DynamoDB data modeling questions on DVA-C02. Memorise these formulas until they are automatic.

Write Capacity Unit (WCU)

One WCU represents one standard write per second for an item up to 1 KB.

Item ≤ 1 KB: 1 WCU per write.
Item > 1 KB: round up to the next whole KB, then 1 WCU per KB. A 3.2 KB item costs 4 WCUs per write.
Transactional writes cost 2× the standard WCU. A 1 KB transactional write costs 2 WCUs.

Read Capacity Unit (RCU)

One RCU represents one strongly consistent read per second for an item up to 4 KB.

Strongly consistent read, item ≤ 4 KB: 1 RCU.
Eventually consistent read, item ≤ 4 KB: 0.5 RCU (half the cost).
Transactional read, item ≤ 4 KB: 2 RCUs (double the strongly consistent cost).
Item > 4 KB: round up to the next whole 4 KB before applying the multiplier. An 11 KB item requires ceil(11/4) = 3 base units; strongly consistent is 3 RCUs, eventually consistent is 1.5 RCUs, transactional is 6 RCUs.

Worked Examples for DynamoDB Data Modeling

100 eventually consistent reads per second of 8 KB items: 100 × ceil(8/4) × 0.5 = 100 × 2 × 0.5 = 100 RCUs.
200 strongly consistent reads per second of 2 KB items: 200 × 1 = 200 RCUs.
500 writes per second of 1.5 KB items: 500 × ceil(1.5/1) = 500 × 2 = 1000 WCUs.
50 transactional writes per second of 500 B items: 50 × 1 × 2 = 100 WCUs.

Burst Capacity and Adaptive Capacity

DynamoDB reserves up to 5 minutes of unused capacity as burst, which can absorb short spikes. Adaptive capacity automatically reallocates throughput from cold partitions to hot partitions, mitigating minor imbalances without developer action. Neither mechanism rescues a badly designed partition key; hot-partition throttling still occurs when a single partition key absorbs too much traffic.

Write 1 KB = 1 WCU. Round item size up to next whole KB. Transactional write = 2×. Read 4 KB strongly consistent = 1 RCU. Eventually consistent = 0.5 RCU. Transactional read = 2 RCUs. Round item size up to next whole 4 KB. Scan reads every item so size × count, not per-match. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html

On-Demand vs Provisioned Capacity

DynamoDB data modeling includes choosing a capacity mode. The two modes have different billing shapes, different throttle behaviours, and different fits for workload profiles.

On-Demand Capacity Mode

On-Demand billing charges per request — a fixed price per million read request units and per million write request units. There is no capacity to provision, no auto-scaling to configure, and no throttling when load ramps smoothly. The trade-off is a higher per-unit price compared with optimally provisioned capacity, making On-Demand ideal for:

New applications where you do not yet know the traffic shape.
Spiky workloads where peak is 10× or more above steady-state.
Development and test environments used sporadically.
Serverless event-driven patterns where load correlates with upstream bursts.

Provisioned Capacity Mode with Auto Scaling

Provisioned capacity asks you to declare WCUs and RCUs up front. You pay for the capacity whether you use it or not, but the per-unit price is lower. Auto Scaling adjusts provisioned capacity inside a min-max band based on consumed-capacity CloudWatch metrics, typically targeting 70% utilisation. Provisioned + Auto Scaling is the right DynamoDB data modeling choice when:

Traffic is predictable and sustained.
Cost sensitivity matters more than absorbing unpredictable spikes.
You already have strong observability and can tune the min-max band.

Capacity Mode Switching

You can switch between On-Demand and Provisioned once every 24 hours per table. Many teams launch in On-Demand to learn the traffic shape, then switch to Provisioned once the pattern stabilises. Some teams do the opposite — stay on Provisioned for cost efficiency and temporarily flip to On-Demand before a known traffic spike (product launch, marketing campaign).

A brand-new table in On-Demand mode can immediately serve up to 4000 WCUs and 12000 RCUs. However, if you spike from zero to 40000 WCUs in one second, DynamoDB still throttles you while it auto-scales. On-Demand absorbs gradual doubling well but not instant 10× jumps. For planned launches that expect cold-start spikes, pre-warm the table by ramping traffic gradually or switch to provisioned with a high floor. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/on-demand-capacity-mode.html

DynamoDB Streams

DynamoDB Streams is the change-data-capture feature that turns DynamoDB into an event source. Every item-level modification in the base table is captured as a stream record that downstream consumers can read in order. DynamoDB Streams is the foundation of event-driven DynamoDB data modeling patterns.

Stream Record Contents

When you enable DynamoDB Streams on a table, you choose what each stream record carries:

KEYS_ONLY — only the key attributes of the modified item.
NEW_IMAGE — the item after modification.
OLD_IMAGE — the item before modification.
NEW_AND_OLD_IMAGES — both before and after, ideal for audit and diff-based downstream logic.

Stream Retention and Ordering

DynamoDB Streams retains records for 24 hours, after which they are purged. Records are delivered in strict partition-key order within a shard, giving you per-item ordering guarantees. The 24-hour retention window means consumers must be reliable — if a Lambda consumer fails for longer than a day, records are lost permanently.

Lambda Trigger Integration

The most common DynamoDB Streams consumer is AWS Lambda. You configure a stream as an event source for a Lambda function and Lambda polls the stream on your behalf. Lambda retries failed batches and reports to a dead-letter queue after configurable max attempts. Typical Lambda-on-Streams patterns include:

Replicating items to a search index (Amazon OpenSearch).
Publishing notifications to Amazon SNS or Amazon SQS.
Maintaining aggregate counters in a second DynamoDB table.
Triggering downstream workflows in AWS Step Functions.

Kinesis Data Streams Integration for DynamoDB

For higher-throughput, longer-retention, or fan-out use cases, DynamoDB can emit changes to Amazon Kinesis Data Streams instead of (or in addition to) the native DynamoDB Streams endpoint. Kinesis Data Streams retains data for up to 365 days and supports multiple parallel consumers per shard via Enhanced Fan-Out. Use Kinesis Data Streams integration for DynamoDB when:

You need retention longer than 24 hours.
Multiple independent consumers need the same change feed.
You want to combine DynamoDB change data with other Kinesis sources in a single stream-processing pipeline.

When your DynamoDB Streams consumer needs to know what changed (versus only that something changed), enable NEW_AND_OLD_IMAGES. This view is the only one that lets you compute before-and-after diffs for audit logs, change notifications, and conditional downstream triggers like "notify when status transitions from PENDING to SHIPPED". KEYS_ONLY is cheaper but forces downstream consumers to re-read the item, which doubles the read cost and loses the original value if the item was deleted. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

DynamoDB Transactions

Until DynamoDB added transactions, developers implemented multi-item atomicity by hand with conditional writes and retry loops. Native transactions simplify DynamoDB data modeling for patterns that must succeed or fail as a unit — for example, debiting one account and crediting another.

TransactWriteItems

TransactWriteItems groups up to 100 write operations (Put, Update, Delete, ConditionCheck) into a single atomic unit. All operations succeed together or none are applied. As of 2022, the limit was raised from 25 items to 100 items per transaction, but legacy exam content may still reference the 25-item limit — the DVA-C02 exam guide as of 2026 acknowledges both numbers, so if you see 25 as an option it may still be marked correct in older question banks. Transactional writes cost 2× the standard WCU.

TransactGetItems

TransactGetItems groups up to 100 read operations into a consistent snapshot, ensuring that no item in the group changes between reads. This is useful for reconciling balances, inventory, or any scenario where read-skew between items would corrupt downstream logic. Transactional reads cost 2× the strongly consistent RCU.

When to Use Transactions

Use DynamoDB transactions when two or more items must change together with strict all-or-nothing semantics — financial transfers, inventory reservations, idempotent workflows that touch multiple entity types. Do not use transactions as a general substitute for careful key design; the 2× cost multiplier adds up quickly.

Conditional Writes

Conditional writes let a single PutItem, UpdateItem, or DeleteItem succeed only if a condition expression evaluates to true. Conditions can reference attribute existence (attribute_not_exists(pk)), attribute values (#status = :pending), or compound boolean logic. Conditional writes do not cost extra — failed conditions still consume the WCU because DynamoDB had to read the item to evaluate the condition.

Optimistic Locking with Version Attribute

Optimistic locking is a DynamoDB data modeling pattern that uses a version attribute to detect concurrent updates without taking a pessimistic lock. The flow is:

Client reads the item and notes the current version, say version = 7.
Client computes the update and issues UpdateItem with condition version = 7 and SET version = 8.
If another client wrote version = 8 in the meantime, the condition fails and the update is rejected. The client reads the fresh state and retries.

Optimistic locking is cheap (one WCU per successful write), scales across many writers, and is the idiomatic concurrency-control pattern for DynamoDB. It replaces the SELECT FOR UPDATE style you would use in a relational database.

TransactWriteItems and TransactGetItems cost 2× the standard WCU and RCU respectively. Use them only when multi-item atomicity is strictly required — financial transfers, inventory reservations, multi-step state machines. For single-item updates, conditional writes plus optimistic locking (using a version attribute) deliver safe concurrency control at the normal 1× cost. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transactions.html

DynamoDB Accelerator (DAX) for Microsecond Reads

DynamoDB Accelerator (DAX) is an in-memory, write-through cache that sits in front of DynamoDB tables and delivers microsecond read latency — roughly 10× faster than the single-digit millisecond base DynamoDB latency. DAX is a purpose-built cache for DynamoDB; it is not ElastiCache, and it speaks the DynamoDB API so your existing SDK code works unchanged once you point it at the DAX endpoint.

DAX Architecture

A DAX cluster consists of one or more nodes (up to 11 nodes per cluster) deployed inside a VPC. The cluster exposes a single endpoint. Your DynamoDB SDK client connects to DAX instead of the regular DynamoDB endpoint; DAX serves cache hits directly and proxies cache misses to DynamoDB.

DAX Caching Behaviour

Item Cache — caches GetItem and BatchGetItem responses by primary key. Default TTL is 5 minutes.
Query Cache — caches Query and Scan results by parameter hash. Default TTL is 5 minutes.
Write-Through — writes pass through DAX to DynamoDB synchronously; DAX updates its cache on successful writes so subsequent reads are consistent with the last write.

When to Use DAX

Use DAX when you need microsecond reads on read-heavy, key-based access patterns — product catalogues, leaderboards, ad-tech decisioning, gaming sessions. DAX does not help with:

Write-heavy workloads (every write still hits DynamoDB).
Scans and filters over large result sets with high cardinality.
Workloads that need strongly consistent reads (DAX is eventually consistent by nature).

DAX vs ElastiCache for DynamoDB Caching

On DVA-C02, if the scenario asks for caching in front of DynamoDB with minimal application code change, DAX is the correct answer because it speaks the DynamoDB API natively. If the scenario requires caching across heterogeneous sources (DynamoDB plus RDS plus API responses), ElastiCache is the right answer — but at the cost of explicit cache-key management in your application code.

DynamoDB Time to Live (TTL)

DynamoDB TTL is a background feature that automatically deletes items whose TTL attribute has expired. TTL is free — deletions consume no WCUs — making it the idiomatic way to manage ephemeral data in DynamoDB data modeling.

How TTL Works

Enable TTL on the table and specify which attribute holds the expiration timestamp (typical name: ttl or expiresAt).
Set the attribute value to a Unix epoch time in seconds at which the item should expire.
DynamoDB scans for expired items asynchronously and deletes them. Deletion usually occurs within 48 hours of expiration.
Deletions appear in DynamoDB Streams with a special userIdentity of dynamodb.amazonaws.com and principalId of dynamodb.amazonaws.com, so downstream consumers can differentiate TTL deletes from user-initiated deletes.

TTL Use Cases

Session tokens that expire after N hours.
Temporary one-time passwords.
Cache entries with bounded lifetime.
Event data that should age out after a retention window.

TTL Caveats

Deletion is eventual, not immediate — queries may return expired items for up to 48 hours after the timestamp has passed. Always filter by TTL in your application if timeliness matters.
TTL is best-effort. Critical compliance deletions should use explicit DeleteItem calls.

DynamoDB Global Tables

DynamoDB Global Tables replicate a table across multiple AWS Regions with full active-active multi-master semantics. Every Region accepts reads and writes; DynamoDB propagates changes to the other Regions asynchronously, typically within one second.

Global Tables Key Properties

Active-active multi-master across up to any supported Region.
Last-writer-wins conflict resolution based on server-side timestamps.
Replication latency usually < 1 second, with a 99.9% SLA.
Requires DynamoDB Streams to be enabled on the table.
Uses the same RCU and WCU units locally in each Region; replication adds replicated write request units (rWRUs).

When to Use Global Tables

Globally distributed applications that need local-Region latency for users in each geography.
Disaster recovery across Regions with zero RTO for reads.
Multi-Region active-active architectures where every Region accepts writes.

Global Tables Gotchas

Conflict resolution is last-writer-wins, which can silently overwrite concurrent updates in different Regions. If your application cannot tolerate lost updates across Regions, add application-level conflict-resolution logic (for example, merge arrays instead of overwriting).
PITR must be configured per Region.
Cost includes cross-Region replicated write request units in addition to the local WCUs.

If your scenario demands multi-Region active-active writes with single-digit millisecond local reads, Global Tables is the correct DynamoDB data modeling answer. Remember the conflict-resolution rule: when two Regions write to the same item simultaneously, the write with the latest timestamp wins. If your application cannot tolerate this, either partition writes by Region or design idempotent merges at the application layer. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GlobalTables.html

Backup Strategy: On-Demand Backup and Point-in-Time Recovery

DynamoDB offers two complementary backup mechanisms, and both appear in DVA-C02 DynamoDB data modeling scenarios.

On-Demand Backup

On-Demand backups are full snapshots that you trigger manually or via schedule (for example, AWS Backup). They are stored indefinitely and cost per GB of snapshot storage. Use on-demand backups for:

Pre-deployment checkpoints before a risky schema change.
Regulatory retention that requires backups older than 35 days.
Cross-account or cross-Region archival via AWS Backup.

Point-in-Time Recovery (PITR)

PITR is continuous backup that lets you restore the table to any second within the last 35 days. PITR is the idiomatic DynamoDB data modeling answer for protecting against accidental writes and deletes. Key PITR facts:

35-day rolling window (up from the original 35-day number — yes, it has always been 35 days, but memorise it precisely for the exam).
Restores are a new table; the original remains untouched. You restore to a new table name and swap at the application layer.
PITR is a table-level setting and must be enabled explicitly.
PITR cost is charged per GB-month of table size.

When to Use Which

Accidental UpdateItem or DeleteItem within the last 35 days → PITR.
Full table disaster recovery across Regions → On-Demand backup + AWS Backup cross-Region copy.
Regulatory retention beyond 35 days → On-Demand backups stored long-term.
Point-in-time restore after a bad application deployment → PITR.

DynamoDB Streams retention = 24 hours. DynamoDB PITR window = 35 days. Max transactional items = 100 (legacy 25). Max item size = 400 KB. Max LSIs per table = 5. Max GSIs per table = 20 (soft). DAX nodes per cluster = up to 11. Write 1 KB = 1 WCU. Read 4 KB strongly consistent = 1 RCU. Eventually consistent read = 0.5 RCU. Transactional read or write = 2× normal cost. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/PointInTimeRecovery.html

Common Exam Traps for DynamoDB Data Modeling

The DVA-C02 question bank reliably hits the following DynamoDB data modeling gotchas. Drill each trap until the reflex is automatic.

Trap 1: Assuming Scans Are Acceptable in Production

A Scan reads every item and pays full RCU per 4 KB scanned. On the exam, any answer that solves a query pattern with a Scan is almost always wrong unless the scenario explicitly accepts bulk offline processing. The correct DynamoDB data modeling answer is typically a Query against a GSI or a composite-key Query with a sort-key condition.

Trap 2: Choosing LSI When You Need Runtime Flexibility

LSI is create-time only. If the scenario describes an existing production table and asks how to add a new query pattern without downtime, the answer is GSI, not LSI.

Trap 3: Using Strong Consistency Where Eventual Consistency Suffices

Strongly consistent reads cost 2× the eventually consistent reads. Many DynamoDB data modeling exam scenarios describe reads that can tolerate a second of staleness (profile reads, catalogue lookups, leaderboard views) — those should use eventually consistent reads to halve the bill.

Trap 4: Missing the 2× Multiplier on Transactions

Transactional reads and writes cost double. If the exam asks you to size capacity for a workload that includes transactions, always apply the 2× multiplier before comparing answer choices.

Trap 5: Hot Partitions from Low-Cardinality Keys

Scenarios that use date, status, country, or category as a partition key are testing hot-partition awareness. The correct answer rewrites the key to include a high-cardinality component (user id, session id, request id) or adds a random suffix (write sharding) to distribute load.

Trap 6: Forgetting DynamoDB Streams Has 24-Hour Retention

If the scenario requires replaying change events after a multi-day outage, DynamoDB Streams alone is insufficient — the correct answer integrates Kinesis Data Streams (up to 365 days retention) for longer replay windows.

Trap 7: DAX Is Not ElastiCache

DAX is purpose-built for DynamoDB and speaks the DynamoDB API. ElastiCache (Redis or Memcached) caches arbitrary values at the application layer. When the scenario says "cache in front of DynamoDB with minimal code change and write-through consistency", the answer is DAX.

Trap 8: Assuming Global Tables Solve Every Multi-Region Problem

Global Tables give active-active replication with last-writer-wins conflict resolution. If the scenario requires strict single-writer semantics across Regions (for example, financial ledger), the answer is to designate a single Region as the writer and use Global Tables for read-only replicas elsewhere — or use a different service.

Trap 9: Expecting TTL to Delete Items Instantly

TTL deletions are best-effort within 48 hours of expiration. Applications that require precise timing must filter by TTL in every query.

Trap 10: Forgetting to Handle `ProvisionedThroughputExceededException`

When a Provisioned table throttles, the SDK surfaces ProvisionedThroughputExceededException. DynamoDB data modeling exam scenarios about error handling expect exponential backoff (the AWS SDK does this by default) plus either capacity scaling or key redesign as the long-term fix.

DynamoDB Data Modeling Real-World Pattern: Orders and Carts

A realistic single-table DynamoDB data modeling example for an e-commerce platform:

Table name: retail-app.
Base keys: PK, SK.
GSI1 keys: GSI1PK, GSI1SK.

Items:

User profile: PK=USER#u42, SK=PROFILE, email=..., GSI1PK=EMAIL#[email protected], GSI1SK=USER#u42.
Cart: PK=USER#u42, SK=CART#c001, status=ACTIVE, updatedAt=....
Cart line item: PK=CART#c001, SK=ITEM#p123, quantity=2.
Order: PK=USER#u42, SK=ORDER#2026-04-20#o987, status=PAID, GSI1PK=ORDER#STATUS#PAID, GSI1SK=2026-04-20#o987.

Access patterns served without scans:

Get user profile → GetItem PK=USER#u42, SK=PROFILE.
List all orders for a user sorted by date → Query PK=USER#u42, SK begins_with "ORDER#".
List items in a cart → Query PK=CART#c001, SK begins_with "ITEM#".
Find user by email → Query GSI1 with GSI1PK=EMAIL#[email protected].
List all paid orders across users for reporting → Query GSI1 with GSI1PK=ORDER#STATUS#PAID, sort by date.

Every access pattern is a Query or a GetItem. No scans, no cross-table JOINs, one table, two indexes.

DynamoDB Data Modeling Security and Observability

DynamoDB integrates with AWS IAM for fine-grained access control, AWS KMS for encryption at rest, AWS CloudTrail for API auditing, and Amazon CloudWatch for metrics and alarms. IAM condition keys can restrict access by partition-key prefix, attribute name, or returned attribute list — for example, allowing a tenant to read only items whose partition key starts with their tenant id. Amazon CloudWatch metrics for DynamoDB include ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, ThrottledRequests, SystemErrors, and UserErrors. DynamoDB data modeling for multi-tenant applications leans heavily on IAM condition keys for tenant isolation.

DynamoDB Data Modeling Cost Levers

Pick the right capacity mode (On-Demand vs Provisioned + Auto Scaling).
Use eventually consistent reads where staleness is acceptable — half the cost.
Avoid Scans in production; always serve queries via partition-key-scoped Query or GetItem.
Project only the attributes you need into GSIs (use KEYS_ONLY or INCLUDE projections) to cut GSI storage and write cost.
Enable DAX for read-heavy workloads — one cache-hit RCU saves many base-table RCUs.
Use TTL to age out data automatically instead of paying for permanent storage.
Tune item size — large items consume more WCUs and RCUs; split rarely-accessed attributes into a separate item if necessary.

Every GSI in DynamoDB data modeling has a projection — KEYS_ONLY, INCLUDE, or ALL. ALL projects every base-table attribute into the GSI and inflates both storage cost and GSI write cost because every base-table update that touches any attribute rewrites the GSI. Use KEYS_ONLY or INCLUDE when your query pattern only needs a subset of attributes, and follow up with a GetItem to the base table if you need more — usually cheaper than projecting everything. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

Key Numbers and Must-Memorise Facts for DynamoDB Data Modeling

Max item size: 400 KB.
Partition key length: up to 2048 bytes.
Sort key length: up to 1024 bytes.
LSIs per table: 5, create-time only.
GSIs per table: 20 (soft limit).
Max items in TransactWriteItems: 100 (legacy 25).
Max items in TransactGetItems: 100 (legacy 25).
BatchWriteItem max items: 25, max 16 MB.
BatchGetItem max items: 100, max 16 MB.
DynamoDB Streams retention: 24 hours.
Kinesis Data Streams integration for DynamoDB retention: up to 365 days.
PITR retention: 35 days.
On-Demand capacity default burst: 4000 WCUs and 12000 RCUs for new tables.
Write 1 KB = 1 WCU; read 4 KB strongly consistent = 1 RCU.
Eventually consistent read = 0.5 RCU; transactional read or write = 2× normal cost.
DAX latency: microseconds (vs. single-digit milliseconds for base DynamoDB).
Global Tables replication: typically < 1 second, last-writer-wins.

FAQ — DynamoDB Data Modeling Top Questions

1. How do I choose between LSI and GSI in DynamoDB data modeling?

Choose an LSI when you must query items that share the same partition key by a different sort key, you know the secondary access pattern at table-creation time, and you need strongly consistent reads on the index. Choose a GSI when you need a different partition key, when the access pattern emerged after the table was already in production, when you are willing to accept eventually consistent reads, or when the secondary access pattern spans partitions. Remember the most important DVA-C02 rule: LSI must be created at table-creation time; GSI can be added, modified, or dropped any time.

2. What is single-table design and when should I use it for DynamoDB data modeling?

Single-table design packs multiple entity types (users, orders, products) into one DynamoDB table using overloaded partition and sort keys with entity prefixes like USER#, ORDER#, PRODUCT#. It lets a single Query answer access patterns that would require multi-table JOINs in a relational database, while also reducing infrastructure surface area to one table, one IAM policy, and one backup. Use single-table design when your application has related entities retrieved together, and when you want serverless-friendly minimal round trips. Stick with multi-table design only when entities are truly unrelated, have radically different capacity profiles, or must be owned by different IAM principals.

3. How do I calculate DynamoDB capacity for a given workload?

For writes, one WCU serves one standard write per second of an item up to 1 KB — round item size up to the next whole KB, then multiply by requests per second, then double for transactional writes. For reads, one RCU serves one strongly consistent read per second of an item up to 4 KB — round item size up to the next whole 4 KB, divide by 2 for eventually consistent reads, or multiply by 2 for transactional reads. Worked example: 500 strongly consistent reads per second of 8 KB items = 500 × ceil(8/4) = 1000 RCUs. Always include the 2× multiplier for transactions and the 0.5× for eventually consistent reads.

4. When should I use DynamoDB Streams versus Kinesis Data Streams for change data capture from DynamoDB?

DynamoDB Streams is the native, free (you only pay read request units) change feed with 24-hour retention and strict per-item ordering. It is ideal for AWS Lambda triggers, small-scale replication, and lightweight event-driven patterns. Kinesis Data Streams integration for DynamoDB is the right choice when you need retention longer than 24 hours (up to 365 days), multiple concurrent consumers per shard with Enhanced Fan-Out, or integration with a broader Kinesis-based analytics pipeline. Some teams enable both — DynamoDB Streams for immediate Lambda triggers and Kinesis Data Streams for longer-term replay and analytics.

5. How does optimistic locking work in DynamoDB, and when should I use transactions instead?

Optimistic locking adds a version attribute to each item. When a client wants to update the item, it reads the current version, performs the update, and issues an UpdateItem with a condition expression version = <last-read-version> plus SET version = version + 1. If another writer has bumped the version in the meantime, the condition fails, the update is rejected, and the client reads fresh state to retry. Optimistic locking costs one WCU per successful write and scales to high concurrency. Use DynamoDB transactions (TransactWriteItems) instead when you must update two or more items atomically — for example, debiting one account and crediting another. Transactions cost 2× the normal WCU but guarantee all-or-nothing semantics across up to 100 items.

6. What is the difference between DAX and ElastiCache for DynamoDB caching?

DAX (DynamoDB Accelerator) is a purpose-built, write-through cache that speaks the DynamoDB API natively. You point your existing DynamoDB SDK client at the DAX endpoint and the SDK code is unchanged; cache hits return in microseconds versus single-digit milliseconds for the base table. DAX is best for read-heavy, key-based access patterns. ElastiCache (Redis or Memcached) is a general-purpose in-memory cache that requires explicit cache-key management in application code; it can cache responses from DynamoDB, RDS, external APIs, or any other source. Choose DAX when the only cached source is DynamoDB and you want zero code change; choose ElastiCache when you need to cache heterogeneous sources or require rich Redis data structures such as sorted sets or pub/sub.

7. How do I prevent hot partitions in DynamoDB data modeling?

Hot partitions occur when one partition-key value receives a disproportionate share of traffic, throttling the physical partition that owns that hash range regardless of how much total capacity the table has. Prevent hot partitions by (a) choosing high-cardinality partition keys such as user id, device id, or session id rather than low-cardinality keys such as status, country, or date; (b) sharding high-traffic keys by appending a random suffix (for example CATEGORY#ELECTRONICS#01 through #10) and scatter-gathering reads across suffixes; (c) using write sharding patterns where incoming events are routed to one of N synthetic partition keys; and (d) leveraging DynamoDB's built-in adaptive capacity for minor imbalances. On the DVA-C02 exam, scenarios that describe "throttling despite low total utilisation" are always hot-partition problems, and the correct fix redesigns the partition key.

8. What happens to DynamoDB Streams records when I delete items via TTL?

TTL deletions are emitted as stream records with a special userIdentity object: principalId is dynamodb.amazonaws.com and the record type is REMOVE. This lets downstream consumers distinguish TTL-driven deletes from user-initiated deletes so they can apply different logic — for example, archiving the expired item to Amazon S3 rather than treating it as an intentional user delete. TTL deletions also appear in Kinesis Data Streams integration for DynamoDB.

9. Can I restore a DynamoDB table to a previous point in time, and how far back?

Yes, if you enable Point-in-Time Recovery (PITR) on the table. PITR maintains continuous backups for the last 35 days and lets you restore to any second within that window. The restore creates a new table rather than overwriting the original, so your application must swap table names after the restore completes. On-Demand backups complement PITR for retention beyond 35 days and for cross-Region or cross-account archival via AWS Backup.

10. How do DynamoDB Global Tables handle write conflicts across Regions?

DynamoDB Global Tables use last-writer-wins conflict resolution based on server-side timestamps. When two Regions write to the same item simultaneously, the write with the later timestamp eventually overwrites the earlier one in every Region. This is simple and latency-friendly but can silently lose updates if your application cannot tolerate write conflicts. For strict single-writer semantics across Regions, designate one Region as the primary writer, route all writes there, and use Global Tables only to replicate reads to other Regions — or choose a different service such as Amazon Aurora Global Database with strong single-writer semantics.

What Is DynamoDB Data Modeling?

Why DynamoDB Data Modeling Is Hard for Relational Developers

Where DynamoDB Data Modeling Sits in the DVA-C02 Blueprint

白話文解釋 DynamoDB Data Modeling

Analogy 1: The Coat-Check Room

Analogy 2: The Library Card Catalogue

Analogy 3: The Open-Book Exam Cheat Sheet

Core Operating Principles of DynamoDB Data Modeling

Principle 1: Model Access Patterns First

Principle 2: Avoid Scans and Filters in Production Paths

Principle 3: Distribute Load Across Partitions

Principle 4: Denormalise Aggressively

Principle 5: Choose Capacity Mode to Match Workload Shape

Primary Keys in DynamoDB Data Modeling

Partition Key Only (Simple Primary Key)

Partition Key + Sort Key (Composite Primary Key)

Uniqueness Rules

Single-Table Design Pattern

Overloaded Partition Keys and Sort Keys

Entity Prefixes as Type Discriminators

When Single-Table Design Wins

When Multi-Table Design Wins

Local Secondary Index (LSI) vs Global Secondary Index (GSI)

Local Secondary Index (LSI) Essentials

Global Secondary Index (GSI) Essentials

GSI Overloading for Single-Table Design

LSI vs GSI Comparison Matrix

RCU and WCU Math

Write Capacity Unit (WCU)

Read Capacity Unit (RCU)

Worked Examples for DynamoDB Data Modeling

Burst Capacity and Adaptive Capacity

On-Demand vs Provisioned Capacity

On-Demand Capacity Mode

Provisioned Capacity Mode with Auto Scaling

Capacity Mode Switching

DynamoDB Streams

Stream Record Contents

Stream Retention and Ordering

Lambda Trigger Integration

Kinesis Data Streams Integration for DynamoDB

DynamoDB Transactions

TransactWriteItems

TransactGetItems

When to Use Transactions

Conditional Writes

Optimistic Locking with Version Attribute

DynamoDB Accelerator (DAX) for Microsecond Reads

DAX Architecture

DAX Caching Behaviour

When to Use DAX

DAX vs ElastiCache for DynamoDB Caching

DynamoDB Time to Live (TTL)

How TTL Works

TTL Use Cases

TTL Caveats

DynamoDB Global Tables

Global Tables Key Properties

When to Use Global Tables

Global Tables Gotchas

Backup Strategy: On-Demand Backup and Point-in-Time Recovery

On-Demand Backup

Point-in-Time Recovery (PITR)

When to Use Which

Common Exam Traps for DynamoDB Data Modeling

Trap 1: Assuming Scans Are Acceptable in Production

Trap 2: Choosing LSI When You Need Runtime Flexibility

Trap 3: Using Strong Consistency Where Eventual Consistency Suffices

Trap 4: Missing the 2× Multiplier on Transactions

Trap 5: Hot Partitions from Low-Cardinality Keys

Trap 6: Forgetting DynamoDB Streams Has 24-Hour Retention

Trap 7: DAX Is Not ElastiCache

Trap 8: Assuming Global Tables Solve Every Multi-Region Problem

Trap 9: Expecting TTL to Delete Items Instantly

Trap 10: Forgetting to Handle ProvisionedThroughputExceededException

DynamoDB Data Modeling Real-World Pattern: Orders and Carts

DynamoDB Data Modeling Security and Observability

DynamoDB Data Modeling Cost Levers

Key Numbers and Must-Memorise Facts for DynamoDB Data Modeling

FAQ — DynamoDB Data Modeling Top Questions

Trap 10: Forgetting to Handle `ProvisionedThroughputExceededException`