Context Management in Large Codebase Exploration

The Context Management in Large Codebase Exploration topic is the Domain 5.4 anchor of the Claude Certified Architect — Foundations (CCA-F) exam. Task statement 5.4 — "Manage context effectively in large codebase exploration" — is one of six tasks inside the 15 %-weight Context Management & Reliability domain, but its architectural footprint reaches into three of the six scoring scenarios: Code Generation with Claude Code, Developer Productivity with Claude, and Claude Code for Continuous Integration. Any Claude Code deployment that touches a real-world repository — tens of thousands of files, millions of lines, hundreds of modules — will, within its first autonomous task, overflow the context window by two or three orders of magnitude unless the agent deliberately chooses which slice of the repository to read, in which order, at which granularity, and which to leave on disk. Task 5.4 is precisely that discipline.

This study note covers every surface a CCA-F candidate is expected to design at the architecture level for Context Management in Large Codebase Exploration: why an entire repository cannot fit in the window, how the lazy-loading principle governs tool selection, the canonical Glob → Read pipeline, the Grep-first approach for symbol-driven exploration, the scratchpad file pattern that externalizes state, the incremental dependency-graph mapping discipline, the prioritization heuristics that pick entry points and configuration files over random file walks, context hygiene for discarding already-extracted file contents, high-level repo map files that survive across sessions, context budget tracking before a risky large read, per-file summary notes for later reference, and the common traps the exam exploits around Grep vs Glob confusion and whole-file reads. The topic intentionally does not duplicate the conversation-context material of task 5.1 — which focuses on case facts blocks, progressive summarization risk, and the lost-in-the-middle effect in conversational agents — and instead concentrates on the codebase-specific mechanics: Explore-agent patterns, Grep, Glob, Read, hierarchical read strategies, scratchpad files, and the Claude Code built-in tool surface.

Large Codebase Challenge — Entire Repo Exceeds Context Window by Orders of Magnitude

A production repository is not a document Claude can ingest. A modest microservice codebase with 2,000 files at an average of 200 lines each already exceeds four million tokens once line numbers, imports, and comments are included — orders of magnitude larger than any Claude context window. Even a relatively focused service with 300 files will usually weigh in at several hundred thousand tokens. Context Management in Large Codebase Exploration starts from the axiom that you cannot read everything, so the architectural question is never "how do I fit the repository into context" but "how do I read the minimum slice that lets Claude do the next correct thing."

The exam treats this axiom as non-negotiable. Any answer choice that proposes "load the entire codebase", "recursively read all files", or "concatenate the full repository into a single context prompt" is a distractor, regardless of how many supporting words surround it. The CCA-F Code Generation with Claude Code scenario, in particular, is structured around this trap: candidates who instinctively reach for bulk reads fail the questions where selective, symbol-driven exploration is the correct answer.

The large codebase context problem is the structural mismatch between repository size (typically millions of tokens across thousands of files) and the Claude context window (a bounded working-memory budget consumed simultaneously by the system prompt, CLAUDE.md content, tool definitions, conversation history, accumulated tool results, and the current user turn). Because an entire repository cannot fit, every codebase-exploration architecture in Claude Code must treat context as a budget to allocate, not a constraint to fight: decide which files to load, at which granularity, in which order, and which to leave on disk until evidence says they matter. Source ↗

Why This Matters on CCA-F

The Code Generation with Claude Code scenario and the Developer Productivity with Claude scenario both present the candidate with long-running tasks — bug investigations, multi-file refactors, test authoring, onboarding code tours — where the agent cannot pre-plan a full reading order before the first tool call. Instead, the agent must iteratively discover, narrow, and read, preserving context headroom for the actual code change at the end of the task. Community pass reports consistently flag that candidates who treat file reading as free fail Domain 5.4 questions; candidates who treat reading as the most expensive action the agent takes tend to select the correct lazy-loading architectures.

Lazy Loading Strategy — Read Only Relevant Files, Not Entire Codebase

The governing principle of Context Management in Large Codebase Exploration is lazy loading: defer the cost of reading a file until you have evidence the file is relevant to the current task. Lazy loading flips the default of "read everything, narrow later" into "narrow first, read only what survives narrowing." The Claude Code built-in tool surface — Read, Write, Edit, Bash, Grep, Glob — is designed around exactly this flip.

The Three Lazy-Loading Laws

Discovery is cheaper than reading. Glob (path discovery) and Grep (content search) return light-weight structural signals — file paths, line matches — that cost a fraction of the tokens a full Read would consume. Spend discovery first.
Reading is cheaper than editing. A Read lets Claude examine a file without committing to a modification. Before Edit, always Read the target region and confirm the precondition.
Every file loaded is a file that displaces something else. The context window is a shared budget; a 3,000-line file loaded early in a task will crowd out the test file that actually drives the next decision.

When to Break the Lazy-Loading Default

Lazy loading is a default, not a prohibition. A handful of small, centrally important files — the top-level CLAUDE.md, the service's main entry point, the test configuration — are legitimate eager reads because they inform every subsequent decision. The exam distinguishes between legitimate eager reads (small, high-information-density, known in advance) and the illegitimate "read the whole repository" pattern (bulk, low-signal-per-token, speculative). The heuristic is the same as a senior engineer's: you read the README and the main entry point eagerly; you do not read every test file until you know you care about tests.

Lazy loading is the first-principle behind every exam-correct answer in Context Management in Large Codebase Exploration. Distractors in the Code Generation with Claude Code scenario frequently propose "read all related files up front to build full context" — this is wrong because it consumes the budget that must remain available for the later reasoning and editing steps. The correct architecture reads only what the current step needs, extracts the relevant fragment, summarizes it into a scratchpad or short notes, and discards the raw file content. Source ↗

Glob → Read Pipeline — Discovering Candidate Files Before Selective Loading

The canonical lazy-loading pipeline in Claude Code is Glob followed by selective Read of a narrow subset. Glob walks the filesystem matching a path pattern (src/**/*.ts, **/auth/*.py, apps/*/config/*.yaml) and returns only the list of matching paths — no file contents. That list of paths is orders of magnitude cheaper than the contents of the same files, and it lets Claude make an informed selection before spending read budget.

How the Pipeline Works Step by Step

Formulate a path hypothesis. Given the task, what directories or naming patterns should contain the relevant files? "Authentication likely lives in **/auth/** or **/login/**."
Run Glob against the hypothesis. The tool returns a list of paths (potentially hundreds) without reading any of them.
Narrow the list. Use filename cues (*.test.ts, index.*, *.config.*) to pick the handful of files most likely to contain the relevant code.
Read only the narrowed set. Call Read on each surviving path. Ideally, use a line-range parameter so only the relevant region enters context.
Iterate if the hypothesis failed. If the first Glob produced nothing useful, refine the pattern; do not widen to a recursive **/* sweep.

Glob Is a Path Tool, Not a Content Tool

The single most-tested distinction between Glob and Grep on CCA-F is that Glob matches path patterns only and does not open any files. It answers "what files match this path pattern?" — never "which files mention this symbol?" If your question is content-oriented ("which files import AuthService?"), Glob cannot answer it; you need Grep.

The Glob tool is a Claude Code built-in that walks the filesystem and returns the list of paths matching a supplied glob pattern (*, **, character classes, extension filters). It reads no file contents — only the directory structure — which makes it the cheapest available form of codebase discovery. Use Glob to answer "what files exist at this path shape?" and to build the candidate list that a later Read or Grep will narrow. Any use of Glob to "search code" is a misuse; content search is the role of Grep. Source ↗

Grep-First Approach — Searching for Relevant Symbols Before Reading Full Files

Grep is the symbol-driven dual of Glob. Where Glob finds files by name, Grep finds files by content — a class declaration, a function name, a configuration key, a string constant. For most real codebase exploration tasks, Grep-first is the exam-correct pattern: before reading a full file, run Grep for the exact symbol or text the task mentions, and use the matching line numbers to target a narrow Read against just those lines.

The Grep-First Workflow

Extract the search term from the task. If the task mentions processRefund, the first move is Grep for processRefund, not Read on a directory of plausible files.
Interpret the match list. Grep returns file paths with matching line numbers. A handful of hits often reveal the definition site (where the symbol is declared) and the call sites (where it is used).
Targeted read. For each hit worth investigating, Read the file with a line range that includes the matched line plus a small surrounding window (for example, 20 lines on each side). Avoid Read without a line range on any file above a few hundred lines.
Follow dependencies only if needed. If the targeted read shows the function delegates to another symbol, repeat the Grep-first pattern for that symbol rather than reading the entire delegate file.

Why Grep-First Preserves Budget

A typical definition site for a function occupies 20–60 lines. A full file read for the same function may load 800 lines of surrounding code — a 20× inflation. Across a multi-step task, the difference compounds: a Grep-first agent can investigate ten distinct symbols in the budget that a whole-file-read agent burns investigating two.

The Grep-first pattern is the discipline of resolving "where does this symbol live?" and "where is it used?" with the Grep tool before opening any file with Read. Grep returns file paths with matching line numbers at a fraction of the token cost of the underlying files, which lets Claude target narrow line-range reads at exactly the relevant regions. The pattern is the highest-leverage single move in Context Management in Large Codebase Exploration because it converts an open-ended "find the code" step into a bounded, budget-predictable lookup. Source ↗

Grep vs Glob — The Exam Distinction

Glob searches file paths. Input: path pattern. Output: file list. No file contents are read.
Grep searches file contents. Input: regex or literal. Output: matching path:line:content hits.

A candidate who confuses these tools will either run Grep where Glob suffices (expensive, because Grep reads every file it scans) or run Glob where Grep is needed (useless, because Glob cannot find symbols inside files). Both errors appear as distractor choices in CCA-F Code Generation scenario questions.

Scratchpad Files — Writing Exploration Notes to External Files for Persistent State

A scratchpad file is a disk-resident file (conventionally NOTES.md, SCRATCH.md, .claude/scratchpad/task-<id>.md, or similar) that the agent writes during exploration to hold intermediate findings — discovered file paths, extracted function signatures, open hypotheses, to-do items. Scratchpads are the codebase-exploration counterpart to the case facts block from task 5.1: they move durable state out of the context window and onto disk, where it can be referenced verbatim later by a short Read instead of replaying multi-turn conversation history.

Why Scratchpad Files Beat In-Conversation Notes

In-conversation notes — the agent typing "I have confirmed that processRefund lives in billing/refunds.ts at line 42, and it delegates to refundGateway.submit" — are subject to all the pathologies of Context Management in Large Codebase Exploration. The note must be replayed on every subsequent turn. It competes for salience with actual code content. It is susceptible to the lost-in-the-middle effect when the conversation grows. A scratchpad file avoids all three costs: it lives on disk at zero ongoing token cost, it can be re-read at a predictable cost when needed, and the agent can append new findings without rewriting the history.

Scratchpad Content Patterns

A well-structured scratchpad captures:

Confirmed facts. "processRefund is defined at billing/refunds.ts:42."
Open hypotheses. "The refund amount may be clamped in billing/policy.ts — not yet confirmed."
Rejected leads. "legacy/old-refund.ts is unused — import graph shows no callers."
Decision log. "Chose to refactor refundGateway.submit rather than replace it; reason: three downstream call sites depend on current signature."
To-do list. "1. Confirm test coverage on refundGateway.submit. 2. Run migration scripts."

Session-Boundary Survival

Scratchpad files survive the end of a Claude Code session, which means they double as handoff documents when the user resumes work the next day or when a subagent is spawned with the scratchpad as its primary input. This property connects Context Management in Large Codebase Exploration to session-state-resumption-and-forking (task 1.7): the scratchpad is the persistent carrier of exploration state.

Scratchpad files are the codebase-exploration equivalent of the case facts block in conversational agents. On CCA-F, whenever an answer choice frames long-running exploration as "keep all findings in the conversation and replay them every turn," it is incorrect. The exam-correct architecture externalizes durable findings to disk in a scratchpad, references them via targeted Read when needed, and leaves the conversation history free for active reasoning about the current step. Source ↗

Incremental Exploration — Mapping Dependency Graph Before Deep File Reads

Senior engineers explore unfamiliar codebases top-down: first they learn the module boundaries, then the public interfaces, then the internal implementation of whichever module the task actually touches. Incremental exploration is the Claude Code analog. Rather than reading a single file deeply, the agent first maps the dependency graph at a shallow level — which modules exist, which depend on which — then drills into only the modules on the path between the task's inputs and outputs.

The Incremental Exploration Protocol

Layer 0: repo shape. Glob for top-level directories, Read the root README.md, package.json / pyproject.toml, and top-level CLAUDE.md. Total cost: a few hundred tokens. Payoff: a map of the project's technology stack and module boundaries.
Layer 1: public interfaces. For each module relevant to the task, Glob for index.* or equivalent public-surface files and Read only those. These files usually list what the module exports — effectively a table of contents for the module's public API.
Layer 2: definition sites. Grep-first for the specific symbols the task mentions, and targeted Read at matched line ranges.
Layer 3: implementation details. Only after the previous three layers justify it should the agent do full-file reads of large implementation files.

Skipping layers — for example, jumping from "user wants to change the refund flow" directly to full-file reads of every file in the billing/ directory — is the failure mode the exam tests. Layered exploration front-loads the cheap, high-signal reads and defers the expensive, low-marginal-signal reads until evidence accumulates.

Dependency Graph vs Call Graph

A dependency graph is the coarse structural map (module A depends on module B). A call graph is the finer symbol map (function foo() calls function bar()). Incremental exploration typically relies on the dependency graph for Layers 0 and 1, and on Grep-first call-graph lookups for Layers 2 and 3.

Prioritization Heuristics — Entry Points, Configuration Files, Test Files as Starting Points

When the task is ambiguous or the codebase is unfamiliar, the agent needs a deterministic starting-point heuristic rather than a random walk. CCA-F expects recognition of the four canonical starting points, in order of signal density.

Entry Points

The main binary's entry file (main.ts, index.ts, app.py, cmd/*/main.go) reveals the service's top-level composition: which modules are wired together, which background workers exist, which HTTP routes are registered. An entry file is almost always worth eagerly reading because it shapes every later decision about where code lives.

Configuration Files

package.json, tsconfig.json, pyproject.toml, docker-compose.yml, .env.example, Cargo.toml, and equivalents describe the project's dependencies, build tooling, and runtime environment. These files are small, information-dense, and almost never off-topic. Reading them early saves the agent from later confusion about which test runner, linter, or package manager applies.

Test Files

Test files often encode the highest-quality executable specification of the code under test. When the task is "understand what this function is supposed to do", the nearest test file is usually the fastest path to a correct mental model — and it doubles as a verification artifact for any change the agent makes. Treat *.test.*, *_test.*, spec/** and tests/** as first-class starting points alongside entry points and configs.

CLAUDE.md Files

The hierarchy of CLAUDE.md files (global, project, directory) is a starting point specifically curated for Claude. Path-specific CLAUDE.md instructions are loaded conditionally based on which files the agent touches — they encode project conventions, do-not-edit lists, and architectural invariants. Reading the nearest CLAUDE.md at each exploration step is an expected reflex on CCA-F.

When the Code Generation with Claude Code scenario presents a task in an unfamiliar repository, the exam-correct first four reads are: (1) the top-level CLAUDE.md, (2) the root configuration file for the primary language, (3) the main entry point, (4) the most specific CLAUDE.md along the likely target directory path. Only after these four reads should the agent start Grep-ing for symbols. Candidates who dive straight into deep Grep without establishing repo shape routinely misidentify the target module and waste subsequent budget. Source ↗

Context Hygiene — Discarding Irrelevant File Content After Extracting Needed Information

Reading a file is the beginning of the lifecycle, not the end. Once the agent has extracted the two or three facts it needs from a 600-line file, the full 600 lines should not continue to occupy the context window on every subsequent turn. Context hygiene is the discipline of condensing or discarding already-mined file content so that later steps retain their budget.

Three Hygiene Moves

Extract and paraphrase. After reading a file, write one or two sentences into the scratchpad — "refundGateway.submit(amount, orderId) posts to /v1/refunds and returns {status, traceId}" — and treat that summary as the canonical reference going forward. The raw file content becomes disposable.
Compact or clear stale tool results. Claude 4 supports context-editing features that drop superseded tool results; use them deliberately for file reads that have been fully mined.
Do not re-read the same file unchanged. If the file has not been edited since the last Read, the earlier Read's extracted facts are still valid; a re-read simply doubles the cost without adding information.

Hygiene vs Summarization

Hygiene is a stronger move than generic summarization. Hygiene says "I have extracted what this file contributes; I no longer need the raw content." Summarization says "here is a compressed version of the raw content." Hygiene is exactly the move the exam rewards — summarization is usually a distractor framed as an architectural solution when a harder decision (drop the content entirely) is the correct answer.

A repo map is a hand-curated or agent-maintained file (often REPO_MAP.md, .claude/REPO_MAP.md, or embedded in CLAUDE.md) that summarizes the codebase's module boundaries, public APIs, and cross-module dependencies in a dense, navigable form. A good repo map lets Claude answer "where does authentication live?" or "what are the public interfaces of the billing module?" without running a single Glob or Grep.

What Belongs in a Repo Map

Module inventory. A bullet list of top-level modules with a one-sentence description each.
Public interface index. For each module, the names of its public exports and a pointer to where they are declared.
Dependency arrows. Coarse-grained "module X depends on module Y" links so Claude can reason about blast radius.
Known invariants. "All monetary amounts are integers in cents; floating-point amounts are a bug."
Do-not-edit list. Files or directories that are generated, third-party, or otherwise off-limits.

Maintenance Discipline

A stale repo map is worse than no repo map because it misleads. The canonical discipline is to update the repo map as part of any PR that touches module boundaries, treat the map as a review artifact, and include a version or "last-verified" timestamp. Claude Code can itself be tasked with updating the repo map after a restructuring task, which closes the loop between exploration and durable knowledge.

Repo Map vs Scratchpad vs CLAUDE.md

These three artifacts have distinct roles:

CLAUDE.md — project rules, conventions, do-not-edit lists, prompting hints. Stable across tasks.
Repo map — structural index of modules, interfaces, and dependencies. Stable across tasks; updated on structural changes.
Scratchpad — transient findings, open hypotheses, task-specific decision log. Reset per task.

Confusing these artifacts — for example, putting transient task state in CLAUDE.md, or structural architecture in the scratchpad — is a real exam trap.

Context Budget Tracking — Estimating Remaining Context Before Loading Large Files

Every autonomous Claude Code agent should treat the context window as a budget with an observable remaining balance. Context budget tracking is the discipline of estimating how much headroom remains and gating large reads against that estimate. It is the architectural habit that prevents the agent from running out of budget halfway through a multi-file refactor.

Signals That Inform the Budget

Iteration count. Each turn adds at least two messages (assistant + tool_result); deep loops accumulate fast.
Cumulative tool result size. Large Read or Bash outputs dominate long-session token usage.
CLAUDE.md + tool definition overhead. These are replayed on every turn and set a permanent floor.
Model's advertised context window. The ceiling against which everything else is measured.

Budget-Aware Gate Before a Risky Read

Before issuing a Read on a large file, an agent (or a hook) should check:

Estimated file size. A Glob-returned path or a quick Bash wc -l can give a size estimate.
Current remaining budget. Rough estimation from the sum of prior tool result sizes.
Alternative. If the file is large and the specific region is known (from a prior Grep hit), read only a line range. If the region is not known, spend a Grep first.

An agent that always reads full files without checking size regularly blows its budget on a single oversized file. An agent that always does a Grep or line-ranged read on files above some threshold (say, 500 lines) runs smoothly for far longer sessions.

A common distractor in Developer Productivity with Claude scenario questions suggests "read the full file to ensure the agent has complete context before editing." This is almost always wrong when the file is large. The exam-correct move is to Grep for the relevant symbol, Read a targeted line range, make the edit, and, if the edit's verification requires seeing more of the file, do a second targeted read afterwards. Full-file reads should be the exception, not the default, and are reserved for small files (a few hundred lines) or for files central to the task. Source ↗

Summarization of Explored Files — Creating Brief Per-File Notes for Later Reference

As the agent explores, it should build a per-file notes index — one or two sentences per explored file, captured in the scratchpad — so that a later step can ask "what did we learn about billing/refunds.ts?" without re-reading the file or replaying a 600-line Read tool result.

What a Per-File Note Contains

One-line purpose statement. "Contains the processRefund handler; orchestrates refund calculation and calls refundGateway.submit."
Key symbols or line numbers. "processRefund at line 42; clampRefundAmount at line 118."
Known quirks. "Amounts are in cents, not dollars; processRefund expects positive integers."
Status. "Reviewed on turn 5; no edits pending."

Per-File Notes vs Repo Map vs Scratchpad Decision Log

The scratchpad hosts all three — but each fulfills a different function. Per-file notes are the distilled output of individual reads. The decision log records choices made across files. The repo map (if maintained at project level) is structural and survives across sessions. Mixing them is fine; conflating them — for example, promoting a task-specific per-file note into the global repo map — is not.

Why This Pattern Matters for CCA-F

Exam questions in the Developer Productivity scenario routinely test whether the candidate recognizes that, on turn 15 of a long task, the agent should not re-read a file it already explored on turn 4. The correct architecture referenced the file once, wrote a brief note to the scratchpad, and now refers to the note — not the original 600-line read — as the source of truth. A distractor answer that has the agent re-reading every previously explored file on every subsequent turn describes the failure mode, not the solution.

Explore Agent Patterns — Read-Only Subagents for Codebase Investigation

A recurring pattern in large codebase exploration is to dedicate a read-only Explore subagent whose job is to investigate the repository, produce a scratchpad of findings, and return — without ever editing a file. The coordinator agent then consumes the scratchpad and decides whether the main agent should proceed with an edit. This pattern cleanly separates discovery from mutation and, because subagents operate with isolated context, prevents the coordinator's context from being polluted by transient raw file contents.

When to Spawn an Explore Subagent

The task requires reading a large number of files before the first edit can be planned safely.
The exploration will produce tens of thousands of tokens of raw file content that the coordinator does not need to see directly.
The coordinator's remaining context budget is constrained and the exploration alone would exhaust it.
The exploration output can be serialized into a scratchpad file that the coordinator can read at bounded cost.

Subagent Tool Allowlist Discipline

An Explore subagent should have Read, Grep, Glob, and Write (for the scratchpad) on its tool allowlist — and should not have Edit or destructive Bash tools. Restricting the allowlist is both a safety measure (the subagent cannot accidentally mutate the repo) and a context measure (the subagent cannot be tempted to branch out of its exploration role).

Explore subagents are the exam-correct answer when a Code Generation scenario presents a deep multi-file investigation before any edit can begin. The pattern isolates the exploration's raw file content inside the subagent's context, hands back only the scratchpad summary to the coordinator, and keeps the coordinator's remaining budget available for planning and edits. Distractors that have the coordinator do all the reading itself, or that have a single agent with Edit permission perform exploration, are lower-quality architectures for large-codebase tasks. Source ↗

Plain-English Explanation

Abstract context-budget discipline sticks when it is grounded in physical systems most engineers already know. Three analogies — deliberately chosen from very different domains — cover the full Context Management in Large Codebase Exploration surface.

Analogy 1: The Reference Library — Card Catalogue, Not the Stacks

Imagine a researcher walking into a large reference library. The library has hundreds of thousands of volumes — far more than any one person can read in a lifetime. A novice researcher might grab the first ten books on the topic and sit down to skim them all. An experienced researcher does something very different: they go to the card catalogue first, look up the exact topic, pull only the specific volumes the catalogue lists, and inside each volume they use the index to jump directly to the relevant pages.

The card catalogue is Glob. The volume index is Grep. The specific pages are line-ranged Read. The researcher's notebook, where they jot down the two sentences that matter from each consulted book, is the scratchpad. And the librarian's permanent "subject guide" pamphlet listing the library's major collection areas is the repo map.

A researcher who skipped the catalogue and the indexes, and instead sat down to read every volume in the architecture section, would run out of time long before they finished their actual research question. That is exactly what happens to a Claude Code agent that skips Glob and Grep and defaults to full-file reads across a large repository.

Analogy 2: The Open-Book Exam — Budget, Not Library

Picture a candidate sitting an eight-hour open-book exam on a large textbook. They are allowed the textbook, a blank notepad, and unlimited pens — but they can only keep a small number of pages physically open on their desk at any one time. The desk is the context window. The textbook is the repository. The notepad is the scratchpad.

The candidate who spreads every chapter across the desk simultaneously cannot find anything; the desk is cluttered, and by question three they are running out of physical space. The candidate who instead indexes the textbook, copies the three formulas that show up everywhere onto the notepad, jumps to specific pages for specific questions, and closes each page once they have extracted what they need, comfortably finishes the exam.

The candidate's discipline of "copy the two sentences that matter, close the book, move on" is context hygiene. The candidate's reflex to open the textbook to the index first, not to page one, is the Grep-first pattern. The candidate's acceptance that they cannot read everything — and their willingness to plan which pages they will not read — is lazy loading made personal.

Analogy 3: The Construction Site — Work Bench vs the Warehouse

Picture a carpenter working on a cabinet inside a workshop that is attached to a massive warehouse of lumber, fixtures, and tools. The workbench holds whatever the carpenter is currently working with — a few boards, a handful of tools, the plans for the current piece. The warehouse holds everything else: thousands of boards, hundreds of tool kits, decades of archived projects.

A carpenter who dragged the entire warehouse onto the workbench could not work; the bench would be buried. A carpenter who never looked in the warehouse would run out of supplies within the hour. The working discipline is: consult the warehouse manifest (the repo map) to know what exists, fetch the specific boards and tools needed for the current step (targeted Read), make a cut list on a clipboard at the bench (the scratchpad), return what you are done with to the warehouse (context hygiene), and keep the bench clear enough to do the actual work.

The exam-correct Claude Code agent is this carpenter. The exam-incorrect agent is the one who piles every board in the warehouse onto the bench "just in case" and then wonders why the dovetail joint is not cutting cleanly.

Which Analogy Fits Which Exam Cue

Glob/Grep/Read pipeline, Grep-first pattern, lazy loading → the reference library card catalogue.
Context budget tracking, context hygiene, hygiene vs summarization → the open-book exam budget.
Scratchpad files, repo maps, Explore-subagent isolation → the carpenter's workbench and warehouse.

Reconciling with Conversation Context Management (Task 5.1)

Task 5.4 and task 5.1 share a context-window substrate but address different content types. Task 5.1 focuses on conversation content — user turns, assistant replies, case facts, promises — where the failure mode is summarization drift of transactional facts. Task 5.4 focuses on file content — source code, configuration, logs — where the failure mode is budget exhaustion from indiscriminate reads.

The architectural parallels are deliberate. A case facts block in task 5.1 maps to a scratchpad file in task 5.4: both externalize durable state to a machine-maintained artifact that is replayed at bounded cost. Tool output trimming in task 5.1 maps to Grep-first targeted reads in task 5.4: both stop verbose content from entering the window in the first place. Section headers in task 5.1 map to repo maps in task 5.4: both provide navigational scaffolding over large, unstructured payloads. A candidate who has internalized task 5.1 will find task 5.4 mostly a re-deployment of the same principles against a different content domain.

Exam questions inside the Code Generation with Claude Code and Developer Productivity with Claude scenarios frequently combine task 5.1 and task 5.4 concerns: a multi-day pairing session with a developer where the agent must preserve both conversational commitments and codebase findings. The exam-correct architecture uses a conversation-level case facts block for the interaction layer and a disk-resident scratchpad for the codebase layer. Treating either as a substitute for the other is a common distractor. Source ↗

Common Exam Traps

CCA-F Domain 5.4 exploits six recurring trap patterns tied to Context Management in Large Codebase Exploration. Each trap is documented in community pass reports as a distractor that sounds reasonable until you apply the lazy-loading axiom.

Trap 1: "Read the Entire Relevant File Up Front"

Distractor wording: "To ensure Claude has full context before editing, read the entire file first." This is wrong for any file above a few hundred lines. A targeted Grep plus a line-ranged Read gives the same decision-quality at a fraction of the token cost. The exam treats full-file reads on large files as the default bad behaviour, not the default safe behaviour.

Trap 2: "Grep Searches File Paths"

Distractor wording: "Use Grep to discover which files exist under src/auth/." This confuses Grep with Glob. Grep searches file contents; Glob searches file paths. Any answer that swaps the two is wrong, and the community consistently reports this as the single most-missed Domain 5.4 distinction.

Trap 3: "Keep All Findings in the Conversation"

Distractor wording: "Maintain exploration findings in the conversation history so Claude can reference them across turns." This works for short tasks but fails at scale. The exam-correct move is to externalize durable findings to a scratchpad file, which survives across turns at bounded cost and is immune to the lost-in-the-middle effect.

Trap 4: "Bigger Context Window Is the Fix"

Distractor wording: "Switch to a model with a larger context window to fit the whole repository." Larger windows do not remove the structural mismatch — a repository is still orders of magnitude larger than any available window, and a bloated window worsens positional-salience effects. The fix is architectural (lazy loading, Grep-first, scratchpad), not capacity expansion.

Trap 5: "Let Claude Maintain Its Own Repo Map In-Prompt"

Distractor wording: "Instruct Claude in the system prompt to build and maintain a repo map as it explores." Model-maintained state in prompts is subject to drift and summarization loss — the same failure mode that sinks in-conversation case facts. The exam-correct architecture has Claude write the repo map (or scratchpad) to a file, not to a persistent prompt section.

Trap 6: "Always Spawn a Subagent for Exploration"

Distractor wording: "Always use an Explore subagent for any codebase investigation." Subagents are valuable when the exploration is deep enough to otherwise displace coordinator context — but for short, targeted tasks (single-file edits, quick Grep lookups) spawning a subagent adds overhead without benefit. The exam rewards picking the lightest pattern that solves the problem.

Practice Anchors — Task 5.4 Scenario Question Templates

CCA-F practice questions tied to Context Management in Large Codebase Exploration cluster into five recurring shapes across the Code Generation with Claude Code, Developer Productivity with Claude, and Claude Code for Continuous Integration scenarios. Detailed questions live in the ExamHub CCA-F question bank; the templates below train the pattern recognition needed to navigate them.

Template A: The Symbol Lookup

The agent is asked to change the behaviour of processRefund in a 2,000-file repository. The proposed architectures include (a) Glob every file containing refund in the path, (b) Read every *.ts file in src/billing/, (c) Grep for processRefund and Read line ranges at each hit, (d) spawn an Explore subagent that reads every billing file. What is the correct first move?

Correct move: (c) Grep-first. The symbol is explicit and the task is targeted; a Grep plus line-ranged reads gives the definition site and call sites at bounded cost. Options (a), (b), and (d) over-read.

Template B: The Unknown Repository

The user has handed Claude a repository it has never seen. The task is to add a feature the user describes in one sentence. What is the correct exploration order?

Correct order: (1) top-level CLAUDE.md, (2) primary config file (package.json / pyproject.toml / equivalent), (3) main entry point, (4) nearest CLAUDE.md along the likely target directory. Only after these four reads should Grep-first exploration begin. Candidates who jump straight to Grep without establishing repo shape misroute subsequent budget.

Template C: The Long Refactor

The agent is 20 turns into a multi-file refactor. Earlier turns read full files that are no longer needed. Remaining budget is tight, and the agent still has to make the critical edit and run tests. What is the correct hygiene move?

Correct move: context hygiene — condense already-mined files into short scratchpad notes and drop or compact the original full-file reads. Do not attempt to re-read everything; do not switch to a model with a bigger window; do not continue without any hygiene on the assumption that remaining budget suffices.

Template D: The Glob vs Grep Distractor

Which of the following is the correct way to find every file that imports the AuthService symbol?

Correct: Grep for AuthService (or import.*AuthService) across the repository and read the matching hits.
Incorrect: Glob for **/*AuthService* — this only matches file paths containing the string, not files whose contents import the symbol.

Template E: The Exploration Subagent Decision

The task is a deep multi-file investigation before a planned major refactor. The coordinator's context budget is tight and the investigation alone will produce tens of thousands of tokens of raw file content. Should the coordinator explore itself or spawn an Explore subagent?

Correct move: spawn an Explore subagent with Read, Grep, Glob, Write (scratchpad) but no Edit. The subagent's isolated context absorbs the raw file content; only the scratchpad summary returns to the coordinator. This preserves coordinator budget for planning and edits.

The seven-move playbook for Context Management in Large Codebase Exploration (Domain 5.4):

Never load the whole repo. A repository is orders of magnitude larger than any context window.
Glob for paths, Grep for symbols. Two different tools; confusing them is a top exam trap.
Grep-first. Resolve "where does this live?" before Read-ing any large file.
Read line ranges, not whole files. Targeted reads beat full-file reads above ~500 lines.
Externalize durable state to a scratchpad. Disk, not conversation, is the home of exploration findings.
Establish repo shape before deep dives. CLAUDE.md → config → entry point → nearest directory CLAUDE.md, then Grep.
Practice context hygiene. After extracting what a file contributes, let the raw content fall out of context.

Distractor cue: if an answer proposes reading the entire repository, relying on a bigger window, or keeping all findings in the conversation, it is wrong. Source ↗

Context Management in Large Codebase Exploration — Frequently Asked Questions

Why is it wrong to load the entire codebase into Claude's context at the start of a task?

A production repository routinely spans millions of tokens across thousands of files — two to three orders of magnitude larger than any Claude context window. Even a small service typically exceeds the available budget after only a few hundred file reads. More importantly, every token spent loading speculative context is a token unavailable for the later reasoning and editing the task actually requires. The correct pattern is lazy loading: establish repo shape with a few targeted reads (CLAUDE.md, config, entry point), then Grep-first for the symbols the task mentions, then line-range Read only the surviving candidates, then extract findings to a scratchpad and discard the raw file content.

What is the difference between `Glob` and `Grep` on Claude Code, and why does the distinction matter?

Glob matches file path patterns — src/**/*.ts, **/auth/*.py — and returns the list of matching file paths without reading any contents. Grep searches file contents for a regex or literal and returns matching path:line:content hits across the files it scans. Use Glob when the question is "what files exist at this path shape?" and Grep when the question is "which files contain this symbol or string?" Confusing them is the most-missed Domain 5.4 distinction on CCA-F: using Grep where Glob would have sufficed burns budget on unnecessary content scanning, while using Glob where Grep is needed cannot find symbols inside files at all.

What is a scratchpad file and why is it preferred over keeping findings in the conversation?

A scratchpad file is a disk-resident file (conventionally NOTES.md, SCRATCH.md, or .claude/scratchpad/task-<id>.md) where the agent writes its accumulating findings during exploration — confirmed facts, open hypotheses, rejected leads, decision logs, per-file notes. It is preferred over conversation-based notes because durable state in the conversation must be replayed on every turn, competes with active reasoning for salience, and is vulnerable to the lost-in-the-middle effect. Durable state on disk costs nothing on turns that do not reference it, can be re-read at bounded cost when needed, and survives across session boundaries, making it the codebase-exploration equivalent of the case facts block from task 5.1.

When should I spawn an Explore subagent instead of doing the investigation in the main agent?

Spawn an Explore subagent when the investigation is deep enough that its raw file content would otherwise displace critical coordinator context. Clear triggers: the investigation touches tens of files, produces tens of thousands of tokens of raw content, precedes a planned major refactor, or the coordinator's remaining budget is already constrained. Give the subagent Read, Grep, Glob, and Write (for scratchpad) but not Edit — the subagent's job is discovery, not mutation. For short, targeted investigations, a subagent adds overhead without benefit; the lightest pattern that solves the problem is the exam-correct pattern.

How do I decide what to read eagerly vs lazily when starting a task in an unfamiliar repo?

Eagerly read files that are small, information-dense, and inform every subsequent decision: the top-level CLAUDE.md, the primary configuration file (package.json, pyproject.toml, Cargo.toml, etc.), the main entry point, and the most specific CLAUDE.md along the likely target directory. These four reads typically cost a few thousand tokens and give the agent a map of the project's technology stack, module boundaries, and conventions. Read everything else lazily: narrow with Glob and Grep first, read line ranges rather than whole files, and extract findings to a scratchpad before moving on.

How should I manage context when a long task needs to read files the agent already examined earlier in the session?

Do not re-read unchanged files. If the agent extracted the relevant facts during the earlier read and captured them in the scratchpad, a targeted scratchpad Read retrieves the extracted summary at a fraction of the original file's cost. Re-reading an unchanged 600-line file to retrieve two facts that are already summarized in one scratchpad line doubles the token cost without adding information. The discipline is: extract on first read, summarize into the scratchpad, treat the scratchpad as the canonical reference going forward, and only re-read the raw file if it has changed or if a deeper region is now relevant.

What is context hygiene and how is it different from summarization?

Context hygiene is the discipline of actively discarding or compacting file content that has already been mined for its relevant facts — letting the raw content fall out of the window so later steps retain budget. Summarization produces a compressed version of the raw content that stays in the window at reduced cost. Hygiene goes one step further: the extracted facts live in the scratchpad, and the raw file content is treated as fully disposable. On CCA-F, answer choices that propose summarization when hygiene is the correct move are common distractors, because summarization still pays ongoing token cost whereas hygiene — combined with a disk-resident scratchpad — does not.

Large Codebase Challenge — Entire Repo Exceeds Context Window by Orders of Magnitude

Why This Matters on CCA-F

Lazy Loading Strategy — Read Only Relevant Files, Not Entire Codebase

The Three Lazy-Loading Laws

When to Break the Lazy-Loading Default

Glob → Read Pipeline — Discovering Candidate Files Before Selective Loading

How the Pipeline Works Step by Step

Glob Is a Path Tool, Not a Content Tool

Grep-First Approach — Searching for Relevant Symbols Before Reading Full Files

The Grep-First Workflow

Why Grep-First Preserves Budget

Grep vs Glob — The Exam Distinction

Scratchpad Files — Writing Exploration Notes to External Files for Persistent State

Why Scratchpad Files Beat In-Conversation Notes

Scratchpad Content Patterns

Session-Boundary Survival

Incremental Exploration — Mapping Dependency Graph Before Deep File Reads

The Incremental Exploration Protocol

Dependency Graph vs Call Graph

Prioritization Heuristics — Entry Points, Configuration Files, Test Files as Starting Points

Entry Points

Configuration Files

Test Files

CLAUDE.md Files

Context Hygiene — Discarding Irrelevant File Content After Extracting Needed Information

Three Hygiene Moves

Hygiene vs Summarization

Repo Map Patterns — Maintaining a High-Level Index File for Navigation

What Belongs in a Repo Map

Maintenance Discipline

Repo Map vs Scratchpad vs CLAUDE.md

Context Budget Tracking — Estimating Remaining Context Before Loading Large Files

Signals That Inform the Budget

Budget-Aware Gate Before a Risky Read

Summarization of Explored Files — Creating Brief Per-File Notes for Later Reference

What a Per-File Note Contains

Per-File Notes vs Repo Map vs Scratchpad Decision Log

Why This Pattern Matters for CCA-F

Explore Agent Patterns — Read-Only Subagents for Codebase Investigation

When to Spawn an Explore Subagent

Subagent Tool Allowlist Discipline

Plain-English Explanation

Analogy 1: The Reference Library — Card Catalogue, Not the Stacks

Analogy 2: The Open-Book Exam — Budget, Not Library

Analogy 3: The Construction Site — Work Bench vs the Warehouse

Which Analogy Fits Which Exam Cue

Reconciling with Conversation Context Management (Task 5.1)

Common Exam Traps

Trap 1: "Read the Entire Relevant File Up Front"

Trap 2: "Grep Searches File Paths"

Trap 3: "Keep All Findings in the Conversation"

Trap 4: "Bigger Context Window Is the Fix"

Trap 5: "Let Claude Maintain Its Own Repo Map In-Prompt"

Trap 6: "Always Spawn a Subagent for Exploration"

Practice Anchors — Task 5.4 Scenario Question Templates

Template A: The Symbol Lookup

Template B: The Unknown Repository

Template C: The Long Refactor

Template D: The Glob vs Grep Distractor

Template E: The Exploration Subagent Decision

Context Management in Large Codebase Exploration — Frequently Asked Questions

Why is it wrong to load the entire codebase into Claude's context at the start of a task?

What is the difference between Glob and Grep on Claude Code, and why does the distinction matter?

What is a scratchpad file and why is it preferred over keeping findings in the conversation?

When should I spawn an Explore subagent instead of doing the investigation in the main agent?

How do I decide what to read eagerly vs lazily when starting a task in an unfamiliar repo?

How should I manage context when a long task needs to read files the agent already examined earlier in the session?

What is context hygiene and how is it different from summarization?

Further Reading

Official sources

What is the difference between `Glob` and `Grep` on Claude Code, and why does the distinction matter?