The Engine — Main Loop and Prompt Assembly
How cli.tsx hands off to query.ts, why it's an async generator instead of recursion, and the 10 steps that run on every turn.
The Request Chain
Every interactive turn follows the same path: cli.tsx receives input → calls into main.tsx → main.tsx calls query() → query() runs queryLoop(). That queryLoop is a while(true) loop. It does not return until the session ends.
main.tsx is 4683 lines because it’s where most cross-cutting concerns land: session initialization, interrupt handling, display state, the connection between user input and the agent loop. It’s the glue layer.
Why an Async Generator
query.ts (1729 lines) implements the main state machine as an async generator — a function that yields intermediate results rather than returning a final value. This replaced an earlier recursive design.
The reason is practical: recursive implementations blow the call stack in long sessions. If every agent turn calls into the next turn recursively, a 200-turn session creates a 200-frame call stack. Node.js has limits. Async generators flatten this into a loop, keeping memory usage constant regardless of session length.
The generator yields partial results as the model streams tokens, so the UI can update incrementally. The caller (queryLoop) pulls from the generator in a loop, processing each yielded chunk.
The 10-Step Loop
Each iteration of the main loop runs these steps in order:
- Context compression — check whether accumulated context needs to be summarized before sending
- Token budget check — verify the current prompt fits within the model’s context window
- Model API call — send the assembled prompt and begin streaming
- Streaming response — process tokens as they arrive, yielding to the UI
- Error recovery — handle API errors, rate limits, and network failures
- Stop hooks — run any registered stop-sequence hooks
- Token budget update — record actual token usage after the response completes
- Tool execution — if the model requested tools, execute them (see Ch3)
- Attachment injection — inject tool results back into the conversation
- Next turn — loop back with the updated context
Steps 3 and 8 overlap intentionally. Claude Code starts executing tools before the model finishes streaming its response. If the model outputs a tool call early in its response, the tool can begin running while the model is still generating the explanation text that follows. This is Speculative Tool Execution — covered more in Ch3, but the loop architecture is what makes it possible.
Prompt Assembly
Before step 3, the prompt needs to be assembled. getSystemPrompt() returns a string array, not a single string. This matters because the array structure maps onto the model’s caching behavior.
The prompt has two sections:
Static (cacheable): Core instructions, tool definitions, safety rules, and the agent’s role description. These don’t change between turns, so the model can cache the KV representations. Prompt caching cuts input token costs on long sessions significantly — the static portion only needs to be processed once.
Dynamic (never cached): Current working directory, git status, environment variables, session-specific context. These change every turn and must be re-processed each time.
The boundary between the two sections is marked by SYSTEM_PROMPT_DYNAMIC_BOUNDARY. Everything before the marker is static; everything after is dynamic. The placement of this marker is not cosmetic — it determines what gets cached and what doesn’t.
Behavior Rules
getSimpleDoingTasksSection() generates a section of the system prompt that encodes behavioral rules: how to respond when given a task, when to ask for clarification versus proceeding, how to handle ambiguous instructions, when to stop and report versus continue. This section is re-evaluated each turn because some rules are context-dependent (for example, rules about when to pause differ between interactive and non-interactive mode).
This is where a lot of the “personality” of Claude Code lives — not as trained behavior but as explicit instructions in the prompt. If you want to understand why Claude Code behaves a certain way in edge cases, this function is a good place to look.
What the Loop Is Not
The loop is not a chat interface that alternates human/assistant turns. It’s an agent loop — the “human” turns are often injected programmatically (tool results, system notifications, automatic continuations). The model doesn’t necessarily see the same turn structure that a casual observer of the conversation might expect.
The queryLoop handles the mechanics of this: when a tool result comes back, it gets packaged as a “user” message and injected back into the loop. The model sees it as a new input, processes it, potentially calls more tools, and so on until it produces a final response with no tool calls.
Understanding the loop structure matters when debugging unexpected behavior. If the agent seems to be “thinking too hard” or looping, it’s often because tool results are triggering additional tool calls, and the loop is working as designed — just with more iterations than expected.
Reference: This chapter draws on Xiao Tan’s (@tvytlx) Claude Code Architecture Deep Dive V2.0 report.