Skip to content

Before, During, After: The Three Checkpoints

A change clears every gate at merge time and still runs wrong three weeks later. The spec is solid, the tests are real proof, and the PR lands clean. Then you find a comment pointing at a design document that no longer matches the code. The decision it depended on was reversed in a different PR, and nothing caught the mismatch because the change sat outside the diff anyone reviewed.

Quality is not a single gate. It is three, in sequence, and each catches a failure the other two miss.

  • Before: did the work start from legible architecture, current agent instructions, and a reviewed spec?
  • During: is the implementation running against that spec and the commit hooks, or has the session drifted?
  • After: do the tests prove the acceptance criteria or assert behavior the agent invented?

Before: the foundation gate

The before-gate does not build the foundation. Earlier chapters did that: legible architecture (Foundation), agent instructions reachable from the AGENTS.md hub (Agent Instructions), a documented design system, and a test convention the agent reads before its first test (Test Strategy and Convention). The gate asks one question about all of it: is each still current and reachable, or has it rotted since the last change?

That question is cheaper than it sounds because most of it is deterministic. A hub that points at a file that no longer exists loads nothing, and the agent codes against a convention it never saw. A link checker catches that. A file-size guard and an index-staleness scan catch the rest, all in CI where humans forget to look. What stays manual is the judgment no scan makes: whether the architecture the docs describe is the architecture the agent will meet in the code.

Keeping Documentation Up to Date adds one more deterministic layer to this gate: a document declares which code it describes, and a validator flags the document when the code changed after the last review date. The check does not prove the prose is right. The check proves nobody has re-verified it since the source moved.

One input sits upstream of the spec itself: the architectural decision the spec executes. The chain runs ADR, then design doc, then spec (Spec Lifecycle). Freeze a spec against a decision still open, or against one reversed in a later PR, and the spec executes a decision the architecture no longer follows. The gate confirms the governing ADR is approved, and the design doc the spec leans on still says what the spec assumes. A link checker proves the reference resolves. Whether the decision still holds is the same manual judgment the architecture check already demands.

Sources: Anthropic, "Building effective agents" (Dec 2024), preparing the agent's context before it starts work. AGENTS.md (agents.md, ongoing), AGENTS.md as a project-level entry point for agent instructions.

During: the implementation gate

The during-gate watches three inputs while the agent codes: the spec it loaded, the hooks firing on each commit, and the state of the context window. The question is whether the implementation is still running against the right ones.

Freeze the spec first. A spec still under negotiation while the code is written is two moving targets, and the implementation drifts toward wherever the agent guesses it is heading. If it has to change, change it explicitly and restart the affected scenario rather than let the code chase a spec in motion.

The other two inputs split along the line this chapter keeps drawing. Hooks are deterministic: the pre-commit checks Skills, Commands, and Hooks installs are lint, secret detection, and AC-ID-to-tag verification, all firing unprompted. Context erodes silently, the way Context Window Management described. A session three hours deep, window full, the spec buried under two unrelated tasks, writes worse code than the same session at its start. The during-gate re-teaches neither. It enforces them. The tell that context has lapsed is the agent re-deriving in hour three an import path it had right in hour one.

The minimum during-checkpoint is three questions. Is the spec the same one the agent loaded? Are the deterministic checks still passing? Has the context window been refreshed in the last hour? Two no answers and the work should pause.

After: the verification gate

The after-checkpoint runs on what was produced. The spec is done, the implementation is done, and the tests pass. The question is whether the artifact closes the loop.

The verification checks the things automation cannot catch on its own. Did the implementation introduce code unrelated to the spec? Scope creep in agentic PRs is common. The agent passes through a file, fixes things it noticed along the way, and those fixes ship without review. Do the new tests prove the acceptance criteria or behavior the agent invented? An AC ID linking a scenario to a test that asserts something different is the silent-drift failure mode.

One closing check has nothing to do with the diff and everything to do with what the diff invalidated. Reverse an earlier decision, and the ADR that recorded it, along with any design doc citing it, now describes a system no longer in the code. The diff cannot flag this, because the stale document sits outside it: the failure this chapter opened on, a comment pointing at a design doc whose decision was overturned in a separate PR. So the after-gate asks whether the change invalidated a recorded decision. If it did, was the ADR updated or marked superseded, and does the design doc still match what shipped?

Refactoring is where most teams stop: the code worked, so it ships. The agent's first generation is rarely the right shape for the next change. The cheapest moment to fix that is now, while the spec and the code are both fresh, not in a follow-up PR three weeks later when the next developer is reverse-engineering unfamiliar generated code.

Review is the third part of the after-checkpoint, and the order is the one Trunk-Based Development with Agents sets out: the spec first, then the diff against the spec, then the diff on its own merits. The after-gate is where that order is most often reversed under time pressure. Reverse it and the review checks whether the code looks reasonable, not whether it implements what was specified.

A worked sequence

A small change runs through all three gates in sequence:

Each gate has its own failure mode. Skip the before-gate and the agent improvises against unknowns. Skip the during-gate and the work drifts inside the session. Skip the after-gate, and the merged artifact does not match the merged intent. The three gates are not redundant because they catch different problems at different points in the lifecycle.

Where the attention goes

Every gate has a deterministic part and a human part. Automation enforces link validity, file-size limits, AC-ID traceability, test-coverage pairing. The human part is what no check reaches: whether the spec describes the right thing, whether the implementation is in the right shape, whether the test proves the scenario rather than something adjacent. Maximize the deterministic part, because hooks scale to agentic speeds and human review does not. That review time is scarce, and most quality programs spend it on things a hook would have caught.

The three gates do not draw on it equally. The before-gate is mostly maintenance, and the during-gate is mostly automation, both cheap once set up. The after-gate is where the review time goes and where it pays off most. Plan for the asymmetry.

The sequence is logical, not temporal

The gates are not project phases. A spec is not finished before implementation starts in a calendar sense. The during-gate happens immediately after the before-gate, in the same afternoon. The sequence is logical, not temporal. Forcing it to map to days or weeks recreates waterfall, which is exactly the failure mode Why Specs? argued against.

The three gates catch divergence from spec and drift from the architecture. They do not catch the case where the spec is silent and every pattern the agent finds in the codebase is valid, including the broken one. The next chapter covers security failure modes that survive because they match the examples the agent was shown, not because any check missed them.