My Claude Code Workflow, Evolved

Six months ago I wrote about what actually works with Claude Code. If you haven't read it, the short version: context management matters, CLAUDE.md should be small, don't one-shot large tasks.

All of that still holds. But I was treating each session as a one-off. Set up context, work until things got fuzzy, compact, continue. It worked until a session would go sideways and I'd realise I had no clean state to go back to. The workflow now has more structure upfront, and I almost never have to throw away a session mid-task.

CLAUDE.md evolved

I said keep it small. That's still right. What changed is what I put in it.

My original file had too many conventions Claude could already infer from the code. It was bloated, and Claude was quietly ignoring chunks of it. I stripped it down to the things Claude kept getting wrong: plan limits for a SaaS feature, a third-party API that returns 200 on errors, the constraint that migrations go through the ORM CLI and never get hand-edited.

One addition that actually helped: @ imports. Root CLAUDE.md stays short and delegates to topic files:

See @docs/architecture.md for service boundaries.
Git conventions: @docs/git-workflow.md

Keeps the root file under 50 lines. The longer it gets, the more Claude ignores. Treat it like code: review it when things go wrong, prune regularly.

Tell Claude to ask questions. I have a single line: "ask if anything is unclear before implementing." Without it, Claude guesses. With it, it asks. The confident wrong implementation is worse than the question.

Memory files

CLAUDE.md handles project context. Memory files handle everything else.

Claude Code persists memory across sessions in ~/.claude/projects/<project>/memory/. I keep multiple files broken out by topic: workflow.md for how I like tasks structured and committed, debugging.md for patterns that keep coming up in this codebase, preferences.md for output style and things I keep having to correct. Each file stays focused so Claude loads what's relevant rather than one bloated document.

The split that matters: CLAUDE.md goes in the repo and is for the whole team. Memory files are local to your machine and are for you. My CLAUDE.md tells Claude about the project. My memory files tell Claude how I work.

In practice this means I almost never re-explain preferences mid-session. Claude already knows I want small commits per task, that I review plans before execution, that I don't want it touching unrelated files. That's all in memory. It carries forward.

Worktrees

The old habit: git stash, switch branch, work, switch back, hope the pop goes cleanly. It never does, not really. It always costs time somewhere.

git worktree add .worktrees/feature-name -b feature/feature-name
cd .worktrees/feature-name && npm install && npm test

Open Claude Code from inside the worktree. Every file it touches, every commit it makes is isolated to that branch. Main stays clean. When Claude goes sideways (and it will), you delete the worktree and start over. Nothing else touched.

The review also gets cleaner. Claude's work shows up as a diff on the feature branch, not as a dirty working directory you have to mentally untangle.

One step people miss: add .worktrees/ to .gitignore before creating the first one. Otherwise git tracks the directory and your status becomes noise. I forgot this twice before it stuck.

Plan Mode

I used to go straight from exploration to implementation. For small tasks, fine. For anything touching multiple files, Claude would consistently pick the wrong ordering, want to create files that already existed under slightly different names, or scope a refactor wider than I asked. I'd spend an hour fixing what a two-minute plan review would have caught.

Shift+Tab twice to toggle Plan Mode. Claude reads, analyzes, generates a plan without touching anything. Then Ctrl+G opens the plan in your editor. You edit it before Claude executes.

That edit step is the point. It's where you catch the wrong assumptions. Two minutes there saves the cleanup session later.

Context

The official docs open with this, and they're right to: performance degrades as context fills. I knew it, but I treated it as a background concern rather than the main one.

The habits that changed things:

/clear between unrelated tasks. Every time, not just when things feel fuzzy. I used to preserve context across tasks thinking continuity helped. It doesn't, it clutters. Once a session accumulates two or three corrections on the same issue, output quality drops noticeably. Clear it and write a better initial prompt. That almost always works better than continuing.

/compact <instructions> instead of just /compact. You can tell Claude what to preserve: /compact Focus on the auth contract changes. Without the instruction, compaction guesses. The wrong things get dropped.

Subagents for any investigation that spans multiple files. The research runs in a separate context and reports back. Main conversation stays clean for the implementation. This one change made the most difference in how well Claude holds its decisions through longer tasks.

Hooks and allowed commands

CLAUDE.md instructions are advisory. As context fills, Claude drifts from them. Hooks run regardless.

I have one that blocks writes to the migrations folder. Claude will try to hand-edit migrations if nothing stops it. Another runs the linter after every file edit. Both came from Claude doing the thing I didn't want, enough times to matter.

You can ask Claude to write the hooks: "write a hook that runs eslint after every file edit." It works. Run /hooks to configure, or edit .claude/settings.json directly.

The same file handles allowed commands, which is worth setting up early. By default Claude asks permission before running anything: every npm test, every git diff, every file read. That back-and-forth kills flow in a long session. Pre-approving the safe ones fixes it:

{
  "permissions": {
    "allow": [
      "Bash(npm test)",
      "Bash(npm run lint)",
      "Bash(git diff*)",
      "Bash(git status)",
      "Bash(git log*)"
    ]
  }
}

Claude runs those without asking. Anything destructive or outside the list still prompts. The interruptions drop from constant to occasional, which is what you want.

TDD

In the first post I mentioned catching HMAC validation that wasn't actually validating. The code looked correct. It wasn't.

The fix isn't reading the output more carefully. It's watching the test fail first.

Write the test. Run it. Confirm it fails with the right error. Implement. Watch it pass. The failure is the signal that the test means something. Skip that step and you're back to reading plausible-looking code and hoping.

What changed in my prompts: "write a failing test for X, run it to confirm it fails, then implement." Not "write a function that does X." The test is the verification. Reading the code is not.

On opencode

I've been running opencode alongside Claude Code for a few weeks. The short version: Claude Code still wins on output quality, but opencode is worth understanding.

The terminal UI is genuinely better. Independent buffer system, scrollable beyond terminal limits, good themes. If you spend long sessions in the terminal, it's a more comfortable experience.

The model flexibility is the real draw. You can run Sonnet-4 through opencode and get close to Claude Code's output quality, though it occasionally reformats code it wasn't asked to touch. If you have a Copilot subscription, GPT-4.1 runs through opencode at no extra cost. That's a real difference if you care about API costs.

opencode zen is a separate thing, not a UI variant. It's a curated model service: the team benchmarked which models actually perform well with coding agents and made them available pay-per-request at cost. Useful if you don't want to figure out yourself which provider is worth using this month.

I'm keeping it in the rotation. Not replacing Claude Code yet.

The first post was a list of observations from someone still figuring out the tool. This one is what happened after the workflow settled. Less improvising per session, more structure before the session starts. The output got more predictable. That's the actual difference.

Sources: