How Claude (Agent Teams) Built a C Compiler

Overview

This page explains (in human terms) how Claude, running inside Claude Code, organized itself to build a large Rust-based C compiler using a very simple agent-team harness. The harness ran Claude in an infinite loop and scaled up to many parallel Claude instances working on the same Git repository.

Scope

This explainer is based on the public Anthropic engineering write-up and the public compiler repository. Where the original author did not publish exact prompt files or scripts, this site uses clearly labeled reconstructions and focuses on the observable workflow and coordination mechanisms.

What was built

A dependency-light, from-scratch Rust C compiler targeting multiple architectures, intended to compile large real-world programs — including a Linux kernel build (with some caveats).

How it was built

Many Claude Code sessions ran in containers, all pushing to a shared upstream Git repo. Agents “claimed” work by creating lock files inside a current_tasks/ folder, then merged and pushed their changes.

The harness loop

The core idea: never let the agent stop. When one Claude Code session finishes, it immediately starts a new one. Output from each session is written to a logfile tied to the current Git commit hash.

#!/bin/bash

while true; do
  COMMIT=$(git rev-parse --short=6 HEAD)
  LOGFILE="agent_logs/agent_${COMMIT}.log"
  claude --dangerously-skip-permissions \
         -p "$(cat AGENT_PROMPT.md)" \
         --model claude-opus-X-Y &> "$LOGFILE"
done

The write-up mentions one “oops” where an agent ran pkill -9 bash and killed its own harness process. The loop ends… because Bash is gone.

What lives in AGENT_PROMPT.md?

The blog describes the prompt at a high level: tell Claude the goal, and explicitly instruct it to break work into small pieces, track what it’s doing, decide what’s next, and keep iterating.

The exact file contents weren’t published in the post. Below is a plausible structure (reconstruction), not a verbatim prompt.

Goal:
- Build a Rust C compiler that can compile increasingly hard targets.

Process:
- Read project status files / READMEs first.
- Choose ONE concrete failing test / build target to improve.
- Claim work by writing a lock file in current_tasks/.
- Implement the fix with minimal changes.
- Run the relevant tests (prefer fast sampling unless deep debugging is needed).
- Commit and push your changes.
- Remove the lock file when done.
- Update progress notes with what you changed and what remains.

Rules of thumb:
- Avoid huge outputs; write details to logs and mark errors with 'ERROR:' lines for grep.
- If stuck, write down attempted approaches + hypotheses for the next agent.

Parallel agents & Git coordination

The parallel setup is intentionally bare-bones: each agent runs in its own container, with the upstream Git repo mounted at /upstream. Each agent clones into its own /workspace, works independently, then pushes back upstream.

Why Git?

Git gives you: a shared state, conflict detection (merge conflicts), and a simple “synchronization primitive” without building a custom queue, database, or coordination service.

No orchestration agent

There was no separate “manager agent” handing out tasks. Each Claude instance decided what to do next, usually choosing the “next most obvious” problem from the environment (tests + docs + logs).

Lock files: the key coordination mechanism

Agents claimed work by creating a simple lock file inside current_tasks/. If two agents attempted the same lock, Git sync produced a conflict and the second agent had to pick a different task.

Tip: click a box to see what it means.

Lock-file algorithm (step-by-step)

Claim: write current_tasks/some_task.txt (a “lock”).
Work: implement changes and run verifiers.
Sync: pull from upstream, merge changes, resolve conflicts.
Publish: push your commit(s).
Release: delete the lock file (task is now free/complete).
Repeat: new Claude session in a fresh container.

How agents self-organize

There wasn’t a message bus, shared chat room, or central scheduler. The harness relied on a simpler question: “Can each agent infer what the best next step is from the repo state?”

Primary signals agents used

Failing tests / build scripts (what’s broken right now)
Progress notes / READMEs (what has already been tried)
Issue lists / TODOs (what’s still missing)
Git history (recent changes, conflicts, direction)

Typical per-session behavior

Orient: read docs and recent logs.
Pick a concrete failing check to fix.
Lock the task file.
Implement + run verifiers.
Merge/push/unlock.

Important: “lock files” don’t solve everything

Lock files prevent two agents from claiming the same named task. They do not prevent “convergent work” where many agents independently chase the same underlying failure mode (e.g., a single giant kernel-build problem).

Keeping agents on-track

The author emphasizes that the loop itself is easy. The hard part is building an environment that makes “doing the right thing” the easiest path for the agent.

1) Extremely high-quality tests

If your verifier is even slightly wrong, an autonomous agent will “solve” the wrong problem and keep digging. The post describes investing heavily in compiler test suites, OSS build scripts, and a stricter CI pipeline that prevents regressions.

2) Design for Claude’s constraints

Context window pollution

Tests should print a few useful lines, log details to files, and label errors so they’re grep-friendly (e.g., ERROR: <reason> on one line).

Time blindness

The harness should avoid letting the agent burn hours on huge test runs. Provide a default --fast mode that runs deterministic random subsamples per agent.

Making the Linux kernel “parallelizable”

Early on, parallelization is easy: many failing tests → many agents fix different tests. But a Linux kernel build is effectively one massive end-to-end test.

The key move: use an oracle to bisect responsibility

The author used GCC as a known-good compiler: compile most files with GCC, compile the remaining subset with Claude’s compiler, and see if the kernel still boots. If it fails, narrow the subset further. This turns one giant task into many smaller ones.

Why “oracle + refinement” enabled parallel work

Once you can isolate failures to smaller sets of files, you can hand different subsets to different agents, and they stop tripping over each other. The post also mentions needing delta debugging to find pairs of files that fail only together.

Specialized roles

Running many agents isn’t only about speed — it also enables specialization. In the write-up, agents were assigned distinct “jobs” such as deduplication and performance work.

Gotchas & hands-on learnings

Gotcha: tests become the “spec” (and can be wrong)

With no active human steering, tests aren’t just “verification”; they’re also the de facto spec. The author notes that the task verifier must be nearly perfect or Claude will optimize for the wrong outcome.

Practical takeaway: invest early in verifiers and regression protection; treat test quality as a first-class deliverable.

Gotcha: context pollution destroys velocity

Flooding output with thousands of lines wastes context and makes it harder for the agent to spot the signal. The write-up recommends concise console output + grep-friendly log files and summary stats.

Practical takeaway: “structured logs” is not optional for long-running agents.

Gotcha: time blindness

Left alone, agents can sink hours into extremely slow test runs. The harness used a default fast sampling mode (deterministic per agent, random across VMs) and printed progress only infrequently.

Practical takeaway: make “fast feedback” the default and “full suite” the exception.

Gotcha: lock files don’t prevent convergent work

When the workload is really one giant problem (like “compile the kernel”), everyone heads toward the same thing anyway. Lock files prevent collisions on the same task name, but can’t invent a decomposition.

Practical takeaway: if parallelism stalls, redesign the harness to create real independent units of work.

Gotcha: merge conflicts are frequent (but survivable)

Frequent merges are expected when many agents push to one repo. The post notes merge conflicts were common, and Claude generally handled them, but this is real overhead.

Practical takeaway: expect merge overhead; keep changes small; encourage task-level isolation.

Gotcha: an agent can shoot the harness

A real example: an agent accidentally ran pkill -9 bash, killing the loop that was keeping it alive.

Practical takeaway: run in containers and sandbox permissions; assume the agent may run destructive commands.

Interactive simulator: lock files, merge conflicts, and “giant tasks”

This simulator is a toy model that helps build intuition for the coordination mechanism described in the write-up: Git-based locking + merging works great when you have many distinct tasks, and works poorly when the problem is one giant bottleneck.

Scenario Agents 8 Merge conflict rate 15%

Progress

Agents

Event log

How to interpret the events

LOCK → agent claims a task (analogous to writing a file in current_tasks/).
WORK → agent implements and tests.
MERGE → agent pulls/merges and may hit conflicts.
PUSH → agent publishes changes and releases the lock.