Public Beta · v0.2.0

Forge the workflow.

Anvil is a single, self-contained AI coding agent that reads, writes, edits, and runs your project itself — wrapped in a disciplined two-gate review workflow so a long vibe-coding session can't quietly drift off the rails. One Rust binary. Model-agnostic. Installs in one line and updates itself.

Public Beta · v0.2.0

Anvil is feature-complete for its core workflow and installs/updates cleanly — but it's early. It's been field-tested mostly on Windows (the macOS/Linux builds pass CI but haven't been run in anger yet); several big pieces landed recently; and the review-gate flow is verified by hand rather than an automated suite. Expect rough edges, and please file issues — that feedback is exactly what the beta is for. For the best experience, use a capable tool-calling model for the coder role.

Everyone has a coder. Anvil has the workflow.

What makes Anvil different isn't the agent — everyone has one of those now. It's the discipline around it: at two human gates, two different model families review the work — once on the plan, once on each phase's git diff — in a sequential review → fix → re-review loop.

🔨

Two human gates, cross-vendor review

The plan and each build phase are critiqued by two different model families. Exactly two reviews per gate — no R3+, no drift, no rubber-stamping.

🔁

Sequential review loop

R1 reviews → the coder applies fixes → you review → R2 reviews → the coder applies fixes → summary. R2 re-reviews after R1's fixes, catching bugs those fixes introduce.

🤖

A real agent coder

It reads, edits, and runs the repo through tools — including a reliable apply_patch diff format. No attaching files, no copy-pasting output back to disk.

🧠

Doesn't lose the plot

A persistent task anchor, automatic compaction into working memory, and a lightweight repo map keep the coder on-goal across long, tool-heavy sessions.

🌐

Any provider

Ollama / local, OpenAI, xAI, Anthropic, Groq, Azure, AWS Bedrock, Google, OpenRouter, and any OpenAI-compatible gateway. Mix vendors freely — that's the whole point.

📦

One-line install + self-update

Prebuilt binaries for macOS, Linux, and Windows. anvil update swaps the binary in place — no reinstall, no toolchain.

Two gates. Two model families. One honest second opinion.

Free-form agentic chat by default; structure only at the two gates. You approve; the coder does the work; a genuine cross-vendor second opinion keeps it honest.

Chat

Talk with the coder about goals, constraints, and architecture, then ask it to write plan.md itself — phased, with goals and acceptance criteria.

Plan gate

/lock-plan runs the review loop on the plan; /accept-plan quenches it — records its hash and unlocks phase work.

Phase gate

The coder implements a phase directly — reads, edits, runs tests. /accept-phase runs the same review loop on the actual git diff.

Ship

/ship-phase quenches the phase. Re-run /accept-phase any time to re-review. Repeat per phase until you're done.

The review loop, at every gate

R1 — reviewer-a Critiques the locked artifact (the plan, then each phase's diff). The coder applies the fixes, then Anvil pauses for you to review.
R2 — reviewer-b A different model family critiques the revised work after R1's fixes land — catching the problems those fixes introduce, the part single-pass review misses.
Summary → quench The coder summarizes both rounds; you approve. Findings are saved to REVIEW_*.md at the repo root. The human approves at each gate; the coder does the creative work.

A real agent, built to stay on task.

Between the gates it's an ordinary, capable coding agent — driven by tools, scoped to the repo root, and engineered not to flail across long sessions.

The coder core

Repo-scoped tools read_file, write_file, apply_patch (a validated, context-located multi-file diff format), edit_file, list_dir, grep, run_command, project_state — paths that escape the root are rejected.
Task anchor The original objective is pinned into every turn, so even a long, compacted session never loses the goal.
Auto-compaction "Clinkering": when the conversation outgrows the model's context budget, older turns are summarized into working memory and re-injected — not silently dropped.
Repo map + budgeting A ranked, budgeted map of top-level symbols is injected each turn. Anvil reads each model's real context window from models.dev and sizes the working window to fit.
Loop breaker + resilience Repeated identical reads are short-circuited; transient provider hiccups (timeouts, 5xx, rate limits) are retried with backoff instead of killing the turn.
Durable memory + grounding An append-only ledger (.anvil/session.json) survives restarts, alongside a live reality snapshot and user-editable context files: decisions.md, assumptions.md, scratch.md, ARCHITECTURE.md.

Any provider. Mix vendors freely.

Set up providers, models, and roles once in a global config — every future repo is configured the moment you cd in. Keys live in the OS keyring or a shared .env; live model lists where the provider exposes them.

Hosted

OpenAI, xAI (Grok), Anthropic, Groq, Google, Together, Fireworks, OpenRouter, Mistral, DeepSeek, Perplexity, Cohere, and more.

Local

Ollama (http://localhost:11434/v1) and LM Studio. A first-run quick-setup picks live local models for CODER / R1 / R2.

Enterprise & custom

Azure OpenAI, AWS Bedrock, Google Vertex, Gradient — plus any OpenAI-compatible endpoint via a custom provider and base URL.

A cost-smart setup

Only the coder needs a strong, tool-calling model. The two reviewers are single-shot critique, so local models handle them fine — and keeping them a different family than the coder is exactly the cross-vendor second opinion Anvil is built around. A capable cloud model as the coder; local Ollama models as R1 / R2.

Tool-calling matters

The coder drives itself through tool calls, so it needs a model that supports tool/function calling. Anvil reads each model's capabilities from models.dev and warns you at startup if your coder model can't do tool calls. /models shows each role's window, tool support, and price.

Install in one line.

No Rust toolchain required — the installer downloads a prebuilt binary for your OS and puts it on your PATH. Prebuilt binaries cover macOS (Intel + Apple Silicon), Linux (x86_64 + arm64, static musl), and Windows (x86_64).

macOS / Linux

$ curl -fsSL https://raw.githubusercontent.com/\
ai-nhancement/Anvil/master/install.sh | sh

Windows (PowerShell)

> irm https://raw.githubusercontent.com/\
ai-nhancement/Anvil/master/install.ps1 | iex

Quick start

In any repo, create Anvil's per-project state dir and launch the TUI — the setup wizard runs automatically on first use, then just type to chat with your coder.

$ anvil init   # create .anvil/ state dir
$ anvil        # launch the full TUI

Self-updating

On launch Anvil does a quiet, cached check for a newer release and shows a pulsing update badge when one exists. To update, run anvil update from the shell or /update from inside the TUI — it downloads the right binary and replaces the running one in place. Pin a build with ANVIL_VERSION=v0.2.0 on the install command.

The forge-themed TUI

Live forge heat The coder is smithing (thinking) and forging (running a tool); older context gets clinkered (compacted); reviewers temper the work; you quench it to lock it in. Streaming responses with live tool activity throughout.
Selectable command approval File reads/writes/edits run automatically; run_command is gated with a selectable prompt — run once, allow that program for the session, or decline. Break in anytime with Ctrl+B.
Command palette Type / for workflow, config, model, and memory commands — /status, /models, /config, /memory, /view-plan, /view-reviews — plus a headless CLI (init, setup, status, update) for scripting.