Reverse-Engineering Claude Code

Open-Source Models.
Frontier Quality.

9 providers. 38 models. Quality-tuned prompting with chain-of-thought reasoning and auto-fallback. The same agentic architecture as Claude Code — bash, file ops, code search, streaming — running on Gemini, Llama, Mistral, Qwen, DeepSeek, and more.

Try BenClaude →Based on open-multi-agent ↑

how we match quality

5 Techniques Stacked

How Open-Source Approaches Opus

Model quality is the bottleneck. These techniques compensate — disciplined prompting makes weaker models behave like stronger ones.

High

Quality System Prompt

2000+ tokens of behavioral instructions. Think internally, self-critique, verify before outputting. Never expose scaffolding.

High

Chain-of-Thought (Internal)

Models reason step-by-step silently. Only the final, polished answer is shown. No leaked "### Planning" headers.

Medium

Auto-Fallback Chain

Gemini 429s? Silently falls back to Groq → OpenRouter → NVIDIA → Cerebras → Together → SambaNova. User never sees an error.

Medium

Low Temperature (0.4)

Reduced randomness for more deterministic, accurate outputs. Higher precision on code and factual claims.

Low

8K Token Output

Double the default max_tokens. Complex answers aren't truncated mid-thought.

the providers

9 Providers, 1 Adapter

All OpenAI-Compatible

Every provider speaks the OpenAI wire format. One universal adapter with different base URLs. If one provider rate-limits, the system silently falls back to the next.

Google Gemini

active

Default. Closest to Opus.

Strongest · ~1s · Tool calling · 3 models

Gemini 2.5 ProGemini 2.5 FlashGemini 2.0 Flash

Groq

active

Primary fallback. Fastest inference.

Ultra-fast · ~200ms · Tool calling · 5 models

Llama 3.3 70BLlama 3.1 70BLlama 3.1 8BGemma 2 9BMixtral 8x7B

Cerebras

new

2000+ tok/s. Fastest in the world.

Instant · ~100ms · Tool calling · 2 models

Llama 3.3 70BLlama 3.1 8B

OpenRouter

active

Most model variety. 200+ models.

Meta-router · ~1.5s · Tool calling · 7 models

Llama 4 MaverickLlama 4 ScoutLlama 3.3 70BQwen 2.5 CoderDeepSeek R1Mistral LargeGemini 2.5 Pro Exp

NVIDIA NIM

active

8 models. 405B available.

GPU-native · ~800ms · Tool calling · 8 models

Llama 3.3 70BLlama 3.1 405BLlama 3.1 70BNemotron 70BDeepSeek R1Qwen 2.5 72BMistral LargeGemma 2 27B

Together AI

new

405B Turbo. Free credits.

Open-source Hub · ~1s · Tool calling · 6 models

Llama 3.3 70B TurboLlama 3.1 405B TurboLlama 3.1 70B TurboQwen 2.5 72B TurboDeepSeek R1Mixtral 8x22B

SambaNova

new

405B at insane speed. Free tier.

Fast 405B · ~500ms · Tool calling · 4 models

Llama 3.3 70BLlama 3.1 405BLlama 3.1 8BDeepSeek R1

DeepSeek

credits exhausted

R1 reasoning model.

Reasoning · ~2s · Tool calling · 2 models

DeepSeek-V3DeepSeek-R1

Sarvam AI

active

Indian language specialist.

Indic NLP · ~1.5s · Chat only · 1 models

Sarvam-M

the architecture

5 Layers, ~1,550 Lines

Clean-Room Architecture

CLI Layer

src/cli.ts

Interactive REPL with streaming, provider switching, session stats

~280

Agent Runner

src/runner.ts

Agentic loop — calls LLM, extracts tool calls, executes, feeds back

~340

LLM Adapter

src/llm/adapter.ts

Universal OpenAI-compatible adapter — one adapter, all 9 providers

~330

Tool Framework

src/tools/framework.ts

Zod-validated tools, registry, concurrent executor, JSON Schema conversion

~260

Built-in Tools

src/tools/built-in.ts

bash, file_read, file_write, file_edit, grep, glob, ls

~340

the tools

7 Built-in Tools

bash

Execute shell commands

file_read

Read files with line numbers

file_write

Create and write files

file_edit

Find-and-replace edits

grep

Search code with regex

glob

Find files by pattern

ls

List directory contents

claude code vs. ours

Feature Matrix

What We Matched. What's Left.

Based on the claurst reverse-engineering of Claude Code internals. 12 of 16 features shipped. 4 planned.

Feature	Claude	Ours	Note
Agentic Tool Loop	✓	✓	LLM → tools → LLM, up to 15 turns
Bash Execution	✓	✓	Shell commands with timeout + cwd
File Read/Write/Edit	✓	✓	Find-and-replace, line numbers, create dirs
Code Search (grep/glob)	✓	✓	ripgrep with glob filters
Streaming Output	✓	✓	Real-time token streaming via SSE
Multi-Provider (9)	—	✓	Hot-switch with /provider command
38 Models	—	✓	Gemini, Llama, Mistral, Qwen, DeepSeek, Gemma
Auto-Fallback	—	✓	429? Silently tries next provider
Quality System Prompt	✓	✓	2000+ token behavioral instructions
Chain-of-Thought	✓	✓	Internal reasoning, clean output
Parallel Tool Calls	✓	✓	Concurrent execution with semaphore
Conversation History	✓	✓	Multi-turn with full context
Coordinator Mode	✓	◔	Multi-agent orchestration (planned)
Memory Consolidation	✓	◔	autoDream pattern (planned)
Permission System	✓	◔	Risk classification (planned)
Browser Tool	✓	◔	Web browsing (planned)

✓ Shipped◔ Planned— Not applicable

the thesis

The Gap Is Closing.

Claude Opus 4.6 is a frontier model. The architecture around it — the agentic loop, tool calling, streaming — is commodity code. ~1,550 lines of TypeScript replicates the entire pattern.

What makes it feel magical is the model quality — knowing when to read before editing, when to grep before writing, when to stop. Gemini 2.5 Pro gets you 85% of the way. Llama 3.1 405B on Together or SambaNova closes the gap further. Both are free.

Stack quality prompting + chain-of-thought + auto-fallback across 9 providers, and you have a system that produces frontier-competitive output from entirely open-source models. The remaining difference is taste, not capability.

Providers

Models

Built-in Tools

~1.5K

Lines of Code

Cost

Built on open-multi-agent (MIT). Internals informed by claurst. No proprietary code was used.

Open-Source Models.Frontier Quality.

How Open-Source Approaches Opus

All OpenAI-Compatible

Clean-Room Architecture

7 Built-in Tools

What We Matched. What's Left.

The Gap Is Closing.

Open-Source Models.
Frontier Quality.