Boredfolio.
Try BenClaude
Reverse-Engineering Claude Code

Open-Source Models.
Frontier Quality.

9 providers. 38 models. Quality-tuned prompting with chain-of-thought reasoning and auto-fallback. The same agentic architecture as Claude Code — bash, file ops, code search, streaming — running on Gemini, Llama, Mistral, Qwen, DeepSeek, and more.

Try BenClaude →Based on open-multi-agent ↑
how we match quality
5 Techniques Stacked

How Open-Source Approaches Opus

Model quality is the bottleneck. These techniques compensate — disciplined prompting makes weaker models behave like stronger ones.

High
Quality System Prompt
2000+ tokens of behavioral instructions. Think internally, self-critique, verify before outputting. Never expose scaffolding.
High
Chain-of-Thought (Internal)
Models reason step-by-step silently. Only the final, polished answer is shown. No leaked "### Planning" headers.
Medium
Auto-Fallback Chain
Gemini 429s? Silently falls back to Groq → OpenRouter → NVIDIA → Cerebras → Together → SambaNova. User never sees an error.
Medium
Low Temperature (0.4)
Reduced randomness for more deterministic, accurate outputs. Higher precision on code and factual claims.
Low
8K Token Output
Double the default max_tokens. Complex answers aren't truncated mid-thought.
the providers
9 Providers, 1 Adapter

All OpenAI-Compatible

Every provider speaks the OpenAI wire format. One universal adapter with different base URLs. If one provider rate-limits, the system silently falls back to the next.

Google Gemini
active
Default. Closest to Opus.
Strongest · ~1s · Tool calling · 3 models
Gemini 2.5 ProGemini 2.5 FlashGemini 2.0 Flash
Groq
active
Primary fallback. Fastest inference.
Ultra-fast · ~200ms · Tool calling · 5 models
Llama 3.3 70BLlama 3.1 70BLlama 3.1 8BGemma 2 9BMixtral 8x7B
Cerebras
new
2000+ tok/s. Fastest in the world.
Instant · ~100ms · Tool calling · 2 models
Llama 3.3 70BLlama 3.1 8B
OpenRouter
active
Most model variety. 200+ models.
Meta-router · ~1.5s · Tool calling · 7 models
Llama 4 MaverickLlama 4 ScoutLlama 3.3 70BQwen 2.5 CoderDeepSeek R1Mistral LargeGemini 2.5 Pro Exp
NVIDIA NIM
active
8 models. 405B available.
GPU-native · ~800ms · Tool calling · 8 models
Llama 3.3 70BLlama 3.1 405BLlama 3.1 70BNemotron 70BDeepSeek R1Qwen 2.5 72BMistral LargeGemma 2 27B
Together AI
new
405B Turbo. Free credits.
Open-source Hub · ~1s · Tool calling · 6 models
Llama 3.3 70B TurboLlama 3.1 405B TurboLlama 3.1 70B TurboQwen 2.5 72B TurboDeepSeek R1Mixtral 8x22B
SambaNova
new
405B at insane speed. Free tier.
Fast 405B · ~500ms · Tool calling · 4 models
Llama 3.3 70BLlama 3.1 405BLlama 3.1 8BDeepSeek R1
DeepSeek
credits exhausted
R1 reasoning model.
Reasoning · ~2s · Tool calling · 2 models
DeepSeek-V3DeepSeek-R1
Sarvam AI
active
Indian language specialist.
Indic NLP · ~1.5s · Chat only · 1 models
Sarvam-M
the architecture
5 Layers, ~1,550 Lines

Clean-Room Architecture

CLI Layer
src/cli.ts
Interactive REPL with streaming, provider switching, session stats
~280
Agent Runner
src/runner.ts
Agentic loop — calls LLM, extracts tool calls, executes, feeds back
~340
LLM Adapter
src/llm/adapter.ts
Universal OpenAI-compatible adapter — one adapter, all 9 providers
~330
Tool Framework
src/tools/framework.ts
Zod-validated tools, registry, concurrent executor, JSON Schema conversion
~260
Built-in Tools
src/tools/built-in.ts
bash, file_read, file_write, file_edit, grep, glob, ls
~340
the tools

7 Built-in Tools

bash
Execute shell commands
file_read
Read files with line numbers
file_write
Create and write files
file_edit
Find-and-replace edits
grep
Search code with regex
glob
Find files by pattern
ls
List directory contents
claude code vs. ours
Feature Matrix

What We Matched. What's Left.

Based on the claurst reverse-engineering of Claude Code internals. 12 of 16 features shipped. 4 planned.

FeatureClaudeOursNote
Agentic Tool LoopLLM → tools → LLM, up to 15 turns
Bash ExecutionShell commands with timeout + cwd
File Read/Write/EditFind-and-replace, line numbers, create dirs
Code Search (grep/glob)ripgrep with glob filters
Streaming OutputReal-time token streaming via SSE
Multi-Provider (9)Hot-switch with /provider command
38 ModelsGemini, Llama, Mistral, Qwen, DeepSeek, Gemma
Auto-Fallback429? Silently tries next provider
Quality System Prompt2000+ token behavioral instructions
Chain-of-ThoughtInternal reasoning, clean output
Parallel Tool CallsConcurrent execution with semaphore
Conversation HistoryMulti-turn with full context
Coordinator ModeMulti-agent orchestration (planned)
Memory ConsolidationautoDream pattern (planned)
Permission SystemRisk classification (planned)
Browser ToolWeb browsing (planned)
Shipped Planned Not applicable
the thesis

The Gap Is Closing.

Claude Opus 4.6 is a frontier model. The architecture around it — the agentic loop, tool calling, streaming — is commodity code. ~1,550 lines of TypeScript replicates the entire pattern.

What makes it feel magical is the model quality — knowing when to read before editing, when to grep before writing, when to stop. Gemini 2.5 Pro gets you 85% of the way. Llama 3.1 405B on Together or SambaNova closes the gap further. Both are free.

Stack quality prompting + chain-of-thought + auto-fallback across 9 providers, and you have a system that produces frontier-competitive output from entirely open-source models. The remaining difference is taste, not capability.

9
Providers
38
Models
7
Built-in Tools
~1.5K
Lines of Code
$0
Cost

Built on open-multi-agent (MIT). Internals informed by claurst. No proprietary code was used.