ButlerBot

A multi-strategy algorithmic trading framework with a quant validation pipeline, designed to be driven through Claude Code. Four AI agents pressure-test every strategy through five gates before it ever touches paper trading.

The problem

Most retail algo trading goes one of two ways. Either you write a backtest script, see a 3.0 Sharpe, and lose money in week one because you forgot slippage and overfit to noise. Or you spend six months building a research stack and never deploy anything because the surface area got too big.

ButlerBot is the version of this I wish I had on day one. It treats strategy development as a research-to-live pipeline with explicit gates, opinionated defaults (friction always applied, no look-ahead, fail-closed on feed outages), and a quant team of AI agents whose default answer is REJECT.

About this version

ButlerBot is a generalised, simplified version of a private trading bot (artemisBot). The included EMA crossover strategy is a textbook example, and the validation tests are intentionally basic. The framework is designed to be extended — bring your own strategies, add your own adversarial tests, customise the pipeline.

Driven through Claude Code

You don't need to know Python or the command line to use it. Clone the repo, open it in Claude Code, type /start, and follow along. From there it's plain English (“test this idea”, “what was my last backtest”, “try a different timeframe”) or these slash commands:

/start

First-run orientation. Checks your setup, walks through anything missing (Alpaca keys, dependencies), and recommends what to do next.

/backtest

Runs a backtest on a strategy and explains the results in plain English — no Sharpe-ratio jargon dump.

/new-thesis

Coaches a rough trading idea ("what if I bought when X happens?") into a testable strategy. Drafts both the thesis and the strategy code.

/test-thesis

Runs the 4-agent adversarial validation pipeline on a strategy. Narrates each gate and translates failures into next steps.

The quant team

Every strategy passes through four agents and five gates before it deploys to paper. Skip a gate and the pipeline halts. The Adversarial Statistician's default is REJECT — strategies are guilty until proven robust.

Hypothesizer

Validates economic rationale

Thesis must have a structural driver — not just a backtest that happens to look good.

Data Archaeologist

Data quality + feature engineering

Missing data under 5%, no survivorship bias, no look-ahead leaks in features.

Model Architect

Builds and backtests the model

In-sample Sharpe ≥ 0.5, max drawdown under 30%.

Adversarial Statistician

Three validation tests, default REJECT

Out-of-sample Sharpe holds ≥ 50% of in-sample. Param sensitivity stays low (Sharpe std < 0.3). Doubling friction doesn't kill profitability.

Architecture

The pipeline is linear by design — each agent gates the next, and nothing reaches live execution without surviving the previous stage.

Thesis YAML

→ Hypothesizer [GATE 1]

→ Data Archaeologist [GATE 2]

→ Model Architect [GATE 3]

→ Adversarial Stats [GATE 4]

→ Paper Deploy [GATE 5]

Underneath that, the runtime data flow:

data_ingestion (Alpaca + parquet cache)

↓

strategy_engine (BaseStrategy + registry)

↓

ml_regime_filter (RandomForest)

↓

backtester (vectorbt, friction always on)

↓

execution (Alpaca, atomic locking)

↓

watchdog (heartbeat, fail-closed)

Dashboard

A FastAPI + Plotly + HTMX dashboard runs at localhost:8050. Ask Claude (“open the dashboard”) or run python -m dashboard.app directly.

Bot Statuswatchdog health, positions, regime state

Research Pipelineview/resume validation runs, agent audit logs

Backtests & Resultsequity curves, tearsheet comparisons

Paper Strategiesstrategies deployed for forward testing

Strategy 101educational content on algo trading with historical case studies

Logsreal-time log viewer with level filtering

Five rules I won't break

The full set lives in DESIGN_SPEC.md (twenty rules total). These five are the ones that actually decide whether a backtest is honest.

No look-ahead bias

Row i uses data[0..i-1] only. Every indicator is .shift(1) before use.

Friction always applied

Commission 0.001, slippage 0.001 in every backtest. Zero-fee results are rejected.

alpaca-py v3 only

Never the deprecated alpaca-trade-api. Pin alpaca-py ≥ 0.30.

Atomic state locking

The entire order cycle runs under threading.Lock — no concurrent order mutations.

Fail-closed

Feed dead more than 10s = cancel all open orders immediately.

Tech stack

Python 3.12core language

alpaca-py ≥ 0.30v3 SDK for market data and order execution

vectorbtvectorized backtesting via Portfolio.from_orders()

scikit-learnRandomForest regime filter (trending vs choppy markets)

SQLiteorder persistence + research audit trail

FastAPI + Plotly + HTMXlocalhost web dashboard

Dockercontainerized deployment

Quick start

You need free Alpaca paper trading API keys (sign up at alpaca.markets) and Claude Code installed. Everything else is in the repo.

# Clone and open in Claude Code

git clone https://github.com/DiffTheEnder/butlerBot.git

cd butlerBot

claude

# Then in Claude Code

/start

From there Claude walks you through Alpaca keys, dependencies, and your first backtest in plain English.

Advanced: direct CLI

Skip Claude Code if you'd rather drive ButlerBot directly from a terminal — everything works as a normal Python project. Three modes:

Backtest

python main.py --mode backtest

Run a strategy against historical data with friction always applied.

Research

python main.py --mode research --thesis <name>

Send a strategy through the full 4-agent, 5-gate validation pipeline.

Live

python main.py --mode live

Paper or live trading via Alpaca with watchdog monitoring and fail-closed behaviour.

Disclaimer

Educational and research purposes only. Not financial advice. Always use paper trading first. Past performance does not guarantee future results. Markets stay irrational longer than your conviction holds.

View on GitHub

Open source · MIT licence

Newsletter

I build tools and frameworks like this once in a while.

New open source projects, calculators, and strategy frameworks -- delivered monthly. I'll let you know when something useful ships.