LLM-driven formulaic alpha mining with typed operators, structured memory, strict runtime recomputation, and a Phase 2 Helix research lane
FactorMiner is a research framework for discovering interpretable alpha factors from market data. It combines:
- a typed DSL over OHLCV-style market features
- an LLM-guided mining loop
- structured experience memory
- library admission and replacement based on predictive power and orthogonality
- strict runtime recomputation for analysis and benchmark reporting
- an extended Helix lane for Phase 2 retrieval, canonicalization, and post-admission validation
The implementation is based on FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery (Wang et al., 2026), then extended with a cleaner architecture layer and a broader research surface.
Current implementation focus:
- canonical paper-style and research mining lanes
- typed DSL operators for OHLCV-style factor formulas
110paper factors shipped in the built-in catalog- runtime recomputation for analysis and benchmark reporting
- CI-backed lint, test, package, CLI smoke, and benchmark-smoke checks
For live local counts, run:
uv run pytest --collect-only -q factorminer/tests
uv run python - <<'PY'
from pathlib import Path
files = sorted(Path("factorminer").rglob("*.py"))
lines = sum(p.read_text(errors="ignore").count("\n") + 1 for p in files)
print(f"Python files: {len(files)}")
print(f"Python lines: {lines}")
PYPrimary execution surfaces:
RalphLoop: canonical paper-style mining loopHelixLoop: Phase 2 research loop with optional retrieval and validation extensionsfactorminer.benchmark.runtime: canonical benchmark runnerfactorminer.architecture: canonical contracts, policies, stages, and services
- Architecture Deep Dive
- Metric Semantics
- Paper Claims Matrix
- Benchmark Baselines
- FAQ
- Reproducibility Guide
- Binance Reproduction Notes
- Bundled Data Notes
- Repo Audit
- Contributing
- Roadmap
flowchart TD
A["Market Data"] --> B["DatasetContract"]
B --> C["Typed DSL + Operator Registry"]
C --> D["Ralph / Helix Stage Pipeline"]
D --> E["EvaluationKernel"]
E --> F["FactorAdmissionService"]
F --> G["FactorLibrary"]
D --> H["MemoryPolicy"]
H --> I["PromptContextBuilder"]
I --> D
G --> J["Runtime Analysis"]
G --> K["Runtime Benchmarks"]
H --> K
B --> K
Two execution lanes share the same core contracts:
| Lane | Purpose | Canonical loop | Typical use |
|---|---|---|---|
| Paper lane | strict, benchmark-facing mining | RalphLoop |
reproducible paper-style runs, library freeze, runtime evaluation |
| Helix lane | extended research mode | HelixLoop |
debate, KG retrieval, family-aware prompts, canonicalization, Phase 2 validation |
Factors are formulas over the canonical feature set:
$open, $high, $low, $close, $volume, $amt, $vwap, $returns
The DSL is parsed into expression trees, executed through the operator registry, and recomputed on demand during analysis and benchmarks.
Paper appendix operator names such as SignedPower, Med, Rsquare,
Slope, Resi, Eq, Min2, Max2, TsDecay, and Scale are accepted by
the parser.
Mining is not plain prompt-and-filter generation. The loop builds a structured retrieval signal from experience memory and library state, then uses it to steer candidate generation.
Supported memory policies:
papernonekgfamily_awareregime_aware
Saved library metadata is not treated as the final source of truth for analysis. The evaluate, combine, visualize, and benchmark paths recompute factor signals from formulas on the supplied dataset.
factorminer.benchmark.runtime is the canonical benchmark entry point. It supports:
- Top-K freeze evaluation across universes
- memory ablations
- strategy-grid ablations over
memory policy × dependence metric × backend - cost-pressure evaluation
- operator and factor efficiency benchmarking
flowchart LR
A["RetrieveStage"] --> B["GenerateStage"]
B --> C["EvaluateStage"]
C --> D["LibraryUpdateStage"]
D --> E["DistillStage"]
E --> A
A -.-> M["MemoryPolicy"]
C -.-> K["EvaluationKernel"]
D -.-> L["FactorAdmissionService"]
B -.-> P["PromptContextBuilder"]
The same stage contract is used by both Ralph and Helix. Helix swaps in richer implementations for retrieval, proposal, validation, and distillation without changing the orchestration model.
git clone https://github.com/minihellboy/factorminer.git
cd factorminer
uv sync --group dev
uv sync --group dev --extra llm
uv sync --group dev --all-extrasNotes:
uv sync --group dev --all-extrasis the intended full contributor setup.- The GPU extra is Linux-oriented because
cupy-cuda12xis not generally installable on macOS. - The packaged default config uses the portable NumPy backend. Pass
--gpuonly when CUDA is available. - Wheels and sdists include
factorminer/configs/*.yamland exclude the internal test package. - Use
uv run ...for all local commands.
python3 -m pip install -e .
python3 -m pip install -e ".[llm]"
python3 -m pip install -e ".[all]"uv run python run_demo.pyuv run factorminer quickstartThis runs doctor, mines a tiny mock library into
/tmp/factorminer-quickstart, generates a static HTML report, and prints the
next commands for real data.
For a runnable, data-shaped walkthrough with sample CSVs and safe /tmp output paths, see examples/quickstart/README.md.
uv run factorminer --helpPrimary commands:
doctorinit-configquickstartvalidate-dataresample-dataminehelixevaluatecombinevisualizebenchmarkexportsession inspect
uv run factorminer mine --mock -n 2 -b 8 -t 10Omitting --gpu and --cpu respects the configured backend. The shipped default is numpy; --gpu and --cpu are explicit overrides.
Paper-mode admission and benchmark selection use ic_paper_mean = abs(mean(IC_t)) and ic_paper_icir = abs(mean(IC_t)) / std(IC_t). The legacy
diagnostic ic_abs_mean = mean(abs(IC_t)) is still reported but is not the
default paper quality gate. See Metric Semantics.
uv run factorminer doctor
uv run factorminer doctor --jsonuv run factorminer init-config factorminer.local.yaml
uv run factorminer --config factorminer.local.yaml mine --mockuv run factorminer --cpu helix --mock --debate --canonicalize -n 2 -b 8 -t 10uv run factorminer --cpu evaluate output/factor_library.json --mock --period both --top-k 10uv run factorminer --cpu combine output/factor_library.json \
--mock \
--fit-period train \
--eval-period test \
--method all \
--selection lasso \
--top-k 20uv run factorminer --cpu visualize output/factor_library.json \
--mock \
--period test \
--correlation \
--ic-timeseries \
--quintile \
--tearsheetuv run factorminer --cpu --config factorminer/configs/paper_repro.yaml \
benchmark table1 --mock --baseline factor_mineruv run factorminer --cpu benchmark ablation-strategy --mock --baseline factor_mineruv run factorminer session inspect output
uv run factorminer session inspect output --jsonAvailable benchmark commands:
benchmark table1benchmark ablation-memorybenchmark ablation-strategybenchmark cost-pressurebenchmark efficiencybenchmark suite
The benchmark suite uses the runtime recomputation layer and carries protocol, dataset, and runtime-manifest metadata into emitted artifacts. See Benchmark Baselines for which baselines are real, partial, or proxy-backed today.
The default config lives at factorminer/configs/default.yaml.
Top-level config sections:
miningevaluationdatallmmemoryphase2benchmarkresearch
Important configuration themes:
evaluation.backend:numpy,c, orgpu--gpu/--cpu: explicit CLI backend override; omitted means use configevaluation.redundancy_metric:spearman,pearson, ordistance_correlationmemory.policy:paper,none,kg,family_aware, orregime_awarebenchmark.strategy_ablation.*: runtime grid over memory policy, dependence metric, and backendresearch.*: multi-horizon scoring, uncertainty controls, and selection models
Profile configs shipped in the repo:
factorminer/configs/binance_sample.yamlfactorminer/configs/paper_repro.yamlfactorminer/configs/paper_repro_binance.yamlfactorminer/configs/benchmark_full.yamlfactorminer/configs/helix_research.yamlfactorminer/configs/demo_local.yaml
Input data is expected to include at least:
datetime, asset_id, open, high, low, close, volume, amount
Accepted identifier aliases include code, ticker, symbol, ts_code, and amt. If vwap or returns are missing, the runtime layer derives them.
factorminer/
├── agent/ LLM providers, prompts, debate
├── architecture/ Canonical contracts, policies, stages, services
├── benchmark/ Runtime benchmark suite and legacy benchmark helpers
├── configs/ YAML profiles
├── core/ Loops, parser, expression trees, factor library, I/O
├── data/ Loaders, preprocessing, tensor building, mock data
├── evaluation/ Metrics, runtime recomputation, analysis, validation
├── memory/ Experience memory, KG retrieval, embeddings
├── operators/ Operator specs, backends, registry
├── tests/ Pytest coverage
└── utils/ Config loading, plotting, reporting
factorminer.architectureis now the canonical place for protocol, dataset, memory, evaluation, stage, and prompt boundaries.factorminer.benchmark.runtimeis the canonical benchmark runner.factorminer.benchmark.helix_benchmarkandrun_phase2_benchmark.pyare still present, but they are legacy-facing compared with the runtime suite.output/is ignored and should be treated as mutable runtime state, not source-controlled project state.
uv run pytest -q factorminer/testsuv run ruff check .uv buildFull-repo mypy is intentionally non-blocking for now. The current cleanup target reports the fixed stabilization surface without making CI depend on it:
uv run mypy --ignore-missing-imports --follow-imports=skip factorminer/utils/config.py factorminer/evaluation/runtime.py factorminer/operators/sandbox.py factorminer/operators/custom.py factorminer/operators/auto_inventor.py factorminer/memory/evolution.py factorminer/memory/online_regime_memory.py factorminer/cli.pyMIT. See LICENSE.