FactorMiner

LLM-driven formulaic alpha mining with typed operators, structured memory, strict runtime recomputation, and a Phase 2 Helix research lane

FactorMiner is a research framework for discovering interpretable alpha factors from market data. It combines:

a typed DSL over OHLCV-style market features
an LLM-guided mining loop
structured experience memory
library admission and replacement based on predictive power and orthogonality
strict runtime recomputation for analysis and benchmark reporting
an extended Helix lane for Phase 2 retrieval, canonicalization, and post-admission validation

The implementation is based on FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery (Wang et al., 2026), then extended with a cleaner architecture layer and a broader research surface.

Repository Status

Current implementation focus:

canonical paper-style and research mining lanes
typed DSL operators for OHLCV-style factor formulas
110 paper factors shipped in the built-in catalog
runtime recomputation for analysis and benchmark reporting
CI-backed lint, test, package, CLI smoke, and benchmark-smoke checks

For live local counts, run:

uv run pytest --collect-only -q factorminer/tests
uv run python - <<'PY'
from pathlib import Path
files = sorted(Path("factorminer").rglob("*.py"))
lines = sum(p.read_text(errors="ignore").count("\n") + 1 for p in files)
print(f"Python files: {len(files)}")
print(f"Python lines: {lines}")
PY

Primary execution surfaces:

RalphLoop: canonical paper-style mining loop
HelixLoop: Phase 2 research loop with optional retrieval and validation extensions
factorminer.benchmark.runtime: canonical benchmark runner
factorminer.architecture: canonical contracts, policies, stages, and services

Documentation Map

Architecture At A Glance

flowchart TD
    A["Market Data"] --> B["DatasetContract"]
    B --> C["Typed DSL + Operator Registry"]
    C --> D["Ralph / Helix Stage Pipeline"]
    D --> E["EvaluationKernel"]
    E --> F["FactorAdmissionService"]
    F --> G["FactorLibrary"]
    D --> H["MemoryPolicy"]
    H --> I["PromptContextBuilder"]
    I --> D
    G --> J["Runtime Analysis"]
    G --> K["Runtime Benchmarks"]
    H --> K
    B --> K

Two execution lanes share the same core contracts:

Lane	Purpose	Canonical loop	Typical use
Paper lane	strict, benchmark-facing mining	`RalphLoop`	reproducible paper-style runs, library freeze, runtime evaluation
Helix lane	extended research mode	`HelixLoop`	debate, KG retrieval, family-aware prompts, canonicalization, Phase 2 validation

Core Concepts

1. Typed factor DSL

Factors are formulas over the canonical feature set:

$open, $high, $low, $close, $volume, $amt, $vwap, $returns

The DSL is parsed into expression trees, executed through the operator registry, and recomputed on demand during analysis and benchmarks. Paper appendix operator names such as SignedPower, Med, Rsquare, Slope, Resi, Eq, Min2, Max2, TsDecay, and Scale are accepted by the parser.

2. Memory-guided mining

Mining is not plain prompt-and-filter generation. The loop builds a structured retrieval signal from experience memory and library state, then uses it to steer candidate generation.

Supported memory policies:

paper
none
kg
family_aware
regime_aware

3. Strict runtime recomputation

Saved library metadata is not treated as the final source of truth for analysis. The evaluate, combine, visualize, and benchmark paths recompute factor signals from formulas on the supplied dataset.

4. Canonical benchmark surface

factorminer.benchmark.runtime is the canonical benchmark entry point. It supports:

Top-K freeze evaluation across universes
memory ablations
strategy-grid ablations over memory policy × dependence metric × backend
cost-pressure evaluation
operator and factor efficiency benchmarking

Canonical Runtime Flow

flowchart LR
    A["RetrieveStage"] --> B["GenerateStage"]
    B --> C["EvaluateStage"]
    C --> D["LibraryUpdateStage"]
    D --> E["DistillStage"]
    E --> A

    A -.-> M["MemoryPolicy"]
    C -.-> K["EvaluationKernel"]
    D -.-> L["FactorAdmissionService"]
    B -.-> P["PromptContextBuilder"]

The same stage contract is used by both Ralph and Helix. Helix swaps in richer implementations for retrieval, proposal, validation, and distillation without changing the orchestration model.

Setup

Recommended: `uv`

git clone https://github.com/minihellboy/factorminer.git
cd factorminer

uv sync --group dev
uv sync --group dev --extra llm
uv sync --group dev --all-extras

Notes:

uv sync --group dev --all-extras is the intended full contributor setup.
The GPU extra is Linux-oriented because cupy-cuda12x is not generally installable on macOS.
The packaged default config uses the portable NumPy backend. Pass --gpu only when CUDA is available.
Wheels and sdists include factorminer/configs/*.yaml and exclude the internal test package.
Use uv run ... for all local commands.

`pip` fallback

python3 -m pip install -e .
python3 -m pip install -e ".[llm]"
python3 -m pip install -e ".[all]"

Quick Start

Demo without API keys

uv run python run_demo.py

One-command quickstart

uv run factorminer quickstart

This runs doctor, mines a tiny mock library into /tmp/factorminer-quickstart, generates a static HTML report, and prints the next commands for real data.

Deterministic quickstart examples

For a runnable, data-shaped walkthrough with sample CSVs and safe /tmp output paths, see examples/quickstart/README.md.

CLI overview

uv run factorminer --help

Primary commands:

doctor
init-config
quickstart
validate-data
resample-data
mine
helix
evaluate
combine
visualize
benchmark
export
session inspect

Common Workflows

Mine with mock data

uv run factorminer mine --mock -n 2 -b 8 -t 10

Omitting --gpu and --cpu respects the configured backend. The shipped default is numpy; --gpu and --cpu are explicit overrides.

Paper-mode admission and benchmark selection use ic_paper_mean = abs(mean(IC_t)) and ic_paper_icir = abs(mean(IC_t)) / std(IC_t). The legacy diagnostic ic_abs_mean = mean(abs(IC_t)) is still reported but is not the default paper quality gate. See Metric Semantics.

First-run health check

uv run factorminer doctor
uv run factorminer doctor --json

Create a local starter config

uv run factorminer init-config factorminer.local.yaml
uv run factorminer --config factorminer.local.yaml mine --mock

Run Helix with selected Phase 2 features

uv run factorminer --cpu helix --mock --debate --canonicalize -n 2 -b 8 -t 10

Evaluate a saved library with strict recomputation

uv run factorminer --cpu evaluate output/factor_library.json --mock --period both --top-k 10

Combine factors on explicit fit/eval splits

uv run factorminer --cpu combine output/factor_library.json \
  --mock \
  --fit-period train \
  --eval-period test \
  --method all \
  --selection lasso \
  --top-k 20

Visualize recomputed artifacts

uv run factorminer --cpu visualize output/factor_library.json \
  --mock \
  --period test \
  --correlation \
  --ic-timeseries \
  --quintile \
  --tearsheet

Run the strict paper benchmark lane

uv run factorminer --cpu --config factorminer/configs/paper_repro.yaml \
  benchmark table1 --mock --baseline factor_miner

Run the strategy-grid ablation lane

uv run factorminer --cpu benchmark ablation-strategy --mock --baseline factor_miner

Inspect a completed or partial session

uv run factorminer session inspect output
uv run factorminer session inspect output --json

Benchmark Surface

Available benchmark commands:

benchmark table1
benchmark ablation-memory
benchmark ablation-strategy
benchmark cost-pressure
benchmark efficiency
benchmark suite

The benchmark suite uses the runtime recomputation layer and carries protocol, dataset, and runtime-manifest metadata into emitted artifacts. See Benchmark Baselines for which baselines are real, partial, or proxy-backed today.

Configuration Model

The default config lives at factorminer/configs/default.yaml.

Top-level config sections:

mining
evaluation
data
llm
memory
phase2
benchmark
research

Important configuration themes:

evaluation.backend: numpy, c, or gpu
--gpu/--cpu: explicit CLI backend override; omitted means use config
evaluation.redundancy_metric: spearman, pearson, or distance_correlation
memory.policy: paper, none, kg, family_aware, or regime_aware
benchmark.strategy_ablation.*: runtime grid over memory policy, dependence metric, and backend
research.*: multi-horizon scoring, uncertainty controls, and selection models

Profile configs shipped in the repo:

Data Format

Input data is expected to include at least:

datetime, asset_id, open, high, low, close, volume, amount

Accepted identifier aliases include code, ticker, symbol, ts_code, and amt. If vwap or returns are missing, the runtime layer derives them.

Project Layout

factorminer/
├── agent/           LLM providers, prompts, debate
├── architecture/    Canonical contracts, policies, stages, services
├── benchmark/       Runtime benchmark suite and legacy benchmark helpers
├── configs/         YAML profiles
├── core/            Loops, parser, expression trees, factor library, I/O
├── data/            Loaders, preprocessing, tensor building, mock data
├── evaluation/      Metrics, runtime recomputation, analysis, validation
├── memory/          Experience memory, KG retrieval, embeddings
├── operators/       Operator specs, backends, registry
├── tests/           Pytest coverage
└── utils/           Config loading, plotting, reporting

Current Implementation Notes

factorminer.architecture is now the canonical place for protocol, dataset, memory, evaluation, stage, and prompt boundaries.
factorminer.benchmark.runtime is the canonical benchmark runner.
factorminer.benchmark.helix_benchmark and run_phase2_benchmark.py are still present, but they are legacy-facing compared with the runtime suite.
output/ is ignored and should be treated as mutable runtime state, not source-controlled project state.

Development

Run tests

uv run pytest -q factorminer/tests

Lint

uv run ruff check .

Build a wheel

uv build

Scoped type-health check

Full-repo mypy is intentionally non-blocking for now. The current cleanup target reports the fixed stabilization surface without making CI depend on it:

uv run mypy --ignore-missing-imports --follow-imports=skip factorminer/utils/config.py factorminer/evaluation/runtime.py factorminer/operators/sandbox.py factorminer/operators/custom.py factorminer/operators/auto_inventor.py factorminer/memory/evolution.py factorminer/memory/online_regime_memory.py factorminer/cli.py

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
data		data
docs		docs
examples/quickstart		examples/quickstart
factorminer		factorminer
.gitignore		.gitignore
2602.14670v1.pdf		2602.14670v1.pdf
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml
run_demo.py		run_demo.py
run_phase2_benchmark.py		run_phase2_benchmark.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

FactorMiner

Repository Status

Documentation Map

Architecture At A Glance

Core Concepts

1. Typed factor DSL

2. Memory-guided mining

3. Strict runtime recomputation

4. Canonical benchmark surface

Canonical Runtime Flow

Setup

Recommended: uv

pip fallback

Quick Start

Demo without API keys

One-command quickstart

Deterministic quickstart examples

CLI overview

Common Workflows

Mine with mock data

First-run health check

Create a local starter config

Run Helix with selected Phase 2 features

Evaluate a saved library with strict recomputation

Combine factors on explicit fit/eval splits

Visualize recomputed artifacts

Run the strict paper benchmark lane

Run the strategy-grid ablation lane

Inspect a completed or partial session

Benchmark Surface

Configuration Model

Data Format

Project Layout

Current Implementation Notes

Development

Run tests

Lint

Build a wheel

Scoped type-health check

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Recommended: `uv`

`pip` fallback

Packages