Version: 0.1.0
This page documents the core data structures used across DecisionBox.
A complete discovery run output. Stored in the discoveries MongoDB collection.
| Field |
Type |
Description |
id |
string |
MongoDB ObjectID |
project_id |
string |
Project that owns this discovery |
domain |
string |
Domain (e.g., gaming, social) |
category |
string |
Category (e.g., match3, idle, casual, content_sharing) |
run_type |
string |
full (all areas), partial (some areas or some failed), failed (all areas failed) |
areas_requested |
string[] |
Area IDs requested (empty for full run) |
discovery_date |
timestamp |
When the discovery ran |
total_steps |
int |
Number of exploration steps executed |
duration |
int64 |
Duration in nanoseconds |
insights |
Insight[] |
Discovered patterns |
recommendations |
Recommendation[] |
Actionable advice |
summary |
Summary |
Aggregate stats |
exploration_log |
ExplorationStep[] |
Every SQL query + AI reasoning |
analysis_log |
AnalysisStep[] |
Full LLM dialog per analysis area |
recommendation_log |
RecommendationStep |
Full LLM dialog for recommendations |
validation_log |
ValidationResult[] |
Verification queries + results |
created_at |
timestamp |
When result was saved |
A discovered pattern or finding. Generated by the analysis phase.
| Field |
Type |
Description |
id |
string |
Deterministic ID: {area}-{index} (e.g., churn-1, monetization-3). Auto-generated if LLM omits it. |
analysis_area |
string |
Which area found this (e.g., churn, levels) |
name |
string |
Specific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return") |
description |
string |
Detailed description with exact numbers and percentages |
severity |
string |
critical, high, medium, or low |
affected_count |
int |
Number of affected users (COUNT DISTINCT user_id) |
risk_score |
float64 |
0.0 to 1.0 risk assessment |
confidence |
float64 |
0.0 to 1.0 confidence level |
metrics |
map |
Flexible key-value metrics (e.g., {"churn_rate": 0.67, "avg_sessions": 3.2}) |
indicators |
string[] |
Specific metric indicators (e.g., "Session drop: 12min → 4min") |
target_segment |
string |
Description of affected user segment |
source_steps |
int[] |
Exploration step numbers that support this insight |
validation |
InsightValidation |
Warehouse verification result (if validated) |
discovered_at |
timestamp |
When this insight was generated |
Attached to an insight after warehouse verification.
| Field |
Type |
Description |
status |
string |
confirmed, adjusted, rejected, or error |
original_count |
int |
Count claimed by the AI |
verified_count |
int |
Count verified from the warehouse |
reasoning |
string |
Explanation of the validation result |
validated_at |
timestamp |
When validation was performed |
An actionable suggestion based on discovered insights.
| Field |
Type |
Description |
id |
string |
Recommendation ID |
category |
string |
Category: churn, engagement, monetization, difficulty |
title |
string |
Specific action title |
description |
string |
Detailed explanation with numbers |
priority |
int |
1 (critical) to 5 (optional). P1 = highest priority. |
target_segment |
string |
Exact segment criteria |
segment_size |
int |
Number of users in the segment |
expected_impact |
Impact |
Expected improvement |
actions |
string[] |
Numbered implementation steps |
related_insight_ids |
string[] |
IDs of insights this recommendation addresses (e.g., ["churn-1", "levels-2"]) |
confidence |
float64 |
0.0 to 1.0 confidence |
Expected impact of a recommendation.
| Field |
Type |
Description |
metric |
string |
What metric improves (e.g., retention_rate, revenue) |
estimated_improvement |
string |
Expected improvement (e.g., "+15-20%", "+$4,975/month") |
reasoning |
string |
Why this improvement is expected |
Aggregate stats for a discovery run.
| Field |
Type |
Description |
total_insights |
int |
Number of insights generated |
total_recommendations |
int |
Number of recommendations generated |
queries_executed |
int |
Number of SQL queries executed |
errors |
string[] |
Error messages from failed analysis areas (if any) |
One step in the autonomous exploration phase. Represents a single LLM call + SQL query.
| Field |
Type |
Description |
step |
int |
Step number (1-based) |
timestamp |
timestamp |
When this step ran |
action |
string |
Always query_data |
thinking |
string |
AI's reasoning for this query |
query_purpose |
string |
Short description of query intent |
query |
string |
The SQL query executed |
row_count |
int |
Number of rows returned |
execution_time_ms |
int64 |
Query execution time in milliseconds |
error |
string |
Error message if query failed |
fixed |
bool |
True if the query was auto-fixed after a SQL error |
Full LLM dialog for one analysis area. Captures the complete prompt and response.
| Field |
Type |
Description |
area_id |
string |
Analysis area ID (e.g., churn) |
area_name |
string |
Display name (e.g., Churn Risks) |
run_at |
timestamp |
When this analysis ran |
relevant_queries |
int |
Number of exploration queries used as context |
tokens_in |
int |
Input tokens consumed |
tokens_out |
int |
Output tokens generated |
duration_ms |
int64 |
LLM call duration in milliseconds |
insight_count |
int |
Number of insights extracted |
error |
string |
Error message if analysis failed |
Warehouse verification of an insight's claims.
| Field |
Type |
Description |
insight_id |
string |
ID of the validated insight |
analysis_area |
string |
Area this insight belongs to |
claimed_count |
int |
Count claimed by the AI |
verified_count |
int |
Count verified from the warehouse |
status |
string |
confirmed, adjusted, rejected, error |
reasoning |
string |
Explanation of the result |
query |
string |
The verification SQL query |
validated_at |
timestamp |
When validation was performed |
Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.
| Field |
Type |
Description |
id |
string |
Run ID |
project_id |
string |
Project being discovered |
status |
string |
pending, running, completed, failed, cancelled |
phase |
string |
Current phase: init, schema_discovery, exploration, analysis, validation, recommendations, saving, complete |
phase_detail |
string |
Human-readable phase description |
progress |
int |
0 to 100 percentage |
started_at |
timestamp |
When the run started |
updated_at |
timestamp |
Last status update |
completed_at |
timestamp |
When the run finished (null if running) |
error |
string |
Error message (if failed) |
steps |
RunStep[] |
Live step feed |
total_queries |
int |
Total SQL queries executed |
successful_queries |
int |
Queries that returned results |
failed_queries |
int |
Queries that errored |
insights_found |
int |
Insights generated so far |
One step in the live progress feed.
| Field |
Type |
Description |
phase |
string |
Which phase this step belongs to |
step_num |
int |
Step number |
timestamp |
timestamp |
When this step occurred |
type |
string |
query, insight, analysis, validation, recommendation, error |
message |
string |
Step description |
llm_thinking |
string |
AI's reasoning text |
query |
string |
SQL query (if type=query) |
query_result |
string |
Query result summary |
row_count |
int |
Rows returned |
query_time_ms |
int |
Query execution time |
query_fixed |
bool |
Whether query was auto-fixed |
insight_name |
string |
Insight name (if type=insight) |
insight_severity |
string |
Insight severity (if type=insight) |
error |
string |
Error message (if type=error) |
User feedback on insights, recommendations, or exploration steps.
| Field |
Type |
Description |
id |
string |
Feedback ID |
project_id |
string |
Project ID |
discovery_id |
string |
Discovery run ID |
target_type |
string |
insight, recommendation, exploration_step |
target_id |
string |
ID of the rated item |
rating |
string |
like or dislike |
comment |
string |
Optional comment (typically with dislikes) |
created_at |
timestamp |
When feedback was submitted |
Project configuration. Stored in projects collection.
| Field |
Type |
Description |
id |
string |
MongoDB ObjectID |
name |
string |
Project name |
description |
string |
Project description |
domain |
string |
Domain (e.g., gaming, social) |
category |
string |
Category (e.g., match3, idle, casual, content_sharing) |
warehouse |
WarehouseConfig |
Data warehouse configuration |
llm |
LLMConfig |
LLM provider configuration |
schedule |
ScheduleConfig |
Discovery schedule |
profile |
map |
Domain-specific profile (from JSON Schema form) |
prompts |
ProjectPrompts |
Per-project prompt overrides |
status |
string |
Project status |
last_run_at |
timestamp |
When the last discovery ran |
last_run_status |
string |
Last run result |
created_at |
timestamp |
When the project was created |
updated_at |
timestamp |
Last update |
| Field |
Type |
Description |
provider |
string |
Provider ID: bigquery, redshift |
project_id |
string |
GCP project ID (BigQuery) |
datasets |
string[] |
Dataset/schema names |
location |
string |
Data location |
filter_field |
string |
Multi-tenant filter column |
filter_value |
string |
Multi-tenant filter value |
config |
map |
Provider-specific key-value config |
| Field |
Type |
Description |
provider |
string |
Provider ID: claude, openai, ollama, vertex-ai, bedrock |
model |
string |
Model identifier (free text) |
config |
map |
Provider-specific key-value config (e.g., project_id, location for Vertex AI) |
| Field |
Type |
Description |
enabled |
bool |
Whether automatic discovery is enabled |
cron_expr |
string |
Cron expression (e.g., 0 2 * * * = daily at 2 AM) |
max_steps |
int |
Max exploration steps for scheduled runs |