Skip to content

Latest commit

 

History

History
262 lines (213 loc) · 11.1 KB

File metadata and controls

262 lines (213 loc) · 11.1 KB

Data Models Reference

Version: 0.1.0

This page documents the core data structures used across DecisionBox.

DiscoveryResult

A complete discovery run output. Stored in the discoveries MongoDB collection.

Field Type Description
id string MongoDB ObjectID
project_id string Project that owns this discovery
domain string Domain (e.g., gaming, social)
category string Category (e.g., match3, idle, casual, content_sharing)
run_type string full (all areas), partial (some areas or some failed), failed (all areas failed)
areas_requested string[] Area IDs requested (empty for full run)
discovery_date timestamp When the discovery ran
total_steps int Number of exploration steps executed
duration int64 Duration in nanoseconds
insights Insight[] Discovered patterns
recommendations Recommendation[] Actionable advice
summary Summary Aggregate stats
exploration_log ExplorationStep[] Every SQL query + AI reasoning
analysis_log AnalysisStep[] Full LLM dialog per analysis area
recommendation_log RecommendationStep Full LLM dialog for recommendations
validation_log ValidationResult[] Verification queries + results
created_at timestamp When result was saved

Insight

A discovered pattern or finding. Generated by the analysis phase.

Field Type Description
id string Deterministic ID: {area}-{index} (e.g., churn-1, monetization-3). Auto-generated if LLM omits it.
analysis_area string Which area found this (e.g., churn, levels)
name string Specific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return")
description string Detailed description with exact numbers and percentages
severity string critical, high, medium, or low
affected_count int Number of affected users (COUNT DISTINCT user_id)
risk_score float64 0.0 to 1.0 risk assessment
confidence float64 0.0 to 1.0 confidence level
metrics map Flexible key-value metrics (e.g., {"churn_rate": 0.67, "avg_sessions": 3.2})
indicators string[] Specific metric indicators (e.g., "Session drop: 12min → 4min")
target_segment string Description of affected user segment
source_steps int[] Exploration step numbers that support this insight
validation InsightValidation Warehouse verification result (if validated)
discovered_at timestamp When this insight was generated

InsightValidation

Attached to an insight after warehouse verification.

Field Type Description
status string confirmed, adjusted, rejected, or error
original_count int Count claimed by the AI
verified_count int Count verified from the warehouse
reasoning string Explanation of the validation result
validated_at timestamp When validation was performed

Recommendation

An actionable suggestion based on discovered insights.

Field Type Description
id string Recommendation ID
category string Category: churn, engagement, monetization, difficulty
title string Specific action title
description string Detailed explanation with numbers
priority int 1 (critical) to 5 (optional). P1 = highest priority.
target_segment string Exact segment criteria
segment_size int Number of users in the segment
expected_impact Impact Expected improvement
actions string[] Numbered implementation steps
related_insight_ids string[] IDs of insights this recommendation addresses (e.g., ["churn-1", "levels-2"])
confidence float64 0.0 to 1.0 confidence

Impact

Expected impact of a recommendation.

Field Type Description
metric string What metric improves (e.g., retention_rate, revenue)
estimated_improvement string Expected improvement (e.g., "+15-20%", "+$4,975/month")
reasoning string Why this improvement is expected

Summary

Aggregate stats for a discovery run.

Field Type Description
total_insights int Number of insights generated
total_recommendations int Number of recommendations generated
queries_executed int Number of SQL queries executed
errors string[] Error messages from failed analysis areas (if any)

ExplorationStep

One step in the autonomous exploration phase. Represents a single LLM call + SQL query.

Field Type Description
step int Step number (1-based)
timestamp timestamp When this step ran
action string Always query_data
thinking string AI's reasoning for this query
query_purpose string Short description of query intent
query string The SQL query executed
row_count int Number of rows returned
execution_time_ms int64 Query execution time in milliseconds
error string Error message if query failed
fixed bool True if the query was auto-fixed after a SQL error

AnalysisStep

Full LLM dialog for one analysis area. Captures the complete prompt and response.

Field Type Description
area_id string Analysis area ID (e.g., churn)
area_name string Display name (e.g., Churn Risks)
run_at timestamp When this analysis ran
relevant_queries int Number of exploration queries used as context
tokens_in int Input tokens consumed
tokens_out int Output tokens generated
duration_ms int64 LLM call duration in milliseconds
insight_count int Number of insights extracted
error string Error message if analysis failed

ValidationResult

Warehouse verification of an insight's claims.

Field Type Description
insight_id string ID of the validated insight
analysis_area string Area this insight belongs to
claimed_count int Count claimed by the AI
verified_count int Count verified from the warehouse
status string confirmed, adjusted, rejected, error
reasoning string Explanation of the result
query string The verification SQL query
validated_at timestamp When validation was performed

DiscoveryRunStatus

Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.

Field Type Description
id string Run ID
project_id string Project being discovered
status string pending, running, completed, failed, cancelled
phase string Current phase: init, schema_discovery, exploration, analysis, validation, recommendations, saving, complete
phase_detail string Human-readable phase description
progress int 0 to 100 percentage
started_at timestamp When the run started
updated_at timestamp Last status update
completed_at timestamp When the run finished (null if running)
error string Error message (if failed)
steps RunStep[] Live step feed
total_queries int Total SQL queries executed
successful_queries int Queries that returned results
failed_queries int Queries that errored
insights_found int Insights generated so far

RunStep

One step in the live progress feed.

Field Type Description
phase string Which phase this step belongs to
step_num int Step number
timestamp timestamp When this step occurred
type string query, insight, analysis, validation, recommendation, error
message string Step description
llm_thinking string AI's reasoning text
query string SQL query (if type=query)
query_result string Query result summary
row_count int Rows returned
query_time_ms int Query execution time
query_fixed bool Whether query was auto-fixed
insight_name string Insight name (if type=insight)
insight_severity string Insight severity (if type=insight)
error string Error message (if type=error)

Feedback

User feedback on insights, recommendations, or exploration steps.

Field Type Description
id string Feedback ID
project_id string Project ID
discovery_id string Discovery run ID
target_type string insight, recommendation, exploration_step
target_id string ID of the rated item
rating string like or dislike
comment string Optional comment (typically with dislikes)
created_at timestamp When feedback was submitted

Project

Project configuration. Stored in projects collection.

Field Type Description
id string MongoDB ObjectID
name string Project name
description string Project description
domain string Domain (e.g., gaming, social)
category string Category (e.g., match3, idle, casual, content_sharing)
warehouse WarehouseConfig Data warehouse configuration
llm LLMConfig LLM provider configuration
schedule ScheduleConfig Discovery schedule
profile map Domain-specific profile (from JSON Schema form)
prompts ProjectPrompts Per-project prompt overrides
status string Project status
last_run_at timestamp When the last discovery ran
last_run_status string Last run result
created_at timestamp When the project was created
updated_at timestamp Last update

WarehouseConfig

Field Type Description
provider string Provider ID: bigquery, redshift
project_id string GCP project ID (BigQuery)
datasets string[] Dataset/schema names
location string Data location
filter_field string Multi-tenant filter column
filter_value string Multi-tenant filter value
config map Provider-specific key-value config

LLMConfig

Field Type Description
provider string Provider ID: claude, openai, ollama, vertex-ai, bedrock
model string Model identifier (free text)
config map Provider-specific key-value config (e.g., project_id, location for Vertex AI)

ScheduleConfig

Field Type Description
enabled bool Whether automatic discovery is enabled
cron_expr string Cron expression (e.g., 0 2 * * * = daily at 2 AM)
max_steps int Max exploration steps for scheduled runs

Next Steps