Data Models Reference

Version: 0.1.0

This page documents the core data structures used across DecisionBox.

DiscoveryResult

A complete discovery run output. Stored in the discoveries MongoDB collection.

Field	Type	Description
`id`	string	MongoDB ObjectID
`project_id`	string	Project that owns this discovery
`domain`	string	Domain (e.g., `gaming`, `social`)
`category`	string	Category (e.g., `match3`, `idle`, `casual`, `content_sharing`)
`run_type`	string	`full` (all areas), `partial` (some areas or some failed), `failed` (all areas failed)
`areas_requested`	string[]	Area IDs requested (empty for full run)
`discovery_date`	timestamp	When the discovery ran
`total_steps`	int	Number of exploration steps executed
`duration`	int64	Duration in nanoseconds
`insights`	Insight[]	Discovered patterns
`recommendations`	Recommendation[]	Actionable advice
`summary`	Summary	Aggregate stats
`exploration_log`	ExplorationStep[]	Every SQL query + AI reasoning
`analysis_log`	AnalysisStep[]	Full LLM dialog per analysis area
`recommendation_log`	RecommendationStep	Full LLM dialog for recommendations
`validation_log`	ValidationResult[]	Verification queries + results
`created_at`	timestamp	When result was saved

Insight

A discovered pattern or finding. Generated by the analysis phase.

Field	Type	Description
`id`	string	Deterministic ID: `{area}-{index}` (e.g., `churn-1`, `monetization-3`). Auto-generated if LLM omits it.
`analysis_area`	string	Which area found this (e.g., `churn`, `levels`)
`name`	string	Specific descriptive name (e.g., "Day 0-to-Day 1 Drop: 67% Never Return")
`description`	string	Detailed description with exact numbers and percentages
`severity`	string	`critical`, `high`, `medium`, or `low`
`affected_count`	int	Number of affected users (COUNT DISTINCT user_id)
`risk_score`	float64	0.0 to 1.0 risk assessment
`confidence`	float64	0.0 to 1.0 confidence level
`metrics`	map	Flexible key-value metrics (e.g., `{"churn_rate": 0.67, "avg_sessions": 3.2}`)
`indicators`	string[]	Specific metric indicators (e.g., "Session drop: 12min → 4min")
`target_segment`	string	Description of affected user segment
`source_steps`	int[]	Exploration step numbers that support this insight
`validation`	InsightValidation	Warehouse verification result (if validated)
`discovered_at`	timestamp	When this insight was generated

InsightValidation

Attached to an insight after warehouse verification.

Field	Type	Description
`status`	string	`confirmed`, `adjusted`, `rejected`, or `error`
`original_count`	int	Count claimed by the AI
`verified_count`	int	Count verified from the warehouse
`reasoning`	string	Explanation of the validation result
`validated_at`	timestamp	When validation was performed

Recommendation

An actionable suggestion based on discovered insights.

Field	Type	Description
`id`	string	Recommendation ID
`category`	string	Category: `churn`, `engagement`, `monetization`, `difficulty`
`title`	string	Specific action title
`description`	string	Detailed explanation with numbers
`priority`	int	1 (critical) to 5 (optional). P1 = highest priority.
`target_segment`	string	Exact segment criteria
`segment_size`	int	Number of users in the segment
`expected_impact`	Impact	Expected improvement
`actions`	string[]	Numbered implementation steps
`related_insight_ids`	string[]	IDs of insights this recommendation addresses (e.g., `["churn-1", "levels-2"]`)
`confidence`	float64	0.0 to 1.0 confidence

Impact

Expected impact of a recommendation.

Field	Type	Description
`metric`	string	What metric improves (e.g., `retention_rate`, `revenue`)
`estimated_improvement`	string	Expected improvement (e.g., "+15-20%", "+$4,975/month")
`reasoning`	string	Why this improvement is expected

Summary

Aggregate stats for a discovery run.

Field	Type	Description
`total_insights`	int	Number of insights generated
`total_recommendations`	int	Number of recommendations generated
`queries_executed`	int	Number of SQL queries executed
`errors`	string[]	Error messages from failed analysis areas (if any)

ExplorationStep

One step in the autonomous exploration phase. Represents a single LLM call + SQL query.

Field	Type	Description
`step`	int	Step number (1-based)
`timestamp`	timestamp	When this step ran
`action`	string	Always `query_data`
`thinking`	string	AI's reasoning for this query
`query_purpose`	string	Short description of query intent
`query`	string	The SQL query executed
`row_count`	int	Number of rows returned
`execution_time_ms`	int64	Query execution time in milliseconds
`error`	string	Error message if query failed
`fixed`	bool	True if the query was auto-fixed after a SQL error

AnalysisStep

Full LLM dialog for one analysis area. Captures the complete prompt and response.

Field	Type	Description
`area_id`	string	Analysis area ID (e.g., `churn`)
`area_name`	string	Display name (e.g., `Churn Risks`)
`run_at`	timestamp	When this analysis ran
`relevant_queries`	int	Number of exploration queries used as context
`tokens_in`	int	Input tokens consumed
`tokens_out`	int	Output tokens generated
`duration_ms`	int64	LLM call duration in milliseconds
`insight_count`	int	Number of insights extracted
`error`	string	Error message if analysis failed

ValidationResult

Warehouse verification of an insight's claims.

Field	Type	Description
`insight_id`	string	ID of the validated insight
`analysis_area`	string	Area this insight belongs to
`claimed_count`	int	Count claimed by the AI
`verified_count`	int	Count verified from the warehouse
`status`	string	`confirmed`, `adjusted`, `rejected`, `error`
`reasoning`	string	Explanation of the result
`query`	string	The verification SQL query
`validated_at`	timestamp	When validation was performed

DiscoveryRunStatus

Live status of a running discovery. Stored in discovery_runs collection, updated in real-time.

Field	Type	Description
`id`	string	Run ID
`project_id`	string	Project being discovered
`status`	string	`pending`, `running`, `completed`, `failed`, `cancelled`
`phase`	string	Current phase: `init`, `schema_discovery`, `exploration`, `analysis`, `validation`, `recommendations`, `saving`, `complete`
`phase_detail`	string	Human-readable phase description
`progress`	int	0 to 100 percentage
`started_at`	timestamp	When the run started
`updated_at`	timestamp	Last status update
`completed_at`	timestamp	When the run finished (null if running)
`error`	string	Error message (if failed)
`steps`	RunStep[]	Live step feed
`total_queries`	int	Total SQL queries executed
`successful_queries`	int	Queries that returned results
`failed_queries`	int	Queries that errored
`insights_found`	int	Insights generated so far

RunStep

One step in the live progress feed.

Field	Type	Description
`phase`	string	Which phase this step belongs to
`step_num`	int	Step number
`timestamp`	timestamp	When this step occurred
`type`	string	`query`, `insight`, `analysis`, `validation`, `recommendation`, `error`
`message`	string	Step description
`llm_thinking`	string	AI's reasoning text
`query`	string	SQL query (if type=query)
`query_result`	string	Query result summary
`row_count`	int	Rows returned
`query_time_ms`	int	Query execution time
`query_fixed`	bool	Whether query was auto-fixed
`insight_name`	string	Insight name (if type=insight)
`insight_severity`	string	Insight severity (if type=insight)
`error`	string	Error message (if type=error)

Feedback

User feedback on insights, recommendations, or exploration steps.

Field	Type	Description
`id`	string	Feedback ID
`project_id`	string	Project ID
`discovery_id`	string	Discovery run ID
`target_type`	string	`insight`, `recommendation`, `exploration_step`
`target_id`	string	ID of the rated item
`rating`	string	`like` or `dislike`
`comment`	string	Optional comment (typically with dislikes)
`created_at`	timestamp	When feedback was submitted

Project

Project configuration. Stored in projects collection.

Field	Type	Description
`id`	string	MongoDB ObjectID
`name`	string	Project name
`description`	string	Project description
`domain`	string	Domain (e.g., `gaming`, `social`)
`category`	string	Category (e.g., `match3`, `idle`, `casual`, `content_sharing`)
`warehouse`	WarehouseConfig	Data warehouse configuration
`llm`	LLMConfig	LLM provider configuration
`schedule`	ScheduleConfig	Discovery schedule
`profile`	map	Domain-specific profile (from JSON Schema form)
`prompts`	ProjectPrompts	Per-project prompt overrides
`status`	string	Project status
`last_run_at`	timestamp	When the last discovery ran
`last_run_status`	string	Last run result
`created_at`	timestamp	When the project was created
`updated_at`	timestamp	Last update

WarehouseConfig

Field	Type	Description
`provider`	string	Provider ID: `bigquery`, `redshift`
`project_id`	string	GCP project ID (BigQuery)
`datasets`	string[]	Dataset/schema names
`location`	string	Data location
`filter_field`	string	Multi-tenant filter column
`filter_value`	string	Multi-tenant filter value
`config`	map	Provider-specific key-value config

LLMConfig

Field	Type	Description
`provider`	string	Provider ID: `claude`, `openai`, `ollama`, `vertex-ai`, `bedrock`
`model`	string	Model identifier (free text)
`config`	map	Provider-specific key-value config (e.g., `project_id`, `location` for Vertex AI)

ScheduleConfig

Field	Type	Description
`enabled`	bool	Whether automatic discovery is enabled
`cron_expr`	string	Cron expression (e.g., `0 2 * * *` = daily at 2 AM)
`max_steps`	int	Max exploration steps for scheduled runs

Next Steps

API Reference — Endpoints that return these models
Configuration Reference — Environment variables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Models Reference

DiscoveryResult

Insight

InsightValidation

Recommendation

Impact

Summary

ExplorationStep

AnalysisStep

ValidationResult

DiscoveryRunStatus

RunStep

Feedback

Project

WarehouseConfig

LLMConfig

ScheduleConfig

Next Steps

FilesExpand file tree

data-models.md

Latest commit

History

data-models.md

File metadata and controls

Data Models Reference

DiscoveryResult

Insight

InsightValidation

Recommendation

Impact

Summary

ExplorationStep

AnalysisStep

ValidationResult

DiscoveryRunStatus

RunStep

Feedback

Project

WarehouseConfig

LLMConfig

ScheduleConfig

Next Steps