Generate machine telemetry and IoT sensor data for dashboard and observability demos.
The telemetry generators create data suitable for:
- DevOps/SRE dashboards
- Anomaly detection systems
- Capacity planning demos
- Infrastructure monitoring
- Crossfilter visualization
The telemetry() function generates comprehensive machine metrics with configurable scenarios.
from superstore import telemetry
# Generate telemetry with default settings
df = telemetry(n_machines=50, n_readings=1000)
# Use a preset scenario
df = telemetry(scenario="anomaly_detection", n_machines=100)| Column | Type | Description |
|---|---|---|
timestamp |
datetime | Reading timestamp |
machine_id |
str | Machine identifier |
machine_type |
str | Machine type (core, edge, worker) |
zone |
str | Zone identifier |
region |
str | Region identifier |
cpu_percent |
float | CPU utilization (0-100) |
memory_percent |
float | Memory utilization (0-100) |
disk_percent |
float | Disk utilization (0-100) |
network_in_mbps |
float | Network ingress (Mbps) |
network_out_mbps |
float | Network egress (Mbps) |
request_count |
int | Request count |
error_count |
int | Error count |
latency_p50 |
float | 50th percentile latency (ms) |
latency_p99 |
float | 99th percentile latency (ms) |
is_anomaly |
bool | Anomaly label (for ML training) |
Use preset scenarios for common use cases:
from superstore import telemetry, TELEMETRY_SCENARIOS
# See available scenarios
print(TELEMETRY_SCENARIOS)
# Use a scenario
df = telemetry(scenario="production", n_machines=100)| Scenario | Description |
|---|---|
baseline |
Normal operating conditions |
maintenance_window |
Scheduled maintenance patterns |
capacity_planning |
Growth trend data |
anomaly_detection |
Training data with labeled anomalies |
multi_zone |
Multi-datacenter deployment |
cpu_bound |
High CPU workload |
memory_bound |
High memory workload |
network_heavy |
Network-intensive workload |
degradation_cycle |
Progressive degradation and recovery |
production |
Full realistic environment |
chaos |
High anomaly rates for chaos engineering |
For dashboard demos, use the individual crossfilter generators:
Generate machine metadata:
from superstore import machines
df = machines(n_machines=100)| Column | Type | Description |
|---|---|---|
machine_id |
str | Machine identifier |
machine_type |
str | Machine type |
cores |
int | CPU cores |
memory_gb |
int | Memory in GB |
zone |
str | Zone |
region |
str | Region |
created_at |
datetime | Provisioning date |
Generate machine usage metrics:
from superstore import usage
df = usage(n_machines=100, n_readings=1000)| Column | Type | Description |
|---|---|---|
timestamp |
datetime | Reading timestamp |
machine_id |
str | Machine identifier |
cpu_percent |
float | CPU utilization |
memory_percent |
float | Memory utilization |
disk_percent |
float | Disk utilization |
Generate machine status records:
from superstore import status
df = status(n_machines=100)| Column | Type | Description |
|---|---|---|
machine_id |
str | Machine identifier |
status |
str | Current status (healthy, degraded, down) |
last_heartbeat |
datetime | Last heartbeat time |
Generate job/task records:
from superstore import jobs
df = jobs(n_machines=100, n_jobs=5000)| Column | Type | Description |
|---|---|---|
job_id |
str | Job identifier |
machine_id |
str | Executing machine |
job_type |
str | Job type |
status |
str | Job status |
start_time |
datetime | Job start time |
end_time |
datetime | Job end time |
duration_seconds |
float | Job duration |
Use CrossfilterConfig for detailed control:
from superstore import telemetry, CrossfilterConfig
config = CrossfilterConfig(
n_machines=200,
n_readings=2000,
seed=42,
)
df = telemetry(config=config)Configure the machine fleet:
config = CrossfilterConfig(
n_machines=100,
# Machine types
machine_types=["core", "edge", "worker"],
# Hardware specs
cores_range=(8, 128), # CPU cores range
# Topology
zones=["zone-a", "zone-b", "zone-c", "zone-d"],
regions=["us-east-1", "us-west-2", "eu-west-1", "ap-northeast-1"],
)| Parameter | Default | Description |
|---|---|---|
machine_types |
[core, edge, worker] |
Types of machines |
cores_range |
(4, 64) |
CPU cores range |
zones |
[zone-a, zone-b, zone-c] |
Available zones |
regions |
[us-east-1, us-west-2, eu-west-1] |
Available regions |
Configure baseline resource utilization:
config = CrossfilterConfig(
n_machines=100,
n_readings=1000,
base_cpu_load=0.4, # 40% base CPU
base_memory_load=0.6, # 60% base memory
load_variance=0.25, # Load variability
)| Parameter | Default | Description |
|---|---|---|
base_cpu_load |
0.3 |
Base CPU utilization (0-1) |
base_memory_load |
0.5 |
Base memory utilization (0-1) |
load_variance |
0.2 |
Variance in load readings |
Inject anomalies for detection training:
config = CrossfilterConfig(
n_machines=100,
n_readings=2000,
anomalies={
"enable": True,
"cpu_spike_probability": 0.03, # 3% CPU spikes
"memory_leak_probability": 0.015, # Memory leak starts
"network_saturation_probability": 0.02, # Network saturation
}
)| Parameter | Default | Description |
|---|---|---|
enable |
False |
Enable anomaly injection |
cpu_spike_probability |
0.02 |
Probability of CPU spike |
memory_leak_probability |
0.01 |
Probability of memory leak |
network_saturation_probability |
0.01 |
Probability of network saturation |
Add realistic time-of-day and day-of-week patterns:
config = CrossfilterConfig(
n_machines=100,
n_readings=5000,
temporal_patterns={
"enable_diurnal": True, # Day/night patterns
"enable_weekly": True, # Weekday/weekend patterns
"peak_hour": 14, # Peak at 2 PM
"night_load_factor": 0.25, # 25% load at night
"weekend_load_factor": 0.4, # 40% load on weekends
}
)| Parameter | Default | Description |
|---|---|---|
enable_diurnal |
False |
Enable day/night load patterns |
enable_weekly |
False |
Enable weekday/weekend patterns |
peak_hour |
14 |
Hour of peak load (0-23) |
night_load_factor |
0.3 |
Load factor during night |
weekend_load_factor |
0.5 |
Load factor during weekends |
Simulate machine failures and cascades:
config = CrossfilterConfig(
n_machines=100,
n_readings=2000,
enable_failures=True,
failure_probability=0.002, # 0.2% failure rate per reading
cascade_failure_probability=0.4, # 40% chance of cascade
)| Parameter | Default | Description |
|---|---|---|
enable_failures |
False |
Enable failure simulation |
failure_probability |
0.001 |
Per-reading failure probability |
cascade_failure_probability |
0.3 |
Cascade probability when dependent fails |
from superstore import telemetry, CrossfilterConfig
config = CrossfilterConfig(
n_machines=200,
n_readings=3000,
seed=42,
# Machine topology
machine_types=["core", "edge", "worker"],
cores_range=(8, 96),
zones=["zone-a", "zone-b", "zone-c"],
regions=["us-east-1", "us-west-2", "eu-west-1"],
# Usage profiles
base_cpu_load=0.35,
base_memory_load=0.55,
load_variance=0.2,
# Anomalies
anomalies={
"enable": True,
"cpu_spike_probability": 0.02,
"memory_leak_probability": 0.01,
"network_saturation_probability": 0.01,
},
# Temporal patterns
temporal_patterns={
"enable_diurnal": True,
"enable_weekly": True,
"peak_hour": 15,
"night_load_factor": 0.2,
"weekend_load_factor": 0.35,
},
# Failures
enable_failures=True,
failure_probability=0.001,
cascade_failure_probability=0.25,
)
df = telemetry(config=config)Access schema constants for validation:
from superstore import (
MACHINE_SCHEMA,
USAGE_SCHEMA,
STATUS_SCHEMA,
JOBS_SCHEMA,
TELEMETRY_SCHEMA,
TELEMETRY_SCENARIOS,
)See the full API documentation: