Skip to content

Latest commit

 

History

History
243 lines (184 loc) · 5.81 KB

File metadata and controls

243 lines (184 loc) · 5.81 KB

Log Data Generation

Generate realistic web server access logs and application event logs with configurable traffic patterns.

Overview

The log generators create data suitable for:

  • Log analytics dashboards
  • Error rate monitoring
  • Latency analysis
  • Security analysis and fraud detection
  • Observability pipeline testing

Web Server Logs

The logs() function generates HTTP access logs in Apache Combined or Common format.

Basic Usage

from superstore import logs

# Generate 10,000 log entries
df = logs(count=10000)

# Apache Combined format (default)
df = logs(count=10000, format="combined")

# JSON structured logs
df = logs(count=10000, format="json")

Output Schema

Column Type Description
timestamp datetime Request timestamp
ip_address str Client IP address
method str HTTP method (GET, POST, etc.)
path str Request path
status_code int HTTP status code
response_size int Response size in bytes
latency_ms float Request latency in milliseconds
user_agent str User agent string
referer str Referrer URL
user_id str User identifier (if authenticated)

Application Logs

The app_logs() function generates application-level event logs with log levels, trace IDs, and exceptions.

Basic Usage

from superstore import app_logs

# Generate 5,000 application log entries
df = app_logs(count=5000)

Output Schema

Column Type Description
timestamp datetime Event timestamp
level str Log level (DEBUG, INFO, WARN, ERROR)
logger str Logger name/component
message str Log message
trace_id str Distributed trace ID
span_id str Span ID
exception str Exception type (if error)
stack_trace str Stack trace (if error)

Configuration

Use LogsConfig for detailed control over log generation:

from superstore import logs, LogsConfig

config = LogsConfig(
    count=50000,
    seed=42,
    format="combined",
)
df = logs(config=config)

Traffic Patterns

Control the traffic rate and timing:

config = LogsConfig(
    count=10000,
    start_time="2024-01-15T10:00:00",  # ISO format start time
    requests_per_second=250.0,          # Average RPS (Poisson arrival)
)
Parameter Default Description
start_time (current time) Start timestamp in ISO format
requests_per_second 100.0 Average requests per second

Status Code Distribution

Configure success and error rates:

config = LogsConfig(
    count=10000,
    success_rate=0.98,  # 98% success (2xx responses)
)
Parameter Default Description
success_rate 0.95 Base success rate (2xx responses)

Error Bursts

Simulate error bursts for monitoring/alerting demos:

config = LogsConfig(
    count=50000,
    error_burst={
        "enable": True,
        "burst_probability": 0.03,      # 3% chance of entering burst
        "burst_duration_seconds": 45,    # Average burst duration
        "burst_error_rate": 0.6,         # 60% errors during burst
    }
)
Parameter Default Description
enable True Enable error burst simulation
burst_probability 0.02 Probability of entering burst state
burst_duration_seconds 30 Average burst duration
burst_error_rate 0.5 Error rate during bursts

Latency Distribution

Configure request latency behavior:

config = LogsConfig(
    count=10000,
    latency={
        "base_latency_ms": 60.0,        # Median latency
        "latency_stddev": 0.9,          # Log-normal spread
        "slow_request_probability": 0.08,  # 8% slow requests
        "slow_request_multiplier": 15.0,   # Slow = 15x base
    }
)
Parameter Default Description
base_latency_ms 50.0 Base/median latency in milliseconds
latency_stddev 0.8 Standard deviation (log-normal)
slow_request_probability 0.05 Probability of slow requests
slow_request_multiplier 10.0 Multiplier for slow request latency

Request Details

Customize request generation:

config = LogsConfig(
    count=10000,
    include_user_agent=True,   # Include user agent strings
    unique_ips=2000,           # Number of unique client IPs
    unique_users=800,          # Number of unique user IDs
    api_path_ratio=0.8,        # 80% API paths, 20% static
)
Parameter Default Description
include_user_agent True Include user agent strings
unique_ips 1000 Number of unique IP addresses
unique_users 500 Number of unique user IDs
api_path_ratio 0.7 Ratio of API vs static paths

Complete Example

from superstore import logs, LogsConfig

config = LogsConfig(
    count=100000,
    seed=42,
    format="json",

    # Traffic
    start_time="2024-06-01T00:00:00",
    requests_per_second=500.0,

    # Success rate
    success_rate=0.97,

    # Error bursts for monitoring demos
    error_burst={
        "enable": True,
        "burst_probability": 0.02,
        "burst_duration_seconds": 60,
        "burst_error_rate": 0.7,
    },

    # Latency
    latency={
        "base_latency_ms": 45.0,
        "latency_stddev": 0.7,
        "slow_request_probability": 0.05,
        "slow_request_multiplier": 20.0,
    },

    # Request details
    unique_ips=5000,
    unique_users=2000,
    api_path_ratio=0.85,
)

df = logs(config=config)

API Reference

See the full API documentation: