Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 11 additions & 20 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,21 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout sources
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Install nightly toolchain
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@nightly
with:
profile: minimal
toolchain: nightly
override: true
components: rustfmt, clippy

- name: Run cargo check
uses: actions-rs/cargo@v1
with:
command: check
run: cargo check

- name: Run cargo fmt
uses: actions-rs/cargo@v1
with:
command: fmt
args: -- --check
run: cargo fmt -- --check

- name: Run cargo clippy
run: cargo clippy -- -D warnings

test:
runs-on: ubuntu-latest
Expand All @@ -41,16 +36,12 @@ jobs:

steps:
- name: Checkout sources
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Install stable toolchain
uses: actions-rs/toolchain@v1
- name: Install toolchain
uses: dtolnay/rust-toolchain@master
with:
profile: minimal
toolchain: ${{ matrix.toolchain }}
override: true

- name: Run cargo test
uses: actions-rs/cargo@v1
with:
command: test
run: cargo test
91 changes: 61 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,83 @@
# Hstats: Online Statistics and Histograms for Data Streams

[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/antimora/hstats/rust.yml)](https://github.com/antimora/hstats/actions)
[![GitHub tag](https://img.shields.io/github/checks-status/antimora/hstats/main.svg)](https://github.com/antimora/hstats)
[![Crates.io](https://img.shields.io/crates/v/hstats.svg)](https://crates.io/crates/hstats)
[![Docs.rs](https://docs.rs/hstats/badge.svg)](https://docs.rs/hstats)

`hstats` is a streamlined and high-performance library engineered for online statistical analysis
and histogram generation from data streams. With a focus on multi-threaded environments, `hstats`
facilitates parallel operations that can later be merged into a single `Hstats` instance.
A Rust library for computing histograms and statistics from data streams without loading entire
datasets into memory. Designed for parallel workloads where independent histograms can be merged
into a single result.

During the histogram creation process, the number and width of bins are predetermined. The bin width
is calculated using the formula (end - start)/nbins, based on the parameters provided by the user.
Values that fall within the range of [start, end) are assigned to the appropriate bins, while values
outside this range are counted in underflow and overflow bins, which allows for subsequent
adjustments to the histogram's range.
## Features

`hstats` utilizes Welford's algorithm via the
[rolling-stats](https://github.com/ryankurte/rust-rolling-stats) library to compute mean and
standard deviation statistics. The `hstats` library is compatible with
[no_std environments](https://docs.rust-embedded.org/book/intro/no-std.html) that support alloc.

To simplify the output of statistics and histograms, `hstats` implements the `Display` trait for
`Hstats`. This allows users to define the floating-point precision (default is 2) for the printed
statistics and choose the character used for the histogram bars (default is `░`).
- **Online computation** - processes values one at a time, constant memory usage
- **Parallel-friendly** - build histograms per-thread, then `merge()` them
- **Underflow/overflow tracking** - values outside `[start, end)` are counted separately
- **Statistics** - min, max, mean, and standard deviation via Welford's algorithm
([rolling-stats](https://github.com/ryankurte/rust-rolling-stats))
- **`Display` trait** - configurable text-based histogram output with custom precision and bar characters
- **`no_std` compatible** - works in `no_std` environments that support `alloc`

## Getting Started

Add the following to your `Cargo.toml`:

```toml
[dependencies]
hstats = "0.1.0"
hstats = "0.2.0"
```

## Usage

```rust
use hstats::Hstats;

// Create a histogram with 10 bins over the range [0.0, 100.0)
let mut hist = Hstats::new(0.0, 100.0, 10);

// Add values
for value in &[15.0, 25.0, 35.5, 50.0, 72.0, 91.0] {
hist.add(*value);
}

// Query statistics
println!("count: {}, mean: {:.2}, std_dev: {:.2}", hist.count(), hist.mean(), hist.std_dev());
println!("min: {:.2}, max: {:.2}", hist.min(), hist.max());

// Print the histogram
println!("{}", hist.with_precision(1));
```

### Parallel usage

Build histograms independently on each thread, then merge:

```rust
// On each thread:
let mut local = Hstats::new(0.0, 100.0, 10);
for value in chunk {
local.add(*value);
}

// After all threads finish, merge results:
let combined = histograms.into_iter()
.reduce(|a, b| a.merge(&b))
.unwrap();
```

See [examples/single-thread.rs](examples/single-thread.rs) and
[examples/multi-thread.rs](examples/multi-thread.rs) for complete runnable examples.

## Examples

1. Single thread example: See [examples/single-thread.rs](examples/single-thread.rs) Run the example
with:
```shell
time cargo run --example single-thread --release
```
2. Multi-thread example: See [examples/multi-thread.rs](examples/multi-thread.rs) Run the example
with:
```shell
time cargo run --example multi-thread --release
```

Here is a sample output from the multi-thread example:
Run the examples with:

```shell
cargo run --example single-thread --release
cargo run --example multi-thread --release
```

Sample output from the multi-thread example:

```
Number of random samples: 50000000
Expand Down
100 changes: 79 additions & 21 deletions src/hstats.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use core::{
ops::AddAssign,
};

use alloc::format;
use alloc::string::{String, ToString};
use alloc::vec;
use alloc::vec::Vec;
Expand All @@ -22,7 +23,7 @@ const DEFAULT_PRECISION: usize = 2;
///
/// The struct includes fields for managing the histogram bins, underflow,
/// overflow, and other statistics.
#[derive(Debug, Default, Clone)]
#[derive(Debug, Clone)]
pub struct Hstats<T>
where
T: Float + AddAssign + FromPrimitive + Debug + Display,
Expand Down Expand Up @@ -83,6 +84,10 @@ where
///
/// * `value`: Value to be added to the histogram.
pub fn add(&mut self, value: T) {
if value.is_nan() {
return;
}

self.stats.update(value);

if value < self.start {
Expand Down Expand Up @@ -117,17 +122,16 @@ where
assert_eq!(self.bin_count, other.bin_count, "Bin counts must be equal");

let mut merged = Hstats::new(self.start, self.end, self.bin_count);
merged.precision = self.precision;
merged.bar_char = self.bar_char.clone();

// Add the underflow and overflow together
merged.underflow = self.underflow + other.underflow;
merged.overflow = self.overflow + other.overflow;

// Add the bins together
for (i, (left, right)) in (self.bins.iter().zip(other.bins.iter())).enumerate() {
merged.bins[i] = *left + *right;
}

// Merge the stats
merged.stats = self.stats.merge(&other.stats);

merged
Expand Down Expand Up @@ -242,44 +246,56 @@ where
}
}

impl<T> Default for Hstats<T>
where
T: Float + AddAssign + FromPrimitive + Debug + Display,
{
fn default() -> Self {
Self::new(T::zero(), T::one(), 10)
}
}

/// Display the histogram as a text-based histogram.
impl<T> Display for Hstats<T>
where
T: Float + AddAssign + FromPrimitive + Debug + Display,
{
fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error> {
const MAX_BAR_SIZE: usize = 60; // Maximum size of the histogram bar
const MAX_BAR_SIZE: usize = 60;

// Find the bin with maximum count
let max_count = *self.bins.iter().max().unwrap_or(&0);
let total_count = self.count();
let bins = self.bins();

// let col1_width = self.bins.iter().max_by_key(|(start, _, _)| *start).unwrap();

let col1 = self
.bins()
let col1 = bins
.iter()
.map(|(start, _, _)| format!("{:.*}", self.precision, start).len())
.max()
.unwrap();
.unwrap_or(5);

let col2 = self
.bins()
let col2 = bins
.iter()
.map(|(_, end, _)| format!("{:.*}", self.precision, end).len())
.max()
.unwrap();
.unwrap_or(5);

let precision = self.precision;

writeln!(f, "{:^col1$} | {:^col2$}", "Start", "End")?;
writeln!(f, "{:-^col1$}-|-{:-^col2$}-", "", "")?;
for (range_start, range_end, count) in self.bins() {
// Calculate the length of the bar
let bar_length = ((count as f64 / max_count as f64) * MAX_BAR_SIZE as f64) as usize;
for (range_start, range_end, count) in &bins {
let bar_length = if max_count > 0 {
((*count as f64 / max_count as f64) * MAX_BAR_SIZE as f64) as usize
} else {
0
};

let percent = if total_count > 0 {
*count as f64 / total_count as f64 * 100.0
} else {
0.0
};

let percent = count as f64 / self.count() as f64 * 100.0;

// Create the bar string with '#' characters
let bar = self.bar_char.repeat(bar_length);

writeln!(
Expand All @@ -288,7 +304,7 @@ where
)?;
}
writeln!(f)?;
write!(f, "Total Count: {}", self.count())?;
write!(f, "Total Count: {}", total_count)?;
write!(f, " Min: {:.*}", self.precision, self.min())?;
write!(f, " Max: {:.*}", self.precision, self.max())?;
write!(f, " Mean: {:.*}", self.precision, self.mean())?;
Expand Down Expand Up @@ -401,6 +417,48 @@ mod tests {
let _ = hstats1.merge(&hstats2);
}

#[test]
fn test_default() {
let hstats: Hstats<f64> = Hstats::default();
assert_eq!(hstats.start(), 0.0);
assert_eq!(hstats.end(), 1.0);
assert_eq!(hstats.bin_count(), 10);
assert_eq!(hstats.count(), 0);
}

#[test]
fn test_add_nan() {
let mut hstats = Hstats::new(0.0, 10.0, 10);
hstats.add(5.0);
hstats.add(f64::NAN);
hstats.add(3.0);

assert_eq!(hstats.count(), 2);
assert_eq!(hstats.bins[5], 1);
assert_eq!(hstats.bins[3], 1);
}

#[test]
fn test_merge_preserves_settings() {
let mut h1 = Hstats::new(0.0, 10.0, 10)
.with_precision(4)
.with_bar_char("#");
h1.add(5.0);
let mut h2 = Hstats::new(0.0, 10.0, 10);
h2.add(6.0);

let merged = h1.merge(&h2);
assert_eq!(merged.precision, 4);
assert_eq!(merged.bar_char, "#");
}

#[test]
fn test_display_empty_histogram() {
let hstats = Hstats::new(0.0, 10.0, 5);
let output = format!("{}", hstats);
assert!(output.contains("Total Count: 0"));
}

#[test]
fn stats_for_large_random_data() {
type T = f64;
Expand Down
1 change: 0 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,4 @@ mod hstats;

pub use crate::hstats::Hstats;

#[macro_use]
extern crate alloc;