Skip to content

Conversation

@weaversam8
Copy link

Earlier today I opened bencheeorg/benchee#472, and I started thinking about how Statistex could be used to accomplish this. Statistex already supports calculating several statistics while ignoring the list of samples, if the prerequisite parts are available to the caller (e.g. average can be called with :total and :sample_size)

One statistic that didn't support this today was variance (and by extension standard_deviation), since both require the entire list of samples to operate. Algorithms to calculate variance "on-line" exist, such as Welford's Online Algorithm, but they depend on a different statistic, m2. This statistic is "the running sum of squared differences from the current mean", and really only has a meaning in the context of computing variance.

In this PR, I've:

  • added this m2 statistic
  • updated the variance statistic to use m2 when calculating variance instead of average
  • I also exposed this m2 statistic on the top level Statistex struct, even if it's not a particularly useful statistic on its own.
    • (I figured the ability for users to incrementally update statistics like variance and standard_deviation was in keeping with Statistex's philosophy of avoiding recomputation.)

This PR also updates some typespecs to better capture the :ignored behavior for statistics that support it, and tweaks the last few digits of floating point results from the variance and standard_deviation statistic. I'm not a statistics expert, but I believe that my floating point tweaks represent a shift towards more accuracy, not less, given the notes about loss of precision on the above linked Wikipedia page:

[The Welford's online] algorithm is much less prone to loss of precision due to catastrophic cancellation, but might not be as efficient because of the division operation inside the loop.

I didn't open an issue before I opened this PR, so I know these changes might not be desired - if so, please let me know! I'd love to discuss better ways to handle what I described in bencheeorg/benchee#472 if this isn't the best approach.

- Also exposes `m2` statistic so this algorithm can be used from user code
- Added test for the above
- Update dialyzer specs to better tolerate `:ignored` option when calling certain statistic functions
Copy link
Member

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋

First of all, thanks a lot for the PR and the ideas!

On a high level, I'm always happy for statistex to support more and help people more so from that pov I love this! You also correctly picked up that it's very much within statistex' philosophy to let you provide as many intermediary values as possible if you want to. 👍 💚

The high level questions to me are:

  • Does this have an impact on performance for the "normal" use case? Online algorithms are pretty cool but usually it comes at a cost, so that the "everything at once" algorithms are faster. Would love to have benchmark showing how fast each approach is - if only for my benchmarking nerdyness.
  • The question on if we're gonna do bencheeorg/benchee#472 is also still open - I just wanna keep that in mind, but statistex is a general purpose library so even if we don't I'm happy to have this in statistex (pending performance results as a separate way if slower)

Also 👋 I believe this is the first PR review you're getting from me. I leave a lot of comments. That's normal. It doesn't mean I don't like the PR, in fact I think this PR is fantastic! I'm just rather detail oriented and want to understand things so I'll leave small comments on wording/naming and ask questions. Just because I leave a comment doesn't mean things have to change, I'm often just genuinely curios.

On the bigger level for this PR:

  • we need to keep backwards compatibility with options that people can pass in (the missing :average in variance)
  • The performance results are interesting to me to see if we can, as this PR does, just always use the m2 variant as performance is comparable or if we need to make it opt in/separate API/whatever
  • since I think most people don't know what m2 is we might need to add some more docs and/or name it differently (I just want people who see m2 in the struct to be able to quickly find the answer to "what the hell is this"?)

Again thanks a lot, sorry for the many comments I know they can seem intimidating but they're definitely not meant that way 😅 Let me see if the train wifi allows me to upload a bunny pic...

IMG_20171130_092158

* `mix test` to run tests
* `mix dialyzer` to run dialyzer for type checking, might take a while on the first invocation (try building plts first with `mix dialyzer --plt`)
* `mix credo` to find code style problems
* To generate code coverage information:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't accurate for this project as we use excoveralls - although to be honest haven't used the old school --cover in a long while. So it's mix coveralls.html :)

m2: 17142.857142857145,
variance: 2857.1428571428573,
standard_deviation: 53.45224838248488,
standard_deviation_ratio: 0.10845383729779542,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, but the differences are so far back I don't think they matter/a normal test with "to a precision of 5 digits" wouldn't even have caught it.

** (ArgumentError) Passed an empty list ([]) to calculate statistics from, please pass a list containing at least one number.
"""
@spec standard_deviation(samples, keyword) :: float
@spec standard_deviation(samples | :ignored, keyword) :: float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this confused me a bit :) I can now see why you did it, but I don't think it should be part of the type specs. Me passing in :ignored in some of the doc tests was just my cheeky way of showing that if you provide all the other data needed then truly the samples aren't even touched and hence don't matter. There is no special meaning to :ignored - you could pass in a pid or whatever. I don't think people should do that though.

I think if we wanted to highlight/encourage do that more we should supply a function/interface that just doesn't take any samples and either works on direct arguments or the keyword list but raises if any of them aren't there :)

** (ArgumentError) Passed an empty list ([]) to calculate statistics from, please pass a list containing at least one number.
"""
@spec standard_deviation_ratio(samples, keyword) :: float
@spec standard_deviation_ratio(samples | :ignored, keyword) :: float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same :ignored comment as above)

@doc """
Calculate the running sum of squared differences from the current mean.
This value is only used when trying to calculate the variance in a single pass, using Welford's online algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A link similar to the one in your PR may be great here!

0

{0, 0} ->
0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here an if works clearer as you can do == 0 and so we don't need to make it 2 separate cases for 0.0 and 0.


_ ->
Keyword.get_lazy(options, :average, fn ->
average(:ignored, sample_size: count, total: total)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here I'd just extract a (private) calc_average function passing sample_size and total in as the args - it'll be extremely tiny but yeah - avoids the :ignored bit 👀

def m2([], _), do: raise(ArgumentError, @empty_list_error_message)

def m2(samples, options) when is_list(samples) do
count = Keyword.get(options, :sample_size, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's nitpicky/focussed on wording but as the rest of the code calls this sample_size I think it'd be worth to keep with that wording :) (or since we keep updating it, maybe something like current_sample_size?)

count = count + 1
delta = sample - mean
mean = mean + delta / count
delta2 = sample - mean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better name for delta2? I haven't read the algorithm description yet but would hope there is a better name to use 😁


average =
Keyword.get_lazy(options, :average, fn -> average(samples, sample_size: sample_size) end)
m2 = Keyword.get_lazy(options, :m2, fn -> m2(samples) end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not allowing to provide :average any more would be a breaking change which we should avoid (as I wouldn't want to bump a major for this). I suppose that means we'd need to keep the "old" offline way to calculate variance around as well.

@weaversam8
Copy link
Author

Thanks for the detailed comments! No worries about the quantity, I also prefer detailed notes instead of one larger comment. I'll take a look at making some of these tweaks soon.

I was also curious about the benchmarking, and my first thought was to use Benchee. @PragTob would you be OK if I added some tests that benchmarked this library using Benchee? Even if that might make a circular dependency between Benchee and Statistex?

@PragTob
Copy link
Member

PragTob commented Oct 5, 2025

@weaversam8 if you can make the circular dependency work, for sure! Otherwise, a separate repo may be the solution. I've definitely benchmarked benchee using benchee :)

Also sorry for the silence... I'll appologize in some other thread 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants