Skip to content

Implementing Hypergeometric sampling to remove RFunctions dependency#1994

Open
JaoCR wants to merge 31 commits intoJuliaStats:masterfrom
JaoCR:hypergeosampling
Open

Implementing Hypergeometric sampling to remove RFunctions dependency#1994
JaoCR wants to merge 31 commits intoJuliaStats:masterfrom
JaoCR:hypergeosampling

Conversation

@JaoCR
Copy link
Copy Markdown

@JaoCR JaoCR commented Aug 18, 2025

Following the TODO in the Hypergeometric distribution sampling code, this PR implements the H2PE/HIM algorithm presented by Kachitvichyanukul & Schmeiser in 1984, removing this dependency on RFunctions and improving the sampling performance with a penalty in setup time and some extra stored constants.

Added some test cases for complete coverage of the distribution code. More extensive testing with the distribution of the generated variables was done (locally, manually) on both of the sub-cases of this algorithm.

Example case of HIM

The HIM algorith runs for smaller distributions. To exemplify, the Hypergeometric(20, 20, 10):

Setup Benchmark (with RFunctions):
image

Setup Benchmark (HIM):
image

Sampling Benchmark (with RFunctions):
image

Sampling Benchmark (HIM):
image

QQPlot with 1000 samples

plot_3

Example case of H2PE

The H2PE algorithm is used for larger cases. To examplify, Hypergeometric(200, 200, 100):

Setup Benchmark (with RFunctions):
image

Setup Benchmark (H2PE)
image

Sampling Benchmark (with RFunctions):
image

Sampling Benchmark (H2PE):
image

QQPlot with 1000 samples
plot_4

Feel free to ask any questions and do/request any adjustments. Have a nice week.

@devmotion
Copy link
Copy Markdown
Member

👍 Can you update the PR to follow the design of Distributions.jl, ie create a new sampler in src/samplers instead of changing the distribution type? 🙂

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Aug 18, 2025

Tanks for taking a look at it, will do. Will be back with an update in the near future. (Thinking about it, this may solve the test that is currently failing, I'm dispatching on a type parameter to switch between algorithms and this seems to be against the intended pattern here)

@JaoCR

This comment was marked as resolved.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (bd05c70) to head (5fc62b9).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1994      +/-   ##
==========================================
+ Coverage   86.46%   86.67%   +0.21%     
==========================================
  Files         147      148       +1     
  Lines        8837     8903      +66     
==========================================
+ Hits         7641     7717      +76     
+ Misses       1196     1186      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JaoCR

This comment was marked as resolved.

@JaoCR

This comment was marked as resolved.

@JaoCR

This comment was marked as resolved.

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Aug 23, 2025

I'm back. The errors seen before, after further investigation, also happen with other distributions due to the lack of support for the logpdf function for Dual values of x in StatsFuns. Those other distributions are already excluded from the test. I've included the Hypergeometric distribution to this exclusion rule (it is still tested with Float64 values of x in the logpdf). Also added one more test to cover the modes function which was uncovered before.

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Aug 23, 2025

Checks are passing and test coverage seems to be complete now. @devmotion , please review this when you have time, thanks.

@oscardssmith
Copy link
Copy Markdown
Contributor

@devmotion can this be rebased and merged? I'm happy to do the rebase if you give me permission.

@devmotion
Copy link
Copy Markdown
Member

can this be rebased and merged

I rebased, but the PR has to be reviewed before merging. Apparently, I missed the ping for review above.

@oscardssmith
Copy link
Copy Markdown
Contributor

I've given the styling changes that I would make if I were writing this, but from the perspective of functionality, correctness, appropriateness of tests, and performance this looks good to me. (It was a pretty easy PR for me to review since I found it after writing my own implementation of the same algorithm)

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Feb 17, 2026

I'll have a little time this week to go through the requested reviews. Just saw also that #2032 talks about some performance improvements, so I'll also look at what @oscardssmith worked on over there and test those improvements as well.

Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>
@oscardssmith
Copy link
Copy Markdown
Contributor

Thanks for picking this up again! Feel free to take any code from #2032 as useful.

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Feb 28, 2026

Turns out I didn't have time last week after all, but I'll be doing a few things today. @oscardssmith, I didn't have time to go through your version in more detail, but I see your approach to the algorithm selection is using a boolean flag instead of type dispatch. Do you think this is improving the performance or is it some other optimization?

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Feb 28, 2026

Testing it here it looks like most of the improvement comes from the optimization of the acceptance check at the end of H2PE, my implementation didn't show improvements when I tried it back then but yours is, that's great! I'm incorporating the optimization in this branch. I don't know how I could directly merge from your code so I'm leaving a comment giving credits to your implementation.

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Feb 28, 2026

Just to keep the rigor, here are the same benchmarks and qqplots after the updates.
/home/jao/Imagens/plot_2.png

Sampler Initialization with HIN:
image

Sampling with HIN:
image

qq-plot with 1000 samples with HIN:
plot_2

Sampler Initialization with H2PE:
image

Sampling with H2PE:
image

qq-plot with 1000 samples with H2PE:
plot_4

@oscardssmith
Copy link
Copy Markdown
Contributor

Looks great to me! The last change I would make is you can remove RMath as a direct dependency, and you can delete the macro rand_rdist(D) from src/common.jl (see my Pr for details). Other than that, IMO this is good to merge. @devmotion can you give it a review?

@JaoCR
Copy link
Copy Markdown
Author

JaoCR commented Mar 2, 2026

Sorry, sent some stuff as review answers but they should've been just comments. Anyhow, I've applied most of the suggestions and some fixes that where needed. The substitution of loggamma for logbeta calls worsens the performance, here is the run without this change:
image
and with the change:
image
So I've reverted it.

@oscardssmith
Copy link
Copy Markdown
Contributor

lgtm

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem unrelated?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these referring to? This isn't associated with a line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a file comment. I'm referring to all changes in this file (same below).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaoCR why is Hypergeometric excluded from the tests below?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this seems unrelated?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's slightly unrelated, but new correct tests doesn't seem like a bad thing to me

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's slightly unrelated, but new correct tests doesn't seem like a bad thing to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants