Implementing Hypergeometric sampling to remove RFunctions dependency#1994
Implementing Hypergeometric sampling to remove RFunctions dependency#1994JaoCR wants to merge 31 commits intoJuliaStats:masterfrom
Conversation
|
👍 Can you update the PR to follow the design of Distributions.jl, ie create a new sampler in src/samplers instead of changing the distribution type? 🙂 |
|
Tanks for taking a look at it, will do. Will be back with an update in the near future. (Thinking about it, this may solve the test that is currently failing, I'm dispatching on a type parameter to switch between algorithms and this seems to be against the intended pattern here) |
This comment was marked as resolved.
This comment was marked as resolved.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1994 +/- ##
==========================================
+ Coverage 86.46% 86.67% +0.21%
==========================================
Files 147 148 +1
Lines 8837 8903 +66
==========================================
+ Hits 7641 7717 +76
+ Misses 1196 1186 -10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
I'm back. The errors seen before, after further investigation, also happen with other distributions due to the lack of support for the logpdf function for Dual values of x in StatsFuns. Those other distributions are already excluded from the test. I've included the Hypergeometric distribution to this exclusion rule (it is still tested with Float64 values of x in the logpdf). Also added one more test to cover the modes function which was uncovered before. |
|
Checks are passing and test coverage seems to be complete now. @devmotion , please review this when you have time, thanks. |
|
@devmotion can this be rebased and merged? I'm happy to do the rebase if you give me permission. |
I rebased, but the PR has to be reviewed before merging. Apparently, I missed the ping for review above. |
|
I've given the styling changes that I would make if I were writing this, but from the perspective of functionality, correctness, appropriateness of tests, and performance this looks good to me. (It was a pretty easy PR for me to review since I found it after writing my own implementation of the same algorithm) |
|
I'll have a little time this week to go through the requested reviews. Just saw also that #2032 talks about some performance improvements, so I'll also look at what @oscardssmith worked on over there and test those improvements as well. |
Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>
|
Thanks for picking this up again! Feel free to take any code from #2032 as useful. |
|
Turns out I didn't have time last week after all, but I'll be doing a few things today. @oscardssmith, I didn't have time to go through your version in more detail, but I see your approach to the algorithm selection is using a boolean flag instead of type dispatch. Do you think this is improving the performance or is it some other optimization? |
|
Testing it here it looks like most of the improvement comes from the optimization of the acceptance check at the end of H2PE, my implementation didn't show improvements when I tried it back then but yours is, that's great! I'm incorporating the optimization in this branch. I don't know how I could directly merge from your code so I'm leaving a comment giving credits to your implementation. |
|
Looks great to me! The last change I would make is you can remove RMath as a direct dependency, and you can delete the |
Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>
Co-authored-by: David Müller-Widmann <devmotion@users.noreply.github.com>
|
lgtm |
There was a problem hiding this comment.
These changes seem unrelated?
There was a problem hiding this comment.
what are these referring to? This isn't associated with a line.
There was a problem hiding this comment.
It's a file comment. I'm referring to all changes in this file (same below).
There was a problem hiding this comment.
@JaoCR why is Hypergeometric excluded from the tests below?
There was a problem hiding this comment.
Same here, this seems unrelated?
There was a problem hiding this comment.
it's slightly unrelated, but new correct tests doesn't seem like a bad thing to me
There was a problem hiding this comment.
it's slightly unrelated, but new correct tests doesn't seem like a bad thing to me








Following the TODO in the Hypergeometric distribution sampling code, this PR implements the H2PE/HIM algorithm presented by Kachitvichyanukul & Schmeiser in 1984, removing this dependency on RFunctions and improving the sampling performance with a penalty in setup time and some extra stored constants.
Added some test cases for complete coverage of the distribution code. More extensive testing with the distribution of the generated variables was done (locally, manually) on both of the sub-cases of this algorithm.
Example case of HIM
The HIM algorith runs for smaller distributions. To exemplify, the
Hypergeometric(20, 20, 10):Setup Benchmark (with RFunctions):

Setup Benchmark (HIM):

Sampling Benchmark (with RFunctions):

Sampling Benchmark (HIM):

QQPlot with 1000 samples
Example case of H2PE
The H2PE algorithm is used for larger cases. To examplify,
Hypergeometric(200, 200, 100):Setup Benchmark (with RFunctions):

Setup Benchmark (H2PE)

Sampling Benchmark (with RFunctions):

Sampling Benchmark (H2PE):

QQPlot with 1000 samples

Feel free to ask any questions and do/request any adjustments. Have a nice week.