Skip to content

Comments

ENH: Implement cvm_cdf_inf from scipy.stats._hypotests#97

Open
fbourgey wants to merge 3 commits intoscipy:mainfrom
fbourgey:add_cdf_cvm_inf
Open

ENH: Implement cvm_cdf_inf from scipy.stats._hypotests#97
fbourgey wants to merge 3 commits intoscipy:mainfrom
fbourgey:add_cdf_cvm_inf

Conversation

@fbourgey
Copy link
Member

Following @mdhaber's comment and @steppi's suggestion:

  • Port the asymptotic (infinite sample size) null distribution CDF of the single-sample Cramér-von Mises statistic _cvm_cdf_inf from scipy.stats._hypotests to xsf/stats.h

  • I've added a test with reference values computed from scipy.stats._hypotests._cvm_cdf_inf

  • For reference, this is based on the asymptotic formula from Csörgő, S. and Faraway, J. (1996).

Screenshot 2026-02-14 at 3 30 42 PM

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fbourgey!

@steppi I have a few questions about preferred style in xsf, but I checked this from a content perspective, and it LGTM. It's a pretty direct translation of the Python code, and the test compares the results against those from Python. I assume the CI failure is unrelated, so I think it's ready for your review.

Comment on lines +57 to +59
if (std::abs(z) < 1e-7) {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does xsf have any tools for summing infinite series like SciPy/mpmath's nsum?

Comment on lines 5 to 14
// Generate linspace: xs = np.linspace(2e-3, 1 - 2e-3, 51)
const int n_points = 51;
const double start = 2e-3;
const double end = 1.0 - 2e-3;
std::vector<double> xs(n_points);
for (int i = 0; i < n_points; ++i) {
xs[i] = start + (end - start) * i / (n_points - 1);
}

// Reference values computed with scipy.stats._hypotests._cdf_cvm_inf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to just include the complete Python code? That makes it easier to verify that the numbers are correct.

// Reference values computed with scipy.stats._hypotests._cdf_cvm_inf
// from scipy.stats._hypotests import _cdf_cvm_inf
// x = np.linspace(2e-3, 1-2e-3, 51)
// res = _cdf_cvm_inf(x)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the suggestion.


// Reference values computed with scipy.stats._hypotests._cdf_cvm_inf
const std::vector<double> expected = {
1.14362132e-27, 5.17604145e-03, 7.65622444e-02, 1.97103480e-01, 3.17803769e-01, 4.22913972e-01, 5.10702567e-01,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it preferred in xsf to truncate the results to a certain number of digits? In any case, these are the right numbers.


const double rtol = 1e-8;

for (int i = 0; i < n_points; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steppi your eyes will be more attuned to the standard testing idioms, but this looks like it's comparing the right things.


inline double chdtri(double df, double y) { return cephes::chdtri(df, y); }

inline double cvm_cdf_inf(double x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a faithful translation of the original code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants