Skip to content

Subtypes of HypothesisTest can have optional fields :tail and :alpha#100

Open
mirkobunse wants to merge 19 commits intoJuliaStats:masterfrom
mirkobunse:master
Open

Subtypes of HypothesisTest can have optional fields :tail and :alpha#100
mirkobunse wants to merge 19 commits intoJuliaStats:masterfrom
mirkobunse:master

Conversation

@mirkobunse
Copy link
Copy Markdown

This enables the specification of tail (left/right/both) and alpha value in the constructors of HypothesisTest subtypes:

using HypothesisTests
EqualVarianceTTest(x, y, alpha=0.0025, tail=:right)

Since the additional arguments of the constructors are named arguments, nobody using this package has to change his constructor calls. Defaults are used so that the package works just as before when the optional arguments are not provided.

Currently, the new optional functionality of this pull request is only implemented for the t-tests. Other subtypes of HypothesisTest can easily follow the adaptations of src/t.jl.

I am looking forward to your comments!
Mirko


Note that this pull request was already present as pull request #99. Since there was a problem with the Travis build routine, I reopen that pull request here.

Copy link
Copy Markdown
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks useful. However, the new feature should be tested. Also, the pvalue function should only be modified for tests which support tail, else it's misleading. Finally, the docs must be updated too.

Comment thread src/HypothesisTests.jl Outdated
end

# Utility to get an optional field (e.g., :tail and :alpha)
getfield(value::HypothesisTest, name::Symbol, default::Any) =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use the name getfield, which is associated with the a.b syntax and is currently not overloadable. Also, this approach isn't really Julian. Better define fallbacks for get_tail and get_alpha which return the default (or maybe better, print a warning), and define specific methods for the distributions which support it (and which are very few).

Also, remove the "get_" prefix.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the "get_" prefix, I simply called the functions tail and alpha. The tail is :undefined, which is the value returned by default_tail if that method is not overloaded. For all tests that always use some default tail, the tail method returns that tail, just like default_tail does. For t-tests, which can be configured to use another tail, that other tail is returned (or the default, if no other tail is specified).

I feel that the method default_tail is superseded by this implementation of the tail method.
default_tail can still be used (and behaves just like tail) but it emits a deprecation warning.

Comment thread src/HypothesisTests.jl Outdated

alpha = get_alpha(test)
tail = get_tail(test)
conf_string = string((1 - alpha) * 100) * "%"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to do string((1 - alpha) * 100) and to insert % directly in the final string to avoid one allocation.

Comment thread src/HypothesisTests.jl Outdated

# population details
has_ci = applicable(StatsBase.confint, test)
label_len = has_ci ? length(conf_string) + 21 : 22 # 21 is length of ' confidence interval:', 22 that of 'parameter of interest:'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break line at 92 chars, e.g. by moving the comment to a separate line.

But this actually sounds kind of overkill to me: it would be simpler to align all lines (of all blocks) to the longest possible one all the time.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a function prettify_detail inside Base.show now that aligns each line consisting of label and value to the longest possible length.

Comment thread src/HypothesisTests.jl Outdated
get_tail(test::HypothesisTest) = getfield(test, :tail, default_tail(test))
get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)

# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap line at 92 chars (check other places too).

Comment thread src/HypothesisTests.jl Outdated
get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)

# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))
with_trailing_whitespace(s::String, len::Int) = s * join(repeat([" "], outer=max(len - length(s), 0)), "")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just " "^max(len - length(s), 0)).

Comment thread src/augmented_dickey_fuller.jl Outdated
end

pvalue(x::ADFTest) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=:left)
pvalue(x::ADFTest; tail=default_tail(x)) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=tail)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you don't use tail=:left here, since you already know the result of default_tail for that test. Same below. Also, 92 char limit.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change by removing the named tail parameter for all but the t-tests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather keep it, but call tail rather than default_tail.

Comment thread src/circular.jl Outdated
function pvalue(x::RayleighTest)
function pvalue(x::RayleighTest; tail=default_tail(x))
if tail != default_tail(x) # warn that tail is not used here
warn("tail=$tail has no effect on the computation of the p value")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better throw an error if the argument doesn't work.

Comment thread src/t.jl Outdated

if tail == :left
(-Inf, StatsBase.confint(x, alpha*2)[2])
(-Inf, StatsBase.confint(x, alpha*2, tail=:both)[2]) # tail=:both required as recursive anchor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"recursive anchor"? I don't understand what happens here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantiles are only computed if tail==:both. The recursion would be infinite if the argument tail=:both would not be provided because tail is set to the tail of x now, which may be one-sided.

The term "recursive anchor" is chosen because tail==:both will not allow infinite recursion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that the term is not obvious, at least not to me. Just say something about blocking infinite recursion.

@mirkobunse
Copy link
Copy Markdown
Author

mirkobunse commented Jun 17, 2017

Thank you very much, @nalimilan, for your detailed review! I implemented all changes you requested and even learned about some other nice features of Julia during the process. Docs and Unit tests are updated.

See my comments for more information and please let me know, if there is anything else to change.

@mirkobunse
Copy link
Copy Markdown
Author

Please note that the Travis build failed for julia:nightly because the package Rmath (and thus also StatsFuns and Distributions, on which this package depends) could not be precompiled.

I hope this is no problem for this pull request. The REQUIRE file states that only julia 0.5 is meant to be supported.

Comment thread .travis.yml Outdated
julia:
- 0.5
- nightly
# - nightly # The REQUIRE file states that only julia 0.5 is supported
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this as it was, it's not an issue if the CI fails for this version (we're not robots).

Comment thread doc/api/confint.rst Outdated

Compute a confidence interval C with coverage 1-``alpha``.
Compute a confidence interval C with coverage 1-``alpha``
(``alpha=0.05`` is the default for all tests).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.05 is the default for all tests unless overridden when creating the object.

Comment thread doc/parametric/test_t.rst Outdated
\nu_{\chi'} \approx \frac{\left(\sum_{i=1}^n k_i s_i^2\right)^2}
{\sum_{i=1}^n \frac{(k_i s_i^2)^2}{\nu_i}}


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing spaces here and elsewhere.

Comment thread src/HypothesisTests.jl Outdated

# utilities for pretty-printing
conf_string = string(floor((1 - alpha(test)) * 100, 6)) # limit to 6 decimals in %
prettify_detail(label::String, value::Any, len::Int) = # len is max length of label
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "format" rather than "prettify".

Comment thread src/HypothesisTests.jl Outdated
println(io, prettify_detail("point estimate:", param_estimate, 32))
if has_ci
println(io, " 95% confidence interval: $(StatsBase.confint(test))")
println(io, prettify_detail(conf_string*"% confidence interval:", StatsBase.confint(test), 32))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for StatsBase., right?

Comment thread doc/api/confint.rst Outdated
(``alpha=0.05`` is the default for all tests).

If ``tail`` is ``:both`` (default), then a two-sided confidence
If ``tail`` is ``:both`` (default for most tests), then a two-sided confidence
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"most" isn't precise enough: give a list of tests for which it is the case below. Same remark for pvalue.

Comment thread doc/parametric/test_t.rst Outdated
the alternative hypothesis that the distribution does not have mean
``mu0``.

``tail`` and ``alpha`` specify the defaults for the application of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for the application of" is an unusual vocabulary. Maybe "when calling"?

Comment thread src/HypothesisTests.jl Outdated
p = pvalue(test)
outcome = if p > 0.05 "fail to reject" else "reject" end
tail = default_tail(test)
tail = HypothesisTests.tail(test)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HypothesisTests. isn't needed. Same below in comment.

Comment thread src/HypothesisTests.jl Outdated
tail = default_tail(test)
tail = HypothesisTests.tail(test)
p = pvalue(test) # obeys value of HypothesisTests.tail(test) if applicable
outcome = if p > alpha(test) "fail to reject" else "reject" end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more standard to use p > alpha(test) ? ... : .... Below, since the conditions are relatively long, would be better to repeat tailvalue = inside each branch (i.e. closer to what it was before).

Comment thread src/t.jl

default_tail(test::TTest) = :both
pvalue(x::TTest; tail=x.tail) = pvalue(TDist(x.df), x.t; tail=tail)
tail(x::TTest) = x.tail # defaults set by constructors
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than defining this here, can you just use these definitions for all HypothesisTest types, and add the needed fields to all tests? If tail doesn't make sense for some types, you can then override the method (to a hardcoded value) instead of adding the field.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nalimilan,

Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.

I will add a tail field to those types, for which the pvalue function takes a tail parameter.

The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types. I will also add a simple function obtaining the test outcome:

reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha

I am looking forward to your comments!


pvalue with fixed tail (no tail parameter; 9 types):

  • OneSampleADTest
  • KSampleADTest
  • ADFTest
  • BoxPierceTest
  • LjungBoxTest
  • BreuschGodfreyTest
  • RayleighTest
  • JarqueBeraTest
  • KruskalWallisTest

pvalue with tail parameter (15 types):

  • BinomialTest
  • SignTest
  • FisherTLinearAssociation
  • JammalamadakaCircularCorrelation
  • FisherExactTest
  • ExactKSTest
  • ApproximateOneSampleKSTest
  • ApproximateTwoSampleKSTest
  • ExactMannWhitneyUTest
  • ApproximateMannWhitneyUTest
  • PowerDivergenceTest
  • ExactSignedRankTest
  • ApproximateSignedRankTest
  • ZTest
  • TTest

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.

I will add a tail field to those types, for which the pvalue function takes a tail parameter.

Yes, of course, no need to add the field to tests which don't need it, overloading the method will be enough for these.

The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types.

Right.

I will also add a simple function obtaining the test outcome:

reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha

I'd rather not add this function, or at least not in this PR, as that's a different issue which deserves some discussion.

Copy link
Copy Markdown
Author

@mirkobunse mirkobunse Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another branch, I already started to add the alpha field to all types, and the tail field those types for which the pvalue method has a tail parameter. I also develop corresponding unit tests and documentation.

Apparently, these changes are quite extensive. We should discuss what the present PR should be going for.

Initially, this PR was meant to (only) provide the infrastructure for subtypes of HypothesisTest to have tail and alpha fields. Initially, I was not planning to actually add these fields to all types. Starting to do so confirmed my impression that this should be a different work package.

I see that all tests should have alpha and tail fields (if applicable), but I suggest to delay these changes to a follow-up PR. I'd like the present PR to focus on the infrastructure, as initially intended. Of course, I will be applying the changes required to make this PR a complete extension to the HypothesisTests package. With what I already did in adding the fields to all types, the next PR is already in preparation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add support for all tests at the same time. Else the package will be in an inconsistent state. What's the interest of merging the PR in a partial state?

Comment thread src/t.jl

# confidence interval by inversion
function StatsBase.confint(x::TTest, alpha::Float64=0.05; tail=:both)
function StatsBase.confint(x::TTest, alpha::Float64=x.alpha; tail::Symbol=x.tail)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use alpha(x) and tail(x) since that gives an identical signature for all tests.

Comment thread doc/api/pvalue.rst

Compute the p-value for a given significance test.

If ``tail`` is ``:both`` (default), then the p-value for the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to mention that the default tail depends on the specific test, and give a list of tests with their default.

Comment thread src/HypothesisTests.jl
p = pvalue(test) # obeys value of tail(test) if applicable
outcome = p > alpha(test) ? "fail to reject" : "reject"
if testtail == :both
plabel ="two-sided p-value:"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space.

Comment thread src/HypothesisTests.jl
plabel = "p-value:"
end
println(io, "Test summary:")
println(io, format_detail("outcome with "*conf_string*"% confidence:", outcome*" h_0", 36))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces around * for consistency and readability. Here you can actually use string interpolation instead.

Comment thread src/HypothesisTests.jl
if testtail == :both
plabel ="two-sided p-value:"
elseif testtail == :left || testtail == :right
plabel = "one-sided p-value ($(string(testtail)) tail):"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string isn't needed AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants