Subtypes of HypothesisTest can have optional fields :tail and :alpha by mirkobunse · Pull Request #100 · JuliaStats/HypothesisTests.jl

mirkobunse · 2017-06-04T10:51:13Z

This enables the specification of tail (left/right/both) and alpha value in the constructors of HypothesisTest subtypes:

using HypothesisTests
EqualVarianceTTest(x, y, alpha=0.0025, tail=:right)

Since the additional arguments of the constructors are named arguments, nobody using this package has to change his constructor calls. Defaults are used so that the package works just as before when the optional arguments are not provided.

Currently, the new optional functionality of this pull request is only implemented for the t-tests. Other subtypes of HypothesisTest can easily follow the adaptations of src/t.jl.

I am looking forward to your comments!
Mirko

Note that this pull request was already present as pull request #99. Since there was a problem with the Travis build routine, I reopen that pull request here.

…with default

…ctions

nalimilan

Thanks, this looks useful. However, the new feature should be tested. Also, the pvalue function should only be modified for tests which support tail, else it's misleading. Finally, the docs must be updated too.

nalimilan · 2017-06-13T11:49:32Z

 end

+# Utility to get an optional field (e.g., :tail and :alpha)
+getfield(value::HypothesisTest, name::Symbol, default::Any) =


I'd rather not use the name getfield, which is associated with the a.b syntax and is currently not overloadable. Also, this approach isn't really Julian. Better define fallbacks for get_tail and get_alpha which return the default (or maybe better, print a warning), and define specific methods for the distributions which support it (and which are very few).

Also, remove the "get_" prefix.

Removing the "get_" prefix, I simply called the functions tail and alpha. The tail is :undefined, which is the value returned by default_tail if that method is not overloaded. For all tests that always use some default tail, the tail method returns that tail, just like default_tail does. For t-tests, which can be configured to use another tail, that other tail is returned (or the default, if no other tail is specified).

I feel that the method default_tail is superseded by this implementation of the tail method.
default_tail can still be used (and behaves just like tail) but it emits a deprecation warning.

nalimilan · 2017-06-13T11:50:37Z

+
+    alpha = get_alpha(test)
+    tail  = get_tail(test)
+    conf_string = string((1 - alpha) * 100) * "%"


Would be better to do string((1 - alpha) * 100) and to insert % directly in the final string to avoid one allocation.

nalimilan · 2017-06-13T11:52:17Z


    # population details
    has_ci = applicable(StatsBase.confint, test)
+    label_len = has_ci ? length(conf_string) + 21 : 22 # 21 is length of ' confidence interval:', 22 that of 'parameter of interest:'


Break line at 92 chars, e.g. by moving the comment to a separate line.

But this actually sounds kind of overkill to me: it would be simpler to align all lines (of all blocks) to the longest possible one all the time.

There is a function prettify_detail inside Base.show now that aligns each line consisting of label and value to the longest possible length.

nalimilan · 2017-06-13T11:58:20Z

+get_tail(test::HypothesisTest) = getfield(test, :tail, default_tail(test))
+get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)
+
+# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))


Wrap line at 92 chars (check other places too).

nalimilan · 2017-06-13T11:59:47Z

+get_alpha(test::HypothesisTest) = getfield(test, :alpha, 0.05)
+
+# Utility for pretty-printing: Append white space so that length(with_trailing_whitespace(s)) = max(len, length(s))
+with_trailing_whitespace(s::String, len::Int) = s * join(repeat([" "], outer=max(len - length(s), 0)), "")


Just " "^max(len - length(s), 0)).

nalimilan · 2017-06-13T15:38:42Z

 end

-pvalue(x::ADFTest) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=:left)
+pvalue(x::ADFTest; tail=default_tail(x)) = HypothesisTests.pvalue(Normal(0, 1), adf_pv_aux(x.stat, x.deterministic); tail=tail)


I don't understand why you don't use tail=:left here, since you already know the result of default_tail for that test. Same below. Also, 92 char limit.

I reverted this change by removing the named tail parameter for all but the t-tests

Rather keep it, but call tail rather than default_tail.

nalimilan · 2017-06-13T15:40:02Z

-function pvalue(x::RayleighTest)
+function pvalue(x::RayleighTest; tail=default_tail(x))
+    if tail != default_tail(x) # warn that tail is not used here
+        warn("tail=$tail has no effect on the computation of the p value")


Better throw an error if the argument doesn't work.

nalimilan · 2017-06-13T15:40:58Z


    if tail == :left
-        (-Inf, StatsBase.confint(x, alpha*2)[2])
+        (-Inf, StatsBase.confint(x, alpha*2, tail=:both)[2]) # tail=:both required as recursive anchor


"recursive anchor"? I don't understand what happens here.

The quantiles are only computed if tail==:both. The recursion would be infinite if the argument tail=:both would not be provided because tail is set to the tail of x now, which may be one-sided.

The term "recursive anchor" is chosen because tail==:both will not allow infinite recursion.

It's just that the term is not obvious, at least not to me. Just say something about blocking infinite recursion.

…ult instead of calling default_tail

The default_tail function was renamed to tail and overloaded for the different defaults or field access (in the case of t-tests).

… pvalue function

All tests pass but the deprecation warnings are too annoying to actually run the tests every time.

mirkobunse · 2017-06-17T08:00:47Z

Thank you very much, @nalimilan, for your detailed review! I implemented all changes you requested and even learned about some other nice features of Julia during the process. Docs and Unit tests are updated.

See my comments for more information and please let me know, if there is anything else to change.

mirkobunse · 2017-06-17T08:37:06Z

Please note that the Travis build failed for julia:nightly because the package Rmath (and thus also StatsFuns and Distributions, on which this package depends) could not be precompiled.

I hope this is no problem for this pull request. The REQUIRE file states that only julia 0.5 is meant to be supported.

nalimilan · 2017-06-26T20:21:31Z

 julia:
  - 0.5
-  - nightly
+#  - nightly    # The REQUIRE file states that only julia 0.5 is supported


Please leave this as it was, it's not an issue if the CI fails for this version (we're not robots).

nalimilan · 2017-06-27T20:25:03Z


-    Compute a confidence interval C with coverage 1-``alpha``.
+    Compute a confidence interval C with coverage 1-``alpha``
+    (``alpha=0.05`` is the default for all tests).


0.05 is the default for all tests unless overridden when creating the object.

nalimilan · 2017-06-27T20:26:33Z

        \nu_{\chi'} \approx \frac{\left(\sum_{i=1}^n k_i s_i^2\right)^2}
                                 {\sum_{i=1}^n \frac{(k_i s_i^2)^2}{\nu_i}}
-
+


Remove trailing spaces here and elsewhere.

nalimilan · 2017-06-27T20:27:58Z

+
+    # utilities for pretty-printing
+    conf_string = string(floor((1 - alpha(test)) * 100, 6)) # limit to 6 decimals in %
+    prettify_detail(label::String, value::Any, len::Int) = # len is max length of label


I'd say "format" rather than "prettify".

nalimilan · 2017-06-27T20:28:22Z

+    println(io, prettify_detail("point estimate:", param_estimate, 32))
    if has_ci
-        println(io, "    95% confidence interval: $(StatsBase.confint(test))")
+        println(io, prettify_detail(conf_string*"% confidence interval:", StatsBase.confint(test), 32))


No need for StatsBase., right?

nalimilan · 2017-06-27T20:44:22Z

+    (``alpha=0.05`` is the default for all tests).

-    If ``tail`` is ``:both`` (default), then a two-sided confidence
+    If ``tail`` is ``:both`` (default for most tests), then a two-sided confidence


"most" isn't precise enough: give a list of tests for which it is the case below. Same remark for pvalue.

nalimilan · 2017-06-27T20:47:08Z

    the alternative hypothesis that the distribution does not have mean
    ``mu0``.
+
+    ``tail`` and ``alpha`` specify the defaults for the application of 


"for the application of" is an unusual vocabulary. Maybe "when calling"?

nalimilan · 2017-06-27T20:48:28Z

-    p = pvalue(test)
-    outcome = if p > 0.05 "fail to reject" else "reject" end
-    tail = default_tail(test)
+    tail = HypothesisTests.tail(test)


HypothesisTests. isn't needed. Same below in comment.

nalimilan · 2017-06-27T20:50:32Z

-    tail = default_tail(test)
+    tail = HypothesisTests.tail(test)
+    p = pvalue(test) # obeys value of HypothesisTests.tail(test) if applicable
+    outcome = if p > alpha(test) "fail to reject" else "reject" end


It's more standard to use p > alpha(test) ? ... : .... Below, since the conditions are relatively long, would be better to repeat tailvalue = inside each branch (i.e. closer to what it was before).

nalimilan · 2017-06-27T20:53:49Z

-
-default_tail(test::TTest) = :both
+pvalue(x::TTest; tail=x.tail) = pvalue(TDist(x.df), x.t; tail=tail)
+tail(x::TTest)  = x.tail    # defaults set by constructors


Rather than defining this here, can you just use these definitions for all HypothesisTest types, and add the needed fields to all tests? If tail doesn't make sense for some types, you can then override the method (to a hardcoded value) instead of adding the field.

Hi @nalimilan,

Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.

I will add a tail field to those types, for which the pvalue function takes a tail parameter.

The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types. I will also add a simple function obtaining the test outcome:

reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha

I am looking forward to your comments!

pvalue with fixed tail (no tail parameter; 9 types):

OneSampleADTest

KSampleADTest

ADFTest

BoxPierceTest

LjungBoxTest

BreuschGodfreyTest

RayleighTest

JarqueBeraTest

KruskalWallisTest

pvalue with tail parameter (15 types):

BinomialTest

SignTest

FisherTLinearAssociation

JammalamadakaCircularCorrelation

FisherExactTest

ExactKSTest

ApproximateOneSampleKSTest

ApproximateTwoSampleKSTest

ExactMannWhitneyUTest

ApproximateMannWhitneyUTest

PowerDivergenceTest

ExactSignedRankTest

ApproximateSignedRankTest

ZTest

TTest

Adding a tail field to all types, which is not used by several of them, is not a quite clean solution. I would rather add the field to those types that actually can make use of it. These are quite many, though: Below, you find a list of types for which the pvalue function has a tail parameter, and those types for which pvalue has a fixed tail.

I will add a tail field to those types, for which the pvalue function takes a tail parameter.

Yes, of course, no need to add the field to tests which don't need it, overloading the method will be enough for these.

The alpha value is relevant to all tests. Therefore, I will add an alpha field to all types.

Right.

I will also add a simple function obtaining the test outcome:

reject_h0(test::HypothesisTest) = pvalue(test) <= test.alpha

I'd rather not add this function, or at least not in this PR, as that's a different issue which deserves some discussion.

In another branch, I already started to add the alpha field to all types, and the tail field those types for which the pvalue method has a tail parameter. I also develop corresponding unit tests and documentation.

Apparently, these changes are quite extensive. We should discuss what the present PR should be going for.

Initially, this PR was meant to (only) provide the infrastructure for subtypes of HypothesisTest to have tail and alpha fields. Initially, I was not planning to actually add these fields to all types. Starting to do so confirmed my impression that this should be a different work package.

I see that all tests should have alpha and tail fields (if applicable), but I suggest to delay these changes to a follow-up PR. I'd like the present PR to focus on the infrastructure, as initially intended. Of course, I will be applying the changes required to make this PR a complete extension to the HypothesisTests package. With what I already did in adding the fields to all types, the next PR is already in preparation.

I think it makes sense to add support for all tests at the same time. Else the package will be in an inconsistent state. What's the interest of merging the PR in a partial state?

…eld of every test.

nalimilan · 2017-07-02T13:46:22Z


 # confidence interval by inversion
-function StatsBase.confint(x::TTest, alpha::Float64=0.05; tail=:both)
+function StatsBase.confint(x::TTest, alpha::Float64=x.alpha; tail::Symbol=x.tail)


Use alpha(x) and tail(x) since that gives an identical signature for all tests.

nalimilan · 2017-07-02T13:50:44Z


    Compute the p-value for a given significance test.

-    If ``tail`` is ``:both`` (default), then the p-value for the


Would be nice to mention that the default tail depends on the specific test, and give a list of tests with their default.

nalimilan · 2017-07-02T13:51:42Z

+    p = pvalue(test) # obeys value of tail(test) if applicable
+    outcome = p > alpha(test) ? "fail to reject" : "reject"
+    if testtail == :both
+        plabel ="two-sided p-value:"


Missing space.

nalimilan · 2017-07-02T13:53:15Z

+        plabel = "p-value:"
    end
+    println(io, "Test summary:")
+    println(io, format_detail("outcome with "*conf_string*"% confidence:", outcome*" h_0", 36))


Add spaces around * for consistency and readability. Here you can actually use string interpolation instead.

nalimilan · 2017-07-02T13:53:33Z

+    if testtail == :both
+        plabel ="two-sided p-value:"
+    elseif testtail == :left || testtail == :right
+        plabel = "one-sided p-value ($(string(testtail)) tail):"


string isn't needed AFAIK.

mirkobunse added 6 commits June 2, 2017 18:01

Subtypes of HypothesisTest have an optional tail field

ef60cb7

Subtypes of HypothesisTest have an optional alpha field

5f82c9c

Indent all labels of details in Base.show by the same length

14f15d1

get_tail and get_alpha reuse the same function getting a field value …

c3363a1

…with default

Fixed test run by providing optional tail parameter to all pvalue fun…

596444f

…ctions

Confidence interval in Base.show respects alpha parameter

15282b5

nalimilan requested changes Jun 13, 2017

View reviewed changes

mirkobunse added 10 commits June 14, 2017 15:59

Optional tail parameter of pvalue is initialized with hard-coded defa…

8007b1e

…ult instead of calling default_tail

Pretty-printing with fixed detail line length

e63c378

Fallback-function approach to tail and alpha.

c882e25

The default_tail function was renamed to tail and overloaded for the different defaults or field access (in the case of t-tests).

Redirect deprecated default_tail to tail

c1ac852

Fixed Base.show for the case of tail being no keyword argument of the…

36a9e76

… pvalue function

Added comments describing how to test deprecation forwarding.

e6bc7a6

All tests pass but the deprecation warnings are too annoying to actually run the tests every time.

Define IO Buffer once to ease temporary redirection to STDOUT

bda80a7

Added unit tests for tail parameter

480e152

Updated doc of t-tests

96eb8cf

Merge remote-tracking branch 'origin/master' into tail_alpha

7f66aa8

mirkobunse added 2 commits June 17, 2017 12:43

Uncommented julia:nightly because not even specified by REQUIRE file

86626ec

Removed named tail parameter from the remaining pvalue functions

1938ffb

nalimilan reviewed Jun 27, 2017

View reviewed changes

Realised requested changes. Only missing change: tail and alpha as fi…

5d38a50

…eld of every test.

nalimilan reviewed Jul 2, 2017

View reviewed changes

		\nu_{\chi'} \approx \frac{\left(\sum_{i=1}^n k_i s_i^2\right)^2}
		{\sum_{i=1}^n \frac{(k_i s_i^2)^2}{\nu_i}}


		Compute the p-value for a given significance test.

		If ``tail`` is ``:both`` (default), then the p-value for the

Conversation

mirkobunse commented Jun 4, 2017

Uh oh!

nalimilan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirkobunse commented Jun 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mirkobunse commented Jun 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirkobunse Jul 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

mirkobunse commented Jun 17, 2017 •

edited

Loading

mirkobunse Jul 4, 2017 •

edited

Loading