Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new experimental GLM option remove_offset_effects to keep offsets during training but remove their effect during scoring/prediction and model metrics, aligning with the “restricted vs unrestricted” model pattern already used for control_variables.
Changes:
- Introduces
remove_offset_effectsparameter in GLM (backend + REST schema) and exposes it in R/Python clients. - Updates GLM scoring/metrics/scoring-history flow to compute both restricted (offset removed) and unrestricted metrics, and enables
make_unrestricted_glm_modelfor this use case. - Adds docs + new tests/examples across Java/R/Python to exercise the feature.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| h2o-algos/src/main/java/hex/glm/GLM.java | Implements restricted/unrestricted scoring-history + metrics computation when remove_offset_effects is enabled. |
| h2o-algos/src/main/java/hex/glm/GLMModel.java | Adds new parameter + basic validation for remove_offset_effects. |
| h2o-algos/src/main/java/hex/glm/GLMScore.java | Skips adding offset into the linear predictor when restricted scoring is enabled. |
| h2o-algos/src/main/java/hex/glm/GLMUtils.java | Renames/extends scoring history combiner for “restricted” use. |
| h2o-algos/src/main/java/hex/schemas/GLMV3.java | Exposes remove_offset_effects via REST schema. |
| h2o-algos/src/main/java/hex/api/MakeGLMModelHandler.java | Allows creating unrestricted model when remove_offset_effects was used; resets the flag on the derived model. |
| h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java | Adds backend tests for remove_offset_effects behavior and its interaction with control_variables. |
| h2o-r/h2o-package/R/glm.R | Adds R API parameter + expands make_unrestricted_glm_model guard. |
| h2o-bindings/bin/custom/R/gen_glm.py | Updates generated R binding template for make_unrestricted_glm_model guard. |
| h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R | Adds an R explain/learning-curve smoke test with remove_offset_effects. |
| h2o-py/h2o/estimators/glm.py | Adds Python API parameter + getter/setter. |
| h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py | Adds Python test comparing behavior with/without remove_offset_effects. |
| h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_glm.py | Adds Python test scaffold around offset scoring behavior. |
| h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst | Documents the new parameter and provides examples. |
| h2o-docs/src/product/data-science/algo-params/control_variables.rst | Links control_variables docs to remove_offset_effects docs. |
| h2o-core/src/main/java/hex/ModelMetricsBinomial.java | Minor signature cleanup (parameter rename). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R
Outdated
Show resolved
Hide resolved
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R
Outdated
Show resolved
Hide resolved
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst
Outdated
Show resolved
Hide resolved
tomasfryda
left a comment
There was a problem hiding this comment.
@maurever Correct me if I am wrong but it looks to me that there is no way of having both control variables(CV) and remove offset and getting:
- model that uses CV and offset
- model that doesn't use CV but does use offset
- model that uses CV and doesn't use offset
- model that doesn't use CV and doesn't use offset
(No way other than training multiple models)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects_compare.py
Outdated
Show resolved
Hide resolved
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects_compare.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-py/tests/testdir_algos/glm/pyunit_glm_make_unrestricted_model.py
Outdated
Show resolved
Hide resolved
h2o-r/tests/testdir_algos/glm/runit_GLM_make_unrestricted_model.R
Outdated
Show resolved
Hide resolved
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects_compare.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java:433
trainis created with a key and then referenced viaparams._train = train._key, but it is no longer put into the DKV. This will likely cause the builder to fail with a missing-frame error (or a different exception than expected), making the test unreliable. AddDKV.put(train);here (as done in the surrounding tests) before training the model.
Vec cat1 = Vec.makeVec(new long[]{1,1,1,0,0},new String[]{"black","red"},Vec.newKey());
Vec cat2 = Vec.makeVec(new long[]{1,1,1,0,0},new String[]{"a","b"},Vec.newKey());
Vec res = Vec.makeVec(new double[]{1,1,2,0,0},cat1.group().addVec());
train = new Frame(Key.<Frame>make("train"),new String[]{"x1", "x2", "y"},new Vec[]{cat1, cat2,res});
GLMModel.GLMParameters params = new GLMModel.GLMParameters();
params._train = train._key;
params._alpha = new double[]{0};
params._response_column = "y";
params._intercept = false;
params._control_variables = new String[]{"x1"};
params._distribution = DistributionFamily.multinomial;
glm = new GLM(params).trainModel().get();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst
Outdated
Show resolved
Hide resolved
h2o-docs/src/product/data-science/algo-params/control_variables.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
# Conflicts: # h2o-algos/src/main/java/hex/glm/GLMModel.java
* initial fixes/changes * Add tests * Remove redundant tests and incorrect changes * Remove a python test * revert unnecessary changes
38c2b15 to
6dbea01
Compare
|
tomasfryda
left a comment
There was a problem hiding this comment.
LGTM, thank you @maurever !


Issue: #16676