Skip to content

Support categorical features in ranger.unify and gbm.unify#54

Draft
Copilot wants to merge 3 commits into
masterfrom
copilot/add-support-categorical-features
Draft

Support categorical features in ranger.unify and gbm.unify#54
Copilot wants to merge 3 commits into
masterfrom
copilot/add-support-categorical-features

Conversation

Copy link
Copy Markdown

Copilot AI commented May 13, 2026

ranger.unify and gbm.unify rejected models trained on factor columns. This PR implements categorical split support across the full treeshap pipeline.

Encoding convention

A new decision_type = 3 (displayed as "==") represents a categorical bitmask split. The Split value is a right-group bitmask: bit k-1 set → factor level k goes to the No (right) child. Factor columns in datasets are passed through as.numeric(factor) (already done by existing sapply(x, as.numeric) calls), yielding 1-based integer codes.

C++ traversal (treeshap.cpp, set_reference_dataset.cpp, predict.cpp)

Added decision_type == 3 branch to all three tree-walking functions:

|| ((decision_type[j] == 3) && ((int)observation[feature[j]] >= 1)
    && !(((int)split[j] >> ((int)observation[feature[j]] - 1)) & 1))

If the observation's level is not in the right-group bitmask → goes left (Yes child).

gbm.unify

  • Removed stop() that blocked categorical features.
  • For each categorical split node, reads the 0-based c.splits index stored in Split, retrieves gbm_model$c.splits[[idx]], and computes a right-group bitmask (levels where c_split[k] == 1 go right). Sets Decision.type = "==" for those nodes.

ranger.unify / ranger_unify.common

  • Extracts !forest$is.ordered (per-feature unordered flag) from the ranger model and passes it into ranger_unify.common.
  • When treeInfo returns character splitval (partition mode), parses the comma-separated right-going level indices and computes the right-group bitmask. Threshold splits on ordered/numeric features in the same trees (also coerced to character by R) are handled by as.numeric() conversion.

model_unified.R

  • is.model_unified() now accepts c("<=", "<", "==") as a valid level set (backward-compatible with existing two-level models).
  • Updated Split and Decision.type column documentation.

Usage

# ranger with unordered factor splits (partition mode)
rf <- ranger::ranger(target ~ ., data = data_with_factors,
                     respect.unordered.factors = "partition")
unified <- ranger.unify(rf, data_with_factors)
treeshap(unified, data_with_factors[1:5, ])

# gbm with factor columns
gbm_model <- gbm::gbm(target ~ ., data = data_with_factors)
unified <- gbm.unify(gbm_model, data_with_factors)
treeshap(unified, data_with_factors[1:5, ])

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cran.r-project.org
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI linked an issue May 13, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Add support for categorical features in ranger and gbm unify Support categorical features in ranger.unify and gbm.unify May 13, 2026
Copilot AI requested a review from mayer79 May 13, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support categorical features

2 participants