speed up the model code by EmilHvitfeldt · Pull Request #229 · tidymodels/tidypredict

EmilHvitfeldt · 2026-02-23T17:03:48Z

Results

ranger

Issue 1: Data frame row subsetting in recursive function

The original code in build_nested_ranger_node() used tree[tree$nodeID == node_id, ] on every recursive call, which is O(n) per node.

Fix: Pre-extract all columns as vectors before recursion and use direct vector indexing.

Configuration	Before	After Round 1	Speedup
max.depth=10	2.15s	0.45s	4.8x
max.depth=20	7.75s	2.57s	3.0x
num.trees=500	10.72s	2.45s	4.4x
num.trees=1000	22.20s	5.05s	4.4x

Issue 2: Named vector lookup with as.character() conversion

After round 1, profiling showed 55% of time spent on id_to_idx[[as.character(node_id)]] - the as.character() call on every recursive iteration was expensive.

Fix: Since ranger nodeIDs are 0-indexed and sequential, use direct integer indexing node_id + 1L instead of named lookup.

Configuration	After Round 1	After Round 2	Speedup
max.depth=10	0.45s	0.17s	2.6x
max.depth=20	2.57s	0.44s	5.8x
num.trees=500	2.45s	0.92s	2.7x
num.trees=1000	5.05s	1.88s	2.7x

Total improvement:

Configuration	Original	Final	Total Speedup
max.depth=10	2.15s	0.17s	12.6x
max.depth=20	7.75s	0.44s	17.6x
num.trees=500	10.72s	0.92s	11.7x
num.trees=1000	22.20s	1.88s	11.8x

Final profile breakdown (500 trees, depth=10):

26% GC (garbage collection from expression building)
21% enexpr (rlang expression building)
7% rlang::sym (symbol creation)
Remaining time distributed across inherent operations

No further obvious optimizations - remaining time is inherent to rlang expression building.

lightgbm

Issue 1: Duplicate JSON parsing

The original code parsed model$dump_model() via jsonlite::fromJSON() twice: once in parse_model.lgb.Booster() and again in extract_lgb_trees_nested(). JSON parsing took 57% of total time.

Fix: Pass feature_names from parsedmodel to extract_lgb_trees_nested() to avoid redundant JSON parsing.

Issue 2: Slow grepl() for string matching

The parse_lgb_linear_trees() function used grepl("^prefix", line) in a loop over every line of the model string.

Fix: Replace grepl("^prefix", ...) with startsWith(line, "prefix") which is faster for prefix matching.

Issue 3: Data frame column access in path building

The get_lgb_path() function used which(tree_df$split_index == current_parent_split) (O(n) per iteration) and repeated tree_df$column[[parent_row]] access.

Fix: Pre-extract columns as vectors and build a split_index to row lookup array for O(1) access.

Configuration	Before	After	Speedup
nrounds=100	1.05s	0.62s	1.7x
nrounds=250	2.69s	1.71s	1.6x
nrounds=500	5.75s	3.71s	1.5x

randomForest

Issue: Matrix row access and repeated unname() calls

The original code used tree[node_id, ] to get a row, then called unname() on each column value (4-5 times per internal node). This caused:

23.8% of time in unname() calls
128 MB memory allocation from unname() alone

Fix: Pre-extract all matrix columns as vectors with unname() called once per column, then use direct integer indexing.

Configuration	Before	After	Speedup
ntree=100	0.96s	0.56s	1.7x
ntree=250	2.73s	1.55s	1.8x
ntree=500	5.80s	3.42s	1.7x

Final profile breakdown (500 trees):

34% GC (garbage collection from expression building)
21% enexpr (rlang expression building)
5% rlang::sym (symbol creation)
Remaining time distributed across inherent operations

No further obvious optimizations - remaining time is inherent to rlang expression building.

catboost

Issue: Repeated feature info lookups in oblivious tree parsing

For oblivious trees (same splits for all leaves), the original code parsed each split separately for every leaf. With depth=6 (64 leaves), each of the 6 splits was looked up 64 times per tree instead of once.

Fix: Pre-extract split info once per tree, then for each leaf only determine the direction (op) based on bit value.

Configuration	Before	After	Speedup
iterations=100	0.18s	0.15s	1.2x
iterations=250	0.48s	0.42s	1.1x
iterations=500	0.97s	0.73s	1.3x

Note: catboost was already quite fast. Most time is spent in JSON parsing (saving model to JSON file, then reading it), which is unavoidable.

cubist

Issue: Repeated data frame subsetting with == in nested loops

The original code used coefs[coefs$rule == .x & coefs$committee == comm, ] inside nested loops, causing O(n) scans for every rule in every committee. With 100 committees and many rules, this became O(n * committees * rules_per_committee).

Fix: Pre-split data frames by committee and rule using split() once, then use direct hash lookup by key.

Configuration	Before	After	Speedup
committees=10	0.16s	0.15s	1.1x
committees=25	0.42s	0.35s	1.2x
committees=50	1.09s	0.74s	1.5x
committees=100	2.78s	1.64s	1.7x

The speedup scales with model complexity - larger models see more benefit.

rpart

Already fast - no optimization needed. <10ms for 617 nodes.

partykit

Issue: Repeated tree traversal via nodeapply() and model[[.x]]

The original code called partykit::nodeapply(model, .x) and model[[.x]] for every single node. Each call traverses the entire tree to find that node, resulting in O(n²) complexity. 73.7% of time was spent in rid/nodeids.partynode (partykit's internal recursive tree traversal).

Fix: Extract all nodes at once using partykit::nodeapply(model, ids = all_node_ids, FUN = identity) and compute predictions using tapply() instead of per-node iteration.

Configuration	Before	After	Speedup
maxdepth=5 (61 nodes)	0.162s	0.005s	32x
maxdepth=10 (193 nodes)	1.197s	0.021s	57x
maxdepth=15 (197 nodes)	1.212s	0.005s	242x

The speedup is dramatic because we eliminated O(n²) tree traversals.

lm, glm, glmnet, earth

All regression-based models are already fast - no optimization needed.

Model	Time	Size
lm	0.016s	74 coefficients
glm	0.012s	74 coefficients
glmnet	0.003s	51 non-zero coefficients
earth	0.065s	41 terms

These models are simple coefficient-based formulas with minimal computation required.

Summary

All 11 model types have been profiled. Optimizations were made to 7 models:

Model	Before	After	Speedup	Key Fix
ranger	10.72s	0.92s	11.7x	Pre-extract columns, direct integer indexing
randomForest	5.80s	3.42s	1.7x	Pre-extract matrix columns with `unname()`
lightgbm	5.75s	3.71s	1.5x	Avoid duplicate JSON parsing, pre-extract columns
catboost	0.97s	0.73s	1.3x	Pre-extract split info once per tree
cubist	2.78s	1.64s	1.7x	Pre-split data frames by committee/rule
partykit	1.20s	0.02s	57x	Extract all nodes at once, eliminate O(n²) traversals
rpart	-	<0.01s	N/A	Already fast
lm/glm/glmnet/earth	-	<0.07s	N/A	Already fast

Common optimization patterns:

Pre-extract data frame/matrix columns as vectors before recursive functions
Use direct integer indexing instead of named lookups with as.character()
Avoid repeated O(n) scans in loops - pre-split or build lookup tables
Avoid calling tree-traversal functions (like nodeapply) per-node

speed up the model code

606b460

EmilHvitfeldt merged commit 2a626c0 into main Feb 23, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up the model code#229

speed up the model code#229
EmilHvitfeldt merged 1 commit into
mainfrom
speedup

EmilHvitfeldt commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EmilHvitfeldt commented Feb 23, 2026

Results

ranger

lightgbm

randomForest

catboost

cubist

rpart

partykit

lm, glm, glmnet, earth

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant