Skip to content

speed up the model code#229

Merged
EmilHvitfeldt merged 1 commit into
mainfrom
speedup
Feb 23, 2026
Merged

speed up the model code#229
EmilHvitfeldt merged 1 commit into
mainfrom
speedup

Conversation

@EmilHvitfeldt
Copy link
Copy Markdown
Member

Results

ranger

Issue 1: Data frame row subsetting in recursive function

The original code in build_nested_ranger_node() used tree[tree$nodeID == node_id, ] on every recursive call, which is O(n) per node.

Fix: Pre-extract all columns as vectors before recursion and use direct vector indexing.

Configuration Before After Round 1 Speedup
max.depth=10 2.15s 0.45s 4.8x
max.depth=20 7.75s 2.57s 3.0x
num.trees=500 10.72s 2.45s 4.4x
num.trees=1000 22.20s 5.05s 4.4x

Issue 2: Named vector lookup with as.character() conversion

After round 1, profiling showed 55% of time spent on id_to_idx[[as.character(node_id)]] - the as.character() call on every recursive iteration was expensive.

Fix: Since ranger nodeIDs are 0-indexed and sequential, use direct integer indexing node_id + 1L instead of named lookup.

Configuration After Round 1 After Round 2 Speedup
max.depth=10 0.45s 0.17s 2.6x
max.depth=20 2.57s 0.44s 5.8x
num.trees=500 2.45s 0.92s 2.7x
num.trees=1000 5.05s 1.88s 2.7x

Total improvement:

Configuration Original Final Total Speedup
max.depth=10 2.15s 0.17s 12.6x
max.depth=20 7.75s 0.44s 17.6x
num.trees=500 10.72s 0.92s 11.7x
num.trees=1000 22.20s 1.88s 11.8x

Final profile breakdown (500 trees, depth=10):

  • 26% GC (garbage collection from expression building)
  • 21% enexpr (rlang expression building)
  • 7% rlang::sym (symbol creation)
  • Remaining time distributed across inherent operations

No further obvious optimizations - remaining time is inherent to rlang expression building.


lightgbm

Issue 1: Duplicate JSON parsing

The original code parsed model$dump_model() via jsonlite::fromJSON() twice: once in parse_model.lgb.Booster() and again in extract_lgb_trees_nested(). JSON parsing took 57% of total time.

Fix: Pass feature_names from parsedmodel to extract_lgb_trees_nested() to avoid redundant JSON parsing.

Issue 2: Slow grepl() for string matching

The parse_lgb_linear_trees() function used grepl("^prefix", line) in a loop over every line of the model string.

Fix: Replace grepl("^prefix", ...) with startsWith(line, "prefix") which is faster for prefix matching.

Issue 3: Data frame column access in path building

The get_lgb_path() function used which(tree_df$split_index == current_parent_split) (O(n) per iteration) and repeated tree_df$column[[parent_row]] access.

Fix: Pre-extract columns as vectors and build a split_index to row lookup array for O(1) access.

Configuration Before After Speedup
nrounds=100 1.05s 0.62s 1.7x
nrounds=250 2.69s 1.71s 1.6x
nrounds=500 5.75s 3.71s 1.5x

randomForest

Issue: Matrix row access and repeated unname() calls

The original code used tree[node_id, ] to get a row, then called unname() on each column value (4-5 times per internal node). This caused:

  • 23.8% of time in unname() calls
  • 128 MB memory allocation from unname() alone

Fix: Pre-extract all matrix columns as vectors with unname() called once per column, then use direct integer indexing.

Configuration Before After Speedup
ntree=100 0.96s 0.56s 1.7x
ntree=250 2.73s 1.55s 1.8x
ntree=500 5.80s 3.42s 1.7x

Final profile breakdown (500 trees):

  • 34% GC (garbage collection from expression building)
  • 21% enexpr (rlang expression building)
  • 5% rlang::sym (symbol creation)
  • Remaining time distributed across inherent operations

No further obvious optimizations - remaining time is inherent to rlang expression building.


catboost

Issue: Repeated feature info lookups in oblivious tree parsing

For oblivious trees (same splits for all leaves), the original code parsed each split separately for every leaf. With depth=6 (64 leaves), each of the 6 splits was looked up 64 times per tree instead of once.

Fix: Pre-extract split info once per tree, then for each leaf only determine the direction (op) based on bit value.

Configuration Before After Speedup
iterations=100 0.18s 0.15s 1.2x
iterations=250 0.48s 0.42s 1.1x
iterations=500 0.97s 0.73s 1.3x

Note: catboost was already quite fast. Most time is spent in JSON parsing (saving model to JSON file, then reading it), which is unavoidable.


cubist

Issue: Repeated data frame subsetting with == in nested loops

The original code used coefs[coefs$rule == .x & coefs$committee == comm, ] inside nested loops, causing O(n) scans for every rule in every committee. With 100 committees and many rules, this became O(n * committees * rules_per_committee).

Fix: Pre-split data frames by committee and rule using split() once, then use direct hash lookup by key.

Configuration Before After Speedup
committees=10 0.16s 0.15s 1.1x
committees=25 0.42s 0.35s 1.2x
committees=50 1.09s 0.74s 1.5x
committees=100 2.78s 1.64s 1.7x

The speedup scales with model complexity - larger models see more benefit.


rpart

Already fast - no optimization needed. <10ms for 617 nodes.


partykit

Issue: Repeated tree traversal via nodeapply() and model[[.x]]

The original code called partykit::nodeapply(model, .x) and model[[.x]] for every single node. Each call traverses the entire tree to find that node, resulting in O(n²) complexity. 73.7% of time was spent in rid/nodeids.partynode (partykit's internal recursive tree traversal).

Fix: Extract all nodes at once using partykit::nodeapply(model, ids = all_node_ids, FUN = identity) and compute predictions using tapply() instead of per-node iteration.

Configuration Before After Speedup
maxdepth=5 (61 nodes) 0.162s 0.005s 32x
maxdepth=10 (193 nodes) 1.197s 0.021s 57x
maxdepth=15 (197 nodes) 1.212s 0.005s 242x

The speedup is dramatic because we eliminated O(n²) tree traversals.


lm, glm, glmnet, earth

All regression-based models are already fast - no optimization needed.

Model Time Size
lm 0.016s 74 coefficients
glm 0.012s 74 coefficients
glmnet 0.003s 51 non-zero coefficients
earth 0.065s 41 terms

These models are simple coefficient-based formulas with minimal computation required.


Summary

All 11 model types have been profiled. Optimizations were made to 7 models:

Model Before After Speedup Key Fix
ranger 10.72s 0.92s 11.7x Pre-extract columns, direct integer indexing
randomForest 5.80s 3.42s 1.7x Pre-extract matrix columns with unname()
lightgbm 5.75s 3.71s 1.5x Avoid duplicate JSON parsing, pre-extract columns
catboost 0.97s 0.73s 1.3x Pre-extract split info once per tree
cubist 2.78s 1.64s 1.7x Pre-split data frames by committee/rule
partykit 1.20s 0.02s 57x Extract all nodes at once, eliminate O(n²) traversals
rpart - <0.01s N/A Already fast
lm/glm/glmnet/earth - <0.07s N/A Already fast

Common optimization patterns:

  1. Pre-extract data frame/matrix columns as vectors before recursive functions
  2. Use direct integer indexing instead of named lookups with as.character()
  3. Avoid repeated O(n) scans in loops - pre-split or build lookup tables
  4. Avoid calling tree-traversal functions (like nodeapply) per-node

@EmilHvitfeldt EmilHvitfeldt merged commit 2a626c0 into main Feb 23, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant