Skip to content

Commit 37ae2e3

Browse files
committed
fix: resolve LSI dimension mismatch with native Ruby SVD
Native Ruby SVD returns transposed matrix dimensions when row_size < column_size (common case: few terms, many documents). This caused ExceptionForMatrix::ErrDimensionMismatch during classification with 10+ similar documents. Two changes: - Transpose reduced matrix when dimensions don't match input - Iterate over column_size (documents) not row_size (terms) Fixes #72
1 parent 6e43186 commit 37ae2e3

1 file changed

Lines changed: 10 additions & 2 deletions

File tree

lib/classifier/lsi.rb

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def build_index(cutoff = 0.75)
144144
tdm = Matrix.rows(tda).trans
145145
ntdm = build_reduced_matrix(tdm, cutoff)
146146

147-
ntdm.row_size.times do |col|
147+
ntdm.column_size.times do |col|
148148
next unless doc_list[col]
149149

150150
column = ntdm.column(col)
@@ -332,7 +332,15 @@ def build_reduced_matrix(matrix, cutoff = 0.75)
332332
s[ord] = 0.0 if s[ord] < s_cutoff
333333
end
334334
# Reconstruct the term document matrix, only with reduced rank
335-
u * (self.class.gsl_available ? GSL::Matrix : ::Matrix).diag(s) * v.trans
335+
result = u * (self.class.gsl_available ? GSL::Matrix : ::Matrix).diag(s) * v.trans
336+
337+
# Native Ruby SVD returns transposed dimensions when row_size < column_size
338+
# Ensure result matches input dimensions
339+
if !self.class.gsl_available && result.row_size != matrix.row_size
340+
result = result.trans
341+
end
342+
343+
result
336344
end
337345

338346
def node_for_content(item, &block)

0 commit comments

Comments
 (0)