Skip to content

Commit 32fffc5

Browse files
authored
fix: resolve LSI dimension mismatch with native Ruby SVD (#78)
Native Ruby SVD returns transposed matrix dimensions when row_size < column_size (common case: few terms, many documents). This caused ExceptionForMatrix::ErrDimensionMismatch during classification with 10+ similar documents. Two changes: - Transpose reduced matrix when dimensions don't match input - Iterate over column_size (documents) not row_size (terms) Fixes #72
1 parent e7a33cb commit 32fffc5

File tree

2 files changed

+28
-2
lines changed

2 files changed

+28
-2
lines changed

lib/classifier/lsi.rb

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def build_index(cutoff = 0.75)
144144
tdm = Matrix.rows(tda).trans
145145
ntdm = build_reduced_matrix(tdm, cutoff)
146146

147-
ntdm.row_size.times do |col|
147+
ntdm.column_size.times do |col|
148148
next unless doc_list[col]
149149

150150
column = ntdm.column(col)
@@ -332,7 +332,13 @@ def build_reduced_matrix(matrix, cutoff = 0.75)
332332
s[ord] = 0.0 if s[ord] < s_cutoff
333333
end
334334
# Reconstruct the term document matrix, only with reduced rank
335-
u * (self.class.gsl_available ? GSL::Matrix : ::Matrix).diag(s) * v.trans
335+
result = u * (self.class.gsl_available ? GSL::Matrix : ::Matrix).diag(s) * v.trans
336+
337+
# Native Ruby SVD returns transposed dimensions when row_size < column_size
338+
# Ensure result matches input dimensions
339+
result = result.trans if !self.class.gsl_available && result.row_size != matrix.row_size
340+
341+
result
336342
end
337343

338344
def node_for_content(item, &block)

test/lsi/lsi_test.rb

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,4 +305,24 @@ def test_empty_word_hash_handling
305305

306306
refute_predicate lsi, :needs_rebuild?
307307
end
308+
309+
def test_large_similar_document_sets
310+
# Regression test for issue #72
311+
# When many similar documents create few unique terms (M < N),
312+
# native Ruby SVD returns transposed dimensions causing ErrDimensionMismatch
313+
lsi = Classifier::LSI.new auto_rebuild: false
314+
315+
10.times do |i|
316+
lsi.add_item "This text deals with dogs. Dogs number #{i}.", 'Dog'
317+
end
318+
10.times do |i|
319+
lsi.add_item "This text deals with cats. Cats number #{i}.", 'Cat'
320+
end
321+
322+
lsi.build_index
323+
324+
result = lsi.classify('Dogs are great pets')
325+
326+
assert_equal 'Dog', result
327+
end
308328
end

0 commit comments

Comments
 (0)