Ensures source and target IDs are paired up in SGD by weefuzzy · Pull Request #296 · flucoma/flucoma-core

weefuzzy · 2025-03-20T22:30:14Z

fixes #292

Ok, ok, I know this looks like a massive PR, but it's not really 😄

The problem

SGD::train assumes that its source and target data are aligned, i.e. that each index represents the same ID. This not necessarily the case because we're using unordered containers by design. You can make training silently fail to do anything sensible by adding points to source and target in a different order, for instance. This is baaaaad.

But why is this diff so large, you 'orrible man?

The core of this PR is decoupling the batching and sampling of data from SGD, a bit like pytorch. I like that SGD doesn't want to know about DataSet, it just needed to go further.

So we have

some new classes that do data sampling: a base class, an implementation that properly matches source and target IDs, and a simple implementation for when data really are aligned (this opens up some cool future extensions*).
changes to SGD to remove its batching code and use this instead
supporting changes to FluidDataSet, FluidTensorView
some tests
updates to MLP clients to actually check that IDs match and use appropriate sampler

User-side, nothing much should change besides a couple of error messages: so, reviewers: please test this proposition by training against your own stuff and making sure it still works as it did.

* Like, it would now be pretty easy to enable quicker n' dirtier MLP training (resp. the rest of the data objects) with CCE buffers alone for those cases where people don't want / care about IDs. I think this would have significant workflow and teaching upsides...

tremblap · 2025-03-21T06:10:26Z

Thanks for this mammoth work! Maybe @lewardo wants to also try this in his spare time. That will have consequences for the WiP of DataSeries for which @weefuzzy has another idea IIRC but for now it would be good to also get him to read it. Also if @AlexHarker wants to try this bespoke objects for his piece from that branch that might be a good idea. I will run the test on Monday as I am in compo/impro/code residency for the next days...

tremblap · 2025-03-22T19:52:34Z

it all works here. Is it possible that it doesn't bounce as much as before, or have I been lucky?

tremblap · 2025-03-22T19:55:02Z

I think I've been lucky. In all cases it works fine.

jamesb93 · 2025-03-25T15:20:36Z

Working for me!

weefuzzy added bug Something isn't working enhancement New feature or request labels Mar 20, 2025

weefuzzy requested review from jamesb93, tedmoore and tremblap March 20, 2025 22:30

weefuzzy self-assigned this Mar 20, 2025

tremblap approved these changes Mar 22, 2025

View reviewed changes

weefuzzy added 4 commits April 5, 2025 17:08

Ensures source and target IDs are paired up in SGD

865fbba

Make code less spicy for MSVC

23af0ab

SGD: Correct validation output data source

fa39e26

warnings

0454ce0

weefuzzy force-pushed the fix/mlp_unaligned_data branch from 2aae3ba to 0454ce0 Compare April 5, 2025 16:09

TestFluidDataSet missing header

226d4e4

weefuzzy merged commit e04085f into flucoma:main Apr 6, 2025
3 checks passed

weefuzzy deleted the fix/mlp_unaligned_data branch April 8, 2025 22:57

weefuzzy mentioned this pull request Jun 25, 2025

FluidDataSet: point args in add and update are now const #311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensures source and target IDs are paired up in SGD#296

Ensures source and target IDs are paired up in SGD#296
weefuzzy merged 5 commits intoflucoma:mainfrom
weefuzzy:fix/mlp_unaligned_data

weefuzzy commented Mar 20, 2025

Uh oh!

tremblap commented Mar 21, 2025 •

edited

Loading

Uh oh!

tremblap commented Mar 22, 2025

Uh oh!

tremblap commented Mar 22, 2025

Uh oh!

jamesb93 commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

weefuzzy commented Mar 20, 2025

The problem

But why is this diff so large, you 'orrible man?

Uh oh!

tremblap commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tremblap commented Mar 22, 2025

Uh oh!

tremblap commented Mar 22, 2025

Uh oh!

jamesb93 commented Mar 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tremblap commented Mar 21, 2025 •

edited

Loading