Use standardized parameter estimates for metrics instead of unstandardized ones#1337
Use standardized parameter estimates for metrics instead of unstandardized ones#1337tsalo wants to merge 17 commits intoME-ICA:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1337 +/- ##
==========================================
- Coverage 90.30% 90.06% -0.25%
==========================================
Files 30 30
Lines 4726 4750 +24
Branches 790 797 +7
==========================================
+ Hits 4268 4278 +10
- Misses 309 318 +9
- Partials 149 154 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request updates tedana to use standardized parameter estimates (map weight) instead of unstandardized parameter estimates (map optcom betas) for calculating several key metrics. This change addresses issue #1336 which identified that unstandardized parameter estimates are confounded by data variance scaling.
Changes:
- Modified metric dependency configuration to use standardized parameter estimates for variance explained, map beta T2 clusterized, and map beta S0 clusterized calculations
- Updated code to ensure both standardized and unstandardized parameter estimate maps are always calculated (the latter is still needed for visualization and percent signal change)
- Updated documentation terminology to use more precise language about parameter estimates
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tedana/resources/config/metrics.json | Updated three metric dependencies to use map weight (standardized) instead of map optcom betas (unstandardized) |
| tedana/metrics/collect.py | Modified metrics initialization to ensure both map types are calculated; updated function calls and log messages to use standardized estimates |
| docs/outputs.rst | Updated terminology from "beta coefficients" to "parameter estimates" in component visualization description |
| docs/denoising.rst | Changed variable names from "betas" to "pes" for clarity in example code |
| docs/approach.rst | Updated documentation to clarify when unstandardized vs standardized parameter estimates are used |
| tedana/tests/test_integration.py | Updated expected classification from "Likely BOLD" to "Accept borderline" due to metric calculation changes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
eurunuela
left a comment
There was a problem hiding this comment.
These look good to me, thanks. That said, I have not tried running it on data. I wonder how much it changes the classification, and therefore the denoising. Have you seen big changes? If so, it might be worth posting some comparison figures here.
|
I did compare metrics on some data I have locally. There were two components out of 88 where the classification changed from rejected to accepted, but that's it for classification changes. The variance explained and Dice metrics did change. I think the change to the variance explained measure makes sense since I don't think variance explained (really "coefficient energy") should be conflated with signal amplitude. I dug into the Dice metrics as well. As far as I can tell, Dice similarity drops down substantially when using standardized parameter estimates because the unstandardized parameter estimate maps, the F-T2 maps, and the F-S0 maps are all more confounded with the voxel-wise scale (standard deviation) of the data, so they end up having higher values in higher-variance areas. |
|
I'm a bit hesitant to change a key parameter without giving users a way to compare. I'm fine making this a breaking change and making this change the default, but maybe retain a way to calculate and save the older metrics. This would also require clear changes and explanations in the documentation. |
I agree. I don't love having a metric that doesn't make sense, but given how much using a more appropriate metric changes things, it seems like we still need to support the old one. I can have the original ones called |
Maybe we can also put implementations of all the current trees with the current metrics onto figshare so that it's easy for someone to replicate older analyses without needing to confuse our main code base with a growing number of built-in trees. For our next release notes and mass email, I'd like to be able to say, if you want to replicate any of the previous trees use Once you have the rest of this done, I can do the figshare part. We'll need to update figshare anyway because any change in default trees should be paired with a new version number on our figshare repo. |
Except in tedana_orig.
This reverts commit d2d67bb.
|
I've reduced the changes to (1) add warnings when using the deprecated metrics and (2) add new metrics that use the standardized parameter estimates. The bad metrics are still incorrectly named (i.e., using "betas", which implies standardized parameter estimates instead), but I'm not sure there's a good way around that while still keeping things working as they did in the past. |
|
Now that the PR is streamlined, we can think about how to make breaking changes that don't break things too badly. @handwerkerd I think stashing decision trees is a good idea. That leaves us with the opportunity to change things in the built-in decision trees in the repo. Which decision trees are you comfortable with me changing? My first thought was to update all of them except |
I think The big Q is what we want the default to be. The default is Fine with changing the metrics in The core issue here, and it will keep coming up is that it is hard to predict how trees will develop. We don't have a naming scheme and I don't think there are just too many variables for how things may or may not change in the future to make a naming scheme we'd be able to use consistently. |

Closes #1336.
Changes proposed in this pull request:
I might need to figure out alternatives to
varex kappa ratio(which is calculated within the decision trees) andd_table_score(both calculated outside and within decision trees).