Summary
This issue proposes decoupling fMRIPrep from sMRIPrep by adopting a precomputed-first strategy:
- fMRIPrep first checks if all sMRIPrep fit derivatives are already present (via
--derivatives). If so, it populates buffer nodes and skips all sMRIPrep computation.
- If the fit derivatives are incomplete, fMRIPrep falls back to calling
init_anat_fit_wf exactly as today, passing any partial cache as precomputed=.
This would remove ~200 lines of fragile inline orchestration from init_single_subject_wf, make the sMRIPrep output contract explicit, and enable the "run sMRIPrep once, fMRIPrep many times" workflow for multi-task/multi-session datasets.
Current coupling: three layers
Layer 1: Workflow embedding (tight)
fMRIPrep imports init_anat_fit_wf() and embeds it directly into its Nipype DAG (call site at L353-L371), passing 16 configuration kwargs. The returned pe.Workflow is a sub-workflow of fMRIPrep's subject graph — they share the same execution context.
Additionally, clean_datasinks() patches every DataSink in the entire workflow tree to remove out_path_base, forcing all sMRIPrep outputs into fMRIPrep's output directory. This is a maintenance hazard: any new DataSink added in sMRIPrep is silently patched. It is called from L617 and L941.
Layer 2: Transform-stage orchestration (~200 lines — the main cost)
Beyond init_anat_fit_wf, fMRIPrep imports and assembles ~10 sMRIPrep transform-stage workflows inline at level == 'full'. These are the most fragile lines in fMRIPrep — they break whenever sMRIPrep changes workflow signatures:
| Code block |
Lines |
What it does |
| Template iterator + standard-space volumes |
29 |
init_template_iterator_wf + init_ds_anat_volumes_wf — resample T1w/mask/dseg/tpms into each standard space |
| Surface derivatives + outputs |
39 |
init_surface_derivatives_wf + init_ds_surfaces_wf + init_ds_surface_metrics_wf + init_ds_fs_segs_wf — inflated surfaces, curvature, aparc, aseg |
| CIFTI morphometrics pipeline |
111 |
init_gifti_morphometrics_wf + init_hcp_morphometrics_wf + init_morph_grayords_wf + init_resample_surfaces_wf + init_ds_grayord_metrics_wf + init_ds_surfaces_wf |
Layer 3: Utility imports (loose, acceptable)
Other sMRIPrep imports across fMRIPrep:
What fMRIPrep consumes from sMRIPrep
Every output field flowing from anat_fit_wf.outputnode into downstream workflows (connection block at L882-L900 for BOLD, plus L427-L445 for standard space, L474-L498 for surfaces, L556-L614 for CIFTI):
Tier 1 — Core volumetric (always needed):
t1w_preproc, t1w_mask, t1w_dseg, t1w_tpms, t1w_valid_list
Tier 2 — FreeSurfer / surfaces (when run_reconall):
subjects_dir, subject_id, fsnative2t1w_xfm, white, pial, midthickness, thickness, sulc, curv, anat_ribbon, cortex_mask, sphere_reg_fsLR/sphere_reg_msm
Tier 3 — Standard space and CIFTI (at level='full'):
template, anat2std_xfm, std2anat_xfm, std_t1w, std_mask, midthickness_fsLR, curv_fsLR, thickness_fsLR, sulc_fsLR
Critical observation: There is no formal contract specification. The coupling is encoded purely in Nipype connection tuples scattered across base.py. A field rename in sMRIPrep silently breaks fMRIPrep.
Proposed architecture
The two-phase design
Phase 1 — Precomputed-first lookup: fMRIPrep checks for existing sMRIPrep derivatives (via --derivatives or --anat-derivatives). Using smriprep.utils.bids.collect_derivatives(), it populates an anatomical_cache dict. If the cache contains all fields required by the current configuration (level, run_reconall, spaces, cifti_output, msm_sulc), fMRIPrep skips init_anat_fit_wf entirely and feeds the cache into buffer nodes.
Phase 2 — Fallback to sMRIPrep: If the cache is incomplete, fMRIPrep calls init_anat_fit_wf as today, with the partial cache as precomputed=. This preserves the single-command user experience — fmriprep /data /out participant continues to work without pre-running sMRIPrep.
Targeted edits in init_single_subject_wf
1. Cache collection + validation (enhance L269-L283)
Currently ~15 lines collecting the cache. Add a validation function (~40 lines) that checks whether the cache satisfies the current configuration:
anatomical_cache = collect_and_validate_anat_cache(
derivatives=config.execution.derivatives,
subject_id=subject_id,
session_id=session_id,
spaces=spaces,
freesurfer=freesurfer,
msm_sulc=msm_sulc,
cifti_output=config.workflow.cifti_output,
level=config.workflow.level,
)
anat_cache_complete = _is_anat_cache_complete(anatomical_cache, ...)
Validate outputs, not settings: if the right files exist in the right spaces, they are usable regardless of the parameters that produced them.
2. Conditional anat_fit_wf (replace L353-L371)
if anat_cache_complete:
# Buffer-only: no sMRIPrep computation
anat_fit_wf = _init_anat_buffer_wf(anatomical_cache, name='anat_fit_wf')
else:
# Fallback: run sMRIPrep, passing partial cache
from smriprep.workflows.anatomical import init_anat_fit_wf
anat_fit_wf = init_anat_fit_wf(
...same 16 kwargs...,
precomputed=anatomical_cache,
)
The _init_anat_buffer_wf function (~30 lines) returns a trivial workflow whose outputnode mirrors init_anat_fit_wf's outputnode, populated from the cache.
3. REMOVE inline transform-stage orchestration (L413-L614)
This is the big win. These ~200 lines assemble 10+ sMRIPrep workflows with intricate connections. Under decoupling, all of these become sMRIPrep's responsibility when run standalone. fMRIPrep would only retain:
These are not anatomical outputs — they are transform selectors that thread spatial mappings into the functional pipeline. They should stay.
4. REMOVE clean_datasinks() (L1022-L1027)
With sMRIPrep writing to its own derivatives directory, there is no need to patch out_path_base. fMRIPrep would instead add sMRIPrep's output as a DatasetLink for BIDS-URI provenance. Remove both call sites at L617 and L941.
5. BOLD connections stay unchanged (L882-L939)
The connections from anat_fit_wf.outputnode to bold_wf.inputnode remain identical. The buffer workflow presents the same outputnode interface as the full init_anat_fit_wf.
6. Remaining sMRIPrep imports to address
| File |
Import |
Action |
bold/base.py:678 |
init_resample_surfaces_wf (for non-CIFTI surface outputs) |
Could be buffered if sMRIPrep emits midthickness_{template}_{density} |
bold/resampling.py:884,961 |
smriprep.data (atlas ROI files for CIFTI) |
Atlas ROIs should be moved to a shared location or looked up from the sMRIPrep derivatives |
interfaces/reports.py:43 |
ReconAll (for reports) |
This is a lightweight interface import; acceptable to keep |
interfaces/bids.py:184 |
stringify_sessions |
Utility; acceptable to keep |
Code simplification assessment
| Section |
Current lines |
After decoupling |
Change |
| Cache collection + validation |
~15 |
~60 |
+45 (new validation + buffer function) |
anat_fit_wf call |
~19 |
~25 |
+6 (conditional + fallback) |
| Transform-stage orchestration (L413-L614) |
~200 |
~0 |
-200 |
clean_datasinks() + calls |
~8 |
~0 |
-8 |
| sMRIPrep import block (L173-L185) |
~13 |
~3 |
-10 (keep init_anat_fit_wf, collect_derivatives, stringify_sessions) |
| Net |
~255 |
~88 |
-167 lines, -27% of init_single_subject_wf |
The 200 most complex lines are eliminated — the ones that assemble 10+ sMRIPrep workflows with intricate connection tuples that break whenever sMRIPrep's function signatures change.
Prerequisites (changes needed in sMRIPrep)
-
sMRIPrep standalone must emit ALL outputs fMRIPrep needs — including surface derivatives, fsLR-resampled surfaces, CIFTI morphometrics, and cortex masks. Currently, fMRIPrep builds these transform-stage workflows itself because init_anat_preproc_wf does not cover everything.
-
Add --level flag to sMRIPrep CLI — --level minimal (fit only, fast, deterministic, reusable) vs --level full (fit + all transforms). This mirrors fMRIPrep's existing level gating.
-
Expand collect_derivatives() to cover all fields — including midthickness_fsLR, sphere_reg_fsLR/sphere_reg_msm, cortex_mask, anat_ribbon, CIFTI morphometric dscalars.
-
Standardize FreeSurfer output path — fMRIPrep's BBR registration (init_bbreg_wf) needs a live SUBJECTS_DIR with the full reconstruction (not just output surfaces). sMRIPrep should place this at a standard path (sourcedata/freesurfer/).
-
Version stamp in dataset_description.json — GeneratedBy with sMRIPrep version and container info for compatibility checking.
Risks and mitigations
| Risk |
Mitigation |
| Version mismatch between sMRIPrep producer and fMRIPrep consumer |
Validate outputs, not settings. If the required files exist, they are usable. Check GeneratedBy in dataset_description.json for compatibility warnings. |
Configuration divergence (e.g., sMRIPrep run without MNI152NLin2009cAsym but fMRIPrep needs it for carpetplots) |
_is_anat_cache_complete() checks that needed normalizations/surfaces exist. If missing, fail with a clear error or fall back to running sMRIPrep. |
| FreeSurfer directory completeness |
collect_derivatives() validates the FreeSurfer dir exists and contains key files (mri/brain.mgz, surf/lh.white, etc.) |
| Single-command UX regression |
The fallback path ensures fmriprep /data /out participant continues to work without pre-running sMRIPrep. Decoupling is an optimization, not a requirement. |
| Transform-stage boundary is fuzzy (surface derivatives involve computation, not just file writes) |
sMRIPrep owns these computations when run standalone. The contract is "provide these BIDS-Derivatives files," not "run these specific workflows." |
Recommended phasing
| Phase |
Scope |
Work |
| 1 |
sMRIPrep only |
Add --level, emit all outputs fMRIPrep needs standalone, expand collect_derivatives(), standardize FreeSurfer path |
| 2 |
fMRIPrep only |
Add _is_anat_cache_complete() + _init_anat_buffer_wf(), make anat_fit_wf conditional, remove inline transform orchestration (L413-L614), remove clean_datasinks() (L1022-L1027), add DatasetLink for sMRIPrep |
| 3 |
Integration |
Integration tests with synthetic sMRIPrep derivative fixtures, contract documentation, user-facing documentation for the two-step workflow |
Phase 1 is backwards-compatible: fMRIPrep continues to work unchanged until Phase 2 is implemented.
Summary
This issue proposes decoupling fMRIPrep from sMRIPrep by adopting a precomputed-first strategy:
--derivatives). If so, it populates buffer nodes and skips all sMRIPrep computation.init_anat_fit_wfexactly as today, passing any partial cache asprecomputed=.This would remove ~200 lines of fragile inline orchestration from
init_single_subject_wf, make the sMRIPrep output contract explicit, and enable the "run sMRIPrep once, fMRIPrep many times" workflow for multi-task/multi-session datasets.Current coupling: three layers
Layer 1: Workflow embedding (tight)
fMRIPrep imports
init_anat_fit_wf()and embeds it directly into its Nipype DAG (call site at L353-L371), passing 16 configuration kwargs. The returnedpe.Workflowis a sub-workflow of fMRIPrep's subject graph — they share the same execution context.Additionally,
clean_datasinks()patches every DataSink in the entire workflow tree to removeout_path_base, forcing all sMRIPrep outputs into fMRIPrep's output directory. This is a maintenance hazard: any new DataSink added in sMRIPrep is silently patched. It is called from L617 and L941.Layer 2: Transform-stage orchestration (~200 lines — the main cost)
Beyond
init_anat_fit_wf, fMRIPrep imports and assembles ~10 sMRIPrep transform-stage workflows inline atlevel == 'full'. These are the most fragile lines in fMRIPrep — they break whenever sMRIPrep changes workflow signatures:init_template_iterator_wf+init_ds_anat_volumes_wf— resample T1w/mask/dseg/tpms into each standard spaceinit_surface_derivatives_wf+init_ds_surfaces_wf+init_ds_surface_metrics_wf+init_ds_fs_segs_wf— inflated surfaces, curvature, aparc, aseginit_gifti_morphometrics_wf+init_hcp_morphometrics_wf+init_morph_grayords_wf+init_resample_surfaces_wf+init_ds_grayord_metrics_wf+init_ds_surfaces_wfLayer 3: Utility imports (loose, acceptable)
Other sMRIPrep imports across fMRIPrep:
base.py:100stringify_sessionsbase.py:271collect_derivatives(ascollect_anat_derivatives)base.py:505TemplateFlowSelectbold/base.py:678init_resample_surfaces_wf(for non-CIFTI surface outputs)bold/resampling.py:884,961smriprep.data(atlas ROI files)interfaces/reports.py:43ReconAll(for reports)interfaces/bids.py:184stringify_sessionsWhat fMRIPrep consumes from sMRIPrep
Every output field flowing from
anat_fit_wf.outputnodeinto downstream workflows (connection block at L882-L900 for BOLD, plus L427-L445 for standard space, L474-L498 for surfaces, L556-L614 for CIFTI):Tier 1 — Core volumetric (always needed):
t1w_preproc,t1w_mask,t1w_dseg,t1w_tpms,t1w_valid_listTier 2 — FreeSurfer / surfaces (when
run_reconall):subjects_dir,subject_id,fsnative2t1w_xfm,white,pial,midthickness,thickness,sulc,curv,anat_ribbon,cortex_mask,sphere_reg_fsLR/sphere_reg_msmTier 3 — Standard space and CIFTI (at
level='full'):template,anat2std_xfm,std2anat_xfm,std_t1w,std_mask,midthickness_fsLR,curv_fsLR,thickness_fsLR,sulc_fsLRCritical observation: There is no formal contract specification. The coupling is encoded purely in Nipype connection tuples scattered across
base.py. A field rename in sMRIPrep silently breaks fMRIPrep.Proposed architecture
The two-phase design
Phase 1 — Precomputed-first lookup: fMRIPrep checks for existing sMRIPrep derivatives (via
--derivativesor--anat-derivatives). Usingsmriprep.utils.bids.collect_derivatives(), it populates ananatomical_cachedict. If the cache contains all fields required by the current configuration (level,run_reconall,spaces,cifti_output,msm_sulc), fMRIPrep skipsinit_anat_fit_wfentirely and feeds the cache into buffer nodes.Phase 2 — Fallback to sMRIPrep: If the cache is incomplete, fMRIPrep calls
init_anat_fit_wfas today, with the partial cache asprecomputed=. This preserves the single-command user experience —fmriprep /data /out participantcontinues to work without pre-running sMRIPrep.Targeted edits in
init_single_subject_wf1. Cache collection + validation (enhance L269-L283)
Currently ~15 lines collecting the cache. Add a validation function (~40 lines) that checks whether the cache satisfies the current configuration:
Validate outputs, not settings: if the right files exist in the right spaces, they are usable regardless of the parameters that produced them.
2. Conditional
anat_fit_wf(replace L353-L371)The
_init_anat_buffer_wffunction (~30 lines) returns a trivial workflow whoseoutputnodemirrorsinit_anat_fit_wf's outputnode, populated from the cache.3. REMOVE inline transform-stage orchestration (L413-L614)
This is the big win. These ~200 lines assemble 10+ sMRIPrep workflows with intricate connections. Under decoupling, all of these become sMRIPrep's responsibility when run standalone. fMRIPrep would only retain:
select_MNI2009c_xfmKeySelect — feeds into BOLD carpetplots (L920-L925)select_MNI6_xfm/select_MNI6_tplnodes — feed into BOLD CIFTI resampling (L932-L934)These are not anatomical outputs — they are transform selectors that thread spatial mappings into the functional pipeline. They should stay.
4. REMOVE
clean_datasinks()(L1022-L1027)With sMRIPrep writing to its own derivatives directory, there is no need to patch
out_path_base. fMRIPrep would instead add sMRIPrep's output as a DatasetLink for BIDS-URI provenance. Remove both call sites at L617 and L941.5. BOLD connections stay unchanged (L882-L939)
The connections from
anat_fit_wf.outputnodetobold_wf.inputnoderemain identical. The buffer workflow presents the same outputnode interface as the fullinit_anat_fit_wf.6. Remaining sMRIPrep imports to address
bold/base.py:678init_resample_surfaces_wf(for non-CIFTI surface outputs)midthickness_{template}_{density}bold/resampling.py:884,961smriprep.data(atlas ROI files for CIFTI)interfaces/reports.py:43ReconAll(for reports)interfaces/bids.py:184stringify_sessionsCode simplification assessment
anat_fit_wfcallclean_datasinks()+ callsinit_anat_fit_wf,collect_derivatives,stringify_sessions)init_single_subject_wfThe 200 most complex lines are eliminated — the ones that assemble 10+ sMRIPrep workflows with intricate connection tuples that break whenever sMRIPrep's function signatures change.
Prerequisites (changes needed in sMRIPrep)
sMRIPrep standalone must emit ALL outputs fMRIPrep needs — including surface derivatives, fsLR-resampled surfaces, CIFTI morphometrics, and cortex masks. Currently, fMRIPrep builds these transform-stage workflows itself because
init_anat_preproc_wfdoes not cover everything.Add
--levelflag to sMRIPrep CLI —--level minimal(fit only, fast, deterministic, reusable) vs--level full(fit + all transforms). This mirrors fMRIPrep's existing level gating.Expand
collect_derivatives()to cover all fields — includingmidthickness_fsLR,sphere_reg_fsLR/sphere_reg_msm,cortex_mask,anat_ribbon, CIFTI morphometric dscalars.Standardize FreeSurfer output path — fMRIPrep's BBR registration (
init_bbreg_wf) needs a liveSUBJECTS_DIRwith the full reconstruction (not just output surfaces). sMRIPrep should place this at a standard path (sourcedata/freesurfer/).Version stamp in
dataset_description.json—GeneratedBywith sMRIPrep version and container info for compatibility checking.Risks and mitigations
GeneratedByindataset_description.jsonfor compatibility warnings.MNI152NLin2009cAsymbut fMRIPrep needs it for carpetplots)_is_anat_cache_complete()checks that needed normalizations/surfaces exist. If missing, fail with a clear error or fall back to running sMRIPrep.collect_derivatives()validates the FreeSurfer dir exists and contains key files (mri/brain.mgz,surf/lh.white, etc.)fmriprep /data /out participantcontinues to work without pre-running sMRIPrep. Decoupling is an optimization, not a requirement.Recommended phasing
--level, emit all outputs fMRIPrep needs standalone, expandcollect_derivatives(), standardize FreeSurfer path_is_anat_cache_complete()+_init_anat_buffer_wf(), makeanat_fit_wfconditional, remove inline transform orchestration (L413-L614), removeclean_datasinks()(L1022-L1027), add DatasetLink for sMRIPrepPhase 1 is backwards-compatible: fMRIPrep continues to work unchanged until Phase 2 is implemented.