Skip to content

Dataset5dstem torch native initial implementation (MAPED, time-series)#231

Draft
bobleesj wants to merge 8 commits into
electronmicroscopy:devfrom
bobleesj:5dstem-may-2026
Draft

Dataset5dstem torch native initial implementation (MAPED, time-series)#231
bobleesj wants to merge 8 commits into
electronmicroscopy:devfrom
bobleesj:5dstem-may-2026

Conversation

@bobleesj
Copy link
Copy Markdown
Collaborator

@bobleesj bobleesj commented May 24, 2026

What problem this PR addreseses

DRAFT - will pull commits from #228 after merged.

~200 LOC changed - 90e7331

What should the reviewer(s) do

DRAFT

from_tensor:

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp),
    sampling=(0.5, 0.5, 0.46, 0.46),
    units=['A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    series_type='time',
    series=[0, 1, 2],

from_4dstem:

dset5d = Dataset5dstem.from_4dstem(
    [Dataset4dstem], series_type='time', series=[0, 1, 2, 3, 4, 5, 6],
)

This PR is limited to single GPU for now on purpose. Next PR can focus on multi GPU implementation.

@arthurmccray
Copy link
Copy Markdown
Collaborator

arthurmccray commented May 28, 2026

Rather than as you had it:

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp),
    sampling=(0.5, 0.5, 0.46, 0.46),
    units=['A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    series_type='time',
    series=[0, 1, 2],

Wouldn't it make more sense to stick with our general DatasetNd convention and extend all the metadata to match? If you want to keep the series_type Literals that's fine, but I think that would belong in the metadata dictionary rather than as a new attribute. The options/literals for series_type should also be moved lower down, as a Dataset3d could be a time/defocus/tilt/etc. series as well.

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp), # tensor.ndim == 5 
    sampling=(10, 0.5, 0.5, 0.46, 0.46),
    units=['sec', 'A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    metadata={"series_type": "time"},
)

Inconsistent sampling values could be metadata as well, though this would quickly spiral if we have inconsistent sampling in multiple dimensions. At that point though, we should probably be using another data class?

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp), # tensor.ndim == 5
    sampling=(None, 0.5, 0.5, 0.46, 0.46), # not sure if we do checks for sampling values>0, but would want some form of Null value for sampling along an axis that is inconsistent
    units=['sec', 'A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    metadata={
        "series_type": "time",
        "series_dim": 0, # not necessary if we only allow a single non-fixed axis and if we force it to be the first
        "series_values": [0, 10, 30], # time in sec of each frame, len(`series_values`) must equal tensor.shape[`series_dim`]
    },
)

@bobleesj
Copy link
Copy Markdown
Collaborator Author

bobleesj commented May 28, 2026

@arthurmccray

Yeah, your implementation is better and more natural to have the tilt/time on the 0th axis with the following use-case you've shown:

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp), # tensor.ndim == 5 
    sampling=(10, 0.5, 0.5, 0.46, 0.46),
    units=['sec', 'A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    metadata={"series_type": "time"},
)

regarding metadata, this is where I wanted to make it easy for people to just do dset.series_type rather than dset.metadata["series_type"] which I really don't like if I ever need to do any coding by hand (last thing I want to do is recalling the keys.

Here, metadata isn't the environment that surrounds the ground truth itself, like operator, condition, location but this series_type data is what describes the ground truth itself (not really "meta") so wan't a fan of metadata for that reason.

@bobleesj
Copy link
Copy Markdown
Collaborator Author

This PR doesn't need to be reviewed anytime soon since I will be creating my own custom 5DSTEM class for my usage what works for my workflow and mental flow - primairly time and tilt at the moment with real data with Arina.

Leaving a few examples below, designing API top down based on use-cases, rather than bottom-up to prevent over abstraction:

Time (in-situ):

dset5d = Dataset5dstem.from_tensors(
    frames,                              # 12 x (256, 256, 192, 192), spilled across 2 GPUs
    sampling=(0.5, 0.5, 0.46, 0.46),     # scan + detector pitch, ONE setup, shared
    units=['A', 'A', 'mrad', 'mrad'],
    series_type='time',
    series=[0, 5, 12, 30, 60, 120, 240, 480, 600, 900, 1200, 1800],  # sec, NON-uniform
    name='LiCoO2_insitu_heating',
)

Tilt (tomography):

dset5d = Dataset5dstem.from_tensors(
    frames,                              # 41 x (128, 128, 96, 96), one GPU after binning
    sampling=(0.4, 0.4, 0.5, 0.5),       # scan + detector pitch, shared across tilts
    units=['A', 'A', 'mrad', 'mrad'],
    series_type='tilt',
    series=np.linspace(-60, 60, 41),     # degrees, uniform tilt
    name='Pt_nanoparticle_tomo',
)

MAPED:

dset5d = Dataset5dstem.from_tensors(
    frames,                              # 7 x (256, 256, 192, 192), spilled across 2 GPUs
    sampling=(0.5, 0.5, 0.46, 0.46),     # scan + detector pitch, ONE setup, shared
    units=['A', 'A', 'mrad', 'mrad'],
    series_type='tilt',
    series=[-30, -18, -5, 8, 14, 22, 33],  # degrees, NON-uniform - this is the experiment
    name='Fe3O4_maped',
)

@TomaSusi
Copy link
Copy Markdown

I would support to "stick with our general DatasetNd convention and extend all the metadata to match"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants