`Dataset5dstem` torch native initial implementation (MAPED, time-series) by bobleesj · Pull Request #231 · electronmicroscopy/quantem

bobleesj · 2026-05-24T22:48:37Z

What problem this PR addreseses

DRAFT - will pull commits from #228 after merged.

~200 LOC changed - 90e7331

What should the reviewer(s) do

DRAFT

from_tensor:

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp),
    sampling=(0.5, 0.5, 0.46, 0.46),
    units=['A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    series_type='time',
    series=[0, 1, 2],

from_4dstem:

dset5d = Dataset5dstem.from_4dstem(
    [Dataset4dstem], series_type='time', series=[0, 1, 2, 3, 4, 5, 6],
)

This PR is limited to single GPU for now on purpose. Next PR can focus on multi GPU implementation.

arthurmccray · 2026-05-28T21:33:55Z

Rather than as you had it:

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp),
    sampling=(0.5, 0.5, 0.46, 0.46),
    units=['A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    series_type='time',
    series=[0, 1, 2],

Wouldn't it make more sense to stick with our general DatasetNd convention and extend all the metadata to match? If you want to keep the series_type Literals that's fine, but I think that would belong in the metadata dictionary rather than as a new attribute. The options/literals for series_type should also be moved lower down, as a Dataset3d could be a time/defocus/tilt/etc. series as well.

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp), # tensor.ndim == 5 
    sampling=(10, 0.5, 0.5, 0.46, 0.46),
    units=['sec', 'A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    metadata={"series_type": "time"},
)

Inconsistent sampling values could be metadata as well, though this would quickly spiral if we have inconsistent sampling in multiple dimensions. At that point though, we should probably be using another data class?

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp), # tensor.ndim == 5
    sampling=(None, 0.5, 0.5, 0.46, 0.46), # not sure if we do checks for sampling values>0, but would want some form of Null value for sampling along an axis that is inconsistent
    units=['sec', 'A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    metadata={
        "series_type": "time",
        "series_dim": 0, # not necessary if we only allow a single non-fixed axis and if we force it to be the first
        "series_values": [0, 10, 30], # time in sec of each frame, len(`series_values`) must equal tensor.shape[`series_dim`]
    },
)

bobleesj · 2026-05-28T22:18:58Z

@arthurmccray

Yeah, your implementation is better and more natural to have the tilt/time on the 0th axis with the following use-case you've shown:

dset5d = Dataset5dstem.from_tensor(
    torch.from_dlpack(data_cp), # tensor.ndim == 5 
    sampling=(10, 0.5, 0.5, 0.46, 0.46),
    units=['sec', 'A', 'A', 'mrad', 'mrad'],
    name='gold_3frame_stack',
    metadata={"series_type": "time"},
)

regarding metadata, this is where I wanted to make it easy for people to just do dset.series_type rather than dset.metadata["series_type"] which I really don't like if I ever need to do any coding by hand (last thing I want to do is recalling the keys.

Here, metadata isn't the environment that surrounds the ground truth itself, like operator, condition, location but this series_type data is what describes the ground truth itself (not really "meta") so wan't a fan of metadata for that reason.

bobleesj · 2026-05-29T02:16:32Z

This PR doesn't need to be reviewed anytime soon since I will be creating my own custom 5DSTEM class for my usage what works for my workflow and mental flow - primairly time and tilt at the moment with real data with Arina.

Leaving a few examples below, designing API top down based on use-cases, rather than bottom-up to prevent over abstraction:

Time (in-situ):

dset5d = Dataset5dstem.from_tensors(
    frames,                              # 12 x (256, 256, 192, 192), spilled across 2 GPUs
    sampling=(0.5, 0.5, 0.46, 0.46),     # scan + detector pitch, ONE setup, shared
    units=['A', 'A', 'mrad', 'mrad'],
    series_type='time',
    series=[0, 5, 12, 30, 60, 120, 240, 480, 600, 900, 1200, 1800],  # sec, NON-uniform
    name='LiCoO2_insitu_heating',
)

Tilt (tomography):

dset5d = Dataset5dstem.from_tensors(
    frames,                              # 41 x (128, 128, 96, 96), one GPU after binning
    sampling=(0.4, 0.4, 0.5, 0.5),       # scan + detector pitch, shared across tilts
    units=['A', 'A', 'mrad', 'mrad'],
    series_type='tilt',
    series=np.linspace(-60, 60, 41),     # degrees, uniform tilt
    name='Pt_nanoparticle_tomo',
)

MAPED:

dset5d = Dataset5dstem.from_tensors(
    frames,                              # 7 x (256, 256, 192, 192), spilled across 2 GPUs
    sampling=(0.5, 0.5, 0.46, 0.46),     # scan + detector pitch, ONE setup, shared
    units=['A', 'A', 'mrad', 'mrad'],
    series_type='tilt',
    series=[-30, -18, -5, 8, 14, 22, 33],  # degrees, NON-uniform - this is the experiment
    name='Fe3O4_maped',
)

TomaSusi · 2026-05-29T10:30:18Z

I would support to "stick with our general DatasetNd convention and extend all the metadata to match"

bobleesj added 8 commits May 18, 2026 22:36

dataset4d, dataset4dstem hold torch array

98ddd0d

bring original docstring back

d2dc32b

remove need for cached numpy array

6b7319d

further cleaup api docstring

7f9913f

use _array _tensor duck typing for show4dstem

9c93ae9

use row, col convention in docstring

40878d3

fix: tolerate missing _tensor slot on autoserialize-loaded datasets

21b684f

feat: add Dataset5dstem for torch-backed 5D-STEM series

90e7331

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Dataset5dstem` torch native initial implementation (MAPED, time-series)#231

`Dataset5dstem` torch native initial implementation (MAPED, time-series)#231
bobleesj wants to merge 8 commits into
electronmicroscopy:devfrom
bobleesj:5dstem-may-2026

bobleesj commented May 24, 2026 •

edited

Loading

Uh oh!

arthurmccray commented May 28, 2026 •

edited

Loading

Uh oh!

bobleesj commented May 28, 2026 •

edited

Loading

Uh oh!

bobleesj commented May 29, 2026

Uh oh!

TomaSusi commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bobleesj commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem this PR addreseses

What should the reviewer(s) do

Uh oh!

arthurmccray commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobleesj commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bobleesj commented May 29, 2026

Uh oh!

TomaSusi commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bobleesj commented May 24, 2026 •

edited

Loading

arthurmccray commented May 28, 2026 •

edited

Loading

bobleesj commented May 28, 2026 •

edited

Loading