Skip to content

feat: introduce stack-allocated PyBuffer#5894

Open
winstxnhdw wants to merge 12 commits intoPyO3:mainfrom
winstxnhdw:feat/stacked-pybuffer
Open

feat: introduce stack-allocated PyBuffer#5894
winstxnhdw wants to merge 12 commits intoPyO3:mainfrom
winstxnhdw:feat/stacked-pybuffer

Conversation

@winstxnhdw
Copy link
Copy Markdown

Summary

This PR implements a pinned stack-allocated PyUntypedBuffer variant, PyUntypedBufferView.

Closes #5836

Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this. Various thoughts around the accessor methods.

Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.

/// Gets the size of a single element, in bytes.
#[inline]
pub fn item_size(&self) -> usize {
self.raw.itemsize as usize
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting note from the Python docs for itemsize

If shape is NULL as a result of a PyBUF_SIMPLE or a PyBUF_WRITABLE request, the consumer must disregard itemsize and assume itemsize == 1.

I am unsure whether it's better to have a corresponding check for shape here and return 1 if needed, or to return None if shape is None, or to just keep the implementation as-is and document this for users.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of returning 1. Also, instead of doing the NULL check inside the method, we could probably do it in with() instead so that the check is only done once.

@winstxnhdw
Copy link
Copy Markdown
Author

winstxnhdw commented Mar 20, 2026

Also, out of scope for this PR, but I keep wondering if we should provide iterators for these structures. Especially with the strides / suboffsets etc, it's not necessarily trivial to get this right.

Yes, I think it would definitely be useful as well. I am not yet sure how this would look like though.

Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work here, looking great!

Implementing Drop directly on PyUntypedBufferView is functionally equivalent to what I had in mind from the "drop guard", so gets a 👍 from me.

Afraid I had quite a few more thoughts around some of the edge cases (plus some ideas we can probably ignore). Hopefully helps us get to the right eventual abstraction!

@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch 3 times, most recently from 5430e96 to 48b5fdd Compare March 22, 2026 18:11
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 24, 2026

Merging this PR will not alter performance

✅ 105 untouched benchmarks
⏩ 1 skipped benchmark1


Comparing winstxnhdw:feat/stacked-pybuffer (c1872f3) with main (cd87dcf)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch from b4f1eab to dd63129 Compare March 27, 2026 15:03
src/buffer.rs Outdated
impl_element!(f64, Float);

/// Sealed marker for buffer field availability. Either [`Known`] or [`Unknown`].
mod buffer_info {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be moved to another file or moved to the top level.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually already have pyo3::pyclass::boolean_struct::True / False which could potentially be deduplicated with these types.

It might be possible to also use const FORMAT: bool etc as const-generics instead of types. Not sure if that offers any practical benefit here. I think it's not possible to provide defaults for const generics, so it might work out strictly worse I guess?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to also use const FORMAT: bool etc as const-generics instead of types.

You're right. It works, and it's a lot cleaner. It's possible to set defaults for const generics.

@winstxnhdw winstxnhdw requested a review from davidhewitt March 30, 2026 10:09
Copy link
Copy Markdown
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

After reading through this implementation a couple of times, I really like the expressiveness this gives in the type system. I have some ideas which might refine it further (see BufferFlags struct idea).

Main concern is about generic code bloat from explosion of generic parameters. I am unsure if there's a way that we can mitigate that at all.

src/buffer.rs Outdated
impl_element!(f64, Float);

/// Sealed marker for buffer field availability. Either [`Known`] or [`Unknown`].
mod buffer_info {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually already have pyo3::pyclass::boolean_struct::True / False which could potentially be deduplicated with these types.

It might be possible to also use const FORMAT: bool etc as const-generics instead of types. Not sure if that offers any practical benefit here. I think it's not possible to provide defaults for const generics, so it might work out strictly worse I guess?

src/buffer.rs Outdated
Comment on lines +933 to +944
impl<Format: FieldInfo, Stride: FieldInfo> PyUntypedBufferView<Format, Known, Stride> {
/// Returns the shape array. `shape[i]` is the length of dimension `i`.
///
/// Despite Python using an array of signed integers, the values are guaranteed to be
/// non-negative. However, dimensions of length 0 are possible and might need special
/// attention.
#[inline]
pub fn shape(&self) -> &[usize] {
debug_assert!(!self.raw.shape.is_null());
unsafe { slice::from_raw_parts(self.raw.shape.cast(), self.raw.ndim as usize) }
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's potentially also valid to have PyUntypedBufferView<Format, Unknown, Stride> implement a shape() accessor which returns Option<&[usize]>. I am unsure if that's of any practical value, it seems to me that it would be expected to always return None.

(Maybe similar observation for format / stride parameters.)

Copy link
Copy Markdown
Author

@winstxnhdw winstxnhdw Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see this being useful, and I can't think of a good way to implement this. I am assuming this is necessary for someone who wants to resolve the flags dynamically at runtime? I don’t think it’s worth supporting that. Such users should just request for all the flags.

src/buffer.rs Outdated
#[repr(transparent)]
pub struct PyBufferView<
T,
Format: FieldInfo = Known,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the typed buffer view always needs to know format, this parameter might not be necessary (however I guess is useful for consistency).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to add another flag type, which isn't worth it imo.

Comment on lines +901 to +930
/// Attempt to interpret this untyped view as containing elements of type `T`.
pub fn as_typed<T: Element>(&self) -> PyResult<&PyBufferView<T, Known, Shape, Stride>> {
self.ensure_compatible_with::<T>()?;
// SAFETY: PyBufferView<T, ..> is repr(transparent) around PyUntypedBufferView<..>
let typed = unsafe {
NonNull::from(self)
.cast::<PyBufferView<T, Known, Shape, Stride>>()
.as_ref()
};

Ok(typed)
}

fn ensure_compatible_with<T: Element>(&self) -> PyResult<()> {
let name = std::any::type_name::<T>();

if mem::size_of::<T>() != self.item_size() || !T::is_compatible_format(self.format()) {
return Err(PyBufferError::new_err(format!(
"buffer contents are not compatible with {name}"
)));
}

if self.raw.buf.align_offset(mem::align_of::<T>()) != 0 {
return Err(PyBufferError::new_err(format!(
"buffer contents are insufficiently aligned for {name}"
)));
}

Ok(())
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside of the high numbers of generic parameters is that these methods are monomorphised many times over during codegen. We might want to have fn inner internal methods which are only generic on T for these larger methods. Small ones probably aren't worth applying such complexity for, I don't have a good instinct for a cut-off threshold on that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the methods are pretty short. I guess ensure_compatible_with is a good candidate?

@davidhewitt
Copy link
Copy Markdown
Member

Thanks for all of this - would be interested to see what you think of these suggestions (none are mandatory, I'm just pushing out ideas of what I think I like, which may not be to others' taste!)

@davidhewitt
Copy link
Copy Markdown
Member

#5870 has now merged, we'll want to add a similar API here.

@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch from dd63129 to 5d3c801 Compare April 7, 2026 17:59
@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch from 5d3c801 to c9e582a Compare April 7, 2026 18:07
@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch 3 times, most recently from a462497 to fcb9203 Compare April 7, 2026 21:25
@winstxnhdw winstxnhdw force-pushed the feat/stacked-pybuffer branch from fcb9203 to c1872f3 Compare April 7, 2026 21:31
@winstxnhdw
Copy link
Copy Markdown
Author

winstxnhdw commented Apr 7, 2026

Sorry for the delay. I've been a little burnt out from school + work. I've adapted your suggestions and modified them to be what I think is appropriate.

Also, if you do PyBufferFlags::simple().full(), you won't be able to append let's say format() on it anymore because full() already implies format(). Just a nice DX win that I haven't really seen in other libraries like reqwest or tokio.

@winstxnhdw winstxnhdw requested a review from davidhewitt April 9, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider stack-allocated PyBuffer variant

2 participants