Skip to content

io: implement posix.open/read/write.close + _io.FileIO + open#434

Open
paugier wants to merge 7 commits intospylang:mainfrom
paugier:posix-open-read-write-close
Open

io: implement posix.open/read/write.close + _io.FileIO + open#434
paugier wants to merge 7 commits intospylang:mainfrom
paugier:posix-open-read-write-close

Conversation

@paugier
Copy link
Copy Markdown
Contributor

@paugier paugier commented Mar 26, 2026

I was helped by Claude, which in particular wrote the C code! It seems reasonable but I'm quite unable to judge if it is really correct.

The most important tests (test_open_read and test_open_write) raises an error for the C backend.

>               raise SPyError.simple(etype, message, "", loc)
E               spy.errors.SPyError: Traceback (most recent call last):
E               
E               ValueError: No such file or directory
E                 | /home/pierre/dev/spy/spy/libspy/include/spy/posix.h:51
E                 |         spy_panic("ValueError", strerror(errno), __FILE__, __LINE__);
E                 |  |__________________________________________________________________|

Claude told me that it is due to the WASI sandbox and that the test can be skipped with

@skip_backends("C", reason="WASI sandbox cannot access host filesystem paths")

Nevertheless a good news is that this works fine:

cd examples
spy build io_low_level.spy
./build/io_low_level

@paugier paugier changed the title io: implement the open, read and write syscalls io: implement posix.open, read, write and close + _io.FileIO Mar 26, 2026
@paugier paugier changed the title io: implement posix.open, read, write and close + _io.FileIO io: implement posix.open, read, write, close + _io.FileIO Mar 26, 2026
@paugier paugier changed the title io: implement posix.open, read, write, close + _io.FileIO io: implement posix.open/read/write.close + _io.FileIO + open Mar 27, 2026
@paugier
Copy link
Copy Markdown
Contributor Author

paugier commented Mar 27, 2026

I think it is ready for review.

One evident very annoying issue is that the C code is not actually tested!

@paugier paugier force-pushed the posix-open-read-write-close branch from 2be2633 to bc1176f Compare March 27, 2026 10:19
@paugier paugier force-pushed the posix-open-read-write-close branch from 53a6c58 to 0217b07 Compare March 27, 2026 10:43
@antocuni
Copy link
Copy Markdown
Member

@paugier thank you! I'll give it a proper review later but in the mean time, some comments:

  • I think this fixes Implement posix.open and posix.read #355 and Implement posix.write #356 but I don't consider it a fix for Create a file object in the stdlib #357. The problem is that a usable file object must have come kind of buffering. The approach of having a buffered _io.FileIO like in CPython is cool, but then we need a Buffered wrapper around it. However, it might be that we lack the proper low-level SPy constructs to be able to implement it right now (e.g. we probably need some kind of "create spy string out of this raw memory buffer from index N with length N", without doing too many copies).

  • we should find a way to test it :). I think that wasmtime can be configured in such a way to be able to read the host filesystem and/or a specific part of it. I'd start to investigate starting from spy/llwasm/wasmtime.py.

  • the same fix should work both for the tests and for the CLI interpreter. Bonus point if the interpreter has a way to configure which part of the filesystem to expose, similar to wasmtime run --dir. This doesn't necessarily be part of this PR, can be done later. An open question is what should be the default (/ as CPython does? Only the cwd? Nothing by default?)

  • you said that you were helped by claude. That's fine and thank you a lot for declaring it, it makes my life easier as a reviewer. Can you mark clearly which files/part of code were written by claude and which were written by you? I'd also be interested in the prompts if available (not a big deal but it helps, plus I'm curious :)).

@antocuni
Copy link
Copy Markdown
Member

@paugier so, I have made some extra investigation on this topic.
I asked claude to to two separate research projects:

  1. How is I/O implemented in major languages: https://github.com/antocuni/vibes/tree/main/io-abstraction-layers
  2. implement a python-like I/O tower in C: https://github.com/antocuni/vibes/tree/main/c-io-tower

I think that what we ultimately want is something like (2), but written in SPy. However, we are probably lacking some low-level capabilities at the moment (e.g. we cannot call memmove and memcpy from SPy).

I also asked claude what it thinks it's missing:

memcpy / memmove
The buffered layer relies heavily on bulk memory copies — filling buffers, compacting with memmove, copying chunks to the accumulation buffer. SPy has no bulk memory operation; you'd have to
loop byte-by-byte with ptr[i] indexing, which kills the performance goal.

str ↔ raw bytes interop
The whole point of the text layer is to build a str from raw bytes read from a buffer, and to extract raw UTF-8 bytes from a str for writing. SPy's str is opaque — there's no way to:

  • Construct a str from a ptr[u8] + length (the equivalent of spy_str_new)
  • Get a ptr[u8] to the internal UTF-8 data of an existing str

Pointer casting / arithmetic
The C code does things like treating a char * buffer at different offsets, passing buf + offset to functions, and memchr scanning. SPy pointers support indexing (ptr[i]) but there's no way to
get a sub-pointer (e.g., "pointer starting at element N") or cast between pointer types.


I think that this is what we should strive for, but as you see it might require a while.
But I agree that we want a file object soon. I suggest to the easy way and implement open and file on top of C's fopen() / fread() / fwrite(). This should be much easier because libc takes care of buffering and should give reasonable performance. This is basically how the file object worked in Python 2.

Then, we we have more low-level capabilities, we can revisit this plan and implement "proper" IO.

What do you think?

@paugier
Copy link
Copy Markdown
Contributor Author

paugier commented Mar 30, 2026

you said that you were helped by claude. That's fine and thank you a lot for declaring it, it makes my life easier as a reviewer. Can you mark clearly which files/part of code were written by claude and which were written by you? I'd also be interested in the prompts if available (not a big deal but it helps, plus I'm curious :)).

This is not exactly simple... One thing is that I "discussed" a lot with Claude about SPy and its implementation (not particularly for this PR). I'm really impressed how this model is able to nicely explain quite complex ideas. I don't know if Claude keeps some context from previous conversations.

For this particular PR, I explained my goals in particular with the issues texts and I asked for the implementation of spy/vm/modules/posix.py and the related tests (nothing special in terms of prompts). I was happy with posix.py (except a Mypy issue and ValueError) but strangely the produced tests were not perfect from my point of view so I cleaned them. I asked for the reason of import os inside the functions and Claude told me something reasonable. Then I knew that Claude would be much faster and better than me for the C code so I asked and Claude wrote all the changes in posix.h. I checked that it works for native compilation and I saw that the C tests fail. I asked about these errors and Claude explained. I asked few questions about implementing _io.FileIO. Claude first told me that I had to use __ll__ like in stdlib._list.spy. But I thought that it was not necessary for a file descriptor so I didn't take that. The other small changes, tests, doc and examples are manual.

I'm very impressed by these interactions with a LLM and this is totally new to me... One thing is that I still do not use Claude Code which help me keeping the control. I use copy and paste, format, diff, test, understand and potentially modify to reach the feeling "this is similar to what I would have written manually". For the C code, I do not feel as comfortable as in Python, which is why I didn't feel very well with it.

@paugier
Copy link
Copy Markdown
Contributor Author

paugier commented Mar 31, 2026

What do you think?

I didn't realize that io was so complex 🙂

I also better realize the tension when building a new language between implemented low level features to build the tower with strong underpinnings and giving quickly high level features so that people can start to experiment with the language and are happy with what they see.

As a potential SPy user, I tend to think more about user facing syntax and utilities to start to do real things. However, I don't want to rush you in this direction.

I see that it would be a very nice demo that SPy can be useful for low level to implement in SPy something like 2. (implement a python-like I/O tower in C). And I see that we need (i) memcpy / memmove, (ii) bytes and str/bytes interop and (iii) more pointers.

To start to use SPy for things like benchmarks, it seems to me that the IO can be quite limited. Just FileIO might even be sufficient?

@antocuni
Copy link
Copy Markdown
Member

antocuni commented Apr 2, 2026

I see that it would be a very nice demo that SPy can be useful for low level to implement in SPy something like 2. (implement a python-like I/O tower in C). And I see that we need (i) memcpy / memmove, (ii) bytes and str/bytes interop and (iii) more pointers.

To start to use SPy for things like benchmarks, it seems to me that the IO can be quite limited. Just FileIO might even be sufficient?

I think that would be very bad, especially for benchmarks: unbuffered IO is extremely slow and would make all the benchmark looking bad.
I just merged #447 which adds a file object based on libc, which is good enough for now.

Sorry for having made you unnecessary work. I even considered merging this PR anyway and keep it for the future, but I don't think it's a good idea because we need a much deeper thinking about it: in particular, on CPython os.read allocates and return a string at every call. We surely don't want this in our performance oriented code: for FileIO, we probably want a different API which lets us to read into a preallocated buffer.

I'm going to leave this PR open, I might decide to "extract" the implementations of os.read and os.write (which seems good), but without rush.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement posix.write Implement posix.open and posix.read

2 participants