Skip to content

Rewrite exp list history to avoid using inspect.stack#867

Open
dwpaley wants to merge 3 commits intomainfrom
history_no_stack
Open

Rewrite exp list history to avoid using inspect.stack#867
dwpaley wants to merge 3 commits intomainfrom
history_no_stack

Conversation

@dwpaley
Copy link
Contributor

@dwpaley dwpaley commented Nov 23, 2025

Calling inspect.stack() appears harmful in a medium-sized (512 ranks) MPI job because every rank hits the filesystem to resolve module absolute paths. This was causing hangs of up to 5 minutes. Here we switch to sys._getframe to work around the problem.

@ndevenish ndevenish requested a review from dagewa December 4, 2025 16:11
@dagewa
Copy link
Member

dagewa commented Dec 5, 2025

Thanks @dwpaley. Reading https://docs.python.org/3/library/sys.html#sys._getframe I see that sys._getframe "should be used for internal and specialized purposes only" and that it is not guaranteed to exist. Well, I guess this use does qualify as specialized, but perhaps we should catch if it doesn't exist and fall back to inspect.stack() in such cases.

I want to test this in a few production environments (dev build DIALS, release build DIALS, CCP4 DIALS, conda DIALS...) just to be sure we are getting what we expect. Might take a little while to get on to it, but it's on my list.

Copy link
Member

@dagewa dagewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success with DIALS (cmake) bootstrap build, release build, and DIALS installed from conda. Can't easily check CCP4 DIALS yet as it's on DIALS 3.21 and doesn't have the required code for the experiment history. But I think if there's any problems there we can make a special case for that later on.

So, I'm happy, but I think a fallback to inspect.stack() is a good idea if someone installs DIALS in some weird environment where sys._getframe does not exist. Plus needs a newsfragment.

@ndevenish
Copy link
Collaborator

CPython implementation detail: This function should be used for internal and specialized purposes only. It is not guaranteed to exist in all implementations of Python.

This looks like we are fine, because it seems to imply it is always present in CPython. However, we should at the very least check _getframe exists and throw our hands up (or just do nothing with history)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants