Aurashk/improve process manager terminate behaviour#733
Aurashk/improve process manager terminate behaviour#733
Conversation
improve kill handling to kill remote processes directly via ssh via sigterm
jamesturner246
left a comment
There was a problem hiding this comment.
Not sure if you can still see our meeting chat any more, but I think I have signal forwarding working without any monitor threads or other hacks.
The secret is to run the ssh client with the -t option, which forces remote to create a PTY, and then run the command inside of that, instead of without, where it runs the command directly. Advantage of this way is signals like SIGTERM et al are actually forwarded to remote command properly, thanks to the PTY's built-in signal handling.
So all it means in practice is using ssh -t ... instead of ssh ..., and it should work without the hacks.
|
One caveat though, SIGTERMing the local ssh means that in fact a SIGHUP is actually what appears on the remote command side. Butt from this use case I don't think it matters which of SIGTERM or SIGHUP reaches remote command, just the fact that a TERM-ish signal is reaching remote reliably. |
|
One can in fact be EXTRA safe (probably would recommend) by sending the |
From my understanding of #658 and #649 this is exactly the problem we want to solve. We want to explicitly send signals to the remote process, but the current implementation send signals to the local process. If a remote process gets a SIGHUP it leads to ambiguity, as it just means the connection/terminal was closed from the client side. The current intended behaviour attempts to shut everything down cleanly with a This is all my interpretation of what the code should be doing from issues and looking at the current implementation though, I'd say ideally we should have these nuances documented somewhere it's clear what |
Monitor threads are used for a few things in the shell ssh process manager which we should probably document better (reading logs from remote processes asynchronously, checking the remote process is alive, running a callback function when the remote process exits), I don't think it's that straightforward to remove them. We are already using |
|
Just want to flag something about the If you try, for example You will get most of your processes forcibly closed with |
|
As discussed with @jamesturner246 we can use the existing monitor threads to have one persistent ssh connection per process which will block until the remote process dies. This means we don't need to reconnect to every remote process for each The fundamental limitation I'm seeing is:
I think this means we need to do one of the following but open to other solutions:
|
|
Thank you both for the discussion on this PR thread. @Aurashk you have made the correct assumptions about the code. When we The order for process shutdown does matter, in the sense that we want to kill Your suggestions on process ordering are good, but it would be better to contain the process termination within the reporting period - leaving the processes as cleaning up after the result of Doing TLDR
|
…ttps://github.com/DUNE-DAQ/drunc into aurashk/improve-process-manager-terminate-behaviour
Thanks for the helpful comments @PawelPlesniak, it wasn't too complicated to do the terminate in asynchronous batches (in the order specified, same as k8s) so I went with that. It seems to work nicely and much more performant than killing the processes one-after-the-other. I turned up the timeout in |
|
@MRiganSUSX I made a small edit to the k8s process manager so that it shares the same shutdown ordering list as the ssh process manager stored as a config variable. It's a bit nicer but not strictly necessary for this PR so I can revert it if it's going to mess with anything you are doing. |
|
Hi Aurash, thank you for this cleanup, the code is much easier to read now. I a request - could we suppress the output from the Note - this was tested with increasing |
|
@Aurashk I have not tested anything and only looked at the k8s PM related bits, but seems good to me. Thanks for checking. |
Description
Fixes #658
Modifies both flavours of SSH process manager (shell and paramiko) to query/kill remote processes directly rather than through the local ssh client. This is achieved by running headless remote processes and storing the pid of the remote process in metadata. Then the pid can be used to send signals through ssh directly to the remote process.
This has some desired effects:
SIGHUPto the remote processNote: There is an adjacent comment in the issue about fixing the terminate order to match K8s. I think that would be straightforward to add to this PR but I will wait for feedback on the approach first
Type of change
Key checklist
python -m pytest)pre-commit run --all-files)Further checks
(Indicate issue here: # (issue))