Update paper.md

jan-janssen · web-flow · commit 2175cdeca174 · 2025-01-31T16:31:45.000+01:00
diff --git a/docs/paper/paper.md b/docs/paper/paper.md
@@ -56,20 +56,19 @@ In this example, each individual summation is concurrently executed in a separat
 
 ## Computing Backends
 Currently, Executorlib supports four different computing backends specified by the  backend
-constructor argument. The first is a “local” backend for rapid prototyping on a local workstation in a way that is functionally similar to the standard ProcessPoolExecutor. The second “slurm_submission” backend can be used to submit Python functions as individual jobs to a SLURM job scheduler using the sbatch command, which can be useful for long running tasks, e.g., that call a compute intensive legacy code. This mode also has the advantage that all required hardware resources don’t have to be secured prior to launching the workflow and can naturally vary in time. The third is a `“slurm_allocation”` backend which distributes Python functions in a existing queuing system allocation using the srun command. Finally, the fourth is a `“flux_allocation”` backend using flux as hierarchical resource manager inside a given SLURM allocation. While the `“slurm_submission”` backend uses file-based communication under the hood e.g., the Python function to execute and its inputs are stored on the file system, executed in a separate Python process whose the output are again stored in a file, the other backends rely on socket-based communication to improve computational efficiency. 
+constructor argument. The first is a `“local”` backend for rapid prototyping on a local workstation in a way that is functionally similar to the standard `ProcessPoolExecutor`. The second `“slurm_submission”` backend can be used to submit Python functions as individual jobs to a SLURM job scheduler using the sbatch command, which can be useful for long running tasks, e.g., that call a compute intensive legacy code. This mode also has the advantage that all required hardware resources don’t have to be secured prior to launching the workflow and can naturally vary in time. The third is a `“slurm_allocation”` backend which distributes Python functions in a existing queuing system allocation using the srun command. Finally, the fourth is a `“flux_allocation”` backend using flux as hierarchical resource manager inside a given SLURM allocation. While the `“slurm_submission”` backend uses file-based communication under the hood e.g., the Python function to execute and its inputs are stored on the file system, executed in a separate Python process whose the output are again stored in a file, the other backends rely on socket-based communication to improve computational efficiency. 
 
 ## Resource assignment
 To assign dedicated computing resources to individual Python functions, the Executorlib Executor class extends the submission function `submit()` to support not only the Python function and its inputs, but also a Python dictionary specifying the requested computing resources resource_dict. The resource dictionary can define the number of compute cores, number of threads, number of GPUs, as well as job scheduler specific parameters like the working directory, maximum run time or the maximum memory. With this hierarchical approach, Executorlib allows the user to finely control the execution of each individual Python function, using parallel communication libraries like the Message Passing Interface (MPI) for Python [@mpi4py] or GPU-optimized libraries to aggressively optimize complex compute intensive tasks of heterogenous HPC that are best solved by tightly-coupled parallelization approaches, while offering a simple and easy to maintain approach to the orchestration of many such weakly-coupled tasks. This ability to seamlessly combine different programming models again accelerates the rapid prototyping of heterogenous HPC workflows without sacrificing performance of critical code components.
 
 ## Dependencies
-While two inter-dependent Python functions with similar computational resource requirements can always be combined and executed as a single Python function, this is no longer the case for Python functions with dramatically different resource requirements. In this case Executorlib, again extends the submission function `submit()` to support concurrent futures Future objects from the Python standard library as inputs. When such an argument is passed as input of a subsequent function, the Executor waits until the Python process linked to the concurrent futures  Future object completes its execution, before submitting the dependent Python function for execution. In the case of the `“slurm_submission”` backend which uses file-based communication, the dependencies of the Python functions are communicated to the SLURM job scheduler, so the communication is decoupled from the initial Python process, which submitted the Python functions, as the execution can be delayed until the user receives access to the requested computing resources from the SLURM job scheduler. Finally, by enabling the plotting parameter plot_dependency_graph=True during the initialization of the Executorlib Executor class, the resulting dependency graph can be visualized to validate the dependency relationships between the different concurrent futures Future objects. 
+While two inter-dependent Python functions with similar computational resource requirements can always be combined and executed as a single Python function, this is no longer the case for Python functions with dramatically different resource requirements. In this case Executorlib, again extends the submission function `submit()` to support concurrent futures Future objects from the Python standard library as inputs. When such an argument is passed as input of a subsequent function, the Executor waits until the Python process linked to the concurrent futures  Future object completes its execution, before submitting the dependent Python function for execution. In the case of the `“slurm_submission”` backend which uses file-based communication, the dependencies of the Python functions are communicated to the SLURM job scheduler, so the communication is decoupled from the initial Python process, which submitted the Python functions, as the execution can be delayed until the user receives access to the requested computing resources from the SLURM job scheduler. Finally, by enabling the plotting parameter `plot_dependency_graph=True` during the initialization of the Executorlib Executor class, the resulting dependency graph can be visualized to validate the dependency relationships between the different concurrent futures Future objects. 
 
 ## Performance Optimization
-While Executorlib is not a priori designed for Python functions with runtimes of less than about a minute, given the overhead of requesting dedicated computing resources and starting a new Python process, the execution of these functions can be significantly accelerated by reusing dedicated computing resources for the submission of multiple Python functions. This is enabled by setting the block allocation parameter block_allocation during the initialization of the Executor class to True. Rather than starting a separate Python process for each submitted Python function, the block allocation mode starts a dedicated number of workers (with a fixed resource allocation over their lifetime) and then allows the user to submit Python functions to these pre-defined workers. 
-To further improve computational efficiency when working with multiple analysis functions being applied on the same large dataset, data can be pre-loaded during the initialization of the Python function using the initialization function `init_function`. This initializes a Python dictionary in the Python process which is accessible by all subsequently submitted Python functions. 
+While Executorlib is not a priori designed for Python functions with runtimes of less than about a minute, given the overhead of requesting dedicated computing resources and starting a new Python process, the execution of these functions can be significantly accelerated by reusing dedicated computing resources for the submission of multiple Python functions. This is enabled by setting the block allocation parameter `block_allocation` during the initialization of the Executor class to `True`. Rather than starting a separate Python process for each submitted Python function, the block allocation mode starts a dedicated number of workers (with a fixed resource allocation over their lifetime) and then allows the user to submit Python functions to these pre-defined workers. To further improve computational efficiency when working with multiple analysis functions being applied on the same large dataset, data can be pre-loaded during the initialization of the Python function using the initialization function `init_function`. This initializes a Python dictionary in the Python process which is accessible by all subsequently submitted Python functions. 
 
 ## Caching
-The development of an HPC workflow is commonly an iterative process, which means the initial steps are repeated multiple times until the workflow is fully developed. To accelerate this process, Executorlib provides the option to cache the output of previously evaluated Python functions so these outputs can be reloaded without the need for repeating the evaluation of the same potentially expensive Python functions. Caching in Executorlib uses the same file storage interface as for the file-based communication with the “slurm_submission” backend. The caching is enabled by defining the cache directory parameter cache_directory as additional input during the initialization of the Executorlib Executor class. Finally, the cache also contains the execution time as additional information, enabling performance analysis of the workflow during the development cycle. 
+The development of an HPC workflow is commonly an iterative process, which means the initial steps are repeated multiple times until the workflow is fully developed. To accelerate this process, Executorlib provides the option to cache the output of previously evaluated Python functions so these outputs can be reloaded without the need for repeating the evaluation of the same potentially expensive Python functions. Caching in Executorlib uses the same file storage interface as for the file-based communication with the `“slurm_submission”` backend. The caching is enabled by defining the cache directory parameter `cache_directory` as additional input during the initialization of the Executorlib Executor class. Finally, the cache also contains the execution time as additional information, enabling performance analysis of the workflow during the development cycle. 
 
 ## Advanced Example
 To demonstrate the advanced functionality of executorlib beyond the scope of the Executor interface of the Python standard library a second advanced example is provided. This advanced example requires the flux framework to be installed, with at least one computing node in a given queuing system allocation and with each computing node having at least one GPU.