Skip to content

BUG: parallel-safe pykokkos#61

Open
tylerjereddy wants to merge 1 commit intokokkos:mainfrom
tylerjereddy:treddy_parallel_safety
Open

BUG: parallel-safe pykokkos#61
tylerjereddy wants to merge 1 commit intokokkos:mainfrom
tylerjereddy:treddy_parallel_safety

Conversation

@tylerjereddy
Copy link
Copy Markdown
Contributor

  • the current develop branch appears to not be parallel
    safe as described at WIP, ENH: support parallel runtests #60 (comment)

  • this branch allows pykokkos to compile/run code both
    in serial and in parallel by providing genuinely unique
    identifiers (file paths) to each "compilation unit"; careful though,
    this will slow down the serial execution time for the testsuite
    substantially, probably because it removes reuse in favor
    of safety from a compilation standpoint--there's probably an approach
    that is both fast and safe, and I'm certainly open to that, but
    I'd also argue that safe and slow > (parallel) unsafe and fast

  • combined with WIP, ENH: support parallel runtests #60, this allows:

    • OMP_NUM_THREADS=1 python runtests.py -n 10
    • 123 passed, 9 skipped, 9 xfailed, 16 warnings in 92.10s (0:01:32)
    • that's more than twice as fast as the serial test run on develop on the same machine
    • python runtests.py
    • 123 passed, 9 skipped, 9 xfailed, 16 warnings in 212.40s (0:03:32) (3:18 with OMP_NUM_THREADS=1)
    • however, this branch slows down the serial test run a lot, to 11.5
      minutes!
  • of course, you'd probably have to thoroughly test OMP_NUM_THREADS
    values and so on to benchmark the hierarchical parallelism situation
    to determine scenarios where you'd even want to use "parallel pykokkos,"
    but I certainly think we should try to be "safe" for concurrent usage
    so that we don't impose a certain model of concurrency on our
    consumers

* the current `develop` branch appears to not be parallel
safe as described at kokkos#60 (comment)

* this branch allows pykokkos to compile/run code both
in serial and in parallel by providing genuinely unique
identifiers (file paths) to each "compilation unit"; careful though,
this will slow down the serial execution time for the testsuite
substantially, probably because it removes reuse in favor
of safety from a compilation standpoint--there's probably an approach
that is both fast and safe, and I'm certainly open to that, but
I'd also argue that safe and slow > (parallel) unsafe and fast

* combined with kokkosgh-60, this allows:
  - `OMP_NUM_THREADS=1 python runtests.py -n 10`
  - `123 passed, 9 skipped, 9 xfailed, 16 warnings in 92.10s (0:01:32)`
  - that's more than twice as fast as the serial test run on `develop`
  - `python runtests.py`
  - `123 passed, 9 skipped, 9 xfailed, 16 warnings in 212.40s (0:03:32)`
  - however, this branch slows down the serial test run a lot, to 11.5
    minutes!

* of course, you'd probably have to thoroughly test `OMP_NUM_THREADS`
  values and so on to benchmark the hierarchical parallelism situation
  to determine scenarios where you'd even want to use "parallel pykokkos,"
  but I certainly think we should try to be "safe" for concurrent usage
  so that we don't impose a certain model of concurrency on our
  consumers
@tylerjereddy tylerjereddy added the bug Something isn't working label Aug 22, 2022
# the same module/class; try using the memory loc of the Python
# metadata object
mem_id = id(metadata)
dirname: str = f"{filename}_{metadata.name}_{mem_id}"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a scheme that's both fast and safe? Giving each "compilation unit" its own specific directory was the easiest approach I could think of, but presumably the serial test run is so much faster on develop because sharing a directory either allows reuse of some previously-compiled code and/or of some shared common compiled bits between different workunits/kernels?

@tylerjereddy
Copy link
Copy Markdown
Contributor Author

With whatever final solution gets adopted here, a regression test on i.e., the directory structure that is decided upon for compilation would probably be a good idea as well.

@NaderAlAwar NaderAlAwar changed the base branch from develop to main May 24, 2023 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant