Modify run_examples.py to run each benchmark multiple times with the output logs. Verify that the simulator runs with determinism.