-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
I'm running the code verbatim but not finding the results which might be expected. For example, running ping_pong_a2c results in barely any improvement after more than 8,000 runs, while I would expect a good level of accuracy (at least > 0 score) by 5,000 iterations or so based on other people reporting results based on using RL with Atari/Pong.
Is there something I'm missing? Do the hyperparameters need to be tuned rather than run as is?
Thank you for creating the code base.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
