SACD Discrete Soft Actor Critic#203
SACD Discrete Soft Actor Critic#203splatter96 wants to merge 9 commits intoStable-Baselines-Team:masterfrom
Conversation
|
Hello,
please don't forget that part (see contributing guide). |
|
Hello, |
|
yes please =) |
Do you have the performance results for this? I came across this PR looking for implementations on SACD. Thank you. |
Unfortunately I never found the time to do the performance benchmark. I however use this implementation in several projects of mine with good results. So the implementation seems to be correct. |
| def get_crit_params(self, n): | ||
| return self.q_networks[n].parameters() | ||
|
|
||
| def forward(self, obs: th.Tensor) -> Tuple[th.Tensor, ...]: |
There was a problem hiding this comment.
It seems self.features_extractor is None is not handled in forward method.
|
Hi, I’d like to contribute to this PR by adding the rl-baselines3-zoo benchmarks. Plan:
I already tested the PR locally on CartPole and it trains perfectly (reaches ~250 reward very fast). Before I run the full 5-seed Atari experiments, could you confirm the environments and timestep budget look good, or would you prefer anything different? I can have the results ready within the next few days and either post the plots here or open a small follow-up PR in rl-baselines3-zoo if that’s cleaner. Thanks! |
This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.
Description
This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)
Context
Types of changes
Checklist:
make format(required)make check-codestyleandmake lint(required)make pytestandmake typeboth pass. (required)Note: we are using a maximum length of 127 characters per line