SACD Discrete Soft Actor Critic by splatter96 · Pull Request #203 · Stable-Baselines-Team/stable-baselines3-contrib

splatter96 · 2023-08-07T12:43:17Z

This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.

Description

This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)

Context

I have raised an issue to propose this change (required)
Original issue in the stable baselines repo [Feature request] Implement SAC-Discrete DLR-RM/stable-baselines3#157

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Note: we are using a maximum length of 127 characters per line

currently

critics

araffin · 2023-08-12T07:55:03Z

Hello,
thanks for the PR =)

The functionality/performance matches that of the source (required for new training algorithms or training-related features).

please don't forget that part (see contributing guide).
I think there are discussion about the results here too: vwxyzjn/cleanrl#270

splatter96 · 2023-09-01T13:17:39Z

Hello,
thanks for the feedback :)
Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

araffin · 2023-09-01T13:25:42Z

yes please =)

daniel-redder · 2025-03-06T17:26:53Z

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

Do you have the performance results for this? I came across this PR looking for implementations on SACD. Thank you.

splatter96 · 2025-03-24T12:41:25Z

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

Do you have the performance results for this? I came across this PR looking for implementations on SACD. Thank you.

Unfortunately I never found the time to do the performance benchmark. I however use this implementation in several projects of mine with good results. So the implementation seems to be correct.

gyunt · 2025-05-10T03:46:55Z

sb3_contrib/sacd/policies.py

+    def get_crit_params(self, n):
+        return self.q_networks[n].parameters()
+
+    def forward(self, obs: th.Tensor) -> Tuple[th.Tensor, ...]:


It seems self.features_extractor is None is not handled in forward method.

alektebel · 2025-11-18T22:13:22Z

@araffin @splatter96

Hi, I’d like to contribute to this PR by adding the rl-baselines3-zoo benchmarks.

Plan:

Run 5 seeds each on:
- CartPole-v1 (MlpPolicy)
- MsPacmanNoFrameskip-v4 (CnnPolicy)
- BreakoutNoFrameskip-v4 (CnnPolicy)
Train for 1 million timesteps per environment
Generate the standard zoo learning-curve plots (mean ± std)
Compare visually/qualitatively against the original p-christ and toshikwa results

I already tested the PR locally on CartPole and it trains perfectly (reaches ~250 reward very fast).

Before I run the full 5-seed Atari experiments, could you confirm the environments and timestep budget look good, or would you prefer anything different?

I can have the results ready within the next few days and either post the plots here or open a small follow-up PR in rl-baselines3-zoo if that’s cleaner.

Thanks!

pauerbach and others added 9 commits July 31, 2023 16:07

Added first version of SAC Discrete, which is running but not learning

a14ae69

currently

Fixed bugs in that lead to wrong results, currently only working with 2

875b8bc

critics

Reworked code to work whith more than 2 critic networks

7711813

Code style changes

4a37f58

Prepared files for merge request (minor cleanup)

fca2c6d

Added run test for SACD

610fd3d

Added doc page for SACD

d97dbc7

Added save_load test for SACD

bc08ee9

Merge branch 'Stable-Baselines-Team:master' into master

4e99b74

gyunt reviewed May 10, 2025

View reviewed changes

araffin added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

SACD Discrete Soft Actor Critic#203

SACD Discrete Soft Actor Critic#203
splatter96 wants to merge 9 commits intoStable-Baselines-Team:masterfrom
splatter96:master

splatter96 commented Aug 7, 2023

Uh oh!

araffin commented Aug 12, 2023

Uh oh!

splatter96 commented Sep 1, 2023

Uh oh!

araffin commented Sep 1, 2023

Uh oh!

daniel-redder commented Mar 6, 2025 •

edited

Loading

Uh oh!

splatter96 commented Mar 24, 2025

Uh oh!

gyunt May 10, 2025 •

edited

Loading

Uh oh!

alektebel commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Comments

Conversation

splatter96 commented Aug 7, 2023

Description

Context

Types of changes

Checklist:

Uh oh!

araffin commented Aug 12, 2023

Uh oh!

splatter96 commented Sep 1, 2023

Uh oh!

araffin commented Sep 1, 2023

Uh oh!

daniel-redder commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

splatter96 commented Mar 24, 2025

Uh oh!

gyunt May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alektebel commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

daniel-redder commented Mar 6, 2025 •

edited

Loading

gyunt May 10, 2025 •

edited

Loading