Conversation
|
Hello,
Just to be clear, because people want the feature but don't have time to maintain it, you are suggesting to give me the additional maintenance load? (In addition to SB3, RL Zoo, algorithms already in SB3-Contrib and SBX) |
Hello, Thanks for getting back on this. While it may seem like a maintenance load, it's actually not. It's because I didn't write any new line of code here. I was just a facilitator to take common code of both implementation and create a merger while keeping code reusability at maximum. And I personally don't think it'd be much of a hassle to modify changes in this algorithm in case implementation of RPPO or MPPO gets modified. Regards |
It's not a maintenance load, but somehow no one, not even you, wants to maintain it in the medium/long term?
I agree it's mostly a copy-pasting of the two implementations, but that's also why I would prefer a separate repo (and a link to that repo in our doc), instead of adding 1400 lines of duplicated code. |
Yeah because it's nearly impossible for me to keep up with the rest of the changes happening in the parent repository, that's why we try to make PR in the first place?
Then I think the most rational option here would be to have an implementation of current RecurrentPPO which could just accept a parameter to enable masking? What do you think about this? |
A simple merge of maskable PPO and recurrent PPO for the latest version of sb3-contrib
Description
Closes #101
This may be one of the most controversial PR here. The reason for this PR are the following:
Instructions
Import using
from sb3_contrib import MaskableRecurrentPPO.Rest of the usage details are the same e.g., lstm_states in predict and action_masks method inside gym env.
Notes
P.S: I wrote this for v1.7.0 last year. However, recently someone tagged me on the issue again. Therefore, it felt the right time to invest some time to re-do it.
Context
Types of changes
Checklist:
make format(required)make check-codestyleandmake lint(required)make pytestandmake typeboth pass. (required)Note: we are using a maximum length of 127 characters per line