Skip to content

Added AdamW optimization function to deep_learning/optimizers.py#121

Open
showmyth wants to merge 1 commit intoeriklindernoren:masterfrom
showmyth:master
Open

Added AdamW optimization function to deep_learning/optimizers.py#121
showmyth wants to merge 1 commit intoeriklindernoren:masterfrom
showmyth:master

Conversation

@showmyth
Copy link
Copy Markdown

@showmyth showmyth commented Oct 8, 2025

Description

The AdamW optimizer is a variant of the Adam optimizer that improves deep learning model generalization by decoupling weight decay from the gradient update. While Adam couples L2 regularization with its gradient updates, which is problematic for adaptive methods like Adam, AdamW applies weight decay directly to the parameters after the gradient-based update.

Necessity

Leads to more stable, regularized and effective training, especially for large models like transformers. It is also used a lot more as compared to the traditional adam optimizer.

Features

  • AdamW optimizer class with configurable hyperparameters (learning_rate, b1, b2, weight_decay)
  • Decoupled weight decay application
  • Bias correction with time step tracking

References

Designed to be compatible with mlfromscratch optimization API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant