fix consolidate training run outputs into a single runs/ directory by sudhansu-24 · Pull Request #580 · mllam/neural-lam

sudhansu-24 · 2026-04-04T14:34:13Z

Describe your changes

Training and evaluation artifacts are written under a single directory runs/<run-name>/: ModelCheckpoint uses runs/<run-name>/checkpoints/, Trainer(default_root_dir=...) keeps Lightning CSV logs under that run instead of a top-level lightning_logs/, and WandbLogger / CustomMLFlowLogger use save_dir=run_dir so internal logger paths and code using self.logger.save_dir (e.g. plots) stay under the run root. Checkpoints remain outside W&B’s wandb/ subtree so large files are not synced by default.

Motivation: Issue #293 and maintainer feedback (W&B selective sync, common run root, MLflow temp images not in CWD).
Dependencies: None

Issue Link

closes #293

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the README to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
[] I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug
- maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

PR is up to date with the base branch
the tests pass
(if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
Once the PR is ready to be merged, squash commits and merge the PR.

joeloskarsson · 2026-04-05T09:54:54Z

@sadamov assigning you here to decide later if this should be closed in favor of #297 or what is the best path forward with this.

sudhansu-24 · 2026-04-08T08:23:19Z

thanks @sadamov for clarifying.

I’ll continue implementation/revisions on #580 and coordinate here.
@Shyam-Sunder-saini @techaadii, if you have any pending changes or preferences from your earlier work that should be included please share them and i will incorporate them into this pr so we can converge quickly

Shyam-Sunder-saini · 2026-04-08T16:12:38Z

Thanks @sudhansu-24 for taking this forward!

From my side, I’ve aligned all training artifacts so they are now scoped under runs/<run-name>/, including checkpoints, Lightning logs, W&B, and MLflow outputs.

I also updated the logger setup so that both WandbLogger and CustomMLFlowLogger use the same save_dir (run directory). Additionally, MLflow now falls back to the run directory if MLFLOW_TRACKING_URI is not set.

Currently, only one logger is active at a time (default is W&B), and MLflow artifacts are generated when explicitly running with --logger mlflow.

If there are any preferences around structure or logging behavior from earlier work, I’m happy to incorporate them.

Let me know if you’d like me to push any additional changes to the PR.

sudhansu-24 · 2026-04-08T18:15:25Z

Thanks @Shyam-Sunder-saini

could you share the exact changes you want added beyond the current #580 state (especially around the MLFLOW_TRACKING_URI fallback) either as:

a short checklist by file, or a commit/PR branch we can cherry-pick from?

if you post that i will incorporate it quickly so we can finalize review.

sudhansu-24 mentioned this pull request Apr 4, 2026

Consolidate training run outputs into a single runs/<run-name>/ directory #297

Closed

21 tasks

joeloskarsson assigned sadamov Apr 5, 2026

sadamov mentioned this pull request Apr 8, 2026

Consolidate Training/Evaluation Run Outputs into a Single runs/ Directory #293

Open

fix consolidate training run outputs into a single runs/ directory

dd2a1a2

sudhansu-24 force-pushed the fix-run-outputs branch from cd1a3ef to dd2a1a2 Compare April 12, 2026 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix consolidate training run outputs into a single runs/ directory#580

fix consolidate training run outputs into a single runs/ directory#580
sudhansu-24 wants to merge 1 commit intomllam:mainfrom
sudhansu-24:fix-run-outputs

sudhansu-24 commented Apr 4, 2026

Uh oh!

joeloskarsson commented Apr 5, 2026

Uh oh!

sudhansu-24 commented Apr 8, 2026

Uh oh!

Shyam-Sunder-saini commented Apr 8, 2026

Uh oh!

sudhansu-24 commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sudhansu-24 commented Apr 4, 2026

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

joeloskarsson commented Apr 5, 2026

Uh oh!

sudhansu-24 commented Apr 8, 2026

Uh oh!

Shyam-Sunder-saini commented Apr 8, 2026

Uh oh!

sudhansu-24 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sudhansu-24 commented Apr 8, 2026 •

edited

Loading