Skip to content

Optimize memory by reducing MAX_SEQ_LENGTH and improve error handling#37

Merged
aravind-3105 merged 2 commits intomainfrom
fix/preference-alignment-coder
Mar 24, 2026
Merged

Optimize memory by reducing MAX_SEQ_LENGTH and improve error handling#37
aravind-3105 merged 2 commits intomainfrom
fix/preference-alignment-coder

Conversation

@aravind-3105
Copy link
Copy Markdown
Member

@aravind-3105 aravind-3105 commented Mar 24, 2026

Summary

This pull request updates the preference alignment pipeline to improve compatibility, reproducibility, and ease of use, especially for users running on different hardware or cloud environments. The changes include improvements to dataset setup instructions, environment configuration, model loading, evaluation flexibility, and training defaults.

Clickup Ticket(s): Link(s) if applicable.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

Key changes include:

Dataset setup and documentation improvements

  • Updated the README to clarify that filtered .parquet files are hosted in a GCP bucket and provided clearer, step-by-step extraction and cleanup instructions, including directory structure expectations after setup. Added guidance on using the --active flag with uv sync and details for installing flash-attn via pre-built wheels, with platform-specific notes. [1] [2] [3]
  • Adjusted notebook error handling to suppress startup errors when data files are missing, making it easier to get started after following setup instructions. [1] [2]

Environment and configuration handling

  • Improved root directory detection and .env file handling in 05_evaluation.ipynb, ensuring consistent configuration loading regardless of execution context. Now supports both Gemini and OpenAI judge providers, with automatic client setup and clearer environment variable usage.
  • Added explicit setting of PYTORCH_ALLOC_CONF and clarified model/data path handling to improve reproducibility across machines.

Model loading and hardware compatibility

  • Changed default attention implementation from "flash_attention_2" to "sdpa" in model loading code for both inference and evaluation, with clear comments on how to switch based on hardware capabilities.
  • In training helpers, added detection for flash_attn and enabled it if available; otherwise, defaults to standard attention mechanisms. Device map is now set to "auto" for multi-GPU support.

Evaluation and judge flexibility

  • Refactored evaluation helpers and notebooks to use a generic judge_with_llm function instead of the hardcoded OpenAI judge, supporting any OpenAI-compatible API (including Gemini). Judge provider and model can be selected via configuration.

Training defaults and resource usage

  • Reduced default MAX_SEQ_LENGTH to 1024 for better compatibility with 22GB GPUs, and adjusted DPO trainer batch sizes to maintain effective batch size while reducing per-device memory requirements.
  • Lowered the default record limit in 02_inference_runner.ipynb for faster test runs, with comments on how to increase for larger experiments.

These changes collectively make the pipeline easier to set up, more robust to different environments, and more flexible for evaluation and training.

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

…ion, and improve dataset error handling in notebooks
@aravind-3105 aravind-3105 self-assigned this Mar 24, 2026
@aravind-3105 aravind-3105 added the bug Something isn't working label Mar 24, 2026
@aravind-3105 aravind-3105 merged commit d082817 into main Mar 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant