Optimize memory by reducing MAX_SEQ_LENGTH and improve error handling#37
Merged
aravind-3105 merged 2 commits intomainfrom Mar 24, 2026
Merged
Optimize memory by reducing MAX_SEQ_LENGTH and improve error handling#37aravind-3105 merged 2 commits intomainfrom
aravind-3105 merged 2 commits intomainfrom
Conversation
…ion, and improve dataset error handling in notebooks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request updates the preference alignment pipeline to improve compatibility, reproducibility, and ease of use, especially for users running on different hardware or cloud environments. The changes include improvements to dataset setup instructions, environment configuration, model loading, evaluation flexibility, and training defaults.
Clickup Ticket(s): Link(s) if applicable.
Type of Change
Changes Made
Key changes include:
Dataset setup and documentation improvements
.parquetfiles are hosted in a GCP bucket and provided clearer, step-by-step extraction and cleanup instructions, including directory structure expectations after setup. Added guidance on using the--activeflag withuv syncand details for installingflash-attnvia pre-built wheels, with platform-specific notes. [1] [2] [3]Environment and configuration handling
.envfile handling in05_evaluation.ipynb, ensuring consistent configuration loading regardless of execution context. Now supports both Gemini and OpenAI judge providers, with automatic client setup and clearer environment variable usage.PYTORCH_ALLOC_CONFand clarified model/data path handling to improve reproducibility across machines.Model loading and hardware compatibility
"flash_attention_2"to"sdpa"in model loading code for both inference and evaluation, with clear comments on how to switch based on hardware capabilities.flash_attnand enabled it if available; otherwise, defaults to standard attention mechanisms. Device map is now set to"auto"for multi-GPU support.Evaluation and judge flexibility
judge_with_llmfunction instead of the hardcoded OpenAI judge, supporting any OpenAI-compatible API (including Gemini). Judge provider and model can be selected via configuration.Training defaults and resource usage
MAX_SEQ_LENGTHto 1024 for better compatibility with 22GB GPUs, and adjusted DPO trainer batch sizes to maintain effective batch size while reducing per-device memory requirements.02_inference_runner.ipynbfor faster test runs, with comments on how to increase for larger experiments.These changes collectively make the pipeline easier to set up, more robust to different environments, and more flexible for evaluation and training.
Testing
uv run pytest tests/)uv run mypy <src_dir>)uv run ruff check src_dir/)Manual testing details:
Screenshots/Recordings
Related Issues
Deployment Notes
Checklist