Skip to content

Reference implementation 4#13

Merged
Sindhuja217 merged 10 commits intomainfrom
ref-impl-4
Feb 17, 2026
Merged

Reference implementation 4#13
Sindhuja217 merged 10 commits intomainfrom
ref-impl-4

Conversation

@Sindhuja217
Copy link
Copy Markdown
Collaborator

This reference implementation includes all core helper utilities, end-to-end notebooks, and documentation required to run a Direct Preference Optimization (DPO) pipeline with an LLM-as-a-Judge setup. The implementation covers dataset construction, judge-based inference, preference pair generation, DPO training, and evaluation, and is structured for modularity and reproducibility.

This is an initial version of the reference implementation. While the codebase is complete and internally consistent, it has not yet been executed or validated on Google Colab. Minor environment- or runtime-specific adjustments may be required when running in Colab.

@Sindhuja217 Sindhuja217 self-assigned this Feb 9, 2026
@aravind-3105 aravind-3105 added the enhancement New feature or request label Feb 9, 2026
Copy link
Copy Markdown
Member

@aravind-3105 aravind-3105 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding comments here as I go through them

},
"outputs": [],
"source": [
"raw_dataset = load_parquet_dataset(PARQUET_PATH)\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PARQUET PATH will be PosixPath('') form so raw_dataset = load_parquet_dataset(str(PARQUET_PATH)) needed.

ALso the .parquet isn't going to be there in the repo so download data steps should be added in this notebook.

Copy link
Copy Markdown
Collaborator Author

@Sindhuja217 Sindhuja217 Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the .parquet files are added in this folder
"/projects/aieng/interp_agents_bootcamp/reference_implementation_4"
So will the participants have access to this folder, because those are not directly downloaded from hugging face I kinda filtered the data from hugging face and got those parquet files

also should I add the code for the data filtering part anywhere ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the data filtering script and add instructions in the README on how to create it. Ideally, the filtered data will be stored in a GCP bucket, which participants can access to download and place in the reference_implementation_4 folder. They won't have access to the cluster.

Copy link
Copy Markdown
Collaborator

@shainarazavi shainarazavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some icons in these, do you want to keep, ideally good to have some reference for judge model, although its your prompt @Sindhuja217

Copy link
Copy Markdown
Collaborator

@shainarazavi shainarazavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest add little bit context before each step

@shainarazavi shainarazavi self-requested a review February 11, 2026 16:44
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to add bit about why seed needed, why we perform some step, little bit above each line of code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sindhuja217 can we add some context before each line of code, bit what is happeneing, there are many LLM judge papers, good to add reference to 1-2 storng

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sindhuja217 I prefer to add some context before each line of code and some ref in the end

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what am I missing @Sindhuja217 @aravind-3105 that we can add bit context before each line, add a reference in the end of related works (I know we have one reference we following but see it more from academic view)

@Sindhuja217
Copy link
Copy Markdown
Collaborator Author

@shainarazavi I addressed all your comments added context for important cells, Im planning to add some one once I run the code on gpu and also included respective references

@aravind-3105
Copy link
Copy Markdown
Member

aravind-3105 commented Feb 13, 2026

I’ve gone through all the notebooks (except the 5th, which needs an API key, I will try it once approval comes through) and everything is working well. One suggestion for the first notebook: instead of jumping straight into the "Dataset Construction for Preference Alignment (DPO)" section, it might be helpful to start with a main title, #Preference Alignment (DPO), that explains why we follow the four steps and includes a brief description (maybe even 1-2 images) about what preference alignment is. This, along with the slides, would make it easier to explain and for participants to grasp. The same content could also be added to the readme so both look complete.

Another addition, based on Shaina’s feedback for other notebooks, is to include 3-4 questions, answers, or discussion points on the topic. Since the notebooks are divided into sections, these could be added to the readme instead of any particular notebook. Once these two additions are in place, it’s good to merge. Thanks for addressing the comments so promptly, really appreciate it.

Copy link
Copy Markdown
Member

@aravind-3105 aravind-3105 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good now to merge.

@Sindhuja217 Sindhuja217 merged commit 7ac52b4 into main Feb 17, 2026
1 of 2 checks passed
@aravind-3105 aravind-3105 deleted the ref-impl-4 branch March 10, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants