Conversation
aravind-3105
left a comment
There was a problem hiding this comment.
Adding comments here as I go through them
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "raw_dataset = load_parquet_dataset(PARQUET_PATH)\n", |
There was a problem hiding this comment.
PARQUET PATH will be PosixPath('') form so raw_dataset = load_parquet_dataset(str(PARQUET_PATH)) needed.
ALso the .parquet isn't going to be there in the repo so download data steps should be added in this notebook.
There was a problem hiding this comment.
so the .parquet files are added in this folder
"/projects/aieng/interp_agents_bootcamp/reference_implementation_4"
So will the participants have access to this folder, because those are not directly downloaded from hugging face I kinda filtered the data from hugging face and got those parquet files
also should I add the code for the data filtering part anywhere ?
There was a problem hiding this comment.
Keep the data filtering script and add instructions in the README on how to create it. Ideally, the filtered data will be stored in a GCP bucket, which participants can access to download and place in the reference_implementation_4 folder. They won't have access to the cluster.
shainarazavi
left a comment
There was a problem hiding this comment.
there are some icons in these, do you want to keep, ideally good to have some reference for judge model, although its your prompt @Sindhuja217
shainarazavi
left a comment
There was a problem hiding this comment.
I suggest add little bit context before each step
There was a problem hiding this comment.
it would be nice to add bit about why seed needed, why we perform some step, little bit above each line of code
There was a problem hiding this comment.
@Sindhuja217 can we add some context before each line of code, bit what is happeneing, there are many LLM judge papers, good to add reference to 1-2 storng
There was a problem hiding this comment.
@Sindhuja217 I prefer to add some context before each line of code and some ref in the end
There was a problem hiding this comment.
what am I missing @Sindhuja217 @aravind-3105 that we can add bit context before each line, add a reference in the end of related works (I know we have one reference we following but see it more from academic view)
|
@shainarazavi I addressed all your comments added context for important cells, Im planning to add some one once I run the code on gpu and also included respective references |
|
I’ve gone through all the notebooks (except the 5th, which needs an API key, I will try it once approval comes through) and everything is working well. One suggestion for the first notebook: instead of jumping straight into the "Dataset Construction for Preference Alignment (DPO)" section, it might be helpful to start with a main title, #Preference Alignment (DPO), that explains why we follow the four steps and includes a brief description (maybe even 1-2 images) about what preference alignment is. This, along with the slides, would make it easier to explain and for participants to grasp. The same content could also be added to the readme so both look complete. Another addition, based on Shaina’s feedback for other notebooks, is to include 3-4 questions, answers, or discussion points on the topic. Since the notebooks are divided into sections, these could be added to the readme instead of any particular notebook. Once these two additions are in place, it’s good to merge. Thanks for addressing the comments so promptly, really appreciate it. |
aravind-3105
left a comment
There was a problem hiding this comment.
Everything looks good now to merge.
This reference implementation includes all core helper utilities, end-to-end notebooks, and documentation required to run a Direct Preference Optimization (DPO) pipeline with an LLM-as-a-Judge setup. The implementation covers dataset construction, judge-based inference, preference pair generation, DPO training, and evaluation, and is structured for modularity and reproducibility.
This is an initial version of the reference implementation. While the codebase is complete and internally consistent, it has not yet been executed or validated on Google Colab. Minor environment- or runtime-specific adjustments may be required when running in Colab.