Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
💤 Files with no reviewable changes (3)
📝 WalkthroughWalkthroughThis PR removes numerous top-level and experiment sub-routers and their registrations from the FastAPI app, deleting associated endpoints, request models, and orchestration helpers across training, evaluations, tasks, prompts, batched prompts, recipes, conversations, diffusion, export, generations, RAG, and workflows. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
deep1401
left a comment
There was a problem hiding this comment.
I think you might have to fix the rag.reindex in documents and also the ruff alert for shared
Paragon SummaryThis pull request review analyzed 17 files and found no issues. The review examined code changes, potential bugs, security vulnerabilities, performance issues, and code quality concerns using automated analysis tools. Paragon did not detect any problems in the current diff. Proceed with merge after your normal checks. This PR removes numerous API routes from the FastAPI backend, including endpoints for batched prompts, evals, experiments, conversations, diffusion, documents, generations, RAG, workflows, prompts, recipes, tasks, and training. The changes affect the main API entry point, multiple router modules, and shared utilities. Key changes:
Confidence score: 5/5
17 files reviewed, 0 comments Tip: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
api/transformerlab/shared/shared.py (2)
500-531:⚠️ Potential issue | 🔴 CriticalEVAL jobs silently complete without execution, and GENERATE jobs will fail with NameError due to missing
run_generation_scriptdefinition.The EVAL branch (line 500–531) comments out
run_evaluation_scriptbut still returns{"status": "complete", ...}after only checking the stop flag. If EVAL jobs are creatable, they succeed without performing any work.More critically, line 548 calls
await run_generation_script(...)for GENERATE jobs, but this function is not defined or imported anywhere in the codebase. This will cause a NameError at runtime.Additionally, the
run_jobfunction signature lacks type hints for thejob_configparameter and a return type annotation, violating the coding guideline that requires type hints for all function arguments and returns inapi/**/*.py.Required actions:
- Either route EVAL jobs to a new evaluation implementation or fail immediately with a clear error message.
- Define or import
run_generation_scriptbefore calling it at line 548.- Add missing type hints: specify type for
job_config, add return type annotation, and consider using Pydantic models fromschemas/forjob_configandjob_detailsparameters per theapi/transformerlab/**/*.pyguideline.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/shared/shared.py` around lines 500 - 531, The EVAL branch currently skips execution (commented-out run_evaluation_script) and returns success, run_generation_script is called but undefined, and run_job lacks type hints/return annotation; fix by either wiring EVAL to call a real evaluator (restore or implement run_evaluation_script and call it with (experiment_name, plugin_name, eval_name, job_id, org_id=org_id, user_id=user_id_from_job)) or explicitly fail early with job_update_status(..., "FAILED") and an explanatory error message; ensure run_generation_script is defined or imported before its call (or rename the call to the actual generation function) so no NameError occurs; and add type hints to the run_job signature: annotate job_config (use a Pydantic schema from schemas/ if available) and job_details, plus a return type (e.g., -> dict) to satisfy api/**/*.py guidelines.
533-550:⚠️ Potential issue | 🔴 Critical
run_generation_scriptis undefined and will cause aNameErrorat runtime.Line 548 calls
run_generation_script, but this function is not defined or imported anywhere in the codebase. If GENERATE jobs execute, this will fail at runtime. The parallel EVAL job has the equivalent call commented out; either restore the missing import/definition or disable the GENERATE path similarly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/shared/shared.py` around lines 533 - 550, The GENERATE branch calls an undefined function run_generation_script which will raise a NameError; fix by either importing the correct function or calling the intended existing helper (restore or add the missing definition). Locate the GENERATE handling block (master_job_type == "GENERATE") and replace the run_generation_script call with the correctly imported symbol or implement/run the appropriate function that matches the EVAL path pattern (use the same parameter set: experiment_name, plugin_name, generation_name, job_id, org_id, user_id_from_job) and ensure _get_user_id_for_subprocess, job_update_status and storage usage remain consistent; also add the needed import at module top if restoring an external function.api/transformerlab/routers/experiment/documents.py (5)
248-250:⚠️ Potential issue | 🔴 CriticalRemove dangling RAG reindex calls after removing
rag.
ragis no longer imported, so these blocks now raiseNameErrorand already fail Ruff (F821). Remove or replace with the new service entrypoint.🧹 Proposed fix
- # reindex the vector store on every file upload - if folder == "rag": - await rag.reindex(experimentId) + # RAG reindex removed with RAG feature- # reindex the vector store on every file upload - if folder == "rag": - await rag.reindex(experimentId) + # RAG reindex removed with RAG featureAlso applies to: 327-329
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/routers/experiment/documents.py` around lines 248 - 250, The code currently calls rag.reindex(experimentId) when folder == "rag", but rag was removed and causes F821 NameError; either remove these conditional blocks or replace them with the new RAG service entrypoint (call the new service's reindex method) — update the branches that reference rag.reindex (the block using folder and experimentId around the upload handler and the similar block at the later location) to either delete the call or invoke the new service API (e.g., NewRagService.reindex(experimentId) or await rag_service.reindex(experimentId)) so no unresolved rag symbol remains.
345-373: 🛠️ Refactor suggestion | 🟠 MajorMove ZIP download/extraction into a service.
This endpoint now embeds significant business logic (download, extract, cleanup). Please move it to
api/transformerlab/services/and keep the router focused on HTTP validation + delegation.As per coding guidelines,
api/transformerlab/{routers,services}/**/*.py: Implement business logic in api/transformerlab/services/ using the Service Pattern; routers should only handle HTTP validation and call services.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/routers/experiment/documents.py` around lines 345 - 373, Extract the ZIP download/extraction/cleanup logic out of the router into a new service (e.g., DocumentImportService) and expose a single method like import_from_url(experiment_id: str, url: str) that performs the following: instantiate Experiment(experimentId) and call get_dir(), compute documents_dir, perform the httpx.AsyncClient GET and response.raise_for_status(), write the response.content to a tempfile (temp_zip_path), open zipfile.ZipFile to extract into documents_dir and build extracted_files, remove the temp file, and (optionally) call rag.reindex(experimentId) if any extracted_files start with "rag/"; return the same payload currently returned ({"status":"success", "extracted_files":..., "total_files":...}). Replace the inline code in the router with validation + a call to DocumentImportService.import_from_url(experimentId, url) and return its result; keep function/class references from the diff (Experiment, get_dir, documents_dir, temp_zip_path, extracted_files, rag.reindex) to locate the moved logic.
269-270:⚠️ Potential issue | 🟠 MajorUse Pydantic request models and add return type hints.
Both
document_upload_links(line 270) anddocument_download_zip(line 334) endpoints accept untypeddictbodies without Pydantic models and omit return type annotations. Additionally, thefolderparameter should use explicit optional syntax (str | None = Noneinstead ofstr = None).Create a
api/transformerlab/schemas/documents.pyfile with request and response models, then update both functions to:
- Use the Pydantic models for request bodies
- Add explicit return type annotations
- Use modern optional syntax for the folder parameter
Note that
document_upload_linkscan return both success and error responses with different structures ({"status": "error", "message": "..."}or{"status": "success", "filename": ...}), so the return type should account for both cases.This applies to both functions at lines 269–270 and 333–335.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/routers/experiment/documents.py` around lines 269 - 270, Create a new Pydantic schema module api/transformerlab/schemas/documents.py defining request models (e.g., UploadLinksRequest with fields for the links list and any other payload keys, and DownloadZipRequest) and response models (e.g., ErrorResponse {status: str, message: str} and UploadSuccessResponse {status: str, filename: str}, plus DownloadSuccessResponse). Then update the router functions document_upload_links and document_download_zip to accept those models as typed bodies instead of raw dicts (e.g., data: UploadLinksRequest), change the folder parameter signature to use modern optional syntax (folder: str | None = None), and add explicit return type annotations that are unions of the success and error response models (e.g., ErrorResponse | UploadSuccessResponse). Ensure all referenced model names match those you add so the function signatures and return annotations compile cleanly.
360-362:⚠️ Potential issue | 🔴 CriticalImplement safe ZIP extraction with path validation to prevent Zip Slip attacks.
extractall()writes files based on archive member paths without validation, allowing directory traversal via paths like../../../file. The domain whitelist validates the source but not the ZIP contents; a malicious archive could originate from a whitelisted domain. Implement member-level validation and resolution checks before extraction.Additionally, add the missing return type annotation to comply with the type hints requirement for
api/**/*.pyfiles.Proposed fix
+from pathlib import Path- with zipfile.ZipFile(temp_zip_path, "r") as zip_ref: - zip_ref.extractall(documents_dir) - extracted_files = [f for f in zip_ref.namelist() if not f.endswith("/") and not f.startswith(".")] + with zipfile.ZipFile(temp_zip_path, "r") as zip_ref: + base_dir = Path(documents_dir).resolve() + extracted_files = [] + for member in zip_ref.infolist(): + if member.is_dir(): + continue + target = (base_dir / member.filename).resolve() + if not str(target).startswith(str(base_dir)): + raise HTTPException(status_code=400, detail="ZIP contains invalid paths") + zip_ref.extract(member, documents_dir) + if not member.filename.startswith("."): + extracted_files.append(member.filename)-async def document_download_zip(experimentId: str, data: dict = Body(...)): +async def document_download_zip(experimentId: str, data: dict = Body(...)) -> dict:
51-52:⚠️ Potential issue | 🟠 MajorGuard
secure_filenamewhenfolderis None.With
folder=None,secure_filenamecoerces it to the string"None", causing the subsequentif folder and folder != ""checks to treat it as a truthy value. Requests without an explicit folder parameter then incorrectly look for a folder literally named"None". Only sanitize when a folder is provided.Proposed fix
- folder = secure_filename(folder) + folder = secure_filename(folder) if folder else None- folder = secure_filename(folder) + folder = secure_filename(folder) if folder else None- folder = secure_filename(folder) + folder = secure_filename(folder) if folder else NoneAlso applies to: 75, 272
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/routers/experiment/documents.py` around lines 51 - 52, document_name should still be sanitized, but avoid calling secure_filename on folder when folder is None (or empty) because secure_filename(None) becomes "None" and makes subsequent if folder and folder != "" treat it as truthy; change the code to only call secure_filename for folder when folder is provided (e.g. if folder is not None and folder != "": folder = secure_filename(folder)) and apply the same guard at the other occurrences mentioned (around the uses at the other two locations referenced), leaving document_name = secure_filename(document_name) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@api/transformerlab/routers/experiment/documents.py`:
- Around line 248-250: The code currently calls rag.reindex(experimentId) when
folder == "rag", but rag was removed and causes F821 NameError; either remove
these conditional blocks or replace them with the new RAG service entrypoint
(call the new service's reindex method) — update the branches that reference
rag.reindex (the block using folder and experimentId around the upload handler
and the similar block at the later location) to either delete the call or invoke
the new service API (e.g., NewRagService.reindex(experimentId) or await
rag_service.reindex(experimentId)) so no unresolved rag symbol remains.
- Around line 345-373: Extract the ZIP download/extraction/cleanup logic out of
the router into a new service (e.g., DocumentImportService) and expose a single
method like import_from_url(experiment_id: str, url: str) that performs the
following: instantiate Experiment(experimentId) and call get_dir(), compute
documents_dir, perform the httpx.AsyncClient GET and
response.raise_for_status(), write the response.content to a tempfile
(temp_zip_path), open zipfile.ZipFile to extract into documents_dir and build
extracted_files, remove the temp file, and (optionally) call
rag.reindex(experimentId) if any extracted_files start with "rag/"; return the
same payload currently returned ({"status":"success", "extracted_files":...,
"total_files":...}). Replace the inline code in the router with validation + a
call to DocumentImportService.import_from_url(experimentId, url) and return its
result; keep function/class references from the diff (Experiment, get_dir,
documents_dir, temp_zip_path, extracted_files, rag.reindex) to locate the moved
logic.
- Around line 269-270: Create a new Pydantic schema module
api/transformerlab/schemas/documents.py defining request models (e.g.,
UploadLinksRequest with fields for the links list and any other payload keys,
and DownloadZipRequest) and response models (e.g., ErrorResponse {status: str,
message: str} and UploadSuccessResponse {status: str, filename: str}, plus
DownloadSuccessResponse). Then update the router functions document_upload_links
and document_download_zip to accept those models as typed bodies instead of raw
dicts (e.g., data: UploadLinksRequest), change the folder parameter signature to
use modern optional syntax (folder: str | None = None), and add explicit return
type annotations that are unions of the success and error response models (e.g.,
ErrorResponse | UploadSuccessResponse). Ensure all referenced model names match
those you add so the function signatures and return annotations compile cleanly.
- Around line 51-52: document_name should still be sanitized, but avoid calling
secure_filename on folder when folder is None (or empty) because
secure_filename(None) becomes "None" and makes subsequent if folder and folder
!= "" treat it as truthy; change the code to only call secure_filename for
folder when folder is provided (e.g. if folder is not None and folder != "":
folder = secure_filename(folder)) and apply the same guard at the other
occurrences mentioned (around the uses at the other two locations referenced),
leaving document_name = secure_filename(document_name) unchanged.
In `@api/transformerlab/shared/shared.py`:
- Around line 500-531: The EVAL branch currently skips execution (commented-out
run_evaluation_script) and returns success, run_generation_script is called but
undefined, and run_job lacks type hints/return annotation; fix by either wiring
EVAL to call a real evaluator (restore or implement run_evaluation_script and
call it with (experiment_name, plugin_name, eval_name, job_id, org_id=org_id,
user_id=user_id_from_job)) or explicitly fail early with job_update_status(...,
"FAILED") and an explanatory error message; ensure run_generation_script is
defined or imported before its call (or rename the call to the actual generation
function) so no NameError occurs; and add type hints to the run_job signature:
annotate job_config (use a Pydantic schema from schemas/ if available) and
job_details, plus a return type (e.g., -> dict) to satisfy api/**/*.py
guidelines.
- Around line 533-550: The GENERATE branch calls an undefined function
run_generation_script which will raise a NameError; fix by either importing the
correct function or calling the intended existing helper (restore or add the
missing definition). Locate the GENERATE handling block (master_job_type ==
"GENERATE") and replace the run_generation_script call with the correctly
imported symbol or implement/run the appropriate function that matches the EVAL
path pattern (use the same parameter set: experiment_name, plugin_name,
generation_name, job_id, org_id, user_id_from_job) and ensure
_get_user_id_for_subprocess, job_update_status and storage usage remain
consistent; also add the needed import at module top if restoring an external
function.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
api/api.pyapi/transformerlab/routers/batched_prompts.pyapi/transformerlab/routers/evals.pyapi/transformerlab/routers/experiment/conversations.pyapi/transformerlab/routers/experiment/diffusion.pyapi/transformerlab/routers/experiment/documents.pyapi/transformerlab/routers/experiment/evals.pyapi/transformerlab/routers/experiment/experiment.pyapi/transformerlab/routers/experiment/export.pyapi/transformerlab/routers/experiment/generations.pyapi/transformerlab/routers/experiment/rag.pyapi/transformerlab/routers/experiment/workflows.pyapi/transformerlab/routers/prompts.pyapi/transformerlab/routers/recipes.pyapi/transformerlab/routers/tasks.pyapi/transformerlab/routers/train.pyapi/transformerlab/shared/shared.py
💤 Files with no reviewable changes (15)
- api/api.py
- api/transformerlab/routers/train.py
- api/transformerlab/routers/experiment/rag.py
- api/transformerlab/routers/prompts.py
- api/transformerlab/routers/experiment/export.py
- api/transformerlab/routers/experiment/generations.py
- api/transformerlab/routers/experiment/experiment.py
- api/transformerlab/routers/evals.py
- api/transformerlab/routers/experiment/conversations.py
- api/transformerlab/routers/recipes.py
- api/transformerlab/routers/experiment/evals.py
- api/transformerlab/routers/batched_prompts.py
- api/transformerlab/routers/experiment/diffusion.py
- api/transformerlab/routers/tasks.py
- api/transformerlab/routers/experiment/workflows.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/transformerlab/shared/shared.py (1)
500-531:⚠️ Potential issue | 🟠 MajorEVAL and GENERATE jobs silently succeed without performing any work.
Both branches now transition
RUNNING → COMPLETEimmediately after the commented-out script calls, with no actual evaluation or generation taking place. Any caller (UI, workflow runner, test) will receive a"complete"response and aCOMPLETEjob status, giving the false impression that real work was done. If these job types are intentionally being retired, the branches should return an explicit unsupported/error response instead of a silent no-op success.🛠️ Suggested fix: return an explicit error for unsupported job types
if master_job_type == "EVAL": - # eval_name = job_config.get("evaluator", "") await job_update_status(job_id, "RUNNING", experiment_id=experiment_name) print("Running evaluation script") evals_output_file = storage.join(output_temp_file_dir, f"output_{job_id}.txt") if not await storage.exists(evals_output_file): async with await storage.open(evals_output_file, "w") as f: await f.write("") - user_id_from_job = _get_user_id_for_subprocess(job_details) - # await run_evaluation_script( - # experiment_name, plugin_name, eval_name, job_id, org_id=org_id, user_id=user_id_from_job - # ) - # Check if stop button was clicked and update status accordingly - job_row = await job_service.job_get(job_id) - job_data = job_row.get("job_data", None) if job_row else None - if job_data is None: - await job_update_status(job_id, "FAILED", experiment_id=experiment_name) - return {"status": "error", "job_id": job_id, "message": "Evaluation job failed: No job data found"} - - if job_data.get("stop", False): - await job_update_status(job_id, "STOPPED", experiment_id=experiment_name) - return {"status": "stopped", "job_id": job_id, "message": "Evaluation job was stopped by user"} - else: - job = await job_service.job_get(job_id) - current_status = job.get("status") - if current_status != "FAILED": - await job_update_status(job_id, "COMPLETE", experiment_id=experiment_name) - return {"status": "complete", "job_id": job_id, "message": "Evaluation job completed successfully"} + await job_update_status(job_id, "FAILED", experiment_id=experiment_name) + return {"status": "error", "job_id": job_id, "message": "EVAL job type is not supported in this version"} elif master_job_type == "GENERATE": plugin_name = job_config["plugin"] - # generation_name = job_config["generator"] await job_update_status(job_id, "RUNNING", experiment_id=experiment_name) print("Running generation script") gen_output_file = storage.join(output_temp_file_dir, f"output_{job_id}.txt") if not await storage.exists(gen_output_file): async with await storage.open(gen_output_file, "w") as f: await f.write("") - user_id_from_job = _get_user_id_for_subprocess(job_details) - # await run_generation_script( - # experiment_name, plugin_name, generation_name, job_id, org_id=org_id, user_id=user_id_from_job - # ) - # Check should_stop flag and update status accordingly - job_row = await job_service.job_get(job_id) - job_data = job_row.get("job_data", None) if job_row else None - if job_data is None: - await job_update_status(job_id, "FAILED", experiment_id=experiment_name) - return {"status": "error", "job_id": job_id, "message": "Generation job failed: No job data found"} - - if job_data.get("stop", False): - await job_update_status(job_id, "STOPPED", experiment_id=experiment_name) - return {"status": "stopped", "job_id": job_id, "message": "Generation job was stopped by user"} - else: - job = await job_service.job_get(job_id) - current_status = job.get("status") - if current_status != "FAILED": - await job_update_status(job_id, "COMPLETE", experiment_id=experiment_name) - return {"status": "complete", "job_id": job_id, "message": "Generation job completed successfully"} + await job_update_status(job_id, "FAILED", experiment_id=experiment_name) + return {"status": "error", "job_id": job_id, "message": "GENERATE job type is not supported in this version"}Also applies to: 533-568
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/shared/shared.py` around lines 500 - 531, The EVAL/GENERATE branches currently mark jobs RUNNING→COMPLETE without doing work because the actual calls (e.g., run_evaluation_script) are commented out; change these branches in the handler that checks master_job_type ("EVAL" and "GENERATE") to return an explicit error/unsupported response instead of silently succeeding: update the code around job_update_status(job_id, "RUNNING", ...) and the subsequent logic to set status to "FAILED" (or a new "UNSUPPORTED") and return {"status":"error","job_id":job_id,"message":"EVAL/GENERATE jobs are not supported"} (include org_id/user_id handling via _get_user_id_for_subprocess if needed) so callers and job_service.job_get see an explicit failure rather than a no-op COMPLETE; remove the final COMPLETE path for these branches until run_evaluation_script/run_generation_script are restored.
♻️ Duplicate comments (2)
api/transformerlab/routers/experiment/documents.py (2)
328-329: Same concern as Line 248-250 regarding RAG reindex removal and dead-code cleanup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/routers/experiment/documents.py` around lines 328 - 329, Remove the commented-out RAG reindex dead code: delete the two commented lines referencing the "rag" folder and the rag.reindex(experimentId) call (the same dead-code pattern seen around the earlier block with folder == "rag"), and if reindexing is still required implement it explicitly where folder handling occurs (use the existing rag.reindex method and the experimentId variable) or add a clear TODO instead of leaving commented code.
368-371: Same concern as Line 248-250 regarding RAG reindex removal and dead-code cleanup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/transformerlab/routers/experiment/documents.py` around lines 368 - 371, The commented-out RAG reindex block using extracted_files and rag.reindex(experimentId) is duplicate dead code (same as the earlier block around the other lines) and should be cleaned up: either remove these commented lines entirely to avoid confusion, or if reindexing is required, restore a single canonical implementation that inspects extracted_files for paths starting with "rag/" and calls rag.reindex(experimentId); update or remove the duplicate commented occurrence accordingly and ensure only one active reindex call exists (refer to the variables extracted_files and the call rag.reindex(experimentId)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/transformerlab/routers/experiment/documents.py`:
- Around line 248-250: The commented-out reindex calls leave the RAG vector
store stale; restore RAG indexing by re-enabling and awaiting
rag.reindex(experimentId) where files are uploaded (the commented blocks
referencing "rag" and reindex at the locations around the upload handlers), or
if indexing is intentionally disabled remove those commented lines to avoid dead
code. Specifically, in the upload handlers that check folder (the places where
"if folder == \"rag\":" and the commented "await rag.reindex(experimentId)"
appear), either uncomment and ensure rag.reindex(experimentId) is awaited and
rag is available/imported, or delete the commented blocks across all three spots
so the codebase is consistent.
In `@api/transformerlab/shared/shared.py`:
- Line 511: The local variable user_id_from_job assigned via
_get_user_id_for_subprocess is dead code because the calls that consumed it
(run_evaluation_script and run_generation_script) are commented out; remove the
user_id_from_job assignment and delete the associated commented-out
run_evaluation_script / run_generation_script blocks to clean up dead code and
avoid confusion, ensuring no other code references _get_user_id_for_subprocess
remains; if _get_user_id_for_subprocess is now unused elsewhere consider
removing or refactoring that helper too.
---
Outside diff comments:
In `@api/transformerlab/shared/shared.py`:
- Around line 500-531: The EVAL/GENERATE branches currently mark jobs
RUNNING→COMPLETE without doing work because the actual calls (e.g.,
run_evaluation_script) are commented out; change these branches in the handler
that checks master_job_type ("EVAL" and "GENERATE") to return an explicit
error/unsupported response instead of silently succeeding: update the code
around job_update_status(job_id, "RUNNING", ...) and the subsequent logic to set
status to "FAILED" (or a new "UNSUPPORTED") and return
{"status":"error","job_id":job_id,"message":"EVAL/GENERATE jobs are not
supported"} (include org_id/user_id handling via _get_user_id_for_subprocess if
needed) so callers and job_service.job_get see an explicit failure rather than a
no-op COMPLETE; remove the final COMPLETE path for these branches until
run_evaluation_script/run_generation_script are restored.
---
Duplicate comments:
In `@api/transformerlab/routers/experiment/documents.py`:
- Around line 328-329: Remove the commented-out RAG reindex dead code: delete
the two commented lines referencing the "rag" folder and the
rag.reindex(experimentId) call (the same dead-code pattern seen around the
earlier block with folder == "rag"), and if reindexing is still required
implement it explicitly where folder handling occurs (use the existing
rag.reindex method and the experimentId variable) or add a clear TODO instead of
leaving commented code.
- Around line 368-371: The commented-out RAG reindex block using extracted_files
and rag.reindex(experimentId) is duplicate dead code (same as the earlier block
around the other lines) and should be cleaned up: either remove these commented
lines entirely to avoid confusion, or if reindexing is required, restore a
single canonical implementation that inspects extracted_files for paths starting
with "rag/" and calls rag.reindex(experimentId); update or remove the duplicate
commented occurrence accordingly and ensure only one active reindex call exists
(refer to the variables extracted_files and the call rag.reindex(experimentId)).
| # reindex the vector store on every file upload | ||
| if folder == "rag": | ||
| await rag.reindex(experimentId) | ||
| # if folder == "rag": | ||
| # await rag.reindex(experimentId) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for RAG-related code and reindex usage
echo "=== Searching for RAG references ==="
rg -n --type=py '\brag\b' api --ignore-case | head -50
echo -e "\n=== Searching for reindex calls ==="
rg -n --type=py 'reindex' api | head -50
echo -e "\n=== Checking if rag module/service exists ==="
find api -type f -name "*rag*" 2>/dev/null
echo -e "\n=== Looking at documents.py context around commented lines ==="
sed -n '240,260p' api/transformerlab/routers/experiment/documents.py
sed -n '320,335p' api/transformerlab/routers/experiment/documents.py
sed -n '360,375p' api/transformerlab/routers/experiment/documents.pyRepository: transformerlab/transformerlab-app
Length of output: 6495
Remove the commented-out reindex blocks or restore RAG integration—stale vector stores otherwise.
RAG is actively used in the codebase (plugins/generate_rag_outputs/main.py makes live calls to /experiment/{experiment_id}/rag/query), but reindex is disabled across all upload endpoints (lines 248–250, 328–329, 368–371). This creates inconsistency: documents are stored but not indexed, causing RAG queries to return outdated results.
Either:
- Delete the dead code if RAG indexing is intentionally disabled
- Fix and uncomment reindex calls if RAG should be kept in sync with document uploads
The current state leaves vector stores stale.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/transformerlab/routers/experiment/documents.py` around lines 248 - 250,
The commented-out reindex calls leave the RAG vector store stale; restore RAG
indexing by re-enabling and awaiting rag.reindex(experimentId) where files are
uploaded (the commented blocks referencing "rag" and reindex at the locations
around the upload handlers), or if indexing is intentionally disabled remove
those commented lines to avoid dead code. Specifically, in the upload handlers
that check folder (the places where "if folder == \"rag\":" and the commented
"await rag.reindex(experimentId)" appear), either uncomment and ensure
rag.reindex(experimentId) is awaited and rag is available/imported, or delete
the commented blocks across all three spots so the codebase is consistent.
| @@ -511,9 +509,9 @@ async def run_job(job_id: str, job_config, experiment_name: str = "default", job | |||
| await f.write("") | |||
| # Pass user_id extracted from job_details if available | |||
| user_id_from_job = _get_user_id_for_subprocess(job_details) | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
user_id_from_job is dead code in both EVAL and GENERATE branches.
The variable is computed only to be passed to run_evaluation_script / run_generation_script, both of which are now commented out. These lines should be removed along with the commented-out blocks.
- user_id_from_job = _get_user_id_for_subprocess(job_details)
- # await run_evaluation_script(
- # experiment_name, plugin_name, eval_name, job_id, org_id=org_id, user_id=user_id_from_job
- # )- user_id_from_job = _get_user_id_for_subprocess(job_details)
- # await run_generation_script(
- # experiment_name, plugin_name, generation_name, job_id, org_id=org_id, user_id=user_id_from_job
- # )Also applies to: 547-547
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/transformerlab/shared/shared.py` at line 511, The local variable
user_id_from_job assigned via _get_user_id_for_subprocess is dead code because
the calls that consumed it (run_evaluation_script and run_generation_script) are
commented out; remove the user_id_from_job assignment and delete the associated
commented-out run_evaluation_script / run_generation_script blocks to clean up
dead code and avoid confusion, ensuring no other code references
_get_user_id_for_subprocess remains; if _get_user_id_for_subprocess is now
unused elsewhere consider removing or refactoring that helper too.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary by CodeRabbit