feat(huggingFace): add image task family via ImageTaskCodegen#5320
Open
PG1204 wants to merge 9 commits into
Open
feat(huggingFace): add image task family via ImageTaskCodegen#5320PG1204 wants to merge 9 commits into
PG1204 wants to merge 9 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5320 +/- ##
==========================================
Coverage 53.02% 53.03%
Complexity 2657 2657
==========================================
Files 1094 1097 +3
Lines 42286 42420 +134
Branches 4541 4556 +15
==========================================
+ Hits 22423 22496 +73
- Misses 18554 18610 +56
- Partials 1309 1314 +5
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
Author
|
/request-review @Ma77Ball |
Ma77Ball
suggested changes
Jun 4, 2026
Ma77Ball
suggested changes
Jun 10, 2026
Contributor
✅ No material benchmark regressions detected🟢 6 better · 🔴 0 worse · ⚪ 9 noise (<±5%) · 0 without baseline
Baseline detailsLatest main
Raw CSVconfig_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,457.24,200,128000,437,0.267,21470.16,32275.64,32275.64
1,100,10,64,20,2099.62,2000,1280000,953,0.581,106634.63,120609.75,120609.75
2,1000,10,64,20,17874.12,20000,12800000,1119,0.683,901581.83,922951.99,922951.99 |
Plugs the 9-task image family into the dispatcher pattern established
in PR 2:
image-only image-classification, object-detection,
image-segmentation, image-to-text
image + prompt visual-question-answering, document-question-answering,
zero-shot-image-classification, image-text-to-text,
image-to-image
- ImageTaskCodegen supplies payload + parse Python for all 9 tasks
- TaskCodegen trait gains a `tasks: Set[String]` default method so a
single codegen can register under multiple task strings; the
dispatcher map in HuggingFaceInferenceOpDesc is built from
registeredCodegens.tasks.flatMap(...)
- CodegenContext extended with imageInput + inputImageColumn
(EncodableString)
- HuggingFaceInferenceOpDesc gains 2 new @JsonProperty fields and
registers ImageTaskCodegen
PythonCodegenBase grows to host the shared image infrastructure:
- image_only_tasks / image_prompt_tasks / image_tasks tuples and
image_headers in process_table
- per-row image bytes resolution from upload (self._read_image_input)
or input column (self._read_binary_value + self._compress_image_bytes)
- use_raw_binary_body / raw_binary_headers state threaded through
_post_with_fallback (signature extended)
- _post_with_fallback adds the image-text-to-text chat-completions
branch and the model-author vision branch
- _call_provider adds branches for zai-org's custom API, Replicate
predictions + polling, Fal-ai, Wavespeed submit+poll, and image
embedding in OpenAI-compatible / unknown-provider fallbacks
- image-content-type response handling returns data:image URLs
- image helpers added: _read_image_input, _compress_image_bytes,
_image_input_as_base64, _read_binary_value, _looks_like_html,
_html_to_image_bytes, _extract_json_arg, _url_to_data_url
User-input strings continue to flow through pyb"..." + EncodableString
so they reach Python as self.decode_python_template('<base64>') rather
than raw literals. PythonCodeRawInvalidTextSpec still passes
(117/117 descriptors py_compile cleanly).
Frontend integration adds only the HF lines (no agent / dataset
noise from the source branch):
- HuggingFaceImageUploadComponent declared in app.module.ts
- huggingface-image-upload formly type registered in formly-config.ts
- Image upload component .ts/.html/.scss cherry-picked from huggingFace
- HuggingFace.png + sample-image.png assets
PR 3 of a stacked 9-PR series. Stacks on hf/02-operator-textgen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests in HuggingFaceInferenceOpDescSpec for the fixes
dd644d4 to
5e0df3e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this PR?
Adds the image task family — 9 HF pipeline tasks — as the second
TaskCodegenplugged into the dispatcher established by #5278:image-only: image-classification, object-detection, image-segmentation, image-to-text
image + prompt: visual-question-answering, document-question-answering, zero-shot-image-classification, image-text-to-text, image-to-image
codegen/ImageTaskCodegen.scalasupplies the per-task payload + parse Python branches for all 9 tasks.TaskCodegentrait gains atasks: Set[String]default method (defaults toSet(task)) so a single codegen can register under multiple task strings;ImageTaskCodegenis the first multi-task codegen to use it.CodegenContextextended withimageInput+inputImageColumn(EncodableString).HuggingFaceInferenceOpDesc.scalagains 2 new@JsonPropertyfields and registersImageTaskCodegenvia the newtasksflat-map.PythonCodegenBase.scalagrows to host the shared image infrastructure:image_only_tasks,image_prompt_tasks,image_tasks) +image_headersinprocess_table._read_image_input/_read_binary_value/_compress_image_bytes._post_with_fallbackextended withraw_binary_headers+use_raw_binary_body; adds image-text-to-text chat-completions and model-author vision branches._call_providergains zai-org, Replicate predictions + polling, Fal-ai, Wavespeed submit+poll branches, and image embedding for OpenAI-compatible / unknown-provider fallbacks.data:image/...;base64,...URLs._read_image_input,_compress_image_bytes,_image_input_as_base64,_read_binary_value,_looks_like_html,_html_to_image_bytes,_extract_json_arg,_url_to_data_url.Frontend integration (HF lines only — no agent / dataset noise):
HuggingFaceImageUploadComponentdeclared inapp.module.ts,huggingface-image-uploadformly type registered, image upload component .ts/.html/.scss +HuggingFace.png+sample-image.pngassets.User-input strings continue to flow through
pyb"..."+EncodableStringso they reach Python asself.decode_python_template('<base64>')rather than raw literals.PythonCodeRawInvalidTextSpecstill passes(117/117 descriptors
py_compilecleanly).Any related issues, documentation, or discussions?
How was this PR tested?
sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"clean.sbt scalafmtCheckclean.sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec"— 18/18 pass (PR 2's 13 spec tests + 5 new image-task tests: image-only routing, VQA / document-QA payload, image-text-to-text chat-completions, image-to-image data-URL parse, all-9-tasks dispatcher coverage).sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"— 117/117 descriptorspy_compilecleanly with the new operator code paths, no marker leaks.python3 -m py_compileon sample image-task outputs.Was this PR authored or co-authored using generative AI tooling?
Yes, co-authored with Claude Opus 4.7.