Skip to content

feat(huggingFace): add image task family via ImageTaskCodegen#5320

Open
PG1204 wants to merge 9 commits into
apache:mainfrom
ELin2025:hf/03-image-tasks
Open

feat(huggingFace): add image task family via ImageTaskCodegen#5320
PG1204 wants to merge 9 commits into
apache:mainfrom
ELin2025:hf/03-image-tasks

Conversation

@PG1204

@PG1204 PG1204 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

⚠️ This PR is stacked on #5278. Until that lands, the diff below also includes #5278's operator + codegen + spec changes. The new code in this PR is codegen/ImageTaskCodegen.scala, the image-related additions to codegen/PythonCodegenBase.scala, the new image fields on HuggingFaceInferenceOpDesc.scala, the frontend image-upload component, and the image-task tests in HuggingFaceInferenceOpDescSpec.scala. Once #5278 merges, this diff will auto-clean to ~856 lines.

What changes were proposed in this PR?

Adds the image task family — 9 HF pipeline tasks — as the second TaskCodegen plugged into the dispatcher established by #5278:

image-only: image-classification, object-detection, image-segmentation, image-to-text
image + prompt: visual-question-answering, document-question-answering, zero-shot-image-classification, image-text-to-text, image-to-image

  • codegen/ImageTaskCodegen.scala supplies the per-task payload + parse Python branches for all 9 tasks.
  • TaskCodegen trait gains a tasks: Set[String] default method (defaults to Set(task)) so a single codegen can register under multiple task strings; ImageTaskCodegen is the first multi-task codegen to use it.
  • CodegenContext extended with imageInput + inputImageColumn (EncodableString).
  • HuggingFaceInferenceOpDesc.scala gains 2 new @JsonProperty fields and registers ImageTaskCodegen via the new tasks flat-map.

PythonCodegenBase.scala grows to host the shared image infrastructure:

  • Task-family tuples (image_only_tasks, image_prompt_tasks, image_tasks) + image_headers in process_table.
  • Per-row image-bytes resolution from upload or column with _read_image_input / _read_binary_value / _compress_image_bytes.
  • _post_with_fallback extended with raw_binary_headers + use_raw_binary_body; adds image-text-to-text chat-completions and model-author vision branches.
  • _call_provider gains zai-org, Replicate predictions + polling, Fal-ai, Wavespeed submit+poll branches, and image embedding for OpenAI-compatible / unknown-provider fallbacks.
  • Image content-type response handling returns data:image/...;base64,... URLs.
  • Image helpers added: _read_image_input, _compress_image_bytes, _image_input_as_base64, _read_binary_value, _looks_like_html, _html_to_image_bytes, _extract_json_arg, _url_to_data_url.

Frontend integration (HF lines only — no agent / dataset noise):
HuggingFaceImageUploadComponent declared in app.module.ts, huggingface-image-upload formly type registered, image upload component .ts/.html/.scss + HuggingFace.png + sample-image.png assets.

User-input strings continue to flow through pyb"..." + EncodableString so they reach Python as self.decode_python_template('<base64>') rather than raw literals. PythonCodeRawInvalidTextSpec still passes
(117/117 descriptors py_compile cleanly).

Any related issues, documentation, or discussions?

How was this PR tested?

  • sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile" clean.
  • sbt scalafmtCheck clean.
  • sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec" — 18/18 pass (PR 2's 13 spec tests + 5 new image-task tests: image-only routing, VQA / document-QA payload, image-text-to-text chat-completions, image-to-image data-URL parse, all-9-tasks dispatcher coverage).
  • sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec" — 117/117 descriptors py_compile cleanly with the new operator code paths, no marker leaks.
  • Generated Python verified via python3 -m py_compile on sample image-task outputs.

Was this PR authored or co-authored using generative AI tooling?

Yes, co-authored with Claude Opus 4.7.

@github-actions github-actions Bot added frontend Changes related to the frontend GUI common labels Jun 3, 2026
@codecov-commenter

codecov-commenter commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.58394% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.03%. Comparing base (33903e1) to head (b56f3ac).

Files with missing lines Patch % Lines
...mage-upload/hugging-face-image-upload.component.ts 50.00% 41 Missing and 1 partial ⚠️
...ge-upload/hugging-face-image-upload.component.html 38.88% 11 Missing ⚠️
...ber/operator/huggingFace/codegen/TaskCodegen.scala 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               main    #5320    +/-   ##
==========================================
  Coverage     53.02%   53.03%            
  Complexity     2657     2657            
==========================================
  Files          1094     1097     +3     
  Lines         42286    42420   +134     
  Branches       4541     4556    +15     
==========================================
+ Hits          22423    22496    +73     
- Misses        18554    18610    +56     
- Partials       1309     1314     +5     
Flag Coverage Δ *Carryforward flag
access-control-service 70.91% <ø> (ø)
agent-service 34.36% <ø> (ø) Carriedforward from 9ab3e60
amber 53.42% <97.05%> (+0.03%) ⬆️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 47.87% <48.54%> (+<0.01%) ⬆️
pyamber 89.77% <ø> (ø) Carriedforward from 9ab3e60
python 90.73% <ø> (ø) Carriedforward from 9ab3e60
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@PG1204

PG1204 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/request-review @Ma77Ball

@Ma77Ball Ma77Ball left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the suggestions below.

@PG1204 PG1204 force-pushed the hf/03-image-tasks branch from 8187ac1 to 76f606a Compare June 5, 2026 20:15
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

✅ No material benchmark regressions detected

🟢 6 better · 🔴 0 worse · ⚪ 9 noise (<±5%) · 0 without baseline

Compared against main 33903e1 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🟢 bs=10 sw=10 sl=64 437 0.267 21,470/32,276/32,276 us 🟢 -5.3% / 🟢 -9.7%
🟢 bs=100 sw=10 sl=64 953 0.581 106,635/120,610/120,610 us 🟢 -23.1% / 🟢 -13.7%
🟢 bs=1000 sw=10 sl=64 1,119 0.683 901,582/922,952/922,952 us 🟢 -7.4% / 🟢 -9.8%
Baseline details

Latest main 33903e1 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 437 tuples/sec 442 tuples/sec 410.82 tuples/sec -1.1% +6.4%
bs=10 sw=10 sl=64 MB/s 0.267 MB/s 0.27 MB/s 0.251 MB/s -1.1% +6.5%
bs=10 sw=10 sl=64 p50 21,470 us 20,615 us 23,785 us +4.1% -9.7%
bs=10 sw=10 sl=64 p95 32,276 us 34,099 us 34,980 us -5.3% -7.7%
bs=10 sw=10 sl=64 p99 32,276 us 34,099 us 34,980 us -5.3% -7.7%
bs=100 sw=10 sl=64 throughput 953 tuples/sec 930 tuples/sec 891.94 tuples/sec +2.5% +6.8%
bs=100 sw=10 sl=64 MB/s 0.581 MB/s 0.567 MB/s 0.544 MB/s +2.5% +6.7%
bs=100 sw=10 sl=64 p50 106,635 us 104,082 us 112,277 us +2.5% -5.0%
bs=100 sw=10 sl=64 p95 120,610 us 156,890 us 139,802 us -23.1% -13.7%
bs=100 sw=10 sl=64 p99 120,610 us 156,890 us 139,802 us -23.1% -13.7%
bs=1000 sw=10 sl=64 throughput 1,119 tuples/sec 1,107 tuples/sec 1,041 tuples/sec +1.1% +7.5%
bs=1000 sw=10 sl=64 MB/s 0.683 MB/s 0.676 MB/s 0.635 MB/s +1.0% +7.5%
bs=1000 sw=10 sl=64 p50 901,582 us 893,996 us 972,714 us +0.8% -7.3%
bs=1000 sw=10 sl=64 p95 922,952 us 996,521 us 1,023,057 us -7.4% -9.8%
bs=1000 sw=10 sl=64 p99 922,952 us 996,521 us 1,023,057 us -7.4% -9.8%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,457.24,200,128000,437,0.267,21470.16,32275.64,32275.64
1,100,10,64,20,2099.62,2000,1280000,953,0.581,106634.63,120609.75,120609.75
2,1000,10,64,20,17874.12,20000,12800000,1119,0.683,901581.83,922951.99,922951.99

PG1204 and others added 6 commits June 15, 2026 17:48
Plugs the 9-task image family into the dispatcher pattern established
in PR 2:

  image-only      image-classification, object-detection,
                  image-segmentation, image-to-text
  image + prompt  visual-question-answering, document-question-answering,
                  zero-shot-image-classification, image-text-to-text,
                  image-to-image

- ImageTaskCodegen supplies payload + parse Python for all 9 tasks
- TaskCodegen trait gains a `tasks: Set[String]` default method so a
  single codegen can register under multiple task strings; the
  dispatcher map in HuggingFaceInferenceOpDesc is built from
  registeredCodegens.tasks.flatMap(...)
- CodegenContext extended with imageInput + inputImageColumn
  (EncodableString)
- HuggingFaceInferenceOpDesc gains 2 new @JsonProperty fields and
  registers ImageTaskCodegen

PythonCodegenBase grows to host the shared image infrastructure:
- image_only_tasks / image_prompt_tasks / image_tasks tuples and
  image_headers in process_table
- per-row image bytes resolution from upload (self._read_image_input)
  or input column (self._read_binary_value + self._compress_image_bytes)
- use_raw_binary_body / raw_binary_headers state threaded through
  _post_with_fallback (signature extended)
- _post_with_fallback adds the image-text-to-text chat-completions
  branch and the model-author vision branch
- _call_provider adds branches for zai-org's custom API, Replicate
  predictions + polling, Fal-ai, Wavespeed submit+poll, and image
  embedding in OpenAI-compatible / unknown-provider fallbacks
- image-content-type response handling returns data:image URLs
- image helpers added: _read_image_input, _compress_image_bytes,
  _image_input_as_base64, _read_binary_value, _looks_like_html,
  _html_to_image_bytes, _extract_json_arg, _url_to_data_url

User-input strings continue to flow through pyb"..." + EncodableString
so they reach Python as self.decode_python_template('<base64>') rather
than raw literals. PythonCodeRawInvalidTextSpec still passes
(117/117 descriptors py_compile cleanly).

Frontend integration adds only the HF lines (no agent / dataset
noise from the source branch):
- HuggingFaceImageUploadComponent declared in app.module.ts
- huggingface-image-upload formly type registered in formly-config.ts
- Image upload component .ts/.html/.scss cherry-picked from huggingFace
- HuggingFace.png + sample-image.png assets

PR 3 of a stacked 9-PR series. Stacks on hf/02-operator-textgen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests in HuggingFaceInferenceOpDescSpec for the fixes
@PG1204 PG1204 force-pushed the hf/03-image-tasks branch from dd644d4 to 5e0df3e Compare June 16, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add image task family (ImageTaskCodegen) to HuggingFace operator

3 participants