[data][llm] Profile the vLLM engine in data LLM release benchmarks

### Description

**Goal**: Help developers identify where the regression originates from in Ray Data LLM jobs.

**Approach**: In data LLM's release tests, we run benchmark with single-GPU generation workload. When there's a regression, it's not immediately clear if the regression originates from the vLLM engine or Ray Data LLM. Once https://github.com/ray-project/ray/pull/60385 lands, we should be able to profile the vLLM engine (e.g. TPOT, token throughput, E2E request latency) and print the metrics in the benchmark output so that it's clear whether the regression results from the vLLM engine. We can also profile the ray data LLM job.

**Relevant file**: https://github.com/ray-project/ray/blob/ad1b87448fec4db7ef11f1697f9bc02ae6a7ba09/release/llm_tests/batch/test_batch_single_node_vllm.py

**Hardware requirement**: You will need a GPU for this task.

### Use case

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data][llm] Profile the vLLM engine in data LLM release benchmarks #60935

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[data][llm] Profile the vLLM engine in data LLM release benchmarks #60935

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions