Skip to content

ray: refactor ray wrapper impl#1016

Merged
DefTruth merged 2 commits into
mainfrom
dev
May 25, 2026
Merged

ray: refactor ray wrapper impl#1016
DefTruth merged 2 commits into
mainfrom
dev

Conversation

@DefTruth
Copy link
Copy Markdown
Member

@DefTruth DefTruth commented May 25, 2026

python3 examples/ray/ray_wrapper_example.py \
  --model-path $FLUX_2_KLEIN_BASE_9B_DIR \
  --tp 2 \
  --compile --cache \
  --save-path ./tmp/ray_cache.png
Unable to import `torchao` Tensor objects. This may affect loading checkpoints serialized with `torchao`
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.37it/s]
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 399/399 [00:00<00:00, 9971.56it/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  3.91it/s]
[05-25 12:04:24] [Cache-DiT] Auto selected parallelism backend for transformer: Native_PyTorch
2026-05-25 12:04:32,681	INFO worker.py:2013 -- Started a local Ray instance.
[05-25 12:05:24] [Cache-DiT] Ray pipeline worker placement before load: [{'rank': 0, 'device': 'cuda', 'cuda_visible_devices': '6', 'memory_allocated_mib': 0, 'memory_reserved_mib': 0}, {'rank': 1, 'device': 'cuda', 'cuda_visible_devices': '7', 'memory_allocated_mib': 0, 'memory_reserved_mib': 0}]
[05-25 12:05:24] [Cache-DiT] The main-process pipeline is already on CPU before Ray worker loading.
[05-25 12:05:24] [Cache-DiT] Saving the current pipeline snapshot for Ray workers to /workspace/dev/vipshop/cache-dit/.tmp/cache_dit_ray/5ff92be54fe645cfb4295beaa964da06/pipeline.
Writing model shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:17<00:00, 17.73s/it]
[05-25 12:06:02] [Cache-DiT] Saved the pipeline snapshot in 37.61s.
Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]
Loading weights:   0%|          | 0/399 [00:00<?, ?it/s]
Loading weights: 100%|██████████| 399/399 [00:00<00:00, 9397.09it/s]
Loading pipeline components...:  20%|██        | 1/5 [00:01<00:05,  1.44s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 106.41it/s]
Loading weights: 100%|██████████| 399/399 [00:00<00:00, 9568.15it/s]
Loading pipeline components...: 100%|██████████| 5/5 [00:01<00:00,  3.01it/s]
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Flux2KleinPipeline is officially supported by cache-dit. Use it's pre-defined BlockAdapter directly!
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Auto fill blocks_name: ['transformer_blocks', 'single_transformer_blocks'].
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Match Block Forward Pattern: Flux2TransformerBlock, ForwardPattern.Pattern_1
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] IN:('hidden_states', 'encoder_hidden_states'), OUT:('encoder_hidden_states', 'hidden_states'))
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Match Block Forward Pattern: Flux2SingleTransformerBlock, ForwardPattern.Pattern_3
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] IN:('hidden_states',), OUT:('hidden_states',))
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Use default 'enable_separate_cfg' from block adapter register: False, Pipeline: Flux2KleinPipeline.
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Collected Context Config: DBCache_F1B0_W8I1M0MC0_R0.08_CFG0, Calibrator Config: None
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Collected Context Config: DBCache_F1B0_W8I1M0MC0_R0.08_CFG0, Calibrator Config: None
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Match Blocks: CachedBlocks_Pattern_0_1_2, for transformer_blocks, cache_context: transformer_blocks_139639897993680, context_manager: Flux2KleinPipeline_139641127455344.
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Match Blocks: CachedBlocks_Pattern_3_4_5, for single_transformer_blocks, cache_context: single_transformer_blocks_139639898257696, context_manager: Flux2KleinPipeline_139641127455344.
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Auto fill blocks_name: ['transformer_blocks', 'single_transformer_blocks'].
(RayPipelineWorker pid=3899811) [05-25 12:06:13] [Cache-DiT] Registered custom attn backends: native, flash, _sdpa_cudnn, sage, _native_npu, _npu_fia.
(RayPipelineWorker pid=3899811) [05-25 12:06:20] [Cache-DiT] Parallelize Transformer: Flux2Transformer2DModel, id:139639897993584, ParallelismConfig(backend=Native_PyTorch, tp_size=2)
(RayPipelineWorker pid=3899811) [05-25 12:06:22] [Cache-DiT] Compiling Ray-owned transformer with compile_repeated_blocks().
[05-25 12:06:23] [Cache-DiT] Loaded saved pipeline snapshots on Ray workers in 58.31s.
[05-25 12:06:23] [Cache-DiT] Ray pipeline worker placement after load: [{'rank': 0, 'device': 'cuda', 'cuda_visible_devices': '6', 'memory_allocated_mib': 24786, 'memory_reserved_mib': 25686}, {'rank': 1, 'device': 'cuda', 'cuda_visible_devices': '7', 'memory_allocated_mib': 24786, 'memory_reserved_mib': 25686}]
[05-25 12:06:23] [Cache-DiT] Enabled Ray parallelism for Flux2KleinPipeline with world_size=2.
Warmup: 1
Repeat: 1
Total Inference Time: 17.73s
Average Inference Time: 17.73s
Saved image to tmp/ray_cache.png
[05-25 12:07:10] [Cache-DiT] Acceleration hooks is disabled for: CacheDitRayFlux2KleinPipeline.

fixed #1010

@DefTruth DefTruth merged commit 1e22079 into main May 25, 2026
4 checks passed
@DefTruth DefTruth deleted the dev branch May 25, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better ray wrapper integration

1 participant