Skip to content

[ci] Migrate stress test and placement group compute configs to new schema#62607

Open
sai-miduthuri wants to merge 5 commits intomasterfrom
sai-miduthuri/upgrade-stress-pg-compute-configs
Open

[ci] Migrate stress test and placement group compute configs to new schema#62607
sai-miduthuri wants to merge 5 commits intomasterfrom
sai-miduthuri/upgrade-stress-pg-compute-configs

Conversation

@sai-miduthuri
Copy link
Copy Markdown
Contributor

Summary

Migrates 15 Anyscale compute config files from the legacy schema to the new SDK 2026 schema, and adds anyscale_sdk_2026: true to all corresponding test entries in release_tests.yaml.

Compute configs migrated (15 files)

Stress tests (release/nightly_tests/stress_tests/):

  • stress_tests_compute.yaml / stress_tests_compute_gce.yaml
  • stress_tests_compute_large.yaml / stress_tests_compute_large_gce.yaml
  • smoke_test_compute.yaml / smoke_test_compute_gce.yaml
  • stress_test_threaded_actor_compute.yaml
  • placement_group_tests_compute.yaml / placement_group_tests_compute_gce.yaml
  • stress_tests_single_node_oom_compute.yaml / stress_tests_single_node_oom_compute_gce.yaml

Placement group tests (release/nightly_tests/placement_group_tests/):

  • compute.yaml / compute_gce.yaml
  • pg_perf_test_compute.yaml / pg_perf_test_compute_gce.yaml

Tests updated in release_tests.yaml (9 tests)

  1. stress_test_placement_group
  2. stress_test_state_api_scale
  3. stress_test_many_tasks
  4. stress_test_dead_actors
  5. threaded_actors_stress_test
  6. stress_test_many_runtime_envs
  7. single_node_oom
  8. pg_autoscaling_regression_test
  9. placement_group_performance_test

Schema changes applied

  • cloud_idcloud, ANYSCALE_CLOUD_IDANYSCALE_CLOUD_NAME
  • head_node_typehead_node, worker_node_typesworker_nodes
  • min_workersmin_nodes, max_workersmax_nodes
  • use_spot: falsemarket_type: ON_DEMAND
  • advanced_configurations_json / gcp_advanced_configurations_jsonadvanced_instance_config
  • GCE: region + allowed_azszones
  • Resources: cpuCPU, gpuGPU, flattened custom_resources
  • Removed: region, max_workers, head/worker name fields (kept where multiple workers share instance type)
  • Removed commented-out blocks
  • Added CPU resources to head nodes where wait_for_nodes > worker count

Test plan

  • All 15 config files validated against ComputeConfig.from_yaml()
  • CI passes with anyscale_sdk_2026: true flag on all test entries

🤖 Generated with Claude Code

…chema

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: sai.miduthuri <[email protected]>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates several nightly and stress test compute configurations to a newer schema, likely associated with the Anyscale SDK 2026. Key changes include renaming configuration fields such as cloud_id to cloud, head_node_type to head_node, and worker_node_types to worker_nodes, as well as updating resource keys and market types. Additionally, the anyscale_sdk_2026: true flag has been enabled for these tests in release/release_tests.yaml. I have no feedback to provide.

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core release-test release test labels Apr 14, 2026
sai-miduthuri and others added 3 commits April 14, 2026 11:11
…ker resources

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: sai.miduthuri <[email protected]>
The legacy config had no explicit cpu on workers (only custom_resources),
so the migration should preserve that — just flatten custom_resources
without adding CPU.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: sai.miduthuri <[email protected]>
CPU: 0 is the default behavior when worker_nodes is present, so no
need to explicitly set it.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: sai.miduthuri <[email protected]>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8ed39cd. Configure here.

…ts_compute_gce

The legacy file had a commented-out advanced_configurations_json block.
Restore it with the updated field name (advanced_instance_config).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: sai.miduthuri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core release-test release test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant