fix(ml): populate subscriptionId and resourceGroup for CLI-created datastores#9753
fix(ml): populate subscriptionId and resourceGroup for CLI-created datastores#9753fmabroukmsft wants to merge 1 commit intoAzure:mainfrom
Conversation
…tastores When creating Azure storage datastores via 'az ml datastore create', the CLI was not populating subscriptionId and resourceGroup in the REST request body. This caused the created datastore to lack ARM scope, breaking downstream operations such as sharing data assets to a registry (400 error). The REST client model supports these fields (they are optional on the AzureBlobDatastore, AzureFileDatastore, and AzureDataLakeGen2Datastore REST models), and the service stores and returns them correctly when provided. The issue was that the SDK entity's _to_rest_object() method never passed them. This fix intercepts the REST object after _to_rest_object() builds it, and injects the workspace's subscription and resource group when the entity does not carry its own values. This ensures CLI-created datastores have the same ARM scope as UI-created ones. Affected commands: az ml datastore create, az ml datastore update Related ICM: 716428613 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Validation for Breaking Change Starting...
Thanks for your contribution! |
|
Hi @fmabroukmsft, |
|
Thank you for your contribution! We will review the pull request and get back to you soon. |
|
The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR. Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions). pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>
|
Release SuggestionsModule: machinelearningservices
Notes
|
There was a problem hiding this comment.
Pull request overview
This PR addresses an Azure ML CLI parity issue where datastores created/updated via az ml datastore create/update are missing ARM-scope fields (subscriptionId, resourceGroup), causing downstream operations (e.g., sharing to registries) to fail.
Changes:
- Added
_create_or_update_with_arm_scope()to serialize a datastore to a REST object, backfillsubscriptionId/resourceGroupfor Azure storage datastores, and call the service. - Routed
ml_datastore_createandml_datastore_updatethrough the new helper.
| from azure.ai.ml.entities import Datastore | ||
| from azure.ai.ml.entities._datastore.azure_storage import AzureBlobDatastore, AzureDataLakeGen2Datastore, AzureFileDatastore | ||
| from azure.ai.ml.entities._load_functions import load_datastore | ||
|
|
||
| from .raise_error import log_and_raise_error | ||
| from .utils import _dump_entity_with_warnings, get_ml_client, modify_sys_path_for_rslex_mount | ||
|
|
||
|
|
||
| _AZURE_STORAGE_DATASTORE_TYPES = (AzureBlobDatastore, AzureDataLakeGen2Datastore, AzureFileDatastore) |
There was a problem hiding this comment.
This adds a dependency on azure.ai.ml.entities._datastore.azure_storage which is a private SDK module (leading underscore). That makes the CLI extension fragile across azure-ai-ml upgrades. Consider detecting Azure storage datastores via public surface (e.g., the datastore entity type string like azure_blob / azure_file / azure_data_lake_gen2) instead of importing private classes.
| """Create or update a datastore, backfilling subscription and resource group. | ||
|
|
||
| The SDK's ``_to_rest_object`` does not populate ``subscriptionId`` and | ||
| ``resourceGroup`` in the request body for Azure storage datastores. When | ||
| these fields are missing the created datastore lacks ARM scope, which | ||
| breaks downstream operations such as sharing data assets to a registry. | ||
|
|
||
| This helper builds the REST object, injects the workspace's subscription | ||
| and resource group when the datastore entity does not carry them, and | ||
| then calls the service directly. | ||
| """ | ||
| ds_request = datastore._to_rest_object() # pylint: disable=protected-access | ||
|
|
||
| if isinstance(datastore, _AZURE_STORAGE_DATASTORE_TYPES): | ||
| subscription_id = ml_client._operation_scope.subscription_id # pylint: disable=protected-access | ||
| resource_group = ml_client._operation_scope._resource_group_name # pylint: disable=protected-access | ||
|
|
||
| props = ds_request.properties | ||
| if props is not None: | ||
| if not getattr(props, 'subscription_id', None): | ||
| props.subscription_id = subscription_id | ||
| if not getattr(props, 'resource_group', None): | ||
| props.resource_group = resource_group | ||
|
|
||
| datastore_resource = ml_client.datastores._operation.create_or_update( # pylint: disable=protected-access | ||
| name=datastore.name, | ||
| resource_group_name=ml_client._operation_scope._resource_group_name, # pylint: disable=protected-access | ||
| workspace_name=ml_client.datastores._workspace_name, # pylint: disable=protected-access | ||
| body=ds_request, | ||
| skip_validation=True, | ||
| ) | ||
| return Datastore._from_rest_object(datastore_resource) # pylint: disable=protected-access | ||
|
|
||
|
|
There was a problem hiding this comment.
This helper relies on private MLClient internals (_operation_scope._resource_group_name, datastores._workspace_name, and datastores._operation). Since ml_datastore_create/update already receive resource_group_name and workspace_name, it would be more robust to pass those into this helper (or use the public ml_client.datastores.create_or_update path) to avoid breaking when SDK internals change.
| """Create or update a datastore, backfilling subscription and resource group. | |
| The SDK's ``_to_rest_object`` does not populate ``subscriptionId`` and | |
| ``resourceGroup`` in the request body for Azure storage datastores. When | |
| these fields are missing the created datastore lacks ARM scope, which | |
| breaks downstream operations such as sharing data assets to a registry. | |
| This helper builds the REST object, injects the workspace's subscription | |
| and resource group when the datastore entity does not carry them, and | |
| then calls the service directly. | |
| """ | |
| ds_request = datastore._to_rest_object() # pylint: disable=protected-access | |
| if isinstance(datastore, _AZURE_STORAGE_DATASTORE_TYPES): | |
| subscription_id = ml_client._operation_scope.subscription_id # pylint: disable=protected-access | |
| resource_group = ml_client._operation_scope._resource_group_name # pylint: disable=protected-access | |
| props = ds_request.properties | |
| if props is not None: | |
| if not getattr(props, 'subscription_id', None): | |
| props.subscription_id = subscription_id | |
| if not getattr(props, 'resource_group', None): | |
| props.resource_group = resource_group | |
| datastore_resource = ml_client.datastores._operation.create_or_update( # pylint: disable=protected-access | |
| name=datastore.name, | |
| resource_group_name=ml_client._operation_scope._resource_group_name, # pylint: disable=protected-access | |
| workspace_name=ml_client.datastores._workspace_name, # pylint: disable=protected-access | |
| body=ds_request, | |
| skip_validation=True, | |
| ) | |
| return Datastore._from_rest_object(datastore_resource) # pylint: disable=protected-access | |
| """Create or update a datastore using the public datastores client. | |
| This helper avoids relying on private MLClient internals by delegating to | |
| ``ml_client.datastores.create_or_update``. | |
| """ | |
| return ml_client.datastores.create_or_update(datastore) |
| resource_group_name=ml_client._operation_scope._resource_group_name, # pylint: disable=protected-access | ||
| workspace_name=ml_client.datastores._workspace_name, # pylint: disable=protected-access | ||
| body=ds_request, | ||
| skip_validation=True, |
There was a problem hiding this comment.
skip_validation=True is a behavior change from the previous implementation (ml_client.datastores.create_or_update(datastore) did not force this flag). If skip_validation disables service-side validation, this could allow invalid datastore definitions or produce less actionable errors. Consider preserving the prior default (omit the flag / set it explicitly to the previous default) or plumb it as an optional parameter so existing behavior is unchanged unless needed.
| skip_validation=True, |
|
|
||
| try: | ||
| datastore = load_datastore(file, params_override=params_override) | ||
| return ml_client.datastores.create_or_update(datastore)._to_dict() # pylint: disable=protected-access | ||
| return _create_or_update_with_arm_scope(ml_client, datastore)._to_dict() # pylint: disable=protected-access |
There was a problem hiding this comment.
This change affects the HTTP request body for az ml datastore create/update (injecting subscriptionId/resourceGroup). There are existing datastore scenario tests with recordings, but none assert the new fields. Please update/add a scenario test assertion that the created datastore response includes subscription_id and resource_group (or the expected keys in CLI output), and refresh the relevant recordings if playback matching depends on request bodies.
|
|
||
| try: | ||
| datastore = Datastore._load(parameters) # pylint: disable=protected-access | ||
| return ml_client.datastores.create_or_update(datastore)._to_dict() # pylint: disable=protected-access | ||
| return _create_or_update_with_arm_scope(ml_client, datastore)._to_dict() # pylint: disable=protected-access |
There was a problem hiding this comment.
Same as create: this alters the request body for az ml datastore update by injecting ARM scope fields. Please ensure scenario coverage includes an update case that verifies the updated datastore has subscription_id and resource_group populated, and refresh recordings if needed.
|
Companion SDK fix submitted: Azure/azure-sdk-for-python#46067 The SDK fix adds |
Problem
When creating Azure storage datastores via
az ml datastore create, the resulting datastore is missingsubscriptionIdandresourceGroupin its properties. This causes downstream operations like sharing data assets to registries to fail with a 400 error.UI-created datastores correctly have these fields populated. Only CLI-created datastores are affected.
Before (CLI-created datastore)
{ "subscriptionId": null, "resourceGroup": null, "datastoreType": "AzureBlob", "accountName": "mystorageaccount", "containerName": "mycontainer" }After (with this fix)
{ "subscriptionId": "8f338f6e-...", "resourceGroup": "my-resource-group", "datastoreType": "AzureBlob", "accountName": "mystorageaccount", "containerName": "mycontainer" }Root Cause
The SDK entity classes (
AzureBlobDatastore,AzureFileDatastore,AzureDataLakeGen2Datastore) do not includesubscription_idorresource_groupin their_to_rest_object()serialization, even though the underlying REST models (RestAzureBlobDatastoreetc.) accept these optional fields and the service stores them correctly when provided.Fix
This PR adds a helper
_create_or_update_with_arm_scope()in the CLI extension's datastore command handler that:_to_rest_object()subscriptionIdandresourceGroupinto the REST body when the entity doesn't provide themThis is a targeted CLI-extension-level fix. A more comprehensive fix should be made in the
azure-ai-mlSDK to add these fields to the YAML schema, entity constructors, and_to_rest_object()methods.Affected Commands
az ml datastore createaz ml datastore updateTesting
subscriptionIdandresourceGroupare now populated in the ARM API responseRelated