SynapseML version
1.0.9
System information
- Language version (e.g. python 3.8, scala 2.12): Python 3.10.12, Scala 2.12.17
- Spark Version (e.g. 3.2.3): 3.4.3
- Spark Platform (e.g. Synapse, Databricks): Microsoft Fabric
Describe the problem
When writing data to an existing Azure Search index that contains scoring profiles, the operation fails with a spray.json.DeserializationException because the JSON parser expects scoring profiles to be simple strings but Azure Search returns complex JSON objects.
Root cause:
AzureSearchSchemas.scala defines scoringProfiles: Option[Seq[String]]
- But Azure Search actually returns complex objects with
functionAggregation, functions, text, etc.
Expected behavior:
Writing data to an index with scoring profiles should work without parsing errors.
Actual behavior:
Operation fails with DeserializationException when trying to parse the index definition.
Impact:
This prevents users from writing data to any Azure Search index that has scoring profiles configured, which is a common production scenario for relevance tuning.
Current workaround:
- Create indexes without scoring profiles when using SynapseML
- Add scoring profiles later via Azure Portal/REST API after data is written
- Or recreate the index without scoring profiles each time
Code to reproduce issue
from synapse.ml.services import *
from pyspark.sql import functions as F
AZURE_SEARCH_SUBSCRIPTION_KEY = "<your-subscription-key>"
AZURE_SEARCH_SERVICE_NAME = "<your-service-name>"
AZURE_SEARCH_INDEX_NAME = "existing-index-with-scoring-profiles"
# Create simple test DataFrame
test_df = spark.createDataFrame([
("TEST01", "item1", "2025-05-15"),
("TEST02", "item2", "2025-05-20")
], ["id", "name", "date"])
test_df = test_df.withColumn("SearchAction", F.lit("upload"))
# Assume you have an existing Azure Search index that contains scoring profiles like:
# {
# "name": "my-index",
# "fields": [...],
# "scoringProfiles": [{
# "name": "freshness_boost",
# "functionAggregation": "sum",
# "functions": [{
# "type": "freshness",
# "boost": 2.0,
# "fieldName": "date",
# "interpolation": "constant",
# "freshness": {"boostingDuration": "P1D"}
# }]
# }]
# }
# This FAILS with DeserializationException when the index has scoring profiles
try:
test_df.writeToAzureSearch(
subscriptionKey = AZURE_SEARCH_SUBSCRIPTION_KEY,
serviceName = AZURE_SEARCH_SERVICE_NAME,
indexName = AZURE_SEARCH_INDEX_NAME, # Index with scoring profiles
keyCol = "id",
actionCol = "SearchAction"
)
except Exception as e:
print(f"Error: {e}")
# Error: spray.json.DeserializationException: Expected String as JsString, but got {complex scoring profile object}
# WORKAROUND: Create/use an index without scoring profiles
AZURE_SEARCH_INDEX_NAME_NO_PROFILES = "same-index-no-scoring-profiles"
# This works when the index has no scoring profiles
test_df.writeToAzureSearch(
subscriptionKey = AZURE_SEARCH_SUBSCRIPTION_KEY,
serviceName = AZURE_SEARCH_SERVICE_NAME,
indexName = AZURE_SEARCH_INDEX_NAME_NO_PROFILES, # Index without scoring profiles
keyCol = "id",
actionCol = "SearchAction"
)
# Note: You can add scoring profiles to the index later via Azure Portal/REST API
Other info / logs
No response
What component(s) does this bug affect?
What language(s) does this bug affect?
What integration(s) does this bug affect?
SynapseML version
1.0.9
System information
Describe the problem
When writing data to an existing Azure Search index that contains scoring profiles, the operation fails with a
spray.json.DeserializationExceptionbecause the JSON parser expects scoring profiles to be simple strings but Azure Search returns complex JSON objects.Root cause:
AzureSearchSchemas.scaladefinesscoringProfiles: Option[Seq[String]]functionAggregation,functions,text, etc.Expected behavior:
Writing data to an index with scoring profiles should work without parsing errors.
Actual behavior:
Operation fails with
DeserializationExceptionwhen trying to parse the index definition.Impact:
This prevents users from writing data to any Azure Search index that has scoring profiles configured, which is a common production scenario for relevance tuning.
Current workaround:
Code to reproduce issue
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive: Cognitive projectarea/core: Core projectarea/deep-learning: DeepLearning projectarea/lightgbm: Lightgbm projectarea/opencv: Opencv projectarea/vw: VW projectarea/website: Websitearea/build: Project build systemarea/notebooks: Samples under notebooks folderarea/docker: Docker usagearea/models: models related issueWhat language(s) does this bug affect?
language/scala: Scala source codelanguage/python: Pyspark APIslanguage/r: R APIslanguage/csharp: .NET APIslanguage/new: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse: Azure Synapse integrationsintegrations/azureml: Azure ML integrationsintegrations/databricks: Databricks integrations