Merge pull request #402 from weaviate/v1-37/query-profiling-restore

g-despot · web-flow · commit e137cd64e2ee · 2026-04-23T16:06:34.000+02:00
[v1.37] Query profiling
diff --git a/_includes/code/howto/search.profile.py b/_includes/code/howto/search.profile.py
@@ -0,0 +1,72 @@
+# START ProfileNearVector
+import weaviate
+from weaviate.classes.query import MetadataQuery
+
+client = weaviate.connect_to_local()
+
+collection = client.collections.get("Article")
+
+response = collection.query.near_vector(
+    near_vector=[0.1, 0.2, 0.3],
+    limit=5,
+    return_metadata=MetadataQuery(query_profile=True, distance=True),
+)
+
+if response.query_profile:
+    for shard in response.query_profile.shards:
+        print(f"Shard: {shard.name} (node: {shard.node})")
+        for search_type, profile in shard.searches.items():
+            print(f"  [{search_type}]")
+            for key, value in profile.details.items():
+                print(f"    {key}: {value}")
+# END ProfileNearVector
+
+# START ProfileBM25
+from weaviate.classes.query import MetadataQuery
+
+collection = client.collections.get("Article")
+
+response = collection.query.bm25(
+    query="machine learning",
+    return_metadata=MetadataQuery(query_profile=True, score=True),
+)
+
+if response.query_profile:
+    for shard in response.query_profile.shards:
+        print(f"Shard: {shard.name} (node: {shard.node})")
+        for search_type, profile in shard.searches.items():
+            print(f"  [{search_type}]")
+            for key, value in profile.details.items():
+                print(f"    {key}: {value}")
+# END ProfileBM25
+
+# START ProfileHybrid
+from weaviate.classes.query import MetadataQuery
+
+collection = client.collections.get("Article")
+
+response = collection.query.hybrid(
+    query="machine learning",
+    return_metadata=MetadataQuery(query_profile=True),
+    limit=5,
+)
+
+if response.query_profile:
+    for shard in response.query_profile.shards:
+        print(f"Shard: {shard.name} (node: {shard.node})")
+        for search_type, profile in shard.searches.items():
+            print(f"  [{search_type}]")
+            for key, value in profile.details.items():
+                print(f"    {key}: {value}")
+# END ProfileHybrid
+
+# START ProfileMetadataList
+# You can also use list-style metadata
+response = collection.query.near_vector(
+    near_vector=[0.1, 0.2, 0.3],
+    limit=5,
+    return_metadata=["query_profile", "distance"],
+)
+# END ProfileMetadataList
+
+client.close()
diff --git a/_includes/feature-notes/query-profile.mdx b/_includes/feature-notes/query-profile.mdx
@@ -0,0 +1,2 @@
+:::info Added in `v1.36.9`
+:::
diff --git a/docs/deploy/configuration/logging.md b/docs/deploy/configuration/logging.md
@@ -117,6 +117,8 @@ QUERY_SLOW_LOG_THRESHOLD=2s
 
 When enabled, queries exceeding the threshold will be logged at the configured log level, allowing you to identify and optimize slow operations.
 
+For per-query timing breakdowns on demand (without configuring log thresholds), see [Query profiling](/weaviate/search/query-profile.md).
+
 ### Tenant Activity Logging
 
 For multi-tenant collections, you can configure the log level for tenant read and write activity.
diff --git a/docs/deploy/configuration/monitoring.md b/docs/deploy/configuration/monitoring.md
@@ -497,6 +497,10 @@ your uses perfectly:
 | [Usage](https://github.com/weaviate/weaviate/blob/master/tools/dev/grafana/dashboards/usage.json)                             | Obtain usage metrics, such as number of objects imported, etc.                                                          | ![Usage](./img/weaviate-sample-dashboard-usage.png "Usage")                                                        |
 | [Aysnc index queue](https://github.com/weaviate/weaviate/blob/main/tools/dev/grafana/dashboards/index_queue.json)             | Observe index queue activity                                                                                            | ![Async index queue](./img/weaviate-sample-dashboard-async-queue.png "Async index queue")                          |
 
+## Query profiling
+
+For per-query performance analysis, Weaviate provides [query profiling](/weaviate/search/query-profile.md). Unlike Prometheus metrics which show aggregate performance, query profiling provides per-shard timing breakdowns for individual queries — useful for diagnosing specific slow queries.
+
 ## `nodes` API Endpoint
 
 To get collection details programmatically, use the [`nodes`](/deploy/configuration/status.md#cluster-node-data) REST endpoint.
diff --git a/docs/weaviate/api/graphql/additional-properties.md b/docs/weaviate/api/graphql/additional-properties.md
@@ -154,6 +154,18 @@ The `score` will be the hybrid score of the result, based on the nominated [fusi
 The `explainScore` will be the hybrid score of the result, broken down into its vector and keyword search components. This can be used to understand why a result was scored the way it was.
 
 
+### Query profiling
+
+import QueryProfileNote from '/_includes/feature-notes/query-profile.mdx';
+
+<QueryProfileNote/>
+
+Use `queryProfile` to get per-shard timing breakdowns for a search query. Profile data is returned on the response level (attached to the first result in GraphQL), not per object. It includes timing for vector search, keyword scoring, filter evaluation, and object retrieval, broken down by shard and cluster node.
+
+In GraphQL, request `_additional { queryProfile }`. The profile is returned as a JSON string.
+
+See [How-to: Query profiling](../../search/query-profile.md) for full details, available metrics, and Python examples.
+
 ### Classification
 
 When a data-object has been <SkipLink href="/weaviate/api/rest#tag/classifications">subjected to classification</SkipLink>, you can get additional information about how the object was classified by running the following command:
diff --git a/docs/weaviate/more-resources/performance.md b/docs/weaviate/more-resources/performance.md
@@ -52,6 +52,10 @@ If you have a nested reference filter, Weaviate starts by resolving the deepest
 
 A tip is to avoid deeply nested filters in the queries. Additionally, try to make your queries as restrictive as possible, because a ten-level deep query would for example not be so expensive if all levels return only a single ID. In that case only ten one ID searches need to be performed, which is a lot of searches in one query, but each search is very cheap.
 
+## Profiling query performance
+
+To diagnose slow queries, use [query profiling](/weaviate/search/query-profile.md) to get per-shard timing breakdowns. This shows exactly how long each phase takes — vector search, keyword scoring, filter evaluation, object retrieval — broken down by shard and cluster node.
+
 ## Questions and feedback
 
 import DocsFeedback from '/_includes/docs-feedback.mdx';
diff --git a/docs/weaviate/search/basics.md b/docs/weaviate/search/basics.md
@@ -638,6 +638,10 @@ You can specify metadata fields to be returned.
 
 For a comprehensive list of metadata fields, see [GraphQL: Additional properties](../api/graphql/additional-properties.md).
 
+:::tip Debugging query performance
+Use [query profiling](./query-profile.md) to get per-shard timing breakdowns for any search query. Add `query_profile=True` to `MetadataQuery` to see exactly how long each phase takes.
+:::
+
 ## Multi-tenancy
 
 If [multi-tenancy](../concepts/data.md#multi-tenancy) is enabled, specify the tenant parameter in each query.
diff --git a/docs/weaviate/search/query-profile.md b/docs/weaviate/search/query-profile.md
@@ -0,0 +1,169 @@
+---
+title: Query profiling
+sidebar_position: 95
+image: og/docs/howto.jpg
+description: "Profile search queries to get per-shard timing breakdowns for vector search, keyword scoring, and filter evaluation."
+---
+
+import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
+import PyCode from '!!raw-loader!/_includes/code/howto/search.profile.py';
+import QueryProfileNote from '/_includes/feature-notes/query-profile.mdx';
+
+<QueryProfileNote/>
+
+Query profiling provides per-shard timing breakdowns for search queries. Enable it on any search request to see how long each phase takes — vector search, keyword scoring, filter evaluation, object retrieval — broken down by shard and cluster node.
+
+Profiling uses the same instrumentation as [slow query logging](/deploy/configuration/logging.md#slow-query-logging). It adds minimal overhead when enabled and zero overhead when disabled.
+
+## Enable profiling
+
+Add `query_profile=True` to `MetadataQuery`, or include `"query_profile"` in the metadata list:
+
+<FilteredTextBlock
+  text={PyCode}
+  startMarker="# START ProfileNearVector"
+  endMarker="# END ProfileNearVector"
+  language="python"
+/>
+
+Profile data is returned on the response object at `response.query_profile`, not on individual result objects. It represents the entire query across all shards.
+
+## Supported search types
+
+| Search type | Profile sections | Query methods |
+| :---------- | :--------------- | :------------ |
+| Vector search | `vector` | `near_vector`, `near_object`, `near_text`, `near_image`, etc. |
+| Keyword search (BM25) | `keyword` | `bm25` |
+| Hybrid search | `vector` + `keyword` | `hybrid` |
+| Object fetch | `object` | `fetch_objects` |
+| Any search + filters | Includes filter metrics | Add `filters` to any search |
+| Any search + groupBy | Profile at query level | Add `group_by` to any search |
+
+### BM25 example
+
+<FilteredTextBlock
+  text={PyCode}
+  startMarker="# START ProfileBM25"
+  endMarker="# END ProfileBM25"
+  language="python"
+/>
+
+### Hybrid example
+
+Hybrid search produces both `vector` and `keyword` profile sections per shard:
+
+<FilteredTextBlock
+  text={PyCode}
+  startMarker="# START ProfileHybrid"
+  endMarker="# END ProfileHybrid"
+  language="python"
+/>
+
+## Response structure
+
+The profile is structured as:
+
+```
+response.query_profile
+  └── shards[]
+        ├── name          # Shard identifier (e.g. "shard_0")
+        ├── node          # Cluster node (e.g. "weaviate-0")
+        └── searches      # Dict of search type → profile
+              ├── "vector" → details: { key: value, ... }
+              ├── "keyword" → details: { key: value, ... }
+              └── "object" → details: { key: value, ... }
+```
+
+Each search type contains a `details` dict with string key-value pairs. The available metrics depend on the query type, index configuration, and filter usage.
+
+## Available metrics
+
+### General metrics
+
+| Metric | Description | Present when |
+| :----- | :---------- | :----------- |
+| `total_took` | Total time for this shard's search | Always |
+| `objects_took` | Time retrieving objects from storage | Always |
+| `sort_took` | Time sorting results | When sorting is applied |
+
+### Vector search metrics
+
+| Metric | Description |
+| :----- | :---------- |
+| `vector_search_took` | Time spent in vector index search |
+| `knn_search_layer_N_took` | Per-layer HNSW graph traversal time (N = layer number) |
+| `knn_search_rescore_took` | Time rescoring compressed vectors (PQ/BQ/SQ) |
+| `hnsw_flat_search` | Whether flat (brute-force) search was used instead of HNSW (`"true"` or `"false"`) |
+
+### Filter metrics
+
+| Metric | Description |
+| :----- | :---------- |
+| `filters_build_allow_list_took` | Time building the filter allow-list |
+| `filters_ids_matched` | Number of object IDs matching the filter |
+
+### BM25 keyword metrics
+
+| Metric | Description |
+| :----- | :---------- |
+| `kwd_method` | BM25 scoring method used (e.g., `blockmaxwand`) |
+| `kwd_time` | Total BM25 scoring time |
+| `kwd_1_tok_time` | Query tokenization time |
+| `kwd_3_term_time` | Term dictionary lookup time |
+| `kwd_4_bmw_time` | BlockMaxWAND scoring time |
+| `kwd_6_res_count` | Number of results from keyword scoring |
+
+## Example output
+
+A hybrid search on a 3-node cluster with filters produces profiles for both vector and keyword phases on each shard:
+
+```
+Shard: shard_abc (node: weaviate-0)
+  [keyword]
+    kwd_method:                        blockmaxwand
+    kwd_time:                          242.75µs
+    kwd_1_tok_time:                    18.291µs
+    kwd_3_term_time:                   52.083µs
+    kwd_4_bmw_time:                    156.417µs
+    total_took:                        248.833µs
+  [vector]
+    filters_build_allow_list_took:     31.125µs
+    filters_ids_matched:               847
+    knn_search_layer_0_took:           14µs
+    objects_took:                      153.542µs
+    total_took:                        198.666µs
+    vector_search_took:                40.959µs
+
+Shard: shard_def (node: weaviate-1)
+  [keyword]
+    kwd_method:                        blockmaxwand
+    kwd_time:                          189.333µs
+    total_took:                        195.25µs
+  [vector]
+    filters_build_allow_list_took:     27.458µs
+    filters_ids_matched:               912
+    total_took:                        172.417µs
+    vector_search_took:                35.75µs
+```
+
+## Multi-node behavior
+
+In multi-node clusters, the coordinator node aggregates profile data from all shards across all nodes. Each shard profile includes the `node` field identifying which cluster node executed that shard's search. This makes it straightforward to identify performance imbalances across nodes.
+
+## Performance impact
+
+- **When disabled (default):** Zero overhead. A single boolean check skips all profiling code paths.
+- **When enabled:** Adds timing instrumentation to each shard search. The overhead is small (microsecond-level timer reads) but measurable under high-throughput workloads. Use for debugging and optimization, not in production hot paths.
+
+## Limitations
+
+- **Response-level only:** Profile data is on `response.query_profile`, not on individual objects. It represents the entire query, not individual result objects.
+- **Search phases only:** Profiling covers vector search, keyword scoring, and filter evaluation. It does not include time spent in generative modules, rerankers, or post-processing.
+- **No per-object breakdown:** You get per-shard timing, not per-object timing.
+- **Metrics vary by query:** Not all metrics appear in every response. Available metrics depend on the search type, index type (HNSW vs. flat), compression settings, and whether filters are used.
+
+## Questions and feedback
+
+import DocsFeedback from '/_includes/docs-feedback.mdx';
+
+<DocsFeedback/>
diff --git a/sidebars.js b/sidebars.js
@@ -634,6 +634,7 @@ const sidebars = {
         "weaviate/search/rerank",
         "weaviate/search/aggregate",
         "weaviate/search/filters",
+        "weaviate/search/query-profile",
         {
           type: "link",
           label: "Search strategies: In depth",

Original file line number	Diff line number	Diff line change
`@@ -634,6 +634,7 @@ const sidebars = {`
`634`	`634`	`"weaviate/search/rerank",`
`635`	`635`	`"weaviate/search/aggregate",`
`636`	`636`	`"weaviate/search/filters",`
	`637`	`+ "weaviate/search/query-profile",`
`637`	`638`	`{`
`638`	`639`	`type: "link",`
`639`	`640`	`label: "Search strategies: In depth",`