Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Immutable records of architectural choices and their rationale.
- [0022 — OpenTelemetry semantic conventions for aevatar.* activities](adr/0022-otel-aevatar-semantic-conventions.md)
- [0023 — Two-tier Inspector architecture (canonical readmodel vs observation OTel)](adr/0023-two-tier-inspector-architecture.md)
- [Chat Route Policy — Config Actor + Boundary Resolver](adr/0024-chat-route-policy.md)
- [Elasticsearch exact-match field resolution reads live index mapping](adr/0025-elasticsearch-exact-match-resolution-reads-index-truth.md)

## History

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: Elasticsearch exact-match field resolution reads live index mapping
status: Accepted
owner: eanzhao
---

# ADR-0025: Elasticsearch exact-match field resolution reads live index mapping

## Context

PR #665 ("Stabilize Elasticsearch projection index mappings", merged 2026-05-18;
design doc `docs/design/2026-05-15-elasticsearch-projection-index-mapping-blueprint.md`)
added `ElasticsearchProjectionDescriptorMappingSupport.AugmentMetadata`: for every
read-model string field whose name matches a stable-identifier shape (`*_id`,
`*_key`, `*_hash`, `*_status`, `*_kind`, `*_type`, ...), the provider injects a
`{"type":"keyword"}` entry into the in-memory `DocumentIndexMetadata.Mappings`.

`BuildExactMatchFieldPathResolver` consulted that augmented metadata to decide
whether an exact-match (`term`) filter targets a field directly or through its
`.keyword` sub-field. The contradiction:

- **Augmented metadata is the code's *intent*** — it says "this field should be
a keyword."
- **An Elasticsearch index created before that intent shipped keeps its original
mapping forever.** A string field on such an index carries the ES dynamic
default — `text` with a `.keyword` multi-field — and `EnsureIndexAsync` never
reconciles an already-existing index (it treats `resource_already_exists` as
"done").

For any index created before 2026-05-18, the resolver therefore saw `keyword`
(intent) and emitted the bare field path, while the field was physically `text`
(truth). The `term` query hit the analyzed `text` field and returned **0 hits**
for identifier-shaped values — silently. This took down the Lark bot on
2026-05-20 (issue #743): the relay callback's scope resolver could not resolve
`apiKeyId → scopeId`, and every inbound relay callback returned 401.

The 2026-05-15 blueprint anticipated incompatibility (§5 hard-constraint #2: the
mapping helper works only from the proto contract + declared
`DocumentIndexMetadata`, never from runtime index state; #9: incompatible
contract changes require a manual clear/rebuild). That stance is defensible for
*index creation*, but it left the *read path* trusting intent over physical
truth, and the rebuild runbook was enforced by no gate.

## Decision

### D1 — The exact-match resolver reads the live index `_mapping`

`ElasticsearchProjectionDocumentStore.QueryAsync` resolves `keyword`/`text` field
paths from the **actual** Elasticsearch mapping of the target index, obtained via
`GET <index>/_mapping` (`ElasticsearchIndexLifecycleManager.GetActualFieldMappingsAsync`),
not from the code-side augmented `DocumentIndexMetadata`.

This narrows blueprint hard-constraint #2 for the read path only: exact-match
`term` resolution is now sourced from index truth. Index *creation* still works
purely from the proto contract + declared metadata — `AugmentMetadata` and
`EnsureIndexAsync` are unchanged.

### D2 — Reading mapping is not query-time repair

`GET _mapping` reads index schema metadata. It performs no mapping mutation, no
reindex, no document backfill, and no event replay. The query path stays free of
repair/priming side effects (CLAUDE.md "query path 禁止执行 mapping mutation /
repair"; blueprint §5 #3). The provider still does not do online index repair or
document-level dual-read.

### D3 — Probe failure falls back to declared metadata

When the `_mapping` probe cannot read physical truth (index absent, ES
unreachable, HTTP timeout, unparseable body), the resolver falls back to the
augmented `DocumentIndexMetadata` — the pre-#743 behaviour. A best-effort probe
must never turn a transient mapping-endpoint failure into a query failure.

### D4 — The probe result is cached per index for the store lifetime

`GetActualFieldMappingsAsync` caches a successful read per index name. Steady-state
cost is one extra `GET _mapping` per index per process. Mapping drift within a
process lifetime is not a concern for stable query fields — they exist in the
proto contract from the start; a process restart re-probes.

### D5 — Scope: query path only

This ADR fixes the exact-match *filter* resolution that caused #743. It does not
introduce alias indirection, schema fingerprinting, blue-green reindex migration,
or a real-Elasticsearch CI suite. Those (issue #743 phases P1–P3, P5) remain
tracked by #743 as a separate index-lifecycle effort; they are required neither to
recover the outage nor to make the query path drift-tolerant.

## Alternatives considered

- **Revert #665.** Rejected: descriptor-driven keyword mapping for new indices is
correct and wanted. The missing piece is read-path drift tolerance, not the
augmentation itself.
- **Heuristic patch to the resolver** (e.g. "always also try `.keyword`").
Rejected — #743 non-goal #8. A blind second guess deepens implicit-convention
debt; reading the index's real mapping is ground truth, not a heuristic.
- **Manual clear/rebuild runbook** (the blueprint's original stance). Rejected as
the *primary* mechanism: it is enforced by no gate and already failed in
production. Reading index truth makes the query path correct without an
operator step.
- **The full index-lifecycle epic now** (alias + fingerprint + migration +
Testcontainers). Deferred: too large for one PR onto the live deploy branch and
unnecessary to recover the outage. Tracked by #743.

## Consequences

- Every projection index created before 2026-05-18 with dynamic string mappings
now answers identifier-shaped exact-match queries correctly — the Lark
registration lookup and every latent variant recover without an operator
touching production.
- One additional cached `GET _mapping` round-trip per index per process.
- `src/Aevatar.CQRS.Projection.Providers.Elasticsearch/README.md` "自动索引映射"
is updated: the provider reads live mapping for read-side field resolution (it
still does not repair or rebuild indices).
- The blueprint's "no runtime index state" constraint now has a recorded, scoped
exception; future read-path work references this ADR instead of silently
re-deciding.

## References

- Issue #743 — ES projection index lifecycle: schema-drift gap silently breaks
by-field queries (Lark bot outage 2026-05-20).
- PR #665 — Stabilize Elasticsearch projection index mappings.
- `docs/design/2026-05-15-elasticsearch-projection-index-mapping-blueprint.md` —
§5 hard-constraints #2/#3, §9 target architecture.
- CLAUDE.md — "权威状态 / ReadModel / Projection(强制)", "正确架构优先".
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ owner: aevatar-core
9. 重构语义必须诚实: 新 mapping 契约不兼容旧 index 时,直接要求清空 / 重建 projection index;不在应用读路径里偷偷修复,也不为未投产历史数据设计兼容层。
10. 本设计不得引入内部泛化 `Metadata` bag;`DocumentIndexMetadata` 是 Elasticsearch index 边界元信息,允许保留该命名。

> **修订(2026-05-22,ADR-0025)**:约束 #2、#9 适用于 index 初始化与 mapping augmentation helper。exact-match(`term`)查询的字段路径解析已改为读取目标 index 的实时 `_mapping`——augmented metadata 是代码意图,2026-05-18 之前创建的 index 的物理 mapping 才是事实,二者背离曾导致 #743 线上故障。读取 `_mapping` 不做 mutation / repair / replay,仍满足约束 #3。详见 `docs/adr/0025-elasticsearch-exact-match-resolution-reads-index-truth.md`。

## 6. 当前基线

### 6.1 当前正确部分
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,17 @@ Elasticsearch Document Provider。
- 新建索引时,provider 会基于 read model 的 protobuf descriptor 补齐低风险稳定字段映射:root-level `google.protobuf.Timestamp` 映射为 `date`,root-level 稳定字符串标识字段(如 `id`、`actor_id`、`last_event_id`、`*_id`、`*_key`、`*_hash`、`*_status`、`*_kind`、`*_type`、`*_type_url`)映射为 `keyword`
- `DocumentIndexMetadata` 中显式声明的 mapping 优先,provider 不覆盖自定义 `text`、analyzer、object、nested 或其他业务 mapping
- `google.protobuf.Any`、`google.protobuf.Struct`、map、repeated message 与 repeated scalar 字段默认保持开放,不由通用 helper 递归展开
- mapping 契约变更不兼容旧 Elasticsearch index 时,直接清空或重建 projection index;provider 不做旧索引在线修复、双读 fallback 或 query-time mapping repair
- mapping 契约变更不兼容旧 Elasticsearch index 时,index 初始化仍按当前契约创建**新** index;provider 不在读路径在线修复、重建或 mutate 旧 index
- `AutoCreateIndex=true` 只会在缺失 index 时按当前契约创建新 index;如果需要保留数据,应通过 projection 重放或外部重建流程恢复数据

## 精确匹配字段路径解析

- 精确匹配(`term` / `terms`)过滤的 `keyword` / `text` 字段路径解析基于目标 index 的**实时** `_mapping`(`GET <index>/_mapping`),而非代码侧 augmented metadata
- 原因:augmented metadata 是代码意图;2026-05-18 之前创建的 index 上 `*_id` 等字符串字段可能仍是 dynamic `text` + `.keyword` multi-field。二者背离会让 `term` 查询命中 analyzed `text` 字段并对 identifier 形态的值返回 0 命中(见 issue #743、ADR-0025)
- 读取 `_mapping` 只读取 index schema,不做 mapping mutation / reindex / 文档回填 / event replay;成功的探测结果按 index 缓存
- `_mapping` 探测失败(index 缺失、ES 不可达、超时、响应不可解析)时回退到 declared / augmented metadata,即 #743 之前的解析行为
- 决策记录见 `docs/adr/0025-elasticsearch-exact-match-resolution-reads-index-truth.md`

参考:

- [_id field](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-id-field)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,11 @@ namespace Aevatar.CQRS.Projection.Providers.Elasticsearch.Stores;
internal sealed class ElasticsearchIndexLifecycleManager : IDisposable
{
private readonly SemaphoreSlim _initLock = new(1, 1);
private readonly SemaphoreSlim _mappingProbeLock = new(1, 1);
private readonly Lock _stateGate = new();
private readonly HashSet<string> _initializedIndices = new(StringComparer.Ordinal);
private readonly Dictionary<string, IReadOnlyDictionary<string, object?>> _actualFieldMappingsByIndex =
new(StringComparer.Ordinal);
private readonly HttpClient _httpClient;
private readonly bool _autoCreate;

Expand Down Expand Up @@ -75,5 +78,74 @@ private void MarkInitialized(string indexName)
_initializedIndices.Add(indexName);
}

public void Dispose() => _initLock.Dispose();
/// <summary>
/// Reads the live Elasticsearch <c>_mapping</c> for an index so the query path can resolve
/// keyword/text field paths from physical truth rather than code-side augmented metadata.
/// Returns <c>null</c> when the index is absent or the mapping cannot be read; callers then
/// fall back to declared metadata. Successful reads are cached for the manager lifetime.
/// </summary>
public async Task<IReadOnlyDictionary<string, object?>?> GetActualFieldMappingsAsync(
string indexName,
CancellationToken ct)
{
lock (_stateGate)
{
if (_actualFieldMappingsByIndex.TryGetValue(indexName, out var cached))
return cached;
}

await _mappingProbeLock.WaitAsync(ct);
try
{
lock (_stateGate)
{
if (_actualFieldMappingsByIndex.TryGetValue(indexName, out var cached))
return cached;
}

var mappings = await ReadActualFieldMappingsAsync(indexName, ct);
if (mappings == null)
return null;

lock (_stateGate)
_actualFieldMappingsByIndex[indexName] = mappings;
return mappings;
}
finally
{
_mappingProbeLock.Release();
}
}

private async Task<IReadOnlyDictionary<string, object?>?> ReadActualFieldMappingsAsync(
string indexName,
CancellationToken ct)
{
try
{
using var response = await _httpClient.GetAsync($"{indexName}/_mapping", ct);
if (!response.IsSuccessStatusCode)
return null;

var payload = await response.Content.ReadAsStringAsync(ct);
return ElasticsearchProjectionDocumentStoreMetadataSupport
.TryExtractFieldMappingsFromMappingResponse(payload, indexName);
}
catch (OperationCanceledException) when (ct.IsCancellationRequested)
{
throw;
}
catch (Exception ex) when (ex is HttpRequestException or OperationCanceledException)
{
// Best-effort probe: an unreachable mapping endpoint or HTTP timeout must not fail the
// query. The caller falls back to declared metadata (pre-existing resolution behaviour).
return null;
}
}

public void Dispose()
{
_initLock.Dispose();
_mappingProbeLock.Dispose();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ public sealed class ElasticsearchProjectionDocumentStore<TReadModel, TKey>
private readonly DocumentIndexMetadata _indexMetadata;
private readonly Func<TReadModel, string?>? _indexScopeSelector;
private readonly Func<string, string> _fieldPathResolver;
private readonly Func<ProjectionDocumentFilter, string, string> _exactMatchFieldPathResolver;
private readonly IReadOnlyDictionary<string, FieldDescriptor> _descriptorFieldMap;
private readonly ILogger<ElasticsearchProjectionDocumentStore<TReadModel, TKey>> _logger;

public ElasticsearchProjectionDocumentStore(
Expand Down Expand Up @@ -98,7 +98,7 @@ public ElasticsearchProjectionDocumentStore(
_indexScopeSelector = indexScopeSelector;
_defaultSortField = options.DefaultSortField?.Trim() ?? "";
_fieldPathResolver = BuildFieldPathResolver(descriptor);
_exactMatchFieldPathResolver = BuildExactMatchFieldPathResolver(descriptor, _indexMetadata);
_descriptorFieldMap = BuildDescriptorFieldMap(descriptor);
_logger = logger ?? NullLogger<ElasticsearchProjectionDocumentStore<TReadModel, TKey>>.Instance;

_indexManager = new ElasticsearchIndexLifecycleManager(_httpClient, _autoCreateIndex);
Expand Down Expand Up @@ -206,6 +206,7 @@ public async Task<ProjectionDocumentQueryResult<TReadModel>> QueryAsync(
ct.ThrowIfCancellationRequested();
ThrowIfDynamicReadModelQueriesUnsupported("query");
await _indexManager.EnsureIndexAsync(_indexName, _indexMetadata, ct);
var exactMatchFieldPathResolver = await BuildExactMatchFieldPathResolverAsync(ct);
var boundedTake = Math.Clamp(query.Take <= 0 ? 50 : query.Take, 1, _queryTakeMax);

using var request = new HttpRequestMessage(HttpMethod.Post, $"{_indexName}/_search")
Expand All @@ -216,7 +217,7 @@ public async Task<ProjectionDocumentQueryResult<TReadModel>> QueryAsync(
_defaultSortField,
boundedTake,
_fieldPathResolver,
_exactMatchFieldPathResolver),
exactMatchFieldPathResolver),
Encoding.UTF8,
"application/json"),
};
Expand Down Expand Up @@ -284,11 +285,23 @@ private static Func<string, string> BuildFieldPathResolver(MessageDescriptor des
return fieldPath => ResolveFieldPath(descriptor, fieldPath);
}

private static Func<ProjectionDocumentFilter, string, string> BuildExactMatchFieldPathResolver(
MessageDescriptor descriptor,
DocumentIndexMetadata indexMetadata)
private async Task<Func<ProjectionDocumentFilter, string, string>> BuildExactMatchFieldPathResolverAsync(
CancellationToken ct)
{
// Exact-match (term) filters must target the field path that physically exists in
// Elasticsearch. The resolver consults the live index `_mapping`, not the code-side
// augmented `DocumentIndexMetadata`: a string field that augmented metadata declares
// `keyword` may still be a dynamic `text` + `.keyword` multi-field on any index created
// before that declaration shipped. When the live mapping cannot be read, fall back to the
// declared metadata (pre-existing behaviour).
// See docs/adr/0025-elasticsearch-exact-match-resolution-reads-index-truth.md.
var actualFieldMappings = await _indexManager.GetActualFieldMappingsAsync(_indexName, ct);
return BuildExactMatchFieldPathResolver(actualFieldMappings ?? _indexMetadata.Mappings);
}

private Func<ProjectionDocumentFilter, string, string> BuildExactMatchFieldPathResolver(
IReadOnlyDictionary<string, object?> mappings)
{
var descriptorFieldMap = BuildDescriptorFieldMap(descriptor);
return (filter, resolvedFieldPath) =>
{
if (resolvedFieldPath.EndsWith(".keyword", StringComparison.Ordinal))
Expand All @@ -298,20 +311,20 @@ private static Func<ProjectionDocumentFilter, string, string> BuildExactMatchFie
return resolvedFieldPath;

if (ElasticsearchProjectionDocumentStoreMetadataSupport.TryGetFieldMapping(
indexMetadata.Mappings,
mappings,
resolvedFieldPath,
out var explicitMapping))
out var fieldMapping))
{
if (ElasticsearchProjectionDocumentStoreMetadataSupport.IsKeywordFieldMapping(explicitMapping))
if (ElasticsearchProjectionDocumentStoreMetadataSupport.IsKeywordFieldMapping(fieldMapping))
return resolvedFieldPath;

if (ElasticsearchProjectionDocumentStoreMetadataSupport.HasKeywordMultiField(explicitMapping))
if (ElasticsearchProjectionDocumentStoreMetadataSupport.HasKeywordMultiField(fieldMapping))
return $"{resolvedFieldPath}.keyword";

return resolvedFieldPath;
}

return descriptorFieldMap.TryGetValue(resolvedFieldPath, out var field) &&
return _descriptorFieldMap.TryGetValue(resolvedFieldPath, out var field) &&
field.FieldType == FieldType.String
? $"{resolvedFieldPath}.keyword"
: resolvedFieldPath;
Expand Down
Loading