Serialize concurrent ConfigureIndexesAsync with distributed lock + cache marker by niemyjski · Pull Request #244 · FoundatioFx/Foundatio.Repositories

niemyjski · 2026-03-24T22:32:06Z

TLDR

When multiple distributed processes (pods, workers, migration runners) call \ConfigureIndexesAsync\ on startup, only the first one does real work. Everyone else waits and skips. Cache marker is cleared on delete, reindex, and maintenance so the next configure call re-validates.

Problem

\ConfigureIndexesAsync\ had no concurrency protection. Every caller ran the full configure pass (index exists checks, create/update settings, put mappings, alias management, reindex detection) regardless of whether another process was already doing the same work. At scale (100 nodes, 100+ indexes), this means thousands of redundant Elasticsearch admin API calls on every deployment, plus racing reindex work item enqueuing.

Solution

Added a distributed double-checked lock + cache marker pattern to \ElasticConfiguration.ConfigureIndexesAsync:

Cache check -- if \configure-indexes\ key exists in the distributed cache, return immediately (zero ES calls, zero lock overhead)
Distributed lock -- acquire via \CacheLockProvider\ (1-minute lock duration, 1-minute wait timeout)
Double-check -- after acquiring the lock, check cache again (another process may have finished while we waited)
Configure -- run the full \ConfigureAsync\ + \MaintainAsync\ on all indexes in parallel
Set cache marker -- 5-minute TTL, only set after successful completion (partial failures retry)

Cache marker is explicitly cleared by \DeleteIndexesAsync, \ReindexAsync, and \MaintainIndexesAsync\ so the next configure call re-runs after any structural change.

Behavior at scale

Caller	Cache lookups	Lock ops	ES API calls
Winner (first process)	2	acquire + release	Full configure
Waiter (during configure)	2	acquire + release	0
Late arrival (after configure)	1	0	0

Failure modes

Cache down: Falls back to current behavior (everyone configures). Idempotent, just redundant.
Lock timeout: Proceeds with warning. Safe because \ConfigureAsync\ is idempotent.
Partial failure: Exception propagates, cache marker never set, next caller retries.
Manual ES changes: 5-minute TTL safety net ensures reconfiguration.

Design decisions

No lock renewal: Even 100+ indexes in parallel complete in seconds. 1-minute lock is generous. If it takes longer, ES is unhealthy.
Explicit \indexes\ parameter bypasses lock+cache: Callers who pass specific indexes know what they want.
No interface changes: \IElasticConfiguration\ signature is unchanged.

Changes

\ElasticConfiguration.ConfigureIndexesAsync\ -- distributed lock + cache marker
\ElasticConfiguration.DeleteIndexesAsync\ -- clears cache marker after deletion
\ElasticConfiguration.MaintainIndexesAsync\ -- clears cache marker after maintenance
\ElasticConfiguration.ReindexAsync\ -- clears cache marker after reindexing

…k + cache marker Prevent redundant Elasticsearch admin API calls when multiple distributed processes call ConfigureIndexesAsync simultaneously on startup.

Copilot

Pull request overview

Adds concurrency protection to Elasticsearch index configuration to prevent redundant admin calls when many distributed processes call ConfigureIndexesAsync at startup, using a distributed lock plus a short-lived cache marker.

Changes:

Serialize ElasticConfiguration.ConfigureIndexesAsync with a distributed lock and “configured recently” cache marker.
Clear the cache marker after index delete, maintenance, and reindex operations so subsequent configure runs re-validate.
Convert several methods to async and await the underlying Task.WhenAll to support the new post-work cache invalidation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T22:36:00Z