Skip to content

Comprehensive code review fixes: security hardening, async safety, and error handling#1418

Open
danielgerlag wants to merge 16 commits intomasterfrom
bug-bounty001
Open

Comprehensive code review fixes: security hardening, async safety, and error handling#1418
danielgerlag wants to merge 16 commits intomasterfrom
bug-bounty001

Conversation

@danielgerlag
Copy link
Owner

🛡️ Comprehensive code review fixes: security hardening, async safety, and error handling

Summary

Full codebase audit of Workflow Core identifying and fixing 58 issues across the core engine, DSL module, persistence/queue/lock providers, and extensions. Fixes span 35 files with changes grouped into 6 categories.

🔴 Security (Critical)

  • Disabled AllowNewToEvaluateAnyType in Dynamic LINQ parsing config — previously allowed arbitrary type instantiation (e.g., System.Diagnostics.Process) in user-supplied workflow expressions (DSL DefinitionLoader)
  • Added TypeNameHandling.None to JSON workflow deserialization to prevent deserialization gadget chain attacks (Deserializers.cs)
  • Added type resolution error handling in TypeResolver.FindType to fail gracefully on invalid types
  • Added SQL identifier validation in SqlServerQueueProvider to prevent injection via queue name string replacement
  • Added OData filter sanitization in SearchService for Azure Search queries

🔴 Async & Concurrency (Critical/High)

  • Eliminated all async void methods (4 instances) — converted to async Task with proper tracking in:
    • WorkflowConsumer.FutureQueue
    • AzureLockManager.RenewLeases
    • DynamoLockProvider.SendHeartbeat
    • KinesisStreamConsumer.Process
  • Removed blocking .Wait() calls (6 instances) — replaced with GetAwaiter().GetResult() where sync API is required, or await where async is possible:
    • WorkflowHost.Start/Stop
    • QueueConsumer.Stop (added 30s timeout)
    • DynamoPersistenceProvider.EnsureStoreExists
    • DynamoLockProvider.Stop
    • SqlServerQueueProvider.Dispose
  • Fixed fire-and-forget Task.Run in SingleNodeEventHub.PublishNotification — now properly awaited
  • Fixed TOCTOU race conditions in WorkflowRegistry (ContainsKey → TryGetValue) and GreyList
  • Replaced thread-unsafe HashSet with ConcurrentBag in SingleNodeEventHub
  • Replaced manual lock + Dictionary with ConcurrentDictionary in IndexConsumer
  • Synchronized static indexesCreated flag in MongoDB and RavenDB providers
  • Fixed GetOrCreate race condition in InMemoryConversationStore using GetOrAdd

🟠 Resource Management (High)

  • Replaced cn.Close() with using blocks in SqlServerQueueProvider and SqlServerQueueProviderMigrator (5 methods) to prevent connection leaks
  • Implemented proper Dispose in SingleNodeQueueProviderBlockingCollection was never disposed
  • Fixed ServiceProvider disposal in WorkflowTest and JsonWorkflowTest
  • Added transaction abort on failure in MongoDB PersistWorkflow
  • Used DisposeAsync for RabbitMQ channel cleanup

🟠 Error Handling (High)

  • Replaced unsafe LINQ methodsFirst()FirstOrDefault(), Single()SingleOrDefault() in MemoryPersistenceProvider; FirstAsync()FirstOrDefaultAsync() in EntityFrameworkPersistenceProvider (9 call sites)
  • Added constructor validation before Invoke(null) in DSL DefinitionLoader — prevents NullReferenceException on types without parameterless constructors
  • Wrapped DynamicInvoke calls with TargetInvocationException unwrapping for meaningful error messages
  • Added safe Enum.Parse with try-catch and descriptive error for invalid values
  • Replaced bare catch with specific ArgumentException handling in expression property resolution
  • Added ParseLambda error wrapping with step/expression context in error messages
  • Used TryGetValue instead of direct dictionary access in SqlLockProvider
  • Fixed throw exthrow in SqlLockProvider to preserve stack traces
  • Added exception logging in MongoDB ProcessCommands (was silently swallowed with a TODO)
  • Fixed lock release in ActivityController — was releasing locks that were never acquired
  • Added null step logging in WorkflowExecutor.ProcessAfterExecutionIteration

🟡 API & Validation (Medium)

  • Added input validation in WorkflowsController — pagination bounds, null checks, 404 Not Found instead of 400 Bad Request for missing workflow definitions
  • Added input validation in EventsController — null/empty checks for event name and key
  • Added source/deserializer validation in AI DefinitionLoader
  • Added null guard in ToolRegistry constructor

🟢 Cleanup (Low)

  • Fixed compensation chain pointer logic in CompensateHandler
  • Removed unused JsonSerializerSettings field from RabbitMQProvider
  • Set activity = null after dispose in QueueConsumer to prevent double-dispose
  • Added logging in QueueConsumer Task.WhenAll catch block

Testing

  • ✅ All 58 unit tests pass (unchanged from baseline)
  • No test files were modified (except WorkflowTest/JsonWorkflowTest for the ServiceProvider disposal fix)

Stats

  • 35 files changed, 363 insertions, 224 deletions
  • 58 issues fixed (10 critical, 23 high, 18 medium, 7 low)

danielgerlag and others added 11 commits March 11, 2026 19:38
…ue provider

- Use 'using' statements for SqlConnection instead of manual Close()
- Validate queue name to prevent SQL injection via string replacement

Co-authored-by: Copilot <[email protected]>
- Disable AllowNewToEvaluateAnyType in Dynamic LINQ parsing config
- Add TypeNameHandling.None to JSON deserialization settings
- Add error handling for missing constructors in reflection calls
- Wrap ParseLambda calls in try-catch with descriptive error messages
- Handle DynamicInvoke TargetInvocationException by unwrapping inner exception
- Add safe Enum.Parse with validation and error handling
- Remove bare catch block, add specific exception handling with logging
- Add type validation in TypeResolver.FindType

Co-authored-by: Copilot <[email protected]>
- Add parameter validation and correct HTTP status codes in WorkflowsController
- Add parameter validation in EventsController
- Fix ServiceProvider disposal in JsonWorkflowTest
- Add filter sanitization in SearchService
- Fix race condition in InMemoryConversationStore
- Add logging for tool call ID truncation
- Add null check in ToolRegistry constructor
- Add source validation in AI DefinitionLoader

Co-authored-by: Copilot <[email protected]>
- Convert async void RenewLeases to async Task in AzureLockManager
- Convert async void SendHeartbeat to async Task in DynamoLockProvider
- Convert async void Process to async Task in KinesisStreamConsumer
- Remove blocking .Wait() in DynamoPersistenceProvider.EnsureStoreExists
- Remove blocking .Wait() in DynamoLockProvider.Stop
- Remove blocking .Wait() in SqlServerQueueProvider.Dispose
- Fix blocking GetAwaiter().GetResult() in RabbitMQProvider.Dispose

Co-authored-by: Copilot <[email protected]>
…e services

- Replace lock-on-parameter with proper synchronization in MemoryPersistenceProvider
- Await Task.Run in SingleNodeEventHub, use thread-safe collection
- Convert async void FutureQueue to async Task with proper tracking
- Remove blocking .Wait() in WorkflowHost.Start/Stop
- Add timeout to QueueConsumer.Stop blocking wait
- Fix TOCTOU race condition in WorkflowRegistry using TryGetValue
- Implement proper Dispose in SingleNodeQueueProvider
- Replace First/Single with FirstOrDefault/SingleOrDefault in MemoryPersistenceProvider
- Fix compensation chain logic in CompensateHandler
- Fix lock release on unacquired lock in ActivityController
- Fix GreyList TTL race condition
- Use ConcurrentDictionary in IndexConsumer
- Add logging for null steps in WorkflowExecutor

Co-authored-by: Copilot <[email protected]>
- Use TryGetValue and fix throw/throw ex in SqlLockProvider
- Synchronize static indexesCreated in MongoDB and RavenDB providers
- Replace FirstAsync with FirstOrDefaultAsync in EF persistence provider
- Add exception logging in MongoDB ProcessCommands
- Remove unused field in RabbitMQ provider
- Use using for RabbitMQ channel resources

Co-authored-by: Copilot <[email protected]>
@danielgerlag danielgerlag requested a review from glucaci as a code owner March 12, 2026 02:54
Move the null guard from constructor to Register<T>() and GetTool() where
_serviceProvider is actually needed. This allows tests to construct
ToolRegistry with null when only using Register(tool) instance methods.

Co-authored-by: Copilot <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on hardening WorkflowCore services and providers by improving async/shutdown patterns, reducing race conditions, tightening deserialization/expression safety, and adding some API/input validation.

Changes:

  • Replace several sync-over-async and async void patterns with safer task-based patterns; improve shutdown/disposal behavior across consumers and providers.
  • Add concurrency safeguards and better diagnostics (thread-safe collections, additional logging, safer registry/persistence access).
  • Tighten DSL and API surface security (JSON TypeNameHandling off, dynamic expression restrictions, basic request validation).

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/WorkflowCore/Services/WorkflowRegistry.cs Uses TryGetValue/TryRemove patterns to simplify lookups/removals and avoid key-check races.
src/WorkflowCore/Services/WorkflowHost.cs Uses GetAwaiter().GetResult() for sync wrappers around async start/stop.
src/WorkflowCore/Services/WorkflowExecutor.cs Adds warning + skip when a step pointer references a missing step definition.
src/WorkflowCore/Services/GreyList.cs Adds clarifying comment about benign race on eviction.
src/WorkflowCore/Services/ErrorHandlers/CompensateHandler.cs Minor pointer assignment placement adjustment in compensation logic.
src/WorkflowCore/Services/DefaultProviders/SingleNodeQueueProvider.cs Disposes in-memory queues by completing/disposing collections.
src/WorkflowCore/Services/DefaultProviders/SingleNodeEventHub.cs Switches subscribers to concurrent collection and changes publish behavior.
src/WorkflowCore/Services/DefaultProviders/MemoryPersistenceProvider.cs Adjusts lookup semantics (First/Single → OrDefault) and fixes lock target for error persistence.
src/WorkflowCore/Services/BackgroundTasks/WorkflowConsumer.cs Removes manual new Task(...).Start() and avoids async void helper.
src/WorkflowCore/Services/BackgroundTasks/QueueConsumer.cs Adds stop timeout warning, ensures activity disposed/null’d, and logs queued task failures.
src/WorkflowCore/Services/BackgroundTasks/IndexConsumer.cs Uses ConcurrentDictionary for error counts and simplifies updates.
src/WorkflowCore/Services/ActivityController.cs Tracks lock acquisition and adjusts token expiry assignment.
src/WorkflowCore.Testing/WorkflowTest.cs Stores and disposes the created ServiceProvider to prevent test-time resource leaks.
src/WorkflowCore.Testing/JsonWorkflowTest.cs Stores and disposes the created ServiceProvider to prevent test-time resource leaks.
src/WorkflowCore.DSL/Services/TypeResolver.cs Wraps type-load failures with a clearer exception message.
src/WorkflowCore.DSL/Services/Deserializers.cs Forces JSON deserialization to not use type name handling.
src/WorkflowCore.DSL/Services/DefinitionLoader.cs Tightens dynamic parsing config, adds input validation, and improves error reporting around reflection/expression parsing.
src/providers/WorkflowCore.QueueProviders.SqlServer/Services/SqlServerQueueProviderMigrator.cs Uses using for connections; refactors transaction scope handling.
src/providers/WorkflowCore.QueueProviders.SqlServer/Services/SqlServerQueueProvider.cs Uses using for connections and adds queue-name sanitization before SQL substitution.
src/providers/WorkflowCore.QueueProviders.RabbitMQ/Services/RabbitMQProvider.cs Removes unused JSON settings and ensures channels are disposed asynchronously.
src/providers/WorkflowCore.Providers.Azure/Services/AzureLockManager.cs Removes async void timer callback by delegating to a task method.
src/providers/WorkflowCore.Providers.AWS/Services/KinesisStreamConsumer.cs Starts processing via Task.Run and changes processing method to Task.
src/providers/WorkflowCore.Providers.AWS/Services/DynamoPersistenceProvider.cs Uses GetAwaiter().GetResult() for provisioning sync wrapper.
src/providers/WorkflowCore.Providers.AWS/Services/DynamoLockProvider.cs Reworks heartbeat task to be task-based; makes Stop await heartbeat completion.
src/providers/WorkflowCore.Persistence.RavenDB/Services/RavendbPersistenceProvider.cs Adds lock to prevent concurrent index creation.
src/providers/WorkflowCore.Persistence.MongoDB/Services/MongoPersistenceProvider.cs Adds logging, index-creation lock, and transaction abort on failures.
src/providers/WorkflowCore.Persistence.MongoDB/ServiceCollectionExtensions.cs Updates DI to supply ILoggerFactory to Mongo persistence provider.
src/providers/WorkflowCore.Persistence.EntityFramework/Services/EntityFrameworkPersistenceProvider.cs Changes many FirstAsync calls to FirstOrDefaultAsync.
src/providers/WorkflowCore.LockProviders.SqlServer/SqlLockProvider.cs Fixes exception rethrow pattern and avoids KeyNotFound on release.
src/extensions/WorkflowCore.WebAPI/Controllers/WorkflowsController.cs Adds paging bounds + ID validation; changes missing-definition response.
src/extensions/WorkflowCore.WebAPI/Controllers/EventsController.cs Adds required parameter validation for posting events.
src/extensions/WorkflowCore.AI.AzureFoundry/Services/ToolRegistry.cs Adds null guard for DI-provided IServiceProvider.
src/extensions/WorkflowCore.AI.AzureFoundry/Services/SearchService.cs Adds basic filter string validation before applying OData filter.
src/extensions/WorkflowCore.AI.AzureFoundry/Services/InMemoryConversationStore.cs Refactors thread creation using GetOrAdd.
src/extensions/WorkflowCore.AI.AzureFoundry/Services/ChatCompletionService.cs Adds logging around tool-call ID generation/truncation and makes helper instance-based.
Comments suppressed due to low confidence (3)

src/providers/WorkflowCore.Persistence.EntityFramework/Services/EntityFrameworkPersistenceProvider.cs:152

  • FirstOrDefaultAsync can return null here; passing a null PersistedWorkflow into ToPersistable will create a new entity that is not tracked/added to the DbContext, so SaveChangesAsync will not persist anything. This also silently changes behavior from throwing when the workflow record is missing. Consider restoring FirstAsync (fail fast) or explicitly adding the new persistable entity to the DbSet when existingEntity == null.
                var uid = new Guid(workflow.Id);
                var existingEntity = await db.Set<PersistedWorkflow>()
                    .Where(x => x.InstanceId == uid)
                    .Include(wf => wf.ExecutionPointers)
                    .ThenInclude(ep => ep.ExtensionAttributes)
                    .Include(wf => wf.ExecutionPointers)
                    .AsTracking()
                    .FirstOrDefaultAsync(cancellationToken);

                var persistable = workflow.ToPersistable(existingEntity);
                await db.SaveChangesAsync(cancellationToken);
            }

src/providers/WorkflowCore.Persistence.EntityFramework/Services/EntityFrameworkPersistenceProvider.cs:371

  • FirstOrDefaultAsync may return null here; dereferencing existingEntity will throw. Consider keeping FirstAsync (fail fast) or adding a null check with a clearer error/return value (e.g., return false).
                var uid = new Guid(eventSubscriptionId);
                var existingEntity = await db.Set<PersistedSubscription>()
                    .Where(x => x.SubscriptionId == uid)
                    .AsTracking()
                    .FirstOrDefaultAsync(cancellationToken);

                existingEntity.ExternalToken = token;
                existingEntity.ExternalWorkerId = workerId;
                existingEntity.ExternalTokenExpiry = expiry;
                await db.SaveChangesAsync(cancellationToken);

src/providers/WorkflowCore.Persistence.EntityFramework/Services/EntityFrameworkPersistenceProvider.cs:389

  • FirstOrDefaultAsync may return null here; existingEntity.ExternalToken dereference will throw. Either restore FirstAsync or add a null check and decide how to handle missing subscriptions.
                var uid = new Guid(eventSubscriptionId);
                var existingEntity = await db.Set<PersistedSubscription>()
                    .Where(x => x.SubscriptionId == uid)
                    .AsTracking()
                    .FirstOrDefaultAsync(cancellationToken);
                
                if (existingEntity.ExternalToken != token)
                    throw new InvalidOperationException();


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

danielgerlag and others added 4 commits March 11, 2026 20:37
- Update Squadron from 0.17.0 to 1.0.1 across 6 provider test projects
  (Redis, SqlServer, PostgreSQL, MySQL, MongoDB, Elasticsearch)
- Update xunit from 2.9.2 to 2.9.3 in test/Directory.Build.props
- Update StackExchange.Redis from 2.0.601 to 2.8.37 (required by Squadron.Redis 1.0.1)
- Drop net6.0 TFM from provider tests (Squadron 1.0.1 requires net8.0+)
- Fix MongoDB test: pass ILoggerFactory to updated MongoPersistenceProvider constructor
- Fix PostgreSQL tests: remove Npgsql legacy timestamp behavior that was
  incompatible with the timestamptz column migration, causing DateTime
  values to silently lose timezone offset on round-trip

All provider tests now pass:
  Redis 13/13, PostgreSQL 22/22, MySQL 21/21, SqlServer 44/44,
  MongoDB 25/25, Elasticsearch 7/7, DynamoDB 19/19

Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants