| description |
|---|
Deploy Conductor as a self-hosted workflow engine in production — architecture overview, horizontal scaling, database, queue, indexing, and lock configuration, workflow monitoring, and recommended production deployment settings for this open source workflow orchestration platform. |
Conductor is a self-hosted, open source workflow engine that you deploy on your own infrastructure. This production deployment guide covers everything you need to run Conductor at scale: architecture, backend configuration, horizontal scaling, workflow monitoring, and tuning.
A Conductor deployment consists of these components:
What each component does:
| Component | Role |
|---|---|
| API Server | Exposes REST and gRPC endpoints for workflow and task operations. |
| Decider | The core state machine. Evaluates workflow state and schedules the next set of tasks. |
| Sweeper | Background process that polls for running workflows and triggers the decider to evaluate them. Required for progress on long-running workflows. |
| System Task Workers | Execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ, etc.) within the server JVM. |
| Event Processor | Listens to configured event buses and triggers workflows or completes tasks based on incoming events. |
| Database | Persists workflow definitions, execution state, task state, and poll data. |
| Queue | Manages task scheduling — pending tasks, delayed tasks, and the sweeper's own work queue. |
| Index | Powers workflow and task search in the UI and via the search API. |
| Lock | Distributed lock that prevents concurrent decider evaluations of the same workflow. Required in production. |
For local development and evaluation:
git clone https://github.com/conductor-oss/conductor
cd conductor
docker compose -f docker/docker-compose.yaml upThis starts Conductor with Redis (database + queue), Elasticsearch (indexing), and the server with UI on port 8080.
| URL | Description |
|---|---|
http://localhost:8080 |
Conductor UI |
http://localhost:8080/swagger-ui/index.html |
REST API docs |
http://localhost:8080/api/ |
API base URL |
Pre-built compose files for other backend combinations:
| Compose file | Database | Queue | Index |
|---|---|---|---|
docker-compose.yaml |
Redis | Redis | Elasticsearch 7 |
docker-compose-postgres.yaml |
PostgreSQL | PostgreSQL | PostgreSQL |
docker-compose-postgres-es7.yaml |
PostgreSQL | PostgreSQL | Elasticsearch 7 |
docker-compose-mysql.yaml |
MySQL | Redis | Elasticsearch 7 |
docker-compose-redis-os2.yaml |
Redis | Redis | OpenSearch 2 |
docker-compose-redis-os3.yaml |
Redis | Redis | OpenSearch 3 |
# Example: PostgreSQL for everything
docker compose -f docker/docker-compose-postgres.yaml up
# Example: Redis + OpenSearch 3
docker compose -f docker/docker-compose-redis-os3.yaml upAll configuration is done via Spring Boot properties in application.properties or environment variables. Properties can also be mounted as a Docker volume.
The database stores workflow definitions, execution state, task state, and event handler definitions.
conductor.db.type=postgresSupported database backends:
| Backend | Property value | When to use | Notes |
|---|---|---|---|
| PostgreSQL | postgres |
Recommended for production. ACID, battle-tested, supports indexing too. | Requires spring.datasource.* config. |
| MySQL | mysql |
Production alternative if your team already runs MySQL. | Requires spring.datasource.* config. Needs separate queue backend (Redis). |
| Redis | redis_standalone |
Fast, simple. Good for moderate scale. | Requires conductor.redis.* config. |
| Cassandra | cassandra |
High write throughput, multi-region. | Requires conductor.cassandra.* config. |
| SQLite | sqlite |
Local development only. Single-file, zero config. | Default. Not for production. |
conductor.db.type=postgres
conductor.external-payload-storage.type=postgres
spring.datasource.url=jdbc:postgresql://db-host:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>
# Optional tuning
conductor.postgres.deadlockRetryMax=3
conductor.postgres.taskDefCacheRefreshInterval=60s
conductor.postgres.asyncMaxPoolSize=12
conductor.postgres.asyncWorkerQueueSize=100conductor.db.type=mysql
spring.datasource.url=jdbc:mysql://db-host:3306/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>
# Optional tuning
conductor.mysql.deadlockRetryMax=3
conductor.mysql.taskDefCacheRefreshInterval=60sconductor.db.type=redis_standalone
# Format: host:port:rack (semicolon-separated for multiple hosts)
conductor.redis.hosts=redis-host:6379:us-east-1c
conductor.redis.workflowNamespacePrefix=conductor
conductor.redis.queueNamespacePrefix=conductor_queues
conductor.redis.taskDefCacheRefreshInterval=1s
# Connection pool
conductor.redis.maxIdleConnections=8
conductor.redis.minIdleConnections=5
# SSL
conductor.redis.ssl=false
# Auth (password is taken from the first host entry: host:port:rack:password)
# Or set conductor.redis.username and conductor.redis.password directlyThe queue backend manages task scheduling — it tracks which tasks are pending, delayed, or ready for execution. The sweeper and system task workers all depend on it.
conductor.queue.type=postgresSupported queue backends:
| Backend | Property value | When to use |
|---|---|---|
| PostgreSQL | postgres |
Use when database is also PostgreSQL. Simplest stack. |
| Redis | redis_standalone |
Use when database is Redis or MySQL. Fast, low-latency. |
| SQLite | sqlite |
Local development only. |
!!! tip "Match your queue backend to your database" PostgreSQL database + PostgreSQL queue is the simplest production stack — one fewer dependency. If you use MySQL for the database, pair it with Redis for the queue.
The indexing backend powers workflow and task search in the UI and via the /api/workflow/search and /api/tasks/search endpoints.
conductor.indexing.enabled=true
conductor.indexing.type=postgresSupported indexing backends:
| Backend | Property value | When to use | Notes |
|---|---|---|---|
| PostgreSQL | postgres |
Simplest stack when database is also PostgreSQL. | Set conductor.elasticsearch.version=0 to disable ES client. |
| Elasticsearch 7 | elasticsearch |
Best search performance at scale. Full-text search. | Set conductor.elasticsearch.version=7. |
| OpenSearch 2 | opensearch2 |
Open-source ES alternative. | Compatible with ES 7 queries. |
| OpenSearch 3 | opensearch3 |
Latest OpenSearch. | |
| SQLite | sqlite |
Local development only. | |
| Disabled | N/A | Set conductor.indexing.enabled=false. UI search won't work. |
conductor.indexing.enabled=true
conductor.indexing.type=postgres
# Disable Elasticsearch client
conductor.elasticsearch.version=0conductor.indexing.enabled=true
conductor.elasticsearch.url=http://es-host:9200
conductor.elasticsearch.version=7
conductor.elasticsearch.indexName=conductor
conductor.elasticsearch.clusterHealthColor=yellow
# Performance tuning
conductor.elasticsearch.indexBatchSize=1
conductor.elasticsearch.asyncMaxPoolSize=12
conductor.elasticsearch.asyncWorkerQueueSize=100
conductor.elasticsearch.asyncBufferFlushTimeout=10s
conductor.elasticsearch.indexShardCount=5
conductor.elasticsearch.indexReplicasCount=1
# Auth (if using security)
conductor.elasticsearch.username=elastic
conductor.elasticsearch.password=<password>conductor.indexing.enabled=true
conductor.indexing.type=opensearch2 # or opensearch3
conductor.opensearch.url=http://os-host:9200
conductor.opensearch.indexPrefix=conductor
conductor.opensearch.clusterHealthColor=yellow
conductor.opensearch.indexReplicasCount=0For high-throughput deployments, enable async indexing to decouple the indexing path from the workflow execution path:
conductor.app.asyncIndexingEnabled=true
conductor.app.asyncUpdateShortRunningWorkflowDuration=30s
conductor.app.asyncUpdateDelay=60sControl what gets indexed:
conductor.app.taskIndexingEnabled=true
conductor.app.taskExecLogIndexingEnabled=true
conductor.app.eventMessageIndexingEnabled=true
conductor.app.eventExecutionIndexingEnabled=true!!! warning "Required for production" Distributed locking prevents race conditions when multiple server instances evaluate the same workflow concurrently. Always enable locking in production with a distributed lock provider (Redis or Zookeeper).
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=trueSupported lock providers:
| Provider | Property value | When to use |
|---|---|---|
| Redis | redis |
Recommended. Use when Redis is already in the stack. |
| Zookeeper | zookeeper |
Use when Zookeeper is available (e.g. Kafka deployments). |
| Local | local_only |
Single-instance development only. Not safe for multi-instance. |
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.app.lockLeaseTime=60000 # lock held for max 60s
conductor.app.lockTimeToTry=500 # wait up to 500ms to acquire
conductor.redis-lock.serverType=SINGLE # SINGLE, CLUSTER, or SENTINEL
conductor.redis-lock.serverAddress=redis://redis-host:6379
# conductor.redis-lock.serverPassword=<password>
# conductor.redis-lock.serverMasterName=master # for Sentinel
# conductor.redis-lock.namespace=conductor # key prefix
conductor.redis-lock.ignoreLockingExceptions=falseconductor.workflow-execution-lock.type=zookeeper
conductor.app.workflowExecutionLockEnabled=true
conductor.app.lockLeaseTime=60000
conductor.app.lockTimeToTry=500
conductor.zookeeper-lock.connectionString=zk1:2181,zk2:2181,zk3:2181
# conductor.zookeeper-lock.sessionTimeoutMs=60000
# conductor.zookeeper-lock.connectionTimeoutMs=15000
# conductor.zookeeper-lock.namespace=conductorThe sweeper is a background process that monitors running workflows. It polls the queue for workflows that need evaluation and triggers the decider. Without the sweeper, long-running workflows will not make progress.
The sweeper runs automatically as part of the Conductor server. Tune the thread count based on your workflow volume:
# Number of sweeper threads (default: availableProcessors * 2)
conductor.app.sweeperThreadCount=8
# How long to wait when polling the sweep queue (default: 2000ms)
conductor.app.sweeperWorkflowPollTimeout=2000
# Batch size per sweep poll (default: 2)
conductor.app.sweeper.sweepBatchSize=2
# Queue pop timeout in ms (default: 100)
conductor.app.sweeper.queuePopTimeout=100!!! tip "Sweeper sizing"
Start with sweeperThreadCount = 2 * CPU cores. If you see workflows stuck in RUNNING state, increase it. If CPU usage is high on idle, decrease it.
System task workers execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ_TRANSFORM, etc.) inside the Conductor server JVM. They poll internal queues for scheduled system tasks and execute them.
# Number of system task worker threads (default: availableProcessors * 2)
conductor.app.systemTaskWorkerThreadCount=20
# Max number of tasks to poll at once (default: same as thread count)
conductor.app.systemTaskMaxPollCount=20
# Poll interval (default: 50ms)
conductor.app.systemTaskWorkerPollInterval=50ms
# Callback duration — how often to re-check async system tasks (default: 30s)
conductor.app.systemTaskWorkerCallbackDuration=30s
# Queue pop timeout (default: 100ms)
conductor.app.systemTaskQueuePopTimeout=100msIn large deployments, you may want to run system task workers on dedicated instances, separate from the API server. Use the execution namespace to isolate which instance handles system tasks:
# On API-only instances — set a namespace that no system task worker listens on
conductor.app.systemTaskWorkerExecutionNamespace=api-only
conductor.app.systemTaskWorkerThreadCount=0
# On dedicated system task worker instances — match the namespace
conductor.app.systemTaskWorkerExecutionNamespace=worker-pool-1
conductor.app.systemTaskWorkerThreadCount=40
conductor.app.systemTaskMaxPollCount=40For task domain isolation (routing specific tasks to specific worker groups):
# Threads per isolation group (default: 1)
conductor.app.isolatedSystemTaskWorkerThreadCount=4When a system task has been polled many times without completing (e.g. a Join waiting for branches), Conductor progressively delays re-evaluation to avoid busy-polling:
# After this many polls, begin exponential backoff (default: 200)
conductor.app.systemTaskPostponeThreshold=200The event processor listens to configured event buses and triggers workflows or completes tasks based on incoming events.
# Thread count for event processing (default: 2)
conductor.app.eventProcessorThreadCount=4
# Event queue polling
conductor.app.eventQueueSchedulerPollThreadCount=4 # default: CPU cores
conductor.app.eventQueuePollInterval=100ms
conductor.app.eventQueuePollCount=10
conductor.app.eventQueueLongPollTimeout=1000msSee the Event-driven recipes for configuring Kafka, NATS, AMQP, and SQS event queues.
Conductor enforces payload size limits to prevent oversized data from degrading performance. When a payload exceeds the threshold, it is automatically stored in external payload storage (S3, PostgreSQL, or Azure Blob).
# Workflow input/output — threshold to move to external storage (default: 5120 KB)
conductor.app.workflowInputPayloadSizeThreshold=5120KB
conductor.app.workflowOutputPayloadSizeThreshold=5120KB
# Workflow input/output — hard limit, fails the workflow (default: 10240 KB)
conductor.app.maxWorkflowInputPayloadSizeThreshold=10240KB
conductor.app.maxWorkflowOutputPayloadSizeThreshold=10240KB
# Task input/output — threshold to move to external storage (default: 3072 KB)
conductor.app.taskInputPayloadSizeThreshold=3072KB
conductor.app.taskOutputPayloadSizeThreshold=3072KB
# Task input/output — hard limit, fails the task (default: 10240 KB)
conductor.app.maxTaskInputPayloadSizeThreshold=10240KB
conductor.app.maxTaskOutputPayloadSizeThreshold=10240KB
# Workflow variables — hard limit (default: 256 KB)
conductor.app.maxWorkflowVariablesPayloadSizeThreshold=256KBFor external payload storage configuration, see External Payload Storage.
Conductor exposes Prometheus-compatible metrics out of the box for workflow monitoring and observability:
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus
management.metrics.web.server.request.autotime.percentiles=0.50,0.75,0.90,0.95,0.99
management.endpoint.health.show-details=alwaysScrape http://<conductor-host>:8080/actuator/prometheus with Prometheus.
For details on available metrics, see Server Metrics and Client Metrics.
One database for everything — fewest moving parts.
# Database
conductor.db.type=postgres
conductor.queue.type=postgres
conductor.external-payload-storage.type=postgres
spring.datasource.url=jdbc:postgresql://db-host:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>
# Indexing (use PostgreSQL, no Elasticsearch needed)
conductor.indexing.enabled=true
conductor.indexing.type=postgres
conductor.elasticsearch.version=0
# Locking (use Redis — lightweight, fast)
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.redis-lock.serverAddress=redis://redis-host:6379
# Sweeper
conductor.app.sweeperThreadCount=8
# System task workers
conductor.app.systemTaskWorkerThreadCount=20
conductor.app.systemTaskMaxPollCount=20
# Metrics
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheusBest search performance and lowest latency for queue operations.
# Database + Queue
conductor.db.type=redis_standalone
conductor.queue.type=redis_standalone
conductor.redis.hosts=redis-host:6379:us-east-1c
conductor.redis.workflowNamespacePrefix=conductor
conductor.redis.queueNamespacePrefix=conductor_queues
# Indexing
conductor.indexing.enabled=true
conductor.elasticsearch.url=http://es-host:9200
conductor.elasticsearch.version=7
conductor.elasticsearch.indexName=conductor
conductor.elasticsearch.clusterHealthColor=yellow
conductor.app.asyncIndexingEnabled=true
# Locking
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.redis-lock.serverAddress=redis://redis-host:6379
# Sweeper
conductor.app.sweeperThreadCount=16
# System task workers
conductor.app.systemTaskWorkerThreadCount=40
conductor.app.systemTaskMaxPollCount=40
# Metrics
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheusgit clone https://github.com/conductor-oss/conductor
cd conductor
docker compose -f docker/docker-compose.yaml upTo use a different backend, swap the compose file:
docker compose -f docker/docker-compose-postgres.yaml updocker run -p 8080:8080 conductoross/conductor:latestMount your own properties file to override the defaults without rebuilding the image:
docker run -p 8080:8080 \
-v /path/to/my-config.properties:/app/config/config.properties \
conductoross/conductor:latest| URL | Description |
|---|---|
http://localhost:8080 |
Conductor UI |
http://localhost:8080/swagger-ui/index.html |
REST API docs |
# Ctrl+C to stop, then:
docker compose downFor high availability and horizontal scaling, run multiple Conductor server instances behind a load balancer. All instances share the same database, queue, index, and lock backends. This architecture enables workflow engine scalability to millions of concurrent executions.
Requirements:
- Distributed locking must be enabled (
redisorzookeeper). Without it, concurrent decider evaluations on the same workflow will cause race conditions. - All instances must point to the same database, queue, and indexing backends.
- The load balancer should use round-robin or least-connections routing.
Optional: separate API and worker instances:
┌──────────────────┐ ┌──────────────────┐
│ API Instance 1 │ │ API Instance 2 │ ← handle REST/gRPC, low system task threads
│ (systemTask=0) │ │ (systemTask=0) │
└────────┬─────────┘ └────────┬─────────┘
│ │
┌────┴────────────────────────┴────┐
│ Load Balancer │
└────┬────────────────────────┬────┘
│ │
┌────────┴──────────┐ ┌───────┴───────────┐
│ Worker Instance │ │ Worker Instance │ ← high system task threads, sweeper
│ (systemTask=40) │ │ (systemTask=40) │
└───────────────────┘ └───────────────────┘
| Issue | Fix |
|---|---|
| Out of memory or slow performance | Check JVM heap usage and adjust -Xms / -Xmx as necessary. Monitor with jstat or the /actuator/health endpoint. |
| Elasticsearch stuck in yellow health | Set conductor.elasticsearch.clusterHealthColor=yellow or add more ES nodes for green. |
| Workflows stuck in RUNNING | Check sweeper is running and sweeperThreadCount > 0. Check lock provider is reachable. |
| System tasks not executing | Verify systemTaskWorkerThreadCount > 0 and the queue backend is reachable. |
| Config changes not taking effect | Properties are baked into the Docker image at build time. Mount a volume instead of rebuilding. |
