description
Deploy Conductor as a self-hosted workflow engine in production — architecture overview, horizontal scaling, database, queue, indexing, and lock configuration, workflow monitoring, and recommended production deployment settings for this open source workflow orchestration platform.

Self-hosted deployment guide

Conductor is a self-hosted, open source workflow engine that you deploy on your own infrastructure. This production deployment guide covers everything you need to run Conductor at scale: architecture, backend configuration, horizontal scaling, workflow monitoring, and tuning.

Architecture overview

A Conductor deployment consists of these components:

What each component does:

Component	Role
API Server	Exposes REST and gRPC endpoints for workflow and task operations.
Decider	The core state machine. Evaluates workflow state and schedules the next set of tasks.
Sweeper	Background process that polls for running workflows and triggers the decider to evaluate them. Required for progress on long-running workflows.
System Task Workers	Execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ, etc.) within the server JVM.
Event Processor	Listens to configured event buses and triggers workflows or completes tasks based on incoming events.
Database	Persists workflow definitions, execution state, task state, and poll data.
Queue	Manages task scheduling — pending tasks, delayed tasks, and the sweeper's own work queue.
Index	Powers workflow and task search in the UI and via the search API.
Lock	Distributed lock that prevents concurrent decider evaluations of the same workflow. Required in production.

Quick start with Docker Compose

For local development and evaluation:

git clone https://github.com/conductor-oss/conductor
cd conductor
docker compose -f docker/docker-compose.yaml up

This starts Conductor with Redis (database + queue), Elasticsearch (indexing), and the server with UI on port 8080.

URL	Description
`http://localhost:8080`	Conductor UI
`http://localhost:8080/swagger-ui/index.html`	REST API docs
`http://localhost:8080/api/`	API base URL

Pre-built compose files for other backend combinations:

Compose file	Database	Queue	Index
`docker-compose.yaml`	Redis	Redis	Elasticsearch 7
`docker-compose-postgres.yaml`	PostgreSQL	PostgreSQL	PostgreSQL
`docker-compose-postgres-es7.yaml`	PostgreSQL	PostgreSQL	Elasticsearch 7
`docker-compose-mysql.yaml`	MySQL	Redis	Elasticsearch 7
`docker-compose-redis-os2.yaml`	Redis	Redis	OpenSearch 2
`docker-compose-redis-os3.yaml`	Redis	Redis	OpenSearch 3

# Example: PostgreSQL for everything
docker compose -f docker/docker-compose-postgres.yaml up

# Example: Redis + OpenSearch 3
docker compose -f docker/docker-compose-redis-os3.yaml up

Production configuration

All configuration is done via Spring Boot properties in application.properties or environment variables. Properties can also be mounted as a Docker volume.

Database

The database stores workflow definitions, execution state, task state, and event handler definitions.

conductor.db.type=postgres

Supported database backends:

Backend	Property value	When to use	Notes
PostgreSQL	`postgres`	Recommended for production. ACID, battle-tested, supports indexing too.	Requires `spring.datasource.*` config.
MySQL	`mysql`	Production alternative if your team already runs MySQL.	Requires `spring.datasource.*` config. Needs separate queue backend (Redis).
Redis	`redis_standalone`	Fast, simple. Good for moderate scale.	Requires `conductor.redis.*` config.
Cassandra	`cassandra`	High write throughput, multi-region.	Requires `conductor.cassandra.*` config.
SQLite	`sqlite`	Local development only. Single-file, zero config.	Default. Not for production.

PostgreSQL

conductor.db.type=postgres
conductor.external-payload-storage.type=postgres

spring.datasource.url=jdbc:postgresql://db-host:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>

# Optional tuning
conductor.postgres.deadlockRetryMax=3
conductor.postgres.taskDefCacheRefreshInterval=60s
conductor.postgres.asyncMaxPoolSize=12
conductor.postgres.asyncWorkerQueueSize=100

MySQL

conductor.db.type=mysql

spring.datasource.url=jdbc:mysql://db-host:3306/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>

# Optional tuning
conductor.mysql.deadlockRetryMax=3
conductor.mysql.taskDefCacheRefreshInterval=60s

Redis

conductor.db.type=redis_standalone

# Format: host:port:rack (semicolon-separated for multiple hosts)
conductor.redis.hosts=redis-host:6379:us-east-1c
conductor.redis.workflowNamespacePrefix=conductor
conductor.redis.queueNamespacePrefix=conductor_queues
conductor.redis.taskDefCacheRefreshInterval=1s

# Connection pool
conductor.redis.maxIdleConnections=8
conductor.redis.minIdleConnections=5

# SSL
conductor.redis.ssl=false

# Auth (password is taken from the first host entry: host:port:rack:password)
# Or set conductor.redis.username and conductor.redis.password directly

Queue

The queue backend manages task scheduling — it tracks which tasks are pending, delayed, or ready for execution. The sweeper and system task workers all depend on it.

conductor.queue.type=postgres

Supported queue backends:

Backend	Property value	When to use
PostgreSQL	`postgres`	Use when database is also PostgreSQL. Simplest stack.
Redis	`redis_standalone`	Use when database is Redis or MySQL. Fast, low-latency.
SQLite	`sqlite`	Local development only.

!!! tip "Match your queue backend to your database" PostgreSQL database + PostgreSQL queue is the simplest production stack — one fewer dependency. If you use MySQL for the database, pair it with Redis for the queue.

Indexing

The indexing backend powers workflow and task search in the UI and via the /api/workflow/search and /api/tasks/search endpoints.

conductor.indexing.enabled=true
conductor.indexing.type=postgres

Supported indexing backends:

Backend	Property value	When to use	Notes
PostgreSQL	`postgres`	Simplest stack when database is also PostgreSQL.	Set `conductor.elasticsearch.version=0` to disable ES client.
Elasticsearch 7	`elasticsearch`	Best search performance at scale. Full-text search.	Set `conductor.elasticsearch.version=7`.
OpenSearch 2	`opensearch2`	Open-source ES alternative.	Compatible with ES 7 queries.
OpenSearch 3	`opensearch3`	Latest OpenSearch.
SQLite	`sqlite`	Local development only.
Disabled	N/A	Set `conductor.indexing.enabled=false`. UI search won't work.

PostgreSQL indexing

conductor.indexing.enabled=true
conductor.indexing.type=postgres
# Disable Elasticsearch client
conductor.elasticsearch.version=0

Elasticsearch 7

conductor.indexing.enabled=true
conductor.elasticsearch.url=http://es-host:9200
conductor.elasticsearch.version=7
conductor.elasticsearch.indexName=conductor
conductor.elasticsearch.clusterHealthColor=yellow

# Performance tuning
conductor.elasticsearch.indexBatchSize=1
conductor.elasticsearch.asyncMaxPoolSize=12
conductor.elasticsearch.asyncWorkerQueueSize=100
conductor.elasticsearch.asyncBufferFlushTimeout=10s
conductor.elasticsearch.indexShardCount=5
conductor.elasticsearch.indexReplicasCount=1

# Auth (if using security)
conductor.elasticsearch.username=elastic
conductor.elasticsearch.password=<password>

OpenSearch

conductor.indexing.enabled=true
conductor.indexing.type=opensearch2   # or opensearch3
conductor.opensearch.url=http://os-host:9200
conductor.opensearch.indexPrefix=conductor
conductor.opensearch.clusterHealthColor=yellow
conductor.opensearch.indexReplicasCount=0

Async indexing

For high-throughput deployments, enable async indexing to decouple the indexing path from the workflow execution path:

conductor.app.asyncIndexingEnabled=true
conductor.app.asyncUpdateShortRunningWorkflowDuration=30s
conductor.app.asyncUpdateDelay=60s

Indexing toggles

Control what gets indexed:

conductor.app.taskIndexingEnabled=true
conductor.app.taskExecLogIndexingEnabled=true
conductor.app.eventMessageIndexingEnabled=true
conductor.app.eventExecutionIndexingEnabled=true

Locking

!!! warning "Required for production" Distributed locking prevents race conditions when multiple server instances evaluate the same workflow concurrently. Always enable locking in production with a distributed lock provider (Redis or Zookeeper).

conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true

Supported lock providers:

Provider	Property value	When to use
Redis	`redis`	Recommended. Use when Redis is already in the stack.
Zookeeper	`zookeeper`	Use when Zookeeper is available (e.g. Kafka deployments).
Local	`local_only`	Single-instance development only. Not safe for multi-instance.

Redis lock

conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.app.lockLeaseTime=60000      # lock held for max 60s
conductor.app.lockTimeToTry=500        # wait up to 500ms to acquire

conductor.redis-lock.serverType=SINGLE              # SINGLE, CLUSTER, or SENTINEL
conductor.redis-lock.serverAddress=redis://redis-host:6379
# conductor.redis-lock.serverPassword=<password>
# conductor.redis-lock.serverMasterName=master     # for Sentinel
# conductor.redis-lock.namespace=conductor          # key prefix
conductor.redis-lock.ignoreLockingExceptions=false

Zookeeper lock

conductor.workflow-execution-lock.type=zookeeper
conductor.app.workflowExecutionLockEnabled=true
conductor.app.lockLeaseTime=60000
conductor.app.lockTimeToTry=500

conductor.zookeeper-lock.connectionString=zk1:2181,zk2:2181,zk3:2181
# conductor.zookeeper-lock.sessionTimeoutMs=60000
# conductor.zookeeper-lock.connectionTimeoutMs=15000
# conductor.zookeeper-lock.namespace=conductor

Sweeper

The sweeper is a background process that monitors running workflows. It polls the queue for workflows that need evaluation and triggers the decider. Without the sweeper, long-running workflows will not make progress.

The sweeper runs automatically as part of the Conductor server. Tune the thread count based on your workflow volume:

# Number of sweeper threads (default: availableProcessors * 2)
conductor.app.sweeperThreadCount=8

# How long to wait when polling the sweep queue (default: 2000ms)
conductor.app.sweeperWorkflowPollTimeout=2000

# Batch size per sweep poll (default: 2)
conductor.app.sweeper.sweepBatchSize=2

# Queue pop timeout in ms (default: 100)
conductor.app.sweeper.queuePopTimeout=100

!!! tip "Sweeper sizing" Start with sweeperThreadCount = 2 * CPU cores. If you see workflows stuck in RUNNING state, increase it. If CPU usage is high on idle, decrease it.

System task workers

System task workers execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ_TRANSFORM, etc.) inside the Conductor server JVM. They poll internal queues for scheduled system tasks and execute them.

# Number of system task worker threads (default: availableProcessors * 2)
conductor.app.systemTaskWorkerThreadCount=20

# Max number of tasks to poll at once (default: same as thread count)
conductor.app.systemTaskMaxPollCount=20

# Poll interval (default: 50ms)
conductor.app.systemTaskWorkerPollInterval=50ms

# Callback duration — how often to re-check async system tasks (default: 30s)
conductor.app.systemTaskWorkerCallbackDuration=30s

# Queue pop timeout (default: 100ms)
conductor.app.systemTaskQueuePopTimeout=100ms

Running system task workers separately

In large deployments, you may want to run system task workers on dedicated instances, separate from the API server. Use the execution namespace to isolate which instance handles system tasks:

# On API-only instances — set a namespace that no system task worker listens on
conductor.app.systemTaskWorkerExecutionNamespace=api-only
conductor.app.systemTaskWorkerThreadCount=0

# On dedicated system task worker instances — match the namespace
conductor.app.systemTaskWorkerExecutionNamespace=worker-pool-1
conductor.app.systemTaskWorkerThreadCount=40
conductor.app.systemTaskMaxPollCount=40

Isolated system task workers

For task domain isolation (routing specific tasks to specific worker groups):

# Threads per isolation group (default: 1)
conductor.app.isolatedSystemTaskWorkerThreadCount=4

Postpone threshold

When a system task has been polled many times without completing (e.g. a Join waiting for branches), Conductor progressively delays re-evaluation to avoid busy-polling:

# After this many polls, begin exponential backoff (default: 200)
conductor.app.systemTaskPostponeThreshold=200

Event processing

The event processor listens to configured event buses and triggers workflows or completes tasks based on incoming events.

# Thread count for event processing (default: 2)
conductor.app.eventProcessorThreadCount=4

# Event queue polling
conductor.app.eventQueueSchedulerPollThreadCount=4  # default: CPU cores
conductor.app.eventQueuePollInterval=100ms
conductor.app.eventQueuePollCount=10
conductor.app.eventQueueLongPollTimeout=1000ms

See the Event-driven recipes for configuring Kafka, NATS, AMQP, and SQS event queues.

Payload size limits

Conductor enforces payload size limits to prevent oversized data from degrading performance. When a payload exceeds the threshold, it is automatically stored in external payload storage (S3, PostgreSQL, or Azure Blob).

# Workflow input/output — threshold to move to external storage (default: 5120 KB)
conductor.app.workflowInputPayloadSizeThreshold=5120KB
conductor.app.workflowOutputPayloadSizeThreshold=5120KB

# Workflow input/output — hard limit, fails the workflow (default: 10240 KB)
conductor.app.maxWorkflowInputPayloadSizeThreshold=10240KB
conductor.app.maxWorkflowOutputPayloadSizeThreshold=10240KB

# Task input/output — threshold to move to external storage (default: 3072 KB)
conductor.app.taskInputPayloadSizeThreshold=3072KB
conductor.app.taskOutputPayloadSizeThreshold=3072KB

# Task input/output — hard limit, fails the task (default: 10240 KB)
conductor.app.maxTaskInputPayloadSizeThreshold=10240KB
conductor.app.maxTaskOutputPayloadSizeThreshold=10240KB

# Workflow variables — hard limit (default: 256 KB)
conductor.app.maxWorkflowVariablesPayloadSizeThreshold=256KB

For external payload storage configuration, see External Payload Storage.

Workflow monitoring and observability

Conductor exposes Prometheus-compatible metrics out of the box for workflow monitoring and observability:

conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus
management.metrics.web.server.request.autotime.percentiles=0.50,0.75,0.90,0.95,0.99
management.endpoint.health.show-details=always

Scrape http://<conductor-host>:8080/actuator/prometheus with Prometheus.

For details on available metrics, see Server Metrics and Client Metrics.

Recommended production configurations

PostgreSQL stack (simplest)

One database for everything — fewest moving parts.

# Database
conductor.db.type=postgres
conductor.queue.type=postgres
conductor.external-payload-storage.type=postgres
spring.datasource.url=jdbc:postgresql://db-host:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>

# Indexing (use PostgreSQL, no Elasticsearch needed)
conductor.indexing.enabled=true
conductor.indexing.type=postgres
conductor.elasticsearch.version=0

# Locking (use Redis — lightweight, fast)
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.redis-lock.serverAddress=redis://redis-host:6379

# Sweeper
conductor.app.sweeperThreadCount=8

# System task workers
conductor.app.systemTaskWorkerThreadCount=20
conductor.app.systemTaskMaxPollCount=20

# Metrics
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus

Redis + Elasticsearch stack (high throughput)

Best search performance and lowest latency for queue operations.

# Database + Queue
conductor.db.type=redis_standalone
conductor.queue.type=redis_standalone
conductor.redis.hosts=redis-host:6379:us-east-1c
conductor.redis.workflowNamespacePrefix=conductor
conductor.redis.queueNamespacePrefix=conductor_queues

# Indexing
conductor.indexing.enabled=true
conductor.elasticsearch.url=http://es-host:9200
conductor.elasticsearch.version=7
conductor.elasticsearch.indexName=conductor
conductor.elasticsearch.clusterHealthColor=yellow
conductor.app.asyncIndexingEnabled=true

# Locking
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.redis-lock.serverAddress=redis://redis-host:6379

# Sweeper
conductor.app.sweeperThreadCount=16

# System task workers
conductor.app.systemTaskWorkerThreadCount=40
conductor.app.systemTaskMaxPollCount=40

# Metrics
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus

Running with Docker

Using Docker Compose

git clone https://github.com/conductor-oss/conductor
cd conductor
docker compose -f docker/docker-compose.yaml up

To use a different backend, swap the compose file:

docker compose -f docker/docker-compose-postgres.yaml up

Using the standalone image

docker run -p 8080:8080 conductoross/conductor:latest

Custom configuration via volume mount

Mount your own properties file to override the defaults without rebuilding the image:

docker run -p 8080:8080 \
  -v /path/to/my-config.properties:/app/config/config.properties \
  conductoross/conductor:latest

Accessing Conductor

URL	Description
`http://localhost:8080`	Conductor UI
`http://localhost:8080/swagger-ui/index.html`	REST API docs

Shutting down

# Ctrl+C to stop, then:
docker compose down

Multi-instance deployment and horizontal scaling

For high availability and horizontal scaling, run multiple Conductor server instances behind a load balancer. All instances share the same database, queue, index, and lock backends. This architecture enables workflow engine scalability to millions of concurrent executions.

Requirements:

Distributed locking must be enabled (redis or zookeeper). Without it, concurrent decider evaluations on the same workflow will cause race conditions.
All instances must point to the same database, queue, and indexing backends.
The load balancer should use round-robin or least-connections routing.

Optional: separate API and worker instances:

┌──────────────────┐     ┌──────────────────┐
│  API Instance 1  │     │  API Instance 2  │   ← handle REST/gRPC, low system task threads
│  (systemTask=0)  │     │  (systemTask=0)  │
└────────┬─────────┘     └────────┬─────────┘
         │                        │
    ┌────┴────────────────────────┴────┐
    │         Load Balancer            │
    └────┬────────────────────────┬────┘
         │                        │
┌────────┴──────────┐     ┌───────┴───────────┐
│  Worker Instance  │     │  Worker Instance  │  ← high system task threads, sweeper
│  (systemTask=40)  │     │  (systemTask=40)  │
└───────────────────┘     └───────────────────┘

Troubleshooting

Issue	Fix
Out of memory or slow performance	Check JVM heap usage and adjust `-Xms` / `-Xmx` as necessary. Monitor with `jstat` or the `/actuator/health` endpoint.
Elasticsearch stuck in yellow health	Set `conductor.elasticsearch.clusterHealthColor=yellow` or add more ES nodes for green.
Workflows stuck in RUNNING	Check sweeper is running and `sweeperThreadCount > 0`. Check lock provider is reachable.
System tasks not executing	Verify `systemTaskWorkerThreadCount > 0` and the queue backend is reachable.
Config changes not taking effect	Properties are baked into the Docker image at build time. Mount a volume instead of rebuilding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-hosted deployment guide

Architecture overview

Quick start with Docker Compose

Production configuration

Database

PostgreSQL

MySQL

Redis

Queue

Indexing

PostgreSQL indexing

Elasticsearch 7

OpenSearch

Async indexing

Indexing toggles

Locking

Redis lock

Zookeeper lock

Sweeper

System task workers

Running system task workers separately

Isolated system task workers

Postpone threshold

Event processing

Payload size limits

Workflow monitoring and observability

Recommended production configurations

PostgreSQL stack (simplest)

Redis + Elasticsearch stack (high throughput)

Running with Docker

Using Docker Compose

Using the standalone image

Custom configuration via volume mount

Accessing Conductor

Shutting down

Multi-instance deployment and horizontal scaling

Troubleshooting

FilesExpand file tree

deploy.md

Latest commit

History

deploy.md

File metadata and controls

Self-hosted deployment guide

Architecture overview

Quick start with Docker Compose

Production configuration

Database

PostgreSQL

MySQL

Redis

Queue

Indexing

PostgreSQL indexing

Elasticsearch 7

OpenSearch

Async indexing

Indexing toggles

Locking

Redis lock

Zookeeper lock

Sweeper

System task workers

Running system task workers separately

Isolated system task workers

Postpone threshold

Event processing

Payload size limits

Workflow monitoring and observability

Recommended production configurations

PostgreSQL stack (simplest)

Redis + Elasticsearch stack (high throughput)

Running with Docker

Using Docker Compose

Using the standalone image

Custom configuration via volume mount

Accessing Conductor

Shutting down

Multi-instance deployment and horizontal scaling

Troubleshooting