-
Notifications
You must be signed in to change notification settings - Fork 537
Description
Summary
When scaling down a ClickHouseInstallation that uses Replicated database engine, the operator generates an incorrect SYSTEM DROP DATABASE REPLICA command. The stale replica metadata is not cleaned from ClickHouse Keeper, causing REPLICA_ALREADY_EXISTS errors on subsequent scale-up.
Environment
- ClickHouse Operator version: latest (tested on current main)
- ClickHouse version: 25.3.2.1
- Database engine:
Replicated
Steps to Reproduce
- Create a ClickHouseInstallation with 1 shard, 1 replica using
Replicateddatabase engine:CREATE DATABASE myDB ON CLUSTER 'all-replicated' ENGINE = Replicated('/clickhouse/databases/myDB', '{all-sharded-shard}-{shard}', '{replica}')
- Scale up to 2 replicas
- Scale back down to 1 replica
- Scale up again to 2 replicas
At step 4, the new replica fails with:
Code: 253. DB::Exception: There was an error on [chi-clickhouse-scw-0-1:9000]:
Code: 253. DB::Exception: Replica /clickhouse/databases/default/replicas/1-0|chi-clickhouse-scw-0-1
already exists. (REPLICA_ALREADY_EXISTS)
Root Cause
In pkg/model/chi/schemer/sql.go:244-248, the sqlDropReplica function generates:
func (s *ClusterSchemer) sqlDropReplica(shard int, replica string) []string {
return []string{
fmt.Sprintf("SYSTEM DROP REPLICA '%s'", replica),
fmt.Sprintf("SYSTEM DROP DATABASE REPLICA '%d|%s'", shard, replica),
}
}The shard parameter is an int (0-based ShardIndex), so the generated command is:
SYSTEM DROP DATABASE REPLICA '0|chi-clickhouse-scw-0-1'However, the Replicated database engine is created with macros for the shard identifier (e.g. {all-sharded-shard}-{shard}), which resolve to a string like 1-0. ClickHouse Keeper stores replica entries using this resolved database_shard_name (visible in system.clusters), not the integer shard index. The actual znode path is:
/clickhouse/databases/default/replicas/1-0|chi-clickhouse-scw-0-1
So the correct command should be:
SYSTEM DROP DATABASE REPLICA '1-0|chi-clickhouse-scw-0-1'