Skip to content

connect: connection refused job #669

@gcr-ran

Description

@gcr-ran

I created a task with the source FE node configured as 221 and the target FE node configured as 225. After the source FE node 221 was killed, data synchronization could proceed normally. However, when the target FE node 225 is killed, an error occurs saying that it cannot connect to FE 225. The log is as follows:

`
[2026-01-16 10:38:05.135] INFO GOROUTINE: Total = 28 line=ccr_syncer/monitor.go:45
[2026-01-16 10:38:05.135] INFO MEMORY STATS: Alloc = 3 MiB, TotalAlloc = 18 MiB, Sys = 36 MiB, NumGC = 6, LiveObjects = 19058 line=ccr_syncer/monitor.go:55
[2026-01-16 10:38:05.135] INFO JOB STATS: Total = 1, Running = 1, DBSync = 1, TableSync = 0 line=ccr_syncer/monitor.go:84
[2026-01-16 10:38:05.135] INFO JOB STATUS: FullSync = 0, PartialSync = 0, IncrementalSync = 1 line=ccr_syncer/monitor.go:86
[2026-01-16 10:38:18.156] INFO begin txn 228232, label: ccrj-1c5b:db_sync:1142520:4738312:1258786, db: 4738312 job=ccr_test_2014_210 line=ccr/job_pipeline.go:680
[2026-01-16 10:38:18.161] INFO txn 228232 ingest binlog: run 1 tablet ingest jobs line=ccr/ingest_binlog_job.go:720
[2026-01-16 10:38:18.175] INFO commit txn 228232 success, commit seq: 1258786 job=ccr_test_2014_210 line=ccr/job_pipeline.go:808
[2026-01-16 10:38:21.154] INFO consume prev txn id: 228232 job=ccr_test_2014_210 line=ccr/job_pipeline.go:483
[mysql] 2026/01/16 10:38:21 packets.go:122: closing bad idle connection: EOF

[2026-01-16 10:38:21.155] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:22.156] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:23.157] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:24.158] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:25.159] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:26.160] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:27.161] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:28.162] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:29.163] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:30.163] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:31.165] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
[2026-01-16 10:38:32.165] ERROR wait transaction done failed, err +query restore state failed: [normal] dial tcp 10...225:9030: connect: connection refused job=ccr_test_2014_210 line=base/spec.go:1137
`

After checking the storage of the metadata backend, the configuration of both the source and target Doris clusters within the corresponding job contains information about other standby FE nodes. I don't understand why data synchronization can proceed normally when the source FE node 221 is killed, but an error occurs when the target FE node 225 is killed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions