Skip to content

[Bug]: CDC daemon task may panic on CN startup before frontend initialization #24175

@LeftHandCold

Description

@LeftHandCold

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

3.0-dev

Commit ID

1b47702

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

CN panics during CDC daemon task startup with a nil pointer dereference in frontend internal executor initialization:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4103038]

goroutine 1027 [running]:
github.com/matrixorigin/matrixone/pkg/frontend.getPu(...)
	/go/src/github.com/matrixorigin/matrixone/pkg/frontend/server.go:403
github.com/matrixorigin/matrixone/pkg/frontend.(*internalExecutor).Query(...)
	/go/src/github.com/matrixorigin/matrixone/pkg/frontend/internal_executor.go:172
github.com/matrixorigin/matrixone/pkg/frontend.(*CDCTaskExecutor).retrieveCdcTask(...)
	/go/src/github.com/matrixorigin/matrixone/pkg/frontend/cdc_exector.go:1214
github.com/matrixorigin/matrixone/pkg/frontend.(*CDCTaskExecutor).Start(...)
	/go/src/github.com/matrixorigin/matrixone/pkg/frontend/cdc_exector.go:245

Expected Behavior

CDC daemon tasks should not start until the CN frontend / internal executor dependencies are ready. CN startup should not panic even if HAKeeper delivers CreateTaskService before the frontend stack finishes initialization.

Steps to Reproduce

1. Use branch `3.0-dev`.
2. Ensure there is an existing CDC daemon task that can be claimed when the CN comes up.
3. Start or restart a CN.
4. Let HAKeeper send `CreateTaskService` during early startup.
5. The heartbeat path may create the task service and start the task runner before `initMOServer()` has finished initializing frontend server-level vars and engine dependencies.
6. A claimed CDC task can then call `frontend.NewInternalExecutor(...).Query(...)` and panic in `getPu(service)`.

Additional information

The root cause is a startup ordering race in CN service initialization:

  • NewService() calls getHAKeeperClient(), which starts the CN heartbeat task.
  • The heartbeat command path can call createTaskService() and startTaskRunner() before initMOServer() completes.
  • initMOServer() is what initializes frontend server-level runtime dependencies used by internal executor sessions (setPu, engine, txn client, storage engine, etc.).
  • CDC task startup uses an internal executor immediately in retrieveCdcTask(), so starting the runner too early can dereference an uninitialized frontend Pu and crash the CN.

A behavior-safe fix is to allow task service creation early, but defer starting the task runner until CN startup has completed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions