Skip to content

docs: interconnect troubleshooting runbook (diagnosing tier1 peers via pg_cluster_ic_peers) #6

Description

@sqlrush

Summary

We have a complete reference for pg_cluster_ic_peers
(docs/reference/system-views.md) and a working multi-node walkthrough
(docs/user-guide/bootstrap.md), but no task-oriented troubleshooting guide
for when the tier1 (TCP) interconnect doesn't come up. This issue adds one.

Why this is a good first issue

  • Pure documentation — no C, no cluster internals.
  • Everything you need is already in the tree: the pg_cluster_ic_peers column
    reference and the cluster.interconnect_* GUC docs.
  • You can reproduce every failure state locally with two nodes on loopback
    (see the multi-node section of bootstrap.md).

What to do

Add docs/user-guide/troubleshooting-interconnect.md and link it from
bootstrap.md. For each common failure give the symptom as seen in
pg_cluster_ic_peers, the likely cause, and the fix. At minimum cover:

  1. pg_cluster_ic_peers returns zero rowscluster.interconnect_tier is
    stub, not tier1.
  2. Peer stuck state = down, connect_error_count climbing, last_error = "connect SO_ERROR: Connection refused" (errno 61), last_error_code = 08001
    → peer not started, or wrong interconnect_addr / port unreachable.
  3. Peer flapping → non-zero reconnect_count; how to read last_connect_at
    vs last_recv_at.
  4. Heartbeats not advancing → compare heartbeat_send_count /
    heartbeat_recv_count across two queries a few seconds apart.
  5. state = rejected → what rejects a peer (membership/handshake) and where to
    look next.

Tip: to reproduce (2), start only node 0 from the bootstrap walkthrough and
watch node 0's view of node 1 before node 1 is up.

Definition of done

  • New docs/user-guide/troubleshooting-interconnect.md, each scenario as
    symptom → cause → fix.
  • Linked from docs/user-guide/bootstrap.md.
  • Column/GUC names match the existing docs (no invented fields).

Pointers

  • docs/reference/system-views.md## pg_cluster_ic_peers
  • docs/user-guide/configuration.mdcluster.interconnect_*
  • docs/user-guide/bootstrap.md → "Multi-node cluster (tier1 TCP interconnect)"

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationgood first issueGood for newcomers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions