Skip to content

sni-router: break domain-fronting loop with pinned Caddy IP#478

Open
dolonet wants to merge 3 commits into9seconds:masterfrom
dolonet:fix/sni-router-fronting-loop
Open

sni-router: break domain-fronting loop with pinned Caddy IP#478
dolonet wants to merge 3 commits into9seconds:masterfrom
dolonet:fix/sni-router-fronting-loop

Conversation

@dolonet
Copy link
Copy Markdown
Contributor

@dolonet dolonet commented Apr 25, 2026

Summary

Follow-up to #462. When the secret's domain is the same as the
fronting domain (the recommended setup, since matching SNI/IP is the
whole point of the SNI-router topology), mtg's default fronting
behavior loops:

  1. Probe arrives on :443 with SNI=example.com → HAProxy routes to mtg.
  2. mtg sees it isn't real Telegram → falls back to domain fronting.
  3. mtg resolves the secret's hostname (example.com) via DNS.
  4. DNS points back at this server → mtg dials its own :443 → HAProxy.
  5. HAProxy sees SNI=example.com → routes to mtg → goto 2.

Reported by @gaudima in #462 (comment).

Fix

Pin Caddy's container address via a static sni network in
docker-compose.yml, and add a [domain-fronting] block in
mtg-config.toml pointing mtg at Caddy directly:

[domain-fronting]
ip = "172.28.0.10"
port = 8443
proxy-protocol = true

mtg now bypasses HAProxy for the fronting connection. PROXY protocol
v2 stays consistent (Caddy's :8443 already has the listener wrapper),
so Caddy's logs still see the real client IP.

domain-fronting.ip is parsed by TypeIP (net.ParseIP) and only
accepts a literal IP, not a hostname — hence the static subnet and
pinned ipv4_address rather than relying on docker DNS.

README gains a "Fronting loop" section explaining the cause and the
requirement to keep the pinned IP in sync between the two files.

Test plan

  • docker compose config validates with the new networks block
  • On a test VPS with DNS pointing at the host:
    • Telegram client connects through the proxy as before
    • curl https://DOMAIN/ returns Caddy's content
    • curl --resolve DOMAIN:443:HOST_IP -k -I https://DOMAIN/ (probe simulation: SNI matches the secret, no MTProto handshake) — connection terminates against Caddy without looping; Caddy's access log shows the real client IP

Comment on lines +92 to +94
> (Caddy's pinned address). Caddy may refuse the mixed-family header
> and log the docker-network address instead of the real client IP for
> that connection. Telegram traffic is unaffected.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fixable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — it disappears with the hostname change in #480. Dual-stack docker DNS lets mtg dial an IPv6 backend for IPv6 clients, so the PROXY v2 source/dest stay same-family. Caveat will be dropped once #480 merges.

Comment on lines +81 to +82
`docker-compose.yml` (mtg's `domain-fronting.ip` only accepts a literal
IP, not a hostname, hence the static `sni` network). `proxy-protocol =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fundamental mtg's restriction? Maybe try to fix it there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fundamental — TypeIP just calls net.ParseIP, but the rest of the dial path is hostname-capable. Opened #480 to add a sibling [domain-fronting].host that accepts hostname or IP. Once it lands this PR shrinks to a host = "web" line and the static subnet/pin go away.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#480 I suggest to have this one first, so the whole PR could be simplified

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can shrink it now, right?


networks:
sni:
driver: bridge
Copy link
Copy Markdown
Contributor

@bam80 bam80 Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bridge driver necessary?

@bam80
Copy link
Copy Markdown
Contributor

bam80 commented Apr 28, 2026

Just to clarify - is this problem happens only if both the hostname and the domain are fully equal, or also if they just partially intersect - so even if the hostname is a.b.com and the domain is b.com?

@dolonet
Copy link
Copy Markdown
Contributor Author

dolonet commented Apr 28, 2026

Good question — it's not about the names overlapping, it's about DNS.

HAProxy matches the SNI exactly (req_ssl_sni -i …), so a partial suffix wouldn't route. What actually triggers the loop is that mtg's default fronting target is the secret's hostname, which it resolves via DNS. In an SNI-router setup that hostname has to point at this same server (otherwise clients couldn't reach mtg at all), so mtg's fronting dial lands back on HAProxy carrying the original ClientHello → HAProxy sees the secret's SNI → routes to mtg → loop.

So in your a.b.com / b.com example, the relationship between the two names doesn't matter. As long as the secret's hostname resolves to this host (which it must, for the setup to work), the loop reproduces. The [domain-fronting] pin in this PR sidesteps it in every case by routing mtg directly to Caddy without going back through HAProxy.

Pushed a small README tweak (bcfacec) leading with "the trigger is DNS, not name equality" so the doc doesn't imply the matching-name case is the only one.

@bam80
Copy link
Copy Markdown
Contributor

bam80 commented Apr 29, 2026

Could we add a loop detection to the mtg runtime and/or it's config check doctor mode?

@dolonet
Copy link
Copy Markdown
Contributor Author

dolonet commented Apr 30, 2026

Good idea, but I'd rather not expand 478 (it's config + docs only) — happy to track it as a separate issue/PR.

Sketch of a feasible runtime check: when [domain-fronting] isn't set, resolve the secret's hostname at startup and compare the result against the local interface addresses / the bind address. On a match, warn that the fronting dial may loop back through the same listener and recommend pinning [domain-fronting] upstream. Warning, not fatal — legitimate self-fronting is rare but possible.

Caveat worth being upfront about: the check only sees the direct case. An SNI-router on a separate IP that ultimately routes back to mtg would slip through, since mtg's outbound dial lands on a "foreign" IP. A precise detector would need an out-of-band marker on the fronting connection, which MTProto doesn't expose cleanly. So 80% coverage from a cheap check, the rest stays a documentation problem.

Shall I open a follow-up issue with this scope?

@bam80
Copy link
Copy Markdown
Contributor

bam80 commented Apr 30, 2026

Sure, thanks.

@dolonet
Copy link
Copy Markdown
Contributor Author

dolonet commented Apr 30, 2026

Wanna hear @9seconds's opinion on that before diving in :)

@dolonet
Copy link
Copy Markdown
Contributor Author

dolonet commented May 4, 2026

Sounds good — happy to wait on #480. Once it lands, this PR collapses to:

  • drop the static networks: block and ipv4_address pin in docker-compose.yml,
  • replace [domain-fronting].ip = "172.28.0.10" with host = "web" (the compose service name),
  • simplify the README's "Fronting loop" section accordingly (the dual-stack caveat noted at L101 also goes away — docker DNS gives mtg an A or AAAA per client, so PROXY v2 stays same-family).

I'll rebase and force-push the simplified version after #480 merges.

Separately, on the runtime loop-detection idea raised above (#478 (comment)) — would you like me to open a follow-up issue with the "resolve secret hostname at startup, warn if it matches a local interface, non-fatal" sketch? Easy to track, easy to scope, but I didn't want to open it without your nod.

dolonet added 3 commits May 4, 2026 16:12
When the secret's domain points at this server (the recommended
deployment), mtg's default fronting behavior dials that domain on :443
and the connection lands on HAProxy. HAProxy sees the SNI matching the
secret and routes back to mtg, looping until something gives.

Pin Caddy's container address via a static `sni` network and point
mtg's `[domain-fronting]` at it directly with `proxy-protocol = true`,
matching Caddy's :8443 PROXY listener wrapper. mtg's
`domain-fronting.ip` only accepts a literal IP (not a hostname), so the
network needs a fixed subnet.

README documents the loop, the fix, and the requirement to keep the
pinned IP in sync between docker-compose.yml and mtg-config.toml.

Reported by @gaudima in 9seconds#462.
- Use list form for `networks: [sni]` on services that need no
  per-network config; keep map form only on `web` where ipv4_address
  requires it.
- README: note that the 172.28.0.0/24 subnet can be changed if it
  collides with an existing host network (and remind to update both
  files in lockstep).
- README: caveat that IPv6 fronting may lose the real client IP in
  Caddy's logs because mtg constructs a mixed-family PROXY v2 header
  (IPv6 source, IPv4 destination); Telegram traffic unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants