Skip to content

fix: derive NAT telemetry from ICE, drop leaky go-nats discovery#367

Closed
garmr-ulfr wants to merge 1 commit into
mainfrom
garmr/nat-telemetry-from-ice
Closed

fix: derive NAT telemetry from ICE, drop leaky go-nats discovery#367
garmr-ulfr wants to merge 1 commit into
mainfrom
garmr/nat-telemetry-from-ice

Conversation

@garmr-ulfr

Copy link
Copy Markdown
Contributor

Summary

The per-connection NAT-behavior telemetry ran a standalone go-nats STUN discovery whose goroutines parked on unbuffered channel sends on error paths, so the deferred Close on its pion/turn client never ran — leaking the goroutine and the client's ~7.45 MB read buffer on every connection cycle.

This derives the same telemetry from the WebRTC ICE candidates the consumer already gathers during connection setup, eliminating the standalone discovery entirely.

What changed

  • The STUN cohort is already configured as ICE servers, so the gathered server-reflexive (srflx) candidates yield external_ip, is_natted, port_preservation, and mapping_behavior (cone vs symmetric, by comparing mapped endpoints across servers).
  • The gathered candidates are threaded into consumer FSM state 4, where the span is emitted on both the success and failure paths.
  • otel.CollectAndSendNATBehaviorTelemetry(srvs, name)otel.SendNATBehaviorTelemetry(nat, name); a new pure summarizeNATFromICE maps candidates to the summary.
  • go-nats and the pion/turn v1.3.7 it pulled in are removed from the module (pion/turn/v4, used by pion/webrtc, is untouched).

Behavior change

filtering_behavior is not observable from ICE — it requires an RFC 5780 CHANGE-REQUEST, which ICE never issues — so it is reported as "unspecified" and the three cone variants (full / address-restricted / port-restricted) collapse to a single "Cone NAT". The symmetric-vs-cone signal (what actually breaks P2P traversal) is preserved. All OTel attribute keys are unchanged, so dashboards keep grouping; only the cone nat_type values differ.

The per-connection NAT-behavior telemetry ran a standalone go-nats STUN
discovery whose goroutines parked on unbuffered channel sends on error
paths, so the deferred Close on its pion/turn client never ran, leaking
the goroutine and the client's read buffer on every connection cycle.

Derive the same telemetry from the ICE candidates the consumer already
gathers during connection setup instead. The STUN cohort is configured as
ICE servers, so the gathered server-reflexive candidates yield external
IP, is-natted, port preservation, and mapping behavior (cone vs symmetric)
by comparing mapped endpoints across servers. The candidates are threaded
into consumer FSM state 4 where the span is emitted.

Filtering behavior is not observable from ICE (it needs an RFC 5780
CHANGE-REQUEST), so it is reported as "unspecified" and the cone variants
collapse to a single "Cone NAT"; all OTel attribute keys are unchanged.

Removes go-nats and the pion/turn v1.3.7 it pulled in from the module.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the standalone go-nats-based NAT discovery (and its associated leak risk) and instead derives the same per-connection NAT telemetry from the ICE candidates already gathered during the consumer WebRTC setup flow. This keeps the telemetry span emission but eliminates the redundant STUN probe and drops the extra dependency chain.

Changes:

  • Replace CollectAndSendNATBehaviorTelemetry([]string, name) with SendNATBehaviorTelemetry(NATSummary, spanName) and add a shared otel.NATSummary data type.
  • Add summarizeNATFromICE([]webrtc.ICECandidate) and thread gathered ICE candidates into consumer FSM state 4 so telemetry can be sent on both success and timeout failure paths.
  • Remove go-nats (and related transitive deps) from go.mod/go.sum.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
otel/otel_wasm_impl.go Updates wasm no-op telemetry API to the new SendNATBehaviorTelemetry(NATSummary, ...) signature.
otel/otel_default_impl.go Implements the new SendNATBehaviorTelemetry span emission using NATSummary attributes.
otel/nat.go Introduces NATSummary struct to carry NAT telemetry attributes consistently across build targets.
clientcore/nat.go Adds ICE-candidate-based NAT summarization logic used by the consumer FSM.
clientcore/nat_test.go Adds unit tests for summarizeNATFromICE.
clientcore/consumer.go Threads ICE candidates into state 4 and emits NAT telemetry spans using the derived summary.
go.mod Removes go-nats requirement and related indirect deps.
go.sum Removes checksums for removed dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread clientcore/nat.go
Comment on lines +84 to +100
type endpoint struct {
addr string
port uint16
}
distinct := make(map[endpoint]struct{}, len(fam))
for _, c := range fam {
distinct[endpoint{c.Address, c.Port}] = struct{}{}
}
switch {
case len(fam) < 2:
// A single srflx candidate can't reveal whether the mapping varies per server.
sum.MappingBehavior = natBehaviorUnknown
case len(distinct) == 1:
sum.MappingBehavior = natMappingIndependent
default:
sum.MappingBehavior = natMappingDependent
}
@garmr-ulfr

Copy link
Copy Markdown
Contributor Author

Closing as we're just going to remove the NAT discovery telemetry altogether.

@garmr-ulfr garmr-ulfr closed this Jun 12, 2026
@garmr-ulfr garmr-ulfr deleted the garmr/nat-telemetry-from-ice branch June 12, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants