fix: derive NAT telemetry from ICE, drop leaky go-nats discovery#367
Closed
garmr-ulfr wants to merge 1 commit into
Closed
fix: derive NAT telemetry from ICE, drop leaky go-nats discovery#367garmr-ulfr wants to merge 1 commit into
garmr-ulfr wants to merge 1 commit into
Conversation
The per-connection NAT-behavior telemetry ran a standalone go-nats STUN discovery whose goroutines parked on unbuffered channel sends on error paths, so the deferred Close on its pion/turn client never ran, leaking the goroutine and the client's read buffer on every connection cycle. Derive the same telemetry from the ICE candidates the consumer already gathers during connection setup instead. The STUN cohort is configured as ICE servers, so the gathered server-reflexive candidates yield external IP, is-natted, port preservation, and mapping behavior (cone vs symmetric) by comparing mapped endpoints across servers. The candidates are threaded into consumer FSM state 4 where the span is emitted. Filtering behavior is not observable from ICE (it needs an RFC 5780 CHANGE-REQUEST), so it is reported as "unspecified" and the cone variants collapse to a single "Cone NAT"; all OTel attribute keys are unchanged. Removes go-nats and the pion/turn v1.3.7 it pulled in from the module.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes the standalone go-nats-based NAT discovery (and its associated leak risk) and instead derives the same per-connection NAT telemetry from the ICE candidates already gathered during the consumer WebRTC setup flow. This keeps the telemetry span emission but eliminates the redundant STUN probe and drops the extra dependency chain.
Changes:
- Replace
CollectAndSendNATBehaviorTelemetry([]string, name)withSendNATBehaviorTelemetry(NATSummary, spanName)and add a sharedotel.NATSummarydata type. - Add
summarizeNATFromICE([]webrtc.ICECandidate)and thread gathered ICE candidates into consumer FSM state 4 so telemetry can be sent on both success and timeout failure paths. - Remove
go-nats(and related transitive deps) fromgo.mod/go.sum.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
otel/otel_wasm_impl.go |
Updates wasm no-op telemetry API to the new SendNATBehaviorTelemetry(NATSummary, ...) signature. |
otel/otel_default_impl.go |
Implements the new SendNATBehaviorTelemetry span emission using NATSummary attributes. |
otel/nat.go |
Introduces NATSummary struct to carry NAT telemetry attributes consistently across build targets. |
clientcore/nat.go |
Adds ICE-candidate-based NAT summarization logic used by the consumer FSM. |
clientcore/nat_test.go |
Adds unit tests for summarizeNATFromICE. |
clientcore/consumer.go |
Threads ICE candidates into state 4 and emits NAT telemetry spans using the derived summary. |
go.mod |
Removes go-nats requirement and related indirect deps. |
go.sum |
Removes checksums for removed dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+84
to
+100
| type endpoint struct { | ||
| addr string | ||
| port uint16 | ||
| } | ||
| distinct := make(map[endpoint]struct{}, len(fam)) | ||
| for _, c := range fam { | ||
| distinct[endpoint{c.Address, c.Port}] = struct{}{} | ||
| } | ||
| switch { | ||
| case len(fam) < 2: | ||
| // A single srflx candidate can't reveal whether the mapping varies per server. | ||
| sum.MappingBehavior = natBehaviorUnknown | ||
| case len(distinct) == 1: | ||
| sum.MappingBehavior = natMappingIndependent | ||
| default: | ||
| sum.MappingBehavior = natMappingDependent | ||
| } |
Contributor
Author
|
Closing as we're just going to remove the NAT discovery telemetry altogether. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The per-connection NAT-behavior telemetry ran a standalone
go-natsSTUN discovery whose goroutines parked on unbuffered channel sends on error paths, so the deferredCloseon itspion/turnclient never ran — leaking the goroutine and the client's ~7.45 MB read buffer on every connection cycle.This derives the same telemetry from the WebRTC ICE candidates the consumer already gathers during connection setup, eliminating the standalone discovery entirely.
What changed
external_ip,is_natted,port_preservation, andmapping_behavior(cone vs symmetric, by comparing mapped endpoints across servers).candidatesare threaded into consumer FSM state 4, where the span is emitted on both the success and failure paths.otel.CollectAndSendNATBehaviorTelemetry(srvs, name)→otel.SendNATBehaviorTelemetry(nat, name); a new puresummarizeNATFromICEmaps candidates to the summary.go-natsand thepion/turnv1.3.7 it pulled in are removed from the module (pion/turn/v4, used bypion/webrtc, is untouched).Behavior change
filtering_behavioris not observable from ICE — it requires an RFC 5780 CHANGE-REQUEST, which ICE never issues — so it is reported as"unspecified"and the three cone variants (full / address-restricted / port-restricted) collapse to a single"Cone NAT". The symmetric-vs-cone signal (what actually breaks P2P traversal) is preserved. All OTel attribute keys are unchanged, so dashboards keep grouping; only the conenat_typevalues differ.