Skip to content

dev_runtime: restore capture mode on GIN setup failure#2229

Open
wanglei875 wants to merge 1 commit into
NVIDIA:devfrom
wanglei875:fix-gin-setup-capture-mode
Open

dev_runtime: restore capture mode on GIN setup failure#2229
wanglei875 wants to merge 1 commit into
NVIDIA:devfrom
wanglei875:fix-gin-setup-capture-mode

Conversation

@wanglei875

@wanglei875 wanglei875 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

ncclDevrCommCreateInternal() switches the CUDA stream capture mode before setting up device communicator resources. Most failure paths after that point use GOTO cleanup labels so the original capture mode is restored.

The GIN setup path used NCCLCHECK(ncclGinDevCommSetup(...)), which returns directly on failure and skips the common fail path. This can leave the thread capture mode unrestored when GIN setup fails.

This change routes the failure through the existing fail label by using NCCLCHECKGOTO.

Changes & Impact

  • Restore the existing cleanup behavior on ncclGinDevCommSetup() failure.
  • No change to the success path.
  • No API or performance impact.

Testing

  • git diff --check
  • Started make -j$(nproc) src.build; src/dev_runtime.cc compiled successfully. Full device kernel build was not completed due to long compile time.

Signed-off-by: WangLei <1539790288@qq.com>
Copilot AI review requested due to automatic review settings June 9, 2026 09:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates error handling in ncclDevrCommCreateInternal to route a setup failure through the common cleanup path.

Changes:

  • Switches ncclGinDevCommSetup failure handling from NCCLCHECK to NCCLCHECKGOTO(..., ret, fail).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants