Skip to content

bound remote device count in ib connect and accept#2228

Open
jmestwa-coder wants to merge 1 commit into
NVIDIA:masterfrom
jmestwa-coder:ib-ndevs-bound
Open

bound remote device count in ib connect and accept#2228
jmestwa-coder wants to merge 1 commit into
NVIDIA:masterfrom
jmestwa-coder:ib-ndevs-bound

Conversation

@jmestwa-coder

Copy link
Copy Markdown

Peer-supplied ndevs in the IB connection metadata is used to index fixed-size arrays without a range check:

  • ncclIbConnectImpl stores remMeta.devs[i] into remDevs[]/rkeys[] for i up to remMeta.ndevs
  • ncclIbAcceptImpl does the same on the receiver side
  • those arrays are sized NCCL_IB_MAX_DEVS_PER_NIC, so a peer reporting ndevs above that count reads and writes past the end
    Validate the count is within range right after the metadata is received, on both sides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant