Add NULL checks after malloc and improve diagnostic logging in shadow pool functions by Pritiks23 · Pull Request #2175 · NVIDIA/nccl

Pritiks23 · 2026-05-13T17:41:29Z

Description

This pull request combines the changes from PRs #2172 and #2173:

Added NULL checks after malloc in ncclShadowPoolAlloc to prevent null pointer dereferences and silent crashes.
Improved diagnostic logging in ncclShadowPoolFree error path to include device pointer address, hash bucket index, and total object count for better debugging.
These changes enhance the robustness and maintainability of the code by addressing potential memory allocation failures and improving error diagnostics.

Related Issues

PR #2172: Add NULL checks after malloc in ncclShadowPoolAlloc
PR #2173: Improve diagnostic logging in ncclShadowPoolFree error path

Changes & Impact

Added NULL checks after malloc calls in ncclShadowPoolAlloc:
pool->table
table1 (hash table expansion)
page (shadow pool metadata)
obj (shadow pool objects)
Improved WARN messages in ncclShadowPoolFree to include:
Device pointer address
Hash bucket index
Total object count in the pool
These changes prevent silent crashes and make debugging memory issues easier.

Performance Impact

No significant performance impact is expected.
Changes have been tested to ensure no regressions in functionality.

Signed-off-by:
Pritika Vipin [pritikavipin@gmail.com]

…resource sharing mode support to all GDAKI APIs ## Summary - Introduce `NCCL_GIN_RESOURCE_SHARING_THREAD`. - Add resource sharing support to Get and Flush in GDAKI. - Add resource sharing thread to all GDAKI APIs. ## Implementation - Add `NCCL_GIN_RESOURCE_SHARING_THREAD` to `enum ncclGinResourceSharingMode`. - Modify Get and Flush APIs in GDAKI to support all resource sharing mode. - Add resource sharing mode thread support to Put and PutValue in GDAKI. Signed-off-by: Pak Markthub <pmarkthub@nvidia.com> Signed-off-by: Xiaofan Li <xiaofanl@nvidia.com> Mirrored-from: c622884df5b4bc3f8fee42de0368baddea93143a

worldGinBarrier sends a signal among all peers. With railed GIN, not all peers are connected, so worldGinBarrier does not work Signed-off-by: Katie Gioioso <kgioioso@nvidia.com> Mirrored-from: 40f87759058e2a86837c14085717150b1318336c

Mirrored-from: 95f7222a565c69972bb7cc65b1e5105444ec93ff

Signed-off-by: Abhilash Kolluri <akolluri@nvidia.com> Mirrored-from: 986da69af53986859ad84384631e6ad54dfd0358

Signed-off-by: Abhilash Kolluri <akolluri@nvidia.com> Mirrored-from: baae513fdbd309cd2012cbeb074ef267a02d6b30

Signed-off-by: Abhilash Kolluri <akolluri@nvidia.com> Mirrored-from: 1b9e5172ba033638fcca26c2e3fd69f459628914

Signed-off-by: Rami Benayed <rbenayed@nvidia.com> Mirrored-from: a7818c7952acf2d3222e4dcbe28efd5982617ede

Signed-off-by: Rami Benayed <rbenayed@nvidia.com> Mirrored-from: 86836e195224482b51651d556479410c6d993471

When multiple GPUs share the network devices, selecting a different starting device is key to avoid conflicts as much as possible. To enforce consistency, we use the `gpu.dev % localGpuCount` in order to avoid breaking it with the mirrorBit operation. This change affect cases with no NET sharing (localGpuCount = 1) and cases with 4 NETs to be shared by multiple GPUs. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: f431b8143728d9e6d2032dbc479d83c57e57ddc5

…communicators When multiple NCCL communicators are created in the same process, the first communicator creates virtual NICs (vNICs) via ncclTopoMakeVNics and caches the vNIC count. Subsequent communicators skip vNIC creation (topo.cc:1413) but fail to mark the merged physical NICs as keep=0. This causes the second communicator's topology to contain both physical and virtual NICs, giving the ring search extra NIC paths. When two sub-communicators (e.g., DP groups in a TP+DP setup) are created after a TP communicator, they can end up with different ring topologies due to the extra NICs, leading to different reduction orders in reduce_scatter and bitwise non-deterministic results. The fix queries each existing vNIC's constituent physical devices and marks them keep=0, matching what ncclTopoMakeVnic does at line 1037. Signed-off-by: Naman Goyal <ngoyal2707@gmail.com> Mirrored-from: d7f092607f32b68d2499441c885130e785fc20f5

When a communicator is created after the fusion has already been done, NCCL must ignore the already fused NICs. The physical NICs are always marked as keep = 1, then the virtual NICs are added to the topology as keep = 1 and the fused physical NICs are removed with keep = 0. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: 7309217a19f06b782861a9be6177d10523935de7

Fixes: NVIDIA#2026 Signed-off-by: Ronald Eytchison <ronald.eytchison@trailofbits.com> Mirrored-from: c1cbee3d05ab3acac075abf6f44922cc27a150b2

WSAPoll was called on the first 1+maxnpeers slots instead of reaching the listen socket at pollfds[maxProxyConnections], polling INVALID_SOCKET and causing WSAENOTSOCK (10038) during proxy initialization. Signed-off-by: Timothy Lo <tilo@nvidia.com> Mirrored-from: 27b6898359fd3a60d2ba525b134168bb7b388ea6

After creation of the vNICs for a specific network plugin, the physical NICs are marked as unused by that plugin. If the physical dev is unused by all plugin, the dev is marked as keep=0 and trimmed later. Signed-off-by: Ahsan Pervaiz <apervaiz@nvidia.com> Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: 7beadaadc11b0f7c35e97716363521eccb6ef6ee

Signed-off-by: Mark Santesson <msantesson@nvidia.com> Mirrored-from: ad34e21727f315056b3c32d2c5b3227c79f40578

Reduce LL kernels memory consumption by including the top-k indices in the token message payloads. On dispatch, this allows to avoid maintaining a separate buffer space per local-expert/remote-rank pair and instead having one space for each remote rank. This reduces the memory overhead from O(E x B x H) down to O(N x B x H), where E is number of experts, N - number of ranks, B - batch size, and H - token hidden dimension On combine, the top-k indices are used to reduce the communication buffer from O(E x B x H) to O(K x B x H). Signed-off-by: Artem Y. Polyakov <artemp@nvidia.com> Mirrored-from: 9abffbbc4a83968034f5c5966c44cb7b525d4d1c

Creating CUDA memory pools with the default maxSize=0 causes the driver to reserve virtual address space equal to twice the GPU's physical memory, leading to severe VA exhaustion on 48-bit systems during multi-communicator training. This commit caps the shadow pool's maxSize to a default of 1 GiB, Users can easily customize or disable this limit with a new environment variable. Signed-off-by: Serapheim Dimitropoulos <sdimitropoulos@coreweave.com> Mirrored-from: 5610a4048efcb1a0fe98746576768e52e31335a7

In MoE training loops the expert routing (topk_idx) changes every iteration but buffer sizes are determined by group-level constants. The current API forces a full destroy+create cycle, causing unnecessary cudaMalloc/cudaFree traffic on every iteration. ncclEpUpdateHandle reuses the existing handle's buffers and re-runs only the topk_idx-dependent computation (buffer resets, bitmap conversion, allgather, metadata preprocessing). ncclEpCreateHandle now delegates to it after allocation, so the first-call behavior is unchanged. Signed-off-by: Ke Wen <kwen@nvidia.com> Mirrored-from: 13dcf1fbf49d441f004d3a7d2bc1f8023996c42a

…n HT mode Signed-off-by: Subhadeep Bhattacharya <subhadeepb@nvidia.com> Mirrored-from: 2110fd0b94d180dade732404e601419cb1cfb542

Signed-off-by: Katie Gioioso <kgioioso@nvidia.com> Mirrored-from: 4f0a5727cf123c158aad0897af0eb2e84945a3a8

Move platform-specific PCI topology discovery functions out of graph/xml.cc and into the OS abstraction layer (linux.cc/windows.cc), declared in os.h Signed-off-by: Timothy Lo <tilo@nvidia.com> Mirrored-from: c95e613cf3e63b19d0a106a7886a0eb90f31518f

Introduce platform-specific code paths in ncclTopoGetXmlFromSys and ncclTopoGetXmlFromGpu to support topology detection on Windows Signed-off-by: Timothy Lo <tilo@nvidia.com> Mirrored-from: 89c7ac993392ab8d4ed42c242679ee5470e57235

Add thin wrappers in nvmlwrap.h/nvmlwrap.cc for NVML APIs used by the Windows topology discovery code paths Signed-off-by: Timothy Lo <tilo@nvidia.com> Mirrored-from: 144b16eec3cb13c1c6f7fe8521efe3b77e0c95f0

Signed-off-by: Pak Markthub <pmarkthub@nvidia.com> Mirrored-from: a622fa1430843d91c739c6258a59c8ff4edb2925

The environment plugin init function does not currently take the nccl debug logger function as other plugins do. Adding the logger function to the env plugin init function avoid the plugin implementing adhoc logging solutions. Signed-off-by: Giuseppe Congiu <gcongiu@nvidia.com> Mirrored-from: 7ce326dfaf79cae742b7f13d45a6243fd20faabb

… sizes Replace 6 individual cudaMemsetAsync calls with 2 bulk calls using the preprocessing_zero_region_size and preprocessing_s2d_size already computed during allocation. The buffers are sized to max_tokens capacity, so clearing the full region is safe and eliminates per-call size recomputation with no meaningful performance difference. Signed-off-by: Ke Wen <kwen@nvidia.com> Mirrored-from: fbdcc10487e6bf33987f220d3210786396c63cbc

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com> Mirrored-from: 36a79161b86ab70e343a1efd43c423f9fd0923a3

… HT dispatch Signed-off-by: Subhadeep Bhattacharya <subhadeepb@nvidia.com> Mirrored-from: 228c3aa7405a5462c2311af3710a9cc88deed531

Remove the HOST-tag path that staged per-expert counts through a device temporary and copied them back via cudaMemcpyAsync D2H. Only the DEVICE tag is now accepted, which writes directly to the caller's device buffer with no D2H copy. The remaining D2H copy (num_tokens_for_experts_host mirror) is annotated; removing it is a separate step toward full CUDA graph compatibility. Signed-off-by: Ke Wen <kwen@nvidia.com> Mirrored-from: 25fe7df1f923d16a35b90e8d1704cd7de92d2b76

…ecvTokens The async D2H mirror of num_tokens_for_experts in ncclEpUpdateHandle was the last remaining obstacle to CUDA graph capture. Move the copy to ncclEpHandleGetNumRecvTokens (a synchronous query API that doesn't need to be graphable) and remove the host-pinned cache field entirely. Signed-off-by: Ke Wen <kwen@nvidia.com> Mirrored-from: a91d5d39bc89a7849dd8673bbf506172c06e6ce7

Indicate to plugins when they only need to connect to ranks on the same rail. Implement rankStride in the RMA IB plugin. The parameter is currently ignored in the GDAKI plugin Signed-off-by: Sylvain Jeaugey <sjeaugey@nvidia.com> Mirrored-from: 80aa5e8ad689b8bee76cfdc82d45bb2d4ab95360

## Summary Refactors NCCL internal logging so full location flows through **`ncclDebugLogInternal`** (`file`, `func`, `line`), while `ncclDebugLog` keeps the historical **`ncclDebugLogger_t`** calling convention to be ABI-safe. Adds **`ALLOC_HOST`** under **`NCCL_DEBUG_SUBSYS`** for optional verbose host-heap **`INFO`** traces. Uses **`INFO_LOC` / `INFO_LOC_FN`** (and **`TRACE_LOC` / `TRACE_LOC_FN`**) so messages that should carry **`file:line (func)`** stay uniform without repeating **`"%s:%d (%s)"`** everywhere. ## Implementation details ### Debug symbols & ABI - **`ncclDebugLogInternal(level, flags, file, func, line, fmt, …)`**: hidden/internal implementation path used by **`VERSION` / `WARN` / `INFO` / `TRACE`** macros (Linux: **`visibility("hidden")`**). - **`ncclDebugLog`**: default-visible; **`legacy`** third argument semantics (**WARN** → path/file string; **TRACE** → function name string). Implemented via shared **`ncclDebugLogV`**. ### Host allocation diagnostics (**`ALLOC_HOST`**) - Verbose host **`malloc` / `calloc` / realloc**, CUDA host alloc helpers, and related **`INFO`** lines use subsystem **`ALLOC_HOST`**; **`NCCL_DEBUG_SUBSYS`** must include it - **`ncclCallocQuiet` / `ncclReallocQuiet`** suppress **`ALLOC_HOST`** **`INFO`** on high-churn paths (e.g. RAS). - **`ncclCalloc` / `ncclRealloc`** **`unique_ptr`-style overloads** take **`bool logHostAlloc`** so **`ncclCalloc`** macros stay consistent. ### **`INFO_LOC`** helpers - **`INFO`** still passes **`nullptr`** **`file`/`func`** into **`ncclDebugLogInternal`**; messages embed location only via **`INFO_LOC`** / **`INFO_LOC_FN`** (wrappers pass **`__FILE__`/`__LINE__`/`__func__`** or explicit **`file`/`line`/`caller`** from **`alloc.h`** debug wrappers). - **`TRACE_LOC` / `TRACE_LOC_FN`** mirror **`INFO_LOC`** when **`ENABLE_TRACE`** is defined. Signed-off-by: David Addison <daddison@nvidia.com> Signed-off-by: Xiaofan Li <xiaofanl@nvidia.com> Mirrored-from: f03e1b354e71062bb456b959fc3a33d5af04786d

… (Please DO NOT squash) ## Summary * Extend the allocator interface per customer request * Address C99 issues spotted during python integration (see !2611) Signed-off-by: Artem Y. Polyakov <artemp@nvidia.com> Signed-off-by: Ke Wen <kwen@nvidia.com> Mirrored-from: 4eb8d7ec9f1e89152e30881e3ea44a4ad301a903

- prefer path with higher bandwidth over shorter hop count, - refactor ncclGetLocalCpu function to use ncclTopoGetLocal Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: 6a5b60617f6dfb5f098633c6500631a3e22887e5

Refactor the multirank GPU detection to not trigger when MloPart is being used in the communicator. We still silently disable NVLS when two ranks are using the same NVML dev. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: 83f0005568223da5405bffe0ff7f3f65610c0432

When MloPart is enabled, NCCL will split the device into two sub DEV nodes, one for each of the uGPU. This will lead to 50% NVLink bw for each sub DEV to the NVS and a LOC link between both sub DEVs. Further, each sub-dev is able to saturate 100% of the C2C bw. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: f5a945fa7680fefa20669160d0844c5868d15fe6

gdrSupport is disabled if unsupported. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: feef28ea1b47202a9c4ac9a0ea837ed38b857ff1

Disable the UB registration path if the mlopart buffer doesn't have RDMA. At enqueue, if the buffer hasn't been registered, NCCL will use the internal buffering. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: 3f2473766411b7ff2cadf610b95fa7773b921f24

## Summary Adds a Python device API that lets users write NCCL device-side kernels in CuTeDSL (`cutlass.cute`). Adds an example (`examples/cute/main.py`): 2-rank, 1 MiB GIN put with signal/wait. Two separate registered windows for clarity; one-step `@cute.jit` direct call with `cutlass.Int64`-annotated args; host-side element-wise validation after `device.sync()`. Signed-off-by: Xiakun Lu <xiakunl@nvidia.com> Mirrored-from: 0ad86ca37281a2acb888725146992e5c4f9ce072

Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: 3d9d627c3facb45a2c4308aafc6cde20c7f66a09

Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: 33a689942a57801bd217626d078397a750cc0d38

Pritiks23

Appreciate your review and valuable insights @justus-nv, Please see my changes.

justus-nv · 2026-05-18T16:42:30Z

Appreciate your review and valuable insights @justus-nv, Please see my changes.

Thank you for incorporating my feedback. It is in much better shape. There is one possible memory leak for page (noted in comments). Once the leak is properly handled. The change should be good to go.

Pritiks23

@justus-nv Thank you, Implemented. In the page-allocation path, I converted both cudaMallocFromPoolAsync and cudaMemsetAsync checks to CUDACHECKGOTO. I also added cleanup in the error path so if either call fails, any half-created page is properly freed. While checking this logic, I fixed the same kind of leak in the non-page path and initialized the page pointer to prevent undefined behavior if execution jumps to the fail label before page is assigned.

justus-nv

Thanks for the quick follow up. The changes look good.

There are two minor issues that need to be addressed.

All commits need sign-off (use -s git option). I checked your commit messages, but only one commit has a sign-off. The easiest solution may be to squash and rebase with a new signed-off commit.
pool->count is an int, but the formatting is %lu in updated warning log in ncclShadowPoolFree.

Once those two issues are dealt with, we should be good to move forward.

Pritiks23 · 2026-05-18T19:38:01Z

Appreciate you pointing these out. I’ve fixed the Signed-off-by handling and updated the format specifier to match pool->count. @justus-nv

justus-nv · 2026-05-18T23:48:08Z

/mirror v2.31

jskrobola · 2026-05-19T03:30:28Z

/mirror v2.31

NCCL-Bot · 2026-05-19T03:32:46Z

Mirroring to the internal repository failed.

The automated mirror did not complete. This is likely due to a conflict. Please ensure the PR is targeting the proper branch, and is rebased to include recent changes.

Casting FLAGS to unsigned long was unnecessary and was in fact hiding errors if somebody forgot to specify the debug subsystem. Also fix a compiler warning about mismatched argument types when compiling with TRACE enabled. Signed-off-by: Kamil Iskra <kiskra@nvidia.com> Mirrored-from: 99144ee3beeb73ef0f57a687b65c660a4964f0dd

Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: 57d9f7ae7a4396e5232d1472797a56aff3d8c231

…acros that was incorrectly reverted during merge Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: d048f213dea659ad4634c76aba4a1a1c8fc3dbe2

justus-nv · 2026-05-21T00:16:48Z

/mirror v2.31

justus-nv · 2026-05-21T00:23:01Z

@Pritiks23 Can you rebase your branch onto dev. This will enable us to pull your branch into our internal rep.

Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>

Pritiks23 · 2026-05-21T03:50:28Z

Sure @justus-nv

#2188

pakmarkthub and others added 30 commits April 11, 2026 06:18

Merge remote-tracking branch 'origin/master' into v2.30u1

49827ef

Mirrored-from: 95f7222a565c69972bb7cc65b1e5105444ec93ff

net: Add new ibverbs API to query speed

3970dd7

Signed-off-by: Abhilash Kolluri <akolluri@nvidia.com> Mirrored-from: 986da69af53986859ad84384631e6ad54dfd0358

Use new query speed API at init to query speed

7edbd40

Signed-off-by: Abhilash Kolluri <akolluri@nvidia.com> Mirrored-from: baae513fdbd309cd2012cbeb074ef267a02d6b30

Add async speed change event

fdf50c5

Signed-off-by: Abhilash Kolluri <akolluri@nvidia.com> Mirrored-from: 1b9e5172ba033638fcca26c2e3fd69f459628914

Add example for AllGather using Copy Engine

31cea02

Signed-off-by: Rami Benayed <rbenayed@nvidia.com> Mirrored-from: a7818c7952acf2d3222e4dcbe28efd5982617ede

Fix .gitignore paths after examples moved to docs/examples

709108e

Signed-off-by: Rami Benayed <rbenayed@nvidia.com> Mirrored-from: 86836e195224482b51651d556479410c6d993471

Fix buffer overflow in parseStringList

cb04f02

Fixes: NVIDIA#2026 Signed-off-by: Ronald Eytchison <ronald.eytchison@trailofbits.com> Mirrored-from: c1cbee3d05ab3acac075abf6f44922cc27a150b2

Merge remote-tracking branch 'origin/master' into v2.30u1

a59132e

Signed-off-by: Mark Santesson <msantesson@nvidia.com> Mirrored-from: ad34e21727f315056b3c32d2c5b3227c79f40578

[NCCL EP] replace IPC handle exchange with NCCL window registrtaion i…

ef8047a

…n HT mode Signed-off-by: Subhadeep Bhattacharya <subhadeepb@nvidia.com> Mirrored-from: 2110fd0b94d180dade732404e601419cb1cfb542

[NCCL EP] use bitmask to compute dst offset in N2N

719d37b

Signed-off-by: Katie Gioioso <kgioioso@nvidia.com> Mirrored-from: 4f0a5727cf123c158aad0897af0eb2e84945a3a8

Add NVML wrapper functions required for Windows topology detection

1414424

Add thin wrappers in nvmlwrap.h/nvmlwrap.cc for NVML APIs used by the Windows topology discovery code paths Signed-off-by: Timothy Lo <tilo@nvidia.com> Mirrored-from: 144b16eec3cb13c1c6f7fe8521efe3b77e0c95f0

Bring in GPUNetIO v2.0.2-rc4

9d6a907

Signed-off-by: Pak Markthub <pmarkthub@nvidia.com> Mirrored-from: a622fa1430843d91c739c6258a59c8ff4edb2925

reverting to correct (but slower) num_recv_tokens

2679c65

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com> Mirrored-from: 36a79161b86ab70e343a1efd43c423f9fd0923a3

[NCCL EP] Use pre-calculated receive token count for cudaMemCpy after…

354ab57

… HT dispatch Signed-off-by: Subhadeep Bhattacharya <subhadeepb@nvidia.com> Mirrored-from: 228c3aa7405a5462c2311af3710a9cc88deed531

sjeaugey and others added 11 commits May 14, 2026 00:28

paths: fix BFS path selection to prefer higher bandwidth

b2c225c

- prefer path with higher bandwidth over shorter hop count, - refactor ncclGetLocalCpu function to use ncclTopoGetLocal Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: 6a5b60617f6dfb5f098633c6500631a3e22887e5

topo: add mlopart logic to gdrSupport detection

bf65d3e

gdrSupport is disabled if unsupported. Signed-off-by: Thomas Gillis <tgillis@nvidia.com> Mirrored-from: feef28ea1b47202a9c4ac9a0ea837ed38b857ff1

Remove trailing whitespace

47c9ecc

Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: 3d9d627c3facb45a2c4308aafc6cde20c7f66a09

NCCL 2.30.5-1

d16e064

Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: 33a689942a57801bd217626d078397a750cc0d38

Pritiks23 commented May 17, 2026

View reviewed changes

Pritiks23 commented May 18, 2026

View reviewed changes

justus-nv reviewed May 18, 2026

View reviewed changes

Pritiks23 force-pushed the combine-prs branch 2 times, most recently from 83fd6ea to 6b1c0a9 Compare May 18, 2026 19:28

kiskra-nvidia and others added 3 commits May 20, 2026 17:06

Merge remote-tracking branch 'origin/master' into v2.31

3abe56b

Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: 57d9f7ae7a4396e5232d1472797a56aff3d8c231

Bring back change for removing spurious casts in the INFO and TRACE m…

c41f343

…acros that was incorrectly reverted during merge Signed-off-by: Bharath Ramesh <bhramesh@nvidia.com> Mirrored-from: d048f213dea659ad4634c76aba4a1a1c8fc3dbe2

Pritiks23 force-pushed the combine-prs branch from 6b1c0a9 to c41f343 Compare May 21, 2026 03:39

allocator: harden shadow pool error handling and diagnostics

95974c0

Signed-off-by: Pritika Vipin <65793273+Pritiks23@users.noreply.github.com>

Pritiks23 mentioned this pull request May 21, 2026

Add NULL checks after malloc and improve diagnostic logging in shadow pool functions #2188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NULL checks after malloc and improve diagnostic logging in shadow pool functions#2175

Add NULL checks after malloc and improve diagnostic logging in shadow pool functions#2175
Pritiks23 wants to merge 218 commits into
NVIDIA:masterfrom
Pritiks23:combine-prs

Pritiks23 commented May 13, 2026

Uh oh!

Pritiks23 left a comment

Uh oh!

justus-nv commented May 18, 2026

Uh oh!

Pritiks23 left a comment

Uh oh!

justus-nv left a comment

Uh oh!

Pritiks23 commented May 18, 2026

Uh oh!

justus-nv commented May 18, 2026

Uh oh!

jskrobola commented May 19, 2026

Uh oh!

NCCL-Bot commented May 19, 2026

Uh oh!

justus-nv commented May 21, 2026

Uh oh!

justus-nv commented May 21, 2026

Uh oh!

Pritiks23 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Pritiks23 commented May 13, 2026

Description

Related Issues

Changes & Impact

Performance Impact

Uh oh!

Pritiks23 left a comment

Choose a reason for hiding this comment

Uh oh!

justus-nv commented May 18, 2026

Uh oh!

Pritiks23 left a comment

Choose a reason for hiding this comment

Uh oh!

justus-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Pritiks23 commented May 18, 2026

Uh oh!

justus-nv commented May 18, 2026

Uh oh!

jskrobola commented May 19, 2026

Uh oh!

NCCL-Bot commented May 19, 2026

Uh oh!

justus-nv commented May 21, 2026

Uh oh!

justus-nv commented May 21, 2026

Uh oh!

Pritiks23 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants