Skip to content

[Question]: bootstrap persistence & multi-module coexistence #68

@Enigmatisms

Description

@Enigmatisms

Question

NVSHMEM 3.x | MoE training (DeepSeek DeepEP + custom modules in one process)

Background

We're building a training framework where multiple independent modules (e.g., DeepEP for MoE all-to-all, a custom overlap module) each call nvshmemx_init_attr(NVSHMEMX_INIT_WITH_UNIQUEID, ...) with potentially different rank/nranks. We ran into some issues and did a source audit — hoping to confirm our understanding and get advice.

Q1: Is it possible to change nranks via finalize + re-init?

From reading the source, it appears that nvshmem_finalize() resets the initialized flag but not the bootstrapped flag. So after finalize + re-init, the new rank/nranks/uid parameters are silently ignored — bootstrap is skipped entirely and the old boot_handle (pg_rank, pg_size, node topology, etc.) is reused.

Is this intended? Is there any supported way to fully reset bootstrap state within a process so that a subsequent init can join a different-sized communication world?

Q2: Multiple modules calling init with different nranks — is this safe?

For example:

  • Module A inits with nranks=8 (intra-node only)
  • Module B inits with nranks=32 (all expert-parallel ranks)

Since NVSHMEM is a process-global singleton, the second init just bumps the refcount and its parameters are discarded. Module B ends up in Module A's 32-PE world without knowing it. This seems fundamentally unsafe — wrong PE numbering, nvshmem_malloc blocking on a global barrier that Module B's subset can't satisfy, etc.

What's the recommended practice here? Our best guess is:

  1. Coordinate a single init with the superset of all PEs
  2. Use nvshmem_team_split_strided() for per-module sub-groups
  3. Use team-based collectives + nvshmem_team_translate_pe() for RMA

Is this the right approach? Any guidance on coordinating NVSHMEM across independently-developed modules would be very helpful.

Context

This came up while integrating DeepEP (which has its own init/finalize lifecycle) into PaddlePaddle alongside other NVSHMEM-based modules. Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions