Skip to content

feat(dspark): add Ascend NPU support for Qwen3.5-4B DSpark training#617

Open
curnane-lab wants to merge 3 commits into
sgl-project:mainfrom
curnane-lab:dspark_npu
Open

feat(dspark): add Ascend NPU support for Qwen3.5-4B DSpark training#617
curnane-lab wants to merge 3 commits into
sgl-project:mainfrom
curnane-lab:dspark_npu

Conversation

@curnane-lab

@curnane-lab curnane-lab commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds Ascend NPU training support for DSpark on Qwen3.5-4B.

Note on scope: The branch currently contains two commits. The first commit (a2f18ea, "feat: DSpark trainer") is borrowed from the preceding DSpark trainer PR and provides the base DSpark implementation. This PR's own incremental change is the second commit (5776d51), which adds the NPU example script and the flex_attention -> SDPA fallback in the trainer.

What is added (incremental)

1. NPU training launcher

  • examples/run_qwen3.5_4b_dspark_online_npu.sh
    • Sets ASCEND_RT_VISIBLE_DEVICES and PYTORCH_NPU_ALLOC_CONF.
    • Uses --attention-backend sdpa and --target-model-backend hf (HF backend always surfaces last_hidden_states, which DSpark's L1 / confidence losses require).
    • Uses HCCL via torchrun --standalone.

2. Trainer NPU fallback

  • scripts/train_dspark.py
    • Auto-detects Ascend NPU and falls back from flex_attention to sdpa when the default backend would fail on NPU.

DSpark background (for context)

DSpark = SpecForge's DFlash block-diffusion drafter + EAGLE-style Markov & confidence heads, trained with:

  • Cross-entropy against ground-truth next tokens.
  • L1 distribution distillation using the target model's final hidden state.
  • Confidence-head BCE against the empirical per-token accept rate.

The base trainer implementation is in the preceding commit (a2f18ea). This PR only layers the NPU enablement on top.

Usage

export TARGET_MODEL_PATH=/path/to/Qwen3.5-4B
export TRAIN_DATA_PATH=/path/to/train.jsonl
bash examples/run_qwen3.5_4b_dspark_online_npu.sh 0,1,2,3,4,5,6,7

Checklist

maocheng23 and others added 2 commits June 29, 2026 10:41
…ion)

Port of TorchSpec PR sgl-project#129 to SpecForge. Adds:
- specforge/modeling/draft/dspark.py: DSparkConfig, VanillaMarkov,
  AcceptRatePredictor, DSparkDraftModel (subclass of DFlashDraftModel)
- specforge/core/dspark.py: OnlineDSparkModel (subclass of OnlineDFlashModel)
  with Markov-biased logits + CE + L1 distribution distillation + confidence BCE
  and a pooled global-mean loss
- scripts/train_dspark.py: training driver (clone of train_dflash.py)
- configs/qwen3-8b-dspark.json, examples/run_qwen3_8b_dspark_online.sh
- last_hidden_states surfaced from the DFlash target backends (HF + sglang)
- tests/test_utils/test_dspark.py: 11 CPU unit tests

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@curnane-lab curnane-lab changed the title Dspark npu add npu support for Dspark Jun 29, 2026
@curnane-lab curnane-lab changed the title add npu support for Dspark feat(dspark): add Ascend NPU support for Qwen3.5-4B DSpark training Jun 29, 2026
@curnane-lab curnane-lab force-pushed the dspark_npu branch 3 times, most recently from f6ed937 to d72ded8 Compare June 29, 2026 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants