fix: resolve NPU OOM with default training config by curnane-lab · Pull Request #620 · sgl-project/SpecForge

curnane-lab · 2026-06-29T10:47:22Z

Motivation

The default NPU training examples for Qwen3.5-4B DFlash use --num-anchors values (512) that cause out-of-memory errors on common 64GB Ascend NPU cards such as 910B(A2 node) and 910C(A3 node). This PR lowers the default to a value that fits within the available device memory while keeping the examples runnable out-of-the-box.

Modifications

examples/run_qwen3.5_4b_dflash_online_npu.sh
- Changed --num-anchors from 512 to 186
examples/run_qwen3.5_4b_domino_online_npu.sh
- Changed --num-anchors from 16 to 186

Both scripts now use the same --num-anchors 186 default, which avoids OOM on 64GB NPU devices.

Related Issues

N/A

Accuracy Test

Not applicable — this change only adjusts a training hyper-parameter default in example launch scripts. No model architecture or kernel code is modified.

Benchmark & Profiling

Not applicable — the change reduces memory usage for the default NPU example configuration.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

Read mask_token_id from draft_config.dflash_config before falling back to tokenizer.mask_token_id or adding a new special token. Apply the same fallback in both train_dflash.py and train_domino.py for consistency. Closes sgl-project#500

gemini-code-assist

Code Review

This pull request updates the --num-anchors parameter to 186 in both the run_qwen3.5_4b_dflash_online_npu.sh and run_qwen3.5_4b_domino_online_npu.sh example scripts. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

mingliangfu and others added 5 commits June 26, 2026 15:56

Merge branch 'main' into domino_npu

2e7ee2a

Merge branch 'sgl-project:main' into domino_npu

9aaafe5

fix: resolve OOM with default training config

57b7226

update default domino training config

5132dd9

curnane-lab requested review from FlamingoPg, shuaills and sleepcoo as code owners June 29, 2026 10:47

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve NPU OOM with default training config#620

fix: resolve NPU OOM with default training config#620
curnane-lab wants to merge 5 commits into
sgl-project:mainfrom
curnane-lab:domino_npu

curnane-lab commented Jun 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

curnane-lab commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

curnane-lab commented Jun 29, 2026 •

edited

Loading