fix: resolve NPU OOM with default training config#620
Open
curnane-lab wants to merge 5 commits into
Open
Conversation
Read mask_token_id from draft_config.dflash_config before falling back to tokenizer.mask_token_id or adding a new special token. Apply the same fallback in both train_dflash.py and train_domino.py for consistency. Closes sgl-project#500
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the --num-anchors parameter to 186 in both the run_qwen3.5_4b_dflash_online_npu.sh and run_qwen3.5_4b_domino_online_npu.sh example scripts. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The default NPU training examples for Qwen3.5-4B DFlash use
--num-anchorsvalues (512) that cause out-of-memory errors on common 64GB Ascend NPU cards such as 910B(A2 node) and 910C(A3 node). This PR lowers the default to a value that fits within the available device memory while keeping the examples runnable out-of-the-box.Modifications
examples/run_qwen3.5_4b_dflash_online_npu.sh--num-anchorsfrom512to186examples/run_qwen3.5_4b_domino_online_npu.sh--num-anchorsfrom16to186Both scripts now use the same
--num-anchors 186default, which avoids OOM on 64GB NPU devices.Related Issues
N/A
Accuracy Test
Not applicable — this change only adjusts a training hyper-parameter default in example launch scripts. No model architecture or kernel code is modified.
Benchmark & Profiling
Not applicable — the change reduces memory usage for the default NPU example configuration.
Checklist