Skip to content

[Question] torch_memory_saver 报错only hook_mode=preload supports #2018

@qq1243196045

Description

@qq1243196045

Your Question

slime 的build_conda.sh中:

TMS_CUDA_MAJOR="${TMS_CUDA_MAJOR:-$(python -c 'import torch; print(torch.version.cuda.split(".")[0])')}"
export TMS_CUDA_MAJOR
# --no-build-isolation: TMS's setup.py needs to find nvcc + headers + the
# installed torch to build its cu${TMS_CUDA_MAJOR} native hook; pip's default
# PEP 517 build venv hides them, so the wheel comes out python-only (~46KB)
# and sglang trips `Only hook_mode=preload supports pauseable CUDA Graph`
# because the preload .so was never compiled in.
pip install -v git+https://github.com/fzyzcjy/torch_memory_saver.git@a193d9dd1b877d33c64a41cfb3db9f867df2d926 \
  --no-cache-dir --force-reinstall --no-build-isolation

按理说 使用现有torch环境进行编译的torch_memory_saver是支持 torch mode的,但是实际运行slime的时候,torch_memory_saver中还是会assert ,

 def cuda_graph(self, cuda_graph, pool, stream, capture_error_mode, tag: str, enable_cpu_backup: bool):
        assert self._hook_mode == "preload", "Only hook_mode=preload supports pauseable CUDA Graph currently"
        with torch.cuda.graph(cuda_graph, pool=pool, stream=stream, capture_error_mode=capture_error_mode):
            with self._with_region_config(tag=tag, enable_cpu_backup=enable_cpu_backup):
                yield

具体代码位置在
https://github.com/fzyzcjy/torch_memory_saver/blob/a193d9dd1b877d33c64a41cfb3db9f867df2d926/torch_memory_saver/entrypoint.py#L121-L125
请问这个是怎么绕过的,源码修改torch_memory_saver嘛?

What I've Tried

我尝试运行run-qwen3-4B.sh
但是遇到了torch_memory_saver的报错

Environment (if relevant)

  • slime version:0.3.0
  • Python version:3.12
  • PyTorch version: 2.8
  • CUDA/ROCm version:11.6
  • GPU type and count: 8
  • OS:ubuntu

Additional Context

No response

Pre-submission Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions