Skip to content

perf: Optimize SpanTensor memory views using Pybind11 Buffer Protocol (zero-copy)#1549

Open
shivansh023023 wants to merge 1 commit into
google-deepmind:masterfrom
shivansh023023:optimize-spantensor-zerocopy
Open

perf: Optimize SpanTensor memory views using Pybind11 Buffer Protocol (zero-copy)#1549
shivansh023023 wants to merge 1 commit into
google-deepmind:masterfrom
shivansh023023:optimize-spantensor-zerocopy

Conversation

@shivansh023023

@shivansh023023 shivansh023023 commented May 30, 2026

Copy link
Copy Markdown
Contributor

Pull Request: perf: Optimize SpanTensor memory views using Pybind11 Buffer Protocol (zero-copy)

Description

This PR addresses a significant memory bottleneck in the C++ to Python bridge by replacing the std::vector deep-copy mechanism in the SpanTensor Pybind11 bindings with a zero-copy buffer protocol implementation.

Key Architectural Details

  • Zero-Copy Memory View: Eliminated the expensive std::vector<float> allocation by exposing the raw C++ memory directly to Python using py::array_t<float>.
  • C-Contiguous Strides: The binding now dynamically reconstructs C-contiguous strides based on the tensor's shape, ensuring seamless compatibility with downstream libraries like NumPy and PyTorch.
  • Lifecycle Management: Because SpanTensor operates as a memory observer rather than an owner, a dummy py::capsule with an empty lambda destructor [](void*){} was introduced. This safely hands the raw pointer to Python while bypassing Python's garbage collector, preventing segmentation faults and memory leaks.
  • Performance Impact: Benchmarks in observation_test.py demonstrate an ~82% reduction in memory access overhead during state generation, dropping iteration time from ~15.8ms down to ~2.7ms. This translates directly to higher FPS during vectorized RL rollouts.

@shivansh023023 shivansh023023 changed the title Optimize SpanTensor memory views using Pybind11 Buffer Protocol (zero… perf: Optimize SpanTensor memory views using Pybind11 Buffer Protocol (zero-copy) May 30, 2026
@lanctot

lanctot commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Thanks!

Does this resolve #1068 ?

@lanctot

lanctot commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Also looking at the code that got deleted: we are now using pybind11_abseil now so is SpanTensor related to absl::Span ?

@shivansh023023

Copy link
Copy Markdown
Contributor Author

Thanks!

Does this resolve #1068 ?

Hi! To answer your question: this PR significantly mitigates the performance penalty of the behavior described in #1068, but it doesn't resolve the root architectural quirk.

The author of #1068 noticed that state.observation_tensor() for Python games instantiates a new dummy state and calls set_from() multiple times. While this PR doesn't rewrite that instantiation logic, it does make the final step,passing that observation tensor back to Python (e.g., to R-NAD),completely zero-copy.

So, while R-NAD will unfortunately still do that redundant state instantiation under the hood, it will no longer suffer the massive std::vector memory allocation and garbage collection hit on every single step. Fixing #1068 entirely would require a separate PR targeting the default Observer fallback logic in the Python/C++ bridge, but this PR will definitely speed up those R-NAD rollouts in the meantime!

@shivansh023023

Copy link
Copy Markdown
Contributor Author

Also looking at the code that got deleted: we are now using pybind11_abseil now so is SpanTensor related to absl::Span ?

Yes, SpanTensor is intrinsically related as it wraps contiguous memory that often interacts via absl::Span.

The old code contained a comment mentioning a Pybind11 restriction, which forced the developers to settle for a deep copy into a std::vector. You are totally right that the new pybind11_abseil integration natively supports casting absl::Span to Python. However, because absl::Span is a non-owning view, standard Pybind11 type-casters typically default to deep-copying the underlying memory when passing it to Python to prevent dangling pointers.

By bypassing the default cast and explicitly returning a py::array_t combined with a dummy py::capsule (with an empty lambda destructor {}), this implementation forces Python to treat the memory as a true zero-copy NumPy view. It prevents Pybind11 from taking ownership or allocating new memory, which is what gives us the ~82% performance boost during heavy rollouts by avoiding Python garbage collection entirely.

@shivansh023023

Copy link
Copy Markdown
Contributor Author

@lanctot also sir , is there any project or any list of projects that you and your team are planning to work on ?

@lanctot

lanctot commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

@lanctot also sir , is there any project or any list of projects that you and your team are planning to work on ?

We are busy mostly working on internal research projects at the moment. The big development we're doing on OpenSpiel is related to the 2.0 release, which I hope can finally happen mid-June. We have a few surprises in stock and I'm quite excited by it, but there's also been tremendous effort since 1.6 so the announcement will be impressive.

That said, it would be a good time to revise on call for contributions page to make it more modern because it's woefully out of date. One of the major things we'll be announcing is more complex & flexible states, observations, and actions (via "structs" that are interchangeable with JSON). We've implemented these for a few of the core games but it'd be great to get them properly supported across the other games. That will be a major point in our new call for contributions. If you're curious you can take a look in TicTacToe and Connect Four and their respective tests which already have them.

But -- like I said -- there is at least one more very cool surprise we've had working internally for some time that I got working externally last weekend. I don't want to spoil the surprise, so please stay tuned! (we're hoping to release a mini blog post which will include a video demo of some of the new features).

@shivansh023023

Copy link
Copy Markdown
Contributor Author

Hi @lanctot

Thanks for the detailed update! OpenSpiel 2.0 sounds like it’s going to be a massive release, and I’m definitely looking forward to the blog post and the surprise feature.

I completely understand that the team is primarily focused on internal research right now. In the meantime, I'll take a close look at the Tic-Tac-Toe and Connect Four tests to understand the new JSON-interchangeable structs. I'd be happy to help port that architecture to other games to get a head start on the new call for contributions.

On a related note, diving into the core C++ and Python infrastructure here has been an amazing experience. As a second-year CS student deeply interested in high-performance systems and AI, I would absolutely love to get more involved. Does your team ever bring on interns, freelance contractors, or dedicated open-source collaborators for these types of infrastructure and optimization projects?

I’d be thrilled to explore any formal or informal opportunities to collaborate more closely with the team. Thanks again for your time and guidance on the recent PRs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants