Port Clifford Ocean environment to 4.0 by y-richie-y · Pull Request #581 · PufferAI/PufferLib

y-richie-y · 2026-06-06T06:15:24Z

This PR adds a native Clifford synthesis env to pufferlib.ocean, plus correctness tests that check native behavior against Python-side expected Clifford transformations. It is a cleaned up version of #506 for 4.0.

Clifford synthesis is a task in quantum computing: given a target Clifford tableau, find a sequence of Clifford gates that implements it.

This can be framed as a reinforcement learning problem where the agent learns to transform a tableau to the identity tableau.

         Initial tableau                          Identity tableau
        x0 x1 x2 | z0 z1 z2                     x0 x1 x2 | z0 z1 z2
      +-------------------+                    +-------------------+
  r0  | 1  0  0 | 0  0  1 |                    | 1  0  0 | 0  0  0 |
  r1  | 0  0  0 | 0  1  0 |                    | 0  1  0 | 0  0  0 |
  r2  | 0  0  1 | 1  0  1 |  --CZ(0,2),        | 0  0  1 | 0  0  0 |
      |---------+---------|      S(2), H(1)->  |---------+---------|
  r3  | 0  0  0 | 1  0  0 |                    | 0  0  0 | 1  0  0 |
  r4  | 0  1  0 | 0  0  0 |                    | 0  0  0 | 0  1  0 |
  r5  | 0  0  0 | 0  0  1 |                    | 0  0  0 | 0  0  1 |
      +-------------------+                    +-------------------+

For related RL-based Clifford synthesis work, see:

Yeung, Kissinger, and Cornish, Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis (arXiv:2605.10910, 2026): https://arxiv.org/abs/2605.10910
Kremer et al., Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning (arXiv:2405.13196, 2024): https://arxiv.org/abs/2405.13196

Environment

The env models synthesis over binary symplectic matrices:

observation: flattened 2n x 2n binary residual matrix
action space: H, S, V, HS, HV on each qubit when shortcut gates are enabled, plus CZ(i, j) for each unordered qubit pair. Without shortcut gates, the single-qubit action set is H, S.
reward: -single_qubit_cost for single-qubit gates, -cz_cost for CZ, optional goal_bonus on reaching identity, and failure_penalty on max-step truncation

Difficulty

difficulty is part of the env config because curriculum learning is useful for this task, and adjusting scramble depth during training/evaluation is a practical control point. Fractional difficulties are supported by sampling between the adjacent integer difficulty levels.

For example, difficulty=3.25 resets each episode from either a 3-step or 4-step random walk, using the fractional part as the probability of sampling the higher difficulty. This gives the curriculum a smoother progression than jumping only between integer scramble depths.

Curriculum Training

This PR also adds a curriculum training script with two profiles. Problems with N_QUBITS <= 3 are easy and can be solved with the fast curriculum. For N_QUBITS >= 4, training is less stable and uses the steady curriculum.

For 3 qubits:

N_QUBITS=3 scripts/train_clifford_curriculum.sh

Performance

Reset states are produced by random walks, so higher difficulty means more work per reset.

Measured on my machine with n_qubits=6, shortcut gates enabled, and the current native CPU path:

rollout SPS at num_envs=128, difficulty=0, max_steps=200: median 6.44M SPS over 5 trials
rollout SPS at num_envs=128, difficulty=10, max_steps=200: median 6.48M SPS over 5 trials
pure reset throughput at num_envs=2048, difficulty=1000: median 44.5 vec resets/sec over 3 trials
rollout SPS at num_envs=2048, difficulty=1000, max_steps=200: median 19.57M SPS over 3 trials
rollout SPS at num_envs=2048, difficulty=1000, max_steps=16: median 4.14M SPS over 3 trials

Port Clifford Ocean environment to 4.0

da7b4ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port Clifford Ocean environment to 4.0#581

Port Clifford Ocean environment to 4.0#581
y-richie-y wants to merge 1 commit into
PufferAI:4.0from
y-richie-y:clifford-ocean-4.0

y-richie-y commented Jun 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

y-richie-y commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Environment

Difficulty

Curriculum Training

Performance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

y-richie-y commented Jun 6, 2026 •

edited

Loading