Skip to content

Port Clifford Ocean environment to 4.0#581

Open
y-richie-y wants to merge 1 commit into
PufferAI:4.0from
y-richie-y:clifford-ocean-4.0
Open

Port Clifford Ocean environment to 4.0#581
y-richie-y wants to merge 1 commit into
PufferAI:4.0from
y-richie-y:clifford-ocean-4.0

Conversation

@y-richie-y

@y-richie-y y-richie-y commented Jun 6, 2026

Copy link
Copy Markdown

This PR adds a native Clifford synthesis env to pufferlib.ocean, plus correctness tests that check native behavior against Python-side expected Clifford transformations. It is a cleaned up version of #506 for 4.0.

Clifford synthesis is a task in quantum computing: given a target Clifford tableau, find a sequence of Clifford gates that implements it.

This can be framed as a reinforcement learning problem where the agent learns to transform a tableau to the identity tableau.

         Initial tableau                          Identity tableau
        x0 x1 x2 | z0 z1 z2                     x0 x1 x2 | z0 z1 z2
      +-------------------+                    +-------------------+
  r0  | 1  0  0 | 0  0  1 |                    | 1  0  0 | 0  0  0 |
  r1  | 0  0  0 | 0  1  0 |                    | 0  1  0 | 0  0  0 |
  r2  | 0  0  1 | 1  0  1 |  --CZ(0,2),        | 0  0  1 | 0  0  0 |
      |---------+---------|      S(2), H(1)->  |---------+---------|
  r3  | 0  0  0 | 1  0  0 |                    | 0  0  0 | 1  0  0 |
  r4  | 0  1  0 | 0  0  0 |                    | 0  0  0 | 0  1  0 |
  r5  | 0  0  0 | 0  0  1 |                    | 0  0  0 | 0  0  1 |
      +-------------------+                    +-------------------+

For related RL-based Clifford synthesis work, see:

  • Yeung, Kissinger, and Cornish, Equivariant Reinforcement Learning for Clifford Quantum Circuit Synthesis (arXiv:2605.10910, 2026): https://arxiv.org/abs/2605.10910
  • Kremer et al., Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning (arXiv:2405.13196, 2024): https://arxiv.org/abs/2405.13196

Environment

The env models synthesis over binary symplectic matrices:

  • observation: flattened 2n x 2n binary residual matrix
  • action space: H, S, V, HS, HV on each qubit when shortcut gates are enabled, plus CZ(i, j) for each unordered qubit pair. Without shortcut gates, the single-qubit action set is H, S.
  • reward: -single_qubit_cost for single-qubit gates, -cz_cost for CZ, optional goal_bonus on reaching identity, and failure_penalty on max-step truncation

Difficulty

difficulty is part of the env config because curriculum learning is useful for this task, and adjusting scramble depth during training/evaluation is a practical control point. Fractional difficulties are supported by sampling between the adjacent integer difficulty levels.

For example, difficulty=3.25 resets each episode from either a 3-step or 4-step random walk, using the fractional part as the probability of sampling the higher difficulty. This gives the curriculum a smoother progression than jumping only between integer scramble depths.

Curriculum Training

This PR also adds a curriculum training script with two profiles. Problems with N_QUBITS <= 3 are easy and can be solved with the fast curriculum. For N_QUBITS >= 4, training is less stable and uses the steady curriculum.

For 3 qubits:

N_QUBITS=3 scripts/train_clifford_curriculum.sh

Performance

Reset states are produced by random walks, so higher difficulty means more work per reset.

Measured on my machine with n_qubits=6, shortcut gates enabled, and the current native CPU path:

  • rollout SPS at num_envs=128, difficulty=0, max_steps=200: median 6.44M SPS over 5 trials
  • rollout SPS at num_envs=128, difficulty=10, max_steps=200: median 6.48M SPS over 5 trials
  • pure reset throughput at num_envs=2048, difficulty=1000: median 44.5 vec resets/sec over 3 trials
  • rollout SPS at num_envs=2048, difficulty=1000, max_steps=200: median 19.57M SPS over 3 trials
  • rollout SPS at num_envs=2048, difficulty=1000, max_steps=16: median 4.14M SPS over 3 trials

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant