Port Clifford Ocean environment to 4.0#581
Open
y-richie-y wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a native Clifford synthesis env to
pufferlib.ocean, plus correctness tests that check native behavior against Python-side expected Clifford transformations. It is a cleaned up version of #506 for 4.0.Clifford synthesis is a task in quantum computing: given a target Clifford tableau, find a sequence of Clifford gates that implements it.
This can be framed as a reinforcement learning problem where the agent learns to transform a tableau to the identity tableau.
For related RL-based Clifford synthesis work, see:
Environment
The env models synthesis over binary symplectic matrices:
2n x 2nbinary residual matrixH,S,V,HS,HVon each qubit when shortcut gates are enabled, plusCZ(i, j)for each unordered qubit pair. Without shortcut gates, the single-qubit action set isH,S.-single_qubit_costfor single-qubit gates,-cz_costforCZ, optionalgoal_bonuson reaching identity, andfailure_penaltyon max-step truncationDifficulty
difficultyis part of the env config because curriculum learning is useful for this task, and adjusting scramble depth during training/evaluation is a practical control point. Fractional difficulties are supported by sampling between the adjacent integer difficulty levels.For example,
difficulty=3.25resets each episode from either a 3-step or 4-step random walk, using the fractional part as the probability of sampling the higher difficulty. This gives the curriculum a smoother progression than jumping only between integer scramble depths.Curriculum Training
This PR also adds a curriculum training script with two profiles. Problems with
N_QUBITS <= 3are easy and can be solved with thefastcurriculum. ForN_QUBITS >= 4, training is less stable and uses thesteadycurriculum.For 3 qubits:
Performance
Reset states are produced by random walks, so higher difficulty means more work per reset.
Measured on my machine with
n_qubits=6, shortcut gates enabled, and the current native CPU path:num_envs=128,difficulty=0,max_steps=200: median6.44MSPS over 5 trialsnum_envs=128,difficulty=10,max_steps=200: median6.48MSPS over 5 trialsnum_envs=2048,difficulty=1000: median44.5vec resets/sec over 3 trialsnum_envs=2048,difficulty=1000,max_steps=200: median19.57MSPS over 3 trialsnum_envs=2048,difficulty=1000,max_steps=16: median4.14MSPS over 3 trials