Skip to content

feat(stt/parakeet): decode-loop fusion campaign — TDT v3 + EOU (LSTM gate + B1 fusion)#68

Open
Alex-Wengg wants to merge 1 commit into
mainfrom
feat/parakeet-decode-fusion
Open

feat(stt/parakeet): decode-loop fusion campaign — TDT v3 + EOU (LSTM gate + B1 fusion)#68
Alex-Wengg wants to merge 1 commit into
mainfrom
feat/parakeet-decode-fusion

Conversation

@Alex-Wengg

Copy link
Copy Markdown
Member

Research record for the Parakeet decode-loop campaign:

  • LSTM gate: ios17.lstm in both prediction networks (TDT v3 ×2, EOU ×1) → ANE placement categorically closed for all Parakeet RNN-T decoders
  • TDT v3 fused decoder+joint: 1.26–1.29×/step, 1.11×/utterance (~5% E2E ceiling), parity bit-identical fp32; no fp16 WER gate yet — the EOU campaign proved fp32 parity doesn't survive fp16 deployment, so gate before shipping (pattern: mobius feat/eou-decode-ane)
  • fp32 fused regresses 0.64× when GPU is in the CU set — fp16 only
  • This branch's EOU traced-fusion artifact is superseded by the MIL-lean build (feat/eou-decode-ane; identical outputs, faster)
  • Writeup: models/stt/parakeet-tdt-v3-0.6b/coreml/OPTIMIZATION.md

🤖 Generated with Claude Code

LSTM gate: ios17.lstm in both prediction networks -> ANE categorically
blocked (Kokoro PostAlbert finding); campaign is fusion-only.

Fused decoder+joint_decision into one CoreML dispatch per step:
- TDT v3: 1.27x/step vs shipped pair, 1.11x utterance decode loop
  (~2 ms/utt; fused pays LSTM on blank steps production skips)
- EOU: 1.23x/step, 1.21x utterance decode loop (~7-10 ms/utt;
  458 -> 229 dispatches)
Parity: token/duration/state bit-identical vs two-model chain (fp32);
top_k_logits 7e-4/1e-3 reassociation noise only. fp16 export only --
fp32 fused regresses 0.64x when GPU is in the compute-unit set.

See models/stt/parakeet-tdt-v3-0.6b/coreml/OPTIMIZATION.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant