Canary streamatt by azziko · Pull Request #34 · hlt-mt/simulstream

azziko · 2026-04-29T07:33:09Z

Changes:

Add flag to the base streamatt, which determines whether the audio history is stored raw or in features
Implement canary with streamatt

Resolves: #28

azziko · 2026-04-29T07:38:34Z

I'll fix the checks and run unit tests. I forgot about them to be honest

Let me know if the overall idea is fine

mgaido91

thank you very much for your contribution @azziko ! The approach looks great to me and the code is very clean, thanks. I amonly concerned by the leading EOS, which I do not understand.

Only a couple of last points:

Can we please add a couple of unit tests for the audio history management? Only to ensure everything works like we expect and also future changes won''t break things.
This code relies on recent contribs to NeMo (thanks for them as well!), but currently we have in our dependencies nemo_toolkit[asr]==2.4.0 for canary. I think we have to update that.

Thanks!

mgaido91 · 2026-04-30T15:45:06Z

+
+        return replace(self.transcription_cfg, prompt={"turns": turns})
+
+    def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:


I do not understand, how and when can this happen? isn't it a problem for the attention to have these extra tokens?

When we were testing our system with Canary for IWSLT, there were EOS tokens occasionally in the beginning of the hypothesis. While we haven't traced the exact reason why, I speculate it's because of the forced prefix. In our system we solved it this way. The fix should probably be better done on the NeMo side, though. I will look into that

isn't it a problem for the attention to have these extra tokens?

In our tests they were outputted together with the other prefiction, so I assume again that they don't distrupt the attention scores.

if you have a repro, I can also try to debug this, thanks. I would like to make sure here we do not have issues.

azziko · 2026-05-02T14:09:45Z

thanks for the review @mgaido91,

I pushed the quick fixes for most of the points, I will add some unit tests later too.

Regarding the EOS, I replied in the related conversation.

This code relies on recent contribs to NeMo (thanks for them as well!), but currently we have in our dependencies nemo_toolkit[asr]==2.4.0 for canary. I think we have to update that.

It does not seem like the contributions have been added to any release yet. I'm using latest commit from the repo when installing nemo toolkit as so:

pip install "nemo_toolkit[asr] @ git+https://github.com/NVIDIA/NeMo.git"

mgaido91

mostly LGTM, thanks, just a few minor comments. The main thing that worries me is the EOS stripping, which I would like to investigate more.

Regarding the version, the next release will be 2.8.0. So we can put that as a dependency. This might also mean we have to wait for that release to merge this but it may be fine if they stick with their scheduled release (June, so ~1 month from now). Otherwise we can put "@ git+https://github.com/NVIDIA/NeMo.git@main" as a dependency in the pyproject (actually it would be better to use a commit hash than main, to ensure we do not have falky issues with newer commits coming in). Then we will need another PR once they do the release to use that.

mgaido91 · 2026-05-04T07:48:09Z

           - **audio_subsampling_factor (int)**: Subsampling factor of the model, if any.
             Defaults to 1.
+           - **mel_hop_samples (int)**: Number of raw waveform samples per mel frame.
+             Defaults to 1.


Suggested change

Defaults to 1.

Defaults to 160, i.e. 10ms at 16kHz.

mgaido91 · 2026-05-04T07:53:25Z

+        self.use_raw_audio_history = True
+        self.mel_hop_samples = getattr(self.config, "mel_hop_samples", 160)
+        self.audio_subsampling_factor = getattr(self.config, "audio_subsampling_factor", 8)


all these things are already set in the parent, no need to have them here.

mgaido91 · 2026-05-04T08:00:16Z

+
+        return replace(self.transcription_cfg, prompt={"turns": turns})
+
+    def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:


if you have a repro, I can also try to debug this, thanks. I would like to make sure here we do not have issues.

azziko · 2026-05-05T21:15:04Z

I agree on the version, I changed it to 2.8.0

Regarding the EOS problem, I looked into the logs I had, it was the problem with our system in particular, so I removed the EOS trimming in the latest commit. It's still probably a good idea to run the processor on some small test set. I will try it out when I have time.

mgaido91

LGTM, only one comment regarding the UT. I agree on testing this more thoroughly, I'll also do that when I find the time.

Since we have to wait for nemo 2.8.0 to be out, please ping me if I do not notice it, so when nemo 2.8.0 is out we merge this.

Thanks!

Co-authored-by: Marco Gaido <marcogaido91@gmail.com>

mgaido91 · 2026-06-09T08:37:44Z

I have been trying this on must-c and got weird results:

Lang	Frame	COMET	SacreBLEU	Ideal Latency (s)	Comp. Latency (s)	RTF
de	2	0.7104	16.6518	2.9936	3.7225	0.5860
de	4	0.7091	16.5604	3.0939	3.7810	0.5566
de	6	0.7191	17.2589	3.5756	4.5110	0.7091
de	8	0.7228	17.1645	4.1756	4.9336	0.5139
es	2	0.7402	20.6929	2.8251	3.5067	0.5992
es	4	0.7395	20.7426	2.8670	3.4297	0.5112
es	6	0.7452	21.0897	3.0740	3.7319	0.5961
es	8	0.7462	21.3364	3.0958	3.6534	0.4638
fr	2	0.7219	23.2862	2.8991	3.6615	0.6582
fr	4	0.7257	23.7139	2.9642	3.6991	0.6417
fr	6	0.7326	24.4975	3.0756	3.8335	0.6594
fr	8	0.7394	25.0632	3.3128	3.8698	0.4516
it	2	0.7394	17.5279	2.7474	3.4357	0.6084
it	4	0.7390	17.6090	2.6898	3.2818	0.5201
it	6	0.7409	18.2894	3.0193	3.5827	—
it	8	0.7507	17.1477	3.1959	3.8328	—
it	10	0.7602	18.2203	3.5713	4.3174	—
nl	2	0.7514	18.2894	3.0193	3.5827	0.4866
nl	4	0.7520	18.5538	3.1366	3.8632	0.6211
nl	6	0.7546	18.8852	2.9528	3.6708	0.6003
nl	8	0.7584	18.5149	3.1959	3.8328	0.5269
pt	2	0.7518	17.1165	3.4080	3.9666	0.4671
pt	4	0.7507	17.1477	3.3887	4.0860	0.5956
pt	6	0.7552	17.5002	3.4613	4.1795	0.5977
pt	8	0.7602	18.2203	3.5713	4.3174	0.6283

does it make sense for you? Is it a behavior you have noticed as well?

azziko · 2026-06-09T08:53:18Z

The quality seems quite low, I would play with frame threshold a bit. What I found in my tests, is that with chunks >2s it's better to increase the frame threshold too to 16 or even 20. On MCIF dev subset I got 0.90 xcomet-xl with chunk 3 and frame 16 for en -> de direction. I will run a grid search on must-c and let you know what I get.

mgaido91 · 2026-06-09T09:09:42Z

Thanks, I was using 1 second as chunk size. This is the config file I used:

type: "simulstream.server.speech_processors.canary_streamatt.CanaryStreamAtt"
model_name: "nvidia/canary-1b-v2"
text_history:
  type: "simulstream.server.speech_processors.base_streamatt.FixedWordsTextHistory"
  history_words: 10
speech_chunk_size: 1.0  # seconds
detokenizer_type: "canary"
cross_attn_layer: -2
cutoff_frame_num: __FRAME__
num_beams: 5
audio_subsampling_factor: 8
audio_history_max_duration: 360  # Maximum length for the audio buffer, in seconds
mel_hop_samples: 160  # Number of audio samples between adjacent mel frames
text_history_max_len: 128
word_level_postprocess: True  # Disable if character-level language
use_raw_audio_history: True

I can also test different values if you think makes sense. I just want to double check we do not have issues with the code. Thanks.

azziko · 2026-06-09T09:22:01Z

I see, thanks. Does it mean "Chunk" in the table you shared actually represent the FRAME? If so, the results are ok for 1 second more or less.

mgaido91 · 2026-06-09T09:23:57Z

Does it mean "Chunk" in the table you shared actually represent the FRAME?

yes, sorry, I have done a bit of a mess with naming.

mgaido91 · 2026-06-15T16:02:56Z

looks like there will be no 2.8.0. They just increased to 3.0 and then 3.1 without any release. In addition, the current repo is a huge refactor, so everything should be re-tested as soon as they release something, sigh.

azziko added 2 commits April 29, 2026 07:30

Add canary streamatt

15b6a00

Add audio history type flag to the base streamatt

4279681

Add stylistic fixes addressing the linter

53afb67

mgaido91 reviewed Apr 30, 2026

View reviewed changes

azziko added 2 commits May 2, 2026 13:56

Add minor fixes

b9ec9ba

Fix linter issues

056ec4e

mgaido91 reviewed May 4, 2026

View reviewed changes

azziko added 4 commits May 4, 2026 14:07

Add minor fixes

39f1380

Delete removing eos in the beginning

3f9a6eb

Add unit test for audio trimming in update history

6b23ddf

Change the canary dependency version

52803f4

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread uts/speech_processors/test_streamatt.py Outdated

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated

azziko and others added 3 commits May 6, 2026 10:36

Update simulstream/server/speech_processors/canary_streamatt.py

cbc895e

Co-authored-by: Marco Gaido <marcogaido91@gmail.com>

Update uts/speech_processors/test_streamatt.py

59bf6f3

Co-authored-by: Marco Gaido <marcogaido91@gmail.com>

Fix linter

076ed37

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread uts/speech_processors/test_streamatt.py Outdated

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated

Add minor fixes

a1fea18


		return replace(self.transcription_cfg, prompt={"turns": turns})

		def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:

Conversation

azziko commented Apr 29, 2026

Uh oh!

azziko commented Apr 29, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgaido91 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

azziko May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

azziko commented May 2, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

azziko commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgaido91 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azziko commented Jun 9, 2026

Uh oh!

mgaido91 commented Jun 9, 2026

Uh oh!

azziko commented Jun 9, 2026

Uh oh!

mgaido91 commented Jun 9, 2026

Uh oh!

mgaido91 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

azziko May 2, 2026 •

edited

Loading

azziko commented May 5, 2026 •

edited

Loading

mgaido91 commented Jun 9, 2026 •

edited

Loading