Skip to content

Fix decoding error (issue 168)#169

Open
pevogam wants to merge 1 commit into
avocado-framework:mainfrom
pevogam:multi-byte-decoding-fix
Open

Fix decoding error (issue 168)#169
pevogam wants to merge 1 commit into
avocado-framework:mainfrom
pevogam:multi-byte-decoding-fix

Conversation

@pevogam

@pevogam pevogam commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

By decoding incomplete byte-input, we risk losing multi-byte characters, if their byte-representation is not aligned with the end of our buffer.

Fix this by concatenating bytes first, and only decode when we have to.

In case of Tail that is a little complicated, because we need to return text in-between read() calls. Outsource to a helper function with lots of documentation to clarify and reduce complexity.

Clarify error policy for decode: why not switch from "ignore" to "replace" to notify callers of decoding problems instead of hiding them?

Original author: Christian Herdtweck christian.herdtweck@intra2net.com

@pevogam

pevogam commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@christian-intra2net Thanks for contributing! Before any review though - have you run the unit tests? I believe they also need adaptation.

@christian-intra2net

Copy link
Copy Markdown

Oops, sorry, I was not aware of unit tests. I think I found the problem, fixing it now...

By decoding incomplete byte-input, we risk losing multi-byte
characters if their byte-representation is not aligned with the
end of our buffer.

Fix this by concatenating bytes first, and only decode when we
actually have to.

In case of `Tail`` that is a little complicated, because we need to
return text in-between `read()` calls. Outsource to a helper function
with lots of documentation to clarify and reduce complexity.

Signed-off-by: Plamen Dimitrov <plamen.dimitrov@intra2net.com>
@pevogam pevogam force-pushed the multi-byte-decoding-fix branch from 4821842 to 2f9de5d Compare June 26, 2026 10:30
@pevogam

pevogam commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Great to see the CI passing, to clarify for everyone else - this PR fixes issue #168.

@pevogam pevogam left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an initial review from me, overall this pull request fixes a really non-trivial issue!

Comment thread aexpect/client.py
raw_data = os.read(expect_pipe, 1024)
if not raw_data:
return read, data
return read, data.decode(self.encoding, "ignore")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume from your comment in #168 (comment) this could be instead be turned into replace and thus provide the clarity you mentioned there. So let's see if we collect some feedback there on the original choices first and until then a "replace" setting here would rather be a requested change for additional improvement there here.

Comment thread aexpect/client.py
return read, data.decode(self.encoding, "ignore")
read += len(raw_data)
data += raw_data.decode(self.encoding, "ignore")
data += raw_data

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better not to decode raw data until the very end, I think this change improves the clarity and related better to the choice of naming.

Comment thread aexpect/client.py
thread.join()


def partial_decode(input_bytes, encoding):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add a few unit tests for what behavior should be contracted with this function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is also a better location for it e.g. in utils folder or something like that since the current module is purely structuring the classes in order of composition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants