fix: strip leading whitespace before encoding detection (fixes #508) by gaoflow · Pull Request #570 · kurtmckee/feedparser

gaoflow · 2026-06-17T21:18:20Z

When an XML feed starts with a newline before the XML declaration
(e.g. \n<?xml version="1.0"...), the encoding detection in
convert_to_utf8() fails to find the <?xml encoding attribute
because it's not at byte offset 0. This causes a second XML
declaration to be prepended, which trips the SAX parser with
"XML or text declaration not at start of entity".

Fix: strip leading ASCII whitespace from the data after BOM
detection and before encoding sniffing, so that XML declarations
preceded by whitespace are correctly detected.

…kee#508) When an XML feed starts with a newline before the XML declaration, the encoding detection in convert_to_utf8() fails to find the <?xml encoding attribute, causing it to prepend a second XML declaration which triggers SAX "XML or text declaration not at start of entity" errors. Strip leading ASCII whitespace from the data after BOM detection but before encoding sniffing, so that XML declarations preceded by newlines are correctly detected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: strip leading whitespace before encoding detection (fixes #508)#570

fix: strip leading whitespace before encoding detection (fixes #508)#570
gaoflow wants to merge 1 commit into
kurtmckee:mainfrom
gaoflow:fix-508-newline-xml

gaoflow commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gaoflow commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant