gh-102555: Fix comment parsing in HTMLParser by serhiy-storchaka · Pull Request #135664 · python/cpython · GitHub | Latest TMZ Celebrity News & Gossip | Watch TMZ Live
Skip to content

gh-102555: Fix comment parsing in HTMLParser #135664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 4, 2025

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 18, 2025

  • "--!>" now ends the comment.
  • "-- >" no longer ends the comment.
  • Support abnormally ended empty comments "<-->" and "<--->".

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".
@serhiy-storchaka serhiy-storchaka changed the title gh-135661: Fix comment parsing in HTMLParser gh-102555: Fix comment parsing in HTMLParser Jun 25, 2025
Copy link
Contributor

@Privat33r-dev Privat33r-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation overall :)

@@ -309,6 +310,21 @@ def parse_html_declaration(self, i):
else:
return self.parse_bogus_comment(i)

# Internal -- parse comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
def parse_comment(self, i, report=True):
Copy link
Contributor

@Privat33r-dev Privat33r-dev Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the change should be made in the _markupbase.

cpython/Lib/_markupbase.py

Lines 165 to 175 in c2f2fd4

def parse_comment(self, i, report=1):
rawdata = self.rawdata
if rawdata[i:i+4] != '<!--':
raise AssertionError('unexpected call to parse_comment()')
match = _commentclose.search(rawdata, i+4)
if not match:
return -1
if report:
j = match.start(0)
self.handle_comment(rawdata[i+4: j])
return match.end(0)

If the method is overloaded here, then there are no other use cases, and the original method becomes dead code.
https://github.com/search?q=repo%3Apython%2Fcpython%20parse_comment&type=code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to override the method here. Since _markupbase has been made internal/private in Python 3 and it's only used by html.parser, it makes sense to me to add new code directly to html.parser (and possibly even merging _markupbase into html.parser eventually).

Regarding the (now) dead code, we could either let it be, adding a comment noting that the method is unused/overridden, or delete it. The first two options are less destructive, but since the module is private there shouldn't be much concern about breaking backward compatibility (and if anyone is relying on the original implementation, they are probably using it through HTMLParser anyway).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_markupbase can be used in third-party code (if it is also the base for the SGML parser or other parsers), so it is better to not touch it in maintained versions. This change can break it if SGML has other rules for comments. But in the developing branch we can remove it, after finishing all other bug fixes.

@@ -309,6 +310,21 @@ def parse_html_declaration(self, i):
else:
return self.parse_bogus_comment(i)

# Internal -- parse comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
def parse_comment(self, i, report=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to override the method here. Since _markupbase has been made internal/private in Python 3 and it's only used by html.parser, it makes sense to me to add new code directly to html.parser (and possibly even merging _markupbase into html.parser eventually).

Regarding the (now) dead code, we could either let it be, adding a comment noting that the method is unused/overridden, or delete it. The first two options are less destructive, but since the module is private there shouldn't be much concern about breaking backward compatibility (and if anyone is relying on the original implementation, they are probably using it through HTMLParser anyway).

@serhiy-storchaka serhiy-storchaka added type-security A security issue needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes labels Jul 4, 2025
@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) July 4, 2025 06:08
@serhiy-storchaka serhiy-storchaka merged commit 8ac7613 into python:main Jul 4, 2025
81 of 83 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11, 3.12, 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…TML5 standard (pythonGH-135664)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…TML5 standard (pythonGH-135664)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136271 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Jul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…TML5 standard (pythonGH-135664)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136272 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jul 4, 2025
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136273 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Jul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…TML5 standard (pythonGH-135664)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136274 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Jul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…TML5 standard (pythonGH-135664)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136275 is a backport of this pull request to the 3.10 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…TML5 standard (pythonGH-135664)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app bedevere-app bot removed the needs backport to 3.10 only security fixes label Jul 4, 2025
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136276 is a backport of this pull request to the 3.9 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.9 only security fixes label Jul 4, 2025
serhiy-storchaka added a commit that referenced this pull request Jul 4, 2025
…HTML5 standard (GH-135664) (GH-136272)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
serhiy-storchaka added a commit that referenced this pull request Jul 4, 2025
…HTML5 standard (GH-135664) (GH-136271)

* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".

---------
(cherry picked from commit 8ac7613)

Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants

TMZ Celebrity News – Breaking Stories, Videos & Gossip

Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.

Whether it’s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.

🎥 Watch TMZ Live

TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Don’t miss a beat—watch now and see what’s trending in Hollywood.