gh-102555: Fix comment parsing in HTMLParser according to the HTML5 s… · python/cpython@8ac7613 · GitHub | Latest TMZ Celebrity News & Gossip | Watch TMZ Live
Skip to content

Commit 8ac7613

Browse files
gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664)
* "--!>" now ends the comment. * "-- >" no longer ends the comment. * Support abnormally ended empty comments "<-->" and "<--->". --------- Co-author: Kerim Kabirov <the.privat33r+gh@pm.me> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
1 parent b582d75 commit 8ac7613

File tree

3 files changed

+50
-3
lines changed

3 files changed

+50
-3
lines changed

Lib/html/parser.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@
2929
starttagopen = re.compile('<[a-zA-Z]')
3030
endtagopen = re.compile('</[a-zA-Z]')
3131
piclose = re.compile('>')
32-
commentclose = re.compile(r'--\s*>')
32+
commentclose = re.compile(r'--!?>')
33+
commentabruptclose = re.compile(r'-?>')
3334
# Note:
3435
# 1) if you change tagfind/attrfind remember to update locatetagend too;
3536
# 2) if you change tagfind/attrfind and/or locatetagend the parser will
@@ -336,6 +337,21 @@ def parse_html_declaration(self, i):
336337
else:
337338
return self.parse_bogus_comment(i)
338339

340+
# Internal -- parse comment, return length or -1 if not terminated
341+
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
342+
def parse_comment(self, i, report=True):
343+
rawdata = self.rawdata
344+
assert rawdata.startswith('<!--', i), 'unexpected call to parse_comment()'
345+
match = commentclose.search(rawdata, i+4)
346+
if not match:
347+
match = commentabruptclose.match(rawdata, i+4)
348+
if not match:
349+
return -1
350+
if report:
351+
j = match.start()
352+
self.handle_comment(rawdata[i+4: j])
353+
return match.end()
354+
339355
# Internal -- parse bogus comment, return length or -1 if not terminated
340356
# see https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state
341357
def parse_bogus_comment(self, i, report=1):

Lib/test/test_htmlparser.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -367,17 +367,45 @@ def test_comments(self):
367367
html = ("<!-- I'm a valid comment -->"
368368
'<!--me too!-->'
369369
'<!------>'
370+
'<!----->'
370371
'<!---->'
372+
# abrupt-closing-of-empty-comment
373+
'<!--->'
374+
'<!-->'
371375
'<!----I have many hyphens---->'
372376
'<!-- I have a > in the middle -->'
373-
'<!-- and I have -- in the middle! -->')
377+
'<!-- and I have -- in the middle! -->'
378+
'<!--incorrectly-closed-comment--!>'
379+
'<!----!>'
380+
'<!----!-->'
381+
'<!---- >-->'
382+
'<!---!>-->'
383+
'<!--!>-->'
384+
# nested-comment
385+
'<!-- <!-- nested --> -->'
386+
'<!--<!-->'
387+
'<!--<!--!>'
388+
)
374389
expected = [('comment', " I'm a valid comment "),
375390
('comment', 'me too!'),
376391
('comment', '--'),
392+
('comment', '-'),
393+
('comment', ''),
394+
('comment', ''),
377395
('comment', ''),
378396
('comment', '--I have many hyphens--'),
379397
('comment', ' I have a > in the middle '),
380-
('comment', ' and I have -- in the middle! ')]
398+
('comment', ' and I have -- in the middle! '),
399+
('comment', 'incorrectly-closed-comment'),
400+
('comment', ''),
401+
('comment', '--!'),
402+
('comment', '-- >'),
403+
('comment', '-!>'),
404+
('comment', '!>'),
405+
('comment', ' <!-- nested '), ('data', ' -->'),
406+
('comment', '<!'),
407+
('comment', '<!'),
408+
]
381409
self._run_check(html, expected)
382410

383411
def test_condcoms(self):
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Fix comment parsing in :class:`html.parser.HTMLParser` according to the
2+
HTML5 standard. ``--!>`` now ends the comment. ``-- >`` no longer ends the
3+
comment. Support abnormally ended empty comments ``<-->`` and ``<--->``.

0 commit comments

Comments
 (0)

TMZ Celebrity News – Breaking Stories, Videos & Gossip

Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.

Whether it’s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.

🎥 Watch TMZ Live

TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Don’t miss a beat—watch now and see what’s trending in Hollywood.