Skip to content

markdown2 malformed HTML tokenizer CPU denial of service #707

@acorn421

Description

@acorn421

Describe the bug
python-markdown2 can spend unbounded CPU in its inline HTML tokenizer when rendering attacker-controlled Markdown containing repeated malformed tag fragments. At the pinned commit, the public markdown2.markdown() path can pass that text to _sorta_html_tokenize_re.split() in lib/markdown2.py, and a roughly 60 KB input deterministically reaches the local one-second timeout oracle. Applications that render untrusted Markdown synchronously can therefore have request workers tied up by a single document.

To Reproduce
INT-regex-markdown2-html-tokenizer-redos.zip

See attached file.

bash ./poc/run.sh
timed_out after 1s

Expected behavior
Tag-branch regex fails to match the malformed fragment in linear time.

Debug info

For more details, see README.md of attached file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions