-
Notifications
You must be signed in to change notification settings - Fork 692
feat: use nh3 for HTML sanitization #10264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ietf/utils/html.py
Outdated
| # Allow the protocols/tags/attributes we specifically want, plus anything that nh3 declares | ||
| # to be safe. | ||
|
|
||
| acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO check. this is more permissive than bleach's strict superset. however, unsure if this adds extra vulnerabilities since nh3 uses a better maintained parser https://github.com/rust-ammonia/ammonia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jennifer-richards @rjsparks any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try to get a few other eyes on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding tel: and ftp: (really?) are fine from a security perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need ftp, but tel might be useful.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10264 +/- ##
==========================================
- Coverage 88.57% 88.39% -0.18%
==========================================
Files 317 325 +8
Lines 42544 43544 +1000
==========================================
+ Hits 37682 38492 +810
- Misses 4862 5052 +190 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ietf/utils/html.py
Outdated
| "li", "ol", "p", "pre", "q", "s", "samp", "small", "span", "strike", "style", | ||
| "li", "ol", "p", "pre", "q", "s", "samp", "small", "span", "strike", | ||
| "strong", "sub", "sup", "table", "title", "tbody", "td", "tfoot", "th", "thead", | ||
| "tr", "tt", "u", "ul", "var" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xmp is also good
| "tr", "tt", "u", "ul", "var" | |
| "tr", "tt", "u", "ul", "var", "xmp" |
ietf/utils/html.py
Outdated
| # to be safe. | ||
|
|
||
| acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union( | ||
| {"http", "https", "mailto", "ftp", "xmpp"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| {"http", "https", "mailto", "ftp", "xmpp"} | |
| {"ftp", "http", "https", "mailto", "tel", "xmpp"} |
Sort. Add "tel".
(If performance is dictated by order, move "https" to the top.)
ietf/utils/html.py
Outdated
| protocols=acceptable_protocols, | ||
| strip=True, | ||
| url_schemes=acceptable_protocols, | ||
| link_rel=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me. The default value is safer. Are there cases where links opened will need window.opener? I can't imagine that being necessary for user-generated content.
Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, changed it here as there were no rel attributes before
ietf/utils/html.py
Outdated
| protocols=acceptable_protocols, | ||
| strip=True, | ||
| _liberal_nh3_cleaner = nh3.Cleaner( | ||
| tags=acceptable_tags.union({"mg", "figure", "figcaption"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
| tags=acceptable_tags.union({"mg", "figure", "figcaption"}), | |
| tags=acceptable_tags.union({"img", "figure", "figcaption"}), |
| """Returns the given HTML sanitized, and with the given tags removed.""" | ||
| allowed = acceptable_tags - set(t.lower() for t in tags) | ||
| return bleach.clean(html, tags=allowed, strip=True) | ||
| return nh3.clean(html, tags=allowed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to audit invocations of this function.
821b2a9 to
bf72278
Compare
|
Hi @nouralmaa - Thanks for continuing to work on this. One point on something that's not obvious - commits bf72278 and 9736b74 touch an old data migration (see the filename) - those will never run again, and its better to not touch these migration files as they should show what happened when they actually ran on the production database. If it's easy, please revert those commits. If it's not, we'll do it when this gets towards final review. When we have major changes to django (we may have one coming with django5) we tend to squash the migrations and that one would go away at that time. |
| '<a href="https://www.ietf.org" rel="nofollow">https://www.ietf.org</a>', | ||
| ) | ||
| self.assertEqual( | ||
| linkify("https://mailman3.ietf.org/mailman3/lists/[email protected]/"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes #10120
fixes #10138