Skip to content

Commit

Permalink
Defang MacOS & iOS crashing text sequences
Browse files Browse the repository at this point in the history
[Manish Goregaokar](https://manishearth.github.io/blog/2018/02/15/picking-apart-the-crashing-ios-string/) says
> So, ultimately, the full set of cases that cause the crash are:
>   Any sequence <consonant1, virama, consonant2, ZWNJ, vowel>
> in Devanagari, Bengali, and Telugu, where: ...

We eliminate the ZWNJ which seems the minimally damaging thing to do to

Telugu rendering per the article above:
> a ZWNJ before a vowel doesn’t really do anything for most Indic scripts.

This is needed as of February 2018, but hopefully not long after that.
  • Loading branch information
mikesamuel committed Feb 19, 2018
1 parent 79c46d2 commit a7760d0
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 0 deletions.
37 changes: 37 additions & 0 deletions src/main/java/org/owasp/html/Encoding.java
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,43 @@ private static void encodeHtmlOnto(
output.append(plainText, pos, i).append(repl);
pos = i + 1;
}
} else if ((0x93A <= ch && ch <= 0xC4C)
&& (
// Devanagari vowel
ch <= 0x94F
|| 0x93A <= ch && ch <= 0x94F
// Benagli vowels
|| 0x985 <= ch && ch <= 0x994
|| 0x9BE <= ch && ch < 0x9CC // 0x9CC (Bengali AU) is ok
|| 0x9E0 <= ch && ch <= 0x9E3
// Telugu vowels
|| 0xC05 <= ch && ch <= 0xC14
|| 0xC3E <= ch && ch != 0xC48 /* 0xC48 (Telugu AI) is ok */)) {
// https://manishearth.github.io/blog/2018/02/15/picking-apart-the-crashing-ios-string/

This comment has been minimized.

Copy link
@mikesamuel

mikesamuel Feb 19, 2018

Author Contributor

@Manishearth, thanks for your writeup btw.

// > So, ultimately, the full set of cases that cause the crash are:
// > Any sequence <consonant1, virama, consonant2, ZWNJ, vowel>
// > in Devanagari, Bengali, and Telugu, where: ...

// TODO: This is needed as of February 2018, but hopefully not long after that.
// We eliminate the ZWNJ which seems the minimally damaging thing to do to
// Telugu rendering per the article above:
// > a ZWNJ before a vowel doesn’t really do anything for most Indic scripts.

if (pos < i) {
if (plainText.charAt(i - 1) == 0x200C /* ZWNJ */) {
output.append(plainText, pos, i - 1);
// Drop the ZWNJ on the floor.
pos = i;
}
} else if (output instanceof StringBuilder) {
StringBuilder sb = (StringBuilder) output;
int len = sb.length();
if (len != 0) {
if (sb.charAt(len - 1) == 0x200C /* ZWNJ */) {
sb.setLength(len - 1);
}
}
}
} else if (((char) 0xd800) <= ch) {
if (ch <= ((char) 0xdfff)) {
char next;
Expand Down
48 changes: 48 additions & 0 deletions src/test/java/org/owasp/html/HtmlSanitizerTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,54 @@ public static final void testDuplicateAttributes() {

}

@Test
public static final void testMacOSAndIOSQueryOfDeath() {
// https://manishearth.github.io/blog/2018/02/15/picking-apart-the-crashing-ios-string/
String[][] tests = {
{
"\u0C1C\u0C4D\u0C1E\u200C\u0C3E",
"\u0C1C\u0C4D\u0C1E\u0C3E",
},
{
"\u09B8\u09CD\u09B0<interrupted>\u200C\u09C1",
"\u09B8\u09CD\u09B0\u09C1",
},
{
"\u0C1C\u0C4D\u0C1E\u200C\u0C3E",
"\u0C1C\u0C4D\u0C1E\u0C3E",
},
{
"\u09B8\u09CD\u09B0\u200C<interrupted>\u09C1",
"\u09B8\u09CD\u09B0\u09C1",
},
{
"&#x0C1C;&#x0C4D;&#x0C1E;&#x200C;&#x0C3E;",
"\u0C1C\u0C4D\u0C1E\u0C3E",
},
{
"&#x0C1C;&#x0C4D;&#x0C1E;<interrupted>&#x200C;&#x0C3E;",
"\u0C1C\u0C4D\u0C1E\u0C3E",
},
{
"&#x09B8;&#x09CD;&#x09B0;&#x200C;&#x09C1;",
"\u09B8\u09CD\u09B0\u09C1",
},
{
"&#x09B8;&#x09CD;&#x09B0;&#x200C;<interrupted>&#x09C1;",
"\u09B8\u09CD\u09B0\u09C1",
},
{
"\u0915\u094D\u0930\u200C\u093E",
"\u0915\u094D\u0930\u093E",
},
};

for (int i = 0, n = tests.length; i < n; ++i) {
String[] test = tests[i];
assertEquals(i + " : " + test[0], test[1], sanitize(test[0]));
}
}

private static String sanitize(@Nullable String html) {
StringBuilder sb = new StringBuilder();
HtmlStreamRenderer renderer = HtmlStreamRenderer.create(
Expand Down

1 comment on commit a7760d0

@stephanemoore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this!

Please sign in to comment.