Skip to content

Commit

Permalink
Infinite Trailing dots in IPv4 addrs at the end
Browse files Browse the repository at this point in the history
Even though WHATWG only mentions the last empty octet to be a warning (https://url.spec.whatwg.org/#ipv4-empty-part), but I don't think something like "http://127.0.0.1.../" should be marked as Too Many Parts (https://url.spec.whatwg.org/#ipv4-too-many-parts).

Browsers' URL Parsers that I tested do ignore the last dot characters as well.

I'm not sure if this needs to be merged or not, but if I were writing WHATWG I either wouldn't have let the last dot be valid at all, or multiple dots would be okay too; I also wouldn't allow hex/octal IPv4 addresses either, so what do I know!!

Thanks.
  • Loading branch information
the-moisrex committed Dec 26, 2023
1 parent d6bca38 commit aae442e
Show file tree
Hide file tree
Showing 4 changed files with 70 additions and 7 deletions.
9 changes: 6 additions & 3 deletions src/checkers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,17 @@ ada_really_inline ada_constexpr bool is_ipv4(std::string_view view) noexcept {
// with 'x' or a lowercase hex character.
// Most of the time, this will be false so this simple check will save a lot
// of effort.
char last_char = view.back();
// If the address ends with a dot, we need to prune it (special case).
if (last_char == '.') {
char last_char{};
for (;;) {
last_char = view.back();
if (last_char != '.') {
break;
}
view.remove_suffix(1);
if (view.empty()) {
return false;
}
last_char = view.back();
}
bool possible_ipv4 = (last_char >= '0' && last_char <= '9') ||
(last_char >= 'a' && last_char <= 'f') ||
Expand Down
2 changes: 1 addition & 1 deletion src/url.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ bool url::parse_opaque_host(std::string_view input) {

bool url::parse_ipv4(std::string_view input) {
ada_log("parse_ipv4 ", input, "[", input.size(), " bytes]");
if (input.back() == '.') {
while (!input.empty() && input.back() == '.') {
input.remove_suffix(1);
}
size_t digit_count{0};
Expand Down
16 changes: 13 additions & 3 deletions src/url_aggregator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -858,10 +858,20 @@ bool url_aggregator::parse_ipv4(std::string_view input) {
" bytes], overlaps with buffer: ",
helpers::overlaps(input, buffer) ? "yes" : "no");
ADA_ASSERT_TRUE(validate());
const bool trailing_dot = (input.back() == '.');
if (trailing_dot) {
input.remove_suffix(1);

// remove all of the last empty dots at the end:
bool trailing_dot = false;
for (;;) {
if (input.back() == '.') {
trailing_dot = true;
input.remove_suffix(1);
if (!input.empty()) {
continue;
}
}
break;
}

size_t digit_count{0};
int pure_decimal_count = 0; // entries that are decimal
uint64_t ipv4{0};
Expand Down
50 changes: 50 additions & 0 deletions tests/basic_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -396,3 +396,53 @@ TYPED_TEST(basic_tests, nodejs_50235) {
ASSERT_EQ(out->get_href(), "http://test.com:5/path?param=1");
SUCCEED();
}

TYPED_TEST(basic_tests, LocalIPv4Addr) {
constexpr std::array<std::string_view, 17 * 2> strs{
"https://127.0.0.1/",
"https://0x7F.1/",
"https://0x7f000001",
"https://0x0000000007F.0X1",
"https://127.0.0x0.1",
"https://127.0X0.0x0.1",
"https://127.0X0.0x0.0x1",
"https://127.0.0x0.0x00000000000000000000000000000000000001",
"https://0x7F.0x00000000000000000000000001",
"https://0x000000000000000007F.0x00000000000000000000000001",
"https://0x000000000000000007F.0.0x00000000000000000000000001",
"https://0x7f.0.0.0x1",
"https://0x7F.0.0x000.0x1",
"https://2130706433",
"https://127.1",
"https://127.0x00.1",
"https://127.0x000000000000000.0.1",

// with dots at the end:
"https://127.0.0.1...../",
"https://0x7F.1..../",
"https://0x7f000001.",
"https://0x0000000007F.0X1...",
"https://127.0.0x0.1............",
"https://127.0X0.0x0.1.............",
"https://127.0X0.0x0.0x1..",
"https://127.0.0x0.0x00000000000000000000000000000000000000001....",
"https://0x7F.0x00000000000000000000000001.",
"https://0x000000000000000007F.0x00000000000000000000000001...",
"https://0x000000000000000007F.0.0x00000000000000000000000001.....",
"https://0x7f.0.0.0x1..........",
"https://0x7F.0.0x000.0x1..............................................",
"https://2130706433.................",
"https://127.1............",
"https://127.0x00.1........................",
"https://127.0x000000000000000.0.1................",
};

for (auto const str : strs) {
auto out = ada::parse<TypeParam>(str);
EXPECT_TRUE(out) << str;
if (out) {
EXPECT_EQ(out->get_href(), "https://127.0.0.1/") << str;
EXPECT_EQ(out->get_hostname(), "127.0.0.1") << str;
}
}
}

0 comments on commit aae442e

Please sign in to comment.