-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix srcset attribute parsing according to the specifications #399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Andriy,
thank you for this! I asked for a couple of cosmetic changes, will merge after that for the next release.
src/html.rs
Outdated
@@ -28,6 +28,8 @@ struct SrcSetItem<'a> { | |||
descriptor: &'a str, | |||
} | |||
|
|||
const WHITESPACES: &'static [char] = &['\t', '\n', '\x0c', '\r', ' ']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need \x0b
here by any chance as well? Python's string.whitespace
includes \x0b
and https://doc.rust-lang.org/reference/whitespace.html is even wider, but the reference spec https://infra.spec.whatwg.org/#ascii-whitespace doesn't include it, so we're probably fine.
I also think it needs to be renamed to something like SRCSET_WHITESPACES
since it's only used for srcset, and the file is html.rs (all things related to HTML).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @snshn,
so we're probably fine.
seems yes, \x0b
is out of scope HTML5 spec.
I also think it needs to be renamed to something like SRCSET_WHITESPACES since it's only used for srcset, and the file is html.rs (all things related to HTML).
But on the other hand, while it's currently used ONLY for srcset
, the WHITESPACES
list makes sense for whole HTML5 spec. I would name it HTML5_WHITESPACES
, but as you mentioned it's already defined inside html.rs
, I ignored the prefix. So if you still insist on renaming, let me know I'll do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's part of the HTML5 spec, I guess it can be just WHITESPACES
in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it below ICON_VALUES
though, to keep it alphabetical?
src/html.rs
Outdated
url_end -= 1; | ||
} | ||
offset = url_end; | ||
// If the URL wasn't terminated by a U+002C COMMA character (,) there may also be a descriptor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think saying just "comma" should be enough here
nit: and let's remove the period at the end, since other comments don't have it
tests/html/embed_srcset.rs
Outdated
} | ||
|
||
#[test] | ||
fn the_latest_without_descriptor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think last_without_descriptor()
might be a better name
@@ -145,7 +195,7 @@ mod failing { | |||
|
|||
assert_eq!( | |||
embedded_css, | |||
format!("{} 1x, {} 2x,", EMPTY_IMAGE_DATA_URL, EMPTY_IMAGE_DATA_URL), | |||
format!("{} 1x, {} 2x", EMPTY_IMAGE_DATA_URL, EMPTY_IMAGE_DATA_URL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, now it automatically fixes the format 👏🏻
src/html.rs
Outdated
if url_start >= size { | ||
break; | ||
} | ||
// A valid non-empty URL that does not start or end with a U+002C COMMA character (,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saying just "comma" should be enough
src/html.rs
Outdated
|
||
while offset < size { | ||
let mut has_descriptor = true; | ||
// Zero or more ASCII whitespace + skip leading U+002C COMMA character (,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "comma" should be enough. And "ASCII whitespace" is too specific, we have those whitespaces defined in the array, we have that part described in the file already, just "whitespace" would be easier to read.
Something like this:
Zero or more whitespaces + skip leading comma
According to the
srcset
attribute specificationCurrent implementation doesn't process
srcset
attribute properly for URLs separated by,
ONLY.This problem also was mentioned in #339