Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Text fragment feature (#1545) #1600

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

thiru-appitap
Copy link

Text Fragment feature implementation pull request. The feature follows the published URL Fragment Text Directives specification (https://wicg.github.io/scroll-to-text-fragment/).

(base) lychee % lychee -vv --include-text-fragments https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments
...
[DEBUG] tdirective: "text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] status: Completed
[DEBUG] result: "From the foregoing remarks we may gather an idea of the importance"
[200] https://mdn.github.io/css-examples/target-text/index.html#:~:text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] tdirective: "text=linked%20URL,-'s%20format"
[DEBUG] status: Completed
[DEBUG] result: "linked URL"
[DEBUG] tdirective: "text=Deprecated-,attributes,attribute"
[DEBUG] status: Completed
[DEBUG] result: "attributes charset Deprecated Hinted at the character encoding of the linked URL. Note:This attribute"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute
[DEBUG] tdirective: "text=downgrade:-,The%20Referer,be%20sent,-to%20origins"
[DEBUG] status: Completed
[DEBUG] result: "The Referer header will not be sent"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=downgrade:-,The%20Referer,be%20sent,-to%20origins
[DEBUG] tdirective: "text=linked%20URL,defining%20a%20value"
[DEBUG] status: Completed
[DEBUG] result: "linked URL as a download. Can be used with or without a filename value: Without a value, the browser will suggest a filename/extension, generated from various sources: The Content-Disposition HTTP header The final segment in the URL path The media type (from the Content-Type header, the start of a data: URL, or Blob.type for a blob: URL) filename: defining a value"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,defining%20a%20value
...

If the fragment directive is not found, a TextDirectiveNotFound error will be returned.

Below changes are completed:

  1. Fragment Directive parser uses fancy-regex
  • this package was added as a dependency
  1. a new flag, include-text-fragments is added to support the feature
  • this is a deviation from the original feature request (which asked for using the text-fragments flag itself)
  1. Fragment (Text) Directive feature is tested on LTR sites only
  2. new UrlExt trait is implemented to enhance Url's to support Fragment Directive
  3. Support for multiple text fragment directives (for example, #:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute)
  4. tests are added for validating the feature
  5. cargo clippy & cargo tests were executed

@thiru-appitap
Copy link
Author

I missed to run the clippy across the test modules - the related lint failure issues are now fixed and ready for review!

);
match url {
Ok(url) => {
eprintln!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an assertion

Comment on lines +108 to +125
let mut status = Status::new(&response, self.accepted.clone());
if self.validate_text_fragments && has_fragment_directive {
if let Ok(res) = response.text().await {
info!("checking fragment directive...");
if let Some(fd) = req_url.fragment_directive() {
info!("directive: {:?}", fd.text_directives);
match fd.check(&res) {
Ok(stat) => {
status = stat;
}
Err(e) => {
return e.into();
}
}
}
}
}
status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move that into a function/method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tests for that part would also be nice

assert!(res.status().is_success());

// start with suffix
println!("\ntesting start with suffix...");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably remove the println!s right?


use crate::types::TextDirective;

const BLOCK_ELEMENTS: &[&str] = &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a long list. Does that mean we'd have to maintain the HTML keywords here? Maybe we can avoid that as it would be an uphill battle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this functionality is isolated in its own module. But it's a looot of code. 😅 Not sure what to do here, but at least the ratio of code/tests could be improved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could move it into a separate crate or use an upstream crate for that? I think it would be a nice library to maintain individually as more applications could profit from it

let mut all_directives_found = false;
let directive = td.directive.borrow();

'directive_loop: while !all_directives_found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels make it quite hard to read. Have you considered any alternatives?

pub(crate) const FRAGMENT_DIRECTIVE_DELIMITER: &str = ":~:";
pub(crate) const TEXT_DIRECTIVE_DELIMITER: &str = "text=";

pub(crate) const TEXT_DIRECTIVE_REGEX: &str = r"(?s)^text=(?:\s*(?P<prefix>[^,&-]*)-\s*[,$]?\s*)?(?:\s*(?P<start>[^-&,]*)\s*)(?:\s*,\s*(?P<end>[^,&-]*)\s*)?(?:\s*,\s*-(?P<suffix>[^,&-]*)\s*)?$";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does that regex come from? Did you write it yourself? If there's an "official" regex for those text fragments, we could perhaps add a link to the reference.

@@ -23,10 +26,61 @@ pub(crate) fn find_links(input: &str) -> impl Iterator<Item = linkify::Link> {
LINK_FINDER.links(input)
}

/// Fragment Directive feature trait
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

@@ -23,10 +26,61 @@ pub(crate) fn find_links(input: &str) -> impl Iterator<Item = linkify::Link> {
LINK_FINDER.links(input)
}

/// Fragment Directive feature trait
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
/// We will use the extension trait pattern to extend [`url::Url`] to support the text fragment feature

/// Fragment Directive feature trait
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
pub(crate) trait UrlExt {
/// Checks if the url has a fragment and if the fragment is has the fragment directive delimiter embedded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Checks if the url has a fragment and if the fragment is has the fragment directive delimiter embedded
/// Checks if the url has a fragment and if the fragment has the fragment directive delimiter embedded

}

impl UrlExt for Url {
/// Returns whether the URL has fragment directive or not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns whether the URL has fragment directive or not
/// Checks whether the URL has fragment directive or not

Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I like the overall structure. Good work so far!

@thiru-appitap
Copy link
Author

Left some comments. I like the overall structure. Good work so far!

@mre
I'll take a look at each of the comments and work to address it - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants