Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better checking for external links #619

Merged
merged 10 commits into from
Oct 22, 2024

Conversation

maddenp-noaa
Copy link
Collaborator

Synopsis

Fixes #614 by replacing link checking using the built-in Sphinx link checker, which operates on the RST files, with a more capable external link checker that operates on the generated HTML files. This results in raw HTML links in e.g. <a> or, pertinently, <iframe> tags being checked the same as links generated by Sphinx from RST code.

As a test, I introduced a typo into one of the URLs defined in docs/confg.py:

$ git diff -U0
diff --git a/docs/conf.py b/docs/conf.py
index 8e0e67f9..9cb12f18 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -68 +68 @@ extlinks = {
-    "ww3": ("https://polar.ncep.noaa.gov/waves/wavewatch/%s", "%s"),
+    "ww3": ("https://polar.ncep.noaa.gov/waves/wavewitch/%s", "%s"),

The typo is detected and the checker exits with non-zero status after displaying this error:

URL        `https://polar.ncep.noaa.gov/waves/wavewitch/'
Name       `Wave Watch III'
Parent URL file:///home/maddenp/git/uwtools/docs/build/html/sections/user_guide/cli/drivers/ww3/index.html, line 117, col 123
Real URL   https://polar.ncep.noaa.gov/waves/wavewitch/
Check time 0.944 seconds
Size       214B
Result     Error: 404 Not Found

I also updated a couple Makefiles to make generation of CLI example output less noisy, reducing the volume of output from make docs. I haven't found that additional output useful in the past.

Finally, and unfortunately, note that this only partially helps with the core issue reported in #614 -- that links to YouTube videos were not being checked. They will be checked now, but not every incorrect URL is considered an error by YouTube. For example:

$ curl -s --head https://www.youtube.com/embed/foo | grep ^HTTP
HTTP/2 200

Here, although no video named foo is available, youtube.com returns an HTTP 200 success code. Open that link in your browser and you'll see that it returns content.

On the other hand, a typo in an earlier part of a YouTube URL does result in an error:

$ curl -s --head https://www.youtube.com/embedZZZ/foo | grep ^HTTP
HTTP/2 404 

This is down to YouTube's API design: They have decided not to send an HTTP error response even if a requested video does not exist.

Nevertheless, I think the changes in the PR are helpful and worth moving forward with.

Type

  • Documentation
  • Enhancement (adds new functionality)

Impact

  • This is a non-breaking change (existing functionality continues to work as expected)

Checklist

  • I have added myself and any co-authors to the PR's Assignees list.
  • I have reviewed the documentation and have made any updates necessitated by this change.

@maddenp-noaa maddenp-noaa self-assigned this Oct 22, 2024
docs/Makefile Show resolved Hide resolved
docs/environment.yml Outdated Show resolved Hide resolved
docs/index.rst Show resolved Hide resolved
docs/sections/user_guide/cli/Makefile Show resolved Hide resolved
Copy link
Contributor

@WeirAE WeirAE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice & clean, looks good

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

docs/Makefile Show resolved Hide resolved
docs/environment.yml Outdated Show resolved Hide resolved
docs/index.rst Show resolved Hide resolved
docs/deps Show resolved Hide resolved
docs/environment.yml Show resolved Hide resolved
@maddenp-noaa maddenp-noaa merged commit 406750c into ufs-community:main Oct 22, 2024
2 checks passed
@maddenp-noaa maddenp-noaa deleted the gh614-external-links branch October 22, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enabler] Learn how to check links included in raw html
5 participants