Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoorlyDrawnLines: Fix after site redesign #329

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vemek
Copy link
Contributor

@vemek vemek commented Jul 1, 2024

Hi folks. Poorly Drawn Lines seems to have had a recent redesign and is no longer working. This is a WIP fix, but I've hit an issue. Some pages are non-comic, e.g. promos for the TV show of the comic. These currently generate an error and bail out. I'm haven't found a way to suppress this for known pages without comics. Any advice here appreciated!

A few other notes:

  • The redesign uses WordPress, but doesn't fit any existing WP scrapers
  • There are some recent pages with 2 comic images, so this now sets multipleImagesPerStrip
  • The comic now loads with dynamic resolution using inline SVG, so I'm checking data-src instead
  • I'd be happy for feedback on specificity vs. robustness. The DOM tree and classes aren't always consistent. I've tried to work around this without pulling in other images. Still, feels a little flaky.

Copy link

codecov bot commented Jul 1, 2024

Codecov Report

Attention: Patch coverage is 38.88889% with 11 lines in your changes missing coverage. Please review.

Project coverage is 81.77%. Comparing base (f76061e) to head (5c27aab).
Report is 40 commits behind head on master.

Files with missing lines Patch % Lines
dosagelib/plugins/p.py 38.88% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #329      +/-   ##
==========================================
- Coverage   82.02%   81.77%   -0.26%     
==========================================
  Files          79       79              
  Lines        6609     6573      -36     
  Branches      525      529       +4     
==========================================
- Hits         5421     5375      -46     
- Misses       1069     1079      +10     
  Partials      119      119              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vemek vemek force-pushed the vemek/poorly-drawn-lines-site-redesign branch from 0cb3794 to 2586aac Compare July 1, 2024 20:33
@vemek
Copy link
Contributor Author

vemek commented Jul 1, 2024

Update: The archive on this site is a bit of a mess, but this is working for now and is ready for review. There are a few dozen comics no longer available, but we also now correctly pull multi-image pages, so it seems to be a net-positive.

I've implemented a skip list with shouldSkipUrl to avoid the issue with non-comic pages. I've also found that the back links are broken at multiple points in the archive. So Much used to link to this page, but now redirects to the main page instead, causing a loop. It looks like any comic that had a number instead of a name in the title is now broken, e.g. 8198, 8186, 8177.

I've hard-coded these to skip the broken redirects and go to the next comic that is still available. I assume these will get fixed at some point as only ~280 of ~1500 posts are now accessible using the normal site back button before it breaks. That said, this is working and I've confirmed that it pulls the entire back catalogue as best we can.

@vemek vemek marked this pull request as ready for review July 1, 2024 20:34
@vemek vemek force-pushed the vemek/poorly-drawn-lines-site-redesign branch from 2586aac to 5c27aab Compare July 1, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant