Meta: Improve results on Readability.js test suite #61

vkryukov · 2024-11-12T00:52:31Z

While readability has a reasonable test suite, it pales in comparison to Readability.js test suite with its 124 test cases.

With the help of a small Elixir program, we're going to discover main differences in behavior between us and them. This issue will serve as a Meta issue to keep track of all bug fixes and reasonable improvement opportunities.

The text was updated successfully, but these errors were encountered:

Valian · 2024-11-12T10:00:32Z

The best approach would be to "steal" test suite from readability and try to make it pass with an Elixir version 🤔 It would require quite a lot of changes to the underlying implementation, but should be possible.

vkryukov · 2024-11-12T15:27:35Z

I agree - ideally, we would just use their test suite unchanged (or programmatically modified) as part of readability test suite.

There would be a few challenges, though: for example, they wrap their results in <div id="readability-page-1" class="page">, they keep the full page, including the title, with the current readability's logic removes from the page, h1 in the input sometimes becomes h2 in the output.

My current approach was to use it for inspiration to find new bugs and edge case, and than manually add (possibly adjusted) tests to improve our own test suite.

Valian mentioned this issue Nov 12, 2024

Comments sections with many URLs is mistaken for the article text #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta: Improve results on Readability.js test suite #61

Meta: Improve results on Readability.js test suite #61

vkryukov commented Nov 12, 2024 •

edited

Loading

Valian commented Nov 12, 2024

vkryukov commented Nov 12, 2024

Meta: Improve results on Readability.js test suite #61

Meta: Improve results on Readability.js test suite #61

Comments

vkryukov commented Nov 12, 2024 • edited Loading

Valian commented Nov 12, 2024

vkryukov commented Nov 12, 2024

vkryukov commented Nov 12, 2024 •

edited

Loading