Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify y-016 to properly exclude more than two periods in a row #677

Merged
merged 1 commit into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion se/se_epub_lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -2942,7 +2942,7 @@ def _lint_xhtml_typo_checks(filename: Path, dom: se.easy_xml.EasyXmlTree, file_c
messages.append(LintMessage("y-014", "Possible typo: Unexpected [text].[/] at the end of quotation. Hint: If a dialog tag follows, should this be [text],[/]?", se.MESSAGE_TYPE_WARNING, filename, typos))

# Check for two periods in a row, almost always a typo for one period or a hellip
typos = [node.to_string() for node in dom.xpath("/html/body//p[re:test(., '\\.\\.[^\\.]')]")]
typos = [node.to_string() for node in dom.xpath("/html/body//p[re:test(., '[^\\.]\\.\\.[^\\.]')]")]
if typos:
messages.append(LintMessage("y-016", "Possible typo: consecutive periods ([text]..[/]).", se.MESSAGE_TYPE_WARNING, filename, typos))

Expand Down
7 changes: 7 additions & 0 deletions tests/lint/typos/y-016/golden/y-016-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
y-016 [Manual Review] chapter-1.xhtml Possible typo: consecutive periods (`..`).
<p>We can assume that any instance of a point can be construed as a
themeless fisherman. Framed in a different way, some picked shakes are thought
of simply as crabs.. Nowhere is it disputed that a dinner sees a modem as a
warming customer. The zeitgeist contends that we can assume that any instance of
a kenneth can be construed as an entranced belgian. A rotate is a gaumless
debt.</p>
19 changes: 19 additions & 0 deletions tests/lint/typos/y-016/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- EXCLUSION 1, three periods in a row -->
<p>The ethernet is a hose... To be more specific, an unpreached violin without hubs is truly a railway of obverse deads. Authors often misinterpret the guatemalan as a flamy conifer, when in actuality it feels more like an unbacked snowman.</p>
<!-- EXCLUSION 2, five periods in a row (anything more than two will be excluded) -->
<p>Few can name a combless cast that isn't a pretend ankle. The zeitgeist contends that the seat of a twilight becomes a linty case. They were lost without the engraved sauce that composed their kitchen.....</p>
<!-- FAIL 1, two periods in a row -->
<p>We can assume that any instance of a point can be construed as a themeless fisherman. Framed in a different way, some picked shakes are thought of simply as crabs.. Nowhere is it disputed that a dinner sees a modem as a warming customer. The zeitgeist contends that we can assume that any instance of a kenneth can be construed as an entranced belgian. A rotate is a gaumless debt.</p>
</section>
</body>
</html>
Loading