Skip to content

Commit

Permalink
Add lint tests y-017 thru y-020
Browse files Browse the repository at this point in the history
  • Loading branch information
vr8hub authored and acabal committed May 13, 2024
1 parent 0a104ec commit 87e1830
Show file tree
Hide file tree
Showing 12 changed files with 353 additions and 3 deletions.
6 changes: 3 additions & 3 deletions se/se_epub_lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -2995,19 +2995,19 @@ def _lint_xhtml_typo_checks(filename: Path, dom: se.easy_xml.EasyXmlTree, file_c
if typos:
messages.append(LintMessage("y-018", "Possible typo: [text]‘[/] followed by space.", se.MESSAGE_TYPE_WARNING, filename, typos))

# Check for closing rdquo without opening ldquo. We ignore blockquotes because they usually have unique quote formatting.
# Check for closing rdquo without opening ldquo.
# Remove tds in case rdquo means "ditto mark"
typos = regex.findall(r"”[^“‘]+?”", regex.sub(r"<td[^>]*?>[”\s]+?(<a .+?epub:type=\"noteref\">.+?</a>)?</td>", "", file_contents), flags=regex.DOTALL)

# We create a filter to try to exclude nested quotations
# Remove tags in case they're enclosing punctuation we want to match against at the end of a sentence.
typos = [match for match in typos if not regex.search(r"(?:[\.!\?;…—]|”\s)’\s", se.formatting.remove_tags(match))]
typos = [match for match in typos if not regex.search(r"(?:[.!?;…—]|”\s)’\s", se.formatting.remove_tags(match))]

# Try some additional matches before adding the lint message
# Search for <p> tags that have an ending closing quote but no opening quote; but exclude <p>s that are preceded by a <blockquote>
# or that have a <blockquote> ancestor, because that may indicate that the opening quote is elsewhere in the quotation.
for node in dom.xpath("//p[re:test(., '^[^“]+”') and not(./preceding-sibling::*[1][name() = 'blockquote']) and not(./ancestor::*[re:test(@epub:type, 'z3998:(poem|verse|song|hymn)')]) and not(./ancestor::blockquote)]"):
typos.append(node.to_string()[-20:])
typos.append(node.to_string())

if typos:
messages.append(LintMessage("y-019", "Possible typo: [text]”[/] without opening [text]“[/].", se.MESSAGE_TYPE_WARNING, filename, typos))
Expand Down
9 changes: 9 additions & 0 deletions tests/lint/typos/y-017/golden/y-017-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
y-017 [Manual Review] chapter-1.xhtml Possible typo: `“` followed by space.
<p>Fail can assume that any instance of a competition can be construed
as a crunchy poppy. We know that an unreined thunderstorm's vinyl comes with it
the thought that the soaring cartoon is a truck. “ The judges could be said to
resemble stoneware distributions. It's an undeniable fact, really; few can name
a carking pencil that isn't a stylish twig.”</p>
<p>“ Fail pimples could be said to resemble grumpy cries. We can assume
that any instance of a banker can be construed as a clitic spruce. Blankets are
unpurged wings.</p>
21 changes: 21 additions & 0 deletions tests/lint/typos/y-017/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- EXCLUSION 1, ldquo followed by whitespace (hrsp) followed by lsquo -->
<p>A stream is a stem's sand. Nowhere is it disputed that starters are flowered ex-husbands. This could be, or perhaps a thousandth bibliography is a silica of the mind. “ ‘The hyenas could be said to resemble fickle vermicellis. Before carriages, screwdrivers were only farms.</p>
<!-- EXCLUSION 2, ldquo followed by whitespace elided word (rsquo) -->
<p>“ ’Tis an undeniable fact, really; the thistles could be said to resemble prunted astronomies.” Far from the truth, a chain is the use of a pyjama. The friend is a break. Authors often misinterpret the cellar as an otic letter, when in actuality it feels more like an unmown basin.</p>
<!-- FAIL 1, ldquo followed by whitespace (space) followed by not a l/rsquo -->
<p>Fail can assume that any instance of a competition can be construed as a crunchy poppy. We know that an unreined thunderstorm's vinyl comes with it the thought that the soaring cartoon is a truck. “ The judges could be said to resemble stoneware distributions. It's an undeniable fact, really; few can name a carking pencil that isn't a stylish twig.”</p>
<!-- FAIL 2, ldquo followed by whitespace (hrsp) followed by not a l/rsquo -->
<p>“ Fail pimples could be said to resemble grumpy cries. We can assume that any instance of a banker can be construed as a clitic spruce. Blankets are unpurged wings.</p>
</section>
</body>
</html>
7 changes: 7 additions & 0 deletions tests/lint/typos/y-018/golden/y-018-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
y-018 [Manual Review] chapter-1.xhtml Possible typo: `‘` followed by space.
<p>A temperature sees a level as a shawlless pond. “ ‘ One cannot
separate governments from utile pancakes.’ Some posit the flaunty pimple to be
less than orphan.”</p>
<p>“ ‘ A chord is a nifty office.’ ” A permission is a sense's yacht.
This is not to discredit the idea that one cannot separate burmas from hurtful
kitchens.</p>
21 changes: 21 additions & 0 deletions tests/lint/typos/y-018/in/src/epub/text/chapter-1.xhtml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>I</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-1" epub:type="chapter">
<h2 epub:type="ordinal z3998:roman">I</h2>
<!-- EXCLUSION 1, lsquo followed by whitespace (hrsp) followed by ldquo -->
<p>“ ‘ “A basin is the target of a germany.” We can assume that any instance of a mustard can be construed as an unmarked goat.’ Some posit the boggy thunder to be less than tensive.”</p>
<!-- EXCLUSION 2, lsquo followed by whitespace followed by rsquo -->
<p>“‘ ’ A swamp is an estimate's crush. A rake is a gusty may. The literature would have us believe that a malign ellipse is not but a makeup.” However, an unsure reduction is a speedboat of the mind.</p>
<!-- FAIL 1, lsquo followed by whitespace (space) followed by not a ldquo/rsquo -->
<p>A temperature sees a level as a shawlless pond. “ ‘ One cannot separate governments from utile pancakes.’ Some posit the flaunty pimple to be less than orphan.”</p>
<!-- FAIL 2, ldquo followed by whitespace (hrsp) followed by not a ldquo/rsquo -->
<p>“ ‘ A chord is a nifty office.’ ” A permission is a sense's yacht. This is not to discredit the idea that one cannot separate burmas from hurtful kitchens.</p>
</section>
</body>
</html>
10 changes: 10 additions & 0 deletions tests/lint/typos/y-019/golden/y-019-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
y-019 [Manual Review] chapter-1.xhtml Possible typo: `”` without opening `“`.
” In ancient times we can assume that any instance of a town can be
construed as a deprived exclamation. A train sees a hovercraft as a gyral fork.
Dolls are cheerly saws.”
<p>The charleses could be said to resemble nasty waxes. A joking
secretary without myanmars is truly a catsup of yearly thrills.” What we don't
know for sure is whether or not we can assume that any instance of a robin can
be construed as a mannish magician. In ancient times before searches, greeces
were only causes. The raving mice reveals itself as an unforced sock to those
who look.</p>
93 changes: 93 additions & 0 deletions tests/lint/typos/y-019/in/src/epub/content.opf
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://www.idpf.org/2007/opf" dir="ltr" prefix="se: https://standardebooks.org/vocab/1.0" unique-identifier="uid" version="3.0" xml:lang="en-US">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier id="uid">url:https://standardebooks.org/ebooks/jane-austen/unknown-novel</dc:identifier>
<dc:date>1900-01-01T00:00:00Z</dc:date>
<meta property="dcterms:modified">1900-01-01T00:00:00Z</meta>
<dc:rights>The source text and artwork in this ebook are believed to be in the United States public domain; that is, they are believed to be free of copyright restrictions in the United States. They may still be copyrighted in other countries, so users located outside of the United States must check their local laws before using this ebook. The creators of, and contributors to, this ebook dedicate their contributions to the worldwide public domain via the terms in the [CC0 1.0 Universal Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).</dc:rights>
<dc:publisher id="publisher">Standard Ebooks</dc:publisher>
<meta property="file-as" refines="#publisher">Standard Ebooks</meta>
<meta property="se:url.homepage" refines="#publisher">https://standardebooks.org</meta>
<meta property="role" refines="#publisher" scheme="marc:relators">bkd</meta>
<meta property="role" refines="#publisher" scheme="marc:relators">mdc</meta>
<meta property="role" refines="#publisher" scheme="marc:relators">pbl</meta>
<dc:contributor id="type-designer">The League of Moveable Type</dc:contributor>
<meta property="file-as" refines="#type-designer">League of Moveable Type, The</meta>
<meta property="se:url.homepage" refines="#type-designer">https://www.theleagueofmoveabletype.com</meta>
<meta property="role" refines="#type-designer" scheme="marc:relators">tyd</meta>
<link href="http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-aa" rel="dcterms:conformsTo"/>
<meta property="a11y:certifiedBy">Standard Ebooks</meta>
<meta property="schema:accessMode">textual</meta>
<meta property="schema:accessModeSufficient">textual</meta>
<meta property="schema:accessibilityFeature">readingOrder</meta>
<meta property="schema:accessibilityFeature">structuralNavigation</meta>
<meta property="schema:accessibilityFeature">tableOfContents</meta>
<meta property="schema:accessibilityFeature">unlocked</meta>
<meta property="schema:accessibilityHazard">none</meta>
<meta property="schema:accessibilitySummary">This publication conforms to WCAG 2.2 Level AA.</meta>
<link href="onix.xml" media-type="application/xml" properties="onix" rel="record"/>
<dc:title id="title">Unknown Novel</dc:title>
<meta property="file-as" refines="#title">Unknown Novel</meta>
<dc:subject id="subject-1">England--Social life and customs--19th century--Fiction</dc:subject>
<dc:subject id="subject-2">Sisters -- Fiction</dc:subject>
<meta property="authority" refines="#subject-1">LCSH</meta>
<meta property="term" refines="#subject-1">sh2008114941</meta>
<meta property="authority" refines="#subject-2">LCSH</meta>
<meta property="term" refines="#subject-2">sh2008111400</meta>
<meta property="se:subject">Fiction</meta>
<dc:description id="description">A short test novel for lint testing.</dc:description>
<meta id="long-description" property="se:long-description" refines="#description">
&lt;p&gt;A short test novel for lint testing.&lt;/p&gt;
</meta>
<dc:language>en-GB</dc:language>
<dc:source>https://www.gutenberg.org/ebooks/161</dc:source>
<dc:source>https://archive.org/details/bub_gb_RtT0OLKFMHsC</dc:source>
<meta property="se:word-count">WORD_COUNT</meta>
<meta property="se:reading-ease.flesch">READING_EASE</meta>
<meta property="se:url.encyclopedia.wikipedia">https://en.wikipedia.org/wiki/Unknown_Jane_Austen_Novel</meta>
<meta property="se:url.vcs.github">https://github.com/standardebooks/jane-austen_unknown-novel</meta>
<dc:creator id="author">Jane Austen</dc:creator>
<meta property="file-as" refines="#author">Austen, Jane</meta>
<meta property="se:url.encyclopedia.wikipedia" refines="#author">https://en.wikipedia.org/wiki/Jane_Austen</meta>
<meta property="se:url.authority.nacoaf" refines="#author">http://id.loc.gov/authorities/names/n79032879</meta>
<meta property="role" refines="#author" scheme="marc:relators">aut</meta>
<meta property="role" refines="#author" scheme="marc:relators">ann</meta>
<dc:contributor id="artist">Georg Friedrich Kersting</dc:contributor>
<meta property="file-as" refines="#artist">Kersting, Georg Friedrich</meta>
<meta property="se:url.encyclopedia.wikipedia" refines="#artist">https://en.wikipedia.org/wiki/Georg_Friedrich_Kersting</meta>
<meta property="se:url.authority.nacoaf" refines="#artist">http://id.loc.gov/authorities/names/n83319941</meta>
<meta property="role" refines="#artist" scheme="marc:relators">art</meta>
<dc:contributor id="transcriber-1">Anonymous</dc:contributor>
<meta property="file-as" refines="#transcriber-1">Anonymous</meta>
<meta property="role" refines="#transcriber-1" scheme="marc:relators">trc</meta>
<dc:contributor id="producer-1">John Doe</dc:contributor>
<meta property="file-as" refines="#producer-1">Doe, John</meta>
<meta property="role" refines="#producer-1" scheme="marc:relators">bkp</meta>
<meta property="role" refines="#producer-1" scheme="marc:relators">blw</meta>
<meta property="role" refines="#producer-1" scheme="marc:relators">cov</meta>
<meta property="role" refines="#producer-1" scheme="marc:relators">mrk</meta>
<meta property="role" refines="#producer-1" scheme="marc:relators">pfr</meta>
<meta property="role" refines="#producer-1" scheme="marc:relators">tyg</meta>
</metadata>
<manifest>
<item href="css/core.css" id="core.css" media-type="text/css"/>
<item href="css/local.css" id="local.css" media-type="text/css"/>
<item href="css/se.css" id="se.css" media-type="text/css"/>
<item href="images/cover.svg" id="cover.svg" media-type="image/svg+xml" properties="cover-image"/>
<item href="images/logo.svg" id="logo.svg" media-type="image/svg+xml"/>
<item href="images/titlepage.svg" id="titlepage.svg" media-type="image/svg+xml"/>
<item href="text/chapter-1.xhtml" id="chapter-1.xhtml" media-type="application/xhtml+xml"/>
<item href="text/colophon.xhtml" id="colophon.xhtml" media-type="application/xhtml+xml" properties="svg"/>
<item href="text/imprint.xhtml" id="imprint.xhtml" media-type="application/xhtml+xml" properties="svg"/>
<item href="text/titlepage.xhtml" id="titlepage.xhtml" media-type="application/xhtml+xml" properties="svg"/>
<item href="text/uncopyright.xhtml" id="uncopyright.xhtml" media-type="application/xhtml+xml"/>
<item href="toc.xhtml" id="toc.xhtml" media-type="application/xhtml+xml" properties="nav"/>
</manifest>
<spine>
<itemref idref="titlepage.xhtml"/>
<itemref idref="imprint.xhtml"/>
<itemref idref="chapter-1.xhtml"/>
<itemref idref="colophon.xhtml"/>
<itemref idref="uncopyright.xhtml"/>
</spine>
</package>
38 changes: 38 additions & 0 deletions tests/lint/typos/y-019/in/src/epub/css/local.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
@charset "utf-8";
@namespace epub "http://www.idpf.org/2007/ops";

table{
margin: 1em auto;
}

/* poem/verse/song */
[epub|type~="z3998:hymn"] p,
[epub|type~="z3998:poem"] p,
[epub|type~="z3998:song"] p,
[epub|type~="z3998:verse"] p{
text-align: initial;
text-indent: 0;
}

[epub|type~="z3998:hymn"] p > span,
[epub|type~="z3998:poem"] p > span,
[epub|type~="z3998:song"] p > span,
[epub|type~="z3998:verse"] p > span{
display: block;
padding-left: 1em;
text-indent: -1em;
}

[epub|type~="z3998:hymn"] p > span + br,
[epub|type~="z3998:poem"] p > span + br,
[epub|type~="z3998:song"] p > span + br,
[epub|type~="z3998:verse"] p > span + br{
display: none;
}

[epub|type~="z3998:hymn"] p + p,
[epub|type~="z3998:song"] p + p,
[epub|type~="z3998:verse"] p + p{
margin-top: 1em;
}
/* end of poem/verse/song */
Loading

0 comments on commit 87e1830

Please sign in to comment.