Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in t-074 #688

Closed
wants to merge 1 commit into from
Closed

Fix typo in t-074 #688

wants to merge 1 commit into from

Conversation

vr8hub
Copy link
Contributor

@vr8hub vr8hub commented May 10, 2024

I was running a lint on the corpus to investigate something with the test I'm working on, and noticed that t-074 wasn't ignoring italics with a language tag like it was supposed to. I looked up what the @ did in xpath, and then added it and tested some of the ones that were failing before, and they don't now, so I'm hopeful this is the correct fix.

Do you want help fixing the t-074's in the corpus?

@acabal
Copy link
Member

acabal commented May 10, 2024

I think I just fixed this moments ago, I'm cleaning up the corpus right now and testing the new scansion test.

@acabal acabal closed this May 10, 2024
@vr8hub vr8hub deleted the fix_t074 branch May 10, 2024 20:12
@vr8hub
Copy link
Contributor Author

vr8hub commented May 10, 2024

I've updated the test to include the exclusions, which is what I'm doing with all of the other tests, to catch things like this. I notice the word in the "too long" exclusion is actually not a sound either (Well-a-day); if it wasn't too long, that would actually be a false positive.

Do we want to look for two consecutive single characters between dashes? That seems more indicative of a sound. If so, I'll make the change in the PR with the test update.

@acabal
Copy link
Member

acabal commented May 10, 2024

Sure, test it out to see how it might go. If it's a better test than you can update it.

@vr8hub
Copy link
Contributor Author

vr8hub commented May 10, 2024

Well, it's obviously going to exclude single-letter ones, but it still catches plenty. And, just to see, I made another change for the test, to remove the requirements that it be in italics/emphasis. The rule for using non-break hyphens instead of dashes doesn't say anything about sounds, just stretched out words. In the following examples, some of them might should be tagged sounds and aren't, and some are just extended words.

  • The Financier, H-a-a-a-w! and Again that "H-a-a-a-w!
  • Weinbaum's Short Fiction, “O-o-o-o-o-oh!” I groaned. and “O-o-o-h!” I gasped.
  • Tom Sawyer, “Y-o-u-u TOM!” and “Steady, steady-y-y-y!”
  • _The Able McLaughlins, “Pr-r-r-r-r!” he articulated proudly. “Pr-r-r!”
  • Andreyev's Short Fiction, “R-r-r-r-apscallion!” (he's "roaring," not stuttering)
  • Such is Life, “H a-a-a-a-a-a-a y!”

And so forth. Plenty of other instances of actual words, not sounds, being extended, which as I understand it from the rule, should be NBH.
It will false-positive on stuttering that extends to two consecutive single letters (most stuttering doesn't, e.g. "Y-you w-want m-me…"), which I assume we don't want using NBH's, but I didn't see much of that; a bit in Wodehouse. What it catches (outside of the italics/emphasis) is certainly a lot more than the false positives.

It also catches spelled-out words where the letters aren't tagged as graphemes/phonemes, of which we also have quite a few. If you decide to keep it to just italics/emphasis, that is probably worth a test of its own.

@vr8hub
Copy link
Contributor Author

vr8hub commented May 10, 2024

I did another xpath test, looking for instances of things that matched -letter- that didn't match \bletter-letter\b. IOW, where the current t-074 would find them but the updated t-074 wouldn't. List of matches below; it looks to me like more false positives than actual catches. Note I did not attempt this without the italic/emphasis test; a single-character would definitely generate a lot of false positives.

        <i>rub-a-dub-dub</i>
        <i>rub-a-dub-dub</i>
        <i>Cur-ru-u-uck</i>
        <i>cur-ru-u-uck</i>
        <i>cur-u-uck-cock-kick</i>
        <em>Hard-a-lee!</em>
        <em>Ting-a-ling-a-ling!</em>
        <em>Jejunus-a-um</em>
        <em>gra-a-ate thoughts</em>
        <em>gra-a-ate men</em>
        <i>bwoo-ur-r-rr</i>
        <i>put-a-put, put-a-put</i>
        <i>Pee-u-ah!</i>
        <i>brek-e-kek-kex</i>
        <i>Arre! Arre! Hai! Yai! Kya-a-ah!</i>
        <i>tunk-a-tunk</i>
        <em>pit-a-pat</em>
        <i>Cock-el-i-coo</i>
        <i>Cock-el-i-coo</i>
        <i>chuck-a-chuck, chuck-a-chuck, chuck-a-chuck</i>
        <i>chug-a-chug, chug-a-chug</i>
        <i>chick-a-chick, chick-a-chick, chick-a-chick</i>

@vr8hub
Copy link
Contributor Author

vr8hub commented May 10, 2024

So, I would recommend making it two consecutive single characters, e.g. -[A-Za-z]-[A-Za-z]-. Whether the italics/emphasis test is removed is up to you; it definitely catches things that aren't caught now, and adds some false positives (but less of the latter than the former). Here is what the test without italics/emphasis is catching now ("/html/body//*[not(@epub:type) and not(@xml:lang) and re:test(., '-[A-Za-z]-[A-Za-z]-') and string-length(.) < 50]", so you can see for yourself.

/Users/vrice/src/SE/ebooks/aleksandr-kuprin_short-fiction_s-koteliansky_j-m-murry_stephen-graham_rosa-savory-graham_leo-pasvolsky_douglas-ashby_the-living-age_b-guilbert-guerney_alexander-gagarine_malcolm-w-davis/src/epub/text/the-white-poodle.xhtml
        <i>Ai-yai-yai-ya-a-a-a!</i>
        <i>U-u-u-ukh!</i>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-8.xhtml
        <span>“They fe-e-e-l the tru-u-u-u-uth!</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-9.xhtml
        <span>Bro-o-o-o-o-oad land of plenty.”</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-10.xhtml
        <p>
            <span>“They fe-e-e-e-l the tru-u-u-u-uth!”</span>
        </p>
        <span>“They fe-e-e-e-l the tru-u-u-u-uth!”</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-12.xhtml
        <span>“They fe-e-e-el the tru-u-u-uth,</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-3-4.xhtml
        <i>Ba-a-a-st-a-a!</i>
/Users/vrice/src/SE/ebooks/anthony-trollope_barchester-towers/src/epub/text/chapter-15.xhtml
        <p>“Whew-w-w-w!” whistled Bertie, “a widow!”</p>
/Users/vrice/src/SE/ebooks/anthony-trollope_barchester-towers/src/epub/text/chapter-37.xhtml
        <p>“Oh-h-h-h!” exclaimed the countess.</p>
/Users/vrice/src/SE/ebooks/charles-dickens_a-christmas-carol/src/epub/text/chapter-3.xhtml
        <p>“It’s your uncle Scro-o-o-o-oge.”</p>
/Users/vrice/src/SE/ebooks/compton-mackenzie_sinister-street/src/epub/text/chapter-4-3.xhtml
        <p>“B-a-r-n-e-s. Have you got it?”</p>
/Users/vrice/src/SE/ebooks/d-h-lawrence_sons-and-lovers/src/epub/text/chapter-8.xhtml
        <i>T-t-t-t!</i>
        <p>“<i>T-t-t-t-t!</i>” went her tongue.</p>
        <i>T-t-t-t-t!</i>
/Users/vrice/src/SE/ebooks/d-h-lawrence_sons-and-lovers/src/epub/text/chapter-12.xhtml
        <i>T-t-t-t!</i>
/Users/vrice/src/SE/ebooks/d-h-lawrence_sons-and-lovers/src/epub/text/chapter-14.xhtml
        <i>ah-h-h-h-h!</i>
/Users/vrice/src/SE/ebooks/e-e-cummings_the-enormous-room/src/epub/text/chapter-10.xhtml
        <p>“<i xml:lang="fr">Ah-h-h-h-h-h-h. …</i>”</p>
/Users/vrice/src/SE/ebooks/e-e-smith_first-lensman/src/epub/text/chapter-13.xhtml
        <em>P-s-s-t! Jill!</em>
/Users/vrice/src/SE/ebooks/e-e-smith_triplanetary/src/epub/text/chapter-2.xhtml
        <i>Z-r-r-e-e-k—<b>whap</b>!</i>
        <i>w-h-i-n-g-e-d</i>
/Users/vrice/src/SE/ebooks/e-e-smith_triplanetary/src/epub/text/chapter-7.xhtml
        <i>p-s-s-t</i>
/Users/vrice/src/SE/ebooks/edgar-lee-masters_spoon-river-anthology/src/epub/text/epilogue.xhtml
        <span>Who-o-o-o-o-o!</span>
/Users/vrice/src/SE/ebooks/edgar-wallace_blue-hand/src/epub/text/chapter-49.xhtml
        <p>“<i epub:type="z3998:grapheme">P</i>-<i epub:type="z3998:grapheme">e</i>-<i epub:type="z3998:grapheme">a</i>-<i
epub:type="z3998:grapheme">l</i>-<i epub:type="z3998:grapheme">i</i>-<i epub:type="z3998:grapheme">g</i>-<i
epub:type="z3998:grapheme">o</i>,” was the reply.</p>
/Users/vrice/src/SE/ebooks/edna-ferber_so-big/src/epub/text/chapter-1.xhtml
        <em>So-o-o-o</em>
/Users/vrice/src/SE/ebooks/f-scott-fitzgerald_the-great-gatsby/src/epub/text/chapter-7.xhtml
        <p>“No, <i epub:type="z3998:grapheme">r</i>—” corrected the man, “<i epub:type="z3998:grapheme">M</i>-<i
epub:type="z3998:grapheme">a</i>-<i epub:type="z3998:grapheme">v</i>-<i epub:type="z3998:grapheme">r</i>-<i
epub:type="z3998:grapheme">o</i>—”</p>
/Users/vrice/src/SE/ebooks/f-scott-fitzgerald_this-side-of-paradise/src/epub/text/chapter-1-2.xhtml
        <span>“Oh-h-h-h-h</span>
        <span>Oh-h-h-h!”</span>
/Users/vrice/src/SE/ebooks/ford-madox-ford_some-do-not/src/epub/text/chapter-2-4.xhtml
        <i>C-r-r-unch!</i>
/Users/vrice/src/SE/ebooks/fyodor-sologub_the-little-demon_john-cournos_richard-aldington/src/epub/text/chapter-14.xhtml
        <span>“O-o-oh; it’s a-rai-ai-ning ha-a-a-rd on me-e-e!”</span>
/Users/vrice/src/SE/ebooks/h-g-wells_short-fiction/src/epub/text/a-vision-of-judgment.xhtml
        <p>Bru-a-a-a.</p>
/Users/vrice/src/SE/ebooks/h-g-wells_tono-bungay/src/epub/text/chapter-1-1.xhtml
        <p>“Beeee-e-e-a-trice!” fearfully close.</p>
/Users/vrice/src/SE/ebooks/henry-david-thoreau_essays/src/epub/text/chesuncook.xhtml
        <i>oo-o-o-o-o-o-o-o</i>
/Users/vrice/src/SE/ebooks/henry-david-thoreau_walden/src/epub/text/sounds.xhtml
        <i>Oh-o-o-o-o that I never had been bor-r-r-r-n!</i>
        <i>that I never had been bor-r-r-r-n</i>
        <i>bor-r-r-r-n!</i>
        <i>tr-r-r-oonk, tr-r-r—oonk, tr-r-r-oonk!</i>
        <i>tr-r-r-oonk</i>
/Users/vrice/src/SE/ebooks/henry-lawson_while-the-billy-boils/src/epub/text/a-day-on-a-selection.xhtml
        <p>“T-o-o-m-<em>may</em>!”</p>
        <p>“<em>Tom</em>-m-a-a-y!”</p>
        <p>“Y-e-e-a-a-s-s!” very passionately and shrilly.</p>
        <p>“Y-e-e-a-a-s-s-s!—carn’t yer see I’m comin’?”</p>
/Users/vrice/src/SE/ebooks/honore-de-balzac_father-goriot_ellen-marriage/src/epub/text/father-goriot.xhtml
        <p>“Poir-r-r-rette! she had you there!”</p>
        <i>Ca-ro, ca-a-ro, ca-a-a-ro, non du-bi-ta-re</i>
/Users/vrice/src/SE/ebooks/horatio-alger-jr_ragged-dick/src/epub/text/chapter-16.xhtml
        <p>“<i epub:type="z3998:grapheme">T</i>-<i epub:type="z3998:grapheme">h</i>-<i epub:type="z3998:grapheme">r</i>-<i
epub:type="z3998:grapheme">u</i>,” said Dick.</p>
/Users/vrice/src/SE/ebooks/jack-london_the-sea-wolf/src/epub/text/chapter-38.xhtml
        <p>“B-O-S-H.”</p>
/Users/vrice/src/SE/ebooks/joseph-furphy_such-is-life/src/epub/text/chapter-4.xhtml
        <p>“Ha-a-a-ay!”</p>
        <p>“Ha-a-a-a-ay!”</p>
        <p>“Ha-a-a-a-a-ay!”</p>
        <p>“H a-a—a-a-a-a-a y!”</p>
        <p>“Ha-a-a-a-a-a-ay!”</p>
        <p>“Ha-a-a-a-a-a-a-a-ay!”</p>
        <p>“Ha-a-a-a-a-a-a-a-a-ay!”</p>
/Users/vrice/src/SE/ebooks/leonid-andreyev_short-fiction_herman-bernstein_alexandra-linden_l-a-magnus_k-walter_w-h-lowe_the-russian-
review_archibald-j-wolfe_john-cournos_r-s-townsend_maurice-magnus/src/epub/text/his-excellency-the-governor.xhtml
        <p>“R-r-r-r-apscallion!”</p>
/Users/vrice/src/SE/ebooks/margaret-wilson_the-able-mclaughlins/src/epub/text/chapter-19.xhtml
        <p>“Pr-r-r-r-r!” he articulated proudly. “Pr-r-r!”</p>
/Users/vrice/src/SE/ebooks/marie-belloc-lowndes_the-lodger/src/epub/text/chapter-3.xhtml
        <p>“No,” she shot out, “<i epub:type="z3998:grapheme">S</i>-<i epub:type="z3998:grapheme">l</i>-<i
epub:type="z3998:grapheme">e</i>-<i epub:type="z3998:grapheme">u</i>-<i epub:type="z3998:grapheme">t</i>-<i
epub:type="z3998:grapheme">h</i>.”</p>
/Users/vrice/src/SE/ebooks/mark-twain_a-connecticut-yankee-in-king-arthurs-court/src/epub/text/chapter-15.xhtml
        <em>s-a-n-d</em>
/Users/vrice/src/SE/ebooks/mark-twain_roughing-it/src/epub/text/chapter-51.xhtml
        <span class="i1">“Labbord!—stabbord!—s-t-e-a-d-y!—so!—</span>
        <span class="i1">Three feet large!—t-h-r-e-e feet!—</span>
/Users/vrice/src/SE/ebooks/mark-twain_the-adventures-of-huckleberry-finn/src/epub/text/chapter-17.xhtml
        <p>“G-e-o-r-g-e J-a-x-o-n—there now,” he says.</p>
/Users/vrice/src/SE/ebooks/mark-twain_the-adventures-of-tom-sawyer/src/epub/text/chapter-1.xhtml
        <p>“Y-o-u-u <strong>tom</strong>!”</p>
/Users/vrice/src/SE/ebooks/mark-twain_the-adventures-of-tom-sawyer/src/epub/text/chapter-13.xhtml
        <p>“Steady, steady-y-y-y!”</p>
/Users/vrice/src/SE/ebooks/martin-andersen-nexo_pelle-the-conqueror_jessie-muir_bernard-miall/src/epub/text/chapter-1-17.xhtml
        <i>Whe-e-e-e-ew!</i>
/Users/vrice/src/SE/ebooks/nikolai-gogol_dead-souls_c-j-hogarth/src/epub/text/chapter-1-2.xhtml
        <i>tur-r-r-ru-ing</i>
/Users/vrice/src/SE/ebooks/nikolai-gogol_dead-souls_d-j-hogarth/src/epub/text/chapter-1-2.xhtml
        <i>tur-r-r-ru-ing</i>
/Users/vrice/src/SE/ebooks/o-henry_short-fiction/src/epub/text/a-harlem-tragedy.xhtml
        <p>“M-m-m-yep,” grunted <abbr epub:type="z3998:name-title">Mr.</abbr> Fink.</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_golf-stories/src/epub/text/chester-forgets-himself.xhtml
        <p>“D-d-d-dear me!” said Chester.</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_indiscretions-of-archie/src/epub/text/chapter-3.xhtml
        <p>“It’s spelt M-o-f-f-a-m, but pronounced Moom.”</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_mr-mulliner-stories/src/epub/text/the-truth-about-george.xhtml
        <p>“N-n-n-n-n-n—” he said.</p>
        <p>“N-n-n-n-n-n-ice d-d-d-d—”</p>
        <p>“I-I-I-I-I-I-I—” said George.</p>
        <p>“S-s-s-s-s-s-s-s—?” said George, puzzled.</p>
        <p>“W-w-w-w-w—?” asked George.</p>
        <p>“N-n-n-n-nice weather,” he said.</p>
        <p>“L-l-l-l-larks?”</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_the-pothunters/src/epub/text/chapter-6.xhtml
        <span>We-e dress him up in uniform so ne-e-e-at.”</span>
/Users/vrice/src/SE/ebooks/richard-marsh_the-beetle/src/epub/text/chapter-21.xhtml
        <p>“S-stay, y-y-y-you—” he stuttered.</p>
/Users/vrice/src/SE/ebooks/richmal-crompton_just-william/src/epub/text/chapter-7.xhtml
        <p>“Gr-r-r-r-r!”</p>
/Users/vrice/src/SE/ebooks/robert-frost_new-hampshire/src/epub/text/new-hampshire.xhtml
        <span>You tell her that it’s M-A-P-L-E.</span>
/Users/vrice/src/SE/ebooks/sigfrid-siwertz_downstream_e-classen/src/epub/text/chapter-1-7.xhtml
        <i>I-i-i-i!</i>
/Users/vrice/src/SE/ebooks/sigfrid-siwertz_downstream_macmillan-of-canada/src/epub/text/chapter-1-7.xhtml
        <i>I-i-i-i!</i>
/Users/vrice/src/SE/ebooks/stanley-g-weinbaum_short-fiction/src/epub/text/the-ideal.xhtml
        <p>“O-o-o-o-o-oh!” I groaned.</p>
/Users/vrice/src/SE/ebooks/stanley-g-weinbaum_short-fiction/src/epub/text/the-point-of-view.xhtml
        <p>“O-o-o-h!” I gasped.</p>
/Users/vrice/src/SE/ebooks/theodore-dreiser_an-american-tragedy/src/epub/text/chapter-2-47.xhtml
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
/Users/vrice/src/SE/ebooks/theodore-dreiser_the-financier/src/epub/text/chapter-25.xhtml
        <p>Again that “H-a-a-a-w!”</p>
        <p>“H-a-a-a-w!”</p>
/Users/vrice/src/SE/ebooks/theodore-roosevelt_an-autobiography/src/epub/text/chapter-3.xhtml
        <p><b epub:type="z3998:persona">Brogan.</b> Misther Clu-r-r-k!</p>
/Users/vrice/src/SE/ebooks/voltairine-de-cleyre_poetry/src/epub/text/poetry.xhtml
        <p>
            <span class="i3">Br-r-r-r-r-r-r-r-r-f-f-f-f-f!!!</span>
        </p>
        <span class="i3">Br-r-r-r-r-r-r-r-r-f-f-f-f-f!!!</span>
/Users/vrice/src/SE/ebooks/w-e-b-du-bois_darkwater/src/epub/text/colophon.xhtml
        <a href="https://standardebooks.org/ebooks/w-e-b-du-bois/darkwater">standardebooks.org/ebooks/w-e-b-du-bois/darkwater</a>
/Users/vrice/src/SE/ebooks/walter-noble-burns_tombstone/src/epub/text/chapter-5.xhtml
        <p>“<i>Whee-e-e-e-e!</i>”</p>
        <i>Whee-e-e-e-e!</i>
/Users/vrice/src/SE/ebooks/wilkie-collins_no-name/src/epub/text/chapter-13-1.xhtml
        <span>“His form was of the manliest beau-u-u-uty,</span>
        <span>But now he’s gone alo-o-o-o-oft—</span>
        <span>But now he’s go-o-o-one aloft!”</span>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants