Fix typo in t-074 #688

vr8hub · 2024-05-10T19:54:08Z

I was running a lint on the corpus to investigate something with the test I'm working on, and noticed that t-074 wasn't ignoring italics with a language tag like it was supposed to. I looked up what the @ did in xpath, and then added it and tested some of the ones that were failing before, and they don't now, so I'm hopeful this is the correct fix.

Do you want help fixing the t-074's in the corpus?

acabal · 2024-05-10T20:07:15Z

I think I just fixed this moments ago, I'm cleaning up the corpus right now and testing the new scansion test.

vr8hub · 2024-05-10T20:25:35Z

I've updated the test to include the exclusions, which is what I'm doing with all of the other tests, to catch things like this. I notice the word in the "too long" exclusion is actually not a sound either (Well-a-day); if it wasn't too long, that would actually be a false positive.

Do we want to look for two consecutive single characters between dashes? That seems more indicative of a sound. If so, I'll make the change in the PR with the test update.

acabal · 2024-05-10T20:28:59Z

Sure, test it out to see how it might go. If it's a better test than you can update it.

vr8hub · 2024-05-10T21:17:24Z

Well, it's obviously going to exclude single-letter ones, but it still catches plenty. And, just to see, I made another change for the test, to remove the requirements that it be in italics/emphasis. The rule for using non-break hyphens instead of dashes doesn't say anything about sounds, just stretched out words. In the following examples, some of them might should be tagged sounds and aren't, and some are just extended words.

The Financier, H-a-a-a-w! and Again that "H-a-a-a-w!
Weinbaum's Short Fiction, “O-o-o-o-o-oh!” I groaned. and “O-o-o-h!” I gasped.
Tom Sawyer, “Y-o-u-u TOM!” and “Steady, steady-y-y-y!”
_The Able McLaughlins, “Pr-r-r-r-r!” he articulated proudly. “Pr-r-r!”
Andreyev's Short Fiction, “R-r-r-r-apscallion!” (he's "roaring," not stuttering)
Such is Life, “H a-a-a-a-a-a-a y!”

And so forth. Plenty of other instances of actual words, not sounds, being extended, which as I understand it from the rule, should be NBH.
It will false-positive on stuttering that extends to two consecutive single letters (most stuttering doesn't, e.g. "Y-you w-want m-me…"), which I assume we don't want using NBH's, but I didn't see much of that; a bit in Wodehouse. What it catches (outside of the italics/emphasis) is certainly a lot more than the false positives.

It also catches spelled-out words where the letters aren't tagged as graphemes/phonemes, of which we also have quite a few. If you decide to keep it to just italics/emphasis, that is probably worth a test of its own.

vr8hub · 2024-05-10T21:45:30Z

I did another xpath test, looking for instances of things that matched -letter- that didn't match \bletter-letter\b. IOW, where the current t-074 would find them but the updated t-074 wouldn't. List of matches below; it looks to me like more false positives than actual catches. Note I did not attempt this without the italic/emphasis test; a single-character would definitely generate a lot of false positives.

        <i>rub-a-dub-dub</i>
        <i>rub-a-dub-dub</i>
        <i>Cur-ru-u-uck</i>
        <i>cur-ru-u-uck</i>
        <i>cur-u-uck-cock-kick</i>
        <em>Hard-a-lee!</em>
        <em>Ting-a-ling-a-ling!</em>
        <em>Jejunus-a-um</em>
        <em>gra-a-ate thoughts</em>
        <em>gra-a-ate men</em>
        <i>bwoo-ur-r-rr</i>
        <i>put-a-put, put-a-put</i>
        <i>Pee-u-ah!</i>
        <i>brek-e-kek-kex</i>
        <i>Arre! Arre! Hai! Yai! Kya-a-ah!</i>
        <i>tunk-a-tunk</i>
        <em>pit-a-pat</em>
        <i>Cock-el-i-coo</i>
        <i>Cock-el-i-coo</i>
        <i>chuck-a-chuck, chuck-a-chuck, chuck-a-chuck</i>
        <i>chug-a-chug, chug-a-chug</i>
        <i>chick-a-chick, chick-a-chick, chick-a-chick</i>

vr8hub · 2024-05-10T21:47:37Z

So, I would recommend making it two consecutive single characters, e.g. -[A-Za-z]-[A-Za-z]-. Whether the italics/emphasis test is removed is up to you; it definitely catches things that aren't caught now, and adds some false positives (but less of the latter than the former). Here is what the test without italics/emphasis is catching now ("/html/body//*[not(@epub:type) and not(@xml:lang) and re:test(., '-[A-Za-z]-[A-Za-z]-') and string-length(.) < 50]", so you can see for yourself.

/Users/vrice/src/SE/ebooks/aleksandr-kuprin_short-fiction_s-koteliansky_j-m-murry_stephen-graham_rosa-savory-graham_leo-pasvolsky_douglas-ashby_the-living-age_b-guilbert-guerney_alexander-gagarine_malcolm-w-davis/src/epub/text/the-white-poodle.xhtml
        <i>Ai-yai-yai-ya-a-a-a!</i>
        <i>U-u-u-ukh!</i>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-8.xhtml
        <span>“They fe-e-e-l the tru-u-u-u-uth!</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-9.xhtml
        <span>Bro-o-o-o-o-oad land of plenty.”</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-10.xhtml
        <p>
            <span>“They fe-e-e-e-l the tru-u-u-u-uth!”</span>
        </p>
        <span>“They fe-e-e-e-l the tru-u-u-u-uth!”</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-1-12.xhtml
        <span>“They fe-e-e-el the tru-u-u-uth,</span>
/Users/vrice/src/SE/ebooks/aleksandr-kuprin_yama_bernard-guilbert-guerney/src/epub/text/chapter-3-4.xhtml
        <i>Ba-a-a-st-a-a!</i>
/Users/vrice/src/SE/ebooks/anthony-trollope_barchester-towers/src/epub/text/chapter-15.xhtml
        <p>“Whew-w-w-w!” whistled Bertie, “a widow!”</p>
/Users/vrice/src/SE/ebooks/anthony-trollope_barchester-towers/src/epub/text/chapter-37.xhtml
        <p>“Oh-h-h-h!” exclaimed the countess.</p>
/Users/vrice/src/SE/ebooks/charles-dickens_a-christmas-carol/src/epub/text/chapter-3.xhtml
        <p>“It’s your uncle Scro-o-o-o-oge.”</p>
/Users/vrice/src/SE/ebooks/compton-mackenzie_sinister-street/src/epub/text/chapter-4-3.xhtml
        <p>“B-a-r-n-e-s. Have you got it?”</p>
/Users/vrice/src/SE/ebooks/d-h-lawrence_sons-and-lovers/src/epub/text/chapter-8.xhtml
        <i>T-t-t-t!</i>
        <p>“<i>T-t-t-t-t!</i>” went her tongue.</p>
        <i>T-t-t-t-t!</i>
/Users/vrice/src/SE/ebooks/d-h-lawrence_sons-and-lovers/src/epub/text/chapter-12.xhtml
        <i>T-t-t-t!</i>
/Users/vrice/src/SE/ebooks/d-h-lawrence_sons-and-lovers/src/epub/text/chapter-14.xhtml
        <i>ah-h-h-h-h!</i>
/Users/vrice/src/SE/ebooks/e-e-cummings_the-enormous-room/src/epub/text/chapter-10.xhtml
        <p>“<i xml:lang="fr">Ah-h-h-h-h-h-h. …</i>”</p>
/Users/vrice/src/SE/ebooks/e-e-smith_first-lensman/src/epub/text/chapter-13.xhtml
        <em>P-s-s-t! Jill!</em>
/Users/vrice/src/SE/ebooks/e-e-smith_triplanetary/src/epub/text/chapter-2.xhtml
        <i>Z-r-r-e-e-k—<b>whap</b>!</i>
        <i>w-h-i-n-g-e-d</i>
/Users/vrice/src/SE/ebooks/e-e-smith_triplanetary/src/epub/text/chapter-7.xhtml
        <i>p-s-s-t</i>
/Users/vrice/src/SE/ebooks/edgar-lee-masters_spoon-river-anthology/src/epub/text/epilogue.xhtml
        <span>Who-o-o-o-o-o!</span>
/Users/vrice/src/SE/ebooks/edgar-wallace_blue-hand/src/epub/text/chapter-49.xhtml
        <p>“<i epub:type="z3998:grapheme">P</i>-<i epub:type="z3998:grapheme">e</i>-<i epub:type="z3998:grapheme">a</i>-<i
epub:type="z3998:grapheme">l</i>-<i epub:type="z3998:grapheme">i</i>-<i epub:type="z3998:grapheme">g</i>-<i
epub:type="z3998:grapheme">o</i>,” was the reply.</p>
/Users/vrice/src/SE/ebooks/edna-ferber_so-big/src/epub/text/chapter-1.xhtml
        <em>So-o-o-o</em>
/Users/vrice/src/SE/ebooks/f-scott-fitzgerald_the-great-gatsby/src/epub/text/chapter-7.xhtml
        <p>“No, <i epub:type="z3998:grapheme">r</i>—” corrected the man, “<i epub:type="z3998:grapheme">M</i>-<i
epub:type="z3998:grapheme">a</i>-<i epub:type="z3998:grapheme">v</i>-<i epub:type="z3998:grapheme">r</i>-<i
epub:type="z3998:grapheme">o</i>—”</p>
/Users/vrice/src/SE/ebooks/f-scott-fitzgerald_this-side-of-paradise/src/epub/text/chapter-1-2.xhtml
        <span>“Oh-h-h-h-h</span>
        <span>Oh-h-h-h!”</span>
/Users/vrice/src/SE/ebooks/ford-madox-ford_some-do-not/src/epub/text/chapter-2-4.xhtml
        <i>C-r-r-unch!</i>
/Users/vrice/src/SE/ebooks/fyodor-sologub_the-little-demon_john-cournos_richard-aldington/src/epub/text/chapter-14.xhtml
        <span>“O-o-oh; it’s a-rai-ai-ning ha-a-a-rd on me-e-e!”</span>
/Users/vrice/src/SE/ebooks/h-g-wells_short-fiction/src/epub/text/a-vision-of-judgment.xhtml
        <p>Bru-a-a-a.</p>
/Users/vrice/src/SE/ebooks/h-g-wells_tono-bungay/src/epub/text/chapter-1-1.xhtml
        <p>“Beeee-e-e-a-trice!” fearfully close.</p>
/Users/vrice/src/SE/ebooks/henry-david-thoreau_essays/src/epub/text/chesuncook.xhtml
        <i>oo-o-o-o-o-o-o-o</i>
/Users/vrice/src/SE/ebooks/henry-david-thoreau_walden/src/epub/text/sounds.xhtml
        <i>Oh-o-o-o-o that I never had been bor-r-r-r-n!</i>
        <i>that I never had been bor-r-r-r-n</i>
        <i>bor-r-r-r-n!</i>
        <i>tr-r-r-oonk, tr-r-r—oonk, tr-r-r-oonk!</i>
        <i>tr-r-r-oonk</i>
/Users/vrice/src/SE/ebooks/henry-lawson_while-the-billy-boils/src/epub/text/a-day-on-a-selection.xhtml
        <p>“T-o-o-m-<em>may</em>!”</p>
        <p>“<em>Tom</em>-m-a-a-y!”</p>
        <p>“Y-e-e-a-a-s-s!” very passionately and shrilly.</p>
        <p>“Y-e-e-a-a-s-s-s!—carn’t yer see I’m comin’?”</p>
/Users/vrice/src/SE/ebooks/honore-de-balzac_father-goriot_ellen-marriage/src/epub/text/father-goriot.xhtml
        <p>“Poir-r-r-rette! she had you there!”</p>
        <i>Ca-ro, ca-a-ro, ca-a-a-ro, non du-bi-ta-re</i>
/Users/vrice/src/SE/ebooks/horatio-alger-jr_ragged-dick/src/epub/text/chapter-16.xhtml
        <p>“<i epub:type="z3998:grapheme">T</i>-<i epub:type="z3998:grapheme">h</i>-<i epub:type="z3998:grapheme">r</i>-<i
epub:type="z3998:grapheme">u</i>,” said Dick.</p>
/Users/vrice/src/SE/ebooks/jack-london_the-sea-wolf/src/epub/text/chapter-38.xhtml
        <p>“B-O-S-H.”</p>
/Users/vrice/src/SE/ebooks/joseph-furphy_such-is-life/src/epub/text/chapter-4.xhtml
        <p>“Ha-a-a-ay!”</p>
        <p>“Ha-a-a-a-ay!”</p>
        <p>“Ha-a-a-a-a-ay!”</p>
        <p>“H a-a—a-a-a-a-a y!”</p>
        <p>“Ha-a-a-a-a-a-ay!”</p>
        <p>“Ha-a-a-a-a-a-a-a-ay!”</p>
        <p>“Ha-a-a-a-a-a-a-a-a-ay!”</p>
/Users/vrice/src/SE/ebooks/leonid-andreyev_short-fiction_herman-bernstein_alexandra-linden_l-a-magnus_k-walter_w-h-lowe_the-russian-
review_archibald-j-wolfe_john-cournos_r-s-townsend_maurice-magnus/src/epub/text/his-excellency-the-governor.xhtml
        <p>“R-r-r-r-apscallion!”</p>
/Users/vrice/src/SE/ebooks/margaret-wilson_the-able-mclaughlins/src/epub/text/chapter-19.xhtml
        <p>“Pr-r-r-r-r!” he articulated proudly. “Pr-r-r!”</p>
/Users/vrice/src/SE/ebooks/marie-belloc-lowndes_the-lodger/src/epub/text/chapter-3.xhtml
        <p>“No,” she shot out, “<i epub:type="z3998:grapheme">S</i>-<i epub:type="z3998:grapheme">l</i>-<i
epub:type="z3998:grapheme">e</i>-<i epub:type="z3998:grapheme">u</i>-<i epub:type="z3998:grapheme">t</i>-<i
epub:type="z3998:grapheme">h</i>.”</p>
/Users/vrice/src/SE/ebooks/mark-twain_a-connecticut-yankee-in-king-arthurs-court/src/epub/text/chapter-15.xhtml
        <em>s-a-n-d</em>
/Users/vrice/src/SE/ebooks/mark-twain_roughing-it/src/epub/text/chapter-51.xhtml
        <span class="i1">“Labbord!—stabbord!—s-t-e-a-d-y!—so!—</span>
        <span class="i1">Three feet large!—t-h-r-e-e feet!—</span>
/Users/vrice/src/SE/ebooks/mark-twain_the-adventures-of-huckleberry-finn/src/epub/text/chapter-17.xhtml
        <p>“G-e-o-r-g-e J-a-x-o-n—there now,” he says.</p>
/Users/vrice/src/SE/ebooks/mark-twain_the-adventures-of-tom-sawyer/src/epub/text/chapter-1.xhtml
        <p>“Y-o-u-u <strong>tom</strong>!”</p>
/Users/vrice/src/SE/ebooks/mark-twain_the-adventures-of-tom-sawyer/src/epub/text/chapter-13.xhtml
        <p>“Steady, steady-y-y-y!”</p>
/Users/vrice/src/SE/ebooks/martin-andersen-nexo_pelle-the-conqueror_jessie-muir_bernard-miall/src/epub/text/chapter-1-17.xhtml
        <i>Whe-e-e-e-ew!</i>
/Users/vrice/src/SE/ebooks/nikolai-gogol_dead-souls_c-j-hogarth/src/epub/text/chapter-1-2.xhtml
        <i>tur-r-r-ru-ing</i>
/Users/vrice/src/SE/ebooks/nikolai-gogol_dead-souls_d-j-hogarth/src/epub/text/chapter-1-2.xhtml
        <i>tur-r-r-ru-ing</i>
/Users/vrice/src/SE/ebooks/o-henry_short-fiction/src/epub/text/a-harlem-tragedy.xhtml
        <p>“M-m-m-yep,” grunted <abbr epub:type="z3998:name-title">Mr.</abbr> Fink.</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_golf-stories/src/epub/text/chester-forgets-himself.xhtml
        <p>“D-d-d-dear me!” said Chester.</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_indiscretions-of-archie/src/epub/text/chapter-3.xhtml
        <p>“It’s spelt M-o-f-f-a-m, but pronounced Moom.”</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_mr-mulliner-stories/src/epub/text/the-truth-about-george.xhtml
        <p>“N-n-n-n-n-n—” he said.</p>
        <p>“N-n-n-n-n-n-ice d-d-d-d—”</p>
        <p>“I-I-I-I-I-I-I—” said George.</p>
        <p>“S-s-s-s-s-s-s-s—?” said George, puzzled.</p>
        <p>“W-w-w-w-w—?” asked George.</p>
        <p>“N-n-n-n-nice weather,” he said.</p>
        <p>“L-l-l-l-larks?”</p>
/Users/vrice/src/SE/ebooks/p-g-wodehouse_the-pothunters/src/epub/text/chapter-6.xhtml
        <span>We-e dress him up in uniform so ne-e-e-at.”</span>
/Users/vrice/src/SE/ebooks/richard-marsh_the-beetle/src/epub/text/chapter-21.xhtml
        <p>“S-stay, y-y-y-you—” he stuttered.</p>
/Users/vrice/src/SE/ebooks/richmal-crompton_just-william/src/epub/text/chapter-7.xhtml
        <p>“Gr-r-r-r-r!”</p>
/Users/vrice/src/SE/ebooks/robert-frost_new-hampshire/src/epub/text/new-hampshire.xhtml
        <span>You tell her that it’s M-A-P-L-E.</span>
/Users/vrice/src/SE/ebooks/sigfrid-siwertz_downstream_e-classen/src/epub/text/chapter-1-7.xhtml
        <i>I-i-i-i!</i>
/Users/vrice/src/SE/ebooks/sigfrid-siwertz_downstream_macmillan-of-canada/src/epub/text/chapter-1-7.xhtml
        <i>I-i-i-i!</i>
/Users/vrice/src/SE/ebooks/stanley-g-weinbaum_short-fiction/src/epub/text/the-ideal.xhtml
        <p>“O-o-o-o-o-oh!” I groaned.</p>
/Users/vrice/src/SE/ebooks/stanley-g-weinbaum_short-fiction/src/epub/text/the-point-of-view.xhtml
        <p>“O-o-o-h!” I gasped.</p>
/Users/vrice/src/SE/ebooks/theodore-dreiser_an-american-tragedy/src/epub/text/chapter-2-47.xhtml
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
        <p>
            <i>Kit, kit, kit, Ca-a-a-ah!</i>
        </p>
        <i>Kit, kit, kit, Ca-a-a-ah!</i>
/Users/vrice/src/SE/ebooks/theodore-dreiser_the-financier/src/epub/text/chapter-25.xhtml
        <p>Again that “H-a-a-a-w!”</p>
        <p>“H-a-a-a-w!”</p>
/Users/vrice/src/SE/ebooks/theodore-roosevelt_an-autobiography/src/epub/text/chapter-3.xhtml
        <p><b epub:type="z3998:persona">Brogan.</b> Misther Clu-r-r-k!</p>
/Users/vrice/src/SE/ebooks/voltairine-de-cleyre_poetry/src/epub/text/poetry.xhtml
        <p>
            <span class="i3">Br-r-r-r-r-r-r-r-r-f-f-f-f-f!!!</span>
        </p>
        <span class="i3">Br-r-r-r-r-r-r-r-r-f-f-f-f-f!!!</span>
/Users/vrice/src/SE/ebooks/w-e-b-du-bois_darkwater/src/epub/text/colophon.xhtml
        <a href="https://standardebooks.org/ebooks/w-e-b-du-bois/darkwater">standardebooks.org/ebooks/w-e-b-du-bois/darkwater</a>
/Users/vrice/src/SE/ebooks/walter-noble-burns_tombstone/src/epub/text/chapter-5.xhtml
        <p>“<i>Whee-e-e-e-e!</i>”</p>
        <i>Whee-e-e-e-e!</i>
/Users/vrice/src/SE/ebooks/wilkie-collins_no-name/src/epub/text/chapter-13-1.xhtml
        <span>“His form was of the manliest beau-u-u-uty,</span>
        <span>But now he’s gone alo-o-o-o-oft—</span>
        <span>But now he’s go-o-o-one aloft!”</span>

Fix typo in t-074

a9a9bd4

acabal closed this May 10, 2024

vr8hub deleted the fix_t074 branch May 10, 2024 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix typo in t-074 #688

Fix typo in t-074 #688

vr8hub commented May 10, 2024

acabal commented May 10, 2024

vr8hub commented May 10, 2024

acabal commented May 10, 2024

vr8hub commented May 10, 2024

vr8hub commented May 10, 2024

vr8hub commented May 10, 2024

Fix typo in t-074 #688

Fix typo in t-074 #688

Conversation

vr8hub commented May 10, 2024

acabal commented May 10, 2024

vr8hub commented May 10, 2024

acabal commented May 10, 2024

vr8hub commented May 10, 2024

vr8hub commented May 10, 2024

vr8hub commented May 10, 2024