-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix typo in t-074 #688
Fix typo in t-074 #688
Conversation
I think I just fixed this moments ago, I'm cleaning up the corpus right now and testing the new scansion test. |
I've updated the test to include the exclusions, which is what I'm doing with all of the other tests, to catch things like this. I notice the word in the "too long" exclusion is actually not a sound either (Well-a-day); if it wasn't too long, that would actually be a false positive. Do we want to look for two consecutive single characters between dashes? That seems more indicative of a sound. If so, I'll make the change in the PR with the test update. |
Sure, test it out to see how it might go. If it's a better test than you can update it. |
Well, it's obviously going to exclude single-letter ones, but it still catches plenty. And, just to see, I made another change for the test, to remove the requirements that it be in italics/emphasis. The rule for using non-break hyphens instead of dashes doesn't say anything about sounds, just stretched out words. In the following examples, some of them might should be tagged sounds and aren't, and some are just extended words.
And so forth. Plenty of other instances of actual words, not sounds, being extended, which as I understand it from the rule, should be NBH. It also catches spelled-out words where the letters aren't tagged as graphemes/phonemes, of which we also have quite a few. If you decide to keep it to just italics/emphasis, that is probably worth a test of its own. |
I did another xpath test, looking for instances of things that matched
|
So, I would recommend making it two consecutive single characters, e.g.
|
I was running a lint on the corpus to investigate something with the test I'm working on, and noticed that t-074 wasn't ignoring italics with a language tag like it was supposed to. I looked up what the
@
did in xpath, and then added it and tested some of the ones that were failing before, and they don't now, so I'm hopeful this is the correct fix.Do you want help fixing the t-074's in the corpus?