Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data::Text can kill ged2site? ("attempt to add consecutive punctuation") #112

Closed
jhannah opened this issue Jun 5, 2024 · 20 comments
Closed
Assignees

Comments

@jhannah
Copy link

jhannah commented Jun 5, 2024

Huh. Weird.

✗ ./ged2site -cFdh 'Jay Weston Hannah' -l ~/Desktop/jay_new.ged
...
Data::Text: attempt to add consecutive punctuation           ]  36% [1529/4182]
	Current = 'The twin brother of Jane E. and the 2nd of 3 children of <a href="?page=people&entry=I2599">James Hamilton</a> and <a href="?page=people&entry=I2600">Myrtle Combs</a>, <b>Harry</b>is  the third cousin once-removed on the father's side of <a href="?page=people&home=1">Jay Hannah</a> and was born on Jul 25, 1933 along with his twin sister Jane E.' added at 3510 of ./ged2site
	Append = '.' at ./ged2site line 3510.
- program exits -

Looks like this is a "feature" of Data::Text?

$ perl -MData::Text -e 'my $t1 = Data::Text->new("Jane E.")->append(".")'
Data::Text: attempt to add consecutive punctuation
	Current = 'Jane E.' added at 70 of /Users/jhannah/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/Data/Text.pm
	Append = '.' at -e line 1.

Which kills ged2site given some input data? uhh... seems like a bad feature?

@nigelhorne
Copy link
Owner

That feature is deliberate and correct, it catches bugs in my code which creates consecutive punctuation. I'll look and see if I can track what's happening.

@jhannah
Copy link
Author

jhannah commented Jun 5, 2024

👍 Jane works fine, but if the phrase happens to end with a period (e.g. Jane E.) it explodes.

If I hack it to this:

$bio_dt->append(conjunction(map { my $tmp = $_->as_string(); $tmp =~ s/\.+$//; return $tmp } @phrases))->append('.');

The program keeps going past that error. The output is:

$ ack 'The twin brother of Jane E. and the 2nd of 3 children' static-site
static-site/Harry-Hamilton-1933.html
<p>The twin brother of Jane E. and the 2nd of 3 children of <a href="James-Hamilton-1901.html">James Hamilton</a> and <a href="Myrtle-Combs-1898.html">Myrtle Combs</a>, <b>Harry</b> is the third cousin once-removed on the father's side of <a href="Jay-Hannah-1975.html">Jay Hannah</a>

Which I assume is correct. :)

@jhannah
Copy link
Author

jhannah commented Jun 7, 2024

Apparently still broken at ae15cb1. It now doesn't reach Jane 36% [1529/4182], it dies on Jennifer 6% [ 291/4176]. 😄

$ ./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged
...
Data::Text: attempt to add consecutive punctuation           ]   6% [ 291/4176]
	Current = 'The husband of <a href="?page=people&entry=I2836">Linda Lee (Perry) Franzman</a> (the third cousin on the father's side of <a href="?page=people&home=1">Jay Hannah</a>) and had 2 children, Jeffrey A. and Jennifer L.' added at 7519 of ./ged2site
	Append = '.' at ./ged2site line 7519.

Here are my ugly hacks to get it to not explode: https://github.com/nigelhorne/ged2site/compare/master...jhannah:ged2site:112-hack?expand=1

@nigelhorne
Copy link
Owner

nigelhorne commented Jun 8, 2024

Though your hack will certainly work, it's fixing the symptom rather than the problem. If possible (and sometimes it isn't) I'd rather find out why two full stops are being added than add both and then take one away. So I'm going to see if I can reproduce what you're seeing then I'll be in a place where I can fix it.

@jhannah
Copy link
Author

jhannah commented Jun 8, 2024

FYI, at master 7ad9ccb the error has now moved to line 7533 (used to be 7519).

Data::Text: attempt to add consecutive punctuation           ]   6% [ 291/4176]
	Current = 'The husband of <a href="?page=people&entry=I2836">Linda Lee (Perry) Franzman</a> (the third cousin on the father's side of <a href="?page=people&home=1">Jay Hannah</a>) and had 2 children, Jeffrey A. and Jennifer L.' added at 7533 of ./ged2site
	Append = '.' at ./ged2site line 7533.
./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged  100.86s user 19.81s system 94% cpu 2:08.17 total

@nigelhorne
Copy link
Owner

I've been able to reproduce this with a test gedcom that I have.

@jhannah
Copy link
Author

jhannah commented Jun 13, 2024

FYI at master 85763da

./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged
...
[====   7589 of ./ged2site                                   ]   7% [ 316/4176]
	1085 of ./ged2site
BUG: string not set at ./ged2site line 13904, <GEN0> line 2582.
-- program exits --

nigelhorne added a commit that referenced this issue Jun 13, 2024
@jhannah
Copy link
Author

jhannah commented Jun 13, 2024

FYI at master f5bcc06

Data::Text: attempt to add consecutive punctuation           ]  29% [1230/4176]
	Current = 'The child of <a href="?page=people&entry=I1140">Rolland Franzman</a> and <a href="?page=people&entry=I2782">Martha Smallberger</a>, <b>Richard</b>was  the second cousin once-removed on the father's side of <a href="?page=people&home=1">Jay Hannah</a> Hewas married twice (to <a href="?page=people&entry=I2830">Isabelle Ware</a> (possibly not married to her) and <a href="?page=people&entry=I2835">Delores Linder</a> (possibly not married to her)). He had 5 children: Linda Lee (Perry), Cheryl Lee (Perry), Richard Harlo , Jr. and Thomas Alan with Delores Irene; and Martha Ann with Isabelle C.' added at 7572 of ./ged2site
	Append = '.' at ./ged2site line 7572.
./ged2site -cFdh 'Jay Weston Hannah' -l ~/src/private/genealogy/jay.ged  473.44s user 81.62s system 94% cpu 9:49.05 total

@jhannah
Copy link
Author

jhannah commented Jun 13, 2024

Recapping my branches in my fork:

  • 112-hacks-round2 seems to run to completion on my big GEDCOM. It's a 1-liner hack.
  • 115-bday-of-living-is-private does my custom birthday behavior.
  • 112-and-115 is a merge of the above two.

@nigelhorne
Copy link
Owner

Recapping my branches in my fork:

  • 112-hacks-round2 seems to run to completion on my big GEDCOM. It's a 1-liner hack.
  • 115-bday-of-living-is-private does my custom birthday behavior.
  • 112-and-115 is a merge of the above two.

Could you generate a context diff, please? I'll take a look.

@jhannah
Copy link
Author

jhannah commented Jun 13, 2024

112-hacks-round2 is just this one line change:

diff --git a/ged2site b/ged2site
index 91130b66..afa8e550 100755
--- a/ged2site
+++ b/ged2site
@@ -7569,7 +7569,7 @@ sub print_person
                push @phrases, $phrase;
        }
        if(scalar(@phrases)) {
-               $bio_dt->append(conjunction(map { $_->as_string() } @phrases))->append('.');
+               $bio_dt->append(conjunction(map { my $tmp = $_->as_string(); $tmp =~ s/\.+$//; $tmp } @phrases))->append('.');
                $phrase = undef;
                @phrases = ();
        }

@jhannah
Copy link
Author

jhannah commented Jun 13, 2024

FYI hanging off of master e01ee88 I'm getting this:

./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged
-- runs for 23m, then dies: --
Data::Text: attempt to add consecutive punctuation==         ]  86% [3596/4176]
	Current = 'The 11th of 12 children of <a href="?page=people&entry=I125">William Stark</a> and <a href="?page=people&entry=I126">Rebecca Ragsdale</a>, <b>Matilda</b>was  the four times great-aunt of <a href="?page=people&home=1">Jay Hannah</a> and was born in Parke Co., Indiana on Nov 9, 1846.<p>She died on Feb 18, 1849 in Parke Co.' added at 7573 of ./ged2site
	Append = '.' at ./ged2site line 7573.

With my hacks in place, here's my new site (work in progress) ❤️ http://jays.net/genealogy/static-site/I1265.html

@jhannah jhannah mentioned this issue Jun 14, 2024
@jhannah
Copy link
Author

jhannah commented Jun 14, 2024

FYI on master df7ab4a I'm now hitting this:

Data::Text: attempt to add consecutive punctuation           ]  10% [ 444/4176]
	Current = 'The 4th of 9 children of <a href="?page=people&entry=I1011">Robert Bunker</a> and <a href="?page=people&entry=I1012">Mira Dillingham</a>, <b>Jonathan</b>was  the three times great-uncle of <a href="?page=people&home=1">Jay Hannah</a>, was born in Randolph Co., Indiana on Dec 3, 1838 and was married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)). He had 8 children: infant son, Mattie and Robert Henry with Mary Jane; and Katie Ann, George William, James Warren, Laura B and Julia with Julia Ann.<p>He died on May 15, 1889 in Jackson Twp.' added at 7584 of ./ged2site
	Append = '.' at ./ged2site line 7584.
./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged  130.93s user 30.80s system 90% cpu 2:59.25 total

@nigelhorne
Copy link
Owner

Is this still happening with your gedcom file?

@jhannah
Copy link
Author

jhannah commented Jun 20, 2024

On master 3f471ce looks like the same error, it just moved to line 7571 now.

✗ time ./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged
Data::Text: attempt to add consecutive punctuation           ]  10% [ 444/4180]
	Current = 'The 4th of 9 children of <a href="?page=people&entry=I1011">Robert Bunker</a> and <a href="?page=people&entry=I1012">Mira Dillingham</a>, <b>Jonathan</b>was  the three times great-uncle of <a href="?page=people&home=1">Jay Hannah</a>, was born in Randolph Co., Indiana on Dec 3, 1838 and was married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)). He had 8 children: Katie Ann, George William, James Warren, Laura B and Julia with Julia Ann; and infant son, Mattie and Robert Henry with Mary Jane.<p>He died on May 15, 1889 in Jackson Twp.' added at 7571 of ./ged2site
	Append = '.' at ./ged2site line 7571.
./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged  142.21s user 29.65s system 90% cpu 3:10.76 total

(I've been using my branch 112-hacks-round2 to get around this one.)

@nigelhorne
Copy link
Owner

Do you have a Gedcom snippet that I can use to reproduce that?

@jhannah
Copy link
Author

jhannah commented Jun 20, 2024

Looks like I can recreate it with that dude and his 2 wives:

0 HEAD
1 SOUR GEDitCOM II
0 @I1703@ INDI
1 NAME Mary Jane /Seaton/
1 SEX F
1 BIRT
2 DATE 28 DEC 1852
2 PLAC nr. Cincinnati, Ohio
1 DEAT
2 DATE 31 AUG 1924
2 PLAC Arkansas City, Cowley Co., Kansas
1 NOTE @X269@
1 FAMS @F682@
0 @I1001@ INDI
1 NAME Julia Ann /Collins/
1 SEX F
1 BIRT
2 DATE 18 SEP 1847
2 PLAC Ohio Co. Indiana
1 DEAT
2 DATE 4 OCT 1875
2 PLAC Jackson Twp., Henry Co., IA
1 FAMS @F378@
0 @I1600@ INDI
1 NAME Jonathan Smith /Bunker/
1 SEX M
1 BIRT
2 DATE 3 DEC 1838
2 PLAC Randolph Co., Indiana
1 DEAT
2 DATE 15 MAY 1889
2 PLAC Jackson Twp., Henry Co., IA
1 FAMS @F378@
1 FAMS @F682@
0 @F682@ FAM
1 HUSB @I1600@
1 WIFE @I1703@
1 MARR
2 DATE 29 FEB 1880
2 PLAC Jackson Twp., Henry Co., IA
0 @F378@ FAM
1 HUSB @I1600@
1 WIFE @I1001@
1 MARR
2 DATE 24 MAR 1867
2 PLAC Henry Co., Iowa, USA

./ged2site -cFdlh 'Jonathan Smith Bunker' ~/src/private/genealogy/jay_small.ged
Data::Text: attempt to add consecutive punctuation===========] 100% [3/3]
	Current = 'was born in Randolph Co., Indiana on Dec 3, 1838. Hewas married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)).<p>He died on May 15, 1889 in Jackson Twp.' added at 7571 of ./ged2site
	Append = '.' at ./ged2site line 7571.

@nigelhorne
Copy link
Owner

Confirmed. That helps, thanks.

$ GMAP_KEY= perl -MDevel::Hide=Geo::libpostal ./ged2site -cFd ~/gedcoms/issue_121.1.ged 
Devel::Hide hides Geo/libpostal.pm
Data::Text: attempt to add consecutive punctuation===========] 100% [3/3]
	Current = '<b>Jonathan Bunker</b> was born in Randolph Co., Indiana on Dec 3, 1838. Hewas married twice (to <a href="?page=people&entry=I1001">Julia Collins</a> (on Mar 24, 1867 in Henry Co., Iowa, USA) and <a href="?page=people&entry=I1703">Mary Seaton</a> (on Feb 29, 1880 in Jackson Twp., Henry Co., IA, following the death of Julia Ann on Oct 4, 1875)).<p>He died on May 15, 1889 in Jackson Twp.' added at 7578 of ./ged2site
	Append = '.' at ./ged2site line 7578.

@jhannah
Copy link
Author

jhannah commented Jun 21, 2024

You could throw all your publically sharable ~/gedcoms/ (like that one) into t/gedcoms/ and we could have a t/no_die.t that asserts that ged2site can process all of those without dying? The more tests the better. :)

@jhannah
Copy link
Author

jhannah commented Jun 21, 2024

master 839a2e4 Woot! 🎉

./ged2site -cFdlh 'Jay Weston Hannah' ~/src/private/genealogy/jay.ged  1507.94s user 299.94s system 92% cpu 32:26.70 total
  • You deprecated my 112-hacks-round2
  • You deprecated my 123-utf-8. Or the ghosts moved on, or my karma improved, or whatever.
  • My 115-bday-of-living-is-private remains.
  • My fork of Gedcom.pm for INDI NOTE ref CONC CONT -> <p> #121 remains.

@jhannah jhannah closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants