-
Notifications
You must be signed in to change notification settings - Fork 1
/
chr.diff
1382 lines (1382 loc) · 143 KB
/
chr.diff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
272a273
> [Template merge - langs/und] The final mmove in the old svn infra: change the am-shared reference to point to giella-core parallel to the language dir. After this we can remove am-shared from each language. 2020-05-13T13:33:57+00:00
273a275
> [Template merge - langs/und] Fix mobile speller filename bug. 2020-05-12T17:03:11+00:00
274a277
> [Template merge - langs/und] Fix speller generation bug. 2020-05-09T11:18:14+00:00
275a279
> [Template merge - langs/und] Fix speller analyser reference after the flattening of the tools/spellcheckers/ dir. 2020-05-09T09:47:38+00:00
277a282,283
> [Template merge - langs/und] Final step in flattening the tools/spellcheckers/ dir tree: removing the whole fstbased/ dir, with all subdirs. Finally! 2020-05-09T05:02:11+00:00
> [Template merge - langs/und] Fix automakefile error: no final backslash followed by an empty line. 2020-05-08T20:42:17+00:00
278a285
> [Template merge - langs/und] Step eight in flattening the tools/spellcheckers/ dir tree: flipping the switch. All pieces are in place for building everything in tools/spellcheckers/ only, and everything has been tested with one language, including make check (a few tests are skipped because the fst is not found, but no tests break). The old files are kept for the moment, in case unseen issues and missing data is popping up after the switch, but will be deleted after verification. 2020-05-08T18:54:41+00:00
279a287
> [Template merge - langs/und] Step six in flattening the tools/spellcheckers/ dir tree: copying fstbased/mobile/hfst/index.xml to the new location. 2020-05-08T15:53:18+00:00
280a289
> [Template merge - langs/und] Step six in flattening the tools/spellcheckers/ dir tree: moving TAGWEIGHTS out of the language independent part to the language specific part, so that we can specify different tagweight files for desktop and mobile spellers. 2020-05-08T13:24:42+00:00
282a292,293
> [Template merge - langs/und] Step four in flattening the tools/spellcheckers/ dir tree: modifying another set of build files for the new dir structure, and the consequences of one dir for all speller files. 2020-05-08T09:18:28+00:00
> [Template merge - langs/und] Step four in flattening the tools/spellcheckers/ dir tree: copying all non-make files from spellcheckers/fstbased/desktop/hfst/ to spellcheckers/. 2020-05-07T19:19:47+00:00
283a295
> [Template merge - langs/und] Step three in flattening the tools/spellcheckers/ dir tree: changing the relocated build files to adapt to their new home. 2020-05-07T16:49:36+00:00
284a297
> [Template merge - langs/und] Step two in flattening the tools/spellcheckers/ dir tree: copying the desktop/weighting/ dir as the default one - for most languages the mobile/weighting/ dir is just a copy of the desktop one. 2020-05-07T06:29:17+00:00
285a299
> [Template merge - langs/und] Step one in flattening the tools/spellcheckers/ dir tree: copying all subdir Makefile.am files to *.mod-* files in the top spellcheckers dir, except from the weigthing dirs. 2020-05-06T12:22:05+00:00
287a302,303
> [Template merge - langs/und] Added .gitignore file, as a preparatory step. 2020-05-06T10:49:16+00:00
> [Template merge - langs/und] Forgot to remove the entries for configure.ac re listbased spellers. 2020-05-06T08:54:20+00:00
288a305
> [Template merge - langs/und] Removed all list-based spellcheckers. There has not been any serious work in that area since the move to the new infrastructure 8 years ago. If there is a future need, we have it all in the rev history, and removing it simplifies other operations. 2020-05-06T08:43:54+00:00
289a307
> [Template merge - langs/und] Moved the files in tools/data/ to tools/tokenisers/, and removed the dir tools/data/. Part of the tools dir cleanup. 2020-05-06T07:11:30+00:00
290a309
> [Template merge - langs/und] Commented out check for GTLANG_xxx variable, it is not used, and the check output is confusing to users. 2020-05-05T12:56:51+00:00
291a311
> [Template merge - langs/und] Added checks for giella-core and giella-shared, symlinking to them if found, checking out (svn) or cloning (git) if not. Also removed every single reference to __UND__, it is not needed, and will cause merge conflicts. 2020-05-05T11:42:46+00:00
294a315
> [Template merge - langs/und] The last hyphenation build fix: now also works with other than the default fst backend, e.g. with the foma backend. 2020-04-27T08:54:50+00:00
295a317
> [Template merge - langs/und] Removed a double target declaration, one from the old pattern-based build, and one from the fst build. It was a simple copy from fst to pattern, and is not needed anymore. 2020-04-27T08:04:16+00:00
296a319
> [Template merge - langs/und] Updated referenced filename. Old name was not found, and stopped all builds. 2020-04-26T16:15:33+00:00
297a321
> [Template merge - langs/und] Restored file that was accidentally deleted, also renamed it to the correct name after the dir reorg. 2020-04-26T09:06:39+00:00
298a323
> [Template merge - langs/und] One reference to an old filename corrected. Stopped all nightlies. 2020-04-25T21:28:14+00:00
299a325
> [Template merge - langs/und] Removing the last remnants of the old hyphenation directory structure. 2020-04-24T20:58:34+00:00
300a327
> [Template merge - langs/und] Moving the last files from patterns one dir up. 2020-04-24T20:01:16+00:00
301a329
> [Template merge - langs/und] Removed most of the old hyph files not needed anymore. 2020-04-24T17:39:35+00:00
302a331
> [Template merge - langs/und] Switched build to new, shallower build structure. The old files and dirs are still there, but not used. 2020-04-24T16:32:06+00:00
303a333
> [Template merge - langs/und] Forgot one file to be copied up one dir level, now done. 2020-04-24T14:02:29+00:00
304a335
> [Template merge - langs/und] Step one in flattening the tools/hyphenators/ dir tree: copying and renaming make files, copying the filter dir. The files are not yet connected. Also preparing new build instruction file. 2020-04-24T12:43:02+00:00
305a337
> [Template merge - langs/und] Added missing quote mark „ that caused unwanted behaviour in tokenisation. 2020-04-23T07:32:58+00:00
306a339
> [Template merge - langs/und] Updated references to dir names in giella-shared: requires new version of giella-common. Updated some test scripts to refer to the new dir names. 2020-04-23T06:57:49+00:00
308a342,343
> [Template merge - langs/und] The second big renaming: src/morphology/ -> src/fst/. All build, test and config files are updated. `make` and `make check` works for sma. 2020-04-22T15:49:16+00:00
> [Template merge - langs/und] Added dynamic construction of a regex of flag diacritics found in tokeniser fst's. The regex is used to ensure that flag diacritics are considered epsilons at token boundaries. Fixes a number of tokenisation bugs. 2020-04-22T09:31:30+00:00
309a345
> [Template merge - langs/und] A glaring miss stopped all nightly builds. Thanks to Tino for pointing out. 2020-04-22T05:45:52+00:00
311a348,349
> [Template merge - langs/und] Renamed src/syntax/ to src/cg3/, and updated all references to it. Part of the large restructuring, and a test case for more complex renaming. 2020-04-21T16:05:38+00:00
> [Template merge - langs/und] More cleanup after removing src/phonology/*: all references to it have been replacecd, and the file am-shared/src-phonology-dir-include.am has been removed. 2020-04-21T07:24:39+00:00
312a351
> [Template merge - langs/und] Forgot to remove src/phonology/Makefile from configure.ac. Duh. 2020-04-20T18:42:24+00:00
313a353
> Deleted src/phonology/ dir after all source files have been moved to src/morphology/. Some files have been renamed. All builds should continue to work as before. 2020-04-20T14:20:52+00:00
314a355
> [Template merge - langs/und] Changed documentation extraction & building to get the source doc in src/morphology/. 2020-04-20T12:09:24+00:00
315a357
> [Template merge - langs/und] The big switch: building phonology files are now changed from src/phonology/ to src/morphology. Documentation is still built in the old location, but will be moved separately due to higher conflict risk. 2020-04-20T11:39:13+00:00
316a359
> [Template merge - langs/und] Update phonology filename in src/morphology/Makefile.modifications-phon.am. 2020-04-20T07:47:35+00:00
318a362,363
> [Template merge - langs/und] Copy src/phonology/Makefile.am to src/morphology/Makefile.modifications-phon.am and src/phonology/xxx-phon.twolc to src/morphology/phonology.twolc as step one in moving the file. Then the build can switch, and finally, the old files can be deleted. 2020-04-18T15:45:08+00:00
> [Template merge - langs/und] Corrected copy-paste bug in the build steps for areal grammar checker analysers. The bug caused SMJ to fail. 2020-04-17T06:38:12+00:00
319a365
> [Template merge - langs/und] Fixed bug with multiple declarations of EXTRA_DIST and noinst_DATA in the previous template merge. 2020-04-17T06:16:29+00:00
320a367
> [Template merge - langs/und] Preparations for moving the phonology files inside morphology/ (later to be renamed fst/). 2020-04-17T06:02:44+00:00
322a370,371
> [Template merge - langs/und] Reorganised mt/apertium make files so that fixed content is in Makefile.am, and userj-editable content is in Makefile.modifications.am. 2020-04-07T13:19:31+00:00
> [Template merge - langs/und] Started splitting the local Makefile.am in two, by moving it to a new filename, and then create a new Makefile.am that just includes the moved one. In later commmits, some of the content can be moved from one file to the other. 2020-04-06T12:02:08+00:00
323a373
> [Template merge - langs/und] Fixed the remaining cases of improved upper-lower case configurable processing. Removed a variable from configure.ac with comments, turned out it wasn't needed. 2020-04-05T11:22:29+00:00
324a375
> [Template merge - langs/und] First step in fixing default case handling: downcasing of derived proper nouns can now be turned off for the standard fst's by changing a test in configure.ac. 2020-04-03T13:10:58+00:00
325a377
> [Template merge - langs/und] Fixed bug in phonology compilation when there are multiple phonology files: temporary files were deleted before being used due to name overlap. 2020-03-31T07:29:06+00:00
326a379
> [Template merge - langs/und] Added Automake variables to handle demanding or non-default uppercasing, or writing systems with no case distinction at all. 2020-03-30T13:48:43+00:00
328a382,383
> [Template merge - langs/und] Adding |{➤}|{•} to pmscript. 2019-12-16T08:21:40+00:00
> [Template merge - langs/und] Added ‹ and › to the list of possible punctuation marks in the tokenisers. 2019-11-15T12:38:01+00:00
329a385
> [Template merge - langs/und] Added Makefile setting for enabling swaps in error models (ie ab -> ba). Default is no (as this used not to work, and the existing error models are based on this fact). 2019-11-06T17:23:16+00:00
330a387
> [Template merge - langs/und] Replace UNDEFINED with __UNDEFINED__, so that text replacement can take place. 2019-10-24T15:58:00+00:00
332d388
< Updated ignore patterns. 2019-10-23T18:40:46+00:00
333a390
> [Template merge - langs/und] Forgot to add cgbased to the SUBDIRS variable in tools/mt/Makefile.am. 2019-10-22T08:38:49+00:00
334a392
> [Template merge - langs/und] Added basic support for CG-based machine translation. Ongoing work. 2019-10-22T07:38:18+00:00
335a394
> [Template merge - langs/und] Make sure some jspwiki header files for generated documentation are included in the distro. 2019-10-16T06:14:21+00:00
336a396
> [Template merge - langs/und] Made it possible to disable Forrest validation when Forrest is installed. This reduces build time and annoying warnings for people not working on the documentation. Default is still to do Forrest validation. 2019-10-14T11:00:25+00:00
337a398
> [Template merge - langs/und] Wrapped command line tools in double quotes, to protect against spaces in pathnames. Spaces will occur when building on Windows using Windows Subsystem for Linux, as locations such as 'Program Files' are included in the default search path. 2019-10-10T10:54:00+00:00
339,343c400
< ignore *.fomabin. 2019-10-08T06:35:05+00:00
< ign 2019-10-07T21:32:11+00:00
< ign 2019-10-07T21:15:15+00:00
< ign 2019-10-07T21:13:09+00:00
< Force unix line endings, to make sure it works ok also on the Windows subsystem for Linux. 2019-10-07T17:16:53+00:00
---
> [Template merge - langs/und] Improved build process for pattern hyphenators - now patgen config is done programmatically instead of interactively. The values are configured in the Makefile.am. 2019-10-03T05:57:00+00:00
344a402
> [Template merge - langs/und] Added script for testing tag coverage, made by Kevin, and originally for sme. 2019-09-17T08:43:16+00:00
345a404
> [Template merge - langs/und] Added support for multiple whitespace analysers. 2019-09-05T07:14:23+00:00
346a406
> [Template merge - langs/und] Added support for comments in error model text files. Added support for zipped but uncompressed files (required by divvunspell for now). 2019-09-05T04:09:35+00:00
347a408
> [Template merge - langs/und] Added simple shell script to easily run the grammar checker test tool, and considering build directories etc. 2019-08-09T12:17:42+00:00
348a410
> [Template merge - langs/und] Generate and compile the new filter for removing semantic tags in front of derivations. Require new version of the giella-core because of dependencies. 2019-06-14T11:12:43+00:00
349a412
> [Template merge - langs/und] Make sure all generated files have a suffix that will make them be ignored. Added comments to clarify. 2019-06-14T08:26:42+00:00
350a414
> [Template merge - langs/und] Børre updated the documentation url to point to giellalt.uit.no. 2019-06-14T07:26:15+00:00
352d415
< Updating svn ignores for tools/analysers/. 2019-06-14T06:38:51+00:00
354a418
> [Template merge - langs/und] Fixed stupid copy-paste error in the previous commit. Reorganised the code a bit to make a variable definition clearer and more logical. 2019-05-27T11:15:18+00:00
355a420
> [Template merge - langs/und] Make sure that the input to all variants of the mobile speller is weighted. 2019-05-27T07:21:09+00:00
357,358c422
< Updating svn ignores. 2019-05-24T09:55:04+00:00
< Updating svn ignores. 2019-05-24T09:44:55+00:00
---
> [Template merge - langs/und] Fixed fsttype mismatch error for filters when building mobile spellers, by building filters locally of the correct fst type, as we do for desktop spellers. 2019-05-24T09:29:23+00:00
365a430
> [Template merge - langs/und] Added UpCase function to the tokenisers, to handle all-upper variants of the input side. It does almost double the size of the fst, but at least it is just one additional line of code. Also, it does only work in Linux/using glib (for other platforms it is restricted to Latin1 - still, that covers a major portion of the Sámi fst's and running text, so much better than nothing). 2019-03-22T14:47:09+00:00
366a432
> [Template merge - langs/und] Ensure that the correct grammar checker pipeline is the default one, so that it will be executed when no pipeline is specified. 2019-03-13T08:52:36+00:00
367a434
> [Template merge - langs/und] Added the new multichar +Symbol to the multichar definitions. 2019-02-28T08:04:36+00:00
368a436
> [Template merge - langs/und] Changed sub-post tag for symbols from +ABBR to +Symbol. Needs to be declared as multichar in each language. 2019-02-27T13:33:56+00:00
370d437
< Updated svn ignores. 2019-02-27T10:18:02+00:00
371a439,440
> [Template merge - langs/und] Added support for shared Symbol file: build rules, affix file, modifications to root.lexc. Also increased required version of giella-common, to make sure that the shared stem file is actually there. 2019-02-26T13:38:54+00:00
> [Template merge - langs/und] Fixed dir name typo that broke compilation. 2019-02-25T18:21:29+00:00
372a442
> [Template merge - langs/und] Added support for building an analyser tool. This is in practice an xml-specified pipeline identical to what is used in the grammar checker, but where the pipeline does text analysis instead of grammar checking. Also made grammar checkers and mobile spellers part of the --enable-all-tools configuration. 2019-02-25T17:08:10+00:00
374a445
> [Template merge - langs/und] Added filter to remove the +MWE tag from the grammar checker generator. It blocked generation of some word forms (and should not be visible in any case). 2019-02-13T07:50:10+00:00
375a447
> [Template merge - langs/und] Fixed another case of transducer format mismatch for hyphenators, this time regarding pattern-based hyph building. 2019-01-25T08:54:49+00:00
376a449
> [Template merge - langs/und] Corrected an instance of transducer format mismatch when building hyphenators. 2019-01-25T08:11:34+00:00
378a452,453
> [Template merge - langs/und] Make the mobile keyboard layout error model work properly (ie on input longer than one char) by circumfixing it with any-stars. 2019-01-17T19:43:44+00:00
> [Template merge - langs/und] First round of improved handling of compilation errors in shell pipes: instruct make to delete targets when some of the intermediate steps fail. 2019-01-11T13:53:45+00:00
379a455
> [Template merge - langs/und] Added configure.ac conditional to control whether spellers for alternative orthographies are built. The default is 'true'. Set this to 'false' for historical or other orthographies for which a speller is not relevant. 2019-01-09T10:41:54+00:00
380a457
> [Template merge - langs/und] Fix broken hfst builds of xfscript files when there is no final newline in the source file (caused the save command to be shaddowed by the final line of text, usually a comment, so no file was saved, and thus there was nothing to work on for the next build step). 2019-01-09T09:00:58+00:00
381a459
> [Template merge - langs/und] Apply alternate orthography conversion after hyphenation marks have been removed, but before the morphology marks are deleted. Especially word boundaries are useful for certain types of conversion, but other borders will likely be useful as well. The conversion scripts need to take the border marks into consideration. 2019-01-08T09:00:11+00:00
383,384c461
< Ignore compiled cg3 files in tools/tokenisers/. 2019-01-08T07:08:34+00:00
< Ignore more files, including files that are automatically added to svn when populating a new language. This is done to avoid them showing up as noise for external languages, in which case these files might not be in our svn (but in the external svn repo instead). 2019-01-08T06:55:51+00:00
---
> [Template merge - langs/und] Replicate the desktop error model for the mobile speller, and generalise the corpus weighting compilation. Now the build code is ready for mobile speller release. 2018-12-17T17:51:19+00:00
385a463
> [Template merge - langs/und] Improved Easter egg generation, using the improved script in giella-core. Increased the required giella-core version correspondingly. 2018-12-14T09:21:49+00:00
386a465
> [Template merge - langs/und] Cleaned the HFST_MINIMIZE_SPELLER macro, and also its use. No need to include push weights anymore, it is done always, for all speller fst's. 2018-12-13T10:23:33+00:00
387a467
> [Template merge - langs/und] Push weights for all final fst's, + optimise error model. 2018-12-13T10:01:59+00:00
388a469
> [Template merge - langs/und] Changed how the att file is produced. From now on it should be built once, and then added to svn. The att file will usually not change, and storing it in svn will avoid rebuilding it every time. Also changed the compression. 2018-12-12T14:57:50+00:00
389a471
> [Template merge - langs/und] Added support for adapting the error model to the mobile keyboard layout for the language in question. 2018-12-11T14:28:15+00:00
390a473
> [Template merge - langs/und] Two more places to remove the Use/-GC and the MWE tags: mt and speller fst's. Now done. 2018-11-06T07:56:14+00:00
391a475
> [Template merge - langs/und] Had forgotten to remove the Use/-GC tag in the core fst's, only from all the others. Now fixed. 2018-11-05T15:58:02+00:00
392a477
> [Template merge - langs/und] Step 2 in blocking dynamic compounds of MWE tagged entries: moved all MWE tag processing away from the *-raw-* targets to the specific *.tmp targets. This way the MWE tags will survive long enough to be available for the blocking done in the tokeniser fst's. Tested in SME, and seems to work as intended. 2018-11-05T09:12:10+00:00
393a479
> [Template merge - langs/und] Added step 1 in blocking dynamic comounds between an MWE and another noun: added new filter that will turn the MWE tag into a flag diacritic. Increased required giella-common version number due to the new filter. 2018-11-02T11:31:21+00:00
394a481
> [Template merge - langs/und] Fixed bug when building the punctuation file - the required subdir was not made. 2018-10-24T08:39:53+00:00
396,397c483
< ignore for bin 2018-10-14T13:31:01+00:00
< added korp.cg3 to svn ignore. 2018-10-14T12:56:20+00:00
---
> [Template merge - langs/und] Moved the whitespace analyser almost to the beginning of the pipeline, directly after the tokeniser+analyser. This is to be able to support sentence boundary detection, as the whitespace analyser will give some valuable tags for that. 2018-10-12T14:09:24+00:00
398a485
> [Template merge - langs/und] Corrected typo in a configuration option - dekstop instead of desktop. Thanks to our friends in Nuuk for noticing. 2018-10-11T15:56:26+00:00
399a487
> [Template merge - langs/und] Corrected a misplaced dependency that caused url.hfst to be rebuilt on every make, and thus trigger other rebuilds. Not anymore. 2018-10-09T14:44:02+00:00
400a489
> [Template merge - langs/und] Moved whitespace tagging after the speller, to avoid that it creates trouble for the speller. That happens when whitespace error tags are applied to the word form that should be spell-checked. 2018-10-09T14:09:19+00:00
401a491
> [Template merge - langs/und] Made it possible to tag something as _only_ for the grammar checker, or _not_ for the grammar checker. Updated required giella-share version, due to new required filters. 2018-10-09T11:51:00+00:00
402a493
> [Template merge - langs/und] Moved whitespace chars to the blank regex, thereby reinstating the old compilation speed. Thanks to Kevin and Tino for noticing and suggesting the improvement. Also added comment to document what incondform is supposed to contain, again thanks to Kevin. 2018-10-09T10:08:39+00:00
403a495
> [Template merge - langs/und] Removed hyphen from the regular unknown alphabet, thereby reverting analysis of -foo as one (unknown) token, and instead back to two tokens. Added hyphen to alphamiddle, so that foo-bar will still be analysed as one big unknown token. 2018-10-09T08:59:36+00:00
404a497
> [Template merge - langs/und] Added the tokenisation disambigutation file to the compiled and installed targets. 2018-10-09T07:36:28+00:00
406a500,501
> [Template merge - langs/und] Better handling of unknowns: defined more whitespace characters, defined a lot more vowels in the alphabet, added recent improvements to flag diacritic like symbols at token boundaries. 2018-10-08T17:23:38+00:00
> [Template merge - langs/und] Fixed two build bugs: abbr.txt was only autogenerated when building with hfst, and the url.?fst file was not properly generated from url.tmp.?fst. 2018-10-04T11:05:54+00:00
407a503
> [Template merge - langs/und] Fixed bug in MT compilation - pattern rules are not used, but new filenames still had them due to copy-paste error. 2018-10-04T08:45:22+00:00
408a505
> [Template merge - langs/und] Added pmatch filtering also to MT and spellcheckers. Now all tools and fst's should be covered. 2018-10-04T08:00:51+00:00
409a507
> [Template merge - langs/und] Forgot to add pmatch filtering to the default targets in src/ - duh. Now done. 2018-10-04T07:38:41+00:00
410a509
> [Template merge - langs/und] Added pmatch filtering to the rest of the build targets in src/. Also added grammar checker filtering. 2018-10-03T10:52:14+00:00
411a511
> [Template merge - langs/und] Major reorganisation to properly handle pmatch preparations, by splitting the disamb-analyser compilation in two: one going to the regular disamb analyser, and the other going to the pmatch variant. We use the two tags +Use/PMatch and +Use/-Pmatch in complementary distribution to specify paths for each, one path containing pmatch backtracking poings (used with the --giella format of hfst-tokenise), and one without. The backtracking machinery is used to handle ambiguous tokenisation. Increased required version of giella-shared due to new, required filters. 2018-10-03T07:47:51+00:00
412a513
> [Template merge - langs/und] More improvements to the analysis regression check: undo space->underscore from lookup2cg (to avoid meaningless diffs when comparing to the new hfst-tokenise), and removed weight info. Also changed the dir ref for abbr.txt to ref the build dir, not the source dir, as that is where the file is generated. 2018-10-01T09:58:02+00:00
413a515
> [Template merge - langs/und] Improved regression check script: check that the abbr file is built, for improved traditional tokenisation; and make the patch command silent, for less noise during testing. 2018-09-29T12:13:46+00:00
414a517
> [Template merge - langs/und] Thanks to Børre, the analysis regression script will now remove diffs due to different handling of dynamic compounds when comparing old and new tokenisation. This makes it much easier to spot real differences between the two. 2018-09-25T10:18:52+00:00
415a519
> [Template merge - langs/und] Improved shell script for analysis regression testing, so that in cases of no diffs it will only print a short message and continue. The test for no diff is also much faster than a real diff. Improves processing time a lot for large test corpora. 2018-09-25T06:58:11+00:00
417,418d520
< svn ignore update 2018-09-20T08:44:05+00:00
< updated svn ignore. 2018-09-20T08:28:11+00:00
419a522,523
> [Template merge - langs/und] Moved punctuation definitions from each language to giella-shared/all_langs/. Makes much more sense, and will help in resolving random tokenisation bugs due to « and ». 2018-09-13T09:55:23+00:00
> [Template merge - langs/und] Implemented the option to compile phonology rules directly against the lexicon, for better rule compilation optimisations. Kevin: fixed a bug in xml generation for the grammar checker. 2018-09-11T07:40:05+00:00
420a525
> [Template merge - langs/und] Fixed hyphenation build when there is no phonology file. 2018-09-10T11:54:43+00:00
422c527
< More general ignore pattern for tools/mt/apertium/tagsets/. 2018-09-10T11:16:40+00:00
---
> [Template merge - langs/und] Corrected an error after the Hunspell config section was commented out. 2018-09-10T11:00:00+00:00
423a529
> [Template merge - langs/und] Added --enable-all-tools option to configure.ac, to allow for easier configuration and testing of all common tools. Unstable or experimental tools must still be explicitly enabled. Commented out the Hunspell speller config completely, it is not supported. Corrected a comment. 2018-09-10T10:36:30+00:00
425c531
< Updated svn ignore patterns. 2018-09-08T05:26:27+00:00
---
> [Template merge - langs/und] Improved and completed the code to skip building phonology fst's. Clearer logic and comments. 2018-09-08T05:01:12+00:00
427a534,535
> [Template merge - langs/und] Added a configure.ac setting to skip phonology compilation, typically used when compiling external sources, that provides a full analyser in src/morphology. Also added a configuration option to compile xfscript files with lexicon references in them, so allow for faster and more optimised rule composition. This variable has no effect yet, the rest of the machinery is missing. 2018-09-07T22:33:46+00:00
> [Template merge - langs/und] Remove all tmp files when cleaning. 2018-09-06T11:45:10+00:00
428a537
> [Template merge - langs/und] Remove also url.tmp.lexc when cleaning. 2018-09-06T11:39:28+00:00
429a539
> [Template merge - langs/und] Fixed bug: the url analyser is located elsewhere, and should not be processed here in any case. 2018-09-06T10:09:51+00:00
430a541
> [Template merge - langs/und] Made url analyser compilation open for local adaptations, by going via a tmp file. 2018-09-06T07:41:14+00:00
431a543
> [Template merge - langs/und] Remove also url.lexc when cleaning, it is copied from giella-shared. 2018-09-05T13:53:35+00:00
432a545
> [Template merge - langs/und] Corrected double installation of url analyser bug. It should not be installed at all. 2018-08-31T17:48:34+00:00
433a547
> [Template merge - langs/und] Add missing ‘|’ in analyser-gt-whitespace.hfst goal. 2018-08-31T11:04:43+00:00
435c549
< Updated svn ignores. 2018-08-30T16:00:09+00:00
---
> [Template merge - langs/und] Fixed a bug in the previous commit that surfaced when enabling tokenisers but not grammar checkers. 2018-08-30T14:09:30+00:00
436a551
> [Template merge - langs/und] Massive rewrite of filter codes and automatically generated tag conversions, all done to handle bug #2474 (URL tag not correctly formatted in the tokeniser output). The bug should be fixed now. 2018-08-30T12:52:08+00:00
438c553
< Updated svn ignores. 2018-08-29T05:25:34+00:00
---
> [Template merge - langs/und] Added filter dir and filter compilation to the fst-based hyphenators. Moved filter compilation from src/filters/ to the local filter dir (by copying the regex files and then compile them), to make the build process mostly fst format independent. 2018-08-28T11:48:14+00:00
440c555
< Updating svn ignores. 2018-08-28T10:47:06+00:00
---
> [Template merge - langs/und] Added support for local modifications of the hyphenator build via a tmp file. Simplified tmp file handling in the src/ dir. 2018-08-27T12:21:35+00:00
441a557
> [Template merge - langs/und] Added dir structure and Autotools data to prepare for adding hyphenation testing. 2018-08-27T11:43:54+00:00
442a559
> [Template merge - langs/und] Downcasing of derived proper nouns was only applied on the input side, not the hyphenated side. This caused such words to be case-shifted: arabialaččat -> A^ra^bi^a^lač^čat. This is now fixed. 2018-08-27T07:55:28+00:00
443a561
> [Template merge - langs/und] Fixed hyphenation bug where the lexicon-based hyphenator missed hyphenation points, mainly in propernouns, due to flag diacritics. Fixed by telling the fst compiler to treat flags as epsilons. Now the lexicon-based hyphenator is beating the plain rule-based one in most (all?) cases where there are differences. Must be tested better, though. 2018-08-27T06:22:37+00:00
444a563
> [Template merge - langs/und] Added comment to guide placement of local build targets (to avoid future merge conflicts), and a comment reminder about other places to change filenames. 2018-08-22T06:51:27+00:00
447a567,568
> [Template merge - langs/und] Reorganised the source filenames to make it easy to override when needed. Should make it possible to solve the bug where src/syntax/disambiguator.cg3 overrides the same file in tools/grammarcheckers/. 2018-08-20T17:16:38+00:00
> [Template merge - langs/und] Refactored repeating patterns of code with variables, fixes upload link after XServe crash last winter. 2018-08-20T10:02:08+00:00
448a570
> [Template merge - langs/und] Corrected and improved the compilation of the analysers including the URL analysis. This should fix the problem with compiling SMA and other languages, and should in general reduce both compilation time and analyser size. The basic change was to union in the URL analysis as the last step in building the analysers, instead of early - the early injection led to fst blowup during minimisation. Now no blowup appears to take place. 2018-06-05T12:25:17+00:00
449a572
> [Template merge - langs/und] Added the special target .NOTPARALLEL to the hfst speller make file, to work around a make bug that caused a prerequisite to not be built when invoking make with the -j option. Also added some comments. 2018-05-18T13:13:02+00:00
450a574
> [Template merge - langs/und] Updated command in comments to use the correct option. 2018-05-18T06:59:52+00:00
451a576
> [Template merge - langs/und] Reverted the more robust semantic tag reordering, it was just too slow. Now we are back to a less robust and more fragile system (including bugs), but with faster compilation. Ultimately we will abandon _semantic_ tag reordering altogether, and instead rewrite the lexc code to always place the semantic tags where they should be. 2018-05-16T09:10:08+00:00
452a578
> [Template merge - langs/und] Corrected automake (and make?) syntax error that broke compilation. 2018-05-15T11:15:12+00:00
453a580
> [Template merge - langs/und] Simplified semantic tag filtering regex construction. 2018-05-15T07:58:30+00:00
455c582
< More things to ignore. 2018-05-14T10:33:30+00:00
---
> [Template merge - langs/und] Too eager in the previous commit to get rid of semantic tag processing: removed the filter to zero out semantic tags completely, which broke compilation of a number of fst's where semantic tags are not wanted. 2018-05-09T08:15:36+00:00
456a584
> [Template merge - langs/und] Corrected bugs in reordering semantic tags by doing the reordering in two steps: 1) insert the tag in the new and correct position, and 2) remove the tag in the wrong position. There will probably be things to iron out, but initial tests are fine. This should also make the whole semantic tag reordering a bit faster to compile and apply, as the generated regexes are smaller and simpler. 2018-05-08T20:34:56+00:00
458a587,588
> [Template merge - langs/und] Now that the downcasing script works in all cases, remove all the special processing, and get rid of spurious rebuilds of the dependent fst's. Another time-saver:-) 2018-05-02T10:11:01+00:00
> [Template merge - langs/und] Changed the downcasing script to work also with hyperminimised hfst-fst's. Now the downcasing script works both with Xerox, Hfst and Foma, and both with standard and hyperminimised hfst-fst's. Finally! 2018-05-02T09:22:31+00:00
459a590
> [Template merge - langs/und] Added support for filters for grammatical and derivation tags, sorted the generated filter list. 2018-04-23T14:47:36+00:00
460a592
> [Template merge - langs/und] Bugfix: OLang/xxx tags were removed, not made optional, in generators. 2018-04-20T08:33:08+00:00
461a594
> [Template merge - langs/und] Do not delete disambiguator.cg3 and grammarchecker.cg3 when cleaning. 2018-04-19T08:50:10+00:00
462a596
> [Template merge - langs/und] Whether to let the orig-lang tags be visible in the disambiguating analyser or not is dependent on the language and the needs of each language community. Moving the removal of those tags from the general processing to the language specific processing. Step 2: removing it from the general processing. 2018-04-18T13:16:59+00:00
463a598
> [Template merge - langs/und] Added the -p option to the yaml testing command, to remove all passing test. This should make it easier to spot the actual FAILs. 2018-03-08T12:53:50+00:00
464a600
> [Template merge - langs/und] Corrected path to zhfst file. Also changed the return code when the zhfst file is not found, so that it will be reported as a FAIL. Since this test is only run when configured for building spellers, a missing zhfst file should be fatal. Also changed variable name to avoid confusion with the shell variable. 2018-03-08T11:03:17+00:00
465a602
> [Template merge - langs/und] Added phony target forwarding 'make test' to 'make check'. Required to make 'make check' work on some build systems. 2018-03-08T10:43:52+00:00
466a604
> [Template merge - langs/und] Added a separate disambiguation file for the spell checker output, and a spell-checker-only pipeline (well, still tokenisation and disambigation, but no proper grammar checking). 2018-03-05T15:41:24+00:00
467a606
> [Template merge - langs/und] Corrected Foma compilation for phonology rules. 2018-03-05T10:24:41+00:00
469,472c608
< Added ignore pattern for in.txt 2018-03-01T07:09:50+00:00
< More ignores 2018-03-01T06:52:33+00:00
< More svn ignores. 2018-03-01T06:25:59+00:00
< Added svnignore pattern for sigma.txt. 2018-02-21T09:49:57+00:00
---
> [Template merge - langs/und] Made symbol alignment default - I can see no cases where we don't want it, but it is still possible to disable it if such a need pops up. Also improved the error message when trying to build a twolc language using Foma. 2018-02-09T08:08:26+00:00
473a610
> [Template merge - langs/und] Added INFO text about switching to Hfst as a fallback when Xerox tools are not found. Also added test and error message when using Foma on a language with a twolc file. 2018-02-09T07:36:52+00:00
475c612
< Two more files to ignore. 2018-02-06T09:44:18+00:00
---
> [Template merge - langs/und] Fixed URL analysis in MT. All URL's and email addresses are now tagged +URL. Although the url analyser itself is small, the resulting analyser quadrupled in size (in sme). 2018-02-05T19:50:53+00:00
476a614
> [Template merge - langs/und] Removed filters for removing morphological borders - they destroy the assymetry of the fst's, and make yaml testing more complicated. 2018-02-02T08:15:13+00:00
477a616
> [Template merge - langs/und] Added support for Area variants of the grammar checker generator. Should fix nightly build error for SMJ. 2018-02-01T19:35:32+00:00
478a618
> [Template merge - langs/und] Added missing Foma support for dictionary fst's. 2018-02-01T18:40:36+00:00
479a620
> [Template merge - langs/und] Fixed the last bunch of path errors. Now all yaml tests are back to normal. 2018-02-01T17:55:01+00:00
480a622
> [Template merge - langs/und] Cleanup: commented in outcommented test loop, removed exit statement used during development, fixed path for two test scripts. 2018-02-01T16:02:11+00:00
481a624
> [Template merge - langs/und] The last set of test runners for yaml tests changed to the new system. 2018-02-01T15:18:02+00:00
482a626
> [Template merge - langs/und] Three more yaml test runners done, still a few more to go before yaml testing is back in shape. 2018-02-01T14:01:58+00:00
484a629,630
> [Template merge - langs/und] Changed the last yaml testing scripts in the template to follow the new and improved system. No need for autoconf processing anymore. 2018-02-01T12:11:33+00:00
> [Template merge - langs/und] Major rework of the yaml testing framework, to be able to properly support fst type specific yaml testing (ie test only xfst or hfst transducers, or everything but xfst transducers (=foma & hfst)). This change triggered a number of other changes. The user-facing shell scripts are greatly simplified by this change. 2018-02-01T10:03:29+00:00
486c632
< Updated svn ignores. 2018-01-31T12:13:59+00:00
---
> [Template merge - langs/und] Corrected AM errors in the previous merge. Now the build is working again, 2018-01-31T11:43:08+00:00
487a634
> [Template merge - langs/und] Added support for grammar checker generators for alternative orthographies and writing systems. Should fix nightly build issue in CRK. 2018-01-31T11:23:45+00:00
488a636
> [Template merge - langs/und] Added support for a grammar checker specific generator. Should fix various issues re generation of suggestions. 2018-01-25T09:46:31+00:00
490a639
> [Template merge - langs/und] Added test for the presence of divvun-validate-suggest, which is now required to build grammar checkers. Now configure will error out instead of make. 2018-01-23T07:34:32+00:00
491a641
> [Template merge - langs/und] Add note to the errors.xml file that it is generated, and from which file it is generated, to avoid people editing the wrong file. 2018-01-22T12:42:30+00:00
492a643,644
> [Template merge - langs/und] Error messages are now copied from a source file to a build file, after bein validated. This allows support for VPATH builds and retains the integrity of the zcheck file. At the same time also replaced hard coded language names with automake variable expansion in the pipespec.xml.in file. 2018-01-22T10:42:27+00:00
> [Template merge - langs/und] Fixed bug in building dictionary analysers for alternative orthographies, introduced in the changes yesterday. 2018-01-18T07:10:39+00:00
493a646
> [Template merge - langs/und] Added option to specify language variant, to allow testing spellers for alternative writing systems, alternative orthographies, different countries etc. 2018-01-18T06:36:58+00:00
494a648
> [Template merge - langs/und] Added support for area / country specific fst's for the specialised dict and oahpa build files. At the same time reorganised the build code so that targets with two variables now consistently use the fst type / suffix as the pattern, and the writing system/alt orth/area/etc as the function parameter. This should make the build system more robust by reducing the risk for accidental pattern similarity. 2018-01-17T11:38:19+00:00
496a651,652
> [Template merge - langs/und] Added support for building area/country specific spellers. The target language for now is SMJ, but the feature is of course language independent and useful in a number of other circumstances. 2018-01-16T19:48:02+00:00
> [Template merge - langs/und] Changed dialect fst filenames to follow existing patterns used for Oahpa fst's. 2018-01-16T14:44:17+00:00
497a654
> [Template merge - langs/und] Added support for building dialect fst's. It is disabled by default, but can be enabled with a configure option. Also changed the disamb analyser to keep the dialect tags. Only normative fst's are filtered against dialect tags. 2018-01-16T12:40:15+00:00
498a656
> [Template merge - langs/und] Added initial support for building Area-specific analysers and generators (norm only). Also restored Area tags in the disamb and grammar checker analysers. Fixed missing support for Foma transducers in the alternative writing system support. 2018-01-16T07:48:28+00:00
499a658
> [Template merge - langs/und] Grammar checker .zcheck file should go into datadir, not libdir. 2018-01-15T11:55:59+00:00
500a660
> [Template merge - langs/und] Now using speller version info from configure.ac, not version.txt, which is removed. New giella-core required. 2018-01-15T10:41:08+00:00
501a662
> [Template merge - langs/und] Fixed a bug in fst format handling for the grammar checker - conflicting formats caused a segfault. Now using openfst-tropical for all fst's being processed in the grammarcheckers/ dir (presently only the speller acceptor analyser). 2018-01-15T08:52:02+00:00
502a664
> [Template merge - langs/und] Fixed OLang tag extraction and filter generation. 2018-01-12T13:20:03+00:00
503a666
> [Template merge - langs/und] Added weights to compounds in the language-indpendent build steps (languages without compounds will go through the same step, but will not be changed). Applied only to analysers. Also added spellrelax to the language-independent build of the analysers = it it always applied. 2018-01-12T12:01:39+00:00
504a668
> [Template merge - langs/und] Improved the previous fix: make sure it does not crash when the target file does not exist, and use the same test on all autogenerated tag lists. This should save a few more seconds of build time. 2018-01-12T08:34:14+00:00
505a670
> [Template merge - langs/und] Fixed bug #2355 so that the filters for semantic tags will only be rebuilt when there are real changes to the semantic tags. 2018-01-11T17:34:37+00:00
506a672
> [Template merge - langs/und] Corrected a € vs cut incompatibility on Linux, cf bug report #2457. 2018-01-11T08:49:22+00:00
507a674
> [Template merge - langs/und] Updated the pipespec.xml file to comply with the newest version of the grammar checker code, where each argument type is explicitly specified. Makes for a more robust pipeline. 2018-01-10T12:05:59+00:00
508a676
> [Template merge - langs/und] Corrected fileref in m4, added correct autoconf path to errors.xml. 2018-01-08T14:49:25+00:00
509a678
> [Template merge - langs/und] Renamed pipespec.xml to *.in, to allow autoconf processing. This makes it possible to use modes when building using VPATHS/out-of-source builds. 2018-01-08T14:32:42+00:00
511a681
> [Template merge - langs/und] Hard-coded filename in fallback target - that was the only way to work around a loop in make on some systems. 2018-01-08T09:47:12+00:00
513a684
> [Template merge - langs/und] Renamed src/syntax/disambiguation.cg3 to src/syntax/disambiguator.cg3, to keep the file naming consistent (actor noun if possible), and remove discrepancy between the regular disambiguator and the grammar checker disambiguator that caused makefile troubles. 2018-01-08T05:52:54+00:00
516a688
> [Template merge - langs/und] Heavy rewrite of the analysis regression check tool, to support testing the grammar checker pipeline. 2017-12-12T12:20:38+00:00
517a690
> [Template merge - langs/und] Do not remove semantic tags, dialect tags and other tags useful for disambiguation or suggestion generation. The grammar checker speller needs these, and they will anyway disappear when we project the final fst. 2017-12-11T13:08:14+00:00
519d691
< Updated svn ignores. 2017-12-11T12:55:46+00:00
520a693,694
> [Template merge - langs/und] Proper verbosity specification in a few more instances, and added weight pushing for the grammar checker speller now (how could I have missed that?). 2017-12-01T12:31:44+00:00
> [Template merge - langs/und] Fixed a bug in piped hfst-xfst commands: in three cases the -p option was missing, causing strange misbehavior in hfst-xfst on some systems. 2017-12-01T12:09:34+00:00
521a696
> [Template merge - langs/und] Further configure.ac cleanup: moved some variable definitions to other m4 files, moved the language definition on top, deprecated GTLANG* variables for GLANG* variants (ie Giella instead of GiellaTechno). Updated copyright year. 2017-12-01T10:34:48+00:00
522a698
> [Template merge - langs/und] Moved all default AC_CONFIG_FILES into a separate function in a separate m4 file, to clean up configure.ac. Some other cleanup of configure.ac. 2017-12-01T09:34:29+00:00
523a700
> [Template merge - langs/und] Defined variable for separate speller release version string. 2017-12-01T08:24:01+00:00
524a702
> [Template merge - langs/und] Changed package name and version to more clearly be a real name and version number. 2017-12-01T08:10:14+00:00
525a704
> [Template merge - langs/und] Updated comment in preparation for other changes. 2017-12-01T07:53:11+00:00
526a706
> [Template merge - langs/und] Added support for analysing whitespace and thus make it possible to tag whitespace errors (double spaces, extra spaces, etc), and also to more reliably detect sentence and paragraph borders by using whitespace as a delimiter. 2017-11-30T14:24:15+00:00
527a708
> [Template merge - langs/und] Using absolute dir refs to make it possible to call the shell scripts from everywhere. 2017-11-30T12:38:32+00:00
528a710
> [Template merge - langs/und] Fixed a bug: forgot to remove a line. 2017-11-29T13:39:30+00:00
529a712
> [Template merge - langs/und] Rewrote the speller test scripts in devtools/ to be VPATH safe and rely on autotools for paths etc, so that the scripts will work also when only checking out single languages. 2017-11-29T13:22:48+00:00
530a714
> [Template merge - langs/und] Added support for specifying language-specific files to be included in the grammar checker archive file. 2017-11-15T13:28:24+00:00
531a716
> [Template merge - langs/und] Updated grammar checker files and build rules. 2017-11-13T09:50:04+00:00
532a718
> [Template merge - langs/und] Added hfst-push-weights to move transducer weights to the beginning of the strings, to enable proper optimisations of speller lookup in hfst-ospell. Stripped out most lang-specific stuff from grammar checker cg file, and added simple example rules + some explanations. Use gramcheck tokeniser in pre-pipe. 2017-11-07T15:47:41+00:00
534a721,722
> [Template merge - langs/und] Added default rule for speller suggestions, to make the suggestions survive cg treatment. 2017-10-25T09:52:38+00:00
> [Template merge - langs/und] Added spell checking component to the grammar checker pipeline. Now every planned component is working as it should. The spell checking requires first that one builds the latest hfst-ospell code, and then the newest grammar checker code for this to work. 2017-10-24T12:57:42+00:00
535a724
> [Template merge - langs/und] Increased weights for fall-back rule-based hyphenation. Added .hfst suffix to rule fst for consistency. 2017-10-13T07:41:39+00:00
536a726
> [Template merge - langs/und] Replaced the huge sme grammar checker with the more moderate smn grammar checker cg file, as the template file for future grammar checkers. 2017-10-12T08:40:18+00:00
537a728
> [Template merge - langs/und] Added note (readme file) about NOT touching the local am-shared dir, to avoid future unintended changes. 2017-10-12T06:37:15+00:00
540,541c731,732
< Updated svn ignores for tokenisers and grammar checkers + subdirs. 2017-10-11T11:47:18+00:00
< Updated svn ignores for tokenisers and grammar checkers + subdirs. 2017-10-11T11:22:45+00:00
---
> [Template merge - langs/und] Added the missing files for a working grammar checker. Fixed grammar checker build rules to not be dependent upon enabling tokenisers. 2017-10-11T17:47:41+00:00
> [Template merge - langs/und] Added conversion of the analysis tags from the grammar checker speller into CG format. 2017-10-11T06:16:10+00:00
542a734
> [Template merge - langs/und] One misplaced variable caused the grammar checker speller to be built independent of the configuration. This caused a build fail for everyone. Solves bug #2437. Also added $(srcdir) in front of root.lexc, to ensure that the file reference resolves correctly in local build targets. 2017-10-10T09:37:14+00:00
543a736
> [Template merge - langs/und] Moved the target clean-local to the local Makefile, to make it possible to enhance the clean target with locally generated files. 2017-10-10T09:10:43+00:00
544a738
> [Template merge - langs/und] Correctiona to the grammar checker speller build: we now build a working zhfst file that can be used as part of the development cycle. Also additions to silent builds. 2017-10-04T07:00:25+00:00
546a741
> [Template merge - langs/und] Major update to the grammar checker template. It still does not work completely as it should, so hold your horses. Update content: ensured that all files needed are copied to the grammar checker build dir, removed option to name files (=irrelevant bloat), now builds an almost proper zip file, and ensured that tokenisers are built before grammarcheckers. Also made it so that when grammar checkers are enabled, spellers are automatically enabled too, as they will be included as part of the grammar checker pipeline. 2017-10-03T06:56:28+00:00
547a743,744
> [Template merge - langs/und] Changed the file exists test for the lemma generation testing so that it will work even in cases where multiple source files are used as input. 2017-09-20T12:00:35+00:00
> [Template merge - langs/und] Made cg3 file compilation more general. 2017-09-19T14:20:12+00:00
548a746
> [Template merge - langs/und] Moved the code to build the apertium relabel script in the apertium directory, so that we can use the actual giella-tagged fst for MT as the tag source. This should fix all issues of missing tags in the relabel script. 2017-09-15T14:15:42+00:00
549a748
> [Template merge - langs/und] GLE requires regex compilation possibilities in src/, no reason why it can't be. 2017-09-14T11:28:42+00:00
551a751,752
> [Template merge - langs/und] Fixed a shortcoming in the build infra uncovered by gle: no explicit support for language-specific build rules that will not end up in lexicon.?fst. 2017-09-14T05:53:36+00:00
> [Template merge - langs/und] Moved tag extraction to a separate am-include file, so that it can be shared between different dirs. Moved generation of regex for turning tags into CG friendly format from src/filters/ to tools/tokenisers/filters/. 2017-08-28T14:22:25+00:00
553c754
< Updating svn ignores. 2017-08-25T10:22:58+00:00
---
> [Template merge - langs/und] After a couple of bug fixes in giella-core, require the new version. 2017-08-25T10:11:56+00:00
555a757,758
> [Template merge - langs/und] Initial support for building tokenisers where the morphological analysis tags are given in CG format directly instead of having to be postprocess by hfst-tokenise before being printed. The idea is to make the hfst-tokenise code more general, and move everything that is particular to one language or setup go into the fst instead of being hardcoded in the C++ code. There are some issues that must be resolved, but fst-wise the code works. 2017-08-24T11:51:30+00:00
> [Template merge - langs/und] Added support for building a regex that transform all tags from the format "+Adv" to " Adv" (including space). The idea is to make the tags readily consumable by CG. Both prefix and suffix tags are converted. Newest giella-core required. 2017-08-24T10:09:58+00:00
556a760
> [Template merge - langs/und] Part two of renaming the preprocess dir to tokenisers. Now all refs to it are updated. 2017-08-24T07:30:10+00:00
557a762
> [Template merge - langs/und] Renamed the preprocess dir to tokenisers, to better describe the content of it. 2017-08-24T06:29:58+00:00
558a764
> [Template merge - langs/und] Added support for diffing and merging on Linux. As part of that added checking for diff tools in m4/giella-macros.m4, and added more tests against failures. Also added test for cg-mwesplit, and increased the required vislcg3 version to the 1.0 release. 2017-08-16T10:52:39+00:00
560a767,768
> [Template merge - langs/und] More robust test for the existence of the various vislcg3 files. 2017-08-15T12:21:48+00:00
> [Template merge - langs/und] Added more robust option checking, and a test for the existence of the specified corpus file. Also added some comments. 2017-08-15T07:17:30+00:00
561a770
> [Template merge - langs/und] Actually open the other diff views. And force-add to svn - we don't want error messages in this context. 2017-08-14T14:47:30+00:00
562a772
> [Template merge - langs/und] Corrected glaring variable copy&paste bug. Thanks to Trond for spotting it! 2017-08-14T12:56:47+00:00
564a775,776
> [Template merge - langs/und] Removed from the default build rules the automatic removal of +Comp tags in adverbs. That is definitely not a behavior we want universally. 2017-07-02T01:38:08+00:00
> [Template merge - langs/und] Fixed a bug that caused the check_analysis_regressions.sh script to fail if you hadn't put giella-core/scripts/ in your path - which is not automatically done when you just checks out giella-core and your language of interest. 2017-06-30T00:57:46+00:00
565a778
> [Template merge - langs/und] Changed command to extract the specified fst name, the old version was not reliable. 2017-06-29T01:19:43+00:00
567,568c780
< Updated svn ignores. 2017-06-28T23:37:25+00:00
< Updated svn ignores. 2017-06-28T23:08:42+00:00
---
> [Template merge - langs/und] Due to wrong AM conditional, it still built a few mobile speller fst's. Now it should be quiet. 2017-05-23T09:33:02+00:00
569a782
> [Template merge - langs/und] Really do disable mobile spellers by default... 2017-05-23T08:58:00+00:00
570a784
> [Template merge - langs/und] Made mobile spellers not build by default, even when enabling spellers. The mobile spellers must now be explicitly enabled. 2017-05-23T08:40:31+00:00
571a786
> [Template merge - langs/und] Removed Ins() around Unknown. This triggered a bug(?) in hfst-tokenise, that caused wordforms not to be output. Speed and memory consumption should not be noticably affected. 2017-05-16T17:02:03+00:00
572a788
> [Template merge - langs/und] Improved pmatch scripts - unification by reference instead of full fst unification. Reduces file size by ≈2/3, and runtime memory consumption by 50%. 2017-05-04T10:53:26+00:00
575a792
> [Template merge - langs/und] Now that there is a new version of Hfst out, require it. Should resolve issues with compiling the url.lexc file. 2017-04-18T16:18:49+00:00
581c798
< ign 2017-03-21T19:49:19+00:00
---
> [Template merge - langs/und] Further development of the analysis regression check: added support for diff views of all diff types, and now you can specify which diff view you want to see (and you must specify at least one). You can also override the default corpus, and specify a corpus of your own with the -c/--corpus option. Also corrected the initial description of the script in the help text, and added a diff view comparing the old pipeline using Xerox with the new pipeline using hfst-tokenise. This will help in finding unwanted differences between the two. 2017-03-17T13:38:06+00:00
582a800
> [Template merge - langs/und] Further improvements to the analysis regression check: only do function and dependency analysis if the required cg3 files exist. Also clarified the -d option and silenced the Xerox lookup tool. 2017-03-16T14:34:32+00:00
583a802
> [Template merge - langs/und] Improved analysis regression check script: added a short help text, and added an option to ask for a diff between old-style (preprocess+lookup+lookup2cg) and new-style (hfst-tokenise+mwe-disamb+cg-mwesplit) morphological analysis. Intended to be used to find weak (and strong!) spots in the new-style morphological analysis. 2017-03-16T12:25:01+00:00
584a804
> [Template merge - langs/und] Added the first version of a $LANG/devtools/ script that will process a corpus with the available tools, and compare the result against the previous version in the svn repository. The idea is to be able to easily spot regressions in analyses due to changes in the lexicons or CG rules. There are a number of rough edges, but it works. 2017-03-16T10:23:09+00:00
585a806
> [Template merge - langs/und] Only remove generated lemma files if the lemma generation tests succeeds. 2017-03-14T14:45:50+00:00
586a808
> [Template merge - langs/und] Only delete generated dic and tex files if one really wants to start anew. Do not delete the version.txt file, only the generated wordlist file. 2017-03-07T18:46:59+00:00
587a810
> [Template merge - langs/und] Add the url parser also to the grammar checker tokeniser. 2017-03-07T15:01:40+00:00
588a812
> [Template merge - langs/und] Make the url.hfst a dependent of the hfst tokenising analyser. Improved the tokeniser based on recent changes in sme. 2017-03-06T17:09:00+00:00
589a814
> [Template merge - langs/und] Removed automatic inclusion of the url parsing fst. The union with the regular fst blew up the total, in some cases more than 10x! The preferred way of adding it is to add it in the last steps of the *.tmp.fst > *.fst processing by loading it onto the stack (and inverse it for hfst) before saving the fst stack, and thus creating a transducer file with two fst's. Applying the input to them both will in effect union them, giving the output we want without blowing up the size of the fst file. 2017-03-03T14:22:03+00:00
590a816
> [Template merge - langs/und] Added support for compiling a lexc file for parsing URL's as such, giving them a separate tag. Only added to the descriptive analysers for now. Requires an updated version of giella-shared, due to the new file needed for the new functionality. 2017-03-02T14:17:37+00:00
591a818
> [Template merge - langs/und] Corrects an inconsistency in the order of tag changing processing, where generators and analysers got their tags changed in different order, which caused different tags in some cases. Fixes bug #2264. Thanks to Heiki-Jaan Kaalep for the new and corrected code. 2017-03-02T06:40:32+00:00
593c820
< Updated svn ignores. 2017-03-01T12:02:48+00:00
---
> [Template merge - langs/und] Updated Python feedback to correctly state that Python 3.5 is required. 2017-02-27T09:33:44+00:00
594a822
> [Template merge - langs/und] Fixed issue with link generation thanks to Heiki-Jaan Kalep. 2017-02-22T09:03:36+00:00
595a824
> [Template merge - langs/und] Increased reqiured version of Python3, due to the updated speller test bench. 2017-02-15T08:03:00+00:00
596a826
> [Template merge - langs/und] New version of the speller test bench, now with sortable table columns, and optional timing of the suggestions for every input word (hfst-ospell-office only). Not finished, but working quite well. It is also possible now to specify the number of suggestions returned by hfst-ospell-office. 2017-02-14T09:53:07+00:00
597a828
> [Template merge - langs/und] Increased required version of giella-core due to bug fix in the core. 2017-02-03T11:51:25+00:00
598a830
> [Template merge - langs/und] Increased required version of giella-core due to changes in speller building. 2017-02-03T09:51:03+00:00
599a832
> [Template merge - langs/und] One more attempt at fixing the giella-common package bug. 2017-02-02T08:57:53+00:00
600a834
> [Template merge - langs/und] Added final step in building pattern-based hyphenators: now also prepared for Hunspell-like OOo hyphenation. Requires new version of the giella-core. Also corrected bug in checking the version number of giella-common. 2017-02-01T11:11:49+00:00
601a836
> [Template merge - langs/und] Tex pattern based hyphenation generation works. The output must be checked and tested, and the process may have to be rerun several times to get the desired hyphenation behavior. Removed outcommented build code from the old infra - the new build code is essentially just a reformulation of the old one. 2017-01-31T14:44:44+00:00
602a838
> [Template merge - langs/und] Added support for checking the version of the giella-common package (aka giella-shared/). Added two new regexes to the source file list for shared regexes. Updated the required version of Hfst - it has not been updated in ages. 2017-01-31T13:57:18+00:00
603a840
> [Template merge - langs/und] Further work on the pattern based hyphenators: added tra file template, which is used to 'translate' non-ASCII chars to ascii only for the pattern creation process. Initial build steps for the pattern build. 2017-01-31T12:29:27+00:00
604a842
> [Template merge - langs/und] Improved the fst-based hyphenator by removing irrelevant paths from the fst. Started work on the pattern-based hyphenator, based on code from the old infra. 2017-01-31T11:12:55+00:00
605a844
> [Template merge - langs/und] Finished first version of fst-based hyphenator: now includes plain rules as a fall-back solution (including for misspelled words), and Err-tagged forms get a high weight penalty. In general, this seems to give good hyphenation patterns if one pick the first (lowest-weight) one. 2017-01-30T13:51:44+00:00
606a846
> [Template merge - langs/und] First version of lexicon-based and fst-based hyphenation done. Works, but misses capitalised words, and does not give extra weights to Err-tagged word forms. Also no hyphenation of misspelled words yet. Hyphenation builds are off by default. 2017-01-30T12:15:03+00:00
609c849,850
< Updated svn ignores. 2017-01-30T10:04:48+00:00
---
> [Template merge - langs/und] Added template file for weighting tags when the fst is used as a hyphenator. 2017-01-30T10:41:28+00:00
> [Template merge - langs/und] Added check for cg-relabel when enabling apertium. Thanks to Flammie for identifying the issue. 2017-01-30T09:32:11+00:00
610a852
> [Template merge - langs/und] Added basic dir structure for building hyphenators. 2017-01-27T07:35:06+00:00
612a855
> [Template merge - langs/und] Replaced gtcore with giella-core. 2017-01-25T10:01:02+00:00
614a858
> [Template merge - langs/und] Added test dir for hyphenators, to store data from the old infra. 2017-01-23T10:52:55+00:00
615a860
> [Template merge - langs/und] Added test dirs for listbased spellcheckers, if we ever get to that. 2017-01-23T09:09:07+00:00
616a862,863
> [Template merge - langs/und] Fixed logical error in the handling of negated specified fst handling in yaml tests (e.g. ~xfst) - the test didn't work, and the yaml file was run when not intended. 2017-01-18T00:33:00+00:00
> [Template merge - langs/und] Fixed regression introduced in the previous commit: one-sided tests where included when looking for test data, causing a subsequent python fail when no actual test data was found. Fixed by using a stricter file name pattern. 2017-01-17T15:52:24+00:00
618a866,867
> [Template merge - langs/und] Added option to specify in a yaml filename that it should only be tested against a specific technology or not, by specifying one of .foma, .hfst or .xfst before the suffix part (before [.gen].yaml), and prefixed with '~' if negated (i.e. .~xfst for NOT running it against Xerox). 2017-01-17T08:48:15+00:00
> [Template merge - langs/und] Slightly more robust yaml testing code. 2017-01-16T15:15:11+00:00
619a869
> [Template merge - langs/und] Common starting point for both weighted and unweighted parts. 2017-01-16T15:08:00+00:00
621a872,873
> [Template merge - langs/und] Added removal of Area tags also for specialised fst's. Fixes Korp issue reported by Ciprian. 2017-01-10T13:51:04+00:00
> [Template merge - langs/und] Ensure the fastest lookup method is used during hfst yaml generation tests. 2016-12-09T09:43:42+00:00
623a876,877
> [Template merge - langs/und] Removed the bash hack to add a css processing instruction - it is done by the perl script writing the xml file. 2016-11-28T19:51:34+00:00
> [Template merge - langs/und] Removed the removal for dialect and variant tags from the grammar checker analyser, the information can be useful when generating suggestions for corrections. 2016-11-23T14:51:00+00:00
625a880,881
> [Template merge - langs/und] Removed repetition of the frequency weighted fst. The goal was to promote compounds where each part was already seen in the corpus, but it made the speller bigger and slower, and actually decreased suggestion quality slightly. — Also added code to do manual priority union, but it is buggy and outcommented for now. 2016-11-21T11:49:22+00:00
> [Template merge - langs/und] Added info about which file to look in to find a suitable frequency corpus cut-off location (=line number). 2016-11-18T09:44:19+00:00
626a883
> [Template merge - langs/und] Renamed the option --enable-hfst-dekstop-spellers (added plural 's'), and changed the behavior of it so that when disabled, zhfst files are still built (and only those). 2016-11-16T10:40:59+00:00
628a886,887
> [Template merge - langs/und] Cleaner build steps for local speller filters - the regex is now copied in and compiled according to the fst-format of the speller as opposed to earlier, where the binary fst was compiled and then transformed. 2016-11-02T23:02:53+00:00
> [Template merge - langs/und] Move CmpNP processing from general speller processing to each language. 2016-11-02T08:13:19+00:00
630a890,891
> [Template merge - langs/und] Also moved the CmpNP filtering to the relevant languages. 2016-11-02T06:45:12+00:00
> [Template merge - langs/und] Forgot one file in the previous commit - now that filter is completely removed from the core and template, and all language-independent processing. 2016-11-01T10:37:04+00:00
631a893
> [Template merge - langs/und] Moved the remove-norm-comp-tags.regex file from the giella-shared directory to the languages actually using it, and consequently removed it from the language-independent build files. 2016-11-01T10:26:05+00:00
632a895
> [Template merge - langs/und] Updated the speller devtools scripts to obey the new name and location of the giella-core directory. 2016-10-26T13:37:44+00:00
633a897
> [Template merge - langs/und] Added test for available GNU Make, and at least at version 3.82. Error if not found, except on OSX/macOS, where the builtin make is GNU Make 3.81 + patches, which corresponds to the required version or newer. 2016-10-26T12:30:01+00:00
635a900,901
> [Template merge - langs/und] Better support for speller filters using source files from other locations. 2016-10-20T14:31:01+00:00
> [Template merge - langs/und] Added mwe-dis.cg3, to allow disambiguation of multiword expressions and other tokenisation ambiguity. 2016-10-18T09:55:59+00:00
636a903
> [Template merge - langs/und] We build the tokeising analysers directly off the disamb and grammar checker analysers in src/, assuming that they are identical. This is a reasonable assumption now that the hfst tool kit contains all necessary machinery, and we don't need to pay special attention to the requirements of the tokenisation. 2016-10-17T07:30:03+00:00
637a905
> [Template merge - langs/und] Make --with-backend-format work also for the tokenising analysers. 2016-10-17T06:44:58+00:00
638a907
> [Template merge - langs/und] Wrong variable name :-( - now it is correct. 2016-10-10T15:01:56+00:00
639a909
> [Template merge - langs/und] Corrected makefile dependency for the und.timestamp file. 2016-10-10T14:50:42+00:00
641a912
> [Template merge - langs/und] More robustness added to the test scripts: checking several variables, testing whether the found variables are pointing to existing directories, and giving an error message if no directory is found. 2016-10-06T15:25:28+00:00
642a914
> [Template merge - langs/und] Changed variable name and definition to allow overriding the path to the called script, to make it easy to use a locally modified script instead. 2016-10-04T13:49:12+00:00
643a916,917
> [Template merge - langs/und] Changed variable name in devtool scripts, to reflect similar changes elsewhere. Part of fixing bug #2219. 2016-10-04T08:53:42+00:00
> [Template merge - langs/und] Corrected a number of bugs and deficiencies when building spellers when the giella proofing tools libraries must be fetched over the net. Not the spellers build correctly under all intended circumstances given that there is a network connection. 2016-09-09T16:16:46+00:00
644a919
> [Template merge - langs/und] Corrected path for the test for availability of the giella-common resources. 2016-09-09T11:35:06+00:00
646a922,923
> [Template merge - langs/und] Added support for getting precompiled proofing tools libraries across the net if not found locally. Makes it actually possible to build spellers without checking out the whole of $GIELLA_HOME. Now it is also possible to just check out $GIELLA_LIBS if one still wants to build everything locally. 2016-09-09T10:37:24+00:00
> [Template merge - langs/und] Applied backend format rules to the tools/mt/ap/filters dir. This is not future proof, but does not create problems for sme, and solves a bug in smj. The future problem is that we mix both a specified backend format (for compilation efficiency) with the default/unspecified format fst (for weighting) in the same dir, and we can't automatically say which filters need to be in the specified backend format and which should be in the default format. This needs further consideration. 2016-09-02T08:23:58+00:00
648a926,927
> [Template merge - langs/und] Completely clean src/transcriptions/, and also clean tools/mt/apertium/filters/. 2016-09-01T13:31:23+00:00
> [Template merge - langs/und] Do not use PKG_CHECK_MODULES if you don't really have to - it clutters your code and creates unneeded variables = noise. 2016-08-31T11:22:13+00:00
650a930
> [Template merge - langs/und] Corrected placeholder string for two-letter ISO language code. 2016-08-25T20:54:03+00:00
651a932,933
> [Template merge - langs/und] Changed the path to the css for the xml speller test results in devtools. 2016-08-25T18:59:16+00:00
> [Template merge - langs/und] Added support for building alternate orthography fst's for dictionary and oahpa, and also morphers for alternative orthographies. Slight simplification of defs. 2016-08-24T13:18:35+00:00
653a936,937
> [Template merge - langs/und] One small change to support spellers for alternative orthographies built off of the raw fst instead of the standard fst. 2016-08-23T22:10:18+00:00
> [Template merge - langs/und] Added a possibility to build fst's for alternate orthographies based on the raw fst surface forms, instead of from the default/standard orthography. 2016-08-23T20:41:06+00:00
654a939
> [Template merge - langs/und] Changed all references to $(GIELLA_SHARED)/common into $(GIELLA_SHARED)/all_langs. 2016-08-23T06:28:45+00:00
656a942,943
> [Template merge - langs/und] Rewrote the code for identifying the location of GIELLA_CORE (former GTCORE). The code should be more robust, and is prepared to check against a pkg-config pc file as well. GTCORE is still used throughout the code, but in parallel to GIELLA_CORE, so that one can easily replace the former with the latter without causing bugs or other problems. 2016-08-22T20:20:28+00:00
> [Template merge - langs/und] Added checking for and setting of GIELLA_TEMPLATES, but only if you have defined GIELLA_MAINTAINER (renamed from GTMAINTAINER). Otherwise it is ignored. 2016-08-22T14:59:30+00:00
657a945
> [Template merge - langs/und] Revert experiment with priority union - it doesn't work as expected when weights are involved. Corrected filenames in the .SECONDARY target. 2016-08-19T12:29:12+00:00
658a947
> [Template merge - langs/und] Added download links to the build feedbad for 'make upload' in tools/spellcheckers/fstbased/desktop/hfst/. 2016-08-19T10:31:51+00:00
660a950,951
> [Template merge - langs/und] Final step to make the GIELLA_SHARED dir be found in all cases: assign the path from pkg-config to the variable. 2016-08-18T10:36:22+00:00
> [Template merge - langs/und] Removed the separate test for content, instead adding the test to each possible location, moving to the next location if no data is found. 2016-08-18T09:46:12+00:00
661a953
> [Template merge - langs/und] Changed the search order for GIELLA_SHARED data: * using --with-giella-shared=/path/to/giella-shared/data/root/dir * env. variable GIELLA_SHARED * env. variable GIELLA_HOME * env. variable GTHOME * env. variable GTCORE * using pkg-config This way it is always possible to overtide everything else using the --with option. Added comments. 2016-08-18T09:00:28+00:00
663a956
> [Template merge - langs/und] Added a configure test to check that there is actually data in GIELLA_SHARED. 2016-08-18T08:04:20+00:00
664a958,959
> [Template merge - langs/und] The giella-shared data dir is now found using several techniques in the following order: * evn. variable GIELLA_SHARED * evn. variable GIELLA_HOME * evn. variable GTHOME * evn. variable GTCORE * using --with-giella-shared=/dir/to/giella-shared * using pkg-config If all these fail, configure errors out. Since it a.o. uses GTHOME, the change should be of no concern to existing users having checked out everything. And since the svn location is still within GTCORE, it will also work for those checking out only the core and a single or a couple of languages without any action on their part. 2016-08-17T12:59:49+00:00
> [Template merge - langs/und] Second steps in renaming and splitting the gtcore into giella-core, giella-shared and giella-templates: replaced $(GTCORE)/giella-shared with the Automake variable @GIELLA_SHARED@. 2016-08-15T12:38:11+00:00
666a962
> [Template merge - langs/und] First steps in renaming and splitting the gtcore into giella-core, giella-shared and giella-templates: renamed variables. 2016-08-15T11:29:27+00:00
667a964
> [Template merge - langs/und] Generalised the build instructions for the morphological segmenter, aka the morpher. The morpher output can be used as input to a stemmer. 2016-07-01T11:29:37+00:00
670c967,968
< Comment in the header 2016-06-30T22:33:36+00:00
---
> Comment in the header 2022-10-23T12:41:55+02:00
> First time generation of documentation 2016-06-30T22:32:59+00:00
672a971
> Removed double exclamation mark in front of multiple hyphens - they make forrest very unhappy, and serve no purpose - they are intended only as visual cues in the text files, not in the generated forrest pages. 2016-06-23T12:38:07+00:00
675a975
> [Template merge - langs/und] Fixed a bug in speller builds introduced lately - missing hfst target. 2016-06-11T14:58:57+00:00
677a978
> [Template merge - langs/und] Updated filename reference, and added a pmatch setting fixes that the issue where words next to punctuation like "ja." don't get analysed. 2016-06-11T06:16:13+00:00
678a980,981
> [Template merge - langs/und] Removed '+' in front of tag patterns to be extracted from the tag list and used as input to regex generation scripts. This was done to accomodate the use of prefix tags, where the '+' is at the end of the tag, not in the beginning. 2016-06-09T23:00:26+00:00
> [Template merge - langs/und] Added new test to check that the speller accepts all lemmas in the lexicon. Disabled another test that hangs for unknown reasons. 2016-06-09T22:12:15+00:00
680c983
< Updated svn ignores. 2016-06-09T20:11:13+00:00
---
> [Template merge - langs/und] Rewrote the pmatch compilation code to support Kevin's tokenisation hints for MWE-ambiguous entries. Requires Kevin's hfst fork for now. Work in progress. 2016-06-08T17:50:16+00:00
681a985
> [Template merge - langs/und] Small change to support new style, backtracking based tokenisation experiments on space separated compounds in sme. 2016-06-08T07:45:09+00:00
683a988,989
> [Template merge - langs/und] The next batch of changes to support building hfst fst's with a specified backend fst format: desktop spellers are now supported. The speller fst's will be built using the specified backend format up to the point where corpus and tag weights are added, when the fst format will be changed to the default (openfst-tropical) format. That is, even if you specify (the unweighted) sfst as the backend format, the final speller will still be weighted. 2016-06-06T10:11:49+00:00
> [Template merge - langs/und] Better variable name and clearer comment about editing distance in spellers. 2016-06-02T10:44:56+00:00
684a991
> [Template merge - langs/und] Changed the build files for the desktop spellers to allow better user control of which files to include in the error model. 2016-06-01T11:16:37+00:00
686a994
> [Template merge - langs/und] Use priority union to avoid duplication of paths and thus make a mutch smaller (and hence faster) mobile speller fst. 2016-05-31T19:17:48+00:00
687a996,997
> [Template merge - langs/und] Use priority union to avoid duplication of paths and thus make a mutch smaller (and hence faster) speller fst. 2016-05-31T17:34:58+00:00
> [Template merge - langs/und] Fixed bug in building Oahpa fst's for alternate orthographies and writing systems. 2016-05-24T18:42:11+00:00
689a1000
> [Template merge - langs/und] Fixed a bug in the default build of grammar checker analysers. Blocked all languages without local overrides. 2016-05-23T12:27:43+00:00
690a1002
> [Template merge - langs/und] Moved removal of word boundaries out of the default, language-independent processing of the grammar checker analyser - we want to be able to do language-depending things with word boundaries, e.g. in freely compounding languages. 2016-05-19T12:45:24+00:00
691a1004,1005
> [Template merge - langs/und] Added provisions for including xfscript files in the src/morphology/ directory. 2016-05-18T13:52:38+00:00
> [Template merge - langs/und] Removed unneeded subtraction that just increased the size of the resulting fst a lot (how much of course depends on the grammar in question). 2016-05-18T12:59:26+00:00
693a1008
> [Template merge - langs/und] Added initial support for doing more targeted regex replacements on multichar sequences in parallel to the regular editdist operations. The idea is that these replacements can be applied more times (since they are few), and thus allow for more corrections of frequent spelling errors. 2016-05-18T06:37:22+00:00
694a1010
> [Template merge - langs/und] Restricted the new spellrelax to only give one tag. The previous version caused out-of-memory issues on a lot of systems. 2016-05-12T20:46:13+00:00
696a1013,1014
> [Template merge - langs/und] Added support for alternative orthographies in spellers. Works nicely in LO, but needs more testing. Also updated the clean target. 2016-05-11T18:01:57+00:00
> [Template merge - langs/und] Added a new spellrelax system that will add an +Err/ tag (or more) to the analysis of words misspelled according to the new spellrelax rules. Can be very costly in terms of size if applied to large lexical fst's, and if many error types are tagged, so initially it is only applied to the transcriptor fst (which are used in Oahpa). Template data is from Plains Cree (crk). 2016-05-11T04:02:01+00:00
699c1017
< Setting svn ignore patterns on tools/spellcheckers/filters/. 2016-05-10T01:00:11+00:00
---
> [Template merge - langs/und] Fixed compilation error: added missing inversion (.i). 2016-05-10T23:51:18+00:00
700a1019
> [Template merge - langs/und] Changed the final file format for hfst transcriptors to the hfstol format. 2016-05-10T00:43:06+00:00
701a1021
> [Template merge - langs/und] Fixed a bug in speller building with Xerox tools enabled. 2016-05-10T00:10:15+00:00
702a1023
> [Template merge - langs/und] Added support for filters for the top-level speller dir, in preparations for needs by the Haida spellers. 2016-05-09T05:26:18+00:00
703a1025
> [Template merge - langs/und] One more bugfix for tag reordering with language-specific additions. 2016-05-05T21:04:31+00:00
704a1027
> [Template merge - langs/und] Fixed bug for tag reordering with language-specific additions. Made building of glossing fst's configurable, and at the same time fixed a build bug for them. 2016-05-05T20:39:06+00:00
705a1029
> [Template merge - langs/und] Added initial support for hfst-based tokenisers, built on generalisations of Kevin's work. They are built using the hfst-tool hfst-pmatch2fst, which is the Hfst implementation of the pmatch tool from Xerox. Supports a regular tokeniser, and one targeted at grammar checking. 2016-05-05T16:20:04+00:00
706a1031,1032
> [Template merge - langs/und] Corrected errors in the makefile that stopped dictionary fst builds for languages with alternative orthographies. 2016-05-04T08:50:29+00:00
> [Template merge - langs/und] Build analyser for grammar checker when grammar checkers are enabled. 2016-05-03T12:37:32+00:00
707a1034
> [Template merge - langs/und] Generalised hack to force make to go via hfst instead of directly to hfstol. 2016-05-02T13:24:34+00:00
709a1037
> [Template merge - langs/und] Added support for specifying backend fst format also for (parts of the) apertium fst's. One step further to speed up compilation by specifying e.g. sfst as the backend format. The implementation is a bit hacky, but will have to do for now. 2016-04-29T15:28:32+00:00
710a1039
> [Template merge - langs/und] Added support for building glossing analysers, where the analysis tags are NOT shifted around to canonical positions. The idea is that one keeps tags and morphs together in the lexc code, and that the analyser output thus will reflect the order of the surface morphs. If one wants to build such analysers, one has to specify the final analyser filename in src/Makefile.am. 2016-04-29T13:46:45+00:00
712a1042
> [Template merge - langs/und] Corrections to make the Oahpa builds work, and also to properly build with foma. 2016-04-21T13:30:51+00:00
713a1044,1045
> [Template merge - langs/und] Corrected an error that made the new option to select fst format (=backend) in hfst non-functional. 2016-04-21T10:35:38+00:00
> [Template merge - langs/und] Now also Oahpa transducers are ready to be built with a specified backend format when building using hfst. Also cleaned up the code, removed 300+ lines of code, and added support for builds using Foma. 2016-04-21T10:09:58+00:00
715a1048,1049
> [Template merge - langs/und] A number of corrections to the previous commit for issues missed during the first round of testing. Now specifying an alternative backend format works correctly for all standard analysers and generators except for Oahpa-fst's. 2016-04-21T07:00:36+00:00
> [Template merge - langs/und] Enabled the new option to specify transducer format when compiling with Hfst, to speed up compilation time by using an unweighted format (ie sfst or foma). Default is still openfst-tropical, until further testing is done. 2016-04-21T05:39:18+00:00
717a1052
> [Template merge - langs/und] Further preparations for enabling the new option to choose the backend format for fst's, for compilation speed improvements in cases where weight is not used: generalisations and corrections of build instructions. 2016-04-20T13:59:10+00:00
718a1054
> [Template merge - langs/und] Bummer: wrong default backend format - only openfst-tropical is stable, the other formats are more or less buggy. 2016-04-20T10:21:47+00:00
719a1056
> [Template merge - langs/und] More preparations for new configure option to specify backend format of compiled fst's. 2016-04-20T10:09:28+00:00
721c1058,1059
< Ignore more preprocessor files = fst’s. 2016-04-14T16:01:04+00:00
---
> [Template merge - langs/und] Preparations for new configure option to specify backend format of compiled fst's. Removed some old code. 2016-04-20T09:33:11+00:00
> [Template merge - langs/und] Further abstractions over parallel patterns, reducing code size. 2016-04-06T08:37:02+00:00
723a1062
> [Template merge - langs/und] Remove generated files also in tools/mt/apertium/tagsets/. 2016-03-17T08:24:49+00:00
725c1064
< Updated svn ignores. 2016-03-15T19:54:49+00:00
---
> [Template merge - langs/und] Updated required versions for Hfst and VislCG3. A number of bug fixes and new features require these versions for many of our tools. 2016-03-17T07:24:56+00:00
726a1066
> [Template merge - langs/und] One pattern rule had for some reason become ambiguous, and caused strange build behavior. Replaced with full filename in one stable case solved the issue. 2016-03-10T23:18:50+00:00
728c1068
< Use a more general svn ignore pattern in src/morphology/. 2016-03-07T17:10:12+00:00
---
> [Template merge - langs/und] cut on linux does not like unicode chars as delimiters, use awk instead. 2016-03-08T19:25:46+00:00
729a1070
> [Template merge - langs/und] Code cleanup - moved target variables related to running xfst tools to the xfscript include file, and thereby removing duplicate code. 2016-03-07T08:24:31+00:00
730a1072
> [Template merge - langs/und] Added an option to enable building tokenisers, off by default. 2016-03-04T09:28:58+00:00
731a1074
> [Template merge - langs/und] Do the CG3 tag relabelling in the Giella infra, not in Apertium. 2016-02-29T10:50:45+00:00
732a1076
> [Template merge - langs/und] Forgot to rename the Area variable in the previous commit. 2016-02-26T16:00:40+00:00
733a1078
> [Template merge - langs/und] First iteration of adding support for Area codes (ie countries) based on ISO 3166 codes. Right now does nothing except filtering out the tags, proper support coming in steps. Requires new version of the GTCORE scripts. 2016-02-26T15:47:35+00:00
735a1081
> [Template merge - langs/und] Better handling of hfst/xfst/foma for the top-level speller dir - invert when needed. 2016-02-24T13:36:45+00:00
736a1083
> [Template merge - langs/und] There was still one more automake file with references to the remove-derivation-position-tags.regex filter. Now they are gone. 2016-02-24T10:46:54+00:00
737a1085,1086
> [Template merge - langs/und] A typo made reversed compose&intersect seem buggy, whereas in fact it was not. 2016-02-23T14:42:30+00:00
> [Template merge - langs/und] Small correction to bring the Giella version of reversed comp&intersect closer to what Miikka has: added minimisation to the reversed twolc rules. 2016-02-23T12:25:22+00:00
739a1089,1090
> [Template merge - langs/und] Added configure option to reverse the lexicon and the morph-phon rules during composition and intersection. Reduced the time needed for that operation to ≈1/3 of what it used to be in SMS, and RAM consumption went down from 11Gb to max 400Mb! Speed and RAM gains will vary from language to language. 2016-02-22T12:04:53+00:00
> [Template merge - langs/und] Now the lexical fst is first compiled into a .tmp file, to allow language-specific changes to be applied from .tmp to final file. More support for xfscript compilation. 2016-02-22T08:31:06+00:00
741a1093,1094
> [Template merge - langs/und] Added include to xfscfript-include.am, to let xfscripts be used in lexc compilation. 2016-02-20T21:22:17+00:00
> [Template merge - langs/und] Second part of libdir cleanup: removed the libdir line in the pkgconf file. 2016-02-19T18:14:43+00:00
742a1096
> [Template merge - langs/und] libdir -> datadir for zhfst installations using autotools. 2016-02-19T17:11:18+00:00
744a1099
> [Template merge - langs/und] Removed some references to remove-derivation-position-tags.regex that were forgotten in commit r129657. 2016-02-19T12:10:35+00:00
745a1101
> [Template merge - langs/und] Added analyser-disamb-gt-desc.hfst as a noinst_DATA target, to force make to build it instead of going directly to the *.hfstol file, and thus breaking compilation when local modifications are needed. 2016-02-19T07:19:50+00:00
746a1103
> [Template merge - langs/und] Added INVERT_<FSTTECH> variables to help in improving compilation of analysers and generators for different fst technologies (Xerox, Hfst, Foma). Hfst has the inversed convention for lookup compared to the other two, and by using a variable we can now actually share the same build code irrespective of which one we need to inverse for the final analyser or generator. 2016-02-18T13:19:40+00:00
747a1105
> [Template merge - langs/und] Removed the remove-derivation-position-tags filter from language-independent processing, it is language-specific, and will be added to the languages needing it. This also makes it possible to do further local processing dependent on these tags. 2016-02-18T09:47:36+00:00
749c1107
< Updated the svn ignore property for recent changes in the infrastructure. 2016-02-16T22:36:51+00:00
---
> [Template merge - langs/und] Error out if Hfst is requested but not found or too old. 2016-02-17T12:34:49+00:00
750a1109
> [Template merge - langs/und] LibreOffice-voikko 5.0 support for spellers with alternating writing systems. 2016-02-05T00:43:25+00:00
751a1111
> [Template merge - langs/und] Initial support for building zhfst files for mobile phone keyboards. This version is essentially the same as the desktop one, we'll start from here and adapt as we find better solutions. The zhfst file is compressed using xz for optimal file size (this is presently in violation of the zhfst specification, it must be updated soon). Also changed some of the configure options to error out when requested but without the required software installed - this is better than silently turning the requested feature off. 2016-02-04T12:09:28+00:00
752a1113
> [Template merge - langs/und] Cleaned the speller build and configuration code in preparation for adding support for building mobile spellers. 2016-02-03T17:36:26+00:00
754c1115
< Updating svn:ignore’s. 2016-02-02T15:34:45+00:00
---
> [Template merge - langs/und] Added a missing SUBDIR, and fixed a speller test script that was not working. 2016-02-03T08:43:55+00:00
755a1117
> [Template merge - langs/und] Updating path in test script. 2016-02-02T15:15:57+00:00
756a1119,1120
> [Template merge - langs/und] Moving the hfst speller test dir inside test/tools/spellcheckers/fstbased/desktop/. 2016-02-02T15:02:07+00:00
> [Template merge - langs/und] Preparing to reorganise the speller testing parallel to what has been done in the development dir. 2016-02-02T12:53:46+00:00
759,760c1123
< Updated svn:ignore’s. 2016-02-02T10:33:44+00:00
< Updated svn:ignore’s. 2016-02-02T10:16:28+00:00
---
> [Template merge - langs/und] Updated path to desktop speller files. 2016-02-02T10:58:50+00:00
761a1125
> [Template merge - langs/und] Major reorganisation to support building zhfst files for mobile systems (aka keyboard + speller). These need very different weighting priorities, another error model, and are thus placed in a separate subdirectory from desktop spellers. 2016-02-02T07:20:05+00:00
762a1127
> [Template merge - langs/und] First step in adding support for mobile phone spellers. 2016-01-31T22:33:17+00:00
764c1129
< Updated svn ignores. 2016-01-25T08:11:45+00:00
---
> [Template merge - langs/und] Added support for building LO-voikko 5.0 extensions. Python-based interface to LO, and initial support for specifying unknown speller languages by typing in the language code in the language name field. 2016-01-30T08:08:22+00:00
765a1131
> [Template merge - langs/und] Commented out xz compression, it isn't supported by libvoikko. 2016-01-25T07:59:16+00:00
766a1133,1134
> [Template merge - langs/und] Changed test pair conventions for twolc from !€/!$ to !!€/!!$ to make it follow the conventions in the rest of the infrastructure, and make it possible to include test data in the documentation. 2016-01-13T08:26:29+00:00
> [Template merge - langs/und] Readded the initial-letter edits in the regex - everything else is there for the initial letter machinery, so leaving it out made the build inconsistent. The default is off, with a large warning for those turning it on. 2015-12-08T16:11:41+00:00
767a1136
> [Template merge - langs/und] Added script to run suggestion testing for the hfst-ospell-service (MS Office) speller. Rewrote the speller testing scripts to allow parallel execution. 2015-12-08T14:08:34+00:00
768a1138
> [Template merge - langs/und] Make transitivity tags optional also for the Apertium generator. 2015-12-02T14:02:18+00:00
769a1140
> [Template merge - langs/und] Push weights even when not minimising the speller acceptor. Minimisation is not always the best strategy. 2015-12-01T13:29:19+00:00
771a1143,1144
> [Template merge - langs/und] Removed --Werror from the language-independent automake file. Added a variable to make it possible to add it to the language-specific automake file. 2015-11-27T14:51:10+00:00
> [Template merge - langs/und] Added configure option to enable symbol alignment during lexc compilation for the lexical transducer. Defaults to off for now, we need to test the effect on various languages before making it default to on. Also added --Werror to lexc to make it break on all warnings when compiling the lexical fst. 2015-11-27T12:59:47+00:00
773a1147
> [Template merge - langs/und] Use tar + xz for a 40-50 % reduction in file size for zhfst files. 2015-11-27T09:15:52+00:00
774a1149,1150
> [Template merge - langs/und] Allow longer filenames by using tar-pax for make dist. 2015-11-26T09:49:48+00:00
> [Template merge - langs/und] Added upload target for zhfst files. That will be the only method for spell checking in more than one language for now (for regular users). Not ideal, but have no time for anything else. 2015-11-25T13:00:20+00:00
776c1152
< Updated svn:ignore’s. 2015-11-18T23:05:40+00:00
---
> [Template merge - langs/und] Ensure that all required cg3 files are copied over to the apertium dir. Also make sure that included files are copied before including files are processed. 2015-11-18T20:06:31+00:00
778a1155,1156
> [Template merge - langs/und] Silent build updates for Apertium. 2015-11-18T17:12:56+00:00
> [Template merge - langs/und] No morphology backend for now in our infra. Corrected typo. 2015-11-18T14:00:01+00:00
780a1159
> [Template merge - langs/und] Added support for the vfst fst format for voikko-based spellers, to be used in mobile apps. 2015-11-18T10:11:30+00:00
781a1161
> [Template merge - langs/und] Corrected typo. 2015-11-17T06:53:53+00:00
782a1163,1164
> [Template merge - langs/und] Upload xpi and MacVoikko files, beta versions. 2015-11-16T22:23:59+00:00
> [Template merge - langs/und] Look for saxon in $HOME/lib first. Fixes bug http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=2100. 2015-11-11T07:50:10+00:00
783a1166
> [Template merge - langs/und] Add lexicon version to the speller testing output. 2015-11-11T05:50:09+00:00
785a1169
> [Template merge - langs/und] Added a new variable HAS_FOMA, which will be set independently of the configuration if foma is available. This can be used to circumvent bugs in Hfst if weights are not needed: if foma is available, print as ATT, read in foma, perform transformations, print as ATT, convert, and continue. 2015-11-10T10:15:38+00:00
786a1171,1172
> [Template merge - langs/und] Error out if one tries to build abbr files with generators disabled. 2015-11-09T10:19:00+00:00
> [Template merge - langs/und] Error out if syntax is enabled and no vislcg3 is found or too old. 2015-11-09T09:59:34+00:00
787a1174
> [Template merge - langs/und] Added support for building abbr.txt. Copy of the sme template committed in r111579. 2015-11-09T09:43:12+00:00
789a1177,1178
> [Template merge - langs/und] Added targets for foma spellers, outcommented now due to build issues. Added more silent build strings. 2015-11-04T12:28:22+00:00
> [Template merge - langs/und] Added some general tag cleanup before making the speller fst used as input for the analyser and generator that is the last step before building the acceptor, Makes it easier to write yaml tests for the speller fst's. 2015-10-26T11:52:59+00:00
791c1180
< Updated svn ignores. 2015-10-20T07:50:38+00:00
---
> [Template merge - langs/und] Added filter to remove tags irrelevant to speller builds. Adjusted required version of GTCORE accordingly. 2015-10-19T07:33:28+00:00
792a1182
> [Template merge - langs/und] Corrected a bug with filter compilations for speller filters involving tag conversion to flag diacritics. 2015-10-15T10:15:01+00:00
793a1184
> [Template merge - langs/und] Make sure analyser-raw-gt-desc.hfst is always built, to ensure we have the necessary prerequisites for all targets. Refactored the initial speller fst build to use common build code for all fst technologies. Makes it possible to easier test and compare test results when debugging. 2015-10-14T12:49:35+00:00
794a1186
> [Template merge - langs/und] Changed the response to missing transducers from FAIL to SKIP to avoid problems with lexc tests for fst's not enabled and thus not available. Instead report the missing fst to the user. 2015-10-05T10:42:32+00:00
795a1188
> [Template merge - langs/und] Streamlined descriptive compounding tags to follow a shared tag structure. 2015-10-03T09:26:12+00:00
796a1190
> Replaced +RCmpnd with +Cmp/SplitR (and escaped variants) using the following commands: 2015-10-03T08:33:46+00:00
798a1193
> [Template merge - langs/und] Added a comment about the non-functioning of the initial edit setting. Made the compound-restricted fst a tmp file, to allow for additional local processing. 2015-09-29T08:15:10+00:00
799a1195
> [Template merge - langs/und] Removed all minimization of the error model except for the final build step. Removed also the initial letter handling for now, it blows up the error model, and slows it down correspondingly, making spellers that has turned this on useless. For now we apply the regular error model on the first letter, that seems to work ok. 2015-09-28T17:51:46+00:00
800a1197
> [Template merge - langs/und] Added a very short test script written by Lene to help run a subset of tests frequently needed. 2015-09-25T10:49:54+00:00
801a1199
> [Template merge - langs/und] Fixed a problem running bc on the linux servers, which caused the yaml test summaries to be blank. Fixes bug #2054. 2015-09-24T13:43:36+00:00
802a1201
> [Template merge - langs/und] Added an option to specify how many lines of the frequency corpus to be used in the frequency weighting, to trim the acceptor fst at a point where the weights don't really matter. 2015-09-23T10:42:55+00:00
803a1203
> [Template merge - langs/und] Replaced 'giellatekno' with 'giella' or added Divvun, depending on context. 2015-09-18T18:27:54+00:00
804a1205
> [Template merge - langs/und] Renamed m4/giellatekno.m4 to bring it in line with the switch to 'giella' for all things common to GT and Divvun. 2015-09-18T10:25:37+00:00
805a1207
> [Template merge - langs/und] The previous commit did not solve the issue - the different jars where checked in the wrong order. Now it should be ok. 2015-09-18T10:03:30+00:00
806a1209
> [Template merge - langs/und] Added standard Linux location for Saxon to the paths searched. Fixes bug #2080. 2015-09-18T07:32:26+00:00
807a1211
> [Template merge - langs/und] Corrected path for pkgconfig data and one variable name in MT filters. 2015-09-16T14:19:23+00:00
808a1213
> [Template merge - langs/und] gtdshared has been renamed to giella-shared, all references now updated. 2015-09-16T11:40:54+00:00
809a1215
> [Template merge - langs/und] More robust handling of MWE in speller testing. Now also possible to specify build dir different from source dir. 2015-09-16T08:22:50+00:00
810a1217
> [Template merge - langs/und] Added a Makefile.am variable to turn on or off corpus-based (frequency) weighting of suggestions. Default for the time being is off while we work out the best interactions between the different parts of the spellers. Changed one intermediary filename to ensure proper dependency checks and thus rebuilds. 2015-09-16T07:01:16+00:00
811a1219
> [Template merge - langs/und] Added support for specifying regexes or list of string pairs for initial and final symbols in the error model. Also added a Makefile variable to control whether to allow edits of the initial letter(s), default is ‘no’. 2015-09-15T13:17:52+00:00
812a1221
> [Template merge - langs/und] Guard against -q for lookups that don't support it. 2015-09-09T13:34:26+00:00
813a1223
> [Template merge - langs/und] Small code cleanup that has been lingering since June. 2015-09-09T09:14:53+00:00
814a1225
> [Template merge - langs/und] Made new of a new option for the speller suggestion testing: output an attribute on each test word element containing essential info about the correct suggestion. This will support better styling of the xml file with the test data. Also changed the path to the css from the local filesystem (which will vary from machine to machine) to the svn repository web url. 2015-09-07T13:09:45+00:00
816a1228
> [Template merge - langs/und] Added a variable to hold source files to be included in the distro but not compiled as such. 2015-09-03T20:16:46+00:00
818c1230
< Ignore temporary files generated by the speller suggestion test script. 2015-09-03T04:23:51+00:00
---
> [Template merge - langs/und] Added first version of a shell script to check the suggestions generated by spellers. Requires the file test/data/typos.txt for data input. 2015-09-02T19:44:33+00:00
819a1232
> [Template merge - langs/und] Shortened a filename to make tar happy when building distribution packages. 2015-09-02T13:29:49+00:00
820a1234
> [Template merge - langs/und] Fixed an error in distcheck - one test shell script was not included. 2015-09-02T09:02:17+00:00
821a1236
> [Template merge - langs/und] Made one step in the speller build behave properly wrt silent builds. Removed grammar checker targets, we are far from ready for this, and it breaks 'make distcheck'. 2015-09-02T07:00:32+00:00
822a1238
> [Template merge - langs/und] Added a variable to pass a compilation option to hfst-regexp2fst. Used this variable to compile all filter regexes with the option --xerox-composition=ON. This will ensure that all filters where flag diacritics are used as symbols will be compiled correctly for proper used in later compositions. A.o. this fixes a bug where tags converted to flags to restrict compounding did not work at all. 2015-09-01T17:58:55+00:00
823a1240
> [Template merge - langs/und] Replaced sed expression with double cut - the sed did not work on the xserve for whatever reason, and caused the testing to hang. 2015-08-17T09:56:34+00:00
824a1242
> [Template merge - langs/und] More robust checking of Saxon, now requires that any jar found is at least v8.0. 2015-08-17T08:07:28+00:00
825a1244
> [Template merge - langs/und] Added /usr/share/java/ as a search path for the Saxon jar, this is what is used on the UiT Linux virtual machines, and probably many other Linux systems. 2015-08-14T06:53:02+00:00
826a1246
> [Template merge - langs/und] Initial support for building Mozvoikko spellers for our languages. 2015-08-13T08:10:56+00:00
827a1248
> [Template merge - langs/und] Adding support for specifying one-sided tests (half tests) in the lexc test data, using an optional .gen or .ana "suffix" after the fst name. Simplified source file processing. 2015-08-06T12:08:46+00:00
828a1250
> [Template merge - langs/und] When building with Foma, use the new lexc-align feature. 2015-06-12T23:26:57+00:00
829a1252
> [Template merge - langs/und] Added lexicon filtering when pair-testing twolc rules. 2015-06-11T14:40:01+00:00
831c1254,1256
< [Template merge - langs/und] Corrected e-mail address, changed the template content of the transcription files from SMA to CRK, and at the same time corrected the direction of the code. Also added a default punctuation lexicon. 2015-06-09T21:29:05+00:00
---
> [Template merge - langs/und] 2022-10-23T12:41:07+02:00
> [Template merge - langs/und] 2022-10-23T12:39:51+02:00
> [Template merge - langs/und] Added support for easter eggs specific to alternative writing systems and other variants. Will help in debugging. 2015-06-07T19:54:05+00:00
832a1258
> [Template merge - langs/und] Moved specification of default weight and editing distance to the language specific Makefile. 2015-06-05T01:43:19+00:00
834a1261
> [Template merge - langs/und] After a lot of experimenting, a moderate set of changes to the speller error models. The biggest change is that the alphabet for the edit distance error model is not taken from the acceptor anymore, but must be explicitly listed in the editdist.*.txt file. The suggestion speed is back to normal, but more work is needed re the interaction of the error model and corpus weights. 2015-06-05T01:11:10+00:00
835a1263
> [Template merge - langs/und] Prefixed all silent build strings for Hfst tools with H, for easier identification. 2015-06-04T10:06:22+00:00
836a1265
> [Template merge - langs/und] Commented out the old target for calculating unit weights (default weight for out-of-corpus word forms), and added a new which is basically the highest tropical weight + the ALPHA smoothing value. This is just the first step in further developing the suggestion ordering for the spellers. 2015-05-27T14:55:29+00:00
837a1267
> [Template merge - langs/und] Added a simple test to check a minimum suggestion speed for our test word nuvviDspeller. No speller should be released that does not pass this test. Additional and more elaborate tests should be added as well, this is just the very bare minimum in suggestion speed testing. 2015-05-25T14:08:25+00:00
838a1269
> [Template merge - langs/und] Corrected typo in twolc compilation for foma (using hfst). 2015-05-24T13:50:50+00:00
839a1271
> [Template merge - langs/und] Worked around a bug in hfst-fst2fst by going via att and foma instead. 2015-05-23T09:15:24+00:00
840a1273
> [Template merge - langs/und] Initial support for compiling twolc files for foma by way of hfst, intersect and conversion to foma format. 2015-05-23T05:39:37+00:00
842c1275
< Change e-mail address 2015-05-21T14:03:58+00:00
---
> [Template merge - langs/und] Yaml testing is now working also when building with Foma. 2015-05-21T09:09:38+00:00
843a1277
> [Template merge - langs/und] Fixed downcasing of derived short names. Made yaml testing output a bit more readable (hopefully). 2015-05-21T08:05:30+00:00