-
Notifications
You must be signed in to change notification settings - Fork 1
/
est-x-utee.diff
2152 lines (2152 loc) · 347 KB
/
est-x-utee.diff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
453,476c453,472
< [Template merge - langs/und] The final mmove in the old svn infra: change the am-shared reference to point to giella-core parallel to the language dir. After this we can remove am-shared from each language. 2020-05-13T12:20:56+00:00
< [Template merge - langs/und] Fix mobile speller filename bug. 2020-05-12T16:55:49+00:00
< second attempt to get speller suggestions similar to those of Vabamorf 2020-05-11T17:03:10+00:00
< added shell scripts to yaml-test the descriptive morph transducer 2020-05-11T14:23:37+00:00
< [Template merge - langs/und] Fix speller generation bug. 2020-05-09T11:09:07+00:00
< [Template merge - langs/und] Fix speller analyser reference after the flattening of the tools/spellcheckers/ dir. 2020-05-09T08:18:53+00:00
< [Template merge - langs/und] Final step in flattening the tools/spellcheckers/ dir tree: removing the whole fstbased/ dir, with all subdirs. Finally! 2020-05-08T21:46:27+00:00
< [Template merge - langs/und] Fix automakefile error: no final backslash followed by an empty line. 2020-05-08T20:23:53+00:00
< [Template merge - langs/und] Step eight in flattening the tools/spellcheckers/ dir tree: flipping the switch. All pieces are in place for building everything in tools/spellcheckers/ only, and everything has been tested with one language, including make check (a few tests are skipped because the fst is not found, but no tests break). The old files are kept for the moment, in case unseen issues and missing data is popping up after the switch, but will be deleted after verification. 2020-05-08T17:19:47+00:00
< Merging in changes from rev 190705 before this file is lost and abandoned: 2020-05-08T16:50:37+00:00
< [Template merge - langs/und] Step six in flattening the tools/spellcheckers/ dir tree: copying fstbased/mobile/hfst/index.xml to the new location. 2020-05-08T15:49:30+00:00
< [Template merge - langs/und] Step six in flattening the tools/spellcheckers/ dir tree: moving TAGWEIGHTS out of the language independent part to the language specific part, so that we can specify different tagweight files for desktop and mobile spellers. 2020-05-08T13:15:07+00:00
< [Template merge - langs/und] Step four in flattening the tools/spellcheckers/ dir tree: modifying another set of build files for the new dir structure, and the consequences of one dir for all speller files. 2020-05-08T08:39:29+00:00
< [Template merge - langs/und] Step four in flattening the tools/spellcheckers/ dir tree: copying all non-make files from spellcheckers/fstbased/desktop/hfst/ to spellcheckers/. 2020-05-07T19:15:00+00:00
< [Template merge - langs/und] Step three in flattening the tools/spellcheckers/ dir tree: changing the relocated build files to adapt to their new home. 2020-05-07T15:54:28+00:00
< [Template merge - langs/und] Step two in flattening the tools/spellcheckers/ dir tree: copying the desktop/weighting/ dir as the default one - for most languages the mobile/weighting/ dir is just a copy of the desktop one. 2020-05-07T06:19:55+00:00
< first attempt to make error model more precise 2020-05-06T17:02:03+00:00
< [Template merge - langs/und] Step one in flattening the tools/spellcheckers/ dir tree: copying all subdir Makefile.am files to *.mod-* files in the top spellcheckers dir, except from the weigthing dirs. 2020-05-06T11:57:33+00:00
< [Template merge - langs/und] Added .gitignore file, as a preparatory step. 2020-05-06T10:43:53+00:00
< [Template merge - langs/und] Forgot to remove the entries for configure.ac re listbased spellers. 2020-05-06T08:49:56+00:00
< [Template merge - langs/und] Removed all list-based spellcheckers. There has not been any serious work in that area since the move to the new infrastructure 8 years ago. If there is a future need, we have it all in the rev history, and removing it simplifies other operations. 2020-05-06T07:43:06+00:00
< [Template merge - langs/und] Moved the files in tools/data/ to tools/tokenisers/, and removed the dir tools/data/. Part of the tools dir cleanup. 2020-05-06T06:49:29+00:00
< [Template merge - langs/und] Commented out check for GTLANG_xxx variable, it is not used, and the check output is confusing to users. 2020-05-05T12:44:03+00:00
< [Template merge - langs/und] Added checks for giella-core and giella-shared, symlinking to them if found, checking out (svn) or cloning (git) if not. Also removed every single reference to __UND__, it is not needed, and will cause merge conflicts. 2020-05-05T11:35:10+00:00
---
> [Template merge - langs/und] The final mmove in the old svn infra: change the am-shared reference to point to giella-core parallel to the language dir. After this we can remove am-shared from each language. 2020-05-13T12:12:25+00:00
> [Template merge - langs/und] Fix mobile speller filename bug. 2020-05-12T16:58:47+00:00
> [Template merge - langs/und] Fix speller generation bug. 2020-05-09T11:12:30+00:00
> [Template merge - langs/und] Fix speller analyser reference after the flattening of the tools/spellcheckers/ dir. 2020-05-09T09:47:27+00:00
> [Template merge - langs/und] Final step in flattening the tools/spellcheckers/ dir tree: removing the whole fstbased/ dir, with all subdirs. Finally! 2020-05-09T05:03:40+00:00
> [Template merge - langs/und] Fix automakefile error: no final backslash followed by an empty line. 2020-05-08T20:41:29+00:00
> [Template merge - langs/und] Step eight in flattening the tools/spellcheckers/ dir tree: flipping the switch. All pieces are in place for building everything in tools/spellcheckers/ only, and everything has been tested with one language, including make check (a few tests are skipped because the fst is not found, but no tests break). The old files are kept for the moment, in case unseen issues and missing data is popping up after the switch, but will be deleted after verification. 2020-05-08T18:28:19+00:00
> [Template merge - langs/und] Step six in flattening the tools/spellcheckers/ dir tree: copying fstbased/mobile/hfst/index.xml to the new location. 2020-05-08T15:52:37+00:00
> [Template merge - langs/und] Step six in flattening the tools/spellcheckers/ dir tree: moving TAGWEIGHTS out of the language independent part to the language specific part, so that we can specify different tagweight files for desktop and mobile spellers. 2020-05-08T13:21:48+00:00
> [Template merge - langs/und] Step four in flattening the tools/spellcheckers/ dir tree: modifying another set of build files for the new dir structure, and the consequences of one dir for all speller files. 2020-05-08T09:57:51+00:00
> [Template merge - langs/und] Step four in flattening the tools/spellcheckers/ dir tree: copying all non-make files from spellcheckers/fstbased/desktop/hfst/ to spellcheckers/. 2020-05-07T19:17:07+00:00
> [Template merge - langs/und] Step three in flattening the tools/spellcheckers/ dir tree: changing the relocated build files to adapt to their new home. 2020-05-07T16:47:55+00:00
> [Template merge - langs/und] Step two in flattening the tools/spellcheckers/ dir tree: copying the desktop/weighting/ dir as the default one - for most languages the mobile/weighting/ dir is just a copy of the desktop one. 2020-05-07T06:28:25+00:00
> [Template merge - langs/und] Step one in flattening the tools/spellcheckers/ dir tree: copying all subdir Makefile.am files to *.mod-* files in the top spellcheckers dir, except from the weigthing dirs. 2020-05-06T12:14:56+00:00
> [Template merge - langs/und] Added .gitignore file, as a preparatory step. 2020-05-06T10:51:00+00:00
> [Template merge - langs/und] Forgot to remove the entries for configure.ac re listbased spellers. 2020-05-06T08:52:49+00:00
> [Template merge - langs/und] Removed all list-based spellcheckers. There has not been any serious work in that area since the move to the new infrastructure 8 years ago. If there is a future need, we have it all in the rev history, and removing it simplifies other operations. 2020-05-06T07:53:52+00:00
> [Template merge - langs/und] Moved the files in tools/data/ to tools/tokenisers/, and removed the dir tools/data/. Part of the tools dir cleanup. 2020-05-06T06:58:26+00:00
> [Template merge - langs/und] Commented out check for GTLANG_xxx variable, it is not used, and the check output is confusing to users. 2020-05-05T12:46:02+00:00
> [Template merge - langs/und] Added checks for giella-core and giella-shared, symlinking to them if found, checking out (svn) or cloning (git) if not. Also removed every single reference to __UND__, it is not needed, and will cause merge conflicts. 2020-05-05T11:37:14+00:00
479,561c475,528
< fixed another wrong file name in Makefile.am which caused build to fail if WITH_OFST_TROPICAL was off 2020-04-28T07:53:04+00:00
< [Template merge - langs/und] The last hyphenation build fix: now also works with other than the default fst backend, e.g. with the foma backend. 2020-04-27T08:48:22+00:00
< [Template merge - langs/und] Removed a double target declaration, one from the old pattern-based build, and one from the fst build. It was a simple copy from fst to pattern, and is not needed anymore. 2020-04-27T07:59:57+00:00
< fixed a file name in src/Makefile.am that caused the build to fail 2020-04-27T07:42:09+00:00
< [Template merge - langs/und] Updated referenced filename. Old name was not found, and stopped all builds. 2020-04-26T16:08:58+00:00
< [Template merge - langs/und] Restored file that was accidentally deleted, also renamed it to the correct name after the dir reorg. 2020-04-26T08:58:42+00:00
< [Template merge - langs/und] One reference to an old filename corrected. Stopped all nightlies. 2020-04-25T21:18:56+00:00
< [Template merge - langs/und] Removing the last remnants of the old hyphenation directory structure. 2020-04-24T20:33:48+00:00
< [Template merge - langs/und] Moving the last files from patterns one dir up. 2020-04-24T19:53:05+00:00
< [Template merge - langs/und] Removed most of the old hyph files not needed anymore. 2020-04-24T17:25:51+00:00
< [Template merge - langs/und] Switched build to new, shallower build structure. The old files and dirs are still there, but not used. 2020-04-24T16:16:54+00:00
< a major re-design of the guesser: introducing flag diacritics to scan the word, and changing the way lexicon-based and guessed words transducers are combined; the guesser is far from finished 2020-04-24T15:23:37+00:00
< X ja ... pattern added to pronoun-verb agreement error patterns 2020-04-24T14:18:12+00:00
< [Template merge - langs/und] Forgot one file to be copied up one dir level, now done. 2020-04-24T13:57:57+00:00
< [Template merge - langs/und] Step one in flattening the tools/hyphenators/ dir tree: copying and renaming make files, copying the filter dir. The files are not yet connected. Also preparing new build instruction file. 2020-04-24T12:28:50+00:00
< [Template merge - langs/und] Added missing quote mark „ that caused unwanted behaviour in tokenisation. 2020-04-23T07:22:11+00:00
< [Template merge - langs/und] Updated references to dir names in giella-shared: requires new version of giella-common. Updated some test scripts to refer to the new dir names. 2020-04-23T06:30:53+00:00
< morphology/ -> fst/. 2020-04-23T05:36:36+00:00
< [Template merge - langs/und] The second big renaming: src/morphology/ -> src/fst/. All build, test and config files are updated. `make` and `make check` works for sma. 2020-04-22T12:40:42+00:00
< [Template merge - langs/und] Added dynamic construction of a regex of flag diacritics found in tokeniser fst's. The regex is used to ensure that flag diacritics are considered epsilons at token boundaries. Fixes a number of tokenisation bugs. 2020-04-22T09:18:14+00:00
< [Template merge - langs/und] A glaring miss stopped all nightly builds. Thanks to Tino for pointing out. 2020-04-22T05:41:20+00:00
< Restored previous content, that was overwritten by a not-so-thought-through conflict resolution command. 2020-04-21T17:04:13+00:00
< [Template merge - langs/und] Renamed src/syntax/ to src/cg3/, and updated all references to it. Part of the large restructuring, and a test case for more complex renaming. 2020-04-21T13:37:09+00:00
< Fixes to language specific build steps to make the phonology build work again. 2020-04-21T07:27:15+00:00
< [Template merge - langs/und] More cleanup after removing src/phonology/*: all references to it have been replacecd, and the file am-shared/src-phonology-dir-include.am has been removed. 2020-04-21T06:55:55+00:00
< [Template merge - langs/und] Forgot to remove src/phonology/Makefile from configure.ac. Duh. 2020-04-20T18:36:24+00:00
< Deleted src/phonology/ dir after all source files have been moved to src/morphology/. Some files have been renamed. All builds should continue to work as before. 2020-04-20T14:19:27+00:00
< Fixing documenation file ref and filename after the source file move. 2020-04-20T12:17:45+00:00
< [Template merge - langs/und] Changed documentation extraction & building to get the source doc in src/morphology/. 2020-04-20T12:02:56+00:00
< [Template merge - langs/und] The big switch: building phonology files are now changed from src/phonology/ to src/morphology. Documentation is still built in the old location, but will be moved separately due to higher conflict risk. 2020-04-20T11:19:18+00:00
< [Template merge - langs/und] Update phonology filename in src/morphology/Makefile.modifications-phon.am. 2020-04-20T07:29:47+00:00
< some trivial rules 2020-04-19T16:31:23+00:00
< [Template merge - langs/und] Copy src/phonology/Makefile.am to src/morphology/Makefile.modifications-phon.am and src/phonology/xxx-phon.twolc to src/morphology/phonology.twolc as step one in moving the file. Then the build can switch, and finally, the old files can be deleted. 2020-04-18T16:07:31+00:00
< [Template merge - langs/und] Corrected copy-paste bug in the build steps for areal grammar checker analysers. The bug caused SMJ to fail. 2020-04-17T06:33:14+00:00
< [Template merge - langs/und] Fixed bug with multiple declarations of EXTRA_DIST and noinst_DATA in the previous template merge. 2020-04-17T06:11:19+00:00
< [Template merge - langs/und] Preparations for moving the phonology files inside morphology/ (later to be renamed fst/). 2020-04-17T05:53:53+00:00
< typo 2020-04-15T08:35:29+00:00
< [Template merge - langs/und] Reorganised mt/apertium make files so that fixed content is in Makefile.am, and userj-editable content is in Makefile.modifications.am. 2020-04-07T12:58:36+00:00
< [Template merge - langs/und] Started splitting the local Makefile.am in two, by moving it to a new filename, and then create a new Makefile.am that just includes the moved one. In later commmits, some of the content can be moved from one file to the other. 2020-04-06T11:56:36+00:00
< [Template merge - langs/und] Fixed the remaining cases of improved upper-lower case configurable processing. Removed a variable from configure.ac with comments, turned out it wasn't needed. 2020-04-05T11:19:22+00:00
< [Template merge - langs/und] First step in fixing default case handling: downcasing of derived proper nouns can now be turned off for the standard fst's by changing a test in configure.ac. 2020-04-03T12:42:48+00:00
< This Estonian analyser uses another system for handling downcasing of derived nouns. We thus set the value to false, to avoid using the default setup. 2020-04-03T12:41:37+00:00
< [Template merge - langs/und] Fixed bug in phonology compilation when there are multiple phonology files: temporary files were deleted before being used due to name overlap. 2020-03-31T07:24:28+00:00
< [Template merge - langs/und] Added Automake variables to handle demanding or non-default uppercasing, or writing systems with no case distinction at all. 2020-03-30T13:43:49+00:00
< removed (for now) guesser from the raw transducer; the first error rules by Kaili for grammar checker, and one error described in errors.xml 2020-02-25T15:53:09+00:00
< guesser now covers also verbs and to some extent words that are declined with an apostrophe (e.g. Braille'st) 2020-02-21T18:24:21+00:00
< forgot to add the skript to upcase guessed proper names 2020-02-20T09:40:34+00:00
< adding a guesser makes the descriptive analyzer 3 times bigger than before; in an (unsuccessful) attempt to diminish it I gave up the previous way of decoding every sound with lowercase|uppercase version, and added a filter instead that up-cases the initial letter of a name. the guesser currently accepts only Estonian letters and a few foreign ones. 2020-02-19T17:08:50+00:00
< guesser-related improvements: fixed a bug in twolc (vowel shortening); modified makefile so that now guessed simplex words do not derive into smth else; slightly improved simplex word guessing 2020-02-18T13:41:04+00:00
< now we have a guesser for simplex words 2020-02-13T16:50:52+00:00
< [Template merge - langs/und] Adding |{➤}|{•} to pmscript. 2019-12-16T08:08:15+00:00
< introduced trigger {E} for -el/er ending words like piksel 2019-12-12T18:26:15+00:00
< docu 2019-12-04T14:38:19+00:00
< added 2600 words to lexicon by importing a newer vabamorf lexicon; added -nd (for -nud) as a NotNorm affix, and classified some verbs so that their NotNorm forms are analysed now (töödata, naasema etc) 2019-11-29T22:07:50+00:00
< [Template merge - langs/und] Added ‹ and › to the list of possible punctuation marks in the tokenisers. 2019-11-15T12:26:48+00:00
< Use # for word boundaries at this point. The next step in the speller fst build process will have the # removed, so that would be the correct place to add the tests as they were. Adjusted Makefile.am correspondingly. 2019-11-12T08:29:32+00:00
< Initial ATT file for a keyboard layout adjusted error model. This layout error model is based on SMA, as we don’t have any layouts for EST yet. SMA is pretty close though, and good enough for initial testing and playing with. 2019-11-12T08:28:00+00:00
< Correct variable syntax and variable name 2019-11-12T08:26:20+00:00
< Added a single yaml test just to make the test pass. More tests can be added if needed. Updated Makefile.am accordingly. 2019-11-12T08:02:08+00:00
< More test scripts accidentally deleted now restored, deleted in svn rev 163954. 2019-11-12T07:51:51+00:00
< Update the call to the test runner. 2019-11-12T07:38:04+00:00
< Undeleted files that were removed in rev. 174629. Even though they may not be used, they should be kept in svn. The tests will only be run if there are corresponding yaml files, and keeping them around makes est/ follow the rest of the languages. And if removed, the corresponding changes must be made also in the Makefile.am. I assume they were deleted by accident. 2019-11-12T07:32:47+00:00
< [Template merge - langs/und] Added Makefile setting for enabling swaps in error models (ie ab -> ba). Default is no (as this used not to work, and the existing error models are based on this fact). 2019-11-06T16:53:03+00:00
< some changes to appease make check 2019-11-05T16:47:37+00:00
< [Template merge - langs/und] Replace UNDEFINED with __UNDEFINED__, so that text replacement can take place. 2019-10-24T08:39:00+00:00
< Updated ignore patterns. 2019-10-23T18:25:34+00:00
< [Template merge - langs/und] tools/mt/Makefile.am needs am-shared/lookup-include.am as well. 2019-10-22T09:17:05+00:00
< [Template merge - langs/und] Forgot to add cgbased to the SUBDIRS variable in tools/mt/Makefile.am. 2019-10-22T08:30:19+00:00
< [Template merge - langs/und] Added basic support for CG-based machine translation. Ongoing work. 2019-10-22T07:20:24+00:00
< [Template merge - langs/und] Make sure some jspwiki header files for generated documentation are included in the distro. 2019-10-16T06:08:49+00:00
< [Template merge - langs/und] Made it possible to disable Forrest validation when Forrest is installed. This reduces build time and annoying warnings for people not working on the documentation. Default is still to do Forrest validation. 2019-10-14T10:47:28+00:00
< [Template merge - langs/und] Wrapped command line tools in double quotes, to protect against spaces in pathnames. Spaces will occur when building on Windows using Windows Subsystem for Linux, as locations such as 'Program Files' are included in the default search path. 2019-10-10T07:24:00+00:00
< Force unix line endings, to make sure it works ok also on the Windows subsystem for Linux. 2019-10-07T17:04:57+00:00
< [Template merge - langs/und] Improved build process for pattern hyphenators - now patgen config is done programmatically instead of interactively. The values are configured in the Makefile.am. 2019-10-02T21:59:56+00:00
< [Template merge - langs/und] Added script for testing tag coverage, made by Kevin, and originally for sme. 2019-09-17T08:35:00+00:00
< [Template merge - langs/und] Added support for multiple whitespace analysers. 2019-09-05T07:12:45+00:00
< [Template merge - langs/und] Added support for comments in error model text files. Added support for zipped but uncompressed files (required by divvunspell for now). 2019-09-04T20:28:04+00:00
< The last traces of Konrad Nielsen removed from our languages. 2019-08-13T07:28:00+00:00
< [Template merge - langs/und] Added simple shell script to easily run the grammar checker test tool, and considering build directories etc. 2019-08-09T11:49:01+00:00
< [Template merge - langs/und] Generate and compile the new filter for removing semantic tags in front of derivations. Require new version of the giella-core because of dependencies. 2019-06-14T11:03:00+00:00
< [Template merge - langs/und] Make sure all generated files have a suffix that will make them be ignored. Added comments to clarify. 2019-06-14T07:45:34+00:00
< [Template merge - langs/und] Børre updated the documentation url to point to giellalt.uit.no. 2019-06-14T07:08:52+00:00
< Updating svn ignores for tools/analysers/. 2019-06-14T06:33:54+00:00
---
> [Template merge - langs/und] The last hyphenation build fix: now also works with other than the default fst backend, e.g. with the foma backend. 2020-04-27T08:53:25+00:00
> [Template merge - langs/und] Removed a double target declaration, one from the old pattern-based build, and one from the fst build. It was a simple copy from fst to pattern, and is not needed anymore. 2020-04-27T08:02:35+00:00
> [Template merge - langs/und] Updated referenced filename. Old name was not found, and stopped all builds. 2020-04-26T16:15:18+00:00
> [Template merge - langs/und] Restored file that was accidentally deleted, also renamed it to the correct name after the dir reorg. 2020-04-26T09:01:14+00:00
> [Template merge - langs/und] One reference to an old filename corrected. Stopped all nightlies. 2020-04-25T21:23:08+00:00
> [Template merge - langs/und] Removing the last remnants of the old hyphenation directory structure. 2020-04-24T20:45:05+00:00
> [Template merge - langs/und] Moving the last files from patterns one dir up. 2020-04-24T19:55:19+00:00
> [Template merge - langs/und] Removed most of the old hyph files not needed anymore. 2020-04-24T17:38:12+00:00
> [Template merge - langs/und] Switched build to new, shallower build structure. The old files and dirs are still there, but not used. 2020-04-24T16:31:34+00:00
> [Template merge - langs/und] Forgot one file to be copied up one dir level, now done. 2020-04-24T13:58:39+00:00
> [Template merge - langs/und] Step one in flattening the tools/hyphenators/ dir tree: copying and renaming make files, copying the filter dir. The files are not yet connected. Also preparing new build instruction file. 2020-04-24T12:37:50+00:00
> [Template merge - langs/und] Added missing quote mark „ that caused unwanted behaviour in tokenisation. 2020-04-23T07:31:30+00:00
> [Template merge - langs/und] Updated references to dir names in giella-shared: requires new version of giella-common. Updated some test scripts to refer to the new dir names. 2020-04-23T06:47:02+00:00
> morphology/ -> fst/. 2020-04-23T05:59:35+00:00
> morphology/ -> fst/. 2020-04-23T05:33:47+00:00
> [Template merge - langs/und] The second big renaming: src/morphology/ -> src/fst/. All build, test and config files are updated. `make` and `make check` works for sma. 2020-04-22T19:33:52+00:00
> [Template merge - langs/und] Added dynamic construction of a regex of flag diacritics found in tokeniser fst's. The regex is used to ensure that flag diacritics are considered epsilons at token boundaries. Fixes a number of tokenisation bugs. 2020-04-22T09:26:18+00:00
> [Template merge - langs/und] A glaring miss stopped all nightly builds. Thanks to Tino for pointing out. 2020-04-22T05:42:35+00:00
> [Template merge - langs/und] Renamed src/syntax/ to src/cg3/, and updated all references to it. Part of the large restructuring, and a test case for more complex renaming. 2020-04-21T18:05:57+00:00
> [Template merge - langs/und] More cleanup after removing src/phonology/*: all references to it have been replacecd, and the file am-shared/src-phonology-dir-include.am has been removed. 2020-04-21T07:10:14+00:00
> Change references to phonology/XXX-phon.yyy in the section for language specific modifications. 2020-04-21T06:58:15+00:00
> [Template merge - langs/und] Forgot to remove src/phonology/Makefile from configure.ac. Duh. 2020-04-20T18:42:11+00:00
> Deleted src/phonology/ dir after all source files have been moved to src/morphology/. Some files have been renamed. All builds should continue to work as before. 2020-04-20T14:20:33+00:00
> [Template merge - langs/und] Changed documentation extraction & building to get the source doc in src/morphology/. 2020-04-20T12:05:17+00:00
> [Template merge - langs/und] The big switch: building phonology files are now changed from src/phonology/ to src/morphology. Documentation is still built in the old location, but will be moved separately due to higher conflict risk. 2020-04-20T11:36:28+00:00
> [Template merge - langs/und] Update phonology filename in src/morphology/Makefile.modifications-phon.am. 2020-04-20T07:29:26+00:00
> [Template merge - langs/und] Copy src/phonology/Makefile.am to src/morphology/Makefile.modifications-phon.am and src/phonology/xxx-phon.twolc to src/morphology/phonology.twolc as step one in moving the file. Then the build can switch, and finally, the old files can be deleted. 2020-04-18T16:05:21+00:00
> [Template merge - langs/und] Corrected copy-paste bug in the build steps for areal grammar checker analysers. The bug caused SMJ to fail. 2020-04-17T06:36:43+00:00
> [Template merge - langs/und] Fixed bug with multiple declarations of EXTRA_DIST and noinst_DATA in the previous template merge. 2020-04-17T06:16:14+00:00
> [Template merge - langs/und] Preparations for moving the phonology files inside morphology/ (later to be renamed fst/). 2020-04-17T05:59:47+00:00
> [Template merge - langs/und] Reorganised mt/apertium make files so that fixed content is in Makefile.am, and userj-editable content is in Makefile.modifications.am. 2020-04-07T13:54:30+00:00
> [Template merge - langs/und] Started splitting the local Makefile.am in two, by moving it to a new filename, and then create a new Makefile.am that just includes the moved one. In later commmits, some of the content can be moved from one file to the other. 2020-04-06T11:57:59+00:00
> [Template merge - langs/und] Fixed the remaining cases of improved upper-lower case configurable processing. Removed a variable from configure.ac with comments, turned out it wasn't needed. 2020-04-05T11:19:59+00:00
> [Template merge - langs/und] First step in fixing default case handling: downcasing of derived proper nouns can now be turned off for the standard fst's by changing a test in configure.ac. 2020-04-03T13:08:27+00:00
> [Template merge - langs/und] Fixed bug in phonology compilation when there are multiple phonology files: temporary files were deleted before being used due to name overlap. 2020-03-31T07:26:54+00:00
> [Template merge - langs/und] Added Automake variables to handle demanding or non-default uppercasing, or writing systems with no case distinction at all. 2020-03-30T13:47:05+00:00
> [Template merge - langs/und] Adding |{➤}|{•} to pmscript. 2019-12-16T08:22:47+00:00
> [Template merge - langs/und] Added ‹ and › to the list of possible punctuation marks in the tokenisers. 2019-11-15T12:37:38+00:00
> [Template merge - langs/und] Added Makefile setting for enabling swaps in error models (ie ab -> ba). Default is no (as this used not to work, and the existing error models are based on this fact). 2019-11-06T17:22:29+00:00
> [Template merge - langs/und] Replace UNDEFINED with __UNDEFINED__, so that text replacement can take place. 2019-10-24T14:20:07+00:00
> [Template merge - langs/und] tools/mt/Makefile.am needs am-shared/lookup-include.am as well. 2019-10-22T09:18:16+00:00
> [Template merge - langs/und] Forgot to add cgbased to the SUBDIRS variable in tools/mt/Makefile.am. 2019-10-22T08:34:12+00:00
> [Template merge - langs/und] Added basic support for CG-based machine translation. Ongoing work. 2019-10-22T07:30:33+00:00
> [Template merge - langs/und] Make sure some jspwiki header files for generated documentation are included in the distro. 2019-10-16T06:13:47+00:00
> [Template merge - langs/und] Made it possible to disable Forrest validation when Forrest is installed. This reduces build time and annoying warnings for people not working on the documentation. Default is still to do Forrest validation. 2019-10-14T10:57:18+00:00
> [Template merge - langs/und] Wrapped command line tools in double quotes, to protect against spaces in pathnames. Spaces will occur when building on Windows using Windows Subsystem for Linux, as locations such as 'Program Files' are included in the default search path. 2019-10-10T09:44:31+00:00
> [Template merge - langs/und] Improved build process for pattern hyphenators - now patgen config is done programmatically instead of interactively. The values are configured in the Makefile.am. 2019-10-02T22:19:52+00:00
> [Template merge - langs/und] Added script for testing tag coverage, made by Kevin, and originally for sme. 2019-09-17T08:42:15+00:00
> [Template merge - langs/und] Added support for multiple whitespace analysers. 2019-09-05T07:12:33+00:00
> [Template merge - langs/und] Added support for comments in error model text files. Added support for zipped but uncompressed files (required by divvunspell for now). 2019-09-05T04:07:21+00:00
> [Template merge - langs/und] Added simple shell script to easily run the grammar checker test tool, and considering build directories etc. 2019-08-09T12:10:05+00:00
> [Template merge - langs/und] Generate and compile the new filter for removing semantic tags in front of derivations. Require new version of the giella-core because of dependencies. 2019-06-14T11:08:42+00:00
> [Template merge - langs/und] Make sure all generated files have a suffix that will make them be ignored. Added comments to clarify. 2019-06-14T08:25:39+00:00
> [Template merge - langs/und] Børre updated the documentation url to point to giellalt.uit.no. 2019-06-14T07:08:18+00:00
564,572c531,537
< [Template merge - langs/und] Fixed stupid copy-paste error in the previous commit. Reorganised the code a bit to make a variable definition clearer and more logical. 2019-05-27T11:01:53+00:00
< [Template merge - langs/und] Make sure that the input to all variants of the mobile speller is weighted. 2019-05-27T07:13:11+00:00
< Updating svn ignores. 2019-05-24T09:52:39+00:00
< Updating svn ignores. 2019-05-24T09:40:09+00:00
< [Template merge - langs/und] Fixed fsttype mismatch error for filters when building mobile spellers, by building filters locally of the correct fst type, as we do for desktop spellers. 2019-05-24T09:11:52+00:00
< converter only, tag away. 2019-04-30T17:59:21+00:00
< turned the converters the correct direction. Some sma at the end remains. 2019-04-30T14:35:22+00:00
< Added literaliser symbol % to suffix boundary > in three different rules, now the file compiles. 2019-04-30T08:55:45+00:00
< |../common -> |/lang/common 2019-04-23T17:24:03+00:00
---
> [Template merge - langs/und] Fixed stupid copy-paste error in the previous commit. Reorganised the code a bit to make a variable definition clearer and more logical. 2019-05-27T11:15:02+00:00
> [Template merge - langs/und] Make sure that the input to all variants of the mobile speller is weighted. 2019-05-27T07:18:59+00:00
> [Template merge - langs/und] Fixed fsttype mismatch error for filters when building mobile spellers, by building filters locally of the correct fst type, as we do for desktop spellers. 2019-05-24T09:23:42+00:00
> Updated docs. 2019-05-01T18:28:20+00:00
> Remove doc/ prefixes 2019-04-23T17:21:28+00:00
> http://giellatekno.uit.no/doc -> https://giellalt.uit.no 2019-04-23T17:20:15+00:00
> http://divvun.no/doc -> https://giellalt.uit.no 2019-04-23T17:18:06+00:00
576,592c541,549
< [Template merge - langs/und] Added UpCase function to the tokenisers, to handle all-upper variants of the input side. It does almost double the size of the fst, but at least it is just one additional line of code. Also, it does only work in Linux/using glib (for other platforms it is restricted to Latin1 - still, that covers a major portion of the Sámi fst's and running text, so much better than nothing). 2019-03-22T14:30:22+00:00
< minor improvements in abbreviations 2019-03-19T09:57:52+00:00
< lexical language script lexlang.xfscript now working for hfst-xfst; guesser-related tag(s) not included. some irrelevant symbol bugs fixed in abbreviations 2019-03-18T15:44:03+00:00
< tried to describe el/er ending words differently: instead of symbol E2 which was used instead of e, now a new complicated 2-level rule deletes e, based on context. so words in stem lexicon look more natural. exceptions to 2-level rule are still with E2, or by putting words in a different cont. lexicon. lexlang script that should define the lexical language is not working at present... 2019-03-15T19:14:03+00:00
< [Template merge - langs/und] Ensure that the correct grammar checker pipeline is the default one, so that it will be executed when no pipeline is specified. 2019-03-13T08:45:12+00:00
< small improvements in compounding; paired words now together in the same transducer with simplex words (earlier they were separate) 2019-03-06T13:02:51+00:00
< added verbal derivation +Der/is, some simplex words and flags to some words in order to increase both recall and precision 2019-03-05T16:43:54+00:00
< pron__u__nciation 2019-03-03T14:38:07+00:00
< more work on compounding; fixed a bug in verbal us-derivation that resulted in output of similar lines (because the lines contained a different flag diacritic) 2019-03-01T20:39:55+00:00
< now the symbols get processed (almost) as intended 2019-02-28T18:42:34+00:00
< made compound combinatorics a little bit better still; some more 3 and 4 letter words prohibited from forming compounds (with various strictness), and more compound words (that became illegal by the new rules) added to the lexicon. proper nouns now classified into geo names, persons and other 2019-02-28T17:39:41+00:00
< [Template merge - langs/und] Added the new multichar +Symbol to the multichar definitions. 2019-02-28T07:21:29+00:00
< [Template merge - langs/und] Changed sub-post tag for symbols from +ABBR to +Symbol. Needs to be declared as multichar in each language. 2019-02-27T13:28:21+00:00
< Updated svn ignores. 2019-02-27T10:21:10+00:00
< [Template merge - langs/und] Added support for shared Symbol file: build rules, affix file, modifications to root.lexc. Also increased required version of giella-common, to make sure that the shared stem file is actually there. 2019-02-26T08:52:43+00:00
< [Template merge - langs/und] Fixed dir name typo that broke compilation. 2019-02-25T18:10:43+00:00
< [Template merge - langs/und] Added support for building an analyser tool. This is in practice an xml-specified pipeline identical to what is used in the grammar checker, but where the pipeline does text analysis instead of grammar checking. Also made grammar checkers and mobile spellers part of the --enable-all-tools configuration. 2019-02-25T15:37:11+00:00
---
> [/doc/ to [/ 2019-04-23T12:03:48+00:00
> We don't need the /doc prefix anymore 2019-04-23T11:58:24+00:00
> [Template merge - langs/und] Added UpCase function to the tokenisers, to handle all-upper variants of the input side. It does almost double the size of the fst, but at least it is just one additional line of code. Also, it does only work in Linux/using glib (for other platforms it is restricted to Latin1 - still, that covers a major portion of the Sámi fst's and running text, so much better than nothing). 2019-03-22T14:44:47+00:00
> [Template merge - langs/und] Ensure that the correct grammar checker pipeline is the default one, so that it will be executed when no pipeline is specified. 2019-03-13T08:46:19+00:00
> [Template merge - langs/und] Added the new multichar +Symbol to the multichar definitions. 2019-02-28T07:40:22+00:00
> [Template merge - langs/und] Changed sub-post tag for symbols from +ABBR to +Symbol. Needs to be declared as multichar in each language. 2019-02-27T13:33:17+00:00
> [Template merge - langs/und] Added support for shared Symbol file: build rules, affix file, modifications to root.lexc. Also increased required version of giella-common, to make sure that the shared stem file is actually there. 2019-02-27T08:28:18+00:00
> [Template merge - langs/und] Fixed dir name typo that broke compilation. 2019-02-25T18:10:10+00:00
> [Template merge - langs/und] Added support for building an analyser tool. This is in practice an xml-specified pipeline identical to what is used in the grammar checker, but where the pipeline does text analysis instead of grammar checking. Also made grammar checkers and mobile spellers part of the --enable-all-tools configuration. 2019-02-25T17:07:57+00:00
594,1019c551,817
< compounding made somewhat more strict and precise, by placing flag diacritics for individual words in lexicons (using scripts in import) 2019-02-21T18:22:53+00:00
< fiddled with flag diacritics to get compounding more restrictive; and added weights for lemmas (based on 15 mio corpus freq. dict), to get frequent words lifted upwards in output; commented guesser out for testing 2019-02-15T20:42:53+00:00
< [Template merge - langs/und] Added filter to remove the +MWE tag from the grammar checker generator. It blocked generation of some word forms (and should not be visible in any case). 2019-02-13T07:44:29+00:00
< guesser: trying to organise lexicons 2019-02-08T20:37:21+00:00
< acronym with hyphen now treated separately in root 2019-02-08T08:31:13+00:00
< corrected an old error in acronym inflection 2019-02-05T16:53:57+00:00
< normative analyser and guesser should not include a guesser, so filter all guessing paths out 2019-02-04T15:53:33+00:00
< some words' diacritic flags changed in lexicons 2019-02-01T19:48:28+00:00
< a temporary commit with a messy guesser 2019-02-01T14:06:42+00:00
< ups, forgot to save the last root.lexc 2019-01-28T19:02:08+00:00
< initial attempt to include a guesser 2019-01-28T19:01:24+00:00
< [Template merge - langs/und] Fixed another case of transducer format mismatch for hyphenators, this time regarding pattern-based hyph building. 2019-01-25T08:45:29+00:00
< [Template merge - langs/und] Corrected an instance of transducer format mismatch when building hyphenators. 2019-01-25T08:05:02+00:00
< a prefix without a trailing hyphen need not be accepted as a word by the speller 2019-01-22T14:39:07+00:00
< the abbreviations are now like in vabamorf by filosoft, i.e. sloppy, but better than nothing; no yaml tests for them yet 2019-01-22T10:32:30+00:00
< [Template merge - langs/und] Make the mobile keyboard layout error model work properly (ie on input longer than one char) by circumfixing it with any-stars. 2019-01-17T19:08:20+00:00
< [Template merge - langs/und] First round of improved handling of compilation errors in shell pipes: instruct make to delete targets when some of the intermediate steps fail. 2019-01-11T12:34:50+00:00
< [Template merge - langs/und] Added configure.ac conditional to control whether spellers for alternative orthographies are built. The default is 'true'. Set this to 'false' for historical or other orthographies for which a speller is not relevant. 2019-01-09T10:34:50+00:00
< dummy abbr file to make estdis work 2019-01-09T09:03:38+00:00
< compiling did not work, copied from fkv 2019-01-09T09:03:09+00:00
< [Template merge - langs/und] Fix broken hfst builds of xfscript files when there is no final newline in the source file (caused the save command to be shaddowed by the final line of text, usually a comment, so no file was saved, and thus there was nothing to work on for the next build step). 2019-01-09T08:56:21+00:00
< first attempt to add weights; in src/makefile.am and root.lexc. now the output cohorts are ordered so that more complex ones are down below, and +Use/Rare are down below, and less frequent grammatical categories are in lower lines 2019-01-08T15:40:26+00:00
< added frequency wordform list 2019-01-08T12:36:04+00:00
< [Template merge - langs/und] Apply alternate orthography conversion after hyphenation marks have been removed, but before the morphology marks are deleted. Especially word boundaries are useful for certain types of conversion, but other borders will likely be useful as well. The conversion scripts need to take the border marks into consideration. 2019-01-08T08:55:05+00:00
< Ignore compiled cg3 files in tools/tokenisers/. 2019-01-08T07:06:40+00:00
< Ignore more files, including files that are automatically added to svn when populating a new language. This is done to avoid them showing up as noise for external languages, in which case these files might not be in our svn (but in the external svn repo instead). 2019-01-08T06:56:01+00:00
< Reverted a number of things back to r172686, because my attempts to simplify the morph. description by adding KAVA to 1C (and modifying other 1C words to fit the new system) resulted in more complex twol rules. And crucially, the stem illative of seminar (TAUD): seminari and oleskelu (KAVA): oleskellu are probably impossble to describe in a uniform and simple manner. 2019-01-06T20:20:15+00:00
< a redundant rule from est-phon.twolc deleted 2018-12-20T13:08:41+00:00
< [Template merge - langs/und] Replicate the desktop error model for the mobile speller, and generalise the corpus weighting compilation. Now the build code is ready for mobile speller release. 2018-12-17T17:31:13+00:00
< ehtima, hauduma, uhtuma changed from 2 entries to 1 2018-12-17T16:46:07+00:00
< laiaku now correct 2018-12-17T10:51:55+00:00
< [Template merge - langs/und] Improved Easter egg generation, using the improved script in giella-core. Increased the required giella-core version correspondingly. 2018-12-14T09:11:01+00:00
< [Template merge - langs/und] Cleaned the HFST_MINIMIZE_SPELLER macro, and also its use. No need to include push weights anymore, it is done always, for all speller fst's. 2018-12-13T10:12:43+00:00
< [Template merge - langs/und] Push weights for all final fst's, + optimise error model. 2018-12-13T09:56:41+00:00
< [Template merge - langs/und] Changed how the att file is produced. From now on it should be built once, and then added to svn. The att file will usually not change, and storing it in svn will avoid rebuilding it every time. Also changed the compression. 2018-12-12T14:51:02+00:00
< [Template merge - langs/und] Added support for adapting the error model to the mobile keyboard layout for the language in question. 2018-12-11T14:17:07+00:00
< rename locatives to locatives_plus and other similar style changes 2018-12-04T11:06:45+00:00
< muuseum now like taud (as it should be), but this is achieved by a careful crafting of twol rules 2018-12-02T18:43:16+00:00
< remake of NIMI to get it similar to KAVA 2018-12-02T00:16:31+00:00
< trivial attempt to simplify continuation lexicons 2018-12-01T22:56:27+00:00
< FIAT type now clearly like TAUD. And Suess etc eroor fixed. 2018-11-27T19:21:25+00:00
< a modified yaml-test that I forgot to commit last time 2018-11-26T10:07:37+00:00
< rules of referral in paradigms (affixes) and 2-level rules changed; a step towards more uniform representation of declinations (and probably even more complex 2-level rules) 2018-11-23T20:38:23+00:00
< [Template merge - langs/und] Two more places to remove the Use/-GC and the MWE tags: mt and speller fst's. Now done. 2018-11-06T07:48:42+00:00
< [Template merge - langs/und] Had forgotten to remove the Use/-GC tag in the core fst's, only from all the others. Now fixed. 2018-11-05T15:51:27+00:00
< [Template merge - langs/und] Step 2 in blocking dynamic compounds of MWE tagged entries: moved all MWE tag processing away from the *-raw-* targets to the specific *.tmp targets. This way the MWE tags will survive long enough to be available for the blocking done in the tokeniser fst's. Tested in SME, and seems to work as intended. 2018-11-05T08:39:46+00:00
< [Template merge - langs/und] Added step 1 in blocking dynamic comounds between an MWE and another noun: added new filter that will turn the MWE tag into a flag diacritic. Increased required giella-common version number due to the new filter. 2018-11-02T11:16:04+00:00
< re-organised and simplified flag diacitics (aim: to make the whole stuff easier to understand); the lexlang.xfscript currently does not express the lexical language (punctuation-related tags are missing) 2018-10-29T15:24:22+00:00
< just some cleanup, no real changes 2018-10-26T17:23:04+00:00
< tokeniser pmscript needs flags explicitly listed; the tokeniser now works the same as for sme and vro 2018-10-24T14:13:43+00:00
< [Template merge - langs/und] Fixed bug when building the punctuation file - the required subdir was not made. 2018-10-24T08:32:02+00:00
< harmonized the lexicon with the generated punctuation; now the tokeniser analyses a separate comma as +CLB, although it doesn't separate it as a token: ja, is tokenised as ja (with , just being lost) 2018-10-22T20:26:08+00:00
< punctuation now in raw analyser, but tokeniser still not reckognising it 2018-10-22T13:44:02+00:00
< Rejected the idea that a word containing up to 2 hyphens may be built from indivifdual words, by just inserting hyphens. As a consequence, had to include acronyms on par with other words in the stems directory, and add them to the root.lexc compounding lexicon mess. morpholgy/Makefile.am needs attention on GENERATED_LEXC_SRCS= 2018-10-19T20:38:27+00:00
< added a lexical language defining xfscript for documentation and testing purposes; probably in a wrong directory (src/morphology) 2018-10-17T13:48:00+00:00
< Copy in the shared punctuation file from giella-shared. 2018-10-16T12:01:38+00:00
< [Template merge - langs/und] Moved the whitespace analyser almost to the beginning of the pipeline, directly after the tokeniser+analyser. This is to be able to support sentence boundary detection, as the whitespace analyser will give some valuable tags for that. 2018-10-12T14:05:29+00:00
< added 2 yaml scripts that were missing from svn (no idea how this came to be...) 2018-10-12T08:25:21+00:00
< yaml tests were meant for a descriptive ana/gene transducer, but they were located and named as -norm-; now they are renamed and relocated to -desc- 2018-10-12T08:01:17+00:00
< [Template merge - langs/und] Corrected typo in a configuration option - dekstop instead of desktop. Thanks to our friends in Nuuk for noticing. 2018-10-11T15:54:45+00:00
< fixed some bugs that had slipped in during re-makes over summer; the hfst version analyzes like in May 2018 2018-10-11T11:18:09+00:00
< [Template merge - langs/und] Corrected a misplaced dependency that caused url.hfst to be rebuilt on every make, and thus trigger other rebuilds. Not anymore. 2018-10-09T14:42:19+00:00
< [Template merge - langs/und] Moved whitespace tagging after the speller, to avoid that it creates trouble for the speller. That happens when whitespace error tags are applied to the word form that should be spell-checked. 2018-10-09T14:07:38+00:00
< [Template merge - langs/und] Made it possible to tag something as _only_ for the grammar checker, or _not_ for the grammar checker. Updated required giella-share version, due to new required filters. 2018-10-09T11:42:04+00:00
< [Template merge - langs/und] Moved whitespace chars to the blank regex, thereby reinstating the old compilation speed. Thanks to Kevin and Tino for noticing and suggesting the improvement. Also added comment to document what incondform is supposed to contain, again thanks to Kevin. 2018-10-09T10:01:28+00:00
< [Template merge - langs/und] Removed hyphen from the regular unknown alphabet, thereby reverting analysis of -foo as one (unknown) token, and instead back to two tokens. Added hyphen to alphamiddle, so that foo-bar will still be analysed as one big unknown token. 2018-10-09T08:51:54+00:00
< [Template merge - langs/und] Added the tokenisation disambigutation file to the compiled and installed targets. 2018-10-09T07:32:09+00:00
< [Template merge - langs/und] Better handling of unknowns: defined more whitespace characters, defined a lot more vowels in the alphabet, added recent improvements to flag diacritic like symbols at token boundaries. 2018-10-08T17:21:50+00:00
< [Template merge - langs/und] Fixed two build bugs: abbr.txt was only autogenerated when building with hfst, and the url.?fst file was not properly generated from url.tmp.?fst. 2018-10-04T10:58:49+00:00
< [Template merge - langs/und] Fixed bug in MT compilation - pattern rules are not used, but new filenames still had them due to copy-paste error. 2018-10-04T08:40:45+00:00
< [Template merge - langs/und] Added pmatch filtering also to MT and spellcheckers. Now all tools and fst's should be covered. 2018-10-04T07:53:29+00:00
< [Template merge - langs/und] Forgot to add pmatch filtering to the default targets in src/ - duh. Now done. 2018-10-04T07:31:29+00:00
< [Template merge - langs/und] Added pmatch filtering to the rest of the build targets in src/. Also added grammar checker filtering. 2018-10-03T09:49:12+00:00
< [Template merge - langs/und] Major reorganisation to properly handle pmatch preparations, by splitting the disamb-analyser compilation in two: one going to the regular disamb analyser, and the other going to the pmatch variant. We use the two tags +Use/PMatch and +Use/-Pmatch in complementary distribution to specify paths for each, one path containing pmatch backtracking poings (used with the --giella format of hfst-tokenise), and one without. The backtracking machinery is used to handle ambiguous tokenisation. Increased required version of giella-shared due to new, required filters. 2018-10-03T07:33:30+00:00
< [Template merge - langs/und] More improvements to the analysis regression check: undo space->underscore from lookup2cg (to avoid meaningless diffs when comparing to the new hfst-tokenise), and removed weight info. Also changed the dir ref for abbr.txt to ref the build dir, not the source dir, as that is where the file is generated. 2018-10-01T09:52:30+00:00
< some small bugfixes; but still xerox lookup cannot generate a wordform for +Par 2018-09-29T12:13:33+00:00
< [Template merge - langs/und] Improved regression check script: check that the abbr file is built, for improved traditional tokenisation; and make the patch command silent, for less noise during testing. 2018-09-29T12:05:54+00:00
< [Template merge - langs/und] Thanks to Børre, the analysis regression script will now remove diffs due to different handling of dynamic compounds when comparing old and new tokenisation. This makes it much easier to spot real differences between the two. 2018-09-25T10:10:09+00:00
< Updated svn ignores. 2018-09-25T08:25:04+00:00
< [Template merge - langs/und] Improved shell script for analysis regression testing, so that in cases of no diffs it will only print a short message and continue. The test for no diff is also much faster than a real diff. Improves processing time a lot for large test corpora. 2018-09-25T06:22:01+00:00
< [Template merge - langs/und] Moved punctuation definitions from each language to giella-shared/all_langs/. Makes much more sense, and will help in resolving random tokenisation bugs due to « and ». 2018-09-13T08:33:05+00:00
< [Template merge - langs/und] Implemented the option to compile phonology rules directly against the lexicon, for better rule compilation optimisations. Kevin: fixed a bug in xml generation for the grammar checker. 2018-09-11T07:16:02+00:00
< [Template merge - langs/und] Fixed hyphenation build when there is no phonology file. 2018-09-10T11:49:24+00:00
< More general ignore pattern for tools/mt/apertium/tagsets/. 2018-09-10T11:03:59+00:00
< [Template merge - langs/und] Corrected an error after the Hunspell config section was commented out. 2018-09-10T10:52:48+00:00
< [Template merge - langs/und] Added --enable-all-tools option to configure.ac, to allow for easier configuration and testing of all common tools. Unstable or experimental tools must still be explicitly enabled. Commented out the Hunspell speller config completely, it is not supported. Corrected a comment. 2018-09-10T09:49:35+00:00
< Updated svn ignore patterns. 2018-09-08T05:26:53+00:00
< [Template merge - langs/und] Improved and completed the code to skip building phonology fst's. Clearer logic and comments. 2018-09-08T04:39:57+00:00
< [Template merge - langs/und] Added a configure.ac setting to skip phonology compilation, typically used when compiling external sources, that provides a full analyser in src/morphology. Also added a configuration option to compile xfscript files with lexicon references in them, so allow for faster and more optimised rule composition. This variable has no effect yet, the rest of the machinery is missing. 2018-09-07T22:09:17+00:00
< [Template merge - langs/und] Remove all tmp files when cleaning. 2018-09-06T11:43:30+00:00
< [Template merge - langs/und] Remove also url.tmp.lexc when cleaning. 2018-09-06T11:36:02+00:00
< [Template merge - langs/und] Fixed bug: the url analyser is located elsewhere, and should not be processed here in any case. 2018-09-06T10:08:30+00:00
< [Template merge - langs/und] Made url analyser compilation open for local adaptations, by going via a tmp file. 2018-09-06T07:31:53+00:00
< [Template merge - langs/und] Remove also url.lexc when cleaning, it is copied from giella-shared. 2018-09-05T13:52:15+00:00
< xerox twolc happy with est-phon.twolc now, but analyser-raw-gt-desc.tmp.xfst.invert would not generate anything with +Par 2018-09-03T09:23:52+00:00
< [Template merge - langs/und] Corrected double installation of url analyser bug. It should not be installed at all. 2018-08-31T17:39:32+00:00
< [Template merge - langs/und] Add missing ‘|’ in analyser-gt-whitespace.hfst goal. 2018-08-31T10:49:54+00:00
< Updated svn ignores. 2018-08-30T15:58:31+00:00
< [Template merge - langs/und] Fixed a bug in the previous commit that surfaced when enabling tokenisers but not grammar checkers. 2018-08-30T13:36:49+00:00
< [Template merge - langs/und] Massive rewrite of filter codes and automatically generated tag conversions, all done to handle bug #2474 (URL tag not correctly formatted in the tokeniser output). The bug should be fixed now. 2018-08-30T11:50:04+00:00
< Updated svn ignores. 2018-08-29T05:25:44+00:00
< [Template merge - langs/und] Added filter dir and filter compilation to the fst-based hyphenators. Moved filter compilation from src/filters/ to the local filter dir (by copying the regex files and then compile them), to make the build process mostly fst format independent. 2018-08-28T11:17:10+00:00
< Updating svn ignores. 2018-08-28T10:41:36+00:00
< lD nD rD mB rS changed to phonological assimilation-dissimilation, like the tradition is (from previous non-traditional llD nnD etc that treated them as a special case of orthographical convention); was kand+N:kannD1, now kand+N:kanD1 2018-08-27T13:48:50+00:00
< [Template merge - langs/und] Added support for local modifications of the hyphenator build via a tmp file. Simplified tmp file handling in the src/ dir. 2018-08-27T11:59:27+00:00
< [Template merge - langs/und] Added dir structure and Autotools data to prepare for adding hyphenation testing. 2018-08-27T10:55:19+00:00
< [Template merge - langs/und] Downcasing of derived proper nouns was only applied on the input side, not the hyphenated side. This caused such words to be case-shifted: arabialaččat -> A^ra^bi^a^lač^čat. This is now fixed. 2018-08-27T07:48:39+00:00
< [Template merge - langs/und] Fixed hyphenation bug where the lexicon-based hyphenator missed hyphenation points, mainly in propernouns, due to flag diacritics. Fixed by telling the fst compiler to treat flags as epsilons. Now the lexicon-based hyphenator is beating the plain rule-based one in most (all?) cases where there are differences. Must be tested better, though. 2018-08-26T16:53:49+00:00
< [Template merge - langs/und] Added comment to guide placement of local build targets (to avoid future merge conflicts), and a comment reminder about other places to change filenames. 2018-08-22T06:46:22+00:00
< [Template merge - langs/und] Reorganised the source filenames to make it easy to override when needed. Should make it possible to solve the bug where src/syntax/disambiguator.cg3 overrides the same file in tools/grammarcheckers/. 2018-08-20T16:39:29+00:00
< [Template merge - langs/und] Refactored repeating patterns of code with variables, fixes upload link after XServe crash last winter. 2018-08-20T09:55:48+00:00
< cleanup net added to xfst commands for Xerox build rules in makefile; attempted to make x_declinations.lexc more readable by re-grouping cont. lexicons 2018-06-15T14:53:27+00:00
< xerox lookup and hfst-lookup now give the same result 2018-06-12T08:58:19+00:00
< [Template merge - langs/und] Corrected and improved the compilation of the analysers including the URL analysis. This should fix the problem with compiling SMA and other languages, and should in general reduce both compilation time and analyser size. The basic change was to union in the URL analysis as the last step in building the analysers, instead of early - the early injection led to fst blowup during minimisation. Now no blowup appears to take place. 2018-06-05T12:07:51+00:00
< acronyms - up to 5 letter allcaps strings, can be inflected according to their pronounciation; likely overgenerating 2018-05-28T17:06:55+00:00
< acronyms.lexc added to svn 2018-05-28T06:58:35+00:00
< first attempt to handle acronyms 2018-05-25T17:58:25+00:00
< now the max number of hyphens that separate words is set to 2; 2018-05-21T16:16:45+00:00
< [Template merge - langs/und] Added the special target .NOTPARALLEL to the hfst speller make file, to work around a make bug that caused a prerequisite to not be built when invoking make with the -j option. Also added some comments. 2018-05-18T13:00:26+00:00
< [Template merge - langs/und] Updated command in comments to use the correct option. 2018-05-18T06:32:57+00:00
< [Template merge - langs/und] Reverted the more robust semantic tag reordering, it was just too slow. Now we are back to a less robust and more fragile system (including bugs), but with faster compilation. Ultimately we will abandon _semantic_ tag reordering altogether, and instead rewrite the lexc code to always place the semantic tags where they should be. 2018-05-16T08:54:41+00:00
< [Template merge - langs/und] Corrected automake (and make?) syntax error that broke compilation. 2018-05-15T11:08:55+00:00
< [Template merge - langs/und] Simplified semantic tag filtering regex construction. 2018-05-15T07:27:42+00:00
< More things to ignore. 2018-05-14T09:52:56+00:00
< [Template merge - langs/und] Too eager in the previous commit to get rid of semantic tag processing: removed the filter to zero out semantic tags completely, which broke compilation of a number of fst's where semantic tags are not wanted. 2018-05-09T08:09:24+00:00
< [Template merge - langs/und] Corrected bugs in reordering semantic tags by doing the reordering in two steps: 1) insert the tag in the new and correct position, and 2) remove the tag in the wrong position. There will probably be things to iron out, but initial tests are fine. This should also make the whole semantic tag reordering a bit faster to compile and apply, as the generated regexes are smaller and simpler. 2018-05-08T17:53:29+00:00
< elamine-olemine corrected 2018-05-04T13:11:40+00:00
< [Template merge - langs/und] Now that the downcasing script works in all cases, remove all the special processing, and get rid of spurious rebuilds of the dependent fst's. Another time-saver:-) 2018-05-02T09:44:57+00:00
< [Template merge - langs/und] Changed the downcasing script to work also with hyperminimised hfst-fst's. Now the downcasing script works both with Xerox, Hfst and Foma, and both with standard and hyperminimised hfst-fst's. Finally! 2018-05-02T09:10:41+00:00
< Applying Tino’s Unicode fix to all other perl scripts in the src/scripts/ dir. 2018-04-26T19:48:05+00:00
< [Template merge - langs/und] Added support for filters for grammatical and derivation tags, sorted the generated filter list. 2018-04-23T14:45:28+00:00
< [Template merge - langs/und] Bugfix: OLang/xxx tags were removed, not made optional, in generators. 2018-04-20T08:20:52+00:00
< [Template merge - langs/und] Do not delete disambiguator.cg3 and grammarchecker.cg3 when cleaning. 2018-04-19T07:27:50+00:00
< [Template merge - langs/und] Whether to let the orig-lang tags be visible in the disambiguating analyser or not is dependent on the language and the needs of each language community. Moving the removal of those tags from the general processing to the language specific processing. Step 2: removing it from the general processing. 2018-04-18T13:09:08+00:00
< tagset modification for apertium now looks over flag diacritics (kind of ...); an unfinished (and not used) filter for compounding 2018-04-17T11:06:59+00:00
< numbers: viieeurone 2018-04-12T12:26:10+00:00
< numbers 2018-04-12T09:40:32+00:00
< numbers and final_components automatically re-made from stems/*.lexc 2018-04-09T13:33:19+00:00
< oops, forgot the numbers.lexc file 2018-04-09T08:51:55+00:00
< now numbers are analyzed, although not as meticulously as a speller would require (or style checker? or grammar checker?); and ʲis used as a palatalisation symbol in the lexicon, even though ʲ is currently not used for anything in the analysis/generation process 2018-04-09T08:40:29+00:00
< compounding 2018-04-05T12:21:29+00:00
< compounding 2018-04-04T09:58:35+00:00
< compounding 2018-03-29T14:14:37+00:00
< compounding 2018-03-23T18:38:43+00:00
< compounding 2018-03-21T19:13:58+00:00
< compounding 2018-03-20T16:11:42+00:00
< compounding and bug fix 2018-03-19T18:28:43+00:00
< compounding 2018-03-16T18:29:03+00:00
< compounding 2018-03-16T15:11:20+00:00
< compounding and some bug fixes 2018-03-14T18:51:53+00:00
< compounding and small bugfixes 2018-03-13T13:55:28+00:00
< dueti fixed in twolc, some more short words allowed to act as 1st part of compounds 2018-03-09T19:34:28+00:00
< [Template merge - langs/und] Added the -p option to the yaml testing command, to remove all passing test. This should make it easier to spot the actual FAILs. 2018-03-08T12:46:47+00:00
< [Template merge - langs/und] Corrected path to zhfst file. Also changed the return code when the zhfst file is not found, so that it will be reported as a FAIL. Since this test is only run when configured for building spellers, a missing zhfst file should be fatal. Also changed variable name to avoid confusion with the shell variable. 2018-03-08T10:55:45+00:00
< [Template merge - langs/und] Added phony target forwarding 'make test' to 'make check'. Required to make 'make check' work on some build systems. 2018-03-08T10:36:21+00:00
< comments updated 2018-03-06T15:35:23+00:00
< [Template merge - langs/und] Added a separate disambiguation file for the spell checker output, and a spell-checker-only pipeline (well, still tokenisation and disambigation, but no proper grammar checking). 2018-03-05T15:20:58+00:00
< [Template merge - langs/und] Corrected Foma compilation for phonology rules. 2018-03-05T10:15:59+00:00
< some small bugfixes 2018-02-23T15:35:12+00:00
< Added svnignore pattern for sigma.txt. 2018-02-21T10:01:06+00:00
< [Template merge - langs/und] Made symbol alignment default - I can see no cases where we don't want it, but it is still possible to disable it if such a need pops up. Also improved the error message when trying to build a twolc language using Foma. 2018-02-09T08:00:22+00:00
< [Template merge - langs/und] Added INFO text about switching to Hfst as a fallback when Xerox tools are not found. Also added test and error message when using Foma on a language with a twolc file. 2018-02-09T07:11:09+00:00
< Two more files to ignore. 2018-02-06T09:34:41+00:00
< [Template merge - langs/und] Fixed URL analysis in MT. All URL's and email addresses are now tagged +URL. Although the url analyser itself is small, the resulting analyser quadrupled in size (in sme). 2018-02-05T19:44:27+00:00
< [Template merge - langs/und] Removed filters for removing morphological borders - they destroy the assymetry of the fst's, and make yaml testing more complicated. 2018-02-02T08:10:41+00:00
< [Template merge - langs/und] Added support for Area variants of the grammar checker generator. Should fix nightly build error for SMJ. 2018-02-01T19:26:57+00:00
< [Template merge - langs/und] Added missing Foma support for dictionary fst's. 2018-02-01T18:13:16+00:00
< [Template merge - langs/und] Fixed the last bunch of path errors. Now all yaml tests are back to normal. 2018-02-01T17:48:24+00:00
< [Template merge - langs/und] Cleanup: commented in outcommented test loop, removed exit statement used during development, fixed path for two test scripts. 2018-02-01T15:57:03+00:00
< [Template merge - langs/und] The last set of test runners for yaml tests changed to the new system. 2018-02-01T15:11:08+00:00
< [Template merge - langs/und] Three more yaml test runners done, still a few more to go before yaml testing is back in shape. 2018-02-01T13:56:58+00:00
< [Template merge - langs/und] Changed the last yaml testing scripts in the template to follow the new and improved system. No need for autoconf processing anymore. 2018-02-01T10:53:02+00:00
< [Template merge - langs/und] Major rework of the yaml testing framework, to be able to properly support fst type specific yaml testing (ie test only xfst or hfst transducers, or everything but xfst transducers (=foma & hfst)). This change triggered a number of other changes. The user-facing shell scripts are greatly simplified by this change. 2018-02-01T09:56:08+00:00
< Updated svn ignores. 2018-01-31T12:06:31+00:00
< [Template merge - langs/und] Corrected AM errors in the previous merge. Now the build is working again, 2018-01-31T11:40:41+00:00
< [Template merge - langs/und] Added support for grammar checker generators for alternative orthographies and writing systems. Should fix nightly build issue in CRK. 2018-01-31T11:11:48+00:00
< ehttartulik and similar compounds with adjectives derived from proper names 2018-01-26T18:50:24+00:00
< some vowel plural forms should not be created/analised 2018-01-25T10:50:52+00:00
< [Template merge - langs/und] Added support for a grammar checker specific generator. Should fix various issues re generation of suggestions. 2018-01-25T09:24:02+00:00
< vowel plural less restricted now, and some twol rules fixed 2018-01-24T18:53:23+00:00
< [Template merge - langs/und] Added test for the presence of divvun-validate-suggest, which is now required to build grammar checkers. Now configure will error out instead of make. 2018-01-23T07:19:58+00:00
< compounding: pronouns, adverbs 2018-01-22T18:23:19+00:00
< [Template merge - langs/und] Add note to the errors.xml file that it is generated, and from which file it is generated, to avoid people editing the wrong file. 2018-01-22T12:33:52+00:00
< [Template merge - langs/und] Error messages are now copied from a source file to a build file, after bein validated. This allows support for VPATH builds and retains the integrity of the zcheck file. At the same time also replaced hard coded language names with automake variable expansion in the pipespec.xml.in file. 2018-01-22T10:24:51+00:00
< kapitalist etc derivation 2018-01-21T20:22:03+00:00
< võitu fixed 2018-01-21T17:36:48+00:00
< compounding 2018-01-20T18:48:29+00:00
< test files for compounding 2018-01-20T16:54:55+00:00
< compounding a little better now 2018-01-19T19:41:14+00:00
< allcaps script now deployed in analysers 2018-01-18T16:03:01+00:00
< [Template merge - langs/und] Fixed bug in building dictionary analysers for alternative orthographies, introduced in the changes yesterday. 2018-01-18T07:04:50+00:00
< [Template merge - langs/und] Added option to specify language variant, to allow testing spellers for alternative writing systems, alternative orthographies, different countries etc. 2018-01-18T06:26:43+00:00
< [Template merge - langs/und] Added support for area / country specific fst's for the specialised dict and oahpa build files. At the same time reorganised the build code so that targets with two variables now consistently use the fst type / suffix as the pattern, and the writing system/alt orth/area/etc as the function parameter. This should make the build system more robust by reducing the risk for accidental pattern similarity. 2018-01-17T11:30:00+00:00
< [Template merge - langs/und] Added support for building area/country specific spellers. The target language for now is SMJ, but the feature is of course language independent and useful in a number of other circumstances. 2018-01-16T19:31:57+00:00
< [Template merge - langs/und] Changed dialect fst filenames to follow existing patterns used for Oahpa fst's. 2018-01-16T14:38:00+00:00
< [Template merge - langs/und] Added support for building dialect fst's. It is disabled by default, but can be enabled with a configure option. Also changed the disamb analyser to keep the dialect tags. Only normative fst's are filtered against dialect tags. 2018-01-16T12:09:04+00:00
< uppercase for hyphenated words 2018-01-16T07:53:43+00:00
< [Template merge - langs/und] Added initial support for building Area-specific analysers and generators (norm only). Also restored Area tags in the disamb and grammar checker analysers. Fixed missing support for Foma transducers in the alternative writing system support. 2018-01-16T07:33:21+00:00
< [Template merge - langs/und] Grammar checker .zcheck file should go into datadir, not libdir. 2018-01-15T11:33:44+00:00
< [Template merge - langs/und] Now using speller version info from configure.ac, not version.txt, which is removed. New giella-core required. 2018-01-15T09:56:53+00:00
< [Template merge - langs/und] Fixed a bug in fst format handling for the grammar checker - conflicting formats caused a segfault. Now using openfst-tropical for all fst's being processed in the grammarcheckers/ dir (presently only the speller acceptor analyser). 2018-01-15T08:44:53+00:00
< minor changes in compounding 2018-01-12T19:47:37+00:00
< [Template merge - langs/und] Fixed OLang tag extraction and filter generation. 2018-01-12T13:13:04+00:00
< [Template merge - langs/und] Added weights to compounds in the language-indpendent build steps (languages without compounds will go through the same step, but will not be changed). Applied only to analysers. Also added spellrelax to the language-independent build of the analysers = it it always applied. 2018-01-12T11:53:15+00:00
< [Template merge - langs/und] Improved the previous fix: make sure it does not crash when the target file does not exist, and use the same test on all autogenerated tag lists. This should save a few more seconds of build time. 2018-01-12T08:28:10+00:00
< minor improvements in compounding 2018-01-11T18:47:05+00:00
< [Template merge - langs/und] Fixed bug #2355 so that the filters for semantic tags will only be rebuilt when there are real changes to the semantic tags. 2018-01-11T17:12:17+00:00
< [Template merge - langs/und] Corrected a € vs cut incompatibility on Linux, cf bug report #2457. 2018-01-11T08:40:19+00:00
< compounding hopefully better now (work on Helsinki-Tallinn boat) 2018-01-10T14:50:37+00:00
< [Template merge - langs/und] Updated the pipespec.xml file to comply with the newest version of the grammar checker code, where each argument type is explicitly specified. Makes for a more robust pipeline. 2018-01-10T11:59:03+00:00
< [Template merge - langs/und] Corrected fileref in m4, added correct autoconf path to errors.xml. 2018-01-08T14:42:23+00:00
< [Template merge - langs/und] Renamed pipespec.xml to *.in, to allow autoconf processing. This makes it possible to use modes when building using VPATHS/out-of-source builds. 2018-01-08T14:20:23+00:00
< here, 'ei' got Adv analysis, thus quoting it instead. 2018-01-08T10:33:52+00:00
< [Template merge - langs/und] Hard-coded filename in fallback target - that was the only way to work around a loop in make on some systems. 2018-01-08T09:42:06+00:00
< [Template merge - langs/und] Renamed src/syntax/disambiguation.cg3 to src/syntax/disambiguator.cg3, to keep the file naming consistent (actor noun if possible), and remove discrepancy between the regular disambiguator and the grammar checker disambiguator that caused makefile troubles. 2018-01-08T05:50:37+00:00
< Removed tags not found in Estonian (Acc, VGen, ImprtII) 2018-01-07T08:26:36+00:00
< Using the est dis file (taken from langs/est), not the sma one. 2018-01-07T08:24:53+00:00
< est, not sme, tags. 2018-01-06T20:39:03+00:00
< Corrected BARRIER ... LINK order, and added final ; 2018-01-06T20:26:18+00:00
< docu 2018-01-06T17:30:48+00:00
< Test corpora for courses, all from biggies/ and may just be deleted if no longer needed here. 2018-01-06T16:33:25+00:00
< one grc rule ok. 2018-01-06T15:11:33+00:00
< da-infinitive more relaxed in compounds 2018-01-02T13:42:08+00:00
< compounding a little more strict now 2017-12-19T19:01:32+00:00
< added some words to prefixes.lexc and reclassified a few adverbs to get more words recognised as compounds 2017-12-14T10:28:01+00:00
< added some comments in lexc files, relaxed orthography (minus-sign now allowed at the end of a word), and re-classified some adverbs so that they can participate in compounding 2017-12-13T18:29:13+00:00
< [Template merge - langs/und] Heavy rewrite of the analysis regression check tool, to support testing the grammar checker pipeline. 2017-12-12T11:44:25+00:00
< [Template merge - langs/und] Do not remove semantic tags, dialect tags and other tags useful for disambiguation or suggestion generation. The grammar checker speller needs these, and they will anyway disappear when we project the final fst. 2017-12-11T13:03:01+00:00
< Updated svn ignores. 2017-12-11T12:51:45+00:00
< [Template merge - langs/und] Proper verbosity specification in a few more instances, and added weight pushing for the grammar checker speller now (how could I have missed that?). 2017-12-01T12:23:37+00:00
< [Template merge - langs/und] Fixed a bug in piped hfst-xfst commands: in three cases the -p option was missing, causing strange misbehavior in hfst-xfst on some systems. 2017-12-01T11:58:58+00:00
< [Template merge - langs/und] Further configure.ac cleanup: moved some variable definitions to other m4 files, moved the language definition on top, deprecated GTLANG* variables for GLANG* variants (ie Giella instead of GiellaTechno). Updated copyright year. 2017-12-01T10:10:31+00:00
< [Template merge - langs/und] Moved all default AC_CONFIG_FILES into a separate function in a separate m4 file, to clean up configure.ac. Some other cleanup of configure.ac. 2017-12-01T09:18:28+00:00
< [Template merge - langs/und] Defined variable for separate speller release version string. 2017-12-01T08:18:29+00:00
< [Template merge - langs/und] Changed package name and version to more clearly be a real name and version number. 2017-12-01T08:09:02+00:00
< [Template merge - langs/und] Updated comment in preparation for other changes. 2017-12-01T07:47:21+00:00
< [Template merge - langs/und] Added support for analysing whitespace and thus make it possible to tag whitespace errors (double spaces, extra spaces, etc), and also to more reliably detect sentence and paragraph borders by using whitespace as a delimiter. 2017-11-30T14:07:36+00:00
< [Template merge - langs/und] Using absolute dir refs to make it possible to call the shell scripts from everywhere. 2017-11-30T12:31:01+00:00
< [Template merge - langs/und] Fixed a bug: forgot to remove a line. 2017-11-29T13:32:44+00:00
< [Template merge - langs/und] Rewrote the speller test scripts in devtools/ to be VPATH safe and rely on autotools for paths etc, so that the scripts will work also when only checking out single languages. 2017-11-29T12:09:07+00:00
< [Template merge - langs/und] Added support for specifying language-specific files to be included in the grammar checker archive file. 2017-11-15T13:17:02+00:00
< [Template merge - langs/und] Updated grammar checker files and build rules. 2017-11-13T09:38:05+00:00
< compounding is more restrictive now, and it fails to recognize many legitimate compounds also... 2017-11-09T17:21:45+00:00
< [Template merge - langs/und] Added hfst-push-weights to move transducer weights to the beginning of the strings, to enable proper optimisations of speller lookup in hfst-ospell. Stripped out most lang-specific stuff from grammar checker cg file, and added simple example rules + some explanations. Use gramcheck tokeniser in pre-pipe. 2017-11-07T15:44:23+00:00
< compounding must forget capitalisation in-between; gi/ki do not change the consonant cluster in front of it 2017-10-27T13:27:11+00:00
< some improvements in compounding 2017-10-26T13:54:53+00:00
< minor update wrt language norm 2017-10-25T14:21:20+00:00
< [Template merge - langs/und] Added default rule for speller suggestions, to make the suggestions survive cg treatment. 2017-10-24T17:26:25+00:00
< [Template merge - langs/und] Added spell checking component to the grammar checker pipeline. Now every planned component is working as it should. The spell checking requires first that one builds the latest hfst-ospell code, and then the newest grammar checker code for this to work. 2017-10-24T12:48:15+00:00
< docu 2017-10-16T15:33:39+00:00
< [Template merge - langs/und] Increased weights for fall-back rule-based hyphenation. Added .hfst suffix to rule fst for consistency. 2017-10-13T07:35:48+00:00
< added some forms for ohtlik-type words; these get a Use/NotNorm analysis 2017-10-12T17:32:28+00:00
< [Template merge - langs/und] Replaced the huge sme grammar checker with the more moderate smn grammar checker cg file, as the template file for future grammar checkers. 2017-10-12T08:30:44+00:00
< [Template merge - langs/und] Added note (readme file) about NOT touching the local am-shared dir, to avoid future unintended changes. 2017-10-12T06:32:03+00:00
< some corrections for words that have parallel forms 2017-10-11T18:03:06+00:00
< [Template merge - langs/und] Added the missing files for a working grammar checker. Fixed grammar checker build rules to not be dependent upon enabling tokenisers. 2017-10-11T15:15:25+00:00
< reorganised the way transducers are combined (and had to re-implement derived word downcasing), which resulted in cutting the size of the final transducer to half what was before 2017-10-11T12:52:55+00:00
< Updated svn ignores for tokenisers and grammar checkers + subdirs. 2017-10-11T11:35:46+00:00
< Updated svn ignores for tokenisers and grammar checkers + subdirs. 2017-10-11T11:03:08+00:00
< [Template merge - langs/und] Added conversion of the analysis tags from the grammar checker speller into CG format. 2017-10-11T05:16:05+00:00
< [Template merge - langs/und] One misplaced variable caused the grammar checker speller to be built independent of the configuration. This caused a build fail for everyone. Solves bug #2437. Also added $(srcdir) in front of root.lexc, to ensure that the file reference resolves correctly in local build targets. 2017-10-10T09:23:22+00:00
< Made the local build code VPATH compatible, and added silencer macro. Also added a local clean target. Now the code builds for me. 2017-10-10T08:57:50+00:00
< [Template merge - langs/und] Moved the target clean-local to the local Makefile, to make it possible to enhance the clean target with locally generated files. 2017-10-10T08:53:25+00:00
< changed the way transducers are combined in makefile (in order to get less parallel paths) 2017-10-09T07:49:15+00:00
< some small improvements in compound words 2017-10-06T18:02:10+00:00
< fixed some bugs in files that I considered finished long agu (/affixes/*, est-phon.twolc, stems/*; the bugs showed up in test runs on a large corpus, and the word(form)s are not part of yaml-tests at the moment 2017-10-04T13:29:56+00:00
< [Template merge - langs/und] Correctiona to the grammar checker speller build: we now build a working zhfst file that can be used as part of the development cycle. Also additions to silent builds. 2017-10-04T04:06:57+00:00
< Fixed a jspwiki syntax error in the generated documentation. 2017-10-04T03:41:41+00:00
< script 2017-10-03T15:17:44+00:00
< docu 2017-10-03T15:17:07+00:00
< [Template merge - langs/und] Major update to the grammar checker template. It still does not work completely as it should, so hold your horses. Update content: ensured that all files needed are copied to the grammar checker build dir, removed option to name files (=irrelevant bloat), now builds an almost proper zip file, and ensured that tokenisers are built before grammarcheckers. Also made it so that when grammar checkers are enabled, spellers are automatically enabled too, as they will be included as part of the grammar checker pipeline. 2017-10-03T06:50:59+00:00
< xerox version does not generate +Par forms 2017-09-29T15:08:28+00:00
< allowing many - or / signs inside a word 2017-09-29T14:49:32+00:00
< prefixes added 2017-09-29T07:40:00+00:00
< first version of compounding with flag diacritics 2017-09-28T09:45:45+00:00
< [Template merge - langs/und] Changed the file exists test for the lemma generation testing so that it will work even in cases where multiple source files are used as input. 2017-09-20T11:48:42+00:00
< [Template merge - langs/und] Made cg3 file compilation more general. 2017-09-19T14:13:29+00:00
< [Template merge - langs/und] Moved the code to build the apertium relabel script in the apertium directory, so that we can use the actual giella-tagged fst for MT as the tag source. This should fix all issues of missing tags in the relabel script. 2017-09-15T14:03:38+00:00
< [Template merge - langs/und] GLE requires regex compilation possibilities in src/, no reason why it can't be. 2017-09-14T11:22:55+00:00
< [Template merge - langs/und] Fixed a shortcoming in the build infra uncovered by gle: no explicit support for language-specific build rules that will not end up in lexicon.?fst. 2017-09-14T05:44:15+00:00
< now the rule context never crosses a word boundary (in compounds) 2017-09-12T07:25:36+00:00
< docu 2017-09-08T21:54:52+00:00
< modified 2-level rules to bypass a xerox twolc feature 2017-09-04T13:00:10+00:00
< added non-compounding components (noncomp.lexc and nonfcomp.lexc) to svn 2017-08-29T06:50:28+00:00
< some errors in lexicons corrected; de, re, taas introduced as verb prefixes 2017-08-28T16:41:43+00:00
< [Template merge - langs/und] Moved tag extraction to a separate am-include file, so that it can be shared between different dirs. Moved generation of regex for turning tags into CG friendly format from src/filters/ to tools/tokenisers/filters/. 2017-08-28T13:12:39+00:00
< Updating svn ignores. 2017-08-25T10:16:46+00:00
< [Template merge - langs/und] After a couple of bug fixes in giella-core, require the new version. 2017-08-25T09:58:41+00:00
< [Template merge - langs/und] Initial support for building tokenisers where the morphological analysis tags are given in CG format directly instead of having to be postprocess by hfst-tokenise before being printed. The idea is to make the hfst-tokenise code more general, and move everything that is particular to one language or setup go into the fst instead of being hardcoded in the C++ code. There are some issues that must be resolved, but fst-wise the code works. 2017-08-24T11:33:54+00:00
< [Template merge - langs/und] Added support for building a regex that transform all tags from the format "+Adv" to " Adv" (including space). The idea is to make the tags readily consumable by CG. Both prefix and suffix tags are converted. Newest giella-core required. 2017-08-24T10:02:20+00:00
< [Template merge - langs/und] Part two of renaming the preprocess dir to tokenisers. Now all refs to it are updated. 2017-08-24T06:44:58+00:00
< [Template merge - langs/und] Renamed the preprocess dir to tokenisers, to better describe the content of it. 2017-08-24T06:10:51+00:00
< aed as an exception 2017-08-23T14:58:32+00:00
< trivial change in 2-level rules 2017-08-23T13:22:51+00:00
< small simplifications in 2-level rules 2017-08-22T13:42:39+00:00
< small changes in 2-level rules to make them look more logical 2017-08-21T14:37:02+00:00
< some irregular stem illatives added 2017-08-17T07:46:54+00:00
< [Template merge - langs/und] Added support for diffing and merging on Linux. As part of that added checking for diff tools in m4/giella-macros.m4, and added more tests against failures. Also added test for cg-mwesplit, and increased the required vislcg3 version to the 1.0 release. 2017-08-16T10:34:09+00:00
< [Template merge - langs/und] More robust test for the existence of the various vislcg3 files. 2017-08-15T12:06:28+00:00
< [Template merge - langs/und] Added more robust option checking, and a test for the existence of the specified corpus file. Also added some comments. 2017-08-15T07:08:43+00:00
< [Template merge - langs/und] Actually open the other diff views. And force-add to svn - we don't want error messages in this context. 2017-08-14T14:39:00+00:00
< [Template merge - langs/und] Corrected glaring variable copy&paste bug. Thanks to Trond for spotting it! 2017-08-14T12:34:53+00:00
< anonymising 2017-07-21T16:20:23+00:00
< some errors corrected 2017-07-19T16:59:53+00:00
< [Template merge - langs/und] Removed from the default build rules the automatic removal of +Comp tags in adverbs. That is definitely not a behavior we want universally. 2017-07-02T00:25:29+00:00
< [Template merge - langs/und] Fixed a bug that caused the check_analysis_regressions.sh script to fail if you hadn't put giella-core/scripts/ in your path - which is not automatically done when you just checks out giella-core and your language of interest. 2017-06-30T00:31:02+00:00
< [Template merge - langs/und] Changed command to extract the specified fst name, the old version was not reliable. 2017-06-29T01:06:15+00:00
< Updating svn ignores. 2017-06-28T23:38:04+00:00
< compounding more restricted 2017-06-28T19:03:24+00:00
< Updated svn ignores. 2017-06-28T17:12:58+00:00
< verb derivations in compounds slightly better 2017-06-27T18:55:59+00:00
< punctuation.lexc moved from stems to morphology; now some shortened stems like astraal-, vaatamis- are ok 2017-06-26T18:03:25+00:00
< first step into compounding: overgenerating with rudimentary filtering 2017-06-15T19:00:59+00:00
< yaml tests for pron and num where both parts inflect; need, nood now part of pl paradigm of see, too 2017-06-12T17:45:31+00:00
< paired words again 2017-06-06T15:00:13+00:00
< paired words 2017-06-06T13:53:02+00:00
< incorrect compound numerals removed 2017-05-31T12:31:04+00:00
< compound numerals (viissada) with two nets; their wrong paradigms still remain in numerals.lexc 2017-05-30T12:18:17+00:00
< paired words containing a hyphen 2017-05-26T16:09:59+00:00
< [Template merge - langs/und] Due to wrong AM conditional, it still built a few mobile speller fst's. Now it should be quiet. 2017-05-23T09:19:48+00:00
< [Template merge - langs/und] Really do disable mobile spellers by default... 2017-05-23T08:51:24+00:00
< [Template merge - langs/und] Made mobile spellers not build by default, even when enabling spellers. The mobile spellers must now be explicitly enabled. 2017-05-23T08:32:45+00:00
< some yaml tests added 2017-05-18T17:19:55+00:00
< emb-kumb, kihin-kahin ok 2017-05-18T16:19:02+00:00
< some build bugs fixed 2017-05-18T09:31:05+00:00
< second try for emb-kumb et al; makefile.am still not ok 2017-05-17T17:59:11+00:00
< [Template merge - langs/und] Removed Ins() around Unknown. This triggered a bug(?) in hfst-tokenise, that caused wordforms not to be output. Speed and memory consumption should not be noticably affected. 2017-05-16T16:59:19+00:00
< first version that handles paired words (emb-kumb, kihin-kahin) that inflect both (emma-kumma, kihina-kahina); it is done by concatenating nets and applying filters 2017-05-12T18:04:34+00:00
< slightly tightened control of derivational suffix sequences 2017-05-05T07:58:54+00:00
< [Template merge - langs/und] Improved pmatch scripts - unification by reference instead of full fst unification. Reduces file size by ≈2/3, and runtime memory consumption by 50%. 2017-05-04T10:18:16+00:00
< slightly more constraining derivation filter 2017-05-02T12:28:52+00:00
< helveetslane OK 2017-05-02T08:17:25+00:00
< firenzelane (lower cased) is now accepted 2017-04-28T18:11:59+00:00
< [Template merge - langs/und] Now that there is a new version of Hfst out, require it. Should resolve issues with compiling the url.lexc file. 2017-04-18T15:19:10+00:00
< derivations improved, and added proper nouns to deriv. bases, but downcasing them not done yet. I.e. Pariislane is accepted... 2017-04-07T18:06:02+00:00
< removed the gi-particle possibility from some adverbs that end with gi/ki already 2017-04-05T16:11:40+00:00
< [Template merge - langs/und] Further development of the analysis regression check: added support for diff views of all diff types, and now you can specify which diff view you want to see (and you must specify at least one). You can also override the default corpus, and specify a corpus of your own with the -c/--corpus option. Also corrected the initial description of the script in the help text, and added a diff view comparing the old pipeline using Xerox with the new pipeline using hfst-tokenise. This will help in finding unwanted differences between the two. 2017-03-17T12:44:23+00:00
< [Template merge - langs/und] Further improvements to the analysis regression check: only do function and dependency analysis if the required cg3 files exist. Also clarified the -d option and silenced the Xerox lookup tool. 2017-03-16T14:33:20+00:00
< [Template merge - langs/und] Improved analysis regression check script: added a short help text, and added an option to ask for a diff between old-style (preprocess+lookup+lookup2cg) and new-style (hfst-tokenise+mwe-disamb+cg-mwesplit) morphological analysis. Intended to be used to find weak (and strong!) spots in the new-style morphological analysis. 2017-03-16T12:17:12+00:00
< [Template merge - langs/und] Added the first version of a $LANG/devtools/ script that will process a corpus with the available tools, and compare the result against the previous version in the svn repository. The idea is to be able to easily spot regressions in analyses due to changes in the lexicons or CG rules. There are a number of rough edges, but it works. 2017-03-16T10:03:42+00:00
< [Template merge - langs/und] Only remove generated lemma files if the lemma generation tests succeeds. 2017-03-14T14:05:45+00:00
< undo jaska's accidental last commit 2017-03-13T13:00:43+00:00
< Work with CnsInZero, VowInZero to improve xfst. 2017-03-13T07:53:18+00:00
< a couple pronoun forms corrected 2017-03-10T20:42:09+00:00
< tale (pro talle) corrected; adjectives that form comparative degree irregularly (soe, pikk etc) now not allowed to form it via derivation; derived comp and superl adjectives are now inflected in fin->est also (thanks to tag conversion) 2017-03-08T18:48:37+00:00
< [Template merge - langs/und] Only delete generated dic and tex files if one really wants to start anew. Do not delete the version.txt file, only the generated wordlist file. 2017-03-07T18:45:30+00:00
< [Template merge - langs/und] Add the url parser also to the grammar checker tokeniser. 2017-03-07T15:00:06+00:00
< [Template merge - langs/und] Make the url.hfst a dependent of the hfst tokenising analyser. Improved the tokeniser based on recent changes in sme. 2017-03-06T17:06:10+00:00
< [Template merge - langs/und] Removed automatic inclusion of the url parsing fst. The union with the regular fst blew up the total, in some cases more than 10x! The preferred way of adding it is to add it in the last steps of the *.tmp.fst > *.fst processing by loading it onto the stack (and inverse it for hfst) before saving the fst stack, and thus creating a transducer file with two fst's. Applying the input to them both will in effect union them, giving the output we want without blowing up the size of the fst file. 2017-03-03T14:13:08+00:00
< [Template merge - langs/und] Added support for compiling a lexc file for parsing URL's as such, giving them a separate tag. Only added to the descriptive analysers for now. Requires an updated version of giella-shared, due to the new file needed for the new functionality. 2017-03-02T14:04:15+00:00
< [Template merge - langs/und] Corrects an inconsistency in the order of tag changing processing, where generators and analysers got their tags changed in different order, which caused different tags in some cases. Fixes bug #2264. Thanks to Heiki-Jaan Kaalep for the new and corrected code. 2017-03-02T06:35:58+00:00
< deleted (for now) punctuation tag conversion in a filter so that now punctuation passes through Apertium without errors 2017-03-01T14:47:38+00:00
< Updated svn ignores. 2017-03-01T11:26:04+00:00
< twolc documentation (not sure it is worth doing...) and a change in apertium lexicon building script: changed the order in which modify-tags.regex and apertium.postproc.relabel are applied in analyser. as a result, modify-tags behaves in a more symmetrical way. 2017-02-28T19:29:05+00:00
< The yaml tests are passing for Heiki-Jaan and me (at least), thus remove the test runner from the XFAIL list to easier detect regressions. 2017-02-28T07:13:21+00:00
< adding modify-tags.regex 2017-02-27T11:55:31+00:00
< script for testing 2017-02-27T09:37:27+00:00
< [Template merge - langs/und] Updated Python feedback to correctly state that Python 3.5 is required. 2017-02-27T09:21:02+00:00
< [Template merge - langs/und] Fixed issue with link generation thanks to Heiki-Jaan Kalep. 2017-02-22T08:58:22+00:00
< generated documentation: root-morphology 2017-02-21T20:28:33+00:00
< Correcting the link. 2017-02-18T17:18:26+00:00
< derivations (except -lane); > for morpheme boundary instead of + 2017-02-17T20:28:42+00:00
< [Template merge - langs/und] Increased reqiured version of Python3, due to the updated speller test bench. 2017-02-15T07:57:46+00:00
< [Template merge - langs/und] New version of the speller test bench, now with sortable table columns, and optional timing of the suggestions for every input word (hfst-ospell-office only). Not finished, but working quite well. It is also possible now to specify the number of suggestions returned by hfst-ospell-office. 2017-02-14T09:37:59+00:00
< Der/lik 2017-02-13T20:50:17+00:00
< derivations still not filtered enough... 2017-02-09T20:21:18+00:00
< derivations do not generate multiple paths any more, but the filters to weed out incorrect ones are still not working properly 2017-02-07T18:54:21+00:00
< [Template merge - langs/und] Increased required version of giella-core due to bug fix in the core. 2017-02-03T11:46:31+00:00
< [Template merge - langs/und] Increased required version of giella-core due to changes in speller building. 2017-02-03T09:45:31+00:00
< derivation filter that disallows some derivations, based on word class, works now 2017-02-02T17:21:03+00:00
< [Template merge - langs/und] One more attempt at fixing the giella-common package bug. 2017-02-02T08:46:50+00:00
< [Template merge - langs/und] Added final step in building pattern-based hyphenators: now also prepared for Hunspell-like OOo hyphenation. Requires new version of the giella-core. Also corrected bug in checking the version number of giella-common. 2017-02-01T11:03:20+00:00
< first version of derivation 2017-01-31T20:01:40+00:00
< [Template merge - langs/und] Tex pattern based hyphenation generation works. The output must be checked and tested, and the process may have to be rerun several times to get the desired hyphenation behavior. Removed outcommented build code from the old infra - the new build code is essentially just a reformulation of the old one. 2017-01-31T14:37:09+00:00
< [Template merge - langs/und] Added support for checking the version of the giella-common package (aka giella-shared/). Added two new regexes to the source file list for shared regexes. Updated the required version of Hfst - it has not been updated in ages. 2017-01-31T13:46:02+00:00
< [Template merge - langs/und] Further work on the pattern based hyphenators: added tra file template, which is used to 'translate' non-ASCII chars to ascii only for the pattern creation process. Initial build steps for the pattern build. 2017-01-31T12:15:05+00:00
< [Template merge - langs/und] Improved the fst-based hyphenator by removing irrelevant paths from the fst. Started work on the pattern-based hyphenator, based on code from the old infra. 2017-01-31T10:17:49+00:00
< [Template merge - langs/und] Finished first version of fst-based hyphenator: now includes plain rules as a fall-back solution (including for misspelled words), and Err-tagged forms get a high weight penalty. In general, this seems to give good hyphenation patterns if one pick the first (lowest-weight) one. 2017-01-30T13:47:32+00:00
< [Template merge - langs/und] First version of lexicon-based and fst-based hyphenation done. Works, but misses capitalised words, and does not give extra weights to Err-tagged word forms. Also no hyphenation of misspelled words yet. Hyphenation builds are off by default. 2017-01-30T12:08:14+00:00
< [Template merge - langs/und] Added template file for weighting tags when the fst is used as a hyphenator. 2017-01-30T10:28:43+00:00
< Updated svn ignores. 2017-01-30T09:54:28+00:00
< [Template merge - langs/und] Added check for cg-relabel when enabling apertium. Thanks to Flammie for identifying the issue. 2017-01-30T09:19:29+00:00
< [Template merge - langs/und] Added basic dir structure for building hyphenators. 2017-01-27T07:14:28+00:00
< Replaced gtcore with giella-core. 2017-01-25T10:39:33+00:00
< [Template merge - langs/und] Replaced gtcore with giella-core. 2017-01-25T09:38:25+00:00
< [Template merge - langs/und] Added test dir for hyphenators, to store data from the old infra. 2017-01-23T10:48:15+00:00
< [Template merge - langs/und] Added test dirs for listbased spellcheckers, if we ever get to that. 2017-01-23T09:03:14+00:00
< [Template merge - langs/und] Fixed logical error in the handling of negated specified fst handling in yaml tests (e.g. ~xfst) - the test didn't work, and the yaml file was run when not intended. 2017-01-18T00:26:16+00:00
< [Template merge - langs/und] Fixed regression introduced in the previous commit: one-sided tests where included when looking for test data, causing a subsequent python fail when no actual test data was found. Fixed by using a stricter file name pattern. 2017-01-17T15:51:18+00:00
< [Template merge - langs/und] Added option to specify in a yaml filename that it should only be tested against a specific technology or not, by specifying one of .foma, .hfst or .xfst before the suffix part (before [.gen].yaml), and prefixed with '~' if negated (i.e. .~xfst for NOT running it against Xerox). 2017-01-17T08:25:53+00:00
< Corrected the Links.jspwiki file - for some reason, Heiki-Jaan’s system generates a garbled Links file (cf commit 146727). 2017-01-16T21:23:26+00:00
< added a script sceleton for Heli to remove usage tags from certain wordforms 2017-01-16T19:23:27+00:00
< [Template merge - langs/und] Slightly more robust yaml testing code. 2017-01-16T15:14:00+00:00
< [Template merge - langs/und] Common starting point for both weighted and unweighted parts. 2017-01-16T15:06:42+00:00
< generated, although not as intended. 2017-01-14T09:58:59+00:00
< Ok, feil funne, ei fil var ikkje sjekka inn. 2017-01-14T08:50:14+00:00
< [Template merge - langs/und] Added removal of Area tags also for specialised fst's. Fixes Korp issue reported by Ciprian. 2017-01-10T13:39:38+00:00
< no essential changes 2017-01-06T12:31:43+00:00
< minor changes 2017-01-05T11:26:45+00:00
< yet another version of verb morphology and phonology 2017-01-05T10:27:02+00:00
< some minor simplifications in 2-level rules 2017-01-04T15:53:52+00:00
< verbs finished 2016-12-29T19:14:14+00:00
< verbs a little bit better still... 2016-12-28T18:12:43+00:00
< verb affixes and cont. lexicons a little bit more generalised now 2016-12-27T20:54:55+00:00
< verb morphology partly re-written; re-making not finished yet 2016-12-22T13:02:30+00:00
< verb paradigms simplified 2016-12-13T21:06:52+00:00
< vette, kätte 2016-12-09T19:47:26+00:00
< [Template merge - langs/und] Ensure the fastest lookup method is used during hfst yaml generation tests. 2016-12-09T09:40:16+00:00
< declinations cont lexicons better organised; tütar+de now like kotkas+te thanks to d:t 2016-12-08T13:59:22+00:00
< declinations a little more compact 2016-12-07T18:48:01+00:00
< note: no links in documentation, with [...]. 2016-12-07T12:54:06+00:00
< lääs-lääne now better 2016-12-05T18:32:30+00:00
< Worked with Heikki-Jaan to solve an issue with generating documentation: if a document is listed as a target, but the corresponding lexc file has _no_ line containing documenting comments, AWK will fail, and the build will error out. 2016-12-02T09:11:57+00:00
< peenike like oluline now also formally 2016-11-30T16:10:32+00:00
< muuseum ill usage info reversed 2016-11-30T15:16:58+00:00
< üks, kaks a little better now 2016-11-29T19:59:22+00:00
< stem illative and sse illative for PERE type corrected 2016-11-28T21:20:20+00:00
< [Template merge - langs/und] Removed the bash hack to add a css processing instruction - it is done by the perl script writing the xml file. 2016-11-28T19:45:47+00:00
< vowel lowering is now handled in a more genral way in twolc; and plural partitive for sai and other i-ending words of type PIIM works now 2016-11-24T18:22:28+00:00
< [Template merge - langs/und] Removed the removal for dialect and variant tags from the grammar checker analyser, the information can be useful when generating suggestions for corrections. 2016-11-23T14:45:29+00:00
< voodeid now ok, but other cases of vowel lowering need re-thinking to make est-phon look nice 2016-11-23T09:37:27+00:00
< regular_declinations a bit simplified now 2016-11-23T08:33:35+00:00
< [Template merge - langs/und] Removed repetition of the frequency weighted fst. The goal was to promote compounds where each part was already seen in the corpus, but it made the speller bigger and slower, and actually decreased suggestion quality slightly. — Also added code to do manual priority union, but it is buggy and outcommented for now. 2016-11-21T08:14:21+00:00
< updated docu. 2016-11-19T10:55:31+00:00
< [Template merge - langs/und] Added info about which file to look in to find a suitable frequency corpus cut-off location (=line number). 2016-11-18T09:26:11+00:00
< created cont.lexicons for case names in ..._declinations.lexc; they should be useful for compounding 2016-11-17T19:48:43+00:00
---
> [Template merge - langs/und] Added filter to remove the +MWE tag from the grammar checker generator. It blocked generation of some word forms (and should not be visible in any case). 2019-02-13T07:47:37+00:00
> [Template merge - langs/und] Fixed another case of transducer format mismatch for hyphenators, this time regarding pattern-based hyph building. 2019-01-25T08:54:07+00:00
> [Template merge - langs/und] Corrected an instance of transducer format mismatch when building hyphenators. 2019-01-25T08:08:55+00:00
> [Template merge - langs/und] Make the mobile keyboard layout error model work properly (ie on input longer than one char) by circumfixing it with any-stars. 2019-01-17T20:23:10+00:00
> [Template merge - langs/und] First round of improved handling of compilation errors in shell pipes: instruct make to delete targets when some of the intermediate steps fail. 2019-01-11T13:53:26+00:00
> [Template merge - langs/und] Added configure.ac conditional to control whether spellers for alternative orthographies are built. The default is 'true'. Set this to 'false' for historical or other orthographies for which a speller is not relevant. 2019-01-09T10:41:17+00:00
> [Template merge - langs/und] Fix broken hfst builds of xfscript files when there is no final newline in the source file (caused the save command to be shaddowed by the final line of text, usually a comment, so no file was saved, and thus there was nothing to work on for the next build step). 2019-01-09T08:59:21+00:00
> [Template merge - langs/und] Apply alternate orthography conversion after hyphenation marks have been removed, but before the morphology marks are deleted. Especially word boundaries are useful for certain types of conversion, but other borders will likely be useful as well. The conversion scripts need to take the border marks into consideration. 2019-01-08T08:59:35+00:00
> [Template merge - langs/und] Replicate the desktop error model for the mobile speller, and generalise the corpus weighting compilation. Now the build code is ready for mobile speller release. 2018-12-17T17:45:37+00:00
> [Template merge - langs/und] Improved Easter egg generation, using the improved script in giella-core. Increased the required giella-core version correspondingly. 2018-12-14T09:21:24+00:00
> [Template merge - langs/und] Cleaned the HFST_MINIMIZE_SPELLER macro, and also its use. No need to include push weights anymore, it is done always, for all speller fst's. 2018-12-13T10:22:14+00:00
> [Template merge - langs/und] Push weights for all final fst's, + optimise error model. 2018-12-13T09:57:44+00:00
> [Template merge - langs/und] Changed how the att file is produced. From now on it should be built once, and then added to svn. The att file will usually not change, and storing it in svn will avoid rebuilding it every time. Also changed the compression. 2018-12-12T14:55:54+00:00
> [Template merge - langs/und] Added support for adapting the error model to the mobile keyboard layout for the language in question. 2018-12-11T14:27:30+00:00
> [Template merge - langs/und] Two more places to remove the Use/-GC and the MWE tags: mt and speller fst's. Now done. 2018-11-06T07:54:39+00:00
> [Template merge - langs/und] Had forgotten to remove the Use/-GC tag in the core fst's, only from all the others. Now fixed. 2018-11-05T15:57:42+00:00
> [Template merge - langs/und] Step 2 in blocking dynamic compounds of MWE tagged entries: moved all MWE tag processing away from the *-raw-* targets to the specific *.tmp targets. This way the MWE tags will survive long enough to be available for the blocking done in the tokeniser fst's. Tested in SME, and seems to work as intended. 2018-11-05T09:10:48+00:00
> [Template merge - langs/und] Added step 1 in blocking dynamic comounds between an MWE and another noun: added new filter that will turn the MWE tag into a flag diacritic. Increased required giella-common version number due to the new filter. 2018-11-02T11:16:52+00:00
> [Template merge - langs/und] Fixed bug when building the punctuation file - the required subdir was not made. 2018-10-24T08:39:39+00:00
> [Template merge - langs/und] Moved the whitespace analyser almost to the beginning of the pipeline, directly after the tokeniser+analyser. This is to be able to support sentence boundary detection, as the whitespace analyser will give some valuable tags for that. 2018-10-12T14:07:22+00:00
> [Template merge - langs/und] Corrected typo in a configuration option - dekstop instead of desktop. Thanks to our friends in Nuuk for noticing. 2018-10-11T15:55:10+00:00
> [Template merge - langs/und] Corrected a misplaced dependency that caused url.hfst to be rebuilt on every make, and thus trigger other rebuilds. Not anymore. 2018-10-09T14:42:03+00:00
> [Template merge - langs/und] Moved whitespace tagging after the speller, to avoid that it creates trouble for the speller. That happens when whitespace error tags are applied to the word form that should be spell-checked. 2018-10-09T14:08:58+00:00
> [Template merge - langs/und] Made it possible to tag something as _only_ for the grammar checker, or _not_ for the grammar checker. Updated required giella-share version, due to new required filters. 2018-10-09T11:50:21+00:00
> [Template merge - langs/und] Moved whitespace chars to the blank regex, thereby reinstating the old compilation speed. Thanks to Kevin and Tino for noticing and suggesting the improvement. Also added comment to document what incondform is supposed to contain, again thanks to Kevin. 2018-10-09T10:08:23+00:00
> [Template merge - langs/und] Removed hyphen from the regular unknown alphabet, thereby reverting analysis of -foo as one (unknown) token, and instead back to two tokens. Added hyphen to alphamiddle, so that foo-bar will still be analysed as one big unknown token. 2018-10-09T08:58:59+00:00
> [Template merge - langs/und] Added the tokenisation disambigutation file to the compiled and installed targets. 2018-10-09T07:31:51+00:00
> [Template merge - langs/und] Better handling of unknowns: defined more whitespace characters, defined a lot more vowels in the alphabet, added recent improvements to flag diacritic like symbols at token boundaries. 2018-10-08T20:49:56+00:00
> [Template merge - langs/und] Fixed two build bugs: abbr.txt was only autogenerated when building with hfst, and the url.?fst file was not properly generated from url.tmp.?fst. 2018-10-04T11:04:14+00:00
> [Template merge - langs/und] Fixed bug in MT compilation - pattern rules are not used, but new filenames still had them due to copy-paste error. 2018-10-04T08:43:53+00:00
> [Template merge - langs/und] Added pmatch filtering also to MT and spellcheckers. Now all tools and fst's should be covered. 2018-10-04T07:59:17+00:00
> [Template merge - langs/und] Forgot to add pmatch filtering to the default targets in src/ - duh. Now done. 2018-10-04T07:32:33+00:00
> [Template merge - langs/und] Added pmatch filtering to the rest of the build targets in src/. Also added grammar checker filtering. 2018-10-03T10:42:04+00:00
> [Template merge - langs/und] Major reorganisation to properly handle pmatch preparations, by splitting the disamb-analyser compilation in two: one going to the regular disamb analyser, and the other going to the pmatch variant. We use the two tags +Use/PMatch and +Use/-Pmatch in complementary distribution to specify paths for each, one path containing pmatch backtracking poings (used with the --giella format of hfst-tokenise), and one without. The backtracking machinery is used to handle ambiguous tokenisation. Increased required version of giella-shared due to new, required filters. 2018-10-03T07:47:18+00:00
> [Template merge - langs/und] More improvements to the analysis regression check: undo space->underscore from lookup2cg (to avoid meaningless diffs when comparing to the new hfst-tokenise), and removed weight info. Also changed the dir ref for abbr.txt to ref the build dir, not the source dir, as that is where the file is generated. 2018-10-01T09:57:18+00:00
> [Template merge - langs/und] Improved regression check script: check that the abbr file is built, for improved traditional tokenisation; and make the patch command silent, for less noise during testing. 2018-09-29T12:13:33+00:00
> [Template merge - langs/und] Thanks to Børre, the analysis regression script will now remove diffs due to different handling of dynamic compounds when comparing old and new tokenisation. This makes it much easier to spot real differences between the two. 2018-09-25T10:10:13+00:00
> [Template merge - langs/und] Improved shell script for analysis regression testing, so that in cases of no diffs it will only print a short message and continue. The test for no diff is also much faster than a real diff. Improves processing time a lot for large test corpora. 2018-09-25T06:57:58+00:00
> [Template merge - langs/und] Moved punctuation definitions from each language to giella-shared/all_langs/. Makes much more sense, and will help in resolving random tokenisation bugs due to « and ». 2018-09-13T11:01:57+00:00
> [Template merge - langs/und] Implemented the option to compile phonology rules directly against the lexicon, for better rule compilation optimisations. Kevin: fixed a bug in xml generation for the grammar checker. 2018-09-11T07:15:37+00:00
> [Template merge - langs/und] Fixed hyphenation build when there is no phonology file. 2018-09-10T11:52:22+00:00
> [Template merge - langs/und] Corrected an error after the Hunspell config section was commented out. 2018-09-10T10:56:33+00:00
> [Template merge - langs/und] Added --enable-all-tools option to configure.ac, to allow for easier configuration and testing of all common tools. Unstable or experimental tools must still be explicitly enabled. Commented out the Hunspell speller config completely, it is not supported. Corrected a comment. 2018-09-10T10:35:59+00:00
> [Template merge - langs/und] Improved and completed the code to skip building phonology fst's. Clearer logic and comments. 2018-09-08T04:50:27+00:00
> [Template merge - langs/und] Added a configure.ac setting to skip phonology compilation, typically used when compiling external sources, that provides a full analyser in src/morphology. Also added a configuration option to compile xfscript files with lexicon references in them, so allow for faster and more optimised rule composition. This variable has no effect yet, the rest of the machinery is missing. 2018-09-07T22:39:32+00:00
> [Template merge - langs/und] Remove all tmp files when cleaning. 2018-09-06T11:43:44+00:00
> [Template merge - langs/und] Remove also url.tmp.lexc when cleaning. 2018-09-06T11:36:46+00:00
> [Template merge - langs/und] Fixed bug: the url analyser is located elsewhere, and should not be processed here in any case. 2018-09-06T10:09:11+00:00
> [Template merge - langs/und] Made url analyser compilation open for local adaptations, by going via a tmp file. 2018-09-06T07:32:50+00:00
> [Template merge - langs/und] Remove also url.lexc when cleaning, it is copied from giella-shared. 2018-09-05T13:52:50+00:00
> [Template merge - langs/und] Corrected double installation of url analyser bug. It should not be installed at all. 2018-08-31T17:48:19+00:00
> [Template merge - langs/und] Add missing ‘|’ in analyser-gt-whitespace.hfst goal. 2018-08-31T11:04:24+00:00
> [Template merge - langs/und] Fixed a bug in the previous commit that surfaced when enabling tokenisers but not grammar checkers. 2018-08-30T14:09:22+00:00
> [Template merge - langs/und] Massive rewrite of filter codes and automatically generated tag conversions, all done to handle bug #2474 (URL tag not correctly formatted in the tokeniser output). The bug should be fixed now. 2018-08-30T12:47:02+00:00
> [Template merge - langs/und] Added filter dir and filter compilation to the fst-based hyphenators. Moved filter compilation from src/filters/ to the local filter dir (by copying the regex files and then compile them), to make the build process mostly fst format independent. 2018-08-28T11:48:12+00:00
> [Template merge - langs/und] Added support for local modifications of the hyphenator build via a tmp file. Simplified tmp file handling in the src/ dir. 2018-08-27T12:21:01+00:00
> [Template merge - langs/und] Added dir structure and Autotools data to prepare for adding hyphenation testing. 2018-08-27T10:57:05+00:00
> [Template merge - langs/und] Downcasing of derived proper nouns was only applied on the input side, not the hyphenated side. This caused such words to be case-shifted: arabialaččat -> A^ra^bi^a^lač^čat. This is now fixed. 2018-08-27T07:54:04+00:00
> [Template merge - langs/und] Fixed hyphenation bug where the lexicon-based hyphenator missed hyphenation points, mainly in propernouns, due to flag diacritics. Fixed by telling the fst compiler to treat flags as epsilons. Now the lexicon-based hyphenator is beating the plain rule-based one in most (all?) cases where there are differences. Must be tested better, though. 2018-08-26T17:13:32+00:00
> [Template merge - langs/und] Added comment to guide placement of local build targets (to avoid future merge conflicts), and a comment reminder about other places to change filenames. 2018-08-22T06:50:55+00:00
> [Template merge - langs/und] Reorganised the source filenames to make it easy to override when needed. Should make it possible to solve the bug where src/syntax/disambiguator.cg3 overrides the same file in tools/grammarcheckers/. 2018-08-21T12:45:04+00:00
> Reverting all */tools/grammarcheckers/Makefile.am to rev 160158 by using this command in langs/: 2018-08-21T12:28:37+00:00
> [Template merge - langs/und] Refactored repeating patterns of code with variables, fixes upload link after XServe crash last winter. 2018-08-20T10:01:02+00:00
> [Template merge - langs/und] Corrected and improved the compilation of the analysers including the URL analysis. This should fix the problem with compiling SMA and other languages, and should in general reduce both compilation time and analyser size. The basic change was to union in the URL analysis as the last step in building the analysers, instead of early - the early injection led to fst blowup during minimisation. Now no blowup appears to take place. 2018-06-05T12:25:12+00:00
> [Template merge - langs/und] Added the special target .NOTPARALLEL to the hfst speller make file, to work around a make bug that caused a prerequisite to not be built when invoking make with the -j option. Also added some comments. 2018-05-18T13:00:28+00:00
> [Template merge - langs/und] Updated command in comments to use the correct option. 2018-05-18T06:43:53+00:00
> Output in a more readable format 2018-05-16T11:21:04+00:00
> Cleaned up taglistings 2018-05-16T10:55:53+00:00
> [Template merge - langs/und] Reverted the more robust semantic tag reordering, it was just too slow. Now we are back to a less robust and more fragile system (including bugs), but with faster compilation. Ultimately we will abandon _semantic_ tag reordering altogether, and instead rewrite the lexc code to always place the semantic tags where they should be. 2018-05-16T09:08:46+00:00
> Skip outcommented lines in .lexc and the resultings tags 2018-05-15T17:46:36+00:00
> First iteration of tags found 2018-05-15T17:31:36+00:00
> [Template merge - langs/und] Corrected automake (and make?) syntax error that broke compilation. 2018-05-15T11:09:28+00:00
> [Template merge - langs/und] Simplified semantic tag filtering regex construction. 2018-05-15T07:32:58+00:00
> [Template merge - langs/und] Too eager in the previous commit to get rid of semantic tag processing: removed the filter to zero out semantic tags completely, which broke compilation of a number of fst's where semantic tags are not wanted. 2018-05-09T08:15:02+00:00
> [Template merge - langs/und] Corrected bugs in reordering semantic tags by doing the reordering in two steps: 1) insert the tag in the new and correct position, and 2) remove the tag in the wrong position. There will probably be things to iron out, but initial tests are fine. This should also make the whole semantic tag reordering a bit faster to compile and apply, as the generated regexes are smaller and simpler. 2018-05-08T18:26:25+00:00
> [Template merge - langs/und] Now that the downcasing script works in all cases, remove all the special processing, and get rid of spurious rebuilds of the dependent fst's. Another time-saver:-) 2018-05-02T10:13:28+00:00
> [Template merge - langs/und] Changed the downcasing script to work also with hyperminimised hfst-fst's. Now the downcasing script works both with Xerox, Hfst and Foma, and both with standard and hyperminimised hfst-fst's. Finally! 2018-05-02T09:13:57+00:00
> [Template merge - langs/und] Added support for filters for grammatical and derivation tags, sorted the generated filter list. 2018-04-23T14:46:22+00:00
> [Template merge - langs/und] Bugfix: OLang/xxx tags were removed, not made optional, in generators. 2018-04-20T08:32:55+00:00
> [Template merge - langs/und] Do not delete disambiguator.cg3 and grammarchecker.cg3 when cleaning. 2018-04-19T08:49:44+00:00
> [Template merge - langs/und] Whether to let the orig-lang tags be visible in the disambiguating analyser or not is dependent on the language and the needs of each language community. Moving the removal of those tags from the general processing to the language specific processing. Step 2: removing it from the general processing. 2018-04-18T13:16:04+00:00
> [Template merge - langs/und] Added the -p option to the yaml testing command, to remove all passing test. This should make it easier to spot the actual FAILs. 2018-03-08T12:52:16+00:00
> [Template merge - langs/und] Corrected path to zhfst file. Also changed the return code when the zhfst file is not found, so that it will be reported as a FAIL. Since this test is only run when configured for building spellers, a missing zhfst file should be fatal. Also changed variable name to avoid confusion with the shell variable. 2018-03-08T11:02:54+00:00
> [Template merge - langs/und] Added phony target forwarding 'make test' to 'make check'. Required to make 'make check' work on some build systems. 2018-03-08T10:41:42+00:00
> [Template merge - langs/und] Added a separate disambiguation file for the spell checker output, and a spell-checker-only pipeline (well, still tokenisation and disambigation, but no proper grammar checking). 2018-03-05T15:40:34+00:00
> [Template merge - langs/und] Corrected Foma compilation for phonology rules. 2018-03-05T10:23:30+00:00
> [Template merge - langs/und] Made symbol alignment default - I can see no cases where we don't want it, but it is still possible to disable it if such a need pops up. Also improved the error message when trying to build a twolc language using Foma. 2018-02-09T08:08:15+00:00
> [Template merge - langs/und] Added INFO text about switching to Hfst as a fallback when Xerox tools are not found. Also added test and error message when using Foma on a language with a twolc file. 2018-02-09T07:36:31+00:00
> Just for completeness and symetry: added foma target. 2018-02-06T07:19:25+00:00
> Another attempt at getting rid of the nightly build fails: reverting back to exactly as it was before my first change, with one modification (and one modification only): added $(HFST_FORMAT) to allow speed optimisations using the foma backend. 2018-02-06T07:16:42+00:00
> [Template merge - langs/und] Fixed URL analysis in MT. All URL's and email addresses are now tagged +URL. Although the url analyser itself is small, the resulting analyser quadrupled in size (in sme). 2018-02-05T19:49:56+00:00
> [Template merge - langs/und] Removed filters for removing morphological borders - they destroy the assymetry of the fst's, and make yaml testing more complicated. 2018-02-02T08:12:06+00:00
> [Template merge - langs/und] Added support for Area variants of the grammar checker generator. Should fix nightly build error for SMJ. 2018-02-01T19:32:30+00:00
> [Template merge - langs/und] Added missing Foma support for dictionary fst's. 2018-02-01T18:40:23+00:00
> [Template merge - langs/und] Fixed the last bunch of path errors. Now all yaml tests are back to normal. 2018-02-01T17:50:32+00:00
> [Template merge - langs/und] Cleanup: commented in outcommented test loop, removed exit statement used during development, fixed path for two test scripts. 2018-02-01T15:59:06+00:00
> [Template merge - langs/und] The last set of test runners for yaml tests changed to the new system. 2018-02-01T15:15:22+00:00
> [Template merge - langs/und] Three more yaml test runners done, still a few more to go before yaml testing is back in shape. 2018-02-01T13:58:57+00:00
> [Template merge - langs/und] Changed the last yaml testing scripts in the template to follow the new and improved system. No need for autoconf processing anymore. 2018-02-01T12:11:53+00:00
> [Template merge - langs/und] Major rework of the yaml testing framework, to be able to properly support fst type specific yaml testing (ie test only xfst or hfst transducers, or everything but xfst transducers (=foma & hfst)). This change triggered a number of other changes. The user-facing shell scripts are greatly simplified by this change. 2018-02-01T09:56:53+00:00
> [Template merge - langs/und] Corrected AM errors in the previous merge. Now the build is working again, 2018-01-31T11:42:51+00:00
> [Template merge - langs/und] Added support for grammar checker generators for alternative orthographies and writing systems. Should fix nightly build issue in CRK. 2018-01-31T11:14:39+00:00
> [Template merge - langs/und] Added support for a grammar checker specific generator. Should fix various issues re generation of suggestions. 2018-01-25T09:40:03+00:00
> Thanks to Heiki-Jaan for spotting a copy-paste error on my part. 2018-01-25T08:57:42+00:00
> Still an error in the nightly build. I am not able to reproduce the error, but I change back to having the regex on one line only, no escaped linebreaks in the Makefile. This is how it was earlier, and that seemed to work. No idea why that would make a difference, though. 2018-01-25T07:49:03+00:00
> Fixed jspwiki error in the documentation. 2018-01-24T09:11:20+00:00
> Updated docu. 2018-01-24T08:14:44+00:00
> More and better documentation. 2018-01-24T08:14:13+00:00
> Removed the '7 definition, no need for it. Defined all escaped quotes and brackets as realised as themselves, to fit with the general handling of morphological boundaries, and to get rid of warnings of multichar only found on one fst during composition-intersection. 2018-01-24T08:08:01+00:00
> Removed the '7 definition, no need for it. Defined all escaped quotes and brackets, to get rid of warnings of multichar only found on one fst during composition-intersection. 2018-01-24T08:05:51+00:00
> Giella style whitespace changes. '7 was changed to plain ', as there is no need for the special use of it in est (and it is in fact not used). 2018-01-24T08:03:39+00:00
> [Template merge - langs/und] Added test for the presence of divvun-validate-suggest, which is now required to build grammar checkers. Now configure will error out instead of make. 2018-01-23T07:34:53+00:00
> [Template merge - langs/und] Add note to the errors.xml file that it is generated, and from which file it is generated, to avoid people editing the wrong file. 2018-01-22T12:42:48+00:00
> [Template merge - langs/und] Error messages are now copied from a source file to a build file, after bein validated. This allows support for VPATH builds and retains the integrity of the zcheck file. At the same time also replaced hard coded language names with automake variable expansion in the pipespec.xml.in file. 2018-01-22T10:59:59+00:00
> [Template merge - langs/und] Fixed bug in building dictionary analysers for alternative orthographies, introduced in the changes yesterday. 2018-01-18T07:10:31+00:00
> Trying to fix broken nightly build by swithing to semicolon separated regex, after I added whitespace to make the regexes more readable. 2018-01-18T06:47:26+00:00
> [Template merge - langs/und] Added option to specify language variant, to allow testing spellers for alternative writing systems, alternative orthographies, different countries etc. 2018-01-18T06:35:48+00:00
> [Template merge - langs/und] Added support for area / country specific fst's for the specialised dict and oahpa build files. At the same time reorganised the build code so that targets with two variables now consistently use the fst type / suffix as the pattern, and the writing system/alt orth/area/etc as the function parameter. This should make the build system more robust by reducing the risk for accidental pattern similarity. 2018-01-17T11:37:42+00:00
> Fixed build errors due to fst format mismatch. Now builds properly with Hfst again. The morpher takes ages, and should probably not be built for est. 2018-01-17T07:56:55+00:00
> [Template merge - langs/und] Added support for building area/country specific spellers. The target language for now is SMJ, but the feature is of course language independent and useful in a number of other circumstances. 2018-01-16T19:48:02+00:00
> [Template merge - langs/und] Changed dialect fst filenames to follow existing patterns used for Oahpa fst's. 2018-01-16T14:42:57+00:00
> [Template merge - langs/und] Added support for building dialect fst's. It is disabled by default, but can be enabled with a configure option. Also changed the disamb analyser to keep the dialect tags. Only normative fst's are filtered against dialect tags. 2018-01-16T12:39:01+00:00
> [Template merge - langs/und] Added initial support for building Area-specific analysers and generators (norm only). Also restored Area tags in the disamb and grammar checker analysers. Fixed missing support for Foma transducers in the alternative writing system support. 2018-01-16T07:44:07+00:00
> [Template merge - langs/und] Grammar checker .zcheck file should go into datadir, not libdir. 2018-01-15T11:55:49+00:00
> [Template merge - langs/und] Now using speller version info from configure.ac, not version.txt, which is removed. New giella-core required. 2018-01-15T10:40:45+00:00
> [Template merge - langs/und] Fixed a bug in fst format handling for the grammar checker - conflicting formats caused a segfault. Now using openfst-tropical for all fst's being processed in the grammarcheckers/ dir (presently only the speller acceptor analyser). 2018-01-15T08:51:33+00:00
> [Template merge - langs/und] Fixed OLang tag extraction and filter generation. 2018-01-12T13:19:58+00:00
> [Template merge - langs/und] Added weights to compounds in the language-indpendent build steps (languages without compounds will go through the same step, but will not be changed). Applied only to analysers. Also added spellrelax to the language-independent build of the analysers = it it always applied. 2018-01-12T11:58:01+00:00
> [Template merge - langs/und] Improved the previous fix: make sure it does not crash when the target file does not exist, and use the same test on all autogenerated tag lists. This should save a few more seconds of build time. 2018-01-12T08:33:08+00:00
> [Template merge - langs/und] Fixed bug #2355 so that the filters for semantic tags will only be rebuilt when there are real changes to the semantic tags. 2018-01-11T17:28:56+00:00
> [Template merge - langs/und] Corrected a € vs cut incompatibility on Linux, cf bug report #2457. 2018-01-11T08:49:04+00:00
> [Template merge - langs/und] Updated the pipespec.xml file to comply with the newest version of the grammar checker code, where each argument type is explicitly specified. Makes for a more robust pipeline. 2018-01-10T12:05:36+00:00
> [Template merge - langs/und] Corrected fileref in m4, added correct autoconf path to errors.xml. 2018-01-08T14:48:18+00:00
> [Template merge - langs/und] Renamed pipespec.xml to *.in, to allow autoconf processing. This makes it possible to use modes when building using VPATHS/out-of-source builds. 2018-01-08T14:23:41+00:00
> [Template merge - langs/und] Hard-coded filename in fallback target - that was the only way to work around a loop in make on some systems. 2018-01-08T09:46:56+00:00
> em dash (sigh). Thanks to Sjur for spotting the obvious. 2018-01-08T07:37:22+00:00
> [Template merge - langs/und] Renamed src/syntax/disambiguation.cg3 to src/syntax/disambiguator.cg3, to keep the file naming consistent (actor noun if possible), and remove discrepancy between the regular disambiguator and the grammar checker disambiguator that caused makefile troubles. 2018-01-07T16:41:02+00:00
> Minor updates. 2018-01-06T10:04:45+00:00
> For now: commenting out ref to src/syntax, manually copying disambiguation.cg3 to disambiguator.cg3 2018-01-06T10:01:35+00:00
> Emacs tab problem, now real TAB 2018-01-05T13:14:16+00:00
> Addition for using ordinary dis-file 2018-01-05T13:10:08+00:00
> [Template merge - langs/und] Heavy rewrite of the analysis regression check tool, to support testing the grammar checker pipeline. 2017-12-12T12:20:30+00:00
> [Template merge - langs/und] Do not remove semantic tags, dialect tags and other tags useful for disambiguation or suggestion generation. The grammar checker speller needs these, and they will anyway disappear when we project the final fst. 2017-12-11T13:07:19+00:00
> [Template merge - langs/und] Proper verbosity specification in a few more instances, and added weight pushing for the grammar checker speller now (how could I have missed that?). 2017-12-01T12:31:44+00:00
> [Template merge - langs/und] Fixed a bug in piped hfst-xfst commands: in three cases the -p option was missing, causing strange misbehavior in hfst-xfst on some systems. 2017-12-01T12:09:04+00:00
> [Template merge - langs/und] Further configure.ac cleanup: moved some variable definitions to other m4 files, moved the language definition on top, deprecated GTLANG* variables for GLANG* variants (ie Giella instead of GiellaTechno). Updated copyright year. 2017-12-01T10:27:06+00:00
> [Template merge - langs/und] Moved all default AC_CONFIG_FILES into a separate function in a separate m4 file, to clean up configure.ac. Some other cleanup of configure.ac. 2017-12-01T09:32:03+00:00
> [Template merge - langs/und] Defined variable for separate speller release version string. 2017-12-01T08:23:56+00:00
> [Template merge - langs/und] Changed package name and version to more clearly be a real name and version number. 2017-12-01T08:07:13+00:00
> [Template merge - langs/und] Updated comment in preparation for other changes. 2017-12-01T07:53:01+00:00
> [Template merge - langs/und] Added support for analysing whitespace and thus make it possible to tag whitespace errors (double spaces, extra spaces, etc), and also to more reliably detect sentence and paragraph borders by using whitespace as a delimiter. 2017-11-30T14:23:26+00:00
> [Template merge - langs/und] Using absolute dir refs to make it possible to call the shell scripts from everywhere. 2017-11-30T12:36:00+00:00
> [Template merge - langs/und] Fixed a bug: forgot to remove a line. 2017-11-29T13:37:02+00:00
> [Template merge - langs/und] Rewrote the speller test scripts in devtools/ to be VPATH safe and rely on autotools for paths etc, so that the scripts will work also when only checking out single languages. 2017-11-29T13:00:15+00:00
> [Template merge - langs/und] Added support for specifying language-specific files to be included in the grammar checker archive file. 2017-11-15T13:19:51+00:00
> [Template merge - langs/und] Updated grammar checker files and build rules. 2017-11-13T09:47:19+00:00
> [Template merge - langs/und] Added hfst-push-weights to move transducer weights to the beginning of the strings, to enable proper optimisations of speller lookup in hfst-ospell. Stripped out most lang-specific stuff from grammar checker cg file, and added simple example rules + some explanations. Use gramcheck tokeniser in pre-pipe. 2017-11-07T15:46:35+00:00
> [Template merge - langs/und] Added default rule for speller suggestions, to make the suggestions survive cg treatment. 2017-10-25T09:54:16+00:00
> [Template merge - langs/und] Added spell checking component to the grammar checker pipeline. Now every planned component is working as it should. The spell checking requires first that one builds the latest hfst-ospell code, and then the newest grammar checker code for this to work. 2017-10-24T12:53:13+00:00
> [Template merge - langs/und] Increased weights for fall-back rule-based hyphenation. Added .hfst suffix to rule fst for consistency. 2017-10-13T07:41:24+00:00
> [Template merge - langs/und] Replaced the huge sme grammar checker with the more moderate smn grammar checker cg file, as the template file for future grammar checkers. 2017-10-12T08:39:54+00:00
> [Template merge - langs/und] Added note (readme file) about NOT touching the local am-shared dir, to avoid future unintended changes. 2017-10-12T06:36:44+00:00
> [Template merge - langs/und] Added the missing files for a working grammar checker. Fixed grammar checker build rules to not be dependent upon enabling tokenisers. 2017-10-11T18:23:32+00:00
> [Template merge - langs/und] Added conversion of the analysis tags from the grammar checker speller into CG format. 2017-10-11T05:53:04+00:00
> [Template merge - langs/und] One misplaced variable caused the grammar checker speller to be built independent of the configuration. This caused a build fail for everyone. Solves bug #2437. Also added $(srcdir) in front of root.lexc, to ensure that the file reference resolves correctly in local build targets. 2017-10-10T09:30:21+00:00
> [Template merge - langs/und] Moved the target clean-local to the local Makefile, to make it possible to enhance the clean target with locally generated files. 2017-10-10T09:01:09+00:00
> [Template merge - langs/und] Correctiona to the grammar checker speller build: we now build a working zhfst file that can be used as part of the development cycle. Also additions to silent builds. 2017-10-04T07:00:03+00:00
> [Template merge - langs/und] Major update to the grammar checker template. It still does not work completely as it should, so hold your horses. Update content: ensured that all files needed are copied to the grammar checker build dir, removed option to name files (=irrelevant bloat), now builds an almost proper zip file, and ensured that tokenisers are built before grammarcheckers. Also made it so that when grammar checkers are enabled, spellers are automatically enabled too, as they will be included as part of the grammar checker pipeline. 2017-10-03T07:01:12+00:00
> [Template merge - langs/und] Changed the file exists test for the lemma generation testing so that it will work even in cases where multiple source files are used as input. 2017-09-20T12:10:07+00:00
> [Template merge - langs/und] Made cg3 file compilation more general. 2017-09-19T14:19:51+00:00
> [Template merge - langs/und] Moved the code to build the apertium relabel script in the apertium directory, so that we can use the actual giella-tagged fst for MT as the tag source. This should fix all issues of missing tags in the relabel script. 2017-09-15T14:15:22+00:00
> [Template merge - langs/und] GLE requires regex compilation possibilities in src/, no reason why it can't be. 2017-09-14T11:27:39+00:00
> [Template merge - langs/und] Fixed a shortcoming in the build infra uncovered by gle: no explicit support for language-specific build rules that will not end up in lexicon.?fst. 2017-09-14T06:20:58+00:00
> Formatting. 2017-09-13T12:20:12+00:00
> today 2017-09-13T12:11:22+00:00
> Post-Samest meeting 2017-09-08T14:47:11+00:00
> [Template merge - langs/und] Moved tag extraction to a separate am-include file, so that it can be shared between different dirs. Moved generation of regex for turning tags into CG friendly format from src/filters/ to tools/tokenisers/filters/. 2017-08-28T14:22:07+00:00
> [Template merge - langs/und] After a couple of bug fixes in giella-core, require the new version. 2017-08-25T10:11:28+00:00
> [Template merge - langs/und] Initial support for building tokenisers where the morphological analysis tags are given in CG format directly instead of having to be postprocess by hfst-tokenise before being printed. The idea is to make the hfst-tokenise code more general, and move everything that is particular to one language or setup go into the fst instead of being hardcoded in the C++ code. There are some issues that must be resolved, but fst-wise the code works. 2017-08-24T11:54:29+00:00
> [Template merge - langs/und] Added support for building a regex that transform all tags from the format "+Adv" to " Adv" (including space). The idea is to make the tags readily consumable by CG. Both prefix and suffix tags are converted. Newest giella-core required. 2017-08-24T10:09:48+00:00
> [Template merge - langs/und] Part two of renaming the preprocess dir to tokenisers. Now all refs to it are updated. 2017-08-24T07:29:47+00:00
> [Template merge - langs/und] Renamed the preprocess dir to tokenisers, to better describe the content of it. 2017-08-24T06:24:29+00:00
> added sent to set of DELIMITERS. 2017-08-16T18:19:12+00:00
> [Template merge - langs/und] Added support for diffing and merging on Linux. As part of that added checking for diff tools in m4/giella-macros.m4, and added more tests against failures. Also added test for cg-mwesplit, and increased the required vislcg3 version to the 1.0 release. 2017-08-16T10:52:11+00:00
> [Template merge - langs/und] More robust test for the existence of the various vislcg3 files. 2017-08-15T12:22:08+00:00
> [Template merge - langs/und] Added more robust option checking, and a test for the existence of the specified corpus file. Also added some comments. 2017-08-15T07:17:16+00:00
> [Template merge - langs/und] Actually open the other diff views. And force-add to svn - we don't want error messages in this context. 2017-08-14T14:47:01+00:00
> [Template merge - langs/und] Corrected glaring variable copy&paste bug. Thanks to Trond for spotting it! 2017-08-14T12:56:13+00:00
> Made a relative link absolute. Should fix the linking errors in Forrest during the last days. 2017-07-12T11:15:55+00:00
> docu 2017-07-10T09:06:20+00:00
> [Template merge - langs/und] Removed from the default build rules the automatic removal of +Comp tags in adverbs. That is definitely not a behavior we want universally. 2017-07-02T01:40:00+00:00
> [Template merge - langs/und] Fixed a bug that caused the check_analysis_regressions.sh script to fail if you hadn't put giella-core/scripts/ in your path - which is not automatically done when you just checks out giella-core and your language of interest. 2017-06-30T00:57:44+00:00
> [Template merge - langs/und] Changed command to extract the specified fst name, the old version was not reliable. 2017-06-29T01:18:11+00:00
> Yesterday's Samest meeting. 2017-06-10T21:01:27+00:00
> [Template merge - langs/und] Due to wrong AM conditional, it still built a few mobile speller fst's. Now it should be quiet. 2017-05-23T09:32:25+00:00
> [Template merge - langs/und] Really do disable mobile spellers by default... 2017-05-23T08:57:05+00:00
> [Template merge - langs/und] Made mobile spellers not build by default, even when enabling spellers. The mobile spellers must now be explicitly enabled. 2017-05-23T08:39:53+00:00
> Last Samest meeting notes. 2017-05-19T19:55:02+00:00
> [Template merge - langs/und] Removed Ins() around Unknown. This triggered a bug(?) in hfst-tokenise, that caused wordforms not to be output. Speed and memory consumption should not be noticably affected. 2017-05-16T17:01:39+00:00
> [Template merge - langs/und] Improved pmatch scripts - unification by reference instead of full fst unification. Reduces file size by ≈2/3, and runtime memory consumption by 50%. 2017-05-04T10:22:09+00:00
> One more link to fix 2017-04-19T14:06:59+00:00
> Fix linking error in when building xtdoc/gtuit and xtdoc/techdoc 2017-04-19T08:58:22+00:00
> [Template merge - langs/und] Now that there is a new version of Hfst out, require it. Should resolve issues with compiling the url.lexc file. 2017-04-18T16:22:44+00:00
> Samest meeting notes x 2. 2017-04-12T14:36:44+00:00
> Better pronouns and better tests 2017-04-10T09:17:01+00:00
> Rewrote links to be relative to the site root instead of absolute. Command used: 2017-03-27T10:30:52+00:00
> Vokaalmitmus sõnadel nagu jalg, mõned asesõnafiltrid korda 2017-03-27T08:33:44+00:00
> Last meeting memo that I have forgotten to check in. 2017-03-27T07:40:31+00:00
> [Template merge - langs/und] Further development of the analysis regression check: added support for diff views of all diff types, and now you can specify which diff view you want to see (and you must specify at least one). You can also override the default corpus, and specify a corpus of your own with the -c/--corpus option. Also corrected the initial description of the script in the help text, and added a diff view comparing the old pipeline using Xerox with the new pipeline using hfst-tokenise. This will help in finding unwanted differences between the two. 2017-03-17T12:48:36+00:00
> [Template merge - langs/und] Further improvements to the analysis regression check: only do function and dependency analysis if the required cg3 files exist. Also clarified the -d option and silenced the Xerox lookup tool. 2017-03-16T14:34:03+00:00
> [Template merge - langs/und] Improved analysis regression check script: added a short help text, and added an option to ask for a diff between old-style (preprocess+lookup+lookup2cg) and new-style (hfst-tokenise+mwe-disamb+cg-mwesplit) morphological analysis. Intended to be used to find weak (and strong!) spots in the new-style morphological analysis. 2017-03-16T12:21:56+00:00
> [Template merge - langs/und] Added the first version of a $LANG/devtools/ script that will process a corpus with the available tools, and compare the result against the previous version in the svn repository. The idea is to be able to easily spot regressions in analyses due to changes in the lexicons or CG rules. There are a number of rough edges, but it works. 2017-03-16T10:12:06+00:00
> [Template merge - langs/und] Only remove generated lemma files if the lemma generation tests succeeds. 2017-03-14T14:42:06+00:00
> The XFAIL_TESTS variable was commented out. That causes the tests to FAIL, even in cases where we are aware of the errors but want to postpone fixing them. Commented the variable in again, so that 'make check' runs all the way through. 2017-03-13T12:56:39+00:00
> [Template merge - langs/und] Only delete generated dic and tex files if one really wants to start anew. Do not delete the version.txt file, only the generated wordlist file. 2017-03-07T18:46:22+00:00
> [Template merge - langs/und] Add the url parser also to the grammar checker tokeniser. 2017-03-07T15:01:20+00:00
> [Template merge - langs/und] Make the url.hfst a dependent of the hfst tokenising analyser. Improved the tokeniser based on recent changes in sme. 2017-03-06T17:08:41+00:00
> [Template merge - langs/und] Removed automatic inclusion of the url parsing fst. The union with the regular fst blew up the total, in some cases more than 10x! The preferred way of adding it is to add it in the last steps of the *.tmp.fst > *.fst processing by loading it onto the stack (and inverse it for hfst) before saving the fst stack, and thus creating a transducer file with two fst's. Applying the input to them both will in effect union them, giving the output we want without blowing up the size of the fst file. 2017-03-03T14:19:52+00:00
> [Template merge - langs/und] Added support for compiling a lexc file for parsing URL's as such, giving them a separate tag. Only added to the descriptive analysers for now. Requires an updated version of giella-shared, due to the new file needed for the new functionality. 2017-03-02T14:17:12+00:00
> [Template merge - langs/und] Corrects an inconsistency in the order of tag changing processing, where generators and analysers got their tags changed in different order, which caused different tags in some cases. Fixes bug #2264. Thanks to Heiki-Jaan Kaalep for the new and corrected code. 2017-03-02T06:40:00+00:00
> Pisiparandus tüübi 29 algvormide genemisele. Olema negatiivsete vormide ja +ConNeg ja +Neg kasutusega paistab ka segadus olema. 2017-02-28T22:45:12+00:00
> script for testing 2017-02-27T09:37:57+00:00
> [Template merge - langs/und] Updated Python feedback to correctly state that Python 3.5 is required. 2017-02-27T09:33:35+00:00
> names for the test 2017-02-26T20:00:22+00:00
> Make some tests pass, fix some forms 2017-02-26T02:01:32+00:00
> [Template merge - langs/und] Fixed issue with link generation thanks to Heiki-Jaan Kalep. 2017-02-22T09:03:27+00:00
> update 2017-02-21T10:08:42+00:00
> Last Samest meeting notes 2017-02-20T18:17:07+00:00
> [Template merge - langs/und] Increased reqiured version of Python3, due to the updated speller test bench. 2017-02-15T08:02:20+00:00
> [Template merge - langs/und] New version of the speller test bench, now with sortable table columns, and optional timing of the suggestions for every input word (hfst-ospell-office only). Not finished, but working quite well. It is also possible now to specify the number of suggestions returned by hfst-ospell-office. 2017-02-14T09:38:50+00:00
> [Template merge - langs/und] Increased required version of giella-core due to bug fix in the core. 2017-02-03T11:51:18+00:00
> [Template merge - langs/und] Increased required version of giella-core due to changes in speller building. 2017-02-03T09:50:59+00:00
> [Template merge - langs/und] One more attempt at fixing the giella-common package bug. 2017-02-02T08:57:48+00:00
> [Template merge - langs/und] Added final step in building pattern-based hyphenators: now also prepared for Hunspell-like OOo hyphenation. Requires new version of the giella-core. Also corrected bug in checking the version number of giella-common. 2017-02-01T11:11:30+00:00
> [Template merge - langs/und] Tex pattern based hyphenation generation works. The output must be checked and tested, and the process may have to be rerun several times to get the desired hyphenation behavior. Removed outcommented build code from the old infra - the new build code is essentially just a reformulation of the old one. 2017-01-31T14:44:34+00:00
> [Template merge - langs/und] Added support for checking the version of the giella-common package (aka giella-shared/). Added two new regexes to the source file list for shared regexes. Updated the required version of Hfst - it has not been updated in ages. 2017-01-31T13:56:33+00:00
> [Template merge - langs/und] Further work on the pattern based hyphenators: added tra file template, which is used to 'translate' non-ASCII chars to ascii only for the pattern creation process. Initial build steps for the pattern build. 2017-01-31T12:26:09+00:00
> [Template merge - langs/und] Improved the fst-based hyphenator by removing irrelevant paths from the fst. Started work on the pattern-based hyphenator, based on code from the old infra. 2017-01-31T11:12:44+00:00
> [Template merge - langs/und] Finished first version of fst-based hyphenator: now includes plain rules as a fall-back solution (including for misspelled words), and Err-tagged forms get a high weight penalty. In general, this seems to give good hyphenation patterns if one pick the first (lowest-weight) one. 2017-01-30T13:51:38+00:00
> [Template merge - langs/und] First version of lexicon-based and fst-based hyphenation done. Works, but misses capitalised words, and does not give extra weights to Err-tagged word forms. Also no hyphenation of misspelled words yet. Hyphenation builds are off by default. 2017-01-30T12:14:37+00:00
> [Template merge - langs/und] Added template file for weighting tags when the fst is used as a hyphenator. 2017-01-30T10:42:47+00:00
> [Template merge - langs/und] Added check for cg-relabel when enabling apertium. Thanks to Flammie for identifying the issue. 2017-01-30T09:31:50+00:00
> [Template merge - langs/und] Added basic dir structure for building hyphenators. 2017-01-27T07:35:00+00:00
> [Template merge - langs/und] Replaced gtcore with giella-core. 2017-01-25T09:59:45+00:00
> [Template merge - langs/und] Added test dir for hyphenators, to store data from the old infra. 2017-01-23T10:54:58+00:00
> [Template merge - langs/und] Added test dirs for listbased spellcheckers, if we ever get to that. 2017-01-23T09:11:43+00:00
> Bidictopnary entries generation from previous incomplete translations. 2017-01-22T19:42:36+00:00
> [Template merge - langs/und] Fixed logical error in the handling of negated specified fst handling in yaml tests (e.g. ~xfst) - the test didn't work, and the yaml file was run when not intended. 2017-01-18T00:34:52+00:00
> [Template merge - langs/und] Fixed regression introduced in the previous commit: one-sided tests where included when looking for test data, causing a subsequent python fail when no actual test data was found. Fixed by using a stricter file name pattern. 2017-01-17T15:52:04+00:00
> [Template merge - langs/und] Added option to specify in a yaml filename that it should only be tested against a specific technology or not, by specifying one of .foma, .hfst or .xfst before the suffix part (before [.gen].yaml), and prefixed with '~' if negated (i.e. .~xfst for NOT running it against Xerox). 2017-01-17T08:48:41+00:00
> [Template merge - langs/und] Slightly more robust yaml testing code. 2017-01-16T15:14:39+00:00
> [Template merge - langs/und] Common starting point for both weighted and unweighted parts. 2017-01-16T15:07:32+00:00
> Friday's Samest meeting 2017-01-16T12:57:43+00:00
> docu also for this est version 2017-01-13T12:55:56+00:00
> A lot of old corrections to tame overgeneration 2017-01-13T12:19:47+00:00
> [Template merge - langs/und] Added removal of Area tags also for specialised fst's. Fixes Korp issue reported by Ciprian. 2017-01-10T13:56:03+00:00
> Updates for cg conversion rules 2016-12-27T21:56:34+00:00
> Updates for dis-cg 2016-12-27T21:55:05+00:00
> Yesterday's Samest meeting notes. 2016-12-20T12:07:34+00:00
> [Template merge - langs/und] Ensure the fastest lookup method is used during hfst yaml generation tests. 2016-12-09T09:42:34+00:00
> today 2016-12-07T13:42:00+00:00
> [Template merge - langs/und] Removed the bash hack to add a css processing instruction - it is done by the perl script writing the xml file. 2016-11-28T20:13:53+00:00
> Tuesday's Samest meeting 2016-11-24T12:48:14+00:00
> [Template merge - langs/und] Removed the removal for dialect and variant tags from the grammar checker analyser, the information can be useful when generating suggestions for corrections. 2016-11-23T14:49:21+00:00
> [Template merge - langs/und] Removed repetition of the frequency weighted fst. The goal was to promote compounds where each part was already seen in the corpus, but it made the speller bigger and slower, and actually decreased suggestion quality slightly. — Also added code to do manual priority union, but it is buggy and outcommented for now. 2016-11-21T11:49:44+00:00
> Corrected typo. 2016-11-19T10:55:01+00:00
> [Template merge - langs/und] Added info about which file to look in to find a suitable frequency corpus cut-off location (=line number). 2016-11-18T09:41:58+00:00
1021c819
< now also estonian 2016-11-17T11:57:37+00:00
---
> estonian 2016-11-17T11:58:10+00:00
1023,1097c821,869
< manually syncronizing with Trond's input for numerals 2016-11-17T09:48:23+00:00
< no real change, just re-phrasing in twolc 2016-11-17T08:44:49+00:00
< [Template merge - langs/und] Renamed the option --enable-hfst-dekstop-spellers (added plural 's'), and changed the behavior of it so that when disabled, zhfst files are still built (and only those). 2016-11-16T09:14:37+00:00
< some checks added to root.lexc and an error in verb generation corrected 2016-11-12T19:27:18+00:00
< still 1st attempt to convert the documentation for forrest 2016-11-09T13:34:34+00:00
< 1st attempt to convert the documentation for forrest 2016-11-09T13:28:04+00:00
< ; missing, sigh. 2016-11-08T11:25:44+00:00
< improving 2016-11-08T11:22:53+00:00
< Ad hoc to make it compile, return to this. 2016-11-08T10:28:11+00:00
< [Template merge - langs/und] Cleaner build steps for local speller filters - the regex is now copied in and compiled according to the fst-format of the speller as opposed to earlier, where the binary fst was compiled and then transformed. 2016-11-02T22:47:28+00:00
< yaml tests pass, meaning that the Filosoft lexicon has been successfully converted to cover simplex word inflection 2016-11-02T16:42:54+00:00
< [Template merge - langs/und] Move CmpNP processing from general speller processing to each language. 2016-11-02T08:12:34+00:00
< [Template merge - langs/und] Also moved the CmpNP filtering to the relevant languages. 2016-11-02T04:32:23+00:00
< for hfst-lookup, all yaml tests pass; for xfst lookup, raudset and viskoosset do not pass 2016-11-01T19:59:06+00:00
< [Template merge - langs/und] Forgot one file in the previous commit - now that filter is completely removed from the core and template, and all language-independent processing. 2016-11-01T10:35:43+00:00
< [Template merge - langs/und] Moved the remove-norm-comp-tags.regex file from the giella-shared directory to the languages actually using it, and consequently removed it from the language-independent build files. 2016-11-01T10:17:09+00:00
< yaml tests still not passing... 2016-10-28T18:05:09+00:00
< generated yaml tests to cover all the wordforms that were used for testing during the previous development; not all the tests pass... 2016-10-28T13:51:48+00:00
< twol is without d:j and g:j pairs now 2016-10-27T14:29:23+00:00
< [Template merge - langs/und] Updated the speller devtools scripts to obey the new name and location of the giella-core directory. 2016-10-26T13:34:35+00:00
< [Template merge - langs/und] Added test for available GNU Make, and at least at version 3.82. Error if not found, except on OSX/macOS, where the builtin make is GNU Make 3.81 + patches, which corresponds to the required version or newer. 2016-10-26T12:25:37+00:00
< aed now without ad hoc E3 2016-10-25T19:10:12+00:00
< Liigsed asjad maha 2016-10-25T13:50:24+00:00
< "on" on yamli või pythoni yamliteegi jaoks eriline. 2016-10-25T13:28:14+00:00
< yaml test ka V jaoks 2016-10-25T13:22:20+00:00
< words from Filosoft lexicon (except those which components inflect, like kakssada) are now in lexc, and the simplest yaml-tests do not fail 2016-10-25T11:46:07+00:00
< twol rules now without except clause to keep it compilable by xerox twolc 2016-10-21T17:03:19+00:00
< [Template merge - langs/und] Better support for speller filters using source files from other locations. 2016-10-20T14:25:41+00:00
< Removed tests from XFAIL_TESTS - the tests that are there do PASS. 2016-10-18T13:52:06+00:00
< Updated documentation. 2016-10-18T13:43:22+00:00
< Added underlying form to the negative test. All tests must be in the form of string pairs, ie both underlying and surface forms in parallel. 2016-10-18T13:42:36+00:00
< two checks in est-phon.twolc 2016-10-18T13:36:13+00:00
< [Template merge - langs/und] Added mwe-dis.cg3, to allow disambiguation of multiword expressions and other tokenisation ambiguity. 2016-10-18T08:36:24+00:00
< [Template merge - langs/und] We build the tokeising analysers directly off the disamb and grammar checker analysers in src/, assuming that they are identical. This is a reasonable assumption now that the hfst tool kit contains all necessary machinery, and we don't need to pay special attention to the requirements of the tokenisation. 2016-10-17T07:25:22+00:00
< [Template merge - langs/und] Make --with-backend-format work also for the tokenising analysers. 2016-10-17T06:40:32+00:00
< still some errors in lexicons (but xfst lookup fails much more often than hfst-lookup) 2016-10-14T19:49:13+00:00
< yaml tests still report missing lemmas 2016-10-13T18:29:47+00:00
< missing lexc files added to svn 2016-10-13T14:13:20+00:00
< the whole lexicon (minus words whose both parts inflect) in lexc format and usable 2016-10-12T19:19:27+00:00
< [Template merge - langs/und] Wrong variable name :-( - now it is correct. 2016-10-10T15:01:20+00:00
< [Template merge - langs/und] Corrected makefile dependency for the und.timestamp file. 2016-10-10T14:49:22+00:00
< [Template merge - langs/und] More robustness added to the test scripts: checking several variables, testing whether the found variables are pointing to existing directories, and giving an error message if no directory is found. 2016-10-06T15:29:04+00:00
< ...and the filters for pronouns also 2016-10-05T17:03:22+00:00
< ne-words without N1 now, and simple pronouns described 2016-10-05T16:58:04+00:00
< [Template merge - langs/und] Changed variable name and definition to allow overriding the path to the called script, to make it easy to use a locally modified script instead. 2016-10-04T09:34:48+00:00
< [Template merge - langs/und] Changed variable name in devtool scripts, to reflect similar changes elsewhere. Part of fixing bug #2219. 2016-10-04T08:44:42+00:00
< minor corrections in twol comments 2016-09-22T08:03:25+00:00
< some minor clean-up 2016-09-21T18:43:22+00:00
< rewrote these 2-level rules that govern parallel form generation; they should be easier to understand now 2016-09-21T17:44:10+00:00
< a major update of twol, lexc files and conversion scripts 2016-09-13T09:22:47+00:00
< Line-wrap. Removed superfluous file listing. 2016-09-13T08:52:41+00:00
< [Template merge - langs/und] Corrected a number of bugs and deficiencies when building spellers when the giella proofing tools libraries must be fetched over the net. Not the spellers build correctly under all intended circumstances given that there is a network connection. 2016-09-09T16:16:09+00:00
< [Template merge - langs/und] Corrected path for the test for availability of the giella-common resources. 2016-09-09T11:31:19+00:00
< [Template merge - langs/und] Added support for getting precompiled proofing tools libraries across the net if not found locally. Makes it actually possible to build spellers without checking out the whole of $GIELLA_HOME. Now it is also possible to just check out $GIELLA_LIBS if one still wants to build everything locally. 2016-09-09T10:27:02+00:00
< [Template merge - langs/und] Applied backend format rules to the tools/mt/ap/filters dir. This is not future proof, but does not create problems for sme, and solves a bug in smj. The future problem is that we mix both a specified backend format (for compilation efficiency) with the default/unspecified format fst (for weighting) in the same dir, and we can't automatically say which filters need to be in the specified backend format and which should be in the default format. This needs further consideration. 2016-09-02T08:20:21+00:00
< [Template merge - langs/und] Completely clean src/transcriptions/, and also clean tools/mt/apertium/filters/. 2016-09-01T13:12:29+00:00
< [Template merge - langs/und] Do not use PKG_CHECK_MODULES if you don't really have to - it clutters your code and creates unneeded variables = noise. 2016-08-31T11:17:39+00:00
< [Template merge - langs/und] Corrected placeholder string for two-letter ISO language code. 2016-08-25T20:22:15+00:00
< [Template merge - langs/und] Changed the path to the css for the xml speller test results in devtools. 2016-08-25T18:48:37+00:00
< [Template merge - langs/und] Added support for building alternate orthography fst's for dictionary and oahpa, and also morphers for alternative orthographies. Slight simplification of defs. 2016-08-24T13:15:31+00:00
< [Template merge - langs/und] One small change to support spellers for alternative orthographies built off of the raw fst instead of the standard fst. 2016-08-23T22:05:53+00:00
< [Template merge - langs/und] Added a possibility to build fst's for alternate orthographies based on the raw fst surface forms, instead of from the default/standard orthography. 2016-08-23T20:30:51+00:00
< [Template merge - langs/und] Changed all references to $(GIELLA_SHARED)/common into $(GIELLA_SHARED)/all_langs. 2016-08-23T05:19:01+00:00
< [Template merge - langs/und] Rewrote the code for identifying the location of GIELLA_CORE (former GTCORE). The code should be more robust, and is prepared to check against a pkg-config pc file as well. GTCORE is still used throughout the code, but in parallel to GIELLA_CORE, so that one can easily replace the former with the latter without causing bugs or other problems. 2016-08-22T20:14:43+00:00
< [Template merge - langs/und] Added checking for and setting of GIELLA_TEMPLATES, but only if you have defined GIELLA_MAINTAINER (renamed from GTMAINTAINER). Otherwise it is ignored. 2016-08-22T14:58:53+00:00
< [Template merge - langs/und] Revert experiment with priority union - it doesn't work as expected when weights are involved. Corrected filenames in the .SECONDARY target. 2016-08-19T12:21:39+00:00
< [Template merge - langs/und] Added download links to the build feedbad for 'make upload' in tools/spellcheckers/fstbased/desktop/hfst/. 2016-08-19T10:24:36+00:00
< [Template merge - langs/und] Final step to make the GIELLA_SHARED dir be found in all cases: assign the path from pkg-config to the variable. 2016-08-18T10:33:29+00:00
< [Template merge - langs/und] Removed the separate test for content, instead adding the test to each possible location, moving to the next location if no data is found. 2016-08-18T09:46:12+00:00