Skip to content

Commit

Permalink
feat: evaluate CRF on patches too (surprising results)
Browse files Browse the repository at this point in the history
  • Loading branch information
dhdaines committed Jul 19, 2024
1 parent aeb9beb commit f090aeb
Show file tree
Hide file tree
Showing 9 changed files with 67 additions and 1 deletion.
Binary file modified alexi/models/crf.vl.joblib.gz
Binary file not shown.
Binary file modified alexi/models/crfseq.joblib.gz
Binary file not shown.
Binary file modified alexi/models/rnn.pt
Binary file not shown.
Binary file modified alexi/models/rnn_crf.pt
Binary file not shown.
3 changes: 3 additions & 0 deletions results/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.gz
*.pt
*.json
5 changes: 4 additions & 1 deletion results/run.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/bin/sh

for features in text+ text+layout text+layout+structure; do
python scripts/train_crf.py --features $features --labels bonly -x 4 data/*.csv -s results/$features-x4.csv
python scripts/train_crf.py --features $features --labels bonly \
-x 4 data/*.csv -s results/$features-x4.csv -o results/cnn_$features
python scripts/test_crf_voting.py -m results/cnn_$features data/patches/*.csv \
> results/$features-patches.txt
done
20 changes: 20 additions & 0 deletions results/text+-patches.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
precision recall f1-score support

O 1.00 0.23 0.38 26
B-Alinea 0.75 0.39 0.51 279
B-Annexe 1.00 1.00 1.00 4
B-Article 0.90 0.42 0.58 111
B-Chapitre 0.71 1.00 0.83 5
B-Figure 0.00 0.00 0.00 3
B-Liste 0.71 0.75 0.73 238
B-Pied 1.00 0.56 0.71 45
B-Section 1.00 1.00 1.00 15
B-SousSection 0.75 0.40 0.52 15
B-TOC 0.60 0.75 0.67 4
B-Tete 0.96 0.98 0.97 44
B-Titre 0.40 0.33 0.36 18

micro avg 0.77 0.55 0.65 807
macro avg 0.75 0.60 0.64 807
weighted avg 0.79 0.55 0.63 807

20 changes: 20 additions & 0 deletions results/text+layout+structure-patches.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
precision recall f1-score support

O 0.02 0.23 0.04 26
B-Alinea 0.86 0.76 0.80 279
B-Annexe 1.00 1.00 1.00 4
B-Article 0.90 0.42 0.58 111
B-Chapitre 0.44 0.80 0.57 5
B-Figure 0.00 0.00 0.00 3
B-Liste 0.52 0.75 0.61 238
B-Pied 1.00 0.56 0.71 45
B-Section 0.43 0.67 0.53 15
B-SousSection 0.44 0.27 0.33 15
B-TOC 0.00 0.00 0.00 4
B-Tete 1.00 0.80 0.89 44
B-Titre 0.17 0.28 0.21 18

micro avg 0.50 0.66 0.57 807
macro avg 0.52 0.50 0.48 807
weighted avg 0.71 0.66 0.66 807

20 changes: 20 additions & 0 deletions results/text+layout-patches.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
precision recall f1-score support

O 0.35 0.23 0.28 26
B-Alinea 0.87 0.84 0.86 279
B-Annexe 1.00 1.00 1.00 4
B-Article 0.94 0.42 0.58 111
B-Chapitre 0.71 1.00 0.83 5
B-Figure 0.00 0.00 0.00 3
B-Liste 0.71 0.75 0.73 238
B-Pied 0.94 0.67 0.78 45
B-Section 0.88 1.00 0.94 15
B-SousSection 0.71 0.33 0.45 15
B-TOC 0.60 0.75 0.67 4
B-Tete 1.00 1.00 1.00 44
B-Titre 0.40 0.33 0.36 18

micro avg 0.81 0.72 0.76 807
macro avg 0.70 0.64 0.65 807
weighted avg 0.81 0.72 0.75 807

0 comments on commit f090aeb

Please sign in to comment.