Skip to content

Commit

Permalink
fix: remove tag from sequence features
Browse files Browse the repository at this point in the history
  • Loading branch information
dhdaines committed Feb 13, 2024
1 parent bb03198 commit b93530a
Show file tree
Hide file tree
Showing 4 changed files with 421 additions and 10 deletions.
1 change: 0 additions & 1 deletion alexi/label.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ def features(page: Sequence[T_obj]) -> Iterator[list[str]]:
features.append("alpha=%s" % bool(word["text"].isalpha()))
features.append("numdash=%s" % bool(NUMDASH.match(word["text"])))
features.append("bold=%s" % bool("bold" in word["fontname"].lower()))
features.append("tag=%s" % word["segment"].partition("-")[2])
features.append("size=%d" % (int(word["bottom"]) - int(word["top"])))
yield features

Expand Down
Binary file modified alexi/models/crfseq.joblib.gz
Binary file not shown.
Loading

0 comments on commit b93530a

Please sign in to comment.