v0.8.3
0.8.3 - 2020-09-11
This is a big release with a lot of changes. These changes are summarized here. Check the Changelog for more details.
Added
- @YasushiMiyata: Add
get_max_row_num
tofonduer.utils.data_model_utils.tabular
. (#469) (#480) - @HiromuHota: Add get_bbox() to
Sentence
andSpanMention
. (#429) - @HiromuHota: Add a custom MLflow model that allows you to package a Fonduer model. See here for how to use it. (#259) (#407)
- @HiromuHota: Support spaCy v2.2. (#384) (#432)
- @wajdikhattel: Add multinary candidates. (#455) (#456)
- @HiromuHota: Add
nullables
tocandidate_subclass()
to allow NULL mention in a candidate. (#496) (#497) - @HiromuHota: Copy textual functions in
data_model_utils.tabular
todata_model_utils.textual
. (#503) (#505)
Changed
- @YasushiMiyata: Enable RegexMatchSpan with concatenates words by sep="(separator)" option. (#270) (#492)
- @HiromuHota: Enabled "Type hints (PEP 484) support for the Sphinx autodoc extension." (#421)
- @HiromuHota: Switched the Cython wrapper for Mecab from mecab-python3 to fugashi. Since the Japanese tokenizer remains the same, there should be no impact on users. (#384) (#432)
- @HiromuHota: Log a stack trace on parsing error for better debug experience. (#478) (#479)
- @HiromuHota:
get_cell_ngrams
andget_neighbor_cell_ngrams
yield nothing when the mention is not tabular. (#471) (#504)
Deprecated
- @HiromuHota: Deprecated
bbox_from_span
andbbox_from_sentence
. (#429) - @HiromuHota: Deprecated
visualizer.get_box
in favor ofspan.get_bbox()
. (#445) (#446) - @HiromuHota: Deprecate textual functions in
data_model_utils.tabular
. (#503) (#505)
Fixed
- @senwu: Fix pdf_path cannot be without a trailing slash. (#442) (#459)
- @kaikun213: Fix bug in table range difference calculations. (#420)
- @HiromuHota: mention_extractor.apply with clear=True now works even if it's not the first run. (#424)
- @HiromuHota: Fix
get_horz_ngrams
andget_vert_ngrams
so that they work even when the input mention is not tabular. (#425) (#426) - @HiromuHota: Fix the order of args to Bbox. (#443) (#444)
- @HiromuHota: Fix the non-deterministic behavior in VisualLinker. (#412) (#458)
- @HiromuHota: Fix an issue that the progress bar shows no progress on preprocessing by executing preprocessing and parsing in parallel. (#439)
- @HiromuHota: Adopt to mlflow>=1.9.0. (#461) (#463)
- @HiromuHota: Correct the entity type for NumberMatcher from "NUMBER" to "CARDINAL". (#473) (#477)
- @HiromuHota: Fix
_get_axis_ngrams
not to returnNone
when the input is not tabular. (#481) - @HiromuHota: Fix
Visualizer.display_candidates
not to draw rectangles on wrong pages. (#488) - @HiromuHota: Persist doc only when no error happens during parsing. (#489) (#490)