0.9.0 - 2021-06-22
This is a long-awaited release with some performance improvements and some breaking changes. See the changelog for details.
Added
- @HiromuHota: Support spaCy v2.3. (#506)
- @HiromuHota: Add
HOCRDocPreprocessor
andHocrVisualLinker
to support hOCR as input file. (#476) (#519) - @YasushiMiyata: Add multiline Japanese strings support to
fonduer.parser.visual_parser.hocr_visual_parser
. (#534) (#542) - @YasushiMiyata: Add commit process immediately after add to
fonduer.parser.Parser
. (#494) (#544)
Changed
-
@HiromuHota: Renamed
VisualLinker
toPdfVisualParser
, which assumes the followings: (#518)pdf_path
should be a directory path, where PDF files exist, and cannot be a file path.- The PDF file should have the same basename (
os.path.basename
) as the document. E.g., the PDF file should be either "123.pdf" or "123.PDF" for "123.html".
-
@HiromuHota: Changed
Parser
's signature as follows: (#518)- Renamed
vizlink
tovisual_parser
. - Removed
pdf_path
. Now this is required only byPdfVisualParser
. - Removed
visual
. Providevisual_parser
if visual information is to be parsed.
- Renamed
-
@YasushiMiyata: Changed
UDFRunner
's andUDF
's data commit process as follows: (#545)- Removed
add
process on single-thread in_apply
inUDFRunner
. - Added
UDFRunner._add
ofy
on multi-threads toParser
,Labeler
andFeaturizer
. - Removed
y
of document parsed result fromout_queue
inUDF
.
- Removed
Fixed
- @YasushiMiyata: Fix test code test_postgres.py::test_cand_gen_cascading_delete. (#538) (#539)
- @HiromuHota: Process the tail text only after child elements. (#333) (#520)