-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Esl 137 added boxes into table #333
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
oksidgy
commented
Sep 21, 2023
- added BBox into TableTree
- added mapping between fastocr boxes and cell boxes
- added CellWithMeta
- change output table structure, remove CellProperies in output
- change logic bbox extraction from image tables after debugging
- change output in CSV, HTML, TABBY, PDF, SCAN readers
- change all tests with tables
- fixed styles
NastyBoget
reviewed
Sep 21, 2023
NastyBoget
reviewed
Sep 21, 2023
dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_cell_extractor.py
Outdated
Show resolved
Hide resolved
dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_cell_extractor.py
Outdated
Show resolved
Hide resolved
NastyBoget
reviewed
Sep 21, 2023
dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_cell_extractor.py
Outdated
Show resolved
Hide resolved
dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_line_extractor.py
Outdated
Show resolved
Hide resolved
dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_line_extractor.py
Outdated
Show resolved
Hide resolved
NastyBoget
reviewed
Sep 21, 2023
oksidgy
force-pushed
the
esl-137-added_boxes_into_table
branch
3 times, most recently
from
September 22, 2023 16:20
18a2993
to
315f02a
Compare
dedoc/readers/pdf_reader/pdf_txtlayer_reader/pdf_tabby_reader.py
Outdated
Show resolved
Hide resolved
NastyBoget
reviewed
Sep 25, 2023
NastyBoget
reviewed
Sep 25, 2023
NastyBoget
reviewed
Sep 25, 2023
NastyBoget
reviewed
Sep 25, 2023
NastyBoget
reviewed
Sep 25, 2023
NastyBoget
reviewed
Sep 25, 2023
dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_cell_extractor.py
Outdated
Show resolved
Hide resolved
NastyBoget
reviewed
Sep 25, 2023
NastyBoget
reviewed
Sep 25, 2023
...mage_reader/table_recognizer/table_extractors/concrete_extractors/onepage_table_extractor.py
Outdated
Show resolved
Hide resolved
NastyBoget
reviewed
Sep 25, 2023
oksidgy
force-pushed
the
esl-137-added_boxes_into_table
branch
from
September 25, 2023 15:37
315f02a
to
89147f0
Compare
NastyBoget
reviewed
Sep 26, 2023
NastyBoget
reviewed
Sep 26, 2023
NastyBoget
reviewed
Sep 26, 2023
- added CellWithMeta - change output table structure, remove CellProperies in output - change logic bbox extraction from image tables after debugging - change output in CSV, HTML, TABBY, PDF, SCAN readers - change all tests with tables - fixed styles
- fixed after review - removing some unused functions
oksidgy
force-pushed
the
esl-137-added_boxes_into_table
branch
2 times, most recently
from
September 26, 2023 11:30
1f2b6aa
to
f597eaf
Compare
oksidgy
force-pushed
the
esl-137-added_boxes_into_table
branch
from
September 26, 2023 12:01
f597eaf
to
97e835c
Compare
NastyBoget
added a commit
that referenced
this pull request
Oct 10, 2023
* TLDR-405 remove is_one_column_document_list (#332) * TLDR-405 remove is_one_column_document_list * TLDR-405 fix tests * TLDR-405 review fix * TLDR-448-Fix draw coordinates bug (#330) * Fix draw coordinates bug * Fix draw coordinates conversion * TLDR-451 tutorial new doc type (#331) * docs added * add code testing * some fixes * some fixes * add tabula and some fixes * add python-djvulibre * delete python-djvulibre and add djvulibre-bin * add poppler-utils * add tesseract * some fixes * flake8 stylefix * fix docs after flake8 * update last part of adding_new_doc_type_tutorial * rewrite dedoc_add_new_doc_type_tutorial * minor fixes * minor fixes * minor fixes * some fixes * add more code examples * some fixes --------- Co-authored-by: Nikita Shevtsov <shevtsov@ispras.ru> Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru> * updated txt layer correctness classifier (#334) Co-authored-by: Alexander Golodkov <golodkov@ispras.ru> * Esl 137 added boxes into table (#333) * ESL-137 added box extraction skeleton into scan table extraction * ESL-138 ESL-137 a lot of table changes - added CellWithMeta - change output table structure, remove CellProperies in output - change logic bbox extraction from image tables after debugging - change output in CSV, HTML, TABBY, PDF, SCAN readers - change all tests with tables - fixed styles * ESL-137 chnaged draw table script * ESL-148 added script of table word boxes drawing * TLDR-471 added angle rotation from PdfImageReader and Tables * ESL-137 fixed unit-tests * ESL-137 fixed after review; removing some unused functions - fixed after review - removing some unused functions * ESL-137 update docs * ESL-137 after review * Updated columns orientation classifier (#335) * updated txt columns orientation classifier * deleted "no_lines" parameter --------- Co-authored-by: Alexander Golodkov <golodkov@ispras.ru> * fix pdf reader (#337) Co-authored-by: Nikita Shevtsov <shevtsov@ispras.ru> * TLDR-472 add flake8-fill-one-line and flake8-multiline-containers and fix lint (#336) * add flake8-fill-one-line and flake8-multiline-containers and fix lint * update precommit hook * TLDR-475 fix table documentation (#338) * TLDR-475 fix table documentation * Small fixes * TLDR-474 remove insert_table parameter (#339) * TLDR-474 remove insert_table parameter * TLDR-474 remove is_inserted attribute * ESL-470 fixed rotation operation of table word boxes (#341) rotates a table image and saving image.shape during rotation. It is important for word bounding box extraction * TLDR-478 docx table refactoring (#342) * TLDR-478 docx table refactoring * Small fixes * TLDR-483 fixed box extraction from cropped cells (#343) * TLDR-473 add dedoc utils (#340) * use dedoc utils BBox class * use AdaptiveBinarizer from dedoc-utils * use SkewCorrector from dedoc-utils * fix style * fix rotated angle error * delete BBox from docs * fix angles * delete print * fix dedocutils * dedocutils set ver. 0.3.5 * fix mistakes and names --------- Co-authored-by: Nikita Shevtsov <shevtsov@ispras.ru> * TLDR-481 html refactoring (#344) * delete unused files * Delete unused files, refactor html * Refactor query parameters * Fix tests * Refactor train dataset api * Fix style * Change python version in tests * Review fixes * TLDR-490 changed uuid1 on uuid4; fixed bug in tabby's table uuid (#345) * TLDR-490 changed uuid1 on uuid4; fixed bug in tabby's table uuid * TLDR-490 fixes after review * Added running API examples instruction (#346) * added linewithmeta comparison operator (#347) Co-authored-by: Alexander Golodkov <golodkov@ispras.ru> * ESL-156 fix pdfminer boxes output (#348) * ESL-156 fix pdfminer boxes output * ESL-156 after review * ESL-159 fixed extract boxes from pdfminer reader (#350) * new version 1.0 (#351) --------- Co-authored-by: Andrey Mikhailov <mikhailov@icc.ru> Co-authored-by: Nikita Shevtsov <61932814+Travvy88@users.noreply.github.com> Co-authored-by: Nikita Shevtsov <shevtsov@ispras.ru> Co-authored-by: Alexander Golodkov <55749660+alexander1999-hub@users.noreply.github.com> Co-authored-by: Alexander Golodkov <golodkov@ispras.ru> Co-authored-by: Oksana Belyaeva <belyaeva@ispras.ru> Co-authored-by: Andrew Perminov <perminov@ispras.ru>
sunveil
pushed a commit
that referenced
this pull request
Oct 11, 2023
* ESL-137 added box extraction skeleton into scan table extraction * ESL-138 ESL-137 a lot of table changes - added CellWithMeta - change output table structure, remove CellProperies in output - change logic bbox extraction from image tables after debugging - change output in CSV, HTML, TABBY, PDF, SCAN readers - change all tests with tables - fixed styles * ESL-137 chnaged draw table script * ESL-148 added script of table word boxes drawing * TLDR-471 added angle rotation from PdfImageReader and Tables * ESL-137 fixed unit-tests * ESL-137 fixed after review; removing some unused functions - fixed after review - removing some unused functions * ESL-137 update docs * ESL-137 after review
sunveil
pushed a commit
that referenced
this pull request
Oct 11, 2023
* ESL-137 added box extraction skeleton into scan table extraction * ESL-138 ESL-137 a lot of table changes - added CellWithMeta - change output table structure, remove CellProperies in output - change logic bbox extraction from image tables after debugging - change output in CSV, HTML, TABBY, PDF, SCAN readers - change all tests with tables - fixed styles * ESL-137 chnaged draw table script * ESL-148 added script of table word boxes drawing * TLDR-471 added angle rotation from PdfImageReader and Tables * ESL-137 fixed unit-tests * ESL-137 fixed after review; removing some unused functions - fixed after review - removing some unused functions * ESL-137 update docs * ESL-137 after review
sunveil
pushed a commit
that referenced
this pull request
Oct 11, 2023
* ESL-137 added box extraction skeleton into scan table extraction * ESL-138 ESL-137 a lot of table changes - added CellWithMeta - change output table structure, remove CellProperies in output - change logic bbox extraction from image tables after debugging - change output in CSV, HTML, TABBY, PDF, SCAN readers - change all tests with tables - fixed styles * ESL-137 chnaged draw table script * ESL-148 added script of table word boxes drawing * TLDR-471 added angle rotation from PdfImageReader and Tables * ESL-137 fixed unit-tests * ESL-137 fixed after review; removing some unused functions - fixed after review - removing some unused functions * ESL-137 update docs * ESL-137 after review
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.