Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocr showcase #866

Closed
wants to merge 33 commits into from
Closed

Ocr showcase #866

wants to merge 33 commits into from

Conversation

hepengfe
Copy link
Collaborator

@hepengfe hepengfe commented Jul 8, 2022

This PR fixes #867

Description of changes

Implement a notebook tutorial for OCR to show forte's capability on managing related data such as images containing text and recognized text, texts' bounding boxes and their relations..

Possible influences of this PR.

Describe what are the possible side-effects of the code change.

Test Conducted

Describe what test cases are included for the PR.

@hepengfe hepengfe added topic: examples Issue about examples topic:cv labels Jul 8, 2022
@hepengfe hepengfe self-assigned this Jul 8, 2022
Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let's always put the images to the asset branch, besides the size problem, there are two more reasons: 1. git is also not good at managing binary data 2. the asset branch won't subject to frequent change like master (so the url will be more stable)
  2. I have attempted to clean up the payload structure in another branch. It is not finished and I deliberately make a wrong test case, but some changes there may help you.
  3. It feels like there are a lot of payload basic not ready to finish up this OCR showcase.

@@ -47,13 +46,13 @@
from forte.data.ontology.core import EntryType
from forte.data.ontology.top import (
Annotation,
Grids,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grids is still an entry? I thought we discuss that it should be a data structure. Or is that change not merged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not merged yet and it's in #827

@@ -244,7 +243,7 @@ def text(self) -> str:
@property
def audio(self) -> Optional[np.ndarray]:
r"""Return the audio of the data pack"""
return self.get_payload_data_at(Modality.Audio, 0)
return cast(np.ndarray, self.get_payload_data_at(Modality.Audio, 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cast is not safe, not everything can be cast this way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a typing issue mypy forte, and I haven't find a better way to solve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think we should assume audio to be ndarray

@@ -569,7 +565,7 @@ def set_text(
# temporary solution for backward compatibility
# past API use this method to add a single text in the datapack
if len(self.text_payloads) == 0 and text_payload_index == 0:
from ft.onto.base_ontology import ( # pylint: disable=import-outside-toplevel
from ft.onto.payload_ontology import ( # pylint: disable=import-outside-toplevel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we importing locally here, these should be imported top level

"description": "A payload that caches image data",
"attributes":[]
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make the indent aligned.

"type": "str"
},
{"name": "mime",
"type": "str"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use a json formatter

@@ -435,7 +463,12 @@ class CrossDocEventRelation(MultiPackLink):
ParentType = EventMention
ChildType = EventMention

def __init__(self, pack: MultiPack, parent: Optional[Entry] = None, child: Optional[Entry] = None):
def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code generated? I feel like there are some manual or tool modifications to this file.

img_meta = JpegMeta(datapack)
img_meta.source_type = "local"

self.f.register(img_meta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of a register is to associate something together. For example, here we should associate the meta data info with the function to load the meta data, and the function to serialize the meta data. But if you only provide one value to register, it cannot achieve what we want.

@@ -0,0 +1,181 @@
{
"cells": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link the notebook to the tutorial section next time? So that I can also review what it looks like in the tutorial section

" pack.set_text(ocr_text)\n",
" pack.pack_name = data_source\n",
" \n",
" yield pack\n",
Copy link
Member

@hunterhector hunterhector Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loading part is not what we desired, for example, it requires one to explicitly set the cache. And I don't even see metadata being used here.

pack: DataPack = DataPack()
jpegMeta = JpegMeta()
ImagePayload(pack, url="some_url", meta=jpegMeta) # payload_index default to 0

Now the user's job is done, the rest should all be handled by Forte (well, the user could also provide a method to load jpegMeta)

This is what we call lazy_loading, it doesn't happen anywhere near the reader, but behind the scene.

When the Payload is used, it should load the data to cache based on URL and jpegMeta, there doesn't need to be a step where the user set the cache.

" yield pack\n",
"\n",
"\n",
"# TODO: split ocr part into a processor"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah please do

@hepengfe
Copy link
Collaborator Author

#875

@hepengfe hepengfe closed this Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:cv topic: examples Issue about examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OCR example tutorial
2 participants