Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can inference be done without boxes and transcripts using PICK-pytorch ???? #105

Open
pushpalatha1405 opened this issue Sep 30, 2021 · 12 comments

Comments

@pushpalatha1405
Copy link

Hi wenwenyu,

i prepared my custom dataset according to the PICK-pytorch form and trained using the models used in PICK-pytorch .The below is the train score for 100 epochs around 69% and also test score around mEF 0.7150 using below test.py script(from PICK_pytorch)
'''
python test.py --checkpoint /datadrive/PICK-pytorch/saved/models/PICK_Default/test_0924_145754/checkpoint-epoch100.pth --boxes_transcripts /datadrive/PICK-pytorch/predictions/boxes_and_transcripts --images_path /datadrive/PICK-pytorch/predictions/images --output_folder /datadrive/PICK-pytorch/output_pred --batch_size 1 --gpu 0
'''

Now my question is when iam building end to end inference pipeline then i need to provide only the image and checkpoint-epoch100.pth file ,then i must get the corresponding entities extractions inform of json /txt file and bounding box coordinates.

But why i need to provide again box_and_transcripts annotation during inference?

Is there any way where i can use PICK-pytorch for inferencing by automatically give the checkpoint file and image path and get the predictions in form of text and images with bounding box.

pls let me know if any solutions exist i want to use PICK-pytorch model( after this much progress where i have trained tested on my custom dataset) in our product, but the barrier is passing box_and transcripts to test.py.

Hoping for the reply at the earliest

regards,
Pushpalatha M

@ziodos
Copy link

ziodos commented Sep 30, 2021

the model accepts both image and bounding boxes and corresponding transcripts as input, you can't only rely on image itself.

@pushpalatha1405
Copy link
Author

Hi ziodos,

Thanks for replying.

I partially agree with your input because if a user need to select the new field which is not been trained using PICK-pytorch model might require its box_transcripts to be passed to test.py.

But still how to get box predictions from the trained PICK-pytorch model. Please let me know is there any way i can include in the script and obtain the predicted bbox after training the model.

Hoping for your reply at the earliest

@pushpalatha1405
Copy link
Author

hi ziodos,

i did not get your input for the question i asked above.Please let me know if any possibility exists.even i will code if possibility exists.

Pushpa.

@mrtranducdung
Copy link

Hi authors,
I have similar question. if we have already had the boxes and transcripts why we need to run the PICK model? Because all the necessary information is in boxes and transcripts (box, text, class).
So, could you please explain if there is any way to run predict with only input image?

@tengerye
Copy link
Collaborator

For OCR (include detection and recognition):
image -> boxes and transcripts;

For layout analysis (e.g., PICK):
image, boxes, and transcripts -> labels for each boxes.

@mrtranducdung
Copy link

Hi tengerye
I got it, thank you for your explanation. i thought we need image, boxes , transcripts and class(such as company, address...) for testing the PICK model. If the classes are not required, it is ok now.
Thank you very much.

@pushpalatha1405
Copy link
Author

hi mrtranducdung,

PICK model does not inference automatically by giving just an image as input. along with the image corresponding bbox, text transcripts annotations must be provided, during prediction. No need to give class label it predicts the class label automatically through model.

The only possibility would be to apply the OCR techniques if need to predict with only input image.
i have implemented the inference by giving just the input image.

pls refer to the articles below for implementing auto inference. In below notebook refer the inference code section.
layoutlm_preprocess.txt
layoutlm_inference.txt

1)https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb#scrollTo=vm3sGnBsL64o
OR
if u cannot access the notebook............. search for Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb.

Also iam sending the code which i implemented , but followed most of the logic from above notebook to perform automatic inference using layoutLM transformer model. so similarly u can modify the logic for PICK transformer model.

regards,
Pushpalatha M

@mrtranducdung
Copy link

Hi pushpalatha1405,
Thank you very much for your reply. I though we need class label for testing the PICK model. If only box and text annotation are required, i can create them using my OCR model.
Thank you very much.

@pushpalatha1405
Copy link
Author

hi mrtranducdung,

Can you give bit more details on ocr model u r using to get (bbox , transcripts) at prediction stage, if its ok for you to share the details.

regards,
Pushpalatha M

@mrtranducdung
Copy link

Hi pushpalatha,
You can you pytesseract to get the boundingbox and transcripts --> Then you re-arrange the boxes and the transcripts according to the test example of PICK.
here are some pytesseract command you may need:

  • pytesseract.image_to_data(Image.open('your image')) --> return: boxes, confidences, line
  • pytesseract.image_to_string('your image') --> return transcript
  • pytesseract.image_to_boxes(Image.open('your image')) --> return: box

@pushpalatha1405
Copy link
Author

Got it! Thanks very much mrtranducdung..

regards,
Pushpalatha M

@pushpalatha1405
Copy link
Author

hi mrtranducdung,

iam revisiting this issue, where u use tesseractocr model to extract bbox and transcripts and then convert to PICK annoformat ,which can be used for model auto inference...

my questions include the following:

a)complexity of document like what if the document is very complex structure like utility bills? does really PICK can predict fields appropriately if above logic is used...because if annotation is created in some form for these n utility bills (word wise or sentense wise due to the document complex structure and huge number) ,do u still comment saying the above solution applying ocr model, accessing box & transcripts,convert to pick anno form,perform auto model inefernce.how well it goes?

pls Share your experience on this ,because i need your input as iam building a fledge robust auto inference pipeline using pytesseract or easyocr model...but i end up in delimma due to the complex document structure.

if any other solution exist to build auto inference using PICK...pls can u share.

awaiting for your reply.

regards,
Pushpalatha M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants