You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/app $ python3 docker_test/langchain_unstructured_test_pdf_to_text.py
INFO: pikepdf C++ to Python logger bridge initialized
INFO: Reading PDF for file: /app/example-docs/pdf/2409.12431v1.pdf ...
Traceback (most recent call last):
File "/app/docker_test/langchain_unstructured_test_pdf_to_text.py", line 19, in<module>
docs = load_and_process_pdf_structured(pdf_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/docker_test/langchain_unstructured_test_pdf_to_text.py", line 11, in load_and_process_pdf_structured
fordocinloader.lazy_load():
File "/home/notebook-user/.local/lib/python3.11/site-packages/langchain_unstructured/document_loaders.py", line 178, in lazy_load
yield from load_file(f=self.file, f_path=self.file_path)
File "/home/notebook-user/.local/lib/python3.11/site-packages/langchain_unstructured/document_loaders.py", line 212, in lazy_load
else self._elements_json
^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/langchain_unstructured/document_loaders.py", line 231, in _elements_json
return self._convert_elements_to_dicts(self._elements_via_local)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/langchain_unstructured/document_loaders.py", line 249, in _elements_via_local
return partition(
^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/auto.py", line 342, in partition
elements = partition_pdf(
^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/documents/elements.py", line 605, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 731, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/file_utils/filetype.py", line 687, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
elements = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 205, in partition_pdf
return partition_pdf_or_image(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 307, in partition_pdf_or_image
elements = _partition_pdf_or_image_local(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/utils.py", line 217, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 720, in _partition_pdf_or_image_local
elements = document_to_element_list(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/common.py", line 657, in document_to_element_list
add_element_metadata(
TypeError: unstructured.partition.common.add_element_metadata() got multiple values for keyword argument 'coordinates'
Ok - I figured it out. You have to remove coordinates=False from the code. coordinates doesn't exist in the API for the constructor.
I was using Langhcain's Unstructured Docs and seems like they haven't updated it. It uses coordinates=True argument in the UnstructuredLoader Constructor while in the Langchain python API for UnstructuredLoader constructor there isn't a mention of coordinates parameter.
Describe the bug
To Reproduce
Expected behavior
To run the code without error
Environment Info
I used the official unstructured docker image, in Windows 11, WSL 2
The text was updated successfully, but these errors were encountered: