This repo demonstrates how to use the Unstructured library with Weaviate. The Unstructured Library offers powerful capabilities for parsing a variety of data sources and extracting structured text from them. This includes, but is not limited to, documents like PDFs, Powerpoints, or JPEG files.
The dataset we've included are two publicly available research papers. One paper contains a single column, and the other has a two column format. The notebook starts with a basic approach to using Unstructured and ends with an end-to-end example. This includes connecting to your Weaviate instance, defining your schema, uploading the data and then running two queries.
Read the blog post for more information!