The graphrag-toolkit is a Python toolkit for building GraphRAG applications. It provides a framework for automating the construction of a graph from unstructured data, and composing question-answering strategies that query this graph when answering user questions.
The toolkit uses low-level LlamaIndex components – data connectors, metadata extractors, and transforms – to implement much of the graph construction process. By default, the toolkit uses Amazon Neptune Analytics or Amazon Neptune Database for its graph store, and Neptune Analytics or Amazon OpenSearch Serverless for its vector store, but it also provides extensibility points for adding alternative graph stores and vector stores. The default backend for LLMs and embedding models is Amazon Bedrock; but, as with the stores, the toolkit can be configured for other LLM and embedding model backends using LlamaIndex abstractions.
If you're running on AWS, there's a quick start AWS CloudFormation template in the examples directory. Note that you must run your application in an AWS region containing the Amazon Bedrock foundation models used by the toolkit (see the configuration section in the documentation for details on the default models used), and must enable access to these models before running any part of the solution.
The graphrag-toolkit requires python and pip to install. You can install the graphrag-toolkit using pip:
$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v1.1.3.zip
The graphrag-toolkit requires Python 3.10 or greater.
import os
from graphrag_toolkit import LexicalGraphIndex
from graphrag_toolkit.storage import GraphStoreFactory
from graphrag_toolkit.storage import VectorStoreFactory
from llama_index.readers.web import SimpleWebPageReader
import nest_asyncio
nest_asyncio.apply()
def run_extract_and_build():
graph_store = GraphStoreFactory.for_graph_store(
'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
)
vector_store = VectorStoreFactory.for_vector_store(
'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
)
graph_index = LexicalGraphIndex(
graph_store,
vector_store
)
doc_urls = [
'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
]
docs = SimpleWebPageReader(
html_to_text=True,
metadata_fn=lambda url:{'url': url}
).load_data(doc_urls)
graph_index.extract_and_build(docs, show_progress=True)
if __name__ == '__main__':
run_extract_and_build()
from graphrag_toolkit import LexicalGraphQueryEngine
from graphrag_toolkit.storage import GraphStoreFactory
from graphrag_toolkit.storage import VectorStoreFactory
import nest_asyncio
nest_asyncio.apply()
def run_query():
graph_store = GraphStoreFactory.for_graph_store(
'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
)
vector_store = VectorStoreFactory.for_vector_store(
'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
)
query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
graph_store,
vector_store
)
response = query_engine.query('''What are the differences between Neptune Database
and Neptune Analytics?''')
print(response.response)
if __name__ == '__main__':
run_query()
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.