Producing structured results (e.g., to populate a database or spreadsheet) from open-ended research (e.g., web research) is a common use case that LLM-powered agents are well-suited to handle. Here, we form the a general template for a "data enrichment agent" and customize it to produce structured outputs on companies. It contains an example graph exported from src/enrichment_agent/graph.py
that implements a research assistant capable of automatically gathering information on various topics from the web and structuring the results into a user-defined JSON format.
The enrichment agent defined in src/enrichment_agent/graph.py
performs the following steps:
- Takes a research list of companiescompanies and requested extraction_schema as input.
- Searches the web for relevant information of these companies in parallel
- Reads and extracts key details from websites
- Organizes the findings into the requested structured format
- Validates the gathered information for completeness and accuracy
Assuming you have already installed LangGraph Studio, to set up:
- Create a
.env
file.
cp .env.example .env
- Define required API keys in your
.env
file.
The primary search tool 1 used is Tavily. Create an API key here.
The defaults values for model
are shown below:
model: anthropic/claude-3-5-sonnet-20240620
Follow the instructions below to get set up, or pick one of the additional options.
To use Anthropic's chat models:
- Sign up for an Anthropic API key if you haven't already.
- Once you have your API key, add it to your
.env
file:
ANTHROPIC_API_KEY=your-api-key
To use OpenAI's chat models:
- Sign up for an OpenAI API key.
- Once you have your API key, add it to your
.env
file:
OPENAI_API_KEY=your-api-key
- Consider a research topic and desired extraction schema.
As an example, here is a research topic we can consider.
"Top 5 chip providers for LLM Training"
And here is a desired extraction schema (pasted in as "extraction_schema
"):
{
"type": "object",
"properties": {
"company_name": {
"type": "string",
"description": "Official name of the company"
},
"founding": {
"type": "object",
"properties": {
"year": {"type": "integer"},
"founders": {
"type": "array",
"items": {"type": "string"},
"description": "Names of the founding team members"
}
}
},
"product_info": {
"type": "object",
"properties": {
"main_product": {"type": "string"},
"description": {"type": "string"},
"key_features": {
"type": "array",
"items": {"type": "string"}
},
"target_users": {
"type": "array",
"items": {"type": "string"},
"description": "Primary user groups (e.g., 'ML Engineers', 'Data Scientists')"
}
}
},
"technical_details": {
"type": "object",
"properties": {
"programming_languages": {
"type": "array",
"items": {"type": "string"}
},
"open_source": {"type": "boolean"},
"github_repo": {"type": "string"},
"license_type": {"type": "string"}
}
},
"business_info": {
"type": "object",
"properties": {
"business_model": {"type": "string"},
"funding_status": {"type": "string"},
"competitors": {
"type": "array",
"items": {"type": "string"}
},
"key_partnerships": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
}
- Open the folder LangGraph Studio, and input
topic
andextraction_schema
.
- Customize research targets: Provide a custom JSON
extraction_schema
when calling the graph to gather different types of information. - Select a different model: We default to anthropic (sonnet-35). You can select a compatible chat model using
provider/model-name
via configuration. Example:openai/gpt-4o-mini
. - Customize the prompt: We provide a default prompt in prompts.py. You can easily update this via configuration.
For quick prototyping, these configurations can be set in the studio UI.
You can also quickly extend this template by:
- Adding new tools and API connections in tools.py. These are just any python functions.
- Adding additional steps in graph.py.
While iterating on your graph, you can edit past state and rerun your app from past states to debug specific nodes. Local changes will be automatically applied via hot reload. Try adding an interrupt before the agent calls tools, updating the default system message in src/enrichment_agent/utils.py
to take on a persona, or adding additional nodes and edges!
Follow up requests will be appended to the same thread. You can create an entirely new thread, clearing previous history, using the +
button in the top right.
You can find the latest (under construction) docs on LangGraph here, including examples and other references. Using those guides can help you pick the right patterns to adapt here for your use case.
LangGraph Studio also integrates with LangSmith for more in-depth tracing and collaboration with teammates.
We can also interact with the graph using the LangGraph API.
See ntbk/testing.ipynb
for an example of how to do this.
LangGraph Cloud (see here) make it possible to deploy the agent.