-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support kb-scope index building #366
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
13 tasks
E2E Result Deploymenthttps://tidb-ai-playwright-9mh2daoma-djaggers-projects.vercel.app |
…eat-support-multiple-kb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ref #327
This PR will make tidb.ai support multiple knowledge bases feature.
Background
Since knowledge base support configured different embedding model,
TiDBVectorStore
andTiDBGraphStore
may need to store vectors of varying dimensions generated by these models.However, the vector index feature requires a fixed vector dimension for the vector column. We can’t store multiple dimensions vectors into a single table.
This is contradictory.
To address this, we plan to physically partition index data for each knowledge base. Specifically, each KB will have its own tables such as
chunks_{kb_id}
,entities_{kb_id}
, andrelationships_{kb_id}
.Implementation
Define table schema
For index related table schema definition, we continue using
SQLModel
to define table schema and perform queries.The main challenge is:
The SQL Model classeses need to be dynamically defined based on each KB's configuration and mapping to different table.
Some solutions:
❌ Create a base class (e.g.,
ChunkBase
) for the common table schema, then dynamically create subclasses usingtype()
and override the__table_name__
andembedding
fields based on KB configuration.(This approach may cause errors during SQLModel initialization due to non-standard sqlachemy types, and the classes with same name in same sqlachemy registry is not allowed)
✅ Create a separate registry for each KB (like a namespace) to avoid class name conflicts. When dynamically generating the SQL Model for a KB, pass in the appropriate registry.
(In SQL Model, passing the registry parameter marks the class as
__abstract__
, allowing SQLModel to skip registration.)You can obtain the dynamic created SQL Model by
get_kb_chunk_model
/get_kb_entity_model
/get_kb_relationship_model
helper method.Perform Queries
When initialized *Store class, you can pass into the SQL Model created dynamically, otherwise using the legacy sql model by default. The subsequent query uses the table corresponding to the SQL Model for query.
TiDBVectorStore
TiDBGraphStore
Retrieve
ChatEngineConfig
(first stage, support multiple KB retrieve in the featue)UI