Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support kb-scope index building #366

Merged
merged 52 commits into from
Nov 20, 2024
Merged

Conversation

Mini256
Copy link
Member

@Mini256 Mini256 commented Nov 7, 2024

ref #327

This PR will make tidb.ai support multiple knowledge bases feature.

Background

Since knowledge base support configured different embedding model, TiDBVectorStore and TiDBGraphStore may need to store vectors of varying dimensions generated by these models.

However, the vector index feature requires a fixed vector dimension for the vector column. We can’t store multiple dimensions vectors into a single table.

This is contradictory.

To address this, we plan to physically partition index data for each knowledge base. Specifically, each KB will have its own tables such as chunks_{kb_id}, entities_{kb_id}, and relationships_{kb_id}.

Implementation

Define table schema

For index related table schema definition, we continue using SQLModel to define table schema and perform queries.

The main challenge is:

The SQL Model classeses need to be dynamically defined based on each KB's configuration and mapping to different table.

Some solutions:

  1. ❌ Create a base class (e.g., ChunkBase) for the common table schema, then dynamically create subclasses using type() and override the __table_name__ and embedding fields based on KB configuration.

    (This approach may cause errors during SQLModel initialization due to non-standard sqlachemy types, and the classes with same name in same sqlachemy registry is not allowed)

  2. ✅ Create a separate registry for each KB (like a namespace) to avoid class name conflicts. When dynamically generating the SQL Model for a KB, pass in the appropriate registry.

    (In SQL Model, passing the registry parameter marks the class as __abstract__, allowing SQLModel to skip registration.)

    kb_sql_model_contexts:Dict[str, KBSQLModelContext] = {}
    
    class KBSQLModelContext:
        registry: RegistryType = None
    
        def __init__(self, registry: RegistryType = default_registry):
            self.registry = registry

You can obtain the dynamic created SQL Model by get_kb_chunk_model / get_kb_entity_model / get_kb_relationship_model helper method.

Perform Queries

When initialized *Store class, you can pass into the SQL Model created dynamically, otherwise using the legacy sql model by default. The subsequent query uses the table corresponding to the SQL Model for query.

TiDBVectorStore

class TiDBVectorStore(BasePydanticVectorStore):
    def __init__(
        ...
        chunk_db_model: SQLModel = Chunk
    ) -> None:

TiDBGraphStore

class TiDBGraphStore(KnowledgeGraphStore):
    def __init__(
        ...
        entity_db_model: SQLModel = DBEntity,
        relationship_db_model: SQLModel = DBRelationship,
    ):

Retrieve

  • Support link one knowledge base for chat engine on ChatEngineConfig (first stage, support multiple KB retrieve in the featue)
  • Keep the old API temporarily

UI

image image image

Copy link

vercel bot commented Nov 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
tidb-ai-preview ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 20, 2024 8:26am
tidb-ai-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 20, 2024 8:26am

Copy link

github-actions bot commented Nov 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants