[RFC] 075 - 客户端模式从 Dexie DB 迁移 pglite #4868

arvinxx · 2024-12-02T12:49:36Z

arvinxx
Dec 2, 2024
Maintainer

arvinxx · 2024-12-03T04:02:01Z

arvinxx
Dec 3, 2024
Maintainer Author

PGlite 和 Postgres 复用

Schema 定义是否可以直接给 PGlite 使用？

现有的 Schema 定义代码可以直接给 PGlite 使用，原因如下：

导入源保持不变

// ✅ 保持原有的导入方式即可
import { integer, jsonb, pgTable, text, uuid, varchar, vector } from 'drizzle-orm/pg-core';

完全兼容性
- PGlite 是 PostgreSQL 的 WASM 实现
- 使用与 PostgreSQL 相同的类型系统
- Drizzle ORM 的 pg-core 类型定义对 PGlite 完全适用

使用方式

// Schema 定义保持不变
export const chunks = pgTable('chunks', { ... });

// 只在数据库连接时使用 PGlite 特定的导入
import { drizzle } from 'drizzle-orm/pglite';

因此，不需要对现有的 Schema 定义代码做任何修改，可以直接复用。

0 replies

arvinxx · 2024-12-04T01:32:44Z

arvinxx
Dec 4, 2024
Maintainer Author

Pglite 浏览器 Migration 机制

和服务端 migration 不同，在浏览器中 pglite 迁移只能在用户启动网站时完成，因此需要做一个运行时 migration 的实现方案，参考官方的一个 discussion: drizzle-team/drizzle-orm#2532 进行实现即可。

核心思路分两步：

Step1. 提前将所有的 sql 文件编译为一个 migrations.json 文件

import { readMigrationFiles } from 'drizzle-orm/migrator';
import { writeFileSync } from 'node:fs';
import { join } from 'node:path';

const dbBase = join(__dirname, '../../src/database');
const migrationsFolder = join(dbBase, './migrations');
const migrations = readMigrationFiles({ migrationsFolder: migrationsFolder });

writeFileSync(
  join(dbBase, './client/migrations.json'),
  JSON.stringify(migrations, null, 2), // null, 2 adds indentation for better readability
);

console.log('🏁 client migrations.json compiled!');

示例 SQL:

ALTER TABLE "messages" ADD COLUMN "client_id" text;--> statement-breakpoint
ALTER TABLE "session_groups" ADD COLUMN "client_id" text;--> statement-breakpoint
ALTER TABLE "sessions" ADD COLUMN "client_id" text;--> statement-breakpoint
ALTER TABLE "topics" ADD COLUMN "client_id" text;--> statement-breakpoint
CREATE INDEX IF NOT EXISTS "messages_client_id_idx" ON "messages" ("client_id");--> statement-breakpoint
ALTER TABLE "messages" ADD CONSTRAINT "messages_client_id_unique" UNIQUE("client_id");--> statement-breakpoint
ALTER TABLE "session_groups" ADD CONSTRAINT "session_groups_client_id_unique" UNIQUE("client_id");--> statement-breakpoint
ALTER TABLE "sessions" ADD CONSTRAINT "sessions_client_id_unique" UNIQUE("client_id");--> statement-breakpoint
ALTER TABLE "topics" ADD CONSTRAINT "topics_client_id_unique" UNIQUE("client_id");

输出JSON 示例：

{
    "sql": [
      "ALTER TABLE \"messages\" ADD COLUMN \"client_id\" text;",
      "\nALTER TABLE \"session_groups\" ADD COLUMN \"client_id\" text;",
      "\nALTER TABLE \"sessions\" ADD COLUMN \"client_id\" text;",
      "\nALTER TABLE \"topics\" ADD COLUMN \"client_id\" text;",
      "\nCREATE INDEX IF NOT EXISTS \"messages_client_id_idx\" ON \"messages\" (\"client_id\");",
      "\nALTER TABLE \"messages\" ADD CONSTRAINT \"messages_client_id_unique\" UNIQUE(\"client_id\");",
      "\nALTER TABLE \"session_groups\" ADD CONSTRAINT \"session_groups_client_id_unique\" UNIQUE(\"client_id\");",
      "\nALTER TABLE \"sessions\" ADD CONSTRAINT \"sessions_client_id_unique\" UNIQUE(\"client_id\");",
      "\nALTER TABLE \"topics\" ADD CONSTRAINT \"topics_client_id_unique\" UNIQUE(\"client_id\");\n"
    ],
    "bps": true,
    "folderMillis": 1717153686544,
    "hash": "ddb29ee7e7a675c12b44996e4be061b1736e8f785052242801f4cdfb2a94f258"
}

这些字段的含义：

sql: 数组类型
- 包含需要执行的 SQL 语句序列
- 每个数组项是一个独立的 SQL 语句
- 这些语句会按顺序执行
- 注意这里的换行符 \n 是作为格式化用途
bps: 布尔值
- "breakpoints" 的缩写
- 值为 true 表示这个迁移文件支持语句断点
- 这个字段用于告诉迁移器是否需要在语句之间添加断点
- 当为 true 时，迁移器会自动处理语句之间的分隔
folderMillis: 数字类型
- 表示迁移文件夹的时间戳（Unix 时间戳，毫秒级）
- 用于确定迁移的顺序
- 也用于追踪迁移文件的创建时间
hash: 字符串类型
- 迁移内容的 SHA-256 哈希值
- 用于验证迁移文件的完整性
- 防止迁移文件被意外修改
- 在执行迁移时用于检查文件是否被篡改

Step2. 在运行时调用 migrate 方法触发迁移

import { PgDialect } from 'drizzle-orm/pg-core';

import { clientDB } from './db';
import migrations from './migrations.json';

export const migrate = async () => {
  // refs: https://github.com/drizzle-team/drizzle-orm/discussions/2532
  // @ts-ignore
  await clientDB.dialect.migrate(migrations, clientDB.session, {});

  return clientDB;
};

注意事项： breakpoint statement

有一个巨坑需要注意，迁移的 sql 语句必须要有 breakpoint statement。

一开始遇到的一个问题是报 "cannot insert multiple commands into a prepared statement" 错误，搞了几小时死活没搞定。

后来发现是因为之前加过一些自定义的 SQL （如果全用 drizzle 生成的 sql 就不会遇到这个问题），如下：

-- step 1: create a temporary table to store the rows we want to keep
CREATE TEMP TABLE embeddings_temp AS
SELECT DISTINCT ON (chunk_id) *
FROM embeddings
ORDER BY chunk_id, random();

-- step 2: delete all rows from the original table
DELETE FROM embeddings;

-- step 3: insert the rows we want to keep back into the original table
INSERT INTO embeddings
SELECT * FROM embeddings_temp;

-- step 4: drop the temporary table
DROP TABLE embeddings_temp;

-- step 5: now it's safe to add the unique constraint
ALTER TABLE "embeddings" ADD CONSTRAINT "embeddings_chunk_id_unique" UNIQUE("chunk_id");

这些执行步骤中如果没添加--> statement-breakpoint 的话，drizzle 在生成 migrations.json 时就会把这一整段sql 写到一个语句中，变成

  {
    "sql": [
      "-- step 1: create a temporary table to store the rows we want to keep\nCREATE TEMP TABLE embeddings_temp AS\nSELECT DISTINCT ON (chunk_id) *\nFROM embeddings\nORDER BY chunk_id, random();\n\n-- step 2: delete all rows from the original table\nDELETE FROM embeddings;\n\n-- step 3: insert the rows we want to keep back into the original table\nINSERT INTO embeddings\nSELECT * FROM embeddings_temp;\n\n-- step 4: drop the temporary table\nDROP TABLE embeddings_temp;\n\n-- step 5: now it's safe to add the unique constraint\nALTER TABLE \"embeddings\" ADD CONSTRAINT \"embeddings_chunk_id_unique\" UNIQUE(\"chunk_id\");\n"
    ],
    "bps": true,
    "folderMillis": 1724254147447,
    "hash": "6aa3e7a9ff9dcd0541ade5471ceec758bc741ee4a3045b4b848e46faedeae7af"
  }

这样就会导致出错。所以需要添加 --> statement-breakpoint 字段。

\-- step 1: create a temporary table to store the rows we want to keep
CREATE TEMP TABLE embeddings_temp AS
SELECT DISTINCT ON (chunk_id) *
FROM embeddings
ORDER BY chunk_id, random();
+ --> statement-breakpoint

\-- step 2: delete all rows from the original table
DELETE FROM embeddings;
+ --> statement-breakpoint

\-- step 3: insert the rows we want to keep back into the original table
INSERT INTO embeddings
SELECT * FROM embeddings_temp;
+ --> statement-breakpoint

\-- step 4: drop the temporary table
DROP TABLE embeddings_temp;
+ --> statement-breakpoint

\-- step 5: now it's safe to add the unique constraint
ALTER TABLE "embeddings" ADD CONSTRAINT "embeddings_chunk_id_unique" UNIQUE("chunk_id");

这样一来生成的 json 就会正确：

{
    "sql": [
      "-- step 1: create a temporary table to store the rows we want to keep\nCREATE TEMP TABLE embeddings_temp AS\nSELECT DISTINCT ON (chunk_id) *\nFROM embeddings\nORDER BY chunk_id, random();\n",
      "\n\n-- step 2: delete all rows from the original table\nDELETE FROM embeddings;\n",
      "\n\n-- step 3: insert the rows we want to keep back into the original table\nINSERT INTO embeddings\nSELECT * FROM embeddings_temp;\n",
      "\n\n-- step 4: drop the temporary table\nDROP TABLE embeddings_temp;\n",
      "\n\n-- step 5: now it's safe to add the unique constraint\nALTER TABLE \"embeddings\" ADD CONSTRAINT \"embeddings_chunk_id_unique\" UNIQUE(\"chunk_id\");\n"
    ],
    "bps": true,
    "folderMillis": 1724254147447,
    "hash": "e99840848ffbb33ca4d7ead6158f02b8d12cb4ff5706d4529d7fa586afa4c2a9"
  },

0 replies

arvinxx · 2024-12-04T15:57:57Z

arvinxx
Dec 4, 2024
Maintainer Author

Client Service 层改造

由于大部分逻辑都收敛在 model 层了，因此在功能上 client service 的改造难度相对低很多很多，基本上就是实例对象的替换。

同时也有机会清理掉 deixe 中一些缺陷（例如 boolean 无法作为索引，因此保存为 0 和 1 存到 db）导致的冗余代码。

但一个较大工作量的地方是需要将原有 client service 的单测全部替换为 pglite 下的实现。不难，但会稍微耗费一些体力。现有的总共 7 个 service 需要替换，复杂的 service 大致需要 1~2 小时来重构单测。

Service 初始化处理

为了方便测试，每个 Service 可以通过 new 时传入 userId 。但在浏览器环境下，userId 的初始化是一个异步过程，需要等加载完也页面后才能获取。因此给每个 Service 添加一个 userId 的 get 方法，通过瞬时调用来获取应用的实时值。

export class ClientService implements ITopicService {

  private readonly fallbackUserId: string;

  private get userId(): string {
    return getClientDBUserId() || this.fallbackUserId;
  }

  constructor(userId?: string) {
    this.fallbackUserId = userId || FALLBACK_CLIENT_DB_USER_ID;
  }
}

同样的，原本的 this.topicModel 也需要改成 get 方法，通过瞬时调用初始化model。同时也测了下初始化的开销，在 0.01ms ~ 2ms 之间，平均应该 0.x ms，可以接受，不需要做缓存。

  private get topicModel(): TopicModel {
    return new TopicModel(clientDB as any, this.userId);
  }

这样一来，ClientService 部分的改造就基本结束了。更进一步的优化方案是收敛一个 BaseClientService 的基类，用于将 userId 相关的实现变成业务 Service 可复用的版本

const getClientDBUserId = () => {
  if (typeof window === 'undefined') return undefined;

  return window.__lobeClientUserId;
};

const FALLBACK_CLIENT_DB_USER_ID = 'DEFAULT_LOBE_CHAT_USER';

export class BaseClientService {
  private readonly fallbackUserId: string;

  protected get userId(): string {
    return getClientDBUserId() || this.fallbackUserId;
  }

  constructor(userId?: string) {
    this.fallbackUserId = userId || FALLBACK_CLIENT_DB_USER_ID;
  }
}

2 replies

arvinxx Dec 11, 2024
Maintainer Author

目前看下来 sessionGroup 可以单独提出来一个 service，以降低 session 部分的复杂度

arvinxx Dec 11, 2024
Maintainer Author

文件 service 部分

这部分其实感觉不需要类似 server 端一样做一个 globalFiles 来校验文件一致性。只需要一个 IndexedDB 来当成浏览器端的 S3 ，剩下部分就都可以走一样的 server model。

不过这里会需要一个方法，类似服务端的 getFullFileUrl ，从 IndexedDB 的数据中拼接成 base64 并返还。

虽然不需要 globalFiles 做检查，但是现在模型层已经有了 fileHash 和 globalFiles 的约束，所以还是需要有这个特性。所以重构 fileModel 的 create 实现来支持该约束（顺带优化掉了 server 部分创建 file 时的没有用事务的问题）

arvinxx · 2024-12-13T07:05:48Z

arvinxx
Dec 13, 2024
Maintainer Author

pglite 初始化

应用初始化

需要一个地方初始化并存储 userId。

现在存储的位置是 IndexedDB -> LOBE_CHAT_DB -> users 表，初始化逻辑在 GlobalProvider/StoreInitialization.tsx 的 useInitUserState ，原有的 UserModel 逻辑为：

class _UserModel extends BaseModel {

 getUser = async (): Promise<DB_User & { id: number }> => {
    const noUser = !(await this.table.count());

    if (noUser) await this.table.put({ uuid: uuid() });

    const list = (await this.table.toArray()) as (DB_User & { id: number })[];

    return list[0];
  };
}

是不是可以直接参考这个方案在 useInitUserState 中做初始化呢？

先这么做了，加了一个 makeSureUserExist 的实现

private async makeSureUserExist() {
    const existUsers = await clientDB.query.users.findMany();

    let user: { id: string };
    if (existUsers.length === 0) {
      const result = await clientDB.insert(users).values({ id: uuid() }).returning();
      user = result[0];
    } else {
      user = existUsers[0];
    }

    if (typeof window !== 'undefined') {
      window.__lobeClientUserId = user.id;
    }
  }

db 部分初始化

Cluade 还是很强的，对话了几轮就给了一个很好的实现。

0 replies

arvinxx · 2024-12-14T16:41:19Z

arvinxx
Dec 14, 2024
Maintainer Author

PGLite 数据库管理器设计

核心设计思路

单例模式管理数据库实例
- 确保整个应用只有一个数据库连接
- 统一管理数据库的生命周期
- 避免重复初始化和资源浪费
异步加载流程
- WASM 模块从 CDN 异步加载
- 依赖模块动态引入
- 数据库初始化和迁移解耦
- 完整的状态反馈机制

状态管理

enum DatabaseLoadingState {
  Idle = 'idle',           // 初始状态
  LoadingWasm = 'loading_wasm',    // 加载 WASM
  LoadingDependencies = 'loading_dependencies',  // 加载依赖
  Initializing = 'initializing',   // 初始化数据库
  Migrating = 'migrating',         // 数据库迁移
  Ready = 'ready',         // 就绪状态
  Error = 'error',         // 错误状态
}

进度追踪

interface LoadingProgress {
  phase: 'wasm' | 'dependencies';  // 加载阶段
  progress: number;                // 加载进度（0-100）
}

代理模式访问
- 使用 Proxy 包装数据库实例
- 确保在数据库初始化前的安全访问
- 维持 API 的一致性

关键实现特性

WASM 模块加载
- 支持进度追踪的流式加载
- 使用 WebAssembly.compile 编译模块
- CDN 资源加载支持
依赖管理
- 动态导入核心依赖
- 并行加载优化
- 模块化组织
数据库迁移
- 集成自动迁移流程
- 支持手动触发迁移
- 防止重复迁移的保护机制
错误处理
- 完整的错误状态管理
- 清晰的错误反馈
- 异常恢复机制

使用示例

// 初始化数据库
await initializeDB({
  onStateChange: (state) => {
    console.log('Database state:', state);
  },
  onProgress: ({ phase, progress }) => {
    console.log(`Loading ${phase}: ${progress}%`);
  },
});

// 数据库操作
await clientDB.query(...);

// 手动迁移（如果需要）
await migrate(true);

设计优势

可维护性
- 清晰的状态管理
- 模块化的代码组织
- 可扩展的架构设计
性能优化
- 避免重复初始化
- 并行加载优化
- 资源复用
用户体验
- 完整的加载状态反馈
- 精确的进度显示
- 清晰的错误提示
开发体验
- 简单直观的 API
- 完整的类型支持
- 一致的接口设计

4 replies

arvinxx Dec 14, 2024
Maintainer Author

调用效果：

state: 40:30:704 loading_dependencies
Loading dependencies: 25% 
Loading dependencies: 50% 
Loading dependencies: 75% 
Loading dependencies: 100% 
Loading dependencies: 100% , used:1879ms
state: 40:32:583 loading_wasm
Loading wasm: 2% 
Loading wasm: 3% 
Loading wasm: 4% 
Loading wasm: 23% 
Loading wasm: 26% 
Loading wasm: 30% 
Loading wasm: 32% 
Loading wasm: 33% 
Loading wasm: 35% 
Loading wasm: 37% 
Loading wasm: 38% 
Loading wasm: 40% 
Loading wasm: 42% 
Loading wasm: 44% 
Loading wasm: 45% 
Loading wasm: 47% 
Loading wasm: 49% 
Loading wasm: 50% 
Loading wasm: 52% 
Loading wasm: 54% 
Loading wasm: 57% 
Loading wasm: 59% 
Loading wasm: 61% 
Loading wasm: 62% 
Loading wasm: 64% 
Loading wasm: 66% 
Loading wasm: 67% 
Loading wasm: 69% 
Loading wasm: 76% 
Loading wasm: 79% 
Loading wasm: 80% 
Loading wasm: 84% 
Loading wasm: 89% 
Loading wasm: 93% 
Loading wasm: 96% 
Loading wasm: 97% 
Loading wasm: 98% 
Loading wasm: 100% 
Loading wasm: 100% , used:2963ms
state: 40:35:557 initializing
state: 40:35:559 migrating
state: 40:35:560 migrating
✅ Local database ready in 1215ms
state: 40:36:776 ready

arvinxx Dec 14, 2024
Maintainer Author

线上版本测试：

首次：

Loading dependencies: 100% , used:2078ms
Loading wasm: 100% , used:1792ms
migration database ready in 2266ms

二次：

Loading dependencies: 100% , used:495ms
Loading wasm: 100% , used:2909ms
migration database ready in 858ms

依赖还能接受，但是 wasm 这个重复请求要 2~3s 有点不能接受了，感觉要用一个本地缓存方案才行

arvinxx Dec 18, 2024
Maintainer Author

wasm 换成本地缓存后其实速度很快，倒是 migration 这个可能会持续带来 500ms~2s 的耗时，因此加一个 migration 的 hash 判断，如果本地已经有了这个 hash，就跳过 migration。

arvinxx Dec 18, 2024
Maintainer Author

待完善部分:

错误处理，展示抛错弹窗
后续需要针对知识库也添加指示器

arvinxx · 2024-12-16T18:54:52Z

arvinxx
Dec 16, 2024
Maintainer Author

现存的一些bug：

Uncaught (in promise) Error: ❌ Attempted to access a server-side environment variable on the client

修改设置时报错。原因是 userModel 有加密存储数据的逻辑，这部分逻辑需要调整掉

无法修改助手的设置，原因应该是 updateAgentConfig 的查询逻辑有问题
目前inbox 无法对话（问题应该是同上，没有做好 session 部分 slug=inbox 的处理逻辑）

0 replies

arvinxx · 2024-12-18T18:47:59Z

arvinxx
Dec 18, 2024
Maintainer Author

数据迁移

dexie 现存数据

理论上走一道 importService 就能导入。

测下来所有的消息都没有问题，但没有做文件部分的导入。

用户导出配置数据

旧版本

走一道 Migration 迁移到最新版，然后用 importService

新版本

直接走 schema 变更 SQL 数据迁移方案

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] 075 - 客户端模式从 Dexie DB 迁移 pglite #4868

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[RFC] 075 - 客户端模式 从 Dexie DB 迁移 pglite #4868

arvinxx Dec 2, 2024 Maintainer

背景

设计思路

实施进展

Step1. 项目目录架构调整

Step2. server 端 db 实现改造

Step3. client 端部分实现统一

Step4. 集成 pglite 客户端实现

Replies: 7 comments · 6 replies

arvinxx Dec 3, 2024 Maintainer Author

PGlite 和 Postgres 复用

arvinxx Dec 4, 2024 Maintainer Author

Pglite 浏览器 Migration 机制

Step1. 提前将所有的 sql 文件编译为一个 migrations.json 文件

Step2. 在运行时调用 migrate 方法触发迁移

注意事项： breakpoint statement

arvinxx Dec 4, 2024 Maintainer Author

Client Service 层改造

Service 初始化处理

arvinxx Dec 11, 2024 Maintainer Author

arvinxx Dec 11, 2024 Maintainer Author

文件 service 部分

arvinxx Dec 13, 2024 Maintainer Author

pglite 初始化

应用初始化

db 部分初始化

arvinxx Dec 14, 2024 Maintainer Author

PGLite 数据库管理器设计

核心设计思路

关键实现特性

使用示例

设计优势

arvinxx Dec 14, 2024 Maintainer Author

arvinxx Dec 14, 2024 Maintainer Author

arvinxx Dec 18, 2024 Maintainer Author

arvinxx Dec 18, 2024 Maintainer Author

arvinxx Dec 16, 2024 Maintainer Author

arvinxx Dec 18, 2024 Maintainer Author

数据迁移

dexie 现存数据

用户导出配置数据

旧版本

新版本

[RFC] 075 - 客户端模式从 Dexie DB 迁移 pglite #4868

arvinxx
Dec 2, 2024
Maintainer

Replies: 7 comments 6 replies

arvinxx
Dec 3, 2024
Maintainer Author

arvinxx
Dec 4, 2024
Maintainer Author

arvinxx
Dec 4, 2024
Maintainer Author

arvinxx Dec 11, 2024
Maintainer Author

arvinxx Dec 11, 2024
Maintainer Author

arvinxx
Dec 13, 2024
Maintainer Author

arvinxx
Dec 14, 2024
Maintainer Author

arvinxx Dec 14, 2024
Maintainer Author

arvinxx Dec 14, 2024
Maintainer Author

arvinxx Dec 18, 2024
Maintainer Author

arvinxx Dec 18, 2024
Maintainer Author

arvinxx
Dec 16, 2024
Maintainer Author

arvinxx
Dec 18, 2024
Maintainer Author