Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIChat full-page storage #25876

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

AIChat full-page storage #25876

wants to merge 4 commits into from

Conversation

petemill
Copy link
Member

@petemill petemill commented Oct 8, 2024

  • Adds AIChatDatabase class which purely looks at persisting and retrieving conversation and conversation entry items.
  • Modified AIChatService to manage data going in and out of the database
  • Adds relevant uuid fields and modifies relevant constructors and unit tests do deal with them

TODO:

  • iOS isn't compiling due to not supporting optional int "ID" field for SiteInfo struct. I'll wrap this with another optional struct or convert to string.
  • For browser data deletion, when "browsing history" is selected, also delete AI Chat conversations which were created or modified within the specified timeframe and have associated content.
  • Implement a browsing_data::BrowsingDataCounter to show how many conversations will be deleted, possibly how many megabytes total the DB is (follow-up, nice to have).

@petemill petemill self-assigned this Oct 8, 2024
@github-actions github-actions bot added the CI/storybook-url Deploy storybook and provide a unique URL for each build label Oct 8, 2024
@petemill petemill changed the base branch from master to aichat-fullpage October 8, 2024 07:02
@petemill petemill force-pushed the aichat-fullpage-storage branch 3 times, most recently from d881677 to ff534cf Compare October 9, 2024 07:12
@brave-builds
Copy link
Collaborator

A Storybook has been deployed to preview UI for the latest push

Copy link
Contributor

@fallaciousreasoning fallaciousreasoning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is looking great @petemill! Lets hold off merging this till #25800 lands so we don't have to get the code reviewed twice.

// It is prepared for future implementation.
// Delete all Leo data
ai_chat::AIChatServiceFactory::GetForBrowserContext(profile_)
->ClearAllHistory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do a null check here? GetForBrowserContext can return null.

Maybe a CHECK if we expect it to never happen, for better crash logs.

@@ -6,7 +6,9 @@
#include "brave/browser/ui/webui/ai_chat/ai_chat_ui_page_handler.h"

#include <memory>
#include <string>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: either revert the file, or move the <string> include into the header file (as it depends on it too)

profile_->GetPrefs()->ClearPref(ai_chat::prefs::kLastAcceptedDisclaimer);
g_brave_browser_process->process_misc_metrics()
->ai_chat_metrics()
->RecordReset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for other reviewers: This call still happens in ClearAllHistory

#include "base/check.h"
#include "base/strings/string_split.h"
#include "base/time/time.h"
#include "brave/components/ai_chat/core/common/mojom/ai_chat.mojom-shared.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

@@ -873,10 +933,13 @@ void ConversationHandler::AddToConversationHistory(
return;
}

if (!turn->uuid.has_value()) {
turn->uuid = base::Uuid::GenerateRandomV4().AsLowercaseString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: could be worth a NewId function in the anonymous namespace as we call this a few times. Pretty simple though so up to you

ai_chat_db_.Reset();
return;
}
DVLOG(0) << "AIChat DB initialized";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVLOG(0)? this doesn't seem that different in importance to the other logs

bool AddConversationEntry(
std::string_view conversation_uuid,
mojom::ConversationTurnPtr entry,
std::optional<std::string> editing_id = std::nullopt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already the default value

Comment on lines +268 to +280
ai_chat_db_.AsyncCall(&AIChatDatabase::IsInitialized)
.Then(base::BindOnce(&AIChatService::OnDBInit,
weak_ptr_factory_.GetWeakPtr()));
}

void AIChatService::OnDBInit(bool success) {
CHECK(ai_chat_db_);
if (!success) {
LOG(ERROR) << "Failed to initialize AIChat DB";
// Ensure any pending tasks are cancelled. Sometimes, especially during unit
// tests that are very quick to run, initilization might fail because of
// shutdown but the next task in the sequence still executes, causing a
// crash because of bad database state.
ai_chat_db_.Reset();
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better we do

ai_chat_db_.AsyncCall(&AIChatDatabase::Init)
  .WithArgs(storage_dir.Append(kDBFileName))
  .Then(base::BindOnce(&AIChatService::OnDBInit,
                           weak_ptr_factory_.GetWeakPtr()));
}

instead of doing it in constructor to eliminate the unnecessary intermediate state is_initialized_ which cause us to possibly check initialized in the middle of initializing process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did start out by having a separate public AIChatDatabase::Init function that consumers would call, but it seemed redundant because what's the point of constructing the class without Init being run, and also I didn't want to have to check that it was init'ed on every other function call.

I could change it back, but just thinking about the reasoning - is it really possible for is_initialized_ to be checked in the middle of the initializing process when all the other functions will be called on the sequence and I assume that guarantees Init will be finished, since it's called in the constructor? I know I'm probably wrong about this so please let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I read your comments in a wrong way that makes me think that is possible to interrupt it. But it seems impossible due to the same reason you mentioned, constructing is posted to the sequenced task runner we chose and then when we call other functions, it will be posted to the same sequence as next task.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even with current code, we still rely on consumer to check if it is successfully inited and it is not safe to assume that it would successfully initialized upon calling other public functions. So we still miss checking is_initialized_ in all public functions because init might fail.

}

bool AIChatDatabase::CreateConversationTable() {
static const std::string kCreateConversationTableQuery =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static constexpr char kCreateConversationTableQuery[] and should apply to all the static queries

}

std::optional<std::string> AIChatDatabase::DecryptOptionalColumnToString(
sql::Statement& statement,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be const sql::Statement&

Comment on lines +785 to +787
auto column_type = statement.GetColumnType(index);
// Don't allow non-BLOB types
if (column_type != sql::ColumnType::kBlob) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (statement.GetColumnType(index) != sql::ColumnType::kBlob) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a style nit, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it doesn't need named variable for one time use unless it is something like callback that would increase level of indention which makes it harder to read.

std::nullopt, std::nullopt, 0, false, false);
mojom::ConversationPtr conversation =
mojom::Conversation::New(conversation_uuid, "", base::Time::Now(),
false, std::move(non_content));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline mojom::SiteInfo::New

std::move(callback).Run(nullptr);
return;
}
CHECK(ai_chat_db_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

<< " with data: " << data->entries.size() << " entries and "
<< data->associated_content.size() << " contents";
mojom::Conversation* conversation =
conversations_.at(conversation_uuid.data()).get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will crash when the conversation is no longer in the map. We can't guarantee that won't happen between AIChatDatabase::GetConversationData and getting the result back

Base automatically changed from aichat-fullpage to master October 30, 2024 16:19
Copy link
Contributor

[puLL-Merge] - brave/brave-core@25876

Description

This PR implements persistent storage for AI Chat conversations using a new AIChatDatabase class. It modifies the AIChatService and ConversationHandler to use this database for storing and retrieving conversation data. The changes also include updates to the mojom definitions, test utilities, and various unit tests to support the new functionality.

Changes

Changes

  1. components/ai_chat/core/browser/ai_chat_database.cc and .h:

    • New files implementing the AIChatDatabase class for persistent storage of conversations.
  2. components/ai_chat/core/browser/ai_chat_service.cc and .h:

    • Modified to use the new AIChatDatabase for storing and retrieving conversations.
    • Added methods for database initialization and conversation data management.
  3. components/ai_chat/core/browser/conversation_handler.cc and .h:

    • Updated to support persistent storage of conversation entries.
    • Modified event handling for conversation changes.
  4. components/ai_chat/core/common/mojom/ai_chat.mojom:

    • Updated definitions for Conversation, ConversationTurn, and related structures to support persistent storage.
  5. components/ai_chat/core/browser/test_utils.cc and .h:

    • New files with utility functions for testing conversation-related functionality.
  6. Various unit test files:

    • Updated to account for the new persistent storage functionality.
  7. browser/ai_chat/ai_chat_service_factory.cc:

    • Modified to include the new OSCryptAsync and profile path parameters in AIChatService construction.

Possible Issues

  1. The PR introduces a significant amount of new code, which may increase the complexity of the AI Chat system and potentially introduce new bugs.
  2. The changes to the database schema and conversation structure may require careful migration handling for existing users.

Security Hotspots

  1. components/ai_chat/core/browser/ai_chat_database.cc:

    • The database operations involve encryption and decryption of sensitive data. Ensure that the encryption methods are properly implemented and that no unencrypted data is accidentally stored or transmitted.
  2. components/ai_chat/core/browser/ai_chat_service.cc:

    • The ClearAllHistory method deletes all conversation data. Ensure that this operation is properly secured and can only be triggered by authorized users or processes.
  3. components/ai_chat/core/browser/conversation_handler.cc:

    • The SetArchiveContent method stores content from associated web pages. Ensure that this content is properly sanitized and does not introduce XSS or other injection vulnerabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/storybook-url Deploy storybook and provide a unique URL for each build needs-security-review puLL-Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants