-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native re-implementation of Web Discovery (part 1) #24421
Conversation
kMaxRequestInterval, | ||
kMaxRetries, | ||
base::BindRepeating(&DoubleFetcher::OnFetchTimer, | ||
base::Unretained(this))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
std::move(resource_request), kFetchNetworkTrafficAnnotation); | ||
url_loader_->DownloadToString( | ||
shared_url_loader_factory_.get(), | ||
base::BindOnce(&DoubleFetcher::OnRequestComplete, base::Unretained(this), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
kMaxRequestInterval, | ||
kMaxRetries, | ||
base::BindRepeating(&Reporter::PrepareRequest, | ||
base::Unretained(this))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
credential_signer_->Sign( | ||
std::vector<const uint8_t>(payload_hash.begin(), payload_hash.end()), | ||
basename_result->basename, | ||
base::BindOnce(&Reporter::OnRequestSigned, base::Unretained(this), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
content_scraper_->ScrapePage(url, false, | ||
document_extractor_remotes_.Get(remote_id), | ||
base::BindOnce(&WDPService::OnContentScraped, | ||
base::Unretained(this), false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
alive_message_timer_.Start( | ||
FROM_HERE, kAliveCheckInterval, | ||
base::BindRepeating(&WDPService::MaybeSendAliveMessage, | ||
base::Unretained(this))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
server_config_loader_ = std::make_unique<ServerConfigLoader>( | ||
local_state_, user_data_dir_, shared_url_loader_factory_.get(), | ||
base::BindRepeating(&WDPService::OnConfigChange, | ||
base::Unretained(this)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
base::BindRepeating(&WDPService::OnConfigChange, | ||
base::Unretained(this)), | ||
base::BindRepeating(&WDPService::OnPatternsLoaded, | ||
base::Unretained(this))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
service_manager::BinderRegistry* registry) | ||
: content::RenderFrameObserver(render_frame), render_frame_(render_frame) { | ||
registry->AddInterface<mojom::DocumentExtractor>(base::BindRepeating( | ||
&BlinkDocumentExtractor::BindReceiver, base::Unretained(this))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] base::Unretained is most of the time unrequited, and a weak reference is better suited for secure coding.
Consider swapping Unretained for a weak reference.
base::Unretained usage may be acceptable when a callback owner is guaranteed
to be destroyed with the object base::Unretained is pointing to, for example:
- PrefChangeRegistrar
- base::*Timer
- mojo::Receiver
- any other class member destroyed when the class is deallocated
Source: https://github.com/brave/security-action/blob/main/assets/semgrep_rules/client/chromium-uaf.yaml
Cc @thypon @goodov @iefremov
A Storybook has been deployed to preview UI for the latest push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
script/brave_license_helper.py
changes look good.
* This Source Code Form is subject to the terms of the Mozilla Public | ||
* License, v. 2.0. If a copy of the MPL was not distributed with this file, | ||
* You can obtain one at https://mozilla.org/MPL/2.0/. */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall my concern with this content scraper is it seems to be doing parsing of web content in the browser process, which would violate the Rule of 2: https://chromium.googlesource.com/chromium/src/+/master/docs/security/rule-of-2.md. In general C++ code in the browsre process should not process untrustworthy inputs unless they are very trivial or first passed through a sanitizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above the class declaration provides info on the two modes of operation:
The first mode involves gathering contents from the current page in the renderer process. The untrusted content is processed by the sandboxed renderer process.
The second mode retrieves the untrusted content via the sandboxed network service process, and processes the HTML/DOM in the browser process using Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds fine to me as the processing that does happen in C++ seems to be relatively minimal but would ike @bridiver to sign off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are several existing methods for extracting html content from the page, why are you introducing something new here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am unaware of those existing methods, could you please provide a couple pointers?
auto final_payload = GenerateFinalPayload(*payload_dict); | ||
|
||
std::string final_payload_json; | ||
if (!base::JSONWriter::Write(final_payload, &final_payload_json)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the payload could contain any untrustworthy content (like any properties which can be controlled by a webpage, for instance) this should follow the rule of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The payload is generated from the output of the ContentScraper, which follows Rule of 2.
|
A Storybook has been deployed to preview UI for the latest push |
@@ -86,6 +86,7 @@ public class BravePrivacySettings extends PrivacySettings implements ConnectionE | |||
public static final String PREF_FINGERPRINTING_PROTECTION2 = "fingerprinting_protection2"; | |||
private static final String PREF_CLOSE_TABS_ON_EXIT = "close_tabs_on_exit"; | |||
private static final String PREF_SEND_P3A = "send_p3a_analytics"; | |||
private static final String PREF_SEND_WEB_DISCOVERY = "send_web_discovery"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a mechanism to expose cpp prefs to java, but I don't recall what it is off-hand. @SergeyZhukovsky ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefs from cpp to java are exposed via that section https://github.com/brave/brave-core/blob/master/browser/android/preferences/BUILD.gn#L37. You basically add files with prefs that you want to expose and the script does. The prefs could be found in an auto generated BrsavePref.java file that is made from a template https://github.com/brave/brave-core/blob/master/browser/preferences/android/java_templates/BravePref.java.tmpl
That's an example on how a pref would look like in Java
public static final String BRAVE_CHAT_PURCHASE_TOKEN_ANDROID = "brave.ai_chat.purchase_token_android"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we already do this:
brave-core/android/java/org/chromium/chrome/browser/privacy/settings/BravePrivacySettings.java
Lines 334 to 340 in 072fdfc
if (BraveConfig.WEB_DISCOVERY_ENABLED | |
&& ChromeFeatureList.isEnabled(BraveFeatureList.BRAVE_WEB_DISCOVERY_NATIVE)) { | |
mSendWebDiscovery = (ChromeSwitchPreference) findPreference(PREF_SEND_WEB_DISCOVERY); | |
mSendWebDiscovery.setOnPreferenceChangeListener(this); | |
} else { | |
removePreferenceIfPresent(PREF_SEND_WEB_DISCOVERY); | |
} |
brave-core/browser/android/preferences/BUILD.gn
Lines 37 to 57 in 072fdfc
java_cpp_strings("java_pref_names_srcjar") { | |
sources = [ | |
"//brave/components/brave_adaptive_captcha/pref_names.h", | |
"//brave/components/brave_rewards/common/pref_names.h", | |
"//brave/components/brave_shields/core/common/pref_names.h", | |
"//brave/components/brave_vpn/common/pref_names.h", | |
"//brave/components/constants/pref_names.h", | |
"//brave/components/de_amp/common/pref_names.cc", | |
"//brave/components/de_amp/common/pref_names.h", | |
"//brave/components/debounce/core/common/pref_names.h", | |
"//brave/components/decentralized_dns/core/pref_names.h", | |
"//brave/components/ipfs/pref_names.h", | |
"//brave/components/ntp_background_images/common/pref_names.h", | |
"//brave/components/omnibox/browser/brave_omnibox_prefs.cc", | |
"//brave/components/omnibox/browser/brave_omnibox_prefs.h", | |
"//brave/components/p3a/pref_names.h", | |
"//brave/components/request_otr/common/pref_names.h", | |
"//brave/components/speedreader/speedreader_pref_names.h", | |
"//brave/components/web_discovery/browser/pref_names.h", | |
"//components/translate/core/browser/translate_pref_names.h", | |
] |
constexpr size_t kMinPageCountForAliveMessage = 2; | ||
} // namespace | ||
|
||
WDPService::WDPService( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use consistent naming, WebDiscovery appears to be used everywhere else.
void WebDiscoveryTabHelper::DidFinishLoad( | ||
content::RenderFrameHost* render_frame_host, | ||
const GURL& url) { | ||
if (!render_frame_host->IsInPrimaryMainFrame()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DocumentOnLoadCompletedInPrimaryMainFrame
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that method does not pass a url
|
||
const TEXT_CONTENT_ATTRIBUTE_NAME: &str = "textContent"; | ||
|
||
fn extract_attributes_from_nodes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we use the axtree which would also take the place of the document extrator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crate is used to process the page contents from the anonymous request made in the browser process. If there is a way to construct an AxTree from a string, would it be easy to do this in a sandboxed process? if not, wouldn't that violate rule of 2?
|
||
namespace web_discovery { | ||
|
||
class WebDiscoveryContentScraperTest : public PlatformBrowserTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only browser test? I think we need some higher level tests for the full process and also tests to ensure that it's properly enabled/disabled with pref/flag
RSAKeyInfo::RSAKeyInfo() = default; | ||
RSAKeyInfo::~RSAKeyInfo() = default; | ||
|
||
std::unique_ptr<RSAKeyInfo> GenerateRSAKeyPair() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use RsaHashedAlgorithm
from webcrypto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. i meant to swap this code out with RsaHashedAlgorithm
before making the PR (i learned about its existence later on during implementation), but i think we can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or crypto::RSAPrivateKey?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what thread does this run on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all callers use a thread pool
std::optional<AESEncryptResult> DeriveAESKeyAndEncrypt( | ||
const std::string& server_pub_key_b64, | ||
const std::vector<uint8_t>& data) { | ||
auto server_pub_key_data = base::Base64Decode(server_pub_key_b64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems strange to me that we're doing this low level encryption stuff ourselves. Are there really not any utilities in chromium for this? There must be something in crypto? Aead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for our specific use case of ecdh, i couldn't find anything useful from webcrypto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AesGcmImplementation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or crypt::Aead
with `AES_256_GCM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crypto::ECPrivateKey
/EcdhImplementation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've tried using stuff from components/webcrypto/algorithms
previously, but it ended up being a no-go since it required a few renderer specific blink dependencies
crypt::Aead
should be an option, however. there may have been a reason why i chose not to use it but i can't recall. i'll revisit nevertheless.
the ecdh stuff would still remain, however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i remembered why crypt::Aead
wasn't an option. we need the aes-128-gcm algo, and the class does not expose that algo. i suppose we could patch it to support it, but it probably wouldn't be worth it especially considering the simplicity of the aes portion of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECPrivateKey
is also not useful since only EC points are exchanged between the server and client, and not PKCS#8/X.509 blocks which the ECPrivateKey
uses for serialization.
start_join_result.gsk.end()); | ||
pool_sequenced_task_runner_->PostTaskAndReplyWithResult( | ||
FROM_HERE, | ||
base::BindOnce(&GenerateJoinRequest, &**anonymous_credential_manager_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not look safe, anonymous_credential_manager_
is accessed on more than one thread dca6a55#diff-41b339c44a2b795ab1bfc9a8a48f5a1a1a1ce380252a5cde46a18338ff5052baR388
also why do you have a rust::Box inside a unique_ptr? You can pass the rust::Box with &*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the unique_ptr is just for OnTaskRunnerDeleter
I think it makes things a bit confusing with &**
and you can just std::move the rust::Box and PostTask
pool_sequenced_task_runner_->PostTask(
FROM_HERE, base::BindOnce([](::rust::Box<anonymous_credentials::CredentialManager> credential_manager) {},
std::move(anonymous_credential_manager_)));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be possible, but then we wouldn't be able to perform multiple join and sign requests concurrently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not look safe, anonymous_credential_manager_ is accessed on more than one thread
the issue in the diff that you linked was fixed in a later commit. all calls to the manager are through the task runner
@@ -71,15 +85,18 @@ class CredentialManager { | |||
net::BackoffEntry backoff_entry_; | |||
base::WallClockTimer retry_timer_; | |||
|
|||
base::RepeatingClosure credentials_loaded_callback_; | |||
scoped_refptr<base::SequencedTaskRunner> pool_sequenced_task_runner_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this pool_*
?
This PR has a nice history of commits, but they are not very usable as-is. Some significant changes occur midway, making a per-commit review obsolete. At this point, the PR should be split into multiple smaller ones to gradually introduce these changes. There are too many modifications affecting performance, stability, and security. The new PRs can use the existing commit history as a base, but some commits should be reordered and squash-fixed to create the "final" version of each logical change. |
6c85c0b
to
0eb2038
Compare
Resolves brave/brave-browser#39439
Submitter Checklist:
QA/Yes
orQA/No
;release-notes/include
orrelease-notes/exclude
;OS/...
) to the associated issuenpm run test -- brave_browser_tests
,npm run test -- brave_unit_tests
wikinpm run presubmit
wiki,npm run gn_check
,npm run tslint
git rebase master
(if needed)Reviewer Checklist:
gn
After-merge Checklist:
changes has landed on
Test Plan:
TBD