You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 5, 2024. It is now read-only.
Model training on private client data is a well motivated use case for ads. There are several approaches which could be compatible with PAM. The original PRIO paper (section 5.3) describes how to implement linear regression using the aggregations supported in PRIO. Upon reflection, it seems that logistic regression could be just as easily implemented, perhaps with an approach like this one. This seems somewhat similar to a paper Apple published on Federated Learning, in that PRIO is used to privately aggregate updates from many devices.
Clarifying the problem setup
The training dataset consists of $N$$k$-dimensional feature vectors
$X^{(1)}, X^{(2)}, …,X^{(N)} \in \mathbb{R}^k$
where each component of $X^{(i)}$ is in the range [0,1].
There are also $N$ corresponding binary labels for whether this user had a conversion or not
$y^{(1)},y^{(2)},...,y^{(N)} \in \{0,1\}.$
We assume that the features are all known in the clear to one party and that the piece we are trying to keep private is the label. In particular we want a label-DP guarantee for the labels. For more details see the above writeup, but the core idea we can focus in on here is that the part of the computation that needs to be computed privately is the following term
which is a linear combination of $k$-dimensional feature vectors.
The key parameters of the problem are
The number of dimensions, $k$, in the feature vectors. (e.g. 32 - 256 is probably a good range to evaluate).
The number of bits, $b$, of precision used to represent the floating point numbers in the range [0,1]. (e.g. 8 bits is probably a good benchmark)
The number of rows, $N$, in the training dataset.
Implementing in PAM
There are two main ways this could be implemented in PAM and they likely vary considerably in performance:
PAM with an impression nonce (see discussion here about doing this in PAM)
PAM without an impression nonce
If impressions can be identified by a nonce, we don’t need to send large feature vectors to and from the device but can join the label to the features before submitting to the MPC.
Using reports with impression nonce
To implement this logistic regression in PAM for publisher queries we can maintain a low client-server bandwidth by implementing it in the following way:
When a site registers an impression, the device will generate a random nonce which is returned to the caller
The publisher creates secret shares of the feature vector $X^{(i)}$ which they associate with this nonce.
The device will compute the attribution result for this impression, which is the label $y^{(i)}.$
Later, the publisher report will be sent as (nonce, EncrptedSecretShares($y^{(i)}$), DP parameters)
Upon receipt of the attributed impression report, the Publisher joins the EncrptedSecretShares($y^{(i)}$) to the stored secret-shared feature vector $X^{(i)}$ having the same nonce.
MPC steps
We need a way to prove to the MPC that the values in $X^{(i)}$ are appropriately bounded (i.e. each component of $X^{(i)}$ in the range [0,1]).
Option 1: The publisher computes a zero-knowledge proof that the components in $X^{(i)}$ are in the range [0,1] and submits this along with a secret shared $X^{(i)}$ to the MPC. The MPC Helpers check the zero-knowledge proofs.
Option 2: The Publisher creates an XOR sharing of each of the 8 bits of each feature, such that each secret shared bit can only be a {0, 1} value. The MPC Helpers then interpret this as 8 bits of precision for a number in the range [0,1].
The MPC then multiplies every component of the secret shared $X^{(i)}$ by the secret shared label $y^{(i)}.$
Next the MPC sums these across all records, to compute $$\sum_{i=1}^N y^{(i)}X^{(i)}$$
MPC adds DP noise to the $k$-dimensional output vector
Reveal the $k$-dimensional output vector
Scaling by $1/N$ can be computed outside the MPC.
Using reports without impression nonce
To implement this logistic regression without an impression nonce, the feature vector would need to be sent down to the device. The multiplication of the feature vector by the label would happen on the device. The resulting $y^{(i)}X^{(i)}$ vector would then need to be secret shared and set off the device for aggregation. The device would also need to generate a zero knowledge proof that the values in $y^{(i)}X^{(i)}$ are appropriately bounded. The MPC would then check these proofs, aggregate and add DP noise. This approach will considerably increase the report size and client-server bandwidth. The MPC will be somewhat cheaper in not needing to multiply $y^{(i)}X^{(i)}$ in MPC.
Questions:
Would Apple consider offering private logistic regression in an ads measurement API, using any of the above approaches or others?
What are your thoughts on the tradeoffs of these two approaches and do either seem viable?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Private Logistic Regression in PAM
Model training on private client data is a well motivated use case for ads. There are several approaches which could be compatible with PAM. The original PRIO paper (section 5.3) describes how to implement linear regression using the aggregations supported in PRIO. Upon reflection, it seems that logistic regression could be just as easily implemented, perhaps with an approach like this one. This seems somewhat similar to a paper Apple published on Federated Learning, in that PRIO is used to privately aggregate updates from many devices.
Clarifying the problem setup
The training dataset consists of$N$ $k$ -dimensional feature vectors
where each component of$X^{(i)}$ is in the range [0,1].
There are also$N$ corresponding binary labels for whether this user had a conversion or not
We assume that the features are all known in the clear to one party and that the piece we are trying to keep private is the label. In particular we want a label-DP guarantee for the labels. For more details see the above writeup, but the core idea we can focus in on here is that the part of the computation that needs to be computed privately is the following term
which is a linear combination of$k$ -dimensional feature vectors.
The key parameters of the problem are
Implementing in PAM
There are two main ways this could be implemented in PAM and they likely vary considerably in performance:
If impressions can be identified by a nonce, we don’t need to send large feature vectors to and from the device but can join the label to the features before submitting to the MPC.
Using reports with impression nonce
To implement this logistic regression in PAM for publisher queries we can maintain a low client-server bandwidth by implementing it in the following way:
When a site registers an impression, the device will generate a random nonce which is returned to the caller
The device will compute the attribution result for this impression, which is the label$y^{(i)}.$
Later, the publisher report will be sent as (nonce, EncrptedSecretShares($y^{(i)}$ ), DP parameters)
Upon receipt of the attributed impression report, the Publisher joins the EncrptedSecretShares($y^{(i)}$ ) to the stored secret-shared feature vector $X^{(i)}$ having the same nonce.
MPC steps
Using reports without impression nonce
To implement this logistic regression without an impression nonce, the feature vector would need to be sent down to the device. The multiplication of the feature vector by the label would happen on the device. The resulting$y^{(i)}X^{(i)}$ vector would then need to be secret shared and set off the device for aggregation. The device would also need to generate a zero knowledge proof that the values in $y^{(i)}X^{(i)}$ are appropriately bounded. The MPC would then check these proofs, aggregate and add DP noise. This approach will considerably increase the report size and client-server bandwidth. The MPC will be somewhat cheaper in not needing to multiply $y^{(i)}X^{(i)}$ in MPC.
Questions:
The text was updated successfully, but these errors were encountered: