You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is per the discussion on AdRoll/privacy#1 , but is spelled out more clearly (hopefully).
A malicious actor opts to abuse the Aggregate Reporting API for deanonymization purposes. They start by selecting a subset of users to track, by, for example, flat bidding on each user, with each user having a unique flat bid. For example, they could select their flat bids from a superincreasing knapsack of the form {1, 2, 4, 8, 16, ... 2^N} which will permit the tracking of (up to) N distinct users as these are effectively user IDs. Each user is frequency-capped to one impression only.
The malicious actor, on impression delivery calls:
Spend here is just an example; any summable quantity will work. With no noise added, the malicious actor can call the API and get back sum(spend) across all users. This sum, because each user is effectively represented as a bit in an N-bit binary number, perfectly identifies which users have been to publisher.example. (Note that being a binary representation isn't necessary for the attack; any superincreasing knapsack will work.)
In theory, differential privacy adds noise to prevent any revelation that a particular user was included in the dataset. However, the malicious actor attempts to circumvent this. On impression delivery, the malicious actor does:
The attacker then queries the API for each of these M reports and receives s1 = sum(spend1), s2 = sum(spend2), ..., sM = sum(spendM). They then compute int((s1 + s2 + ... sM) / M), effectively canceling out the mean 0 noise that was added by differential privacy. The resultant integer, they then convert to binary and read out which users have been to publisher.example.
Note that it is not sufficient to limit the queries by the malicious attacker: they could be using multiple identities with different access policies to the reports, or be colluding with other malicious actors. An increase in noise added by differential privacy can be countered by increasing M. Also, M doesn't need to be so large as to completely recover the least-significant bit; even revealing the most-significant bit results in an epsilon of infinity.
Instead, it seems that the browser needs some mechanism to detect that an attack like this is occurring, which could be difficult given that the different hashes obfuscate that a record for publisher.example is being repeated M times (and these hashes could be a legitimate use case, for example tracking spend on each of (creative, interest group, campaign, advertiser, publisher, etc.), which are likely to appear as hashes on their own).
The text was updated successfully, but these errors were encountered:
This is per the discussion on AdRoll/privacy#1 , but is spelled out more clearly (hopefully).
A malicious actor opts to abuse the Aggregate Reporting API for deanonymization purposes. They start by selecting a subset of users to track, by, for example, flat bidding on each user, with each user having a unique flat bid. For example, they could select their flat bids from a superincreasing knapsack of the form
{1, 2, 4, 8, 16, ... 2^N}
which will permit the tracking of (up to) N distinct users as these are effectively user IDs. Each user is frequency-capped to one impression only.The malicious actor, on impression delivery calls:
Spend here is just an example; any summable quantity will work. With no noise added, the malicious actor can call the API and get back
sum(spend)
across all users. This sum, because each user is effectively represented as a bit in an N-bit binary number, perfectly identifies which users have been topublisher.example
. (Note that being a binary representation isn't necessary for the attack; any superincreasing knapsack will work.)In theory, differential privacy adds noise to prevent any revelation that a particular user was included in the dataset. However, the malicious actor attempts to circumvent this. On impression delivery, the malicious actor does:
The attacker then queries the API for each of these M reports and receives
s1 = sum(spend1), s2 = sum(spend2), ..., sM = sum(spendM)
. They then computeint((s1 + s2 + ... sM) / M)
, effectively canceling out the mean 0 noise that was added by differential privacy. The resultant integer, they then convert to binary and read out which users have been topublisher.example
.Note that it is not sufficient to limit the queries by the malicious attacker: they could be using multiple identities with different access policies to the reports, or be colluding with other malicious actors. An increase in noise added by differential privacy can be countered by increasing M. Also, M doesn't need to be so large as to completely recover the least-significant bit; even revealing the most-significant bit results in an epsilon of infinity.
Instead, it seems that the browser needs some mechanism to detect that an attack like this is occurring, which could be difficult given that the different hashes obfuscate that a record for
publisher.example
is being repeated M times (and these hashes could be a legitimate use case, for example tracking spend on each of (creative, interest group, campaign, advertiser, publisher, etc.), which are likely to appear as hashes on their own).The text was updated successfully, but these errors were encountered: