-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Topics API for not adding current page's topics #54
Comments
I'm surprised to hear you say that. During previous discussions of the API we have very consistently heard that it needs to be "pay to play" based on the risk that sites would always choose to request a user's topics without contributing their own visits to the model, if that choice were possible. |
I should clarify, I think we should still combine 2 and 3 in my description of the API for this reason. If the API caller decides to set The API caller only gets topics from calls where they contributed. If a caller never contributes |
I understand, but the risk we've heard about is of a particular publisher site saying "it's OK to use topics for targeting on my site but not to have my site contribute to the topics profile". |
Very large sites, such as those that carry user-generated content from large numbers of users, are already effectively opted out of contributing, just by having some content about each topic. (They could opt back in by splitting their content into multiple sections: #17) |
This is something that SSPs can manage in their relationships with publishers. Similar to other agreements around audience targeting that are common today. |
I agree that the contribution risk is mitigated now that a caller only receives topics for sites that they run (2) on. I think teasing apart into (1) and (2+3) into separate spaces makes sense. The fact that 1+2+3 are tied together has caused problems for #7 where sending the topic via a request header would also incur behaviors (2+3) which the server may not have wanted/agreed to. I think the way forward there is to split (1) into a request header and (2+3) into a response header. Your proposal of adding an argument to |
@stguav thinking more on this, it seems difficult for a server to figure out when it ought to call (2+3) in order to filter out particular topics. The server would a) need to know what Chrome thinks the topics are for the given site and b) there is collateral damage if there are other topics that the site has that the server is interested in. Is the use case here purely to filter out overreported topics from the user's top 5? The better solution to that problem might be weighting the topics by frequency. |
I'm not sure I agree that this is difficult for a server to figure out. I believe the Chrome classification will be available as discussed in #64 (comment), and this seems similar to other kinds of optimizations that ad tech providers (ATPs) are used to making.
Yes, the primary use case is to have more control over the kinds of topics the API would return, and frequency weighting would help. However, "overreported" is not precisely the same issue. ATPs are interested in the commercial relevance. (Just because a topic is rare, does not mean that it is commercially relevant.) I expect that Chrome may not want to be deciding the commercial relevance of different topics, since different parts of the ads ecosystem are likely to have different opinions and creating a broad consensus seems difficult. So it seems preferable to me, to provide more control and flexibility in the API, as long as it doesn't compromise user privacy or ecosystem health. |
I'd need to think further on the impact it might have on user privacy. What if different third-parties have different ideas of what topics are commercially relevant and then the top 5 topics wind up being super noisy? I guess the idea is that a little bit of influence over the top topics is better than none? |
One important use case of having a separate set/get API (either in the form of JS or headers proposed in #7 ) is regulatory compliance. Unlike user consent signals, publisher control setting might only exist in server side. The client side is not able to make the decision whether it is ok to use the API until talking to the server side. Most of the pressure comes from the setting part of the API (2+3). If we can separate the API into a getter and a setter, it will largely mitigate the concern. |
@zhengweiwithoutthei Not all users are going to be in a consent-based jurisdiction. Have to be able to handle both GDPR and similar jurisdictions (where you need a basis for processing, generally consent in the case of marketing) and opt-out-based jurisdictions like California, where you don't need consent in advance but if an opt-out is in effect for a given user/site pair you can't share their info (You can't "sell" their info (including any exchange of data for something of value) today, "share" takes effect next year.) So there are regulatory issues around (1) as well. |
@dmarti Yes. I agree there are regulatory issues around (1) as well. My view of this issues is as following: the processing of (1) (use of topics signal for targeting) is usually done server side, where you have a more complete set signals for consents, configures and out-opt, etc. So even if all of the client side consent checks are passed, we calls the API to get the topics, the server side still has the choice to nullify or ignore the signal if an opt-out in effect. This should satisfy both consent or opt-out based jurisdictions. However, for (2), (3), if action of setting the topics profile is done client side together with (1) where we might not yet have all of the basis you need in advance and there is no way to revert the topic you just viewed after talking to server side, it can be a bigger issue. |
I do think it's reasonable to separate (1) and (2+3), and the fact that they were combined was due to legacy FLoC reasons that are no longer relevant. So the plan is (sometime in the next few milestones) to create a |
I am supportive of this proposal. Please also considered a request/response header version of the implementation (#7) for the same proposal. |
Thanks for the update @jkarlin ! |
Hey all. Quick update on thoughts in this space. Pros:
Cons:
|
Another item for the "cons" section is that excluding the current site's topics could help callers monetize scraper sites (and other low-value, low-engagement sites). If the same widely-used caller is on legit topic-focused sites and on scraper sites, it can choose to call Topics API on the scraper sites in order to get valuable topics leaked from the legit sites -- but it ends up getting (1) for whatever topics the scraper site ended up with, obfuscating its own view of the user. Blocking collecting topics for the current site makes it easy for a caller to do a one-way data flow from legit sites to scraper sites. |
Typically the browser doesn't make judgement calls as to what sites are high or low value. If the user is going to the site, then presumably it has value to the user at that time. Therefore I'm not sure I see that as a con here. |
People visit low-quality sites all the time, generally because of some kind of deceptive link. They bounce, but generally after the ad impressions count as viewed. Depending on the rewards for running a scraper, or other crappy site, those sites are going to have more or less incentive to try to get users to click on deceptive links (using black hat SEO, social and forum spam, malware, whatever.) The more that Topics API gives an advantage to low-quality sites over high-quality ones, the more deceptive links that users will likely be exposed to. |
The Topics API provides one zero-argument function
document.browsingTopics()
, which serves three logically distinct purposes:It would be useful to provide a little more control over these three different aspects of the API. In particular, there is some tension between the first two and the last use case. For the first two use-cases, there is no downside (aside potentially from some latency) to calling the API. Each ad tech is incentivized to call the API whenever possible, either to get useful signals or enabling nonempty responses for future calls to the API.
On the other hand, there are potential downsides to calling the API when it comes to the third point. For example, for a very large publisher site with generic, not commercially relevant topics at the domain/subdomain level. The ad tech might like to call the API to get useful signals, but with the current API it may not be worth the risk of potentially contaminating the users' future top 5 topics with the generic, not commercially relevant topics.
It would be beneficial to perhaps provide an argument that controls the behavior, something like
browsingTopics(add_current_topics=true)
. Since eligibility is determined per API caller there should be no ecosystem concern about "freeloaders" getting other callers topics without contributing. There also does not seem to be any detrimental effect on user privacy. While the concern mentioned above might be partially mitigated by improved Topics ranking and commercially focused taxonomy changes, it seems best to provide this flexibility for API callers so they have flexibility in how they use the API.The text was updated successfully, but these errors were encountered: