Page MenuHomePhabricator

Enable cross-domain API requests in API's JSON responses
Closed, ResolvedPublic

Description

I was hoping that the response from a GET request to Wikipedia's API[1] would include a CORS "Access-Control-Allow-Origin: *" header, so that it could be accessed by a client-side script running on any domain.

I ended up using the JSONP response as a workaround, but this is less secure than cross-origin JSON, and shouldn't really be necessary now that browsers support CORS headers.

Would it be possible to add an "Access-Control-Allow-Origin: *" header to the API's JSON responses?

[1] https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmlimit=max&cmtype=subcat&format=json&cmtitle=Category:Set_theory


Version: unspecified
Severity: enhancement

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

For information, on every public wikis hosted in the WMF cluster, $wgCrossSiteAJAXdomains contains the following domains:

'*.wikipedia.org',
'*.wikinews.org',
'*.wiktionary.org',
'*.wikibooks.org',
'*.wikiversity.org',
'*.wikisource.org',
'wikisource.org',
'*.wikiquote.org',
'*.wikidata.org',
'*.wikivoyage.org',
'www.mediawiki.org',
'm.mediawiki.org',
'wikimediafoundation.org',
'advisory.wikimedia.org',
'auditcom.wikimedia.org',
'boardgovcom.wikimedia.org',
'board.wikimedia.org',
'chair.wikimedia.org',
'chapcom.wikimedia.org',
'collab.wikimedia.org',
'commons.wikimedia.org',
'donate.wikimedia.org',
'exec.wikimedia.org',
'grants.wikimedia.org',
'incubator.wikimedia.org',
'internal.wikimedia.org',
'login.wikimedia.org',
'meta.wikimedia.org',
'movementroles.wikimedia.org',
'office.wikimedia.org',
'otrs-wiki.wikimedia.org',
'outreach.wikimedia.org',
'quality.wikimedia.org',
'searchcom.wikimedia.org',
'spcom.wikimedia.org',
'species.wikimedia.org',
'steward.wikimedia.org',
'strategy.wikimedia.org',
'checkuser.wikimedia.org',
'internal.wikimedia.org',
'login.wikimedia.org',
'meta.wikimedia.org',
'movementroles.wikimedia.org',
'office.wikimedia.org',
'otrs-wiki.wikimedia.org',
'outreach.wikimedia.org',
'quality.wikimedia.org',
'searchcom.wikimedia.org',
'spcom.wikimedia.org',
'species.wikimedia.org',
'steward.wikimedia.org',
'strategy.wikimedia.org',
'usability.wikimedia.org',
'wikimania????.wikimedia.org',
'wikimaniateam.wikimedia.org'

This allows requests from one wiki to another.

This is disabled on private wikis hosted in the WMF cluster.

Regarding the clarification of intent provided in T88532, I could see the addition of support to the API for specifying "origin=*" meaning to return Access-Control-Allow-Origin: * (without any other CORS-related headers, particularly without Access-Control-Allow-Credentials) while also performing the same forcing of anonymous responses and refusal to provide tokens that are done for JSONP (see also Gerrit change 180430).

@csteipp, what do you think about that idea?

My attempt to more clearly phrase this task:

To allow client-side JavaScript applications to fetch information from MediaWiki APIs, add the following header to API responses, allowing the response to be read by an application running on a different domain:

Access-Control-Allow-Origin: *

In the current documentation for CORS usage in cross-site requests, it states:

"If the CORS origin check passes, MediaWiki will include the Access-Control-Allow-Credentials: true header in the response, so authentication cookies may be sent."

What it should also say -- once this is implemented -- is that if the CORS origin check doesn't pass, MediaWiki will not include the Access-Control-Allow-Credentials: true header in the response, so authentication cookies may not be sent, but MediaWiki will still include the Access-Control-Allow-Origin: * header so that unauthenticated requests can be accessed from any origin.

Notes:

eaton.alf set Security to None.

> I could see the addition of support to the API for specifying "origin=*" meaning to return Access-Control-Allow-Origin: *

This could be good, except it makes clients add an extra parameter -- it would be most straightforward to return Access-Control-Allow-Origin: * when no origin parameter is specified.

> I could see the addition of support to the API for specifying "origin=*" meaning to return Access-Control-Allow-Origin: *

This could be good, except it makes clients add an extra parameter -- it would be most straightforward to return Access-Control-Allow-Origin: * when no origin parameter is specified.

We need to have some trigger for "force anonymous response and no-tokens", we can't just blindly allow all origins because of the no-token part.

We might be able to use the presence of an Origin header without the origin parameter as such a trigger, but RFC 6454 § 7.3 explicitly allows the agent to give an Origin header even for same-origin requests so I'd be wary of doing that.

We need to have some trigger for "force anonymous response and no-tokens", we can't just blindly allow all origins because of the no-token part.

Could you elaborate on why that would be a problem?

Anomie, you're wrong. Unless the domains in question are behind a firewall (e.g. intranet or home network) adding Access-Control-Allow-Origin: * has absolutely no negative consequences. It enables the same kind of thing possible with curl. No need to worry about cookies or HTTP authentication.

Anomie, you're wrong. Unless the domains in question are behind a firewall (e.g. intranet or home network) adding Access-Control-Allow-Origin: * has absolutely no negative consequences. It enables the same kind of thing possible with curl. No need to worry about cookies or HTTP authentication.

Sorry, but that is not correct. Allowing browser access from arbitrary external sites would basically nullify our protection against CSRF attacks.

As I understand it, the CSRF protection involves sending a token to authenticated users, who must be sending requests from origins that are in the whitelist (i.e. Wikimedia sites that send credentials and are allowed to make edits) as those are the only origins for which the Access-Control-Allow-Credentials: true header is added to responses.

What I don't understand is how that relates to requests from any other origin, which are guaranteed to be anonymous. What's the harm in adding Access-Control-Allow-Origin: * to those responses?

Anomie, no it would not. I recommend studying e.g. https://annevankesteren.nl/2015/02/same-origin-policy or https://annevankesteren.nl/2012/12/cors-101 or maybe even reading the specification at https://fetch.spec.whatwg.org/ itself.

CSRF is about a request vulnerability. Adding a CORS header on a response is neither going to make you vulnerable, nor protected from such a vulnerability. CORS is about sharing the data in a response (if we ignore CORS preflights, which are not relevant here).

Specifying * only allows the data to be read and only from requests that include neither cookies nor HTTP authentication information associated with the user. This is essentially the same as if you did curl from a public server.

I've run into this today. I inferred from the documentation on MediaWiki I would get Access-Control-Allow-Origin: <value of origin header> since allowing all domains seems like a reasonably sensible option when doing GET queries, but, as discussed here I didn't. If we are really at the stage of not trusting the browsers to implement the standard correctly (as far as I know they all do), it would be possible to reject requests with the Cookie header sent.

While we're on the subject, what's the point in the origin GET parameter anyway? Why not just use the value of the Origin header? They're checked to make sure they're identical anyway so why do they both have to be present?

Another point to make here is that JSONP is less secure since then anyone with control over the Mediawiki site can make my users on my site execute arbitrary Javascript.

TheDJ subscribed.

I had informally asked chris to review the security aspects of this, after anne and frankie left their comments, but I don't think it was ever properly on the radar. So i'm adding Application Security Reviews to this, in hopes that at least the request is tracked.

So hereby the review request:
With the current CORS implementation that we have in core (significantly different from 2-3 years ago when this was last investigated), plus the comments of Anne, is there any reason why we should not remove the origin restriction on anonymous wildcard access ?

Second, with the changed implementation of the origin checks (basically denying multiple origins in the origin header), do we still actually need that origin param on the api request ?

Allowing read-only access for XMLHttpRequests to the API by using "Access-Control-Allow-Origin: *" (and "Access-Control-Allow-Credentials: false" as defense in depth) makes sense as long as our infrastructure can handle the additional load from increased API use and pre-flighting.

When considering exposure of user credentials as the primary threat, users using older browsers that don't implement CORS are protected by the same-origin policy. Users using browsers which fully and correctly implement CORS are protected by nature of their adherence to the spec. Our main concern is users using browsers with broken CORS implementations which, specifically, violate the CORS spec by passing cookies when "Access-Control-Allow-Origin: *" is set by the server, and for which the CORS behavior supersedes same origin behavior. I did not encounter any such implementations in my testing, which included the two most recent versions of each major browser listed at http://caniuse.com/#search=CORS. Further automated testing using BrowserStack may be done to achieve coverage closer to that reflected in our actual user statistics (https://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm), but this initial data is heartening.

Additionally, browsers which support CORS should automatically send the "Origin:" header, so I believe that the separate "origin" request parameter can be removed from, or at least deprecated in, the API (re. https://phabricator.wikimedia.org/T62835#1122434 and https://phabricator.wikimedia.org/T62835#1454829).

The reason for the "origin" request parameter was concern that the impact of varying all API responses on the Origin header would be disastrous for caching, see T22814#248552. @tstarling or someone else familiar with the caching should be asked if that concern still applies to our current varnish caching setup.

Also, are there any interesting browsers that violate the spec by using a cache for "Access-Control-Allow-Origin: *" if the same URL is hit from different origins?

For more defense in depth, BTW, we'd probably want to add the CORS check to ApiBase::lacksSameOriginSecurity() to additionally force an anonymous user and prevent certain other actions (login, account creation, token fetch) if we're returning "Access-Control-Allow-Credentials: false".

If there's a way we can log a request that is coming from a non-whitelisted domain, and includes any MediaWiki session cookies, that would be helpful.

Ran into this again at Jerusalem hackathon, trying to do a wikidata demo. Can't XHR from off-domain JS, have to use JSONP still. :( Anything still blocking this?

Krenair renamed this task from Enable cross-domain Wikipedia API requests in API's JSON responses to Enable cross-domain API requests in API's JSON responses.Apr 1 2016, 2:12 PM
In T62835#2168690, @brion wrote:

Ran into this again at Jerusalem hackathon, trying to do a wikidata demo. Can't XHR from off-domain JS, have to use JSONP still. :( Anything still blocking this?

I'm still waiting on a reply to T62835#1794676, mainly the first paragraph.

I'm still waiting on a reply to T62835#1794676, mainly the first paragraph.

Isn't caching of API requests pretty much completely broken to begin with? There's no way to purge API requests when their data is updated, so request bodies are going to be either uncached or cached too long.

It does rely on user-requested time-based caching rather than purging like index.php endpoints do, but it's not "completely broken". OTOH, I believe it would be very possible to write an "action=pagedata" endpoint that would be purgeable where action=query isn't. It might even be possible for action=query in the future, see T122867 for details.

But I believe the concern in T22814#248552 was that varying all API requests on Origin would fragment the cache so severely that it would negatively impact the whole caching system.

Looks like API requests by default are not cached; the caller must opt in to caching by setting maxage and/or smaxage on the URL. So, 'Origin' would only have to be added to the 'Vary' header -- and would only affect caching -- for GET/HEAD requests where maxage/smaxage are included in the URL. I have no idea how much of our API traffic is included in that subset, or how much of that subset is web browser traffic (which would include an Origin header) versus bot traffic (which would probably not).

For requests which we want to make available from any domain (I guess those would be the requests which don't need write permissions?), we would set Access-Control-Allow-Origin: * anyway, so there is no need to vary on Origin, whether it's cached or not.

It would vary on the session cookie though, since some read modules return private data if the user is logged in and has sufficient permissions. The same is already true for JSONP so as far as I can see that would have no adverse effect on caching performance.

In T62835#2189281, @Tgr wrote:

For requests which we want to make available from any domain (I guess those would be the requests which don't need write permissions?), we would set Access-Control-Allow-Origin: * anyway, so there is no need to vary on Origin, whether it's cached or not.

...except that's only true for non-authenticated requests since for authenticated the standard requires Access-Control-Allow-Origin: <actual origin>. But the cache is already varied on the session cookie, so even if it is not varied on origin, that would work out, right? (Also, varnish could be hacked to replace * with the actual origin.)

In T62835#2168690, @brion wrote:

Ran into this again at Jerusalem hackathon, trying to do a wikidata demo. Can't XHR from off-domain JS, have to use JSONP still. :( Anything still blocking this?

I also ran into this recently and would love to see it closed!

In T62835#2189299, @Tgr wrote:
In T62835#2189281, @Tgr wrote:

For requests which we want to make available from any domain (I guess those would be the requests which don't need write permissions?), we would set Access-Control-Allow-Origin: * anyway, so there is no need to vary on Origin, whether it's cached or not.

...except that's only true for non-authenticated requests since for authenticated the standard requires Access-Control-Allow-Origin: <actual origin>. But the cache is already varied on the session cookie, so even if it is not varied on origin, that would work out, right?

No, it wouldn't. Consider if someone does the same CORS request to Commons from enwiki and dewiki, the Access-Control-Allow-Origin must differ.

(Also, varnish could be hacked to replace * with the actual origin.)

That would be a horrible idea, IMO.

No, it wouldn't. Consider if someone does the same CORS request to Commons from enwiki and dewiki, the Access-Control-Allow-Origin must differ.

Why? If it's an unauthenticated request, just set *. If it's an authenticated request, it won't be cached anyway (downstream it might be, but still varied on the session cookie, which is never the same for two different domains).

Reviewing this whole bug, we seem to have two different requests that seem to be being conflated. Some of that confusion may be my fault in the recent revival of attention to this task.

  1. Allow for CORS requests from any domain, returning Access-Control-Allow-Origin: * and Access-Control-Allow-Credentials: false and internally making ApiBase::lacksSameOriginSecurity() return true.
  2. Remove the need for clients to specify the origin URL parameter when intending to do a CORS request.

Number 1 I think could be done now, I don't see objections raised to it. And that seems to be what @brion needs. Number 2 is what is blocked on making sure it won't blow up our caching infrastructure to vary every request on the Origin header.

So perhaps we should move forward on #1 here and someone can file a separate task for #2.

Change 282391 had a related patch set uploaded (by Anomie):
API: Allow anonymous CORS from anywhere, when specifically requested

https://gerrit.wikimedia.org/r/282391

In T62835#2191089, @Tgr wrote:

No, it wouldn't. Consider if someone does the same CORS request to Commons from enwiki and dewiki, the Access-Control-Allow-Origin must differ.

Why? If it's an unauthenticated request, just set *. If it's an authenticated request, it won't be cached anyway (downstream it might be, but still varied on the session cookie, which is never the same for two different domains).

The session cookie for the two CORS requests to Commons will be the same, despite the different Origin headers. And it's not guaranteed that it won't be marked as cacheable in our varnish, depending on just what the request is.

Any progress here? I would love to see this issue get fixed. Thanks!

See T62835#2191138 for some clarification of the task. A patch has been submitted based on that clarification, but no one has been brave enough to merge it.

This question hasn't been answered:

If there's a way we can log a request that is coming from a non-whitelisted domain, and includes any MediaWiki session cookies, that would be helpful.

This question hasn't been answered:

If there's a way we can log a request that is coming from a non-whitelisted domain, and includes any MediaWiki session cookies, that would be helpful.

Yes, such a thing is possible, although it's a bit ugly. https://gerrit.wikimedia.org/r/294348

Change 282391 merged by jenkins-bot:
API: Allow anonymous CORS from anywhere, when specifically requested

https://gerrit.wikimedia.org/r/282391

Anomie claimed this task.

Marking this bug as resolved, since unauthenticated cross-domain API requests are now possible. This should be deployed to WMF wikis with 1.128.0-wmf.10, see https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap for the schedule.

Again, if someone wants to follow up on the tangent about the need for the 'origin' URL parameter, file a separate task for that.

Note, we should document this on the mw.org CORS page

Hello,

		if ( $request->getVal( 'origin' ) === '*' ) {
			$this->lacksSameOriginSecurity = true;
			return true;
		}

Doesn't seem to solve the problem for client applications running in web browser where application have no control over 'Origin' header which browser includes in request. Why not just check if request has credentials and if it does NOT then include Access-Control-Allow-Origin: * ?

That line checks the origin URL parameter, not the header.

My bad! I understood from reading this thread that Access-Control-Allow-Origin: * gets included only for requests with header Origin: * and didn't verify it before posting that comment.

After your comment and re-reading https://www.mediawiki.org/wiki/Manual:CORS#Description I understood that it talks about the query string parameter. I made edit to that page to state it even more clearly that it doesn't have anything to do with the Origin header of HTTP request.
https://www.mediawiki.org/w/index.php?title=Manual:CORS&diff=prev&oldid=2289620