Page MenuHomePhabricator

"MediaWiki:Copyright" message allows raw HTML
Closed, ResolvedPublic

Description

[[MediaWiki:Copyright]] still allows raw html input which can be maliciously used by rogue admins by adding <img src="http://my_host/index.php?title=Special:UserLogout"/> to
[[MediaWiki:Copyright]] so everyone will be forcefully logged out.

Did talk to the security responsible dude an age ago (one year ago approx), but nothing seems to have been done to address this issue, nor has any bug been written.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What happened here with https://gerrit.wikimedia.org/r/c/449689 ?

Someone fixed this but then it got conflicted and notfixed?

What happened here with https://gerrit.wikimedia.org/r/c/449689 ?

Someone fixed this but then it got conflicted and notfixed?

Someone proposed a fix, however nobody reviewed it so it never got merged.

Change 449689 abandoned by SBassett:

[mediawiki/core@master] Replace raw HTML copyright footer message with wikitext one

Reason:

This is so out-of-date at this point (the Skin.php code isn't remotely similar these days) that it's likely worth a complete revisit.

https://gerrit.wikimedia.org/r/449689

Is <a rel="license" really the only reason to use raw HTML? If it is, is there really a reason to keep this raw HTML functionality?

Is <a rel="license" actually useful? It's nice to follow guidelines¹ for using semantic HTML attributes, but do they really do anything useful from the point of view of metadata analysis, search engines, language models, etc.? If not, then maybe it's just dead weight? In the last few days, I've asked several people who I thought know about how this rel="license" things, and no one really knows.

If <link rel="license" is enough and <a rel="license" is redundant, then the raw HTML feature on the copyright messages should be just removed. (I am not saying that it's enough. It's just a wild guess and it might be wrong.)

If <a rel="license" is really necessary, it shouldn't be written as raw HTML in messages. Ideally, it should not be in messages at all, but inserted automatically by the software.

The main current reason I care about this is T360497. By itself, that issue directly affects only translatewiki staff. However, it would be just great to get rid of this raw HTML usage to make life easier for all the translatewiki volunteers, who curently have to copy lots of markup in this message, as well as in all its variants in WikimediaMessages.


¹ Which guidelines, actually? Perhaps these? I was a bit surprised that they mention only <a> and not <link>. Maybe we don't need <link rel="license"> either?

¹ Which guidelines, actually? Perhaps these? I was a bit surprised that they mention only <a> and not <link>. Maybe we don't need <link rel="license"> either?

The authoritative guideline is https://html.spec.whatwg.org/dev/links.html#link-type-license; the most thorough guideline (but quite dated) is https://microformats.org/wiki/rel-license I think. Based on these I'd say the <a rel="license" is entirely superfluous and should be removed.

Is <a rel="license" really the only reason to use raw HTML? If it is, is there really a reason to keep this raw HTML functionality?

We do need some functionality at https://wiki.documentfoundation.org/Main_Page to get some footer. There is no way to get the footer "ok" (either broken for logged in or for logged out visitors).

Looks like only wikitext formatting is needed?

Raw HTML:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the <a href="https://www.libreoffice.org/download/license/">Mozilla Public License v2.0</a>. "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our <a href="/TradeMark Policy">trademark policy</a> (see <a href="/Project:Copyrights">Project:Copyrights</a> for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Wikitext:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the [https://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported License], unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the GNU Lesser General Public License ([https://www.libreoffice.org/download/license/ LGPLv3]). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our [[TradeMark Policy|trademark policy]] (see [[Project:Copyrights]] for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Looks like only wikitext formatting is needed?

Raw HTML:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the <a href="https://www.libreoffice.org/download/license/">Mozilla Public License v2.0</a>. "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our <a href="/TradeMark Policy">trademark policy</a> (see <a href="/Project:Copyrights">Project:Copyrights</a> for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Wikitext:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the [https://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported License], unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the GNU Lesser General Public License ([https://www.libreoffice.org/download/license/ LGPLv3]). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our [[TradeMark Policy|trademark policy]] (see [[Project:Copyrights]] for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

then it is broken the other way round. either use wikitext and it is broken for logged in user, or vise versa. Thanks, I had tested this already. ;-)

Is <a rel="license" really the only reason to use raw HTML? If it is, is there really a reason to keep this raw HTML functionality?

We do need some functionality at https://wiki.documentfoundation.org/Main_Page to get some footer. There is no way to get the footer "ok" (either broken for logged in or for logged out visitors).

If it is always wikitext, is there a way to make it OK? What functionality do you need there exactly?

If it is always wikitext, is there a way to make it OK?

Only-wikitext would work (hence no bug in mediawiki), but not for us, sadly.

What functionality do you need there exactly?

External link linking! We need to put our legal disclaimer and post address (In German "Impressum") to be linked for having it on one page for all services. (otherwise we end up changing 100+ pages just to get the Impressum updated)

Basically, at the moment it reads:

https://wiki.documentfoundation.org/Main_Page?uselang=de

Der Inhalt ist verfügbar unter der Lizenz the <a href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the <a href="https://www.libreoffice.org/download/license/">Mozilla Public License v2.0</a>. "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our <a href="/TradeMark Policy">trademark policy</a> (see <a href="/Project:Copyrights">Project:Copyrights</a> for details). LibreOffice was based on OpenOffice.org.<br/>If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here., sofern nicht anders angegeben.

or

https://wiki.documentfoundation.org/Main_Page?uselang=en

Please note that all contributions to The Document Foundation Wiki are considered to be released under the Creative Commons Attribution-ShareAlike 3.0 Unported License, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the GNU Lesser General Public License (LGPLv3). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy (see Project:Copyrights for details). LibreOffice was based on OpenOffice.org.
If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Please check both links on your own: the markup visibility / differences just to switch (logged out!) the user interface of MediaWiki. There is at least one bug! This MediaWiki instance is sadly (in a global project) using German as default language.

Again: I believe this is a corner case that hits us. We do need to provide $wgRightsPage and/or $wgRightsUrl, but we do need some special/individual text! Most wikis do work with the actual system.

If using custom text, then please simply allow changing the whole text, ignoring all part before and afterward, allowing external links and using HTML/CSS stuff. It should be a new server property ($wgRightsText or the like) and thus can only be changed by the sysadmin - best with i18n possibilities (within the server config)
If the sysadmin wants to fuck up the reader/editor, then he can add some stuff by JS, HTML, PHP, whatever - no security concerns by this way! (and I really do not like this MediaWiki-NS as too many people can blindly mess it up without communicating)

Two ideas for making progress on this:

  • Rather than convert the html to wikitext, one avenue here might be to pass the "raw html" through the sanitizer. Sanitizer::removeSomeTags() will prevent many bad things, while still allowing the external links that German wiki wants. It would be able to block the <img> tag of the original bug report. If third party wikis wanted to embed images in the footer they could do that with CSS rather than embedded <img> tags.
  • We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia. That would allow us to incrementally improve our security footing without necessarily breaking german wiki or third parties which might rely on this.
    • A variant of this would be to add a new message: MediaWiki:CopyrightWikitext. If that message is non-empty, it is used in place of MediaWiki:Copyright and rendered from wikitext. (Or vice-versa, maybe we'd want to use MediaWiki:CopyrightWikitext only if MediaWiki:Copyright was empty?) That would also allow wiki-by-wiki conversions so that only those wikis which actually need raw html use it.
  • We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia. That would allow us to incrementally improve our security footing without necessarily breaking german wiki or third parties which might rely on this.

So maybe something like:

'wmgCopyrightRawHtml' => [
    'default' => false,
    'dewiki' => true
];

foreach ( $wmgCopyrightRawHtml as $k => $v ) {
    global $wgDbName;
    if ( $v === true && $k == $wgDbName ) {
        global $wgRawHtmlMessages;
        $wgRawHtmlMessages[] = 'copyright';
    }
}

in IS.php and then remove 'copyright' as a default value for $wgRawHtmlMessages within various config-schema files?

  • We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia.

Why does dewiki need this? What in https://de.wikipedia.org/w/index.php?title=MediaWiki:Copyright&action=edit or https://de.wikipedia.org/w/index.php?title=MediaWiki:Wikimedia-copyright&action=edit requires raw HTML?

  • A variant of this would be to add a new message: MediaWiki:CopyrightWikitext. If that message is non-empty, it is used in place of MediaWiki:Copyright and rendered from wikitext. (Or vice-versa, maybe we'd want to use MediaWiki:CopyrightWikitext only if MediaWiki:Copyright was empty?) That would also allow wiki-by-wiki conversions so that only those wikis which actually need raw html use it.

This seems like the best idea to me. I think it will make for the easiest way to migrate them, and less potential to accidentally treat the message the wrong way.

Apparently it was already proposed back in 2018: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689

Rather than convert the html to wikitext, one avenue here might be to pass the "raw html" through the sanitizer.

That's simple to do but we'd end up with a custom MediaWiki page that behaves differently from any other MediaWiki page. Not great IMO.

We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia. That would allow us to incrementally improve our security footing...

Security-wise we are OK I think since the message is listed in $wgRawHtmlMessages. It would be a usability improvement (would allow more people to edit the message on other wikis). IMO not worth the complexity.

A variant of this would be to add a new message: MediaWiki:CopyrightWikitext. If that message is non-empty, it is used in place of MediaWiki:Copyright and rendered from wikitext.

Apparently it was already proposed back in 2018: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689

Yeah it just didn't get any reviews.

It's slightly more complicated because a hook is also involved, but only slightly.

Per the discussion with Security that happened as part of T367995: Security Preview for shared login domain, we should probably fix this before making the copyright message from all wikis show up on the same shared login domain.

I started working on this. I'm planning to basically revive https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689 – adding a new message and a new hook, and deprecating the old ones. To ensure that we don't accidentally reintroduce the raw HTML ones in WMF production (particularly on the shared login domain), I'll also add a config option that disables them completely.

In addition to updating the code in MediaWiki core, we'll also need to update WMF overrides for these messages in WikimediaMessages, and you wouldn't believe how many there are. I worked on that code a couple of years ago and somehow it has at least doubled in size since then. I have some cleanup patches up for review already: https://gerrit.wikimedia.org/r/q/topic:T45646-copyright-nohtml

We'll probably also need to update on-wiki overrides, if we don't want them to be lost when disabling the raw HTML option. I'll review how many of those exist, and either update them manually or write a bot to run from my WMF wiki account. That will be the last step after all code changes are merged and deployed.

I started working on this. I'm planning to basically revive https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689 – adding a new message and a new hook, and deprecating the old ones. To ensure that we don't accidentally reintroduce the raw HTML ones in WMF production (particularly on the shared login domain), I'll also add a config option that disables them completely.

In addition to updating the code in MediaWiki core, we'll also need to update WMF overrides for these messages in WikimediaMessages, and you wouldn't believe how many there are. I worked on that code a couple of years ago and somehow it has at least doubled in size since then. I have some cleanup patches up for review already: https://gerrit.wikimedia.org/r/q/topic:T45646-copyright-nohtml

We'll probably also need to update on-wiki overrides, if we don't want them to be lost when disabling the raw HTML option. I'll review how many of those exist, and either update them manually or write a bot to run from my WMF wiki account. That will be the last step after all code changes are merged and deployed.

Thank you! I'm not sure that I understand, however: when it's done, will there still be any raw HTML left in those messages in core, in the WikimediaMessages extension, or in the locally overridden messages on wiki? I think that raw HTML is not necessary anywhere, but I might be missing something.

If the raw HTML functionality is removed, the locally-overridden messages will have to be updated (or deleted), for example at https://fr.wikipedia.org/w/index.php?title=MediaWiki:Copyright&action=edit .

There won't be any HTML left, if all goes well. Core will have an option to continue using raw HTML (it may be removed in the future, but that's out of scope for now), but we won't be using it on Wikimedia wikis. I am planning to update those locally-overridden messages.

There won't be any HTML left, if all goes well. Core will have an option to continue using raw HTML (it may be removed in the future, but that's out of scope for now), but we won't be using it on Wikimedia wikis. I am planning to update those locally-overridden messages.

Great, thank you. Be bold and recommend that the communities delete them instead of overriding locally, unless they have a very strong legal or linguistic reason to override :)

Change #1075597 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Replace raw HTML copyright footer message with wikitext one

https://gerrit.wikimedia.org/r/1075597

Change #1075620 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/JsonConfig@master] Update to use SkinCopyrightFooterMessage hook, avoiding raw HTML

https://gerrit.wikimedia.org/r/1075620

Change #1075631 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaMessages@master] Update to use SkinCopyrightFooterMessage hook, avoiding raw HTML

https://gerrit.wikimedia.org/r/1075631

Change #1076257 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/WikimediaMessages@master] Update translations of copyright messages

https://gerrit.wikimedia.org/r/1076257

Change #1076259 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/MobileFrontend@master] "mobile-frontend-copyright" is no longer a raw HTML message

https://gerrit.wikimedia.org/r/1076259

Change #1075597 merged by jenkins-bot:

[mediawiki/core@master] Replace raw HTML copyright footer message with wikitext one

https://gerrit.wikimedia.org/r/1075597

Change #1075620 merged by jenkins-bot:

[mediawiki/extensions/JsonConfig@master] Update to use SkinCopyrightFooterMessage hook, avoiding raw HTML

https://gerrit.wikimedia.org/r/1075620

The copyright message has been removed from languages/i18n/qqq.json by the localization updates in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1076561/2/languages/i18n/qqq.json . That got force merged since the banana checker failed with:

Running "banana:core" (banana) task
>> 1 message lacks documentation in qqq.json:
>> * copyright

The copyright message is deprecated by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1075597 and replaced by copyright-footer, history_copyright and copyright-footer-history. I guess the qqq message should have been kept.

Change #1076707 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@master] Restore "copyright" qqq message

https://gerrit.wikimedia.org/r/1076707

The copyright message has been removed from languages/i18n/qqq.json by the localization updates in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1076561/2/languages/i18n/qqq.json . That got force merged since the banana checker failed with:

Running "banana:core" (banana) task
>> 1 message lacks documentation in qqq.json:
>> * copyright

The copyright message is deprecated by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1075597 and replaced by copyright-footer, history_copyright and copyright-footer-history. I guess the qqq message should have been kept.

I am merging your patch seeing it as a partial revert of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1076561/2/languages/i18n/qqq.json since this currently causes CI to fail for all core patches.

Change #1076707 merged by jenkins-bot:

[mediawiki/core@master] Restore "copyright" qqq message

https://gerrit.wikimedia.org/r/1076707

Change #1075631 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Update to use SkinCopyrightFooterMessage hook, avoiding raw HTML

https://gerrit.wikimedia.org/r/1075631

Change #1076257 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Update translations of copyright messages

https://gerrit.wikimedia.org/r/1076257

matmarex removed a project: Patch-For-Review.

I filed T376293 for finishing the cleanup in WikimediaMessages (later this month), and T376295 for finishing the removal in MediaWiki core (in an unspecified future MediaWiki release).

matmarex moved this task from Soon to Current Sprint on the MediaWiki-Platform-Team board.

I've noticed a problem as I was preparing to use this on Wikimedia wikis: the copyright-footer-history message is not being used unless history_copyright is also created.

Change #1080095 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] SkinComponentCopyright: Fix message existence check for history-copyright

https://gerrit.wikimedia.org/r/1080095

Change #1080279 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.43.0-wmf.27] SkinComponentCopyright: Fix message existence check for history-copyright

https://gerrit.wikimedia.org/r/1080279

Change #1080279 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.27] SkinComponentCopyright: Fix message existence check for history-copyright

https://gerrit.wikimedia.org/r/1080279

Mentioned in SAL (#wikimedia-operations) [2024-10-15T13:50:50Z] <urbanecm@deploy2002> Started scap sync-world: Backport for [[gerrit:1080279|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-15T13:54:03Z] <urbanecm@deploy2002> urbanecm, matmarex: Backport for [[gerrit:1080279|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Change #1080095 merged by jenkins-bot:

[mediawiki/core@master] SkinComponentCopyright: Fix message existence check for history-copyright

https://gerrit.wikimedia.org/r/1080095

Mentioned in SAL (#wikimedia-operations) [2024-10-15T14:24:13Z] <urbanecm@deploy2002> Finished scap sync-world: Backport for [[gerrit:1080279|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]] (duration: 33m 23s)

Change #1080369 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.43.0-wmf.26] SkinComponentCopyright: Fix message existence check for history-copyright

https://gerrit.wikimedia.org/r/1080369

Change #1080369 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.26] SkinComponentCopyright: Fix message existence check for history-copyright

https://gerrit.wikimedia.org/r/1080369

Mentioned in SAL (#wikimedia-operations) [2024-10-15T20:57:10Z] <cjming@deploy2002> Started scap sync-world: Backport for [[gerrit:1080369|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-15T20:59:19Z] <cjming@deploy2002> cjming, matmarex: Backport for [[gerrit:1080369|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-15T21:04:00Z] <cjming@deploy2002> Finished scap sync-world: Backport for [[gerrit:1080369|SkinComponentCopyright: Fix message existence check for history-copyright (T45646)]] (duration: 06m 51s)

Thanks to all who have worked on fixing this.

I hope this isn't going off-topic too much...

The other strange legacy element of the basic UI is [[Mediawiki:Sidebar]] whose format is just plain weird. There's a plan to form a committee to review the sidebar at T318435 but it seems to be "death by committee," i.e. a committee is formed, and then nothing gets done. ¯\_(ツ)_/¯

I mean, the code looks like this -- it's like its own weird markup language:

* navigation
** mainpage|mainpage-description
** Wikipedia:Contents|contents
** currentevents-url|currentevents
** randompage-url|randompage
** Wikipedia:About|aboutsite
** contact-url|contactpage
** sitesupport-url|sitesupport
* SEARCH
* interaction
** helppage|help
**Help:Introduction|introduction
** portal-url|portal
** recentchanges-url|recentchanges
** Wikipedia:File upload wizard|upload
* TOOLBOX

It seems to me that the best fix for that is also to convert it to wikitext, and that a similar fix should be instituted to the one deployed here. I've reported this as T326471

@Bugreporter2: Please bring this up in T318435 (and I don't see some "committee" ever mentioned) - thanks.