Page MenuHomePhabricator

Add raw HTML messages in WMF-deployed extensions to $wgRawHtmlMessages
Closed, ResolvedPublic

Description

Following up on the work being done for T45646, I've identified the following raw HTML messages in WMF-deployed extensions:

  • WikimediaMessages
    • wikimedia-mobile-license-links
    • wikimedia-copyright
    • wikidata-copyright
    • wikimedia-feedback-termsofuse
    • Others?
  • JsonConfig
    • jsonconfig-license
    • Others?
  • TimedMediaHandler
    • some raw <a> tags can be seen in the i18n file, I haven't gone through to check which other messages are raw HTML
    • Others?
  • WikiEditor
    • all help messages (wikieditor-toolbar-help-*)
    • all titles of jQuery UI dialogs (wikieditor-toolbar-tool-*-title)
    • Others?
  • Gadgets
    • MediaWiki:Gadgets-definition (and possibly others?)
    • Others?

These messages should be added to the raw HTML messages list in extension.json, support for which is being added by @Tgr in his patch.

Needless to say, this list is not exhaustive. There are probably many other raw HTML messages, and a proper audit should be done. Perhaps @Bawolff, who wrote phan-taint-check-plugin, might have thoughts on this?

Event Timeline

TTO triaged this task as Medium priority.Aug 2 2018, 10:42 AM
TTO created this task.
This comment was removed by Tgr.

Change 450445 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/WikimediaMessages@master] Mark some raw HTML messages

https://gerrit.wikimedia.org/r/450445

Change 450449 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/JsonConfig@master] Mark some raw HTML messages

https://gerrit.wikimedia.org/r/450449

Change 455603 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/Gadgets@master] Mark MediaWiki:Gadgets-definition as a raw HTML message

https://gerrit.wikimedia.org/r/455603

Change 450445 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMessages@master] Mark some raw HTML messages

https://gerrit.wikimedia.org/r/450445

Change 450449 merged by jenkins-bot:
[mediawiki/extensions/JsonConfig@master] Mark some raw HTML messages

https://gerrit.wikimedia.org/r/450449

Change 456030 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/WikiEditor@master] Mark some messages as raw HTML

https://gerrit.wikimedia.org/r/456030

Change 455603 merged by jenkins-bot:
[mediawiki/extensions/Gadgets@master] Mark MediaWiki:Gadgets-definition as a raw HTML message

https://gerrit.wikimedia.org/r/455603

I am happy with the current task.

However, I would like to see a deprecation process initiated now.

  • WMF projects which make use of unsanitizable raw HTML today shall be informed and asked to convert obeying wikitext limitations.
  • Projects in the outer world shall be informed that raw HTML is deprecated and will fade out within two releases.
  • If not defined yet a $wgHtmlHead feature for <meta> stuff might be introduced for deliberate configuration, available for site sysops only rather than any wiki page.
  • TimedMediaHandler and WikiEditor will fade out in the long run, as far as I know. No need to hurry.

The final target should be to sanitize every system message, but it might take some years to reach that point. However, it is more clear if there are no exceptions and backdoors and hidden sophisticated bypasses.

Change 456030 merged by jenkins-bot:
[mediawiki/extensions/WikiEditor@master] Mark some messages as raw HTML

https://gerrit.wikimedia.org/r/456030

I am happy with the current task.

However, I would like to see a deprecation process initiated now.

  • WMF projects which make use of unsanitizable raw HTML today shall be informed and asked to convert obeying wikitext limitations.
  • Projects in the outer world shall be informed that raw HTML is deprecated and will fade out within two releases.
  • If not defined yet a $wgHtmlHead feature for <meta> stuff might be introduced for deliberate configuration, available for site sysops only rather than any wiki page.
  • TimedMediaHandler and WikiEditor will fade out in the long run, as far as I know. No need to hurry.

The final target should be to sanitize every system message, but it might take some years to reach that point. However, it is more clear if there are no exceptions and backdoors and hidden sophisticated bypasses.

Just stating for the record, that raw html messages is generally considered "deprecated" since eons ago. No new code should introduce new raw html messages, and raw html messages are grounds for an extension to fail the extension security review that new extensions are required to go through.

It may take a while to phase out some legacy stuff (some people apparently rely on the copyright message being raw html), but any messages that just happen to be raw html with nobody depending on it being that way should be phased out aggressively.

Just stating for the record, that raw html messages is generally considered "deprecated" since eons ago. No new code should introduce new raw html messages, and raw html messages are grounds for an extension to fail the extension security review that new extensions are required to go through.

AFAIK there is no equivalent of rawParam in Javascript, so if your message takes a HTML parameter (which is pretty common), you have to make it a raw HTML message, even if the message itself is not intended to contain any HTML.

Change 467003 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/TimedMediaHandler@master] Fix message escaping or mark as raw

https://gerrit.wikimedia.org/r/467003

Change 467003 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Fix message escaping or mark as raw

https://gerrit.wikimedia.org/r/467003

AFAIK there is no equivalent of rawParam in Javascript, so if your message takes a HTML parameter (which is pretty common), you have to make it a raw HTML message, even if the message itself is not intended to contain any HTML.

I don't see it as common, but it does happen sometimes indeed. However, I've never seen someone turn the message into a raw HTML message for that reason. We have at least two alternative routes that I see commonly used and are safe:

  1. The equivalent of a raw parameter in JS, is to first safely process the message (e.g. parsed or escaped), and then replace() the $-placeholder. This is exactly what the PHP code does, just abstracted behind a method call.
  2. If you have a DOM node (not unparsed raw HTML string), you can also use mediawiki.jqueryMsg which is capable of replacing a $-placeholder within an otherwise safely text-transformed or wikitext-parsed interface message.

Both of these approaches are used in core and various extensions.

Fair enough.

This task should be done unless someone has an idea what to do about all the "Others?" fields.

I tried using some grepping to see if I could find some more. Here's what i got:

for i in `find . -name '*.js' -type f`
	do if tr $'\n'$'\t' ' ' < "$i" | grep -o -e '[^ ]*\.html(\s*\(mw\.msg\s*([^()]*\([^()]*([^()]*)[^()]*\)*)*\|mw\.[mM]essage\s*([^()]*\([^()]*([^()]*)[^()]*\)*)\s*\.\s*\(params(\([^()]*([^()]*)[^()]*\)*)\.\)*\(plain\|text\)\s*(\s*)\)\s*)*'
	then echo $'\t'--^ Raw html message in file "$i";fi;
done
).html(      mw.msg( 'mwe-embedplayer-credit-title',       // get the link       $( '<div>' ).append
	--^ Raw html message in file ./extensions/TimedMediaHandler/resources/mw.MediaWikiPlayerSupport.js
.html( mw.msg( 'cx-entrypoint-dialog-page-doesnt-exist-yet', targetAutonym ) )
	--^ Raw html message in file ./extensions/ContentTranslation/modules/entrypoint/ext.cx.entrypoint.js

[Note i don't have all extensions downloaded]

Change 467255 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/TimedMediaHandler@master] Escape one more message

https://gerrit.wikimedia.org/r/467255

Change 467256 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/ContentTranslation@master] Fix escaping of cx-entrypoint-dialog-page-doesnt-exist-yet

https://gerrit.wikimedia.org/r/467256

Fair enough.

This task should be done unless someone has an idea what to do about all the "Others?" fields.

My hope in writing that was that someone would ride to the rescue with phan to give us confidence we've spotted at least most or all of them.

[Note i don't have all extensions downloaded]

Your bash magic has some issues with files with spaces in their name, but here's the rest of the output against everything in Gerrit.

).html( mw.message( 'bs-extendedsearch-search-center-result-no-results' ).plain() )
	--^ Raw html message in file ./BlueSpiceExtendedSearch/resources/bs.extendedSearch/panel/Results.js
.html( mw.message( 'bs-extendedsearch-autocomplete-result-primary-no-results-label' ).plain() )
.html( mw.message( 'bs-extendedsearch-autocomplete-result-secondary-results-label' ).plain() )
.html( mw.message( 'bs-extendedsearch-autocomplete-modified-time-label', this.mtime ).plain() )
	--^ Raw html message in file ./BlueSpiceExtendedSearch/resources/bs.extendedSearch/mixin/Autocomplete.js
).html( mw.msg( 'poll-js-action-complete' ) )
).html( mw.msg( 'poll-js-action-complete' ) )
	--^ Raw html message in file ./PollNY/resources/js/Poll.js
).html(      mw.msg( 'mwe-embedplayer-credit-title',       // get the link       $( '<div>' ).append
	--^ Raw html message in file ./TimedMediaHandler/resources/mw.MediaWikiPlayerSupport.js
.html( mw.msg( 'imagerating-category', category ) )
	--^ Raw html message in file ./ImageRating/resources/js/ImageRating.js
.html( mw.msg( 'cx-entrypoint-dialog-page-doesnt-exist-yet', targetAutonym ) )
	--^ Raw html message in file ./ContentTranslation/modules/entrypoint/ext.cx.entrypoint.js
.html( mw.msg(       'smitespam-created-by',       users[groupCreator] ? users[groupCreator].link : groupCreator      )     )
	--^ Raw html message in file ./SmiteSpam/static/js/ext.smitespam.js

Change 467255 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Escape one more message

https://gerrit.wikimedia.org/r/467255

Change 467256 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Fix escaping of cx-entrypoint-dialog-page-doesnt-exist-yet

https://gerrit.wikimedia.org/r/467256

Can this task be closed as the task title talks about "WMF-deployed extensions", and dedicated subtasks of T2212 be created for BlueSpice, ImageRating, MediaWiki-extensions-SmiteSpam, PollNY (which don't have a string RawHtmlMessages in their extension.json files) as per T200997#4667849 ?

Can this task be closed as the task title talks about "WMF-deployed extensions", and dedicated subtasks of T2212 be created for BlueSpice, ImageRating, MediaWiki-extensions-SmiteSpam, PollNY (which don't have a string RawHtmlMessages in their extension.json files) as per T200997#4667849 ?

It's currently waiting for an expert to audit and confirm that all such messages have been done, so no.

matmarex claimed this task.
matmarex subscribed.

As a self-proclaimed expert, I declare that the listed messages are all I know about, and that nobody is going to do an audit of all of our code. I don't think it makes sense to keep this task open. It was already suggested to close it in 2018 (T200997#4663076).

Some of the messages listed in the task have even already been fixed to not use raw HTML, or are in the progress of being fixed (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiEditor/+/934416, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1075597), so keeping this open is a bit confusing.