Page MenuHomePhabricator

Show warning about privacy/security issues on PDF file pages
Closed, ResolvedPublic2 Estimated Story Points

Description

There are various privacy and security issues with some PDF clients (Adobe Reader, Adobe Acrobat) which can be used to expose the IP address and other private data of the reader to the author of the document. These are not new but probably still not widely known.

We should display a small warning on PDF description pages and link to a page with information about how to set secure defaults in mainstream PDF readers.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added projects: Multimedia, acl*security.
Tgr subscribed.

Isn't that something that could be filtered/sanitized at the upload stage? That's how we usually deal with file formats that can contain code/phone home features.

Apparently PDF has so many ways to hide such content that detecting such files is a hard problem. See https://blog.avast.com/2011/04/22/another-nasty-trick-in-malicious-pdf/ for an example.

T89745 has some details but in general there doesn't seem to be any library claiming to reliably identify such files, just various experimental tools / research projects. There are apparently several ways user data could be leaked by a PDF reader (Javascript, Flash, remote images, reference XObjects, DRM), and several content obfuscation techniques on top of that.

Gilles edited a custom field.

From what I understood of the problem I considered the following:

  • The warning is about the format, so it may make sense to place it closer to where the format is displayed.
  • A subtle but visible indication is preferred since (a) it will appear for all PDFs regardless of the user being already aware, and (b) we want to avoid yet another warning box users tend to ignore.
  • A compact initial presentation, where user can expand on more details when needed would accommodate the different needs.

Based on that, I think the following could work:

  • For the original file section where the format information is provided, make the "application/pdf" text use a warning color. I picked orange (#FF5D00) in the example below, but we can go with yellow (#FFB50D) or red (#D11813) depending on the severity we consider for the issue).
  • An additional warning icon is shown next to the file format using the same color.
  • When hovering or clicking on the file format text or the icon, a tooltip will describe the issue including a link to a description page on how to configure your PDF viewer properly. I also included a remark to clarify that the issue is not specific for the file but general for the format.

A mockup with the above idea:

pdf-warning.png (731×1 px, 247 KB)

The icon used:

I really like the mockup! @Gilles, if this is something multimedia can schedule to get developed, I think we can safely call this addressed.

Yeah, we'll schedule it for our next sprint, which starts later today.

Gilles added a subscriber: Pginer-WMF.
MarkTraceur triaged this task as High priority.

@Pginer-WMF your mockup is on a PDF that only has one page - on multiple-paged PDFs, the page count comes after the MIME type:

pdf-pages.png (438×1 px, 102 KB)

Any changes with that in mind? (I will assume no for now, we can make them later)

@Pginer-WMF sorry to hit you with concerns after the fact, but you also have, in the mockup, only part of the second message linked. Would it be terribly ugly for me to add a "learn more" link as a separate paragraph under the main warning?

And...what is that link supposed to be to?

I think the same approach should work on multi-page PDFs, but it is a good catch, we can check how it looks in case we may want to adjust something (e.g., some spacing).

Change 194564 had a related patch set uploaded (by MarkTraceur):
Add warning about PDF files on the file page.

https://gerrit.wikimedia.org/r/194564

Change 194565 had a related patch set uploaded (by MarkTraceur):
Add framework for file warnings

https://gerrit.wikimedia.org/r/194565

Change 194569 had a related patch set uploaded (by MarkTraceur):
[WIP] Add warning about PDF files on the file page.

https://gerrit.wikimedia.org/r/194569

Change 194564 abandoned by MarkTraceur:
[WIP] Add warning about PDF files on the file page.

Reason:
Sorry! See https://gerrit.wikimedia.org/r/194569

https://gerrit.wikimedia.org/r/194564

Bit stalled as we deal with insanity in oojs-ui to make this possible.

Trivial OOjs UI portion of this task now complete.

Change 194565 merged by jenkins-bot:
Add framework for file warnings

https://gerrit.wikimedia.org/r/194565

Change 194569 merged by jenkins-bot:
Add warning about PDF files on the file page.

https://gerrit.wikimedia.org/r/194569

Change 199972 had a related patch set uploaded (by MarkTraceur):
Add message documentation for file warning

https://gerrit.wikimedia.org/r/199972

csteipp asked me yesterday, on IRC, whether this could be closed, but the i18n patch still needs to be merged. One small fix to do. I'll do it first thing.

Change 199972 merged by jenkins-bot:
Add message documentation for file warning

https://gerrit.wikimedia.org/r/199972

Thanks for the merge, @Tgr

May be we could force uploaded PDFs to be converted to DejaVu : stop supporting PDFs natively.

However this will require a lengthy conversion process to render them (and if they are prerendered as bitmaps, their storage size will be largely increased: this could be a problem for PDFs stored in Commons containing scanned facsimiles of books for Wikisource).

Can't we use some safe XML format to render them, such as "paged SVG" or an open document format like eBook ?

IMHO, the eBook format should be supported by default in Commons (like DejaVu) to deprecate PDF uploads (and avoid all server-side conversions: these conversions should be performed by an utility on a ToolLabs server, or by users themselves), but I've not considered if it is fully safe (can it embed scripts?). The ODT format is definitely not safe (just like Microsoft document formats that embed scripts in several languages).

The plain old RTF format could be enabled as well (without its scriptable extensions added by Microsoft), as well as standard TeX in a standard profile

We could support also PostScript, with Ghostscript running on the server in its "safe" minimum profile, including for generating DejaVu files (or just a single image in PNG format for previewing a single page of the PostScript file).

May be we could force uploaded PDFs to be converted to DejaVu : stop supporting PDFs natively.

That does not sound reasonible - PDF is a popular format.