Maniphest T119494

Don't "strip tags" from DOIs
Closed, ResolvedPublic0 Estimated Story Points
Actions

Assigned To

Authored By

	Josve05a
	Nov 24 2015, 11:20 AM

Description

Paste:

10.1002/1096-8628(20000612)96:3<302::aid-ajmg13>3.0.co;2-i

in Citoid and it will fill the citation, however, in the doi-field it wil add:

10.1002/1096-8628(20000612)96:33.0.CO;2-I

Which is a non-existing doi.

Checked on sv.wiki prodution

Related Objects

Mentioned In: T175632: Citoid for DOI has problems with < >
T140990: Citing url with urlencoded string for < > fails
Mentioned Here: T228575: Decrease number of open tickets with assignee field set for more than two years (aka cookie licking) (March-June 2020 edition)

Event Timeline

Josve05a created this task.Nov 24 2015, 11:20 AM

Josve05a raised the priority of this task from to Needs Triage.

Josve05a updated the task description. (Show Details)

Josve05a added a project: Citoid.

Josve05a subscribed.

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 24 2015, 11:20 AM

Hmm. It's getting caught by a fix that strips html tags out of any fields using a node library called stripTags. This is a validation measure to make sure we aren't accidentally sending html to wikis... not sure how to get around this except not to use it on the doi fields. I'll have to consult with security :).

Mvolz added a project: deprecated-security-team-reviews.Dec 7 2015, 8:34 PM

Mvolz set Security to None.

Mvolz edited projects, added acl*security; removed deprecated-security-team-reviews.

Hmm. It's getting caught by a fix that strips html tags out of any fields using a node library called stripTags

Use a better library?

Stripping html tags does not sound like the correct solution here. Unless you're worried about dirty data that has extra html tags you don't want in it, I would expect that you would simply want to escape angled brackets (ie turn >, < into > and < respectively).

@Bawolff, That's exactly what we're worried about. It was originally a fix for some Zotero data that was coming in with div and i tags.

Fittingly their wikipedia translator, hah :).

Also, changing < into > in a doi changes its meaning.

Also, changing < into > in a doi changes its meaning

It should be different layers. turning < into < when fed to the wiki, will eventually get turned back into a '<' when read by the user.

That's exactly what we're worried about. It was originally a fix for some Zotero data that was coming in with div and i tags.

I'm not exactly sure about the syntax of doi's, but that means you'd probably have to do something like strip html-type tags, but only those that are div, span, i, etc, and then escape angle brackets for the other uses of angle brackets.

Bawolff claimed this task.Dec 8 2015, 10:08 PM

Bawolff triaged this task as Medium priority.

Bawolff edited projects, added Security-Team; removed acl*security.

That's true, but we can't guarantee that every consumer of the doi field is
going to be html, so that makes me concerned. I think I'd be happier not
using striptags just on the doi field. The issue we had with polluting html
tags was not in the doi field and I think is unlikely to be. @mobrovac what
do you think?

I'm with @Mvolz that the best solution might be not to enforce tag stripping on all of the fields. But, before we do that, I'd like to see some research conducted into what happens if malicious tags (primarily <script>) are inserted in the DOI field by the client. How would Zotero process this? What about DOI indexers?

Personally, I'm sad to see such abuses of the DOI spec.

Mvolz moved this task from Backlog to IO Tasks on the Citoid board.Jan 12 2016, 10:14 AM

Josve05a mentioned this in T140990: Citing url with urlencoded string for < > fails.Jul 21 2016, 3:19 PM

Mvolz moved this task from IO Tasks to Service on the Citoid board.Oct 28 2016, 3:13 PM

Restricted Application added a project: VisualEditor. · View Herald TranscriptOct 28 2016, 3:13 PM

Mvolz moved this task from Service to Service: Scraper & Validation on the Citoid board.Oct 28 2016, 3:16 PM

Jdforrester-WMF moved this task from To Triage to External and Administrivia on the VisualEditor board.Nov 8 2016, 8:05 PM

Jdforrester-WMF set the point value for this task to 0.Feb 9 2017, 6:18 PM

• Deskana mentioned this in T175632: Citoid for DOI has problems with < >.Sep 12 2017, 7:48 PM

• Deskana merged a task: T175632: Citoid for DOI has problems with < >.

• Deskana added subscribers: PerfektesChaos, • Deskana.

Mvolz claimed this task.Sep 13 2017, 9:21 AM

Josve05a added a project: User-Josve05a.Sep 13 2017, 1:12 PM

Josve05a moved this task from Backlog to Tasks to follow on the User-Josve05a board.

Bawolff removed a project: Security-Team.Sep 4 2018, 3:00 PM

Mvolz moved this task from Service: Scraper & Validation to Service on the Citoid board.Dec 11 2018, 11:30 AM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Mvolz renamed this task from Citoid converts ignores <302::aid-ajmg13> to Don't "strip tags" from DOIs.May 19 2021, 10:34 AM

Mvolz claimed this task.

Mvolz merged a task: T283101: Parts of DOIs between < and > are lost in URL expansion.

Mvolz added a subscriber: AManWithNoPlan.

Mvolz closed this task as Resolved.Jun 10 2021, 10:31 AM

Don't "strip tags" from DOIsClosed, ResolvedPublic0 Estimated Story PointsActions

Description

Related Objects

Event Timeline

Don't "strip tags" from DOIs
Closed, ResolvedPublic0 Estimated Story Points
Actions