User talk:Citation bot/Archive 18

This is an archive of past discussions with User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 15

Archive 16

Archive 17

→

Dropping parameter "access-date" from other templates without URLs like cite episode, cite AV media notes & cite ODNB

Status: new bug
Reported by: — Chris Capoccia 💬 12:48, 25 July 2019 (UTC)

What happens: Citation bot is stripping accessdate from many kinds of citation templates without URLs, but there are other templates that will still show an error in Category:Pages using citations with accessdate and no URL and Citation bot does not try to strip the accessdate from these other kinds of templates
We can't proceed until: Feedback from maintainers

{{wontfix}} since often the url is in the website parameter or someplace else and we won’t remove access dates unless template is something we work with in general. AManWithNoPlan (talk) 11:52, 11 August 2019 (UTC)

Web site changed to book

Status: {{fixed}} by flagging archive.org as not a reliable defininer of books
Reported by: Hawkeye7 (discuss) 03:30, 9 August 2019 (UTC)

What happens: {{cite web}} changed to {{cite book}}. It is not a book, and I don't know why it thinks it is. A heuristic may need to be tweaked.
What should happen: No change
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Operation_Hurricane&type=revision&diff=909929609&oldid=903079362
We can't proceed until: Feedback from maintainers

archive.org has revampted the website. Will look more. AManWithNoPlan (talk) 18:34, 9 August 2019 (UTC)

URL structure the same. The media type can be determined with an API call. -- GreenC 20:45, 9 August 2019 (UTC)

what do you mean by api call? AManWithNoPlan (talk) 21:02, 9 August 2019 (UTC)

"Advanced Search returning JSON, XML, and more", enter the ID in the search field (BritishNuclearTestOperationHurricaneDeclassifiedReportsToWinston), choose "mediatype" in the fields to return box, choose a format (JSON etc): return. This work is unusual because it is a multi-file so it gives a media type for each one (all the same: "texts") but there may be cases where it is mixed (texts and audio). A more typical book eg. raven01poegoog has a single mediatype on return. -- GreenC 22:31, 9 August 2019 (UTC)

we might eventually do that. Depending upon free time and the need level. AManWithNoPlan (talk) 23:14, 9 August 2019 (UTC)

The media type is pretty generic:

https://archive.org/advancedsearch.php?fl[]=mediatype&output=xml&rows=5000&page=1&q=random

AManWithNoPlan (talk) 02:10, 10 August 2019 (UTC)

Hmm it might not work to distinguish between books and other printed non-book texts. -- GreenC 05:15, 10 August 2019 (UTC)

Closing as {{fixed}} as best we can, since the types archive.org uses (audio, web, account, movies, collection, texts, image, software) are pretty useless). AManWithNoPlan (talk) 18:10, 11 August 2019 (UTC)

JSTOR books

<ref>https://www.jstor.org/stable/j.ctv5rf6vf.6</ref>

expands to

<ref>{{Cite journal | url=https://www.jstor.org/stable/j.ctv5rf6vf.6 | title=Stories and Storytelling in the Era of Graphic Narrative| journal=Stories| pages=27–44| last1=Baetens| first1=Jan| year=2018}}</ref>

when it should expand to

<ref>{{Cite book |last=Baetens |first=Jan |year=2018 |chapter=Stories and Storytelling in the Era of Graphic Narrative |editor1-last=Christie |editor1-first=Ian |editor2-last=Van den Oever |editor2-first= Annie |title=Stories |location=Amsterdam |publisher=Amsterdam University Press |pages=27–44 |jstor=j.ctv5rf6vf.6}}</ref>

You can use the fact that the JSTOR start with j. to know it's a book. Headbomb {t · c · p · b} 17:12, 12 August 2019 (UTC)

The j. is irrelevant. We just need to parse the RIS data better. Which I am working on. https://github.com/ms609/citation-bot/pull/2079 AManWithNoPlan (talk) 23:45, 12 August 2019 (UTC)

Please fix titles with volumes, issue, etc in them

Status: {{fixed}} mostly
Reported by: Headbomb {t · c · p · b} 04:13, 23 June 2019 (UTC)

What should happen: [1]
We can't proceed until: Feedback from maintainers

This is probably tricky to implement, but if a pattern can be generalized, e.g. (untested pseudocode, lacking some punctuation)

Journal Name + ((Volume|Vol\.?|V\.?|Number|Num\.?|No\.?|Pages|Page|Pp\.?|P\.?)+(\d+)?)* + (Special Issue.*)?

that could be worth it. A more limited scope could also be easier to implement. Headbomb {t · c · p · b} 04:13, 23 June 2019 (UTC)

Some care necessary; V. might show up in regard to law cases and any variation of pages in the context of book reviews. --Izno (talk) 13:45, 23 June 2019 (UTC)

Well, it's not just V. alone, but rather <Journal Name> + V.# + nothing that isn't issue/pages/"Special Issue...". So unless you have something like Journal of Physics v. 1993 Special Issue Ford Mustang, that shouldn't happen. Headbomb {t · c · p · b} 15:48, 23 June 2019 (UTC)

there is always JSTOR 41335348 which includes that in the title AManWithNoPlan (talk) 17:36, 8 August 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2075 AManWithNoPlan (talk) 18:28, 11 August 2019 (UTC)

Remove via= if it's the same as journal= or publisher=

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:14, 12 August 2019 (UTC)

What should happen: [2]
We can't proceed until: Feedback from maintainers

With the usual 'The Foobar' = 'Foobar' and other similar variations. Headbomb {t · c · p · b} 17:15, 12 August 2019 (UTC)

Bot down

The bot is just down in general. Gadget doesn't work. API doesn't work. Nothing works. Headbomb {t · c · p · b} 23:13, 13 August 2019 (UTC)

Kaldari any ideas. I have a password protected restart.php that I ran and the bot went away for awhile but when it came back it was still dead AManWithNoPlan (talk) 00:28, 14 August 2019 (UTC)

@AManWithNoPlan: Looks like it won't execute any PHP files. Not sure why. Kaldari (talk) 08:01, 14 August 2019 (UTC)

@AManWithNoPlan: I rolled back the most recent change and it seems to work again. (The head is now at 3f21e6a.) Kaldari (talk) 08:29, 14 August 2019 (UTC)

Going forward, can we test new changes at citations-dev first? Kaldari (talk) 08:49, 14 August 2019 (UTC)

@Smith609: or some one else would need to get that up to date. Also, the branch that works seems to be newer than the bot dying. Lastly the Bot was alive enough that html files, gitpull, and restart all worked. I suspect something wrong with Authenticator. AManWithNoPlan (talk) 11:06, 14 August 2019 (UTC)

Kaldari can we get back on the git master branch for now. AManWithNoPlan (talk) 18:05, 14 August 2019 (UTC)

@AManWithNoPlan: The master branch doesn't work. If I set the repo to the current master branch or anything after 3f21e6a, PHP pages won't execute. Kaldari (talk) 10:09, 15 August 2019 (UTC)

I imagine its an issue with the webservice configuration, i.e. lighttpd.conf, but I'm not sure. Kaldari (talk) 10:12, 15 August 2019 (UTC)

could it be a missing chmod +x ? AManWithNoPlan (talk) 11:19, 15 August 2019 (UTC)

Kaldari try master now. AManWithNoPlan (talk) 18:56, 15 August 2019 (UTC)

No... not master! Headbomb {t · c · p · b} 20:12, 15 August 2019 (UTC)

@AManWithNoPlan: Seems to be fixed now! Kaldari (talk) 14:23, 16 August 2019 (UTC)

{{fixed}} ALL php files need to start with magic php keyword, even unused ones. AManWithNoPlan (talk) 17:27, 16 August 2019 (UTC)

JSTOR book chapters

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 15:12, 14 August 2019 (UTC)

What happens: [3]
What should happen: [4]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2086 RIS is a not well standardized format AManWithNoPlan (talk) 18:05, 14 August 2019 (UTC)

More garbage volume/issue cleanup

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 15:25, 14 August 2019 (UTC)

What should happen: [5]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2091 AManWithNoPlan (talk) 17:46, 16 August 2019 (UTC)

If DOI = JSTOR, and DOI = Inactive, remove DOI / DOI-BROKEN-DATE, remove URL/CHAPTER-URL

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 16:43, 14 August 2019 (UTC)

What should happen: [6]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2089

Caps: La Trobe

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:04, 14 August 2019 (UTC)

What should happen: [7]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2088

CAPS: Nyt Tidsskrift

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 18:31, 14 August 2019 (UTC)

What should happen: [8]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2088

Carnegie Institute Washington D.c. Publication

Status: {{fixed}}
Reported by: Nemo 14:28, 16 August 2019 (UTC)

What happens: special:diff/911094964
We can't proceed until: Feedback from maintainers

As a side note, it should just capitalize "D.c." accross the board. Headbomb {t · c · p · b} 14:48, 16 August 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2090 AManWithNoPlan (talk) 17:45, 16 August 2019 (UTC)

If you have a broken DOI, try it as a JSTOR

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 00:29, 18 August 2019 (UTC)

What should happen: [9]
We can't proceed until: Feedback from maintainers

@AManWithNoPlan: could have waited until this one was fixed before running on Category:Pages with DOIs inactive as of 2019...

Need to run twice

Status: {{wontfix}} database was down
Reported by: Headbomb {t · c · p · b} 06:15, 23 August 2019 (UTC)

What happens: [10] + [11]
What should happen: One diff. Possibly remove the url too.
We can't proceed until: Feedback from maintainers

Capitalize after spaced bracket

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:21, 23 August 2019 (UTC)

What should happen: [12]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2105 AManWithNoPlan (talk) 19:52, 23 August 2019 (UTC)

Caps: eCrypt

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 19:22, 23 August 2019 (UTC)

What should happen: [13]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2105 AManWithNoPlan (talk) 19:52, 23 August 2019 (UTC)

Do not add titles with | (or it's HTML representation)

Status: {{wontfix}}
Reported by: Jonatan Svensson Glad (talk) 18:50, 9 August 2019 (UTC)

What happens: When converting from bare URL Or when adding title, if |title= includes a & #124;, then nuke it or stip what is behind it.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&diff=910110345&oldid=910110119
We can't proceed until: Feedback from maintainers

It's often part of the title. AManWithNoPlan (talk) 18:53, 9 August 2019 (UTC)

I've never seen a single place where this was part of a title (and worth keeping; I always strip this). Jonatan Svensson Glad (talk) 18:55, 9 August 2019 (UTC)

Titles with pipes are better than no titles. Headbomb {t · c · p · b} 19:37, 9 August 2019 (UTC)

If done properly, yes. But I rather the bot not make any changes, or only "correct" changes. Not 'better than nothing' chnages, since a human still needs to clean it up. Jonatan Svensson Glad (talk) 21:15, 9 August 2019 (UTC)

Pipes should convert to {{!}}. A literal pipe is a reserved character in CS1|2 and shouldn't exist at all. HTML pipes could ideally be converted to {{!}}. -- GreenC 20:31, 9 August 2019 (UTC)

I would prefer not to add “fixing this” to the bots tasks. AManWithNoPlan (talk) 21:33, 9 August 2019 (UTC)

The website cannot seem to make up its mind about what the better title is. AManWithNoPlan (talk) 21:50, 9 August 2019 (UTC)

<title>One Direction Tour Tickets Sell Out In Minutes | MTV UK</title>
<meta property="twitter:title" content="One Direction Tour Tickets Sell Out In Minutes | MTV UK" /> 
<script type="application/ld+json">{"@context":"http:\/\/schema.org","@type":"NewsArticle","headline":"One Direction Tour Tickets Sell Out In Minutes","url":"http:\/\/www.mtv.co.uk\/one-direction\/news\/one-direction-tour-tickets-sell-out-in-minutes","keywords":["one direction"],"dateCreated":"2013-05-25T12:32:44+01:00","articleSection":"One Direction"}</script>
<meta property="og:title" content="One Direction Tour Tickets Sell Out In Minutes | MTV UK" />

reFill and Citoid give the same title we do. AManWithNoPlan (talk) 21:52, 9 August 2019 (UTC)

there is no reliable way to determine if the after the pipe stuff is part of the title or not. It is actually more of a philosophical question than a factual question. AManWithNoPlan (talk) 18:04, 11 August 2019 (UTC)

I would return {{!}} or the HTML string | rather than avoiding fixing this, whether it's part of a title or elsewhere (except in URLs). --Izno (talk) 18:10, 11 August 2019 (UTC)

Just for the records, Citoid also blindly adds titles with pipes, resulting in stray text without a parameter. At least escaping the pipes as Izno says should be uncontroversial. I have no opinions on stripping them (I often do so manually as Josve says). Nemo 09:18, 12 August 2019 (UTC)

one problem we run into is websites that pipe parts the opposite direction host|section|title AManWithNoPlan (talk) 10:53, 12 August 2019 (UTC)

Converting to {{!}} is all that should happen here. Headbomb {t · c · p · b} 11:44, 12 August 2019 (UTC)

is there any reason to prefer the pseudo template over html? AManWithNoPlan (talk) 11:18, 20 August 2019 (UTC)

It's more wikitextish, but beyond that, no. --Izno (talk) 12:22, 20 August 2019 (UTC)

It's more recognizable in edit window. It's a rather cosmetic and not really a critical issue though. Headbomb {t · c · p · b} 17:00, 20 August 2019 (UTC)

more details needed on "! CrossRef title did not match existing title"

Hello Martin. Thanks for this amazing power tool, which makes life so much easier!

I wonder if you might add details to the debug messages for "! CrossRef title did not match existing title", which cropped up thrice today on Mefloquine. I suspect that the existing title was an all-caps version of the correctly capitalised title, however, it is not immediately apparent to the casual user. It would be nice if you added a printout of the prior title and the new title, so that users might compare the two on the debug-screen page and, if necessary, act to remedy the inconsistencies.

I doubt that this link will show you what I mean because it is likely to vaporise, but here goes: Mefloquine title did not match

Have I explained myself well? Please contact me if not. And thanks again!

Magnoffiq (talk) 15:46, 23 August 2019 (UTC)

We cannot do that without confusing people more. The number of incoming data pieces and existing data pieces is quite large and we have print them all. You can click on the DOI link and see what CrossRef shows and search for it on the existing page and see what the page already has. AManWithNoPlan (talk) 18:54, 23 August 2019 (UTC)

gamma vs the greek character. Extra ": a review" at the end. AManWithNoPlan (talk) 19:12, 23 August 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2104 AManWithNoPlan (talk) 19:53, 23 August 2019 (UTC)

{{fixed}} with a few more pulls. AManWithNoPlan (talk) 22:10, 24 August 2019 (UTC)

Redundant JSTOR url

Status: mostly {{fixed}}
Reported by: Headbomb {t · c · p · b} 02:02, 24 August 2019 (UTC)

What should happen: [14]
We can't proceed until: Feedback from maintainers

Books and their reviews

Status: rarely not {{fixed}}
Reported by: Catfish Jim and the soapdish 08:39, 22 August 2019 (UTC)

What happens: filling reference for a book with a book review
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Carnoustie&diff=911879057&oldid=911667160
We can't proceed until: Feedback from maintainers

No idea how widespread this is... only noticed as it is on a page I watch. Catfish Jim and the soapdish 08:39, 22 August 2019 (UTC)

Thank you. This is a problem in less than 0.5 % cases according to studies of Unpaywall data, but it can happen. The best solution is to report any bug to Unpaywall, the worst solution is to try second-guessing ourselves. Nemo 10:10, 22 August 2019 (UTC)

One can reduce the likelihood of this by using {{cite book}} instead of {{citation}} and by adding things that are “bookish” like isbn, olcn, etc since we use all those to guess whether the reference is a book or a review when querying AddAbs. Lastly, add a comment in the |bibcode= area to stop it. This bug is very rare since i added about a dozen lines of code to guess if its a book or a review. AManWithNoPlan (talk) 11:10, 22 August 2019 (UTC)

location vs publication-place

Status: {{notabug}}
Reported by: Johannes Schade (talk) 14:51, 10 August 2019 (UTC)

We can't proceed until: Feedback from maintainers

Dear Programmer. Nice tool. I tried your bot on the page Antoine Hamilton. I verified the doi very well. However, it also replaced all the '|publication-place=' parameters in the citation template to '|location=' parameters. My understanding was that publication-place is now old and should be replaced by publication-place. Finally it said it could not find the isbn 9780198613741, which is however the 13-digit version of the isbn 0-19-861374-1 marked in the book. Johannes Schade (talk) 14:51, 10 August 2019 (UTC)

Better series handling

Status: {{fixed}} with lists of journals that identify as book series. This way we can add more over time, rather than figuring it out with heuristics
Reported by: Headbomb {t · c · p · b} 00:46, 25 August 2019 (UTC)

What should happen: [15]
We can't proceed until: Feedback from maintainers

Pleas explain. AManWithNoPlan (talk) 13:54, 26 August 2019 (UTC)

What's to explain? Methods of Molecular Biology is a book series, and it should be handled as such? Convert to {{cite book}}, use |series= over |journal= and remove duplications, and use |chapter=+|title=? Headbomb {t · c · p · b} 14:08, 26 August 2019 (UTC)

Uses cite book for web site about a journal

Status: {{fixed}}
Reported by: David Eppstein (talk) 20:36, 22 August 2019 (UTC)

What happens: Special:Diff/912033008
What should happen: The added title is correct but the template should have been left as cite web, not changed to cite book
We can't proceed until: Feedback from maintainers

That is a very interesing problem. Are we referencing the Book like object itself that the website is a copy thereof or are we referencing the website itself. AManWithNoPlan (talk) 19:55, 23 August 2019 (UTC)

AManWithNoPlan What book-like object? What copy? That's a link to the publisher's web site about a journal that they publish. It's a web site. Not a book. Not even a journal. Just a web site. —David Eppstein (talk) 07:13, 25 August 2019 (UTC)

my point is that a journal series is a book like object. We convert amazon links and google books links to books. If you are referencing the journal than one can debate if book/journal/web is best. Unfortunately that website in its meta data presents itself more like a book than a website. We query citoid, so we can’t really fix that because it happens outside of our code. AManWithNoPlan (talk) 11:11, 25 August 2019 (UTC)

I can probably black list that domain. AManWithNoPlan (talk) 11:21, 25 August 2019 (UTC)

It's not a reference to a journal. It's a reference to a web site about a journal. It's used to source some information about the journal, not used to source information published in the journal. You are misinterpreting the metadata about the journal as being metadata about the web site about the journal. And much as preventing the creation of links to Elsevier might amuse me, I think it would be a bad idea. Or does "blacklist" merely mean to prevent the bot from touching that link? —David Eppstein (talk) 01:52, 26 August 2019 (UTC)

black listed in that any url with ‘journal’ in the hostname with not be web to book changed. AManWithNoPlan (talk) 02:23, 26 August 2019 (UTC)

@AManWithNoPlan: Journal in the hostname is a bad blacklisting, too many journal articles will use it. Blacklisting journals.elsevier.com however would be fine. Headbomb {t · c · p · b} 03:50, 26 August 2019 (UTC)

Indeed, for instance journals.cambridge.org and tons of university-run journals, some of which often act as books. Nemo 05:15, 26 August 2019 (UTC)

It’s only the zotoro based changing. Chapters/isbn/etc will still trigger it. AManWithNoPlan (talk) 13:54, 26 August 2019 (UTC)

Capitalization of journal titles

Status: {{fixed}} for journals mentioned
Reported by: Renata (talk) 18:31, 26 August 2019 (UTC)

What happens: Lithuanian language is weird in that it does not capitalize every word in book/journal titles. Bot ignores it and capitalizes all words in journal titles.
Relevant diffs/links: [16] and [17]
We can't proceed until: Feedback from maintainers

I will add some Lithuanian words to the list of foreign words. By the way whether what the bot did is wrong or right depends upon the style. Many styles specify capitalization independently of the native language. AManWithNoPlan (talk) 18:47, 26 August 2019 (UTC)

Would it be possible to ignore title capitalization if the language parameter says Lithuanian? That would seem to be a more efficient solution. Renata (talk) 19:11, 26 August 2019 (UTC)

not really. It’s rarely set and often journal titles are English even when the articles are not. AManWithNoPlan (talk) 22:54, 26 August 2019 (UTC)

"Removed URL that duplicated unique identifier"

[18] I'm getting these all the time now and I think they arguably make the citation sections worse. There's no way that [edit: general readers] know to click on the linked "doi" when the citation's title itself is unlinked. I'll note that the {{cite journal}} documentation examples keep the url parameter even when a doi is provided.

Where is the consensus to make this edit en masse? czar 13:26, 20 July 2019 (UTC)

I'm going out now but I'll leave a quick answer to one of your points: a lot of people do, in fact, know to click the DOI. We know for sure from CrossRef data: https://www.crossref.org/blog/https-and-wikipedia/ https://www.crossref.org/blog/real-time-stream-of-dois-being-cited-in-wikipedia/ Nemo 13:36, 20 July 2019 (UTC)

unless the url is free to download without logging in, you should not add them unless there is no other links out. AManWithNoPlan (talk) 13:39, 20 July 2019 (UTC)

there is even movement afoot to remove the automatic linking of titles when a PMC is present. AManWithNoPlan (talk) 13:47, 20 July 2019 (UTC)

My question was where this consensus has been established, or if this is just a practice localized to editors using this bot/tool. czar 15:17, 20 July 2019 (UTC)

I don’t have time to look it up, but the links are in the talk archives somewhere—hopefully someone not in an auto parts store can respond better. AManWithNoPlan (talk) 15:31, 20 July 2019 (UTC)

The general idea is that these links are redundant with the DOI/other identifiers, who are clear about where they take you (doi: version of record, jstor = jstor repository, etc... If you don't know what those are, we have the wikilinks). |url= is then freed up to be used for freely-available full text versions-of-record of the paper hosted on an author's website, or similar. If the DOI version is free, you can use |doi-access=free to mark it as free, etc. Headbomb {t · c · p · b} 17:29, 20 July 2019 (UTC)

Please see the usage page for why {{notabug}} AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)

@AManWithNoPlan, sorry, where on the usage page is the consensus/discussion to remove url parameters when a doi is provided? czar 23:21, 21 July 2019 (UTC)

I thought someone added it. Weird. AManWithNoPlan (talk) 23:51, 21 July 2019 (UTC)

It's long standing practice to do this, for the reasons outlined above. Many bots have been approved for this sort of cleanup too, e.g. User:CitationCleanerBot. If you want the title always linked, go to Help talk:CS1 and request that |url= is automatically set to https://doi.org/10.1234/1234567890 whenever a DOI is present. Likewise for other identifiers of record. Headbomb {t · c · p · b} 00:10, 22 July 2019 (UTC)

So is the answer to my question that there is no documented discussion of consensus? czar 12:44, 27 July 2019 (UTC)

the answer is that people are to busy too dig it up. AManWithNoPlan (talk) 14:35, 27 July 2019 (UTC)

Another answer might just be that there has been no 'formal' discussion because formal discussion is not a requirement for something that, it would appear, has silent consensus. I would guess that thousands of edits of this type have been made by the bot and by individual editors (I am one). As far as I know, there has been little to no discussion about removing urls that duplicate the named identifiers. I've done it a lot and have seen quite a few where the url had rotted on the vine while the named-identifier link worked properly.

—Trappist the monk (talk) 15:04, 27 July 2019 (UTC)

And bots like User:CitationCleanerBot which has explicit approval for such things. Headbomb {t · c · p · b} 20:18, 27 July 2019 (UTC)

Per Trappist, I would ask Czar to find any past discussion (with a few users who argued) against this established practice. I'm sure you can find some and it would help focus the discussion, because there are various ways to look at it.

I've searched a bit at the village pump and I couldn't find any, although I did find a rather surreal discussion of 2010 on the relationship between DOI and promotion to publishers (you can presumably find many variants of that argument in discussions on Wikipedia:Credo and other similar schemes) plus a few discussions with relevant comments in passing such as "urls to dois should generally not be placed in |url= when there is |doi= because that constitutes overlinking and because most most dois are behind paywalls"

In general, in my opinion policies and guidelines contain two signs that the removal of URL redundant with DOI is desirable.

The very fact that there is consensus on adding a parameter for a certain identifier in {{cite journal}} or others proves that there is a desire to have that identifier presented in a structured way (see Citation templates now support more identifiers, 2011). It follows logically that there is a desire for the identifier information/link to be moved to the structured parameter rather than left lingering in N other ways it can be inserted (the |id= and |url= parameter, free text after the citation template, other templates after the citation templates etc. etc.). Nobody ever complained of people removing links to PubMed or CiteSeerX to use the corresponding identifier parameters instead, after they were introduced: it was the logical expectation. The same for the DOI, especially when doi-access was introduced to give more granular information about it and its target.
At Help:Citation Style 1#Online sources elsewhere you can see that the URL parameter is generally expected to point to a full text of the cited document, open for everyone to see. So strong is the assumption, that in a few places you find a note that yes, a paywalled URL is acceptable if necessary for verifiability: it's clearly considered an exception, because nowhere you will find a general statement that paywalled and commercial copies are preferred over the others. (Such notes were added relatively late in the life cycle of the citation guidelines, around 2009; see also 2010, 2011, 2014 discussions.) The official publisher URL (to which the DOI leads when resolved with doi.org) is generally paywalled so it would by default not be the ideal content of an URL parameter even if the DOI parameter didn't exist.

Nemo 09:09, 28 July 2019 (UTC)

As I was pointed here after I had reverted a "Dup URL" edit, I think if there is consensus, then another change should be made to the templates particularly cite journal that both doi= and url= should not be present, that doi= takes precedence and should automagically populate the URL field with the correct DOI URL, and that this can be flagged in red text in the reflist as other errors. you can still have the bot go around cleaning it up, too, but this helps users to clean it faster (those red errors are easy to spot). I do note that even for paywalled URLs, you still get that the cited journal article exists, its abstract, and sufficient citation deals to meet WP:V, but ideally the DOI URL should get you there too. --Masem (t) 17:26, 2 August 2019 (UTC)

Found this thread from 2015:
I have also seen it argued that readers are more likely to click on a linked article title (which |url= provides) than on an obscure series of letters and numbers and symbols following a cryptic initialism. I haven't done A/B usability testing with readers to find out if this is true, but it seems reasonable to me. Jonesey95 (talk) 05:49, 21 March 2015 (UTC)
— Help talk:Citation Style 1/Archive 7 § Additional link to doi, bibcode, arxiv, etc. via the url parameter
But yes, not an easy discussion topic to query, hence why I thought I'd have better luck with meatspace. My concern is essentially the same one I'm quoting (and as Masem alludes). I have no strong opinion on removing |url= when a total duplicate for the |doi= but from my experience watching people use Wikipedia, when the citation's title is unlinked, readers with no knowledge of DOIs aren't going to click through the links unless they're interested in figuring out what a DOI is (same for ISSNs, ISBNs, and similar identifiers). Maybe that makes this more of a CS1 discussion now? There is also a separate discussion to be had re: the edit I first cited above, which removed a |url= that linked to the full text but no |doi-access=free was replaced in its stead. czar 21:47, 3 August 2019 (UTC)
- Currently |doi-access=free does not turn the title into a link, so I'm personally not especially motivated to add it. Nemo 19:02, 4 August 2019 (UTC)
  - @Czar and Nemo bis: you may be interested in this RFC which would make that option a reality. Headbomb {t · c · p · b} 21:20, 4 August 2019 (UTC)
- Another discussion was Wikipedia:Bots/Requests for approval/DOI bot 2#Adding URLs to nonfree articles? "the usual style in articles I edit is that url= is reserved for articles where the entire text is freely readable, and that url= is not used for articles where just the abstract is readable (for that, you can just live with the DOI or PMID or whatever)". Nemo 13:12, 6 August 2019 (UTC)

If that is the common sentiment, shouldn't it be added to the CS1 documentation? It's hard to have a discussion for/against the practice because the current standard isn't documented in a central location. czar 20:40, 11 August 2019 (UTC)

{{notabug}} Headbomb {t · c · p · b} 23:13, 29 August 2019 (UTC)

Remove via=domain when adding work for BBC News

Status: {{fixed}} I think.
Reported by: Jonatan Svensson Glad (talk) 21:27, 18 August 2019 (UTC)

What should happen: Remove |via=www.bbc.co.uk when adding |work=BBC News
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Pentecostalism&diff=prev&oldid=911441255
We can't proceed until: Feedback from maintainers

Title field and cite encyclopedia

Status: {{fixed}}
Reported by: Umimmak (talk) 23:05, 29 August 2019 (UTC)

What happens: When Template:Cite encyclopedia is used, even when the title of the dictionary is added in |dictionary= or its aliases, the bot redundantly adds the title of the entire work to |title=.
What should happen: The title should not be repeated in two separate fields.
Relevant diffs/links: [19]
We can't proceed until: Feedback from maintainers

Surgeon General of the United States

Status: {{fixed}} quite a bit and no agreement yet
Reported by: QuackGuru (talk) 23:03, 28 July 2019 (UTC)

What happens: Added "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health". The url goes to the CDC website but it is a copy of the Surgeon General of the United States report.
What should happen: The bot should remove "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health"..[20]
We can't proceed until: Feedback from maintainers

The bot should undo this edit. QuackGuru (talk) 18:56, 5 August 2019 (UTC)

According to the website, the preferred citation includes that information: U.S. Department of Health and Human Services. The Health Consequences of Smoking: 50 Years of Progress. A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health, 2014. AManWithNoPlan (talk) 18:18, 11 August 2019 (UTC)

The bot is listing "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health" as an "author". QuackGuru (talk) 18:34, 11 August 2019 (UTC)

if you do not like who the publisher lists as the author, then block the bot with author1=  AManWithNoPlan (talk) 17:34, 12 August 2019 (UTC)

I thought "author" parameters were reserved for a person's name. Is there a publisher1 and publisher2 and so on for co-publishers or could this be created for co-publishers when other authors are not a person's name? QuackGuru (talk) 18:29, 12 August 2019 (UTC)

Well, that's an interesting question. There is a loose hierarchy publisher > editor > author, similar to series > title > chapter in books. But that still pretty vague, and non-humans can be authors. AManWithNoPlan (talk) 23:50, 12 August 2019 (UTC)

See Template:Cite book. I could not find where it mentions "author" for non-humans. There is no solution for when there are multiple co-publishers. Just let it be or someone could propose creating new parameters for co-publishers. QuackGuru (talk) 00:35, 13 August 2019 (UTC)

The use of |author= for organizational authors is permitted. --Izno (talk) 23:25, 13 August 2019 (UTC)

I prefer the creation of new parameters for |publisher1= and |publisher2= and so on. QuackGuru (talk) 23:38, 13 August 2019 (UTC)

I have seen cases of co-publishers, but that is not the case here. Looking at the edit QG links to, it appears to me that the essential problem is citing the "Surgeon General of the United States" as the publisher. It is quite unlikely that the Surgeon General has personally published that item. (I can conceive of the office "of the Surgeon General" doing so, but unlikely.) Someone should take a closer look at the source to sort out who the (possibly "corporate" or institutional) author is, and who actually managed the publication. At any rate, this case does not warrant citing multiple publishers when the real issue is who is the (singular) publisher. And certainly does not warrant multiple publisher parameters. ♦ J. Johnson (JJ) (talk) 23:55, 13 August 2019 (UTC)

It is a report of the SG. Adding "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health" is fine. But what is the best way to add it? How can I list it as a co-publisher? QuackGuru (talk) 16:19, 14 August 2019 (UTC)

"Report of the SG" is ambiguous as to authorship (responsibility), publisher, etc. You want the "best way to add" something, where I would say it is not clear as to exactly what should be added. (And unless the document says "co-published" I rather doubt that is the case.) What you need to do is examine the document closely, perhaps with the help of a medical librarian. Or look at how other publications cite it. But be cautious. E.g., I would not go with citoid's identification of the author as "General". ♦ J. Johnson (JJ) (talk) 23:22, 14 August 2019 (UTC)

if there is only a single non-human author from citoid, we reject it. This author is from the pubmed API based upon the PMID. AManWithNoPlan (talk) 00:00, 15 August 2019 (UTC)

So is the Surgeon General a non-human author? [Caution! lots of sharp edges in that question; handle with care.] ♦ J. Johnson (JJ) (talk) 19:04, 15 August 2019 (UTC)

See how other publications cite it. For example, see "While the most recent Surgeon General's Report on the "Health Consequences of Smoking"..."[21] QuackGuru (talk) 19:38, 15 August 2019 (UTC)

Isn't that just what I said? ("Or look at how other publications cite it.")

Note that what you just quoted is not a citation. A citation – more precisely, a full citation – has bibliographic details, etc. Which medical journals tend to pare down to what is minimally sufficient (such as leaving off the publisher), but if you search for this report on Google Scholar you should find lots of hits, and quite likely some useful examples.

There is no bot issue here, so I think we're done. ♦ J. Johnson (JJ) (talk) 21:36, 16 August 2019 (UTC)

It is not a bot issue unless there is a new way to format it for organizational authors in the future. For now this is the way to cite it. QuackGuru (talk) 21:42, 16 August 2019 (UTC)

Not quite; cs1|2 has |chapter= and |chapter-url=; use them:

{{cite book |chapter-url=https://stacks.cdc.gov/view/cdc/21569/Share |title=The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General |chapter=Nicotine |date=2014 |pages=107–138 |publisher=[[Surgeon General of the United States]] |pmid=24455788 |archive-url=https://web.archive.org/web/20150915172434/http://www.surgeongeneral.gov/library/reports/50-years-of-progress/sgr50-chap-5.pdf |archive-date=15 September 2015 |author1=National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health}}

National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health (2014). "Nicotine" (PDF). The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. Surgeon General of the United States. pp. 107–138. PMID 24455788. Archived from the original on 15 September 2015.

I left |pages=107–138 but do your readers a favor: for in-line citations like this one, use an appropriate in-source location parameter and value to identify where in the source the supporting information is; don't make readers search through 32ish pages to find the the supporting information.

—Trappist the monk (talk) 22:06, 16 August 2019 (UTC)

For the page numbers I had to re-format it. QuackGuru (talk) 23:12, 16 August 2019 (UTC)

Four things about that:

SGUS is not an author listed in Safety of electronic cigarettes § Bibliography so readers who might read a printed copy of the article won't be able to find it without a special decoder-ring that tells them that SGUS = National Center for Chronic Disease ...
items in §Bibliography should be listed in alpha order by author
clicking this title link The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General I don't expect to land at "Nicotine". Don't astonish readers.
why is it |url=https://stacks.cdc.gov/view/cdc/21569/Share (not a dead link) but |archive-url=https://web.archive.org/web/20150915172434/http://www.surgeongeneral.gov/library/reports/50-years-of-progress/sgr50-chap-5.pdf? The root url should be the same in both.

—Trappist the monk (talk) 23:59, 16 August 2019 (UTC)

SGUS stands for Surgeon General of the United States. They are the publisher. The National Center for Chronic Disease... is a co-publisher/author. I listed it by year. I removed the archived link. The other link has a PDF file. QuackGuru (talk) 00:29, 17 August 2019 (UTC)

Date formats

Status: {{notabug}}
Reported by: PamD 23:35, 30 August 2019 (UTC)

What happens: not is adding dates formatted as 2019-08-29 although existing dates are all formatted as 28 August 2019
What should happen: should use existing format
Relevant diffs/links: Sandra Appiah
We can't proceed until: Feedback from maintainers

I would say that this is not a bug. Add either of the {{use dmy dates}} or {{use mdy dates}} to an article and the cs1|2 templates will render dates in the chosen format; see the {{use xxx dates}} documentation.

—Trappist the monk (talk) 23:42, 30 August 2019 (UTC)

definitely not a bug. AManWithNoPlan (talk) 01:17, 31 August 2019 (UTC)

Convert – to –

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 11:54, 24 August 2019 (UTC)

What should happen: [22]
Relevant diffs/links: Likewise for whatever the code for mdash is.
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2130 AManWithNoPlan (talk) 18:56, 31 August 2019 (UTC)

Caps: B/gcvs instead of B/GCVS or whatever other variations (B/Gcvs, b/Gcvs...)

For sources like [23]. Headbomb {t · c · p · b} 19:53, 2 September 2019 (UTC)

Archive-url & associated parameters stripped out.

Status: {{notabug}}
Reported by: Quuux (talk) 05:53, 3 September 2019 (UTC)

What happens: strips out archive-url, archive-date, dead-url parameters when sites are dead (maybe malformed 404?)
Replication instructions: run it on e.g. Mickey Rooney, or for cite: "Mickey Rooney Claims Elder Abuse, Testifies Before Senate Committee". AARP Bulletin. 2011. Retrieved 2019-09-03. {{cite web}}: Check |archive-url= value (help)
We can't proceed until: Feedback from maintainers

That’s a good thing. Archive URLS are copies stored on remote archive websites. Not the original URLS. AManWithNoPlan (talk) 10:52, 3 September 2019 (UTC)

Yes, but shouldn't they stay when the original is dead? Quuux (talk) 11:28, 3 September 2019 (UTC)

The uses that AMWNP removed with this edit are incorrect. The intent of |archive-url= is to hold a web archiving webpage, such as the same page hosted at Internet Archive. --Izno (talk) 13:55, 3 September 2019 (UTC)

Request: add "subscription required" tag

Status: we {{wontfix}} it since we can’t
Reported by: TrottieTrue (talk) 23:02, 31 August 2019 (UTC)

What should happen: [24]
We can't proceed until: Feedback from maintainers

How would we reliably know? AManWithNoPlan (talk) 23:50, 31 August 2019 (UTC)

Title’s where the original title has quotes

Status: so rare that we cannot fix it beyond existing comment blocks. {{wontfix}}
Reported by: Nick Number (talk) 14:37, 4 September 2019 (UTC)

What happens: Single quotes are removed from a citation title when they are present (as double quotes) in that title.
What should happen: Double quotes should be retained as single quotes per MOS:QWQ.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Yren_Rotela&curid=60225218&diff=913954345&oldid=889188188
We can't proceed until: Feedback from maintainers

A more precise style guide https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Titles#Typographic_effects AManWithNoPlan (talk) 21:27, 4 September 2019 (UTC)

it is up to the editor to take special care in these extraordinarily limited case and flag the title with a comment to this effect. AManWithNoPlan (talk) 21:30, 4 September 2019 (UTC)

I have now done that. AManWithNoPlan (talk) 22:18, 4 September 2019 (UTC)

New handle

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 21:01, 7 September 2019 (UTC)

What should happen: |url=http://hdl.cqu.edu.au/10018/1029016 --> |hdl=10018/1029016
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Myricetin&diff=914506213&oldid=914505747
We can't proceed until: Feedback from maintainers

Better handling of Google Books URLS

Status: {{fixed}}
Reported by: SpinningSpark 10:26, 8 September 2019 (UTC)

What happens: This bot edit changed the book link of Suetonius, Twelve Caesars from this to this. The first link takes the reader directly to the relevant page with the relevant passage highlighted. The bot version goes to the book home page, requiring the reader to search manually. The bot has preserved the paramater "dq=signals" but this does not actually execute a search, "q=signals" is required to do that, "dq=" is the highlighting applied to search results so is useless without a "q=" or a page parameter.
What should happen: If the editor has constructed the url to link to a specific page or highlight specific text then this should be preserved.
Replication instructions: The bot may possibly have been fooled by the unusual page parameter of "jtp=296". Page parameters are normally "pg=PA<n>" (arabic numerals), "pg=PR<n>" (roman numerals), or "pg=PT<n>" (used on unpaginated books).
We can't proceed until: Feedback from maintainers

WebCite query strings

Status: feature disabled now {{fixed}}
Reported by: JJMC89 18:57, 8 September 2019 (UTC)

What happens: The url parameter of WebCite query strings is removed.
What should happen: Citation bot should follow Wikipedia:Using WebCite#Use within Wikipedia.
Relevant diffs/links: Special:Diff/914645096
We can't proceed until: Feedback from maintainers

This statement on that page “This is followed by the original URL which helps protect against malicious code that is hiding an inappropriate link, such as spam.” is blatantly wrong. The truth is the opposite “the url at the end is meaningless and allows people to append anything they want to and lie to people” is the truth. AManWithNoPlan (talk) 00:51, 9 September 2019 (UTC)

No. The reason we do this is to avoid malicious URLs from getting past Wikipedia edit list filters during page-save. This is done per an RfC as a solution to the WikiCite problem since web shortening is otherwise disallowed on Wikipedia because it avoids the edit filters. Thus we tack on the URL at the end so the Wikipedia edit filters can process it. The URL is not a "lie" it is checked and rechecked by bots using the WebCite API to ensure they match, our bots are continually checking. Please do not remove this URL otherwise we are in a bot war, our bots will just re-add it per the RfC requirement and policy about web shortening and edit filters. -- GreenC 03:51, 9 September 2019 (UTC)

If the URL parameter needs to be checked/enforced by bots anyway, why can't that job just be performed on the actual url parameter of the template? That's what AbuseFilter rules should be targeting. Nemo 06:20, 9 September 2019 (UTC)

AbuseFilters can not target templates as far as I know. Archive URLs often exist outside templates (external links, bare links etc) -- GreenC 06:34, 9 September 2019 (UTC)

so the ones I found with the URL set to the wrong thing would have gotten caught eventually. Interesting. Those bots should probably change the URL to the other non-time stamp format to fix this problem for good. AManWithNoPlan (talk) 10:53, 9 September 2019 (UTC)

Yes. When people inject a false ?url=http:// into the WebCite URL it will get caught eventually. I run into them and its always a clever way to avoid a blacklist filter. Not super common but they do exist. Or they don't have a ?url=http:// at all, in which case the bot will try to add it (based on data from the WebCite API not the |url= field) and the bot gets blocked on page save due to the blacklist filter which is a flag of this problem. -- GreenC 14:45, 9 September 2019 (UTC)

Access-date removal

Status: {{wontfix}} for now....
Reported by: Marchjuly (talk) 05:44, 8 September 2019 (UTC)

What happens: bot removed |access-date= from citation because "there was no url"; in this particular case, there was a |url= provided, but the citation template syntax was malformed (the template was missing a vertical bar) and this what was causing the error. Not sure if this is a bug, but it might be worth seeing if the bot could be set up to recognize/fix this error instead.
Relevant diffs/links: Special:diff/Citation bot/914542884
We can't proceed until: Feedback from maintainers

Citation bot completes two journal citations but leaves two journal citations from the same source uncompleted

Status: {{fixed}}
Reported by: Jo-Jo Eumerus (talk, contributions) 10:19, 10 September 2019 (UTC)

What happens: Before this edit, the reference list contained four incomplete "cite journal" templates with http://adsabs.harvard.edu/abs/ URLs. Of these four templates, two had journal, volume, author and pagenumber information filled in, the other two didn't. It's not clear why they didn't get completed as well.
Relevant diffs/links: [25] + [26]
We can't proceed until: Feedback from maintainers

Temporary Block

A 1-hour block has been placed related to Wikipedia:Administrators'_noticeboard#Temporary_block_of_User:Citation_bot. — xaosflux ^Talk 18:15, 10 September 2019 (UTC)

{{fixed}} for now...

Ignore title=none when looking for journals/dois/etc...

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 19:09, 10 September 2019 (UTC)

What should happen: [27]
We can't proceed until: Feedback from maintainers

Caps: Pis'ma v Astronomicheskii Zhurnal

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 20:36, 10 September 2019 (UTC)

What should happen: [28]
We can't proceed until: Feedback from maintainers

Regular expression failure

Status: {{wontfix}} bugs intrinsic to PHP Regex mechanisms
Reported by: Jonatan Svensson Glad (talk) 21:41, 11 September 2019 (UTC)

What happens: Regular expression failure in Chemical process of decomposition when extracting Templates
Relevant diffs/links: https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&page=Chemical_process_of_decomposition
We can't proceed until: Feedback from maintainers

Also "Regular expression failure in Palermo when extracting Templates" Jonatan Svensson Glad (talk) 08:04, 12 September 2019 (UTC)

Request: more citation templates

Status: {{notabug}}
Reported by

TrottieTrue (talk) 20:27, 12 September 2019 (UTC)

What should happen: we need additional templates, for citing magazines, broadcasts, YouTube, Twitter etc.
We can't proceed until: Feedback from maintainers

Why? What is the need that is not adequately handled with the existing tools? ♦ J. Johnson (JJ) (talk) 20:51, 12 September 2019 (UTC)

If you just want more templates, this is the wrong place to ask for those. Try Help:CS1. However, many of those already exist, see {{Citation Style 1}} and Category:Citation Style 1 specific-source templates, which will contain things like {{cite tweet}} and {{cite Youtube}} (an alias of {{cite AV media}}). Headbomb {t · c · p · b} 21:09, 12 September 2019 (UTC)

I think that the user is referring to the options provided in the dropdown menu when you edit a page via a web browser. Cite web, news, book, and journal are the only available options. These provide a user-friendly field form that is much easier to fill out than the wikitext templates. Still not the right place to request, though, I'm guessing. -2pou (talk) 05:06, 13 September 2019 (UTC)

Yes. I guess I shouldn't have added it to the Archive page (I didn't think I had), but I don't know where else such a request would go. -- TrottieTrue (talk) 23:06, 14 September 2019 (UTC)

Missing space in title

Status: {{notabug}}
Reported by: 94.13.116.91 (talk) 16:02, 14 September 2019 (UTC)

What happens: The bot parses a title containing an italicised first word followed by a narrow space (?), and it writes a title with no italics (OK) and no space between the first and second words (not OK).
Relevant diffs/links: See [K2-18]], specifically this edit.
Replication instructions: You can play with this citation: . doi:10.3847/1538-4357/834/2/187. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)CS1 maint: unflagged free DOI (link)
We can't proceed until: Feedback from maintainers

There is nothing we can do about GIGO. The meta data in the crossref database does not have the space. So, the authoritative answer is missing the space. AManWithNoPlan (talk) 17:09, 14 September 2019 (UTC)

The Satanic Bible

It cannot possibly be intended behavior for this bot to be doing this... right? Pinging Chris Capoccia since that is the user who has "activated" the majority of these edits. Happy to try to address the DOI issue when I get a second, but I can't see any reason why the date as of which the DOI is broken needs to be updated approximately daily. Is there no logic to avoid this kind of watchlist/page history-clogging, unhelpful edit? GorillaWarfare (talk) 02:08, 14 September 2019 (UTC)

Perhaps if the day is within the last month then do not update I will look into that. Generally speaking running the bot on the same over and over again should be pointless. We consider running the bot again and getting more results to be a bug. AManWithNoPlan (talk) 02:16, 14 September 2019 (UTC)

Chris should also not be asking the same categories to be pointlessly processed over and over. Headbomb {t · c · p · b} 02:21, 14 September 2019 (UTC)

and the bot should not be giving people a false sense of usefulness by changing dates by less than a month. AManWithNoPlan (talk) 02:35, 14 September 2019 (UTC)

For sure. If a DOI is marked as broken, there's no real need to update the date all that often. Through the gadget, sure, since people choose to save or not. But through batch runs? Once a year should be enough. Headbomb {t · c · p · b} 04:05, 14 September 2019 (UTC)

fixed the trigger. Perlmutter ref had an access date with no URL. — Chris Capoccia 💬 12:59, 14 September 2019 (UTC)

also fixed the doi. — Chris Capoccia 💬 13:02, 14 September 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2145 AManWithNoPlan (talk) 20:16, 15 September 2019 (UTC)

{{fixed}} minimum of a months change in date. AManWithNoPlan (talk) 23:15, 15 September 2019 (UTC)

most likely bad meta data used

Status: {{fixed}}
Reported by: Strebe (talk) 05:00, 14 September 2019 (UTC)

What happens: This edit is a mess.
We can't proceed until: Feedback from maintainers

The meta data is “interesting”, I will think about filtering. AManWithNoPlan (talk) 17:21, 14 September 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2144 and https://github.com/ms609/citation-bot/pull/2143 AManWithNoPlan (talk) 19:40, 15 September 2019 (UTC)

Chapter url

Status: {{fixed}}
Reported by: DuncanHill (talk) 12:16, 15 September 2019 (UTC)

What happens: changes url to chapter url when the link isn't to the chapter
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=David_Lloyd_George&curid=46836&diff=915796543&oldid=915711272
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2142 AManWithNoPlan (talk) 19:21, 15 September 2019 (UTC)

Thank you. DuncanHill (talk) 20:31, 15 September 2019 (UTC)

If you remove firstn/lastn, also remove author-linkn/authorn-link

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 12:45, 24 July 2019 (UTC)

What happens: [29]
What should happen: [30]
We can't proceed until: Feedback from maintainers

That's more annoying than it sounds since we have to check a lot of name parameters. AManWithNoPlan (talk) 18:25, 9 August 2019 (UTC)

Purple background in this page makes it hard to read

I use the blackscreen gadget (it gives green text on a black screen) as it make Wikipedia much more readable for me. The purpleish background on parts of this page makes it almost impossible to read the text. DuncanHill (talk) 12:21, 15 September 2019 (UTC)

feel free to suggest coding changes to the bot bug template. Do ANY other templates detect your non standard style AManWithNoPlan (talk) 18:11, 15 September 2019 (UTC)

also, please point to information about this gadget, I have no knowledge of it. AManWithNoPlan (talk) 19:22, 15 September 2019 (UTC)

1) I am not a coder, 2) NONE cause any problems that I am aware of at the moment, there have been a few in the past but people have been very helpful in making their templates compliant when asked, and 3) it's called blackskin, it's one of the gadgets available in preferences ("Use a black background with green text" in Appearance), and is listed at Wikipedia:Gadget#Currently installed gadgets. DuncanHill (talk) 20:30, 15 September 2019 (UTC)

do you remember any of the templates names. I am curious what they did. AManWithNoPlan (talk) 23:19, 15 September 2019 (UTC)

How is this? AManWithNoPlan (talk) 14:22, 16 September 2019 (UTC)

It does look clearer now, thank you. DuncanHill (talk) 14:24, 16 September 2019 (UTC)

Convert Template:PMID when in ref tags with eight digits

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 05:46, 16 September 2019 (UTC)

What should happen: [31]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2146 AManWithNoPlan (talk) 20:53, 16 September 2019 (UTC)

Bot dies on page without error message

Status: {{fixed}} with code that detects AdsAbs Java stack trace
Reported by: Jonatan Svensson Glad (talk) 18:55, 17 September 2019 (UTC)

What happens: The bot "stops loading" midway when processing Isabella Clara Eugenia.

Extended content

[18:53:50] Processing page 'Isabella Clara Eugenia' — edit—history 
 
> Remedial work to prepare citations
   - Dropping parameter "access-date"
   - Dropping parameter "url"
   - Dropping parameter "access-date"
   - Dropping parameter "url"
 
> Consult APIs to expand templates
   > Checking that DOI 10.1093/ml/gcl154 is operational... DOI ok.
 > Querying CrossRef: doi:10.1093/ml/gcl154
   > Checking that DOI 10.30827/cn.v0i40.2562 is operational... It's not...
   + Adding doi-broken-date: 2019-09-17
   > Checking that DOI 10.30827/cn.v0i40.2562 is operational... It's not...
 > Using Zotero translation server to retrieve details from URLs.
   > Retrieved info from https://www.royalcollection.org.uk/collection/407377/the-infanta-isabella-clara-eugenia-1566-1633-archduchess-of-austria
   > Internal server error with URL https://www.findagrave.com/memorial/19232
 
> Expand individual templates by API calls
 > Checking CrossRef database for doi. 
 > Searching PubMed...  nothing found.
 > Checking AdsAbs database
 > Checking CrossRef database for doi. 
 > Searching PubMed...  nothing found.
 > Checking AdsAbs database
 > Checking CrossRef database for doi. 
 > Searching PubMed...  nothing found.
 > Checking AdsAbs database
 > Searching PubMed...  nothing found.
 > Checking AdsAbs database
   > AdsAbs search 2515/25000:
       identifier:"10.1093/ml/gcl154"
   > AdsAbs search 2516/25000:
       title:"Biagio Marini, Sonate Sinfonie: Canzoni, Passemezzi, Balletti, Correnti, Gagliarde, 
        Ritornelli, a 1, 2, 3, 4, 5 
        6 voci per ogni sorte di stromento, Opera VIII, ed. Maura Zoni."
 > Checking CrossRef database for doi. 
 > Searching PubMed...  nothing found.
 > Checking AdsAbs database
 > Searching PubMed... 
   ! Unable to do PMID search nothing found.
 > Checking AdsAbs database
   > AdsAbs search 2517/25000:
       identifier:"10.30827/cn.v0i40.2562"

Relevant diffs/links: https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&page=Isabella_Clara_Eugenia
We can't proceed until: Feedback from maintainers

Strip semicolons

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 00:13, 18 June 2019 (UTC)

What should happen: [32], [33]
We can't proceed until: Feedback from maintainers

This should perhaps not apply to |title= however. Also might not be safe to do in some identifiers. Headbomb {t · c · p · b} 00:14, 18 June 2019 (UTC)

and as always titles good friend |chapter= too. AManWithNoPlan (talk) 13:57, 23 June 2019 (UTC)

And contribution and other aliases. Headbomb {t · c · p · b} 15:49, 23 June 2019 (UTC)

And NOT & a m p ; and his friends. AManWithNoPlan (talk) 01:25, 30 June 2019 (UTC)

Rather than a blacklist, we would want a white list of parameters. AManWithNoPlan (talk) 15:17, 6 July 2019 (UTC)

Get the list of all parameters and remove those then. Headbomb {t · c · p · b} 20:07, 6 July 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2130 AManWithNoPlan (talk) 19:06, 31 August 2019 (UTC)

Should also handle author/editors/contributors/others (and their variants) Headbomb {t · c · p · b} 19:55, 31 August 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2153 AManWithNoPlan (talk) 18:37, 18 September 2019 (UTC)

Garbage archive-url cleanup

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 15:26, 14 August 2019 (UTC)

What should happen: [34]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2154 AManWithNoPlan (talk) 18:56, 18 September 2019 (UTC)

Bot dies

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 20:43, 18 September 2019 (UTC)

What happens: Bot stop loading on Zakir Naik. Nothing happens.
Relevant diffs/links: https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&page=Zakir_Naik
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2156 AManWithNoPlan (talk) 21:24, 18 September 2019 (UTC)

Request: Capitalize linked journals

Status: {{fixed}} https://github.com/ms609/citation-bot/pull/2157
Reported by: Headbomb {t · c · p · b} 17:09, 30 April 2019 (UTC)

What should happen: [35]
We can't proceed until: Feedback from maintainers

That is very dangerous territory. We would have to verify that the old page did not exist at all and that the new page did exist. We really have not ever got in the business of fixing red links. AManWithNoPlan (talk) 15:01, 2 May 2019 (UTC)

It's not a matter of fixing redlinks, it's a matter of capitalization. E.g. Journal of physics vs Journal of Physics or INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY vs International Journal of Systematic and Evolutionary Microbiology. Or Developmental neuroscience vs Developmental Neuroscience. Headbomb {t · c · p · b} 15:43, 2 May 2019 (UTC)

And in the rare case that the capitalized version links to a different page, it will link to the correct page instead of the wrong one. Headbomb {t · c · p · b} 15:48, 2 May 2019 (UTC)

Unless it’s a foreign-language title — the bot sometimes gets a little overzealous capitalizing words, and a redirect from a title with extra capitalization might not exist yet for articles about some publications. Umimmak (talk) 14:23, 4 September 2019 (UTC)

That's mostly taken care of through this + a custom list of foreign titles. This is just bringing the bot inline with what it would do to an unlinked title, so if there's an issue with capitalization, it wouldn't be specific to the linked version. Headbomb {t · c · p · b} 14:33, 4 September 2019 (UTC)

Incorrectly capitilized word in piped link

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 22:42, 22 September 2019 (UTC)

What happens: Changed [[Science (journal)|Science]] to [[Science (Journal)|Science]]
(Capitilized the J in journal)
What should happen: Nothing
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Swarm_behaviour&diff=next&oldid=917212784
We can't proceed until: Feedback from maintainers

This should apply to all disambiguators e.g. (Hindawi journal), (magazine), (website), (musicology journal), ... Headbomb {t · c · p · b} 23:07, 22 September 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2158 AManWithNoPlan (talk) 23:18, 22 September 2019 (UTC)

Unexpected conversion from cite web to cite journal

The Stanford Encyclopedia of Philosophy entry looked more like a website than a journal and was surprised by this conversion by Citation bot diff. Is it supposed to work this way? Final output is OK, but not sure what was wrong with cite web. — Chris Capoccia 💬 12:14, 23 September 2019 (UTC)

This one should probably go to {{cite encyclopedia}} instead. --Izno (talk) 16:21, 23 September 2019 (UTC)

Caps: The De Paulia; dell'

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 16:27, 23 September 2019 (UTC)

What should happen: [36] + [37]
We can't proceed until: Feedback from maintainers

De Paulia done. Waiting on this for dell words: Italian dell'xxx words AManWithNoPlan (talk) 17:41, 24 September 2019 (UTC)

Why waiting on for dell'? Headbomb {t · c · p · b} 19:02, 24 September 2019 (UTC)

Wiktionary is not aware of any other language using wikt:dell'. Of all the preposizioni articolate, degli and delle should also be rather safe. Nemo 19:25, 24 September 2019 (UTC)

Caps: N.Y.)

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 20:02, 24 September 2019 (UTC)

What should happen: [38]
We can't proceed until: Feedback from maintainers

Ignore orbit.dtu.dk

Status: {{fixed}}
Reported by: Nemo 12:08, 25 September 2019 (UTC)

What happens: An Elsevier Pure website misleads clients into believing it has a PDF and Citation bot added a link.
What should happen: Do not add an URL of the form https://orbit.dtu.dk/en/publications/, remove it if you see it. URLs under the path http://orbit.dtu.dk/ws/files/ are actual PDFs.
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Esophagus&diff=883170792&oldid=881966247
We can't proceed until: Feedback from maintainers

Hep Lib.web and other arXiv-mirrors are not journals

Status: {{fixed}} — this seriously looked oh so wrong
Reported by: David Eppstein (talk) 17:46, 25 September 2019 (UTC)

What should happen: Leave arXiv cite alone rather than turning it into bogus cite journal
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=ArXiv&type=revision&diff=917820799&oldid=916119433
We can't proceed until: Feedback from maintainers

HEP Lib.Web. does seem to be a journal or work of some kind, e.g. [39]. Full name is High Energy Physics Libraries Webzine. Headbomb {t · c · p · b} 19:30, 25 September 2019 (UTC)

🙄 AManWithNoPlan (talk) 21:35, 25 September 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2165 AManWithNoPlan (talk) 21:45, 25 September 2019 (UTC)

Ok, thanks. So the only action needed is to spell out the full name in such a way as to make it look like a name of a journal and not a web site to avoid the same confusion befalling others. —David Eppstein (talk) 22:28, 25 September 2019 (UTC)

Διδακτορική Διατριβή

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 19:57, 25 September 2019 (UTC)

What happens: |type=Διδακτορική Διατριβή
What should happen: Add something in English or nothing at all
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Operation_Animals&diff=prev&oldid=917851499
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2164 AManWithNoPlan (talk) 21:34, 25 September 2019 (UTC)

Should not overwrite title=none

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 03:19, 26 September 2019 (UTC)

Relevant diffs/links: [40]
We can't proceed until: Feedback from maintainers

More caps: ACS, ISME

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 16:41, 26 September 2019 (UTC)

What should happen: Capitalize Acs, Isme
We can't proceed until: Feedback from maintainers

Talk pages

Is the bot supposed to work on talk pages such as Talk:Lockheed SR-71 Blackbird ? Stepho talk 10:39, 22 September 2019 (UTC)

Only if requested. Here User:GeneralPoxter asked the bot to make an edit. Headbomb {t · c · p · b} 22:14, 22 September 2019 (UTC)

If a category includes it then yes also. AManWithNoPlan (talk) 11:07, 23 September 2019 (UTC)

{{notabug}}, I guess....

I think that the "main" function when running the bot on a category should be to only run on pages in article namespace (ns 0), and require a manual opt-in (such as &allNS=true) or something, since maintenance categories mistakenly has talk pages and template documentation pages in them. Jonatan Svensson Glad (talk) 20:12, 26 September 2019 (UTC)

It wouldn't be a bad idea to have category runs only allow for Main/Draft spaces by default. Headbomb {t · c · p · b} 22:24, 26 September 2019 (UTC)

{{fixed}} no longer in category mode. AManWithNoPlan (talk) 14:51, 27 September 2019 (UTC)

For purpose of title matching, strip sub/sup markup

If you have

Multiple metal-carbon bonds. 16. Tungsten-oxo alkylidene complexes as olefins metathesis catalysts and the crystal structure of W(O)(CHCMe<sub>3</sub>(PEt<sub>3</sub>)Cl<sub>2</sub>

This should be treated as equivalent to

Multiple metal-carbon bonds. 16. Tungsten-oxo alkylidene complexes as olefins metathesis catalysts and the crystal structure of W(O)(CHCMe3(PEt3)Cl2

For purpose of title-matching. Headbomb {t · c · p · b} 22:44, 26 September 2019 (UTC)

Author-link vs Agency

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 03:40, 29 September 2019 (UTC)

What happens: In edit 1: Changes |author=[[Associated Press]] to |author=Associated Press|author-link=Associated Press
In edtit 2: Changes |author=Associated Press|author-link=Associated Press to |agency=Associated Press|author-link=Associated Press
This causes a stray |author-link=Associated Press be left in the article.
What should happen: Change from author to journal in first edit
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Carl_Levin&diff=prev&oldid=918529543
https://en.wikipedia.org/w/index.php?title=Carl_Levin&diff=prev&oldid=918530007
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2171 AManWithNoPlan (talk) 17:44, 29 September 2019 (UTC)

A question on removal of my edits

You removed my edits from Proximity space with the log message "Removing self promotion".

Please explain which Wikipedia policy (the page and the exact quote, please) strictly disallows self-promotion.

I do realize that self-promotion should be restricted, but I see no rule in Wikipedia policy that would completely disallow it and thus would justify your removal.

If you don't explain it soon and do not restore it back, I will dispute it with the Wikipedia authorities.

--VictorPorton (talk) 21:22, 1 October 2019 (UTC)

You are mistaken. As you can see from this diff, Citation bot did not revert your edits.

—Trappist the monk (talk) 21:28, 1 October 2019 (UTC)

{{notabug}}, just a user who cannot read edit logs. AManWithNoPlan (talk) 22:21, 1 October 2019 (UTC)

Idea: Usage stats

Not really anything pressing, but now that we have OAuth in, it would be neat to have usage statistics. Who makes use of the bot. If the bot is activated via the web interface, scripts, etc... Or whatever else is trackable. Headbomb {t · c · p · b} 06:15, 27 June 2019 (UTC)

I guess one could sort the bot contributions based on if the edit summary said “category” and one could query Wikipedia and search for edit summaries with the “use this tool” text in them. AManWithNoPlan (talk) 13:40, 27 June 2019 (UTC)

Having a &via=... in the API would likely be a better way of tracking things, but right now I'm mostly thinking about something very non-critical. I'll take any bug fix and things that actual affect the edits of the bots over usage stats thought. Just figured if one of the talk page stalkers felt like compiling stats, or build a sub-module that would export information into an external database after every edit, well that's a nice little project. Headbomb {t · c · p · b} 17:17, 27 June 2019 (UTC)

We currently have no logging, so any logging would have to be done in the edit summaries. AManWithNoPlan (talk) 14:32, 28 June 2019 (UTC)

"Currently", yup. But if there was logging, we could have graphs/stats like [41], except for citation bot usage, instead of pageviews.

Anyway, it's an idea more than anything. Not critical by far, and I'd rather have someone else work on that if that ever gets done (unless we suddendly run out of edit-related bug fixes and feature requests). Headbomb {t · c · p · b} 15:15, 28 June 2019 (UTC)

Could it be enough to add a hashtag and rely on toolforge:hashtags/? Nemo 15:37, 28 June 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2170 AManWithNoPlan (talk) 17:27, 29 September 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2173 AManWithNoPlan (talk) 19:58, 30 September 2019 (UTC)

{{fixed}} using hashish tags tags. AManWithNoPlan (talk) 11:15, 5 October 2019 (UTC)

OAuth every time I use Citebot?

Why does Wikipedia citation bot (WP:UCB) require WP:OAuth every time I use it? I know there was a stir regarding the tool a while back but...? {{ping|waddie96}} {talk} 18:49, 18 September 2019 (UTC)

I don't know. Your granting permission should stick. It might be related to the fact that require permission to edit pages "as the user". We never actually use that, and the keeper of the OAuth setting could probably fix that. AManWithNoPlan (talk) 18:58, 18 September 2019 (UTC)

Waddie96, do you have cookies enabled for tools.wmflabs.org? Nemo 19:38, 18 September 2019 (UTC)

I agree this is something rather annoying. It didn't do that initially, but started to require it every time a few weeks/month ish after the rollout. Headbomb {t · c · p · b} 20:20, 18 September 2019 (UTC)

Every time there is a bug fix applied, ALL cookies are lost on the server side. 20:24, 18 September 2019 (UTC)

I think it is mostly fixed now. AManWithNoPlan (talk) 21:04, 5 October 2019 (UTC)

It certainly seems better. Headbomb {t · c · p · b} 21:13, 5 October 2019 (UTC)

{{fixed}} for the better. AManWithNoPlan (talk) 01:42, 7 October 2019 (UTC)

Redundant JSTOR chapter-url

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 21:16, 5 October 2019 (UTC)

What should happen: [42]
We can't proceed until: Feedback from maintainers

Caps:NDT & e International → NDT & E International

Self explanatory Headbomb {t · c · p · b} 02:52, 6 October 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 01:41, 7 October 2019 (UTC)

Cleanup: Remove empty journal/issue from cite book, remove empty ISBNs from cite journals

Those parameters should not be present on a cite book, likewise isbn should not (normally) be present in a cite journal. Only remove the empty ones though. Headbomb {t · c · p · b} 19:26, 5 October 2019 (UTC)

{{fixed}} AManWithNoPlan (talk) 17:17, 7 October 2019 (UTC)

500 error

Citation bot server keeps throwing a 500 error. Headbomb {t · c · p · b} 15:46, 7 October 2019 (UTC)

I don’t see it. You might need to delete your tools.wikimedoa.org cookies AManWithNoPlan (talk) 17:20, 7 October 2019 (UTC)

Probably a temporary hiccup. Things are fine now. Headbomb {t · c · p · b} 17:24, 7 October 2019 (UTC)

{{fixed}} now. AManWithNoPlan (talk) 18:06, 7 October 2019 (UTC)

Need to run twice to decapitalize

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 01:08, 6 October 2019 (UTC)

What happens: [43] + [44]
What should happen: Do it all in one edit
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2186 JSTOR's were not on the list of reliable information. AManWithNoPlan (talk) 21:40, 7 October 2019 (UTC)

Category/batch whitelist

Category/batch runs are being abused. Possibilities on dealing with this are:

a) a whitelist of people allowed to ask for unlimited category/batch runs
- This could just be something like extended confirmed. Edit: Template editors might be a better idea. Headbomb {t · c · p · b} 03:25, 11 September 2019 (UTC)
  - That is not what the permission is for. Please do not attach things to random permissions. --Izno (talk) 13:25, 11 September 2019 (UTC)
b) a whitelist for limited category/batch runs (say ~250 pages at once, tops)
- This could just be something like autoconfirmed/confirmed.
c) a way to kill inappropriate category/batch runs

And have category/batch runs disabled/greatly limited (~25 articles) for non-confirmed/whitelisted users. Headbomb {t · c · p · b} 17:48, 7 August 2019 (UTC)

A/B may also prevent sock puppets and "suspicious" new users that may intend to use the bot in ways that are undesired from doing so. Users without edits or very few edits might not check their edits or won't see possible mistakes by the bot and as such won't report them. Proposal B seems like a good one to go forward with in any case in my opinion. For proposal C it might good to define who could use that option, only maintainers and the operator or also some "trusted" users + we would also need to define what is considered inappropriate. For option A it might also be an idea to let extended confirmed up to 1000 pages, and then have a further whitelist of users who can do unlimited runs ie bureaucrats,administrators and "normal" users who have proven to understand of what the bot does, the impact of extremely large runs (ie don't run during high usage times) and possibly are also actively reporting bugs and joining in discussion here. Just a few things to think about. -Redalert2fan (talk) 20:05, 16 August 2019 (UTC)

The bot has been effectively disabled for the last week or so due to Chris Capoccia's insanely large-category requests (e.g. Category:Pages with citations having bare URLs) that hog all the resources. Please implement better parallelism à la first "Extended content" box in the see also link above, or something similar enough that one or two large requests doesn't disable the bot for everyone else. Headbomb {t · c · p · b} 15:40, 1 September 2019 (UTC)

maybe some simple intermediate steps would be good. currently the bot is still churning on something from a couple days ago. it doesn't even appear in any of my browser windows and there's no way for me to stop it. maybe the bot could refuse to do large requests. or even eliminate the category box altogether. — Chris Capoccia 💬 14:12, 2 September 2019 (UTC)

For intermediate steps, see #Category/batch whitelist. Headbomb {t · c · p · b} 19:55, 2 September 2019 (UTC)

it would be nice to have multiple bot/zotero accounts AManWithNoPlan (talk) 20:36, 2 September 2019 (UTC)

And again, because of the massive run against Category:CS1 errors: missing periodical, with over 300K articles in it. Please kill this! Headbomb {t · c · p · b} 14:54, 10 September 2019 (UTC)

I requested a bot block at WP:AN for this. This is way too large an unsupervised bot run. Headbomb {t · c · p · b} 17:43, 10 September 2019 (UTC)

Made another such request. Headbomb {t · c · p · b} 01:18, 14 September 2019 (UTC)

500 is a typical number for API request blocks, and the size of a page of diffs. Checking 500 diffs is some work, a common max size with the option to request another 500 etc.. -- GreenC

I suggest a soft limit of 100 to 250 save for a handful of users. That will usually take over an hour to process, and quite long to review as well. Headbomb {t · c · p · b} 01:44, 14 September 2019 (UTC)

When folks were doing large runs, I routinely checked thousands of edits a day out of curiosity. I'm not sure that a limit of few hundreds would be appropriate. Nemo 13:25, 14 September 2019 (UTC)

The issue here isn't so much the lack of checking, more than asking for runs larger than 100 or so disables the bot for everyone else for several hours. Headbomb {t · c · p · b} 16:11, 14 September 2019 (UTC)

Changes made. Should be mostly {{fixed}} AManWithNoPlan (talk) 00:27, 10 October 2019 (UTC)

What was done, exactly? Headbomb {t · c · p · b} 00:54, 10 October 2019 (UTC)

will refuse to run on large categories AManWithNoPlan (talk) 02:40, 10 October 2019 (UTC)

Yes, but what is the definition of a 'large category' here? Headbomb {t · c · p · b} 02:58, 10 October 2019 (UTC)

According to https://github.com/ms609/citation-bot/pull/2189 > 10000 pages. --Redalert2fan (talk) 04:23, 10 October 2019 (UTC)

That's way too big a limit. Headbomb {t · c · p · b} 17:47, 11 October 2019 (UTC)

Cut to 1000 AManWithNoPlan (talk) 19:48, 11 October 2019 (UTC)

Call that {{fixed}} for now AManWithNoPlan (talk) 20:44, 11 October 2019 (UTC)

url jstor cleanup

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 17:46, 11 October 2019 (UTC)

What should happen: [45]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2193 AManWithNoPlan (talk) 19:51, 11 October 2019 (UTC)

HTML vs real character and weird interaction with ref tags

Status: ref tags generate conflicting formatting. {{fixed}} regex to eliminate us adding to the problem
Reported by: Jonatan Svensson Glad (talk) 00:50, 29 September 2019 (UTC)

What happens: Bot converted the HTML entity &rsquo to ', causing the text next to it become boldend
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=30_Rock_(season_7)&diff=918507025&oldid=916082230
We can't proceed until: Feedback from maintainers

I don’t see the bold. AManWithNoPlan (talk) 17:48, 29 September 2019 (UTC)

See ref 17 in that diff above. Jonatan Svensson Glad (talk) 19:37, 29 September 2019 (UTC)

Only when immediately preceded with a ref tag on the same line. AManWithNoPlan (talk) 20:05, 29 September 2019 (UTC)

Looks like a bug in the handling of references in general in wikiland. AManWithNoPlan (talk) 18:49, 30 September 2019 (UTC)

^[1]

^[2]

'^ 30 Rocks
^ 30 Rock's

Discussing elsewhere https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Weird_ref,_citation,_white_space_interaction_bug AManWithNoPlan (talk) 01:49, 1 October 2019 (UTC)

In the diff in question, the bot converted the valid markup (two-single-quote)(italicized text)(entity for apostrophe)(two-single-quote) to the invalid markup (two-single-quote)(text with italic delimiter on one side and bold delimiter on the other side)(three-single-quote). How is that not a bot bug, regardless of the inconsistent rendering of the result? —David Eppstein (talk) 02:30, 1 October 2019 (UTC)

the resulting text renders properly unless it has ref tag right before it. AManWithNoPlan (talk) 02:49, 1 October 2019 (UTC)

Yes, so the bot bug is sometimes being masked by accidentally ok rendering. But it's still a bot bug. —David Eppstein (talk) 05:14, 1 October 2019 (UTC)

There is a bug in ref tags. Is this a bug in the bot? I am not sure, but I can say that wiki-text is not a 100% robust formatting system. This should avoid the above issue https://github.com/ms609/citation-bot/pull/2202 AManWithNoPlan (talk) 18:18, 13 October 2019 (UTC)

URL is not a type

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 12:35, 12 October 2019 (UTC)

What happens: |type=http://purl.org/dc/dcmitype/Text
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Rajiva_Wijesinha&diff=920872034&oldid=917626120
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2200 for URL being returned for the data type. AManWithNoPlan (talk) 14:12, 12 October 2019 (UTC)

Volume cleanup

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 01:42, 13 October 2019 (UTC)

What should happen: [46]
We can't proceed until: Feedback from maintainers

Make sure not to remove simple |volume=V for the roman numeral 5. Headbomb {t · c · p · b} 01:42, 13 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2203 AManWithNoPlan (talk) 18:26, 13 October 2019 (UTC)

caps: Algebra i Analiz

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 19:02, 14 October 2019 (UTC)

What should happen: [47]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2204 AManWithNoPlan (talk) 23:42, 14 October 2019 (UTC)

Publisher vs newspaper

https://en.wikipedia.org/w/index.php?title=Rajiva_Wijesinha&diff=920872034&oldid=917626120

As part of that same edit, Citation bot removed wikimarkup from |publisher=. If that is all that the bot is doing, please stop. For example, this template has wiki markup in |publisher=:

{{cite news| author = Hiran H. Senewiratne| title = SCOPP to close down| publisher = ''Daily News Online''| date = 25 July 2009| url = http://www.dailynews.lk/2009/07/25/news33.asp| accessdate = 13 March 2010| archive-url = https://web.archive.org/web/20100130201226/http://www.dailynews.lk/2009/07/25/news33.asp| archive-date = 30 January 2010| url-status = dead| df = dmy-all}}

Daily News is a newspaper so it's online presence should be treated as such. 'Fixing' the wiki markup error by the simple expedient of stripping the markup without also ensuring that the parameters in the template are used correctly only masks the underlying problem: newspaper name in |publisher=. The other wiki markup fixes are probably correct but Citation bot should not just strip markup as it appears that it does.

—Trappist the monk (talk) 12:52, 12 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2201 adds more newspapers to the publisher confusion list. AManWithNoPlan (talk) 14:12, 12 October 2019 (UTC)

Such a short list is PUBLISHERS_ARE_WORKS. In the code for Monkbot/task 14 I have a list of 1800+ newspapers (canonical names and redirects) all of which I have found in use at en.wiki. Perhaps it would be best to not attempt to make these kinds of fixes.

—Trappist the monk (talk) 15:46, 12 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2205 AManWithNoPlan (talk) 14:54, 15 October 2019 (UTC)

Missed a jstor cleanup

Status: {{notabug}} — not going to code for every typo
Reported by: Headbomb {t · c · p · b} 17:43, 15 October 2019 (UTC)

What should happen: [48]
We can't proceed until: Feedback from maintainers

we don’t handle jstor.or AManWithNoPlan (talk) 21:53, 15 October 2019 (UTC)

Well look at that. I completely missed the typo. Headbomb {t · c · p · b} 02:03, 16 October 2019 (UTC)

Question about handles

I'm building a list of various handle links, e.g.

What do you need to know to implement hdl convertion? Do you need to know all root paths

domains? Or just

http://digilib.gmu.edu/handle/...

or even just

http://digilib.gmu.edu/...

? Headbomb {t · c · p · b} 07:24, 27 June 2019 (UTC)

Also does knowing http vs https matter? Headbomb {t · c · p · b} 07:25, 27 June 2019 (UTC)

http and https is irrelevant. Right now, each and every URL path is specific. I should change it to be hosts and paths separate. Hosts is probably enough, unless you find a new file path beyond the usual suspects. Please verify each host actually works though; http://oasis.postech.ac.kr/handle/2014.oak/9965 is not a handle 🙄. AManWithNoPlan (talk) 13:45, 27 June 2019 (UTC)

@AManWithNoPlan:, well, I'm building a massive list with the help of others (e.g. [49]), so I want to know what's the most useful format. Right now, if I have something like

I'll eliminate things that only differ after the /handle/ part, and have something like

and currently have 2169 such paths. Which I could reduce to (after checking that they indeed work inside a {{hdl}})

But I was wondering if there was a way to trim that down further to something more manageable/less redundant. Headbomb {t · c · p · b} 17:05, 27 June 2019 (UTC)

While is is true that some of them probably do not have all these possibilities, I doudt that we would run into a case where http://digilib.gmu.edu/dspace/handle/ works, but http://digilib.gmu.edu/bitstream/handle/ is not a handle but some thing else. So, what I need are three lists:

Protocol: http and https (short list)
Host names (HUGE list)
Suffix list (/handle/, /bitstream/handle/, ....) (medium sized list).

The code can then accept and convert any combination. AManWithNoPlan (talk) 17:22, 27 June 2019 (UTC)

That works. Headbomb {t · c · p · b} 17:29, 27 June 2019 (UTC)

The easy stuff

Protocols: https*

Suffix:\/(dspace|dspace-law|jspui|repository|xmlui)?(\/?bitstream\/)?handle\/

Going to build the host names list. It's in the ballpark of 1228 domains. Headbomb {t · c · p · b} 17:55, 27 June 2019 (UTC)

currently we use a single Regex. I will need to change that. I already have a plan for some simple fast code. AManWithNoPlan (talk) 18:58, 27 June 2019 (UTC)

Code written, now for testing. AManWithNoPlan (talk) 21:19, 27 June 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1856 AManWithNoPlan (talk) 21:20, 27 June 2019 (UTC)

More https://github.com/ms609/citation-bot/pull/1857 AManWithNoPlan (talk) 23:37, 27 June 2019 (UTC)

when you have a host list post the link. AManWithNoPlan (talk) 03:49, 28 June 2019 (UTC)

A preview is in User:Headbomb/Sandbox. User:Betacommand will run a script to see which handle links resolve when put into a {{hdl}}. I'll then be able to give you a list of domains that could be converted. It likely won't cover everything, but it'll probably cover 95%+ of cases. Headbomb {t · c · p · b} 03:56, 28 June 2019 (UTC)

Headbomb Got a final list yet? AManWithNoPlan (talk) 15:31, 3 July 2019 (UTC)

Still chugging at it. The list of HDL urls that didn't work needs manual review still, because some of the servers were only temporarily down and was not in the most convenient of formats. Should have it by the end of the week though. Headbomb {t · c · p · b} 15:47, 3 July 2019 (UTC)

Headbomb Got a final list yet? AManWithNoPlan (talk) 14:27, 19 July 2019 (UTC)

Still working on it. Not forgotten though. I was travelling for a while, then had computer issues (dead PSU) which prevented me from. Hoping to have it done this weekend. Headbomb {t · c · p · b} 16:08, 19 July 2019 (UTC)

Headbomb any progress AManWithNoPlan (talk) 17:28, 16 August 2019 (UTC)

It's still on the to-do list. Headbomb {t · c · p · b} 22:30, 24 August 2019 (UTC)

the year (fiscal) is almost done. AManWithNoPlan (talk) 22:07, 29 September 2019 (UTC)

Headbomb Cough Cough.... AManWithNoPlan (talk) 19:52, 11 October 2019 (UTC)

Flag as {{notabug}} to archive discussion and split off from discussion of getting data. AManWithNoPlan (talk) 19:03, 16 October 2019 (UTC)

Don't change case if linked name is correct/not a redirect

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 22:18, 16 October 2019 (UTC)

What happens: Bot replaced |journal=[[Montana The Magazine of Western History]] with |journal=[[Montana the Magazine of Western History]]
What should happen: Do not change case of journal name if it is linked an article with correct case (if not a redirect)
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Flathead_Lake&diff=prev&oldid=921636148
We can't proceed until: Feedback from maintainers

The behaviour is correct, we use title case, regardless of whatever style the publication uses. The article should be moved to either Montana: The Magazine of Western History or Montana (magazine), per the usual convention for magazine titles. Headbomb {t · c · p · b} 23:23, 16 October 2019 (UTC)

No matter our naming standards, if a magazine is linked and is named a certain way on Wikipedia (there may have been discussions about speicifc artilces), we should not chnage the actual link. We could change to |journal=[[Montana The Magazine of Western History|Montana the Magazine of Western History]] (adding a pipe to the link) in order to not break links, in case there isn't a redirect for that new capitilization. Jonatan Svensson Glad (talk) 10:35, 17 October 2019 (UTC)

The link wasn't broken. Headbomb {t · c · p · b} 13:33, 17 October 2019 (UTC)

In this case no, but we really should have to do a patch-work to see if something brakes or not. Jonatan Svensson Glad (talk) 13:58, 17 October 2019 (UTC)

This doesn't breaks things anymore than changing an unlinked 'Journal Of Foobar' to 'Journal of Foobar' breaks things. If it did, then a redirect is missing somewhere / a page is located at the wrong title, and it would get picked up by WP:JCW/Miscapitalisations. Headbomb {t · c · p · b} 18:19, 17 October 2019 (UTC)

While that is true, in revision 1 the link works, in revision 2 a bot has changed the link to a red-link braking the link. That is not what the bot is either intended to do (brake things), not accepted bot behavior (even if commons sence would accept it). I just feel it is not ok for a bot to change a working link to a possible non-working link. Jonatan Svensson Glad (talk) 18:33, 17 October 2019 (UTC)

The second revision had a working link, not a broken one. Headbomb {t · c · p · b} 18:43, 17 October 2019 (UTC)

This time. Jonatan Svensson Glad (talk) 20:24, 17 October 2019 (UTC)

Don't get title from dead URLs

Status: {{fixed}}
Reported by: Jonatan Svensson Glad (talk) 21:04, 17 October 2019 (UTC)

What happens: The bot fetched the title of http://www.arkive.org/sandy-dogfish/scyliorhinus-canicula/ to be |title=Arkive closure. However, that page is dead and has both an |archive-url= and |url-status=dead
What should happen: The bot should not check the original URL for a title, if that URL has been marked as dead
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Small-spotted_catshark&diff=921779637&oldid=917930836
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2210 AManWithNoPlan (talk) 22:18, 17 October 2019 (UTC)

handle Methods in Molecular Biology better

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 11:52, 14 October 2019 (UTC)

What happens: See bot edit, which has both |journal=Methods in Molecular Biology (Clifton, N.J.) and |series=Methods in Molecular Biology
We can't proceed until: Feedback from maintainers

This is a book series, so should be converted to a cite book, with |series=, and drop |journal=. See whatever you are doing with Methods in Enzymology for reference. Headbomb {t · c · p · b} 11:52, 14 October 2019 (UTC)

It works better after stripping (Clifton, N.J.) from |journal=, but the conversion to cite book isn't complete [50]. Running again converts to cite book [51]. At this point however, it doesn't add chapter/title correctly [52]. Headbomb {t · c · p · b} 11:57, 14 October 2019 (UTC)

I will have to write special code. AManWithNoPlan (talk) 23:29, 17 October 2019 (UTC)

https://github.com/ms609/citation-bot/pull/2212 AManWithNoPlan (talk) 14:45, 18 October 2019 (UTC)

please look and and flag as {{fixed}} or point out more issues. AManWithNoPlan (talk) 16:01, 18 October 2019 (UTC)

caps: eGEMs

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 02:48, 23 October 2019 (UTC)

What should happen: [53]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2213 AManWithNoPlan (talk) 14:44, 25 October 2019 (UTC)

Also remove empty location/place from cite journal

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 15:24, 23 October 2019 (UTC)

What happens: [54]
What should happen: [55]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2214 AManWithNoPlan (talk) 14:50, 25 October 2019 (UTC)

Crossref search truncates name

Status: {{notabug}} -- bad crossref data, which is now fixed
Reported by: Jonatan Svensson Glad (talk) 23:52, 25 October 2019 (UTC)

What happens: It changed from |title=Haunted Travelogue: Hometowns, Ghost Towns, and Memories of War on JSTOR to |title=Haunted Travelogue: Hometowns, Ghost Towns, and Memories of
What should happen: the word War should not have been removed
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=Shigeru_Mizuki&diff=prev&oldid=923042670
We can't proceed until: Feedback from maintainers

Actually it is bad crossref data for doi:10.1353/mec.0.0026 or perhaps bad formatted response AManWithNoPlan (talk) 01:27, 26 October 2019 (UTC)

Hyphens and dashes and accents

I notice lots of edits like this one assisted by Citation bot have created date ranges in titles (e.g. title=Arthur Erdelyi. 2 October 1908-12 December 1977) with unspaced hyphens where the source has a spaced hyphen and Wikipedia style would be to use a spaced en dash. If this hasn't been fixed in recent years, maybe we can work on it. I have no idea what it takes, but will help as I can; I've been fixing a ton of these by hand. That particular example also dropped the accent from Erdélyi; is that expected? Dicklyon (talk) 03:20, 26 October 2019 (UTC)

Those were, I believe, simple imports of the various {{cite doi}} subtemplates. A more recent diff would be better here. Headbomb {t · c · p · b} 03:30, 26 October 2019 (UTC)

If you're saying this is a thing of the past only, I'm happy. If I find a newer one, I'll be back. Dicklyon (talk) 04:33, 26 October 2019 (UTC)

{{notabug}} or {{fixed}}. Impossible to tell which since the meta-data gets better over time and the bit gets better over time. AManWithNoPlan (talk) 11:14, 26 October 2019 (UTC)

Caps: Off

Status: {{fixed}}
Reported by: Headbomb {t · c · p · b} 06:02, 26 October 2019 (UTC)

What should happen: [56]
We can't proceed until: Feedback from maintainers

https://github.com/ms609/citation-bot/pull/2217 AManWithNoPlan (talk) 11:20, 26 October 2019 (UTC)

[1] '^ 30 Rocks

[2] 30 Rock's

[1]

[2]