Wikipedia:Bots/Requests for approval/KadaneBot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Kadane (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:10, Tuesday, March 19, 2019 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): Python
Source code available: Not published yet
Function overview: Tags redirects with {{R to disambiguation page}}, {{R from unnecessary disambiguation}}, and {{R from incomplete disambiguation}} if it meets criteria described in function details.
Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Tag_with_Template:R_from_unnecessary_disambiguation
Edit period(s): Monthly
Estimated number of pages affected: ~56,417 first run
Exclusion compliant (Yes/No): No
Already has a bot flag (Yes/No): Yes
Function details:
Note: This BRFA only covers the functionality mentioned in Case 2. Case 1 and Case 3 have been stricken
Case 1:
If a redirect exists
Foo (bar) -> Foo
where bar does not equal disambiguation AND Foo is NOT a disambiguation page, then tag Foo (bar) with {{R from unnecessary disambiguation}}
Currently 39,963 articles fit this case
Case 2:
If a redirect exists
Foo (bar) -> Foo
where bar does not equal disambiguation AND Foo is IS a disambiguation page then tag with {{R from incomplete disambiguation}}.
Currently 16,427 articles fit this case
Case 3:
If a redirect exists
Foo (disambiguation) -> Foo
AND Foo is a disambiguation page AND Foo (disambiguation) is NOT malformed, then tag Foo (bar) with {{R to disambiguation page}}
Currently 27 articles fit this case
The following functionality/logic exists for all 3 cases:
- If the redirect page is already tagged {{R with possibilities}}, {{R to disambiguation page}}, {{R from unnecessary disambiguation}}, or {{R from incomplete disambiguation}} skip
- If the redirect page is in Category:Printworthy redirects skip
- For Case 2: If these templates are present replace with {{R from incomplete disambiguation}}.
- If a redirect exists
Foo (disambiguation) -> Foo
and disambiguation is malformed log to User:KadaneBot/Task3/Malformed disambiguations - In any case that results in adding a redirect template to a page, if there will be 2 or more redirect templates nest tags in {{Redirect category shell}}.
Discussion
edit- A sample of 1000 edits the bot would make (under current functional details) along with the template it would add to the page is listed at User:KadaneBot/Sandbox Kadane (talk) 16:11, 19 March 2019 (UTC)[reply]
Comment @Kadane: The following should be tagged as {{R from incomplete disambiguation}} instead of {{R from unnecessary disambiguation}}
Those can be identified by the landing page being a disambiguation page.
This one should be skipped, or tagged with something else (investigating)
These ones should be skipped as malformed DAB pages (missing space, capital D), but collecting them so they can be RFD's would be good.
- 212th Division(disambiguation) → 212th Division
- 2nd Avenue (Disambiguation) → 2nd Avenue
- A&B (Disambiguation) → A&B
Headbomb {t · c · p · b} 17:11, 19 March 2019 (UTC)[reply]
- Okay I have updated the functional details of the bot to fix the cases you brought up. I will update the table of edits when I make it home. Kadane (talk) 19:23, 19 March 2019 (UTC)[reply]
- @Headbomb: I have uploaded new edits to User:KadaneBot/Sandbox. It contains 100 edits of each of the cases, with the exception of {{R to disambiguation page}} which only has 22 edits total. I have also included all of the malformed disambiguation pages (these will not be modified by the bot, just included in the log). Kadane (talk) 05:48, 20 March 2019 (UTC)[reply]
Better, although
- 02 (album) → 02
- 03 (album) → 03
- 1. Liga (football) → 1. Liga
- 118th Regiment of Foot (1761) → 118th Regiment of Foot
Should be tagged with {{R from incomplete disambiguation}} instead of {{R from unnecessary disambiguation}}. Headbomb {t · c · p · b} 09:31, 20 March 2019 (UTC)[reply]
- @Headbomb: - There was an error in my CSV parsing from the database dump. I forgot to set the parameter
quoting=csv.QUOTE_NONE
, which resulted in some lines being skipped when the database query was being scanned. Because of this some articles and disambiguation pages were being ignored. This is fixed now. I clicked through most of the cases and I can't find any errors. User:KadaneBot/Sandbox is updated. Kadane (talk) 15:17, 20 March 2019 (UTC)[reply]
- Of all cases, the following aren't really disambiguation pages.
- .hack//G.U. (Volume 1: Rebirth) → .hack//G.U.
- 112th Special Operations Signal Battalion (Airborne) → 112th Special Operations Signal Battalion
- 104th Regiment Royal Artillery (Volunteers) → 104th Regiment Royal Artillery
- 105th Regiment Royal Artillery (Volunteers) → 105th Regiment Royal Artillery
Maybe a full list should be created so we can purge all cases that shouldn't be tagged. Everything else look fine though. Headbomb {t · c · p · b} 18:03, 20 March 2019 (UTC)[reply]
- To save time, that full list to review could exclude things that end in
\s\(.* (album|song|single|EP|soundtrack|network|channel|episode|series|film|journal|magazine|website|company|publisher|newspaper|company|station|decade|numeral|number|game|novel|book|gene)\)
since those are safe. Headbomb {t · c · p · b} 21:02, 20 March 2019 (UTC)[reply]
- To save time, that full list to review could exclude things that end in
- See
User:KadaneBot/Task3/Case 1 for {{R from unnecessary disambiguation}}
- See
User:KadaneBot/Task3/Case 2 for {{R from incomplete disambiguation}}
- See
User:KadaneBot/Task3/Case 3 for {{R to disambiguation}}
Kadane (talk) 21:52, 20 March 2019 (UTC)[reply]
- Case 3 are all fine, I'll review Case 1 and 2. Headbomb {t · c · p · b} 22:09, 20 March 2019 (UTC)[reply]
- Actually Always(song)) and a few others with )) are malformed. Headbomb {t · c · p · b} 22:12, 20 March 2019 (UTC)[reply]
- Case 3 are all fine, I'll review Case 1 and 2. Headbomb {t · c · p · b} 22:09, 20 March 2019 (UTC)[reply]
So are
Extended content
|
---|
Headbomb {t · c · p · b} 22:19, 20 March 2019 (UTC)[reply]
- Ah I was under the impression that we only checked malformed disambig on case 3 (when name ends with (disambiguation)). Updated the logic to check for malformed disambigs for all cases. Kadane (talk) 22:37, 20 March 2019 (UTC)[reply]
There are actually a few more, which I've sent to RFD.
Headbomb {t · c · p · b} 22:49, 20 March 2019 (UTC)[reply]
@Kadane:, actually could you break User:KadaneBot/Task3/Case 1 in sections of 100 KB tops? Those pages are pretty slow to load/edit (I have scripts that classify type of links, which slow down these pages considerably). Headbomb {t · c · p · b} 23:06, 20 March 2019 (UTC)[reply]
- Done @Headbomb: Also I am catching disambiguation misspellings as well as other words appearing next to disambiguation between parenthesis. If there are any other misspellings they should probably be excluded manually unless there is a pattern. Kadane (talk) 23:15, 20 March 2019 (UTC)[reply]
Could you also break down redirects into 'species', e.g. all those ending with \s\(*album\) into a subpage (or section), all those ending with \s(*song\) into another, and so on (and everything else considered "Other")? At least for endings in
- \d (i.e. ends with digits, like Typhoon Haikui (2012)); album; AM; band; book; channel; comics; company; company; cricketer; decade; district; EP; episode; film; FM; footballer; game; gene; Germany; German Empire; journal; magazine; name; network; newspaper; novel; number; numeral; politician; publisher; series; show; single; song; soundtrack; station; United States; video; website
All case insensitive. Headbomb {t · c · p · b} 23:18, 20 March 2019 (UTC)[reply]
- @Kadane: and could you also put the target page in those lists? Headbomb {t · c · p · b} 23:21, 20 March 2019 (UTC)[reply]
- I am on my way to class but I can do that in a couple hours. Kadane (talk) 23:23, 20 March 2019 (UTC)[reply]
Okay all edits have been sorted by 'species' and a list of all pages can be found here. @Headbomb: Kadane (talk) 00:09, 23 March 2019 (UTC)[reply]
Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. - Let's start with everything in User:KadaneBot/Task3/Edits/other/Case_3. This is something that could safely be automated. Make sure to run on the most version of the pages, since things may be updated. Headbomb {t · c · p · b} 00:11, 23 March 2019 (UTC)[reply]
- Headbomb - Come to find out Task 3 is already taken care of by RussBot and it ran through and tagged every article in case 3 with {{R to disambiguation}}. I could run another database query to see if there are any cases that RussBot has missed, but a task for case 3 seems redundant. What do you think?
- Also I made 1 trial edit[1] which resulted in an error because of a misplaced quotation mark in my code. Going forward it will check (correctly) to see if the category has been added since the last database scan. Kadane (talk) 01:20, 23 March 2019 (UTC)[reply]
- If Case 3 is taken care of by RussBot, then let's leave it to RussBot. We can revisit this if RussBot goes dead. Let's trial case 2 on everything in User:KadaneBot/Task3/Edits/newspaper/Case 2 then. Headbomb {t · c · p · b} 01:23, 23 March 2019 (UTC)[reply]
I have completed the trial edits [2] [3] [4]. The rest were false positives. I am hesitant to mark the trial as done with only 3 edits.
May I suggest trialing either User:KadaneBot/Task3/Edits/cricketer/Case 2 (135 edits), User:KadaneBot/Task3/Edits/footballer/Case 2 (60 edits), or User:KadaneBot/Task3/Edits/politician/Case 2 (40 edits)? Kadane (talk) 01:47, 23 March 2019 (UTC)[reply]
- I picked that category on purpose to see how it would handle those cases and not blow everything up. Side note [5]/[6]/[7] this is a much much better format. And while you don't have to do this, when making edits, you might as well add [8] if you find a #Whatever in the redirect. Headbomb {t · c · p · b} 01:51, 23 March 2019 (UTC)[reply]
- For a follow up trial, you can do 25 edits in User:KadaneBot/Task3/Edits/other/Case_2/1. Headbomb {t · c · p · b} 01:59, 23 March 2019 (UTC)[reply]
- Trial complete. - All edits are here [9]. There was one error [10], which added {{R from section}} when it shouldn't have. I fixed this and subsequently tested it [11]. The whitespace looks off, but that is because the template {{Redirect category shell}} already exists and the white space was already malformed from my removal. The bot also edited from another 'species' [12], [13], [14], [15], and [16]. This was operator error. My database isn't structured by species and the view and edit code are separate. I had to introduce new code to just edit the 'other' species since there is no specific regex for an article that fits into other. Kadane (talk) 03:10, 23 March 2019 (UTC)[reply]
- You can do the rest of User:KadaneBot/Task3/Edits/other/Case_2/1/User:KadaneBot/Task3/Edits/other/Case_2/1 to see if all the kinks are worked out. Headbomb {t · c · p · b} 03:14, 23 March 2019 (UTC)[reply]
Small whitespace issues: [17], [18]. Headbomb {t · c · p · b} 04:55, 23 March 2019 (UTC)[reply]
- Dupe disambiguation category: [19], [20]. Also [21].Headbomb {t · c · p · b} 05:00, 23 March 2019 (UTC)[reply]
- Okay I have implemented logic to fix everything you have put here so far except for the whitespace issue. I am not quite sure how to fix that using MWParserFromHell. It only affects a small number of pages, if this is something that needs to be fixed I will figure something out in the coming days. Kadane (talk) 05:21, 23 March 2019 (UTC)[reply]
- One more: [26] (see all aliases)Headbomb {t · c · p · b} 05:23, 23 March 2019 (UTC)[reply]
For the whitespace issue, I think you can have something similar to \}\}\n+\{\{
→ }}\n{{
and \n\n+
→ \n\n
. Headbomb {t · c · p · b} 05:29, 23 March 2019 (UTC)[reply]
- @Kadane: if you're ready to continue trial, you can tackle User:KadaneBot/Task3/Edits/other/Case_2/3.Headbomb {t · c · p · b} 23:43, 27 March 2019 (UTC)[reply]
- Okay everything is ready. I have several deadlines in the coming days and will run the trial when real life permits. Should be no later than Saturday 6th and I am hoping that it's much earlier than that. Kadane (talk) 01:16, 28 March 2019 (UTC)[reply]
- Trial complete. @Headbomb: Here are the edits from the bot trial. I started the trial off on an old version of the source which resulted in an error in the first 5 edits. I reverted this edit, restarted, and the bot worked as expected ([27]). Also during the trial I realized that there may be an issue with [28] and [29]. The bot will now skip pages in Category:Printworthy redirects or containing the template {{R with possibilities}}. I have updated the functional details. Kadane (talk) 00:08, 15 April 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.