Wiktionary:Todo

From Wiktionary, the free dictionary
Latest comment: 1 month ago by P. Sovjunk in topic 2022
Jump to navigation Jump to search
This page is for cleanup jobs. Request jobs are at Wiktionary:Task lists.

This page lists cleanup requests affecting multiple entries. These may include updating templates, categories or generic entry structure, but not specific terms, which should be tagged with {{rfc}} and put on WT:RFC. Therefore, tasks that have previously been divided across discussion and user pages are grouped together in one place where they are easier to find.

Frequently updated todo lists

Todo Lists project
Please see WT:Todo/Lists for a set of regularly updated cleanup lists.
JeffDoozan's cleanup lists
See User:JeffDoozan for lists of entries with formatting or layout errors.

Regular tasks

In this section, you will find relatively easy cleanup tasks.

Categories

Updated live.

Special pages

Updated a couple of times a week.

Todo lists

Updated weekly.

Semi-regular tasks

Usually dump-analyzed:

  • Unhelpful abbreviations — These should use the full term.
  • Occasionally, soft hyphens or other invisible/zero-width characters (­|​|‌|‍) sneak into the content of entries or even the pagenames; the soft hyphens should be removed; the other characters should be discussed.
  • People sometimes type {[, }] etc when they mean {{ / }}. It is useful to periodically scan dumps for instances of this. Here is some regex: ([^\[\{]\[\{[^\[\{]|[^\[\{]\{\[[^\[\{]|[^\]\}]\]\}[^\]\}]|[^\]\}]\}\][^\]\}]). Simply searching for ]} will not work, because there are many valid instances of it, e.g. {{m|en|a [[link]]}}.
  • Every few months, check for instances of the common but nonstandard headers "Alternative form", "Alternative spelling" and "Alternative spellings" (which should be "Alternative forms") and "Usage note" (which should be "Usage notes"). Many other nonstandard headers exist, but none are as common as those. Also, no L1 headers should exist in the main namespace (language headers should always be L2, and all other headers should always be L3 or more). See User:Erutuon/mainspace headers for a full list of non-language headers and User:Erutuon/mainspace headers/possibly incorrect for a list of possibly incorrect headers.
  • Check for entries using modifier letters or deprecated IPA characters.
  • Search for (using the site search function) and fix "Etymology 2" -"Etymology 1" and other cases of higher-number etymologies without the full complement of lower-number etymologies.
  • Check for misindented quotations (pages with a line containing {{quote- but not starting with #* or ##*)
  • Check for entries that use Template:sense, Template:a, manual formatting, etc. instead of {{lb}}
  • People, and some other online dictionaries, write /e/ where the actual IPA symbol is /ɛ/, e.g. [1], [2], [3]

To be monitored manually:

Also:

Useful search queries

If the search gives a warning (and even if it doesn't!), see Help:CirrusSearch for ways of making the search much less demanding on the servers and much more likely to provide a complete list of problem entries.

All subpages

Subpages of Wiktionary:Todo :

2013

> This is the list of entries, as of the last database dump, that contain Slovene translations with the gender m ("masculine"). They should most likely be changed to use either m-an (+ "animate") or m-in (+ "inanimate"), since that distinction has grammatical consequences in Slovene. (?)

RuakhTALK 14:34, 11 September 2013 (UTC)Reply

2015

In many cases, these are unnecessary and cause problems. - -sche (discuss) 18:16, 21 January 2015 (UTC)Reply

What are LTR marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)Reply
What are RTL marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)Reply
They are invisible characters that otherwise behave like strongly left-to-right characters (such as Latin letters) or strongly right-to-left characters (such as Arabic letters), in that they influence the direction of surrounding characters that do not have a defined text direction. So they are sometimes used to change the direction of characters in text. For instance, on Wiktionary, where text direction is generally left-to-right, punctuation characters can be forced to render right-to-left by sandwiching them between Arabic letters and a right-to-left mark.
But CSS should be used to change text direction instead, whenever possible. On Wiktionary, we do this by adding classes that have the correct CSS properties: for instance, enclosing Arabic text in class="Arab", which has the CSS direction: rtl; unicode-bidi: embed; applied to it in MediaWiki:Common.css. This is done automatically by most linking templates.
You can read more in w:Left-to-right mark and w:Right-to-left mark and w:Bidirectional text. — Eru·tuon 16:50, 16 October 2019 (UTC)Reply
Regenerated. - -sche (discuss) 14:48, 19 February 2018 (UTC)Reply

A partial list of pages where at least one language section simply states, in plain text, without using {{etyl}}, that it derives from German, French, Latin, Greek, Ancient Greek, Chinese or Spanish. - -sche (discuss) 17:43, 25 January 2015 (UTC)Reply

Regenerated (1469 entries). - -sche (discuss) 14:44, 19 February 2018 (UTC)Reply

A list of entries which are labelled as being Canadian, or American, but not both. It is likely that many should in fact have both labels. See Wiktionary:Beer_parlour/2015/March#North_American_English_vs_Canadian_and_American_English for a bit of background. - -sche (discuss) 05:00, 7 March 2015 (UTC)Reply

Erroneous Greek characters

Any place that the character ϕ is used in place of φ or ϑ in place of θ in a string that is marked as being grc or el should be listed so that an editor can look them over and fix mistakes. I just found one lying around in a {{term}}, which made me think that these shouldn't be overly hard to find. —Μετάknowledgediscuss/deeds 21:01, 12 May 2015 (UTC)Reply

@Metaknowledge: Never knew this page existed. Ironically I came across this why searching for incorrect uses of ϕ. For future reference, here is the search for ϕ and here is the search for ϑ (other incorrect characters are ϖ ϛ ϰ ϱ ϐ ϵ ϲ ϗ ȣ; there may be more). --WikiTiki89 13:20, 21 April 2017 (UTC)Reply
If nothing has been done about this, I can make Module:script utilities search for these characters when it tags text, and add a tracking template or a category. — Eru·tuon 23:50, 20 May 2017 (UTC)Reply
@Metaknowledge, Wikitiki89: Done.Eru·tuon 00:02, 21 May 2017 (UTC)Reply
@Erutuon: It's never done, people will keep adding them. --WikiTiki89 15:03, 22 May 2017 (UTC)Reply
Oh sorry, you were referring to having Module:script utilities search for them. It's not that nothing has been done, I went through and removed over a hundred of these. But again, people will keep adding them. --WikiTiki89 15:05, 22 May 2017 (UTC)Reply
Right. I just found one in polypharmacy... 🙄 — Eru·tuon 18:14, 22 May 2017 (UTC)Reply
User:Erutuon, should we add an actual cleanup category to entries using these? I just cleaned up hypophora, which had no indication on the page itself (that I noticed) of the problem (though someone who knew where the tracking template was could find the page). I'm going at ask in the WT:GP if we could catch these with an edit filter. - -sche (discuss) 11:13, 12 November 2022 (UTC)Reply
@-sche: Unfortunately that isn't a good idea because adding a category will cause changes in parsing in certain cases. We don't do that at all when language-tagging text at the moment and so language-tagged text can be used in cases where a link wouldn't be allowed. For instance, if language-tagged text is inside the text of a page link ([[some page|{{lang|grc|ϑ}}]]), adding a category link (the equivalent of [[some page|{{lang|grc|ϑ}}[[Category:Bad Ancient Greek text]]]]) next to it would break the page link. — Eru·tuon 21:28, 13 November 2022 (UTC)Reply
As of a while ago, I implemented (with an IP's help) one filter which warns against a few of the most-wrong of the characters above, which has already helped some users to replace them before saving their edit, and another filter which silently tracks all of the characters. - -sche (discuss) 02:38, 2 August 2023 (UTC)Reply

Not click characters

All over the dictionary, e.g. in the name and content of !nawas and in this translation, ! turns up for ǃ, and I wouldn't be surprised to find other substitutions for click consonants. The best way I can think of to find such uses is: create a list of all languages that use clicks, or as a presumably easier-to-make approximation of that a list of all Khoisan languages, then search a database dump for all translations, language sections, and {{m}}/{{l}}s of those languages that contain !. I've just cleaned up the few pages which misused ! in their pagenames (only 31 pages on Wiktionary used ! in their pagenames at all). - -sche (discuss) 18:42, 25 August 2015 (UTC)Reply

2017

Check IDs

As discussed at Wiktionary:Grease pit/2017/May § Adding ids to enable linking to headwords, we need to check for sense ids in {{senseid}} and the |id= parameter of headword templates that are on the same page and have the same language and have the same id string: that is, those that would create the exact link when input into an entry linking template. Each sense id for a given language on a given page should be unique. — Eru·tuon 16:57, 19 May 2017 (UTC)Reply

Usage note template naming

User:-sche/Usage note templates lists some usage-note templates which could be moved to fit our usual naming scheme, as described on the page and [5]. - -sche (discuss) 22:01, 26 May 2017 (UTC)Reply

Possibly mislabeled affixes

Wiktionary:Todo/interfixes: These look like interfixes, but are labelled "prefixes" or "suffixes". - -sche (discuss) 19:57, 8 June 2017 (UTC)Reply

Regenerated (per request on my talk page). Note that some, e.g. for Navajo, may be fine as they are. - -sche (discuss) 03:34, 15 February 2020 (UTC)Reply

Pronunciation audio files

User:DerbethBot/Add manually: DerbethBot adds pronunciation files to entries, but some audio files need to be added manually. (See also User:DerbethBot for more info.) -- Curious (talk) 12:00, 11 June 2017 (UTC)Reply

2018

Entries where label language does not match entry language. – Jberkel 00:01, 28 February 2018 (UTC)Reply

Quite a few entries with usage notes like this are labelled {{lb|en|law}}, but are in fact in general use and not at all restricted to legal jargon (so the label should be removed). - -sche (discuss) 00:10, 23 December 2018 (UTC)Reply

2022

You can help repair the broken links to Wikipedia, Wikispecies, Wikimedia Commons and Wikisource at the subpages of User:This, that and the other/broken interwiki links. For each page listed, one of the following three things should be done: (1) correct the spelling, pluralisation, lowercase/uppercase of the link, add a |lang= parameter etc., (2) remove the link template altogether if not appropriate, or (3) create a redirect on the other wiki (many redirects on other projects were valid but have since been deleted). This, that and the other (talk) 03:14, 2 February 2022 (UTC)Reply

See above. 70.172.194.25 00:59, 1 April 2022 (UTC)Reply

See the description on the subpage itself. 70.172.194.25 19:55, 10 April 2022 (UTC)Reply

Invocations of templates where the first parameter is a language code, but it does not match the language header. Similar to the above, but captures a wider range of templates. This, that and the other (talk) 10:20, 7 May 2022 (UTC)Reply

To find compound terms not linked Dunderdool (talk) 21:39, 24 July 2022 (UTC)Reply

Terms from Webster's 1913 dictionary

Thousands of them are at Category:Webster 1913 (and have been around since almost the beginning of Wiktionary!). Often only one or two terms in Webster's dictionary have not been assimilated and modernized into Wiktionary, sometimes more. GreyishWorm (talk) 17:51, 22 October 2022 (UTC)Reply

>21,000 as of today. GreyishWorm (talk) 15:11, 12 November 2022 (UTC)Reply
<17,000 Ñobody Elz (talk) 08:25, 5 June 2023 (UTC)Reply
<16,000 Creeps like you (talk) 12:29, 2 July 2023 (UTC)Reply
<15,000 Worm spail (talk) 17:09, 28 August 2023 (UTC)Reply
<14,000 Denazz (talk) 08:07, 20 December 2023 (UTC)Reply
<13,000 Denazz (talk) 21:11, 23 February 2024 (UTC)Reply
<12,000 P. Sovjunk (talk) 07:11, 29 April 2024 (UTC)Reply
<11,000 Denazz (talk) 13:32, 21 July 2024 (UTC)Reply
<10,000 Denazz (talk) 13:05, 10 September 2024 (UTC)Reply
<9,500 P. Sovjunk (talk) 21:51, 30 September 2024 (UTC)Reply
<9000 P. Sovjunk (talk) 13:21, 15 October 2024 (UTC)Reply

2023

Shorten them, or convert to quotations. This, that and the other (talk) 01:24, 19 June 2023 (UTC)Reply

Many undated quotes. Chioshio (talk) 03:10, 19 June 2023 (UTC)Reply

"Raw" inflection tables in entries

Numerous entries contain hard-coded, non-templated inflection tables. Languages especially affected include Hunsrik, Pennsylvania German, Albanian, Old Marathi, and Sanskrit. Some of them have probably been subst'ed by accident, but in other cases, no inflection template exists. The development of a new one will be necessary.

See the search, which currently returns 198 pages. This, that and the other (talk) 05:45, 14 August 2023 (UTC)Reply

In the English Wikipedia I instituted a system whereby certain accidentally substituted templates (cleanup tags) were easily de-substituted. I think someone else improved it so that they de-substituted themselves, though this required some magic somewhere. Rich Farmbrough, 15:43, 13 December 2023 (UTC).Reply

[6] This, that and the other (talk) 11:23, 3 October 2023 (UTC)Reply