Context:
- Wikibase Core
Problem:
There are various lists defining language codes for Wikibase. Some values in these lists are not suitable for the termbox on Wikidata.org (codes for labels/descriptions/aliases). See T44396 for the general problem. Despite efforts to clean this up by bot, the numbers are currently at 500,000 (June 2021, see T44396#7150919).
Suggested solution:
The idea is to add a configuration variable that allows disabling language codes for the termbox that are not suitable in this context.
This doesn't touch the use of these language codes for other purposes (like lexemes or monolingual strings). For the latter we have a similar implementation already, see DifferenceContentLanguages with DefaultMonolingualTextLanguages
Example:
- Sample: "no" isn't used on Wikidata, but is in the domain name for no.wikipedia.org . Wikidata uses just "nb".
- Other codes for an initial version of the variable: 'bat-smg' (→'sgs'), 'bh' (→'bho'), 'fiu-vro' (→'vro'), 'roa-rup' (→'rup'), 'simple' (→'en'), 'zh-classical' (→'lzh'), 'zh-min-nan' (→'nan'), 'zh-yue' (→'yue'), 'be-x-old' (→ 'be-tarask'), 'shy' (→ 'shy-latn'), 'de-formal', 'es-formal', 'hu-formal', 'nl-informal'
Acceptance criteria:
- there is a configuration variable that allows disabling language codes for "labels, descriptions, and aliases" (everywhere including in the API, Special:SetLabel, etc.)
- there should be a default configuration that makes sense for Wikibase instances in general (e.g. including "simple")
- edge cases are cared for
- Deletion of existing disabled language code values should still be possible
- Reverts should still be possible, even if a disabled language code was used in the old revision.
-
When a disallowed code is used as the UI language, item labels, the page title and termbox all use the correct language instead (e.g. UI language de-formal should behave the same as de)(see discussion below)
Original:
There are various lists defining language codes for Wikibase. Some values in these lists are not suitable for the termbox on Wikidata.org. See T44396 for the general problem.
The idea is to add a configuration variable that allows to disable such language codes. Deletion of existing values should still be possible.
- Sample: "no" isn't used on Wikidata, but is the domain name for no.wikipedia.org . Wikidata uses just "nb".
- Other codes for an initial version of the variable: "bat-smg", "bh", "fiu-vro", "roa-rup", "simple", "zh-classical, "zh-min-nan", "zh-yue", 'de-formal', 'es-formal', 'hu-formal', 'nl-informal',
Despite efforts to clean this up by bot, the numbers of are currently at 500,000 (June 2021, see T44396#7150919).
This doesn't touch their use for lexemes or monolingual strings. For the later, see DifferenceContentLanguages with DefaultMonolingualTextLanguages