Page MenuHomePhabricator

Suffixed keys like "name:sr-Latn", specific to one language, are used to latinize other languages
Closed, ResolvedPublic

Description

If OSM key "name:sr-Latn" is used outside Serbia then usually it is applicable only in Serbian language. On Wikimedia maps however it is currently used to latinize all other languages as well. This also concerns keys like "name:ja_rm" and "name:ko-Latn" that are sometimes used outside Japan and Korea, respectively.

It would be expected that local "name" key value is used instead, especially if it already is in Latin script. As a workaround, to avoid aforesaid latinization attempt, every place name could be entered in every language in OSM, but this shouldn't be necessary and it is generally unwanted if name variant used in particular language is identical to "name" key value.

An example of the internationalized map failing:
For the Hungarian city Szeged the international maps in English as well as Norwegian shows the name of the town https://en.wikipedia.org/wiki/Szeged as Segedin. It looks from the OpenStreetMap (see https://www.openstreetmap.org/node/30453579) and uses "name:sr-Latn" value.
Note that there is no name:en tag for that node.

The map in Norwegian:
https://maps.wikimedia.org/?lang=nb#13/46.2472/20.1855
This is also how it appears without any "lang" setting in no.wikipedia

Without any "lang" setting the name is presented as Szeged for maps.wikimedia.org.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added subscribers: jeblad, Danmichaelo, jhsoby, Aklapper. · View Herald Transcript
Vvjjkkii renamed this task from Map internationalization: Szeged shown as Segedin to thcaaaaaaa.Jul 1 2018, 1:08 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from thcaaaaaaa to Map internationalization: Szeged shown as Segedin.Jul 2 2018, 3:58 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Jhernandez edited projects, added Maps (Kartotherian); removed Maps.
Jhernandez subscribed.

Moving to kartotherian, if it should be on tilerator or maps-style move it.

This task needs checking if this is a bug or if it is just the complex language fallback chain doing its thing.

For lang=en or no lang set it currently shows Szeged on maps.wikimedia.org. I tried setting lang= for serveral other languages, with or without any fallbacks shown at fallbacks.json, and no localized name defined at n30453579. They all showed up as "Segedin" ("name:sr-Latn" key value).

Lately I came across another example: before I added name:et key to OSM object n424314631 it showed up as "Columbia" for lang=et (still does for zoom levels 0–9 that haven't been updated for a while). No fallbacks for "et", OSM object has "Columbia" as key value for four different languages.

It seems if name is not translated for OSM object then localized Wikimedia tile shows name in more or less random translated language for which the key value may or may not match "name" key, while it should pick up "name" key value if there are no fallbacks.

Edit: Another significant example: for several languages currently not set at n107775 and without fallbacks (like da, et, hr, id, sl) London displays as "Londra".

Note that there is no name:en tag for that relation.
It seems if name is not translated for OSM object then localized Wikimedia tile shows name in more or less random translated language

So this seems a "content" (or at best l10n) issue, not an i18n issue, no? Meaning, you have to fix the data not how the software uses it. If neither the desired language nor any of the fallback languages is available, I'm not sure what else we could do. Use the autonym/most common language of the place's country (even if it's not in the desired script)?

If neither the desired language nor any of the fallback languages is available, I'm not sure what else we could do. Use the autonym/most common language of the place's country (even if it's not in the desired script)?

The software shouldn't try to latinize the name in this manner and it should use the default "name" key value which already is in Latin script. I've tried to outline underlying language picker issues in T230013.

Pikne renamed this task from Map internationalization: Szeged shown as Segedin to Suffixed keys like "name:sr-Latn", specific to one language, are used to latinize other languages.Sep 14 2021, 7:35 AM
Pikne updated the task description. (Show Details)
Pikne added subscribers: GoEThe, Izno.

If anyone wants to fix this, this logic is in LanguagePicker.js of https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/kartotherian/+/refs/heads/master/packages/babel/#language-resolution

Probably rule 3 should be adapted to not fallback to another script if the script of xx is Latn ? Or maybe we should simply remove the entire script fallback logic as being too error prone ?

It has quite a few testcases, this should be an easy one to fix

Another example: https://maps.wikimedia.org/#15/40.7379/-73.9819 displays the default English names for places in New York

https://maps.wikimedia.org/?lang=en#15/40.7379/-73.9819 displays many names in Serbian instead of using the default value.

Probably rule 3 should be adapted to not fallback to another script if the script of xx is Latn ? Or maybe we should simply remove the entire script fallback logic as being too error prone ?

I would vote for removing Rule 3 entirely. Skipping it for latin languages would solve the problem for users of latin languages, but not for users of non-latin languages. I would think that, except for the fallbacks explicitly laid out in fallbacks.json, showing the native name is preferable to a random name that happens to use the same script.

Pinging @Mooeypoo and @Yurik, whose names are on this commit

Change #1030307 had a related patch set uploaded (by Ahecht; author: Ahecht):

[mediawiki/services/kartotherian@master] LanguagePicker.js: Remove fallback Rule 3

https://gerrit.wikimedia.org/r/1030307

I would vote for removing Rule 3 entirely. Skipping it for latin languages would solve the problem for users of latin languages, but not for users of non-latin languages. I would think that, except for the fallbacks explicitly laid out in fallbacks.json, showing the native name is preferable to a random name that happens to use the same script.

Is it clear that removing rule 3 is a better option than modifying it? If rule 3 is removed, what would be the result for English users for locations with a non-Latin name field, no name:en field, and a "Latn" name? That is certainly common in Serbia where a lot of places have Cyrillic names (random example, a military base: https://www.openstreetmap.org/way/43108890).

The only other alternative that would behave as intended for cases like that Cyrillic military base would be to write an algorithm that would analyze the default name and try to determine the predominant character set, and then only apply Rule 3 if it doesn't match the current language. There are all sorts of corner cases we'd have to deal with, like https://www.openstreetmap.org/relation/4610904 where the default name included both Latin and non-latin versions, which may be preferable to a hypothetical Serbian version.

While that may be a good long-term goal, I can't see that sort of solution happening quickly given the pace of WMF software development, and in the meantime we have maps across the Wikis that are displaying names in Serbian for no good reason.

How about special-casing Serbian to use sr-Latn?

If rule 3 is removed, what would be the result for English users for locations with a non-Latin name field, no name:en field, and a "Latn" name? That is certainly common in Serbia...

Then these places would be treated the same as places in almost all other countries where the main name key values are in non-Latin language. Cyrillic name would be expected if name:en isn't present. Though, currently out of place=* nodes (relevant to Wikimedia localization) in Serbia about 47% of those with name:sr-Latn key also have name:en key. So English users would probably see names for more prominent/major places in Serbia still in English.

Similarly name:ko-Latn and name:ja-Latn wouldn't be used for places in Korea and Japan, and non-Latin name would be displayed for a Latin-script language if explicit name:<lang> key isn't present for given language. That said, I agree it is still more adequate to skip/remove current rule 3 as long as it isn't applied only for places in relevant countries and/or taking into account the script of main name key value.

How about special-casing Serbian to use sr-Latn?

This probably can be done via fallbacks.json. Current rule 3 of the language picker apparently messes things up for Serbian language as well, e.g. currently this map in sr.wikipedia using Cyrillic language variant (sr-ec) displays Latin names.

How about special-casing Serbian to use sr-Latn?

This probably can be done via fallbacks.json. Current rule 3 of the language picker apparently messes things up for Serbian language as well, e.g. currently this map in sr.wikipedia using Cyrillic language variant (sr-ec) displays Latin names.

I think this is the better approach indeed. It seems much simpler and future proof.

How about special-casing Serbian to use sr-Latn?

This probably can be done via fallbacks.json...

I think this is the better approach indeed. It seems much simpler and future proof.

What I meant is that language variant code sr-el used in Serbian Wikipedia should probably fall back to sr-Latn. This as I understand is a side issue that wouldn't concern English and other languages. It could be done in addition to removing rule 3 of the language picker.

For the record: I reviewed https://gerrit.wikimedia.org/r/1030307 and left a longer comment explaining why I think it should be merged. My conclusion is effectively the same as @Ahecht's:

[…] showing the native name is preferable to a random name that happens to use the same script.

I don't know if this is the same issue: T366136

For a few days I’ve been noticing that all of the neighborhood labels are rendered in Serbian for ~every map of Manhattan on English Wikipedia, which seems like pretty high impact

For example the infobox on https://en.wikipedia.org/wiki/One_World_Trade_Center currently renders like this:

Screenshot 2024-06-05 at 4.40.16 PM.png (638×636 px, 252 KB)

Change #1030307 merged by jenkins-bot:

[mediawiki/services/kartotherian@master] LanguagePicker.js: Remove fallback Rule 3

https://gerrit.wikimedia.org/r/1030307

This still needs to be deployed. The most recent deploy of Kartotherian code was done in August 2023 per T329924

After last deployment here is the output of the img that has the wrong variant in prod

image.png (931×1 px, 656 KB)

If we GET the same URL directly from the kartotherian server without any caching the output looks fixed:

image.png (844×1 px, 626 KB)

Lets wait for caching to expire and see if the problem get fixed.

Jgiannelos claimed this task.