Wikidata:Property proposal/word with diacritical signs

From Wikidata
Jump to navigation Jump to search

representation with diacritical signs

[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionIn some languages that texts usually written without diacritical signs, some words have a diacriticked form (usually indicating pronunciation).
Data typeMonolingual text
Domainform
Allowed valuesdiacriticked word
Example 1liber (L6601) → līber (Note: There're a Latin word "liber" with diacriticked form "liber"; they are different)
Example 2كِتَاب (L2233) → كِتَاب (Note: The representation currently use word with diacritical signs. Probably we need another property "word without diacritical signs"?
Example 3guossi (L22017) → guosˈsi

Motivation

This is important in some languages, like Latin and Arabic. GZWDer (talk) 13:02, 14 July 2018 (UTC)[reply]

Note I found there's already Wikidata:Property proposal/Vocalized form. But in Latin diacritical marks mark length of vowel, not type of vowel.--GZWDer (talk) 13:05, 14 July 2018 (UTC)[reply]

Discussion

 Comment shouldn't this be named "representation with diacritical signs" rather than "word" and according to naming convention of interface? KaMan (talk) 13:13, 14 July 2018 (UTC)[reply]
 Comment I'm not convinced this is really generalizable; however having it monolingual text rather than form-valued may be a better solution than what was proposed in Wikidata:Property proposal/Vocalized form. Have you tried doing this with the representation system that forms have now? I.e. add the diacritic form as one representation in the language, and then the non-diacritic form as another in a variant of the language (or vice versa)? It might require creating an item for the language variant with/without diacritics, if we don't have that already, but that seems a reasonable way to do it. ArthurPSmith (talk) 14:05, 16 July 2018 (UTC)[reply]
 Support adding this as form instead doesn't seem suitable.
--- Jura 06:34, 26 July 2018 (UTC)[reply]
nobody suggested representing this as a separate form. The suggestion is to represent this as a spelling variant of a form's representation (or of a lexeme's lemma). -- Duesentrieb (talk) 14:12, 30 July 2018 (UTC)[reply]
Ok. Isn't this closer to the use case for IPA? (which has a separate property).
--- Jura 14:15, 30 July 2018 (UTC)[reply]
You are right that there is a gray area here. I think that pragmatically, IPA is not treated as a "spelling" (i.e. there are no books written in it), while Hebrew-with-vowels is a spelling. Transliterations may or may not be "spellings", depending on how commons they are, and how frequently and in what context they are used. I guess the distinction in my mind is: if it's used to write text, it's a spelling. If it's only used as an aid in dictionaries and such, then it's not. But I agree that this i9s not a very clear distinction. -- Duesentrieb (talk) 14:38, 1 August 2018 (UTC)[reply]
The samples above only include Latin and Arabic and I had in mind the first one only. It's not clear if yours is covered at all.
--- Jura 14:49, 1 August 2018 (UTC)[reply]
I think for Latin, the suggested alternative would be fine. The exception might be when we want to reference it explicitly. @Duesentrieb: is there a non-QID language code available?
--- Jura 06:40, 24 August 2018 (UTC)[reply]
 Comment I agree with KaMan's comment that it shouldn't have "word". Moreover, it doesn't have to be diacritics, some languages also use other signs. Look at wikt:guossi for example, where ˈ is added. en.Wiktionary calls this the "display form", but that's rather vague for a Wikidata property. Perhaps "pronunciation respelling"? Then it can also be used for w:enPR and the likes. Or maybe this can just be included as "pronunciation" with a qualifier as to what scheme is being used. Rua (talk) 11:58, 14 September 2018 (UTC)[reply]
I've added a Northern Sami example, and I've taken the liberty of changing the name of the property to "representation with diacritical signs", to indicate that it's really a variant of the form's representation, and that it doesn't apply only to single words. I'm still not sure about the use of "diacritical sign", given that the Sami example does not involve a diacritic. —Rua (mew) 17:41, 18 September 2018 (UTC)[reply]
@Rua: What about name "alternative representation" with qualifier of (P642) set to diacritic (Q162940) or any other suitable value? KaMan (talk) 08:27, 19 September 2018 (UTC)[reply]
 Oppose as currently proposed for two reasons: Firstly, I think most of the things mentioned here should be separate spelling variants (like شُبَاط (L8661) does it) because they are actually used in some types of writing. Secondly, diacritical marks are used in a variety of ways in different languages, so the property would not have a clear meaning. Where spelling variants are not appropriate, I think it's better to have properties which are defined based on the purpose (e.g. full representation of vowels) rather than the way it's done (e.g. with diacritics). - Nikki (talk) 09:45, 19 September 2018 (UTC)[reply]
What solution would you suggest for the case of Latin and Northern Sami, which are given as examples here? —Rua (mew) 10:30, 19 September 2018 (UTC)[reply]
@Nikki: I experimentally added inflexion to uranium (L22579). That's how it should look like in your opinion? KaMan (talk) 11:36, 19 September 2018 (UTC)[reply]
Here is one possibility for Northern Sami: guossi (L22017). The second spelling uses pronunciation respelling (Q7249970). This method can work for other languages too. —Rua (mew) 18:24, 19 September 2018 (UTC)[reply]
I just created normalised spelling (Q56669831), which could also be used. —Rua (mew) 19:47, 19 September 2018 (UTC)[reply]

 Not done No support.--Micru (talk) 09:48, 22 December 2018 (UTC)[reply]