User talk:Atitarev
Deleted entries
[edit]I noticed you deleted the pages 빛의 and 빛에 관한. Is there a particular reason, or was it an accident? Spacestationtrustfund (talk) 07:00, 26 July 2022 (UTC)
- @Spacestationtrustfund: No, not an accident. They are both WT:RFDCJK, in case you missed and are clear SoP's. I may have rushed (there should be longer time), though but not sure. These cases are too obvious and will be deleted, anyway. If you want to translate photic, you should use non-SoP links, like 빛의 (ko) (bichui), which will link to lemma 빛 (ko) (bit). 의 (-ui) is a particle and not part of the word.
- User:Fish bowl voted "Delete swiftly" in both cases. I agree: Wiktionary:Requests_for_deletion/CJK#빛의 and Wiktionary:Requests_for_deletion/CJK#빛에_관한 Anatoli T. (обсудить/вклад) 07:11, 26 July 2022 (UTC)
Talkpage deletion
[edit]Hi, don't you think it might be better to archive your talkpage instead of deleting it every once in a while? People might want to revisit some older conversations from time to time, and not everyone has the possibility to view the deleted edits. Thadh (talk) 15:12, 5 September 2022 (UTC)
- I want to say that I support deleting the talk page from time to time, but more importantly, I support a qualified right of the editor who controls the account to delete things from their talk page. An editor should have free reign in this area; they should delete things without fear that people will worry about it if they delete things on their own talk page. --Geographyinitiative (talk) 19:54, 5 September 2022 (UTC)
- This isn't about the right to delete things. It's a polite request to consider doing otherwise. If Anatoli declines, nothing bad will happen. Chuck Entz (talk) 20:03, 5 September 2022 (UTC)
- A simple request is fine, but I completely disagree with such a request. I am also against peer pressure in this matter. Sometimes you can have conversations that you can't bear seeing all the time on your talk page. If you delete them, they don't go away, but at least you don't have to see them. Sometimes you can have other legitimate reasons. Therefore, I support complete freedom in this area, to the extent allowed within the rules of Wiktionary. --Geographyinitiative (talk) 20:12, 5 September 2022 (UTC)
- It's one thing to delete things from your talkpage - it's another thing entirely to do so in a way that makes the edit history inaccessible. Theknightwho (talk) 14:06, 6 September 2022 (UTC)
- (cf. "Although archiving is preferred, users may freely remove comments from their own talk pages." Wikipedia:OWNTALK) --Geographyinitiative (talk) 15:53, 6 September 2022 (UTC)
- Yes, but that doesn't make it inaccessible. In any event, it's just a request. Theknightwho (talk) 15:54, 6 September 2022 (UTC)
- In the context of the sentence on Wikipedia I quoted, you may be right on some level. But from the literal words of the sentence, making comments 'inaccessible' seems to fall within the limit of "freely remove". I interpret these words in favor of a maximal freedom of action because we don't know everyone's situation and needs. Thanks for the discussion; since I have made four comments I will not post any further replies. --Geographyinitiative (talk) 16:17, 6 September 2022 (UTC)
- It's important to remember that we are not Wikipedia, their rules and customs don't necessarily apply on Wiktionary. That said, Thekinghtwho is right, this was only a polite request. That said, maybe it's a good idea to stop spamming Atitarev's talkpage with meaningless discussions. Thadh (talk) 16:24, 6 September 2022 (UTC)
- Yes, but that doesn't make it inaccessible. In any event, it's just a request. Theknightwho (talk) 15:54, 6 September 2022 (UTC)
- (cf. "Although archiving is preferred, users may freely remove comments from their own talk pages." Wikipedia:OWNTALK) --Geographyinitiative (talk) 15:53, 6 September 2022 (UTC)
- It's one thing to delete things from your talkpage - it's another thing entirely to do so in a way that makes the edit history inaccessible. Theknightwho (talk) 14:06, 6 September 2022 (UTC)
- A simple request is fine, but I completely disagree with such a request. I am also against peer pressure in this matter. Sometimes you can have conversations that you can't bear seeing all the time on your talk page. If you delete them, they don't go away, but at least you don't have to see them. Sometimes you can have other legitimate reasons. Therefore, I support complete freedom in this area, to the extent allowed within the rules of Wiktionary. --Geographyinitiative (talk) 20:12, 5 September 2022 (UTC)
- This isn't about the right to delete things. It's a polite request to consider doing otherwise. If Anatoli declines, nothing bad will happen. Chuck Entz (talk) 20:03, 5 September 2022 (UTC)
Uyghurjin
[edit]Hi - please don't amend "Mongolian" to "Uyghurjin" to refer to the Mongolian script, as you did here. Uyghurjin refers to an early form of the script, derived from the Old Uyghur script. The normal word for the traditional script is bichig (or just Mongolian). Theknightwho (talk) 14:01, 6 September 2022 (UTC)
- That was a long time ago. I am reverting back to use “Mongolian”, which is also supported by modules. Anatoli T. (обсудить/вклад) 20:45, 6 September 2022 (UTC)
u russian?/??
[edit]r u???? well???? huuh??! Shumkichi (talk) 23:52, 11 September 2022 (UTC)
- @Shumkichi: Why do want to know? Anatoli T. (обсудить/вклад) 23:54, 11 September 2022 (UTC)
- are you russian or not? Shumkichi (talk) 23:55, 11 September 2022 (UTC)
- @Shumkichi: What's this? Ethnically, I am half Ukrainian, half Russian. Anatoli T. (обсудить/вклад) 23:58, 11 September 2022 (UTC)
- aha ok, where do you live? do you consider yourself Ukrainian or Russian? what do you think about the war? Shumkichi (talk) 00:02, 12 September 2022 (UTC)
- also, i don't believe you speak all of those languages that fluently. Shumkichi (talk) 00:03, 12 September 2022 (UTC)
- As a normal person, I am, of course, against the war and I hope Ukraine will win.
- If you have any particular issue with a particular edit, you can point it out. --Anatoli T. (обсудить/вклад) 00:06, 12 September 2022 (UTC)
- ufff, I thought you were pro-Putin for a minute. wow, there are normal Russians after all :O Shumkichi (talk) 00:09, 12 September 2022 (UTC)
- also, i don't believe you speak all of those languages that fluently. Shumkichi (talk) 00:03, 12 September 2022 (UTC)
- aha ok, where do you live? do you consider yourself Ukrainian or Russian? what do you think about the war? Shumkichi (talk) 00:02, 12 September 2022 (UTC)
- @Shumkichi: What's this? Ethnically, I am half Ukrainian, half Russian. Anatoli T. (обсудить/вклад) 23:58, 11 September 2022 (UTC)
- are you russian or not? Shumkichi (talk) 23:55, 11 September 2022 (UTC)
Mongolian scripts
[edit]Hiya - I'm wondering how best to handle the two Mongolian scripts when a term is only used in one script or the other. There are ambiguous situations going in both directions (more when it's traditional to Cyrillic), so it's not just a case of working out what the theoretical form would be. Quite a lot of the Russian borrowings have made it across the border to Inner Mongolia, but not all of them, which is why I was thinking. Do you have any thoughts? Theknightwho (talk) 12:45, 19 September 2022 (UTC)
- @Theknightwho: You’re talking about descendants, right? You just provide the one script that the term exists in. What seems to be the problem? Multiple forms in one script can be separated by a pipe “|”. Anatoli T. (обсудить/вклад) 13:05, 19 September 2022 (UTC)
- It came from that, but it's also a wider question about the best way to handle Mongolian going forward, as there will inevitably be entries that can only be created in one script or the other. It's fine to do that, but the elephant in the room is what that implies for the entries used in both. I don't want to be in a permanent situation of trying to keep two sets of entries in sync with each other, and I also don't want to be making judgment-calls about which term deserves to be lemmatised at which. Theknightwho (talk) 13:15, 19 September 2022 (UTC)
- @Theknightwho: Nobody is expected to do more than they know or feel comfortable with. If you can access the other script and want to do add, go ahead and add it, otherwise don't worry. Also, You don't have to bother about a language more than native speakers do. Anatoli T. (обсудить/вклад) 23:58, 19 September 2022 (UTC)
- It came from that, but it's also a wider question about the best way to handle Mongolian going forward, as there will inevitably be entries that can only be created in one script or the other. It's fine to do that, but the elephant in the room is what that implies for the entries used in both. I don't want to be in a permanent situation of trying to keep two sets of entries in sync with each other, and I also don't want to be making judgment-calls about which term deserves to be lemmatised at which. Theknightwho (talk) 13:15, 19 September 2022 (UTC)
Revert - положба
[edit]Hello, why did you revert my edit положба on the page положение? It's also Slavic, so it redirects them to another same word (but similar writing)? I've also put a redirection from Положба to Положение. All Slavic words are interconnected. Andrew012p (talk) 21:25, 20 October 2022 (UTC)
- @Andrew012p If these are cognates, the best place to put them is in the relevant etymology sections. For example, "Cognate with
{{cog|mk|положба}}
" gives "Cognate with Macedonian положба (položba)". The{{also}}
template is for words that are visually similar - usually alternative lettercase forms or involving diacritics - and doesn't have anything to do with whether they're related. That's because it's not tied to a specific language and just exists for convenience, whereas cognates are actual linguistic information. Theknightwho (talk) 21:59, 20 October 2022 (UTC)- @Theknightwho Okay, thank you. Andrew012p (talk) 22:18, 20 October 2022 (UTC)
- Yes, what Theknightwho said. Anatoli T. (обсудить/вклад) 22:31, 20 October 2022 (UTC)
- @Theknightwho Okay, thank you. Andrew012p (talk) 22:18, 20 October 2022 (UTC)
Pronunciation revert
[edit]Why did you cancel the second pronunciation for доброе утро? The variant with fully pronounced "eje" is uncommon. Vziel (talk) 06:45, 2 November 2022 (UTC)
- Because I don't agree with that pronunciation to be put there. You can experiment with this on the Russian Wiktionary, though.
- [ˈdobrə(j)ə ˈutrə], [ˈdobrəɪ ˈutrə] or [ˈdobrəɛ ˈutrə] would be more accurate alternatives. --Anatoli T. (обсудить/вклад)
- @Benwing2: Hi. Does Ivanova say anything about this type of endings? I'd hate adding original researches on standard terms. --Anatoli T. (обсудить/вклад) 07:01, 2 November 2022 (UTC)
- I think these ones you mentioned are indeed more accurate. Vziel (talk) 09:58, 2 November 2022 (UTC)
- @Vziel Hi Anatoli, sorry for the delay. I think you are asking what Ivanova says about endings like -ое? The closest she has to talking about this is on p. 888, where she says (in reference to "Гласные [а], [е] (на письме буквы я, е)"):
- В остальных предударных и заударных слогах произносится очень короткий неясный звук, средний между [и] и [э], но более близкий к [и], обозначаемый в транскрипций как [ь]: нестерпи́мо [ньсьтиэр], деревя́нный [дьриэвь], перезво́н [пьриэз], репертуа́р [рьпьр], челове́к [чьла].
- Unfortunately it's clear from the examples that she's talking about syllables before the stress rather than syllables at the end of the word. There are examples elsewhere where she indicates e.g. the ending of до́брого and си́него as [въ], and similarly the ending of рабо́та as [тъ] and во́рсом as [съм], but no examples indicating the ending of words in -е. Benwing2 (talk) 18:12, 5 November 2022 (UTC)
- @Benwing2: Thanks. Unfortunately, it doesn't cover that situation. The final, unstressed pronunciations of "е" and "э" is not so well-described (where it's also [ə], not [ɪ]). The optional dropping of [j] is also of interest. Anatoli T. (обсудить/вклад) 07:47, 6 November 2022 (UTC)
- @Vziel Hi Anatoli, sorry for the delay. I think you are asking what Ivanova says about endings like -ое? The closest she has to talking about this is on p. 888, where she says (in reference to "Гласные [а], [е] (на письме буквы я, е)"):
Isn’t this SOP, with sense 14 of the verb tirer at the French Wiktionary? (tirer quelqu’un d’un problème → tirer soi même de ce problème → s’en tirer.) Compare Bill Clinton ... se tira du piège [1] and son père l’en tira de nouveau.[2] --Lambiam 06:29, 4 November 2022 (UTC)
- @Lambiam: My impression is, it is idiomatic, though but I am not 100% sure any more. Anatoli T. (обсудить/вклад) 07:32, 4 November 2022 (UTC)
- I think the idiomaticity is vested in sense 14 of the verb, “to set free”, “to deliver” (as in, “deliver us from evil”). --Lambiam 08:30, 4 November 2022 (UTC)
- @Lambiam: OK, I will delete the entry and move the citation to tirer. Are you willing to add more definitions? Anatoli T. (обсудить/вклад) 08:43, 4 November 2022 (UTC)
- I think the idiomaticity is vested in sense 14 of the verb, “to set free”, “to deliver” (as in, “deliver us from evil”). --Lambiam 08:30, 4 November 2022 (UTC)
Hi. I recently edited the etymology section of this English entry so that the Hebrew would appear romanized in a consistent way, but then four hours later you edited the section leaving romanizations inconsistent again. For example, you changed the romanization of יְהוֹשֻׁעַ to "yehoshúa'" in the first instance, but left the second and third instances as Yĕhōšúaʿ (I see now that I forgot to add the ayin mark to the 2nd instance, but no matter...). Why is this? I admit I'm pretty unfamiliar with Wiktionary's Hebrew conventions, but does the inconsistency not annoy you? --Ser be être 是talk/stalk 01:57, 26 November 2022 (UTC)
- @Ser be etre shi: The inconsistency with Wiktionary:About_Hebrew#Romanizations (column #3 - the regular Wiktionary romanisation) does annoy me, that's why the edit, which doesn't include any capitalisation or (scholarly) symbols like "š". I may have missed some places but the idea is to comply with the agreed method. If the method is not to your liking you challenge the rule but not individual entries. You're not first who complains but the rules haven't changed. If they change, the romanisations can be consistently changed with a bot. I don't work much with Hebrew entries but I am all for consistency. Anatoli T. (обсудить/вклад) 09:47, 26 November 2022 (UTC)
Finally was able to address your RFE. 😄 ‑‑ Eiríkr Útlendi │Tala við mig 21:26, 23 January 2023 (UTC)
- @Eirikr: Oh, thank you! Does it also (by extension) mean duce (a fascist authoritarian leader)? Anatoli T. (обсудить/вклад) 22:34, 23 January 2023 (UTC)
- It looks like 総統 was used to refer to Mussolini and/or his title, as we can see some over at w:ja:総統#イタリア -- presumably as part of the same shade of meaning at play when using 総統 to refer to any dictator, I think. ‑‑ Eiríkr Útlendi │Tala við mig 23:19, 23 January 2023 (UTC)
Overlordnat1
[edit]Regarding that Tea Room comment: on second thought, it is way over the line. I’ll participate in whatever process can ensure such speech remains clearly unacceptable in discussions. Thanks for speaking up. —Michael Z. 01:16, 24 January 2023 (UTC)
- @Mzajac: Thanks, you can voice your opinion here wt:BP#Review the anti-Ukrainian propaganda edit by User:Overlordnat1. Anatoli T. (обсудить/вклад) 01:22, 24 January 2023 (UTC)
Arabic loanwords
[edit]Hi Atitarev. Not sure I talked about it with you, but I have been wondering whether you are actually taught to pronounce loanwords in Arabic as the example of Barbados. In Arabic, loanwords don't normally take the expected Classical Arabic three vowels indicated in many cases by its mater lectionis. The damma diacritic is also incorrectly added to indicate a close /u(ː)/, though the Arabic practice outside of Wiktionary is to omit it if the word has another vowel, /o(ː)/. Thanks. --Esperfulmo (talk) 13:02, 6 February 2023 (UTC)
- @Esperfulmo: Hey. I am aware of the complexity, unpredictability and lack of regulations for pronunciations of loanwords and Arabic, specifically about the /o(ː)/ vs /u(ː)/ or /e(ː)/ vs /o(ː)/. When dictionaries or online resources don't provide anything definite, I have given the default, predictable transliteration in some cases. It's an old edit from 2010. I have changed many since. Thanks for addressing. Surely another native speaker will come and add their own version. Anatoli T. (обсудить/вклад) 05:28, 9 February 2023 (UTC)
- OK, thanks for taking care. --Esperfulmo (talk) 13:19, 9 February 2023 (UTC)
- @Esperfulmo Several years ago when I wrote a bot script to clean up Arabic translits and vocalize the Arabic script form, I ran into this issue of loanwords. As Anatoli points out, there are often no dictionary resources that clearly indicate the pronunciation of vowels in loanwords. The existing loanword translits are years old and totally inconsistent; it's a mess and really needs a native speaker to clean things up. My vocalization script added diacritics in all cases including when e.g. a و was indicated as u, ō or o in translit; I didn't realize that the normal practice is to omit the diacritic in such cases (although I'd argue this isn't necessarily the most helpful practice). Benwing2 (talk) 02:29, 15 March 2023 (UTC)
- @Benwing2, Esperfulmo: There is a lot of evidence that loanwords do use diacritics, even if the vowels don't correspond to standard Arabic o(ː)/ or /e(ː)/ or if they are shortened.
- Diacritics are helpful but a trasnliterations or various version of them help disambiguate.
- So بَرْبَادُوس (barbādūs) can be transliterated as various readings like barbādus, barbados or barbādōs but the vocalisation "بَرْبَادُوس" is still correct and applicable. It is problematic when it is barbēdos, for that, a different Arabic spelling would apply, e.g. بَرْبِيدُوس (barbēdos) or بَارْبِيدُوس (barbēdos). This is a rare exception where a spelling doesn't match the transliteration and pronunciation altogether. Anatoli T. (обсудить/вклад) 02:41, 15 March 2023 (UTC)
- Thanks guys. I already wrote before that Arabic speakers never add diacritics preceding mater lectiones representing the vowels /eː, oː/. If outsiders do this practice, that doesn't make it valid. I believe Persians also don't use diacritics for this particular case. The only times I've witnessed the mentioned practice was online by non-Arabic speakers and outsiders. An addition to your codes that interprets vocalized words could interpret unvocalized ي and و as ē, ō, respectively, is worth considering. This will produce a lot less faulty results. --Esperfulmo (talk) 01:59, 22 March 2023 (UTC)
- @Esperfulmo, Benwing2: Diacritics offer convenience, regardless of the origin. If a loanword is written with diacritics بَلْجِيكَا (baljīkā), as in this article, it offers an alternative to a native or foreigner alike on how to read a word. Regardless of what is considered "correct" or "standard". The question on what is "correct" often depends on who you ask and how one treats loanwords. One will say "we accommodate to the original sound", recognising the foreign words, the others prefer to almost or totally adopt them. In this revision I used دِيكْتَاتُورِيَّة بْرُولِيتَارِيَا ― diktātōriyyat broletāriyā ― dictatorship of the proletariat. Both readings are attested and links are provided, which was then discarded by @Fenakhay with a "No..." edit summary and now we have a fully adopted reading: دِيكْتَاتُورِيَّة بْرُولِيتَارِيَا ― dīktātūriyya brūlītāriyā ― dictatorship of the proletariat. (@Fenakhay: My preference is to provide both ō/o and ū/u readings, one is definitely attested AND diacritics - I didn't get around to respond earlier). There are many examples, check مُوسْكُو (mūskū). I consulted educated Arabic speakers. They all claimed different pronunciations and I heard Al-Jazeera to use various. Does the vocalisation make anything harder? I don't think so.
- I think we can further discuss policy decisions. Providing both diacritics and transliterations doesn't hurt anyone, just makes it possible both technically and for the user experience (you can use either method or both). Anatoli T. (обсудить/вклад) 02:59, 22 March 2023 (UTC)
- @Esperfulmo, Benwing2: BTW, I like the idea to add invisible codes to allow for ē, ō. Also, any consonants not available by default - č, g, ž, etc.
- It is possible e.g. to replace Arabic ج (j) with the Persian گ here, to get the desired reading:
- إِنْجْلِيزِيّ ― ʔinglīziyy ― English
- Note that the display still uses the ج (j) but it transliterates it as "g". Anatoli T. (обсудить/вклад) 03:11, 22 March 2023 (UTC)
- Thanks guys. I already wrote before that Arabic speakers never add diacritics preceding mater lectiones representing the vowels /eː, oː/. If outsiders do this practice, that doesn't make it valid. I believe Persians also don't use diacritics for this particular case. The only times I've witnessed the mentioned practice was online by non-Arabic speakers and outsiders. An addition to your codes that interprets vocalized words could interpret unvocalized ي and و as ē, ō, respectively, is worth considering. This will produce a lot less faulty results. --Esperfulmo (talk) 01:59, 22 March 2023 (UTC)
- @Esperfulmo Several years ago when I wrote a bot script to clean up Arabic translits and vocalize the Arabic script form, I ran into this issue of loanwords. As Anatoli points out, there are often no dictionary resources that clearly indicate the pronunciation of vowels in loanwords. The existing loanword translits are years old and totally inconsistent; it's a mess and really needs a native speaker to clean things up. My vocalization script added diacritics in all cases including when e.g. a و was indicated as u, ō or o in translit; I didn't realize that the normal practice is to omit the diacritic in such cases (although I'd argue this isn't necessarily the most helpful practice). Benwing2 (talk) 02:29, 15 March 2023 (UTC)
- OK, thanks for taking care. --Esperfulmo (talk) 13:19, 9 February 2023 (UTC)
Support for Łacinka entry names
[edit]Hiya - I've added support for Łacinka entry names in Belarusian, so (e.g.) Biéraście only removes the accent from the "e" (not the "s"). Theknightwho (talk) 20:36, 1 March 2023 (UTC)
- @Theknightwho: Thanks. It's the right thing to do but I don't how useful it's going to be specifically for Belarusian. Anatoli T. (обсудить/вклад) 22:36, 1 March 2023 (UTC)
- @Atitarev I've noticed we don't actually have any entries, but if they're in translation sections then it makes sense to be able to support accents properly, I think. Theknightwho (talk) 22:38, 1 March 2023 (UTC)
- @Theknightwho: Not just that. It's normal to use stress signs over Cyrillic vowels, e.g. in Belarusian but I am not aware of such a tradition when it's written in łacinka. Łacinka is originally based on Polish spellings, where letter ó is used and it's pronounced "u". Now it looks more like a mixture of Polish. In Czech, acute accents signify long vowels. It's good to have, anyway. In case there is a conversion from Cyrillic to łacinka, which can be done from Taraškievica orthography. Anatoli T. (обсудить/вклад) 22:45, 1 March 2023 (UTC)
- Seems like stress signs were at least sometimes used in Łacinka back in 1846, as can be seen here. "Хрысто́с васкрэ́с Сын Божы" was written as "Chrystōs waskrēs Syn Božy". It's also interesting that we can see a remark about the technical limitations of the used typography equipment (at the bottom of the scanned page). But the intention to indicate the stressed vowels was clearly there. Ssvb (talk) 17:58, 1 June 2023 (UTC)
- @Ssvb, @Theknightwho: That's a different (older) version of łacinka and was meant to help Polish speakers to get the accents right. With the same success you can probably find some accented German (or other Latin-based) texts. My point being, accented łacinka is very uncommon and can't be sourced. It doesn't mean it can be used in a transliterated Belarusian texts at Wiktionary (we already have accents) over Belarusian terms and if łacinka is also used as the main or alternative transliteration, I can't see why it can't have accents. Anatoli T. (обсудить/вклад) 02:06, 2 June 2023 (UTC)
- I only wanted to show an example of accented Łacinka just in case if you or @Theknightwho haven't seen it yet. This particular song lyrics book was a very recent addition on wikisource after all. As for the common use or lack thereof, accented Cyrillic texts are not very common either. Their usage is mostly limited to specialized books, such as dictionaries or language learning materials. Ssvb (talk) 11:20, 2 June 2023 (UTC)
- @Ssvb, @Theknightwho: That's a different (older) version of łacinka and was meant to help Polish speakers to get the accents right. With the same success you can probably find some accented German (or other Latin-based) texts. My point being, accented łacinka is very uncommon and can't be sourced. It doesn't mean it can be used in a transliterated Belarusian texts at Wiktionary (we already have accents) over Belarusian terms and if łacinka is also used as the main or alternative transliteration, I can't see why it can't have accents. Anatoli T. (обсудить/вклад) 02:06, 2 June 2023 (UTC)
- Seems like stress signs were at least sometimes used in Łacinka back in 1846, as can be seen here. "Хрысто́с васкрэ́с Сын Божы" was written as "Chrystōs waskrēs Syn Božy". It's also interesting that we can see a remark about the technical limitations of the used typography equipment (at the bottom of the scanned page). But the intention to indicate the stressed vowels was clearly there. Ssvb (talk) 17:58, 1 June 2023 (UTC)
- @Theknightwho: Not just that. It's normal to use stress signs over Cyrillic vowels, e.g. in Belarusian but I am not aware of such a tradition when it's written in łacinka. Łacinka is originally based on Polish spellings, where letter ó is used and it's pronounced "u". Now it looks more like a mixture of Polish. In Czech, acute accents signify long vowels. It's good to have, anyway. In case there is a conversion from Cyrillic to łacinka, which can be done from Taraškievica orthography. Anatoli T. (обсудить/вклад) 22:45, 1 March 2023 (UTC)
- @Atitarev I've noticed we don't actually have any entries, but if they're in translation sections then it makes sense to be able to support accents properly, I think. Theknightwho (talk) 22:38, 1 March 2023 (UTC)
Persian questions
[edit]Hi. I am making good progress in canonicalizing Persian translit but I have a lot of questions, not knowing Persian much. Here is the first batch:
1. Latin o against Arabic و especially in loanwords: Is this OK or a mistake in translit?
{{t|fa|کروآت|tr=koroât}}
"Croatian"{{t|fa|نوردراین-وستفالن|sc=fa-Arab|tr=nordrâyn-vestfâlen}}
"North Rhine-Westphalia"{{tt+|fa|هورن|tr=horn}}
"horn"{{t|fa|آنتلوپ|tr=ântelop}}
"antelope"
It doesn't always occur:
{{t+|fa|هلند|tr=holand}}
"Holland"
- AT: و may stand for both ô and o in loanwords, unpredictable but based on the source language, usually quite close to the source pronunciation. Sometimes no written vowel is used, as in case of
{{t+|fa|هلند|tr=holand}}
, which is quite adapted.
- AT: و may stand for both ô and o in loanwords, unpredictable but based on the source language, usually quite close to the source pronunciation. Sometimes no written vowel is used, as in case of
2. Sometimes there's a silent or untransliterated و in خو: {{t+|fa|خوابیده|sc=fa-Arab|tr=xâbide}}
"asleep". Is this correct or should we be writing xwâbide?
- AT: xâbide is correct, there are few cases where "consonant + وا" = "consonant + â". Good pick-up, this should be reflected in WT:FA TR. I only remember words where وا follows خ
3. Missing ' against ع, especially word-initially: {{t+|fa|عطر|sc=fa-Arab|tr=atr}}
"scent". I assume we should be adding the apostrophe?
- AT: 'atr. for ع is best to used ' in any position despite the actual pronunciation. Tajik uses ъ after or between vowels or nothing at the beginning.
(no 4 or 5)
6. Occasional short Latin a against long Arabic ا: {{t+|fa|چمدان|tr=chamedan}}
"portmanteau". Is this always a mistake that should be corrected to â?
- AT: Mistake. Correct = čamedân. (There are possibly some loanwords where non-initial ا after consonants is short but I am not aware)
7. No Latin hyphen against ZWNJ: {{t|fa|بیمعنی|tr=bima'ni}}
"nonsense". Should we be auto-adding a hyphen?
- AT: bi-ma'ni
8. Final h or no h against Arabic ه. Under what circumstances is this transliterated as h? Always, never or sometimes, and if sometimes, when? Currently my script deletes final h but I assume this is wrong in general. With final -eh, there appears to be no consistency in whether h appears:
{{t+|fa|چهارشنبه|sc=fa-Arab|tr=čahâr-šanbe}}
"Wednesday" vs.{{t|fa|پنجشنبه|sc=fa-Arab|tr=panj-šanbeh}}
"Thursday"
In this case should we be removing the h? If so, what about single-syllable -eh? {{t+|fa|زه|sc=fa-Arab|tr=zeh}}
"string", {{tt+|fa|مه|tr=meh}}
"fog", {{t+|fa|ده|tr=deh}}
"village" etc. In other cases, I'm also removing the h but I assume this is wrong:
{{t+|fa|ماه|tr=mâh}}
"month" -> mâ (WRONG?){{t|fa|راه پیمایی|tr=râh-peymâyi}}
"march" -> râ peymâyi (WRONG?){{t+|fa|کشتارگاه|sc=fa-Arab|tr=koštârgâh}}
"abattoir" -> koštârgâ (WRONG?){{t+check|fa|نُه|tr=noh}}
-> no (WRONG?){{tt+|fa|انبوه|sc=fa-Arab|tr=anbuh}}
-> anbu (WRONG?)
- AT:
- panj-šanbe, remove "h" where ه is in the final position after consonants but there may be some compound words where a vowel is different, not "e" (or "a" in classical or Dari).
- meh - don't remove h in single-syllable words
- mâh, koštârgâh, noh, anbuh - don't remove h where the vowel is not "e" (never after long vowels, such as gâh, koštârgâh or "anbuh" (don't forget that final "e" pronunciation applies to modern Iranian, not Dari or classical)
- END AT:
9. Latin hyphen against Arabic space: Is it correct to change hyphen to space here?
{{t+|fa|کدو حلوایی|tr=kadu-halvâyi}}
"pumpkin" -> kadu halvâyi (RIGHT OR WRONG?){{t+|fa|سلاخ خانه|sc=fa-Arab|tr=sallâx-xâne}}
"abattoir" -> sallâx xâne (RIGHT OR WRONG?){{t|fa|اس ام اس|sc=fa-Arab|tr=es-em-es}}
"SMS" -> es em es (RIGHT OR WRONG?){{t+|fa|هرج و مرج|tr=harj-o-marj}}
"anarchy" -> harj o marj (RIGHT OR WRONG?)
- AT: There is some inconsistencies in using "-" there, also with Urdu. Some words have alternative spellings with ZWNJ (only between joining type of letters). E.g. سلاخ خانه (sallâx xâne) can also be written as سلاخخانه (sallâx-xâne) (with ZWNJ), کدو حلوایی (kadu halvâyi) or کدوحلوایی (kadu halvâyi) (no ZWNJ, these are non-joining letters). ZWNJ is not always used by all writers, the letters can be joined or have a space between. The use of "-" is just to show that it is considered one word or for readability. Perhaps اس ام اس (es em es) or هرج و مرج (harj o marj) shouldn't have hyphens but it's IMO, "es-em-es" might be easier to read if it's used in a longer sentence. (I transliterated words in #9 with my preference, dependent on Perso-Arabic spelling). I am less sure about the last two examples.
10. Canonicalizing ō to ô and ē to ê: Is this OK?
{{tt|fa|خوروران|tr=xōrvarân}}
"west" -> xôrvarân (RIGHT OR WRONG?){{tt+|fa|میوه|tr=mēva}}
"fruit" -> mêva (RIGHT OR WRONG?){{tt+|fa|بارو|tr=bârō}}
"wall" -> bârô (RIGHT OR WRONG?)- AT: Yes, xôrvarân, mêva, bârô.
Benwing2 (talk) 02:36, 12 March 2023 (UTC)
- @Benwing2: I tried to answer your questions. Pls let me know if my answers make sense. --Anatoli T. (обсудить/вклад) 04:53, 12 March 2023 (UTC)
- Thanks! They do make sense. A few more examples of #7:
{{t+|fa|قهوهای|tr=qahvei}}
or{{t+|fa|قهوهای|tr=qahve'i}}
"brown"{{t|fa|یکشنبه|tr=yekšanbe}}
"Sunday"{{t|fa|شعلهور|sc=fa-Arab|tr=šo'levar}}
"ablaze"{{t+|fa|گاهشماری|tr=gâhšomâri}}
"calendar"{{t+|fa|باستاناخترشناسی|tr=bāstānaxtaršenāsī}}
"archaeastronomy"
- AT: I would use قهوهای (qahve-i) because of the ZWNJ (but ' seems OK too (?)). "yek-šanbe", "šo'le-var", "gâh-šomâri", "bâstân-axtaršenâsi".
- A few more examples of #3:
{{t|fa|علم اشتقاق|tr=elm-e ešteqâq}}
"etymology"{{t+|fa|غیر عادی|tr=qeyre âddi}}
"abnormal"{{t|fa|طبیعتاّ|sc=fa-Arab|tr=tabiatan}}
"accordingly"
- AT: " 'elm-e ešteqâq", "ğeyr-e 'âddi", "tabi'atan"
- A few more examples of #6:
- AT: "afgâne", "dowrân talâyi" (the 2nd was totally wrong).
- A couple more issues:
- about assimilation of 'nb' -> 'mb':
{{t+|fa|کدو تنبل|tr=kadu tambal}}
"pumpkin"; is this a mistake in translit or correct? - glottal stop in the middle of a word:
{{t|fa|کوئیز|tr=kuiz}}
"quiz"; should there be an apostrophe here?
- AT: IMO "kadu tanbal" (unless specifically descibed in WT:FA TR). "ku'iz".
- about assimilation of 'nb' -> 'mb':
- Benwing2 (talk) 05:18, 12 March 2023 (UTC)
- Answered above with ":::: AT:" --Anatoli T. (обсудить/вклад) 05:34, 12 March 2023 (UTC)
- @Benwing2: On monosyllabic words - there are words where ه can be a vowel or a consonant به (be, “to, for”) or به (beh, “quince”).
- Also when ه is "e", it can precede a ZWNJ + ا or ی or و. Anatoli T. (обсудить/вклад) 04:10, 13 March 2023 (UTC)
- Answered above with ":::: AT:" --Anatoli T. (обсудить/вклад) 05:34, 12 March 2023 (UTC)
- Thanks! They do make sense. A few more examples of #7:
- @Benwing2: I tried to answer your questions. Pls let me know if my answers make sense. --Anatoli T. (обсудить/вклад) 04:53, 12 March 2023 (UTC)
A few more questions ... I am focusing on getting the existing conversions mistake-free then I will expand coverage.
1. There are lots of cases of existing a corresponding to Arabic ا. Most of them are clearly mistakes that do need to be converted, but some of them are loanwords, and I want to see whether all of these should have a -> â. (If not, I can make the conversion script output a list of all such conversions, so that we can undo the ones that are wrong.) The following are the questionable cases among the first 5000 pages processed (out of about 35,000 or so pages with Persian translit on them); the three arguments to test() are the Arabic script, the original translit and the translit as converted by my script:
- test("اسپانیا", "espâniya", "espâniyâ") "Spain"
- test("کویت", "koveit", "koveyt") "Kuwait"
- test("ترینیداد و توباگو", "trinidad ve tobago", "trinidâd ve tobâgo") "Trinidad and Tobago"
- test("زبان هاوایی", "zabân havayi", "zabân hâvâyi") "Hawaiian"
- test("عبرانی", "ebrani", "'ebrâni") "Hebrew"
- test("ولاپوک", "volapuk", "volâpuk") "Volapük"
- test("تایلندی", "tailandi", "tâylandi") "Thai"
- test("توندرا", "tundra", "tundrâ") "tundra"
- test("زبان کانارا", "zabân kanara", "zabân kânârâ") "Kannada"
- test("رادیوم", "radiyom", "râdiyom") "radium"
- test("برتاین", "bretayn", "bretâyn") "Brittany"
- test("استالدهید", "asetaldehid", "asetâldehid") "acetaldehyde"
- test("فیلادلفیا", "filadelfia", "filâdelfiyâ") "Philadelphia"
- test("کولا", "kola", "kolâ") "cola"
- test("ماراتی", "marâti", "mârâti") "Marathi"
- test("کامیون", "kamyon", "kâmiyon") "truck"
- test("مالزیایی", "maleziyayi", "mâleziyâyi") "Malay"
- test("براتیسلاوا", "beratislâvâ", "berâtislâvâ") "Bratislava"
- test("کوارت", "kwart", "kwârt") "quart"
- test("پاستا", "pâsta", "pâstâ") "pasta"
- test("ماکروفاژ", "makrofāž", "mâkrofâž") "macrophage"
- AT: Yes, to "â" in all cases.
2. I also have questions about the following three, which may be wrong:
- test("پیشافکند", "piš-afkand", "piš-âfkand") "project"
- test("رایج", "raij", "râyj") "common"
- test("بیادب", "bi-adab", "bi-âdab") "rude"
- AT: Only "râyj" is a correct conversion, in other cases ا is the first letter of a word (a stem), so it's short and can also be "e" or "o" on the initial ا alef. Cf. افکند (afkand), ادب (abad). #1 and # are compound words
3. There is this one, should it be "kwârk"? (If so it needs to be done manually.)
- test("کوارک", "kuārk", "kuârk") "quark"
- AT: "kvârk" (per https://forvo.com/word/کوارک/ even if it sounds /w/) - this case is not in [[WT:FA TR], though, so it could be "w", "v" or "u". "v" matches the current policy closer (?). Let's decide on "kwârt" and "kwârk" vs "kvârt" and "kvârk"
4. In this one, should it be "nega dâštan"?
- test("نگه داشتن", "negah dāštan", "negah dâštan") "stop"
- AT: "negah dâštan"
5. Finally, I found a couple of cases with final ZWNJ. Are these errors?
- test("فهرست پیگیری", "fehrest-e peygiri", "fehrest-e pey-giri")
- test("گم", "gom", "gom")
- AT: "gom" and "fehrest-e pey-giri" are correct
Thanks. Benwing2 (talk) 04:55, 13 March 2023 (UTC)
- Oh, one more question: Arabic و, should it always be v (except in diphthongs), always be w, or be left alone whatever it currently is (this is what is currently happening)? Also, I am currently leaving final h alone whatever it is, so the things you mentioned above about monosyllables ending in ه won't be an issue. I will fix things so that polysyllabic words ending in -eh are changed to end in -e, if that makes sense to you. Also, you said this:
- AT: it's "v" in front of vowels, "ow" in a diphthong. The rest is not described properly in the policy. E.g. واو (vâv). It's "vâv" but maybe it should be "vâw"?
- AT: With the final ه, it should be based on pronunciation but unfortunately some people write "-eh", even knowing there is no "h" as in "panj-šanbeh" instead of the correct پنجشنبه (panj-šanbe). Need to also pay attention to compound words. I can imagine there could be a small number of words where the last "ه" after a consonant is actually "eh", not "e"
- Also when ه is "e", it can precede a ZWNJ + ا or ی or و.
- Can you give me a few examples of this? Benwing2 (talk) 04:58, 13 March 2023 (UTC)
- @Benwing2: Your example قهوهای (qahve-i) above. [EDITED with answers above].--Anatoli T. (обсудить/вклад) 05:47, 13 March 2023 (UTC)
- Thanks, this is helpful. I see you corrected فیلادلفیا filâdelfiâ to filâdelfiyâ, is this always the case that there should be a y between i and a following vowel? If so I can have my script add it. As for کامیون kâmiyon, it looks like you added an i here. I don't think I can automatically add the i unless it's never possible to have a y following a consonant (which seems doubtful). As for v vs. w, should the policy be that kw- should be written with a w (and presumably also xw-, gw-, qw-, ğw-?) and all other cases of و outside of diphthongs aw ow and when not representing u should be written as v? Are there words with word-internal -kw-, -xw-, -gw-, etc.? I seem to have found some:
{{t+|fa|دادخواست|tr=dâdxwâst}}
(a compound?),{{t+|fa|کیفرخواست|tr=keyfarxwâst}}
(a compound?),{{t+|fa|ناخواهری|tr=nâxwâhari}}
,{{t+|fa|درخواست کردن|tr=darxwâst kardan}}
,{{t|fa|نانخواه|tr=nânaxwâh}}
"ajwain",{{l|fa|دستر خوان|t=table-cloth; a meal setting|tr=dastarxwân}}
. There are also words with -xv- in them:{{t+|fa|گیاهخوار|tr=giyâhxvar}}
"herbivore",{{l|fa|مخور|tr=moxvar}}
,{{der|tr|fa|ترخوانه|tr=tarxvâne}}
,{{cog|fa|دادخواه||complainant, plaintiff|tr=dâdxvâh}}
,{{fa-noun|tr=naxvāz, noxāz}}
,{{l|fa|چرخوار|tr=čarxvâr}}
etc. With -kv- I find{{t|fa|تکواژ|tr=takvâž}}
,{{t|fa|اکوادور|tr=ekvâdor}}
,{{t+|fa|تکواژه|tr=takvâže}}
,{{desc|fa|تکواندو|bor=1|tr=tekvândo}}
,{{desc|fa|تکور|bor=1|tr=takvor}}
. Benwing2 (talk) 06:29, 13 March 2023 (UTC)- @Benwing2: Yes, it's filâdelfiyâ, râdiyom, kâmiyon. ـیا and "-iyâ" and ـيه "-iye" are common with geographical names.
- Some of the cases with "خوا" should actually be "xâ", not "xwâ" with a silent "و", at least if we use modern Iranian as the default.
- کیفرخواست - keyfarxâst (yes, it's a compound but ر is a non-joining letter, no ZWNJ is used here
- خواستن -xâstan
- ناخواهری - nâxâhari
- دادخواه - dâdxâh
- درخواست - darxâst
- etc.
- But
- چرخوار - čarx-vâr (compound, čarx + vâr)
- Otherwise, the preference seems to be kv-, gv-. I am neutral on this
- اکوادور -
ekvâdor(preferred by users) but "ekwâdor" makes probably more sense and is in line with many other cases?
- اکوادور -
- Anatoli T. (обсудить/вклад) 06:58, 13 March 2023 (UTC)
- It seems "خوا" is always "xâ" - there are too many words (unless خ + و belong to different parts of a word). I wonder if editors add "w" knowing it's silent because of the Perso-Arabic spelling or for etymological reasons? Anatoli T. (обсудить/вклад) 07:01, 13 March 2023 (UTC)
- OK, it sounds like I should (a) allow missing w across from و if followed by خ; (b) leave w alone if following k, x, g??; (c) otherwise convert to v (except in a diphthong, as mentioned above). Benwing2 (talk) 07:02, 13 March 2023 (UTC)
- @Benwing2: Yeah, sounds good. I'd like to add missing bits to WT:FA TR for clarity regarding these cases. "kv" or "gv" would be weird, knowing Persian dislikes initial consonant clusters, so "kw" or "gw", etc. would be more acceptable. So "v" only only if it's the consonant in the beginning of a syllable and "w" in a cluster or a final in a diphthong.
- > "allow missing w across from و if followed by خ". Surely you meant preceded by خ? :) Anatoli T. (обсудить/вклад) 07:09, 13 March 2023 (UTC)
- @Benwing2: The use of "i" before "y" is indeed confusing and inconsistent.
- These are pronounced without any "i" before /j/ to my ear:
- https://forvo.com/word/روسیه/#fa: روسیه - rusye
- https://forvo.com/word/مریم/#fa: مریم - maryam
- But there is a light "i" here:
- https://forvo.com/word/اسپانیا/#fa: اسپانیا - espâniya
- What do you think? Anatoli T. (обсудить/вклад) 07:34, 13 March 2023 (UTC)
- فیلادلفیا (Philadelphia) has an "i" in forvo but "رادیوم" (radium) hasn't. Anatoli T. (обсудить/вклад) 07:37, 13 March 2023 (UTC)
- Hard to say in those Forvo pronuns. I hear a light /i/ in at least 2 of 3 pronuns of 'rusye' but not in 'maryam' and maybe in 'espâniya'; but keep in mind that English doesn't normally have /sj/ sequences, which assimilate to /ʃ/ so it may be just my untrained ear. As for خو, yes, preceded by خ. BTW what about û? I am finding lots of existing translits with û in them opposite و:
{{t|fa|تولبره|tr=tūlbare}}
,{{t+|fa|ساروج|sc=fa-Arab|tr=sârûj}}
,{{t+|fa|فزونی|tr=fozūnī}}
,{{t+|fa|ققنوس|tr=qoqnūs}}
, etc. (and also lots with unmatched u that I assume should be o:{{t+|fa|بردوان|tr=burdvân}}
,{{desc|fa|رخش|t=light|tr=ruxš}}
,{{noncog|fa|بیشمار|tr=baySHumâr}}
,{{l|fa|پرتگالی|tr=purtagâlî}}
, etc.). Is it OK to make the conversion û/ū -> u, î/ī -> i, unmatched u -> o, unmatched i -> e or do we have to worry e.g. about Dari/Classical or something where these might be legit? Benwing2 (talk) 08:20, 13 March 2023 (UTC)- @Benwing2: Your assumptions are correct about the conversion. Unless there is a qualifier with Dari (or Afghanistan)/classical, then treat them as modern Iranian. They look like they are.
- 'espânya' is also preferred to 'espâniya', 'rusye' to 'rusiye' if we assume that modern Iranian doesn't even have a short "i".
- Or, have both 'rusiye' and 'rusye' vocalised as روسیِه, روسیَه.
- Or, leave it for later to discuss with native speakers on a vote, etc. Anatoli T. (обсудить/вклад) 08:33, 13 March 2023 (UTC)
- Hard to say in those Forvo pronuns. I hear a light /i/ in at least 2 of 3 pronuns of 'rusye' but not in 'maryam' and maybe in 'espâniya'; but keep in mind that English doesn't normally have /sj/ sequences, which assimilate to /ʃ/ so it may be just my untrained ear. As for خو, yes, preceded by خ. BTW what about û? I am finding lots of existing translits with û in them opposite و:
- فیلادلفیا (Philadelphia) has an "i" in forvo but "رادیوم" (radium) hasn't. Anatoli T. (обсудить/вклад) 07:37, 13 March 2023 (UTC)
I'm getting pretty close. Latest failure rate trying to match Persian spelling with Latin translit is 2890/51759 = 5.6% which is pretty good given how messed up so many of the transliterations are. My script is able to vocalize the Persian spelling in the process; this is what I did as well in the original version of the script, which handled Arabic. Is this something we want to do or should we put it off till later? Benwing2 (talk) 03:39, 14 March 2023 (UTC)
- @Benwing2: Thanks for the update. Take your time Apologies for many pings - there's always something that comes to mind when I think I missed something. Anatoli T. (обсудить/вклад) 03:41, 14 March 2023 (UTC)
- No problem with the pings. I have a question about unmatched â, which is frequent. It's one of the following:
- Long â when it seemingly should be short. Are these all just mistakes? (but note these which are not a mistake: tr_matching(الهه, elâhe); tr_matching(الهیات, elâhiyât))
- tr_matching(جارو برقی, jâru bârqi); tr_matching(راک اند رول, râk ând rol); tr_matching(سیبرنتیک, sâybernetik); tr_matching(تایلند, tâylând); tr_matching(نشان دادن, nešân dadân); tr_matching(متکلم به چند زبان, motekâllem be čand zabân); tr_matching(بلروسی, belârusi); tr_matching(شمارههای طبیعی, šomārehā-ye tabī'ī) [this is a weird one]; tr_matching(مایکروویو, mâykrovâyv); tr_matching(غلط املایی, ğalaat-e emlâyi); tr_matching(روشنگری, rowšangâri)
- Long â when it seemingly should be short. Are these all just mistakes? (but note these which are not a mistake: tr_matching(الهه, elâhe); tr_matching(الهیات, elâhiyât))
- No problem with the pings. I have a question about unmatched â, which is frequent. It's one of the following:
- AT: "â" -> "a" for all. When you see a long "â" for an unwritten vowel as in برقی or an a plain alif at the beginning of a word or after ZWNJ, you can substitute with a plain "a".
- Long â against alif madda in the middle of a word not after ZWNJ (but usually after a non-joining letter); should we write an apostrophe or hyphen here?
- tr_matching(درآمدن, darâmadan); tr_matching(درآشامنده, dārāšāmane); tr_matching(سود سوزآور, sud-e suzâvar); tr_matching(پاپوآ گینهٔ نو, pâpuâ gine-ye now); tr_matching(فرآویز, farâviz); tr_matching(کوآلا, koâlâ); tr_matching(رآکتور, reâktor); tr_matching(جنوآ, jenoâ); tr_matching(کروآت, koroât); tr_matching(زردآلو, zardâlu); tr_matching(برآمدن, barâmadan); tr_matching(پدیدآورنده, padidâvarande); tr_matching(مآبانه, -meâbâne); tr_matching(گاوآهن, gâvâhan); tr_matching(میوآن, myuân)
- Long â against alif madda in the middle of a word not after ZWNJ (but usually after a non-joining letter); should we write an apostrophe or hyphen here?
- AT: let's use hyphen. It matches German-Persian Langenscheidt way (it's imperfect but good).
- Long â against FARSI YEH in final position where it should be alif maqsuura; is this a spelling mistake in the Persian or should we allow the â?
- tr_matching(مصلی, mosallâ); tr_matching(خنثی, xonsā); tr_matching(موسی, Musâ); tr_matching(عیسی, 'isâ); tr_matching(حتی, hattâ)
- Long â against FARSI YEH in final position where it should be alif maqsuura; is this a spelling mistake in the Persian or should we allow the â?
- AT: There are cases where ی is read as "â" in Arabic loanwords, e.g. عیسی, 'isâ. Also in Urdu.
- Long â against regular alif that should be alif madda; is this a spelling mistake in the Persian or should we allow it?
- tr_matching(اما, âmma)
- Long â against regular alif that should be alif madda; is this a spelling mistake in the Persian or should we allow it?
- AT: A mistake.
- Long â against fatha + alif; is this vocalization correct for Persian?
- tr_matching(زُرنَاپَا, zurnāpā); tr_matching(زُرنَا, zurnā); tr_matching(آسَامِی, âsâmi)
- Long â against fatha + alif; is this vocalization correct for Persian?
- AT: A fathe (zebar) is uncommon before an alef but it's not incorrect and doesn't affect anything negatively. (On the other hand, a kasre (zir) before a "yeh" or a zamme (pish) before a vâw would result in "ey" and "ow"). I would use زُرناپا, zornâpâ, زُرنا, zornâ, آسامی âsâmi (note: I have removed the zir on the last one, otherwise it would be âsâmey).
- A weird case:
- tr_matching(علیرغم, alâraqm)
- A weird case:
- Benwing2 (talk) 04:11, 14 March 2023 (UTC)
- Answered above. --Anatoli T. (обсудить/вклад) 04:43, 14 March 2023 (UTC)
- Thanks! BTW you didn't answer my question above about vocalizing the Persian script. I assume we should hold off on this? Benwing2 (talk) 04:53, 14 March 2023 (UTC)
- @Benwing2: I personally support vocalisation. Wiktionary:Beer_parlour/2021/December#Persian_automated_transliteration showed support, even if native speakers were cold about it. From the perspective of learners, it's definitely better. It's also untrue that Persian vocalisation is non-existent.
- The pronunciation sections could also display alternative vocalisations where they differ. My example غِیرَت (ğeyrat) - modern Iranian vs غَیرَت (ğayrat) - Dari or classical in Wiktionary:Grease_pit/2023/March#Bot_request_for_Persian_transliterations_(fa)
- Module:fa-translit works pretty well based on vocalisations but it needs to cover more test cases and there are exceptions. Anatoli T. (обсудить/вклад) 05:06, 14 March 2023 (UTC)
- @Benwing2: Urdu vocalisation has been already happening full-on but it didn't get the same coverage as Arabic and modules/templates are behind. Anatoli T. (обсудить/вклад) 05:08, 14 March 2023 (UTC)
- Thanks! I have a question about the character sequence هٔ (U+0647 U+0654) that frequently occurs opposite Latin -ye. Examples:
- tr_matching(امارات متحدهٔ عربی, emârât-e mottahede-ye 'arabi)
- tr_matching(پاپوآ گینهٔ نو, pâpuâ gine-ye now)
- tr_matching(آرایهٔ ادبی, ârâye-ye adabi)
- tr_matching(چفتهٔ زانو, čafte-ye zānū)
- There is also a similar single-char version ۀ (U+06C0) that my script correctly handles.
- Is the two-char version acceptable or a mistake? I ask because if it's a mistake I'll canonicalize it to the one-char version and correct the Persian script in the examples I find. Benwing2 (talk) 05:51, 14 March 2023 (UTC)
- BTW as for vocalization it sounds like we should wait but try to push for consensus. Who are the currently active native-speaker editors? Benwing2 (talk) 05:53, 14 March 2023 (UTC)
- @Benwing2: They are visually identical, so I couldn't tell, which one is right.
- The Persian keyboard offers هٔ (U+0647 U+0654). So this must be right one.
- You can see who is has been active by calling
{{subst:wgping|fa}}
but only Sameerhameedy is active lately, less so ZxxZxxZ, who is a long time contributor (opposes vocalization). Anatoli T. (обсудить/вклад) 06:18, 14 March 2023 (UTC)- OK thanks. ZxxZxxZ's name appears in the comments of some modules I have written because they were an early contributor. I checked the current Persian lemmas and the single-char ۀ doesn't occur in any lemmas but the two-char version هٔ (which looks different in my font; the hamza above is separated by significantly more whitespace from the heh) occurs in two lemmas: چلهٔ تابستان and چلهٔ زمستان. More evidence that the two-char version is right. Another weird char that shows up sometimes is ہ, which is HEH GOAL. Wiktionary calls this choṭī he and says it's an Urdu char. I take it this is a mistake and should be replaced with regular ه? Benwing2 (talk) 06:44, 14 March 2023 (UTC)
- @Benwing2: Yes, to both questions - bring to two char version and "choṭī he" is a mistake, not used in Persian. Anatoli T. (обсудить/вклад) 06:52, 14 March 2023 (UTC)
- OK thanks. ZxxZxxZ's name appears in the comments of some modules I have written because they were an early contributor. I checked the current Persian lemmas and the single-char ۀ doesn't occur in any lemmas but the two-char version هٔ (which looks different in my font; the hamza above is separated by significantly more whitespace from the heh) occurs in two lemmas: چلهٔ تابستان and چلهٔ زمستان. More evidence that the two-char version is right. Another weird char that shows up sometimes is ہ, which is HEH GOAL. Wiktionary calls this choṭī he and says it's an Urdu char. I take it this is a mistake and should be replaced with regular ه? Benwing2 (talk) 06:44, 14 March 2023 (UTC)
- Thanks! I have a question about the character sequence هٔ (U+0647 U+0654) that frequently occurs opposite Latin -ye. Examples:
- @Benwing2: Urdu vocalisation has been already happening full-on but it didn't get the same coverage as Arabic and modules/templates are behind. Anatoli T. (обсудить/вклад) 05:08, 14 March 2023 (UTC)
- Thanks! BTW you didn't answer my question above about vocalizing the Persian script. I assume we should hold off on this? Benwing2 (talk) 04:53, 14 March 2023 (UTC)
- Answered above. --Anatoli T. (обсудить/вклад) 04:43, 14 March 2023 (UTC)
- Benwing2 (talk) 04:11, 14 March 2023 (UTC)
I dealt with a bunch of other little issues with weird Arabic chars as well as Latin chars in the Persian script e.g. <br>
and {{...}}
. Going to sleep now but I'm basically ready to run it; I just have to ensure that existing short vowel diacritics in the Persian script don't cause problems and add an option to turn off auto-vocalization since we're apparently not doing it now (?). Latest failure rate is 2332/53890 = 4.3%. This still leaves 2000+ cases to check manually but I am hitting the point of diminishing returns, and the manual checking can be sped up by doing it in a text file and pushing all the changes at once. Benwing2 (talk) 08:54, 14 March 2023 (UTC)
- @Benwing2: Thank you very much! I'll try to fix some issue, if they occur (over time). --Anatoli T. (обсудить/вклад) 10:53, 14 March 2023 (UTC)
- @Benwing2 Hi,
- Another case where a Persian "ye" is "â" قرون وسطی (fa) (qorun-e vostâ) (Middle Ages).
- German-Persian dictionary:
- https://en.langenscheidt.com/german-persian/mittelalter
- It's interesting that in Urdu, they want to vocalise it with یٰ as in قُرُونِ وُسْطیٰ (qurūn-e vust̤á) (Middle Ages), cf. مَشْرِقِ وُسْطیٰ (maśriq-e-vust̤á) (Middle East).
- It's probably a good alternative to zebar (fatha) but I have to yet to find such usage in Persian. Anatoli T. (обсудить/вклад) 00:16, 15 March 2023 (UTC)
- Thanks! Failure rate is now 2248/73918 = 3.0% after implementing support for Persian-specific templates; multiple translits (comma-separated or tilde-separated, etc.) opposite a single Persian-script spelling; fatha+alif opposite â; and a few other things. I need a few more changes in the driver, then I will run it. Benwing2 (talk) 01:33, 15 March 2023 (UTC)
- BTW that is a dagger alif (alif xanjariyya) over the alif maqsuura, an old Koranic usage. Benwing2 (talk) 01:34, 15 March 2023 (UTC)
- @Benwing2: Thanks.
- Yes, I know about alif xanjariyya as in رَحْمٰن (raḥmān) but is it supposed to be used over the alif maqsuura? In Arabic alif maqsuura is vocalised with a fatha: الْعُصُور الْوُسْطَى (al-ʕuṣūr al-wusṭā) or قُصْوَى (quṣwā).
- I guess a serious difference between Persian/Urdu and Arabic is that Arabic has both letters yaa' and alif maqsuura but in Urdu and Persian "ye" is used for both. A fatha before a "ye" would mean "ay" but an alif xanjariyya before "ye" makes it a long "â". Anatoli T. (обсудить/вклад) 01:56, 15 March 2023 (UTC)
- Yes exactly. I have occasionally seen dagger alif over alif maqsuura in vocalized Arabic as well but it's not standard. Benwing2 (talk) 02:02, 15 March 2023 (UTC)
- @Benwing2: Thanks. If we start considering vocalisations, need to address some of the corner cases. There may not be a definite way, since vocalised Persian is much less common.
- Almost predictably, a silent و is not vocalised in خوانْدَن (xândan) when a vocalisation is needed. Anatoli T. (обсудить/вклад) 02:10, 15 March 2023 (UTC)
- Yeah, the non-usage of sukuun in Persian will potentially cause issues here. Maybe we should always assume و is silent in خوا. BTW the current Persian templates are kind of messy and should be cleaned up. E.g.
{{fa-adv}}
,{{fa-conjunction}}
,{{fa-interjection}}
,{{fa-phrase}}
,{{fa-preposition}}
,{{fa-pronoun}}
don't really accomplish anything and IMO should be eliminated in favor of directly calling{{head}}
, and some of the other templates have weird and non-standard param usages that could stand to be cleaned up and standardized. There's also things like{{fa-verb/new}}
,{{fa-IPA/old}}
etc. that are in a halfway state. Maybe after the translit cleanup I will try to clean these templates up, any objections? Do we need to bring in other Persian editors (if they still exist)? Benwing2 (talk) 02:23, 15 March 2023 (UTC)- BTW I didn't run my test script on terms with manual translit in headwords (Special:WhatLinksHere/Template:tracking/headword/has-manual-translit/fa) but only on terms with manual translit in links (Special:WhatLinksHere/Template:tracking/links/manual-tr/fa). This adds a bunch more cases, with some overlap with the previous set of cases. Failure rate here is 1514/38603 = 3.9% but I'm not yet handling all the conjugation/declension templates. Benwing2 (talk) 02:44, 15 March 2023 (UTC)
- @Benwing2: sukuun is best to use to avoid confusions in most cases, "خوا" is a special case but it's used in خوانْدَن over nuun. This reminds me more of Arabic sun letters where there is nothing (unmarked) over laam in الشَّمْس (aš-šams).
- No objections to cleaning up from me. I am converting some from
{{fa-IPA/old}}
to{{fa-IPA}}
. You can always wgping:fa the group in WT:BP. Anatoli T. (обсудить/вклад) 02:47, 15 March 2023 (UTC)
- Yeah, the non-usage of sukuun in Persian will potentially cause issues here. Maybe we should always assume و is silent in خوا. BTW the current Persian templates are kind of messy and should be cleaned up. E.g.
- Yes exactly. I have occasionally seen dagger alif over alif maqsuura in vocalized Arabic as well but it's not standard. Benwing2 (talk) 02:02, 15 March 2023 (UTC)
One more question, about stress in Persian. The {{fa-decl-c}}
and {{fa-decl-c-unc}}
templates have a param 4= that is a translit with stress mark added. You can see an example of this in شهادت. (Some uses of this template do not specify 4=, e.g. گوز.) Is the stress predictable and is it worth keeping the existing stress marks in uses of this template? My script normally removes all acute and grave accents from the Latin translit. Benwing2 (talk) 01:22, 16 March 2023 (UTC)
- @Benwing2: Yes, the stress is predictable but there are a few exceptions. Probably not worth keeping. The inflection templates will need to be reviewed so that they are automatically added if someone objects. Anatoli T. (обсудить/вклад) 01:32, 16 March 2023 (UTC)
Oh yeah one other thing, my script currently converts e.g. bâğ-e-vahš to bâğ-e vahš opposite باغ وحش. I assume this is fine but just want to make sure. Benwing2 (talk) 01:27, 16 March 2023 (UTC)
- @Benwing2: Yep. That's fine. I have sometimes added redundant hyphens in Persian and Urdu translations for etymological reasons or to match whole words in source or equivalent languages (Hindi, English or Persian). E.g. चिड़ियाघर (ciṛiyāghar) = چڑیا گھر (ciṛiyā-ghar) -> "ciṛiyā ghar". Or because some dictionaries I used, have them.
- Fine to convert when it's unjustified. Anatoli T. (обсудить/вклад) 01:34, 16 March 2023 (UTC)
- OK thanks. I found a bunch more cases of stressed vowels in the translit of verbal conjugations, e.g. تبریک گفتن, انزال کردن, مردن. All of these will have the stress marks removed from the translit; just want to make sure that's OK. Benwing2 (talk) 01:45, 16 March 2023 (UTC)
- BTW do you have a good reference for Persian verbs? Benwing2 (talk) 01:48, 16 March 2023 (UTC)
- @Benwing2: The grammar is not too complex, usually there are changes inside the stem, as in Arabic, but finding info on irregular noun plurals may be sometimes difficult, especially for loanwords from Arabic. Our current verb templates are not bad.
- I used "Complete ..."(aka "Teach Yourself ...") and "Beginners ..." series. I have just emailed you "Persian: A Comprehensive Grammar". Anatoli T. (обсудить/вклад) 02:16, 16 March 2023 (UTC)
- A weird case: درود which tries to list both Classical and modern Iranian translits. Benwing2 (talk) 02:40, 16 March 2023 (UTC)
- @Benwing2: Ha-ha. Far out! Multiversion transliteration is not supported (yet). Use modern Iranian "dorud" on درود and "kud" on کود. We need to pick just one standard (for now, at least).
- Do you want me to manually change it? Anatoli T. (обсудить/вклад) 03:20, 16 March 2023 (UTC)
- This one too: کود. A case which my script gets wrong: گلئون which gets canonicalized from glu'on to glo'on. Not sure how to avoid this as the word does not have a و corresponding to the long ū in the first syllable. Benwing2 (talk) 02:58, 16 March 2023 (UTC)
- Yeah it should probably be changed. Other cases: دروازه, پودنه, اسپرزه. Benwing2 (talk) 03:30, 16 March 2023 (UTC)
- Another thing: I am removing final -h in -eh in a multisyllabic word. I changed it to also do this before a hyphen, but only hyphen + consonant, so farmânde-hâ but farmândeh-e; is this correct? Benwing2 (talk) 03:32, 16 March 2023 (UTC)
- @Benwing2: I've just fixed دروازه.
- No, it should be "farmânde-ye" Anatoli T. (обсудить/вклад) 03:38, 16 March 2023 (UTC)
- OK, thanks. The remaining cases with multiple translit (I think this is all): سنبوسه, شوکران, پودینه, مردار سنگ, اسفرزه, اسپغول, اسبغول. Benwing2 (talk) 03:43, 16 March 2023 (UTC)
- @Benwing2: Done. The classical transliterations were added by User:Fay Freak. I am not sure if multiple transliterations are sustainable. Anatoli T. (обсудить/вклад) 03:58, 16 March 2023 (UTC)
- Thanks! IMO multiple translits should only be used with multiple lemmas (which are now supported by User:Theknightwho's changes, but more work is required). It's very common to have multiple translits giving alternative vocalizations, e.g.
{{alter|fa|زفان|زوان|زوون|زبون|tr=zofân, zafân|tr2=zovân, zavân|tr3=zoun, zowun|tr4=zabun}}
,{{alter|fa|آینه|tr=āyina ~ āyēna}}
,{{m|fa|محاربه||war|tr=muhâraba or mohârabe}}
,{{head|fa|adjective|tr=gawd/gōd}}
, variously separated by comma, tilde, slash, semicolon, italicized or, etc. My script tries to recognize this and match-canonicalize the various translits individually and then join them together again with a comma. Some of these translits represent dialectal or Classical pronunciations which IMO shouldn't be present in most cases. Benwing2 (talk) 04:13, 16 March 2023 (UTC)- @Benwing2: Yeah, I know about commas, I meant separating them by variety, sense, etc. with a label inside.
- muhâraba/mohârabe are obvious Dari/modern Iranian pairs and some others Anatoli T. (обсудить/вклад) 04:43, 16 March 2023 (UTC)
- Thanks! IMO multiple translits should only be used with multiple lemmas (which are now supported by User:Theknightwho's changes, but more work is required). It's very common to have multiple translits giving alternative vocalizations, e.g.
- @Benwing2: Done. The classical transliterations were added by User:Fay Freak. I am not sure if multiple transliterations are sustainable. Anatoli T. (обсудить/вклад) 03:58, 16 March 2023 (UTC)
- OK, thanks. The remaining cases with multiple translit (I think this is all): سنبوسه, شوکران, پودینه, مردار سنگ, اسفرزه, اسپغول, اسبغول. Benwing2 (talk) 03:43, 16 March 2023 (UTC)
- A weird case: درود which tries to list both Classical and modern Iranian translits. Benwing2 (talk) 02:40, 16 March 2023 (UTC)
- BTW do you have a good reference for Persian verbs? Benwing2 (talk) 01:48, 16 March 2023 (UTC)
- OK thanks. I found a bunch more cases of stressed vowels in the translit of verbal conjugations, e.g. تبریک گفتن, انزال کردن, مردن. All of these will have the stress marks removed from the translit; just want to make sure that's OK. Benwing2 (talk) 01:45, 16 March 2023 (UTC)
One more question ... I have been canonicalizing ey + vowel to iy + vowel. Is it correct to do the same thing to eyy + vowel? There are some places where it seems to work well, e.g. فعالیت variously transliterated as fa'âliyyat or fa'âleyyat on the same page, but there's also the case of سید which is given the IPA of sayyid and the current transliteration of seyyed, which my script would canonicalize to siyyed, which may not be correct. (Another such case: عیاشی with current translit 'eyyâši.) Benwing2 (talk) 05:13, 16 March 2023 (UTC)
- BTW if my script can't match the Persian and translit, it falls back to "self-canonicalization" of each individually, which only does changes that are relatively safe. Most of the time this fixes problems, but occasionally it makes mistaken translits worse (so many existing translits are all messed up). Once I run the script I will publish a list of cases where it couldn't match and self-canonicalization made a change to the translit or the Persian, so we can review them and fix the mistakes. Benwing2 (talk) 05:18, 16 March 2023 (UTC)
- @Benwing2: It's the same problem as with 'espâniya'. You correctly observed that a short "i" happens before "y" in some words but it's more like an Arabic nisba (sg or pl) but there are words where "e" before "y" is legit as well, like "seyyed" or "'eyyâši". It has to do with the etymology, which the script won't be able to pick up. "seyyed" is from "sayyid" with a common vowel change, same with 'eyyâši, which is from ʕayyāšī. Anatoli T. (обсудить/вклад) 06:26, 16 March 2023 (UTC)
- Hmm, what about single ey + vowel -> iy + vowel, is this also problematic? Benwing2 (talk) 06:30, 16 March 2023 (UTC)
- @Benwing2: Can you give an example or two of that? Anatoli T. (обсудить/вклад) 06:32, 16 March 2023 (UTC)
- It's to correct e.g. Ligureyâ (= Liguria) -> liguriyâ and ریاض reyâz -> riyâz. I think it may be wrong though as it corrects eyâlât -> iyâlât and سیک siyak, seyek -> siyak, siyek. Benwing2 (talk) 06:40, 16 March 2023 (UTC)
- @Benwing2: It's my guess here. It should be for modern Iranian لیگوریا "liguriyâ" (espâniyâ case) and the rest - ریاض "reyâz" and سیک "seyek" (= سهیک se-yek). Anatoli T. (обсудить/вклад) 07:56, 16 March 2023 (UTC)
- OK, thanks, I'll make it so that -eyâ -> -iyâ but otherwise i or e before y stays as-is. Benwing2 (talk) 07:58, 16 March 2023 (UTC)
- @Benwing2: The transliteration for سیاحت as "seyâhat" seems attestable. Anatoli T. (обсудить/вклад) 08:07, 16 March 2023 (UTC)
- I am making the change -eyâ -> -iyâ only word finally; hopefully that is safe. Benwing2 (talk) 08:10, 16 March 2023 (UTC)
- @Benwing2: Yep, hopefully. It may include plurals, other word forms and derivatives. Anatoli T. (обсудить/вклад) 08:13, 16 March 2023 (UTC)
- I am doing (yet another) run on all 38,151 pages with manual Persian translit. I will see which words get changed -eyâ -> -iyâ to verify that it's safe. This is probably the last change I'll make, and I'll probably run the script with
--save
tomorrow. Benwing2 (talk) 08:17, 16 March 2023 (UTC) - BTW I mentioned above the need for manual review in the cases where a match couldn't be made and self-canonicalization made a change. This will be on the order of ~750 cases to review and correct if necessary. The reviewing will be done in a single text file to speed things up. A few examples:
- I am doing (yet another) run on all 38,151 pages with manual Persian translit. I will see which words get changed -eyâ -> -iyâ to verify that it's safe. This is probably the last change I'll make, and I'll probably run the script with
- @Benwing2: Yep, hopefully. It may include plurals, other word forms and derivatives. Anatoli T. (обсудить/вклад) 08:13, 16 March 2023 (UTC)
- I am making the change -eyâ -> -iyâ only word finally; hopefully that is safe. Benwing2 (talk) 08:10, 16 March 2023 (UTC)
- @Benwing2: The transliteration for سیاحت as "seyâhat" seems attestable. Anatoli T. (обсудить/вклад) 08:07, 16 March 2023 (UTC)
- OK, thanks, I'll make it so that -eyâ -> -iyâ but otherwise i or e before y stays as-is. Benwing2 (talk) 07:58, 16 March 2023 (UTC)
- @Benwing2: It's my guess here. It should be for modern Iranian لیگوریا "liguriyâ" (espâniyâ case) and the rest - ریاض "reyâz" and سیک "seyek" (= سهیک se-yek). Anatoli T. (обсудить/вклад) 07:56, 16 March 2023 (UTC)
- It's to correct e.g. Ligureyâ (= Liguria) -> liguriyâ and ریاض reyâz -> riyâz. I think it may be wrong though as it corrects eyâlât -> iyâlât and سیک siyak, seyek -> siyak, siyek. Benwing2 (talk) 06:40, 16 March 2023 (UTC)
- @Benwing2: Can you give an example or two of that? Anatoli T. (обсудить/вклад) 06:32, 16 March 2023 (UTC)
- Hmm, what about single ey + vowel -> iy + vowel, is this also problematic? Benwing2 (talk) 06:30, 16 March 2023 (UTC)
Page 936 jump: WARNING: Unable to match-canon پایین پریدن (paein paridan): Unable to match Arabic character ی at index 3, Latin character n at index 4 [self-canon Latin paein paridan -> paeyn paridan]: <from> {{t|fa|پایین پریدن|tr=paeyn paridan}} <to> {{t|fa|پایین پریدن|tr=paeyn paridan}} <end> Page 967 fuchsia: WARNING: Unable to match-canon فوکسیه (fūksīye): Unable to match trailing Latin character y at index 5 [self-canon Latin fūksīye -> fûksîye]: <from> {{t+|fa|فوکسیه|tr=fûksîye}} <to> {{t+|fa|فوکسیه|tr=fûksîye}} <end> Page 1244 Briton: WARNING: Unable to match-canon بریتانیایی (berīṭāniyāyī): Unable to match Arabic character ت at index 3, Latin character ṭ at index 4 [self-canon Latin berīṭāniyāyī -> berîtâniyâyî]: <from> {{t|fa|بریتانیایی|tr=berîtâniyâyî}} <to> {{t|fa|بریتانیایی|tr=berîtâniyâyî}} <end> Page 1506 common: WARNING: Unable to match-canon شایع (šâyee): Unable to match trailing Latin character ê at index 3 [self-canon Latin šâyee -> šâyê]: <from> {{t+|fa|شایع|tr=šâyê}} <to> {{t+|fa|شایع|tr=šâyê}} <end> Page 1528 condition: WARNING: Unable to match-canon شرط (wart): Unable to match Arabic character ش at index 0, Latin character w at index 0 [self-canon Latin wart -> vart]: <from> {{t+|fa|شرط|tr=vart}} <to> {{t+|fa|شرط|tr=vart}} <end>
- In these examples, the first one has paein -> paeyn for پایین; I don't know what's right here. The second one should have i instead of î. The third one may be correct but the matching got confused by ṭ opposite ت. The fourth one has a weird translit šâyee -> šâyê for شایع, which maybe should be šâyi'. The fifth one seems to have wart in place of šart. Benwing2 (talk) 08:27, 16 March 2023 (UTC)
- @Benwing2: Thank you very much!
- It's obviously hard to convert incorrect transliterations when there is no pattern. it makes a bit of sense to do fūksīye->fûksîye->fuksiye and berīṭāniyāyī->berîtâniyâyî->beritâniyâyi
- From the above, the correct translit is on the right:
- پایین - pâyin
- فوکسیه - fuksiye
- بریتانیایی - beritâniyâyi
- شایع - šâye'
- شرط - šart
- Any unmatched conversions I can go through and fix manually. If you dump them into a page a with links to page where they occur, I will go through it over time. Anatoli T. (обсудить/вклад) 22:47, 16 March 2023 (UTC)
- In these examples, the first one has paein -> paeyn for پایین; I don't know what's right here. The second one should have i instead of î. The third one may be correct but the matching got confused by ṭ opposite ت. The fourth one has a weird translit šâyee -> šâyê for شایع, which maybe should be šâyi'. The fifth one seems to have wart in place of šart. Benwing2 (talk) 08:27, 16 March 2023 (UTC)
See User:Benwing2/canon-persian-warning-self-or-cross-canon-1 (350 lines) and User:Benwing2/canon-persian-warning-self-or-cross-canon-2 (365 lines). Total of 715 lines. I have split them into two to make them easier to manage. All you need to do is edit the part AFTER <to>
and before <end>
, and I can run a script to update the actual entries. There are lots of other issues as well but these are the most important ones. Benwing2 (talk) 23:05, 16 March 2023 (UTC)
- @Benwing2: Hi. Thank you for producing the list of issues. I have started working directly with these files. You probably noticed my edits on the first one but it may be easier for me to handle the entries directly. Anatoli T. (обсудить/вклад) 23:43, 19 March 2023 (UTC)
- Sounds good, feel free to do what works for you. If you want I'll run the push-changes script on what you've already edited in the file. BTW User:Saranamd answered some questions about iyV vs. eyV in Module talk:fa-IPA; apparently 'riyâz' is correct. Benwing2 (talk) 23:47, 19 March 2023 (UTC)
- @Benwing2: Thanks. Yes, you can push the changes. Some I have deleted corrected altogether or changed the Persian script as well. I saw some edits. I have correct 'riyâz' in the entry and translation. Anatoli T. (обсудить/вклад) 23:49, 19 March 2023 (UTC)
- @Benwing2: Thanks for providing the files. I've gone through and corrected where I could and where I thought it was necessary. I didn't change many usage examples and long quotes, especially where a classical transliteration was used. Some were too hard, so I didn't attempt them.
- Overall, the quality of Persian transliterations has improved and matches the policies and common practice. Thank you!
- I would still appreciate your development input on Persian translit and pronunciation modules, if you're still interested :) If you want to address Czech first, it's fine too. You'll get more help and resources there and probably makes much more sense as well. Anatoli T. (обсудить/вклад) 23:44, 26 March 2023 (UTC)
- @Atitarev Hi, I got distracted by Czech but Czech nouns are turning out a lot more complicated than I thought. It's a huge mess of exceptions and exceptions to exceptions, etc. I'm close to finishing the changes to the Persian IPA module and should just finish that before trying to solve all the Czech edge cases. (BTW sometimes I feel like I'm behaving like Leonardo da Vinci, who was notorious for starting things and not finishing them :) ... I also started a generic headword library to help simplify the 200 or so existing headword modules, but that's another big project.) Thanks for working on the Persian warnings; note that there were actually a lot more output than the 715 that I dumped here, but I didn't want to overwhelm you and chose the ones that I thought were highest priority. If you want the remainder I can dump them as well. Benwing2 (talk) 00:07, 27 March 2023 (UTC)
- @Benwing2: Thanks. If you're able to limit warnings to templates or where those occurrences happen, pls send results only from
{{t+}}
(and the like), headword transliterations, such{{fa-noun}}
, etc. - Depending on the number, maybe also
{{m}}
,{{l}}
and{{alter}}
. {{usex}}
or{{quote}}
are two long and boldfaced terms were incorrectly reported as suspicious. Anatoli T. (обсудить/вклад) 00:16, 27 March 2023 (UTC)- @Benwing2: Re the Persian IPA, not all questions were answered, unfortunately, and not enough resources e.g. "espâniyâ" (long or short "i"? and how to force a short one in the module? what is the rule here?) Anatoli T. (обсудить/вклад) 00:18, 27 March 2023 (UTC)
- Here are the remaining warnings involving translation templates: User:Benwing2/canon-persian-warning-translation-templates-not-self-or-cross-canon There are about 1800 warnings left of all sorts but it looks like only 374 involve translation templates. This is not in the from-to-end format because I assume you'll make the edits directly. As for the module, I am working on adding auto-stress and incorporating the Tehrani and Kabuli pronunciation changes, so we can conver the remaining uses of
{{fa-IPA/old}}
. I will undoubtedly have more questions. Not sure what we can do about cases like espâniyâ, maybe we can ping one of the Persian editors again. Benwing2 (talk) 00:31, 27 March 2023 (UTC)
- Here are the remaining warnings involving translation templates: User:Benwing2/canon-persian-warning-translation-templates-not-self-or-cross-canon There are about 1800 warnings left of all sorts but it looks like only 374 involve translation templates. This is not in the from-to-end format because I assume you'll make the edits directly. As for the module, I am working on adding auto-stress and incorporating the Tehrani and Kabuli pronunciation changes, so we can conver the remaining uses of
- @Benwing2: Thanks. If you're able to limit warnings to templates or where those occurrences happen, pls send results only from
- @Atitarev Hi, I got distracted by Czech but Czech nouns are turning out a lot more complicated than I thought. It's a huge mess of exceptions and exceptions to exceptions, etc. I'm close to finishing the changes to the Persian IPA module and should just finish that before trying to solve all the Czech edge cases. (BTW sometimes I feel like I'm behaving like Leonardo da Vinci, who was notorious for starting things and not finishing them :) ... I also started a generic headword library to help simplify the 200 or so existing headword modules, but that's another big project.) Thanks for working on the Persian warnings; note that there were actually a lot more output than the 715 that I dumped here, but I didn't want to overwhelm you and chose the ones that I thought were highest priority. If you want the remainder I can dump them as well. Benwing2 (talk) 00:07, 27 March 2023 (UTC)
- @Benwing2: Thanks. Yes, you can push the changes. Some I have deleted corrected altogether or changed the Persian script as well. I saw some edits. I have correct 'riyâz' in the entry and translation. Anatoli T. (обсудить/вклад) 23:49, 19 March 2023 (UTC)
- Sounds good, feel free to do what works for you. If you want I'll run the push-changes script on what you've already edited in the file. BTW User:Saranamd answered some questions about iyV vs. eyV in Module talk:fa-IPA; apparently 'riyâz' is correct. Benwing2 (talk) 23:47, 19 March 2023 (UTC)
Please help
[edit]Ive got no clue what to do with the Ukrainian declension code on курінь (kurinʹ). What kind of sound change is it? Переход ятя в 'е' кроме именительного падежа??? Do you have any advice?— This unsigned comment was added by Tollef Salemann (talk • contribs).
- @Tollef Salemann: Done. It's not "io" but "ie" alteration, b-accent pattern. --Anatoli T. (обсудить/вклад) 01:52, 4 April 2023 (UTC)
- wow thanks! i tried 'ie' in many positions but it was fail as well, because i forgot some extra signs were not necessary Tollef Salemann (talk) 01:59, 4 April 2023 (UTC)
Dari Transliterations
[edit]@Atitarev @Benwing2 the module is changing the transliteration of Dari-Specific words to Iranian Persian. For example, if you look at this revision, all of the Dari names for the Gregorian months have been changed to Iranian Transliteration, despite Iranian Persian not using any of these spellings. Earlier one of you we're discussing the possibility of linking multiple transliterations to the same entry, perhaps we could look into that?? I feel like it would prevent these kinds of issues. Also on pages where both the Dari and Iranian transliteration were listed, the Dari is converted to Iranian Persian. Causing the Header to list the same transliteration twice. Maybe the Module can avoid changing words if it causes a word to be listed twice?? Sameerhameedy (talk) 00:58, 9 April 2023 (UTC)
- @Sameerhameedy, Atitarev There are several things going on here. One is in reference to the bot run that I did to clean up transliterations. I had it skip cases where it could figure out that it was a Dari or Classical word; it did this by looking for the words 'Dari' or 'Classical' in the nearby text, and for whatever reason it didn't see the word 'Dari' stashed away in a template parameter. Another issue has to do with transliteration moduless. In this case the transliterations are manual so the transliteration module wouldn't apply, but in general we now have support (courtesy of User:Theknightwho) for different transliteration schemes for different etymology languages. In order for this to work, we'd have to (a) use 'prs' instead of 'fa' as the language code, and (b) have all the right diacritics in place. A big issue is that the Arabic script doesn't natively have diacritics to distinguish ē from ī and ō from ū; we'd have to do some sort of hack. (Support for this now exists as well but we'd have to figure out a reasonable scheme.) Finally, in reference to avoiding changing words if it causes a word to be listed twice, my bot run (but probably not the transliteration module) could potentially have done that, although in most cases there should have been the word 'Dari' nearby. In any case, I probably won't be running that bot script again for awhile. Benwing2 (talk) 01:46, 9 April 2023 (UTC)
About calques of country names
[edit]There might already be an unwritten community approval regarding country names. If you look at the English entries for official country names, many of them are noted as calque. See Democratic People's Republic of Korea, Islamic Republic of Iran, People's Republic of China, United Mexican States.
Moreover, through this pattern of thinking I just yesterday was able to discern calques of country names.
In the GDR, the name of the Democratic People's Republic of Korea (<= the direct translation from Korean) was Koreanische Volksdemokratische Republik instead of Demokratische Volksrepublik Korea. It might seem trivial but there is a real semantic difference. I'm pretty sure the GDR directly translated the name from the Warsaw Pact's lingua franca: Корейская Народно-Демократическая Республика. Just like Polish Koreańska Republika Ludowo-Demokratyczna, which features that same semantic nuance.
Oh well, where are we: I just wanted to showcase to you the potential of such inclusions would have, as through discerning the semantic differences in country names one can trace paths as to how diplomacy and spreading of information worked throughout the times and areas. Synotia (talk) 07:54, 7 May 2023 (UTC)
The term improvised explosive device was coined during the Northern Ireland conflict. While equivalent in other languages like German don't exactly match the English term, the Russian term does. When did it start being used in Russia? I find no indication that its usage predates the English term. Synotia (talk) 09:38, 7 May 2023 (UTC)
The definition was a string of male-centric slang words. In current modern Russian as of the 2020s this word is in general use by girls and women, as well as significantly by males as applied to girls and women, and is hence not male-centric, and aside from still being vulgar is otherwise the default unmarked (but vulgar) word for to masturbate. Even in males (and in females, by both or all sexes) it is widely used in reference to things such as anal masturbation to which the English words as currently on the page are not applicable and are defining the word incorrectly. So I updated the definition to primarily feature "to masturbate" and secondarily "to jerk off" as an echo of what it perhaps was, and removed the slang label, but I kept it as vulgar. That is, I propose the word in modern Russian above all primarily means to masturbate -- various genitals or erogenous zones variously, and not any male-genitals centric "whack off" to the exclusion of everything else that is not about masturbating the penis, but unlike English "to masturbate" the word cannot be used in formal language, where primarily мастурбировать replaces it. Why do you disagree with my edit? 2A00:6020:5013:6700:B6A2:FB34:6518:BD9F 07:19, 9 May 2023 (UTC)
- We would usually not delete older meanings, but perhaps add newer ones. DCDuring (talk) 12:18, 9 May 2023 (UTC)
- I doubt that your edits brought over this information, while your two claims were correct. Note that we recently disputed the application of the label “slang”, which you couldn’t know. Fay Freak (talk) 15:54, 9 May 2023 (UTC)
- POG Gnosandes ❀ (talk) 11:06, 10 May 2023 (UTC)
- Aren't you ashamed to write about such details! hahaha Gnosandes ❀ (talk) 11:08, 10 May 2023 (UTC)
Should we include the diaeresis in page names for terms with ѣ̈?
[edit]ѣ̈ (jǒ) is obviously very rare, as we only have entries for it in one pre-reform lemma - гнѣ̈здышко (gnjǒ́zdyško) - as well as the inflections of four others: звѣзда́ (zvězdá), гнѣздо́ (gnězdó), лѣса́ (lěsá) and сѣдло́ (sědló).
At the moment, we don't include it in page titles (e.g. гнѣздышко), but this seems inconsistent with our treatment of ё (jo). It also leads to confusion at гнѣзда, where we have гнѣ̈зда (gnjǒ́zda) and гнѣзда́ (gnězdá), which are different inflections of the same term.
My gut feeling is that we should take the same approach as ё (jo), by including it as part of the page title. Though it's unlikely to happen, there could plausibly be a multi-word term with ё (jo) and ѣ̈ (jǒ), and it would be very weird to remove the diaeresis from one but not the other. Theknightwho (talk) 15:00, 9 May 2023 (UTC)
- @Theknightwho: Both options have their amenities, I have no strong opinion. The analogy may recommend its presence in the page title. It is weird though and hence somewhat unexpected to retroactively apply a standard to an orthography that was out of use when the standard was created. Note also the forms обрѣ̈лъ (obrjǒ́l), пріобрѣ̈лъ (priobrjǒ́l) and so on. Fay Freak (talk) 15:49, 9 May 2023 (UTC)
- @Theknightwho: Hi. I’d leave as is. Diaeresis over ё is not considered as a diacritic, it is a letter of the alphabet (with reduced usage) but ѣ̈ is totally artificial, only a means to tweak the transliteration, perhaps it should even be invisible to users. @Benwing2? Anatoli T. (обсудить/вклад) 21:45, 9 May 2023 (UTC)
- @Theknightwho: Actually, Talk:гнѣзда has interesting points, so let me think about it. BTW, I don’t consider myself an expert on old Russian orthographies. Anatoli T. (обсудить/вклад) 21:54, 9 May 2023 (UTC)
- @Atitarev Yeah - I am curious if it saw historical use in texts, because the evidence from dictionaries suggests it probably did. Btw, it didn’t work properly in transliterations until yesterday, so I don’t think that’s why we have it. We also have a usage note template for it:
{{U:ru:ѣ like ё}}
. Theknightwho (talk) 22:26, 9 May 2023 (UTC)
- @Atitarev Yeah - I am curious if it saw historical use in texts, because the evidence from dictionaries suggests it probably did. Btw, it didn’t work properly in transliterations until yesterday, so I don’t think that’s why we have it. We also have a usage note template for it:
- @Theknightwho: Actually, Talk:гнѣзда has interesting points, so let me think about it. BTW, I don’t consider myself an expert on old Russian orthographies. Anatoli T. (обсудить/вклад) 21:54, 9 May 2023 (UTC)
I just saw you added this as a Slovene translation for understandable, but that entry only has a Serbo-Croatian section. Did you use the wrong code or does this word exist in both languages? Acolyte of Ice (talk) 10:48, 11 May 2023 (UTC)
- @Acolyte of Ice: I seldom mix up language codes. The word exists in both languages. Anatoli T. (обсудить/вклад) 11:28, 11 May 2023 (UTC)
Dari Layouts
[edit]I made rough drafts of all the layouts we were discussing. Which one do you prefer?? Let me know if you have any feedback. @Anatoli T. @Benwing2
سَمِیر | sameer (talk) 05:12, 17 May 2023 (UTC)
- @Sameerhameedy: Wow! As for the pronunciation sections, I like both!
- As for the headwords, I prefer multiple headers/when there are more. The labels, like (Iranian Transliteration) should be coming from parameters but no translit is also OK. Anatoli T. (обсудить/вклад) 05:31, 17 May 2023 (UTC)
- @Anatoli T. Perhaps the vote (I think we need a vote) for how to format the pronunciation section can be separate from how to format the entry? Also, how do we set up a vote? Do you think it needs minor adjustments still?? سَمِیر | sameer (talk) 20:10, 17 May 2023 (UTC)
- @Sameerhameedy: I also think with the multiple header solution the vocalisation (harakat) can possibly be placed on the header (as an alternative to or on top of the pronunciation section). Anatoli T. (обсудить/вклад) 06:06, 17 May 2023 (UTC)
- @Anatoli T. Okay I tried that rn, Is this what you mean?? If not can you edit it and tag me so I can see?? سَمِیر | sameer (talk) 20:10, 17 May 2023 (UTC)
- @Sameerhameedy: That's fine, it looks good. It was just a generic comment about how, since you just presented a desired display. The module and template changes are yet to be done, so we can't demonstrate the use of parameters yet, e.g.
|cls=
=1 in the headword, e.g.{{fa-noun}}
to get the label, e.g. [Class. Pers.] and the appropriate subcategory, e.g. Category:Classical Persian terms (or similar). Anatoli T. (обсудить/вклад) 23:07, 17 May 2023 (UTC)- @Anatoli T. Oh okay. I was under the impression that a vote was needed before module changes would be made. Or, are changes to the modules made prior so they can be demonstrated? سَمِیر | sameer (talk) 21:29, 19 May 2023 (UTC)
- @Sameerhameedy: Thanks. As long as there is an agreement, the work can start but as you can see, apart from ZxxZxxZ, there was little active participation. I hope the skilled guys can start some work when they get freer, anyway. Anatoli T. (обсудить/вклад) 23:22, 19 May 2023 (UTC)
- @Anatoli T., So, just to be sure, there doesn't need to be a request or something? I know to wait until the skilled contributors are available, but how will they know that their help is even needed? سَمِیر | sameer (talk) 18:23, 31 May 2023 (UTC)
- @Sameerhameedy: Modularisation of Persian has been delayed and is way overdue. Not sure when it will happen. I think the options agreed on are mostly good to get started. I hope it will be @Benwing2 who will make improvements to the Persian headword when he has time. @Benwing2, please tell us if you plan to work on Persian in the near future, if you need anything or if there are still serious unanswered questions or lack of agreement on something. Anatoli T. (обсудить/вклад) 01:41, 2 June 2023 (UTC)
- @Atitarev, Sameerhameedy Apologies, the Persian work has gotten delayed by a combination of work on Czech, more recently Catalan, and getting COVID, which wiped me out for most of a month. It would help if one or both of you would enumerate what work needs to be done and in approximately what order (i.e. what is most important); I'm not sure at this point where to start. Benwing2 (talk) 06:50, 2 June 2023 (UTC)
- @Benwing2, @Sameerhameedy:
- Thanks. I personally think, working on the modularisation would be good as a start, allowing the unnamed first parameter as a vocalised form, variety parameters per the suggestions above (default to Iranian Persian but allow others with labels related transliterations). Please see the agreed Wiktionary_talk:Persian_transliteration#New_System,_Examples:, scroll down to Option2, headwords on multiple lines.
- Rework the IPA module and fix transliteration module are probably the priorities. The pronunciation template should probably look similar to Korean or Chinese options above offered by Sameerhameedy at User_talk:Atitarev#Dari_Layouts, whatever is best. We will you give you more details and answer the questions, hopefully, if you get the chance to do it. Anatoli T. (обсудить/вклад) 13:48, 3 June 2023 (UTC)
- @Anatoli T. speaking of, can I put the Classical-Dari transliteration into Persian transliteration?? I can also put a disclaimer saying that the Classical-Dari transliteration isn't ready for general use. There have been 3 conversations with no opposition; I also held a vote to make sure there wasn't opposition and there was no participation, so I think it might be safe to add it. Also wouldn't it be best if both transliteration styles were standardized before work begins on the modules? Let me know what you think. سَمِیر | sameer (talk) 04:25, 6 June 2023 (UTC)
- @Sameerhameedy: Of course, it's a good idea. What if you make Wiktionary:Persian_transliteration/Dari page? Similar to how say Wiktionary:About Chinese/Hakka Anatoli T. (обсудить/вклад) 04:37, 6 June 2023 (UTC)
- Hi, just to let you know I moved all examples to one page instead of having it copied on multiple pages, for consistency. سَمِیر | sameer (talk) 02:01, 30 July 2023 (UTC)
- @Sameerhameedy: Of course, it's a good idea. What if you make Wiktionary:Persian_transliteration/Dari page? Similar to how say Wiktionary:About Chinese/Hakka Anatoli T. (обсудить/вклад) 04:37, 6 June 2023 (UTC)
- @Anatoli T. speaking of, can I put the Classical-Dari transliteration into Persian transliteration?? I can also put a disclaimer saying that the Classical-Dari transliteration isn't ready for general use. There have been 3 conversations with no opposition; I also held a vote to make sure there wasn't opposition and there was no participation, so I think it might be safe to add it. Also wouldn't it be best if both transliteration styles were standardized before work begins on the modules? Let me know what you think. سَمِیر | sameer (talk) 04:25, 6 June 2023 (UTC)
- @Atitarev, Sameerhameedy Apologies, the Persian work has gotten delayed by a combination of work on Czech, more recently Catalan, and getting COVID, which wiped me out for most of a month. It would help if one or both of you would enumerate what work needs to be done and in approximately what order (i.e. what is most important); I'm not sure at this point where to start. Benwing2 (talk) 06:50, 2 June 2023 (UTC)
- @Sameerhameedy: Modularisation of Persian has been delayed and is way overdue. Not sure when it will happen. I think the options agreed on are mostly good to get started. I hope it will be @Benwing2 who will make improvements to the Persian headword when he has time. @Benwing2, please tell us if you plan to work on Persian in the near future, if you need anything or if there are still serious unanswered questions or lack of agreement on something. Anatoli T. (обсудить/вклад) 01:41, 2 June 2023 (UTC)
- @Anatoli T., So, just to be sure, there doesn't need to be a request or something? I know to wait until the skilled contributors are available, but how will they know that their help is even needed? سَمِیر | sameer (talk) 18:23, 31 May 2023 (UTC)
- @Sameerhameedy: Thanks. As long as there is an agreement, the work can start but as you can see, apart from ZxxZxxZ, there was little active participation. I hope the skilled guys can start some work when they get freer, anyway. Anatoli T. (обсудить/вклад) 23:22, 19 May 2023 (UTC)
- @Anatoli T. Oh okay. I was under the impression that a vote was needed before module changes would be made. Or, are changes to the modules made prior so they can be demonstrated? سَمِیر | sameer (talk) 21:29, 19 May 2023 (UTC)
- @Sameerhameedy: That's fine, it looks good. It was just a generic comment about how, since you just presented a desired display. The module and template changes are yet to be done, so we can't demonstrate the use of parameters yet, e.g.
- @Anatoli T. Okay I tried that rn, Is this what you mean?? If not can you edit it and tag me so I can see?? سَمِیر | sameer (talk) 20:10, 17 May 2023 (UTC)
Спасибо and neo-pagans
[edit]I don't say that спасибо has any Christian connotations, but that many neo-pagans do see these connotations because of etymology. I am not a neo-pagan by any way and I'm tired of their folk-etymology-stuff, but the neo-pagan use of благодарю may be considered as a sociolect (kinda New Age slang). See also благодарствую (blagodarstvuju). Tollef Salemann (talk) 09:17, 21 May 2023 (UTC)
- The problem is that is all your speculations. The majority of speakers don’t know or don’t care about the (perceived) etymology. Спасибо is the most common, благодарю/благодарим is somewhat more formal. Anatoli T. (обсудить/вклад) 12:51, 21 May 2023 (UTC)
- Well, I'm agree that it can be good to have some decent sources/researches on neo-pagan slang (родноверский новояз?). But I stongly disagree in the orientation towards the majority. In fact, many people do care about the "right" accents and the "real" meaning of words. As example, if I don't like use of endings like -логиня (-loginja), it doesn't mean they ain't exist. Folk etymology does have impact on re-combination of words and their popularity. But in the case with the neo pagan slang, it is probably hard to find a good statistics on word use, and make any objective conclusions. Tollef Salemann (talk) 15:24, 21 May 2023 (UTC)
- Anyway, if you insist, I will wait with inclusion of the neo-pagan newspeak before I get some good sources on it. In this case I maybe should also ignore politized stuff like хлопок (xlopok, “explosion”). Tollef Salemann (talk) 15:35, 21 May 2023 (UTC)
participles like переведясь
[edit]Hi. Hope you are feeling better. I am almost through deprecating {{ru-participle of}}
in favor of generic {{participle of|ru}}
and I found a few weird cases. The participles переведясь, взбредя and отведясь are given as both present and past adverbial imperfective, in both the entry for these participles and in the table for the corresponding verbs. It seems that the module also generates these. Are they correct? Benwing2 (talk) 01:30, 27 June 2023 (UTC)
- @Benwing2: Getting better, thanks. They are all past tense perfective adverbials. The conjugation tables used wrong aspect, hence the errors. Anatoli T. (обсудить/вклад) 05:47, 27 June 2023 (UTC)
Template:R:zh:zbgycxdb name
[edit]The Hanyu Pinyin of the 重編國語辭典修訂本 (Revised Mandarin Chinese Dictionary) should be Chóngbiān guóyǔ cídiǎn xiūdìngběn. When 重 means "again, repeat", its Hanyu pinyn is chóng. (You can find it here [3], [4]), I suggest the title to be changed to Template:R:zh:cbgycdxdb Kethyga (talk) 21:47, 28 June 2023 (UTC)
- @Kethyga: Thanks, I moved it leaving a redirect behind. The referenced entries can be manually fixed, they are not many. Anatoli T. (обсудить/вклад) 23:44, 28 June 2023 (UTC)
несвободные изображения
[edit]Здравствуйте. Интересует ситуация с несвободными изображениями на английском Викисловаре.
С одной стороны, сообщество приняло NFCC. С другой, здесь их ровно одна штука. В то время, как есть большое количество статей (например, Pokémon), качественно проиллюстрировать которые можно только несвободными файлами. И где-то рядом запрет на локальную загрузки до администратора (вот об этом я вообще никаких упоминаний не нашёл). Но даже так, хоть сколько-нибудь изображений админ-состав загрузил бы за 10 лет.
Вам что-нибудь известно по этому поводу? — Ирука13 04:32, 5 July 2023 (UTC)
- @Iruka13: I am away from home skiing till mid-July. Try asking at WT:ID. Anatoli T. (обсудить/вклад) 07:03, 5 July 2023 (UTC)
Hello. In 2014, you added ころあい (“suitable, reasonable”) here at 格好. This has since been moved to 恰好, where it is now. When I searched 恰好 and ころあい in Sakura Paris(which is, admittedly, one of the only Japanese dictionary websites I know), I only got the pronunciation of かっこう and the kanji spelling of 頃合(い) respectively. When I googled ‘“ころあい” “恰好”’, I couldn’t find any relevant examples. Could you please provide your sources or quotations from which you added this word, if you still have them? Thank you. Mcph2 (talk) 14:56, 15 July 2023 (UTC)
- @Mcph2: Hello. I am just back from leave. I don't remember what my sources were. I can see the two readings in Goo 辞書. Anatoli T. (обсудить/вклад) 23:41, 16 July 2023 (UTC)
- @Atitarev: I see that the entry says the following:
- 〔姿,形,様子〕a figure;〔形〕(a) shape
- 〔ころあい〕格好な 〔適当な〕suitable ((for));〔妥当な〕reasonable
- Comparing the two definitions, I think ころあい is a synonym of 格好, like 姿,形,様子 are synonyms of 格好 when it is a noun. In searches like 一寸 and 風車, Goo 辞書 separates the two readings into different entries. Mcph2 (talk) 01:18, 17 July 2023 (UTC)
- @Mcph2: You must be right. Please change accordingly. Anatoli T. (обсудить/вклад) 01:37, 17 July 2023 (UTC)
- @Atitarev: I see that the entry says the following:
Question
[edit]In entry wikt:en:葉連娜, how should I correcyly add three different transliterations of a Russian word into 1 sentences through {{name translit}}
, or is there a better way? Kethyga (talk) 22:27, 22 August 2023 (UTC)
- @Kethyga: I am not sure. I tried. You can ask on WT:GP. Anatoli T. (обсудить/вклад) 23:28, 22 August 2023 (UTC)
fa-IPA
[edit]check out Template:fa-IPA/sandbox/documentation and let me know what you think! Classical Persian needs to be changed to output in phonemic brackets // rather than [], but I can't figure out how to do that so I had to ask Ben and Fenakhay about that. Hopefully one of them will be able to do it, whenever they are available. سَمِیر | sameer (مشارکتها • با مرا گپ بزن) 07:40, 25 August 2023 (UTC)
- @Sameerhameedy: Looks amazing!
- BTW, I noticed that Tajik on "eron" or "afġoniston" is in lower case. It's OK if it's automatically generated but if you use Эрон (Eron) or Афғонистон (Afġoniston) you'll get the right case. (or
|cap=
y, as in Korean, which will only capitalise Tajik). - I like that you're embracing vocalisation. Are you planning to use sokun, e.g. /کِشوَر/ or /کِشْوَر/? Also, will vocalisations be part of the headword?
- You can ping me on the project's or template's talk page, so that we don't spread discussion all over the place. I'm adding them to watchlist.
- @Benwing2, @Fenakhay. Anatoli T. (обсудить/вклад) 07:55, 25 August 2023 (UTC)
миротворец
[edit]this is certified, look at this link http://rusyndictionary.com/ Stríðsdrengur (talk) 02:38, 22 September 2023 (UTC)
- @Stríðsdrengur: Thanks. I have changed to use a regular template. Anatoli T. (обсудить/вклад) 04:12, 22 September 2023 (UTC)
translations
[edit]Thanks for fixing the diacritic thing, I noticed two other issues though.
1) Dari translations don't redirect to the Persian entry, they just redirect to the page itself?? e.t. {{lang|prs|تیرم}} should be (and it used to be) equivalent to [[تیرم#Persian]] but now it isn't. Also I have the gadget that shows yellow links for pages that exist but don't have an entry for the corresponding language. It shows Dari in yellow even if there is a Persian entry. Any idea why that's happening?
2) when using t+ for Dari it doesn't redirect to Persian wiktionary. الماری (fa) forwards to the page prs:الماری for example. I suspect this has something to do with namespaces because if you search up "fa:<insert word>" it'll redirect to Persian wiktionary. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 04:18, 13 October 2023 (UTC)
- also @Benwing2 سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 04:46, 13 October 2023 (UTC)
- @Sameerhameedy I have fixed the data modules for the translation adder gadget and Module:wikimedia languages/data but I suspect there are some code changes needed for the translation modules themselves to fix these issues. Benwing2 (talk) 05:08, 13 October 2023 (UTC)
- @Benwing2 thank you for fixing it 🙏🙏. But yellow link issue (see above) is still happening for me though, any idea why? سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 05:23, 13 October 2023 (UTC)
- @Sameerhameedy It should be fixed now; User:Atitarev accidentally made Dari a full language. Anatoli can you clarify what you were trying to accomplish by this change? Benwing2 (talk) 05:26, 13 October 2023 (UTC)
- yes all Dari links are functional again tysm! سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 05:38, 13 October 2023 (UTC)
- @Sameerhameedy It should be fixed now; User:Atitarev accidentally made Dari a full language. Anatoli can you clarify what you were trying to accomplish by this change? Benwing2 (talk) 05:26, 13 October 2023 (UTC)
- @Benwing2 thank you for fixing it 🙏🙏. But yellow link issue (see above) is still happening for me though, any idea why? سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 05:23, 13 October 2023 (UTC)
- @Sameerhameedy I have fixed the data modules for the translation adder gadget and Module:wikimedia languages/data but I suspect there are some code changes needed for the translation modules themselves to fix these issues. Benwing2 (talk) 05:08, 13 October 2023 (UTC)
- Hi @Sameerhameedy, @Benwing2:
- It would be possible to fix the interwiki links, although I think it may only work for one selected language. You will find that 中國/中国 (zh) (Zhōngguó) (language code "cmn") links to zh:中國 ("zh", not "cmn" remotely (external wiki) but to 中國#Mandarin locally.
- Thanks for fixing, @Benwing2. Anatoli T. (обсудить/вклад) 05:26, 13 October 2023 (UTC)
- Hi, what is the diacritic thing that Sameer is referring to? The fix for it needs to go into Module:etymology languages/data. Benwing2 (talk) 05:28, 13 October 2023 (UTC)
- The diacritic is fixed. "prs" got the same handling (copy) as "fa" in the Module:language modules.
- Should all Persian language code (prs, fa-ira, fa, fa-cls) link to "fa" entries? Anatoli T. (обсудить/вклад) 05:30, 13 October 2023 (UTC)
- @Atitarev Yes, probably as a general rule all etymology languages should link to the correct full-language Wiktionary unless there's an entry in an exceptions table. Let me see about implementing that. Benwing2 (talk) 05:32, 13 October 2023 (UTC)
- Thanks. @Benwing2, I am not sure I agree with diff. The idea of adding "prs" was to nest Persian\Dari. If it's redirected to "fa", it won't work as intended. Anatoli T. (обсудить/вклад) 05:40, 13 October 2023 (UTC)
- @Atitarev That diff only affects the interwikt linking, as far as I know. This was requested by Sameer above. I don't think it affects the actual translation module used. Benwing2 (talk) 05:42, 13 October 2023 (UTC)
- @Benwing2. OK, no redirects now but attempts to add translations with "prs" with a translation adder gives the error "The parameter "<strong class" is not used by this template." Anatoli T. (обсудить/вклад) 05:49, 13 October 2023 (UTC)
- @Atitarev Hmm, did it work before? I added something to the translation adder for prs that might be wrong. Benwing2 (talk) 05:50, 13 October 2023 (UTC)
- @Benwing2: Yes, it did. I made a few edits on modules today. It may be hacky but made the desired outcome for Dari (prs), except for links to "fa".
- I'm not sure what to do with longer codes like "fa-ira" and "fa-cls". They don't fit into Module: language methods and not able to add translit modules to language modules. Anatoli T. (обсудить/вклад) 05:54, 13 October 2023 (UTC)
- @Atitarev Can you clarify what you mean by "not sure what to do with longer codes 'fa-ira'"? It shouldn't matter whether the codes are short or long. Benwing2 (talk) 05:55, 13 October 2023 (UTC)
- BTW what we need to do is fix the translation adder to know about etymology-only languages, which is what 'prs', 'fa-ira' and 'fa-cls' are. Benwing2 (talk) 05:56, 13 October 2023 (UTC)
- I will take a look. Benwing2 (talk) 05:57, 13 October 2023 (UTC)
- Module:languages/data/2 has "fa"
- Module:languages/data/3/p has "prs"
- Adding the translit module Module:fa-ira-translit to Module:languages/data/2 (for "fa") will generate errors. It doesn't like the module name or its categorisations.
- Myself and I understand @Sameerhameedy wants to permanently separate Iranian Persian/Dari/Classical Persian into different transliteration modules (vocalisations and transliterations differ), which should make the translation adder work with separate language codes (link to "fa") but allow automated nesting, e.g. "prs" should nest translations Persian\Dari, "fa-ira" should nest "Persian\Iranian Persian".
- "fa" should still work. Probably default to everything "Iranian Persian". Anatoli T. (обсудить/вклад) 06:06, 13 October 2023 (UTC)
- @Atitarev Let me look into how it's implemented; I think you need to add the translit module for fa-ira to Module:etymology languages/data. Since 'prs' is an etymology-only language, it should not go into Module:languages/data/3/p (which is for full languages), which is why I removed it. Benwing2 (talk) 06:08, 13 October 2023 (UTC)
- balti uses Module:fa-ira-translit (and it seems to work on balti entries) so I don't believe the name is the issue. fa-ira should also be connected to Module:fa-ira-translit, but the etymology codes "fa-cls" "prs" and "haz" should all use Module:fa-cls-translit. ("Since Tajik terms in the Arabic script are usually treated as "fa" not sure if this would be needed for Tajik. But maybe we can add it for "tg" as well just as a precaution?) سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 06:18, 13 October 2023 (UTC)
- @Sameerhameedy What should plain 'fa' use as translit? Benwing2 (talk) 06:20, 13 October 2023 (UTC)
- @Benwing2 I personally would prefer if the code "fa" was dialect neutral (kinda like "zh"). I think if "fa" wasn't dialect neutral it would make writing headers for Classical and Dari terms messy... But @Atitarev seemed to (somewhat) disagree with that when I suggested it. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 06:29, 13 October 2023 (UTC)
- @Sameerhameedy What do you mean by "dialect neutral"? Can you spell out how things would look in that scenario? Maybe we can find something that works for everyone. Benwing2 (talk) 06:31, 13 October 2023 (UTC)
- @Benwing2 I guess I was thinking of some variation of this User:Sameerhameedy/example entry. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 07:03, 13 October 2023 (UTC)
- @Sameerhameedy: “dialect neutral” is meaningless if we have translations and entries using this code. If we choose one vocal./translit. method as default, then fa can use fa-ira. The reality is 99% of wiktionary Persian contents is in modern Iranian.
- To change that, a large bot work would be required to seaparate eg Iranian from classical, etc. Think about what should happen with thousands of translations. “zh” is not used intentionally at Wiktionary but it defaults to Mandarin handling for translit purposes. Anatoli T. (обсудить/вклад) 06:55, 13 October 2023 (UTC)
- @Sameerhameedy What do you mean by "dialect neutral"? Can you spell out how things would look in that scenario? Maybe we can find something that works for everyone. Benwing2 (talk) 06:31, 13 October 2023 (UTC)
- @Benwing2 I personally would prefer if the code "fa" was dialect neutral (kinda like "zh"). I think if "fa" wasn't dialect neutral it would make writing headers for Classical and Dari terms messy... But @Atitarev seemed to (somewhat) disagree with that when I suggested it. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 06:29, 13 October 2023 (UTC)
- @Sameerhameedy What should plain 'fa' use as translit? Benwing2 (talk) 06:20, 13 October 2023 (UTC)
- BTW what we need to do is fix the translation adder to know about etymology-only languages, which is what 'prs', 'fa-ira' and 'fa-cls' are. Benwing2 (talk) 05:56, 13 October 2023 (UTC)
- @Atitarev Can you clarify what you mean by "not sure what to do with longer codes 'fa-ira'"? It shouldn't matter whether the codes are short or long. Benwing2 (talk) 05:55, 13 October 2023 (UTC)
- @Atitarev Hmm, did it work before? I added something to the translation adder for prs that might be wrong. Benwing2 (talk) 05:50, 13 October 2023 (UTC)
- @Benwing2. OK, no redirects now but attempts to add translations with "prs" with a translation adder gives the error "The parameter "<strong class" is not used by this template." Anatoli T. (обсудить/вклад) 05:49, 13 October 2023 (UTC)
- @Atitarev That diff only affects the interwikt linking, as far as I know. This was requested by Sameer above. I don't think it affects the actual translation module used. Benwing2 (talk) 05:42, 13 October 2023 (UTC)
- Thanks. @Benwing2, I am not sure I agree with diff. The idea of adding "prs" was to nest Persian\Dari. If it's redirected to "fa", it won't work as intended. Anatoli T. (обсудить/вклад) 05:40, 13 October 2023 (UTC)
- @Atitarev Yes, probably as a general rule all etymology languages should link to the correct full-language Wiktionary unless there's an entry in an exceptions table. Let me see about implementing that. Benwing2 (talk) 05:32, 13 October 2023 (UTC)
- I literally have no idea what catastrophic event you are alluding to, considering all persian links thus far have manual transliterations. There are also other separate languages with fewer differences on Wiktionary. Being honest if Iranian Persian and Dari were written phonemically (with all vowels) they would probably be treated as separate. I think being able to combine them is beneficial, one large dictionary is definitely the best solution for Persian. But, I don't think it makes any sense to prioritize Iranian Persian just because Dari is an LDL. Also Iranian Persian has the most phonemic mergers of all modern dialects, so it's harder to convert Iranian Persian to other dialects than vice versa, with that in mind I was planning on making fa-ira more specific to Iranian Persian's phonology (transliterating ق as ğ for example). سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 07:16, 13 October 2023 (UTC)
- @Sameerhameedy @Atitarev One possibility is to have no transliteration in place for 'fa' and have auto-translit for 'fa-ira' (which IMO should be called 'fa-IR'), 'fa-cls' (which probably should be called 'fa-cla', for consistency e.g. with 'la-cla' = Classical Latin and the general way we do such codes), and 'prs', each according to their own rules. The other possibility, as Anatoli suggests, is to have 'fa' use the same translit as 'fa-ira' and require that one of the other codes be used to get a different translit. Anatoli is right that the vast majority of terms currently on Wiktionary are effectively Iranian Persian. But since there's no current auto-translit defined, we can define the vocalization however we want; one reasonable thing to do is to use Classical vocalization, so the module can generate pronunciations for all the different varieties without too much trouble. BTW did we ever agree on a way of distinguishing /eː/ from /iː/ and /oː/ from /uː/ in vocalization? Benwing2 (talk) 07:39, 13 October 2023 (UTC)
- @Benwing2 The modules were named after the language codes which I didn't make, we can change them though if you want.
- Yes the classical vocalization is detailed here (note Dari diacritics are actually not standardized and this is just one of many ways Dari is vocalized. But this was is in-line with Arabic, Urdu, Punjabi, Uzbek, etc). For the Iranian style vocalization we use on wiktionary, there is probably no way to distinguish those vowels. But it seems there are also multiple ways to vocalize Iranian Persian. So I suppose Iranian Persian could use the classical vocalization, but it would be very different from all Iranian vocalizations so far. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 07:49, 13 October 2023 (UTC)
- @Sameerhameedy Hmm, it looks like the way of indicating the ē and ō vowels is simply to leave out the diacritic. This seems less than optimal, as it doesn't make it possible to distinguish vocalized from unvocalized text. Is there any way of having some specific diacritics to indicate the ē and ō? Benwing2 (talk) 18:25, 13 October 2023 (UTC)
- @Benwing2 afaik, no. Unless we pick a random diacritic and use that. But that might be confusing to readers. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 21:44, 13 October 2023 (UTC)
- @Sameerhameedy Hmm, it looks like the way of indicating the ē and ō vowels is simply to leave out the diacritic. This seems less than optimal, as it doesn't make it possible to distinguish vocalized from unvocalized text. Is there any way of having some specific diacritics to indicate the ē and ō? Benwing2 (talk) 18:25, 13 October 2023 (UTC)
- @Sameerhameedy: No catastrophic event will happen but what do you wish to do with all those manual transliterations, where the Iranian schema is dominant? Having one-to-many relationships from Iranian to other varieties doesn’t make the conversion easier. If they are identified as matching one of the schemas, they could all be converted to the new code and appropriate nesting on translations. If most contributors prefer to use Iranian, defaulting fa to Iranian would make sense. Anatoli T. (обсудить/вклад) 08:13, 13 October 2023 (UTC)
- @Atitarev I checked and the translation table just converts "zh" to "cmn" (at least that's what it did when I typed it in). If that's what your suggesting that's fine by me. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 08:37, 13 October 2023 (UTC)
- @Sameerhameedy @Atitarev One possibility is to have no transliteration in place for 'fa' and have auto-translit for 'fa-ira' (which IMO should be called 'fa-IR'), 'fa-cls' (which probably should be called 'fa-cla', for consistency e.g. with 'la-cla' = Classical Latin and the general way we do such codes), and 'prs', each according to their own rules. The other possibility, as Anatoli suggests, is to have 'fa' use the same translit as 'fa-ira' and require that one of the other codes be used to get a different translit. Anatoli is right that the vast majority of terms currently on Wiktionary are effectively Iranian Persian. But since there's no current auto-translit defined, we can define the vocalization however we want; one reasonable thing to do is to use Classical vocalization, so the module can generate pronunciations for all the different varieties without too much trouble. BTW did we ever agree on a way of distinguishing /eː/ from /iː/ and /oː/ from /uː/ in vocalization? Benwing2 (talk) 07:39, 13 October 2023 (UTC)
- Hi, what is the diacritic thing that Sameer is referring to? The fix for it needs to go into Module:etymology languages/data. Benwing2 (talk) 05:28, 13 October 2023 (UTC)
радейка
[edit]Hi, there is a Telegram video of an interview on Yuri Shcherbakov on getting the Hero of Russia award (https://t.me/Slavyangrad/73145 at 02:09). He spoke of information he received в радейке. This is not in Wiktionary. Is this likely to mean радио or рация? A diminutive of radio, or of walkie-talkie? The subtitles on Telegram say it means "radio", but pictures of it on the Internet look more like walkie-talkies. Maybe it could go in Wiktionary?2A00:23C8:A785:3B01:E26F:656B:C914:883D 10:42, 11 November 2023 (UTC)
- Hi. I never heard this word before. you can always request it on Wiktionary:Requested entries (Russian). Anatoli T. (обсудить/вклад) 22:01, 12 November 2023 (UTC)
Kyrgyz transliteration
[edit]Greetings, I see you have reverted my edit of Kyrgyz transliteration. What is the particular reason? Bababashqort (talk) 01:13, 7 December 2023 (UTC)
- I assume it was the deletion of 2 rows, so I returned those. Sorry for any inconvenience. Bababashqort (talk) 01:26, 7 December 2023 (UTC)
- @Bababashqort: Hi. Yes, that was the reason. Just wondering if you discussed with anyone the language policy? You should, at least, announce what you're planning to change on WT:BP. The transliteration policy should also match the Module:ky-translit.
- I can see you also added a column "Wiktionary standart" to w:Romanization_of_Kyrgyz, which you just added today. Anatoli T. (обсудить/вклад) 03:02, 7 December 2023 (UTC)
This isn't an adverb, it's just a constituent element of the whole phrase, lol. Why don't you mark every noun or other elements of adverbial phrases as adverbs as well? SujkaNiewydymka (talk) 06:35, 15 February 2024 (UTC)
- @Atitarev Lol, where did I insult YOU? I merely said that the idea of marking every constituent element as an adverb is quote "idiotic". Stop lying. SujkaNiewydymka (talk) 06:41, 15 February 2024 (UTC)
- And at least add if you really need to do this. But then you would have to mark e.g. angielsku, niemiecku etc. as adverbs as well. SujkaNiewydymka (talk) 06:44, 15 February 2024 (UTC)
oops
[edit]Sorry about that, I didn't read it carefully enough when I was editing it, I thought the only relevant part was the dative argument. Anarhistička Maca (talk) 19:18, 27 April 2024 (UTC)
They seem to know a thing or two about Slavic languages, but I left a welcome template on their talk page a month ago and they're still making basic mistakes like omitting headword templates. Perhaps you can look through their edits and give them advice tailored to the subject matter and their particular problems. For all I know, there may be language-comprehension problems as well. You would know better than I would. Thanks! Chuck Entz (talk) 19:41, 4 May 2024 (UTC)
- @Chuck Entz, @Ssvb, @Benwing2: Thanks for the message and apologies for not responding earlier. I am quite busy at work now.
- I agree there are a few problems with @Валёк Наталья's edits - formatting, spellings (and misspellings).
- I recommend the user to stay away from manually creating inflected forms with missing stresses and misspellings. Anatoli T. (обсудить/вклад) 02:10, 8 May 2024 (UTC)
- @Atitarev @Chuck Entz If this user doesn't respond to talk messages, they may need to be blocked, unfortunately. Benwing2 (talk) 03:01, 8 May 2024 (UTC)
What's SoP? Sorry for the question, I'm not that knowledgeable about acronyms and initialisms on here. Insaneguy1083 (talk) 06:02, 31 May 2024 (UTC)
- @Insaneguy1083: SoP stands "sum of parts". We don't include SoP entries where you can just what it is by knowing its components: любля́нскі (ljubljánski) + універсітэ́т (univjersitét).
- It's best not to link to entries for which entries are unlikely. You can, of course, use a usage example (usex)
- Любля́нскі ўніверсітэ́т ― Ljubljánski ŭnivjersitét ― Ljubljana university
- Which would belong either to любля́нскі (ljubljánski) or універсітэ́т (univjersitét) entries.
- The term won't pass Wiktionary:Criteria_for_inclusion, probably under Wiktionary:Criteria_for_inclusion#Place_names and some other checks. Anatoli T. (обсудить/вклад) 06:16, 31 May 2024 (UTC)
- Thanks! Will try to keep this in mind in future. Insaneguy1083 (talk) 06:54, 31 May 2024 (UTC)