Module talk:ur-translit

From Wiktionary, the free dictionary
Latest comment: 9 months ago by Exarchus in topic و as short u in خوش
Jump to navigation Jump to search

Food for thought

[edit]
  1. Is there a difference in pronunciation in final small prolonged ye and final short 'i' ye?
    (i.e. 'i' vs 'ī' in the final form)
  2. Is there any need for ō rather than just 'o' (i.e. is there a shortened o in Urdu)?
    (Note to self: Research more on Majhool)
  3. What's the difference between [ئے & ۓ] and [ئی & ئ]?
  4. ɛ̄ + ê or just ɛ + ê?
    (ɛ = prolonged, ê = shortened )

(This will be updated over time)


-Taimoor Ahmed(گل بات؟) 08:16, 31 December 2020 (UTC)Reply

1. I have seen few words in Hindi: शक्ति, मति, मिति while not a single word in Urdu. Final short 'i's just appear in Sankrit loanwords. They are not different in pronunciation; they neutralize to same phone(prolonged or not).
3. I have found a word پیدائش (pɛ̄.dɑ.iʃ); I believe ئ is used to transcribe post-vocalic short i (इ). The इ in Devanāgarī is used to transcribe English words: साइड. Kushalpok01 (talk) 10:09, 3 January 2021 (UTC)Reply
@Kushalpok01: - Hi Kushal, I didn't get your notification, please use the template:reply.
1. Do you think that it's better to transliterate ye in final form as 'i' over 'ī'? I think I would support this too because there isn't a distinction in Urdu unlike Hindi.
3. I wonder whether there should be something done to lower the chances of user error? Also do you think we should remove the diacritic from ɛ̄ and change it with ɛ for prolonged ai sounds?
-Taimoor Ahmed(گل بات؟) 02:55, 10 January 2021 (UTC)Reply
@Taimoorahmed11: Butting in here, I think -ī is more representative of pronunciation so I'd suggest using that. —AryamanA (मुझसे बात करेंयोगदान) 05:48, 10 January 2021 (UTC)Reply
@AryamanA: Noted!
-Taimoor Ahmed(گل بات؟) 17:37, 12 January 2021 (UTC)Reply

Module

[edit]

@Sameerhameedy Urdu is using Module:pa-Arab-translit instead of this module. To be honest, Urdu infrastructure isn't good with many hacky solutions. — Fenakhay (حيطي · مساهماتي) 05:37, 6 September 2023 (UTC)Reply

@Fenakhay Thank you, but since Urdu and Punjabi have very different transliteration policies, I don't know if having them use the same module is a good Idea... Urdu policy treats the hamza as a zero consonant (silent consonant that only exists to bear a vowel) but Punjabi policy treats it as a glottal stop, they both transcribe nasals differently, and the letter correspondences are different.
plus the punjabi module is actually very buggy in its current state.. Maybe i'll look into fixing it in the future. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 06:53, 6 September 2023 (UTC)Reply
Hola @Sameerhameedy, Fenakhay, I requested Module:pa-Arab-translit to be used for Urdu, because of how closely the languages are related (speaker-wise). I'm not sure why pronunciation is related to this, because transliteration doesn't necessarily represent pronunciation, no? Anyways, for me when it comes to transliteration, the representation of the individual letters matters the most, so IMO, we shouldn't just transliterate, for instance, ز (z), ض (z), ظ (z) as merely 'z'. I'm assuming there never was a policy set for Urdu, and the Transliteration policy for Hindi was adopted for Urdu. نعم البدل (talk) 19:28, 6 September 2023 (UTC)Reply
@نعم البدل to be clear, I have no objections to whatever transliteration policy is implemented. I am only saying that you should start a discussion about changing the policy. In fact, if you do start a discussion and get a new transliteration policy implemented I will change Module:ur-translit to conform to whatever transliteration that is (since urdu translit is currently less buggy than the panjabi one). But I am only asking that you discuss the changes instead of unilaterally implementing them. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 19:29, 6 September 2023 (UTC)Reply
And not with me (because I will follow whatever the community decides), go to beer parlor, tag all active urdu editors and propose your new transliteration. If they agree I will change this module to match whatever policy you put in place. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 19:34, 6 September 2023 (UTC)Reply
@Sameerhameedy: No you have a perfectly valid objection/point, a discussion is warranted. I'm not really sure who to invite to the discussion though, perhaps you could help me in this case. I'd love to get your opinion as well, as a Persian speaker. نعم البدل (talk) 19:35, 6 September 2023 (UTC)Reply
Okay, I'll find the active Urdu editors and invite them for a discussion at the beer parlour. نعم البدل (talk) 19:35, 6 September 2023 (UTC)Reply
Okay, since I will edit Module:ur-translit to match whatever transliteration policy is in place. @Benwing2 could you change the transliteration for Urdu to Module:ur-translit?
If the transliteration is changed, I will change the character mappings and will begin cleaning up transliterations in Urdu entries to conform to the new standard. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 19:48, 6 September 2023 (UTC)Reply

Cases without vowels should fail

[edit]

@Sameerhameedy: Hi. Pinging you as the only active editor on the module but maybe I should post in WT:GP or WT:BP.

Words like مشعہ, currently showing transliteration as "mśʻa", (found at radio#Translations) should fail for the lack of sufficient information. Either vowels should be provided or sokuns when there is no vowel. Otherwise, we'll get a lot of wrong transliterations.

I don't know if the word is valid and how to read it.

It looks like the module is enabled in the main namespace? Anatoli T. (обсудить/вклад) 01:00, 7 September 2023 (UTC)Reply

Urdu transliterations currently still use Module:pa-Arab-translit, (ben hasn't made the switch yet) I can add a provision that words that do not have any diacritics should not be transliterated. But the issue is that some vowels in Urdu don't need diacritics like سے, not sure what to do in that situation but maybe in that case we can add a spurious sukoon? e.g. سےْ? سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 01:15, 7 September 2023 (UTC)Reply
@Sameerhameedy: The form like سے (se) is sufficient, just like سا (). --Anatoli T. (обсудить/вклад) 01:50, 7 September 2023 (UTC)Reply
@Sameerhameedy: @Benwing2 used the provision for Arabic to fail transliterations, if a word looks "suspicious" with various consonant clusters and no vocalisation. I guess, Urdu and Persian can use a similar approach but the rules will be somewhat different. Errors like کُرُوز vs کُروز read as "korowz" instead of "koruz" are unavoidable, if some uses incorrect vocalisation by modelling on e.g. Arabic. Anatoli T. (обсудить/вклад) 01:55, 7 September 2023 (UTC)Reply
Hmm I actually might not be able to implement this but I'll try. I was going to just copy and paste what ben put in the Arabic transliteration module but I don't know what kinda magic he used. I cant really understand what the code he wrote does but if I figure it out i'll implement it here. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 03:21, 7 September 2023 (UTC)Reply
@Sameerhameedy Hi Sameer. The basic approach I used is to start with the vocalized source and remove sequences that are considered vocalized or are allowed to be unvocalized. If any are left at the end, you fail the translit. Benwing2 (talk) 03:31, 7 September 2023 (UTC)Reply
@Benwing2 thank you! I'm still learning lua which is why I couldn't understand it. When previewing one attempt the text cases said something like "ur-translit does not exist" any idea what would cause that? سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 03:48, 7 September 2023 (UTC)Reply
@Sameerhameedy I am not sure. Can you tell me exactly what you did and what message you saw? Benwing2 (talk) 03:53, 7 September 2023 (UTC)Reply
@Benwing2 Okay sorry I fixed it, it was because it called some functions that didn't exist (a function I copied called for the letter waaw but it's called vao in the module). But usually it says that a value that was called was null, not that the module didn't exist. So im still not sure why it did that.
Anyways I got it working in this revision this revision but I had to remove it because it caused 2/3 of the test cases to fail instead of the 2 that I wanted. I might be able to get it working, but i'm worried about the aspirated consonants, and if they'd mess things up. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 06:14, 7 September 2023 (UTC)Reply
@Sameerhameedy The aspirated consonants shouldn't mess things up. Basically, you have a series of pattern subs to remove sequences that are allowable. You can see the patterns used in Arabic near the bottom of Module:ar-translit, starting at line 319. You'd just need some extra patterns to handle the aspirated consonants, which go before the corresponding regular-consonant patterns. I don't know whether the fatha/kasra/damma goes on the first or second consonant in the aspirated cluster, but e.g. if it goes on the first one, you just remove the consonant for /h/ in the sequence of consonant + fatha/kasra/damma + /h/, then you can treat everything following as if it were the corresponding unaspirated consonant. Benwing2 (talk) 06:33, 7 September 2023 (UTC)Reply
@Benwing2, Sameerhameedy: Has the switch to Module:ur-translit been made? In any case, lines 108-11 on Module:pa-Arab-translit handles vocalised aspirated consonants, if that's the issue? Copying it over should suffice here as well. نعم البدل (talk) 15:28, 7 September 2023 (UTC)Reply
@نعم البدل no the module already handles vocalized aspirated consonants. What we're trying to do is have the module do a "count" to see how many syllables don't have any vowels (or a sukoon). And if too many syllables don't have any vowels, it won't transliterate. So if I wrote سنسکرت the module would return blank, but if I wrote something like سَنْسْکْرِت (sanskrit), it would transliterate. Unfortunately words like دیدار will transliterate as "dēdār", instead of going blank. Since Urdu has a some vowels that don't need diacritics.
Ben has already implemented a feature like that for Arabic, but porting it over here is a bit difficult. I should be able to get it working though. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 16:54, 7 September 2023 (UTC)Reply
@Sameerhameedy: - ah right yeah I was trying to see how to add it in the module. Went right past me, too confusing. Tried to understand how it worked for Module:ar-translit, but that module is too cryptic for me lol, efficient but cryptic. نعم البدل (talk) 17:33, 7 September 2023 (UTC)Reply
@Atitarev @Benwing2 this now works at least the vast majority of the time. But there are some false positives (though so far i've only seen one, but that means there's a lot more out there). Not sure why but I'll see if I can figure it out. I tried making it more strict to prevent that but it caused a lot of false negatives. I'll try to see if I can figure out what's going on, but since Urdu vowels don't need diacritics like Arabic, it can't be as strict as Arabic is. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 00:20, 8 September 2023 (UTC)Reply
@Sameerhameedy, @Benwing2: Thank you.
For the purposes of good transliterations perhaps it needs to more strict than the usual casual vocalisation, e.g. using sokuns in the middle of words but there should be rules regarding what clusters are allowed and at what position, which is not easy.
E.g. I've changed from کھلائی to کِھلائی (khilāī). The former should fail, since I don't think any word can start with "khl" but if it does, it needs to have sukuns. It's further complicated by digraphs. "کھ" is "kh", and the diacritic is set on the first consonant.
Also, a minor question. How do we set "(nil)" or "" when it is expected to fail (produce (nil)? Anatoli T. (обсудить/вклад) 00:32, 8 September 2023 (UTC)Reply
@Sameerhameedy What is the false positive you've seen? I'll take a look at what you've done and see if I can fix it. Benwing2 (talk) 05:56, 8 September 2023 (UTC)Reply
If you look at the test cases {{l|ur|کھلائی}} transliterates but the cluster "khl" shouldn't be allowed, at least without a sukoon. The module could bar adjacent consonants without a sukoon (i tried to do that but did something wrong) but unlike Arabic (afaik) it can't require all letters to have a diacritic since semi vowels without diacritics are distinguished (e.g. ایـ "e", اِیـ "ī", and اَیـ "ai" ). سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 06:22, 8 September 2023 (UTC)Reply
Yes, it should bar adjacent consonants without sukuun IMO. Are all your changes in the production module? Benwing2 (talk) 06:24, 8 September 2023 (UTC)Reply
Yes I haven't uploaded any changes outside of the module, I did some unsaved attempts in preview but didn't upload any of them because they broke some of the test cases. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 06:27, 8 September 2023 (UTC)Reply
@Benwing2: Thanks. Please bear in mind Urdu digraphs w:Urdu_alphabet#Digraphs. As in my example کِھلائی (khilāī) is okay or a common word گَھر (ghar), the diacritic is on the first letter of the digraph but کھلائی is not okay (the cluster is kh+l, which is ambiguous). Anatoli T. (обсудить/вклад) 06:39, 8 September 2023 (UTC)Reply
@Atitarev Thanks! Benwing2 (talk) 06:51, 8 September 2023 (UTC)Reply
@Benwing2, @Sameerhameedy: No worries.
جنوبی, on the other hand, should produce NIL (currently "jnobī"), since جن ("jn") is an impossible cluster, especially word-initial, the correct vocalisation is جُنُوبی (junūbī) or جَنُوبی (janūbī). Anatoli T. (обсудить/вклад) 08:28, 8 September 2023 (UTC)Reply
@Atitarev I assume جن can occur word-medially in words like Arabic مَجْنُون (majnūn), which appears to become مَجْنُوں (majnū̃) in Urdu. Either we need to have a full understanding of Urdu phonotactics, or (better) we need to require sukuuns to mark consonant clusters. I believe requiring sukuuns is the right thing. Benwing2 (talk) 08:38, 8 September 2023 (UTC)Reply
@Benwing2: Sorry, I didn't express myself clearly. "jn" is not a digraph, so it requires either a sukun or a vowel. Yes, we need a better understanding. Anatoli T. (обсудить/вклад) 08:43, 8 September 2023 (UTC)Reply
Requiring Jazm would be necessary in my opinion. Without it, there's just too many ambiguities. نعم البدل (talk) 08:45, 8 September 2023 (UTC)Reply
@نعم البدل By jazm you mean sukuun? sukuun is a symbol, whereas from what I can tell, jazm is a phonological concept referring to a consonant followed directly by another consonant. Benwing2 (talk) 08:57, 8 September 2023 (UTC)Reply
@Benwing2: yes we use the terms Sukoon and Jazm interchangeably in Urdu. نعم البدل (talk) 09:15, 8 September 2023 (UTC)Reply
@نعم البدل Let's standardize on sukuun (or sukoon, whatever). sukuun has only one meaning, which is consistent everywhere, but jazm has multiple meanings and is inconsistent: in Arabic, jazm means only the jussive mood (which happens to be marked by a null ending in Arabic); its extension of use to mean both a consonant cluster and the symbol marking a consonant cluster shows a general confusion between morphology, phonology and orthography. Benwing2 (talk) 09:27, 8 September 2023 (UTC)Reply
Yeah, sorry. I'll use sukoon henceforth. نعم البدل (talk) 09:29, 8 September 2023 (UTC)Reply
──────────────────────────────────────────────────────────────────────────────────────────────────── @Sameerhameedy – Module is returning nil in cases like رَہنا (to live); بَہنا (to flow). It's essentially in words with two-lettered stems, which will only have one diacritic + infinitive verb (na) نعم البدل (talk) 01:18, 10 September 2023 (UTC)Reply
@نعم البدل Hi, consonant clusters now require a sukoon in order to prevent the large about of blank transliterations that have been happening. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 01:21, 10 September 2023 (UTC)Reply
@Sameerhameedy: Is it not possible to add a clause for verbs specifically, because it's grammatically incorrect and it will mess up Module:ur-hi-convert generations as well? نعم البدل (talk) 01:23, 10 September 2023 (UTC)Reply
grammatically incorrect for verbs*, might I add. I think it's something that we didn't consider in the discussions about consonant clusters. نعم البدل (talk) 01:25, 10 September 2023 (UTC)Reply
@نعم البدل Not sure I understand, why do you think we need a special case for verbs? I'd rather not do that. What is messing up? Let's fix that instead. Benwing2 (talk) 01:44, 10 September 2023 (UTC)Reply
@نعم البدل Are you sure? Urdu lughat does use sukoon for رہنا (though the font uses the quranic style of sukoon which resembles a ح cut in half). IMO this is more likely an issue with hi-translit, unless there a reason why he + sukoon = ā instead of "h"?
But if you think ur-translit is the problem then you'd have to ask @Benwing2 to look into it. Because I have genuinely have no idea how I would make transliteration exceptions for certain lemmas types.
There are still other issues with ur-translit I will be working on later though, so i'll check if anything can be done on ur-translits side. But i don't think there is. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 01:42, 10 September 2023 (UTC)Reply
@Sameerhameedy, Benwing2:
  • Are you sure? Urdu lughat does use sukoon for رہنا – Huh, I never noticed that, I'd always thought that when it came to verbs, the Sukoon was never needed, since the stem is separate to the infinitive. If so, I guess it can stay as it is.
  • Because I have genuinely have no idea how I would make transliteration exceptions for certain lemmas types. – I just assumed that some botch with Module:ur-headword could create an exception of some kind, I'm not great at coding lmao, but if not then it's not the end of the world.
The issue with Module:ur-hi-convert is that the sukoon will combine the stem with the infinitive, so for instance رَہْنا (rahnā) would become रह्ना (rahnā) instead of रहना (rahnā), and this would have to change for all Urdu verbs, so I'm wondering if some sort of exception can be added in either module or whether this will cause the ur-hi convert mod to be redundant for Urdu verbs. نعم البدل (talk) 02:22, 10 September 2023 (UTC)Reply
Unless we use the generations of Module:ur-hi-convert, and put them in Module:hi-translit and use that as the TR, but only for Urdu verbs – but the TR will be different to the current TR policy? نعم البدل (talk) 02:31, 10 September 2023 (UTC)Reply
@نعم البدل But this is a more general issue, isn't it? Consonant clusters can be rendered two ways in Devanagari (either through a conjunct consonant or two consonants placed next to each other) and AFAIK there's no way to predict which one will be used in any particular circumstance. Benwing2 (talk) 03:05, 10 September 2023 (UTC)Reply
@Benwing2: But in this case the pattern would be that the sukoon before the infinitive in verbs just needs to be ignored, ie. if the mod is being used for Template:ur-verb the sukoon before the نا – last two characters needs to just return "" or continue (or something similar)? نعم البدل (talk) 03:14, 10 September 2023 (UTC)Reply
@نعم البدل We can make a special hack for this case but in general Module:ur-hi-convert cannot work correctly in all circumstances as it will get clusters wrong much of the time. Benwing2 (talk) 03:43, 10 September 2023 (UTC)Reply
@Benwing2 Module:ur-hi-convert is so botched that I'm considering making a new module from scratch in the sandbox, in any case, in foresight, if we could get that exception for verbs, it would be quite convenient for it to work with the convert mod. نعم البدل (talk) 14:20, 10 September 2023 (UTC)Reply
@نعم البدل If all verbs end in -nā, you can just make Module:ur-hi-convert recognize and handle this specially. I think this is a better solution than hacking the translit to have a special no-sukuun exception for this case. Benwing2 (talk) 02:26, 11 September 2023 (UTC)Reply
@Benwing2: How can I make it specific to Template:ur-verb? نعم البدل (talk) 17:53, 11 September 2023 (UTC)Reply
@نعم البدل It should not be specific to Template:ur-verb. Benwing2 (talk) 18:59, 11 September 2023 (UTC)Reply
It won't work without it. Urdu verbs tend to end in -nā, but that's not to say every Urdu word that ends in -nā is a verb. نعم البدل (talk) 21:40, 12 September 2023 (UTC)Reply
@نعم البدل IMO having a hack for verbs is the wrong approach. Why can't you just make ALL clusters with sukuun be rendered using Module:ur-hi-convert using two consonants next to each other rather than using a conjunct? Conjuncts aren't so common in modern Hindi. Benwing2 (talk) 23:18, 12 September 2023 (UTC)Reply
@Benwing2, @نعم البدل, @Sameerhameedy:
पाकिस्तानी (pākistānī) uses a conjunct "st", ट्रैक्टर (ṭraikṭar) uses a conjunct "ṭr". Conjuncts are very common in Hindi. Anatoli T. (обсудить/вклад) 05:52, 20 September 2023 (UTC)Reply

New test cases

[edit]

@Sameerhameedy, @Benwing2:

Hi,

Thinking of adding these test cases based on w:Urdu_alphabet#Iẓāfat:

  1. شیرِ پَنْجاب (śer-i panjāb, the lion of Punjab) śer-i or śer-e?
  2. مَلِکَۂ دُنْیا (malika-yi dunyā, the queen of the world) malikā-yi or malikā-ye?
  3. وَلِئِ کامِل (vali-i kāmil, perfect saint) vali-i, vali-e, vali-yi or vali-ye?
  4. مَۓ عِشْق (ma-ye 'iśq, the wine of love) Is "ma-ye" correct?
  5. رُوئے زَمِین (rūe zamīn, the surface of the Earth) as above
  6. صَدائے بُلَنْد (sadāe buland, a high voice) as above

You see, they are not all working as expected but I want to check if I vocalised them correctly first. Anatoli T. (обсудить/вклад) 08:15, 8 September 2023 (UTC)Reply

for the 6th example is that really written with ghunna instead of sukoon?? I have set ghunna to be a nasal vowel unless it's in front of specifically mentioned characters. the mentioned characters being گکجچخغٹڈڑحہ , in front of those characters ghunna becomes an assimilated nasal consonant. Should I add all the dentals as well? I didn't think ghunna was used before them. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 08:44, 8 September 2023 (UTC)Reply
There isn't a guideline for how ghunna assimilates so I based the assimilation on how Hindi uses ं. And Hindi does not use ं before dentals. But obviously the scripts are different, so that might not apply to Urdu. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 08:47, 8 September 2023 (UTC)Reply
@Sameerhameedy: It may be a sukun. I’ll try to find out. Pls fix for now. Anatoli T. (обсудить/вклад) 08:51, 8 September 2023 (UTC)Reply
Fixed, since there is no guideline for urdu's noon ghunna, I made noon ghunna mirror noon ghunna in panjabi and how Hindi uses ं . سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 21:17, 8 September 2023 (UTC)Reply
@Atitarev actually I will undo this change. According to this official dictionary from the Pakistan government [1] (at least judging by how which words had a sukoon vs a ghunna) an assimilated noon is represented with a sukoon, and a noon ghunna is usually a nasal vowel. By the looks of it, it seems ان٘گ = aŋ, انْگ = ang. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 22:03, 8 September 2023 (UTC)Reply
@Sameerhameedy: Apologies for not responding earlier. I am flat out at work with a new project. Thanks for checking.
I think vowel + ن٘ (̃) is almost like Hindi vowel + (̃) as in दांत (dānt) but our transliteration is phonetic, so it's transliterated as "dānt", just like दान्त (dānt) and I think there is no difference to दाँत (dā̃t). Feel free to edit and add new transliteration cases when you're more certain.
I more interested in iẓāfat in this case and how it should be transliterated - "e" or "i", with or without "y" and at what position. @نعم البدل: please also comment on vocalisation and "expected" transliterations. Anatoli T. (обсудить/вклад) 06:35, 12 September 2023 (UTC)Reply
@Atitarev Historically, the izafat in Urdu has been transliterated as -i, but I'm still not sure what should be used for the TR. I likely would have leaned towards -i if we were making a distinction between individual letters, and it would have made sense since the zer is a 'short i', but since we're evidently not so strict with the TR, -e might be on the table. I believe, I've already expressed my opinion regarding nasalisation. نعم البدل (talk) 21:37, 12 September 2023 (UTC)Reply
@نعم البدل, Sameerhameedy, Benwing2:: Thank you! If you agree, we will leave iẓāfat with a zer (kasra) as "-i" or "-yi". ئے (e) seems to always produce "e". Please take another look at the iẓāfat endings in all examples. I added some comments. Are there any changes required to the vocalisations and the currently produced (automatic) transliterations? --Anatoli T. (обсудить/вклад) 00:18, 13 September 2023 (UTC)Reply
@نعم البدل: Just to clarify, I saw your message about "bulãd" vs "buland" in another discussion to which I agree. So I'll make test cases considering this. It may be hard to follow and remember all discussions but if a test case is created, it will be addressed sooner or later. E.g. for صَدائے بُلَن٘د (sadāe bulãd), it should be "sadāe buland", not "sadāe bulãd". I will ping you when I make cases after your response. (And as User:Sameerhameedy said in the same discussion, بُلَنْد (buland) produces "buland", so is بُلَن٘د (bulãd) still a correct vocalisation and transliteration?) (repeating the ping, since I included Sameerhameedy as well). --Anatoli T. (обсудить/вклад) 00:49, 13 September 2023 (UTC)Reply
@Atitarev Urdu lughat only uses ghunna for a nasal vowel. It only uses ghunna for consonant assimilation before kaaf and gaaf. In this case, UR-L would only use the spelling بُلَنْد for "bulānd", but would use دان٘ت for dā̃t. My usage of ghunna in fa-IPA was wrong and i'll remove it soon. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 18:26, 14 September 2023 (UTC)Reply
@Sameerhameedy: Thanks. I've changed to sukuun.
Re: izāfat endings, it seems "-i" (not "-e") is correct for "zer" diacritic, so the automated transliterations should now be correct?
What about before qaaf (if this happens)? Do you have a copy of Urdu Lughat? Anatoli T. (обсудить/вклад) 23:56, 14 September 2023 (UTC)Reply
@Atitarev so after hours of research I found [2] and a book by Manjari Ohala (can't access their book directly but it's quoted on wikipedia and on that site). Which are the only phonetic guides that focuses on urdu. According to both of them, nasalized vowels tend to assimilate before certain voiced stops (b,j,g) but (almost) never before voiceless stops (k,c,p). Nasal vowel assimilation before voiceless stops is a rarity and only really occurs in loan words (this seems to be the exact same case with Hindi, hi-translit removes nasal vowels before b,bh,j,jh,g,gh,Dh,dh but leaves them in front of t,th,T,Th,c,ch,k,kh,d,D. Even if they use the strict nasalization marker ँ.). So for بَین٘ک, the assimilation most likely has to be inputted manually. It seems this is a rare case anyways so it probably isn't worth creating a hack for it. There's really no other option. At least, I cannot think of a way that wouldn't fuck up entries like سان٘پ آن٘کھ پان٘چ. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 05:10, 15 September 2023 (UTC)Reply

ah -> ā

[edit]

@Sameerhameedy Hi, final 'a'/'ah' words are being returned as ā. It shouldn't be the case. Otherwise, we'll have alternative forms with redundant transliterations. نعم البدل (talk) 22:37, 8 September 2023 (UTC)Reply

it changes back to -> "ah" if you put a sukoon on it. Since, according to wikipedia, it's not consistently pronounced as a short vowel in Urdu. If you want, I can change it so that a final -he becomes -a, but a final zabar + he = -ā. Or vice versa? سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 22:54, 8 September 2023 (UTC)Reply
Final-a words like کھاتا (khātā), کَھاتَہ (khāta) (alt-form of the first lemma) and جَگَہْ (jagah). The first and last are fine, it's just the middle one. Words which end in zabar he, shouldn't be prolonged in this case, despite the pronunciation. نعم البدل (talk) 01:01, 9 September 2023 (UTC)Reply
Any update on this? نعم البدل (talk) 17:54, 11 September 2023 (UTC)Reply
I was looking into this, this seems like it could be problematic due to how many entries list a final he as an -ā. Are you sure the hamza trick isn't good enough? It allows a final he to be transliterated as -ā and as -ah. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 06:45, 17 September 2023 (UTC)Reply
@Sameerhameedy:
  • how many entries list a final he as an -ā – this is because final he usually become -ā in Hindi (with exceptions like जगह (jagah) / جَگَہْ (jagah)), and the TR used to be just copied over, even though they're not strictly the same form. As I said, it would treat lemmas like کھاتا (khātā) and کھاتَہ (khāta) the same. Is the issue strictly that the TR in older lemmas won't match the newer TR, or is it because the code might become too ambigious? Because many of the TR in the old lemmas need correcting anyways.
  • Are you sure the hamza trick isn't good enough? Sorry, did you mean sukoon? نعم البدل (talk) 10:41, 17 September 2023 (UTC)Reply
@نعم البدل, Yes I meant sukoon sorry.
Theres no technical reason why we can't change a final -he, I was just worried about changing it since transliterated a final he as -ā seems to be common practice. But if your 100% sure it should be changed I can do it. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 19:19, 17 September 2023 (UTC)Reply
@Sameerhameedy: It would be preferable if you could change it, thanks. نعم البدل (talk) 00:16, 19 September 2023 (UTC)Reply
@نعم البدل okay i'll trust your judgement, the change has been made. It looks like it might've caused small issues issues for words with الله or ۂ, i’ll fix those soon. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 04:00, 19 September 2023 (UTC)Reply
Thank you! نعم البدل (talk) 16:55, 20 September 2023 (UTC)Reply

Diphthongs اَی and اَو, also اَئ

[edit]

@Sameerhameedy:

Hi,

How should diphthongs be presented

  1. ٹَرَیکْٹَر (ṭaraikṭar) or ٹَرَیْکْٹَر (ṭaraykṭar)
  2. اَور (aur) or اَوْر (avr)

Also:

  1. Is فائِر اِنْجَن (fāir injan) a correct vocalisation?

Anatoli T. (обсудить/вклад) 00:44, 15 September 2023 (UTC)Reply

@Atitarev "ai" and "au" should never have a sukoon because they are actually independent vowels, not a vowel + consonant.
ai = /ɛ/
au = /ɔ/
so اَور is /ɔɾ/, not /əʋɾ/ or /əwɾ/
yes that vocalization looks correct. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 01:01, 15 September 2023 (UTC)Reply

variants of 'n'

[edit]

Hi @Sameerhameedy About the various transliterations of, 'n' like ہونْٹ (honṭ) and پَنْجاب (panjāb). Please normalise them as 'n', as these sounds – the Retroflex 'n' and (lesser) the Palatal 'n' aren't found in Urdu (neither in the alphabet or the phonetic inventory), and are specific to Hindi. نعم البدل (talk) 13:29, 17 September 2023 (UTC)Reply

Arabic tāʾ marbūṭa + الْ with an unmarked vowel

[edit]

@Sameerhameedy, @نعم البدل, @Benwing2: Hi. Here's an example دائِرَۃُ الْمَعارِف‎ (encyclopedia) with both tā marbūṭa + الْ (al-) with an unmarked vowel.

Will the module be able to automatically transliterate it as "dāiratu l-ma'ārif"? Is it a good and correct test case or we just transliterate such words manually? Should the الْ be spelled as ٱلْ to show that it's silent? Anatoli T. (обсудить/вклад) 23:23, 24 September 2023 (UTC)Reply

@Atitarev It should be possible to make it handle this correctly as the Arabic module can do it. Benwing2 (talk) 02:43, 25 September 2023 (UTC)Reply
@Atitarev It has to use the letter "te marbūta goal" ۃ (which is encoded differently than the Arabic one ة "ye marbūta") but it should work. In دائِرَۃُ ُالْمَعَارِف (dāiratu ulma'ārif) the ۃ is correct; When it has a vowel it becomes "atV" but otherwise is "a". However I'm not sure how to handle Arabic al- thing because i'm not sure how the module would distinguish Arabic words starting with al- vs Urdu words (which shouldn't transliterate like that). Theoretically we could get it to work with ٱلْ but that diacritic is not on Urdu keyboards. It's not type-able on either my phone or laptop. Not sure how I feel about using diacritics that are not utilized by Urdu at all and that most Urdu speakers don't normally have access too. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 20:01, 29 September 2023 (UTC)Reply
I might be able to make it so that an initial alif with a sukoon is deleted though, could that work? سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 20:02, 29 September 2023 (UTC)Reply
@Sameerhameedy Are there any native Urdu words that end in "te marbūta goal" + vowel + ال in the next word? I'd think that the sequence of ‎"te marbūta goal" + vowel indicates a phrase borrowed from Arabic. Benwing2 (talk) 20:09, 29 September 2023 (UTC)Reply
Thanks, I suppose I could get that sequence to work. The only issue is that an initial alif has to be paired to a vowel or else the module will return nill. I would have to have the alif paired to a sukoon. I could have it so that "alif + sukoon + laam + sukoon" always returns "l-". سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 20:29, 29 September 2023 (UTC)Reply
@Sameerhameedy, @Benwing2: Thanks for addressing this. You can use the same methods as in the Arabic module, even if such a combination is much less common. I also think "dāiratu ul-ma'ārif" with a hyphen would be more accurate but I don't have a strong opinion about it. It may not be so necessary to insert hyphens for etymological reasons only.
What do you think of کِتابُ الْمَعارِف (kitābu l-ma'ārif) (or kitābu lma'ārif - with no hyphen)? I think the indication that it is an Arabic borrowing and "ا" should be read as "ٱ", is the lack of a diacritic over the alif and a vowel on a preceding word. Would that work? Anatoli T. (обсудить/вклад) 00:24, 30 September 2023 (UTC)Reply
@Sameerhameedy I agree with User:Atitarev, it would be better to not require the sukoon over the alif (since it's a nonstandard use of sukoon, which is normally only placed over consonants, and is likely to confuse users). That would probably mean adding a special case for this situation to allow it. Benwing2 (talk) 00:47, 30 September 2023 (UTC)Reply

Final ہ after vowels

[edit]

@Sameerhameedy:

Should these گْرَہ (gra), گِرَہ (gira) be exceptions, with "h" pronounced? Anatoli T. (обсудить/вклад) 10:15, 30 September 2023 (UTC)Reply

@Atitarev if you put a sukoon on the he it'll show the "h" سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 14:17, 30 September 2023 (UTC)Reply

Handling Arabic al-

[edit]

@Sameerhameedy, @Benwing2: Hi, the Arabic ال article is not (yet) handled but is it preferred to do:

  1. عِید اُلْفِطْر ('īd ulfitr) (automated) or. Not correct from the Arabic point of view but produces a correct transliteration.
  2. عِیْدُ الْفِطْر ('īd ul-fitr) (manual) (possibly just "īd ulfitr" with no hyphens?)

Benwing2 suggested the Arabic article can be handled just the Arabic module but Sameerhameedy has previously removed my test case. So, just want to know your views, also regarding assimilations of L for Arabic sun letters, e.g. عیدُ الصَّوم ('īd us-saum) and tāʾ marbūṭa ۃ in the previous topic (above). Anatoli T. (обсудить/вклад) 00:08, 10 October 2023 (UTC)Reply

@Atitarev I removed it because I didn't think it was possible, but I'll use ben's suggestion so you can add the test case back. Or perhaps I will when I need to test it.
It'll probably be vocalized the second way سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 00:11, 10 October 2023 (UTC)Reply

Word of Gratitude

[edit]

The module works great by the way – I've not had a single issue with it so far, so thank you. @Sameerhameedy نعم البدل (talk) 17:35, 2 November 2023 (UTC)Reply

و as short u in خوش

[edit]

@Sameerhameedy Hi, is there a way to indicate و is a short 'u' like in خوش or should the transliteration be done manually? Thanks. Exarchus (talk) 12:36, 10 January 2024 (UTC)Reply

@Atitarev Any ideas? Maybe just writing a hack that converts خوش and خود to 'xuś' and 'xud'? Exarchus (talk) 14:03, 14 January 2024 (UTC)Reply
@Exarchus: It would be worth adding a hack if خوش (xuś) and خود (xud) are consistently pronounced so, otherwise a manual transliteration is required. Anatoli T. (обсудить/вклад) 21:39, 14 January 2024 (UTC)Reply
Also, I think ؤُ (ū) should always give a short "u", not "ū" as in ساؤُتھ اَفْرِیقَہ (sāuth afrīqa).
@نعم البدل, @Exarchus, @Sameerhameedy. I made a case in Module:ur-translit/testcases a while ago. Anatoli T. (обсудить/вклад) 21:49, 14 January 2024 (UTC)Reply
@Atitarev I think I fixed that issue Exarchus (talk) 23:09, 14 January 2024 (UTC)Reply
@Exarchus: Yes, you did, thank you! Anatoli T. (обсудить/вклад) 23:16, 14 January 2024 (UTC)Reply
@Atitarev There appears to be a complication: sometimes ؤُ is intended to mean 'ū', like in طاؤس and ساؤ. I know this doesn't conform to the rules in Arabic, but in Urdu ؤُ can mean both 'u' and 'ū'. At the end of a word (from what I'm seeing in the Platts dictionary), it's apparently always long, so a rule can be introduced for that, and then there's only طاؤس left to do manually. Exarchus (talk) 09:34, 15 January 2024 (UTC)Reply
Btw, the title for the Arabic entry طاووس might be wrong, although ṭāwūs apparently exists too. Exarchus (talk) 09:48, 15 January 2024 (UTC)Reply
Unrelated to Urdu, but several dictionaries connect طاووس to a root ط-و-س, connecting it to طاس. A spurious etymology? Exarchus (talk) 10:20, 15 January 2024 (UTC)Reply
@Exarchus: Yes, that complicates things. Perhaps, it's worth knowing what it more common a long or a short "u" and manually translit what is less common. Anatoli T. (обсудить/вклад) 03:06, 16 January 2024 (UTC)Reply
@Atitarev Looking at Platts, I find:
- word-finally: sāʼū, lāʼū, nayaʼū, wahaʼū (none from Arabic)
- word-medially: tafāʼul, raʼuf vs. tāʼūs(ī) (all from Arabic)
So if you don't want separate rules for word-finally vs. medially, then long 'ū' should be preferred. Exarchus (talk) 09:44, 16 January 2024 (UTC)Reply
@Exarchus: Thanks, I am OK, if you revert the change.
I'm not sure how hard it would be to implement the module logic.
If it's going to default to "ū", I will remove the test case. Anatoli T. (обсудить/вклад) 21:41, 16 January 2024 (UTC)Reply
@Atitarev Well, I had already changed it to: word-finally = 'ū', word-medially = 'u', nothing difficult to implement, and you can rationalise this by saying that word-final 'u' becomes long anyway. Exarchus (talk) 21:48, 16 January 2024 (UTC)Reply
@Exarchus: Thanks, I missed that edit. Anatoli T. (обсудить/вклад) 22:12, 16 January 2024 (UTC)Reply
@Exarchus, @Sameerhameedy, @نعم البدل:
Hi. What are the length rules for ئِ (-i) as in لائِسَنْس (lāisans), are they similar to ؤُ (ū) or is it always short? Anatoli T. (обсудить/вклад) 23:46, 21 January 2024 (UTC)Reply
@Atitarev From what I've seen, the regular Arabic spelling is followed there, so to have a long ī, ئِي is used. Exarchus (talk) 10:11, 22 January 2024 (UTC)Reply
@Atitarev There are a few exceptions like Persian خود (classical: 'xōd', meaning: 'steal helmet') and خوشه, but those are pretty rare words (don't have Urdu entries on wiktionary). Exarchus (talk) 21:53, 14 January 2024 (UTC)Reply
@Exarchus: In Persian, these are handled separately, Sameerhameedy has added a handling, which works nicely for Classical Persian and Dari. The Iranian Persian is manually transliterated. Anatoli T. (обсудить/вклад) 21:56, 14 January 2024 (UTC)Reply
@Exarchus, Atitarev: This is what's called a Silent Vav (or Vā'o-i Ma'dūla). I don't see how you can identify a silent vao in code, considering there's no diacritic or rule that you can follow to recognise it. It would almost certainly always need to be a manual transit, and I'll be honest, I don't really think there is any need to highlight a slient vow. Translating خود as xod, even though it's pronounced as xud is fine, really. نعم البدل (talk) 22:28, 14 January 2024 (UTC)Reply
@Exarchus Hi I did create a hack that dealt with this for Module:fa-cls-translit, and I can add it to the urdu module. Though I probably will not be able to work on it anytime soon. سَمِیر | Sameer (مشارکت‌ها · بحث) 23:04, 14 January 2024 (UTC)Reply