Wiktionary:Beer parlour/2021/February

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Splitting WT:RFVN

[edit]

This page is getting close to unworkable due to its size. It takes a long time to make any edits to it. I think we need to split it by month, or maybe by language family. Benwing2 (talk) 06:26, 1 February 2021 (UTC)[reply]

Latin script and non-Latin script?--Karaeng Matoaya (talk) 06:31, 1 February 2021 (UTC)[reply]
I think language family is a good idea, or even just geographical area, e.g. /European_languages, /African_languages, /Asian_languages, /Oceanian_languages, /indigenous_American_languages. —Mahāgaja · talk 10:43, 1 February 2021 (UTC)[reply]
Support by language family, absolutely not by geography. DTLHS (talk) 18:04, 1 February 2021 (UTC)[reply]
OK, except there really shouldn't be 150 different subpages. I'd support making subpages for 4 or 5 of the largest families (e.g. Indo-European, Afroasiatic, Sino-Tibetan, Austronesian) and then having one subpage for everything else. —Mahāgaja · talk 20:01, 1 February 2021 (UTC)[reply]
I suggested spliting RFD last year: Wiktionary:Beer_parlour/2020/October#Splitting_RFD_non-English. Vox Sciurorum (talk) 17:57, 1 February 2021 (UTC)[reply]
The real solution is to close old RFVs, which is a task that often needs to be left to specialists. Let's make an effort to clean it out, by doing what we can and pinging knowledgeable people for what we can't, and then reassess. —Μετάknowledgediscuss/deeds 19:22, 1 February 2021 (UTC)[reply]
I don't think language family is a good way to do it. We should look at the natural divisions in the community, regardless of genetic relationships. For instance, the CJKV languages are all unrelated to each other, but there's lots of overlap (at least for the CJK part). What's more, the communities for those languages have their own way of doing things and it's very hard for anyone who doesn't have a background in those languages to contribute anything useful to the discussions and workflow. If you think about it, the CJKV part of RFV is already separate for all practical purposes because almost no one outside of the CJKV communities can even understand what the discussion is about, let alone contribute to it. I think splitting off CJKV would make a substantial dent in the overload.
There are probably a couple of other other natural divisions: the Turkic, Iranian and Arabic languages are also unrelated and also have a considerable amount of overlap, and then there's South Asia. I'm not sure if those have enough volume to make much of a difference though. Of course, there are languages that have ties to more than one area- Urdu is both a Middle Eastern and a South Asian language, for instance. Even so, that kind of thing happens with just about any criteria you might use. Chuck Entz (talk) 04:18, 2 February 2021 (UTC)[reply]
And what’s with threaded discussions as on User talk:Rua? Isn’t en.Wikipedia using similar for frequently edited project pages, apart from cutting them into multiple pages? Fay Freak (talk) 18:38, 2 February 2021 (UTC)[reply]
Splitting is unavoidable. But maybe we could also start closing old entries that have not reached a consensus while leaving the RFV notice in the actual entry so that readers know the validity of the word has been contested? This is somewhat similar to the system that some wikis employ that shows the last verified version of an article. Dixtosa (talk) 17:57, 4 February 2021 (UTC)[reply]
Related discussion: Wiktionary:Grease pit/2021/February § Out of memory!. J3133 (talk) 06:13, 5 February 2021 (UTC)[reply]
I agree with User:Chuck Entz here. From looking at the recent entries I think we could make things a lot better just by making two splits: (1) CJK languages, (2) Italic (i.e. Latin + Romance languages). There's no reason we need to split everything at once; we can proceed in several stages as needed. Splitting in a way that lines up with communities helps minimize the number of different pages that need monitoring. Benwing2 (talk) 23:33, 7 February 2021 (UTC)[reply]
@Benwing2, Chuck Entz, Fay Freak, Mahagaja, Karaeng Matoaya, DTLHS, Metaknowledge, Dixtosa: So, how many groups are proposed?
Perhaps, we could suggest (for a vote) just the three for now: 1. CJK (minus roman-based Vietnamese, just CJK) + the whole of Sino-Tibetan. 2. Everything Roman-based, 3. Everything in non-Roman scripts. Potentially, Cyrillic, Greek, Armenian and Georgian script languages to be grouped together with the Roman script languages. Need to make sure that each group has enough languages and users, though. What do you think? --Anatoli T. (обсудить/вклад) 00:06, 8 February 2021 (UTC)[reply]
@Atitarev: I think this is so finely arbitrary and unambiguous (save Serbo-Croatian) that nobody wants to disagree because of deeming it POV. It is a bit unnatural though, to separate Polish and Ukrainian etc., and Turkic languages (I guess you mean Turkmen will be at the Roman side while Kazakh will be at the Cyrillic side). Fay Freak (talk) 00:36, 8 February 2021 (UTC)[reply]
@Fay Freak: Yes, it is arbitrary and this is just a discussion. We're open for other suggestions. Do you have any? I also suggested possibly grouping easy (?) scripts like Cyrillic and Greek together with the Roman-based languages (+ Armenian and Georgian but less sure about this part).
For a larger number of users and editors a foreign script is a hurdle they won't even try to overcome and skip/ignore such words, even if texts in Roman scripts can be full of diacritics. So yes, for them Polish would be OK but not Ukrainian. It is not my opinion. --Anatoli T. (обсудить/вклад) 00:54, 8 February 2021 (UTC)[reply]
(edit conflict) Turkish is in Roman script, Ottoman Turkish is in Arabic script, and most of the rest of Turkic is in Cyrillic. Similar problems with Javanese. Then there are the Gothic and the Italic scripts alongside all the Roman-script German and Latin/Romance languages. Greek really shouldn't be separated from all the European Roman-script languages, nor should the Slavic languages be separated by script (what would you do with Serbian?), so including Greek and Cyrillic with Roman scripts is not optional. Southeast Asia is likely to be a train wreck: Burmese and Tibetan would be lumped with CJK, while Vietnamese would be lumped with Spanish and Lithuanian, and Thai and Khmer would be lumped with Arabic and Ethiopian. Chuck Entz (talk) 01:15, 8 February 2021 (UTC)[reply]
An interesting project might be to look at the revision history of RFVN and figure out who contributes to discussions on which languages, then which groupings of languages have the largest numbers of contributors in common. IMO it's all about overlapping of expertise.
I think that any global criterion is going to fail spectacularly in some cases. There's no way we can split all the languages of the world cleanly at one go. Let's find a large grouping that seems natural and split it off. Later, after discussion, we can find another one and split it off, etc. Whatever we do, it should be split by language codes so a module can generate a link to the correct one without some IP or first-time logged-in user having to read up on the descriptions of all the different choices. Chuck Entz (talk) 02:11, 8 February 2021 (UTC)[reply]
Just looking at the existing entries in WT:RFVN, I bet it would be sufficient, at least at first, to split off CJK (with "C" construed broadly to contain all Sinitic languages but not the rest of Sino-Tibetan) and leave everything else in the main group. If that turns out not to be enough, we can discuss another split later. —Mahāgaja · talk 07:35, 8 February 2021 (UTC)[reply]
I agree that grouping non-Sinitic Sino-Tibetan with CJK would not be of benefit, since mostly the contributors to these languages (save perhaps Tibetan and Burmese) aren't CJK editors.
Perhaps another - or a complementary - solution would be to split LDLs from WDLs, any thoughts on that? LDL RFVs generally have less to no imput, so the difference in the number of languages shouldn't be an issue. Thadh (talk) 21:57, 8 February 2021 (UTC)[reply]
@Thadh: That might be hard to implement given that Chinese (among other languages) is only partially well-documented (only Standard Chinese is considered well-documented). — justin(r)leung (t...) | c=› } 02:26, 11 February 2021 (UTC)[reply]
@Mahagaja, Chuck Entz, Dixtosa, Metaknowledge, Vox Sciurorum, Fay Freak, Atitarev This discussion seems to be petering out. I would like to poll people to see what people think about the following two splits: (1) CJK, i.e. all varieties of Chinese, Japonic (= Japanese + Ryukyuan), Korean only, not including other Sino-Tibetan languages; (2) Latin + Romance. Please respond to the following ("support" means you would like to see the relevant split happen now; "oppose" means you would not like it to happen now, not committing yourself one way or another to a later split): Benwing2 (talk) 03:13, 14 February 2021 (UTC)[reply]
@Benwing2: Thanks, I have voted for (1). Please give more detail re (2) Latin + Romance, which languages or groups are included, any exceptions for languages written in multiple scripts? --Anatoli T. (обсудить/вклад) 03:20, 14 February 2021 (UTC)[reply]
@Atitarev To me, Latin + Romance simply means Latin + all Romance languages. It's rare to have Romance languages written in any script but Latin letters, although it occasionally happened with Mozarabic and maybe Ladino. I don't think we should make an exception for such cases; they are very rare in any case. I don't have any opinion as to whether we should include other Italic languages (Oscan, Umbrian, Faliscan). Benwing2 (talk) 03:46, 14 February 2021 (UTC)[reply]
@Benwing2: Yes, we already have over 200 Ladino entries written in the Hebrew script. The other non-Latin script Romance language that occurs to me is Romanian in Cyrillic, which was common first in the 19th century, then in the Moldavian SSR and up to today in Transnistria. —Mahāgaja · talk 07:13, 14 February 2021 (UTC)[reply]

Option 1: Split off CJK from WT:RFVN now (support/oppose)

[edit]
@Metaknowledge Are you now satisfied that there is enough support to split? Benwing2 (talk) 02:32, 21 February 2021 (UTC)[reply]
@Benwing2: Of course. But you didn't need to satisfy me — my point was that you needed to satisfy the people who will actually be doing the work. —Μετάknowledgediscuss/deeds 02:51, 21 February 2021 (UTC)[reply]

Option 2: Split off Latin+Romance from WT:RFVN now (support/oppose)

[edit]

Hyphens for Korean affixes on Wiktionary.

[edit]

Hello!
I want to make a request about the addition of hyphen Korean affixes. The removal of such hyphens has previously been discussed in Beer parlour in 2011, but I think not using them, for Korean at least, is kind of wrong. First issue is the consistency. In other languages of Wiktionary, separating an affix page with a hyphen is a regularity and Korean doesn't block people from using them. Secondly, the main dictionary used for Korean terms, pyojun'gugeodaesajeon - Dictionary of Standard Korean, uses hyphens to distinguish between suffixes and words. As such, such words as 어서 (eoseo) have 2 entries in the aforementioned dictionary: 어서 (quickly) and -어서 (because). As for the issue of redirecting, I think it's possible to create a hyphen version of affixes in Korean Wiktionary to cause less trouble for the learners and searchers, which has already been mentioned by sche in the previous discussion:

"We also have the option (if our Japanese and Korean editors prefer to include the hyphens in the page titles) of creating unhyphenated pages as redirects, and asking the Japanese and Korean Wiktionaries to create hyphenated versions as redirects. This is how en.Wikt and de.Wikt (which use l') link to and from fr.Wikt (which uses l’)."

I personally believe this is a great idea. Such action will keep consistency and reduce overflow in one pages, when it's completely unnecessary. I do not know how Japanese works, but as for Korean, I think considering 2 main reasons: consistency and the source of terms, is enough to add the function back. Yes, Korean may not use hyphen in the actual texts and such, but in many Korean learning resources, suffixes are still separated either by a hyphen or a tilde, however, the latter isn't really used for other languages. Please let me know what you think. This has already been discussed with @Karaeng Matoaya and, as of now, he agrees with the proposal. -Solarkoid (talk) 18:04, 3 February 2021 (UTC)[reply]

@LoutK, what do you think? I think this will help remove clutter in entries like (i), which currently has twenty etymologies—creating (-i) and moving the particles and verbal suffixes there would reduce that to a "mere" thirteen, once we get rid of the useless "Hangul syllable" entry. On the other hand, this would be less convenient for suffixes with clear etymological connections to free morphemes, e.g. Sino-Korean ones. A solution (one I might personally prefer, though I'll have to think a bit more) might be to move all the verbal suffixes and case-marking noun particles to hyphenated lemmas while keeping noun suffixes with clear equivalent free morphemes together with their free forms in non-hyphenated lemmas.--Karaeng Matoaya (talk) 01:23, 4 February 2021 (UTC)[reply]
  • For JA entries, I'm not a fan of hyphenating -- no other resource that I'm aware of uses hyphens, and I don't think they're necessary.
That said, I have no opposition to hyphenation for KO entries -- written Korean has many more homographs than Japanese, and thus much more potential for huge polysemic entries like Karaeng's (i) example above. (FWIW, I like the idea of keeping standalone entries and affixes together under hyphen-less spellings, and moving those affix entries that are etymologically distinct off to hyphenated spellings.)
‑‑ Eiríkr Útlendi │Tala við mig 01:44, 4 February 2021 (UTC)[reply]
I wouldn't mind the creation of hyphenated lemmas. But, I really like Karaeng's solution. If we were to unconditionally move everything, including Sino-Korean, it would actually complicate things more than it should be. For example, would be split among (Han, historical dynasties) and (-han, man; person), a suffix not used in isolation. And this would be the same for with (mu) and (mu-), which is already neatly organized under (mu). I would much prefer both definitions be under one entry as both share the same etymology. — LoutK (talk) 18:05, 4 February 2021 (UTC)[reply]
@LoutK Ah... Okay, now that's another obstacle to be tackled. What do you think of this: If a 한자어 exists on its own and as an affix, let's keep it under the same etymology and create a redirect link from that affix with a hyphen to the 한자어 Korean reading page, as in: Create (mu-) and make it be redirected to (mu)'s second etymology. If that is not the case, then why not just make the hyphenated page be the main one? Thank you :D (Actually to add, in Georgian we have არა- (ara-, negative prefix) and არა (ara, no) though they have the same etymology). -Solarkoid (talk) 19:17, 4 February 2021 (UTC)[reply]
I'd prefer the status quo. Apart from the reasons stated above, it doesn't seem a Korean convention (not used in Korean dictionaries) but the an the headword and the automated transliteration are already in place, e.g. the suffix 이시여 (isiyeo) is displayed as —이시여 (-isiyeo) and is transliterated as "-isiyeo".
We had similar discussions regarding the Arabic somewhere. If I am not mistaken, the agreement was not to include hyphens in prefixes, articles and siffixes but a taṭwīl (an elongatation symbol ـ) e.g. the definite article ال (al-) (always spelled together with the related word) optionally displayed as الـ (al-) and a hyphen could be used in transliterations. --23:52, 7 February 2021 (UTC)

@LoutK, Solarkoid, Atitarev, Eirikr, Suzukaze-c, I think this should be revisited. (i) currently has twenty-one (!) etymologies, and more dialectal suffixes and forms could still be added. As a result, the critical "two" and "tooth" senses have been relegated to Etymologies 11 and 12, which is simply not ideal. The other option is to sort the etymologies based on some nefarious notion of importance or frequency, so that "two" comes after Etymology 2 (the subject-marking particle) but before Etymology 8 (a verb-deriving suffix for certain ideophones). But this is tricky business and makes the sequence of POS headings look random.

Incidentally, as Solarkoid pointed out, it's not actually true that Korean dictionaries do not use hyphens. The Standard Korean Language Dictionary uses hyphens to mark verbal suffixes, e.g. -더라.--Tibidibi (talk) 15:24, 2 March 2021 (UTC)[reply]

@Tibidibi From a practical standpoint, I think it would be more beneficial to sort etymologies based on importance or frequency. But yes, this would inevitably be based on a somewhat arbitrary standard.
Honestly, I'm still not sure about creating hyphenated pages. Perhaps, we can make exceptions for entries like (i) where it might be more beneficial to create a separate hyphenated page. However, I don't think this would be practical for most other entries. — LoutK (talk) 23:10, 2 March 2021 (UTC)[reply]
No opinion; you are the experts. —Suzukaze-c (talk) 23:43, 2 March 2021 (UTC)[reply]
@LoutK, having played with the order of Etymologies in , it seems I was wrong and it works fine without separating the suffixes ("two" is now Etymology 3 and "tooth" is Etymology 4, which seems broadly appropriate). So I'm back to neutral, maybe even oppose−there doesn't seem to be a practical benefit to making hyphenated entries after all, and the workload (fixing usex links, etc.) would be very annoying.--Tibidibi (talk) 00:18, 3 March 2021 (UTC)[reply]

Word of the Day theme for April Fools' Day 2021

[edit]

Hello, all. For April Fools' Day 2021, I'm thinking of featuring six pairs of interesting words which are anagrams of each other. We already have three pairs at "Wiktionary:Word of the day/Nominations#Other; please nominate another three more – bonus points if you can find anagrams of words already in the nomination list! Longer words are probably more interesting (suggestions like bat and tab are too dull). — SGconlaw (talk) 22:00, 3 February 2021 (UTC)[reply]

Derogatory/pejorative

[edit]

Some time ago, these two labels were merged, because it was argued that there isn't a clear distinction between them. Maybe, but I don't think it was a good move.

Imo, derogatory carries a nuance of belittling or insulting that pejorative doesn't necessarily have. When I say that someone has a mémoire sélective (selective memory) (maybe not the best example), I'm pointing out an attribute that is generally disliked and frowned upon, but I'm not "belittling" them or "insulting" the person for it. PUC22:53, 3 February 2021 (UTC)[reply]

I think the distinction is too subtle to make having separate labels worth it. If it’s really necessary to make this distinction for a particular entry, put it in a usage note. — SGconlaw (talk) 18:37, 4 February 2021 (UTC)[reply]
Prior discussion was WT:BP/2018/July. I thought "derogatory" was stronger, harsher than "pejorative", like "rare" vs "uncommon", but this doesn't seem to be reflected in how other dictionaries define the two terms; they do not seem to bear out the idea that "pejorative" is weaker, and in this wordreference thread someone says they thought "pejorative" was stronger and "derogatory" was euphemistic! It seems like there is not a consistent / maintainable difference. - -sche (discuss) 05:37, 8 February 2021 (UTC)[reply]
The Collins and AHD thesauruses show these terms as synonymous. Whatever subtle differences in connotation there may be or have been, they are not something we could rely on now. DCDuring (talk) 22:15, 8 February 2021 (UTC)[reply]

Separate entries for reflexive verbs

[edit]

Suppose a verb X carries some distinct meaning(s) when used reflexively. Right now, this seems to be handled inconsistently in English, with a mix of the following options:

  1. Add a definition for the reflexive sense at X
    Examples: pride, acquit, trouble, lower
  2. Create a separate entry at X oneself
    Examples: help oneself, occupy oneself, kick oneself, shit oneself, piss oneself, express oneself, sun oneself (though, in the last two cases, there are generic transitive senses at express and sun which arguably subsume the reflexive configurations)
  3. Both
    Examples: top / top oneself, soil / soil oneself, carry / carry oneself

For more examples, see Category:English reflexive verbs, and intitle:oneself.

I found that this has been discussed before. That thread itself links to 7 earlier discussions of the topic. It seems like there was at least tentative support for this policy described by Mahagaja: "regardless of semantics, we only have separate entries for reflexive verbs in languages where the reflexive particle is written together".

But, to quote another participant, it continues to be unfortunate that "rules and decisions are lost and forgotten from one generation of contributors to the next". I'm wondering how we could go about codifying a consensus on this question (if not globally, at least for English). I noticed that Mahagaja's rule is reflected at Wiktionary:About French, and Wiktionary:About Czech, but we have nothing about reflexive verbs at Wiktionary:English entry guidelines. I'm not familiar with the processes around policy here. Would it be acceptable in this case for me to BOLDly add this to WT:AEN and see if it sticks? Or would this need to go through some RfC process? Also, are there other policy pages where this could be codified?

I'd be eager to work on cleaning up the inconsistency around these entries, but it's unclear to me at the moment which direction to standardize on. Colin M (talk) 22:28, 4 February 2021 (UTC)[reply]

I'd be in favour of not having separate entries for reflexive forms, which is currently the case for most languages. The one potential problem with this is that a lot of English speakers probably don't know what "reflexive" means in a label, since it's not as common in English as in, say, Romance languages. The solution, I guess, would be to make sure they all have usexes. Andrew Sheedy (talk) 04:18, 5 February 2021 (UTC)[reply]
You can’t decide it globally, because of polysynthetic languages and because of different lexicographic traditions. The Macedonians and Bulgarians find it consequential to usual practice to add reflexive verbs as page titles with separately written reflexive pronouns, whereas for Serbo-Croatian it is inacceptable, whereas for Russian the reflexive verbs are considered more fused, being written together, and deserve entries as derivations. For German it is generally unexpected to have entry titles of reflexive verbs barring longer idioms. I don’t know what you have in English. The meaning of “reflexive” not being understood in labels at least is a poor reason for any decision – no need to foster lacking the lack of basic education. Fay Freak (talk) 20:09, 6 February 2021 (UTC)[reply]
We already have long established conventions for languages where the reflexive particle is separate from the word and (can often be written far away from the verb) but what I found is missing is the information for users what that particle IS. E.g. I don't find befinden: (reflexive) to occupy a place a very helpful definition. Where is the reflexive particle sich? The actual term is sich befinden, not befinden.
Now, compare with the Bulgarian ка́звам (kázvam) (sense# 2). Displaying it as (reflexive) (~ се) to be called is much more useful, IMO. It's achieved with the template: {{bg-reflexive}}. --Anatoli T. (обсудить/вклад) 01:24, 8 February 2021 (UTC)[reply]

Wiki Loves Folklore 2021 is back!

[edit]

Please help translate to your language

You are humbly invited to participate in the Wiki Loves Folklore 2021 an international photography contest organized on Wikimedia Commons to document folklore and intangible cultural heritage from different regions, including, folk creative activities and many more. It is held every year from the 1st till the 28th of February.

You can help in enriching the folklore documentation on Commons from your region by taking photos, audios, videos, and submitting them in this commons contest.

Please support us in translating the project page and a banner message to help us spread the word in your native language.

Kind regards,

Wiki loves Folklore International Team

MediaWiki message delivery (talk) 13:25, 6 February 2021 (UTC)[reply]

If you look at the history for the individual pages (A-F, G-P, Q-Z) you will see that there was really very little activity in 2020 (despite everybody being stuck at home). I think the likes of Urban Dictionary are now popular enough for people inventing words to go there instead. The protologisms don't really serve any purpose for our project and are mostly not even interesting or funny. Equinox 18:21, 6 February 2021 (UTC)[reply]

Support. As I understand it, LOP was originally intended to be a kind of shunt for people who would otherwise make a mess in the main namespace. Honestly, I don't see the need for that, and I don't like the idea that the appendix is a dumping ground for crap nobody actually wants in a dictionary. It seems worth one last scan for anything that might genuinely have become attestable, and then it can be deleted. —Μετάknowledgediscuss/deeds 19:14, 6 February 2021 (UTC)[reply]
Vote created: Wiktionary:Votes/2021-02/Retire_the_Protologisms_appendix. Equinox 05:36, 14 February 2021 (UTC)[reply]
Vote moved to Wiktionary:Requests for deletion/Others#Wiktionary:List of protologisms. —Μετάknowledgediscuss/deeds 21:42, 15 February 2021 (UTC)[reply]
[edit]

Citing the crowdsourced website Wiktionary, they argued "the 2000s" could refer to "the period from 2000 to 2999," and that Maxwell couldn't possibly see into the future.

The 2000s page could be improved. DTLHS (talk) 15:13, 7 February 2021 (UTC)[reply]

I've RfVed the century and mellenniumn senses. For all we know the anon edits could have been by parties to the lawsuit (or their agents(. DCDuring (talk) 17:51, 7 February 2021 (UTC)[reply]
Quite funny, but sad that legal representatives are permitted to make such obviously bad-faith claims. Equinox 17:53, 7 February 2021 (UTC)[reply]
They can make the claims, but they are likely to be laughed out of court. It is embarrassing that we have had both of these definitions since 2014 with no citations. DCDuring (talk) 17:57, 7 February 2021 (UTC)[reply]
Making citations mandatory would improve the quality and reliability of Wiktionary very much. A start can be making it mandatory to give sources in the edit summary or requiring them for new entries. As it is now, en.wt will never be realiable, trustworthy. (Even with WT:RFV hoaxes can lie inside of en.wt without being noticed by anyone.) --幽霊四 (talk) 02:34, 8 February 2021 (UTC)[reply]
It would also be unworkable in general and quite pointless for a lot of senses. ←₰-→ Lingo Bingo Dingo (talk) 12:48, 14 February 2021 (UTC)[reply]
Oh wow?!
Many AncientPages have not been paid attention to. Anybody interested in a project to clean up the Ancient Pages, by considering them with the policy and standards now?119.56.103.124 13:50, 31 March 2021 (UTC)[reply]

Limits of Old Spanish

[edit]

I am trying to do a mass cleanup of {{etyl}} for Spanish entries. Essentially, I load the 2100 or so pages using this tag into a single file, then make all the edits needed, then push the results back using my bot. I have done this a lot in the past; for edits of this sort, I always add "(manually assisted)" in the bot changelog message. I'm running into a few issues, however:

  1. What is the limit of Spanish vs. Old Spanish? For example, can Spanish directly borrow a term from Andalusian Arabic (which must have happened pre-1492), or is there always an Old Spanish intermediary? Similarly with Classical Nahuatl (I think these borrowings generally happened in the 1500's). Wikipedia says Old Spanish goes up to the early 15th century, but it also says the boundary of Old Spanish occurred "before a consonantal readjustment gave rise to the evolution of modern Spanish"; this sound change occurred c. 1550-1600.
  2. Does anyone know of an Old Spanish dictionary? I'm having a hard time even finding a reference to one, much less an online source.
  3. What are the best sources of Spanish etymology? There don't seem to be very many good online sources.

Thanks! Benwing2 (talk) 23:21, 7 February 2021 (UTC)[reply]

@Benwing2:
First question: I have used to set the line at 1492 to avoid difficulties with borrowings in America’s prior languages, and because this is the similar to the 1500 line of other European languages. By mutual intelligibility of the chronolects the line may be seen one or two generations earlier in the past. Moroccan Arabic كابوس (kābūs) has been borrowed right at the line. It is true that most borrowings from Andalusian Arabic into Iberian Romance must have happened in Old Spanish in any case, however note that Andalusian Arabic was spoken until the early 17th century in Spain, the speakers didn’t just vanish by one expulsion. In the early 1600s in Spain one still needed court interpreters for Arabic – till the language was completely eradicated due to intolerance. Sound changes are overrated for language division.
Third question: I am not sure particularly about best sources or all-encompassing sources, dealing with Spanish only incidentally when coming from specific onomastic topics or relatives (this is also an approach that works after all). There is Corominas, there is recently Edward A. Roberts, but all have flaws. Spanish, similar to though less severely than English, has obtained its loans from so sundry directions that it is difficult for an individual to be good at all. For Arabisms one has comprehensive coverage by Corriente, Federico (2008) “anything”, in Dictionary of Arabic and Allied Loanwords. Spanish, Portuguese, Catalan, Galician and Kindred Dialects (Handbook of Oriental Studies; 97), Leiden: Brill, →ISBN but lately Corriente, Federico, Pereira, Christophe, Vicente, Angeles, editors (2019), Dictionnaire des emprunts ibéro-romans. Emprunts à l’arabe et aux langues du Monde Islamique (in French), Berlin: De Gruyter, →ISBN, for all Ibero-Romance. For Americanisms there are some but little-known works. I have Diccionario de Americanismos by Marcos A. Morínigo on my shelf, as an example what there is. Fay Freak (talk) 01:00, 8 February 2021 (UTC)[reply]
It looks like this SP/OSP distinction is often not made. The DRAE includes some terms which are clearly OSP, and then they end up here as Spanish. – Jberkel 21:45, 31 March 2021 (UTC)[reply]

Durable archiving

[edit]

If we take screenshots of e.g. websites, and upload to Wiktionary/Wikimedia, does that count as "durably archived"? (Yes, I am aware that screenshots can be faked fairly easily.) Mihia (talk)

In practice, "durably archived" seems to refer to the type of source rather than anything to do with how it is archived. I think it is high time that this language was relitigated. Perhaps we should have a more positive criterion; for example, a list of sources that are deemed acceptable that can be easily amended. DTLHS (talk) 01:35, 8 February 2021 (UTC)[reply]
Pictures on Wikimedia can be altered/deleted as well, can't they? (Even though for alterings/modification there's a version history.) And the part you mentioned in brackets also means, it's unreliable, not trustworthy. One could easily provide three fakes, and claim they're real pictures and that the original source is gone. --幽霊四 (talk) 02:34, 8 February 2021 (UTC)[reply]
Is that really that hard to do now? I could easily make up three books and claim that none of them are listed in Google books. I know I've read books that are nowhere to be found on the Internet. Would anyone really check? (The trick might not work in RFV, but it wouldn't be hard to create an entry like this that no one ever caught.) Andrew Sheedy (talk) 05:48, 8 February 2021 (UTC)[reply]
Indeed, fakes aren't a high concern. They're easy to accomplish, if you know what you're doing, but if you know what you're doing around here, you clearly care enough about the dictionary that you're very unlikely to add fake cites. (I'm sure WF will claim he's added fake cites, but he cares more than he lets on...) —Μετάknowledgediscuss/deeds 06:30, 8 February 2021 (UTC)[reply]
Sure, I've added fake cites before, because I'm a freaking anarchist. Sadly, I don't keep a list of them, but occasionally some fake stuff I added gets picked up - a recent one found was cuntbutt. MM0898 (talk) 14:53, 8 February 2021 (UTC)[reply]
Screenshots can be rather awkward to work with, and may be rendered wrongly by various browsers etc. It seems it would be preferable to extract the actual text or HTML in some way. There are copyright issues in either case. Equinox 02:45, 8 February 2021 (UTC)[reply]
One side effect of the "durably archived" rule is to give heavier weight to professionally edited material than to a typo-laden rant somebody banged out on a keyboard before rushing out the door. We shouldn't try to systematically preserve stuff we like without considering how to change the CFI rules. This also applies to, for example, comments sections preserved by search engines and archive.org. Vox Sciurorum (talk) 09:41, 8 February 2021 (UTC)[reply]
For some days I have a rule formulation floating in my head to implement, and I finally express it, since derises grow, the old increasingly being viewed as antiquated, the virus holding closed the libraries etc.
As a third point after “Attested” means verified through in WT:CFI:
“consistently appearing on the internet.”
It means that, as some have realized earlier but I do not want to seek out now from the archives, that if hundred pages use a term then it does not matter that those could all vanish if at the same time in some years there are other pages replacing the vanished pages as a term has become recurring internet vocabulary. It is a dynamical concept of durability. It excludes protologisms because they do not appear consistently, and it excludes typo-laden rants because typos are not that consistent. And most importantly, we can thus without experiment keep 🦀 used to convey joy or excitement, we can have Anitwitter, we can have glownigger. What we can’t have is what just X and his friends use (→ protologisms), it is not carte blanche for private language. Processually, if pending a request for verification concerning such a term it is assessed that the term is thus consistent in appearance then it does not matter either if later a term’s uses have vanished from the internet “completely” because of being outdated etc., for then Wiktionary archives traces of the past in its files, so because only the ex ante view matters there is no danger of contradiction by reality. Still it is not fine to have linkrot in the mainspace so one would avoid links in it that aren’t intended to be durable in principle (this linkrot is, correct me if I am wrong, the main reason for the durability barrier in the first place), but there is agreement that the citation namespace can document low-durability quotes, as showing the basis the editors have worked on (heck, our best people even quoted some Twitter utterances by philologists for etymologies). Fay Freak (talk) 13:02, 8 February 2021 (UTC)[reply]
  • There is already provision in the CFI for attestation through "clearly widespread use", which would apparently include the case where the term is used on "enough" websites, even if the web pages are not individually durable. However, I would like us to be able to attest usage by reference to an individual web page, provided the content is "clearly sensible", however we can best define this, e.g. not random gibberish, non-native gibberish, highly ephemeral/casual content, etc. One way to bolster the evidence of usage, make it slightly harder work for people to fake citations, and provide some degree of permanence would be to take a screenshot and upload it. I note the copyright concern raised by Equinox. Would we be able to claim some sort of "legitimate use" exemption for these purposes? Another thing has also occurred to me. Do we consider content on Internet archive services to be "permanently archived"? I'm not very familiar with using these. Are they sufficiently reliable and complete in their coverage for our purposes? If we do allow them, do we have any particular preferences for one over another? Mihia (talk) 18:24, 8 February 2021 (UTC)[reply]
    If only this was how people interpreted that line. That's what I always thought it meant, but then I've frequently seen people saying it doesn't apply to non-durably archived material. If so, then what on earth is the point? I think clearly widespread use should be good enough, regardless of whether it's archived or not.... Andrew Sheedy (talk) 23:54, 8 February 2021 (UTC)[reply]
Then what people frequently say is certainly not what the text of the CFI actually says. Mihia (talk) 10:12, 9 February 2021 (UTC)[reply]
  • An archive.org backup of a website is superior to a screenshot in just about every way. It's much quicker than taking a screenshot and uploading to Wikimedia Commons, it's not susceptible to forgery, and it can capture the full context (whereas, for a page with lots of content, you're only going to be able to capture a screen's worth of it in a screenshot). You can proactively force their crawler to take a snapshot by going to https://archive.org/web/ and entering the URL under "Save Page Now". Also, templates like Template:quote-web include an "archiveurl" parameter so you can provide the original url and an archive link. The only case where this won't work is pages that aren't accessible to the crawler, for example a forum post that can only be read if you register an account. I think such sources should be avoided if at all possible. Colin M (talk) 20:25, 8 February 2021 (UTC)[reply]
So (question to all) is it, then, generally accepted that an archive.org link to "sensible" web content counts for attestation purposes? If it does, somehow I have never known that. Mihia (talk) 20:41, 8 February 2021 (UTC)[reply]
No! Equinox 20:43, 8 February 2021 (UTC)[reply]
So why not? Mihia (talk) 20:44, 8 February 2021 (UTC)[reply]
In addition to the quality problem I mentioned earlier, archive.org could go away at any time. For example, they got sued over massive copyright violation last year. That could put them out of business. Or a change to copyright law or online liability law ("Section 230" in the USA) could cause content to disappear. Vox Sciurorum (talk) 21:12, 8 February 2021 (UTC)[reply]
There would need to be some regulation about the quality of content, I agree, but I don't see why the fact that there is a lot of crap on the web should prevent us from using the "sensible" stuff. We allow Usenet, after all, which is unedited/unregulated, and also contains a lot of crap (in fact, to my eye, the CFI gives Usenet a peculiarly high prominence, even ahead of printed books). In respect of quality, why would we allow Usenet and not "sensible" web content? Mihia (talk) 21:57, 8 February 2021 (UTC)[reply]
Seems like an argument that could be used to disqualify any online source. WT:ATTEST approvingly mentions "Usenet groups, which are durably archived by Google". And yet, Google could shut down its Google Groups service at any time. The whole company could be sued out of existence. But I think it's very unlikely either Google Groups or Archive.org is going to disappear overnight. Colin M (talk) 21:58, 8 February 2021 (UTC)[reply]
Indeed. I think it's a horrible argument, to be perfectly frank. There are plenty of books, movies, etc. out there that could theoretically be completely destroyed. The Internet could be destroyed and then nothing on the Internet would count as durably archived (obviously, this would negate the need for any Wiktionary CFI, but hopefully my point is clear). There are books I have that are not fully searchable anywhere online and would be very hard to find in a library. Are they not durably archived? It seems that CFI is a bit behind the times, intended for a time when the Internet was new and its future not certain. It looks like it's here to stay, short of a global catastrophe, and I don't think we should be so restrictive about our CFI.
Here's another thought: an RFV discussion that passes is usually a good indication that the cites were findable at the time of the discussion. Could a word not be kept on the basis of, say, 5 tweets, and then the discussion archived? The discussion would serve as much for attestation purposes as the cites themselves. If necessary, the RFV could be renewed if the tweets no longer existed, and new cites found. But if a word or emoji never makes it into a book, it seems really strange that we'd exclude it on those grounds alone. Andrew Sheedy (talk) 23:54, 8 February 2021 (UTC)[reply]
The CFI in this regard seems at once up with times in its comment that "Wiktionary is an online dictionary" and behind the times by saying that "this naturally favors media such as Usenet groups". Mihia (talk) 00:22, 9 February 2021 (UTC)[reply]
As we move away from the concept of "durable archiving" (whatever that means), it means the obligation of editors to faithfully record what they see is higher, since Wiktionary itself is now the "durable archive". What does it mean when we have a citation from a website from 2020 that no longer exists in any form in 2030? Do we have to go back to the editor that originally added that citation and make some judgement on their reliability? IMO there now needs to be a higher emphasis on reviewing citations as they are added. DTLHS (talk) 20:27, 9 February 2021 (UTC)[reply]
What I had in mind is that non-durably archived cites could still be challenged in RFV. So, if I create an entry with two cites from Twitter and one from a book, it could still be brought to RFV. If those cites no longer existed, they could be removed or made less prominent. If three new cites were found on, say, Twitter and Reddit, they could be added, and the RFV would serve as proof that the cites were at one point authenticated, even if they disappear later. So the only thing that would change from the way things are now (aside from allowing non-durably archived cites in the first place) would be that an entry with three cites could still be challenged once in RFV, and the original cites wouldn't count for anything unless they still existed. Andrew Sheedy (talk) 20:57, 9 February 2021 (UTC)[reply]
We'd probably prefer to know when a word first came into use, if possible. Equinox 21:02, 9 February 2021 (UTC)[reply]
That's a good point, but I think that just makes the out-datedness of the current CFI even more apparent, given that so many words take off on the Internet before becoming common in print. I think that if one cite from a given year was deleted, it wouldn't be hard to find another from the same time anyway, so I don't think this would be much of an issue in practice. Andrew Sheedy (talk) 05:36, 10 February 2021 (UTC)[reply]
I wasn't active when the language approving of Google groups was written. In the old days Usenet was highly redundant and it wasn't hard to find a new source of news. Over time it became more dependent on a few large servers. Google groups shows why we can't trust one company: several editors here have complained that it can no longer be searched easily, or at all. The NNTP server I used to use shut down a decade or more ago and I never bothered to find another. The Usenet services I found last year did not offer free public searching. (Perhaps there is one that does; I didn't look hard.) I wouldn't object to saying modern usenet posts, say after 2000 or 2010, don't count as durably archived. Vox Sciurorum (talk) 13:50, 9 February 2021 (UTC)[reply]
Today I went to look up bear cat in the OED. Their most recent quote for the slang sense is actually a tweet!

2016 @prettyoddissy 2 Feb. in twitter.com (accessed 13 Aug. 2019) Nothing better than coming home to pizza after braving that bearcat of a snow storm.

Apparently they've been looking to Twitter for quotes since at least 2014, and have been really ramping up their use in recent years, with Twitter being the second most cited source among quotes added to the OED in 2019.
So Wiktionary is officially fuddy-duddier than the Oxford English Dictionary. Let that sink in. Colin M (talk) 18:02, 31 March 2021 (UTC)[reply]
The OED has the capability to do their own archiving. Wiktionary does not. You should think about the distinction. DTLHS (talk) 18:08, 31 March 2021 (UTC)[reply]
As a non-admin outsider who has done some citations/quotations, I would like to comment about my feeling here. When I quote three resources that happen to be on archive.org or a website, I add the ISBNs etc and assume that the discussion is "not over" yet: I'm just providing a preliminary basis for concluding that "someone should look closer" if an issue about the existence of a given word is ever seriously raised. I am no expert; I'm just trying to show what may be out there. I assume it will always shock the conscience of the admins and editors to delete words that have a substantial evidentiary basis, regardless of durability or theoretical issues. I would do challenges on more words that I cite to have decisions that confirm their status, but some admins would think I am wasting the time of the website. When I have tried to confirm some words without doubt, I have gotten mixed messages about the validity of various citations and their durability etc, so I just add as many as I think a good dictionary might want to see- covering various time periods, usages, etc, well beyond "three", a puny number. Yes, there is a possibility that some authority will come in and wantonly wreck it all one day, but that's the risk with the whole Wikimedia project. Wiktionary is duplicated on other websites, so the words are still documented elsewhere even if crazy rules are applied. --Geographyinitiative (talk) 19:45, 31 March 2021 (UTC)[reply]
New Twitter: namespace, anyone? Jokes aside, it'd be great if we had resources to do this kind of work, to build our own archiving and indexing tools. But as it stands we even have to haggle over lousy 50 megs of Lua memory. – Jberkel 21:33, 31 March 2021 (UTC)[reply]
Reflection I have just had my first positive experience with the 'durably archived'/'permanently recorded media' requirement. The words do seem fuddy-duddy and ridiculous at first, but the underlying concept seems to be very correct. The idea is that whatever is being cited/quoted should be findable by other people, long-term. As long as some printed media is found in OCLC World Cat, then you can use Google Books or Internet Archive (archive.org) to look at what's in the book, and you can cite to pages in that book. I plan to slowly go back through my citations/quotations and check to see if they are found in WorldCat libraries adding the OCLC number if I see them. Just because something is on Google Books or archive.org does not mean it's durably archived or permanently recorded. Those sources just help us see what's in the book or source. The question comes when some words that seem to be rarely used anywhere but online. In my experience, I would say that those words are provisionally allowable, but if the sources where they exist are no longer accessible at some point, then inclusion becomes untenable. --Geographyinitiative (talk) 14:50, 8 April 2021 (UTC)[reply]

Translingual

[edit]

It's obvious that there's much inconsistence in en.wt:

  • Taxonomic adjectives are sometimes entered as Translingual (mul) and sometimes as Latin (la), sometimes even when they were never used in Latin.
  • Terms derived from Latin, like law terms, are sometimes entered as Latin (ex turpi causa non oritur actio), English (hostis humani generis) or Translingual (ius cogens).
  • Species names are sometimes mentioned with an internal wiktionary link, sometimes with an external wikispecies link albeit there's not necessarily a wikispecia entry ({{taxlink}}).
  • Constellations: Translingual Andromedae is given as genitive of English Andromeda. And in English Andromeda Translingual Andromedae is given as a derived term. Translingual And links to both Translingual Andromedae and English Andromeda. Translingual Piscium is given as genitive of Translingual Pisces (no entry), while English Pisces gives English Piscium (no entry) as derived term.

It's also obvious that to some extent the handling isn't based on attestion, usage, hence contrary to WT:CFI.

  • Latin pneumophilus even states: "Used exclusively as a taxonomic epithet and thus not inflected except in the nominative singular; other inflections are theoretical." A note which would be incorrect for Latin, even for scientific taxonomic Latin.
  • Latin albifrons, Latin iudex non calculat failed WT:RFVN (i.e. were created without Latin attestion), while Translingual albifrons, English iudex non calculat do exist.

It's also obvious that to some extent people do what they like or prefer:

  • Tyrannosaurus: Uses italics for the head and has a † before the name. WT:Taxonomic names however still has it as a question, as an undecided matter.
  • English jus cogens was moved to Translingual ius cogens, and as far as I can see without any discussion or community approval.
  • Translingual Homo sapiens contains information about inflection, including a Latin inflection template totally unfitting for it, and as far as I can see too without any discussion or community approval.

Things which have to be considered:

  • Pronunciation, inflection, gender - cp. WT:About Translingual#Rejected.
    • Gender:
      • Translingual Nix Olympica, albeit feminine in origin as can still be seen in Olympica, is used as masculine in German in "der Nix Olympica" (maybe because of Berg m, Krater m, Vulkan m).
      • Translingual ius cogens, albeit neuter in origin, is used as masculine in French (French has no neuter).
      • (German) uses Felis with articles and as feminine (because of felis f or Katze f). (German, without proper noun capitalisation) has masculine or neuter "namen des felis catus", which is from Latin cattus (catus) m (male cate) in apposition and not from an adjective *catus (-a, -um). (same) is similar. (German) has masculine/neuter "des Felis pardus" (genitive). (French) has "le Felis spelaea", from spelaeus (-a, -um), that is: The Latinate/internal gender expressed through the adjective is feminine but the French real/external gender expressed through French articles/pronouns is masculine. And the German examples hint that the German real gender is feminine and masculine/neuter (from the examples above it can't be decided).
    Was the issue of Latinate/internal vs. real/external gender ever discussed anywhere, or did the English lacking genders never considered this issue?
    • Latinate/internal gender could also be given in the etymology section.
    • Lack of real/external gender can make entries somewhat useless.
    • Inflection:
    Basically there are three ways: Inflect it as Latin, as somewhat native or mixed.
    • Latin inflection (6 cases, 2 numbers) was common in German too, and can also be seen in English homo sapiens (Citations:homo sapiens) which probably is an exception.
    • In English and modern German both a mixture and a somewhat native inflection are common, like Homo sapiens in the singular regardless of case, Homines sapienties (from Latin nom. pl.) or Homo sapiens (unchanged plural, which is common in German and still present in English) in plural regardless of case.
    • Macra (not talking about possible attested spelling variants):
      • Translingual Homo sapiens currently gives Latin inflection and with macra Homō sapiēns etc. The German pronunciation [ˈhmo ˈzpi̯ɛns] (not: [ˈhom ˈzapi̯ɛːns]) shows that the macra don't make sense translingually, and google books:Homō sapiēns hints this is not a common spelling actually used (if it's even attestable?).
  • Alternative forms, spelling conventions:
    • Is a capitalised German term like Jus cogens a translingual alternative form of Translingual ius cogens?
    • There's English [ie/i.e./i. e., German i. e.: For German dots and space are prescribed (Duden), so even if German i.e. exists, it's proscribed (for example by Duden). For English there's i.e.#Usage notes regarding the use of comma and italics in English.
  • Constellations and entries' correctness:
    • Latinate genitives (Andomedae, whether English, Translingual or even Latin) are not derived from English.
    • Does Translingual And really abbreviate English Andromeda? If And is used in multiple languages, then probably also the full form, though possibly in different registers (scientifically using the Latinate term, commonly using a native term similar to Big Dipper).

Things which should be voted upon:

  • (Fancy Style)
    Should some entries have special styling, like italics or † in the head?
    • Are taxonomic terms always used in italics? No. Should a non-italic taxonomic term be considered as an italic taxonomic term which by default is placed in italics (e.g. Homo sapiens = italics of Homo sapiens)? That's complicated and moreover ridiculous, isn't it?
    • † in head is ambiguous: Does it mean the species is extinct or that the term is obsolete? Both is better explained in the usual way: "(obsolete) a species" or "an extinct species" or combined "(obsolete) an extinct species".
    • If † is used for species names, why not use 卐 for nazi terms (卐Führer), ✡︎ for Jewish things (✡︎Torah)? It's ridiculous too, isn't it?
  • ({{taxlink}})
    Should the template be used inside of entries in sections for hypernyms/hyponyms (..regna/../genera/species/..), or should it be limited to Further reading?
  • (Translations)
    Should translingual terms can have translations sections?
    As of WT:About Translingual#Under discussion it's undecided;
    WT:Entry layout#Translations however also permits it for taxonomic terms;
    Translingual ius cogens, as a law and not as a taxonomic term, has a translation section too.
    • The awkwardness, inconsistence of constellation terms might be caused by some (old) rule like "only English entries can have translations".
  • (Macra)
    Should macra be added on Translingual (mul) terms based on the Latin origin even though it's not spelled this way and does make no sense? (This is not about actual spellings with macra, if they exist.)
  • (Attestion)
    Should terms really have to be attestable (WT:CFI)?
    For the beginning, asking more specifically:
    • Should taxonomic terms have to be attested?
      If a taxonomic term is only mentioned once ("We discovered a new subspecies and called it Fish and chips"), does it deserve an entry? Or does it need more, like usages, a certain number of usages, usages in multiple languages?
      Currently "Translingual" is not a WT:WDL. However, it's also not really a language and it can be argued, that for being translingual, a term must be attested in at least two languages.
      If taxonomic terms have to be attested, what's sufficient? Regular attestion in at least two languages; three usages - in one or multiple? - languages; ...?
      • Nix Olympica passed RFV with three usages in two languages being added to the entry.
  • (Gender)
    Which gender should be given? The real/external, the Latinate/internal or both?

--幽霊四 (talk) 02:39, 8 February 2021 (UTC)[reply]

Answers: One thing is that terms are both Latin and translingual. The statement that “ius cogens” is Latin and the statement that “ius cogens” is translingual are both true. Due to the nature of translinguality, it is also true to say it is Ukrainian and Romanian, but not on the same level – it does not mean we should have Danish duplicates of such phrases. Your delatinizing translingual entries is therefore a spectacular fail of realizing this close connection of translingual and Latin. The translingual is an ideal entity, therefore we want macrons and inflection tables (the Latin inflection table templates should account for this placement) and the genus should be the Latin one, irrespectively of how it is used in French. If ius cogens has nominal class 35 in some Bantu language this does not mean the head template should include it; although the usage note can if the nominal class is hard to guess, which is not the case for the French masculine, so you see why I gave but the neuter.
Then, you are fanciful, in wanting it all so exact, the style is not fancy. Italics and † before a taxon are standard under certain circumstances, in the taxonomical sciences. If you don’t know the circumstances and find it special then why do you bloviate about it? Again here, nobody would see a problem here short of you. Because it is clear in those sciences how things should be italicized it is not plausible that anyone would see a need to vote upon it, as we shan’t have any vote contrary to science.
Attestation: Strange question. Apparently not three times as translingual is not mentioned in WT:WDL. Which is reasonable because if somebody reclassifies something this year it is likely it will be used by others, or if not then it is still inclusionworthy because somebody might stumble upon it and try to look up here what it means or what synonyms there are.
Alternative spelling conventions: Please avoid capitalised German terms like Jus cogens because this is already not based on a translingual rule but on a German rule, which is even different according to the various orthographical frameworks (and often not adhered to e.g. by authors which use Neue Rechtschreibung because of thinking German rules having nothing to say about the writing foreign terms).
“Should translingual terms have translations sections” – why not, if it is the best place. It is specifically mentioned under Wiktionary:About Translingual § Under discussion, that is not even by me—I added the caveat that Wikispecies also allows translations so it is for the specific matter of taxons avoidable on Wiktionary; but ius cogens is a translingual term with no corresponding English term though there be native terms in other language so therefore we have translation sections, it is unavoidable unless one argues, only to have translation sections only under English, that the translingual entry should be duplicated in such a fashion that we also have an English section under it; but no I reckoned reasonably that such a constellation that in English only a term is used that we should treat as translingual while other languages use native formations is possible. And yes, ius cogens was moved to translingual because with respect to serving all languages it appears to serve users best and depict the usage most accurately (regarding the question “which language is it?”); you can still add local pronunciations under the translingual pronunciation section (native pronunciations are always an argument people profer).
BTW theoretically, to solve the ever-arising question whether something is Latin or translingual, it might be possible to merge translingual into Latin and present Latin, that is even the Roman Latin, before English, but I think, apart from the fact of distinction loss in that case, it is easier to just categorize by practicality like I do. If a term is devised as translingual then it is translingual, and it is irrelevant in which languages the term is used. Radical and consequential, as well as unintuitive to monolinguals, but intuitivity shan’t assert itself as translingual terms are presented on the top of pages, as well as not against objectivity. See, it all has a system. Fay Freak (talk) 12:08, 8 February 2021 (UTC)[reply]
So if you really want votes, to formally append WT:CFI or WT:EL or other acts, abstract-general formulations of the rules may be the following:
1. “Without prejudice to the requirement of having been used, a term is translingual if devised as translingual.”
2. “If a lexical unit is not used in an individual language but as translingual then it only belongs to the latter.” (this solves the cases when something is formally Latin but is not used in Latin but as translingual; it also excludes Danish entries for translingual bonmots.)
3. “A translingual term may present diacritics, inflection tables, and similar grammatical information particular to an individual language if the term is manifestly closely connected to it.” (then it is also no undue hardship to not have Latin entries for certain words because the translingual term can have all the macrons and inflections)
This is without decision for the question about capitalization inside Latin. Fay Freak (talk) 12:32, 8 February 2021 (UTC)[reply]
Do we want to be the best English/German/French/etc. dictionary we can be? Then any system that prevents us from pointing out that "jus cogens" is far more common in English, and whatever forms are usual in those languages are usual in those languages, is a bad one. Certainly anyone searching for jus cogens in an English text is ill-served by being told it's an alternative spelling of iūs cōgēns.--Prosfilaes (talk) 07:47, 10 February 2021 (UTC)[reply]
It’s not any preventing system. 幽霊四 added something about gender in individual languages in the usage notes, and like that it is possible to add something about preferred spellings – though, in English there seems to be a regular adaption towards ⟨j⟩ spellings; however capitalization of such terms in German is also regular and I would not see a need of it being mentioned. Poor argument in any case to devise a system where one cannot see the forest for the trees because of individual language information.
The question always comes up under which headings content has to be sorted. If the editors only want to state something is used in a particular language, then this fact alone may not be enough to warrant a whole language section, if in the same fashion it can be stated for many languages (after all it is the very idea of “Translingual” sections); if it is about pronunciation, then it still can be avoided if the pronunciation of “Latin” terms is after the usual measures (it is still within the concept of “Translingual”); it is similar to why the existence of {{ar-IPA}} does not at at all compel Arabic editors to sort everything under Pronunciation N sections, for it would be disproportionately more cluttered than when one just ignores the pronunciations – which are obvious enough from the transcriptions and the spellings for anyone who repeatedly deals with the language. Fay Freak (talk) 10:54, 10 February 2021 (UTC)[reply]
Gender matters for Translingual terms when they are names of genera that might be combined with a species name treated as a Latin adjective. Sciurus niger but Polietina nigra. Offhand I can't think of any other cases likely to come up in modern writing. Often one finds phrasing like die Gattung Sciurus allowing a native word to control the grammar. Once upon a time people wrote species descriptions in Latin and there is still a rule of zoological nomenclature allowing names to be corrected to the nominative singular when they were first mentioned in a different case. Vox Sciurorum (talk) 15:51, 9 February 2021 (UTC)[reply]
@幽霊四 Please do NOT move Latin taxonomic etc. terms to Translingual, as there is no consensus for doing this. Benwing2 (talk) 05:10, 10 February 2021 (UTC)[reply]
Wiktionary includes far too much stuff under 'Translingual' simply out of a distaste for having multiple language entries for 'the same thing'. I think a radical reduction of allowed content for Translingual, that for instance includes taxonomic names but excludes legal, grammatical and musical jargon, would be a vast improvement. Language-specific comments about gender, semantic relations, variants and language-specific senses are better explained in the sections of the individual languages. The current setup is also immensely Eurocentric because the majority of Translingual cruft is Latinate. ←₰-→ Lingo Bingo Dingo (talk) 13:04, 14 February 2021 (UTC)[reply]
@Lingo Bingo Dingo: It is a vast improvement when we have moved the Latinate legal, grammatical and musical jargon to translingual. The distaste is not wrong but grounded, and it is sophistry on your part to pretend by this wale of wordhoard that I have appealed to emotion, which you alone do. Your contradistinction of taxonomic names is completely arbitrary and particular to your own taste, which particularity you fail to admit. Anatomical terms are standardized just like taxonomic names. Hence we come to other medicinal terms, names of pathological conditions with the body areas they occur in – a lot of names of muscles and bones have to be moved – as well as the organisms which are their agents. It is just a little but consequential step to declare grammar terms linguists and philologers made up to be used in all languages as well translingual. Music also distinguishes itself in its international character. The legal terms are translingual in so far as they usually do not refer to anything specific to one legal system but are supranational topoi. Obviously you know nothing about comparative law and private international law. The Eurocentrism is a false claim. If Japan is taking over German dogmatics then it is a skew to make it look somehow less European. Only, ironically, thou, not knowing any languages but a few Latin-written ones, art Eurocentrist, failing to consider the absurdity of creating lots of Latin phrases like genitivus absolutus as Russian or Hindi.
A professor consciously uses international terms independently of whether he is a botany or medicine or law professor. Making the categorization depend on some code of nomenclature is an irrational appeal to authority.
It is a misconception that terms are not translingual by default and only described so out of convenience. On the contrary, the circumstance that any word belongs to a particular language is a claim that requires evidence, and like on first glance accepted can also be disproven. By default, utterances do not belong to languages. Fay Freak (talk) 13:53, 17 February 2021 (UTC)[reply]
@Fay Freak It is ironic that you lie that I use appeals to emotion, whereas there isn't a single emotional statement in my previous comment and most of your comment is an unpleasant screed. Your claim that I only have knowledge of Latin-script languages is mendacious, for one I mention Ancient Greek, Biblical Hebrew, Coptic, Yiddish and Syriac on my user page; there are also others but I don't expect you to read minds. Perhaps you should learn to read carefully before you write. Dutch genitivus absolutus, which I made, is a good example of something that should never be considered Translingual; but thank you for demonstrating that you never took the effort to check for actual attestation because it is only widely attested in a limited number of languages. ←₰-→ Lingo Bingo Dingo (talk) 17:53, 17 February 2021 (UTC)[reply]
@Lingo Bingo Dingo: It is ironic that you pretend that I use appeals to emotion, whereas there isn't a single emotional judgment in my previous post but I was concerned to debunk all emotions. There is no such thing as “emotional statements”. The sharpness and harshness does not correspond to emotionality; whereas behind your displayed impartiality there is nothing but emotion leading to your decision, not being informed and thinking through.
You should learn to read carefully before you write. I already argued that the number of languages it has been used in is irrelevant. There are translingual terms only attested in one language – including many taxa. The actual attestation of Dutch genitivus absolutus is as translingual. You are completely missing the point. That “it is attested in Dutch” does not serve to distinguish whether a word is translingual or Dutch, nor any arbitrary number of languages a word is attested in. You do not shed light on any criteria to determine whether something is translingual. Fay Freak (talk) 22:06, 17 February 2021 (UTC)[reply]
@Fay Freak Maybe making false claims about other people without bothering to check basic details seems a rational undertaking to you; to me it does not quite suggest dispassionate reasoning. I also note that you have not even acknowledged your blunder, instead you opt for personal attacks.
The crux of genitivus absolutus is that it appears to be assimilated into very few languages, with many European languages preferring a more native counterpart similar to absolute genitive or using the Latin phrase in italics. Even in German, where it seems to be used more frquently than in French and English and where it often is not italicised, there is the vacillation in capitalisation that marks it as foreign in contrast to the standard spelling of nouns. Only the inconsistency in italicisation exists in contemporary Dutch and there are plenty of unitalicised attestations, generally considered an indication of nativisation on here. You seem inclined to already consider multi-word phrases borrowed by as few as two languages multilingual. I challenge you to find unitalicised uses of "genitivus absolutus" in the native script in languages other than Dutch and German; I doubt that you will find many languages with dozens of such uses. With one donor language and maybe two borrowing languages, there is little risk of a glut of L2s. ←₰-→ Lingo Bingo Dingo (talk) 17:58, 21 March 2021 (UTC)[reply]
@Lingo Bingo Dingo: You still haven’t got the point and make false claims about what other people believe without any undertaking to find rational or consistent criteria. I contended that it is completely irrelevant in how many languages it is used, and how it is integrated in native sentences (which varies by the grammar and script of the main language; endings you would argue to show something don’t exist in the other, and the script influences the decision whether any endings are attached at all, while only some scripts even have italicisation and uppercase-lowercase-distinction). A translingual term may have been ever used in only one language. What makes it translingual is that its users did not intend it to be considered part of the native language but translingual. Whether an educated person would be surprised to hear the claim of something being a borrowing instead of liberal use of foreign elements. How can one even save oneself from dictionary authors expanding the lexica by use of one’s barbarisms? Fay Freak (talk) 18:14, 21 March 2021 (UTC)[reply]
@Fay Freak How is "[y]ou seem inclined to already consider multi-word phrases borrowed by as few as two languages multilingual" a false claim if you "contended that it is completely irrelevant in how many languages it is used, and how it is integrated in native sentences"? Anyway, your position is at odds with the usual practices at Wiktionary and quite anti-empirical. I also doubt that it finds a lot of support here. Good luck, I guess. ←₰-→ Lingo Bingo Dingo (talk) 18:42, 21 March 2021 (UTC)[reply]
Related discussion: Wiktionary:Tea room/2021/February § Homo sapiens. J3133 (talk) 05:54, 27 February 2021 (UTC)[reply]
  • I found this post very difficult to address. In addition, some of the comments and questions seem ill-informed. Some specific points:
    "Re: [1.] Are taxonomic terms always used in italics? No. [2.] Should a non-italic taxonomic term be considered as an italic taxonomic term which by default is placed in italics (e.g. Homo sapiens = italics of Homo sapiens)? [3.] That's complicated and moreover ridiculous, isn't it?"
    To answer the three questions:
    1. No, but there are rules for italicization in the various taxonomy codes, which are followed in scholarly works and many others.
    2. Hunh?
    3. If I understood the question, I suspect I would agree.
    Re: use of to mark extinct species. It's a common convention, especially where both extinct and extant species are both being discussed, which is potentially the case in all Wiktionary taxonomic name entries since all extinct taxa have some ancestor in common with extant taxa.
    Re: {{taxlink}}, see the documentation. It has little use in further reading. It's main purpose is to enable counting of uses of taxonomic names, which, in turn helps prioritize efforts to add taxonomic name entries. We have many taxonomic names in entries which are unlikely to get entries in this decade, possibly not even in this century. The link to Wikispecies does provide a means to get more information, sometimes enough to suggest that a new entry might be warranted.
    Re: Attestation for Translingual terms: Ideally all lemmas would be attested in all their senses and variants. But, as any user of Wiktionary should have noticed, not all lemmas have citations. Lemmas are challenged one at a time via RfV, not en masse. As any practical person would appreciate, we do not usually waste too much time attesting orthographic variants, eg, italics, nor alternative forms, nor do we waste time on challenging items that are highly likely to prove to be attestable. Occasionally, we have made entries for taxa that were not attestable. Sometimes they are detected, challenged and removed. More often a challenged entry is found to be either a misspelling or attestable. DCDuring (talk) 22:41, 21 March 2021 (UTC)[reply]

Saterland Frisian orthography

[edit]
See also: WT:RFVN, and User talk:Leasnam

@Leasnam, Apisite: I think it's time to make this official: which orthography do we want on en.wiktionary? I propose the one handled by {{R:stq:SW}} (which matches the one portrayed by en.wikipedia). Other attested spellings could be given as "alternative spelling of". Any objections? If not, I'll update WT:ASTQ accordingly. Thadh (talk) 11:50, 9 February 2021 (UTC)[reply]

Moldovan vs. Moldavian varieties of Romanian

[edit]

In the Republic of Moldova there are some words that are not in use in Romania, for instance rutieră instead of microbuz (meaning minibus) or bătută instead of șnițel (meaning schnitzel).

We currently have a category Category:Moldovan Romanian that includes words from both the Republic of Moldova and the Moldavia region of Romania, which is making things a bit confusing. There is no way to see the words in use in that country, like we have for instance for Category:Australian English.

Since these varieties of Romanian are separate, should we divide them into Category:Moldovan Romanian (for the words from the Republic of Moldova) and Category:Moldavian Romanian (for the words from the Romanian Moldavia)? Bogdan (talk) 20:13, 9 February 2021 (UTC)[reply]

Seems sensible to me. Due to the political border, this is useful for navigation and in terms of actual linguistic shift, political borders (particularly international ones) will inevitably lead to some changes in the language itself. —Justin (koavf)TCM 21:16, 9 February 2021 (UTC)[reply]
@Bogdan The main risk I see is that most people won't notice the subtle difference in spelling between the two, and we'll end up with the same uninformative hodgepodge, but now randomly distributed in two different categories. Worse, it may not even be obvious that there are two categories. I think we need to make the names longer and more distinct, perhaps something like "Moldova Republic Romanian" and Moldavian Region Romanian". I realize those names are long and awkward, but it's no good to have short and sweet names that end up meaning nothing. Chuck Entz (talk) 03:30, 10 February 2021 (UTC)[reply]
@Bogdan, Chuck Entz: Therefore, and combined with a diachronic view, it may be advised to keep it ambiguous. For if one deals with a term that was used in Moldavia – and one will only roughly know which Moldavia – two hundred years ago, one cannot work with the current political distinction. For the time being, to save regiolect information that you possibly have, you can employ an insider distinction, Bogdan: You can label explicitly with something like “Moldavian region of Romania” while categorizing the same as with Republic of Moldova terms. The same the labels {{lb|ar|al-Andalus}} and {{lb|es|Andalusia}} categorize as Category:Andalusian Arabic and Category:Andalusian Spanish while displaying differently (which is also devised in view of potential Aramaic or Hebrew usages in the region). Then you might see if you have enough and sufficiently unambiguous label uses to turn on separate categories. Fay Freak (talk) 11:17, 10 February 2021 (UTC)[reply]
My problem is not just with the ambiguity of what's displayed, but also that we cannot see a list of all the words that are in use only in the Republic of Moldova. Bogdan (talk) 11:29, 10 February 2021 (UTC)[reply]

My fear is that we yet again start moving towards recognising Romanian spoken in the Republic of Moldova as a separate language. It goes against what we voted for all those years ago. To the best of my knowledge, the Moldovan regiolect does not have rigid boarders, hence, keeping to one category isn't wrong. If there are in fact words only used in the Republic of Moldova, can't we just add usage notes? Seems more reasonable that way. --Robbie SWE (talk) 13:33, 10 February 2021 (UTC)[reply]

Regardless of the merits of a split, I am not a fan of the two names chosen. These two terms are too similar and not very clear in what they indicate. Something like "Republic of Moldova Romanian" for the former category would be much clearer. —Rua (mew) 13:40, 10 February 2021 (UTC)[reply]
I guess "Republic of Moldova Romanian" is the least ambiguous version. Bogdan (talk) 07:09, 11 February 2021 (UTC)[reply]
I agree with Chuck and Rua that "Moldovan" / "Moldovian" / "Moldavian" are too similar so people are unlikely to understand or make or maintain the desired distinction, so there will just be too categories with words from each region instead of one. I suppose the categories could be split if this is really necessary; if they are split, and we use more distinguishable names like Chuck proposes, the existing "Moldova" category could be retained (under some spelling...) as a parent category, to contain both subcategories and to contain terms used in both regions and/or terms where it is not possible to determine exactly which of the two subregions they are used in, or terms which predate the split, this would address the issue Fay Freak mentions. (On some level this reminds me of the discussion of whether the "Canadian English" category should allow an overview of terms used only in Canada, with terms that are also used in Scotland or the Maine or wherever split off or removed.) - -sche (discuss) 01:15, 12 February 2021 (UTC)[reply]

Cannot open collapsible sections (translations, etc) in Kiwix

[edit]

(Initial discussion was at sv:Wiktionary:BB#Translations_missing_in_the_official_.zim_dumps)

A bug in the MWoffliner .zim creator makes it impossible to view the content of collapsible sections in the official .zim dumps for Wikimedia's offline project w:Kiwix. This is not specific to English Wiktionary and has been reported with respect to other wiki languages. For example, translations and inflections are not displayed.

The issue seems to be entirely on MWoffliner's side: Wiktionary's CSS collapses the sections iff Javascript is available and MWofflner adds its own Javascript, which interferes with the ability of expanding them.

Should the Wiktionary/ies do anything to work around this until MWoffliner is fixed? Is there a wiki page for a real cross-wiki report/discussion? --62.98.98.150 07:30, 10 February 2021 (UTC)[reply]

You speak as if people know (/ care about) what MWoffliner is.
Wiktionary (-ies) is (are) the data origin, and it is the responsibility of 3rd party parsers to ensure that their parsers work (and you already say that it "seems to be entirely on MWoffliner's side"). :s —Suzukaze-c (talk) 11:28, 10 February 2021 (UTC)[reply]

CFI for place names

[edit]

I've drafted a vote at Wiktionary:Votes/2021-02/Expanding CFI for place names for expanding and clarifying CFI for place names. I appreciate any input left on the vote's talk page! Ultimateria (talk) 22:09, 10 February 2021 (UTC)[reply]

Guanche terms in Spanish etymology sections

[edit]

I have been encountering several cases of confidently cited Guanche terms in Spanish etymologies, written in Tifinagh (a script used in writing Berber languages). Examples: Beneharo, Derque, Echedey, Firjas, gofio, Hañagua, Itahisa, Kebehi, Meagens, and several others listed in Category:Spanish terms derived from Guanche. Some of these use an asterisk by the Tifinagh original and/or the transliteration. (None of the above-cited terms star the original but many of them put an asterisk on the transliteration; see tajinaste for a term with an asterisk by the original.) I am highly skeptical of the accuracy of these terms. In general they are cited to a certain Ignacio Reyes Garcia, who is not in Wikipedia and who seems to have written an obscure book called "Nombres Personales de las Islas Canarias" that is out of print. They also cite a blog that I am guessing copied the book, but the blog no longer exists. (Some of the pages are archived in the Wayback Machine, but I tried opening the link on gofio and just get a blank page: [1].) Per Wikipedia, although Tifinagh did in fact exist in certain inscriptions in the Canary Islands, the variant that was used is not well deciphered, and it isn't even known for certain that Guanche is a Berber language (as Wiktionary claims). I suspect that Reyes Garcia's transliterations and definitions are largely fanciful (similar to supposed "decipherments" of the Indus Valley and Rongorongo scripts), and he may have even taken reconstructed Guanche terms and transliterated them into Neo-Tifinagh in order to generate the supposed originals. The etymology on gofio was added by User:Jberkel, while many of the others were added by User:JaS, who I am not familiar with and who appears no longer active. The one on tajinaste, which is not cited, was added by User:Smettems, another user I don't recognize. Benwing2 (talk) 02:12, 11 February 2021 (UTC)[reply]

Yes, they are fanciful and should be removed. No Guanche entries should use Tifinagh script, and without highly compelling evidence, no reconstructed entries should be created. The existing entries (by @Tibidibi) also have major problems, but as they haven't fixed them, I guess I'll have to get around to it myself. —Μετάknowledgediscuss/deeds 18:45, 11 February 2021 (UTC)[reply]
@Metaknowledge, sorry about that! I'd kind of forgotten about cleaning up after myself :( You can delete them all if it's necessary.--Tibidibi (talk) 05:40, 12 February 2021 (UTC)[reply]
[edit]

Google Groups has made its interface less useful, as you now need to be logged in to search all groups, and many have been blacklisted for containing spam. Usenet Archives doesn't allow searching and doesn't appear to provide metadata (and is mostly incomplete to boot). Is there a better Usenet archive? Even if Google Groups is ideal for permalinks, is there any other place to search, especially if Google decides to stop providing this service? grendel|khan 02:45, 12 February 2021 (UTC)[reply]

How should we handle Late Common Slavic loans into Romanian?

[edit]

Romanian has a large amount of Slavic words that were borrowed from the local Slavic population during the 8th to 10th centuries. They spoke a language that was still mutually intelligible with the rest of the Slavs and innovations were still spreading throughout the area, but some dialectal differences had already began to arise.

From what language should we say they were borrowed? If I say "Proto-Slavic", in 95% of the times the form of the word is identical, but there are exceptions.

Romanian linguists typically use "Old Slavic", but which makes it ambiguous because it includes everything: the actual Proto-Slavic (6-7th century), Old Church Slavonic and these words as well.

I see that Slavicists often use "Late Common Slavic", which is good in setting the timespan, but even that is a bit ambiguous geography-wise, as South Slavic already began to be separate from the rest. Bogdan (talk) 11:04, 12 February 2021 (UTC)[reply]

I assume that's why we just give Common Slavic as the source – since we don't actually know when and where the loan was made, due to a complete lack of sources, it's better not to guess. --Robbie SWE (talk) 20:23, 12 February 2021 (UTC)[reply]

Relocating Japanese historical hiragana

[edit]

Currently historical hiragana are displayed like this:

える • (kaeru) transitive ichidan (stem え (kae), past えた (kaeta), historical kana かへる)

But seemingly that bracket is intended for inflected forms. This looks somehow illogical. How about this?

える • (kaeru)←かへる (kaferu, hist.) transitive ichidan (stem え (kae), past えた (kaeta))

-- Huhu9001 (talk) 11:54, 12 February 2021 (UTC)[reply]

No objection. I definitely agree that the old location is awkward. —Suzukaze-c (talk) 12:08, 14 February 2021 (UTC)[reply]
Nitpicking on presentation: I don't think that <small> is necessary, and it also makes clicking on the question mark (enclosed in sup→small→sup) hellish. —Suzukaze-c (talk) 12:11, 14 February 2021 (UTC)[reply]
<small> removed. -- Huhu9001 (talk) 14:02, 14 February 2021 (UTC)[reply]

Should "Middle Korean adjectives" be abolished?

[edit]

@LoutK, Suzukaze-c,

Currently, Wiktionary distinguishes adjectives from verbs in Middle Korean. However, the distinction between the two is much less clear in MK. Examples:

  • 性이 서르 갓가오나 ᄇᆡ호ᄆᆞ로 서르 머ᄂᆞ니 (their nature is close to each other, but by learning they become distant)
    • The ModK equivalent is 性이 서로 가까우나 배움으로 서로 멀어지니
  • 피리 부로매 셴 머리 도로 검ᄂᆞ니 (phili pwulwomay syeyn meli twolwo kemnoni, in playing the flute, the white hair becomes black again)
    • The ModK equivalent is 피리 부르니 센 머리가 다시 검어지니
  • 愛水 흐르디 아니케 ᄒᆞ면 湛性이 ᄆᆞᆰᄂᆞᆫ 젼ᄎᆞ로 (because the tranquil mind becomes clear if one makes the sexual fluids not flow)
    • The ModK equivalent is 愛水가 흐르지 않게 하면 湛性이 맑아지기 때문에
  • 蓮ㅅ 고지 븕고져 ᄒᆞ놋다 (How the lotus flower seeks to be red!)
    • The ModK equivalent is 연꽃이 붉어지려 하는구나
  • 君子이 싁싁ᄒᆞ고 공경ᄒᆞ면 나날 어디러 가고 (the junzi who is strict and reverent becomes more benevolent day after day)
    • The ModK equivalent is 군자가 엄하고 공경하면 나날로 어질어져 가고
  • ᄀᆞ장 모딘 罪 다 업서 버서나니라 (their most grave sins shall all vanish and they shall shed [their agony])
    • The ModK equivalent is 가장 심한 罪가 모두 없어져서 [이를] 벗어나리라; an adjectival interpretation doesn't make any sense.

In effect, most adjectives could also be considered verbs with the meaning of "to become [STATE]", with e.g. 블근 곳 (pulkun kwos, red flower) literally being "flower that has become red".

There are also very rare cases of transitive use of adjectives:

  • 갈 바ᄅᆞᆯ 아득ᄒᆞ야 머리ᄅᆞᆯ 돌아보니 (kal palol atukhoya melilol twolapwoni, turning the head to look back, considering as distant the way that I must take)

There are still convincing morphological arguments for differentiating verbs from adjectives, and scholars do usually still consider "adjectives" as a discrete category. For example, only adjectives in hota can be adverbalized with hi. The distinction is especially clear in the semantic field of emotion. 젛다 (cehta) is specifically "to fear", i.e. "to become scared", and the derived 저프다 (cephuta) is "to be scared"; 슳다 (sulhta) is "to grieve" and the derived 슬프다 (sulphuta) means the same thing it does now. These derived adjectives of emotion almost never show verbal usage.

Still, it's my opinion that Wiktionary would be better served if we abolished Category:Middle Korean adjectives entirely, because as it stands most the definitions are incomplete (they're missing the verbal "to become [STATE]" definition), there is effectively no difference for the purposes of conjugation templates, and it better shows the ways in which MK differed grammatically from MdK.

Thoughts?--Tibidibi (talk) 10:28, 13 February 2021 (UTC)[reply]

No opinion. But I note that Chinese has the same feature, where some adjectives can mean 'become [adjective]', as in . I think we only record the adjective sense. —Suzukaze-c (talk) 23:04, 13 February 2021 (UTC)[reply]
Support If you think the dictionary is better served by this change, then I'm definitely on board. — LoutK (talk) 01:38, 14 February 2021 (UTC)[reply]

alternative pronunciations vs forms

[edit]

Halp, am of doubt as to whether or not to list alternative pronunciations on the lemma page for languages that pretend to be phonemic (with a beast like English, it's a no-brainer.). Seems intuitively obvious, but then in many cases there's no still corresponding spelling; it would also imply that people who pronounce it alternatively also use the alternative spelling, and pronounce the standard spelling using spelling-pronunciation, which is rarely true.

Secondary question: would it not be better to give a truncated transcription of only the segment that differs instead of the whole word, where no braackets are possible (e.g. presence/absence of nasalisation: [õ])? Can this be easily done with the pronunciation template, or better to use manual IPA?

Tertiary question: do or don't list alternative forms on other alternative forms' pages? Brutal Russian (talk) 20:06, 15 February 2021 (UTC)[reply]

Here's my thoughts; they don't necessarily correspond to community consensus:
  • The first question doesn't have an easy answer; things are done differently by different editors and in different contexts. However, for languages where the spelling doesn't correspond well to pronunciation (e.g. English), I prefer to centralise most pronunciations at the lemma, because speakers often like to use standard spellings even when it doesn't correspond to their own pronunciation (for example, I say /ˈɛk(t͡)ʃɫi/ and /pɹəˌnæɘ̜nsiˈæɪʃən/, but write actually and pronunciation, not akshly and pronounciation).
  • However, some (particularly prominent) alternative forms can have their own pronunciation sections. Alternative forms that are particularly divergent or exotic (in meaning etc.) maybe shouldn't go on the lemma.
  • For the second question, practice varies here as well, but I generally prefer to have each alternate pronunciation wrote out in full unless there's too many alternate pronunciations to make that practical. I don't edit languages that have their own pronunciation templates, so I wouldn't know how well
  • For your third question, it's better to avoid listing alternative forms on other alternate forms, as that increases the amount of maintenance to do (e.g. if somebody wants to add a new alternate form.
Hazarasp (parlement · werkis) 06:05, 17 February 2021 (UTC)[reply]
  • For #3, I'll ditto Hazarasp's comment. Japanese alternative form entries are kept as simple stubs acting as soft-redirects back to the lemma form. We've been able to use templates and modules to keep these very streamlined. See, for example, lemma entry あなた (anata), and alternative-form entry 貴方. The latter is complicated due to the oddities of Japanese orthography -- the pronunciation anata for that spelling is the alt form, pointing the user back to the lemma at あなた (anata), while the pronunciation kihō is an entirely separate term with its own separate etymology and other details.
For a simpler example, see also lemma (sakura) and alternative (hiragana) form at さくら (sakura). There's a lot of detail visible at さくら (sakura), but if you open the page in edit view, you'll see that this is all pulled from the lemma entry using the template and module. This approach allows us to keep all the important lexicographic detail in one place -- in the lemma entry. This keeps maintenance much simpler, since we don't have to update the same information in multiple places. ‑‑ Eiríkr Útlendi │Tala við mig 19:32, 19 February 2021 (UTC)[reply]

Monthly Community Project to Import a Public Domain Dictionary or Glossary to Wiktionary

[edit]

I've noticed that Project Guntenberg has multiple dictionaries [2] [3] whose addition would greatly benefit Wiktionary. Could we set up a monthly contest to have their data imported into Wiktionary? Perhaps at the start of the month, we could set up a list of words contained in the dictionary and then have individuals check off once they have imported it? These public domain dictionaries are such a rich source of knowledge and would greatly enhance wiktionary. Languageseeker (talk) 20:36, 15 February 2021 (UTC)[reply]

etymology: bor template doubts

[edit]

Halp, more confusenings. Special:Diff/60429477 that User:Ultimateria made a while ago warns against using {bor} unless there's no previous links in the etymological chain. But my intention there was precisely to categorize the word as an Occitan term borrowed from Latin - even if the borrowing first happened in Old Occitan. Firstly, often the time of borrowing, or the more appropriate periodisation of the language is unclear (Basque marti). Second, if one wants to find all the Latin borrowings in Occitan, the no previous link approach only makes this possible by going to the earlier stages of the language, and in some cases by having to check two or more language stages, because the "Derived from Latin" category will show inherited and borrowed terms promiscuously. I specifically wanted the word to show up as a borrowing.

Additionally, there are many European words of Latin origin that have spread via the mediation of some prestige language, most often French. And at times it's impossible to tell whether the word was borrowed directly from Latin or rather the-French-borrowed-it-and-so-did-we, i.e. technically the borrowing is from Latin, but could be treated as a relatinisation of a French word borrowed from Latin (a calque-borrowing? :pensive_face:). Case in point: Dutch unie, which can't be directly from either French or Latin, but is a Romance/Medieval Latin-style adaptation into unia, also visible in many Slavic languages: unia/уния/унија. Dutch dictionaries don't know what to make of it: some say from French, others say from Latin. In my opinion both is justified and should be templated as such. Brutal Russian (talk) 20:37, 15 February 2021 (UTC)[reply]

Exclusively focusing on the Dutch example: It can be a direct borrowing even if the ending has been changed, -ion < -io endings are practically never borrowed as -io(n) in Dutch because the type is considered foreign. In this case the more recent etymological dictionaries agree that it is from French and they date it to Middle French, so considering it a borrowing from Middle French should be uncontroversial. ←₰-→ Lingo Bingo Dingo (talk) 18:54, 16 February 2021 (UTC)[reply]

learned borrowings template

[edit]

This template is severely underused even for languages like Dutch, where a clear separation between learned and non-learned Latin borrowings is desirable. In Romance languages, where basically any term borrowed from Latin is almost by definition learned (exceptions would mostly look like this - check it out, entertaining)... in these languages it's unused presumably because nobody feels the need for it. What do?

Corollary: this question extends to the {desc} template's lbor=1 parameter. Are there clear guidelines on when to use it? A whole tree of borrowings with (learned) added looks quite ugly and unnecessary. My current thinking is to use it when learned and non-learned borrowings appear in the same {top2} field; when I separate inherited and borrowed descendants, I use bor=1 instead to get the nicely-looking arrow. Brutal Russian (talk) 21:00, 15 February 2021 (UTC)[reply]

Because the template isn't used much, if you're wanting to massively expand its use, you're in the (un)enviable position of setting the standard here - do whatever you feel comfortable with, as long as it doesn't contradict existing practice and makes some kind of sense. Hazarasp (parlement · werkis) 06:09, 17 February 2021 (UTC)[reply]
I only recently became familiar with the template, but I will use it from now on where I think it is appropriate. ←₰-→ Lingo Bingo Dingo (talk) 17:31, 17 February 2021 (UTC)[reply]
@Hazarasp, Lingo Bingo Dingo: So given what I said above about Romance languages, I've been thinking: is it possible or desirable to automatically convert all borrowings from a parent language (English borrowing from Old English etc) into learned borrowings? With an override parameter for the sicutera's? Do we even have an appropriate template/category for such folksy corruptions/Chinese telephone items? Brutal Russian (talk) 02:49, 20 February 2021 (UTC)[reply]
@Brutal Russian I'm not really sure; a large proportion of borrowings from Latin will be learned borrowings, but I can't really judge the extent that it is possible that terms in some registers were e.g. picked up by members of the general public when in contact with Latin jargon (not necessarily with irregular secondary changes). ←₰-→ Lingo Bingo Dingo (talk) 08:20, 20 February 2021 (UTC)[reply]
[edit]

Special:Diff/61781951&oldid=61781606 by User:Rua warns against listing as cognates in the Etymology section those terms that already appear under Related. This slaps some further questions into me: which section to prefer for listing these words? Do we reserve Etymology for outside cognates and list internal ones in Related? What if mentioning some or all of these while discussing Etymology is beneficial - do we now forego listing them in Related? Any insights appreciated. Brutal Russian (talk) 21:07, 15 February 2021 (UTC)[reply]

Related terms is definitely the preferred section. "Do we reserve Etymology for outside cognates and list internal ones in Related?" Yes. If for example a word is inherited from a word in the parent language but is influenced by another word in the child language, it should be mentioned in the etymology, but I think it's fine to also link to it in related terms in this case. In Romance languages at least there's not much reason for same-language links in etymologies. The kind of redundancy I often remove is when a word is simply "foo" + "-bar" and "foo" is listed under related terms. Ultimateria (talk) 21:55, 15 February 2021 (UTC)[reply]
I wonder whether it might not be a good idea to have Related terms be an L4 heading under ===Etymology=== rather than under the part of speeech, since Related terms is for etymologically related words, regardless of the POS of the lemma. —Mahāgaja · talk 12:22, 17 February 2021 (UTC)[reply]
I wouldn't like that because it would push the down start of the main content even more. --{{victar|talk}} 06:33, 18 February 2021 (UTC)[reply]
Ignoring the likelihood that most monolingual users and many English learners want definitions more than any other content, we often have ridiculously long alternative form, pronunciation, and etymology sections. Alternative forms can be shown in a comma-separated, horizontal list. Regional pronunciation variations can appear under show-hide bars, as can long lists of cognates and alternative speculations about etymology. We could probably go further by hiding IPA. Registered users can set preferences to display hidden content. Related terms seem likely to be helpful for reasons other than interest in etymology. DCDuring (talk) 14:30, 18 February 2021 (UTC)[reply]
Man, some etymology sections for English words read like novels (see: thou, plat). I don't know what the solution is aside from maybe separating the omnipresent lists of 25+ cognates into another section/line. (Or...just not listing so many damn cognates) DJ K-Çel (contribs ~ talk) 20:17, 18 February 2021 (UTC)[reply]
See thou. I didn't change plat because the longest etymology section was for obsolete definitions. The entire content of etymology section 3 is of solely historical interest. DCDuring (talk) 13:46, 19 February 2021 (UTC)[reply]
Perhaps we need some kind of weasel-proof guidelines for which cognates to use- and which ones to leave out. The problem is all the IP editors that insist on adding their languages whenever they see someone else's language being used: you can't have just a Swedish cognate- that would be unfair to Danish, Norwegian, Faroese and Icelandic. Spanish? What about Asturian? Portuguese? What about Galician and Mirandese? Bulgarian? What about Macedonian? The languages I'm mentioning may not be the actual ones that get the most of this, but it should give an idea. The worst part is that the ones added by the "me too" folks are usually closely related to the ones already there, and thus mostly useless. Then there are the cases where there are a few representative cognates illustrating some finer point regarding the local development of, say West Germanic, and someone decides to add Albanian or Persian instead of letting the Proto-Indo-European entry tell that part of the story. Chuck Entz (talk)
I'm for showing a few (3 or 4, preferably from different branches of the language family) cognates even when a parent entry exists because users can make some helpful comparison without having to chase links around in truncated etymology sections. But I agree that more than that is undesirable. I'd also support showing cognates under a collapsible box if it bothers people too much about cluttering the page (we have that already on some entries). Mahagaja's idea of having Relates terms under Etymology is a good one. Altforms, if shown using {{alter}} already are listed horizontally instead of vertically. Maybe we can have the POS header first followed by pronunciation, etymology and descendants... -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 11:02, 19 February 2021 (UTC)[reply]
Just hide them. DCDuring (talk) 13:46, 19 February 2021 (UTC)[reply]

Side note, I've been to lazy too put this together, but really the cognates list should have the option to be automated much like we do with {{suffixsee}}, perhaps in a template called {{cognatesee|lang|root|family}} -- working in conjunction with {{root}} -- and thrown under a ====Cognates==== header. @Erutuon --{{victar|talk}} 21:33, 19 February 2021 (UTC)[reply]

@Victar: This would be truly useful and relieve people from feeling the need to exhaustively list cognates in the etymology entries of related words - or even of the same word in different languages, as often happens in Romance. I find it bothersome that they all need to be synchronised every time a correction is made to one of them. It would be especially handy if these automated lists could be transcluded under a collapsible header in the etymology section itself. Brutal Russian (talk) 12:07, 6 March 2021 (UTC)[reply]
I would die in a war to not have them in the etymology section. Absolutely never. {{victar|talk}} 16:50, 6 March 2021 (UTC)[reply]

WordHippo potential violation

[edit]

Where do we report potential copyright violations? Compare this revision of spew to Word Hippo's definitions, including "(Can we verify(+) this sense?)". I've saved a screenshot as well. There doesn't seem to be any attribution on the page. DAVilla 11:57, 16 February 2021 (UTC)[reply]

@DAVilla: Wikipedia:Mirrors_and_forks#Non-compliance_process. —Suzukaze-c (talk) 12:01, 16 February 2021 (UTC)[reply]
Ah, in that case I can't proceed, at least not with that word. I wasn't an editor for that page. DAVilla 14:03, 16 February 2021 (UTC)[reply]
How does that matter? The standard letters just point out the copyright violation.  --Lambiam 08:55, 17 February 2021 (UTC)[reply]
He wouldn't have standing to press the matter legally. I don't understand why MWF wouldn't accumulate complaints against offenders and then make the license-violation/copyright complaint. It will never be efficient for the responsibility to remain with each individual copyright holder. DCDuring (talk) 16:14, 17 February 2021 (UTC)[reply]
I'm tempted to suggest that we set up a Wall of Shame calling out blatant unattributed plagiarism like this, but it probably would be more trouble and bad publicity than it's worth. Chuck Entz (talk) 04:26, 19 February 2021 (UTC)[reply]
definitiondb.com is another one. There are tons. Equinox 16:47, 19 February 2021 (UTC)[reply]
It is highly unlikely that these are one-offs. I expect that each such site took content, probably everything available in a dump that met some criteria. If so, it would seem that WMF should be made aware, take whatever action they deem appropriate, and let us know what is the upshot. We could ask them what they are likely to do before we waste time on documenting the problem for the various sites that are guilty of this unacknowledged downloading. DCDuring (talk) 17:01, 19 February 2021 (UTC)[reply]
WMF will never win. Spammers always win. That's why the beautiful pioneering Internet of the '90s is now trash TV. Equinox 22:27, 19 February 2021 (UTC)[reply]

Possible case of admin abuse

[edit]

See Infodesk. Can we get some more input on this.__Gamren (talk) 13:58, 19 February 2021 (UTC)[reply]

The long block of the user seems wrong on its face. The user seems to have provided evidence showing usage that fits the labels and definitions involved. The admin's position seems at best prescriptivist and possibly PoV. DCDuring (talk) 14:34, 19 February 2021 (UTC)[reply]
On Wikipedia some policies allow action only by "uninvolved" admins. I don't think admin participants in a dispute should be handing out long blocks to their unarmed adversaries. A few hours to a day is plenty long enough for an uninvolved admin to make a long term decision. Vox Sciurorum (talk) 15:14, 19 February 2021 (UTC)[reply]
I fully agree with allow action only by "uninvolved" admins and don't think admin participants in a dispute should be handing out long blocks to their unarmed adversaries. Taylor 49 (talk) 20:30, 30 March 2021 (UTC)[reply]
Yes, there is something slightly off in this part of the scheme of authority and organization of the website. --Geographyinitiative (talk) 21:17, 30 March 2021 (UTC)[reply]

Cognates for borrowings

[edit]

Wiktionary:Etymology#Cognates says that cognates should be listed only for inherited words. This is generally a sensible policy, as the fact that a Romanian word borrowed from Ottoman Turkish has a Swahili cognate is irrelevant trivia.

However, there is the case in which the identification of the source language is not straightforward.

For instance, Romanian borrowed a few thousand words from a plethora of Slavic languages: Proto-Slavic, Old Church Slavonic, Old Bulgarian, modern Bulgarian, Serbian, Russian, Ukrainian and Polish, etc. and many words look very similar in all these languages.

Identifying which is the source language is not easy and comparing the forms in the possible candidates is part of this process.

Any thoughts about this? Bogdan (talk) 20:32, 19 February 2021 (UTC)[reply]

@Bogdan: I can give you the definite answer that the section you refer to is blatantly false, apart from the document being an old compilation that hasn’t seen our recent best practices and architecture, so naturally contradicting itself and established later schemes. Obviously with borrowed terms, wanderworts, the origin of which is dubious, we list cognates. Only if one restricts the meaning of “cognate” by “related as as sister languages from a common ancestor” that statement is valid. But it is even less true for inherited words than for any other words because then we regularly can reconstruct an ancestor where we list cognates in a descendants list, making the cognate lists superfluous.
The document reads like one tried an introduction into historical linguistics. But the matter you describe, the information you have, already suggests you how to write etymology, and hopefully you know yourself from outside how etymologies should be written! Fay Freak (talk) 21:21, 19 February 2021 (UTC)[reply]
I would say "from a {{der|ro|sla}} language. Compare " followed by Slavic terms that seem to be related to whatever the donor term was. If appropriate, I might add "Ultimately from {{der|ro|sla-pro|[...]}}" with the Proto-Slavic term replacing the [...] in this example. Another option would be to substitute the "Ultimately from {{der|ro|sla-pro|[...]}}" part for the "Compare" part- that is, let them find all the presumably related Slavic forms in the Descendants section of the Proto-Slavic entry. Chuck Entz (talk) 21:51, 19 February 2021 (UTC)[reply]
In most cases, it is possible to tell which language was the source due to phonetic differences.
What problem I still have is what name to use for the language from which the bulk of 8-10th centuries borrowings were made.
  • Late Common Slavic -- basically it was still a late stage of Proto-Slavic, still acting like one language and mutually comprehensible with the other parts of the Slavic dialect continuum
  • Old Church Slavonic -- the differences between the language from which the words were borrowed and the dialect which was standardized as Old Church Slavonic were very small, basically the same language. (for this reason, Bulgarians call Old Church Slavonic "Old Bulgarian")
  • Old Bulgarian -- the surviving similar dialects ended up being Bulgarian
Bogdan (talk) 22:05, 19 February 2021 (UTC)[reply]
@Bogdan: If you have considered yourself more confused, misinformed or disappointed after reading Wiktionary:Etymology in earnest, you might vote pro on the just created motion for deletion of that page. Fay Freak (talk) 22:22, 19 February 2021 (UTC)[reply]

Taking into consideration that we have a substantial amount of high-quality reconstruction pages for several proto-languages, I personally see no point in adding cognates in an etymology section, unless it provides an interesting aspect that the main etymology does not convey. Can we choose to keep the etymology sections short and sweet, then we should definitely put in the effort. --Robbie SWE (talk) 18:50, 20 February 2021 (UTC)[reply]

Descendants: Inherited vs Borrowings

[edit]

So I've been busying myself with these - separating the raisins from the flies, as the Russian saying goes - for a while now, and the best approprioach (made this word up accidentally but it works) I've discovered through practice is roughly this: to have two separate sections if the ratio of inherited to borrowed comparable and they don't fit into two columns of 5. Alanus is already pushing it and I'd rather separate them. minimus absolutely needs two separate fields in my opinion. I also separate ancient/natural borrowings from learned ones, especially those from the early-modern period onwards, by listing the former together with inherited words, as I did in fabrica. I haven't felt the need to use all three sections for a single word so far. Apart from this solution seemingly not being adopted by anyone else (I may have seen it once), there's at least one other thing: it breaks {{desctree}}. So firstly, how do other editors feel about the issue, and secondly, if the feeling if mutual, I propose introducing the practice and modifying desctree with an option to choose between listing items from the Descendants, Borrowings or Learned borrowings section. This will also benefit the template and make it usable in cases which currently would result in a monstruous list of descendants, with many unwanted learned borrowings. Alternatively, make a {{bortree}}.

Also, what are your thoughts on collapsible lists? IMO the website's styling would benifit a lot from hiding long lists like that, and I would also add that I find the approach to collapsibel lists that the Hungarian wiktionary takes to be the bestest: uncollapse by default, use smallcaps for language names, separate the columns with background colour - and honestly, we could add links to other-language wiktionaries to descendants as well. Brutal Russian (talk) 03:06, 20 February 2021 (UTC)[reply]

I agree with you that separating inherited terms from borrowings is a good idea if there's a whole bunch of each. I might start doing that myself - the inability to {{desctree}} is a loss I can take, especially as its flaws already often make it unusable (e.g. it can't handle terms with multiple etymologies).
As for collapsible lists, I'm kinda non-plussed by them, though I don't have any cogent objections to them other than the fact that I think the hu.wikt translation box looks ugly (though there's other styles that could be used, such as that used by Template:col4) Hazarasp (parlement · werkis) 09:05, 20 February 2021 (UTC)[reply]
@Brutal Russian: You can't create headers that aren't in WT:EL. I would support a ===Borrowed terms=== header if you make a vote for its inclusion, but until then, please use |bor= and |lbor= where appropriate. --{{victar|talk}} 21:09, 20 February 2021 (UTC)[reply]
@Brutal Russian, |1= should always be lang. Please place |bor=, |lbor=, etc. and the end of {{desc}}. Thanks. --{{victar|talk}} 22:58, 20 February 2021 (UTC)[reply]
I'm not sure that a ====Learned borrowings==== header is actually the right way to do things. A more elegant method to organise the morass of descendants that some words have would be to allow users to divide up ====Descendants==== sections using subheadings (e.g. =====Inherited=====, =====Borrowed=====, but freedom would be given to define arbitrary subheadings within reason). Hazarasp (parlement · werkis) 01:30, 23 February 2021 (UTC)[reply]
I disagree, but draft a vote. --{{victar|talk}} 07:42, 23 February 2021 (UTC)[reply]

Alternative forms - diacritics

[edit]

While listing these for Latin I've discovered that in many cases, diacritics stand in the way. Most alternative forms come from periods vowel length breakdown such as Late and Medieval Latin, so one might as well mark all of these with macron-breve. When these ā̆'s agglomerate, the result can be plain ugly. For the lulz a representative example: deorsum. Outside of these, the macrons are not only rather redundant, they also encourage the editor to list variant prosody as alternate forms. While this is clearly the right way to go for languages that include diacritics in the page name (Latvian, Spanish), in Latin the difference will simply be noted on the same page. I expect the same arguments will be true for Ancient Greek etc - but especially with reference to ancient languages with multiple periodisations/pronunciation traditions, where specifying prosody in alternative forms would lead to innumerable alternatives. I propose adopting a policy of foregoing prosody in alternative forms for such languages, as I did on ieiunus. If someone feels this to be somewhat inconsistent with listing ie- and ia- forms while also giving both pronunciations in the lemma (based on the discussion just above), I also welcome your thoughts. Brutal Russian (talk) 03:48, 20 February 2021 (UTC)[reply]

Derivatives/Descendants of alternative forms vs the lemma

[edit]

Example: ientō, alt form of ieientō, or zizania, collective singularisation of zizanium. Where do the descendants go in such cases? I suppose this can be resolved using {{desctree}}, but is this suggested? — And what about derivatives? — Also, what's the best/currently adopted way to specify that something comes from an alternative form instead of the lemma in the Descendants section, while avoiding making a Latin word a descendant of another Latin word (which I take it is discouraged)? For instance, some of the descendants continue one form of the word, and the rest clearly continue another, perhaps unattested? Does this call for making a new (reconstruction) page and including the desctree from that in the lemma? Brutal Russian (talk) 04:02, 20 February 2021 (UTC)[reply]

Again, multiple approaches have been used here, depending on the scenario, personal preference, etc. This is really a area where there personal judgement supersedes hard-and-fast rules, but here's some broad guidelines:
If only one or two descendants come from a alternative form, I would add a qualifier next to those descendants, e.g. at bolster:
  • English: bolster
  • Scots: bowster, bouster, boster (bowstur)
If more descendants come from a alternative form, then you can put them under a subheading (e.g. at sabbatum):
From the variant *sambatum:
If most or all descendants come from a alternate form, then I believe it's to centralise them on the main page, maybe with a note beforehand explaining the situation (compare tabula; the forms at tabla probably should be replaced with a note telling readers to go to tabula). However, other editors do disagree with me (as indicated by the existence of forms at tabula). Some might even think it's best to have forms at both the main form and the alternate form.
As for your comment about making Latin words descendants of other Latin words, yes, this is generally avoided, but some still do it (see the current situation at sabatum. Hazarasp (parlement · werkis) 08:57, 20 February 2021 (UTC)[reply]
If the alternative form has a lot of descendants, why aren't they on that form's own page, possibly with a cf. or something to draw attention to it? DCDuring (talk) 23:23, 22 February 2021 (UTC)[reply]
Generally, I think everything should be centralised on the main lemma as much as possible (within reason). This is because it's what people are more likely to look for (e.g. more people will search for tabula than tabla). Hazarasp (parlement · werkis) 01:35, 23 February 2021 (UTC)[reply]

Google Groups no longer provides message IDs.

[edit]

See this early use of netiquette. The original message headers are no longer available, which means I can't include the Message-ID, which uniquely identifies a Usenet message. I also can't revert to the old-school flavor of the original Google Groups, which let me do that. Is this a regression? Is there anything we can do? grendel|khan 08:35, 20 February 2021 (UTC)[reply]

Google Groups is not the only place for Usenet messages. J3133 (talk) 08:51, 20 February 2021 (UTC)[reply]
Do you know of another searchable archive of Usenet messages?  --Lambiam 14:50, 20 February 2021 (UTC)[reply]
@J3133: I asked about that earlier this month, but didn't hear anything back. {{quote-newsgroup}}'s docs and discussion don't suggest anything else. Do you have another link to the message used there? Or its Message-ID and instructions on how you got it? grendel|khan 16:41, 20 February 2021 (UTC)[reply]
You can also no longer forward these messages, which would (presumably) have included the message-ID.  --Lambiam 14:50, 20 February 2021 (UTC)[reply]
Google was only nice when being nice helped build the brand. Now they seem to be cutting costs, limiting financial/legal liability, and reducing political risk. DCDuring (talk) 16:52, 20 February 2021 (UTC)[reply]
Archive.org has a Usenet Historical Collection. According to the description, it spans "more than 30 years", though it's not clear which years, or how their coverage compares to what's indexed by Google Groups. However, as an experiment, I went to their news subcollection and downloaded news.misc.mbox.zip (23.5 MB). After unzipping, I grepped the mbox file for the string 'CORPARASHUN' and found the netiquette message linked in the original post. The Message-Id header looks like: Message-ID: <7805@BIT.NET>. Now, if you wanted to do a search against all groups, it would require some level of technical proficiency and a fair bit of disk space (eyeballing it, all the files in the usenet-news subcollection look like they'd add up to around 8GB uncompressed, and that's just one out of 1,019 subcollections). But it's at least comforting to know that this redundancy exists.
Apparently there was also another Usenet archive on archive.org, The UTZOO Wiseman Usenet Archive, but the author took it down recently as a result of some legal threats. Unfortunate.
I also stumbled on UsenetArchives.com, another online archive of usenet posts, but it doesn't look very promising. It doesn't have a search feature, and currently doesn't seem to be indexed by Google, and coverage seems spotty. But it's a recent-ish project, and maybe it will get better as development continues.
As a final note, I would personally not be too concerned about adding quotes without message ids. I think having the group name, plus year, plus subject line, plus author, plus text excerpt should be more than enough bits of entropy to uniquely identify the message. Colin M (talk) 21:38, 20 February 2021 (UTC)[reply]
Would it be possible to add a URL as well? That would add to verifiability. — SGconlaw (talk) 14:04, 21 February 2021 (UTC)[reply]
For a message found in the archive.org collection? No, unfortunately the only way to view a particular message from that collection is to download the corresponding zip file to your computer, unzip it, and find the message inside the mbox file. I suppose you could include the url for the zipped mbox file, but I doubt many readers would want to go through the legwork to deal with it. Colin M (talk) 20:40, 21 February 2021 (UTC)[reply]
Ah, I meant at Google Groups. Even a message ID can no longer be provided, at least a URL can be added. — SGconlaw (talk) 21:15, 21 February 2021 (UTC)[reply]
Oh, in that case then yes, that's definitely possible. See the first URL in this thread for example. You can get a permalink for a given message by clicking the triple-dot button in the top-right of the message and clicking "Link" from the dropdown menu. Colin M (talk) 20:56, 22 February 2021 (UTC)[reply]
Thanks for those. The hoster of the Usenet Archives gives some background here; it is a mirror of the UTZOO Wiseman that has been taken down [fake news: it contains the UTZOO Wiseman archive but it also contains other messages]. The site does have a "search posts" function on my end, though. ←₰-→ Lingo Bingo Dingo (talk) 09:50, 21 February 2021 (UTC)[reply]
Whoops, you're right, I just missed the 'search posts' toggle in the UI. Colin M (talk) 20:43, 21 February 2021 (UTC)[reply]
I've edited my previous comment, because the Usenet Archives are clearly not limited to the UTZOO Wiseman archive. ←₰-→ Lingo Bingo Dingo (talk) 15:31, 23 February 2021 (UTC)[reply]
Thanks! I fetched the mbox file for news.misc, and the message ID for that particular message is 14434@goofy.megatest.UUCP; the one you listed is for a (wacky) reply from a BIFF. This is usable for a one-off, but (obviously) much more time-consuming than I'd prefer. I'll send some feedback, for whatever good that might do. Note also that you can still search by message ID in the interface. grendel|khan 19:17, 23 February 2021 (UTC)[reply]
I don't know what the total storage requirements are, but in theory something like that could also be hosted on WMF lab infrastructure, or at least a tool to extract message ids. – Jberkel 11:48, 22 February 2021 (UTC)[reply]
That would be amazing! There are a lot of annoying limitations with the Google Groups UI. The biggest one for me is the inability to sort from oldest-to-newest, which would be so useful for antedating. It would also be great to be able to search by regex. Also, it would be nice if you could, say, select some text from a message and click a button to get a pre-filled quote-newsgroup template. Colin M (talk) 21:03, 22 February 2021 (UTC)[reply]
Re-thinking this again, the content itself cannot be stored/served from WMF servers, because of the fuzzy copyright status of usenet messages. – Jberkel 22:23, 22 February 2021 (UTC)[reply]

Category:Matter and its subcategories vs. Category:Chemistry

[edit]

I recently added Category:Matter to Category:Chemistry (aside from it being included in "Nature") because most if not all of its subcategories are closely related to chemistry: Acids, Chemical elements, Drugs, Dyes, Explosives, Gases, Inorganic compounds, Ions, Liquids, Metals, Minerals, Natural resources, Organic compounds, and Poisons. (The only exception seems to be Subatomic particles, which belongs more to physics.) Also, compounds may get categorized broadly under "chemistry" but if someone wanted to move them into the right subcategory, it was not so easy to find it, as "(in)organic compounds" was/were not part of "chemistry". On the other hand, @Benwing suggested to me that if there are subcategories of 'matter' that relate to chemistry, IMO you should add those subcategories directly to 'chemistry'.

I replied that in that case I'm afraid "Matter" itself would become partly redundant and somewhat pointless, and the distinction between the categories directly included in "chemistry" and those that are not may be fairly arbitrary. On the whole, treating "Matter" as a meaningful unit on its own still seems more feasible. In fact, I'd suggest that we delineate the category "Matter" more in accordance with its current content, which has a great deal of overlap with chemistry.

Benwing wrote that "Matter" on the whole sounds very vague and it seems very strange e.g. to put "Drugs" under "Matter". Maybe getting rid of it and moving those categories above to "Chemistry" and putting "Subatomic particles" directly under "Physics" is the right thing to do. – What do you all think about this? Adam78 (talk) 17:00, 21 February 2021 (UTC)[reply]

"pronominal" vs. "reflexive" in Spanish

[edit]

(Notifying Ungoliant MMDCCLXIV, Metaknowledge, Ultimateria, Gibraltar Rocks): Supposedly there is a distinction between "pronominal" and "reflexive" verbs in Spanish. See https://www.spanishdict.com/answers/208148/how-do-you-distinguish-between-pronominal-and-reflexive-verbs. Correspondingly, some senses in some verbs are labeled "pronominal" and some "reflexive". I know of no other language with reflexive verbs (e.g. Russian, French, Italian) that makes such a distinction, and Wiktionary doesn't recognize the label "pronominal" or categorize it in any way. I don't understand the distinction, really, and I doubt it's necessary to make it. Anyone object if I replace "pronominal" with "reflexive"? Benwing2 (talk) 06:05, 22 February 2021 (UTC)[reply]

Reflexive forms are a special case of pronominal forms, see reflexive. I think it's worth keeping the distinction, and not just in Spanish. Marking all as "reflexive" would certainly be wrong in some cases. – Jberkel 10:59, 22 February 2021 (UTC)[reply]
A verb that indicates an action that is necessarily done by the subject unto themself is different from a verb that requires a meaningless reflexive pronoun as part of its lexeme. If you want to remove the label reflexive (which is recognised), this information should be preserved some other way. — Ungoliant (falai) 11:08, 22 February 2021 (UTC)[reply]
  • I'm inclined to merge them because I've never seen the distinction in any reference materials, and I don't feel that it's very important. It's worth noting that Spanish dictionaries call these "pronominal" verbs, and English-Spanish dictionaries and learning/teaching materials almost always call them "reflexive". There's normally little reason to define true reflexive verbs (e.g. bañarse which should be a soft redirect) because their meaning is usually exactly what you'd guess, so most of the entries in our Spanish reflexive verb category would ideally be those Ungoliant's describes as "verb that requires a meaningless reflexive pronoun", in which case it would be more correct to call them "pronominal", a term that's still correct for the true reflexive verbs. (Pinging also @Froaringus.) Ultimateria (talk) 18:16, 22 February 2021 (UTC)[reply]
Lots of languages use reflexive morphology on verbs without reflexive semantics; I've never known a dictionary to label the two types differently. Even English has a small number of verbs with reflexive morphology but no reflexive semantics, such as avail (oneself of something). I don't think we need to make the distinction here either: as far as learning inflected forms goes, there's no difference, and the glosses of the entries tell us what they mean. And the difference isn't always clear-cut, anyway; certainly in German there are some morphologically reflexive verbs whose semantics are not immediately obviously reflexive, but could be considered reflexive with a bit of imagination, like sich trauen. —Mahāgaja · talk 18:47, 22 February 2021 (UTC)[reply]

Replacing all uses of {{etyl}} with new templates {{uder}} and {{uety}}

[edit]

I notice several users, e.g. User:Apisite, User:Vivaelcelta, User:Embryomystic, User:Donnanz, "helpfully" replacing {{etyl}} with {{der}} in a more or less mechanical fashion instead of making the correct distinctions between {{bor}}, {{inh}} and {{der}}. User:Mahagaja has been trying to clean up {{etyl}} for a long time now, and these mechanical replacements destroy the use of {{etyl}} as a signal that manual cleanup is needed. I understand these users may be doing this because {{der}} looks nicer than {{etyl}}, but these replacements aren't helpful. To forestall further such changes, I propose to replace *all* uses of {{etyl}} by bot with one of two new templates, both of which indicate that further cleanup is needed:

  1. {{uder}} (undefined derivation) works like {{der}} but indicates that cleanup is needed, and will place the page in a cleanup category, similarly to {{etyl}}. It will be used whenever a construction like {{etyl|FOO|BAR}} {{m|FOO|...}} currently occurs, as well as in cases like {{etyl|ML.|BAR}} {{m|la|...}}, where {{etyl}} occurs with an etymology language whose parent is used in {{m}}. It will also be used in corresponding constructions where {{l}} occurs instead of {{m}}.
  2. {{uety}} (undefined etymology) replaces all remaining occurrences of {{etyl}}, like this: {{etyl|FOO|BAR}} -> {{uety|BAR|FOO}}. The idea is to use the standard language ordering, making it easier to later replace e.g. {{uety|es|ML.}} with something like {{bor|es|ML.|-}} or {{bor|es|ML.|term}} (as the case may be).

Potentially, an edit filter will flag instances of adding {{uder}} and {{uety}} by hand and maybe even prevent them from happening, and they may throw errors for languages that have already been completely cleaned up, similarly to what {{etyl}} currently does. (On the other hand, it might be useful to allow people to add them by hand in cases where it's not clear which etymology template is correct.)

Thoughts? Benwing2 (talk) 06:28, 22 February 2021 (UTC)[reply]

Support 🔥शब्दशोधक🔥 08:32, 27 February 2021 (UTC)[reply]
How is {{der}} different from the templates you propose? --Vahag (talk) 07:41, 22 February 2021 (UTC)[reply]
Trackability. {{uder}} means "no one has checked what kind of derivation this is yet", while {{der}} ideally ought to mean "someone has determined that {{der}} is correct here rather than {{inh}} or {{bor}} (but of course up to now it doesn't necessarily actually mean that). I support this suggestion. —Mahāgaja · talk 07:53, 22 February 2021 (UTC)[reply]
All this may be rather futile, as you can't stop {{der}} being used in newly added etymology or newly created entries. The only way you can prevent that is by deleting {{der}} itself. DonnanZ (talk) 09:33, 22 February 2021 (UTC)[reply]
No, we can't completely stop {{der}} from being misused, but we can reduce the frequency of such misuse. —Mahāgaja · talk 09:53, 22 February 2021 (UTC)[reply]
I don't necessarily agree that {{der}} is being misused, but I do suspect {{etyl}} is still being added by a tiny minority of users in languages where they are still able to, so the sooner the etyl cleanup is completely finished the better. DonnanZ (talk) 11:21, 22 February 2021 (UTC)[reply]
What is definitely still happening is that {{etyl|xyz}} is being used to generate the name of a language in etymology sections, in cases where {{cog}} or {{noncog}} – or simply writing the language's name – should be used instead. I know this is still happening because every couple of weeks I find new pages in Category:Language code missing/etyl. —Mahāgaja · talk 12:16, 22 February 2021 (UTC)[reply]
Great idea. Fay Freak (talk) 12:52, 22 February 2021 (UTC)[reply]
I totally support this, but I don't think adding {{uder}} should be flagged. I add a lot of etymologies, and I'm much more interested in finding the etymon and adding descendants to it than determining the method of derivation. I'm always a little uncomfortable just adding {{der}} when it needs to be fixed later. Ultimateria (talk) 19:05, 22 February 2021 (UTC)[reply]
I've been bothered by users replacing {{etyl}} with {{der}} for a while now, but I haven't been able to convince them to stop. Now we have a new mess that's much harder to clean up. :/ So I'm in favour of doing this. But at least for {{uder}} we can just make it a redirect to {{der}}. The transclusions will act as tracking, no more would be needed I think. —Rua (mew) 20:30, 22 February 2021 (UTC)[reply]
@Rua: Now that you have spoken, maybe you'd like to deal with the four instances in User:Rua/ja. DonnanZ (talk) 22:33, 22 February 2021 (UTC)[reply]
Great idea. I too find these indiscriminate changes less than helpful. —Μετάknowledgediscuss/deeds 20:36, 22 February 2021 (UTC)[reply]
@Benwing2 Long time. What do you think of this now? —Svārtava [tcur] 15:54, 18 January 2022 (UTC)[reply]

I saw someone mass removing this template from all 一段 verb pages. Should the classical conjugation of classical 教ふ be given in the page 教える? -- Huhu9001 (talk) 22:04, 22 February 2021 (UTC)[reply]

Many other online dictionaries such as weblio.jp will give the classical conjugations for 一段 verbs.
おし・える〔をしへる〕【教える】
[動ア下一][文]をし・ふ[ハ下二]

https://www.weblio.jp/content/%E3%81%8A%E3%81%97%E3%81%88%E3%82%8B

  • That may have been me.
Regarding what other dictionaries do, the Daijisen example from Weblio shows a common hyper-abbreviated notation that includes several pieces of information.
おし・える〔をしへる〕【教える】
modern kana 〔historical kana〕 【kanji + okurigana】
[動ア下一][文]をし・ふ[ハ下二]
[verb vowel-stem lower monograde] [literary-form] historical kana  ["H"-stem lower bigrade]
The two lines parallel each other: modern info first, then historical. The historical here also gives us some information about etymology, by showing us the earlier Classical / Middle Japanese (sometimes also Old Japanese) lemma from which the modern form derives.
We include derivational information in our ===Etymology=== sections. Fuller information about the older forms, such as full inflection tables, we provide in the relevant lemma entry for those older forms.
I am fully supportive of including this inflection information in the lemma entries for verb forms that are actually included in that paradigm.
My concern is that the modern lemma verb forms ending in -iru and -eru for so-called 一段活用 (ichidan katsuyō, monograde conjugation) verbs are entirely absent from the Classical / Middle Japanese 二段活用 (nidan katsuyō, bigrade conjugation) verb paradigm.
For those unfamiliar, here are the basic conjugation stems for modern verbs ending in -iru and their Classical Japanese counterparts, and modern verbs ending in -eru and their Classical counterparts:
Conjugation Modern -iru verbs Classical Modern -eru verbs Classical
未然形 (mizenkei, irrealis form or negative stem) -i -i -e -e
連用形 (ren'yōkei, continuative or positive stem) -i -i -e -e
終止形 (shūshikei, terminal or predicative form)
Also known informally as the "dictionary form": this is the lemma.
-iru -u -eru -u
連体形 (rentaikei, attributive form) -iru -uru -eru -uru
已然形 (izenkei, realis or hypothetical form) -ire -ure -ere -ure
命令形 (meireikei, imperative or command form) -iro / -iyo -iyo -ero / -eyo -eyo
The key point I'd like to emphasize here is that the Classical counterparts to our modern -iru verbs have no forms ending in -iru. Likewise for our modern -eru verbs. It strikes me as problematic for a modern verb entry to include a conjugation table for Classical Japanese, where that table does not -- and cannot -- include the headword of the entry.
By way of loose analogy, our modern English do entry does not include Middle English conjugation forms like dide or dost -- rather, the Middle English conjugation tables are located at the lemma form for the Middle English verb, at don. Likewise, the Classical Japanese conjugation tables should presumably be located at the lemma forms for the Classical verbs -- the terminal or "dictionary" forms ending in just -u.
Where the lemma forms for Classical and modern align, I am perfectly happy for the shared lemma forms to include both Classical and modern inflection tables -- most notably, for the modern so-called 五段活用 (godan katsuyō, quintigrade conjugation) verbs and Classical 四段活用 (yodan katsuyō, quadrigrade conjugation) verbs. But where the Classical inflection paradigm doesn't include the modern lemma form, I do not think the modern lemma entry should include the Classical inflection table, nor should the Classical lemma entry include the modern inflection table. ‑‑ Eiríkr Útlendi │Tala við mig 01:29, 23 February 2021 (UTC)[reply]
Abstain ~ weak Support providing classical conjugation on the entries for modern verbs, as the entries for classical verbs do not presently exist. —Suzukaze-c (talk) 01:43, 23 February 2021 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233): -- Huhu9001 (talk) 10:15, 23 February 2021 (UTC)[reply]

It's unnatural to put a classical conjugation on the modern form lemma. The classical form of verbs should have separate lemma and should be linked from the ===Etymology=== section.--荒巻モロゾフ (talk) 11:02, 23 February 2021 (UTC)[reply]
I also support putting the classical form in the etymology section. Onionbar (talk) 21:01, 23 February 2021 (UTC)[reply]

Stock ticker symbols

[edit]

Should we keep them or toss them?

We currently have Category:en:Stock symbols for companies which isn't fitted into the category tree. I think these either fail CFI as written (they are not terms that you "would run across ... and want to know what it means" - these symbols are tightly bound to a financial context, where it is obvious that you should consult a list of stock symbols rather than a generalist dictionary) or should be expunged (by vote?) so our time isn't wasted with entries for every stock symbol that ever lived.

See Wiktionary:Beer_parlour/2009/October#Stock_symbols and Talk:A#RFD_discussion:_March–June_2014 for a little bit of past discussion on this. This, that and the other (talk) 09:08, 23 February 2021 (UTC)[reply]