User talk:Dan Polansky/2013
Add topicUsefulness of phrasebook
[edit]I find some of the phrasebook entries rather useful, as their translation is not grammatically straightforward.
- I'm hungry
- Czech: mám hlad, as if "I have hunger"
- German: ich habe Hunger, as if "I have hunger"
- I'm thirsty
- Czech: mám žízeň, as if "I have thirst"
- German: ich habe Durst, as if "I have thirst"
- do you speak English
- Czech: mluvíte anglicky?, as if "do you speak in English" or even *"do you speak Englishly"?
- Polish: czy mówisz po angielsku?, which I do not know how to render in an English analogue, maybe "do you speak in an English manner or way"
- I'm cold
- Czech: je mi zima, rather than *"jsem studený"; in English, perhaps "it is cold to me"?
- German: es ist mir kalt, rather than *"ich bin kalt"; in English, perhaps "it is cold to me"?
- how are you?
- Czech: jak se máš, as if "how are you having yourself?"
- German: wie geht es dir?, as if "how does it go with you?" or the like
- have a seat
- Czech: posaďte se, as if sit down
- how do you say...in English?
- Czech: jak se řekne...anglicky?, as if "how does ... get said Englishly"?
- German: wie sagt man...auf Englisch, as if "how does one say ... on English"?
- how much does it cost?
- German: was kostet das?, as if "what does it cost", which happens to be idiomatic English
Category:English phrasebook has 357 entries. --Dan Polansky (talk) 18:06, 5 January 2013 (UTC)
- Exactly. If only we could get rid of all the rubbish, the phrasebook would be a useful part of the project. SemperBlotto (talk) 18:09, 5 January 2013 (UTC)
Some more:
- I have a cold
- Czech: jsem nachlazený, as if "I am got cold" or the like
- German: ich bin erkältet, just like Czech
- Russian: ja prostudílsja, as if "I have colded myself through" or the like
- I'm twenty years old
- Czech: je mi dvacet let, as if "it is twenty years to me"
- Polish: mam dwadzieścia lat, as if "I have twenty years"
--Dan Polansky (talk) 11:52, 6 January 2013 (UTC)
some random words
[edit]Just a few random words from the beginning of Czech Wikipedia article on Hydrogen. (not a full, sorted list of red links as my software doesn't understand the funny accents over letters) Don't feel under any obligation to add them! SemperBlotto (talk) 16:11, 6 January 2013 (UTC)
Vodík chemická latinsky nejjednodušší tvořící převážnou hmoty vesmíru Má široké praktické redukční činidlo chemické syntéze metalurgii meteorologických pouťových balonů vzducholodí Obsah Základní fyzikálně-chemické vlastnosti Historický Výskyt přírodě Tvorba průmyslová Využití Sloučeniny Anorganické sloučeniny Hydridy Další Organické sloučeniny Izotopy vodíku Odkazy Související články Literatura Základní fyzikálně-chemické vlastnosti Molekula chuti zápachu hoří namodralým plamenem nepodporuje - some capitalisation will be wrong
- Thanks. We have lemma forms of base forms of many of these, although not all of them: vodík, chemický, latinský (adjective rather than the redlinked adverb), jednoduchý, tvořit, převážný, hmota, vesmír, mít, široký, praktický, redukční, činidlo, chemický, syntéza, metalurgie, etc.
- For your method to work for me, I would need to enter inflected forms of Czech words into Wiktionary, which I don't feel like doing. I actually have a list of Czech words to add, working offline on their verification, from time to time. --Dan Polansky (talk) 19:37, 7 January 2013 (UTC)
ttbc
[edit]Hi,
When adding ttbc's please check for qualifier, they actually explain the sense sometimes as in trio#Translations. --Anatoli (обсудить/вклад) 00:47, 9 January 2013 (UTC)
Harry Potter
[edit]I don't really understand the difference between Category:Harry Potter and Category:Harry Potter derivations. Where do metloboj or bezjak or smrtožder belong? Zabadu (talk)
Also, I am very worried about anti-Serb bias here. For example there is no Serbia category but there is Croatia category. Why is that?
Can you please help me add flag of Serbia to Category:Serbia?
- No comment. --Dan Polansky (talk) 13:37, 12 January 2013 (UTC)
chargemaster
[edit]Please see talk:chargemaster. Please can we discuss this more before removing this material, as it is integral to the definition. Thank you, -- Cirt (talk) 22:06, 9 March 2013 (UTC)
- Thanks very much for your polite response on the talk page, I really appreciate it! :) I've responded there, -- Cirt (talk) 04:47, 10 March 2013 (UTC)
Request about "chargemaster"
[edit]Request: Please, Dan Polansky (talk • contribs), I ask of you to read this article:
- Lua error in Module:quote at line 2956: Parameter 1 is required.
I think that will give you some clarity about the term chargemaster. Thank you for your time, -- Cirt (talk) 18:18, 10 March 2013 (UTC)
- I responded at WT:RFD. --Dan Polansky (talk) 19:30, 10 March 2013 (UTC)
DONE: Trimmed the definition to that suggested by Dan Polansky (talk • contribs), above, please see DIFF. Hopefully this is now satisfactory to Dan Polansky (talk • contribs). Thank you, -- Cirt (talk) 23:40, 10 March 2013 (UTC)
Please read this article
[edit]- Lua error in Module:quote at line 2956: Parameter 1 is required.
I strongly recommend you read this article, as a good faith gesture, it would help inform our discussion. Can you please read it? It is most informative. Thank you, -- Cirt (talk) 23:43, 10 March 2013 (UTC)
- Let me note that, in the discussion about the definition of "chargemaster", I am acting in the capacity of a dictionary maker, trying to figure out what is and what is not a part of the definition of "chargemaster". I am not defending whatever despicable practices exist in relation to chargemasters. --Dan Polansky (talk) 19:18, 11 March 2013 (UTC)
- Sure, sure, I agree with you and I don't doubt your good faith intentions. :) I'm just respectfully asking you to read this article, please? -- Cirt (talk) 20:29, 11 March 2013 (UTC)
Re: KYPark
[edit]Well said. I owe you one. —Μετάknowledgediscuss/deeds 02:17, 14 March 2013 (UTC)
- I second. --Anatoli (обсудить/вклад) 02:43, 14 March 2013 (UTC)
drug
[edit]A note to myself and whoever cares to read: I am dissatisfied with the "drug" entry, currently having four senses. Recent related events:
- Wiktionary:RFC#drug, August 2012, originally at RFV
- A conversation at User_talk:Msh210#drug, 17 March 2013
As a consequence, I have done this:
- Sent the 1st sense to WT:RFD, with the intention of making it more narrow by removing part of the definition.
- Sent the 2nd sense to to WT:RFV with the intention of getting it removed.
--Dan Polansky (talk) 15:32, 27 April 2013 (UTC)
Key definition edits to "drug" entry:
- diff, March 2003: 1st def entered of "Substance used to treat an illness, relieve a symptom or modify a chemical process in the body for a specific purpose."
- diff, May 2003: A 2nd def entered: "Addictive substance used to alter the level of consciousness"
- diff, August 2004: 2nd def tweak: "A substance, often addictive, used to alter the level of consciousness"
- diff, July 2005: 2nd def tweak: "A substance, often addictive, which affects the central nervous system"
- diff, March 2006: 3rd def added: "A chemical or substance, not necessarily for medical purposes, that alters the way the mind or body works", with the summary "Added definition(noun) that encomasses non-medicinal drugs)", by an anon
- diff, December 2006: 4rd def added: "An illegal drug", by an anon
- diff, March 2007: 4rd def tweaked: "A drug, especially illegal, taken for recreational use"
- diff, July 2008: 4rd def tweaked: "A substance, especially one which is illegal, ingested for recreational use."
- diff, May 2013: 4rd def tweaked: "A psychoactive substance, especially one which is illegal and addictive, ingested for recreational use, such as cocaine"
--Dan Polansky (talk) 12:20, 4 May 2013 (UTC)
Two senses removed by me in diff, failing WT:RFV.
I have reverted this revision by an anon, one of interest:
- (pharmacology) A substance intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease in man or other animals; a medicine.
- (pharmacology) A substance (other than food) intended to affect the structure or any function of the body of man or other animals.
- A narcotic substance.
- (figuratively) Anything that has the effect of a narcotic substance.
The 1st definition is subject to objections raised in a recent RFD: contraceptives do not meet the definition. This may be dealt with by placing contraceptives under the 2nd definion, but it is unobvious why the 2nd definition should be separate from the 1st one.
The "narcotic substance" definition is unhelpful, IMHO; it relies on narcotic entry featuring three definitions, failing to select the intended definition from "narcotic". Furthermore, the "narcotic substance" definition may be wrong, depending on what the definitions at "narcotic" are intended to cover; the 1st one seems to entail "induces sleep" as a condition necessary, so it does not cover all illicit drugs; the 2nd one entails "numbing", so ditto; the 3rd one "certain illegal drugs" is massively unspecific, failing to tell us which illegal drugs it selects, but if it selects some and not all, then it cannot be covering all illicit drugs.
When checking “drug”, in OneLook Dictionary Search., no dictionary equates illicit drugs with narcotics. Collins, for instance, has "chemical substance, esp a narcotic, ...", which makes it clear "drug" and "narcotic" are not synonymous.
The 4th figurative sense is one that we are possibly missing, and was mentioned by msh210 on his talk page. However, it should be added only together with citations supporting it, IMHO. --Dan Polansky (talk) 09:59, 30 June 2013 (UTC)
Hi there. In the UK a "tax office" is a place where you can go (or phone) to discuss your tax affairs. The organization used to be called the "inland revenue" and is now called "HM Revenue and Customs". See [1] as an example of use. SemperBlotto (talk) 10:10, 4 May 2013 (UTC)
- Oops! So it is not like "post office", which can refer both to a particular place and to the organization itself. The tax-collecting organizations have various specific names across the world, as per W:Revenue_service: "HM Revenue and Customs" (U.K.), "Internal Revenue Service (IRS)" (U.S.), "Australian Taxation Office" and "Canada Revenue Agency". Would "revenue service" be the generic term for tax-collecting agency or organization I am looking for? What about tax agency, or tax authority? --Dan Polansky (talk) 10:19, 4 May 2013 (UTC)
- Yes, I think that "revenue/tax service/agency/authority" combinations are used in the UK and elsewhere as a generic term for the organization. In the UK, there are several other "offices" that are organizations rather than places (normally capitalised) - Office for National Statistics is one that springs to mind. SemperBlotto (talk) 10:26, 4 May 2013 (UTC)
- I have fixed the entry. Feel free to edit it further. --Dan Polansky (talk) 10:33, 4 May 2013 (UTC)
Wiktionary popularity among online dictionaries per Alexa rank
[edit]I can't even believe the following statistics:
Rank of dictionary web sites per number of visitors per Alexa.com, ordered by global rank:
Web | Alexa Global Rank | Alexa U.S. Rank | Alexa U.K. Rank | Note |
---|---|---|---|---|
wikipedia.org | 6 | 8 | 10 | Listed despite not being a dictionary, as a super successful Mediawiki project |
reference.com | 207 | 77 | 144 | dictionary.reference.com - 54% visitors of the domain go here |
thefreedictionary.com | 265 | 223 | 205 | Multi-lingual; has a definition dictionary for several languages; by Farlex |
wordreference.com | 306 | 1,024 | 325 | |
wiktionary.org | 641 | 1,313 | 867 | This is for all Wiktionaries, not just the English one. en.wiktionary.org - 40% of visitors of the domain go to this subdomain |
urbandictionary.com | 836 | 378 | 429 | Note the U.S. and U.K. ranks |
merriam-webster.com | 867 | 315 | 1,817 | Note the U.S. rank |
yourdictionary.com | 3,440 | 1,775 | 3,294 | |
cambridge.org | 3,509 | 5,781 | 982 | |
oxforddictionaries.com | 5,635 | 7,898 | 1,540 | |
infoplease.com | 7,936 | 2,742 | 7,682 | |
uchicago.edu | 7,983 | 3,084 | 7,711 | Hosts Webster 1913 and Roget 1911, but naturally also many other things machaut.uchicago.edu - 1.2% of domain visitors go here; this is the subdomain that hosts the dictionaries. |
macmillandictionary.com | 8,232 | 6,731 | 4,798 | |
rhymezone.com | 13,207 | 3,548 | 6,301 | |
collinsdictionary.com | 19,187 | 19,487 | 5,175 | |
wordnik.com | 19,976 | 11,546 | 13,955 | |
onelook.com | 20,022 | 8,228 | 17,626 | |
vocabulary.com | 20,588 | 11,113 | 15,779 | |
dicts.info | 124,942 | 144,782 | 81,619 | |
wordsmyth.net | 147,063 | 64,960 | N/A | |
allwords.com | 166,231 | 77,270 | 172,247 | |
freedictionary.org | 576,552 | N/A | N/A | |
freedict.org | 8,006,370 | N/A | N/A | It may be that most downloaders download the complete dictionary files; I don't know. |
It follows that there are five dictionary web sites significantly competing with Wiktionary in terms of number of visitors: reference.com, wordreference.com, thefreedictionary.com, urbandictionary.com, and merriam-webster.com. All the other dictionaries perform worse than Wiktionary even in access from U.S. and U.K., no matter how good definitions they offer. --Dan Polansky (talk) 10:58, 8 May 2013 (UTC) Updated. --Dan Polansky (talk) 15:16, 8 May 2013 (UTC)
Alexa rank for dictionaries selected with the focus on the Czech Republic aka Czechia:
Web Site | Alexa Rank for CR | Note |
---|---|---|
seznam.cz | 1 | slovnik.seznam.cz - 6% go to this subdomain, so Seznam dictionary is really popular; features data from Lingea and Macmillan Dictionary |
centrum.cz | 10 | slovniky.centrum.cz - 0.56% go to this subdomain |
abz.cz | 240 | slovnik-cizich-slov.abz.cz - 68% go to this subdomain |
slovnik.cz | 423 | Features LangSoft vocabulary + GNU/FDL dictionary |
online-slovnik.cz | 783 | En<-->cs + synonym dictionary; unclear owner and licensing terms |
wiktionary.org | 827 | cs.wiktionary.org - 0.4% go to this subdomain, so chances are the visitors from Czechia actually go somewhere else, like to en.wikt, fr.wikt or de.wikt. |
zcu.cz | 838 | slovnik.zcu.cz - the subdomain is not listed; why? |
slovnik-synonym.cz | 1,072 | Seems to belong to abz.cz |
lingea.cz | 8,016 | slovniky.lingea.cz - 36% go here |
--Dan Polansky (talk) 13:14, 8 May 2013 (UTC)
See also http://www.alexa.com/topsites/category/Top/Reference/Dictionaries, a list of top Alexy sites in Dictionaries category. There, Wiktionary is 3rd, probably based on the global Alexa rank. --Dan Polansky (talk) 15:16, 8 May 2013 (UTC)
The popularity of Wiktionary can be corroborated from other sources, rather than relying on Alexa only.
According to Google Ad Planner at http://www.google.com/adplanner/static/top1000, Wiktionary had rank of 574 by the number of unique visitors in July 2011; it had 8,200,000 unique visitors and 26,000,000 page views. As for some other dictionaries, thefreedictionary.com had rank 167 and 21,000,000 unique visitors, while merriam-webster.com had rank 700 and 7,400,000 unique visitors. Again, these are 2011 data. To find other dictionaries there, search for "dictionaries", as there is "Dictionaries & Encyclopedias" category shown in their table.
Website quantcast.com is another source. By going to http://www.quantcast.com/wiktionary.org, and using "Compare Site" button, you can compare Wiktionary popularity to other dictionaries, including "merriam-webster.com". The comparision is shown as a time-dependent graph. For March 31 through April 29 2013, the graph shows around 1.8 million "people" in United States per month for Wiktionary while around 14 million "people" for merriam-webster.com; for cambridge.org, it shows around 0.2 million "people". Presumably, "people" refers to unique visitors. --Dan Polansky (talk) 18:26, 17 May 2013 (UTC)
Page views of Wiktionary and some other stats per Wikimedia statistics[2]:
Language | Page Views in March 2013 | Very Active Editors in March 2013 | Speakers |
---|---|---|---|
English | 95,761,090 | 80 | 1,500,000,000 |
French | 36,264,524 | 32 | 200,000,000 |
Russian | 21,857,366 | 12 | 278,000,000 |
German | 14,115,657 | 17 | 185,000,000 |
Portuguese | 8,935,120 | 3 | 290,000,000 |
Polish | 8,399,996 | 11 | 43,000,000 |
Greek | 5,636,774 | 5 | 15,000,000 |
Chinese | 5,506,369 | 1 | 1,300,000,000 |
Spanish | 5,385,766 | 7 | 500,000,000 |
Italian | 5,181,025 | 4 | 70,000,000 |
Dutch | 4,243,709 | 6 | 27,000,000 |
Japanese | 3,502,241 | 3 | 132,000,000 |
Swedish | 3,118,274 | 3 | 10,000,000 |
Korean | 3,022,935 | 1 | 78,000,000 |
Vietnamese | 2,663,255 | 0 | 80,000,000 |
Turkish | 2,480,022 | 2 | 70,000,000 |
Finnish | 2,127,353 | 7 | 6,000,000 |
Malagasy | 1,606,076 | 0 | 20,000,000 |
Lithuanian | 1,503,229 | 0 | 3,500,000 |
Czech | 1,491,805 | 8 | 12,000,000 |
--Dan Polansky (talk) 17:39, 19 May 2013 (UTC)
Maximum records of Alexa ranks of Wiktionary[3], haphasardly collected:
Date | Global | U.S. | U.K. | France | Poland | Germany | Russia |
---|---|---|---|---|---|---|---|
26 Oct 2013 | 570 | 954 | 743 | 179 | 275 | 307 | 388 |
Czech rhymes
[edit]I always thought Czech had stress on the first syllable in most words. Am I mistaken, or do words rhyme differently in Czech? —CodeCat 19:03, 14 May 2013 (UTC)
- I don't think consciously of stress in Czech; I just speak it. Whatever the case, is there any impact on the pages that I am creating? Is there anything I have entered that you think in fact does not rhyme? --Dan Polansky (talk) 19:05, 14 May 2013 (UTC)
- If stress is word-initial, I'd expect hrana and obrana to not rhyme. You are probably better at judging what rhymes and what doesn't, but if those words do rhyme, I am curious why that is. —CodeCat 19:11, 14 May 2013 (UTC)
- I can't think of a rhyme with "hrana" and "obrana", but consider this: "Mariana byla panna¶ než vrazila do klokana". There, "panna" has two syllables while "klokana" has three syllables. It can be that the stress shifts to the preposition do before klokana; I do not really know. What I do know is that the words I am entering generally can be paired to create rhymes. --Dan Polansky (talk) 19:18, 14 May 2013 (UTC)
- I think other Slavic languages have similar stress shifts with prepositions. It has something to do with the original Proto-Slavic pitch-based accent I think. —CodeCat 19:24, 14 May 2013 (UTC)
- I can't think of a rhyme with "hrana" and "obrana", but consider this: "Mariana byla panna¶ než vrazila do klokana". There, "panna" has two syllables while "klokana" has three syllables. It can be that the stress shifts to the preposition do before klokana; I do not really know. What I do know is that the words I am entering generally can be paired to create rhymes. --Dan Polansky (talk) 19:18, 14 May 2013 (UTC)
- If stress is word-initial, I'd expect hrana and obrana to not rhyme. You are probably better at judging what rhymes and what doesn't, but if those words do rhyme, I am curious why that is. —CodeCat 19:11, 14 May 2013 (UTC)
Deprecated Czech templates
[edit]There are some Czech entries listed in Category:Pages using deprecated templates. Could you have a look and fix them if possible? —CodeCat 19:23, 19 May 2013 (UTC)
- I have removed the deprecation from
{{cs-conj-it}}
. After the server catches up, Category:Pages using deprecated templates should get emptied. The template was marked as deprecated in diff, on 3 March 2009. The template seems to produce correct results. I am not really much into Czech inflection templates, so I am unenthusiastic about implementing replacement proposals invented but not executed on by other editors. --Dan Polansky (talk) 20:11, 19 May 2013 (UTC)
Phonosemantic interpretations
[edit]Thank you for calling my attention to the new Beer Parlour thread, Dan. I await the community's decision, and will of course be adding no entries for the time being. Lawrence J. Howell (talk) 22:48, 9 June 2013 (UTC)
Your View?
[edit]Hello, Dan. My watchlist tells me that user 75.71.64.241 reverted data I uploaded for the character 身, writing Very little evidence to support those claims. As I'm abiding by the community's request to refrain from doing anything until the matter under debate has been settled, I believe it's only fair that the hands-off policy cut both ways. What's your take? Lawrence J. Howell (talk) 08:23, 13 June 2013 (UTC)
- I don't really know. I don't think Wiktionary can keep "Phonosemantic interpretations" backed by a single source. The anon should better wait for the discussion to proceed, though. However, many view such waiting as too bureaucratic and proceed via a fast track. As per fast track, etymological content that is sourced from a single source, having no obvious other sources, and for which no sources are in the process of being added can be removed.
- Links: Wiktionary:Beer_parlour/2013/June#Phonosemantic_interpretation, 75.71.64.241 (talk). --Dan Polansky (talk) 15:42, 13 June 2013 (UTC)
What is a misspelling
[edit]What is a misspelling may be a hard question but let us have a look, in a hasty sketch.
A misspelling can be understood as a transmission error, in terms on sending messages over a noisy communication channel. A message--a sequence of letters--sent over a noisy communication channel is subject to random changes to the letters. The intended received message is the one that was sent; the criterion of correctness is identity: the received message has correct spellings if they are identical to the spellings used in the sent message. As a consequence, misspellings resulting from noise of low-noise channel tend to be of much lower frequency in the corpus of received messages than "correct" spellings.
What is the noise in the case of man-made misspellings? For one thing, each person makes misspelling in individual written utterances; these tend to have lower frequency in all writings of the person than the "correct" spellings. For another thing, a person can store an uncommon spelling as the standard one in the mind and consistently reproduce the spelling that has low frequency in the corpus of the language community but high frequency in the writing of that single person.
There may be an authority declaring what is and what is not a misspelling, such as a dictionary published by a successful commercial publisher or a dictionary published by a regulatory government-funded organization established in one of the countries in which the language prevails. The decision made by the dictionary may be arbitrary, disregarding current frequency. The point of making an arbitrary decision about "correct" spelling and sticking to it is enabling uniformity of spelling in the corpus, coupled with compactness of spelling patterns if the spelling decision is made according to implied spelling patterns and regularities rather than by individual frequencies.
As a practical frequency criterion, misspellings tend to have vanishingly lower frequency than their "correct" alternatives, whereas alternative spellings have much more favorable frequency ratio to the "correct" or mainstream alternatives. In English, it is worthwhile to have a look at frequency ratios of U.S. vs. British spellings, such as "color" vs "colour". From what I can see in Google Ngram Viewer, their frequency ratio tends to be 2 to 4, meaning the U.S. spelling is twice to four times more common in the whole corpus than the British spelling. By contrast, looking at "conceive" vs. "concieve", the frequency ratio is 1000.
As per frequency criterion, a misspelling can never have a higher frequency than a "correct" spellings. Nonetheless, there are probably etymology afficionados claiming about one mainstream spelling or another that it is "incorrect". If these are allowed to run authoritative dictionaries, their preferences can end up being codified as "correct". --Dan Polansky (talk)
Policies and would-be policies:
Discussions:
Categories:
- Category:English misspellings - currently 1,477 entries; only for common misspellings
--Dan Polansky (talk) 10:05, 5 July 2013 (UTC)
- Yes, I'll go along with most of that. I had always assumed that spelling mistakes were honest errors (-ie- instead of -ei- etc.), the results of typing too fast (that's where most of mine come from) and simple ignorance (I can never remember how to spell (deprecated template usage) manoeuvre. But when is a spelling mistake "common" (as the ones we include)? Maybe when the "frequency ratio" is greater than hundreds but less than thousands? SemperBlotto (talk) 10:25, 5 July 2013 (UTC)
Re: 'Maybe when the "frequency ratio" is greater than hundreds but less than thousands?' Sounds okay to me as a criterion for "common misspelling"; what has lower frequency ratio is "alternative spelling". However, the lower bound could be even lower, like 20 or 50. In RFV, I have posted a table that gives an impression:
Short Term | Long Term | Ngram | Frequency Ratio in Year 2000 |
---|---|---|---|
referencable | referenceable | Ngram | 8 |
experiencable | experienceable | Ngram | 10 |
influencable | influenceable | Ngram | 16,5 |
sequencable | sequenceable | Ngram | 6 |
servicable | serviceable | Ngram | 156 |
enforcable | enforceable | Ngram | 860 |
replacable | replaceable | Ngram | 190 |
colour | color | Ngram | 3,4 |
behaviour | behavior | Ngram | 2,8 |
rigour | rigor | Ngram | 2 |
concievable | conceivable | Ngram | 3867 |
idiosyncracy | idiosyncrasy | Ngram | 6 |
supercede | supersede | Ngram | 15 |
--Dan Polansky (talk) 10:42, 5 July 2013 (UTC)
I was not paying attention. You asked when is a spelling mistake common enough to be includable. For this, not only frequency ratio can be considered but also absolute frequency. Let me think some more and have a look. --Dan Polansky (talk) 10:45, 5 July 2013 (UTC)
Currently, Wiktionary is not overflooded with misspellings, having 1477 English misspellings. To decide what misspellings to exclude based on frequency ratio, we would need to choose a fairly arbitrary threshold. I would choose such threshold that prevents overflooding of Wiktionary with misspellings while allowing a fair amount of them. As I cannot determine the number of acceptable misspellings per various frequency ratio thresholds, I have not much of an opinion on that threshold. From the table that follows, I would guess the threshold should be higher than 2000. With the use of the data that Google has published for download at Google Ngram Viewer, the number of misspellings per threshold could be determined, but that would require fairly heavy number crunching, it seems.
One could object that frequency ratio should not be used alone. I don't have much of an opinion on that other than that using it alone seems okay, not too bad.
Term 1 | Term 2 | Ngram | Ratio in Year 2000 |
---|---|---|---|
beleive | believe | Ngram | 3349 |
beleiver | believer | Ngram | 22913 |
aquitted | acquitted | Ngram | 433 |
aquire | acquire | Ngram | 1075 |
arithmatically | arithmetically | Ngram | 441 |
concieve | conceive | Ngram | 1494 |
recieve | receive | Ngram | 1874 |
bibiliography | bibliography | Ngram | 2920 |
assidious | assiduous | Ngram | 1084 |
bizzare | bizarre | Ngram | 396 |
athiest | atheist | Ngram | 561 |
condensor | condenser | Ngram | 99 |
concensus | consensus | Ngram | 341 |
accross | across | Ngram | 5097 |
--Dan Polansky (talk) 12:06, 5 July 2013 (UTC)
To get an idea of how selective the predicate "common misspelling" is as opposed to mere "misspelling", I had a little look at imaginable misspellings of "conceive", and their frequency ratio as per Google Ngram Viewer:
Spelling | Corpus Frequency in Y2000 in % |
Freq Ratio to Base Spelling |
Ngram |
---|---|---|---|
conceive | 0,0006574282 | 1 | Ngram |
concieve | 0,0000004472 | 1470 | Ngram |
coceive | Not found | N/A | Ngram |
cocneive | Not found | N/A | Ngram |
cnceive | Not found | N/A | Ngram |
concive | 0,0000000197 | 33372 | Ngram |
conceie | Not found | N/A | Ngram |
conceibe | Not found | N/A | Ngram |
conceice | Not found | N/A | Ngram |
Notice that, using Google Ngram Viewer, we are looking at Google books, which is a corpus of copyedited works, as contrasted to world wide web. --Dan Polansky (talk) 09:02, 6 July 2013 (UTC)
To broaden the impression, here comes a comparison of a couple of -ize/-ise forms:
Term 1 | Term 2 | Ngram | Frequency Ratio in Year 2000 |
---|---|---|---|
analyse | analyze | Ngram | 2.6 |
crystalise | crystalize | Ngram | 6.8 |
revitalise | revitalize | Ngram | 6.5 |
popularise | popularize | Ngram | 3.7 |
formalise | formalize | Ngram | 4.4 |
pluralise | pluralize | Ngram | 7.5 |
criticise | criticize | Ngram | 5.1 |
realise | realize | Ngram | 6.7 |
organise | organize | Ngram | 5.9 |
equalise | equalize | Ngram | 7.8 |
neutralise | neutralize | Ngram | 6.8 |
socialise | socialize | Ngram | 9.8 |
--Dan Polansky (talk) 18:29, 9 July 2013 (UTC)
Hypothesis: Copyediting massively impacts frequency ratio. Verification:
Term 1 | Term 2 | Ngram | Ngram Freq Ratio in Year 2000 |
Freq Ratio in English Web |
Ratio of Ratios | Hits 1 | Hits 2 |
---|---|---|---|---|---|---|---|
beleive | believe | Ngram | 3349 | 127 | 26 | 22900000 | 2900000000 |
beleiver | believer | Ngram | 22913 | 417 | 55 | 220000 | 91700000 |
aquitted | acquitted | Ngram | 433 | 243 | 2 | 188000 | 45600000 |
aquire | acquire | Ngram | 1075 | 72 | 15 | 5080000 | 366000000 |
arithmatically | arithmetically | Ngram | 441 | 50 | 9 | 9640 | 484000 |
concieve | conceive | Ngram | 1494 | 2 | 612 | 25400000 | 62000000 |
recieve | receive | Ngram | 1874 | 40 | 46 | 56000000 | 2260000000 |
bibiliography | bibliography | Ngram | 2920 | 2118 | 1 | 68000 | 144000000 |
assidious | assiduous | Ngram | 1084 | 93 | 12 | 25600 | 2390000 |
bizzare | bizarre | Ngram | 396 | 27 | 15 | 16300000 | 444000000 |
athiest | atheist | Ngram | 561 | 67 | 8 | 1710000 | 115000000 |
condensor | condenser | Ngram | 99 | 9 | 11 | 4130000 | 37200000 |
concensus | consensus | Ngram | 341 | 91 | 4 | 1990000 | 181000000 |
accross | across | Ngram | 5097 | 187 | 27 | 16800000 | 3140000000 |
Anomalies or outliers: acquitted, conceive, bibliography.
--Dan Polansky (talk) 17:19, 12 July 2013 (UTC)
This currently has a chemistry definition. But given that it has a Proto-Slavic origin, it's almost certainly missing senses. Can you help? —CodeCat 16:14, 19 July 2013 (UTC)
Also, are -ný and -ní the same suffix or is there a difference? —CodeCat 16:25, 19 July 2013 (UTC)
- I have added a def to -ný. -ný does not seem to be the same suffix as -ní. --Dan Polansky (talk) 16:47, 22 July 2013 (UTC)
Personal attack
[edit]Why did you have to personally attack me on my own user talk page? If anyone is being shoddy, you are by attacking me personally on my own talk page. Don't do it again. Razorflame 19:34, 28 July 2013 (UTC)
- Evidence to the claims I have made on your talk page is in the archives of your talk page, in your editing history and in your block log. If you find any inaccuracy in what I write, let me know. --Dan Polansky (talk) 18:26, 29 July 2013 (UTC)
- It is a personal attack. Don't add it back to my talk page. Razorflame 20:18, 29 July 2013 (UTC)
- @Razorflame: You're bandying about "personal attack" with abandon. Don't. What he wrote is not what I, or I believe most editors, would consider a personal attack.
- @Dan: That said, I quote the following from WT:BLOCK: "[A reasonable cause for blocking is causing] ... our editors distress by directly insulting them or by being continually impolite towards them." I'm not going to block you, but it is true that you are arguably being "continually impolite". Please be civil. —Μετάknowledgediscuss/deeds 21:08, 29 July 2013 (UTC)
- You are misrepresenting WT:BLOCK. The complete WT:BLOCK policy is this: "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary. It should not be used unless less drastic means of stopping these edits are, by the assessment of the blocking administrator, highly unlikely to succeed." I am calling Razorflame to responsibility for what he does. If you find a particular sentence that I posted incivil, be specific about it. For the record, given your history of misrepresentations and poor understanding, I am nowhere all too enthusiastic seeing you on my talk page or in the talk between me and Razorflame. --Dan Polansky (talk) 21:13, 29 July 2013 (UTC)
- Yes, I know you dislike me as well. I am here because you two are at each other's throats, and I am (apparently ineffectively) trying to make sure that neither does something actually blockworthy. —Μετάknowledgediscuss/deeds 21:20, 29 July 2013 (UTC)
- Be specific. --Dan Polansky (talk) 21:20, 29 July 2013 (UTC)
- About what? If you mean for me to be specific about "something actually blockworthy", I essentially mean harassment. Whether or not harassment has occurred could easily be argued; I think not, but Razorflame certainly feels harassed, judging by his defensive reaction. —Μετάknowledgediscuss/deeds 21:41, 29 July 2013 (UTC)
- Be specific. --Dan Polansky (talk) 21:20, 29 July 2013 (UTC)
- Yes, I know you dislike me as well. I am here because you two are at each other's throats, and I am (apparently ineffectively) trying to make sure that neither does something actually blockworthy. —Μετάknowledgediscuss/deeds 21:20, 29 July 2013 (UTC)
- You are misrepresenting WT:BLOCK. The complete WT:BLOCK policy is this: "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary. It should not be used unless less drastic means of stopping these edits are, by the assessment of the blocking administrator, highly unlikely to succeed." I am calling Razorflame to responsibility for what he does. If you find a particular sentence that I posted incivil, be specific about it. For the record, given your history of misrepresentations and poor understanding, I am nowhere all too enthusiastic seeing you on my talk page or in the talk between me and Razorflame. --Dan Polansky (talk) 21:13, 29 July 2013 (UTC)
- It is a personal attack. Don't add it back to my talk page. Razorflame 20:18, 29 July 2013 (UTC)
- Which sentence that I have posted is incivil, blockworthy or borderline blockworthy? --Dan Polansky (talk) 21:42, 29 July 2013 (UTC)
- Nothing except reposting material that Razorflame read, removed, and (relatively civilly) asked you not to repost. You have a right to notify him of his errors on his talkpage, but reposting reverted material like that is basically edit warring. I don't think you could be reasonably blocked for it, but if you continue, perhaps someone would block you (as I said, I myself would not). —Μετάknowledgediscuss/deeds 21:51, 29 July 2013 (UTC)
- Do you believe users have the right to remove posts to their talk pages that are critical of their editing? I am not notifying Razorflame about his errors; I am notifying other editors of Razorflame's dubious editing by providing direct evidence in the form of diff hyperlinks from which editors can figure things out for themselves, without taking my word for it. My posts on Razorflame's talk page are not for Razorflame, and he knows it very well. This is why he is removing my posts. I would have blocked him for those removals, but I am not an admin. --Dan Polansky (talk) 21:55, 29 July 2013 (UTC)
- Yes, I do believe he has that right. If you truly wished to post them for the community, you should do so in the BP. —Μετάknowledgediscuss/deeds 22:02, 29 July 2013 (UTC)
- Do you believe users have the right to remove posts to their talk pages that are critical of their editing? I am not notifying Razorflame about his errors; I am notifying other editors of Razorflame's dubious editing by providing direct evidence in the form of diff hyperlinks from which editors can figure things out for themselves, without taking my word for it. My posts on Razorflame's talk page are not for Razorflame, and he knows it very well. This is why he is removing my posts. I would have blocked him for those removals, but I am not an admin. --Dan Polansky (talk) 21:55, 29 July 2013 (UTC)
- Nothing except reposting material that Razorflame read, removed, and (relatively civilly) asked you not to repost. You have a right to notify him of his errors on his talkpage, but reposting reverted material like that is basically edit warring. I don't think you could be reasonably blocked for it, but if you continue, perhaps someone would block you (as I said, I myself would not). —Μετάknowledgediscuss/deeds 21:51, 29 July 2013 (UTC)
- A user's talk page is the most natural location for finding out about him. If I posted to Beer parlour, and months later people come to Razorflame's talk page, they would not find much there. But because most of discussions about Razorflame took place on his talk page, any editor with a sincere wish to know about Razorflame can conveniently find out. Furthermore, there is no need to bring his editing to a broad community attention when the user talk page suffices. Thus, I see not a single benefit of posting to Beer parlour, while I see two benefits of posting to his talk page. --Dan Polansky (talk) 22:07, 29 July 2013 (UTC)
trreq
[edit]I want {{trreq}}
deleted or constrained as much as possible. Other editors disagree.
Discussions:
- Template_talk:trreq#RfD_deletion_debate, October 2012
- Wiktionary:Beer_parlour/2012/October#Translation requests
- Wiktionary:Beer_parlour/2013/August#trreq.27s
I am equally okay with dumping {{rfe}}
. --Dan Polansky (talk) 07:42, 3 August 2013 (UTC)
Razorflame
[edit]Razorflame (talk • contribs) is incompetent and untrustworthy. Evidence of his lack of dictionary making skill and lack of ability to make and keep promises is available in the archives of User talk:Razorflame: User_talk:Razorflame/Archive_1, User_talk:Razorflame/Archive_2, User_talk:Razorflame/Archive_3, and more in future. As a consequence of a protracted series of broken promises and mistakes resulting from his contributing in languages in which he has close to no knowledge, he was blocked in July 2010 for one year; see his block log.
Editors expressing serious misgivings about his editing include EncycloPetey, Equinox, msh210, Ruakh, opiaterein AKA Dick Laurent, Yair rand, Atelaes, and Dan Polansky (me).
Some editors hold hope for his somehow maturing over the years. However, I find it unlikely.
Arbitrarily selected incidents:
- Performing various arbitrary unjustified reverts
- Contributing in a variety of languages not understood by him with unclear error rate; fairly many errors have been identified by the editors versed in the language in question
- Adding copyrighted non-free definitions, November 2009
- Adding Czech translations from a copyrighted dictionary, June 2010
- Creating Kannada entries with extremely numerous etymology sections based on nothing at all; see ತಡೆ and Talk:ತಡೆ, August 2013
--Dan Polansky (talk) 19:40, 6 August 2013 (UTC)
Remove the above section
[edit]Please either remove the above section or move it into a discussion page because as it sits right now, you're continuing to harass me by posting this on your talk page. Razorflame 04:33, 7 August 2013 (UTC)
On Wikipedia, w:WP:HOUND states that "If 'following another user around' is accompanied by tendentiousness, personal attacks, or other disruptive behavior, it may become a very serious matter and could result in blocks and other editing restrictions." Wiktionary doesn't have a formal, acronymized counterpart to that policy, but it does have WT:AGF and WT:BLOCK, which, as Metaknowledge already pointed out, warns against insulting other editors. I second the suggestion that you delete the above section and discontinue your practice of keeping tendentious notes on other users. - -sche (discuss) 05:08, 7 August 2013 (UTC)
- I have posted no personal attack. I have only posted accurate information. If you question the accuracy of any sentence that I have posted, do so, and be specific. I surmise that what I have posted is not tendentious; again, be specific in your allegations. I object to my removing the Razorflame section from my talk page. In the past, various opposers of blocking editors have requested that a case is made in favor of blocking them. Now when I am in the process of making a case, you are threatening me with a block. You are expressly citing a Wikipedia policy while knowing that references to Wikipedia policies made by newbies are quickly dismissed in Wiktionary; I cry foul. --Dan Polansky (talk) 18:40, 7 August 2013 (UTC)
- Let me take it on record that I feel harrassed by your post, which I see as an unjust threat of an abuse of admin tools. I am glad that I am not an admin, as I cannot harrass anyone with a gun, only with a word. --Dan Polansky (talk) 20:02, 7 August 2013 (UTC)
- You have hounded and attacked Razorflame as "incompetent and untrustworthy" despite previously being warned by Metaknowledge that such behavior was unacceptably uncivil and blockable per WT:BLOCK. You have refused to remove the attack and have shown that you have no intention of changing your behavior. Therefore, I have issued a short block pursuant to WT:BLOCK, which explicitly states—as Metaknowledge specifically warned you—that "[A cause for blocking is] causing our editors distress by directly insulting them or by being continually impolite towards them. There are few other means of protecting Wiktionary; the most obvious is by discussion on the users’ talk pages. Some effort should be made to explain to people why their edits are considered incorrect, however a short block can be given if they clearly won’t listen. In cases where a user has had something explained to them, an explicit warning should be given to them before blocking them; this can show that they have no intention of mending their ways." - -sche (discuss) 21:31, 7 August 2013 (UTC)
- Razorflame is incompetent and untrustworthy; there is plenty of evidence of this, and I can provide countless links to specific examples. Even Anatoli, who opposed Razorflame's block, seems to agree with me: "I know it sounds harsh but you seem not to have the right attitude or skills. --Anatoli (обсудить/вклад) 00:29, 7 August 2013 (UTC)".
- Re: "... despite previously being warned by Metaknowledge ...": You are wrong: I asked: "Which sentence that I have posted is incivil, blockworthy or borderline blockworthy? --Dan Polansky (talk) 21:42, 29 July 2013 (UTC)"; Metaknowledge answered: "Nothing except ...".
- As for WT:BLOCK, you are misrepresenting it in exactly the same way as Metaknowledge did; you should pay more attention and read my responses to Metaknowledge above.
- You are inaccurate and unjust. I have pointed your misrepresentations and misunderstanding to you above, but you do not acknowledge any mistake in a single point. By this unjust action, you are doing damage to your reputation. --Dan Polansky (talk) 22:03, 7 August 2013 (UTC)
- @Dan: I said "nothing except..." when that was true. Your sentence that simply declares Razorflame to be "incompetent" is opinion, not fact, and is certainly both incivil and borderline blockworthy, if not more. —Μετάknowledgediscuss/deeds 22:10, 7 August 2013 (UTC)
- Is this blockworthy: "I know it sounds harsh but you seem not to have the right attitude or skills."? --Dan Polansky (talk) 22:16, 7 August 2013 (UTC)
- You are wrong; Razorframe's incompetence is a verifiable fact. You seem not to understand what a fact is. --Dan Polansky (talk) 22:17, 7 August 2013 (UTC)
- Your mom is a verifiable fact. I propose to close this discussion immediately - an in fact I'm just gonna do that. -- Liliana • 22:20, 7 August 2013 (UTC)
- @Dan: I said "nothing except..." when that was true. Your sentence that simply declares Razorflame to be "incompetent" is opinion, not fact, and is certainly both incivil and borderline blockworthy, if not more. —Μετάknowledgediscuss/deeds 22:10, 7 August 2013 (UTC)
- You have hounded and attacked Razorflame as "incompetent and untrustworthy" despite previously being warned by Metaknowledge that such behavior was unacceptably uncivil and blockable per WT:BLOCK. You have refused to remove the attack and have shown that you have no intention of changing your behavior. Therefore, I have issued a short block pursuant to WT:BLOCK, which explicitly states—as Metaknowledge specifically warned you—that "[A cause for blocking is] causing our editors distress by directly insulting them or by being continually impolite towards them. There are few other means of protecting Wiktionary; the most obvious is by discussion on the users’ talk pages. Some effort should be made to explain to people why their edits are considered incorrect, however a short block can be given if they clearly won’t listen. In cases where a user has had something explained to them, an explicit warning should be given to them before blocking them; this can show that they have no intention of mending their ways." - -sche (discuss) 21:31, 7 August 2013 (UTC)
Sort order in Czech
[edit]How are Czech words normally sorted? Are all diacritics ignored or are there some special rules? —CodeCat 19:20, 7 August 2013 (UTC)
- Czech sorting is sketched here: Index_talk:Czech#Sorting_or_ordering. Thus, a, and á are equivalent as for sorting order, while r and ř are not. Another tricky thing is "ch", which is treated as a single letter rather than "c" followed by "h". If you want to sort Czech properly in a programming language, there should be a library that takes care of locale and collation. --Dan Polansky (talk) 19:26, 7 August 2013 (UTC)
- I'm asking because I want to make our own software approximate Czech sorting to a reasonable approximation. If I understand it correctly:
- Vowels with an acute accent should be equivalent to the basic vowel, and ů is also equivalent to u.
- ch should be equivalent to h
- Letters with haček are distinct from the basic letter. (Unfortunately we can't make these appear in the correct order, so they'll go at the end)
- What happens to w? It's not a native letter of Czech, but when it does occur, is it considered equivalent to v or distinct? And I suppose that the rules for Slovak are similar, but Slovak also has ä and ô, are those considered equivalent to a and o? —CodeCat 19:33, 7 August 2013 (UTC)
- Your 1st and 3rd bullets are right, but the 2nd is wrong: ch comes after h rather than being equivalent. w comes after v; nothing special going on there. Both points should follow from Index_talk:Czech#Sorting_or_ordering. I don't know about Slovak. If you install gsort from GnuWin32, you should be able to figure these things out empirically, by playing with locale. gsort or some GNU library might have a documentation or specification that you might want to see. --Dan Polansky (talk) 19:40, 7 August 2013 (UTC)
- We can't make ch sort after h on Wiktionary. We can't change the order of letters, only make certain letters equivalent to others. So either ch would be sorted under c+h, or considered equivalent to h. We could make ch sort at the end of h, so that the order would be hy, hz, cha, chb... but they would still appear under the H section. —CodeCat 19:43, 7 August 2013 (UTC)
- A Czech sorting that places ě,š,č,ř,ž after all other diacritic-free letters seems so fundamentally broken that I would not bother fixing the rest. --Dan Polansky (talk) 19:49, 7 August 2013 (UTC)
- We can't make ch sort after h on Wiktionary. We can't change the order of letters, only make certain letters equivalent to others. So either ch would be sorted under c+h, or considered equivalent to h. We could make ch sort at the end of h, so that the order would be hy, hz, cha, chb... but they would still appear under the H section. —CodeCat 19:43, 7 August 2013 (UTC)
- Your 1st and 3rd bullets are right, but the 2nd is wrong: ch comes after h rather than being equivalent. w comes after v; nothing special going on there. Both points should follow from Index_talk:Czech#Sorting_or_ordering. I don't know about Slovak. If you install gsort from GnuWin32, you should be able to figure these things out empirically, by playing with locale. gsort or some GNU library might have a documentation or specification that you might want to see. --Dan Polansky (talk) 19:40, 7 August 2013 (UTC)
- I'm asking because I want to make our own software approximate Czech sorting to a reasonable approximation. If I understand it correctly:
Block
[edit]For reference, here is a block summary: 7 August 2013 -sche (Talk | contribs) blocked Dan Polansky (Talk | contribs) with an expiry time of 1 week (account creation disabled) (Intimidating behavior/harassment: Violating WT:AGF+WT:BLOCK. Hounding+attacking editor despite being warned by MK such behavior was unacceptable+blockable. Refusing to remove attack or acknowledge such behaviour was unacceptable despite being w...).
I cry foul. --Dan Polansky (talk) 22:10, 7 August 2013 (UTC)
- So let me get this straight: you were blocked for telling Razorflame that he is incompetent in languages that he's editing, and in the meantime Razorflame has requested deletion of several of his Kannada entries because he cannot read the script and cannot provide a single quotation for any of the meanings that were RfV-ed? It looks like we're slowly turning to Wikipedia. --Ivan Štambuk (talk) 19:34, 17 August 2013 (UTC)
- AFAIK, the block was mainly for diff, which ascribes two negative qualities to an editor, and provides detail and partial substantiation of the negative qualities. See also #Remove the above section, Aug 2013. --Dan Polansky (talk) 11:27, 18 August 2013 (UTC)
DPMaid
[edit]I have created User:DPMaid for menial, semiautomated edits. --Dan Polansky (talk) 17:47, 16 August 2013 (UTC)
Old Czech
[edit]In Этимологический словарь славянских языков there are many Old Czech (ст.-чеш.) words listed (with a Russian gloss), which contain some important properties with respect to etymologies. I'd like to add some in the main namespace. In your opinion, to they merit separate language code/L2, or should they be just marked as ==Czech== with a context label Old Czech? There is also Old Polish (zlw-opl), so Old Czech would be zlw-ocs. --Ivan Štambuk (talk) 19:21, 17 August 2013 (UTC)
- If you choose to enter Old Czech entries, using "Czech" as L2-heading and using "Old Czech" as a context should do as a temporary measure, to be updated later when people with strong opinions on the subject arrive. This should be okay given that such entries can be identified by humans by using wiki online search and using grepping on the dump. Old Czech was written in an orthography and spelling radically different from the modern one. Per Building a Corpus of Old Czech, 2012, "The term Old Czech (OC) usually refers to the language as used roughly between 1150 and 1500." The article further distinguishes "Humanistic Czech (1500-1650), Baroque Czech (1650-1780) and then Czech of the so-called National Revival."
- ISO 639-2 does not distinguish historical varieties of Czech, as per loc.gov. Nor does ISO 639-1 per W:List of ISO 639-1 codes.
- For Old Czech entries, attestation using sources other than dictionaries is key, I think. Etymologies of Russian and other languages cannot be reliably used to attest Old Czech words, I believe. An Old Czech corpus, which could be used for attestation, seems to be available at http://vokabular.ujc.cas.cz/banka.aspx. However, the corpus uses a modernized orthography, more of which in the following paragraph.
- The orthography under which to record Old Czech entries is rather unclear to me. According to W:Czech_orthography#History, using diacritics was first suggested by Jan Hus in "De orthographia Bohemica" around 1406; if Old Czech ranges from 1150 and 1500 as suggested above, then an orthography with diacritics is not the one in which Old Czech was contemporaneously recorded. A further orthographic difference is in "g" vs. "j" and "w" vs. "v", as "mág" vs. "máj". The Old Czech corpus mentioned in a paragraph above uses a modernized orthography, with diacritics and "máj"; the corpus contains e.g. "když ten měsiec máj rozličným kvietím polí rovnost bieše ozdobil". A worthy read in Czech is Textová opora k vývoji českého jazyka by Jaroslav David, 2007-2008, which contains a scan from 1668 on page 12 showing "gest" instead of "jest" and "chowati" instead of "chovati", and examples of various orthographies on page 22. Another worthy read in Czech seems to be Vývoj pravopisu ve staré češtině by Pleskalová, 2008. --Dan Polansky (talk) 11:27, 18 August 2013 (UTC)
- Seems to be more complicated then I thought it would. Perhaps it would be necessary for Czech (and many other Slavic languages in the per-standardization period) devise some kind of normalization scheme that would ease looking up words. ESSJa is not Russian etymological dictionary but of Proto-Slavic (Common Slavic) lexicon, and it is compiled on the basis of many dictionaries of dialectal and obsolete/extinct word, so it is pretty much reliable, it's just the spelling scheme that they utilize that is inconsistent at times. Personally, giving all the problems that per-standardized spellings entail, I'd support mandatory citations for every mainspace entry. --Ivan Štambuk (talk) 08:13, 21 August 2013 (UTC)
- If I were under the compulsion to create Old Czech entries right now (which I am not), I would probably go with the modernized spelling used by the only online Old Czech corpus that I know of, despite some reservations that I have. The corpus makes it easy to do some sort of attestation on the spellings, and the designers of the corpus hopefully gave some thought to why they go with the modernized spellings. I don't think we absolutely need mandatory citations in the mainspace, since the corpus makes verification of attestation fairly easy. --Dan Polansky (talk) 18:10, 21 August 2013 (UTC)
- I did some google searches of some excerpts form the corpus and they have no hits, or have a single hit (some PDF or DOC file from the same website). If that corpus site goes down for whatever reason, all of the entries created on the basis of it suddenly become very problematic. The site is great, but the "single point of failure" is what bothers me. For normal languages attestation is not a problem, but for ancient or pre-standardized forms of languages spellings will vary according to the normalization scheme. It' always the best to include some short attestation, ideally with an original spelling. Even if these old works are available on Google Books or Internet Archive, they are difficult to search for because of different scripts or special diacritics which OCR can't recognize.
- At any case, I don't have any immediate plans to create OC entries, but it's an issue to have in mind because it will definitely arise in the future.
- PS: I've found that Czech WS has some OC work such as Dalimilova kronika (with a German translation, yay) - I'll use those, if anything. I've opened a random part and the shockingly initial verses speak about Croats and Serbs.. --Ivan Štambuk (talk) 15:12, 22 August 2013 (UTC)
- If I were under the compulsion to create Old Czech entries right now (which I am not), I would probably go with the modernized spelling used by the only online Old Czech corpus that I know of, despite some reservations that I have. The corpus makes it easy to do some sort of attestation on the spellings, and the designers of the corpus hopefully gave some thought to why they go with the modernized spellings. I don't think we absolutely need mandatory citations in the mainspace, since the corpus makes verification of attestation fairly easy. --Dan Polansky (talk) 18:10, 21 August 2013 (UTC)
- Seems to be more complicated then I thought it would. Perhaps it would be necessary for Czech (and many other Slavic languages in the per-standardization period) devise some kind of normalization scheme that would ease looking up words. ESSJa is not Russian etymological dictionary but of Proto-Slavic (Common Slavic) lexicon, and it is compiled on the basis of many dictionaries of dialectal and obsolete/extinct word, so it is pretty much reliable, it's just the spelling scheme that they utilize that is inconsistent at times. Personally, giving all the problems that per-standardized spellings entail, I'd support mandatory citations for every mainspace entry. --Ivan Štambuk (talk) 08:13, 21 August 2013 (UTC)
Romanization
[edit]Some links on romanization and transliteration in Wiktionary:
Votes:
- Wiktionary:Votes/pl-2009-12/Treatment_of_toneless_pinyin_syllables
- Wiktionary:Votes/2011-07/Pinyin entries
- Wiktionary:Votes/pl-2011-08/Romanization_of_languages_in_ancient_scripts
- Wiktionary:Votes/pl-2011-09/Romanization_of_languages_in_ancient_scripts_2
- Wiktionary:Votes/pl-2011-10/Romanization of Gothic
- Wiktionary:Votes/pl-2013-03/Japanese_Romaji_romanization_-_format_and_content
- Wiktionary:Votes/pl-2013-03/Romanization and definition line
- Search votes for "romanization"
Categories:
- Category:Chinese_romanizations, incl. Category:Mandarin pinyin
- Category:Japanese_kanji
- Category:Gothic romanizations
Other:
Romanization and transliteration guides:
- Ancient Greek: Wiktionary:Ancient Greek romanization and pronunciation
--Dan Polansky (talk) 16:01, 20 August 2013 (UTC)
- One vote added. --Dan Polansky (talk) 06:50, 14 June 2014 (UTC)
Tit-for-tat discussion closing request.
[edit]Greetings, Dan. I have recently proposed in the Beer parlour that since WT:RFD and WT:RFV are perpetually backlogged with discussions that should have been closed long ago, it would be nice if editors adding a new section to one of these pages would help to move some old sections towards closure/archiving. Since you have added some new RfV sections, please consider closing or archiving some old ones, or otherwise moving discussions toward closure. Cheers! bd2412 T 20:40, 20 August 2013 (UTC)
- I've striked some RFVs. Great initiative; kudos! --Dan Polansky (talk) 18:04, 21 August 2013 (UTC)
American or British spelling
[edit]See Wiktionary_talk:About_English#Entry_created_first_to_be_made_.22lemma.22 for my opposition of this policy:
- "If a word is spelled differently in different standard varieties of English, the spelling (that is, the entry) which was created first is made the lemma; to avoid unmaintainable duplication of content, other spellings soft-redirect to it."
Some links:
- Wiktionary:American_or_British_Spelling
- Wiktionary:Beer_parlour_archive/October-December_05#Especially_British_?, August 2005
- Wiktionary:Beer_parlour_archive/2006/March#First_quarter_2006_US_vs._UK_flamewar, Match 2006
- Wiktionary:Beer_parlour/2012/January#.22color.2Fcolour.22_etc., January 2012
- Wiktionary:Beer_parlour/2013/August#-yse.2F-yze, August 2013
--Dan Polansky (talk) 07:31, 31 August 2013 (UTC)
If the new policy gains support, the test for its application is going to be "defense", which was created later than "defence", is predominant in American corpus and fairly common in British corpus. Some seem to imply that there is an exception or constraint somehow already built into the policy:
- "I support the policy while also noting that it only applies to spellings that are exclusive to one area or the other" --Codecat; "defense" is not exclusive to one area.
- "... a word that has an acceptable common spelling used in all varieties of English needn’t consult this fall-back rule": --Mzajac; so again, "defense", "defence" pair has an acceptable common spelling: "defense".
--Dan Polansky (talk) 14:04, 1 September 2013 (UTC)
lorem ipsum
[edit]While you are at it, you may want to destroy this Latin translation (and hopefully destroy this project also). --Æ&Œ (talk) 02:55, 10 September 2013 (UTC)
- This post is intentionally left without an answer. --Dan Polansky (talk) 19:37, 2 October 2013 (UTC)
Looks like you’re taking my advice! Good job, man! --Æ&Œ (talk) 22:20, 19 October 2013 (UTC)
- I've removed Latin and Sanskrit translations as they seem to be made up. Many of Sanskrit and Latin terms that appear in translations (some even as entries) for modern terms are calques of "modernized reconstructions", usually stemming from and la and sa wikiprojects, which in turn serve as some kind of revival effort/intellectual playground. Since we cannot use wikipedia for attestation, we can't list them. --Ivan Štambuk (talk) 22:42, 19 October 2013 (UTC)
- I wouldn't remove them without checking first. Many Modern Latin terms are surprisingly attested in Vatican publications, recent Latin translations, and on Usenet. —Μετάknowledgediscuss/deeds 00:03, 20 October 2013 (UTC)
- Well google search yielded nothing on both of them so it's better to remove them. --Ivan Štambuk (talk) 00:27, 20 October 2013 (UTC)
- I meant that as a general guideline. On these specific words, 'twould seem you're justified. —Μετάknowledgediscuss/deeds 00:34, 20 October 2013 (UTC)
- Well google search yielded nothing on both of them so it's better to remove them. --Ivan Štambuk (talk) 00:27, 20 October 2013 (UTC)
- I wouldn't remove them without checking first. Many Modern Latin terms are surprisingly attested in Vatican publications, recent Latin translations, and on Usenet. —Μετάknowledgediscuss/deeds 00:03, 20 October 2013 (UTC)
- But Wiktionary is a playground. Hardly anybody takes this project seriously. It is better to view Wiktionary as an outlet for enjoyment, not some crusade to inform or educate people. --Æ&Œ (talk) 22:46, 19 October 2013 (UTC)
- WMF's mission is to educate the planet. This is serius biznis. --Ivan Štambuk (talk) 00:27, 20 October 2013 (UTC)
- But Wiktionary is a playground. Hardly anybody takes this project seriously. It is better to view Wiktionary as an outlet for enjoyment, not some crusade to inform or educate people. --Æ&Œ (talk) 22:46, 19 October 2013 (UTC)
Requests
[edit]Stop removing requests you don’t want to serve. Leave them there. — Ungoliant (Falai) 01:15, 17 September 2013 (UTC)
- This post by a user with an inappropriate user name is intentionally left without an answer. --Dan Polansky (talk) 19:35, 2 October 2013 (UTC)
Original research
[edit]Wiktionary seems to practice original research in creation of definitions. Some editors want to practice it in etymologies too.
Links:
- Wiktionary:Wiktionary is a secondary source
- Wiktionary:Beer_parlour/2013/September#Etymology_policy.2C_original_research.2C_aliaque_Wiktionarii_conturbata
- Wiktionary:Beer_parlour/2013/September#Original_research_at_Wiktionary
- W:Wikipedia:No original research
- W:Wikipedia:No_original_research#Primary.2C_secondary_and_tertiary_sources
- Search in Beer parlour for "original research"
--Dan Polansky (talk) 09:06, 22 September 2013 (UTC)
ReidAA
[edit]Hi there. I think your criticism aimed at ReidAA is somewhat harsh. I can't see those diffs as cause-promoting at all. Also, I believe that the quotations illustrate very well the words in question. As for the message about me creating trouble, well, that's not entirely true ... One more thing, I corrected the URLs. -WF
- This post by a widely recognized troublemaker Wonderfool is intentionally left without an answer. --Dan Polansky (talk) 19:35, 2 October 2013 (UTC)
Revert
[edit]I reverted your edit because you removed lots of valid templates without any apparent explanation. I don't treat anyone differently, anyone who makes an edit I think is bad gets the same treatment, "random anon" or not. —CodeCat 19:40, 2 October 2013 (UTC)
- I removed an edit with which I disagreed, with a summary explaining my edit. Thus, status quo ante prevails in the absence of consensus. --Dan Polansky (talk) 19:42, 2 October 2013 (UTC)
- Furthermore, if you think an edit by a long-term editor is bad, you should explain that at least in the edit summary, or even on the talk page of the editor. Manual revert might be okay; rollback with no explanation not so. I find your general pattern of editing without edit summaries troublesome anyway. --Dan Polansky (talk) 19:44, 2 October 2013 (UTC)
From Beer Parlour discussion
[edit](I decided to move this conversation here, as it was off-topic and disrespectful to the original intent and poster of the thread.)
On yet another note, "E | talk" is a horrible signature. For one thing, signatures should match user names. For another thing, signatures should create the appearance of being names, which "E" is not. --Dan Polansky (talk) 09:38, 25 October 2013 (UTC)
- For all I know E might be a valid name in China or South East Asia. Otherwise, I agree with your points (that's a first!) -- Liliana • 09:49, 25 October 2013 (UTC)
- E has been the name I've gone by online for nearly two years. (Also, I'm pretty white.) — E | talk 10:39, 25 October 2013 (UTC)
- Re: "E has been the name I've gone by online ...": Which is why you have registered the user name User:Casicastiel, right? --Dan Polansky (talk) 12:29, 25 October 2013 (UTC)
I don't understand why you are questioning me, or what you don't understand about this. Casicastiel is my username. I use it on many sites. E is my name. It is what I call myself and what people call me, which is why it is in my signature. I've been on this site a matter of days and I was not prepared to have my own name, of all things, objected to. Please, leave me alone? — E | talk 15:03, 25 October 2013 (UTC)
- As I said, I find having a signature different from the user name objectionable, especially if consisting of a single letter. You report to be a native speaker of English, so your claim that "E is my name" is implausible to me. You now know that I disapprove of your signature and I know that you don't care and enjoy your signature, so there seems nothing more to clarify, right? --Dan Polansky (talk) 15:08, 25 October 2013 (UTC)
- Furthermore, leave the Beer parlour posts alone. Other people's posts should be neither edited nor commented out. --Dan Polansky (talk) 15:11, 25 October 2013 (UTC)
- If your question is whether E is the name I was given at birth: no. Neither is it a name by which I am referred to in "real life". It is a name I chose for myself and that many people have referred to me with. Are you asking me to change my signature? Is it unacceptable for a user on this site? If so, I will change it. All I want to do is help contribute to a project. I don't like conflict. This has been a very personally upsetting series of discussions for me and if that is to be the norm then I can't stay here.
- Additionally, I'm truly sorry for removing the posts in the Beer Parlour without asking you or Liliana-60. That was inappropriate on my part. Please forgive me. — E | talk 15:29, 25 October 2013 (UTC)
- I do not know of any policy prohibiting user names like "E". I am not an admin, and I cannot block you, so your diagreement with me can at worst lead to a quarrel. My views stand: a user registered as "User:Casicastiel" should not have "E" as a signature; I should not have to remember two string identities of a person, one used in a signature, another one in revision history of wiki pages. --Dan Polansky (talk) 18:37, 25 October 2013 (UTC)
Wikisaurus talk:unhappy: "pointless template"
[edit]I fail to see why {{talk header}} at the top of a talk page is ever pointless. Could you please explain to me why you feel this way? Thanks. —TeragR (talk) 17:16, 3 November 2013 (UTC)
- I don't think there is much to explain beyond the obvious: The template
{{talk header}}
clutters display while adding close to nothing of value. The overwhelming majority of editors have managed to properly use talk pages without being given an instruction along the lines of "post only true sentences"; Wiktionary is not a kindergarten. Given the template is included in less than 40 pages (Special:WhatLinksHere/Template:talk_header), its use is not a common practice in English Wiktionary. I oppose the use of the template. --Dan Polansky (talk) 20:53, 4 November 2013 (UTC)- Understood. Thank you for the clarification. —TeragR (talk) 20:57, 4 November 2013 (UTC)
Simile
[edit]I think similes should be kept, at least for the encoding direction. See also Category:English similes, and Talk:dumb as a bag of hammers, Talk:flat as a pancake, Talk:like the back end of a bus, Talk:watch like a hawk. --Dan Polansky (talk) 20:48, 5 December 2013 (UTC)
Two more RFD discussion links, from 2014: Talk:fat as a cow, Talk:fat as a pig.
Rationale for keeping high-frequency transparent similes (intransparent ones meet WT:CFI#Idiomaticity anyway):
- For the encoding direction: How do I say e.g. "very fat" using a simile?
- For simile-to-simile translation: How do I render e.g. "fat as a pig" using a Spanish simile?
--Dan Polansky (talk) 11:00, 26 July 2014 (UTC)
free variable
[edit]The page talk:free variable is the first one where I provided a certain kind of argument for keeping some pages of the form <adjective> <noun>.
This is what I wrote:
- With the current definition from programming, the term "free variable" appears to be SoP only because its meaning is explicitly listed in the entry "free" -- "(programming) Of identifiers, not bound". The same applies to "free variable" in logic, which is currently undefined.
- If "free variable" gets deleted, other terms may follow. They include algebraic number, per the definition of algebraic -- "(Of a number) which is a root of some polynomial", which makes "algebraic number" technically a sum of parts. Likewise transcendental number and even complex number, as complex has the definition "(mathematics) Of a number, of the form a + bi, where a and b are real numbers and i is the square root of −1."
- I fear that these cases provide a method of how to artificially make a lot of two-word technical terms of the form <adjective> <noun> appear sum-of-parts, by providing their definition at the adjective, of the form "Of <noun>, definition". Imagine I get rid of red dwarf by adding to red the definition "Of a dwarf star, small and relatively cool one of the main sequence".
- I do not know what WT:CFI says to these cases, but to me all these sum-of-parts seem somehow artificial or odd. I would like to see free variable, algebraic number, transcendetal number and complex number included.
- Some of the concerned entries: algebraic number, algebraic integer, bound variable, cardinal number, complex number, free variable, imaginary number, rational number, real number, transcendental number, free software, open set, closed set, complete graph, normal distribution.
Talk pages of similar cases: talk:nominative case, talk:prime number, Talk:free software. --Dan Polansky (talk) 08:58, 7 December 2013 (UTC)
- yellow press is another case in point since we have yellow: Characterized by sensationalism, lurid content, and doubtful accuracy. --Dan Polansky (talk) 18:43, 10 September 2016 (UTC)
Define
[edit]Can you define what an "English native speaker" is for me please? Pass a Method (talk) 17:15, 8 December 2013 (UTC)
- native -- belonging to one by birth
- After my birth, I was brought up to speak Czech and no other language, so I am a native Czech speaker, while I have acquired English later in my life, becoming a non-native English speaker. There are people who have more than one language as the native one, and there are various degress of shades, depending how early in life one acquires the 2nd language. --Dan Polansky (talk) 17:24, 8 December 2013 (UTC)
Thank you for your Wikisaurus work
[edit]Thank you for starting Wikisaurus:stick. I wanted to look up synonyms for beam (building component) and this list was a good starting point. – b_jonas 14:05, 19 December 2013 (UTC)
Sourced
[edit]Why did you make this revert? Pass a Method (talk) 20:15, 25 December 2013 (UTC)
- I reverted undue Islamization. --Dan Polansky (talk) 17:14, 29 December 2013 (UTC)