Jump to content

Module:accel: difference between revisions

From Wiktionary, the free dictionary
Content deleted Content added
separate generation of JSON so that error message can be properly sequestered, stylistic stuff (merged from Module:accel/sandbox)
Undo revision 50731192 by Surjection (talk) - Since when are comparatives and superlatives supposed to be treated as full lemmas??
Tag: Undo
Line 44: Line 44:
}
}
if params.form == "comparative" or params.form == "superlative" then
if params.form == "comparative" or params.form == "superlative" or params.form == "equative" then
entry.head =
"{{head|" .. params.lang .. "|" .. params.form .. " " .. params.pos ..
(params.target ~= params.target_pagename and "|head=" .. params.target or "") ..
(params.gender and "|g=" .. params.gender or "") ..
"}}"
entry.def =
"{{" .. params.form .. " of" ..
"|" .. params.origin ..
(params.origin_transliteration and "|tr=" .. params.origin_transliteration or "") ..
(params.pos ~= "adjective" and "|POS=" .. params.pos or "") ..
"|lang=" .. params.lang ..
"|nocat=1}}"
elseif params.form == "equative" then
entry.head =
entry.head =
"{{head|" .. params.lang .. "|" .. params.pos .. " " .. params.form .. " form" ..
"{{head|" .. params.lang .. "|" .. params.pos .. " " .. params.form .. " form" ..

Revision as of 23:22, 10 January 2019


This module supports the accelerated entry creation gadget, WT:ACCEL. It automatically creates entries according to a set of language-specific rules, located in submodules.

The module will automatically try to merge multiple generated entries into one, if everything but the definitions is the same. Moreover, if the definitions use {{inflection of}}, then the inflection tags will be combined into a single {{inflection of}} definition line, separated by a semicolon ;. In addition, the module will attempt to group multiple semicolon-separated tag sets in a single {{inflection of}} call that differ in only one dimension, using multipart tags. For example, the following initially-generated entries

==Latin==

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||dat|m|p}}

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||dat|f|p}}

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||dat|n|p}}

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||abl|m|p}}

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||abl|f|p}}

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||abl|n|p}}

will first be grouped into one entry as follows:

==Latin==

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||dat|m|p}}
# {{inflection of|la|bonus||dat|f|p}}
# {{inflection of|la|bonus||dat|n|p}}
# {{inflection of|la|bonus||abl|m|p}}
# {{inflection of|la|bonus||abl|f|p}}
# {{inflection of|la|bonus||abl|n|p}}

Then, the several inflection lines will be combined together into one:

==Latin==

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||dat|m|p|;|dat|f|p|;|dat|n|p|;|abl|m|p|;|abl|f|p|;|abl|n|p}}

Finally, the several tag sets in the single {{inflection of}} call will be grouped into one tag set with multipart tags, like this:

==Latin==

===Adjective===
{{head|la|adjective form|head=bonīs}}

# {{inflection of|la|bonus||dat//abl|m//f//n|p}}

The use of multipart tags like this helps by indicating where syncretism occurs and reduces the amount of information that must be processed. The algorithm to do the grouping is quite smart; it will only group when it won't change the semantics of the inflections, and there are multiple possible groupings that yield the same number of tag sets, the one with the fewest number of multipart tags is preferred. As an example of the latter, an entry like this::

==Latin==

===Adjective===
{{head|la|adjective form|head=bonum}}

# {{inflection of|la|bonus||acc|m|s|;|nom|n|s|;|acc|n|s|;|voc|n|s}}

will be converted to the following:

==Latin==

===Adjective===
{{head|la|adjective form|head=bonum}}

# {{inflection of|la|bonus||acc|m|s|;|nom//acc//voc|n|s}}

It could equally well be converted to the following, which also contains two tag sets:

==Latin==

===Adjective===
{{head|la|adjective form|head=bonum}}

# {{inflection of|la|bonus||acc|m//n|s|;|nom//voc|n|s}}

However, this grouping is dispreferred because it results in two multipart tags, while the preferred grouping has only one.

Language-specific submodules


Default rules

The module uses a set of default rules which generate entries that should be acceptable in most cases:

  • The headword is formatted using {{head}}, using the part-of-speech of the lemma plus "form", e.g. noun → noun form, adjective → adjective form, etc.
    • For the tags comparative and superlative, "comparative" and "superlative" are added to before the part of speech instead.
  • The definition is formatted using {{inflection of}}, and the form tag is used directly as the form tag of the template. Thus, gen|s{{inflection of|lang|...||gen|s}}.
  • For some tags, a special-purpose template is used in the definition instead:

These defaults may change in the future as Wiktionary's needs change. Don't rely on particular default values. If in doubt, assume that everything will use {{inflection of}}, and override anything you want to be different.

Requesting new rules

First, consider whether new rules are needed at all. The default rules suffice for many cases, especially if you make sure to provide a value for the form tag that can be directly inserted into {{inflection of}}. If you really need language-specific rules, and are not able to edit the module yourself, please file requests for new features at the Grease Pit. Specify:

  1. What you want to generate the links for. That includes at least a link to the template whose links you want to make green.
  2. What the generated entries should look like. In particular, which headword-line template it should use, and which form-of template, which parameters they should receive in which situations, and so on. A link to a word that has blue links to all the forms in the template would work best, as an example.
  3. Ideally, a link to a word that has red links to all the forms. This is useful for testing to see if the generated entries are correct.

Adding new rules

Generation rules are used to create the entry's contents. The general parts are defined in this module, while the language-specific rules are handled by submodules. Each submodule must return a table containing one function named generate. This function has two parameters, params and entry, and it does not return any value.

Params

The params parameter is a table that contains the information about the lemma, the form-of entry to be created, and the acceleration tags. It contains the following values:

lang
The language code of the language in question.
pos
The part of speech that the new entry is created for, e.g. "noun" or "verb". This is taken from whatever part-of-speech header preceded the template in the lemma entry. The default rule for creating the headword adds " form" onto this, resulting in e.g. {{head|hi|noun form}}.
target
The non-lemma form that the new entry is created for. This is taken automatically from the display form (alt) in the template's link to the form. It is used to give the head= parameter of the headword, if necessary (i.e. if different from target_pagename). This is the same as target_pagename in most cases, but can be different if the display form contains additional diacritics, as in languages such as Russian or Ukrainian (where the target will contain an acute accent marking the stress, if the word is more than one syllable) and Latin or Old English (where the target will contain macrons marking long vowels).
target_pagename
The page name of the entry to be created.
form
The name of the form. Normally this is an inflection code with the individual tags separated by pipe symbols, e.g. "1|s|pres|ind". Occasionally it may be something else like "comparative". This comes from the .accel.form field in a module invocation; the |accel-form= parameter in a call to {{l}}, {{m}} or similar; or the |fNaccel-form= parameter in a call to {{head}}. See WT:ACCEL for more information.
gender
The gender, or nil if no gender was explicitly given. This comes from the .accel.gender field in a module invocation; the |accel-gender= parameter in a call to {{l}}, {{m}} or similar; or the |fNaccel-gender= parameter in a call to {{head}}. See WT:ACCEL for more information.
transliteration
The transliteration of the non-lemma form, or nil if no gender was explicitly given. This comes from the .accel.translit field in a module invocation; the |accel-translit= parameter in a call to {{l}}, {{m}} or similar; or the |fNaccel-translit= parameter in a call to {{head}}. See WT:ACCEL for more information. Note that this will only be specified for languages that use a non-Latin script, and only when the auto-generated transliteration is insufficient, incorrect or nonexistent.
origin
The lemma that the new entry should link back to. This comes from the .accel.lemma field in a module invocation; the |accel-lemma= parameter in a call to {{l}}, {{m}} or similar; or the |fNaccel-lemma= parameter in a call to {{head}}. Under normal circumstances, none of these parameters are explicitly given, in which case the value of this field is the same as origin_pagename. (It will only differ from origin_pagename when the lemma contains additional diacritics that are stripped in order to generate the pagename, as in Latin, Russian, Ancient Greek or Old English. See WT:ACCEL for more information.
origin_pagename
The page name of the lemma to link back to.
origin_transliteration
The transliteration of the lemma to link back to, or nil. This comes from the .accel.lemma_translit field in a module invocation; the |accel-lemma-translit= parameter in a call to {{l}}, {{m}} or similar; or the |fNaccel-lemma-translit= parameter in a call to {{head}}. The same considerations apply here as for transliteration above. See WT:ACCEL for more information.

Entry

The entry parameter is essentially the return value of the function. It is a table that contains the different parts of the entry that is being created. Some of them will already have a default value when the language-specific function is run, while others are nil by default. The purpose of the generation function for each language is to fill in these values, or override the defaults, so that the entry is generated according to what is needed for the language. The entry table contains the following values:

pronunc
The contents of the "Pronunciation" section, if any. Empty by default.
pos_header
The name of the level 3 part-of-speech header for the new entry. This does not usually need to be changed, as it automatically matches the part of speech of the main entry. But you can change it if, for example, you are generating a participle entry and you want to show "Participle" instead of "Verb".
head
The headword template code and all its parameters. By default, it uses {{head|(lang)|(pos) form}}, with head= and tr= as necessary. You need to override this if you need something else.
def
The definition line, without the initial # . By default, it uses {{inflection of|(lang)|(target)||(form)}}, with tr= as necessary. You need to override this if you need something else.
inflection, declension, conjugation
The contents of the "Inflection", "Declension" and "Conjugation" sections respectively, if any. Empty by default. This can be used if the new entry is a sub-lemma with its own inflection, such as participles or comparative/superlative forms that inflect themselves.
mutation
The contents of the "Mutation" section. Empty by default. This appears at level 3 rather than level 4.
altforms
The contents of the "Alternative forms" section. Empty by default. This appears after the definitions (rather than before, which is more common) and after the sections above (as per WT:EL).

local export = {}


function no_rule_error(params) -- Intentionally global; better way to do this?
	return error(('No rule for "%s" in language "%s".')
		:format(params.form, params.lang), 2)
end


messages = require("Module:array")() -- intentionally global


local function default_entry(params)
	local entry = {
		pronunc = nil,
		pos_header = mw.getContentLanguage():ucfirst(params.pos),
		head =
			"{{head|" .. params.lang .. "|" .. params.pos .. " form" ..
			(params.target ~= params.target_pagename and '|head=' .. params.target or "") ..
			(params.transliteration and "|tr=" .. params.transliteration or "") ..
			(params.gender and "|g=" .. params.gender or "") ..
			"}}",
		def =
			"{{inflection of|" ..
			params.origin ..
			(params.origin_transliteration and "|tr=" .. params.origin_transliteration or "") ..
			"||" .. params.form ..
			"|lang=" .. params.lang ..
			"}}",
		inflection = nil,
		declension = nil,
		conjugation = nil,
		mutation = nil,
	}
	
	-- Exceptions for some forms
	local templates = {
		["p"] = "plural of",
		["f"] = "feminine of",
		["n"] = "neuter of",
		["f|s"] = "feminine singular of",
		["m|p"] = "masculine plural of",
		["f|p"] = "feminine plural of",
	}
	
	if params.form == "comparative" or params.form == "superlative" or params.form == "equative" then
		entry.head =
			"{{head|" .. params.lang .. "|" .. params.pos .. " " .. params.form .. " form" ..
			(params.target ~= params.target_pagename and "|head=" .. params.target or "") ..
			(params.gender and "|g=" .. params.gender or "") ..
			"}}"
		entry.def =
			"{{" .. params.form .. " of" ..
			"|" .. params.origin ..
			(params.origin_transliteration and "|tr=" .. params.origin_transliteration or "") ..
			(params.pos ~= "adjective" and "|POS=" .. params.pos or "") ..
			"|lang=" .. params.lang ..
			"|nocat=1}}"
	elseif templates[params.form] then
		entry.def =
			"{{" .. templates[params.form] ..
			"|" .. params.origin ..
			(params.origin_transliteration and "|tr=" .. params.origin_transliteration or "") ..
			"|lang=" .. params.lang ..
			"}}"
	end
	
	return entry
end

-- Merges multiple entries into one if they differ only in the definition
local function merge_entries(entries)
	local entries_new = {}
	
	for i, entry in ipairs(entries) do
		local last_entry = entries_new[#entries_new]
		
		if last_entry and
			entry.pronunc == last_entry.pronunc and
			entry.pos_header == last_entry.pos_header and
			entry.head == last_entry.head and
			entry.inflection == last_entry.inflection and
			entry.declension == last_entry.declension and
			entry.conjugation == last_entry.conjugation then
			
			local params1 = mw.ustring.match(last_entry.def, "^{{inflection of|([^{}]+)}}$")
			local params2 = mw.ustring.match(entry.def, "^{{inflection of|([^{}]+)}}$")
			
			last_entry.def = last_entry.def .. "\n# " .. entry.def
			
			-- Do some extra-special merging with "inflection of"
			if params1 and params2 then
				-- Find the last unnamed parameter of the first template
				params1 = mw.text.split(params1, "|", true)
				local last_numbered_index
				
				for j, param in ipairs(params1) do
					if not mw.ustring.find(param, "=", nil, true) then
						last_numbered_index = j
					end
				end
				
				-- Add grammar tags of the second template
				params2 = mw.text.split(params2, "|")
				local tags = {}
				local n = 0
				
				for k, param in ipairs(params2) do
					if not mw.ustring.find(param, "=", nil, true) then
						n = n + 1
						
						-- Skip the first two unnamed parameters,
						-- which don't indicate grammar tags
						if n >= 3 then
							-- Now append the tags
							table.insert(tags, param)
						end
					end
				end
				
				-- Add the new parameters after the existing ones
				params1[last_numbered_index] = params1[last_numbered_index] .. "|;|" .. table.concat(tags, "|")
				last_entry.def = "{{inflection of|" .. table.concat(params1, "|") .. "}}"
			end
		else
			table.insert(entries_new, entry)
		end
	end
	
	return entries_new
end

local function entries_to_text(entries, lang)
	lang = require("Module:languages").getByCode(lang) or require("Module:languages").err(lang, "lang")
	
	for i, entry in ipairs(entries) do
		entry =
			(entry.pronunc and "===Pronunciation===\n" .. entry.pronunc .. "\n\n" or "") ..
			"===" .. entry.pos_header .. "===\n" ..
			entry.head .. "\n\n" ..
			"# " .. entry.def ..
			(entry.inflection and "\n\n====Inflection====\n" .. entry.inflection or "") ..
			(entry.declension and "\n\n====Declension====\n" .. entry.declension or "") ..
			(entry.conjugation and "\n\n====Conjugation====\n" .. entry.conjugation or "") ..
			(entry.mutation and "\n\n===Mutation===\n" .. entry.mutation or "")
		
		entries[i] = entry
	end
	
	return "==" .. lang:getCanonicalName() .. "==\n\n" .. table.concat(entries, "\n\n")
end

function export.generate(frame)
	local fparams = {
		lang            = {required = true},
		origin_pagename = {required = true},
		target_pagename = {required = true},
		num             = {required = true, type = "number"},
		
		pos                    = {list = true, allow_holes = true},
		form                   = {list = true, allow_holes = true},
		gender                 = {list = true, allow_holes = true},
		transliteration        = {list = true, allow_holes = true},
		origin                 = {list = true, allow_holes = true},
		origin_transliteration = {list = true, allow_holes = true},
		target                 = {list = true, allow_holes = true},
	}
	
	local args = require("Module:parameters").process(frame.args, fparams)
	
	local entries = {}
	
	-- Generate each entry
	for i = 1, args.num do
		local params = {
			lang = args.lang,
			origin_pagename = args.origin_pagename,
			target_pagename = args.target_pagename,
			
			pos = args.pos[i] or error("The argument \"pos\" is missing for entry " .. i),
			form = args.form[i] or error("The argument \"form\" is missing for entry " .. i),
			gender = args.gender[i],
			transliteration = args.transliteration[i],
			origin = args.origin[i] or error("The argument \"origin\" is missing for entry " .. i),
			origin_transliteration = args.origin_transliteration[i],
			target = args.target[i],
		}
		
		params.form = mw.ustring.gsub(params.form, "|", "|")
		
		-- Make a default entry
		local entry = default_entry(params)
		
		-- Try to use a language-specific module, if one exists
		local success, lang_module = pcall(require, "Module:accel/" .. args.lang)
		
		if success then
			lang_module.generate(params, entry)
		end
		
		-- Add it to the list
		table.insert(entries, entry)
	end
	
	-- Merge entries if possible
	entries = merge_entries(entries)
	entries = entries_to_text(entries, args.lang)
	
	return entries
end


function export.generate_JSON(frame)
	local success, entries = pcall(export.generate, frame)
	
	-- If success is false, entries is an error message.
	local ret = { [success and "entries" or "error"] = entries, messages = messages }
	
	return require("Module:JSON").toJSON(ret)
end


return export