Jump to content

Upgrading Wiktionary

From Meta, a Wikimedia project coordination wiki

Since the inception of Wiktionary, there have been many proposals (links) for adding metadata about terms, or adding structure to input screens to help unify format and content across the project. As the project split into separate language editions (links), this dilemma became more pronounced; each project developed its own style guidelines which were not necessarily compatible with those of other projects. (The ongoing debates about whether to enforce first-letter capitalization across all articles (links), and whether this should be a decision made separately by each language, are a fine example.)

Many Wiktionaries share data that is localised by using templates. Its use is problematic because it is hard to update known errors in wiktionaries that share the same content. Particularly painfull are the many languages that do not capitalise language names. These can be found to be wrong in many wiktionaries. There is no easy method of fixing these problems.

The intensive collaboration between particularly the Italian and the Dutch wiktionaries fed the need for a solution to the lack of coherence and cooperation. Without particularly the hard work of Sabine Cretella we would not be comtemplating the implementation of Ultimate Wiktionary.

The inception of Wiktionary marked another milestone in the history of Wikimedia -- it was the first fork of verifiable, neutral content away from Wikipedia (links). Since then, there have been many less-focused discussions about how and when to separate content into its own project or database; when this is a duplication of effort and when it makes good sense (links).

Some people maintain that all projects should be in a single database with an advanced set of namespaces and viewing options; more specific suggestions for reuniting the projects include a unified login scheme, unified or aggregate recentchanges, improved interproject/interlanguage links, and a special interface for translations (links).
The diametrically opposed position, that all projects should exist completely independently in all languages, without encouraging translation, without special interproject links, and without any sort of policy coordination, is not popular. However, one can find community members who maintain each of those positions to some degree.

Reducing repetition

[edit]

One problem with maintaining content in multiple languages is that updates have to be repeated in many languages. For Wikipedia and most other Wikimedia projects, this is presently of a secondary concern, because the human effort required to translate new changes exceeds the effort required to notify people that a change has occurred.

Exceptions to this rule are interwiki links, which should be [with rare exceptions] the same in all languages, for all articles linked together; there adding a single link can require updating dozens of pages in different languages. Wiktionary is a far more striking example; if the English entry for a word includes its translation and definition in a dozen languages, the entries in those other languages will contain exactly the same content (save for the names of the languages involved, and terms like "noun", "obsolete" -- the metadata about parts of that entry).

As Wiktionary has become more multilingual, the pressure to recombine its databases to avoid such repetition has grown stronger.

Metadata and structured data

[edit]

Mediawiki, like all information platforms, supports metadata about its contents. It has long been recognized (links) that it could usefully support a much cleaner distinction between data and metadata, a richer set of matadata categories, even a finer granularity of content addressing to allow metadata to be associated with individual paragraphs, sections or revisions.

A few metadata projects have been proposed and worked on. The metadata that is used in the current wiktionaries is particularly used for localisation. As metadata it does not really function at all.

Eloquence proposed adding support for structured data to the MediaWiki core codebase in 2003 (known as the Wikidata proposal). It was not originally conceived with Wiktionary as a target project; however some people (notably GerardM) saw this as a key step towards recombining the different-language Wiktionary projects in a single database.

External language organizations have also taken an interest in Wiktionary, and many are willing to donate or synchronize their wordlists and thesauri and spelling dictionaries with Wiktionary. Some want to maintain a synchronisation for the changes that will happen. The use of TBX an open standard for lexicological content is the method of choise to effect such a synchronisation

Kennisnet, an organisation that supports the Internet for the Dutch education liked the idea of the Ultimate Wiktionary, and provided a grant for the initial development of relational data in a Mediawiki environment.

GerardM was the contact for that grant, and the project manager for the related development; getting this grant, and finding receptive audiences in other language institutions, encouraged him to work in earnest on the many plans for the "Ultimate Wiktionary".

Ultimate Wiktionary design

[edit]

The data model is intended to be general enough to cover all of the different potential uses of an online dictionary : as a content source for translation tools, as a spelling or pronunciation dictionary, as an etymological dictionary, &c. Then suggestions began to come in, regarding which data formats to support, what kinds of content to include -- sign language entries, exports compatible with a variety of different dictionary standards.

As of this draft, the current data model, based on GerardM's original design with input from many others, contains 39 tables.


Vikivortaro kaj ReVo

[edit]

en I add this discussion that runs in teh eo-WP. In esperanto we have the ReVo which provides thousand of words with definition, translations in several languages and so on. To put the contain of ReVo into Wiktionary would be a good stuff. But there are some problems.

I make a suggestion here above : each wiktionary in each language should have only the article in its own language whith interwiki links to the article int the other languages, and the words in other languages should be in a wiktionary commons which should have only entrys like : vino {{es}} --> [[:es:vino]], not repeating the same informations in the wiktionary in each language. Arno Lagrange  08:00, 10 January 2006 (UTC)[reply]

eo La avantaĝo enporti la entenon de ReVo al Vikivortaro estas riĉigi la esperantan parton de la Vikivortaro kiu ĝenerale estas uzata de neesperantistoj dum ReVo estas iel ia pura esperanta projekto. Same kiel pri vikipedio estas lerta maniero montri al la neesperanta mondo la valoron kaj la riĉecon de esperanto. Fakte la ida vikivortaro riĉas je 30 000 artikoloj sed pli ol 90% el ili estas simple la artikoloj pri la vortoj en la plej diversaj nacilingvoj ĉar vikivortaro havas tian stultan strukturon ke en ĉiu unuopa nacilingva vikivortaro oni ripetas la vortojn de ĉiuj fremdaj lingvoj. Mi provis kontraŭbatali tion ĉar mi opinias ke estas terure temporaba kaj ke oni simple kreadas amason da paĝoj tute malkompletaj. Certe oni iam devos plibonigi tion. Tiu stultaĵo fontas el la fakto ke la unuaj kreintoj de vikivortaro imagis ĝin kiel iu angla vortaro kiu ebligus el la angla vidi la aliajn lingvojn, kaj komence ili eĉ ne imagis ke povus esti vikivortaro kun alia ĉeflingvo ol la angla. Tamen tiuj ekaperis kaj simie kopiis tiun stultaĵon : vidi ĉiujn aliajn lingvojn el la vidpunkto de unu lingvo : la rezulto estas ke por la sama litersinsekvo aperas en ĉiu unuopa nacilingva vikivortaro paĝo en kiu oni pli malpli malkomplete listigas kian vorton konsistigas tiu litersinsekvo en la plej diversaj lingvoj (mi esploris tion kun la ekzemploj vin kaj vino). Laŭ mi ĉiu unuopa vikivortaro devus havi nur siajn nacilingvajn artikolojn intervikie ligitajn al la respondaj alilingvaj artikoloj, kaj la fremdlingvaj vortoj devus troviĝi en iu vikivortara komunejo de kiu oni povus trovi la signifon de la diversaj litersinsekvoj en la plej diversaj lingvoj.

Questions

[edit]
  • There is a problen in uk.wiktionary (sorry if I tell it in wrong place). Special page uk:wikt:Special:Allpages erroneously lists many articles from Mediawiki namepace in main namespace Ilya K 13:20, 29 October 2005 (UTC)[reply]
    Odd, that's because all the interface messages are in the main namespace. For example, uk:wikt:Usercssjs is actually the location of the interface message, and uk:wikt:MediaWiki:Usercssjs simply redirects there. Perhaps this was done to allow everyone to edit the interface?! – Minh Nguyễn (talk, contribs) 23:18, 7 February 2006 (UTC)[reply]
  • What is the relationship between completing the software development funded by Kennsinet, and launching a new project? There seems to be no delay between having a "third milestone" and declaring the project complete and open for business.
  • Where is there time for community feedback and discussion?
    I would like to see time given to a formal feedback phase, a period during which a unified database exists only for demonstration purposes, and feedback at all levels of detail is encouraged. Sj
    Feedback is encouraged. Feedback resulted in the many changes that can be observed in the many changes of the [Image:ERD.jpg data design].
    What is a formal feedback phase. We have never had one. The point of Ultimate Wiktionary is that it is used. What is the point of a "demonstation" if it means that the data cannot be used, changed? This is what people need to do to make up their mind if they want to work with UW or with the wiktionaries. GerardM
  • Are there provisions for writing user manuals? For teaching other developers how to use and update the software?
  • Is a unified wiktionary project, before it has attained consensus support in the Wiktionary community, considered a 'new' Wikimedia project? A prototype of an interface upgrade to a current project?
    Currently we need 5 people to start a project. These can be found. If people want to work on UW they can, if a community moves over to UW they can. There are however many Wiktionary communities and many particularly the ones for smaller languages will have a bigger impact in a shared database than seperately in a wiktionary.
    How might upgrades be rolled out?
    Upgrades to UW will be implemented like any other Mediawiki upgrade.
    Should the idea of upgrading Wiktionary go through the same process that ideas for completely new projects do?
  • How will the new Wiktionary/Wikidata code be merged into MediaWiki? Will it be part of MW 1.7? Will it be part of the release before it is used on an active Wikimedia project? Will it be used in prototypes before it is merged with the rest of MediaWiki?
    If the development is subsumed by normal MediaWiki development, this will happen automatically. If it is separate, how will updates to MW and this code be coordinated?
    Wikidata has always been intended as part of MW. GerardM
    • I cannot say which version of MW Wikidata will be included in; this obviously depends on the timetable. Gerard and I will meet in the near future to finalize some decisions in this regard. Inclusion in MW also obviously depends on the Release Manager (Brion), and other developers' opinions. However, that is very much the goal.--Eloquence 23:11, 9 September 2005 (UTC)[reply]
  • Do the procedures for vetting a new Wikimedia project apply to vetting a new face on an old project? It is currently sometimes possible for individual projects to choose what software features are turned on for them; does this apply to large-scale data schema changes as well?
    Ultimate Wiktionary is not a new face on an old project. It is essentially different and the change is disruptive. Content of all wiktionaries has to be converted to the UW. This will take a long time to accomplish. After UW is life, and the Wiktionary data is converted to UW, it will be possible to do the smooth installs and conversions that we favour. GerardM 00:54, 10 September 2005 (UTC)[reply]
  • The current design seems to remove some familiar software features, such as providing Recentchanges and talk pages separated by the editor's language, something users fought hard for when asking for separate language domains in the first place. How can these features be included? If they absolutely cannot be, what other workarounds are there? How important is this to current community members?
    • Filtering RC et al. by language is a desirable goal for UW and feasible, given that the language information is available. However, it is likely not going to be part of the first project release.--23:11, 9 September 2005 (UTC)
  • Will it be possible for an end-user to choose to view only, say, Finnish-French-Finnish dictionary on the Finnish wikipedia. That is: a) No Finnish explanations in articles of Finnish words; b) Translation to only French in articles of Finnish words; c) No articles of Finnish words that don't have a French translation; d) Articles of French words only show the French meaning (see eg. wikt:train); e) Translation to only Finnish in articles of French words. -Samulili 14:04, 4 November 2005 (UTC)[reply]
    This sounds like an excellent use-case. A-B-A dictionaries; A->B translation dictionaries; A-A dictionaries... I am pretty sure the answer is "yes, the data structure will allow one to select such a view" -- but that no interface has yet been designed to filter out just such data and provide A-B-A dictionaries for any pair {A,B}. +sj | Translate the Quarto |+ 01:06, 5 November 2005 (UTC)[reply]
    The idea is that a user can select the language he is interested in. This means that you will find both the Finnish and the French words. If for the user interface Finnish is chosen, you will see the Finnish definitions. Showing the French definition (in translation the fi and fr should be the same content !!) is something that we want as well. When there is no definition at all, we consider to show the current lingua franca. Creating selections like fi no fr translation is something that we want as it will help editors. It is however not likely to be in the first release of UW. GerardM 16:12, 5 November 2005 (UTC)[reply]

  • How does Ultimate Wiktionary data structure compare to the English-only w:WordNet? Will it support semantic relations between "meanings", e.g. hypernim-hyponim relations?