Ontology Based Word Sense Disambiguation
Ontology Based Word Sense Disambiguation
Ontology Based Word Sense Disambiguation
Selected Papers, pp. 134-141. INCOMA Ltd., Shoumen, Bulgaria, December 2006. ISBN
978:954-91743-6-6.
3 http://www.cs.unt.edu/~rada/senseval
4
an XML document (XCES compliant format), problem includes the cases where words in one
which is the standard input for most of our tools. part of the bitext are not translated in the other part
The word alignment process is preceded by a (these are called null alignments) and the cases
coarser grained alignment, namely the sentence where multiple words in one part of the bitext are
alignment which transforms a parallel text <TL1 translated as one or more words in the other part
TL2> in a sequence of pairs of one or more (these are called expression alignments).
sentences in language L1 (SL11 SL12...SL1k) and one We developed two quite different word
or more sentences in language L2 (SL21 SL22…SL2m) aligners, motivated by two distinct objectives: the
so that the two ordered sets of sentences represent first one, called YAWA (Tufiş et al., 2005) was
reciprocal translations. Such a pair is called a motivated by a project aiming at the development
translation alignment unit (or translation unit). In of an interlingually aligned set of wordnets while
the vast majority of cases a translation unit the other one, called MEBA(Tufiş et al., 2005) was
contains one sentence per language (this is called a developed within an SMT ongoing project. The
1-1 translation unit). first one was used for validating, against a
We developed a sentence aligner [3] inspired by multilingual corpus, the interlingual synset
Moore’s aligner (Moore, 2002) which unlike it, is equivalences and also for WSD experiments.
able to detect sentence alignments which are not Although, initially it was concerned only with open
necessarily 1-1 and can process arbitrarily large class words recorded in a wordnet, turning it into
parallel data. It has a comparable precision but a an “all words” aligner was not a difficult task.
better recall than Moore’s aligner. Our aligner does YAWA is a three stage lexical aligner that uses
not need a-priori language specific information, its bilingual translation lexicons and phrase
parameters being set by a training phase on a small boundaries detection to align words of a given
number of human checked alignment data (about bitext. The translations lexicons are generated by a
1000 sentences). different module, TREQ (Tufiş, 2002; Tufiş et al.
The sentence aligner consists of a hypothesis 2003), which generates translation equivalence
generator which creates a list of plausible sentence hypotheses for the pairs of words (one for each
alignments from the parallel corpus and a filter language in the parallel corpus) which have been
which removes the improbable alignments. The observed occurring in aligned sentences more than
filter is an SVM binary classifier (Fan et al., 2005) expected by chance. The hypotheses are filtered by
initially trained on a Gold Standard. The features a loglikelihood score threshold. Several heuristics
of the initial SVM model are: the word sentence (string similarity-cognates, POS affinities and
length, the non-word sentence length, and the rank alignments locality5) are used in a competitive
correlation for the first 25% of the most frequent linking manner (Melamed, 2001) to extract the
words in the two parts of the training bitext. This most likely translation equivalents.
model is used to preliminarily filter alignment YAWA generates a bitext alignment by
hypotheses generated from the parallel corpus. The incrementally adding new links to those created at
set of the remaining aligned sentences is used as the end of the previous stage. The existing links act
the input for an EM algorithm which builds a word as contextual restrictors for the new added links.
translation equivalence table by a similar approach From one phase to the other, new links are added
to the IBM model-1 procedure. The SVM model is without deleting anything. This monotonic process
rebuilt (from the Gold Standard) this time requires a very high precision (at the price of a
including, as an additional feature, the number of modest recall) for the first step. The next two steps
word translation equivalents existing in the are responsible for significantly improving the
sentences of a candidate alignment pair. This new recall and ensuring an increased F-measure.
model is used by the SVM classifier for the final A quite different approach from the one used by
sentence alignment of the parallel corpus. YAWA, is implemented in our second word
aligner, called MEBA. It is a multiple parameter
2.2 Two Aligners and Their Combination and multiple step algorithm using relevance
thresholds specific to each parameter, but different
The word alignment of a bitext is an explicit from each step to the other. The implementation of
representation of the pairs of words <wL1 wL2>
(called translation equivalence pairs) co-occurring 5 The alignments locality heuristics exploits the observation
in the same translation units and representing made by several researchers that adjacent words of a text in
the source language tend to align to adjacent words in the
mutual translations. The general word alignment target language. A more strict alignment locality constraint
requires that all alignment links starting from a chunk, in the
4 http://www.cs.vassar.edu/XCES/ one language end in a chunk in the other language.
MEBA was strongly influenced by the famous five likely to be wrong. For the purpose of filtering, a
IBM models described in the (Brown et al., 1993) link is characterized by its type defined by the pair
seminal paper. We used GIZA++ (Och & Ney of indexes (i,j) and the POS of the tokens of the
2000; Och &Ney, 2003) to estimate different respective link. The likelihood of a link is
parameters of the MEBA aligner. proportional to the POS affinities of the tokens of
MEBA is an iterative algorithm that takes the link and inverse proportional to the bounded
advantage of all pre-processing phases mentioned relative positions (BRP) of the respective tokens:
in the beginning of the Section 2. BRP = 1+ || i − j | −avg | where avg is the average
The alignment model considers a link between displacement in a Gold Standard of the aligned
two candidate words as an object that is described tokens with the same POSes as the tokens of the
by a feature-values structure (with values in the current link. From the same gold standard we
[0,1] interval) which we call the reification of the estimated a threshold below which a link is
link. We differentiate between context independent removed from the final alignment.
features that refer only to the tokens of the current A more elaborated alignment combination (with
link (translation equivalency, part-of-speech better results than the previous one) is modelled as
affinity, cognates, etc.) and context dependent a binary statistical classification problem (good /
features that refer to the properties of the current bad) and, as in the case of the previous method, the
link with respect to the rest of links in a bi-text net result is the removal of the links which are
(locality, number of traversed links, tokens indexes likely to be wrong. We used the SVM training and
displacement, collocation). Also, we distinguish classification toolkit - LIBSVM (Fan et al., 2005)
between bi-directional features (translation with the default parameters (C-SVC classification
equivalency, part-of-speech affinity) and non- and radial basis kernel function). The classifier was
directional features (cognates, locality, number of trained with positive and negative examples of
traversed links, collocation, indexes displacement) links. A subset of the Gold Standard alignment
links was used as positive examples set. The same
2.3 COWAL: The Combined Aligner number of negative examples was extracted from
the alignments produced by COWAL and MEBA
The Combined Word Aligner, COWAL, is a where they differ from the Gold Standard.
wrapper of the two aligners (YAWA and MEBA) The result of the SVM-based combination
merging the individual alignments and filtering the (COWAL), compared with the individual aligners,
result. At the Shared Task on Word Alignment is shown in Table 1.
organized by the ACL2005 Workshop on
“Building and Using Parallel Corpora: Data-driven
Aligner P R F-measure
Machine Translation and Beyond” (Martin et al.
2005), we participated (on the Romanian-English YAWA 88.80% 74.83% 81.22%
track) with the two aligners and the combined one MEBA 92.15% 73.40% 81.71%
(COWAL). Out of 37 competing systems,
COWAL (Tufiş et al., 2005) was rated the first, COWAL 87.26% 80.94% 83.98%
MEBA the 20th and TREQ-AL (Tufiş, 2002; Tufiş Table 1: Combined alignment
et al. 2003), the former version of YAWA), was
rated the 21st. The usefulness of the aligner
combination was convincingly demonstrated. COWAL is now embedded into a larger platform
Meanwhile, both the individual aligners as well as (called MTkit) that incorporates the tools for
their combination were significantly improved. bitexts pre-processing, a graphical interface that
One very simple, but very effective method of allows for comparing and editing different
alignment combination is a heuristic procedure alignments, as well as a word sense disambiguation
which merges the alignments produced by two or module. A snapshot of the COWAL graphical
more word aligners and filters out the links that are interface is shown in Figure 1.
Figure 1: COWAL Graphical User Interface
6
The left pane in Figure 1 is the alignment viewer et al., 2006) produced wordnet-relevant lexicons
and editor area. The user can edit the alignments 7
with F-measures as high as 84.26% and 89.92% .
(delete and add one or multiple links). By double
clicking a word in this pane, its properties will be
automatically displayed in the right-hand windows. 3 WN-based Sense Disambiguation
The upper-right window shows the lexico-syntactic
The task of word sense disambiguation (WSD)
properties of the selected word: the morphological
requires one reference sense inventory in terms of
analysis of the orthographic form, its lemma, the
which the senses of the target words will be
syntactic chunk to which it belongs. Currently this
labeled. We argued at length elsewhere (Tufiş &
pane is not editable. The bottom-right window
Ion, 2004) that a meaningful discussion of the
displays the semantic properties of the selected
performances of a WSD system cannot dispense of
word: its sense in the current context, the gloss for
clearly specifying the sense inventory it uses, and
this sense, synonyms, hyperonyms, derivatives,
the comparison between two WSD systems that
etc. These properties are extracted from the
uses different sense inventories is frequently more
wordnet of the language to which the selected
confusing than illuminating. Essentially, this is
word belongs to. This pane is editable, but only the
because the differences in the semantic distinctions
sense number is subject to user modifications.
(sense granularities), as used by different semantic
Although far from being perfect, the accuracy
dictionaries (sense repositories), make the
of word alignment technology and of the
difficulty of the WSD task range over a large
translation lexicons extracted from parallel corpora
spectrum. For instance, the discrimination of
is rapidly improving. In the shared task evaluations
homographs (more often than not having different
of different word aligners, organized on the
occasion of the 2003 NAACL Conference and the
2005 ACL Conference, our winning systems 6 wordnet-relevant lexicons are restricted only to translation
TREQ-AL (Tufiş et al., 2003) and COWAL (Tufiş pairs of the same major POS (nouns, verbs, adjectives and
adverbs).
7 Currently, with the most recent improvements, the
COWAL’s F-measure is 92.08%
9
parts of speech, e.g. “(to) bottle” as storing liquids structure. We compute the semantic-similarity
or gases in bottles, versus “bottle” as the recipient)
score by the formula SYM ( ILI 1 , ILI 2 ) = 1 where k
is much simpler than metonymic distinctions (e.g. 1+ k
“bottle” as container, versus “bottle” as content). is the number of links from ILI1 to ILI2 or from
In our research, we used the Princeton Wordnet both ILI1 and ILI2 to the nearest common ancestor.
2.0 as the major sense inventory and the BalkaNet After the WSD process has finished, the sense
multilingual lexical ontology. By observing the information is inserted into the XML encoding of
interlingual synset mapping principle and the corpus. Which sense inventory (ILI, SUMO or
incorporating most of the conceptual extensions DOMAINS) should be used in the encoding is a
proposed by EuroWordNet, the BalkaNet wordnets user-set parameter, which by default includes all of
can be easily combined with any of the other them.
semantic networks of the EuroWordnet, and, thus,
<tu id="Ozz20">
one may speak about a really pan-European <seg lang="en">
multilingual lexical ontology, covering at least 15 <s id="Oen.1.1.4.9">
8 <w lemma="the" ana="Dd">The</w>
languages . <w lemma="patrol" ana="Ncnp" sn="3"
The BalkaNet multilingual environment took oc="Group" dom="military">patrols</w>
advantage of the latest developments in the PWN <w lemma="do" ana="Vais">did</w>
<w lemma="not" ana="Rmp" sn="1" oc="not"
that was adopted itself as an interlingual index. dom="factotum">not</w>
This is a major difference with respect to the <w lemma="matter" ana="Vmn" sn="1"
oc="SubjAssesAttr"
EuroWordNet’s ILI. As the SUMO/MILO and dom="factotum">matter</w>
DOMAINS classifications, have both been aligned <c>,</c>
with PWN, they automatically became available in <w lemma="however" ana="Rmp" sn="1"
oc="SubjAssesAttr|PastFn"
each monolingual wordnet of the BalkaNet. To dom="factotum">however</w>
allow the representation of language idiosyncratic <c>.</c>
</s>
properties, structural knowledge present in the </seg>
monolingual wordnets has precedence over the <seg lang="ro">
structural knowledge imported from the ILI. As the <s id="Oro.1.2.5.9">
<w lemma="şi" ana=Crssp>Şi</w>
Romanian wordnet (Tufiş et al., 2006) imported <w lemma="totuşi" ana="Rgp" sn="1"
SUMO/MILO and DOMAINS labels and the oc="SubjAssesAttr|PastFn"
dom="factotum">totuşi</w>
synsets unique identifiers are the same as in the <c>,</c>
PWN, it is self-contained but at the same time it <w lemma="patrulă" ana="Ncfpry" sn="1.1.x"
can be directly plugged-in in a PWN centered oc=“Group" dom="military">patrulele</w>
<w lemma="nu" ana="Qz" sn="1.x" oc="not"
multilingual wordnet infrastructure. dom="factotum">nu</w>
Once the translation equivalents identified, it is <w lemma="conta" ana="Vmii3p" sn="2.x"
oc="SubjAssesAttr" dom="factotum">contau
reasonable to expect that the words of a translation </w>
pair <wiL1, wjL2> share at least one conceptual <c>.</c>
meaning stored in an interlingual sense inventory. </s>
</seg>
When interlingually aligned wordnets are available …
(as is our case), obtaining the sense labels for the </tu>
words in a translation pair is straightforward: one Figure 2: The final corpus encoding
has to identify for wiL1 the synset SiL1 and for wjL2
the synset SjL2 so that SiL1 and SjL2 are projected In Figure 2, it is shown the final encoding of
over the same interlingual concept. The index of one translation unit of the “1984” parallel corpus.
this common interlingual concept (ILI) is the sense The “sn” attribute represents the Princeton
label of the two words wiL1 and wjL2. However, it is Wordnet 2.0 unique synset identifier (ILI code),
possible that no common interlingual projection the “oc” attribute represents the SUMO ontology
will be found for the synsets to which wiL1 and wjL2 concept and the “dom” attribute represents the
belong. In this case, the senses of the two words DOMAINS label.
will be given by the indexes of the most similar
interlingual concepts corresponding to the synsets
of the two words. Our measure of interlingual
concepts semantic similarity is based on PWN 9 For a detailed discussion and an in-depth analysis of several
other measures see: Budanitsky, A., Hirst, G., Semantic
distance in WordNet: An experimental, application-oriented
8 Basque, Bulgarian, Catalan, Dutch, Czech, English, evaluation of five measures. Proceedings of the Workshop
Estonian, French, German, Greek, Italian, Romanian, on WordNet and Other Lexical Resources, NAACL,
Serbian, Spanish, and Turkish. Pittsburgh, June, (2001) 29-34.
4 WSD Evaluation Finally, the most refined sense inventory of PWN
will be extremely useful in Natural Language
The BalkaNet version of the “1984” corpus is Understanding Systems, which would require a
encoded as a sequence of uniquely identified deep processing. Such a fine inventory would be
translation units. For the evaluation purposes, we highly beneficial in lexicographic and lexicological
selected a set of frequent English words (123 studies.
nouns and 88 verbs) the meanings of which were Similar findings on sense granularity for the
also encoded in the Romanian wordnet. The WSD task are discussed in (Stevenson & Wilks,
selection considered only polysemous words (at 1998) where for some coarser grained inventories
least two senses per part of speech) since the POS- even higher precisions are reported. However, we
ambiguous words are irrelevant as this distinction are not aware of better results in WSD exercises
is solved with high accuracy (more than 99%) by where the PWN sense inventory was used. The
our present tiered-tagger (Ceauşu, 2006). All the major explanation for this is that unlike the
occurrences of the target words were majority work in WSD that is based on
disambiguated by three independent experts who monolingual environments, we use for the
negotiated the disagreements and thus created a definition of sense contexts the cross-lingual
gold-standard annotation for the evaluation of translations of the occurrences of the target words.
precision and recall of the WSD algorithm. The The way one word in context is translated into one
table below summarizes the results. or more other languages is a very accurate and
highly discriminative knowledge source for the
Precision Recall F-measure decision-making.
78.21% 78.21% 78.21%
5. Conclusions
Table 2. WSD precision, recall and F-measure
Word Alignment is a highly promising technology
With the PWN senses identified (synset unique with real prospects of soon reaching full maturity
identifiers), sense labeling with either SUMO and reliability as needed by commercial
and/or IRST domains inventories is trivial, as applications. Among them, one could mention
described before, because the synset unique multilingual computational lexicography and
identifiers of PWN are already mapped (clustered) terminology, multilingual documents indexing and
onto these two sense inventories. Table 3 shows a retrieval, open domain natural language question
great variation in terms of Precision, Recall and F- answering and obviously machine translation. We
measure when different granularity sense described another application, WSD, which is not
inventories are considered for the WSD problem. an end in itself, but necessary at one level or
Thus, it is important to make the right choice on another to accomplish most natural language
the sense inventory to be used with respect to a processing tasks.
given application. Neither YAWA nor MEBA needs an a priori
bilingual dictionary, as this will be automatically
extracted by the TREQ or GIZA++. We made
Sense Inventory Precision Recall F-measure
evaluation of the individual alignments in both
PWN 115424 cat. 78.21% 78.21% 78.21% experimental settings: without a startup bilingual
Sumo 2066 cat. 85.08% 85.08% 85.08% lexicon and with an initial mid-sized bilingual
lexicon. Surprisingly enough, we found that while
Domains 163 cat. 93.30% 93.30% 93.30% the performance of YAWA increases a little bit
Table 3. Evaluation of the WSD in terms of three (approx. 1% increase of the F-measure) MEBA is
different sense inventories. doing better without an additional lexicon. So, in
the evaluation presented in the previous section
In case of a document classification problem, it is MEBA uses only the training data vocabulary. The
very likely that the IRST domain labels (or a automatically extracted lexicons, could be almost
similar granularity sense inventory) would suffice. 100% accurate (with a sufficiently high occurrence
The rationale is that IRST domains are directly threshold) which is obviously a very good starting
derived from the Universal Decimal Classification point in compiling bilingual dictionaries for
as used by most libraries and librarians. The language pairs where such electronic resources are
SUMO sense labeling will be definitely more not easily available.
useful in an ontology based intelligent system The results in Table 3 show that although we
interacting through a natural language interface. used the same WSD algorithm on the same text,
the performance scores (precision, recall, f- Martin, J., Mihalcea, R., Pedersen, T. Word Alignment for Languages
with Scarce Resources. In Proceeding of the ACL2005 Workshop
measure) significantly varied, with more than 15% on “Building and Using Parallel Corpora: Data-driven Machine
difference between the best (DOMAINS) and the Translation and Beyond”. June, 2005, Ann Arbor, Michigan,
June, Association for Computational Linguistics, 65–74
worst (PWN) f-measures. This is not surprising,
Melamed, D. Empirical Methods for Exploiting Parallel Texts.
but it shows that it is extremely difficult to Cambridge, MA: MIT Press, 2001
objectively compare and rate WSD systems Mihalcea R., and Pedersen, T. An Evaluation Exercise for Word
working with different sense inventories. Alignment, in Proceedings of the HLT/NAACL Workshop on
The potential drawback of this approach is that Building and Using Parallel Texts: Data Driven Machine
Translation and Beyond, Edmonton, Canada, May 2003.
it relies on the existence of parallel data and at
Moore, R. 2002. Fast and Accurate Sentence Alignment of Bilingual
least two aligned wordnets that might not be Corpora in Machine Translation: From Research to Real Users. In
available yet. Nevertheless, parallel resources are Proceedings of the 5th Conference of the Association for Machine
Translation in the Americas, Tiburon, California), Springer-
becoming increasingly available, in particular on Verlag, Heidelberg, Germany: 135-244.
the World Wide Web, and aligned wordnets are Niles, I., and Pease, A., Towards a Standard Upper Ontology. In
being produced for more and more languages Proceedings of the 2nd International Conference on Formal
(currently there are more than 40 ongoing wordnet Ontology in Information Systems (FOIS-2001), Ogunquit, Maine,
(2001) 17-19.
projects for 37 languages). In the near future it
Och, F.J., Ney, H., Improved Statistical Alignment Models,
should be possible to apply our and similar Proceedings of ACL2000, Hong Kong, China, 440-447, 2000.
methods to large amounts of parallel data and a Och, F.J., Ney, H. "A Systematic Comparison of Various Statistical
wide spectrum of languages. Alignment Models", Computational Linguistics, 29(1), pp. 19-51,
2003
Stevenson, M., Wilks, Y., The interaction of Knowledge Sources in
Acknowledgements. The reported work is the Word Sense Disambiguation. Computational Linguistics, Vol. 24,
result of several year intensive research at our no. 1, (1998) 321-350.
institute. Many people deserve acknowledgements Tufiş, D., Ide, N. Erjavec, T. (1998). “Standardized Specifications,
Development and Assessment of Large Morpho-Lexical
here, but special mentioning is due to Radu Ion, Resources for Six Central and Eastern European Languages”.
Alin Ceauşu, Dan Ştefănescu, Verginica Barbu- Proceedings LREC’1998, Granada, Spain, pp. 233-240.
Mititelu and Elena Irimia, currently preparing their Tufiş, D., Tiered Tagging and Combined Classifiers, in F. Jelinek, E.
PhD theses on topics directly or closely related to Nöth (eds) Text, Speech and Dialogue, Lecture Notes in Artificial
Intelligence, Vol. 1692. Springer-Verlag, Berlin Heidelberg New-
those discussed in this paper. York (1999) 28-33.
Tufis, D. A cheap and fast way to build useful translation lexicons. In
References Proceedings of the 19th International Conference on
Computational Linguistics, COLING2002, Taipei, 25-30 August,
2002, pp. 1030-1036, ISBN 1-55860-894.
Brown, P. F., Della Pietra, S.A., Della Pietra, V. J., Mercer, R.
L.(1993) “The mathematics of statistical machine translation: Tufiş, D., Barbu, A., M., Ion, R. A word-alignment system with
Parameter estimation”. Computational Linguistics, 19(2) pp. 263– limited language resources. In Proceedings of the NAACL 2003
311 Workshop on Building and Using Parallel Texts; Romanian-
English Shared Task, Edmonton (2003) 36-39.
Ceauşu, Al. Maximum Entropy Tiered Tagging, in Janneke Huitink &
Sophia Katrenko (editors), Proceedings of the Eleventh ESSLLI Tufiş, D. (ed): Special Issue on BalkaNet. Romanian Journal on
Student Session, June 20, 2006, Malaga, Spain, pp. 173-179 Science and Technology of Information, Vol. 7 no. 3-4 (2004) 9-
44.
Ceauşu, Al., Ştefănescu, D., Tufiş, D. :Acquis Communautaire
sentence alignment using Support Vector Machines. In Tufiş, D., Ion, R. Ceauşu, Al., Stefănescu, D.: Combined Aligners. In
proceedings of the 5th LREC Conference, Genoa, Italy, 22-28 Proceeding of the ACL2005 Workshop on “Building and Using
May, 2006, pp. 2134-2137, ISBN 2-9517408-2-4, EAN Parallel Corpora: Data-driven Machine Translation and
9782951740822 Beyond”. June, 2005, Ann Arbor, Michigan, June, Association for
Computational Linguistics, pp. 107-110.
Dimitrova, L, Erjavec, T., Ide, N., Kaalep, H., Petkevic, V. and Tufiş,
D. (1998) "Multext-East:Parallel and Comparable Corpora and Tufiş, D., Ion, R. Evaluating the word sense disambiguation accuracy
Lexicons for Six Central and East European Languages" in with three different sense inventories. In Proceedings of the
Proceedings ACL-COLING’1998, Montreal, Canada, pp. 315- Natural Language Understanding and Cognitive Systems
319. Symposium, Miami, Florida, May 2005, pp. 118-127, ISBN 972-
8865-23-6
Fan, R., Chen, P.H, Lin, C.J. Working set selection using the second
order information for training SVM. Technical report 2005., Tufiş, D., Barbu-Mititelu, V., Bozianu, L., Mihăilă, C.:
Department of Computer Science, National Taiwan University Romanian WordNet: New Developments and
(www.csie.ntu.edu.tw/ ~cjlin/papers/quadworkset.pdf). Applications. In Proceedings of the 3rd Conference of the
Fellbaum, Ch. (ed.) WordNet: An Electronic Lexical Database, MIT Global WordNet Association, Seogwipo, Jeju, Republic of
Press (1998). Korea, January 22-26, 2006, pp. 337-344, ISBN 80-210-
Ide, N., Veronis, J., Introduction to the special issue on word sense 3915-9
disambiguation. The state of the art. Computational Linguistics, Tufiş, D., Ion, R. Ceauşu, Al., Stefănescu, D.: Improved Lexical
Vol. 27, no. 3, (2001) 1-40. Alignment by Combining Multiple Reified Alignments. In
Proceedings of the 11th Conference of the European Chapter of
Ion, R. 2006. Methods for Automatic Disambiguation. Applications the Association for Computational Linguistics (EACL2006),
for Romanian and English, PhD Thesis, Romanian Academy, Trento, Italy, 3-7 April, 2006, pp. 153-160, ISBN 1-9324-32-61-2
Bucharest, Romania, 145 p. (in Romanian) Vossen P. (ed.) A Multilingual Database with Lexical Semantic
Magnini B. Cavaglià G., Integrating Subject Field Codes into Networks, Kluwer Academic Publishers, Dordrecht, 1998
WordNet. In Proceedings of LREC2000, Athens, Greece (2000)
1413-1418.