V4i4 0338
V4i4 0338
V4i4 0338
net/publication/319090129
CITATION READS
1 4,457
1 author:
SEE PROFILE
All content following this page was uploaded by Bhuvaneshwari chndrashekharayya Melinamath on 13 August 2017.
Abstract— The morphological generator generate morphological forms for nouns and verbs when the root word is
given. This paper describes a method of generating different word forms. Nouns in Kannada gets inflected for gender,
number and vibhakti (cases). There are 2 numbers singular and plural and 3 genders feminine, masculine and neuter.
6 cases and another set of 6 extended case like suffixes, 4 clitics. A single noun root has around 250 morphological
forms in Kannada where as a single noun root has only 2 forms in English say boy, boys. This shows the complexity
of Kannada Language. Similarly a single verb root in Kannada has around 30000 different morphological forms
where as verb root in English has 5 verb forms. Morphological generator are useful components of MT (Machine
Translation application. The input to the generator is root word followed by category and required inflection types or
all inflections. The morphological generator refers the morph dictionary to check for any morph related information
present in the dictionary. We have stored morph related information in dictionary like information regarding ‘real u’,
past participle form of verb or kinship terms, countable and uncountable features etc. These instances avoid
generator to be over general which otherwise will generate in valid word forms. Our results regarding nouns and
pronoun is more than 95% and around 85% for verbs. We have handled derivational morphology also.
Keywords— Natural language Processing (NLP), Part of Speech (POS), Expert Advisory Group on Language
Engineering Standards) EAGLES, Machine Translation ( MT).
I. INTRODUCTION
Morphology is the study of the internal structure and transformational processes of words. Words are formed from a
combination of one or more free morphemes and zero or more bound morphemes. Free morpheme are units of meaning
which can stand on their own as words. Bound morphemes are also units of meaning; however, cannot occur as words on
their own: they can only occur in combination with free morphemes. The English word jumped is comprised of two
morphemes, jump+ed. Since jump is an individual unit of meaning which cannot be broken down further into smaller
units of meaning, it is a morpheme. And, since jump can occur on its own as a word in the language, it is a free
morpheme. The unit +ed can be added to a large number of English verbs to create the past tense. Since +ed has meaning,
and since it cannot be segmented into smaller units, it is a morpheme. However, +ed can only occur as a part of another
word, not as a word on its own; therefore, it is a bound morpheme. The process by which bound morphemes are added to
free morphemes can often be described using a word formation rule.
Both analysis and generation rely on two sources of information a dictionary of valid lemmas of the language and a
set of inflections paradigms. The basic principle of morphological generation is to get forms from a root and a set of
features (lexical category and morphological properties). Generally, there are two categories of approaches to developing
a morphological generator. Approaches that use finite-state transducers (FSTs), such as Xerox Arabic analyzer (Beesley,
2003) and approaches that use rule based transformations, by ( Cavalli- Sforza, 2000)
We have developed morphological generator kannada for nouns and verbs. Nouns in Kannada gets inflected for
gender, number and vibhakti (cases). There are 2 numbers singular and plural and 3 genders feminine, masculine and
neuter. 6 cases and another set of 6 extended case like suffixes, 4 clitics. A single noun root has around 250
morphological forms in Kannada Similarly a single verb root in Kannada has around 30000 different morphological
forms. Morphological generator are useful components of MT (Machine Translation) application. The input to the
generator is root word followed by category and required inflection types or all inflections. The morphological generator
refers the morph dictionary to check for any morph related information present in the dictionary. We have developed
30000 plus words dictionary for this tag. The dictionary is tagged with hierarchical tag set. We have handled derivational
morphology also.
The remaining part of paper comprises of 5 sections. Section 2 describes the work in this area, section 3 describes
Kannada morphology, section 4 describes the proposed method, and section 5 is about performance and section 6 is
about conclusion.
II. LITERATURE SURVEY
An Lots of work is carried out on development of morphological analyzers and generators for foreign languages and
also for few Indian languages. Following is the gist of works cited in the literature. One of the most efficient approaches
to morphological analysis and generation uses finite state transducers (FST) addressed by( Mohri, 1997). There are
number of tools for the construction of FST based morphological analyzers. The best known being those developed at
This Kannada word form is derived by adding 9 suffixes to the root verb tappu (wrong). The formation of this verb
form is Verb-intransitive-causative-Verbal participle-reflexive + infinitive +modal (CAP) +PRO.avaru+accusative+clitic
indefinite.
A noun boy in English has at the maximum 2 inflections boy and boys. But the same word in Kannada has around
250 forms. All these morphosyntactic aspects of the complex morphology has to adopted while generation. Hence
designing morphological generator for Kannada language is quite challenging.
A. Noun Morphology
Nouns in Kannada may be distinguished for gender, number, case and clitcs.
Gender:
Nouns referring to biologically female beings are feminine in gender, that are biologically male are masculine in
gender and nouns that are not thought to be ”rational‟(capable of thought) are referred as neuter. Sometimes young
children, small kids are treated as non-rational.
Number:
Kannada nouns are distinguished by two numbers, singular and plural. The singular has no particular distinguishing
marker added. The plural marker ‟gaLu‟ is used for neuter nouns, ‟ru‟ is used for others. There is an exception in some
case whether gender is not specific in human noun ‟gaLu ‟ is also used as plural marker for in examples like
prajegaLu ‟citizen. Usually all masculine and feminine nouns ending in ‟a‟,‟i‟, ‟e‟, and consonant followed by enunciate
Case system:
The case system is useful to indicate different relationships between the noun and other constituents of the sentence.
For Ex to indicate whether the noun is the ”object” of a verb (in which case it is marked for accusative case), or ”goal” of
a verb of motion (dative case), posser of something (genitive case), or the means by which something takes place
(ablative case), etc.
DERIVATIVE STEMS: These are formed by adding various kinds of derivative suffixes to the verbal or nonverbal
stem. Consider the following examples.
Derived Stem (1) tinnu „eat‟+/alu/ tinnalu, (to eat) tinnu is basic stem
COMPLEX VERBS: These are formed by adding various kinds of models to the primitive and derived stems. These
can be further subdivided into compound verbs, conjunct verbs, modal verbs and aspectual verbs. Kannada has at least
four nonfinite forms of verbs. Like the participial constructions, verbal participles, relative participles, they are
aspectually distinguished.
TABLE 3 VERB FEATURES AND CHARACTERISTICS SUFFIXES
A. Transliteration Module: This module converts the raw text in Kannada in to roman form by using a convertor
program. If raw text is in ISCCI or Unicode then the convertor program accepts both kind of input and transliterate the
text as per our lerc notation. The input file or word should be a uninflected root.
B. RESOURCES : To develop a generator we must have some resources. These resources are
Dictionary: A dictionary contains roots/stems, categories and exceptions, morph relevant information etc.
Orthographic Rules Set : Morphological generators use the concatenation process while they generate a word-
form. So, we must create a set of concatenation rules for the generator according to the language. The rules
changes with word endings as shown in table 1. The alternation forms (spelling changes) of morphemes
according to the context in which they appear.
Stem Ending
a E I o U Consonant
Real Enunciative
Glide insertion for rational n Y Y n v u dropped N
Glide for non rational v Y Y v v u dropped -
REFERENCES
[1] Martin Kay. 1987. Nonconcatenative finite-state mor phology. In Proceedings of the Third Conference of the
European Chapter of the Association for Computational Linguistics, pp. 2–10.
[2] L. Kataja and K. Koskenniemi. 1988. Finite state description of Semitic morphology. In COLING-88: Papers
Presented to the 12th International Conference on Computational Linguistics, volume 1, pp 313–15.
[3] Koskenniemi, Kimmo. 1983. : Two level Morphology: A general Computational Model, for Word Form
Recognition and Production. Ph.D. Thesis, Department of General Linguistics, University of Helsinki, Helsinki,
Finland.
[4] Chanod, Jean-Pierre. 1994. Finite state composition of French verbmorphology. Technical report MLTT-005,
Xerox Research Centre Europe, Meylan, France.
[5] Karttunen, Lauri. 1993. Finite state lexicon compiler. Technical Report ISTL-NLTT-1993-04-02, Xerox Palo Alto
Research Center, Palo Alto, California.
[6] K. Beesley 1996. Arabic finite-state morphological analysis and generation. In Proceedings of the 16th
International Conference on Computational Linguistics (COLING-96), volume 1, pp. 89–94,Copenhagen, Denmark.
[7] [Buckwalter, 2002] T. Buckwalter. Arabic Morphological Analyzer Version 1.0. Linguistic Data Consortium,
University of Pennsylvania, 2002. LDC Catalog No.: LDC2002L49, 2002
[8] Hopcroft, J. E., & J. D. Ullman. : Introduction to automata theory, lan guages. and computation. Reading, MA :
Addition -Wesley .
[9] Mohri, Mehryar. : Finite-state transducers in language and speech processing. Computational Linguistics
23(2):269–311. (1997)
[10] Oncina, Jose, Pedro Garc`ia, & Enrique Vidal. 1993. Learning subsequential transducers for pattern recognition
interpretation tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 15:448-458
[11] John McCarthy.1981. A prosodic theory of nonconcatenative morphology. Linguistic Inquiry, 12(3):373–418.
[12] Kiraz, G.: Multi-tape Two-level Morphology: A Case study in Semitic Non-Linear Morphology. In: Proceedings of
COLING‟94, Vol. 1 (1994) 180-186.
[13] Cavalli-Sforza, V. Soudi A., Mitamura T.: Arabic Morphology Generation Using a Concatenative Strategy.
In:Proceedings of NAACL 2000. Seattle, WA (2000).
[14] Nizar Habash. 2004. Large scale lexeme based arabic morphological generation. In Proceedings of Traitement
Automatique du Langage Naturel (TALN-04). Fez, Morocco