Morphological Process
Morphological Process
Morphological Process
10
Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology (SIGMORPHON2012), pages 10–16,
Montre´al, Canada, June 7, 2012. Ⓧc 2012 Association for Computational Linguistics
of this language have been developed. There are by Niraj and Robert ex- tracts a set of suffix
derivational analyzers for other Indian languages replacement rules from a corpus and a dictionary.
like Marathi (Ashwini Vaidya, 2009) and Kannada The rules are applied to an inflected
(Bhuvaneshwari C Melinamath et al., 2011). The
Marathi morphological analyzer was built using a
Paradigm based approach whereas the Kannada ana-
lyzer was built using an FST based approach. As
far as English is concerned, there are some
important works (Woods, 2000; Hoeppner, 1982)
pertaining to the area of derivational
morphological analysis. However, both of these are
lexicon based works.
For our work, we employed a set of suffix replace-
ment rules and a dictionary in our derivational ana-
lyzer, having taken insights from the Porter’s stem-
mer (Porter, 1980) and the K-stemmer (R. Krovetz.
1993). They are amongst the most cited stemmers
in the literature. The primary goal of Porter’s stem-
mer is suffix stripping. So when a word is given as
input, the stemmer strips all the suffixes in the
word to produce a stem. It achieves the task in five
steps applying rules at each step. Given a word as
input, the Krovetz stemmer removes inflectional
suffixes present in the word in three steps. First it
converts the plural form of the word into a singular
form, then it converts past tense to present tense,
and fi- nally removes -ing. As the last step, the
stemmer checks the dictionary for any recoding
and returns the stem. Our algorithm uses the main
principles of both the Porters stemmer and Krovetz
stemmer. The suffix replacement rules of our
algorithm resemble that of the Porters and a
segment of the algorithm is analogous to the
dictionary based approach of the Krovetzs
stemmer.
3 Existing Inflectional
Hindi Morphological
Analyzers
18
analysis (by old Let there be 10 derivational 6 Conclusions
morph) of 50 words words which belong analyzer. As a
is perfectly equal. to Type-6. It means result of this We presented an
derivational improvement, the algorithm which
Table 5: Output analyzer. Finally overall Type-1 uses an exist- ing
analysis of old we compare the (Perfect output inflectional
morph analyzer
evaluations of the which is analyzer for
Type
old morphological completely performing
analyzer and our matching with the derivational
Number of
derivational an- gold output) of analysis. The
instances % of
Type alyzer. This is our derivational algorithm uses the
Type1 evaluation analyzer is nearly main principles of
Type2 methodology. 5% more than that both the Porters
Type3 So a gold-data of the old stemmer and
Type4 consisting of the morphological Krovetz stemmer
Type5 analysis of 5000 analyzer. The data for achieving the
words was taken. size is small (only task. The algorithm
Type6
The linguistic 5000). A testing on achieves decent
experts of IIIT Hy- a larger gold-data precision and recall.
Table 6: Output derabad have built will show an even It also expands the
analysis of this data and it was better picture of the coverage of the
derivational acquired from that improvement that inflectional
analyzer analyzer. But it
institution. The can be achieved by
Type the derivational must be incorpo-
5000 words were
tested on both the analyzer. rated in applications
Number of
derivational like machine
instances % of
analyzer and the translators which
Type
inflectional ana- use derivational
Type1
lyzer. analysis for
Type2
Both the understanding its
Type3
analyzers were real strengths and
Type4
tested on the gold- limitations.
Type5
Type6 data containing
5000 words. The References
table 6 proves
that the old that the Claudia Gdaniec,
performance of Esm Manandise,
morphological
Michael C.
analyzer could not the new McCord. 2001.
an- alyze 10 words derivational Derivational
but there is gold analyzer is better morphology to the
analysis of those than the old rescue: how it can
words. In this way, morphological help resolve
each type forms an analyzer. The old unfound words in
important part of analyzer could not MT, pp.129–131.
Summit VIII:
the evaluation provide any output
Machine
process. Similarly of 288 words Translation in the
we evalu- ate the (Type-6) whereas Information Age,
analysis of the 100 that number is only Proceedings,
words by the 31 in- case of the Santiago de
19
Compostela, Spain. Proceedings of
Jesus Vilares, David ANLC.
Cabrero and Miguel Wolfgang Hoeppner.
A. Alonso. 2001. 1982. A
Applying multilayered
Productive approach to the
Derivational handling of word
Morphology to formation. In
Term Indexing of Proceedings of
Spanish Texts. In COLING.
Proceedings of R. Krovetz. 1993. Viewing
CICLing. morphology as an
Vishal Goyal, Gurpreet inference process. In
Singh Lehal. 2008. Proceedings of COLING.
Hindi Mor- M. F. Porter. 1980. An
phological Analyzer algorithm for suffix
and Generator, pp. stripping. Originally
1156–1159. IEEE published in Program, 14
Computer Society no. 3, pp 130-137.
Press, California, Bharati Akshar, Vineet
USA. Chaitanya, Rajeev
Niraj Aswani, Robert Sangal. 1995. Natural
Gaizauskas. 2010. Language Processing: A
Develop- ing Paninian Perspec-
Morphological tive. Prentice-Hall of India.
Analysers for South Amba P Kulkarni.
Asian Lan- guages: 2010. A Report on
Experimenting with Evaluation of San-
the Hindi and skrit Tools.
Gujarati
Languages. In
Proceedings of
LREC.
Ashwini Vaidya. 2009.
Using paradigms
for certain
morphological
phenomena in
Marathi. In
Proceedings of
ICON.
Bhuvaneshwari C
Melinamath,
Shubhagini D. 2011.
A robust
Morphological
analyzer to capture
Kannada noun
Morphology, VOL
13. IPCSIT.
William A. Woods.
2000. Aggressive
Morphology for
Robust Lexical
Coverage. In
20