Automatic acquisition of semantic collocation from corpora

S Sekine, J Tsujii - Machine Translation, 1995 - Springer
Machine Translation, 1995Springer
The real difficulty in developing practical NLP systems is due to the fact that we do not know
in advance what actual instances of knowledge should be used in the application system,
even though we know in advance what types of knowledge are required. An effective
method for extracting linquistic knowledge from corpora is needed. We propose automatic
linguistic knowledge acquisition from sublanguage corpora. The system combines existing
linquistic knowledge and human intervention with corpus based techniques. The algorithm …
Abstract
The real difficulty in developing practical NLP systems is due to the fact that we do not know in advance what actualinstances of knowledge should be used in the application system, even though we know in advance whattypes of knowledge are required. An effective method for extracting linquistic knowledge from corpora is needed. We propose automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linquistic knowledge and human intervention with corpus based techniques. The algorithm involves a “Gradual Approximation”, which works to converge linguistic knowledge gradually towards desirable results. We conducted three experiments. The first experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus. The results show the algorithm is promising, though there are some problems; the practical problem of the parameters, the formalism problems to include more linguistic features and the combination with other linguistic clues for more development. We would like to continue the research to perform further experiments and to improve the algorithm.
Springer
Showing the best result for this search. See all results