A Comparative Study On Text Summarization Methods: Abstract
A Comparative Study On Text Summarization Methods: Abstract
A Comparative Study On Text Summarization Methods: Abstract
Abstract:
With the advent of Internet, the data being added online is increasing at enormous rate. Businesses are waiting
for models that can render some useful information out of this large chunk; and hence our research holds
significance. There are various statistical and NLP models that are used and each of them are efficient in their
own way. Here, we will be making a comparative study so as to bring out the parameters and efficiency, thus
bringing about a deduction as to when and how best we can use a particular model. Text summarization is the
technique, which automatically creates an abstract, or summary of a text. The technique has been developed for
many years.
Summarization is one of the research works in NLP, which concentrates on providing meaningful summary
using various NLP tools and techniques. Since huge amount of information is used across the digital world, it is
highly essential to have automatic summarization techniques. Extractive and Abstractive summarization are the
two summarization techniques available. Lot of research work are being carried out in this area especially in
extractive summarization. The techniques involved here are text summarization with statistical scoring,
Linguistic Method, Graph based method, and artificial Intelligence.
for text summarization. The biggestchallenge of labels used for classification. The most basic
abstractive summary is the representation problem. annotation scheme is modelled after the scientific
Systems’ capabilities are constrained by the method: aim, method, results, conclusion [1].
richness of their representations and their ability to
generate such structures—systems cannot B. Semi-supervised and unsupervised classification
summarize what their representations cannot Guoet. al [3] use four semi-supervised
capture classifiers for sentence classification: three variants
of the support vector machine and a conditional
II. LITERATURESURVEY random field model. The semi-supervised
Automatic text summarization arose in the fifties classifiers either (1) start with a small set of labeled
and become important that suggested to weight the data and choose, at each iteration, additional
sentences of a document as a function of high unlabeled data to be labeled and added to the
frequency words[13], disregarding the very high training set (known as active learning) or (2)
frequency common words. One well-known and include the unlabeled data in the classifier
widely used statistical model of text is latent formulation with an estimate of, or distribution over,
Dirichlet allocation [5], which is a latent variable the unknown labels. They perform sentence
mixture model where a document is modeled as a classification on biomedical abstracts using a
mixture over T clusters known as topics. Informally, version of the Argumentative Zones annotation
a topic is a semantically focused set of words. scheme developed specifically for biology articles.
Informally, a topic is a semantically focused set They present experiments using only 100 labeled
of words. Formally, LDA represents a topic as a abstracts (approximately 700 sentences) to train the
probability vector, or distribution, over the words in different classifiers.
a vocabulary. Thus, the topic about “football” Wu et al. [4] use a hidden Markov model to label
would give high-probability to words such as sentences in scientific abstracts. They first label a
“football”, “quarterback”, “touchdown”, etc. and set of 106 abstracts (709 sentences). They use the
give low (or zero) probability to all other non- labeled data to extract pairs of words from
football related words. Similarly, the topic about sentences that are strong indicators of a particular
“traveling” would give high-probability to traveling label. They then use these word pairs and the
related words, and low (or zero) probability to non- labeled sentences to train a hidden Markov model.
traveling related words. The following methods are Again, we use less labeled data than Wu et al. Also,
also determining the sentence weights. the annotation scheme used by Wu et al. (based on
the scientific method) differs from the annotation
A. Supervised classification scheme used in this paper.
The vast majority of existing work on sentence
classification employs a supervised learning C. Annotation scheme
approach. Common classifiers include conditional We use an annotation scheme that is derived
random fields, naive Bayes classifiers, support from Argumentative Zones [5] (AZ). There are five
vector machines, hidden Markov models and labels in our annotation scheme: own, contrast,
maximum entropy models. basis, aim and miscellaneous. The AZ annotation
scheme includes one additional label textual which
The scope of the task refers to whether describes sentences that discuss the structure of the
classification is performed on the abstract sentences article, e.g. “In Section 3, we show that...”. We
only which is thought to be an easier task since removed the label textual because it was not of
fewer sentence types occur in the abstract– or on obvious use for other applications. We also
the entire text of the article. Alternatively, other collapsed two of the labels in AZ – neutral and
past work has focused on a specific section within other – into one label miscellaneous. The label
the article [2]. The second aspect in which past neutral describes sentences that refer to past work
work differs is the annotation scheme, i.e. the set of in a neutral way. The label other describes
2) Location method
This method states that sentences that appear in
the title are considered to be more important and are
more likely to be included in the summary. The
score of the sentences is calculated as how many
words are commonly used between a sentence and a
title. Title method cannot be effective if the
document does not include any title information.
3) tf-idf method
The term frequency-inverse document
frequency is a numerical statistic which reflects
how important a word is to a document. It is often
used as a weighting factor in information retrieval
and text mining. tf-idf is used majorly for stop
words filtering in text summarization and
categorization application. The tf-idf value
increases proportionally to the number of times a
word appears in the document. tf–idf weighting
scheme are oftenused by search engines as a central
tool in scoring and ranking a document's relevance
Fig.1Text Summarization
given a user query.
The term frequency f(t,d) means the raw linguistic method, which involves the semantic
frequency of a term in a document, that i the processing for summarization.
number of times that term t occurs in document d. Linguistic approaches have some difficulties in
The inverse document frequency is a measure of using high quality linguistic analysis tools (a
whether the term is common or rare across all discourse parser, etc.) and linguistic resources
documents. It is obtained by dividing the total (Word Net, Lexical Chain, Context Vector Space,
number of documents by the number of documents etc.).
containing the term.
C) Lexical chain
4) Cue word method The concept of lexical chains was first
Weight is assigned to text based on its introduced by Morris and Hirst. Basically, lexical
significance like positive weights "verified, chains exploit the cohesion among an arbitrary
significant, best, this paper" and negative weights number of related words. Lexical chains can be
like "hardly, impossible". Cue phrases are usually computed in a source document by grouping
genre dependent. The sentence consisting such cue (chaining) sets of words that are semantically
phrases can be included in summary. The cue related. Identities, synonyms, and
phrase method is based on the assumption that such hypernyms/hyponyms are the relations among
phrases provide a "rhetorical" context for words that might cause them to be grouped into the
identifying important sentences. The source same lexical chain.
abstraction in this case is a set of cue phrases and Lexical chains are used for IR and grammatical
the sentences that contain them. Above all error corrections. In computing lexical chains, the
statistical features are used by extractive text noun instances must be grouped according to the
summarization. above relations, but each noun instance must
belong to exactly one lexical chain. There are
Bayesian Classifier: several difficulties in determining which lexical
chain a particular word instance should join. Words
P( F1 , F2 ,...Fk | s ∈ S ) P ( s ∈ S ) must be grouped such that it creates a strongest and
P ( s ∈ S | F1 , F2 ,...Fk ) =
P( F1 , F2 ,... Fk ) longest lexical chain.
work that contains a list of words grouped edges • Edge weights w(u, v) define a measure of
together according to the similarity of meaning. pairwise similarity between nodes u,v
Semantic relations between the words are
represented by synonyms sets, hyponym trees.
Word-net are used for building lexical chains
according to these relations. Word Net contains
more than 118,000 different word forms. LexSum
is a summarization system which uses Word Net Fig.2Example of Graph-based Representations
Relevance factors Keyword Word Density Document Individual Word Homogene • Based on
s, Alert- Ratio Structure count (excluding ity index similarity
Words stop-words) • Based on Corpus