Seminar ON: Natural Language Processing
Seminar ON: Natural Language Processing
Seminar ON: Natural Language Processing
ON
NATURAL LANGUAGE PROCESSING
Prepared by
NEHA S. SHAH
Roll No. 53
BE-IV (7th Sem) [CO]
Guide: Mita Parikh
Co-Guide: Mayuri Mehta
CERTIFICATE
NEHA S. SHAH
Roll No. 53
BE-IV (7th Sem) [CO]
Natural Language Processing (NLP) is the process that can design and build the
software that will analyze, understand and generate languages that humans uses in
XI. INTRODUCTION------------------------------------------------------------------ 1
• DEFINITION
• ORIGIN
• GOAL
XIII.IMPLEMENTATION-------------------------------------------------------------9
XIV.PARSER
XV.PROLOG
XVI.APPLICATIONS-------------------------------------------------------------------19
XVII.LIMITATIONS---------------------------------------------------------------------20
XVIII.FUTURE----------------------------------------------------------------------------- 21
XIX.CONCLUSION--------------------------------------------------------------------- 22
XX.BIBLIOGRAPHY-----------------------------------------------------------------23
INTRODUCTION
1. DEFINITION:
Natural Language Processing is a theoretically motivated range of computational
techniques for analyzing and representing naturally occurring texts at one or more levels
of linguistic analysis for the purpose of achieving human-like language processing for a
range of tasks or applications.
'Natural Language Processing' (NLP) is a convenient description for all attempts to use
computers to process Natural Language. NLP includes:
XXI.Speech Synthesis
XXII.Speech Recognition
XXIII.Natural Language Understanding
XXIV.Natural Language Generation
XXV.Machine translation (MT)
2. ORIGINS:
Research in natural language processing has been going on for several decades
dating back to the late 1940s. Machine translation (MT) was the first computer-based
application related to natural language.
3. GOAL:
XXVII.The choice of the word ‘processing’ is very deliberate, and should not be replaced
with ‘understanding’.
XXVIII.For although the field of NLP was originally referred to as Natural Language
Understanding (NLU) in the early days of AI, it is well agreed today that while the
goal of NLP is true NLU, that goal has not yet been accomplished.
XXX.There are more practical goals for NLP, many related to the particular
application for which it is being utilized.
XXXI.For example, an NLP-based IR system has the goal of providing more
precise, complete information in response to a user’s real information
need.
XXXII.The goal of the NLP system here is to represent the true meaning and
intent of the user’s query, which can be expressed as naturally in everyday
language as if they were speaking to a reference librarian.
XXXIII.Also, the contents of the documents that are being searched will be
represented at all their levels of meaning so that a true match between
need and response can be found, no matter how either are expressed in
their surface form.
[1] PHONOLOGICAL:
This level deals with the interpretation of speech sounds within and across words.
There are, in fact, three types of rules used in phonological analysis:
In an NLP system that accepts spoken input, the sound waves are analyzed and
encoded into a digitized signal for interpretation by various rules or by comparison to the
particular language model being utilized.
[2] MORPHOLOGICAL:
It concerns how words are constructed from more basic meaning units called
morphemes. A morpheme is the primitive unit of meaning in a language.
E.g.
XXXVII.Relatively simple for English. But for some languages such as Hindi, Turkish it
is more difficult.
3] LEXICAL:
Lexical Analyzer reads the source program character by character and returns the
tokens of the source program. A token describes a pattern of characters having same
meaning in the source program.
:= assignment operator
oldval identifier
4] SYNTACTIC:
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the
given program. A syntax analyzer is also called as a parser .A parse tree describes a
syntactic structure.
Identifier := expression
Identifier number
oldval 12
In a parse tree, all terminals are at leaves. All inner nodes are non-terminals in a
context free grammar. The syntax of a language is specified by a context free grammar
(CFG). The rules in a CFG are mostly recursive. A syntax analyzer checks whether a
given program satisfies the rules implied by a CFG or not. If it satisfies, the syntax
analyzer creates a parse tree for the given program. The syntax analyzer works on the
smallest meaningful units (tokens) in a source program to recognize meaningful
structures in our programming Language.
5] SEMANTIC:
A semantic analyzer checks the source program for semantic errors and collects
the type information for the code generation. Type checking is an important part of
semantic analyzer. A context-free language used in syntax analyzers cannot represent
normally semantic information. Context-free grammars used in the syntax analysis are
integrated with attributes (semantic rules) the result is a syntax-directed translation,
Attribute grammars.
The type of the identifier newval must match with type of the expression (oldval+12).
6] DISCOURSE:
The next level of analysis is called "discourse theory". This is about the higher-
level relations that hold among sequences of sentences in a discourse or a narrative. It
merges sometimes with literary theory, but also with pragmatics.
One important idea in discourse theory is the idea that much language is
performed in the context of some mutual activity.
.
As "indirect speech acts" because they look like questions, but aren't really. One
way to think about sentences like this is that the hearer understands that this is probably
not a question, but is a conventionalized means of asking for the salt.
Another analysis of this sort of sentence is that you are trying to avoid rejection.
You do this by considering ways that your plan might fail. So you don't want to have this
happen:
Oh, sorry.
So you ask about potential problems first -- asking about ability. So that if there is a
problem, you don't have to ask directly and you won't be rejected. It is
sort of like:
7] PRAGMATIC:
This level is concerned with the purposeful use of language in situations and
utilizes context over and above the contents of the text for understanding.
The goal is to explain how extra meaning is read into texts without actually being
encoded in them. This requires much world knowledge, including the understanding of
intentions, plans and goals. Some NLP applications may utilize knowledge bases and
inferencing modules.
For example, the following two sentences require resolution of the anaphoric
term ‘they’, but this resolution requires pragmatic or world knowledge:
1] The city councilors refused the demonstrators a permit because they feared violence.
2] The city councilors refused the demonstrators a permit because they advocated
revolution.
IMPLEMENTATION
1. PARSER
Parser technique takes output from Speech Recognizer and returns to the semantic
frame representation. Speech understanding system is given below:
XL.During the parsing process edges will be added between the vertices of the
chart by the helper functions.
XLII.A Complete Edge is an edge where the marker occurs after the (at
the end of the string of symbols on the right-hand side). An example of a
complete edge is .
XLIV.The following four functions are used in both the nondeterministic and
deterministic versions of the chart parsing algorithm.
[I] Initializer:
XLIX. Once S' has been obtained we are finished our parse
(assuming that it is a parse spanning the entire input, we don't
want an incomplete parse).
[2] Predictor:
[3] Completer:
[4] Scanner:
Figure 4 Presents the trace of this chart parse in tabular format, included to aid in
understanding the example. There is another example presented on page 700 of the.
Figure 4: Trace of chart parse of "Artificial Intelligence is smelly."
Since this course is only an overview of AI we have decided not to go into the
details of how this algorithm works. With the pseudo-code provided below and several
hours to play around with it, you should be able to figure out how the algorithm works,
but please remember this is not critical for computer science 533.
It only adds edges, which will serve to extend this parse tree. One of the
advantages of this algorithm is that it avoids adding some edges, which could not
possibly be part of a sentence spanning the input string.
For example, a left-corner parsing algorithm will correctly avoid interpreting "ride
the horse" as a verb phrase (VP), but will still correctly interpret the phrase "the horse
gave" as a relative clause. This can be seen in Figure 6.
When designing a parsing algorithm there are three things we should keep in
mind. These will help ensure we have an algorithm, which is efficient.
2. PROLOG:
Prolog is a logic programming language which has been popular for developing
natural language parsers and feature-based grammars, given the inbuilt support for search
and the unification operation which combines two feature structures into one.
Unfortunately Prolog is not easy to use for string processing or input/output, as the
following program code demonstrates:
Main: -
current_input(InputStream),
read_stream_to_codes(InputStream,Codes),
codesToWords(Codes,Words),
maplist(string_to_list,Words,Strings),
filter(endsWithIng,Strings,MatchingStrings), writeMany(MatchingStrings),
halt.
codesToWords([],[]). codesToWords([Head|
Tail],Words):- (char_type(Head,space)->
CodesToWords(Tail,Words);
getWord([Head|Tail],Word,Rest),
codesToWords(Rest,Words0),
Words=[Word|Words0]).
getWord([],[],[]).
getWord([Head|Tail],Word,Rest):-
( (
char_type(Head,space);char_type(Head,punct))
->Word=[],Tail=Rest;
getWord(Tail,Word0,Rest),Word=[Head|Word0]
).
filter(Predicate,List0,List):-
(List0=[]->List=[]
;List0=[Head|Tail],
(apply(Predicate,[Head])->
filter(Predicate,Tail,List1),
List=[Head|List1]
;filter(Predicate,Tail,List) )
).
endsWithIng(String):-sub_string(String,_Start,_Len,0,'ing').
writeMany([]).
writeMany([Head | Tail]) :- write(Head), nl, writeMany(Tail).
APPLICATION
q Machine Translation:
• E.g. http://babelfish.altavista.com/translate.dvn
It can translate the sentences or letters or words into any language.
LIMITATION
LIX.Physical limitations:
NLP suffers from the lack of a unifying ontology that addresses semantic as well
as syntactic representation. The various competing ontologies serve only to slow the
advancement of knowledge management.
NLP lacks an accessible and complete knowledge base that describes the world in
the detail necessary for practical use. The most successful commercial knowledge bases
are limited to licensed use and have little chance of wide adoption. Even those with the
most academic intentions develop at an unacceptable pace.
FUTURE
Computer Scientists and Information Professionals have been working on the idea
of natural language processing for as long as there have been computers. In the decades
of research, amazing progress has been made in this field and there are now many natural
language applications on the market and in general use.
LXIII.Some forty years later however, dynamic research is still taking place with
Natural Language Processing.
Two, there are so many different kinds of applications where NLP will be
able to help.
LXIV.From translating texts or even websites to transcribing speech for the hearing
impaired, natural language can improve information access in so many different
ways - alone or in conjunction with other non-NLP technologies.
LXV.How about robots that understand and follow instructions by human voice or
driving by talking to the car like in some science fiction movies. Well they all
can be real one day.
CONCLUSION
Natural Language Processing takes a very important roll in new machine human
interfaces. It is very difficult to design a system that is 100 % accurate for NLP. These
problems get more complicated when we think of different people speaking the same
language with different styles. Therefore most of research on speech recognition is more
concentrated on there areas. Information retrieval can be improved to give very accurate
results for various searches. This will involve intelligence to find and sort all the results.
So such intelligent systems are being experimented right now are we will be able to see
improved applications of NLP in the near future.
BIBILIOGRAPHY
LXVIII.http://www2.sims.berkeley.edu/courses/is290-2/f04/lectures
LXIX.http://ai-depot.com/ska/paper/node21.html
LXX.http://sern.ucalgary.ca/courses/cpsc/533/W99/presentations/L2_23A_C
urry_Lee/
LXXI.http://www.ec-gis.org/Workshops/7ec-
gis/papers/html/rachev2/7thGISRachevPaper.htm
LXXII.http://www.csc.villanova.edu/~nlp/intro.html
LXXIII.http://www.cogs.susx.ac.uk/research/nlp/gazdar/nlp-in-prolog/ch01
LXXIV.http://www.mind.ilstu.edu/published/Phillips/natlangproc.php#fn3
LXXV.<http://debra.dgbt.doc.ca/chat/chat.html>
LXXVI.<http://www.cpsc.ucalgary.ca/Courses/461/notes/Signatures/Signatu
res1.html>