Final Thesis

ii
ACLC College of Butuan

COMPUTER EDUCATION DEPARTMENT
HDS Building. 999 J.C. Aquino Avenue, Butuan City, Agusan del Norte
Philippines,8600
INFORMATION RETRIEVAL CHATBOT FOR SCHOOL

ADMISSION USING NATURAL LANGUAGE PROCESSING
A Thesis
Presented to the Faculty of
COMPUTER EDUCATION DEPARTMENT
In Partial Fulfillment
of the Requirements for the Degree in
Bachelor of Science in Computer Science
Submitted by:
IAN BILL JUSTINE P. CABARLES
JOHN PAUL M. GETONGO
JULY 2022
ii
APPROVAL SHEET
The thesis attached hereto, entitled "INFORMATION RETRIEVAL

CHATBOT FOR SCHOOL ADMISSION USING NATURAL LANGUAGE
PROCESSING," prepared and submitted by IAN BILL JUSTINE P. CABARLES
and JOHN PAUL M. GETONGO in partial fulfillment of the requirements for the
degree BACHELOR OF SCIENCE IN COMPUTER SCIENCE, is hereby
recommended for approval.
JOEL C. TRILLO CHRISTOPHER C. ABALORIO, MIT

Thesis Adviser Thesis Instructor
_________ _________
Date Date
JUNELL T. BOJOCAN, MIT JAMES CLOYD M. BUSTILLO, MSIT

Panel Member Panel Member
_________ _________
Date Date
CHRISTOPHER C. ABALORIO, MIT

Panel Chairman
_________
Date
This THESIS is approved in partial fulfillment of the requirements for the

degree BACHELOR OF SCIENCE IN COMPUTER SCIENCE.
JAMES CLOYD M. BUSTILLO, MSIT

Research Innovation Coordinator
_________
Date
JUNELL T. BOJOCAN, MIT

Dean, Computer Education Department
_________
iii
Date
ACKNOWLEDGMENT
The researchers would like to offer their deepest gratitude and appreciation to
all who assisted and contributed significantly to the study's completion. Making it to
this point has never been easy. Many obstacles were met along the way, but with
determination and perseverance, everything is possible.
The proponents would like to express their gratitude to the Almighty Father
for providing them with the chance to conduct this research study and successfully
complete it. He also blessed them with knowledge and understanding. Finally, without
the strength, talent, and guidance to complete and surpass the goal, this success will
be unattainable.
We would like to thank Mr. Joel Trillo, thesis adviser, for his immense
patience, untiring guidance, and inspiration in motivating us.
We would like to thank Mr. Christopher Abalorio, our OLC Instructor of CS
Design Project 2/Thesis 2 for extending his additional knowledge and help even with
his busy schedule.
To the panelists, Mr. Cristopher Abalorio, Mr. Junell Bojocan, and Mr.
James Cloyd Bustillo, for helping the proponents improve the study by giving their
constructive comments and suggestions.

iv
To the researcher's cherished families, particularly their parents, whose
unwavering support never failed to provide financial and moral assistance to the
proponents. For giving the researcher's needs during the development of the system.
Last but not least, to our Almighty God for his guidance throughout the
development of this study.

v
ABSTRACT
In this time of the pandemic, most of our daily tasks that have been
traditionally done in person have been overhauled to a digital and online platform.
Answering queries from the majority of customers in large businesses has been a
norm, in most cases, it is catered by people who are specifically hired for that role. In
this paper, the researchers proposed a chatbot prototype using Natural Language
Processing and Information Retrieval that could assess customers' questions in a
specific sub-organization of an educational institution, namely ACLC College of
Butuan's Admission Department. The majority of today's chatbots are utilized for a
single purpose: information retrieval. This type of bot is intended to offer human-like
responses without the need for human intervention. Here, the system tries to figure
out what question you are attempting to ask, or, more realistically, which question
from its bank is closest to it, and responds appropriately (in this case, trained). Our
proposed design 1.) processes user questions through a natural language processing
pipeline, then 2.) identifies keywords within the processed query. 3.) it would then
find those keywords in tagged data that it was trained for and retrieve the
corresponding response. 4.) Finally, it would give that corresponding response to the
user. The process proved to be easy at first, but we ran into unexpected issues that
would have delayed and prolonged beyond what we had planned for our timeline.
Nonetheless, the researchers proved that this prototype chatbot would suffice and
prove that the study is viable for real-world application.
KEYWORDS: ACLC College of Butuan, Chatbot, Information Retrieval, Term

Frequency-Inverse Document Frequency (TF-IDF), and Natural Language
Processing.
vi
TABLE OF CONTENTS
Description Page
TITLE PAGE…..…………………………………………………………...……..….i
APPROVAL SHEET….………………………………………………….......….......ii
ACKNOWLEDGMENT…..………………………………..…………….….....…..iii
ABSTRACT…..…………………………………………………………...…...…......v
TABLE OF CONTENTS………………………………………………..………......vi
LIST OF FIGURES…..……………………………………………………….…….xi
LIST OF TABLES…..……………………………………………………..….
…....xiii
CHAPTER 1 INTRODUCTION…..…….…………...…………………….…..
…..14
1.1 Background of the Problem………………………………………….……..…….14
1.2 Statement of the Problem………………...…………………………………....…20
1.3 Objective of the Study………………………....…………………………...…….20

vii
1.4 Scope and Limitation………………………...………………………...………...20
Scope…………………………………………………………………………20
Limitation……………………………………………………………...……..21
1.5 Definition of Terms…………………………………...………………………….22
CHAPTER 2 REVIEW OF RELATED LITERATURE…………………………
23
2.1 Related Literature………………………………………...……………………....23
2.1.1 Natural Language Processing (NLP)…………………………………...23
2.1.2 An Information Retrieval-based Approach for Building Intuitive
Chatbots for Large Knowledge Bases …....…………………………..24
2.1.3 Text Mining: Use of TF-IDF to Examine the Relevance of Words to
Documents……………………………………………………………...25
2.1.4 Chatbots for Customer Service: User Experience and Motivation…….26
2.1.5 Sequential Matching Network: A New Architecture for Multi-turn
Response Selection in Retrieval-Based Chatbot……………………….27
2.1.6 Formation of SQL from Natural Language Query using NLP…………28
2.1.7 Doly: Bengali Chatbot for Bengali Education…………………………30

viii
2.1.8 Evaluation of Information Retrieval Systems………………………….31
2.1.9 Online transactions in higher education during lockdown period of
COVID-19 pandemic…………………………………………………..32
CHAPTER 3 DESIN AND METHODOLOGY…………………………………..34
3.1 Conceptual Framework………………………………..........................................34
3.2 System Model and Design…………………………………….……………..…...35
3.2.1 Natural Language Processing……………………….………………….35
3.2.2 Information Retrieval…………………………………………………..38
3.2.3 Calculation of TF-IDF………………………………………………….39
3.3 Evaluation Method……………………………………………………………….40
3.4 Programming Language………………………………………………………….41
3.3.1 Python…………………………………………………………………..41
3.4.2 Libraries………………………………………………………………...41
3.5 Integrated Development Environment (IDE) ……………………………………42
3.5.1 Python (IDLE SHELL) ………………………………………..
……….42
ix
3.5.2 Sublime Text
Editor…………………………………………………….43
3.6 Web Framework……….…………………………...…………………………….43
3.6.1 Flask (Python)….. ……………………………………………………………..43
3.7 Data Gathering…………………………………………………………………...44
3.7.1 School Admission………………………………………………………
44
3.7.1.1 Admission
Officer…………………………………………….44
3.8 Relevant Technology……….……………………...……………………………..45
3.8.1 Hardware Requirements………………………………………………..45
3.8.2 Software Requirements…………………………………………………
45
CHAPTER 4 RESULTS AND DISCUSSIONS……………………………………
46
4.1 The Chatbot……….………………………………………………………………
46
4.1.1 Tokenize………………………………………………………………..47
x
4.1.2 POS Tagging……………………………………………………………
47
4.1.3 Lemmatization………………………………………………………….48
4.1.3 Information Retrieval……………………………………..
…………….49
4.2 Initiate Readable Data……….……………………………………………………
50
4.2.1 Dataset File……………………………………………………………..50
4.2.2 Training the Datasets…………………………………………………...51
4.3 Testing the Chatbot Prototype……………………………………………………52
4.3.1 Testing Commonly Asked Questions…………………………………..54
4.3.2 Testing Short, Unintelligible
Questions………………………………...55
4.4 Response Time Graph……….……………………………………………………
56
4.5 Evaluation……….…………...……………………………………….…………..57
CHAPTER 5 SUMMARY, CONCLUSION, AND
RECOMMENDATION…….59
xi
5.1 Summary……….…………………………………………………………………
59
5.2 Conclusion……….………………………………..….…………………………..61
5.3 Recommendation……….……………………………………...…………………62
REFERENCES……………………………………………………………………...63
APPENDIX A Gathered Datasets………………………………………………….65
APPENDIX B Training the Dataset…………………………………….
………….66
DOCUMENTATION……………………………………………………………….67
CURRICULUM VITAE………………………………………………...………….69
LIST OF FIGURES
No. Description
Page
1 Conceptual Framework.................................................................................34
xii
2 NLTK POS tags example..............................................................................37
3 Lemmatizer example.....................................................................................38
4 Sample result of Calculation.........................................................................39
5 Comparing responses using spaCy................................................................40
6 Python IDLE Shell 3.9.5...............................................................................42
7 Main File Running the Information Retrieval Chatbot.................................46
8 Tokenization sample result...........................................................................47
9 Output of POS tagging..................................................................................47
10 Training files for the main application..........................................................50
11 Raw data extracted formatted in .txt file.......................................................50
12 Graph of the result of training the datasets..................................................51
13 Asking the offered courses...........................................................................52
14 Asking enrollment requirements for new students.......................................52
15 Asking enrollment requirements for the transferee......................................53
16 Asking about the start of enrollment............................................................54
17 Giving insufficient information....................................................................55

xiii
18 Graph of the response time of the Chatbot...................................................56
19 Raw data extracted formatted in .txt file......................................................65
20 Sample data training.....................................................................................66
21 Gathering Data from ACLC Butuan............................................................67
22 Gathering Data from ACLC Butuan Facebook page...................................67
23 Figure Developing the Chatbot....................................................................68

xiv
LIST OF TABLES
No Description Page
1 Sample Lemmatization result……………………………………………... 43
2 Sample result of the Calculation…………………………………………... 44
3 Sample questions and response of the Chatbot compared to the response
of the Admission Officer……………………………..…………………... 52

15
Chapter 1
INTRODUCTION
1.1 Background of the Problem
The COVID-19 (Coronavirus Disease 2019) pandemic has had a significant
negative impact on economies and people of all ages and socioeconomic
backgrounds. Increased online business, educational, and economic activities have
become the new standard. The entire system of education, tertiary level through
primary level, has failed during the COVID-19 shutdown period not just in India, but
also around the globe. Face-to-face classes had to be suspended; therefore, classes
could only be delivered through online. It was decided that both synchronous and
asynchronous types of instruction would be used as a medium. Annotated
PowerPoints and voice-over narration from the teacher are also included in the
asynchronous online training, in addition to the transmission and keeping of readings
and session materials on the LMS. Individual tasks were developed from session
activities and discussion assignments with the inclusion of clearer directions and
feedback. Social media and virtual gatherings have evolved into the "new normal" on
a national and international level as individuals attempt to preserve normalcy in their
lives in spite of pandemic constraints. Because of this problem, almost all
establishments accept or consider this solution to adopting online transactions like
schools, hospitals, banks, and other companies.

16
Due to rising concerns over the COVID-19 virus's spread and calls to control the
Corona virus, an increasing number of higher educational institutions have stopped
providing traditional in-person classes globally. The Corona virus has revealed
previously unknown flaws in educational institutions all over the world. As humanity
faces an unpredictable future, it is clear that society demands flexible and robust
educational systems and teaching practices now more than ever. Universities
throughout the world are undergoing substantial changes in how they function and
connect with their constituents because students and their families are seeking more
attention through numerous channels as well as quick response and service.
According to studies, the younger generation prefers chat services like WhatsApp,
SMS, and Facebook Messenger to phone calls or other direct person-to-person
engagement methods. We now communicate continuously at great volume and
intensity using a range of platforms, tools, and techniques. However, as technology
advances, it is now feasible to create user-friendly systems that converse with a
variety of user populations in the same way that humans would. (Lala Olusegun
Gbenga, 2020).
Every day there may be inquiries that admissions officers cannot cater to all of
the questions. The Admissions office answers all the questions of the customers
regarding the school. Because of the COVID, the school was unable to add staff to
address this issue, and staff are limited to working at school, particularly if they are
performing skeletal work. The researchers attempt to find a solution to this problem; a
solution that deals with the lack of attention when answering questions. Now that
17
almost all transactions are online, the researchers thought of the solution of making a
chatbot that answers all inquiries online, 24/7. Regarding this matter, this study
(chatbot) will solve the problem. Because the chatbot can operate on its own, it caters
to the customers 24/7 without the need for additional staff at the Admission Office.
Chatbots are a viable approach for automating customer service, especially as
online chat is increasingly being utilized for customer service. Chatbots are not a
novel technology as they date back to ELIZA, which was developed by Joseph
Weizenbaum in 1966. However, recent advances in machine learning and artificial
intelligence, as well as the increasing usage of messaging platforms, have prompted
organizations to consider chatbots as a supplement to customer service (Flstad A.,
Nordheim C.B., Bjrkli C.A. 2018). In this age of the internet and the COVID-19
pandemic, more than ever, communication is necessary, and everyday use of several
internet-based communication services is increasing. Chatbots have regularly been
used to speed up the process of replying to customer inquiries. Chatbots can
comprehend your communications and reply properly thanks to a process called
natural language processing (NLP). Natural language processing helps AI deliver the
best answer by providing context and meaning to text-based user inputs.(2020,
Khrystyna Sarakhman, Roman Kempnyk, and Vladyslav Chyhura).
AI-driven chatbots can understand natural human language, discern meaning and
emotion, and deliver smart responses as if a real human delivered it. Customers may
easily obtain responses to their concerns, for example, without being forced to wait in
phone lines or send numerous emails. They can lower the volume of client calls, the
18
typical handling time, and the cost of providing customer service (Mohammad
Nuruzzaman, Omar Khadeer Hussain, 2018).
AI-based chatbots can recognize meaning and emotion in natural language,
comprehend context, and provide thoughtful answers. As an instance, it helps
customers to swiftly get answers to their questions without having to wait in long
phone lines or send several emails. They can reduce phone volume, average handling
times, and customer service costs. (György Molnár, Zoltán Szűts, 2018).
At the start of the current decade, chatbots started to appear in great numbers.
Online conversation has been swiftly invaded and taken over by interactive
technology, which is frequently integrated with artificial intelligence. Chatbots are
used by businesses, governments, and other organizations to advertise goods, services,
and ideas on websites, in apps, and on instant messaging systems. They are not
merely components of virtual assistants. The authors of this paper begin by providing
a theoretical and historical framework, then emphasize the issues with using chat bots
as teaching aids, and finally detail the core methods and obstacles of chat bot
construction. (György Molnár, Zoltán Szűts, 2018).
When it comes to responding to client’s questions that are regularly asked and
frequently answered, chatbots employed in customer service save a large amount of
time and resources for businesses. In higher education settings, where students
frequently ask staff members about organizational and administrative matters, such a
technique is highly widespread. This is also clearly visible during the difficult
19
admissions season, when the admissions team is required to reply to inquiries from
countless numbers of curious high school students. Software programs called
conversational agents use natural language to have conversations that resemble those
between humans. In order to pass the Turing Test, they omitted information from
users making them think they were people in the 1960s. It was observed that allowing
users to convey their questions and interests organically by speaking, typing, or
pointing would boost and improve the overall user experience. (W. El Hefny et al.
2021). This shows that chatbot is a good solution for answering questions of
customers of a certain school, especially questions for the Admissions office.
Now, in building the chatbot there are multiple ways of approaching it and one of
them is using Natural Language Processing. The backbone of chatbots, natural
language processing (NLP), has undergone numerous changes and evolved into many
techniques of how it processes and interprets human language. Chatbots can
comprehend your communications and reply properly thanks to a process called
natural language processing (NLP). Natural language processing (NLP) assists in
providing context and meaning to text-based user inputs in order for AI to deliver the
best possible result. (Khrystyna Sarakhman, Roman Kempnyk, Vladyslav Chyhura,
2020).
A chatbot can comprehend your communications and reply properly because of a
process called natural language processing (NLP). The chatbot may use its AI skills to
provide a suitable answer when you start a message with "Hello," since the NLP
directs the chatbot that you provided a typical greeting. The chatbot will probably
20
give a reply greeting in this situation. Without the logic used in Natural Language
Processing, a chat bot cannot discern between the responses "Hello" and "Goodbye."
"Hello" and "Goodbye" will both be text-based user inputs to a chatbot that lacks the
use of Natural language processing. It helps AI deliver the best answer by providing
context and meaning to text-based user inputs. (Casey Phillips, 2018).
In addition, the researchers will also use the Information Retrieval method in
building the chatbot. This method will retrieve NLP and understand the data needed
once the input of the user. This will serve as the final stage of processing the data to
answer the question of the user. In other terms, a database is an organized system of
storage that enables searching for objects within it based on predetermined criteria.
The technology that enables searching of databases to retrieve information stored
within is known as a search mechanism. Depending on the user's technical proficiency
while accessing the database, the complexity of the query techniques used varies. A
regulated vocabulary or "natural language" can be used as the third element of an
information retrieval system. (Chu, 2005, p.16).
In conclusion, because of the need to be able to conduct and continue online
classes, all students come through with online enrollment/transactions. ACLC College
of Butuan adopts online transactions for the payment of tuition, enrollment, classes,
meetings, and inquiries or questions about the school. Now that almost all transactions
are online, the researchers thought of the solution of making a chatbot that answers all
inquiries online, 24/7. Regarding this matter, this study (chatbot) will solve the
21
problem. Because the chatbot can operate on its own, it caters to the customers 24/7
without the need for additional staff at the Admission Office.
1.2 Statement of the Problem
The researchers focus on the design of an information retrieval chatbot that can
properly respond to queries regarding school admission matters. Specifically, it aims
to answer the following questions:
1.) Can an information retrieval-based Chatbot cater to the frequently asked
questions from customers pertaining to school admission queries?
2.) Will an information retrieval-based chatbot model be able to generate 60% -
80% accuracy in generating an informative response?
3.) Can it offer a more efficient and quick response to inquiries asked by the
customers/students of ACLC COLLEGE?
1.3 Objective of the Study
To create a chatbot system that can cater to students or customers, maximize
time and efficiency in answering questions in place of Admission Staff, and
prove that an information retrieval-based chatbot is viable enough for that role.
1.4 Scope and Limitation
Scope:
● The Chatbot accepts inquiries from students or customers of ACLC College of
Butuan and answers them.

22
● The Chatbot will process the questions and provide specific
answers/information.
Limitation:
● It only answers school admission questions.
● It only caters to ACLC College of Butuan admission queries.
● It only accepts and responds to English text. It only caters to 1 question at a
time.
23
1.5 Definition of Terms
Chatbot – is a computer program created to mimic conversations with real people,
particularly online.
Epoch - is used in machine learning to describe how many iterations the algorithm
has made across the full training dataset.
COVID-19 - (Coronavirus disease 2019) is an infectious illness brought on by a
coronavirus that has just been identified. The majority of COVID-19 patients will
have mild to moderate symptoms and recover without additional care.
Natural Language Processing (NLP) - is a subfield of AI that enables machines to
comprehend, interpret, and modify the human language.
Natural Language Toolkit (NLTK) - is a tool used to create Python algorithms for
statistical natural language processing that use data from human language (NLP).
Information Retrieval (IR) - is a software program that controls how data from
document repositories, particularly textual data, is organized, stored, retrieved, and
evaluated.
ACLC College of Butuan - this is the institution where the proponents gather
questions and datasets.

24
Chapter 2
REVIEW OF RELATED LITERATURE
2.1 Related Literature
2.1.1 Natural Language Processing (NLP)
Raina, V., Krishnamurthy, S. (2022). The collection of techniques
known as "natural language processing" is used to make human language
understandable to computers. Over the past ten years, natural language
processing has become increasingly integrated into our daily lives. For
example, automatic machine translation is widely used on the internet and in
social media, text classification keeps our email inboxes free of spam, a high
level of language sophistication and dialog systems have been attained by
search engines beyond string matching and network analysis. are becoming a
more widely used and effective method of exchanging information.
These various applications, which incorporate elements of algorithms,
languages, logic, statistics, and more, are founded on a shared set of concepts.
This text's objective is to give an overview of these underpinnings. The
remainder of this chapter discusses several high-level topics in contemporary
natural language processing, situates natural language processing in relation to
other academic fields, and offers reader guidance on how to approach the
25
subject. The technical fun begins in the following chapter. Based on their
research, NLP makes use of language to comprehend and manipulate it so that
appropriate tools and strategies may be created to assist computers in
comprehending and manipulating natural languages in order to carry out their
assigned tasks. As a result, the researchers can use this research to assist in the
development of a chatbot that can interpret the users' questions or queries.
2.1.2 An Information Retrieval-based Approach for Building Intuitive
Chatbots for Large Knowledge Bases
In 2019, they saw the implementation of a conversational bot and the
deployment of the system on the official web portals of two significant
German cities by Andreas Lommatzsch and Jonas Katins. They began as an
extra avenue for residents looking for information about the services provided
by the administration without making any formal statements. We saw a
steadily rising interest in the service over the first month. After serving over
2500 dialogs each month on one municipal site for six months, they provided
us with insights into the user preferences and behaviors. They provide an
architecture that combines pre-existing databases, dialog handling tools, and
components for translating user inquiries into knowledge base entries.
In their study, they created a chatbot that responds to inquiries on the
public administration's services. It is similar to the study the researchers are

26
conducting to answer queries about a certain topic. As a result, the researchers
can use this research to build a chatbot since this is related to the study.
2.1.3 Text Mining: Use of TF-IDF to Examine the Relevance of Words to
Documents
In Shahzad Qaiser and Ramsha Ali's study (2018), The words "TF-
IDF" and "IDF" are combined. Term frequency and inverse document
frequency. The idea of "term frequency" will be discussed first. TF measures
the frequency of a term within a document. It is a well-known fact that papers'
overall lengths can range from extremely short to very long; consequently,
Any phrase may occur more frequently in lengthier documents than in shorter
ones. Consider a paper called "T1" that contains five thousand words and
exactly ten instances of the term "Alpha." To solve this problem, the word
frequency is calculated by dividing each instance of a term in a document by
the total number of terms in that document.
We shall now talk about inverse document frequency. Variable
keywords have varied weights, and when the term frequency of a page is
calculated, it can be observed that the algorithm treats all keywords equally,
regardless of whether they contain stop words like the incorrect "of." Imagine
a sentence that uses the end word "of" two thousand times, yet is completely
irrelevant or of no use. The IDF would be helpful in such a situation. Inverse

27
document frequency provides terms that do not occur often more weight than
those that do. For instance, if there are ten documents and the word
"technology" appears in five of them, the inverse document frequency is given
by IDF = log (20/6) = 0.2140.
2.1.4 Chatbots for Customer Service: User Experience and Motivation
Asbjorn Folstad and Marita Skjuve (2019) used the chatbots to attract
people from different service provoders. In answer to questions on the services
offered by the service providers, both chatbots give information and support.
Both chatbots provide a kind greeting to the user by giving a succinct
welcome greetings and background data before getting the user’s inquiries.
The inquiry is then evaluated to see if it fits one of the thousands of intents the
chatbots have to offer, and an acceptable response is subsequently supplied.
The user frequently needs to choose from a branching conversation tree's
possibilities in order to react to a series of follow-up questions in order to
receive an answer. Typically, responses come in the form of text with links to
other information or self-service on the business website. The Chatbot can
advance the conversation to a live customer care agent in cases when it cannot
respond to the user's question or the user is not happy with the response.
Marita Skjuve and Asbjorn Folstad to provide customer assistance they
deployed Chatbots; their study's findings show how crucial it is for these
28
chatbots to respond quickly to straightforward inquiries with appropriate
responses. Our findings also suggest that as long as the Chatbot provides a
simple route for further communication with real customer support people, the
rare absence of sufficient responses does not always result in a negative
experience.
2.1.5 Sequential Matching Network: A New Architecture for Multi-turn
Response Selection in Retrieval-Based Chatbot
There are many ways to approach problem-solving, and the most
effective approach depends on the specific problem at hand. Al. In 2017,
researchers analyzed how to answer selection works in chatbots that rely on
retrieval. They found that this process can be difficult, especially when the
conversation becomes longer. The challenge in matching a response to a
conversation context is to find meaningful pieces of information in the context
and to use that information to match responses to the conversation. Existing
matching methods may not be able to capture all the relevant information in
certain contexts. The authors propose a unified framework in which the
context is treated as a fixed-length vector without any interaction with the
answer before matching. This new sequential matching framework is known
as an SMF; it can effectively take significant information from the contexts to
match the relations between speeches. The first step in SMF matching is
29
matching the response and converting it into a matching vector. Then, the
vectors that match are gathered with the help of an RNN. The final step in the
context-response matching process is calculating the match between the
context and the response. On two open datasets, the model's effectiveness was
evaluated. The findings show that both models can outperform cutting-edge
matching techniques.
2.1.6 Formation of SQL from Natural Language Query using NLP
M Uma, V Sneha, G Sneha, 2019, suggested system is made up of a
number of modules which are used to extract keywords and discard
unnecessary information. This is crucial since redundant data will
unquestionably lower the system's overall performance. The first processing of
the incoming data is followed by a mapping phase. Tokenization,
lemmatization, POS tagging, and parsing are some of the NLP steps in the
translation process. Following the determination of the attributes in the input
that has been processed, the mapping step creates the SQL query using the
pertinent data. The workflow of their proposed work is the following:
1) Tokenization: It is the initial stage in the process of
breaking a phrase into more manageable tokens. These are typically
words. When the user provides the input in text form, tokenization is
30
implemented, and the results are saved in the form of a list. The word
tokenize module of the Python tokenize function package was utilized
by the supporters.
2) Lemmatization: The root words or lemmatization in a
procedure known as stemming, of each of the tokens are created from
the outcomes of the previous phase and are added to another list..
Lemmatization is preferred over stemming since stemming just
involves removing a word's prefix or suffix, which may not necessarily
result in accurate results.
3) Syntactical Analysis: all tokens that are lemmatized is
examined in syntactic analysis, and each token is assigned a POS based
on the context in which it appears. Every word and its associated tags
are condensed turned into a tuple in this case, and a list of each of these
tuples is generated.
4) Semantic Analysis: in this stage, we attempt to
interpret the tokens in order for the system can move on when the SQL
query is created. This is accomplished through the parsing (or
chunking) method.
Their study uses NLP in creating their Chatbot. They used the same
steps or stages to process the user's input that we will be using for our study.
Tokenization, Lemmatization, Syntactical Analysis, Semantic Analysis.

31
2.1.7 Doly: Bengali Chatbot for Bengali Education
MirMd.Moheuddin Khan, Md. Kowsher 2019 uses natural language
processing methods and the Natural Language Toolkit for Python to develop
devices for learning languages, intelligent explanations, and human-like
responses. They uses Python version 3.7.0 in building the Chatbot. Python is
an interpreted, high-level, general-purpose programming language. Code
readability is a key component of Python's development model and features
substantial indentation, this is best suited developing researches especially
science field. Additionally, they used Anaconda to partition Python. The best
Python-based open-source data science platform is provided by Anaconda.
They used the Natural Language Toolkit for natural language processing
(NLP) (NLTK). I also needed to install ChatterBot. ChatterBot comes with
built-in adapter classes that allow you to connect to different types of
databases. Python 3.7.0 is used to implement the Chatbot as it gives a
'Unicode decoding error .'Unicode decoding errors are runtime errors caused
by non-English languages that have many characters in their alphabet, such as
Bengali. The Chatbot has a Unicode range that has vowels and consonants.
This includes consonant conjunctions, modifiers, and other elements.
Therefore, their Chatbot cannot be bypass or decoded with an ASCII decoding
system.
32
In summary, they used the Natural Language Processing method and
Natural Language Toolkit for Python, which is one of the methods we will use
in creating the Chatbot. We will use Python and import the NLTK (Natural
Language Toolkit) library.
2.1.8 Evaluation of Information Retrieval Systems
Keneilwe Zuva1 and Tranos Zuva (2018) conducted a study that
evaluated Systems that used the Information Retrieval method. Evaluation is a
very important and tedious task in information retrieval systems. The literature
has a wide variety of search models, techniques, and systems. So you have to
pick the best of the many, choose which ones to use, and evaluate them to
enhance. One method of evaluation is to measure the effectiveness of the
system. The difficulty in measuring effectiveness has to do with the relevance
of the items searched. Relevance is thus the basis for evaluating information
retrieval. Therefore, understanding the relevance is important. To support the
laboratory experiments in early research was viewed as thematic relevance,
that is, thematic relevance between items and queries. Relevance is the
connection between a book, alternative, thing, or bit of information and an
issue, a need for information, a request, or a demand. Relevance can be
situational (relating to the user's present needs), subjective (based on a
particular user's judgment), cognitive (dependent on human perception), and

33
dynamic from a human perspective (over time, change). Due to concerns with
relevance, user-oriented system evaluation is particularly resource-intensive to
implement. Both textual and non-textual contexts have been used to research
this relevance problem. The Information Retrieval Evaluation Experiment thus
focuses solely on system evaluation. Consult an impartial expert to determine
whether the document or item is relevant to your information needs.
2.1.9 Online transactions in higher education during lockdown period of
COVID-19 pandemic
A study on online learning and transactions was conducted by
Lokanath Mishra, Tushar Guptab, and Abha Shreeb. The federal and state
governments have approved the national rollout of online education,
considering current demands. Numerous student and teacher organizations at
the national, state, and university levels have endorsed the idea of online
teaching with varying degrees of zeal and reluctance. Due to a lack of training,
orientation, and motivation for participants to use online teaching, there is a
natural inclination to experiment with new technology and business ways of
teaching in the educational system. Our readiness for this pandemic change,
our preparedness for online education, and the availability of resources to
apply online teaching strategies all went into creating an action plan. Teachers
have trained and prepared themselves to get familiar with the technology
needed to employ online teaching methods as part of the action plan. System
administrators and information and communication technology (ICT)

34
specialists have supported stakeholders as needed and handled the change
process at the university level. The success of online teaching and learning
has, however, been the subject of a great deal of research, none of which was
conducted during the COVID-19 lockdown. As a result, the researcher is
motivated to carry out this research with the following goals.

35
Chapter 3
DESIGN AND METHODOLOGY
This chapter introduces the relevant tool and methods for completing this
research. The following discussions include the tools and steps of building the
Chatbot. This chapter aims to solve the problem.
3.1 Conceptual Framework
Figure SEQ Figure \* ARABIC 1: Conceptual Framework.
Figure 1. Conceptual Framework
Figure 1: Conceptual Framework.

36
3.2 System Model and Design
3.2.1 Natural Language Processing
Natural Language Processing (NLP) is a branch of computer science
that studies Artificial Intelligence (AI) and how computer can interpret and
interpret human language. The researchers will use NLP. The question will be
processed into three steps: tokenize, POS tag, and lemmatize. In that way, the
Chatbot will understand the question and can find the right answer to that
question.
3.2.1.1 Tokenizer
Tokenization is the process of dividing characters of written
language into its section words. Tokenizer identifies individual tokens
or words. A language model is produced on the identified tokens or
words (Zhao & Zhang, 2016).
A text is tokenized when it is divided up into smaller units, like
phrases and words. Tokens are the name for these pieces. A sentence is
a token in a paragraph, just as a word is a token in a sentence. In this
stage, the researchers will use Python NLTK Tokenizer. There is a
module named tokenize in NLTK, which is further classified into two
subcategories:
37
● Tokenize word: The researchers used this method to
split a sentence into words or also called as tokens.
● Tokenize sentence: The researchers used the method
to split a paragraph into sentences.
3.2.1.2 POS Tagging (Parts of Speech)
Each token is automatically described as part of the
classification process known as tagging. The descriptor, in this case, is
known as a tag, and it can indicate various things, such as semantic
data and portions of speech.
One of the aspects of speech's job with a given word is
described as the process of allocating one of the verbal components of
the word. It's known as point-of-speech (POS) tagging. In layman's
terms, POS tagging is the process of assigning the correct part of
speech to each word in a phrase.
In this stage, the researchers will use the Python NLTK library.
Steps in the POS tagging example:
● Tokenize text (split the text)

38
● Part-of-speech tags.
Figure 2: NLTK POS tags example.
3.2.1.3 Lemmatization
Lemmatization is the process of converting a word to its base
form. Lemmatization, unlike to stemming, converts the term to its
relevant base form after analyzing the context. Stemming only
eliminates the last few characters, which commonly leads to incorrect
spellings and interpretations.
For example, Lemmatization would correctly determine the
fundamental form of ‘eating’ to ‘eat’, 'Eating’ -> Lemmatization ->
‘Eat’.
Another method for reducing words to a normalized form is
lemmatization. In lemmatization, a dictionary is transformed to trace a

39
word's many spellings back to its original root form. Therefore, using
this method, we are able to return the root "be" of non-trivial
inflections like "is," "was," and "were."
Figure 3: Lemmatizer example.
3.2.2 Information Retrieval
Information retrieval is the process of selecting resources from an
assortment of information system resources that are pertinent to a particular
information demand. Searches may use full-text indexing or another type of
content-based indexing. Information retrieval is the study of finding
information within a document, within papers themselves, as well as within
databases containing texts, pictures, or sounds, as well as within the metadata
used to describe data. This approach helps users find the info they need,
however, it does not provide thorough responses to the queries. Information
retrieval (IR) has acknowledged tagging as a successful approach to increasing
relevance matching, particularly when items lack extensive textual
descriptions.
40
The Information Retrieval method will be used in order to retrieve the
most suitable answer for the question of the user once the Chatbot understands
the question.
3.2.3 Calculation of TF-IDF
Search engines are a popular example of how term frequency-inverse
document frequency (TF-IDF) is applied to information retrieval. Search
results can be ranked by a search engine using TF-IDF, with higher TF-IDF
scores indicating results that are more pertinent to the user. This is so that TF-
IDF can tell you the appropriate significance of a word depending on a
document.
a.
b.
c.
In order to normalize the term frequency, it is frequently divided by the
length of the document (also known as the total number of terms in the
document) as follows: TF(t) = (Number of times term t appears in a text) /
(Total number of terms in the document).

41
Figure 4: Sample result of Calculation.

Figure 4
shows an example of TF-IDF calculation. The researchers used this formula to
calculate the most relevant response the Chatbot can give.
3.3 Evaluation Method
With the help of the free, open-source Python module SpaCy, you can quickly
and efficiently perform natural language processing (NLP) on huge amounts of text. It
supports the creation of models and production platforms that can help the
functionality of chatbots, document analysis, and other kinds of text analysis. The
proponents will utilize spaCy to assess or contrast the admission officer's and the
Chatbot's responses.
Figure 5: Comparing responses using spaCy.

42
3.4 Programming Language
3.3.1 Python
Python is an interpreted, high-level, general-purpose programming
language. Code readability is a key component of Python's development
model and features substantial indentation. Its language features and object-
oriented approach help programmers write clear, understandable code for
small and large projects. Python's indenting of source statements to make the
code simpler to understand is a remarkable feature.
3.4.2 Libraries
Python and its libraries will be useful for thresholding, the
segmentation process, the creation of neural network models, and in training
and testing the Dataset that will be used in giving responses to the user.
3.4.2.1 NLTK
Natural Language Toolkit (NLTK) is a Python library for
natural language processing. It is a collection of software tools for
statistical language processing. Its tools, which make it one of the most
effective NLP libraries, enable computers to understand human
language and react correctly when it is utilized.

43
3.4.2.2 Tensorflow
This will be used as the method of training the datasets.
Tensorflow is a software library for training datasets, and it is free and
open-source. This library's programming is adaptable to various
applications and machine learning techniques like neural networks.
3.5 Integrated Development Environment (IDE)
3.5.1 Python (IDLE SHELL)
The integrated development and learning environment for Python is
known as IDLE. Interactive Python interpreter window with colored input-
output and warning or errors. Multiple undo, smart indent, auto complete,
code hints, and other features are available in this multi-window editor.
Figure SEQ Figure \* ARABIC 3: Python Idle Shell 3.9.5
Figure 6: Python IDLE Shell 3.9.5.

44
3.5.2 Sublime Text Editor
It has a Python application interface and is a unique cross-platform
source code editor. Numerous programming and markup languages are
supported natively, and users may add features through plugins, which are
often community-developed and maintained under free-software licenses.
3.6 Web Framework
3.6.1 Flask (Python)
A completely functional web application may be easily created using
the Python web framework Flask. Due to the lack of specialized tools or
libraries, it has earned the title of micro framework. It lacks any elements that
common functionality is already provided by existing third-party libraries,
such as a database abstraction layer, form validation, or other elements.
However, Flask permits extensions that may be used to add application
functionality in the same way that they were included into Flask's core. This
methodology was applied by the researchers for deploying their chatbot
prototype.
45
3.7 Data Gathering
3.7.1 School Admission
3.7.1.1 Admission Officer
According to Mr. Reggie Consigna, as an admission officer and
mostly catering to customers and students, the most frequently asked
questions in the Admission Office are in regards with the following:
● Courses offered (College/SHS)
● Requirements for enrollment
● Enrollment/admission process
● Tuition fee
Aside from these questions, students that ask questions that are
outside the knowledge or field of the Admission Office would simply
direct to the corresponding department.
3.7.1.2 Facebook Page (ACLC College of Butuan)
The researchers gathered the raw data or the actual inquiries of
the customers/students from the Official Facebook page of ACLC
College of Butuan with the school admission officer's permission and
supervision as a reference used for the Chatbot's responses.

46
3.8 Relevant Technology
3.8.1 Hardware Requirements
● Minimum Requirements - Intel Core-i3 (Processor), 2GB RAM,
250GB storage.
3.8.2 Software Requirements
● Operating System
o Windows - Windows 7 and above, 64 Bit
● Required Software
o Flask - to run the web application Chatbot.
o Python 3.7 and above

47
Chapter 4
RESULTS AND DISCUSSIONS
This chapter presents the testing and results obtained from the proposed
method in this study. A thorough evaluation was made to achieve the following
objectives specifically stated in the study.
4.1 The Chatbot
Figure 7: Main File Running the Information Retrieval Chatbot.
The proponents were able to build the program that will be used in running the
Chatbot, and the program file consists of programming codes that will run the
Chatbot. And it is a combination of NLP and an Information retrieval program. The
following is what happened to the data that was passed through the methods:
48
4.1.1 Tokenize
The input or query of the user was split or separated into words,
keywords, symbols, phrases, and other elements and then replaced with a
token. The token will become the basis or the key. The input was being
tokenized so that it will be understood by the next step, which is the POS
tagging.
Figure 8: Tokenization sample result.
As per the result, the tokenizing was successful, the proponents were
able to use the tokenized data to be processed to the next part.
4.1.2 POS Tagging
After the tokenization is where the tagging takes part, the words that
are tokenized will be processed here to assign the specific parts-of-speech
tags. These tags represent what the token means to be understood by the
Chatbot. The desired result or output of this phase is that the tokens are
successfully tagged with the POS tags.
Figure 9: Output of POS tagging.

49
Figure 7 shows that the proponents were able to get the expected result
or output from tagging.
4.1.3 Lemmatization
Table 1: Sample Lemmatization result.
LEMMATIZED
WORD LIST RESULT
What what
are is
the the
courses course
offered offer
requirements requirement
needed need
In this process, we created a lemmatized result for the chunk of words.
Words have different permutations depending on the context of the sentence,
whether it happened in the past, present, or future. In order to speed up the
process of Information Retrieval, we reduced the number of words by
eliminating the different variations of each word.
4.1.3 Information Retrieval
After the process of NLP (natural language processing), the Chatbot
used the output as a reference to retrieve a response that can satisfy the query.
50
4.1.3.1 Calculation of TF-IDF
Table 2. Sample result of the Calculation.
Term-Frequency TF - IDF
WORDS IDF
1 2 1 2
enrollment 1/6 1/6 log(2/2) = 0 0 0
is 1/6 1/6 log(2/2) = 0 0 0
the 0 1/6 log(2/1) = 0.3 0.043 0
transferee 1/6 1/6 log(2/2) = 0 0 0
new 1/6 1/6 log(2/2) = 0 0 0
freshmen 1/6 0 log(2/1) = 0.3 0 0.043
Table 2 shows the sample result of the TF-IDF calculation.
This displays the word frequency in each document from the dataset
file.
4.1.3.2 Tagging
After the Calculation, the most frequent words are then tagged
to the most suitable response retrieved in the document. The result of
tagging is only determined by the final result, which can be determined
by the chatbot response itself.

51
4.2 Initiate Readable Data
This is the separate training file in which the data feed will be trained and then
fed. This file contains all the processed Dataset gathered from the raw Dataset.
Figure 10: Training file for the main application.
4.2.1 Dataset File
4.2.1.1 Raw Dataset
Figure 11: Raw data extracted formatted in .txt file.

52
Figure 11 shows the Raw Dataset from the ACLC College of Butuan
Facebook page responses. There are no available mining methods to mine a
private conversation. Hence, the Dataset was manually gathered from the
conversation between the student and admission officer. The proponents were
able to gather over 1000 responses and queries. This Dataset will be used to
train the Chatbot.
4.2.2 Training the Datasets
Figure 12: Graph of the result of training the datasets.

53
Figure 12 shows the graph of the results of training the Dataset. As we
can see, the graph shows that the accuracy rate is getting higher as we increase
the frequency of training the data.
4.3 Testing the Chatbot Prototype
The proponents provided sample questions from the gathered data obtained
through the ACLC School Admission on their official Facebook page. In the testing
phase, the proponents asked questions that had different sentence constructions, the
same content but different construction.
Figure 14: Asking enrollment requirements for new students.
Figure SEQ Figure \* ARABIC 20: Chatbot starting message.

Figure 13: Asking about the offered courses.
Figure 15: Asking enrollment requirements for the transferee.

54
Figures 13 to 15 show a sample conversation between the Chatbot and the
user. The proponents are the ones who tested the Chatbot and conducted at least
five tries. As shown above, the Chatbot responds accordingly to the user's
queries with sufficient information given.
4.3.1 Testing Commonly Asked Questions
The three figures above (Figures 12 to 14) show the responses of the
Information Retrieval Chatbot to the most commonly asked questions by users
that were gathered through mining the text manually from the school’s official
Facebook page. On the first trial of asking the question, the Chatbot took a
little over five seconds to output a response. The proponents expected this
since it was its first time running after being fed and trained in the data. After
at least ten extended runs of messaging the Chatbot, it was able to retrieve the
information it needed to respond to the user with very acceptable rates of
success.
55
Figure 16 shows that it was also tested with a question that was more
commonly asked, usually almost at the end of a semester or after a month after
enrollment was officially closed, according to the school admission officer.
This was one of a few exceptions since this schedule varied depending on
external or internal factors that may or may not affect the exact date and time
given. Although given this, the Chatbot gave an exception and didn't give a
sufficient answer but instead referred to the official Facebook page in which
the admission officer personally provided the answer.
4.3.2 Testing Short, Unintelligible Questions
In the following image below, the proponents tested short and
insufficient questions, which were sizable in the gathered data set. Although it
was part of the trained data, the Chatbot was not able to understand the given
query. The Chatbot will automatically detect short or incomplete queries, and
it will ask a query again to rephrase what was the question the user.
56
4.4 Response Time Graph
1
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
# # # # # # # # # 1 1 1 1 1 1 1 1 1 1 2
t t t t t t t t t # # # # # # # # # # #
es es es es es es es es es st st st st st st st st st st st
T T T T T T T T T e e e e e e e e e e e
T T T T T T T T T T T
Figure 18: Graph of the response time of the Chatbot.
Figure 18 shows a graph of the response time of the Chatbot tested 20 times.
The x-axis shows the number of seconds it responds, and the y-axis shows the number
of tests. The average response time of the Chatbot tested 20 times is 1.4 seconds. 1.4
seconds on average, which is quite fast considering that the Chatbot was tested on a
low-spec computer. The proponents were able to achieve the objective of creating a
quick response chatbot.

57
58
4.5 Evaluation
Table 3. Sample questions and responses of the Chatbot compared to the response of
the Admission Officer.
No. Question Chatbot Admission Officer Result

1. When would Enrollment dates will be The enrollment 84.04%
enrollment communicated through our date only depends
start? Facebook page due to the on when will be
pandemic and other announced
uncontrollable factors
2. What are the Transferee students must Requirement for 83.85%
requirements submit two photocopies of Transferee:
for transferee their birth certificate, Transcript of
students? original good moral Records (TOR),
certificate, original Honorable
transcript of records dismissal,
(TOR), and original Certificate of Good
Honorable Dismissal (HD), moral, two
all of which must be photocopies of
included in a long, brown Birth Cert. (PSA),
envelope. and 1 long brown
envelope.
3. Requirements Requirements for new Requirement for 90.21%
for new students are as follows: 2 Freshmen: SHS
students? Photocopies of your Birth Card, Certificate of
Certificate, original Good Good moral, 2
Moral Certificate, and photocopies of
original Senior HighSchool Birth Cert. (PSA),
Card (Form 138). Enclose and 1 long brown
it all in a long, brown envelope.
envelope.
4. When will the Class schedule will be The start of classes 92.65%
classes start? posted at the Facebook is still tentative, the
Page, ACLC College of official start of
Butuan. classes will be
announced at the
Facebook page.
5. How much is Tuition ranges from Php Tuition fee is 82.33%
the Tuition 16,000.00 (minimum load around 16k to 23k
fee? of units) to Php 25,000.00 per semester.
(maximum load of units). New/trasferee is
Cost of per unit is Php 250 Php 250 and php
for new/transferee students, 350 for old
59
then Php 350.00 for students.

students who are
returning/old.
The table 1 shows the sample comparison result of the responses. We tested 5
frequently ask questions and its responses. The average similarity percentage is
86.62%. The result shows that the responses of the Chatbot is acceptable and
exceeded the expected accuracy percentage that we stated in the objective of the
study. Also the Admission officer reviewed the chatbot responses and they agreed that
the responses are acceptable.

60
Chapter 5
SUMMARY, CONCLUSION, AND RECOMMENDATION
5.1 Summary
Due to COVID, the whole country went into lockdown, so online transaction
became the primary solution for almost everything that is usually done face to face.
Like queries to companies, or schools. Asking questions online was difficult because
staff assigned to answer questions about the school were not available all the time.
Because of that, the researchers thought of creating a Chatbot that would answer
frequently asked questions regarding school matters.
When creating a Chatbot, NLP or Natural Language Processing is one of the
best ways to do it. So the researchers used NLP and then combination with
Information Retrieval to make the Chatbot accurate and quick response. The
researchers focus on the design of an information retrieval chatbot that can properly
respond to queries regarding school admission matters.
Combining NLP and information retrieval for developing the Chatbot was
difficult. Throughout the process, the proponents continuously improve the Chatbot to
achieve the objective. There were only a few credible studies and resources that
would help in developing the Information Retrieval Chatbot. That makes this study
unique. Conventional chatbots most commonly employ Nave Bayes, Decision Trees,
Support Vector Machines, and Recurrent Neural Networks (RNN), etc., none of
61
which relate to information retrieval. However, the proponents focused on extensively
gathering the data and made do with the few related studies that implemented a form
of information retrieval on their Chatbot. Objectives were then put in place to
strategize and meet the requirements of the study, which resulted in the creation of a
working prototype of the Information Retrieval Chatbot for the ACLC College of
Butuan school admission.
In gathering and collecting the data needed, the proponents approached the
ACLC College of Butuan’s school admission officer and asked permission to be given
access to their public official Facebook page and gather customer queries under his
supervision. This data was the primary source of data, which the proponents acquired
a sizable amount to be used in training the Chatbot.
During the testing phase, the proponents faced problems when feeding the data
set into the Chatbot and that sometimes, feeding lack of information would not be
sufficient to achieve the desired goal, the Chatbot would only throw answers asking
for more information or words to work with. Adjustments were made to address the
issues that were presented. As the proponents fed sufficient data for the Chatbot, the
Chatbot was able to provide a relevant response that are similar to the Admission
Officer's response which is credible.

62
5.2 Conclusion
The proponents were able to achieve the stated objectives of the study upon
creating a working prototype chatbot system applied with information retrieval. The
proponents were able to achieve at least 80% similarity between the response of the
Chatbot and the Admission Officer. As a result of the completion of the study, it was
discovered that even with the filtered and processed data set, the Chatbot could not
appropriately give feedback to a few questions. It should be emphasized that the
queries did not provide enough context for the Chatbot to answer appropriately, but
they were part of the data set retrieved.
Furthermore, the research proved that even with a lack of more related
credible research and studies, the proponents were still able to produce a prototype
information retrieval chatbot. Through additional development and research
integration, the created prototype may be altered and improved. It may not be able to
reply to inquiries outside of its narrow scope, but its proponents feel that there will
always be more opportunity for improvement to work with.
This research will be beneficial to 1) future proponent’s research in working
with Chatbots, 2) other related studies of natural language processing, and 3)
development of information-retrieval-based systems.

63
5.3 Recommendation
The following recommendations were made by the study's proponents for
future improvement:
● Improve on the Natural Language Processing pipeline. This may have been the
cause for the Chatbot to sometimes not understand the given user query.
● Optimize training runtime to improve Chatbot overall performance and
efficiency.
● Increase number of datasets for the training phase of the Chatbot. A minimum
of 1000 queries to maximize the NLP potential of the Chatbot to efficiently
retrieve the proper information.

64
REFERENCES
A Folstad, CB Nordheim, CA Bjorkli - International conference on internet 2018

Springer. What makes users trust a chatbot for customer service?
https://scholar.google.com.ph/scholar
Khrystyna Sarakhman, Roman Kempnyk, Vladyslav Chyhura, 2020. ChatBot using

NLP (Natural Language Processing).
http://ena.lp.edu.ua:8080/handle/ntb/52117
Mohammad Nuruzzaman, Omar Khadeer Hussain, 2018. A survey on chatbot

implementation in customer service industry through deep neural networks
https://scholar.google.com.ph/scholar? 2018&hl=en&as
György Molnár, Zoltán Szűts, September 2018. The Role of Chatbots in Formal
Education https://www.researchgate.net /327670400
W. El Hefny et al. 2021, Jooka: A Bilingual Chatbot for University Admission

https://www.researchgate.net/publication/
350450767_Jooka_A_Bilingual_Chatbot_for_University_Admission
Khrystyna Sarakhman, Roman Kempnyk, Vladyslav Chyhura, 2020). ChatBot using

NLP http://ena.lp.edu.ua:8080/bitstream/ntb/52117/2/2020v2_Sarakhman_K-
ChatBot_using_NLP_429-432.pdf
Casey Phillips, 2018, What is Natural Language Processing (NLP) & Why Chatbots
Need it https://medium.com/support-automation-magazine/what-is-natural-
language-processing-nlp-why-chatbots-need-it-1316d4d120e6
Chu, 2005, p.16, Information Retrieval Methods Report

https://ivypanda.com/essays/information-retrieval-methods/
Gobinda G. Chowdhury (2011), Natural Language Processing (NLP)

https://www.researchgate.net/profile/Rajani-Kamath/publication
Andreas Lommatzsch and Jonas Katins (2019). An Information Retrieval-based

Approach for Building Intuitive Chatbots for Large Knowledge Bases
65
http://ceur-ws.org/Vol-2454/paper_60.pdf
Shahzad Qaiser and Ramsha Ali’s study, (2018). Text Mining: Use of TF-IDF to
Examine the Relevance of Words to Documents
https://ijcaonline.org/archives/volume181/number1/29681-2018917395
Asbjørn Følstad and Marita Skjuve (2019). Chatbots for Customer Service: User
Experience and Motivation https://www.researchgate.net/publication/335079257
Wu and. Al. (2017) Sequential Matching Network: A New Architecture for Multi-turn
Response Selection in Retrieval-Based Chatbot
https://www.semanticscholar.org/paper/Sequential-Matching-Network
M Uma, V Sneha, G Sneha, 2019, Formation of SQL from Natural Language Query
using NLP. https://www.researchgate.net/publication/336440228
MirMd.Moheuddin Khan, Md. Kowsher 2019. 7, Doly: Bengali Chatbot for Bengali
Education. https://www.semanticscholar.org/paper/Doly/Chatbot
Lokanath Mishraa, Tushar Guptab and Abha Shreeb. Online transactions in higher
education during lockdown period of COVID-19 pandemic.
https://www.semanticscholar.org/paper/Doly%3A-Bengali-Chatbot-for-Bengali-
Education-Kowsher-Tithi/bd4771299d49ba2bc022383d0671d57c09687f1f
66
APPENDIX A
Gathered Dataset
Figure 19: Raw data extracted formatted in .txt file.

67
APPENDIX B
Training the Dataset
Figure 20: Sample data training.

68
Documentation
Figure 21: Gathering Data from ACLC Butuan
Figure 22: Gathering Data from ACLC Butuan

Facebook page.
Figure 21 and 22 shows
the proponents gathering data from the Facebook page of ACLC College of Butuan.
69
Figure 23: Figure Developing the Chatbot.
Figure 23 shows the proponents developing the Chatbot.

70
CURRICULUM VITAE
JOHN PAUL M. GETONGO

Purok 2-A Village II, Brgy. Libertad, Butuan City
09999394031
[email protected]
PERSONAL PROFILE
Age: 22 yrs. Old
Date of Birth: November 08, 1999
Place of Birth: Libertad, Butuan City
Civil Status: Single
Citizenship: Filipino
Gender: Female
Religion: Roman Catholic
Father's Name: Paulo G. Getongo II
Mother's Name: Grace M. Getongo
EDUCATIONAL ATTAINMENT
PRIMARY:
Butuan Central Elementary School
A.D. Curato Street, Butuan City
June 2007 – March 2012
SECONDARY:
Good Sheperd Christian Academy
Guinggona Subdivision, J.P. Rizal, Butuan City
SENIOR HIGH SCHOOL:
June 2016- March 2018
TERTIARY:
71
August 2016 – July 2022
IAN BILL JUSTINE P. CABARLES

Purok 1, Salvacion, San Agustin, Surigao del Sur
09079166198
[email protected]
PERSONAL PROFILE
Age: 22 yrs. Old

Date of Birth: November 08, 1999
Place of Birth: Libertad, Butuan City
Civil Status: Single
Citizenship: Filipino
Gender: Female
Religion: Roman Catholic
Father's Name: Paulo G. Getongo II
Mother's Name: Grace M. Getongo
EDUCATIONAL ATTAINMENT
PRIMARY:
Salvacion Elementary School
Salvacion, San Agustin, Surigao del Sur
SECONDARY:
Salvacion National High School
Salvacion, San Agustin, Surigao del Sur
SENIOR HIGH SCHOOL:
June 2016- March 2018
TERTIARY:
August 2016 – July 2022

Final Thesis

Uploaded by

Copyright:

Available Formats

Final Thesis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Thesis

Uploaded by

Copyright:

Available Formats

ii

ACLC College of Butuan

INFORMATION RETRIEVAL CHATBOT FOR SCHOOL

Presented to the Faculty of

COMPUTER EDUCATION DEPARTMENT

of the Requirements for the Degree in

Bachelor of Science in Computer Science

IAN BILL JUSTINE P. CABARLES

JOHN PAUL M. GETONGO

The thesis attached hereto, entitled "INFORMATION RETRIEVAL

JOEL C. TRILLO CHRISTOPHER C. ABALORIO, MIT

JUNELL T. BOJOCAN, MIT JAMES CLOYD M. BUSTILLO, MSIT

CHRISTOPHER C. ABALORIO, MIT

This THESIS is approved in partial fulfillment of the requirements for the

JAMES CLOYD M. BUSTILLO, MSIT

JUNELL T. BOJOCAN, MIT

determination and perseverance, everything is possible.

patience, untiring guidance, and inspiration in motivating us.

We would like to thank Mr. Christopher Abalorio, our OLC Instructor of CS

his busy schedule.

constructive comments and suggestions.

To the researcher's cherished families, particularly their parents, whose

development of this study.

KEYWORDS: ACLC College of Butuan, Chatbot, Information Retrieval, Term

1.1 Background of the Problem………………………………………….……..…….14

1.2 Statement of the Problem………………...…………………………………....…20

1.3 Objective of the Study………………………....…………………………...…….20

1.4 Scope and Limitation………………………...………………………...………...20

1.5 Definition of Terms…………………………………...………………………….22

CHAPTER 2 REVIEW OF RELATED LITERATURE…………………………

2.1 Related Literature………………………………………...……………………....23

2.1.1 Natural Language Processing (NLP)…………………………………...23

2.1.2 An Information Retrieval-based Approach for Building Intuitive

Chatbots for Large Knowledge Bases …....…………………………..24

2.1.3 Text Mining: Use of TF-IDF to Examine the Relevance of Words to

2.1.4 Chatbots for Customer Service: User Experience and Motivation…….26

2.1.5 Sequential Matching Network: A New Architecture for Multi-turn

Response Selection in Retrieval-Based Chatbot……………………….27

2.1.6 Formation of SQL from Natural Language Query using NLP…………28

2.1.7 Doly: Bengali Chatbot for Bengali Education…………………………30

2.1.8 Evaluation of Information Retrieval Systems………………………….31

2.1.9 Online transactions in higher education during lockdown period of

CHAPTER 3 DESIN AND METHODOLOGY…………………………………..34

3.1 Conceptual Framework………………………………..........................................34

3.2 System Model and Design…………………………………….……………..…...35

3.2.1 Natural Language Processing……………………….………………….35

3.2.2 Information Retrieval…………………………………………………..38

3.2.3 Calculation of TF-IDF………………………………………………….39

3.3 Evaluation Method……………………………………………………………….40

3.4 Programming Language………………………………………………………….41

3.5 Integrated Development Environment (IDE) ……………………………………42

3.5.1 Python (IDLE SHELL) ………………………………………..

3.5.2 Sublime Text

3.6 Web Framework……….…………………………...…………………………….43

3.6.1 Flask (Python)….. ……………………………………………………………..43

3.7 Data Gathering…………………………………………………………………...44

3.7.1 School Admission………………………………………………………

3.8 Relevant Technology……….……………………...……………………………..45