Final Thesis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 71

ii

ACLC College of Butuan


COMPUTER EDUCATION DEPARTMENT
HDS Building. 999 J.C. Aquino Avenue, Butuan City, Agusan del Norte
Philippines,8600

INFORMATION RETRIEVAL CHATBOT FOR SCHOOL


ADMISSION USING NATURAL LANGUAGE PROCESSING

A Thesis

Presented to the Faculty of

COMPUTER EDUCATION DEPARTMENT

In Partial Fulfillment

of the Requirements for the Degree in

Bachelor of Science in Computer Science

Submitted by:

IAN BILL JUSTINE P. CABARLES

JOHN PAUL M. GETONGO

JULY 2022
ii

APPROVAL SHEET

The thesis attached hereto, entitled "INFORMATION RETRIEVAL


CHATBOT FOR SCHOOL ADMISSION USING NATURAL LANGUAGE
PROCESSING," prepared and submitted by IAN BILL JUSTINE P. CABARLES
and JOHN PAUL M. GETONGO in partial fulfillment of the requirements for the
degree BACHELOR OF SCIENCE IN COMPUTER SCIENCE, is hereby
recommended for approval.

JOEL C. TRILLO CHRISTOPHER C. ABALORIO, MIT


Thesis Adviser Thesis Instructor
_________ _________
Date Date

JUNELL T. BOJOCAN, MIT JAMES CLOYD M. BUSTILLO, MSIT


Panel Member Panel Member
_________ _________
Date Date

CHRISTOPHER C. ABALORIO, MIT


Panel Chairman
_________
Date

This THESIS is approved in partial fulfillment of the requirements for the


degree BACHELOR OF SCIENCE IN COMPUTER SCIENCE.

JAMES CLOYD M. BUSTILLO, MSIT


Research Innovation Coordinator
_________
Date

JUNELL T. BOJOCAN, MIT


Dean, Computer Education Department

_________
iii

Date

ACKNOWLEDGMENT

The researchers would like to offer their deepest gratitude and appreciation to

all who assisted and contributed significantly to the study's completion. Making it to

this point has never been easy. Many obstacles were met along the way, but with

determination and perseverance, everything is possible.

The proponents would like to express their gratitude to the Almighty Father

for providing them with the chance to conduct this research study and successfully

complete it. He also blessed them with knowledge and understanding. Finally, without

the strength, talent, and guidance to complete and surpass the goal, this success will

be unattainable.

We would like to thank Mr. Joel Trillo, thesis adviser, for his immense

patience, untiring guidance, and inspiration in motivating us.

We would like to thank Mr. Christopher Abalorio, our OLC Instructor of CS

Design Project 2/Thesis 2 for extending his additional knowledge and help even with

his busy schedule.

To the panelists, Mr. Cristopher Abalorio, Mr. Junell Bojocan, and Mr.

James Cloyd Bustillo, for helping the proponents improve the study by giving their

constructive comments and suggestions.


iv

To the researcher's cherished families, particularly their parents, whose

unwavering support never failed to provide financial and moral assistance to the

proponents. For giving the researcher's needs during the development of the system.

Last but not least, to our Almighty God for his guidance throughout the

development of this study.


v

ABSTRACT

In this time of the pandemic, most of our daily tasks that have been
traditionally done in person have been overhauled to a digital and online platform.
Answering queries from the majority of customers in large businesses has been a
norm, in most cases, it is catered by people who are specifically hired for that role. In
this paper, the researchers proposed a chatbot prototype using Natural Language
Processing and Information Retrieval that could assess customers' questions in a
specific sub-organization of an educational institution, namely ACLC College of
Butuan's Admission Department. The majority of today's chatbots are utilized for a
single purpose: information retrieval. This type of bot is intended to offer human-like
responses without the need for human intervention. Here, the system tries to figure
out what question you are attempting to ask, or, more realistically, which question
from its bank is closest to it, and responds appropriately (in this case, trained). Our
proposed design 1.) processes user questions through a natural language processing
pipeline, then 2.) identifies keywords within the processed query. 3.) it would then
find those keywords in tagged data that it was trained for and retrieve the
corresponding response. 4.) Finally, it would give that corresponding response to the
user. The process proved to be easy at first, but we ran into unexpected issues that
would have delayed and prolonged beyond what we had planned for our timeline.
Nonetheless, the researchers proved that this prototype chatbot would suffice and
prove that the study is viable for real-world application.

KEYWORDS: ACLC College of Butuan, Chatbot, Information Retrieval, Term


Frequency-Inverse Document Frequency (TF-IDF), and Natural Language
Processing.
vi

TABLE OF CONTENTS

Description Page

TITLE PAGE…..…………………………………………………………...……..….i

APPROVAL SHEET….………………………………………………….......….......ii

ACKNOWLEDGMENT…..………………………………..…………….….....…..iii

ABSTRACT…..…………………………………………………………...…...…......v

TABLE OF CONTENTS………………………………………………..………......vi

LIST OF FIGURES…..……………………………………………………….…….xi

LIST OF TABLES…..……………………………………………………..….

…....xiii

CHAPTER 1 INTRODUCTION…..…….…………...…………………….…..

…..14

1.1 Background of the Problem………………………………………….……..…….14

1.2 Statement of the Problem………………...…………………………………....…20

1.3 Objective of the Study………………………....…………………………...…….20


vii

1.4 Scope and Limitation………………………...………………………...………...20

Scope…………………………………………………………………………20

Limitation……………………………………………………………...……..21

1.5 Definition of Terms…………………………………...………………………….22

CHAPTER 2 REVIEW OF RELATED LITERATURE…………………………

23

2.1 Related Literature………………………………………...……………………....23

2.1.1 Natural Language Processing (NLP)…………………………………...23

2.1.2 An Information Retrieval-based Approach for Building Intuitive

Chatbots for Large Knowledge Bases …....…………………………..24

2.1.3 Text Mining: Use of TF-IDF to Examine the Relevance of Words to

Documents……………………………………………………………...25

2.1.4 Chatbots for Customer Service: User Experience and Motivation…….26

2.1.5 Sequential Matching Network: A New Architecture for Multi-turn

Response Selection in Retrieval-Based Chatbot……………………….27

2.1.6 Formation of SQL from Natural Language Query using NLP…………28

2.1.7 Doly: Bengali Chatbot for Bengali Education…………………………30


viii

2.1.8 Evaluation of Information Retrieval Systems………………………….31

2.1.9 Online transactions in higher education during lockdown period of

COVID-19 pandemic…………………………………………………..32

CHAPTER 3 DESIN AND METHODOLOGY…………………………………..34

3.1 Conceptual Framework………………………………..........................................34

3.2 System Model and Design…………………………………….……………..…...35

3.2.1 Natural Language Processing……………………….………………….35

3.2.2 Information Retrieval…………………………………………………..38

3.2.3 Calculation of TF-IDF………………………………………………….39

3.3 Evaluation Method……………………………………………………………….40

3.4 Programming Language………………………………………………………….41

3.3.1 Python…………………………………………………………………..41

3.4.2 Libraries………………………………………………………………...41

3.5 Integrated Development Environment (IDE) ……………………………………42

3.5.1 Python (IDLE SHELL) ………………………………………..

……….42
ix

3.5.2 Sublime Text

Editor…………………………………………………….43

3.6 Web Framework……….…………………………...…………………………….43

3.6.1 Flask (Python)….. ……………………………………………………………..43

3.7 Data Gathering…………………………………………………………………...44

3.7.1 School Admission………………………………………………………

44

3.7.1.1 Admission

Officer…………………………………………….44

3.8 Relevant Technology……….……………………...……………………………..45

3.8.1 Hardware Requirements………………………………………………..45

3.8.2 Software Requirements…………………………………………………

45

CHAPTER 4 RESULTS AND DISCUSSIONS……………………………………

46

4.1 The Chatbot……….………………………………………………………………

46

4.1.1 Tokenize………………………………………………………………..47
x

4.1.2 POS Tagging……………………………………………………………

47

4.1.3 Lemmatization………………………………………………………….48

4.1.3 Information Retrieval……………………………………..

…………….49

4.2 Initiate Readable Data……….……………………………………………………

50

4.2.1 Dataset File……………………………………………………………..50

4.2.2 Training the Datasets…………………………………………………...51

4.3 Testing the Chatbot Prototype……………………………………………………52

4.3.1 Testing Commonly Asked Questions…………………………………..54

4.3.2 Testing Short, Unintelligible

Questions………………………………...55

4.4 Response Time Graph……….……………………………………………………

56

4.5 Evaluation……….…………...……………………………………….…………..57

CHAPTER 5 SUMMARY, CONCLUSION, AND

RECOMMENDATION…….59
xi

5.1 Summary……….…………………………………………………………………

59

5.2 Conclusion……….………………………………..….…………………………..61

5.3 Recommendation……….……………………………………...…………………62

REFERENCES……………………………………………………………………...63

APPENDIX A Gathered Datasets………………………………………………….65

APPENDIX B Training the Dataset…………………………………….

………….66

DOCUMENTATION……………………………………………………………….67

CURRICULUM VITAE………………………………………………...………….69

LIST OF FIGURES

No. Description

Page

1 Conceptual Framework.................................................................................34
xii

2 NLTK POS tags example..............................................................................37

3 Lemmatizer example.....................................................................................38

4 Sample result of Calculation.........................................................................39

5 Comparing responses using spaCy................................................................40

6 Python IDLE Shell 3.9.5...............................................................................42

7 Main File Running the Information Retrieval Chatbot.................................46

8 Tokenization sample result...........................................................................47

9 Output of POS tagging..................................................................................47

10 Training files for the main application..........................................................50

11 Raw data extracted formatted in .txt file.......................................................50

12 Graph of the result of training the datasets..................................................51

13 Asking the offered courses...........................................................................52

14 Asking enrollment requirements for new students.......................................52

15 Asking enrollment requirements for the transferee......................................53

16 Asking about the start of enrollment............................................................54

17 Giving insufficient information....................................................................55


xiii

18 Graph of the response time of the Chatbot...................................................56

19 Raw data extracted formatted in .txt file......................................................65

20 Sample data training.....................................................................................66

21 Gathering Data from ACLC Butuan............................................................67

22 Gathering Data from ACLC Butuan Facebook page...................................67

23 Figure Developing the Chatbot....................................................................68


xiv

LIST OF TABLES

No Description Page

1 Sample Lemmatization result……………………………………………... 43

2 Sample result of the Calculation…………………………………………... 44

3 Sample questions and response of the Chatbot compared to the response

of the Admission Officer……………………………..…………………... 52


15

Chapter 1

INTRODUCTION

1.1 Background of the Problem

The COVID-19 (Coronavirus Disease 2019) pandemic has had a significant

negative impact on economies and people of all ages and socioeconomic

backgrounds. Increased online business, educational, and economic activities have

become the new standard. The entire system of education, tertiary level through

primary level, has failed during the COVID-19 shutdown period not just in India, but

also around the globe. Face-to-face classes had to be suspended; therefore, classes

could only be delivered through online. It was decided that both synchronous and

asynchronous types of instruction would be used as a medium. Annotated

PowerPoints and voice-over narration from the teacher are also included in the

asynchronous online training, in addition to the transmission and keeping of readings

and session materials on the LMS. Individual tasks were developed from session

activities and discussion assignments with the inclusion of clearer directions and

feedback. Social media and virtual gatherings have evolved into the "new normal" on

a national and international level as individuals attempt to preserve normalcy in their

lives in spite of pandemic constraints. Because of this problem, almost all

establishments accept or consider this solution to adopting online transactions like

schools, hospitals, banks, and other companies.


16

Due to rising concerns over the COVID-19 virus's spread and calls to control the

Corona virus, an increasing number of higher educational institutions have stopped

providing traditional in-person classes globally. The Corona virus has revealed

previously unknown flaws in educational institutions all over the world. As humanity

faces an unpredictable future, it is clear that society demands flexible and robust

educational systems and teaching practices now more than ever. Universities

throughout the world are undergoing substantial changes in how they function and

connect with their constituents because students and their families are seeking more

attention through numerous channels as well as quick response and service.

According to studies, the younger generation prefers chat services like WhatsApp,

SMS, and Facebook Messenger to phone calls or other direct person-to-person

engagement methods. We now communicate continuously at great volume and

intensity using a range of platforms, tools, and techniques. However, as technology

advances, it is now feasible to create user-friendly systems that converse with a

variety of user populations in the same way that humans would. (Lala Olusegun

Gbenga, 2020).

Every day there may be inquiries that admissions officers cannot cater to all of

the questions. The Admissions office answers all the questions of the customers

regarding the school. Because of the COVID, the school was unable to add staff to

address this issue, and staff are limited to working at school, particularly if they are

performing skeletal work. The researchers attempt to find a solution to this problem; a

solution that deals with the lack of attention when answering questions. Now that
17

almost all transactions are online, the researchers thought of the solution of making a

chatbot that answers all inquiries online, 24/7. Regarding this matter, this study

(chatbot) will solve the problem. Because the chatbot can operate on its own, it caters

to the customers 24/7 without the need for additional staff at the Admission Office.

Chatbots are a viable approach for automating customer service, especially as

online chat is increasingly being utilized for customer service. Chatbots are not a

novel technology as they date back to ELIZA, which was developed by Joseph

Weizenbaum in 1966. However, recent advances in machine learning and artificial

intelligence, as well as the increasing usage of messaging platforms, have prompted

organizations to consider chatbots as a supplement to customer service (Flstad A.,

Nordheim C.B., Bjrkli C.A. 2018). In this age of the internet and the COVID-19

pandemic, more than ever, communication is necessary, and everyday use of several

internet-based communication services is increasing. Chatbots have regularly been

used to speed up the process of replying to customer inquiries. Chatbots can

comprehend your communications and reply properly thanks to a process called

natural language processing (NLP). Natural language processing helps AI deliver the

best answer by providing context and meaning to text-based user inputs.(2020,

Khrystyna Sarakhman, Roman Kempnyk, and Vladyslav Chyhura).

AI-driven chatbots can understand natural human language, discern meaning and

emotion, and deliver smart responses as if a real human delivered it. Customers may

easily obtain responses to their concerns, for example, without being forced to wait in

phone lines or send numerous emails. They can lower the volume of client calls, the
18

typical handling time, and the cost of providing customer service (Mohammad

Nuruzzaman, Omar Khadeer Hussain, 2018).

AI-based chatbots can recognize meaning and emotion in natural language,

comprehend context, and provide thoughtful answers. As an instance, it helps

customers to swiftly get answers to their questions without having to wait in long

phone lines or send several emails. They can reduce phone volume, average handling

times, and customer service costs. (György Molnár, Zoltán Szűts, 2018). 

At the start of the current decade, chatbots started to appear in great numbers.

Online conversation has been swiftly invaded and taken over by interactive

technology, which is frequently integrated with artificial intelligence. Chatbots are

used by businesses, governments, and other organizations to advertise goods, services,

and ideas on websites, in apps, and on instant messaging systems. They are not

merely components of virtual assistants.  The authors of this paper begin by providing

a theoretical and historical framework, then emphasize the issues with using chat bots

as teaching aids, and finally detail the core methods and obstacles of chat bot

construction. (György Molnár, Zoltán Szűts, 2018).

When it comes to responding to client’s questions that are regularly asked and

frequently answered, chatbots employed in customer service save a large amount of

time and resources for businesses. In higher education settings, where students

frequently ask staff members about organizational and administrative matters, such a

technique is highly widespread. This is also clearly visible during the difficult
19

admissions season, when the admissions team is required to reply to inquiries from

countless numbers of curious high school students. Software programs called

conversational agents use natural language to have conversations that resemble those

between humans. In order to pass the Turing Test, they omitted information from

users making them think they were people in the 1960s. It was observed that allowing

users to convey their questions and interests organically by speaking, typing, or

pointing would boost and improve the overall user experience. (W. El Hefny et al.

2021). This shows that chatbot is a good solution for answering questions of

customers of a certain school, especially questions for the Admissions office.

Now, in building the chatbot there are multiple ways of approaching it and one of

them is using Natural Language Processing. The backbone of chatbots, natural

language processing (NLP), has undergone numerous changes and evolved into many

techniques of how it processes and interprets human language. Chatbots can

comprehend your communications and reply properly thanks to a process called

natural language processing (NLP). Natural language processing (NLP) assists in

providing context and meaning to text-based user inputs in order for AI to deliver the

best possible result. (Khrystyna Sarakhman, Roman Kempnyk, Vladyslav Chyhura,

2020).

A chatbot can comprehend your communications and reply properly because of a

process called natural language processing (NLP). The chatbot may use its AI skills to

provide a suitable answer when you start a message with "Hello," since the NLP

directs the chatbot that you provided a typical greeting. The chatbot will probably
20

give a reply greeting in this situation. Without the logic used in Natural Language

Processing, a chat bot cannot discern between the responses "Hello" and "Goodbye."

"Hello" and "Goodbye" will both be text-based user inputs to a chatbot that lacks the

use of Natural language processing. It helps AI deliver the best answer by providing

context and meaning to text-based user inputs. (Casey Phillips, 2018).

In addition, the researchers will also use the Information Retrieval method in

building the chatbot. This method will retrieve NLP and understand the data needed

once the input of the user. This will serve as the final stage of processing the data to

answer the question of the user. In other terms, a database is an organized system of

storage that enables searching for objects within it based on predetermined criteria.

The technology that enables searching of databases to retrieve information stored

within is known as a search mechanism. Depending on the user's technical proficiency

while accessing the database, the complexity of the query techniques used varies. A

regulated vocabulary or "natural language" can be used as the third element of an

information retrieval system. (Chu, 2005, p.16).

In conclusion, because of the need to be able to conduct and continue online

classes, all students come through with online enrollment/transactions. ACLC College

of Butuan adopts online transactions for the payment of tuition, enrollment, classes,

meetings, and inquiries or questions about the school. Now that almost all transactions

are online, the researchers thought of the solution of making a chatbot that answers all

inquiries online, 24/7. Regarding this matter, this study (chatbot) will solve the
21

problem. Because the chatbot can operate on its own, it caters to the customers 24/7

without the need for additional staff at the Admission Office.

1.2 Statement of the Problem

The researchers focus on the design of an information retrieval chatbot that can

properly respond to queries regarding school admission matters. Specifically, it aims

to answer the following questions:

1.) Can an information retrieval-based Chatbot cater to the frequently asked

questions from customers pertaining to school admission queries?

2.) Will an information retrieval-based chatbot model be able to generate 60% -

80% accuracy in generating an informative response?

3.) Can it offer a more efficient and quick response to inquiries asked by the

customers/students of ACLC COLLEGE?

1.3 Objective of the Study

To create a chatbot system that can cater to students or customers, maximize

time and efficiency in answering questions in place of Admission Staff, and

prove that an information retrieval-based chatbot is viable enough for that role.

1.4 Scope and Limitation

Scope:

● The Chatbot accepts inquiries from students or customers of ACLC College of

Butuan and answers them.


22

● The Chatbot will process the questions and provide specific

answers/information.

Limitation:

● It only answers school admission questions.

● It only caters to ACLC College of Butuan admission queries.

● It only accepts and responds to English text. It only caters to 1 question at a

time.
23

1.5 Definition of Terms

Chatbot – is a computer program created to mimic conversations with real people,

particularly online.

Epoch - is used in machine learning to describe how many iterations the algorithm

has made across the full training dataset.

COVID-19 - (Coronavirus disease 2019) is an infectious illness brought on by a

coronavirus that has just been identified. The majority of COVID-19 patients will

have mild to moderate symptoms and recover without additional care.

Natural Language Processing (NLP) - is a subfield of AI that enables machines to

comprehend, interpret, and modify the human language.

Natural Language Toolkit (NLTK) - is a tool used to create Python algorithms for

statistical natural language processing that use data from human language (NLP).

Information Retrieval (IR) - is a software program that controls how data from

document repositories, particularly textual data, is organized, stored, retrieved, and

evaluated.

ACLC College of Butuan - this is the institution where the proponents gather

questions and datasets.


24

Chapter 2

REVIEW OF RELATED LITERATURE

2.1 Related Literature

2.1.1 Natural Language Processing (NLP)

Raina, V., Krishnamurthy, S. (2022). The collection of techniques

known as "natural language processing" is used to make human language

understandable to computers. Over the past ten years, natural language

processing has become increasingly integrated into our daily lives. For

example, automatic machine translation is widely used on the internet and in

social media, text classification keeps our email inboxes free of spam, a high

level of language sophistication and dialog systems have been attained by

search engines beyond string matching and network analysis. are becoming a

more widely used and effective method of exchanging information.

These various applications, which incorporate elements of algorithms,

languages, logic, statistics, and more, are founded on a shared set of concepts.

This text's objective is to give an overview of these underpinnings. The

remainder of this chapter discusses several high-level topics in contemporary

natural language processing, situates natural language processing in relation to

other academic fields, and offers reader guidance on how to approach the
25

subject. The technical fun begins in the following chapter. Based on their

research, NLP makes use of language to comprehend and manipulate it so that

appropriate tools and strategies may be created to assist computers in

comprehending and manipulating natural languages in order to carry out their

assigned tasks. As a result, the researchers can use this research to assist in the

development of a chatbot that can interpret the users' questions or queries.

2.1.2 An Information Retrieval-based Approach for Building Intuitive

Chatbots for Large Knowledge Bases

In 2019, they saw the implementation of a conversational bot and the

deployment of the system on the official web portals of two significant

German cities by Andreas Lommatzsch and Jonas Katins. They began as an

extra avenue for residents looking for information about the services provided

by the administration without making any formal statements. We saw a

steadily rising interest in the service over the first month. After serving over

2500 dialogs each month on one municipal site for six months, they provided

us with insights into the user preferences and behaviors. They provide an

architecture that combines pre-existing databases, dialog handling tools, and

components for translating user inquiries into knowledge base entries.

In their study, they created a chatbot that responds to inquiries on the

public administration's services. It is similar to the study the researchers are


26

conducting to answer queries about a certain topic. As a result, the researchers

can use this research to build a chatbot since this is related to the study.

2.1.3 Text Mining: Use of TF-IDF to Examine the Relevance of Words to

Documents

In Shahzad Qaiser and Ramsha Ali's study (2018), The words "TF-

IDF" and "IDF" are combined. Term frequency and inverse document

frequency. The idea of "term frequency" will be discussed first. TF measures

the frequency of a term within a document. It is a well-known fact that papers'

overall lengths can range from extremely short to very long; consequently,

Any phrase may occur more frequently in lengthier documents than in shorter

ones. Consider a paper called "T1" that contains five thousand words and

exactly ten instances of the term "Alpha." To solve this problem, the word

frequency is calculated by dividing each instance of a term in a document by

the total number of terms in that document.

We shall now talk about inverse document frequency. Variable

keywords have varied weights, and when the term frequency of a page is

calculated, it can be observed that the algorithm treats all keywords equally,

regardless of whether they contain stop words like the incorrect "of." Imagine

a sentence that uses the end word "of" two thousand times, yet is completely

irrelevant or of no use. The IDF would be helpful in such a situation. Inverse


27

document frequency provides terms that do not occur often more weight than

those that do. For instance, if there are ten documents and the word

"technology" appears in five of them, the inverse document frequency is given

by IDF = log (20/6) = 0.2140.

2.1.4 Chatbots for Customer Service: User Experience and Motivation

Asbjorn Folstad and Marita Skjuve (2019) used the chatbots to attract

people from different service provoders. In answer to questions on the services

offered by the service providers, both chatbots give information and support.

Both chatbots provide a kind greeting to the user by giving a succinct

welcome greetings and background data before getting the user’s inquiries.

The inquiry is then evaluated to see if it fits one of the thousands of intents the

chatbots have to offer, and an acceptable response is subsequently supplied.

The user frequently needs to choose from a branching conversation tree's

possibilities in order to react to a series of follow-up questions in order to

receive an answer. Typically, responses come in the form of text with links to

other information or self-service on the business website. The Chatbot can

advance the conversation to a live customer care agent in cases when it cannot

respond to the user's question or the user is not happy with the response.

Marita Skjuve and Asbjorn Folstad to provide customer assistance they

deployed Chatbots; their study's findings show how crucial it is for these
28

chatbots to respond quickly to straightforward inquiries with appropriate

responses. Our findings also suggest that as long as the Chatbot provides a

simple route for further communication with real customer support people, the

rare absence of sufficient responses does not always result in a negative

experience.

2.1.5 Sequential Matching Network: A New Architecture for Multi-turn

Response Selection in Retrieval-Based Chatbot

There are many ways to approach problem-solving, and the most

effective approach depends on the specific problem at hand. Al. In 2017,

researchers analyzed how to answer selection works in chatbots that rely on

retrieval. They found that this process can be difficult, especially when the

conversation becomes longer. The challenge in matching a response to a

conversation context is to find meaningful pieces of information in the context

and to use that information to match responses to the conversation. Existing

matching methods may not be able to capture all the relevant information in

certain contexts. The authors propose a unified framework in which the

context is treated as a fixed-length vector without any interaction with the

answer before matching. This new sequential matching framework is known

as an SMF; it can effectively take significant information from the contexts to

match the relations between speeches. The first step in SMF matching is
29

matching the response and converting it into a matching vector. Then, the

vectors that match are gathered with the help of an RNN. The final step in the

context-response matching process is calculating the match between the

context and the response. On two open datasets, the model's effectiveness was

evaluated. The findings show that both models can outperform cutting-edge

matching techniques.

2.1.6 Formation of SQL from Natural Language Query using NLP

M Uma, V Sneha, G Sneha, 2019, suggested system is made up of a

number of modules which are used to extract keywords and discard

unnecessary information. This is crucial since redundant data will

unquestionably lower the system's overall performance. The first processing of

the incoming data is followed by a mapping phase. Tokenization,

lemmatization, POS tagging, and parsing are some of the NLP steps in the

translation process. Following the determination of the attributes in the input

that has been processed, the mapping step creates the SQL query using the

pertinent data. The workflow of their proposed work is the following:

1) Tokenization: It is the initial stage in the process of

breaking a phrase into more manageable tokens. These are typically

words. When the user provides the input in text form, tokenization is
30

implemented, and the results are saved in the form of a list. The word

tokenize module of the Python tokenize function package was utilized

by the supporters.

2) Lemmatization: The root words or lemmatization in a

procedure known as stemming, of each of the tokens are created from

the outcomes of the previous phase and are added to another list..

Lemmatization is preferred over stemming since stemming just

involves removing a word's prefix or suffix, which may not necessarily

result in accurate results.

3) Syntactical Analysis: all tokens that are lemmatized is

examined in syntactic analysis, and each token is assigned a POS based

on the context in which it appears. Every word and its associated tags

are condensed turned into a tuple in this case, and a list of each of these

tuples is generated.

4) Semantic Analysis: in this stage, we attempt to

interpret the tokens in order for the system can move on when the SQL

query is created. This is accomplished through the parsing (or

chunking) method.

Their study uses NLP in creating their Chatbot. They used the same

steps or stages to process the user's input that we will be using for our study.

Tokenization, Lemmatization, Syntactical Analysis, Semantic Analysis.


31

2.1.7 Doly: Bengali Chatbot for Bengali Education

MirMd.Moheuddin Khan, Md. Kowsher 2019 uses natural language

processing methods and the Natural Language Toolkit for Python to develop

devices for learning languages, intelligent explanations, and human-like

responses. They uses Python version 3.7.0 in building the Chatbot. Python is

an interpreted, high-level, general-purpose programming language. Code

readability is a key component of Python's development model and features

substantial indentation, this is best suited developing researches especially

science field. Additionally, they used Anaconda to partition Python. The best

Python-based open-source data science platform is provided by Anaconda.

They used the Natural Language Toolkit for natural language processing

(NLP) (NLTK). I also needed to install ChatterBot. ChatterBot comes with

built-in adapter classes that allow you to connect to different types of

databases. Python 3.7.0 is used to implement the Chatbot as it gives a

'Unicode decoding error .'Unicode decoding errors are runtime errors caused

by non-English languages that have many characters in their alphabet, such as

Bengali. The Chatbot has a Unicode range that has vowels and consonants.

This includes consonant conjunctions, modifiers, and other elements.

Therefore, their Chatbot cannot be bypass or decoded with an ASCII decoding

system.
32

In summary, they used the Natural Language Processing method and

Natural Language Toolkit for Python, which is one of the methods we will use

in creating the Chatbot. We will use Python and import the NLTK (Natural

Language Toolkit) library.

2.1.8 Evaluation of Information Retrieval Systems

Keneilwe Zuva1 and Tranos Zuva (2018) conducted a study that

evaluated Systems that used the Information Retrieval method. Evaluation is a

very important and tedious task in information retrieval systems. The literature

has a wide variety of search models, techniques, and systems. So you have to

pick the best of the many, choose which ones to use, and evaluate them to

enhance. One method of evaluation is to measure the effectiveness of the

system. The difficulty in measuring effectiveness has to do with the relevance

of the items searched. Relevance is thus the basis for evaluating information

retrieval. Therefore, understanding the relevance is important. To support the

laboratory experiments in early research was viewed as thematic relevance,

that is, thematic relevance between items and queries. Relevance is the

connection between a book, alternative, thing, or bit of information and an

issue, a need for information, a request, or a demand. Relevance can be

situational (relating to the user's present needs), subjective (based on a

particular user's judgment), cognitive (dependent on human perception), and


33

dynamic from a human perspective (over time, change). Due to concerns with

relevance, user-oriented system evaluation is particularly resource-intensive to

implement. Both textual and non-textual contexts have been used to research

this relevance problem. The Information Retrieval Evaluation Experiment thus

focuses solely on system evaluation. Consult an impartial expert to determine

whether the document or item is relevant to your information needs.

2.1.9 Online transactions in higher education during lockdown period of

COVID-19 pandemic

A study on online learning and transactions was conducted by

Lokanath Mishra, Tushar Guptab, and Abha Shreeb. The federal and state

governments have approved the national rollout of online education,

considering current demands. Numerous student and teacher organizations at

the national, state, and university levels have endorsed the idea of online

teaching with varying degrees of zeal and reluctance. Due to a lack of training,

orientation, and motivation for participants to use online teaching, there is a

natural inclination to experiment with new technology and business ways of

teaching in the educational system. Our readiness for this pandemic change,

our preparedness for online education, and the availability of resources to

apply online teaching strategies all went into creating an action plan. Teachers

have trained and prepared themselves to get familiar with the technology

needed to employ online teaching methods as part of the action plan. System

administrators and information and communication technology (ICT)


34

specialists have supported stakeholders as needed and handled the change

process at the university level. The success of online teaching and learning

has, however, been the subject of a great deal of research, none of which was

conducted during the COVID-19 lockdown. As a result, the researcher is

motivated to carry out this research with the following goals.


35

Chapter 3

DESIGN AND METHODOLOGY

This chapter introduces the relevant tool and methods for completing this

research. The following discussions include the tools and steps of building the

Chatbot. This chapter aims to solve the problem.

3.1 Conceptual Framework

Figure SEQ Figure \* ARABIC 1: Conceptual Framework.

Figure 1. Conceptual Framework

Figure 1: Conceptual Framework.


36

3.2 System Model and Design

3.2.1 Natural Language Processing

Natural Language Processing (NLP) is a branch of computer science

that studies Artificial Intelligence (AI) and how computer can interpret and

interpret human language. The researchers will use NLP. The question will be

processed into three steps: tokenize, POS tag, and lemmatize. In that way, the

Chatbot will understand the question and can find the right answer to that

question.

3.2.1.1 Tokenizer

Tokenization is the process of dividing characters of written

language into its section words. Tokenizer identifies individual tokens

or words. A language model is produced on the identified tokens or

words (Zhao & Zhang, 2016).

A text is tokenized when it is divided up into smaller units, like

phrases and words. Tokens are the name for these pieces. A sentence is

a token in a paragraph, just as a word is a token in a sentence. In this

stage, the researchers will use Python NLTK Tokenizer. There is a

module named tokenize in NLTK, which is further classified into two

subcategories:
37

● Tokenize word: The researchers used this method to

split a sentence into words or also called as tokens.

● Tokenize sentence:  The researchers used the method

to split a paragraph into sentences.

3.2.1.2 POS Tagging (Parts of Speech)

Each token is automatically described as part of the

classification process known as tagging. The descriptor, in this case, is

known as a tag, and it can indicate various things, such as semantic

data and portions of speech.

One of the aspects of speech's job with a given word is

described as the process of allocating one of the verbal components of

the word. It's known as point-of-speech (POS) tagging. In layman's 

terms,  POS tagging is the process of assigning the correct part of

speech to each word in a phrase.

In this stage, the researchers will use the Python NLTK library.

Steps in the POS tagging example:

● Tokenize text (split the text)


38

● Part-of-speech tags.

Figure 2: NLTK POS tags example.

3.2.1.3 Lemmatization

Lemmatization is the process of converting a word to its base

form. Lemmatization, unlike to stemming, converts the term to its

relevant base form after analyzing the context. Stemming only

eliminates the last few characters, which commonly leads to incorrect

spellings and interpretations.

For example, Lemmatization would correctly determine the

fundamental form of ‘eating’ to ‘eat’, 'Eating’ -> Lemmatization ->

‘Eat’.

Another method for reducing words to a normalized form is

lemmatization. In lemmatization, a dictionary is transformed to trace a


39

word's many spellings back to its original root form. Therefore, using

this method, we are able to return the root "be" of non-trivial

inflections like "is," "was," and "were."

Figure 3: Lemmatizer example.

3.2.2 Information Retrieval

Information retrieval is the process of selecting resources from an

assortment of information system resources that are pertinent to a particular

information demand. Searches may use full-text indexing or another type of

content-based indexing. Information retrieval is the study of finding

information within a document, within papers themselves, as well as within

databases containing texts, pictures, or sounds, as well as within the metadata

used to describe data. This approach helps users find the info they need,

however, it does not provide thorough responses to the queries. Information

retrieval (IR) has acknowledged tagging as a successful approach to increasing

relevance matching, particularly when items lack extensive textual

descriptions.
40

The Information Retrieval method will be used in order to retrieve the

most suitable answer for the question of the user once the Chatbot understands

the question.

3.2.3 Calculation of TF-IDF

Search engines are a popular example of how term frequency-inverse

document frequency (TF-IDF) is applied to information retrieval. Search

results can be ranked by a search engine using TF-IDF, with higher TF-IDF

scores indicating results that are more pertinent to the user. This is so that TF-

IDF can tell you the appropriate significance of a word depending on a

document.

a.

b.

c.

In order to normalize the term frequency, it is frequently divided by the

length of the document (also known as the total number of terms in the

document) as follows: TF(t) = (Number of times term t appears in a text) /

(Total number of terms in the document).


41

Figure 4: Sample result of Calculation.


Figure 4

shows an example of TF-IDF calculation. The researchers used this formula to

calculate the most relevant response the Chatbot can give.

3.3 Evaluation Method

With the help of the free, open-source Python module SpaCy, you can quickly

and efficiently perform natural language processing (NLP) on huge amounts of text. It

supports the creation of models and production platforms that can help the

functionality of chatbots, document analysis, and other kinds of text analysis. The

proponents will utilize spaCy to assess or contrast the admission officer's and the

Chatbot's responses.

Figure 5: Comparing responses using spaCy.


42

3.4 Programming Language

3.3.1 Python

Python is an interpreted, high-level, general-purpose programming

language. Code readability is a key component of Python's development

model and features substantial indentation. Its language features and object-

oriented approach help programmers write clear, understandable code for

small and large projects. Python's indenting of source statements to make the

code simpler to understand is a remarkable feature.

3.4.2 Libraries

Python and its libraries will be useful for thresholding, the

segmentation process, the creation of neural network models, and in training

and testing the Dataset that will be used in giving responses to the user.

3.4.2.1 NLTK

Natural Language Toolkit (NLTK) is a Python library for

natural language processing. It is a collection of software tools for

statistical language processing. Its tools, which make it one of the most

effective NLP libraries, enable computers to understand human

language and react correctly when it is utilized.


43

3.4.2.2 Tensorflow

This will be used as the method of training the datasets.

Tensorflow is a software library for training datasets, and it is free and

open-source. This library's programming is adaptable to various

applications and machine learning techniques like neural networks.

3.5 Integrated Development Environment (IDE)

3.5.1 Python (IDLE SHELL)

The integrated development and learning environment for Python is

known as IDLE. Interactive Python interpreter window with colored input-

output and warning or errors. Multiple undo, smart indent, auto complete,

code hints, and other features are available in this multi-window editor.

Figure SEQ Figure \* ARABIC 3: Python Idle Shell 3.9.5

Figure 6: Python IDLE Shell 3.9.5.


44

3.5.2 Sublime Text Editor

It has a Python application interface and is a unique cross-platform

source code editor. Numerous programming and markup languages are

supported natively, and users may add features through plugins, which are

often community-developed and maintained under free-software licenses.

3.6 Web Framework

3.6.1 Flask (Python)

A completely functional web application may be easily created using

the Python web framework Flask. Due to the lack of specialized tools or

libraries, it has earned the title of micro framework. It lacks any elements that

common functionality is already provided by existing third-party libraries,

such as a database abstraction layer, form validation, or other elements.

However, Flask permits extensions that may be used to add application

functionality in the same way that they were included into Flask's core. This

methodology was applied by the researchers for deploying their chatbot

prototype.
45

3.7 Data Gathering

3.7.1 School Admission

3.7.1.1 Admission Officer

According to Mr. Reggie Consigna, as an admission officer and

mostly catering to customers and students, the most frequently asked

questions in the Admission Office are in regards with the following:

● Courses offered (College/SHS)

● Requirements for enrollment

● Enrollment/admission process

● Tuition fee

Aside from these questions, students that ask questions that are

outside the knowledge or field of the Admission Office would simply

direct to the corresponding department.

3.7.1.2 Facebook Page (ACLC College of Butuan)

The researchers gathered the raw data or the actual inquiries of

the customers/students from the Official Facebook page of ACLC

College of Butuan with the school admission officer's permission and

supervision as a reference used for the Chatbot's responses.


46

3.8 Relevant Technology

3.8.1 Hardware Requirements

● Minimum Requirements - Intel Core-i3 (Processor), 2GB RAM,

250GB storage.

3.8.2 Software Requirements

● Operating System

o Windows - Windows 7 and above, 64 Bit

● Required Software

o Flask - to run the web application Chatbot.

o Python 3.7 and above


47

Chapter 4

RESULTS AND DISCUSSIONS

This chapter presents the testing and results obtained from the proposed

method in this study. A thorough evaluation was made to achieve the following

objectives specifically stated in the study.

4.1 The Chatbot

Figure 7: Main File Running the Information Retrieval Chatbot.

The proponents were able to build the program that will be used in running the

Chatbot, and the program file consists of programming codes that will run the

Chatbot. And it is a combination of NLP and an Information retrieval program. The

following is what happened to the data that was passed through the methods:
48

4.1.1 Tokenize

The input or query of the user was split or separated into words,

keywords, symbols, phrases, and other elements and then replaced with a

token. The token will become the basis or the key. The input was being

tokenized so that it will be understood by the next step, which is the POS

tagging.

Figure 8: Tokenization sample result.

As per the result, the tokenizing was successful, the proponents were

able to use the tokenized data to be processed to the next part.

4.1.2 POS Tagging

After the tokenization is where the tagging takes part, the words that

are tokenized will be processed here to assign the specific parts-of-speech

tags. These tags represent what the token means to be understood by the

Chatbot. The desired result or output of this phase is that the tokens are

successfully tagged with the POS tags.

Figure 9: Output of POS tagging.


49

Figure 7 shows that the proponents were able to get the expected result

or output from tagging.

4.1.3 Lemmatization

Table 1: Sample Lemmatization result.

LEMMATIZED
WORD LIST RESULT
What what
are is
the the
courses course
offered offer
requirements requirement
needed need

In this process, we created a lemmatized result for the chunk of words.

Words have different permutations depending on the context of the sentence,

whether it happened in the past, present, or future. In order to speed up the

process of Information Retrieval, we reduced the number of words by

eliminating the different variations of each word.

4.1.3 Information Retrieval

After the process of NLP (natural language processing), the Chatbot

used the output as a reference to retrieve a response that can satisfy the query.
50

4.1.3.1 Calculation of TF-IDF

Table 2. Sample result of the Calculation.

Term-Frequency TF - IDF
WORDS IDF
1 2 1 2
enrollment 1/6 1/6 log(2/2) = 0 0 0
is 1/6 1/6 log(2/2) = 0 0 0
the 0 1/6 log(2/1) = 0.3 0.043 0
transferee 1/6 1/6 log(2/2) = 0 0 0
new 1/6 1/6 log(2/2) = 0 0 0
freshmen 1/6 0 log(2/1) = 0.3 0 0.043

Table 2 shows the sample result of the TF-IDF calculation.

This displays the word frequency in each document from the dataset

file.

4.1.3.2 Tagging

After the Calculation, the most frequent words are then tagged

to the most suitable response retrieved in the document. The result of

tagging is only determined by the final result, which can be determined

by the chatbot response itself.


51

4.2 Initiate Readable Data

This is the separate training file in which the data feed will be trained and then

fed. This file contains all the processed Dataset gathered from the raw Dataset.

Figure 10: Training file for the main application.

4.2.1 Dataset File

4.2.1.1 Raw Dataset

Figure 11: Raw data extracted formatted in .txt file.


52

Figure 11 shows the Raw Dataset from the ACLC College of Butuan

Facebook page responses. There are no available mining methods to mine a

private conversation. Hence, the Dataset was manually gathered from the

conversation between the student and admission officer. The proponents were

able to gather over 1000 responses and queries. This Dataset will be used to

train the Chatbot.

4.2.2 Training the Datasets

Figure 12: Graph of the result of training the datasets.


53

Figure 12 shows the graph of the results of training the Dataset. As we

can see, the graph shows that the accuracy rate is getting higher as we increase

the frequency of training the data.

4.3 Testing the Chatbot Prototype

The proponents provided sample questions from the gathered data obtained

through the ACLC School Admission on their official Facebook page. In the testing

phase, the proponents asked questions that had different sentence constructions, the

same content but different construction.

Figure 14: Asking enrollment requirements for new students.

Figure SEQ Figure \* ARABIC 20: Chatbot starting message.


Figure 13: Asking about the offered courses.

Figure 15: Asking enrollment requirements for the transferee.


54

Figures 13 to 15 show a sample conversation between the Chatbot and the

user. The proponents are the ones who tested the Chatbot and conducted at least

five tries. As shown above, the Chatbot responds accordingly to the user's

queries with sufficient information given.

4.3.1 Testing Commonly Asked Questions

The three figures above (Figures 12 to 14) show the responses of the

Information Retrieval Chatbot to the most commonly asked questions by users

that were gathered through mining the text manually from the school’s official

Facebook page. On the first trial of asking the question, the Chatbot took a

little over five seconds to output a response. The proponents expected this

since it was its first time running after being fed and trained in the data. After

at least ten extended runs of messaging the Chatbot, it was able to retrieve the

information it needed to respond to the user with very acceptable rates of

success.
55

Figure 16 shows that it was also tested with a question that was more

commonly asked, usually almost at the end of a semester or after a month after

enrollment was officially closed, according to the school admission officer.

This was one of a few exceptions since this schedule varied depending on

external or internal factors that may or may not affect the exact date and time

given. Although given this, the Chatbot gave an exception and didn't give a

sufficient answer but instead referred to the official Facebook page in which

the admission officer personally provided the answer.

4.3.2 Testing Short, Unintelligible Questions

In the following image below, the proponents tested short and

insufficient questions, which were sizable in the gathered data set. Although it

was part of the trained data, the Chatbot was not able to understand the given

query. The Chatbot will automatically detect short or incomplete queries, and

it will ask a query again to rephrase what was the question the user.
56

4.4 Response Time Graph

1
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
# # # # # # # # # 1 1 1 1 1 1 1 1 1 1 2
t t t t t t t t t # # # # # # # # # # #
es es es es es es es es es st st st st st st st st st st st
T T T T T T T T T e e e e e e e e e e e
T T T T T T T T T T T

Figure 18: Graph of the response time of the Chatbot.

Figure 18 shows a graph of the response time of the Chatbot tested 20 times.

The x-axis shows the number of seconds it responds, and the y-axis shows the number

of tests. The average response time of the Chatbot tested 20 times is 1.4 seconds. 1.4

seconds on average, which is quite fast considering that the Chatbot was tested on a

low-spec computer. The proponents were able to achieve the objective of creating a

quick response chatbot.


57
58

4.5 Evaluation

Table 3. Sample questions and responses of the Chatbot compared to the response of
the Admission Officer.

No. Question Chatbot Admission Officer Result


1. When would Enrollment dates will be The enrollment 84.04%
enrollment communicated through our date only depends
start? Facebook page due to the on when will be
pandemic and other announced
uncontrollable factors
2. What are the Transferee students must Requirement for 83.85%
requirements submit two photocopies of Transferee:
for transferee their birth certificate, Transcript of
students? original good moral Records (TOR),
certificate, original Honorable
transcript of records dismissal,
(TOR), and original Certificate of Good
Honorable Dismissal (HD), moral, two
all of which must be photocopies of
included in a long, brown Birth Cert. (PSA),
envelope. and 1 long brown
envelope.
3. Requirements Requirements for new Requirement for 90.21%
for new students are as follows: 2 Freshmen: SHS
students? Photocopies of your Birth Card, Certificate of
Certificate, original Good Good moral, 2
Moral Certificate, and photocopies of
original Senior HighSchool Birth Cert. (PSA),
Card (Form 138). Enclose and 1 long brown
it all in a long, brown envelope.
envelope.
4. When will the Class schedule will be The start of classes 92.65%
classes start? posted at the Facebook is still tentative, the
Page, ACLC College of official start of
Butuan. classes will be
announced at the
Facebook page.
5. How much is Tuition ranges from Php Tuition fee is 82.33%
the Tuition 16,000.00 (minimum load around 16k to 23k
fee? of units) to Php 25,000.00 per semester.
(maximum load of units). New/trasferee is
Cost of per unit is Php 250 Php 250 and php
for new/transferee students, 350 for old
59

then Php 350.00 for students.


students who are
returning/old.

The table 1 shows the sample comparison result of the responses. We tested 5

frequently ask questions and its responses. The average similarity percentage is

86.62%. The result shows that the responses of the Chatbot is acceptable and

exceeded the expected accuracy percentage that we stated in the objective of the

study. Also the Admission officer reviewed the chatbot responses and they agreed that

the responses are acceptable.


60

Chapter 5

SUMMARY, CONCLUSION, AND RECOMMENDATION

5.1 Summary

Due to COVID, the whole country went into lockdown, so online transaction

became the primary solution for almost everything that is usually done face to face.

Like queries to companies, or schools. Asking questions online was difficult because

staff assigned to answer questions about the school were not available all the time.

Because of that, the researchers thought of creating a Chatbot that would answer

frequently asked questions regarding school matters.

When creating a Chatbot, NLP or Natural Language Processing is one of the

best ways to do it. So the researchers used NLP and then combination with

Information Retrieval to make the Chatbot accurate and quick response. The

researchers focus on the design of an information retrieval chatbot that can properly

respond to queries regarding school admission matters.

Combining NLP and information retrieval for developing the Chatbot was

difficult. Throughout the process, the proponents continuously improve the Chatbot to

achieve the objective. There were only a few credible studies and resources that

would help in developing the Information Retrieval Chatbot. That makes this study

unique. Conventional chatbots most commonly employ Nave Bayes, Decision Trees,

Support Vector Machines, and Recurrent Neural Networks (RNN), etc., none of
61

which relate to information retrieval. However, the proponents focused on extensively

gathering the data and made do with the few related studies that implemented a form

of information retrieval on their Chatbot. Objectives were then put in place to

strategize and meet the requirements of the study, which resulted in the creation of a

working prototype of the Information Retrieval Chatbot for the ACLC College of

Butuan school admission.

In gathering and collecting the data needed, the proponents approached the

ACLC College of Butuan’s school admission officer and asked permission to be given

access to their public official Facebook page and gather customer queries under his

supervision. This data was the primary source of data, which the proponents acquired

a sizable amount to be used in training the Chatbot.

During the testing phase, the proponents faced problems when feeding the data

set into the Chatbot and that sometimes, feeding lack of information would not be

sufficient to achieve the desired goal, the Chatbot would only throw answers asking

for more information or words to work with. Adjustments were made to address the

issues that were presented. As the proponents fed sufficient data for the Chatbot, the

Chatbot was able to provide a relevant response that are similar to the Admission

Officer's response which is credible.


62

5.2 Conclusion

The proponents were able to achieve the stated objectives of the study upon

creating a working prototype chatbot system applied with information retrieval. The

proponents were able to achieve at least 80% similarity between the response of the

Chatbot and the Admission Officer. As a result of the completion of the study, it was

discovered that even with the filtered and processed data set, the Chatbot could not

appropriately give feedback to a few questions. It should be emphasized that the

queries did not provide enough context for the Chatbot to answer appropriately, but

they were part of the data set retrieved.

Furthermore, the research proved that even with a lack of more related

credible research and studies, the proponents were still able to produce a prototype

information retrieval chatbot. Through additional development and research

integration, the created prototype may be altered and improved. It may not be able to

reply to inquiries outside of its narrow scope, but its proponents feel that there will

always be more opportunity for improvement to work with.

This research will be beneficial to 1) future proponent’s research in working

with Chatbots, 2) other related studies of natural language processing, and 3)

development of information-retrieval-based systems.


63

5.3 Recommendation

The following recommendations were made by the study's proponents for

future improvement:

● Improve on the Natural Language Processing pipeline. This may have been the

cause for the Chatbot to sometimes not understand the given user query.

● Optimize training runtime to improve Chatbot overall performance and

efficiency.

● Increase number of datasets for the training phase of the Chatbot. A minimum

of 1000 queries to maximize the NLP potential of the Chatbot to efficiently

retrieve the proper information.


64

REFERENCES

A Folstad, CB Nordheim, CA Bjorkli - International conference on internet 2018


Springer. What makes users trust a chatbot for customer service?
https://scholar.google.com.ph/scholar

Khrystyna Sarakhman, Roman Kempnyk, Vladyslav Chyhura, 2020. ChatBot using


NLP (Natural Language Processing).
http://ena.lp.edu.ua:8080/handle/ntb/52117

Mohammad Nuruzzaman, Omar Khadeer Hussain, 2018. A survey on chatbot


implementation in customer service industry through deep neural networks
https://scholar.google.com.ph/scholar? 2018&hl=en&as

György Molnár, Zoltán Szűts, September 2018. The Role of Chatbots in Formal
Education https://www.researchgate.net /327670400

W. El Hefny et al. 2021, Jooka: A Bilingual Chatbot for University Admission


https://www.researchgate.net/publication/
350450767_Jooka_A_Bilingual_Chatbot_for_University_Admission

Khrystyna Sarakhman, Roman Kempnyk, Vladyslav Chyhura, 2020). ChatBot using


NLP http://ena.lp.edu.ua:8080/bitstream/ntb/52117/2/2020v2_Sarakhman_K-
ChatBot_using_NLP_429-432.pdf

Casey Phillips, 2018, What is Natural Language Processing (NLP) & Why Chatbots
Need it https://medium.com/support-automation-magazine/what-is-natural-
language-processing-nlp-why-chatbots-need-it-1316d4d120e6

Chu, 2005, p.16, Information Retrieval Methods Report


https://ivypanda.com/essays/information-retrieval-methods/

Gobinda G. Chowdhury (2011), Natural Language Processing (NLP)


https://www.researchgate.net/profile/Rajani-Kamath/publication

Andreas Lommatzsch and Jonas Katins (2019). An Information Retrieval-based


Approach for Building Intuitive Chatbots for Large Knowledge Bases
65

http://ceur-ws.org/Vol-2454/paper_60.pdf

Shahzad Qaiser and Ramsha Ali’s study, (2018). Text Mining: Use of TF-IDF to
Examine the Relevance of Words to Documents
https://ijcaonline.org/archives/volume181/number1/29681-2018917395

Asbjørn Følstad and Marita Skjuve (2019). Chatbots for Customer Service: User
Experience and Motivation https://www.researchgate.net/publication/335079257

Wu and. Al. (2017) Sequential Matching Network: A New Architecture for Multi-turn
Response Selection in Retrieval-Based Chatbot
https://www.semanticscholar.org/paper/Sequential-Matching-Network

M Uma, V Sneha, G Sneha, 2019, Formation of SQL from Natural Language Query
using NLP. https://www.researchgate.net/publication/336440228

MirMd.Moheuddin Khan, Md. Kowsher 2019. 7, Doly: Bengali Chatbot for Bengali
Education. https://www.semanticscholar.org/paper/Doly/Chatbot

Lokanath Mishraa, Tushar Guptab and Abha Shreeb. Online transactions in higher
education during lockdown period of COVID-19 pandemic.
https://www.semanticscholar.org/paper/Doly%3A-Bengali-Chatbot-for-Bengali-
Education-Kowsher-Tithi/bd4771299d49ba2bc022383d0671d57c09687f1f
66

APPENDIX A

Gathered Dataset

Figure 19: Raw data extracted formatted in .txt file.


67

APPENDIX B

Training the Dataset

Figure 20: Sample data training.


68

Documentation

Figure 21: Gathering Data from ACLC Butuan

Figure 22: Gathering Data from ACLC Butuan


Facebook page.

Figure 21 and 22 shows

the proponents gathering data from the Facebook page of ACLC College of Butuan.
69

Figure 23: Figure Developing the Chatbot.

Figure 23 shows the proponents developing the Chatbot.


70

CURRICULUM VITAE

JOHN PAUL M. GETONGO


Purok 2-A Village II, Brgy. Libertad, Butuan City
09999394031
[email protected]

PERSONAL PROFILE
Age: 22 yrs. Old
Date of Birth: November 08, 1999
Place of Birth: Libertad, Butuan City
Civil Status: Single
Citizenship: Filipino
Gender: Female
Religion: Roman Catholic
Father's Name: Paulo G. Getongo II
Mother's Name: Grace M. Getongo

EDUCATIONAL ATTAINMENT

PRIMARY:
Butuan Central Elementary School
A.D. Curato Street, Butuan City
June 2007 – March 2012
SECONDARY:
Good Sheperd Christian Academy
Guinggona Subdivision, J.P. Rizal, Butuan City
June 2012 – March 2016
SENIOR HIGH SCHOOL:
ACLC College of Butuan
June 2016- March 2018
TERTIARY:
ACLC College of Butuan
71

August 2016 – July 2022

IAN BILL JUSTINE P. CABARLES


Purok 1, Salvacion, San Agustin, Surigao del Sur
09079166198
[email protected]

PERSONAL PROFILE

Age: 22 yrs. Old


Date of Birth: November 08, 1999
Place of Birth: Libertad, Butuan City
Civil Status: Single
Citizenship: Filipino
Gender: Female
Religion: Roman Catholic
Father's Name: Paulo G. Getongo II
Mother's Name: Grace M. Getongo

EDUCATIONAL ATTAINMENT
PRIMARY:
Salvacion Elementary School
Salvacion, San Agustin, Surigao del Sur
June 2007 – March 2012
SECONDARY:
Salvacion National High School
Salvacion, San Agustin, Surigao del Sur
June 2012 – March 2016
SENIOR HIGH SCHOOL:
ACLC College of Butuan
June 2016- March 2018
TERTIARY:
ACLC College of Butuan
August 2016 – July 2022

You might also like