OpenBook Whitepaper V1.0

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

OpenBook: A knowledge-based method to develop

Question-Answering chatbots
Botpress Technologies Inc.

the number of organizations utilizing them. Inside


Abstract Intelligence predicts that consumer retail spending
through chatbots will increase to $142 billion worldwide
Many internet users have encountered talking with a by 2024, compared to the $2.8 billion spent in 2019 [1].
virtual digital assistant at least once. These interactions Several industries are adopting the use of chatbots,
could be in the form of voice talking with popular including prominent sectors like online retail, customer
assistants like Siri, Alexa, or Google Assistant or service, telecommunications, and banking. Juniper
interacting with text-based chatbots. All chatbots involve Research estimates $7.3 billion in operational cost savings
text processing, and one of the most prominent globally by 2023 in the banking industry alone through
approaches for building chatbots is the intent-based chatbots [2].
approach. But these methods are challenging to build,
monotonous for users, and do not scale well. This How are Chatbots built?
whitepaper introduces a new solution developed by
Botpress called OpenBook, which uses a Chatbots are considered one of the most prominent
knowledge-based approach for building emerging technologies. As a result, there is a huge
question-answering chatbots. This paper provides demand for the tools to develop such chatbots.
comparison results against several popular intent-based Conversational AI platforms have played an important
chatbot building platforms to demonstrate the role in democratizing the development of chatbots across
performance and ease of implementing chatbots using various industries. These platforms provide a technology
OpenBook. stack for users to build and customize chatbots according
to their needs.
Introduction to Chatbots
Typically, conversational AI platforms allow developers
A Chatbot (or conversational agent) is a software to create chatbots using a flow-based approach or an
application that simulates a conversation with a user in a intent-based approach. The flow-based approach uses a
natural language through the exchange of messages. The predefined set of routes (mapped like a flowchart) for the
conversational exchange format can be text or voice chatbots to interact with the users. On the other hand,
hosted through a web application, mobile application, or the intent-based approach enables the understanding of
consumable service. free text input by using Natural Language Understanding
(NLU) methods to understand a user's intention (hence,
One of the primary advantages of chatbots is that they 'intent-based') and provide responses based on the
streamline user interactions and help businesses and identified intent. While an improvement over flow-based
individuals emulate human-to-human interactions. As a models, intent-based approaches come with a list of
result, chatbots enable companies to enjoy greater shortcomings.
operational efficiencies and an improved customer
experience while also reducing costs and interaction Intent-based methods’ challenges
times for employees and customers alike. Chatbots also
provide an automated way of gathering data that helps Intent-based approaches require a predetermined list of
organizations better understand their customers — intents that the chatbot should be able to handle.
leading to better and more personalized solutions. However, creating an exhaustive list of intents requires
intensive building time and resources, making it
With the proliferation of chatbots and the technology
error-prone. The intents have to be predetermined, and
supporting their development, there is a stark increase in
for each intent, the user has to provide a set of utterances

Copyrighted material. © 2022 Botpress Technologies, Inc.


to learn the relevance of potential questions. As a result, building and the chatbot user interaction. The first
any errors made while providing the utterances will activity of chatbot building was focused on identifying
produce an undesirable outcome. The intent-based the ‘ease-of-development’ and the ‘input size’ consumed
approach also has to be trained using machine learning by each platform. The ‘ease-of-development’ was
models every time there is a change in the intent or the evaluated by measuring the time taken to build chatbots
related utterances. Further, because this approach by the respective experts of each platform. The ‘input
generates responses based on the relevant context, it size’ for each platform was measured by the total number
must be programmed explicitly. of non-whitespace characters provided as an input for
building the chatbot.
OpenBook solution
The second activity was to measure the chatbot's
As an alternative to intent-based chatbots, OpenBook performance using various parameters through user
uses a knowledge-driven approach to build interactions. The same set of questions was fed as input
question-answering chatbots. The only input required for to each chatbot, and the chatbot's responses to each
OpenBook is to provide the knowledge as a book. A question were recorded. This question-response set and
book is a collection of short facts using a markdown the specification sheet were provided to two groups of
format. independent evaluators, who then evaluated and rated it
against 11 parameters, as shown in Table 1. Each of these
Unlike the intent-based approaches, chatbots built on 11 parameters captures unique information about the
OpenBook do not require the input of questions, their response and its correctness.
variations, and various answers to function. Instead,
chatbots built on OpenBook can be deployed as soon as 1. The response contained the complete answer to
a book is provided to the platform. By going intentless, the question asked
OpenBook requires much less data and is robust to 2. The response contained only partial information
unanticipated questions, facilitating a faster and more to the question asked
reliable way of creating intelligent chatbots. 3. The response has at least one piece of additional
unrelated information
OpenBook also allows developers to select the level of
4. More than half of the responses contained
creativity of generated responses. They can be as strict as
information that is not relevant to question
using the verbatim from the book of facts or generating
5. The response was an unaltered repetition of facts
creative human-like responses. Unlike intent-based
6. The response contained misleading/untrue
approaches, OpenBook is scalable without affecting the
information
system's performance. Also, OpenBook does not involve
7. The response was completely wrong
training or fine-tuning of machine learning models when
8. The chatbot responded it did not know the answer
the knowledge is updated, making it one of the first
when it should have answered
commercial products with a zero-shot learning method
9. The response showed that the chatbot did not
for creating chatbots.
understand question
10. The chatbot correctly responded that it did not
Methodology to compare platforms
know the answer to out-of-scope question
11. The question is invalid
OpenBook was compared with three other popular
intent-based chatbot development platforms: Rasa, IBM’s Table 1. Parameters used for scoring the response.
Watson Assistant, and Google’s Dialogflow. Three
experts, each having at least one year of experience in The evaluators were unaware of the platform used for
developing chatbots using the respective platform, were response generation, and the order of the responses was
given a specification/factsheet [3] to build a chatbot for a also randomized to avoid any bias.
hotel.

Two high-level activities were conducted for comparison


of different chatbot development platforms — chatbot

Copyrighted material. © 2022 Botpress Technologies, Inc.


Preliminary Results and Analysis A preliminary analysis was also done to measure the
performance of the chatbots created using different
The following section provides a brief analysis of the platforms. The results shown are from the initial full
preliminary results. Fig. 1 measures the dataset (after the removal of erroneous instances). This
‘ease-of-development’ by comparing the time to build of paper will be updated with further analysis as and when
each of the chatbots developed, given the same new datasets, results, errors, and insights are identified.
specification. The results show the time to develop a The raw dataset, encoding, and analysis work are
chatbot using OpenBook is considerably less compared available in [4] and [5]. The dataset can also be
to the next best time by both Rasa and Dialogflow. downloaded from Kaggle [6]. There are a total of 5019
question-response pairs on which results have been
reported. The metrics shown are an average of scores
between two sets of evaluations.

Fig. 3 shows a comparison of the number of times the


responses from each chatbot were completely accurate,
as measured by the evaluators.

Fig. 1. Time to build the chatbots by experts on different


development platforms.

The second analysis on the chatbot building was to


compare the size of the input by measuring the total
Fig. 3. Comparison of performance of chatbots by
number of non-whitespace characters used as an input
measuring the number of accurate responses (average of
for building a chatbot for each of the platforms. It is
2 rounds of evaluation).
evident from Fig. 2 that OpenBook takes the least
amount of characters as an input. A comparison was made to measure the number of times
the response generated was not correct, as shown in Fig.
4. Inaccurate responses are responses that are irrelevant
to the question asked but relevant to the information
provided to the chatbot while building it. From Fig. 4, it
can be seen that OpenBook performs better than the
intent-based methods.

Another parameter that determines the creativity of a


chatbot to provide more human-like responses was to
identify the number of times that the chatbot repeated
the information verbatim from the information provided
to build the chatbot. This shows that OpenBook can
provide responses that do not sound monotonous.
Fig. 2. Comparison of input sizes on different platforms

Copyrighted material. © 2022 Botpress Technologies, Inc.


Fig. 4. Comparison of performance of chatbots by Fig. 6. The number of instances where the chatbots
measuring the number of incorrect responses (average of created fake responses irrelevant to the information used
2 rounds of evaluation). for the bots (average of 2 rounds of evaluation).

There were also two rounds of evaluation to determine


the best and worst responses for each question given by
the four chatbots. The chatbots' best and worst
responses are shown in Fig. 7 and Fig. 8, respectively.
The evaluator can rate more than one response as the
best or worst for each question, given that the responses
by different chatbots might be similar. Both numbers
show that OpenBook fared well compared to the other
chatbots in having the most number of best responses
and the least number of worst answers.

Fig. 5. Comparison of verbatim responses by different


chatbots (average of 2 rounds of evaluation).

Another analysis was done to see if the chatbots were


able to provide information that was completely false, as
opposed to providing inaccurate responses, as shown in
Fig. 6. Providing false responses is an area where
OpenBook is below par against the intent-based chatbots
and suggests clear opportunities to improve the intentless
chatbot platform further.

Fig. 7a. The number of times (in percentage) the


response from each chatbot was considered to be the
best (average of 2 rounds of evaluation).

Copyrighted material. © 2022 Botpress Technologies, Inc.


4. Each response was rated by two different
evaluators.
5. The time of development of building a chatbot
was not actively tracked using an automated
system.
6. The subjectivity and bias of independent
evaluators in understanding specifications,
questions, and responses by different bots are not
quantified.
7. The level of expertise of independent evaluators
for coding responses might differ.

Fig. 7b. The number of times (in percentage) the Table 2. Constraints of study
response from each chatbot was considered to be the
worst (average of 2 rounds of evaluation). 1. The order of the responses from different
chatbots presented to evaluators was randomized.
Study Parameters 2. Evaluators were unaware of the platform used to
develop while evaluating the responses.
While the benchmarking exercise was not completely 3. Data will be shared so that independent analysis
objective; it has value in highlighting the progress made can be conducted.
with the use of a knowledge-based approach over
Table 3. Fairness Measures
intent-based approaches. An effort has been made to
mitigate bias through obvious sources in the results. Any
identified erroneous data has been removed from the 1. Chatbots developed using other platforms fare
analysis. However, there are multiple constraints and well in certain parameters. It is an active area that
fairness measures in the methodology and analysis. Botpress is working to improve.
Tables 2, 3, and 4 provide a non-exhaustive list of the 2. Data cleaning and sanity checks on the encoded
constraints, fairness measures, and identified answers can be further improved.
opportunities for improvement in the methodology. Table 4. Identified opportunities for improvement in
Despite the issues, the results indicate the advantages of OpenBook
the knowledge-driven approach and can be considered
the future of chatbot development, with OpenBook’s Conclusion
first version being the first step toward it. Multiple
improvements can be made to make the OpenBook is a knowledge-based approach to building
knowledge-based approaches easier to build and with question-answering chatbots without training and
better performance. fine-tuning models. These benchmarks show that
OpenBook beats its counterparts using intent-based
1. The chatbots were built by experts with a methods in both categories — the ease of building a
minimum of one year of experience in building chatbot by developers and the satisfaction scores by the
bots using their respective platforms but may not chatbot users. From the preliminary results and analysis,
have an equal amount of experience. one can infer that knowledge-driven approaches work
2. Only one attempt was provided to each expert to well over the intent-based approaches in providing the
build the chatbot. right and relevant responses and making it easier for
3. The performance of the different platforms building the chatbots using OpenBook.
heavily reflects the approach of the developers of
the chatbots. Even though the developers were
experts, the choices they made influenced the
results.

Copyrighted material. © 2022 Botpress Technologies, Inc.


References

[1] Chatbot market in 2022: Stats, trends, and companies


in the growing AI chatbot industry, Business Insider,
2022 (link)

[2] Bank Cost Savings via Chatbots to reach $7.3 billion


by 2023, as automated customer experience evolve (link)

[3] Specification/Factsheet given to chatbot builders


(link)

[4] Raw Data and Analysis Spreadsheet - Round 1 (link)

[5] Raw Data and Analysis Spreadsheet - Round 2 (link)

[6] Justin Watson, Sylvain Perron. (2022). Botpress


Question and Answering Chatbot Data, Kaggle.
https://doi.org/10.34740/KAGGLE/DSV/3594757
(link)

Copyrighted material. © 2022 Botpress Technologies, Inc.

You might also like