In this section, we propose our methodological framework, and then detail each part in the following.
3.1. Methodological Framework
This section introduces a methodological framework for modeling tourist satisfaction from online travel reviews, as shown in
Figure 2. For a clearer introduction to the method framework, first define some basic concepts covered in this article.
Online travel reviews: An online travel review is text generated by a tourist, including the tourist’s opinions on the tourism services he/she experienced;
Tourist satisfaction: Tourist satisfaction is a kind of psychological state, resulting from tourists’ overall subjective evaluation of tourism services based on their expectations and actual performance [
7]. In previous studies, online ratings of customers are usually used to represent customers’ satisfaction with products and services [
8,
16,
28,
29]. Following these studies, this paper will use online ratings of tourists to express their satisfaction levels with the tourism services they experienced;
Tourist satisfaction dimension (TSD): Tourists usually evaluate tourism services based on their perception of some important attributes of the tourism services they experienced. Similar to the research of Guo et al. [
8], this paper defines these important attributes of tourism services as TSDs;
Classification of TSDs: In this paper, under the classification framework of the Kano model, TSDs are divided into five categories, namely, performance TSD, excitement TSD, must-be TSD, reverse TSD, and indifferent TSD, whose meanings correspond to the five basic attributes of Kano model.
Based on
Figure 2 and an explanation of the basic concepts, let’s look at the three main parts of the framework.
3.1.1. Mining Tourists’ Sentiments toward TSDs from Online Reviews
The collected online review data exist in the form of free text, which cannot be directly analyzed. In this part, through various text processing technologies, tourists’ positive or negative sentiments on different TSDs are mined from online text reviews, so as to transform unstructured online text reviews into a structured data matrix for modeling tourist satisfaction. Specifically, this part consists of two stages: (1) mine TSDs from online reviews using the LDA model; (2) identify the sentiment orientations of the review data for each TSD based on the sentiment dictionary.
3.1.2. Identifying the Category of Each TSD
Tourist satisfaction is expressed by the online rating given by tourists. The process of tourists giving the overall rating is a comprehensive evaluation of the performance of the tourism services in various aspects. Thus, tourist satisfaction can be viewed as a complex combination of tourist sentiments regarding the multiple TSDs covered in their reviews. Based on the structured review data obtained in the last stage, as well as overall tourist satisfaction (online ratings), this paper uses the BPNN network to depict the influence of tourists’ positive and negative sentiments of TSDs on their satisfaction, and then puts forward the calculation method of effect according to the weight parameters obtained by model training.
3.1.3. Measuring the Influence of Each TSD Sentiment on Tourism Service Satisfaction
Identifying the corresponding attribute category of each TSD under the Kano framework is conducive to improving the tourism services more effectively and thus improving tourist satisfaction. According to the importance calculated in the previous step, we can use the effect-based Kano model (EKM) to convert the extracted TSDs into five categories (performance TSD, excitement TSD, must-be TSD, reverse TSD, and indifferent TSD). Finally, the identified TSDs category plays an important role in the improvement strategy of tourism services.
Let’s go through each of these sections in detail based on the overall framework in
Figure 2.
3.2. Excavating Tourists’ Sentiments on TSDs
3.2.1. Extracting TSDs Based on LDA
Previous studies have proved that LDA is an effective topic extraction method for online reviews [
8,
30]. LDA is a three-level Bayesian model, which assumes that each item in the set contains a finite number of topics. In the LDA model, words, topics, and documents are three important concepts. The word is a basic unit. Each document consists of multiple words and contains one or more topics. In this context, each online review can be viewed as a document, so the words in the document are the words in the review. According to the frequency of each word in each review, we can obtain the topic distribution of the review, and the word distribution of the topic by training the LDA model.
Extracting TSDs from travel reviews using LDA mainly includes two steps: preprocessing text reviews and extracting TSDs from review data:
In the Chinese travel text review data, there are not only words related to the required TSDs, but also a large number of noise words and irrelevant words, which will not become the target CSDs and will aggravate the data sparsity problem. Therefore, it is necessary to improve the effect of the LDA model through the pretreatment process. First, we divide the Chinese text review data into words. Then, we filter the corresponding words in the sentences according to the stop word dictionary (HIT Stop Word List, Chinese Stop Word List), negation dictionary, degree adverb dictionary, and sentiment dictionary. Let R = {r1,r2,…rM} denote the online review dataset, where denotes the th text review, and is the number of reviews in the dataset. By counting the occurrence frequency of each word in each preprocessed text review, we can obtain the review-word matrix , where denotes the number of words appearing in all the preprocessed review data;
- 2.
TSDs extraction based on LDA
By using the obtained
matrix as input, the LDA model can be trained. The output of the trained LDA model has three parts, including review-topic matrix, topic-word matrix, and topic list. Because there are noise words in the obtained topics, and some subject words may have similar meanings, we can manually filter the noise words and merge subject words with similar meanings to obtain more reasonable results. Then, we select the appropriate topics from the results and assign the tag to each topic. As in some existing studies, each thematic term can be regarded as a TSD [
7,
16]. Let
denote the number of TSDs (i.e., topics), each consisting of multiple frequent words, and let
denote the number of frequent words under the
th topic, so that the
th TSD can be denoted as
, where
denotes the
th frequent word in the
th TSD.
3.2.2. Dictionary-Based TSD Granularity Sentiment Recognition
Typically, a single online review may contain several sentences related to different TSDs.
Table 1 shows several examples of online travel review text. At the same time, the TSDs mentioned in different reviews may be different. To identify tourists’ emotional attitudes towards different attributes, it is necessary to extract the sentences containing each attribute from the original reviews, that is, decompose the original reviews into components under different TSDs. First, we divide the online reviews in
into clauses according to punctuation marks.
Then, according to the obtained , we extract sentences from which contains , and obtain the review set related to the topic . In particular, if more than one sentence in a review is related to a TSD, the sentences are merged into one. If a review contains no sentence related to a TSD, the corresponding component is null.
Let
denote the review set related to the
th TSD in the total review data
,
denote the
th review in
, and
denote the number of reviews in the dataset. To show this process more clearly, we take the reviews in
Table 1 as an example. Through the above processing, we can extract the clauses related to TSDs from the three reviews. The specific results are shown in
Table 2.
- 2.
Review text decomposition
In order to identify the sentiment orientations of each review in
, this paper will use a Chinese sentiment dictionary for sentiment analysis. Dalian University of Technology Chinese Emotion Vocabulary Ontology Database (called EMO_DIC) covers commonly used sentiment words in Chinese context, which is a widely used sentiment dictionary in the field of Chinese sentiment analysis [
31,
32]. Referring to these studies, this paper uses EMO_DIC as the basic sentiment dictionary, which contains a total of 27,466 Chinese sentiment words. It divides Chinese sentiment into 7 categories and 21 sub-categories. Specifically, they include “joy”, “good”, “anger”, “sorrow”, “fear”, “evil”, and “shock”. As shown in
Table 3, the dictionary classifies each sentiment word into a specific sentiment category and provides the sentiment polarity of the word (1 for positive, 2 for negative). The parts of speech include adjectives, nouns, etc.
Based on the sentiment orientations of each review in
, we can obtain the sentiment orientations of each review
about
(that is, the
th TSD),
,
, and convert the results into nominal encoded data, as shown in
Table 4. The missing values in
Table 4 indicate that there is no review about
in the review
rm, or that there is no sentiment orientations shown in the review about
. Let
denote the sentiment orientations of review
about
. Through Formula (1), we can convert nominal encoded data into structured data, as shown in
Table 5. As can be seen from
Table 5, if
’s sentiment orientations about
are positive,
; If
’s sentiment orientations about
are negative,
; If sentiment orientations are missing,
.
3.3. Evaluate the Impact of Each TSD’s Sentiment Orientation on Tourist Satisfaction
In this section, we propose a method based on backpropagation neural network (BPNN) to measure how tourists’ sentiments towards TSDs affect their satisfaction.
Most current studies modeling tourist satisfaction from online review data follow the following assumptions [
28,
29]: (1) assume that tourist satisfaction (i.e., online rating) follows a Gaussian distribution; (2) at the same time, it is assumed that tourist satisfaction is a linear combination of tourists’ emotional attitudes towards TSDs; (3) in addition, the multicollinearity between different TSDs is low. However, in many practical problems, these assumptions cannot be satisfied. In practice, the TSDs mined from online reviews and the online ratings of tourists usually have the following characteristics: (1) tourist satisfaction (online ratings) usually follows the positive skew, asymmetric, bimodal (or J-shaped) distribution [
23]; (2) tourist satisfaction may be a nonlinear combination of sentiments toward TSDs; (3) there may be a multicollinearity relationship between the different attributes automatically mined from the reviews, and there may be a complex nonlinear relationship between the attributes and tourist satisfaction. In fact, tourist satisfaction is a complex union of their emotional attitudes towards the full range of TSDs involved in the reviews. Therefore, considering the above characteristics, this section proposes a new method to measure the impact of tourists’ TSD sentiments on tourist satisfaction.
Neural network (NN) is an effective prediction method. In some complex data environments (such as non-normal data, nonlinear relationship, multicollinearity, etc.), NN is significantly better than the multiple regression model because it is not affected by collinear independent variables and does not require the linear assumption of multivariate input variables and dependent variables [
7,
33]. Although NN is proposed to be used for prediction tasks, some studies have shown that it can also be used to determine the weight information of input variables [
34]. For example, artificial NNs were utilized in the study [
35] to evaluate the relative importance of the influence factors of consumer acceptance of behavioral-targeted advertising services. Therefore, NN is undoubtedly a competitive alternative method for measuring the influence of positive and negative TSDs sentiments of tourists on their satisfaction. BPNN is one of the most popular NN models, which will be used as the importance measurement technique in this paper. Specifically, this paper builds a BPNN model with three layers of network structure, including input layer, hidden layer, and output layer, as shown in
Figure 3 below. Generally, BPNN includes two processes of forward information propagation and reverse error propagation to train the model. In the forward process, the input node transmits the tourists’ sentiment information about each TSD to the hidden node, and then the hidden node transmits the corresponding information to the output layer through the activation function. In the reverse process, according to the error between the model calculation results and the real results, the gradient descent method is used to minimize the error, and the model parameters (namely the weight) are updated [
36].
Let
denote the structured data of
, that is, the emotional attitudes of tourists towards each TSD (a row of data in
Table 5), where
and
. In addition, let
denote the review
corresponding to tourist satisfaction. Then, the training sample is composed of
and
, which can be denoted as
. The following describes in detail the method of calculating the impact of tourist’ TSD sentiment on tourist satisfaction based on the BPNN training.
(1) Let
denote the trained BPNN model. Let
and
denote the weight between the input nodes
and
in BPNN
b and the
th hidden node, respectively (the blue line in
Figure 3), where
. Let
denote the weight between the
th hidden node in
and the output node (yellow line in
Figure 2),
. Let
and
, respectively, denote the influence of positive and negative sentiments of tourists towards each TSD on tourist satisfaction.
and
can be obtained from Formulas (2) and (3), respectively.
(2) In order to reduce the overfitting problem and enhance the reliability of the results, we conduct 10-fold cross-validation on the dataset. According to Equations (2) and (3), we calculate the sentiment weight information of the 10 trained BPNN models, respectively, and take their average value as the final required result, denoted as and, .
(3) Based on the calculated
and
, we can evaluate the total impact (relative importance weight) of the
th TSD on tourist satisfaction. Let
denote the range of influence of the
th TSD on tourist satisfaction.
can be calculated by Formula (4):
(4) Let
denote the total impact (importance weight) of the
th TSD on tourist satisfaction,
. Then, according to the studies [
37,
38], the formula for calculating
is (5):
3.4. TSD Category Recognition Based on Kano Model
According to the obtained
and
, as well as the basic principle of the Kano model, we proposed a model based on the effect of Kano (Effect-Based Kano (EKM)), which can identify the category of each TSD from the perspective of tourists. The core idea of EKM is shown in
Figure 4.
(1) In
Figure 4a, positive sentiment is considered as the performance of the TSD achieves the requirements of tourists (that is, the green rectangle in
Figure 4a); in contrast, negative sentiment is considered to be when the performance of the TSD does not achieve the requirement of tourists (the red rectangle in
Figure 4a). In addition, the online ratings of tourists indicate the overall satisfaction of tourists with the tourism services they enjoyed.
(2) In
Figure 4b, with the introduction of
and
, the traditional Kano model framework is divided into two parts. Among them, the right side is the part related to positive sentiments, that is, the requirements for the TSD are fulfilled. Meanwhile,
can be regarded as the influence of
on the overall satisfaction of tourists when TSD
is satisfied; accordingly, the left side of
Figure 4b is the part related to negative sentiments, that is, the requirements of the TSD are not fulfilled. Meanwhile,
can be regarded as the influence of
on the overall satisfaction of tourists when TSD
is not satisfied. At this point, the detailed meanings of
and
are as follows:
(i) > 0 indicates that the overall satisfaction of tourists will increase when their requirements for are satisfied;
(ii) ≤ 0 indicates that the overall satisfaction of tourists will not increase when their requirements for are satisfied;
(iii) ≥ 0 indicates that the overall satisfaction of tourists will not decrease when their requirements for are not satisfied;
(iv) < 0 indicates that the overall satisfaction of tourists will decrease when their requirements for ti are not satisfied.
In
Figure 4c,
and
are denoted as the horizontal axis and vertical axis, respectively. Therefore, the TSD represented as a curve in
Figure 4b can be converted into a point in
Figure 4c. Thus, according to the basic principles of the Kano model, combined with
and
, TSDs can be divided into five types in
Figure 4c, the detailed explanation is as follows:
(i) If both and , this indicates that has a very small effect on the overall satisfaction of tourists, and is an indifferent attribute. It is worth noting that that means the threshold that determines whether a CSD is an indifferent property; the classification conditions for other cases are as follows, (ii)–(v);
(ii) If and , then is a must-be attribute, that is, when the requirements of tourists for are satisfied, the overall satisfaction of tourists will not increase. When they are not satisfied, the overall satisfaction of tourists will decrease;
(iii) If and , then is a reverse attribute, that is, when the requirements of tourists for are satisfied, the overall satisfaction of tourists will not increase. When they are not satisfied, the overall satisfaction of tourists will not decrease;
(iv) If and , then is a performance attribute, that is, when the requirements of tourists for are satisfied, the overall satisfaction of tourists will increase. When they are not satisfied, the overall satisfaction of tourists will decrease;
(v) If and , then is an excitement attribute, that is, when the requirements of tourists for are satisfied, the overall satisfaction of tourists will increase. When they are not satisfied, the overall satisfaction of tourists will not decrease.