1. Introduction
The analysis of the financial health and performance of companies and the prognosis of their potential financial failure, even bankruptcy, are currently continually debated and examined topics, both between academics and entrepreneurs. Global problems, the pandemic, and internal problems within countries and companies create the premise of their possible failure. Therefore, companies must face many challenges in this regard. To avoid bankruptcy, they must apply increasingly advanced and sophisticated methods of diagnosing their financial health. By applying these methods, managers of companies obtain the necessary information about their financial health, which can be a useful tool to manage and improve companies’ performance.
In economic sciences, as well as in other leading fields, nonparametric methods are increasingly used in research instead of multi-criteria parametric methods. This is primarily due to the quality and usability of economic and financial data, which do not meet the requirements for the application of parametric methods. Another important factor for the change in applied methods is the fact that the amount of processed data is growing. It increases the demands on tools and methods that can be used to process and analyze these data. This is also confirmed by Berezigar Masten and Masten [
1], who point out that large datasets are available and have grown significantly over the last 20 years. Equally important is the speed of development in the field of applicable technologies and statistical methods. These facts help to apply increasingly sophisticated methods in various scientific fields. The limitations of parametric methods were already pointed out in 1999 by Dimitras et al. [
2], who applied a method that eliminates the limitations of parametric methods and thus achieved better results in their research compared to the results of parametric methods. Several nonparametric methods may be used to solve this problem. To illustrate, Gepp et al. [
3] suggest that decision trees can achieve better classification accuracy than discriminant analysis (DA), and their application does not need to meet the assumptions required when applying DA. Olson et al. [
4] argue that logistic regression (LR) provides less exact results than decision trees when predicting bankruptcy. Durica et al. [
5] applied decision trees (CART, CHAID) to classify prosperous and non-prosperous businesses and achieved 98% classification accuracy. Liu and Wu [
6] pointed out the simplicity of interpreting the results of decision trees. Lee et al. [
7] highlighted the decision tree as an important tool for predictive and classification analysis. The authors argue that despite the fact that predictive analytics has existed for decades, it has gained much attention, especially during the last decade of the 20th century, while providing the application of new nonparametric methods. This approach mainly includes data-mining methods and big data analysis.
Ensemble methods have been developed and studied to create a model that can accurately predict outcomes with higher precision. The aim of these methods is to create a model with better classification accuracy than single models, which can be achieved individually by combining a set of simple or weak models [
8]. Well-known combination methods include bagging or boosting [
9]. Several studies comparing the classification accuracy of these methods with single models can be mentioned. For example, Karas and Režňáková [
10] achieved much better classification accuracy results when using the nonparametric method of boosted trees compared to linear discriminant analysis (LDA). Ben Jabeur et al. [
11] proposed an improved XGBoost algorithm derived from a selection of important features (FS-XGBoost) and confirmed that this model with traditional feature selection methods outperformed discriminant analysis, partial least squares discriminant analysis, logistic regression, support vector machines, Multilayer Perceptron, including Restricted Boltzmann Machine in their AUC.
Despite significant achievements in knowledge acquisition, traditional machine-learning methods (LDA, LR) may not achieve satisfactory performance when handling complex information, e.g., unbalanced, high-dimensional, noisy data, etc. This is due to the difficulty of capturing additional characteristics and underlying data structures, which is a limitation of these methods. Under these circumstances, the question of how to build an effective mining model for knowledge discovery comes to the fore. The concept of ensemble learning revolves around combining data fusion, data modeling, and data mining to create a single connected and comprehensive system.
Another of the possible applications in the given area, which falls under the issue of machine learning, is artificial neural networks. Results of their application are presented in studies [
4,
12,
13,
14].
When predicting the financial health of companies, methods that fall into the special field of machine learning—deep learning—have recently come to the fore. The application of these methods has greatly improved the up-to-date level of development in the fields of recognition of speech and visual objects, as well as their detection and many other areas, such as the discovery of medical devices and the prognosis of corporate bankruptcies. Deep convolutional networks brought a breakthrough in the processing of corporate insolvency forecasts, as well as in other areas, using image, video, speech, and sound processing [
15].
Nevertheless, discriminant analysis and logistic regression are methodologies most often used in predicting a company’s financial health [
3].
In this paper, an analysis of the financial health and performance of businesses in the construction industry was performed. This industry is unique because of its high capital intensity, uniqueness of projects, and long-term project periods [
16,
17]. Companies doing business within it achieve high values of liquidity and have different capital structures [
18] compared to businesses from other industries. For this reason, simple bankruptcy prediction methods are not suitable for companies doing business within this industry. Ensemble methods appear to be an appropriate alternative. The application of these methods in the construction industry of transition economies is rare. This paper strives to fill this gap in scientific research by submitting insolvency prediction models applied within the construction industry using ensemble algorithms. We suppose that these models will achieve higher performance than simple ones. Based on this, we asked the following research question: Will ensemble bankruptcy prediction models for the construction industry achieve higher performance than simple models?
Ensemble models such as boosted trees are often used for bankruptcy prediction due to the high variance of financial indicators that express the financial health of companies. The reason for this variance is the relatively limited number of used samples, meaning that most of the values of the selected indicators are concentrated in a narrow range, but some companies also have extreme values. This creates problems when using Gradient-based models like neural networks or logistic regression, which can lead to less effective predictions. Even after normalizing or standardizing the data, the state is problematic to overcome. Boosted tree models, on the other hand, consider the order of feature values rather than the values themselves. This means that the results are not affected by extreme values and do not demand any pre-processing efforts.
The rest of the paper is organized in the following manner:
Section 2 lists selected machine-learning methods and studies dealing with bankruptcy prediction.
Section 3 specifies the sample used in the research and the initial set of financial features and describes Lasso regression and selected simple and ensemble classifiers from the methodological point of view.
Section 4 offers results of bankruptcy prediction achieved by neural networks, decision trees, AdaBoost, and Gradient-boosting algorithms.
Section 5 discusses the results achieved in this study in terms of similar studies and conclusions.
2. Literature Review
Currently, artificial intelligence (AI) techniques are coming to the fore as alternative methods to established approaches or as a part of combined procedures. These are increasingly used to find answers to complex economic assignments, including predicting the failure of businesses. Artificial intelligence techniques manage to overcome the shortcomings of nonlinear programs and deal with corrupt and incoherent data. Their advantage is the ability to learn from examples, and when trained, they can predict and provide generalized notions with ease and speedier [
19]. The subfield of AI is machine learning, which allows computers to learn even if they are not directly programmed [
20]. There are two basic types of learning methodologies used in machine learning—supervised learning and unsupervised learning. The difference between them is that supervised learning provides a target value to improve the training with the data, while the unsupervised method deals with a target value as such. The second difference lies in their application to different problems. While supervised learning is applied in regression or classification settings, unsupervised learning is rather dedicated to solving association and clustering problems [
21]. Often used machine-learning algorithms include Multiple Discriminant Analysis, Logistic Regression, Naïve Bayes, Bayesian Network, Support Vector Machines, Decision Tree, Random Forest, Bootstrapped aggregation (Bagging), AdaBoost, Gradient Boosting, K-Nearest Neighbor and Artificial Neural Network (ANN) [
22].
Artificial neural networks are a category of parallel information processing models motivated by biological neural networks, which take into account various significant simplified adaptations [
23]. They are able to learn through previous exposure by generalizing acquired knowledge to form new conclusions, and in this way, they can make useful decisions [
9]. Recent studies show that ANNs have nonlinear, nonparametric adaptive learning properties that enable them to recognize and classify patterns [
12] (p. 16). During the recent period, they have been used successfully to solve and predict numerous financial difficulties, including the prognosis of insolvency [
12].
The pioneering attempts using artificial intelligence models which imitated the organic nervous system can be traced back to the 1920s. It took only two decades for the basis of the scientific area handling artificial neural networks to be created. The prime theory devoted to this cause was authored by McCulloch and Pitts [
24]. Within their work, both highlighted the probability of the actual presence of an artificial neural network that would be able to deal with arithmetic and logical algorithms. This work was influential to further scholars who started to examine the practical application of neural networks [
25]. In 1958, Rosenblat proposed a definition of a neural network structure termed perceptron. It may have been possibly the earliest “honest-to-goodness neural network tool” since it ran as a detailed simulation using an IBM 704 computer [
26]. The perceptron was developed in line with biological fundamentals and demonstrated a skill to learn [
27]. These attempts were stopped because it was assumed that the linear neural network method was insufficient for dealing with more complex problems [
25]. A breakthrough concerning the development of neural networks occurred during the 1970s and 1980s when Rumelhart et al. [
28] rediscovered the backpropagation procedure previously developed by Werbos in 1974 and used it as a widely accepted tool for training a multilayer perceptron. The aim is to find the minimum error function in relation to the connection’s weights [
27,
29]. Multilayer perceptrons (MLPs) with backpropagation learning algorithms, also known as multilayer feedforward neural networks, can handle a high count of statements that single-layer perceptron is unable to resolve. Therefore, they have been used more than other types of neural networks for a wide variety of problems [
30,
31].
Another widely used machine-learning method is decision trees (DT). They create a hierarchical tree structure that divides data into leaves [
9]. Branch nodes in decision trees store rules for classifications, which are used to cluster comparable samples into identical leaf nodes. As a result, they are used in both classification and regression tasks [
32]. Commonly used decision tree methods encompass ID3, C4.5, C5, CHAID (Chi-squared Automatic Interaction Detection), CART (Classification and Regression Trees), Best First Decision Tree or AD Decision Tree [
4,
33]. Decision trees were first used to predict business failure by Frydman et al. [
34]. These authors found that DT outperforms discriminant analysis in predicting financial failure. DTs were also used in corporate health prediction by Nazemi Ardakani et al. [
35]. Olson et al. [
4] used DT and confirmed its significant classification accuracy. The excellent prediction and classification accuracy of DT models was also confirmed by Gepp and Kumar [
36].
Among well-known machine-learning algorithms applicable to both classification problems and regression tasks, the Support Vector Machine (SVM) is highly acknowledged [
37]. It belongs to the most widely used classification models because of its positive results in greater feature spaces [
32]. This method was first introduced by Vapnik and Vapnik [
38]. SVM attempts to identify the hyperplane with the largest margin to split the inputs into binary clusters, which is called the optimal separating hyperplane. In the case of linearly indivisible data, SVM aims to map these data onto a feature space of higher dimension by transforming its core function [
32].
Over the past few decades, within the area of machine learning, a couple of significant advances in sophisticated learning procedures and powerful pre-processing methods were produced. Among others, it was the further development of ANNs towards deeper neural network architectures combined with advanced learning proficiency, which is summarized as deep learning [
39]. A deep ANN is based on the application of a nonlinear model with multiple hidden layers, allowing the capture of the complex relationship between input and output [
40]. The benefit deep learning provides over machine learning is that humanly extricated or hand-crafted factors are no longer needed. Deep learning derives aspects spontaneously from source input, processes it, and determines further actions that are dependent on it [
40].
In the discipline of recognition of patterns and machine learning, numerous classifiers have been frequently utilized in scientific trends. These combined approaches are known as ensemble classifiers [
9]. They have recently delivered better results than a unique classifier [
6]. The easiest way to incorporate different classifiers is majority voting. The dual outputs of the
k separate original classifiers become merged. Then, the class with the highest count of votes is chosen as the ultimate classification outcome. Another way of combining classifiers is bagging. When using this method, a number of classifiers are thought of separately on distinct training sets utilizing the bootstrap method [
9]. Another well-known method of combining classifiers is boosting. Using this method, the importance of incorrectly predicted training instances in subsequent iterations is increased, and thus, the classifiers learn from the errors of previous iterations [
41]. Examples of boosting ensembles are AdaBoost or Gradient Boosting [
41]. These algorithms have been used in previous studies, and their performance was compared with that of base learners. Kim and Upneja [
33] used decision trees and AdaBoosted decision trees to determine the critical features of financial problems of publicly traded US restaurants in the period 1988–2010. AdaBoosted decision trees achieved superior prediction results and the lowest error type I. The delivered outcome of this model showed that restaurants in financial need were massively dependent on debt and achieved worse current ratios, their assets grew slower, and their net profit margins were lower than those of non-distressed restaurants. The authors suggested the application of the AdaBoosted decision tree to enable an early warning system regarding the prediction of the restaurants’ financial distress. Wyrobek and Kluza [
42] compared the performance of Gradient-boosted trees with other methods. They found that Gradient-boosted trees performed much better than Linear Discriminant Analysis and Logistic Regression and slightly better than Random Forest. According to these authors, the main advantages of Gradient-boosted trees lie in dealing very well with outliers, missing data, and non-normally distributed variables. They are also used to automatically expose nonlinear interactions among features while adjusting to them. Alfaro et al. [
43] conducted an empirical study comparing the performance of AdaBoost and neural networks for predicting bankruptcy. Their findings based on a set of European firms indicate that the AdaBoost method they proposed can significantly lower the generalization error by approximately 30% when compared to the error generated by a neural network.
An overview of selected AI methods applied to predict corporate financial collapse is listed in
Table 1.
3. Methodology
The sample under consideration included businesses from the Slovak construction industry, SK NACE 41—Construction of buildings (NACE means the classification of economic activities in the European Union). Financial data were obtained from the CRIF Agency [
68]. When preparing the research sample for the analysis, from the set of all enterprises doing business within SK NACE 41, enterprises with incomplete records and zero sales were removed. In the next step, it was necessary to identify and remove outliers from the analyzed sample. For this purpose, kernel density estimates were created for all analyzed indicators. After removing enterprises with incomplete records, zero sales, and outliers from the analyzed sample, we continued to work with a set of 1349 enterprises. Analysis was performed using 24 financial features from all areas of financial health assessment: profitability, assets management, liquidity, and capital structure. Selected financial features and formulas for their calculation are listed in
Table 2.
Three criteria [
69] were used to determine the assumption of prosperity for the analyzed sample of companies (see
Figure 1). Businesses were classified as non-prosperous if they met all these criteria. In this paper, we supposed that non-prosperity is a prerequisite for bankruptcy. The research sample consisted of 1282 prosperous and 67 non-prosperous businesses.
In this paper, Lasso regression was used to find the most appropriate financial features in terms of bankruptcy prediction. Lasso regression was introduced by Tibshirani [
70]. It identifies the variables and their associated regression coefficients that lead to building a model with the minimum prediction error. This is achieved by constraining the parameters of the model, which “shrinks” the regression coefficients towards zero [
71]. When performing Lasso in software Statistica 14.1.0.8. Lasso logistic regression was selected. The penalized logarithmic likelihood function of Lasso logistic regression that needs to be maximized can be written as (1) [
72]:
where
is the penalty parameter,
is the column vector of the regression coefficients,
are the independent variables,
is the binomial dependent variable,
is the number of observations, and
is the number of variables.
The penalty parameter λ was determined based on the minimum prediction error in cross-validation (λmin).
Two nonparametric methods were applied to predict bankruptcy—neural networks and decision trees.
Neural networks are currently considered one of the best machine-learning algorithms. The theory of neural networks is based on neurophysiological knowledge. It tries to explain behavior based on the principle of processing information in nerve cells. ANNs are sometimes called brain-without-mind models [
73]. Recently, machine-learning techniques, especially ANNs, have been widely investigated with respect to bankruptcy prediction, as they have been confirmed as good predictors and classifiers, especially when classifying companies according to their risk of possible bankruptcy [
17]. In this research, we applied feedforward ANN with multiple hidden layers.
The basic structure of neural networks consists of a directed graph with vertices (neurons), which are arranged in layers and are connected by edges (synapses). The input layer consists of all the inputs in separate neurons, and the output layer consists of dependent variables. The function of the simplest multilayer perceptron (MLP) can be written with the use of the following Formula (2) [
74]:
where
is the intercept,
is the vector of all synaptic weights except for intercept,
is the vector of all inputs. The flexibility of the modeling can be improved by adding hidden layers. MLP function with a hidden layer comprising
J neurons can be written as follows (3):
where
is the intercept of the output neuron,
is the synaptic weight,
is the intercept of the
jth hidden neuron, and
is the vector of all synaptic weights.
All hidden and output neurons calculate the output
f from the outputs of all previous preceding neurons
, where the integration function
g and the activation function
f are as follows:
g: →
R;
f: R →
R. Integration function
g can be defined as (4):
As an activation output function, SoftMax was used, which is a combination of several sigmoid functions. The SoftMax function, as opposed to the sigmoid functions used in binary classification, can be applied to multiclass classification problems. The output of the SoftMax function is a value between 0 and 1, while the difference compared to the sigmoid functions is that the sum of the output neurons is 1. The formula for the activation function is as follows (5) [
75]:
Decision trees are nonparametric discrimination algorithms that are increasingly popular because of their intuitive interpretability characteristics [
4]. When building a decision tree, recursive splitting is used to fit a tree to the training sample. During this process, the training sample is gradually divided into increasingly homogenous subsets using specific criteria [
76]. Various decision tree algorithms can be used. One of the most successful ones is the CART algorithm [
77] applied in this study. The CART algorithm introduced by Breiman [
78] is used to build both classification and regression trees. The construction of the classification tree using CART is based on binary partitioning of attributes [
79].
Let us denote training vectors
as well as a label vector
. A decision tree recursively splits the feature space in such a way that it matches samples with similar or the same features and target values. Let the data at node
be presented by
with
samples. In the case of each candidate split
formed by a feature
and threshols
, splits the data into
and
subsets as follows (6) [
80]:
A crucial decision when creating a classification tree is the selection of the feature on which to perform further partitioning. A specific measure of the “impurity” of a set of cases can be used. It is the extent to which training cases from several classes are contained in a node [
76]. In this paper, the Gini index (GI) was used as an impurity measure.
At the moment when a node splits into two child nodes,
GI is calculated for each child node
GI(
i) as follows (7):
where
is Gini index at node
t and
is a share of cases from class
c at node
t.The total value of the Gini index for a given distribution (
) (8) is equal to the weighted sum of all GI indices of individual child nodes. The weights are set according to the size of the child node. Therefore, we calculate
as the sum of
of the child nodes, which are multiplied by the corresponding share of observations in the given child node from the total number of observations in the original node [
76,
81].
where
is the number of child nodes for binary tree
K = 2,
is the number of observations at node
t and
is the number of observations at child nodes.
To enhance the classification accuracy of weak learners (in our case, decision tree), one of the ensemble techniques—boosting—was used in this paper. The first and still the most widely used boosting algorithm today is AdaBoost [
82], which was presented in the study of Freund and Shapiro [
83].
The AdaBoost algorithm is designed to classify tasks into two classes. In this algorithm, trees “grow” gradually. Each tree applies the information from the previous tree, so instead of random selections with repetition, a modification of the original data file is used. The algorithm works by gradually adjusting the weights of learning examples based on previous results. The first iteration is typically one where the learning algorithm formulates a hypothesis from learning examples with equal weights (the weight of each example is 1). If the total number of observations is n, then the starting weights will be equal to 1/n. In the next steps, the model is created using a weight vector that increases the weights if observations are classified incorrectly and decreases the weights if they are classified correctly. The classification method thus increasingly “focuses attention” on “difficult” observations that cannot be assigned to the correct class. When creating trees, residuals are used as the dependent variables [
84,
85].
Let us denote the indicator function ‖
m {
yi}, which takes on the value 1 if observation
i is misclassified in step
m. Next, denote the weight of observation
i in step
m as
wm (
i). The weights of the observations are always non-negative, and their sum is equal to 1. Subsequently, let us calculate
em, which represents an error in the assumption of determining the input weights. This error can be calculated as follows (9):
In the next step, the weights of correctly classified observations are multiplied by a constant, which is calculated based on the following Formula (10):
All weights need to be multiplied by a suitable constant to maintain the relationship (11):
Usually, the algorithm is stopped when there is a sufficient number of iterations, e.g., 100, or if it is true that .
To select a suitable tree, the final classification of the possible failure of enterprises is chosen as a combination of the classification of individual trees, while each tree is assigned a weight. Let us mark individual trees as
. The weight assigned to each tree is as follows (12):
Learning rate is a parameter specified by the user. Subsequently, the individual weights are calculated, and the variants are evaluated. The variant that achieves the highest sum of weights is selected.
Gradient boosting, introduced by Friedman [
86], is a numerical optimization algorithm that aims to find a model that eliminates the errors of the previous models. At each step, it iteratively adds a new DT, which best reduces the lost function [
87]. The methodology of applying Gradient boosting on a decision tree as a base learner was inspired by Friedman [
86] and Bentéjac et al. [
88].
Given a training dataset
, Gradient boosting aims to look for an approximation
of the function
investigating instances
x to their output values
y that minimizes the expected value of a chosen loss function
. These functions are the models of the ensemble (e.g., DT). The approximation is built iteratively. Let us initialize the model with a constant value prediction
(13):
where
L is the loss function,
is the predicted value. Argmin means we are searching for the value
that minimizes
For , while M is the number of decision trees, the aim is to:
where
is forecast for the base model.
- 3.
Find the output value of each leaf of the decision tree using Formula (15)
where
is a decision tree produced during residual.
- 4.
Update the model as follows (16):
where
is the weight of the
function
.
If there are deficiencies in the regulation of the iteration process, the above algorithm may suffer from overfitting. When applying some loss functions, the model can perfectly fit pseudo-residuals. In that case, in the next iteration, the pseudo-residuals are equal to zero, and the process is terminated prematurely. Several regularization hyperparameters are considered to control the additive Gradient boosting process. A natural way to regulate the Gradient boosting is to use shrinkage to decrease each step of the gradient with
. The value of
v is most often set to 0.1 [
88].
Furthermore, regularization can also take place by reducing the complexity of the trained models. In the case of decision trees, it is possible to regulate the depth of the tree as well as the minimum number of instances needed to split a node [
88].
When building bankruptcy prediction models, 20% of the data were used for testing and the remaining 80% for training. K-fold cross-validation (N = 5) was applied to ensure that observations in each set are equally distributed between prosperous and non-prosperous businesses. The following classification accuracy measures were used to measure the performance of the models: Accuracy, which measures the percentage of correctly classified cases [
89]; Precision (also called confidence), which expresses the proportion of predicted positive cases which are correctly true positives. Recall (also called sensitivity), which expresses the proportion of true positive cases that are correctly predicted positive [
90]. F1-score, which is defined as the harmonic mean of precision and recall [
91]. AUC is not affected by the prior class distributions [
92] and is considered a better measure than accuracy [
93]. Models were built in the Python module Scikit-learn.
4. Results
The analyzed sample of companies from the construction industry achieves acceptable liquidity results, which was confirmed by several authors. The median of the current ratio as a representative of financial risk reaches a value of 1.26 (see
Table 3). This result is higher than 1.2, which rating agencies consider to be the threshold in financial risk assessment. In the case of this indicator, extreme values were also recorded, with a maximum value of 32.91 and a minimum value of 0.15. These are the shortcomings of the sample of companies when extreme values occur within it. Therefore, the application of methods such as DT, which are not affected by extreme values, is very suitable. The safety of the analyzed sample of companies is low, which can be seen in the values of the NWCTA and NWCCA indicators, whose averages do not reach the required values. Profitability indicators, as important predictors of possible bankruptcy of companies, achieve positive results on average. However, even in the case of these indicators, some companies achieve extremely low values and represent outliers in the given area. The ROE and ROS
EBITDA indicators achieve the best results. The problematic area of the analyzed sample of businesses is the capital structure since, in this industry, debt is higher than equity. A significant shortcoming affecting the financial health of the sample under investigation is the long Receivables turnover ratio.
The most relevant features in terms of bankruptcy prediction were identified by the Lasso logistic regression performed in software Statistica 14.1.0.8. These features were selected based on the optimal value of λ determined according to the minimum prediction error of the model when using 10-fold cross-validation (λ
min). The most relevant features at the optimal value of
and their coefficients are as follows: TDTA (4.321), ROC (−4.175), ROE (−1.729), SLTA (1.812), NWCTA (−1.449), NCFTA (−1.310), AT (−0.203), FL (0.008), ELLFAR (−0.004). A graphical representation of these results is shown in
Figure 2. The coefficients of the other features were shrunk to 0.
A frequently applied AI method is a neural network. In this paper, a five-layer feedforward ANN was built. The structure of the network was chosen with two conditions in mind: to make the network robust enough to extract features and to prevent its overfitting. The optimization algorithm Adam (Adaptive Moment Estimation) was applied. The architecture of this network with three hidden layers can be seen in
Figure 3.
The neural network takes 7 inputs, which are financial features selected by Lasso logistic regression. Hidden layers comprise 16, 32, and 16 neurons and threshold values (bias). The output layer takes the dependent variable—bankruptcy, which contains two neurons giving the final decision. As a result, there are two groups of businesses—businesses that are threatened with bankruptcy and businesses that are not threatened with bankruptcy. The ANN parameters are listed in
Table 4.
The classification accuracy of neural networks was evaluated using several performance measures. The overall accuracy of the network was achieved at 97.04% (see
Table 5). The network reached higher precision (98.74%) than recall (69,23%). F1-score of the neural network was 0.69, while according to AUC results, the classification accuracy of the network was good.
Another artificial intelligence method applied to predict bankruptcy in this study is a decision tree built by the CART algorithm. It used the features selected by the Lasso logistic regression. DT ranked these features according to their importance for improving financial health and preventing possible bankruptcy. The most important feature was the Return on costs (ROC), followed by the capital structure indicators TDTA and SLTA (see
Figure 4).
The results of the DT are graphically illustrated by the diagram (
Figure 5). Decomposition of the DT takes place hierarchically. The left branch of the tree is limited to the threshold value of TDTA (
), and 1006 businesses were assessed by that rule., i.e., most prosperous businesses (990 of 1025) and a part of non-prosperous businesses (16 of 54). To increase the accuracy of the model, node 2 was further split, and another branch was created. Businesses that reached
were classified as not at risk of bankruptcy. There were 917 businesses. A total of 89 businesses that reached
were further classified using other variables. We can state that businesses that reach
are at risk of bankruptcy if the value of
and, at the same time, the value of
.
The left branch of the tree was constrained by a threshold value of . However, this inequality was not confirmed in the case of 73 businesses, 35 of them prosperous and 38 non-prosperous. Node 3 was further split, and another branch was created. Businesses that reached a positive value of ROC were classified as not at risk of bankruptcy. There were 31 businesses. Businesses that achieved a negative ROC value were further classified based on the value of NWCTA. We can state that businesses that reached are at risk of bankruptcy if the value of and, at the same time, the value of
The performance metrics of the DT are listed in
Table 6. The overall accuracy of the model was 98.89%, while the model achieved higher precision (92.05%) than recall (85.27%). The F1-score of the model was equal to 0.88, while according to the results of its AUC, the model achieved perfect classification accuracy.
Two ensemble techniques were used to enhance decision trees’ classification accuracy—AdaBoosted trees (ABT) and Gradient-boosted trees (GBT). Results of features importance in the case of ABT identified three most important features at about the same level—these were indicator of capital structure TDTA, indicator of profitability ROC and safety indicators NWCTA (see
Figure 6).
When applying ABT, there was a slight improvement in overall accuracy compared to DT (see
Table 7). However, precision (from 92.05% to 100%), recall (from 85.27% to 94.07%), and thus F1-score increased significantly (from 0.88 to 0.97). The value of AUC 0.97 indicates perfect classification accuracy of the model.
In the case of the GBT application, profitability (ROC) appears to be the most important factor in predicting bankruptcy (see
Figure 7). It is followed by capital structure indicators TDTA and SLTA. Significant is also a safety indicator NWCTA.
When using GBT, an increase in classification accuracy compared to DT was also demonstrated, while this model achieved the same performance measures as ABT (see
Table 8).
5. Discussion
Empirical research focused on the verification of decision trees, ensemble methods, and artificial neural networks in predicting the financial failure of businesses provides interesting results. The summary of the results is listed in
Table 9. The table also shows the results of MDA and LR, which are not part of this empirical study but arise from our earlier research studies.
The ranking of the applied methods according to the results of the individual evaluation criteria is shown in
Table 10.
Table 10 shows that ensemble models achieved the best results in all performance criteria. Significantly worse results were achieved by ANN, but despite this, it follows DT in its classification accuracy. The application of the boosting methods to DT clearly showed an increase in their classification accuracy. Compared to ensemble models, MDA achieved the worst results, while LR achieved a slightly better position. Ensemble models achieved the best classification accuracy for both prosperous and non-prosperous businesses. Their classification accuracy for prosperous businesses was 100%, and for non-prosperous businesses, 94.07%. Overall, the classification accuracy of ensemble models was high, as well. It achieved 99.7%.
However, some problems can occur when training DTs. We can mention the determination of the depth of the decision tree, the choice of the appropriate method of selecting the attributes and dealing with training data with missing values. Overfitting problems can occur, as well. The disadvantage of DT occurs if it is built with multiple features while the research sample is small. However, DT also has advantages, such as generating understandable rules, not requiring difficult calculations, working with continuous and categorical variables, and, last but not least, achieving excellent classification and prediction accuracy [
94].
The results of this study can be compared with the study by Tsai et al. [
9], who applied DT ensembles on three datasets and achieved 88.36% accuracy, while ANN achieved 86.6% accuracy. They confirmed that ensemble models achieve a better classification accuracy compared to simple models, while DT ensembles using boosting methods performed best. Kim and Upneja [
33] found out that AdaBoosted decision trees constructed to address financial problems achieved superior prediction results and the lowest error type I compared to decision trees. Alfaro et al. [
43] concluded that the ensemble model based on AdaBoost achieved higher classification accuracy than ANN. Better classification accuracy of ensemble-learning models for credit scoring compared to traditional ones also confirmed the findings of Li et al. [
95]. Hung and Chen [
32] proposed a selective ensemble of three classifiers, i.e., the decision tree, the backpropagation neural network, and the support vector machines, and concluded that they perform better than other weighting or voting ensembles for bankruptcy prediction.
The results of this study can be supported mainly by the study of Heo and Yang [
18], who pointed out the fact that many studies focus on the application of different models for the classification of bankrupt companies. However, according to the authors, these are mainly applications for general companies. They focused their study on construction companies, which differ from general companies in some financial characteristics, especially in the liquidity and capital structure. The authors pointed out the significant benefit of the application of the AdaBoost for construction companies. Sun et al. [
96] used a backpropagation neural network as a base learner and constructed two ensemble models based on AdaBoost and bagging. These models were constructed to predict the financial distress of Chinese construction and real estate companies. They also confirmed that ensemble models significantly outperform single ones.
The higher classification accuracy of decision trees compared to NN was also confirmed by Olson et al. [
4]. They also pointed to the fact that DT results are more understandable for users than ANN results. A shortcoming of DT can be the large number of rules contained in the tree. The high classification and prediction accuracy of DT, which exceeded the classification accuracy of NN, was confirmed by Golbayani et al. [
97]. Alam et al. [
98], in their study, focused on the prediction of corporate bankruptcy and confirmed the high classification accuracy of the decision tree—random forest (98.7%). Even though we did not apply this decision forest in this study, it is also possible to confirm the significant classification accuracy of decision forests and trees. The significant classification accuracy of AdaBoost was confirmed by Lahmiri et al. [
60]. According to the authors, it is a significant classification tool from the point of view of its limited complexity, lower classification error, and short data processing time. Furthermore, the results show that this classification system outperforms models that have been validated on the same data in the recent period. Papíková and Papík [
99] state that individual resampling and feature selection methods do not enhance the performance of the model compared to the results of the original unbalanced sample. Even if the sample is unbalanced with a minority of failed businesses, many classification algorithms can cope with this imbalance and bring significant results. These findings are also applied in practice, helping stakeholders to detect real failing companies.
The limitation of this research is the data. There are deficiencies in them, such as missing values, extreme values, or incorrect records. For this reason, adapting data for a given analysis and research is quite time-consuming. In future research, we may focus on the need to resample the dataset to achieve better classification accuracy, even though the classification accuracy of the models built in this study was excellent. We will also focus on the application of non-numeric input data to apply deep learning methods.
A significant contribution of ensembled models is the fact that in their results, they also provide a ranking of the important features that determine the prosperity and failure of the analyzed enterprises (see
Table 11).
These are the most important features that should be applied in the prediction of bankruptcy and the classification of businesses into prosperous and non-prosperous.
All entities cooperating with the analyzed businesses can check the results achieved by selected features and thus prevent possible losses. On the other hand, business managers can monitor the development of the mentioned indicators and thus manage and improve the financial health and performance of companies.
Research into the application of decision trees in bankruptcy prediction is still ongoing. It is primarily focused on improving the accuracy of the models without worsening their interpretability. An interesting challenge for the future is, therefore, the effort to improve the classification accuracy of decision tree models without reducing the quality of their interpretation, which tends to be considered their main advantage. The solution to this research problem is twofold: to build traditional decision trees with higher classification accuracy without significantly changing their structure or to propose new interpretation options for non-traditional tree structures [
100].