Leandros A. Maglaras (editor), Sonali Das (editor), Naliniprava Tripathy (editor), Srikanta Patnaik (editor) - Machine Learning Approaches in Financial Analytics (Intelligent Systems Reference Library
Leandros A. Maglaras (editor), Sonali Das (editor), Naliniprava Tripathy (editor), Srikanta Patnaik (editor) - Machine Learning Approaches in Financial Analytics (Intelligent Systems Reference Library
Leandros A. Maglaras (editor), Sonali Das (editor), Naliniprava Tripathy (editor), Srikanta Patnaik (editor) - Machine Learning Approaches in Financial Analytics (Intelligent Systems Reference Library
Leandros A. Maglaras
Sonali Das
Naliniprava Tripathy
Srikanta Patnaik Editors
Machine
Learning
Approaches
in Financial
Analytics
Intelligent Systems Reference Library
Volume 254
Series Editors
Janusz Kacprzyk , Polish Academy of Sciences, Warsaw, Poland
Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The aim of this series is to publish a Reference Library, including novel advances
and developments in all aspects of Intelligent Systems in an easily accessible and
well structured form. The series includes reference works, handbooks, compendia,
textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains
well integrated knowledge and current information in the field of Intelligent Systems.
The series covers the theory, applications, and design methods of Intelligent Systems.
Virtually all disciplines such as engineering, computer science, avionics, business,
e-commerce, environment, healthcare, physics and life science are included. The list
of topics spans all the areas of modern intelligent systems such as: Ambient intelli-
gence, Computational intelligence, Social intelligence, Computational neuroscience,
Artificial life, Virtual society, Cognitive systems, DNA and immunity-based systems,
e-Learning and teaching, Human-centred computing and Machine ethics, Intelligent
control, Intelligent data analysis, Knowledge-based paradigms, Knowledge manage-
ment, Intelligent agents, Intelligent decision making, Intelligent network security,
Interactive entertainment, Learning paradigms, Recommender systems, Robotics
and Mechatronics including human-machine teaming, Self-organizing and adap-
tive systems, Soft computing including Neural systems, Fuzzy systems, Evolu-
tionary computing and the Fusion of these paradigms, Perception and Vision, Web
intelligence and Multimedia.
Indexed by SCOPUS, DBLP, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Leandros A. Maglaras · Sonali Das ·
Naliniprava Tripathy · Srikanta Patnaik
Editors
Machine Learning
Approaches in Financial
Analytics
Editors
Leandros A. Maglaras Sonali Das
Edinburgh Napier University Department of Business Management
Edinburgh, UK University of Pretoria
Hatfield, South Africa
Naliniprava Tripathy
Indian Institute of Management Srikanta Patnaik
Shillong, India Institute of Management and Technology
Bhubaneswar, Odisha, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
v
vi Preface
In today’s fast-paced, data-driven world, the realms of finance and technology are
converging like never before. Machine learning, a subset of artificial intelligence, has
emerged as a game-changer in the world of financial analytics. The integration of
advanced algorithms and predictive models has revolutionized the way financial insti-
tutions, investors, and professionals analyse and predict market trends, manage risk,
and make critical decisions. In this edited volume, Machine Learning Approaches
in Financial Analytics, we dive deep into this exciting and rapidly evolving field,
providing a comprehensive guide for individuals looking to leverage the power of
machine learning in the finance industry.
The financial industry is no stranger to technological innovation, and machine
learning is the latest breakthrough in this ongoing transformation. From algorithmic
trading and portfolio management to credit risk assessment and fraud detection,
machine learning techniques are being harnessed to enhance efficiency, reduce costs,
and improve accuracy in decision-making. Our book unravels this financial revolu-
tion, making complex concepts accessible to both novices and experts, as we explore
the intersection of finance and artificial intelligence.
Key Features
vii
viii Editorial
Part I Foundations
1 Introduction to Optimal Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Makoto Shimoshimizu
1.1 Overview: Financial Market and Execution Problem . . . . . . . . . 3
1.1.1 Electronic Market and System Transition . . . . . . . . . . . . 3
1.1.2 Large Trader and Market (Price) Impact . . . . . . . . . . . . . 5
1.1.3 Structure of This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Notations and Some Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Almgren-Chriss Model for Optimal Execution . . . . . . . . . . . . . . . 11
1.3.1 Market Model and Optimal Execution Strategy . . . . . . 11
1.3.2 Efficient Frontier of Optimal Execution:
A Mean-Variance Perspective . . . . . . . . . . . . . . . . . . . . . 15
1.4 A Continuous-time Analog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.2 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Transient Impact Model with Small Traders’ Orders [35] . . . . . . 20
1.5.1 Market Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.2 Formulation as a Markov Decision Process . . . . . . . . . . 24
1.5.3 Dynamics of the Optimal Execution . . . . . . . . . . . . . . . . 25
1.5.4 In the Case with Target Close Order . . . . . . . . . . . . . . . . 33
1.5.5 Computation Method for Optimal Execution . . . . . . . . . 34
1.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Appendix 1: Lagrange Multiplier Method . . . . . . . . . . . . . . . . . . . . . . . . . . 37
xi
xii Contents
3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4 Fully Homomorphic Encrypted Wavelet Neural Network
for Privacy-Preserving Bankruptcy Prediction in Banks . . . . . . . . . . 97
Syed Imtiaz Ahamed, Vadlamani Ravi, and Pranay Gopi
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2 Overview of Bankruptcy Prediction and Problem
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.1 Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.2 CKKS Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.3 Overview of the Original Unencrypted WNN . . . . . . . . 104
4.4.4 Proposed Privacy-Preserving Wavelet Neural
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5 Datasets Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5.1 Qualitative Bankruptcy Dataset . . . . . . . . . . . . . . . . . . . . 108
4.5.2 Spanish Banks Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5.3 Turkish Banks Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5.4 UK Banks Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Appendix: Datasets Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5 Tools and Measurement Criteria of Ethical Finance Through
Computational Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Marco Piccolo and Francesco Vigliarolo
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Ethical Finance, Principles and Operating Criteria . . . . . . . . . . . 119
5.3 Computational Finance Critic: Limits and Challenge
with Respect to Ethic Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.1 Some Definition Aspects Considered in This
Paragraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.3.2 The Background Vice: Economic Positivism . . . . . . . . . 126
5.4 Measurement Criteria of Computational Finance
with the Principles of Ethical Finance . . . . . . . . . . . . . . . . . . . . . . 127
5.5 Some Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
xiv Contents
Makoto Shimoshimizu
M. Shimoshimizu (B)
Department of Industrial and Systems Engineering, Tokyo University of Science, Tokyo, Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 3
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_1
4 M. Shimoshimizu
The last three decades have witnessed a huge (and worldwide) change in the
trading system on stock exchanges. For example, as stated in [33], the regulatory
development of the HFT was accelerated over the 1990s for the financial market
to be more competitive among market participants. The related regulation, Regu-
lation ATS (alternative trading systems; Reg ATS) in 2000, was enforced in the
U. S. for the sorts of non–exchange competitors to be able to enter the market-
place.1
In light of the emergence of MiFID in 2007, considerable concerns about the
so-called dark pool have arisen among practitioners and researchers. A dark pool is
a (private) securities trading exchange where traders can use an uninformed order
book and matching engine. Since the MiFID was enforced in Europe, institutional
traders such as pension fund manager rapidly used dark pools, where the trading of
a large block of orders are not informed to the market participants.
According to [24], although traders did not often use high-frequency trading
(HFT) around 2000, HFTs have accounted for .20 percent of the total trading volume
in the market since the mid-.2000s (until .2019). The volume-weighted average price
(VWAP) or time-weighted average price (TWAP) strategy was the mainstream of
algorithmic trading in the early .2000s. The VWAP, denoted by . PVWAP , is a bench-
mark as the average price weighted by the relative volume over the trading time
window:
∑n
Pi Vi
. PVWAP := ∑n ,
i=1
(1.1)
Vi
i=1
where . Pi is the asset price at time .n. The VWAP strategy aims to maintain the
price dynamics as the VWAP via one’s own trading activity. The TWAP, denoted by
. PTWAP , is a benchmark as the average price of a given number of trades, say .n, over
the trading time window:
1∑
n
. PTWAP := Pi . (1.2)
n i=1
The VWAP and TWAP are not realized until the end of the trading horizon. Thus,
traders generally consider the historical VWAP and TWAP as the benchmark. How-
ever, using some liquidity-seeking algorithms has become more common since the
mid-.2000s (until .2019). These facts underscore the importance of analyzing algo-
rithmic trading or HFT which financial market traders have heavily relied on for
1The Regulation National Market System (Reg NMS) in 2007 and Market in Financial Instrument
Directive (MiFID) in 2007, enforced in the U.S. and Europe, respectively, brought about a negative
outcome. Even though these regulations are designed for encouraging new competition and trading
venues, equity markets in the U.S. and Europe are fragmented since tradings spread out among
various exchanges and financial markets.
1 Introduction to Optimal Execution 5
more than a decade. The development of the trading system facilitates an increasing
number of studies encompassing a field such as market impact modeling, or (optimal)
execution problem.
The mechanism of how the market impact occurs is captured via the information
obtained from the so-called limit order book. A limit order book (commonly abbre-
viated as LOB) is a set of information including volumes of buy/sell orders, and the
price at which the volume is submitted by traders. All market participants can access
the information about LOB.
A trader can select how to submit orders from the following two ways: market
order and limit order.3 A market order (MO) is a one used by traders to execute
buy/sell orders immediately after the order submission. A limit order (LO) is, on the
contrary, aimed at executing buy/sell orders at the price which the trader prefers to
trade. One of the main differences between the two orders is the fact that MOs are
2 As a current stream, the impact caused by large traders as well as other traders is called the market
impact rather than the price impact. Therefore, the author consistently uses the word market impact
in what follows.
3 The types of orders a trader can use are categorized into more classes (e.g., cancellation, dark-
pool, and so on). Here our aim is to illustrate a basic concept of how the market works and how a
market impact can arise. Readers can refer to [10, 22, 47] for more details.
6 M. Shimoshimizu
Fig. 1.1 Example of submiting a sell market order. Assume that the volume of best bid at time
.tis .3000 (units) at the best bid price 299$. When a trader submits .2500 (units of) volumes of sell
market orders, the LO at the best bid price is firstly executed. Since the volume of orders placed at
the best bid price is less than the market orders submitted by the trader, the best bid price does not
change. The best bid price at time .t + Δt thus remains 299$ and the execution price also remains
unchanged for the market order. Also, any market impacts does not occur
executed for certain, although LOs are not certainly executed. The feature of MOs
and LOs will be illustrated from Figs. 1.1, 1.2, 1.3 and 1.4.4
Some terminology expressing the LOB precedes at first. The term bid (ask, respec-
tively) has been applied to each buy (sell) LO. The price at which each buy (sell)
order is placed is referred to as bid price (ask price). In particular, the term best
bid price (best ask price) defines the highest (lowest) price of the bid (ask) price. A
trader submits an LO by designating the order type (buy/sell), the volume, and the
price at which the trader wants the orders to be executed. The term LOB is generally
understood to mean all the information about these features as well as the time each
order is placed. If an opposite LO at a price less than the best bid price or larger than
the best ask price comes into the market, the buy/sell transaction matches and the
LO vanishes in the LOB. Let us denote the best bid price and best ask price at time
b a mid
.t by . Pt and . Pt , respectively. Then, the mid-price, expressed as . Pt , is defined as
Ptb + Pta
. Ptmid := . (1.3)
2
The minimum price increment that all traders can submit orders is called tick size.
A change in the tick size influences the trading activity of market participants. (For
the details, see, for example [10, 22, 47].)
4 The tick size is assumed to be 1$ in Figs. 1.1, 1.2, 1.3 and 1.4. The horizontal axis denotes the
volume of orders on the LOB and the vertical axis the price at which each order is placed (as LOs).
1 Introduction to Optimal Execution 7
Fig. 1.2 Example of submiting a sell market order. Assume that the volume of best bid at time
.tis .3000 (units) at the best bid price 299$. When a trader submits .3500 (units of) volumes of sell
market orders, the LO at the best bid price is firstly executed. Since the volume of orders placed
at the best bid price is larger than the market orders submitted by the trader, the best bid price
changes. The best bid price at time .t + Δt thus moves to the next ask price (i.e., decreases) and
becomes 298$. The execution price also changes for the market order and the market impact occurs
Fig. 1.3 Example of submiting a buy market order. Assume that the volume of best ask at time
.tis .3000 (units) at the best ask price 301$. When a trader submits .2500 (units of) volumes of buy
market orders, the LO at the best ask price is firstly executed. Since the volume of orders placed at
the best ask price is less than the market orders submitted by the trader, the best ask price does not
change. The best ask price at time .t + Δt thus remains 301$ and the execution price also remains
unchanged for the market order. Also, any market impacts does not occur.
We can categorize the market impact mentioned above into three types: temporary,
permanent, and transient market impact. The temporary (market) impact is defined
to stand for the part vanishing before the next trading due to the recovery of (limited)
market liquidity. On the other hand, the part of the market impact that remains at the
8 M. Shimoshimizu
Fig. 1.4 Example of submiting a buy market order. Assume that the volume of best ask at time
.tis .3000 (units) at the best ask price 301$. When a trader submits .3500 (units of) volumes of buy
market orders, the LO at the best ask price is firstly executed. Since the volume of orders placed
at the best ask price is larger than the market orders submitted by the trader, the best ask price
changes. The best ask price at time .t + Δt thus moves to the next ask price (i.e., increases) and
becomes 302$. The execution price also changes for the market order and the market impact occurs
next trading time is referred to as the permanent (market) impact. Moreover, when
the temporary impact dissipates over the course of the trading horizon, we call the
market impact the transient (market) impact.5 Figure 1.5 illustrates the basic concept
explained above.
Assume that a market (or quoted) price is given by. Pt . Since a large trader executes
a large number of orders denoted by .qt , the execution price (that corresponds to the
real trading price) goes up to some degree and becomes . Pt + λt qt . Here .λt qt denotes
the market impact caused by the submission of the large trader (under the assumption
of a linear market impact model). After the execution, the market price goes down to
some degree due to the liquidity provision from the market participants (e.g., market
makers, noise traders, and so on).6 The impact that does not affect the (market) price
at the next trading time (corresponding to .λt qt αt in Fig. 1.5) is the temporary impact.
The impact that affects the (market) price at the next trading time (corresponding
5 The definition of each kind of market impact may be different from that of other literature. The
above definition stems from the assumption that the market impact is decomposed into a temporary
part and a permanent one. Some literature, such as [8, 17], empirically show that the market impact
has transient properties. In the following, each market impact is abbreviated as temporary impact,
permanent impact, and transient impact, respectively.
6 We can classify the types of traders as follows:
Fig. 1.5 Illustration of temporary, permanent, and transient market impact (in the case of buy MO)
to .λt qt (1 − αt ) in Fig. 1.5) is the permanent impact. Moreover, the transient impact
describes a residual effect of past temporary impacts caused by the large trader (and
other market participants). The formulation of temporary, permanent, and transient
impacts is set to become a vital factor in analyzing optimal execution problems.
where .W0 and .WT represents the wealth at time .0 and .T , respectively.
Remark 1 (Premanent Impact)
Let us define the permanent impact function caused by a large trader by . f : R →
R. Then, as [17, 22, 23] shows, the market excludes the dynamic arbitrage if the
permanent impact function becomes
for all .v ∈ R for some .k ∈ R++ , that is, a linear function [20]. In addition, they
theoretically demonstrate that nonlinear permanent impact can lead to no dynamic
arbitrage.
According to this fact (as well as some empirical results), a lot of existing research
including the one explained below analyzes an optimal execution strategy under a
linear permanent impact model.
The rest of this chapter proceeds as follows. Section 1.3 introduces the so-called
Almgren-Chriss (AC) model [2] in a discrete-time framework, a fundamental model
for seminal theoretical papers on optimal execution problems. Section 1.4 then ana-
lyzes a continuous-time analog of AC model, which captures the significant role that
the market impact and risk-averse feature of a large trader plays. In Sect. 1.5, we see
the model examined by Ohnishi and Shimoshimizu [35], where the market impact
caused by small traders as well as a large trader exists. The model also considers a
transient feature of the market impact which Almgren and Chriss [2] does not incor-
porate. All of these models derive the optimal execution strategy explicitly so that
these strategies can be used as a backtest from a practitioner’s point of view. A bib-
liographic note (Sect. 1.6) and some appendices are placed at the end of this chapter.
In this chapter, .Z++ stands for the set of all positive integers, i.e., .Z++ := {1, 2, . . .}.
Likewise, we define .R+ := [0, ∞) and .R++ := (0, ∞). .Rd represents the set of
.d-dimensional real-valued vectors. Any vectors are defined as row vectors. .R
m×n
and .V[·] the variance, each defined on a probability space .(Ω, F, P). As for the
differentiation of a function, for any twice-differentiable function . f : X ⊂ R → R,
denoted by . f t := f (t) for .t ∈ X , . f˙t expresses the differentiated function of . f evalu-
ated at.t and. f¨t the twice-differentiated function. Also, for any vector-valued function
1 Introduction to Optimal Execution 11
This chapter aims to explain some introductory notions and models concerning
optimal execution problems for those who have some knowledge of stochastic anal-
ysis. To this end, the readers are supposed to have a basic knowledge of real analysis
and stochastic analysis. Readers who are not familiar with these issues, please refer
to some basic materials, for example, [9, 40]. The other materials needed for this
chapter, some optimization methods in particular, are shown in the appendices.
We first introduce a fundamental model proposed by Almgren and Chriss [2]. The
model searches for an optimal execution strategy from the viewpoints of cost mini-
mization criteria and mean-variance one.7
Assume that a large trader (e.g., a life insurance company, trust company, or a com-
pany that manages pension funds) holds .Q ∈ R++ units of one risky asset. The large
trader needs to liquidate all of the assets by the maturity .T ∈ Z++ . The set of the
trading time, denoted by .T , consists of .n times as follows:
{ }
T 2T
.T := , ,...,T (1.7)
n n
In the following, we will use the notation .τ to express .T /n and .tk to express .kT /n
for .k ∈ {1, . . . , n}. We also define by .qk for .k ∈ {1, . . . , n} the number of shares that
the large trader submits at time .tk ∈ T and by . Q k the ones that the large trader still
holds at time .tk ∈ T ∪ {tn+1 }.8 By definition, the following relationships hold:
7 Bertsimas and Lo [7] propels to the forefront in investigations of this field, which addresses the
optimization problem of minimizing the expected execution cost in a discrete-time framework via a
dynamic programming approach. This analysis identifies the optimal execution volume as equally
divided volume throughout the trading epochs. Notwithstanding a valuable insight into the execution
problem, the model disregards any attitudes toward the large trader’s risk.
8 The definition of some variables is slightly different from the one defined in AC model [2]
to integrate the notation through this chapter, but the difference does not significantly affect the
contents compared with the original paper.
12 M. Shimoshimizu
• . Q 1 = Q. (initial condition)
• . Q n+1 = 0. (terminal condition)
∑k
• . Q k+1 = Q k − qk = Q − q j for .k ∈ {1, . . . , n − 1}.
j=1
Also, the execution strategy of the large trader, i.e., the set .{q1 , . . . , qn } is defined as
. q. The execution trajectory of the one, i.e., the set .{Q 1 , . . . , Q n , Q n+1 } is similarly
Let us next consider the dynamics of the risky asset. Assume that . Pt for .t ∈ T
represents the fundamental price or unaffected price, which expresses the risky asset
price without market impact caused by the large trader.
In the Almgren-Chriss Model, the market impact is divided into two parts: tempo-
rary impact and permanent impact. We assume that the dynamics of the risky asset,
denoted by . Pk := Ptk for .tk ∈ T , evolve as a discrete arithmetic random walk. The
dynamics of the risky asset price becomes as follows:
(q )
k
. Pk+1 = Pk + σ τ 1/2 ∈k − τ g , (1.9)
τ
where, for .t ∈ T , .σ represents the volatility of the asset, .∈t for .t ∈ T follows a
standard normal distribution, i.e.,
∈ ∼ N (0, 1),
. t (1.10)
( )
9 To be precise, the permanent impact.g qτk is a function of the average rate of the trading volume
.qtk during the interval .(tk−1 , tk ). Moreover, .∈t is independent in time .t ∈ T by definition.
1 Introduction to Optimal Execution 13
captured value of a trading strategy (or trajectory) as the total revenue after the
trading transaction. The captured value is calculated, via a simple calculation, as
follows:
∑
n n (
∑ ( q )) ∑
n (q )
. ̂k qk = P1 Q +
P στ 1/2
∈k − τ g
k
Q k+1 − qk h
k
. (1.12)
k=1 k=1
τ k=1
τ
Using the notation .C( Q) to represent the total cost, we can express the total cost as
follows:
∑
n n (
∑ ( q )) ∑
n (q )
C( Q) := P0 Q −
. ̂k qk = −
P σ τ 1/2 ∈k − τ g
k
Q k+1 + qk h
k
.
k=1 k=1
τ k=1
τ
(1.13)
Remark 2 The transaction cost defined above is the standard measure in trading
performance evaluations, which [38] called the implementation shortfall.
By the above arguments, the expected total cost and the variance of the total cost
are readily computed as follows:
[ ] ∑ n (q ) ∑n (q )
k k
.E C( Q) = τg Q k+1 + qk h ; (1.14)
k=1
τ k=1
τ
[ ] ∑
n
2
.V C( Q) = σ 2 τ Q k+1 . (1.15)
k=1
Theorem 1 (Optimal execution strategy for AC model) The optimal execution strat-
egy at time .tk ∈ T becomes an equally divided trading strategy:
Q
. kq∗ = , k = 1, . . . , n. (1.18)
n
Proof By the summation by parts formula (see, e.g., [40, Theorem 3.41]), we have
∑
n
ψ 2 ψ∑ 2
n
ψ
. qk Q k+1 = Q − q . (1.19)
k=1
2 2 k=1 k
Substituting this into the objective function of the above minimization problem then
results in the following minimization problem:
( )∑
n
ψ ψ 2
min η− qk2 + Q ,
q∈A 2 k=1
2
. (1.20)
∑
n
s.t. qk = Q.
k=1
This is a convex optimization problem with equality constraint since we assume that
ψ
.η − > 0. Using the Lagrange multiplier method, we obtain
2
Q
q∗ =
. k , (1.21)
n
for .k ∈ {1, . . . , n}. For the Lagrange multiplier method, see Appendix 1. □
∗ n − (k − 1)
. Qk = Q, k = 1, . . . , n, n + 1. (1.22)
n
One can easily check that the expected total cost of the optimal execution strategy
obtained as above is less than the one of the strategy of initially executing all of the
trading volumes, denoted by . Q 0 :
( ) ( )
ψ Q2 ψ ψ ψ
C( Q ∗ ) = η −
. + Q2 > C( Q 0 ) = η − Q2 + Q2 (1.23)
2 n 2 2 2
This shows that dividing large orders into small ones can lead to reducing execution
costs.
This strategy is closely related to the Time-Weighted Average Price (TWAP) strat-
egy, which practitioners use as a standard trading strategy. For a continuous-time
1 Introduction to Optimal Execution 15
execution in an interval .[0, T ] for some .T ∈ R with initial trading volume .Q, the
TWAP strategy is defined as
Q
q TWAP :=
. t . (1.24)
T
We show from the above analysis of an optimal execution that the optimal execu-
tion strategy under a cost minimization criterion exists without risk-aversion term.
An analogous manner with Markowitz’s mean-variance analysis [2, 31] defines the
so-called effecient execution strategy. An execution strategy is said to be efficient if
there exists no other strategy that attains a lower expected shortfall (variance, respec-
tively) than any other strategy that has the same or lower variance (expected shortfall,
respectively).
The efficient execution strategy is defined to satisfy the following constrained
optimization problem:
[ ]
. min E C( Q) ; (1.25)
Q∈A
[ ]
∗
.s.t. V C( Q) ≤ V . (1.26)
∑
n
( )2 ∑
n
2
. min U( Q) = η Q k+1 − Q k + γσ2 Q k+1 (1.29)
Q∈A
k=1 k=1
16 M. Shimoshimizu
Note that the function .U is a convex quadratic function with respect to . Q k for .k ∈
{1, . . . , n + 1}. The optimality conditions for Eq. (1.27) result in
∂U ( ) ( )
. ( Q) = 0 ⇐⇒ η Q k − Q k−1 − η Q k+1 − Q k + γ σ 2 Q k = 0. (1.30)
∂ Qk
∗ sinh (κ(T − tk ))
. Qk = Q , k = 1, . . . , n, n + 1, (1.31)
sinh (κ T )
1.4.1 Model
We can extend the above model to a continuous-time setting. In this case, the execu-
tion is continuously conducted in a trading horizon .[0, T ]. In the following, we only
consider the temporary impact and assume that the temporary impact is linear with
respect to the orders posed by the large trader.
Let . P0 be the risky asset price at time .0. The dynamics for the market price are
then described, using a standard Brownian motion, as follows:
dPt = σ dBt ,
. (1.33)
where .{Bt }0≤t≤T is a standard Brownian motion with . Bt = 0 and .σ is the risk of the
market price.11 The execution price at time .t ∈ [0, T ] is given by
10 The original model includes an approximation for .κ, although the model explained in this
subsection does not.
11 If we add a linear permanent impact in this model, the dynamics for the market price becomes
. ̂t = Pt − η Q̇ t ,
P (1.35)
dQ t
. = − Q̇ t , (1.36)
dt
and .η Q̇ t at time .t ∈ [0, T ] stands for the temporary impact with .η ∈ R++ .
Let us define . Q to be the trading trajectory and .A to be the set of admissible
strategies in a class of deterministic strategies as in the discrete-time setting. That is,
Using the stochastic integration by parts formula, we can calculate the trading cost
(or the implementation shortfall) and obtain the expression as follows:
{T
.C( Q) := P0 Q − ̂t Q̇ t dt
P
0
{T
( )
= P0 Q − Pt − η Q̇ t Q̇ t dt
0
{T {T
( )2
= P0 Q − Pt Q̇ t dt + η Q̇ t dt
0 0
{T {T
( )2
= −σ Q t dBt + η Q̇ t dt. (1.38)
0 0
From the property of stochastic integral and Itô’s isometry, we obtain the expec-
tation of the trading cost (or the expected implementation shortfall) and the variance
of the trading cost:
[ ] {T
( )2
.E C( Q) = η Q̇ t dt; (1.39)
0
[ ] {T
( )2
.V C( Q) = σ
2
Q t dt. (1.40)
0
[ ] [ ] {T ( ( )2 ( )2 )
. min E C( Q) + γ V C( Q) = η Q̇ t + γ σ 2 Q t dt, (1.41)
Q∈A
0
s.t. Q 0 = Q;
. Q T = 0. (1.42)
Theorem 2 The optimal trading strategy at time .t ∈ [0, T ], denoted by . Q ∗ :=
∗
{Q t |0 ≤ t ≤ T }, is characterized by the following remaining execution volume:
∗ sinh(κ(T − t))
. Qt = Q , (1.43)
sinh(κ T )
γσ2
. Q̈ t = Qt , (1.44)
η
with boundary conditions: . Q 0 = Q and . Q T = 0. (For the detail, see Appendix 3.)
This is a second order linear ordinary differential equation (ODE) with constant
coefficients. Solving this yields Eq. (1.43). (For the detail, see Appendix 4.) □
Remark 3 As Eq. (1.43) shows, if .Q = 0,
∗
. Qt = 0 (1.45)
for all.t ∈ [0, T ]. This means that the round-trip trading in this model with zero initial
trading volume excludes the dynamic arbitrage opportunity.
We see some numerical examples through comparative statics in the following. The
benchmark values are set as follows:
The first example encompasses how the level of volatility influences the optimal exe-
cution strategy. Figure 1.6 illustrates the remaining execution volume of the optimal
execution strategy for different .σ : .σ = 0.5, 1, and .5.
1 Introduction to Optimal Execution 19
As Fig. 1.6 shows, the larger the volatility is, the faster the large trader executes.
This fact is compatible with the intuition that a large trader tends to avoid the risk of
future price fluctuation.
We next examine how the risk-aversion parameter can influence the trading strat-
egy. Figure 1.7 demonstrates the optimal trading strategy for different risk-aversion
parameters: .γ = 0.001, 0.01, and .0.1.
Figure 1.7 shows that the more risk-averse the large trader is, the faster he/she
executes the trading volume. This is rather compatible with the intuitive understand-
ing that the risk-averse large trader executes the orders fastly to avoid the effect of
market impact (and price fluctuation).
20 M. Shimoshimizu
with an absolute risk aversion parameter .γ ∈ R++ . Let .qt ∈ R represent a large
amount of orders submitted by the large trader at time .t ∈ {1, . . . , T }. We denote
by . Q t the remained execution volume, that is, the number of shares remained to
purchase by the large trader at time .t ∈ {1, . . . , T, T + 1}. This assumption yields
. Q 1 = Q and
12 As for [26, 27], they construct models with the residual effect of the market impact, i.e., the
transient impact which dissipates over the trading time window. These papers solve an optimization
problem of maximizing an expected utility payoff from the final wealth at maturity, deriving an
optimal execution strategy.
1 Introduction to Optimal Execution 21
. Q t+1 = Q t − qt , t = 1, . . . , T. (1.48)
The market price (or quoted price) of the risky asset at time .t ∈ {1, . . . , T, T + 1}
is expressed as . Pt . Since the large trader has a great influence on the risky asset’s
price through his/her submission of a large amount of orders, the execution price
̂t with the additive execution cost. In
at time .t ∈ {1, . . . , T } becomes not . Pt but . P
the rest of this paper, we assume that submitting one unit of (large) order at time
.t ∈ {1, . . . , T } causes the instantaneous linear market impact whose coefficient is
denoted as .λt ∈ R++ . We also assume that the aggregate trading volumes posed by
small traders also have some impact on the execution price.13 .κt ∈ R++ represents
the market impact coefficient per unit at time .t ∈ {1, . . . , T } caused by small traders.
The dynamics of the aggregate trading volumes submitted by small traders at time
.t ∈ {1, . . . , T } is assumed to be a sequence of random variables .vt , which follows a
normal distribution with mean .μvt and variance .(σtv )2 for each time .t ∈ {1, . . . , T },
that is,
( )
v ∼ N μvt , (σtv )2 , t = 1, . . . , T.
. t (1.49)
In the sequel, the buy- and sell-trades of a large trader are supposed to induce the
same (instantaneous) market impact. Assuming this would be inconsistent with the
situation observed in a real market. However, we can justify this assumption from
the statistical analysis of market data in [11, 12].14 We assume that the execution
price takes a form of a linear market impact model as follows:
. ̂t = Pt + (λt qt + κt vt ), t = 1, . . . , T.
P (1.50)
13 Existing research concerned with execution problems has thoroughly investigated the market
impact model with small traders. As [39] show, small trades have statistically far larger impacts
on the market than that of large trades in a relative sense. These results infer that one should take
into account a market impact caused by small traders when constructing a market impact model.
Cartea et al. [11] incorporates the market impact caused by other traders into the construction of
the midprice process by describing the market order-flow through a general Markov process and
derives a closed-form strategy for a large trader. They show that the optimal execution strategies
are different from [2] when small traders cause a market impact and coincide with [2] when small
traders do not affect the midprice. This analysis is based on the assumption that the market impact is
decomposed into temporary and permanent, and not transient. The model explained here considers
the transient impact through the residual effect of the past execution (caused by both a large trader
and small traders) on a risky asset market. This setting enables us to analyze how the residual effect
of the past market impact influences the execution strategy of a large trader. The effect of the market
impact caused by small traders on the execution price features the generalized market impact in the
model.
14 Their works estimate the permanent and temporary impacts by conducting a linear regression of
price changes on net order-flow using trading data obtained from Nasdaq. This estimation and the
relevant statistics show that the linear assumption of the market impact is compatible with the stock
market and that the market impact caused by both buy and sell trades is thought of as the same from
the viewpoint of statistical analysis.
22 M. Shimoshimizu
We next define the residual effect of past market impact (temporary impact to be
precise) at time .t ∈ {1, . . . , T, T + 1}, represented by . Rt , by means of the following
exponential decay kernel function .G(t) of time .t ∈ {1, . . . , T, T + 1}:
Using a deterministic price reversion rate .αt ∈ [0, 1] and a deterministic resilience
speed .ρ ∈ [0, ∞), the dynamics of the residual effect of past market impact is given
by
. R1 = 0;
∑
t
Rt+1 := (λk qk + κk vk ) αk e−ρ((t+1)−k)
k=1
∑
t−1
= e−ρ (λk qk + κk vk ) αk e−ρ(t−k) + (λt qt + κt vt ) αt e−ρ
k=1
[ ]
= Rt + (λt qt + κt vt ) αt e−ρ , t = 1, . . . , T. (1.52)
Note that (1.52) shows a Markov property. The Markov property arises from the
assumption of the exponential decay kernel.
We suppose here that the two stochastic processes,.vt and.∈t for.t ∈ {1, . . . , T }, are
mutually independent for convenience. We hereafter conduct our analysis assuming
the independence of the two stochastic processes.
f
The construction of the fundamental price at time .t ∈ {1, . . . , T }, denoted by . Pt ,
must be carefully considered. Since the residual effect of the past execution dissipates
over the course of the trading horizon, we define . Pt − Rt as the fundamental price
1 Introduction to Optimal Execution 23
of the risky asset. By definition of .∈t and the assumption that the permanent impact
at time .t ∈ {1, . . . , T } is represented by .(λt qt + κt vt ) βt , we can set the fundamental
f
price . Pt := Pt − Rt with a permanent impact as follows:
f
. Pt+1 = Pt+1 − Rt+1
:= Pt − Rt + βt (λt qt + κt vt ) + ∈t (1.54)
f
= Pt + βt (λt qt + κt vt ) + ∈t , t = 1, . . . , T .
This relation indicates that (i) the permanent impact caused by large traders and
small traders and (ii) the public news or information about an economic situation
are assumed to affect the fundamental price. This assumption also reveals that the
permanent impact may give a non-zero trend to the fundamental price, even if the
mean of .∈t is zero for all .t ∈ {1, . . . , T }. According to Eqs. (1.50), (1.52), and (1.54),
the dynamics of the market price or the relation between . Pt+1 and . Pt are described
as
and therefore,
Thus, in this case, we have a permanent impact model. Also, if .αt = 1, the model is
reduced to a transient impact model. Also, if .κt = 0 or .σtv = 0, the model is reduced
to [27].
From the definition of the execution price, the wealth process .Wt evolves
. ̂t qt = Wt − {Pt + (λt qt + κt vt )} qt , t = 1, . . . , T.
Wt+1 = Wt − P (1.58)
24 M. Shimoshimizu
s = (Wt , Pt , Q t , Rt ) ∈ R × R × R × R =: S.
. t (1.59)
s
. t+1 = (Wt+1 , Pt+1 , Q t+1 , Rt+1 ) ∈ S (1.60)
occurs according to the law of motion which we have precisely described in the
previous subsection. We symbolically describe the transition by a (Borel measurable)
system dynamics function . ht .(: S × A × (R × R) −→ S):
s
. t+1 = ht (st , qt , (∈t , vt )), t = 1, . . . , T. (1.61)
The term .−∞ means a hard constraint enforcing the large trader to execute all of the
remaining volume . Q T at the maturity .T , that is, .qT = Q T .
If we define a (history-independent) one-stage decision rule . f t at time .t ∈
{1, . . . , T } by a Borel measurable map from a state . st ∈ S = R4 to an action
q = f t (st ) ∈ A = R,
. t (1.63)
π := ( f 1 , . . . , f t , . . . , f T ).
. (1.64)
We denote the set of all Markov execution strategies as .∏M . Further, for .t ∈
{1, . . . , T }, we define the sub-execution strategy after time .t of a Markov execu-
tion strategy .π = ( f 1 , . . . , f t , . . . , f T ) ∈ ∏M as
1 Introduction to Optimal Execution 25
π := ( f t , . . . , f T ),
. t (1.65)
be the expected utility payoff at time [ ].t under the strategy .π . It should be noted
that the expected utility payoff .Vtπ st depends on the Markov execution policy
.π = ( f 1 , . . . , f t , . . . , f T ) only through the sub-execution policy .πt := ( f t , . . . , f T )
after time .t.
Now, we define the optimal value function as follows:
[ ] [ ]
. Vt st = sup Vtπ st , st ∈ S, t = 1, . . . , T, T + 1. (1.68)
π∈∏M
Proof We derive the optimal execution volume .qt∗ at time .t ∈ {1, . . . , T } by back-
ward induction method of dynamic programming from the maturity .T .
[Step 1] From the assumption that the large trader must unwind all the remainder
of his/her position at time .t = T ,
. Q T +1 = Q T − qT = 0, (1.72)
must hold, which yields .qT∗ = Q T . Then, for .t = T , with the relation of the moment-
generating function of .vt :
{ }
[ ] 1
.E exp {γ κT qT vT } = exp γ κT qT μvT + γ 2 κT2 qT2 (σTv )2 , (1.73)
2
where
( )
1
. G T := − λT + γ κT2 (σTv )2 (< 0); HT := −κT μvT .
2
[ ]
. VT −1 s T −1
[ [ ]| ]
|
= sup E VT s T |s T −1
qT −1 ∈R
[ { [ ]}| ]
2 |
= sup E − exp − γ WT − pT Q T + G T Q T + HT Q T |WT −1 , PT −1 , Q T −1 , RT −1
qT −1 ∈R
[ { [ { ( )}
= sup E − exp − γ WT −1 − PT −1 + λT −1 qT −1 + κT −1 vT −1 qT −1
qT −1 ∈R
{ ( ) }
− PT −1 − (1 − e−ρ )RT −1 + (λT −1 qT −1 + κT −1 vT −1 ) αT −1 e−ρ + βT −1 + ∈T −1
( ) ( )2 ( ) ]}|| ]
× Q T −1 − qT −1 + G T Q T −1 − qT −1 + HT Q T −1 − qT −1 |WT −1 , PT −1 , Q T −1 , RT −1
{ [
= sup − exp − γ − A T −1 qT2 −1 + (BT −1 Q T −1 + C T −1 RT −1 + DT −1 )qT −1
qT −1 ∈R
{ }
1 1 2
+ WT −1 − PT −1 Q T −1 + G T − γ (α T −1 )2 κT2 −1 (σTv −1 )2 − γ (σT∈ −1 )2 Q T −1
2 2
( ) ]}
+ HT − α T −1 κT −1 μvT −1 − μ∈T −1 Q T −1 + (1 − e−ρ )Q T −1 RT −1 , (1.75)
where
α T −1 := αT −1 e−ρ + βT −1 ;
. (1.76)
1 1
. A T −1 := (1 − α T −1 )λT −1 − G T + γ (1 − α T −1 )2 κT2 −1 (σTv −1 )2 + γ (σT∈ −1 )2 ;
2 2
(1.77)
. BT −1 := −α T −1 λT −1 − 2G T − γ α T −1 (1 − α T −1 )κT2 −1 (σTv −1 )2 + γ (σT∈ −1 )2 ;
(1.78)
. C T −1 := −(1 − e−ρ ); (1.79)
T −1
. DT −1 := −HT − (1 − α )κT −1 μvT −1 + μ∈T −1 . (1.80)
Finding the optimal execution volume .qT∗ −1 , which attains the supremum of Eq.
(1.75), is equivalent to finding the one which yields the maximum of
since Eq. (1.75) and Eq. (1.81) are concave functions with respect to .qT −1 . Thus, by
completing the square of . K T −1 (qT −1 ) with respect to .qT −1 , we obtain the optimal
execution volume .qT∗ −1 as
BT −1 Q T −1 + C T −1 RT −1 + DT −1 ( )
q∗
. T −1 = =: aT −1 + bT −1 Q T −1 + cT −1 RT −1 .
2 A T −1
(1.82)
(BT −1 Q T −1 + C T −1 RT −1 + DT −1 )2 ]}
+
4 A T −1
{ [
2
= − exp − γ WT −1 − PT −1 Q T −1 + G T −1 Q T −1 + HT −1 Q T −1 + I T −1 Q T −1 RT −1
]}
+ JT −1 RT2 −1 + L T −1 RT −1 + Z T −1 , (1.83)
where
1 1 B2
G T −1 := G T − γ (α T −1 )2 κT2 −1 (σTv −1 )2 − γ (σT∈ −1 )2 + T −1 ;
2 2 4 A T −1
B −1 D −1
HT −1 := HT − α T −1 κT −1 μvT −1 − μ∈T −1 +
T T
;
2 A T −1
. (1.84)
BT −1 C T −1 C2
IT −1 := (1 − e−ρ ) + ; JT −1 := T −1 ,
2 A T −1 4 A T −1
C T −1 DT −1 DT2 −1
L T −1 := , Z T −1 := .
2 A T −1 4 A T −1
[Step 3] For .t ∈ {T − 2, . . . , 1}, we can assume from the above results that the
optimal value function has the following functional form at time .t + 1:
[ ] { [ 2
. Vt+1 st+1 = − exp − γ Wt+1 − Pt+1 Q t+1 + G t+1 Q t+1 + Ht+1 Q t+1
]}
+ It+1 Q t+1 Rt+1 + Jt+1 Rt+1
2
+ L t+1 Rt+1 + Z t+1 . (1.85)
[ ] [ { [
. Vt st = sup E − exp − γ Wt+1 − Pt+1 Q t+1
qt ∈R
2
+ G t+1 Q t+1 + Ht+1 Q t+1 + It+1 Q t+1 Rt+1
]}| ]
|
+ Jt+1 Rt+1
2
+ L t+1 Rt+1 + Z t+1 |Wt , Pt , Q t , Rt ,
{ [
= sup − exp − γ − At qt2 + (Bt Q t + Ct Rt + Dt )qt + Wt − Pt Q t
qt ∈R
[ 1 1 ] 2
v 2 ∈ 2
+ G t+1 − v 2 γ ηt (σt ) − γ (σt ) Q t
2
2{1 + 2γ ζt (σt ) } 2
[ 1
+ Ht+1 + ηt μvt
1 + 2γ ζt (σtv )2
1 ]
v 2 ∈
− γ θt φ t (σ ) − μ t Qt
1 + 2γ ζt (σtv )2 t
[ 1 ]
+ (1 − e−ρ ) + e−ρ It+1 − v 2
v 2 γ ηt θt (σt ) Q t Rt
1 + 2γ ζt (σt )
[ 1 ]
+ e−2ρ Jt+1 − v 2
v 2 γ θt (σt ) Rt
2 2
2{1 + 2γ ζt (σt ) }
[ 1
+ e−ρ L t+1 + θt μvt
1 + 2γ ζt (σtv )2
1 ]
v 2
− γ θt φ t (σ ) Rt
1 + 2γ ζt (σtv )2 t
[ 1 1
+ Z t+1 + φt μvt − γ φ 2 (σ v )2
1 + 2γ ζt (σtv )2 2{1 + 2γ ζt (σtv )2 } t t
1 ]]}
v 2
+ ζ t (μ ) + x t , (1.86)
1 + 2γ ζt (σtv )2 t
where
δt := (α t − 1)κt − κt αt e−ρ It+1 + 2λt κt αt2 e−2ρ Jt+1 ; ηt := −κt α t + κt αt e−ρ It+1 ;
1 1
θt := 2κt αt e−ρ Jt+1 ; φt := κt αt e−ρ L t+1 ; xt := − log √ ,
γ 1 + 2γ ζt (σtv )2
and
1
Ct := −(1 − e−ρ ) − e−ρ It+1 + 2λt αt e−2ρ Jt+1 − γ δt θt (σtv )2 ;
1 + 2γ ζt (σtv )2
1
Dt := −Ht+1 + λt αt e−ρ L t+1 + γ δt μvt
1 + 2γ ζt (σtv )2
1
− γ δt φt (σtv )2 + μ∈t . (1.87)
1 + 2γ ζt (σtv )2
{∞ { ( )2 }
1 1 μ + aσ 2
× √ exp − 2 x− dx
2π σ 2 /(1 − 2bσ 2 ) 2σ /(1 − 2bσ 2 ) 1 − 2bσ 2
−∞
~ ~~ ~
=1
{ }
1 2aμ + a 2 σ 2 + 2bμ2
=√ exp . (1.89)
1 − 2bσ 2 2(1 − 2bσ 2 )
( )
Bt Q t + Ct Rt + Dt
q∗
. t := f (st ) = = at + bt Q t + ct Rt , t = T − 2, . . . , 1,
2 At
(1.91)
where
Dt Bt Ct
a :=
. t , bt := , ct := . (1.92)
At At At
By inserting this into Eq. (1.86), the optimal value function at time.t ∈ {T − 2, . . . , 1}
has a functional form as follows:
32 M. Shimoshimizu
[ ] { [
. Vt st = − exp − γ Wt − Pt Q t
[ 1 1 ]
γ ηt2 (σtv )2 − γ (σt∈ )2 Q t
2
+ G t+1 − v
2{1 + 2γ ζt (σt ) } 2 2
[ 1
+ Ht+1 + ηt atv
1 + 2γ ζt (σtv )2
1 ]
− γ θt φt (σtv )2 − μ∈t Q t
1 + 2γ ζt (σtv )2
[ 1 ]
+ (1 − e−ρ ) + e−ρ It+1 − v γ ηt θ t (σ t
v )2 Q R
t t
1 + 2γ ζt (σt )2
[ 1 ]
+ e−2ρ Jt+1 − γ θ 2 (σ v )2 Rt2
2{1 + 2γ ζt (σtv )2 } t t
[ 1
+ e−ρ L t+1 + θt μvt
1 + 2γ ζt (σtv )2
1 ]
− v γ θt φt (σtv )2 Rt
1 + 2γ ζt (σt ) 2
[ 1 1
+ Z t+1 + φt μvt − γ φ 2 (σ v )2
1 + 2γ ζt (σtv )2 2{1 + 2γ ζt (σtv )2 } t t
1 ] (B Q + C R + D )2 ]}
ζt (μvt )2 + xt +
t t t t t
+ v
1 + 2γ ζt (σt ) 2 4 At
{ [ ]}
2
= − exp − γ Wt − Pt Q t + G t Q t + Ht Q t + It Q t Rt + Jt Rt2 + L t Rt + Z t , (1.93)
where
1 v 2 1 ∈ 2 Bt2
. G t := G t+1 − γ η 2
(σ ) − γ (σ ) + ;
2{1 + 2γ ζt (σtv )2 } t t
2 t
4 At
1 1 Bt Dt
Ht := Ht+1 + ηt μvt − γ ηt φt (σtv )2 − μ∈t + ;
1 + 2γ ζt (σtv )2 1 + 2γ ζt (σtv )2 2 At
1 Bt Ct
It := (1 − e−ρ ) + e−ρ It+1 − v 2
v 2 γ ηt θt (σt ) + ;
1 + 2γ ζt (σt ) 2 At
1 Ct2
Jt := e−2ρ Jt+1 − v 2 γ θt (σt ) +
2 v 2
;
2{1 + 2γ ζt (σt ) } 4 At
1 1 C t Dt
Lt := e−ρ L t+1 + v 2 θt μt −
v v 2
v 2 γ θt φt (σt ) + ;
1 + 2γ ζt (σt ) 1 + 2γ ζt (σt ) 2 At
1 v 1
Zt := Z t+1 + v 2 φt μt − γ φ 2 (σ v )2
1 + 2γ ζt (σt ) 2{1 + 2γ ζt (σtv )2 } t t
1 v 2 Dt2
+ ζ t (μ ) + x t + . (1.94)
1 + 2γ ζt (σtv )2 t
4 At
□
From the above theorem, we find that the optimal execution volume .qt∗ for .t ∈
{1, . . . , T } depend on the state . st = (Wt , Pt , Q t , Rt ) ∈ S of the decision process
through the remaining execution volume . Q t and the cumulative residual effect . Rt ,
and not through the wealth .Wt or market price . Pt . Not only does our analysis show
1 Introduction to Optimal Execution 33
that the optimal execution strategy becomes a stochastic one, but also it reveals that
the orders posed by small traders (indirectly) affect the execution strategy of the
large trader (through the residual effect). A great number of researches focus on the
execution problem of a single large trader and yield the optimal execution strategy
in a deterministic class, which is different from our results.
Corollary 1 If the aggregate trading volumes submitted by small traders .vt for .t ∈
{1, . . . , T } are deterministic, the optimal execution volumes.qt∗ at time.t ∈ {1, . . . , T }
also become deterministic functions of time. Thus, the optimal execution strategy is
the one in a class of the static (and non-randomized) execution strategy.
We here consider a model with a closing price. The time framework .t ∈ {1, . . . ,
T, T + 1} is the same in the model mentioned above. However, we add an assumption
that a large trader can execute his/her remaining execution volume at time .T + 1,
i.e., . Q T +1 , with closing price . PT +1 . We further assume that the trading at time .T + 1
imposes the large trader to pay the additive cost .χT +1 per unit of the remaining
volume.
According to the above settings, the value function at maturity becomes
[ ] { [ ]}
g
. T +1 (sT +1 ) = VT +1 sT +1 = − exp − γ WT +1 − (PT +1 + χT +1 Q T +1 )Q T +1 .
(1.95)
where .at∗ , bt∗ , ct∗ for .t ∈ {1, . . . , T, T + 1} are deterministic functions of time .t
which depend on the problem parameters and can be computed backwardly from
maturity .T + 1. [ ]
2. The optimal value function .Vt st at time .t ∈ {1, . . . , T, T + 1} takes the form
as follows:
[ ]
. Vt Wt , Pt , Q t , Rt
{ [ ]}
= − exp − γ Wt − Pt Q t + G ∗t Q t + Ht∗ Q t + It∗ Q t Rt + Jt∗ Rt2 + L ∗t Rt + Z t∗ ,
2
(1.97)
34 M. Shimoshimizu
We can consider .χT as the cost of the dark pool, which large traders make use of in
a real marketplace.
We finally illustrate how to compute the optimal execution strategy shown as Eq.
(1.70).
The algorithm illustrates that if we have the information about .μvt , σtv , μ∈t , σt∈ , αt ,
βt , λt , ρ, γ , then practitioners working as a large trader can use the algorithm to
execute a large amount of orders or as a backtest of his/her trade performance.
The pioneering theoretical study for optimal execution strategy is done by [7] that
addresses the optimization problem of minimizing the expected execution cost in a
discrete-time framework via a dynamic programming approach and shows that the
optimal execution strategy is the one equally split over (finite) time horizon under the
presence of temporary impact. Subsequently, [2] derive an optimal execution strategy
1 Introduction to Optimal Execution 35
by considering both the execution cost and volatility risk, which entails the analysis
with a mean-variance approach. [11, 12] incorporate the market impact caused by
other traders into the construction of the midprice process, showing that the optimal
execution strategies are different from the one obtained in [2] when the price impacts
caused by small traders and coincide with the one obtained in [2] when small traders
are assumed to not influence the midprice.
The modeling of the market impact plays an indispensable role in the research
of optimal and equilibrium execution problems. Some studies (e.g., [8]) empirically
show that part of the market impact at a real market consists of a transient impact. A
number of empirical and theoretical researches then investigate the transient impact
modeling, which is empirically compatible with the real situation. [17, 44] consider
the so-called no-arbitrage condition under a transient impact model. [32] show that
the resilience effect of the limit order book does affect the optimal execution strate-
gies. Then, the seminal papers, such as [26, 27, 46] theoretically consider a market
model under a transient impact and show that the transient impact does affect the
optimal execution strategy of a large trader. In addition, [14, 15, 35, 36] show that
the aggregate orders posed by small traders influence the optimal execution strategy
for a large trader under the assumption that the market impact has the temporary,
permanent, and transient parts. For multiple large traders’ equilibrium execution
problems, [29, 43] derive equilibrium execution strategies under a transient price
impact model. These execution strategies are in a deterministic and static class. [34,
35, 37] derive equilibrium execution strategies in a randomized and dynamic class.
As in the analysis of [7], a lot of research applies a method of dynamic program-
ming approach. For example, [12] study the optimal execution strategies considering
the VWAP as well as the market order-flow and provide the optimal execution speed
in an explicit form. On the other hand, [21] focuse on constructing a model that
explains a guaranteed VWAP strategy with risk mitigating and finds that optimal
trading speed for the strategy is characterized by a Hamiltonian system (through
Legendre transform). [13] consider the correlated multi-assets liquidation problem
with the information of untraded assets incorporated into the price dynamics. [26,
27] construct models for an investor to maximize an expected utility payoff from the
final wealth at maturity via a dynamic programming approach.
A series of recent studies focus on the optimal execution of multiple (correlated)
assets. As an extension of [11, 13] demonstrate the multi assets execution problem.
The paper considers an optimal execution strategy of a single large trader for multiple
risky assets using the information of both assets that he/she trades and does not. Their
research concerns a market that large orders impose both temporary and permanent
impacts, but not transient one. Another research [46] investigates the optimal execu-
tion strategy of multiple risky assets under the assumption that a single large trader
exists in a financial market and the orders posed by the large trader cause a tran-
sient impact. Following these researches, a number of studies concern the optimal
execution of multiple correlated risky assets.
The cross-impact has received much attention in recent years. [44] investigate a
condition for a market to admit no arbitrage opportunities and show that the cross-
impact of asset.i on asset. j must be identical to that of asset. j on asset.i. This condition
36 M. Shimoshimizu
is equivalent to the symmetricity of the market impact matrix representing all of the
market impacts on the order execution of multiple assets. From a theoretical point of
view, [1] examine the property of the so-called decay kernel, a matrix representing
the resilience speed of temporary impacts of multiple assets with cross-impact. They
show that the decay kernel must be (i) nonnegative, (ii) nonincreasing with respect
to trading time, (iii) convex with respect to trading time, and (iv) commuting.
There are a few papers that extend Ohnishi and Shimoshimizu model. [14, 36]
investigate the case that aggregate orders posed by small traders have a Markovian
dependence as follows:
v 0 = 0;
. (1.98)
v t+1 = (avt+1 − bvt+1 v t ) + σ vt+1ω t+1 , t = 0, . . . , T − 1,
where .ω t ∼ NRd (0, 1) for all .t ∈ {1, . . . , T }. (The dimension is set as .d = 1 for [14]
and .d = 2 for [36].) In this case, the optimal execution becomes
q ∗ := at + bt Q t + ct Rt + d t v t−1 , t = 0, . . . , T.
. t (1.99)
Thus, the previous aggregate orders posed by small traders directly and indirectly
affect the optimal execution strategy. [15] further investigates a continuous-time
analog of [14].
The question ‘How do traders act in the dark pool?’ and ‘To what extent does
the dark pool affect the market quality and market efficiency?’ has attracted both
empirical and theoretical researchers in the last decade. For more detail, see, eg.,
[25, 28].
Some papers delve into the interaction between more than one large trader; exam-
ples are [29, 43, 45], to mention only a few related papers. [45] analyze the interaction
of two large traders on their execution strategies, which inspires the following two
works. In [43], they formulate what they call a market impact game model (as a static
strategic game model). This study discovers some features of a Nash equilibrium
strategy, proving that a unique Nash equilibrium exists in a class of static and deter-
ministic strategies in explicit form. They also prevail, via a rather direct method, that
the equilibrium is also a Nash equilibrium in a broader class of dynamic strategies.
Subsequently, [29] extend the above model to .n-large trader model and constructs
cost minimization problems in terms of a mean-variance and expected utility max-
imization problems. A significant result of their analysis is that a Nash equilibrium
exists in each problem, which is also in explicit form and is unique for the former
one. They also reveal that the Bachelier price model renders the Nash equilibrium
obtained from each problem identical, where the price is composed of a Brown-
ian motion as a term expressing the volatility of the stock price. These studies are
noteworthy since they theoretically highlight the interaction of execution strategies
among multiple large traders.
Much of the research has been conducted to search for optimal trading perfor-
mance with trading (transaction) costs. [18, 19] theoretically consider portfolio
selection problems with transaction costs (which can be seen as market impact) by
1 Introduction to Optimal Execution 37
assuming the quadratic trading costs for the trading shares. They show that the optimal
trading strategies, under the maximization problem of the sum of all future expected
returns with the penalty for the risks and transaction cost, becomes a weighted aver-
age of the existing portfolio and the aim portfolio, which is the weighted average of
the current Markowitz portfolio and the expected Markowitz portfolio of the remain-
ing infinite future horizon. Another research [30] further investigate these works
when a CARA investor executes a large amount of orders in a finite time horizon
and shows that the CARA investor is sensitive to the risk which the return-predicting
factor causes while it is not the case in the above model.
Acknowledgements The author is partly supported by the JSPS Grant-in-aid for Early-Career
Scientists #21K13325.
Competing Interests The author has no conflicts of interest to declare that are relevant to the
content of this chapter.
Many problems arising from economic or financial problems often result in a max-
imization (or minimization) problem with equity constraint. To be precise, we often
face the following type of optimization problem:
( )
. maxn or minn f (x) (1.100a)
x∈R x∈R
subject to g(x) = 0,
. (1.100b)
and .0 := (0, . . . , 0)T ∈ Rm . The function . g indicates that the optimization problem
is subject to .m equality constraints. Here we assume that .n ≥ m holds. Under some
regularity conditions, the following theorem provides a necessity condition that the
optimal solution must satisfy.
38 M. Shimoshimizu
Theorem 5 (The Theorem of Lagrange) Assume that we have the following maxi-
mization (or minimization) problem as in Egs. (1.100a) and (1.100b). We also sup-
pose that a local maximum or minimum, denoted by . x ∗ , exists, and the rank of the
Jacobian:
⎛ ∂g (x ∗ ) ∗ ⎞
1
∂ x1
· · · ∂g∂1 x(xn )
∂g ∗ ⎜ . .. .. ⎟
. (x ) := ⎜
⎝ .. . . ⎠∈R
⎟ m×n
, (1.102)
∂x ∗
∂gm (x ) ∂gm (x ) ∗
∂ x1
· · · ∂ xn
is .m. (That is, the Jacobian matrix is full rank.) Then, there exists a vector .λ∗ :=
(λ∗1 , . . . , λ∗m ) ∈ Rm such that
| |
∂ f (x) || ∑
k
|
∗ ∂gi (x) |
| + λi = 0. (1.103)
∂ x | x=x ∗
.
∂ x x=x ∗ i=1
Equation (1.103) implies a first-order condition for the above problem. The
method that narrows down the candidates is referred to as the Lagrange Multiplier
Method.
The readers can confirm that the condition is satisfied for the expected cost min-
imization problem of the AC model since in the model . ∂∂ xg (x ∗ ) becomes
⎛ ∗⎞ ⎛ 2Q ⎞ ⎛ ⎞
( ) 2q1 ( ) n 0
ψ ⎜ . ⎟ ψ ⎜ . ⎟ ⎜.⎟
. η − .
⎝ . ⎠= η− .
⎝ . ⎠ /= ⎝ .. ⎠ , (1.104)
2 ∗
2 2Q
2qn n
0
. x n+1 = Ax n , (1.105)
where . x n ∈ Rd for all .n ∈ Z++ and .A ∈ Rd×d , with the initial condition . x 0 = x ∈
Rd . The following one-dimensional example illustrates the basic concept of the first-
order linear difference equation.
x
. n+1 = axn , (1.106)
where .a ∈ R++ , with .x0 = 1. Then the explicit solution for .xn for .n ∈ Z++ is
x = a n x0 = a n .
. n (1.107)
x
. n+2 = axn+1 + bxn , (1.109)
The general idea for solving the second-order linear difference equation is cap-
tured by the following example.
as follows:
(
)
11
. x n+1 = x =: Ax n . (1.113)
10 n
The eigenvalues for .A, denoted by .λ1 and .λ2 with .λ1 > λ2 , are given by
√ √
1+ 5 1− 5
|A − λI| = λ − λ − 1 = 0
.
2
⇐⇒ λ1 = , λ2 = . (1.114)
2 2
The (typical) corresponding eigenvectors, .v 1 and .v 2 , is given by
( ) ( )
λ1 λ2
v =
. 1 , v2 = . (1.115)
1 1
Combining the above equation with the fact that . x 0 = v 1 − x 2 /λ1 − λ2 , we obtain
(( √ )n ( √ )n )
λn1 λn2 1 1+ 5 1+ 5
. Fn = − =√ − . (1.117)
λ 1 − λ2 λ 1 − λ2 5 2 2
Here we show how one can derive Eq. (1.31) and the reason that the equation is
an approximation of the solution. Let us assume that the solution of the following
equation:
1 Introduction to Optimal Execution 41
( ) ( )
. η Q k − Q k−1 − η Q k+1 − Q k + γ σ 2 Q k = 0, (1.118)
Therefore, we have
Note that this is not the true solution for the above equation, although this approxi-
mation is similar to the one obtained in a continuous-time model.
{a
. min f (xt , ẋt , t)dt, (1.122)
{xt }t∈[a,b]
b
subject to various kinds of constraints over a set of functions defined on .[a, b](,
which, in the following, is symbolically denoted by .S). Here we review a necessary
condition for a minimization problem as the form (1.122) with initial and terminal
conditions (or the so-called boundary conditions). The necessary condition is given
by the form of a differential equation that the minimizer satisfies.
f
.(ξ, θ, τ ) |→ f (ξ, θ, τ ), (1.123)
3. . F: S → R is given by
{b
. F(x) := f (xt , ẋt , t)dt (1.124)
a
for all .x ∈ S.
Then, we have
1. If .x ∗ ∈ S is a minimizer of . F, the minimizer satisfies the Euler-Lagrange equa-
tion:
( )
∂F ( ∗ ∗ ) d ∂F ( ∗ ∗ )
. x , ẋ , t − x , ẋ , t = 0, (1.125)
∂ξ t t dt ∂θ t t
This section reviews how to solve an ordinary differential equation (ODE), in par-
ticular, a second-order linear ODE with constant coefficients.
This subsection is a quick review of linear ODE with two examples illuminating the
essence of ODEs.15
ẋ − 2t = 0.
. t (1.127)
x = t 2 + 2.
. t (1.129)
A solution for an ODE without any (uncertain) constant is the particular solution.
The condition like “.x0 = 2” is called a initial condition.
f˙ = f t
. t (1.130)
f˙ = β f t ,
. t (1.131)
. tf = Ceβt . (1.132)
a ẍ + a2 ẋt + a3 xt = yt ,
. 1 t (1.133)
16 Note that all ODEs do not have an explicit solution with necessity.
44 M. Shimoshimizu
with .a1 /= 0 is called second-order linear ordinary differential equation (ODE) with
constant coefficients. The ODE is said to be homogeneous if . yt ≡ 0 for all .t ∈ T,
and non-homogeneous if otherwise.
We assume that. yt ≡ 0 in the rest of this section. The following explanation reveals
some features of Eq. (1.133). Let .xt be an exponential function, i.e., .xt = eβt . Then,
substituting this into Eq. (1.133) yields
( )
. a1 β 2 + a2 β + a3 eβt = 0. (1.134)
a β 2 + a2 β + a3 = 0
. 1 (1.135)
holds. This equation is called the characteristic equation of Eq. (1.133). Assume that
the characteristic equation has two real solutions, for example, .β1 and .β2 . Then, Eq.
(1.133) has the two solutions:
x ∗ := A1 eβ1 t + A2 eβ2 t ,
. t (1.137)
(1.138)
17 A rigorous explanation from a linear algebraic point of view shows that the set .V :
Proposition 1 Assume that the characteristic equation has two real solutions,
denoted by .β1 and .β2 .18 Then, the general solution to the second-order linear ODE
(1.133) is given by
x = A1 eβ1 t + A2 eβ2 t ,
. t (1.140)
λσ 2
. tẍ − xt = 0. (1.141)
η
From the fact that the solution of the following quadratic equation:
λσ 2
α2 −
. = 0, (1.142)
η
/
λσ 2
is given by .α = ±κ (.κ := η
), the general solution to Eq. (1.141) becomes
x = A1 eκt + A2 e−κt ,
. t (1.143)
18 To be precise, the solution of the ODE (1.133) exists regardless of the features of the solution.
46 M. Shimoshimizu
State Process
3. A decision rule is the action that the decision maker takes at each time
.t ∈ {1, 2, . . . , T }. The action at time .t ∈ {1, 2, . . . , T } affects the state at the
next time .t + 1 ∈ {2, . . . , T + 1}. The action of the decision maker at time
.t ∈ {1, 2, . . . , T } is denoted as . q t .
19
The set of all actions the decision maker
can take at time .t ∈ {1, 2, . . . , T } is expressed as .At .20
4. The dynamics of the state variables, or so-called the law of motion, are the ones
that are determined after the decision maker take an action .q t at a state . st for
time .t ∈ {1, . . . , T }. We often describe the relationships between . st and . st+1 via
a (Borel measurable) function . h(: S × At × Rl → S) as follows:
( )
s
. t+1 = h st , q t , ∈ t . (1.147)
For each time .t ∈ {1, 2, . . . , T }, a payoff arises depending on the state at time .t and
( that) the decision maker takes at time .t. Here we denote the payoff at time
the action
.t as . gt s t , q t . The decision maker aims to maximize the expected sum of payoffs
for all .t ∈ {1, 2, . . . , T } defined as follows:
[ T | ]
∑ ( ) |
E1
. gt st , q t , ∈ t + gT +1 (s T +1 ) || s1 . (1.148)
t=1
Similarly,
[ ] let us define the expected sum of payoff after time .{t, t + 1, . . . , T } as
Vt st :
.
[ | ]
[ ] ∑
T
( ) |
. Vt s t := max Et gn sn , q n , ∈ n + gT +1 (s T +1 ) || st . (1.150)
(q t ,...,q T )∈At ×···×AT n=t
Equation (1.150) corresponds to the tail problem starting from time .t ∈ {1, . . . , T }.
Under the above setting, Bellman’s principle of optimality becomes as follows.
as follows:
48 M. Shimoshimizu
[ ] [ ( ) [ ( ) ]|| ]
. Vt st = max Et gt st , q t , ∈ t + Vt+1 h st , q t , ∈ n | st (1.151)
q t ∈At
( )
[ ( ) [ ]|| ]
∈
. = max Et gt s t , q t , t + Vt+1 s t+1 | s t (1.152)
q t ∈At
The principle that this recursive mechanism holds for all time.t ∈ {1, . . . , T } is called
the Bellman’s principle of optimality.
The dynamic programming originated from [4, 5]. Application of Bellman’s prin-
ciple of optimality to financial problems is well explained in [3]. Readers can also
refer to a proof for a deterministic version of Bellman’s principle of optimality with
economic applications in [41].
Also, their model uses the Markov decision process approach. This approach defines
the action of the decision maker at time .t as a map from the state at time .t to the
action at time .t: . f t : At → R, (that is, .qt = f t (st )). Then, the Bellman equation takes
the form of Eq. (1.69).
References
1. Alfonsi, A., Klöck, F., Schied, A.: Multivariate transient price impact and matrix-valued positive
definite functions. Math. Oper. Res. 41, 914–934 (2016)
2. Almgren, R., Chriss, N.: Optimal execution of portfolio transactions. J. Risk 3, 5–39 (2000)
3. Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer,
Berlin, Heidelberg (2011)
4. Bellman, R.: The theory of dynamic programming. B. Am. Math. Soc. 60, 503–515 (1954)
5. Bellman, R.: Dynamic Programming. Princeton University Press (1957)
6. Bertsekas, D.: Dynamic programming and optimal control: Volume I (Vol. 1). Athena Scientific
(2012)
7. Bertsimas, D., Lo, A.W.: Optimal control of execution costs. J. Financ. Mark. 1, 1–50 (1998)
8. Bouchaud, J.P., Gefen, Y., Potters, M., Wyart, M.: Fluctuations and response in financial mar-
kets: the subtle nature of ‘random’ price changes. Quant. Financ. 4, 176–190 (2004)
9. Capiński, M., Kopp, P.E.: Measure. Springer-Verlag, Integral and Probability (2004)
10. Cartea, Á., Jaimungal, S., Penalva, J.: Algorithmic and High-frequency Trading. Cambridge
University Press (2015)
11. Cartea, Á., Jaimungal, S.: Incorporating order-flow into optimal execution. Math. and Financ.
Econ. 10, 339–364 (2016)
1 Introduction to Optimal Execution 49
12. Cartea, Á., Jaimungal, S.: A closed-form execution strategy to target volume weighted average
price. SIAM J. Financ. Math. 7, 760–785 (2016)
13. Cartea, Á., Gan, L., Jaimungal, S.: Trading co-integrated assets with price impact. Math. Financ.
29, 542–567 (2019)
14. Fukasawa, M., Ohnishi, M., Shimoshimizu, M.: Discrete-time optimal execution under a gen-
eralized price impact model with Markov exogenous orders. Int. J. Theor. Appl. Financ. 24,
2150025 (2021)
15. Fukasawa, M., Ohnishi, M., Shimoshimizu, M.: Optimal execution under a generalized price
impact model with Markovian exogenous orders in a continuous-time setting. RIMS Kokyuroku
2207, 1–22 (2022)
16. Fruth, A., Schöneborn, T., Urusov, M.: Optimal trade execution and price manipulation in order
books with time-varying liquidity. Math. Financ. 24, 651–695 (2014)
17. Gatheral, J.: No-dynamic-arbitrage and market impact. Quant. Financ. 10, 749–759 (2010)
18. Gârleanu, N., Pedersen, L.H.: Dynamic trading with predictable returns and transaction costs.
J. Financ. 68, 2309–2340 (2013)
19. Gârleanu, N., Pedersen, L.H.: Dynamic portfolio choice with frictions. J. Econ. Theory 165,
487–516 (2016)
20. Guéant, O.: Permanent market impact can be nonlinear. Available at arXiv:1305.0413 (2013)
21. Guéant, O., Royer, G.: VWAP execution and guaranteed VWAP. SIAM J. Financ. Math. 5,
445–471 (2014)
22. Guéant, O.: The Financial Mathematics of Market Liquidity: From Optimal Execution to Mar-
ket Making. CRC Press (2016)
23. Huberman, G., Stanzl, W.: Price manipulation and quasi-arbitrage. Econometrica 72, 1247–
1275 (2004)
24. Kissell, R.: Algorithmic Trading Methods: Applications Using Advanced Statistics, Optimiza-
tion, and Machine Learning Techniques. Academic Press (2021)
25. Kratz, P., Schöneborn, T.: Portfolio liquidation in dark pools in continuous time. Math. Financ.
25, 496–544 (2015)
26. Kuno, S., Ohnishi, M.: Optimal execution in illiquid market with the absence of price manip-
ulation. J. Math. Financ. 5, 1–14 (2015)
27. Kuno, S., Ohnishi, M., Shimizu, P.: Optimal off-exchange execution with closing price. J. Math.
Financ. 7, 54–64 (2017)
28. Laruelle, S., Lehalle, C. A.: Market Microstructure in Practice Second Edition. World Scientific
(2018)
29. Luo, X., Schied, A.: Nash equilibrium for risk-averse investors in a market impact game with
transient price impact. Market Microstruct. Liquidity 5, 2050001 (2019)
30. Ma, G., Siu, C.C., Zhu, S.-P.: Dynamic portfolio choice with return predictability and transac-
tion costs. Eur. J. Oper. Res. 278, 976–988 (2019)
31. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)
32. Obizhaeva, A.A., Wang, J.: Optimal trading strategy and supply/demand dynamics. J. Financ.
Mark. 16, 1–32 (2013)
33. O’Hara, M.: High frequency market microstructure. J. Financ. Econ. 116, 257–270 (2015)
34. Ohnishi, M., Shimoshimizu, M.: Equilibrium execution strategy with generalized price impacts.
RIMS Kokyuroku 2111, 84–106 (2019)
35. Ohnishi, M., Shimoshimizu, M.: Optimal and equilibrium execution strategies with generalized
price impact. Quant. Financ. 20, 1625–1644 (2020)
36. Ohnishi, M., Shimoshimizu, M.: Optimal pair-trade execution with generalized cross-impact.
Asia-Pac. Financ. Mark. 29, 253–289 (2022)
37. Ohnishi, M., Shimoshimizu, M.: Trade execution game in a Markovian environment. Available
at SSRN (2023)
38. Perold, A.F.: The implementation shortfall: paper versus reality. J. Portfolio Manage. 14, 4–9
(1988)
39. Potters, M., Bouchaud, J.P.: More statistical properties of order books and price impact. Physica
A 324, 133–140 (2003)
50 M. Shimoshimizu
40. Rudin, W.: Principles of Mathematical Analysis. McGraw-Hill, New York (1976)
41. Sundaram, R. K.: A First Course in Optimization Theory. Cambridge University Press (1996)
42. Sasane, A.: Optimization in Function Spaces. Courier Dover Publications (2016)
43. Schied, A., Zhang, T.: A market impact game under transient price impact. Math. Oper. Res.
44, 102–121 (2019)
44. Schneider, M., Lillo, F.: Cross-impact and no-dynamic-arbitrage. Quant. Financ. 19, 137–154
(2019)
45. Schöneborn, T.: Trade Execution in Illiquid Markets: Optimal Stochastic Control and Multi-
agent Equilibria. Doctoral dissertation, Technische Universität Berlin (2008)
46. Tsoukalas, G., Wang, J., Giesecke, K.: Dynamic portfolio execution. Manage. Sci. 65, 2015–
2040 (2019)
47. Velu, R., Hardy, M., Nehren, D.: Algorithmic Trading and Quantitative Strategies. Chapman
and Hall/CRC (2020)
Part II
Tools and Techniques
Chapter 2
Python Stack for Design
and Visualization in Financial
Engineering
2.1 Introduction
at a pre-defined strike price on a future date and make a profit if the price of the
Facebook stock falls below the strike price. Financial engineering involves taking
such options and other existing securities like stocks, indices, government bonds,
currencies (or even bitcoins) to create tailor-made insurance like products for a wide
variety of investors and corporations.
It turns out that designing and pricing of such products requires as much facility
with applied mathematics, statistics and programming as with finance. Reflecting
the industry’s need and students’ demand for such skills, an increasing number of
schools of engineering have started offering Master’s and certificate programs in
financial engineering or mathematical/computational finance, with many of them
being jointly offered with the departments of applied mathematics and economics.
While high-paying job opportunities in large banks and hedge funds definitely
explains part of the attraction, the fact that the field draws its toolkit from disci-
plines as varied as theory of stochastic processes and partial differential equations
to practicalities of Monte Carlo simulation and finite difference methods makes it
particularly exciting for students aiming for a career requiring expertise in applied
mathematics and computational methods [1]. Also, given that implementing practical
financial engineering applications routinely involve programming, many engineering
and computer science students find the coding part of the job equally fascinating.
It helps that programming jobs in Goldman Sachs are often far more lucrative than
those in Facebook. The fact that the most famous formula in the field [2] got two
of its inventors (Fischer Black, Myron Scholes and Robert Merton) a Nobel Prize in
Economic Sciences attracts even academically minded students to the field.
Even though there exist many open-source libraries in many languages today to
help solve practical financial engineering problems, ranging from those built in C++
to downloadable Excel add-ins, over time use of Python has become mainstream
at hedge funds and quantitative trading firms. Solving mathematically challenging
pricing problems and writing efficient computer programs to implement them is
just one part of the puzzle, however. Most of the action (and money) in financial
engineering lies in structuring—designing products suited to the exact needs of the
clients and investors, and here the powers of visualization and interactivity are more
important than speed and efficiency of computations.
In this note we highlight the power of the Python stack for designing graphical
user interfaces (GUIs) for engineering structured product solutions by visualizing
their payoffs and prices in a web browser.
The plan of the paper is as follows. After reviewing the literature on design of
such applications for practical and pedagogical use in Sect. 2.2, in Sect. 2.3, we
briefly describe the nature of structured products to understand the importance of
interactivity in visualization for the task at hand. In Sect. 2.4, we describe the Python
modules and the associated classes for designing interactive Python based GUIs.
Section 2.5 describes our main Python application in detail and Sect. 2.6 concludes.
2 Python Stack for Design and Visualization in Financial Engineering 55
When it comes to designing interactive applications for building prototypes and for
pedagogical needs, Microsoft Excel remains one of the most popular software [3].
The fact that there are many websites, blogs and journal articles published on use
of Excel understandably makes it understandably a highly attractive and convenient
choice for beginners as well as for classroom use in a variety of contexts [4–7].
Some of the biggest concerns with using spreadsheets, however, is the compul-
sion to keep the data, inputs and outputs within the same software and lack of a
professional interactive graphic library important for designing and visualizing finan-
cial engineering applications [8]. It has also been often blamed for making it easy
to make ‘silly mistakes’, often leading to embarrassing reputational and financial
consequences for large organizations [9, 10].
Browser-based applications built using high level languages like R and Python
provide an attractive alternative. In this article we introduce one such browser-based
alternative to Excel called Jupyter based on Python. The use of Jupyter to create a
front end on the web browser is a game changer. Today all modern browsers are
compatible in being able to render text, tables and figures equally well, making such
a Python-based application well-suited for both prototypes as well as professional
applications.
To our knowledge, while there is large literature on using Python in natural
sciences or in engineering applications [11] and designing domain specific Python
libraries [12, 13], the literature on designing Python-based financial engineering
applications is still at a nascent stage with only studies on using Python for blockchain
and crypto [14] and as a front-end when business needs require using sophisticated
computational finance libraries built in C++ for efficient implementations [15].
Python for scientific computing and visualization has the advantage of being
open-source and free [16], so ideal for universities and schools with limited budgets
to teach courses in option pricing and financial engineering as well as for budding
trading and boutique consulting firms.
The category of structured products straddles the market somewhere in between the
market of regular call and put options and over-the-counter derivatives available
only to select large financial institutions and hedge funds. The domain of financial
engineering involves designing such structured products. The field in that sense has
all the flavor of boutique tailoring shops at Savoy, London. It involves designing
customized solutions but using otherwise familiar ingredients. The need for such
products arises from two kinds of clients—typically high net worth individuals and
corporations—those looking to express out-of-consensus and asymmetric views, and
those looking to hedge their financial risks to very specific kind of exposures.
56 J. R. Varma and V. Virmani
• Maturity: Although not often critical, the duration of the product is also up for
tweaking. It is more a matter of an investor’s (or sometimes the seller’s) preference
on how long the lock-in period is desired. For example, designers often introduce
autocallable features in the product to reduce the expected life of the product.
Other than for pricing, maturity is relatively less important otherwise.
• Barrier level: Upside (or downside) from a structured product is often contingent
on a barrier being hit prior to (or at) maturity of the product. A barrier represents
a pre-specified level for the underlying variable, say, foreign exchange rate. For
example, the payoff from a standard ‘Up-and-In’ call option on EUR/USD gives
the payoff from a call option at maturity if the value of EUR/USD rate has hit
a barrier above its beginning value at some time before maturity. In comparison,
an ‘Up-and-out’ option gives a payoff only if the barrier has not been hit. (‘Up’
and ‘Down’ are defined with respect to the current value of the variable defining
the barrier. If the barrier level is above/below the current value, it is designated as
‘Up/Down’.)
• Caps and Floors: Including caps and floors allow further tweaking of the payoff by
introducing the maximum possible performance (called a cap) and/or a minimum
guaranteed performance (called a floor) from the embedded option in the product.
So, if a buyer is not comfortable with the price of a product, issuers may either
tweak the barrier level or introduce a cap. This limits the upside performance
defined by the level of the cap, but also leads to a reduction in price. A floor
serves a similar purpose.
In the early years of the development of the IPython project (co-founded by Fernando
Perez and Brian Granger in 2001), IPython environment integrated a terminal, a
Python kernel (console and qtconsole), distributed computing, support for other
languages and the browser-based notebook [17].
In the 10 years since, the notebook part of the project has been split into a more
general purpose browser-based environment called Jupyter (post IPython 3.x), with
the ability to also integrate other popular languages used in the field of data science
like Julia and R. Jupyter is also known to integrate with other languages (list of
unofficial community maintained kernels is available at https://github.com/jupyter/
jupyter/wiki/Jupyter-kernels).
The modern Jupyter environment is further enhanced by a set of magic commands
designed to simplify and speed up commonly used operations like debugging, timing
codes, copying and pasting commands from external sources as well as plotting. With
a browser-based interface to the IPython shell and integration of scripting, formatted
texts and dynamic display capabilities, Jupyter provides an ideal setting for building
a GUI for financial engineering applications. And given the compatibility of all
58 J. R. Varma and V. Virmani
the full list). Third-party APIs like Seaborn and Plotly allow additional high-level
interactivity. While the context decides usability of one or more kind of widgets in
a GUI, for the task at hand we find sliders and check buttons to be the most useful.
In the next section we describe building a GUI towards designing a product called
“3-way collar”.
A 3-way collar has all the features one finds in a typical structured product designed
for hedging risks while at the same time not being overly complicated. For producers
of commodities like oil and copper, the biggest risk is the fluctuation in the price of
their output. An oilfield worries about the price of crude oil, a copper mine worries
about the copper price, and a farmer worries about the price of wheat or corn.
There are a variety of ways to hedge the exposure to the sale price, but it is useful to
begin by graphically visualizing producer’s revenue assuming she does not hedge the
exposure at all and then slowly adding elements towards the design of a 3-way collar.
The first plot in Fig. 2.1 shows that the revenue varies one-on-one with fluctuation
in the prices. The danger is clear, at low prices producer might not even cover her
costs. At high prices, of course, she accordingly gains too.
The next plot (titled “Forward”) shows how the risk is completely eliminated
by using a forward contract to lock in the price of 100. Regardless of the output
price, the revenues are fixed at the forward price. This make sense from a pure risk
management perspective, but can be very unattractive if the producer has a view that
prices are more likely to rise than to fall. In this case, all the upside has been given
up to eliminate the downside risk.
The plot titled “Put” shows the power of option contracts. By buying a put option
with a strike price of 100, the producer has the option to sell at 100 while retaining
the ability to sell at the market price if that is higher. The producer keeps most
of the upside from the expected rise in output prices while largely eliminating the
downside risk from falling prices. The problem is that the put option costs money,
with a premium of about 8% in this case (given the assumed Black–Scholes option
pricing model). In the plot, the option premium is the gap between the “Revenue”
and “Net Revenue” lines. The Net Revenue is less than that obtained by the forward
contract unless the output price rises to 108.
The plot titled “Collar” shows a slightly more complex strategy that makes sense
if the producer believes that the price is unlikely to rise above 115. Based on this
view, the producer sells a call option with a strike of 115 in addition to buying a
put option with a strike of 100. The sold call option obliges the producer to sell the
output at 115 even if the market price is higher, while the bought put entitles her to
sell at 100 even if the market price is lower. The sold call option earns a part of the
60 J. R. Varma and V. Virmani
premium expended on the bought put option, and the net total premium to be paid
is only around 5% (the gap between the “Revenue” and “Net Revenue” lines in the
plot is smaller than in the previous case).
Finally, the last plot titled “3-way Collar” is a strategy which might be adopted by
a producer who has a view similar to that in the “Collar” case, and also believes that a
significant drop in the output price is unlikely. Specifically, she believes that there is a
very low chance that the price will drop below 90. In this case, she might supplement
the “Collar” with the sale of a put option at 90. The 3-way collar therefore consists
of a put bought at 100 (K1 ), a call sold at 115 (K2 ), and a put sold at 90 (K3 ). This
strategy costs much less money (about 1.3%) to set it up.
Given the plot for 3-way collar in Figure [strategies], it is clear that a 3-way collar
has both a floor and a cap built in. Mathematically, the revenue may be written as:
where I [S > K1 ] denotes an indicator function which takes the value 1 when S > K1
and 0 otherwise. From a design point of view, ignoring time to maturity, the 3-way
collar has three strikes (K1 , K2 and K3 ) to tweak, each of which can be moved up or
2 Python Stack for Design and Visualization in Financial Engineering 61
down to match the views of the producer, the available budget for option premiums
and the willingness to take risks.
The interactive plot whose screenshot is shown in Fig. 2.2 builds on top of the
Black_Scholes python module for pricing described later. Having developed the
main module, however, the design of the 3-way collar itself only requires about a
half dozen lines of python code, with the combination of call and put options defining
the product (three_way) built up piece by piece in four lines towards the end of the
code in Appendix.
• The line combo.exposure() denotes the producer’s exposure to the commodity
price as an output price. Had we been considering a consumer of the commodity
worried about the commodity price as an input price, we would have a minus sign
in front −combo.exposure().
• combo.put(K[0]) denotes the put option that was bought with a strike of K[0] =
100.
• −combo.put(K[2]) denotes the sold put option with a strike of K[2] = 90.
• −combo.call(K[1]) is the sold call option with a strike of K[1] = 115.
• .set_name(‘3-way Collar’) gives the combo an easy to recognize name. Other-
wise, the software would give it a name reflecting its component pieces
(Exposure+Put@100−Call@115−Put@90).
The interactive plot is generated by calling the appropriate method of the three_
way object:
• three_way.interactive_plot([combo.payoff, combo.profit])
• The two lines that are plotted are what option traders call the payoff and the
profit. For ease of understanding, they are relabelled as Revenue and Net Revenue:
combo.name_mapping = dict(payoff = ‘Revenue’, profit = ‘Net Revenue’).
Had we been analyzing the 3-way collar from the perspective of an options dealer,
we might have chosen to plot the option value or one of its partial derivatives (Delta,
Gamma or Vega) instead of the payoff and the profit. We now turn to the details of
the Black_Scholes python module that does all the heavy lifting.
As would be clear from the few lines of code in the listing in Appendix, the
Black_Scholes python module is entirely object oriented and is built on a series of
classes derived from the basic GBS class that implements the famous Black–Scholes
formulas [2]. The main derived classes include (the full source code is available on
our GitHub page at https://github.com/Computational-Finance/Black_Scholes):
• GBSx
• option_portfolio
• combos
The Black Scholes theory was important enough to earn a Nobel prize in
economics, but its main practical utility is that it provides analytical formulas for
all quantities of interest—the option prices as well as all its partial derivatives, which
are often referred to as Greeks. An acronym for ‘Generalized Black Scholes’, the
GBS class is an implementation of the Black Scholes formulas for option values,
implied volatility and the important Greeks (the term Generalized indicating that
underlying with dividend-like features—dividend-paying stocks, currencies, futures,
commodities—are also supported).
The constructor of the GBS class takes as parameters all the inputs to the Black
Scholes formulas: price of the underlying, strike price, volatility, time to maturity,
interest rate and the dividend yield. All of these can be NumPy arrays and so an array
of options can be analysed simultaneously. In this case, the class methods return
NumPy arrays. All the functions needed for the Black–Scholes formulas including
the cumulative normal distribution function are readily in available in NumPy and
SciPy and so most of this is a faithful transcription of the formulas into Python [19].
The only tricky part is that some calculations can lead to expressions of the form
‘0/0’ which would normally lead NumPy to return a nan (not a number). Wherever
2 Python Stack for Design and Visualization in Financial Engineering 63
possible, a careful analysis of the limiting behaviour is used to replace this with zero
or infinity.
The GBSx class is a convenience class derived from the GBS class that includes
forward contracts, the underlying and zero-coupon bonds as ‘first class’ objects
without resorting to artificial constructions. It is motivated by the fact that call and
put options can be used to replicate other common instruments. For example, a call
option with a strike of zero is the same as the underlying asset. A long position in a
call option combined with a short position in a put option with the same strike is a
forward contract at that price.
The option_portfolio class represents option portfolios containing long and short
positions in several different options. This is a relatively easy extension because
GBS already allows for an array of options. The only new thing in option_portfolio
is an additional array weight which can be positive (purchased options) or negative
(sold options). The methods of this class return the dot product of the weight with
the array returned by the corresponding method of the GBS class. For example, the
value method of this class calls the value method of the underlying GBS class to get
the values of all the options in the portfolio. It then computes the dot product of this
value array with the weight array to return the value of the entire portfolio as a scalar
quantity. Similarly, the Delta method returns the portfolio delta.
Finally, the combo class is a wrapper around the option_portfolio class focusing on
simplicity and ease of use (‘combo’ is common practitioner term for a combination
or portfolio of options). Instead of using NumPy arrays, this class uses operator
overloading to allow option combos to be built piece by piece as in the earlier example
of a three way collar. Further simplification is possible because combos typically have
the same underlying and maturity, and these common parameters can be represented
by static variables of the combo class. From the user point of view, combos can often
be constructed by specifying only one parameter—the strike. This is facilitated by
a number of static methods of the combo class that construct and return a combo
without the user having to worry about all the arrays and Pandas DataFrames that
the constructor uses internally.
The addition and subtraction operators and the unary minus operator are all over-
loaded to make it easy to combine multiple options by simple addition or subtrac-
tion. The multiplication operator is also overloaded in the special case where the
left operand is a number: 5 * call(110) means 5 call options with a strike of 100.
All this is largely syntactic sugar: for example, all that 5 * call(110) does is to set
the appropriate element of the weight array of the underlying option_portfolio to
5. Most of the addition and subtraction is implemented by merging the two Pandas
DataFrames to avoid reinventing the wheel.
The other important methods in the combos class plot the values or Greeks (partial
derivatives) of the combo for various values of the underlying price. There is also
64 J. R. Varma and V. Virmani
“memory” feature: if the coupon was not paid in some years (because the index level
was below the barrier), but in a later year, the index level was above the barrier, then
all the missed coupons would be paid along with the coupon for that year. Second
was the autocallable feature: if at any coupon date, the index level is above the initial
value, the bond would be called back prematurely by paying the coupon for the year
and the full principal.
When the issuing bank is designing such an instrument (often in consultation
with one or more potential investors), it has a number of elements to choose from:
the level of the coupon, the level of the barrier, whether or not to include the two
wrinkles (autocall and memory) and with what specifications. With the outcome
uncertain for both the parties, without visualizing the entire probability distribution
of outcomes they may not be in a position or interested to commit. For example,
when the investor pays 100 for this bond, the bank might receive around 97 (after
marketing and distribution costs) and must ensure that the discounted present value
of its expected payments amount to only 97. Otherwise, it makes a loss. At the same
time, the investor would have a view of how the stock market is likely to behave, and
would analyze the expected profit from the product based on that view.
This requires both a sophisticated pricing library like QuantLib as well as a visual-
izing platform. An object-oriented Python based module again perfectly fits the bill.
QuantLib-Python could be used to analyze different pricing scenarios using Monte
Carlo simulation [15], and the power of NumPy, Jupyter and Matplotlib could be
leveraged to visualize the results.
As an illustration, the two plots in Figs. 2.3 and 2.4 show the visualization of the
actual instrument described above alongside that of an alternative design which yields
roughly the same value but a different probability distribution and risk profile. In the
alternative design (Fig. 2.4), the memory feature has been switched OFF (making it
worse for the investor), but this has been offset by reducing the barrier level to 50%
thereby reducing the risk of principal loss.
2.6 Conclusion
The market for financial structured products runs in hundreds of billions of dollars
worldwide, with the variety ranging from payoffs from plain vanilla call and put
options to complex autocallable products with barrier and memory features. Given
the number of design elements involved which can be tweaked in creating such
products, there is often no way to determine their behaviour other than visualizing
how sensitive their payoff and price would be to changes in the contractual features,
market variables and statistical proxies for risk and dependence.
In this note we have highlighted the power of the Python stack for designing
graphical user interfaces for engineering structured product solutions by visualizing
their payoffs and prices in a web browser. Object-oriented programming in Python
combined with the power of NumPy, Matplotlib and Jupyter fits the bill perfectly
for design and visualization in financial engineering. Given the compatibility of
66 J. R. Varma and V. Virmani
Fig. 2.3 Distribution of payoff from a Phoenix Memory Autocallable Note with the memory feature
switched ON
Fig. 2.4 Distribution of payoff from a Phoenix Memory Autocallable Note with the memory feature
switched OFF
References
1. Higham, D.J.: Black-Scholes for scientific computing students. Comput. Sci. Eng. 6(6), 72–79
(2004)
2. Black, F., Scholes, M.S.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3),
637–654 (1973)
3. Barreto, H.: Why excel? J. Econ. Educ. 46(3), 300–309 (2015)
4. Barreto, H., Widdows, K.: Introductory economics labs. J. Econ. Educ. 43(1), 109 (2012)
5. Briand, G., Hill, R.C.: Teaching basic econometric concepts using Monte Carlo simulations in
excel. Int. Rev. Econ. Educ. (2013)
6. Engelhardt, L.M.: Simulating price-taking. J. Econ. Educ. 46(4), 107–113 (2015)
7. Zhang, C.: Incorporating powerful excel tools into finance teaching. J. Finan. Educ. 40(3 & 4),
87–113 (2015)
8. Varma, J.R., Virmani, V.: Web applications for teaching portfolio analysis and option pricing.
Adv. Finan. Educ. (2021)
9. Powell, S.G., Baker, K.R., Lawson, B.: Impact of errors in operational spreadsheets. Decis.
Support Syst. 47, 126–132 (2009)
10. Morgan, J.P.: Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO
Losses (2013)
11. Mandanici, A., Alessandro Sarà, S., Fiumara, G., Mandaglio, G.: Studying physics, getting to
know Python: RC circuit, simple experiments, coding, and data analysis with Raspberry Pi.
Comput. Sci. Eng. 23(1), 93–96 (2021)
12. Bauer, M., Lee, W., Papadakis, W., Zalewski, M., Garland, M.: Supercomputing in python with
legate. Comput. Sci. Eng. 23(4), 73–79 (2021)
13. Mandanici, A., Mandaglio, G., Pirrotta, G. , Nibali, V.C., Fiumara, G.: Simple physics with
python: a workbook on introductory physics with open-source software. Comput. Sci. Eng.
24(2), 1–5 (2022)
14. Zhang, L., Wu, T., Lahrichi, S., Salas-Flores, C.-G., Li, J.: A data science pipeline for algo-
rithmic trading: a comparative study of applications for finance and cryptoeconomics. In: 2022
IEEE International Conference on Blockchain (Blockchain), Espoo, Finland, pp. 298–303
(2022)
15. Varma, J.R., Virmani, V.: Computational finance using QuantLib-Python. Comput. Sci. Eng.
18, 78–88 (2016)
16. Oliphant, T.E.: Python for scientific computing. Comput. Sci. Eng. 9(3), 10–20 (2007)
17. Perez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci.
Eng. 9(3), 21–29 (2007)
18. Hunter, J.D.: Matplotlib: a 2d graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)
19. van der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: a structure for efficient
numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)
Chapter 3
Neurodynamic Approaches
to Cardinality-Constrained Portfolio
Optimization
Abstract The field of portfolio optimization holds significant interest for both
academic researchers and financial practitioners. Markowitz’s seminal mean–vari-
ance analysis laid the groundwork for optimizing portfolios by balancing returns and
risks, marking a pivotal advancement in investment strategy formulation. However,
despite its foundational role, mean–variance theory is not without its limitations,
notably its reliance on assumptions that do not always hold in real-world scenarios
and its use of variance as a risk measure, which may not fully capture the complexities
of risk behaviors. The pursuit of alternative risk measures introduces mathematical
and computational challenges due to nonconvexity and discontinuities. Concurrently,
the field of neural networks has seen vigorous activity, particularly with the advance-
ments in deep learning, offering novel approaches to a variety of optimization prob-
lems. Within this stream, neurodynamic optimization emerges as a method that lever-
ages the parallel and distributed computing capabilities of neural networks, proving to
be effective for tackling global optimization, multi-period, and multi-objective prob-
lems, and is now expanding into bi-level and combinatorial optimization domains.
Given these developments, applying neurodynamic optimization to portfolio opti-
mization is a promising avenue, especially considering the unique challenges posed
by the financial domain in terms of complexity and scale. This chapter delves into
the application of neurodynamic optimization to portfolio optimization, specifically
focusing on cardinality-constrained problems. Through experimental analysis across
several global stock market datasets, neurodynamic systems have demonstrated their
efficacy in achieving superior performance based on key metrics.
3.1 Introduction
for solving various optimization problems. For example, neurodynamic models have
been devised to address nonlinear optimization problems with nonlinear inequality
constraints [23], in addition to tackling constrained convex problems through the use
of projection operators [24]. By leveraging differential inclusion theory, a single-
layer neurodynamic approach has been proposed to tackle non-smooth optimization
problems, demonstrating convergence to solutions within a finite time [25]. To over-
come the challenge of multiple local optima, collaborative neurodynamic approaches
have been introduced [26, 27]. These approaches utilize multiple neural networks
to conduct precise local searches and incorporate metaheuristics like particle swarm
optimization for information exchange [28]. In the case of optimization problems
with multiple objectives, a scalarization technique is often employed to convert the
problem into a set of subproblems. Multiple neural networks are then employed to
solve each subproblem, generating a Pareto front of solutions [29, 30]. Neurodynamic
approaches have found applications in various fields, including engineering, due to
their superior performance [31–33], and have been proven to globally converge with
guaranteed optimality. Additionally, neurodynamic strategies that utilize recurrent
neural networks (RNNs) are particularly effective for real-time optimal asset allo-
cation when executed on specialized hardware, including various GPUs and CPUs.
The parallelism inherent in neurodynamic approaches is a significant advantage. By
leveraging parallel computing environments, neurodynamic models can efficiently
explore large solution spaces and accelerate the optimization process, leading to
faster and more effective solutions.
In particular, an approach involving collaborative neurodynamic optimization
(CNO) has been introduced for selecting portfolios [34]. This approach, grounded
in a minimax and bi-objective framework, leverages neural networks to map out the
Pareto front effectively. Following this, the approach tackles a decentralized robust
portfolio optimization challenge within the mean–variance (MV) framework using
a neurodynamic model [35]. Furthermore, this technique is applied to address port-
folio selection problems with specific constraints, reconceptualizing them as mixed-
integer optimization issues [36]. Through the utilization of RNNs, this strategy seeks
Pareto-optimal solutions by fine-tuning a weighted objective function, alongside
employing a meta-heuristic algorithm to adjust the weights, proving its efficacy in
yielding favorable Pareto-optimal outcomes. In another instance [37], the method is
applied to portfolio selection focusing on precise performance objectives, success-
fully addressing and solving five distinct optimization challenges, thereby show-
casing exceptional performance backed by thorough experimentation. This chapter
presents a two-timescale duplex neurodynamic approach to asset allocation. Here, the
challenge of MV asset allocation is redefined as a biconvex optimization problem,
incorporating the aspect of conditional value at risk. It utilizes two RNNs func-
tioning on separate timescales to identify the optimal solutions. Simultaneously, a
meta-heuristic approach is adpoted to update the neural states, thereby avoiding the
pitfalls of potential local minima.
This chapter is organized into five sections. Section 3.2 outlines the reformulation
of the portfolio optimization problem, considering both scenarios with and without
cardinality limitations. Section 3.3 highlights some existing neurodynamic models.
72 M.-F. Leung and J. Wang
Section 3.4 presents two neurodynamic models for portfolio optimization problem
with cardinality constraints. Section 3.5 presents the experimental results. Finally,
Sect. 3.6 concludes the chapter.
3.2 Preliminaries
where f (x, y) is biconvex with respect to both x and y, over the domain X × Y.
The MV framework suggests that investors should evaluate the risk and expected
return of every asset, then allocate their funds across these assets to find the ideal
equilibrium between risk and return. This allocation of funds is represented by the
portfolio proportions y ∈ Y = [0, 1]n , with n representing the total number of assets..
For simplicity, no short-selling will be allowed. The expected portfolio return and
its variance are denoted by μT y and yT Vy, respectively, where μ ∈ Rn is the mean
returns and V is the covariance matrix. Markowitz’s MV portfolio selection model
can be articulated through two distinct optimization problems:
min yT Vy
y
s.t. μT y ≥ μmin ,
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 73
eT y = 1,
y ≥ 0, (3.2)
or
max μT y
y
s.t. yT Vy ≤ σmax ,
eT y = 1,
y ≥ 0, (3.3)
where μmin is the minimum required return in (3.2), σmax is the cap on portfolio
variance in (3.3), the vector e consists entirely of ones, and eT y = 1 acts as the budget
restriction. However, these formulations are prone to inaccuracies due to estimation
errors. To circumvent this, a more resilient strategy within the MV framework, like
minimax portfolio selection, is suggested [39, 40]. The strategy outlined in [34, 41]
aims to optimize the portfolio against the lowest expected returns, specified as:
s.t. e y = 1,
T
y ≥ 0, (3.4)
Variance is not always a suitable measure of market volatility, and value-at-risk (VaR)
is an alternative. Let ξ ∈ Rn be random returns. VaR is defined as the lowest possible
ρ ∈ R such that the probability of −ξ T y ≤ ρ is greater than or equal to a given
threshold 0 < θ < 10 [42]. That is,
{ ( ) }
VaRθ (y) = min ρ ∈ R : P −ξ T y ≤ ρ ≥ θ . (3.5)
74 M.-F. Leung and J. Wang
1 ∑ N ( )
CVaRθ (y) ≈ ρ + max 0, −ξjT y − ρ . (3.7)
N (1 − θ ) j=1
min −μT y
y
1 ∑ N
min ρ + σj
σ,ρ N (1 − θ ) j=1
s.t. σj ≥ −ξjT y − ρ, σj ≥ 0, j = 1, 2, . . . , N ,
eT y = 1,
y ≥ 0, (3.8)
( )
where σj = max 0, −ξjT y − ρ for all j.
Introduced by Nobel laureate Sharpe [45], the Sharpe Ratio (SR) serves as a widely
recognized metric for assessing the risk-adjusted performance of investment portfo-
lios. This metric calculates the average return that exceeds the risk-free rate for each
unit of volatility, using standard deviation as a measure of risk [46]. Additionally,
the SR has been employed as a target metric for optimizing portfolio allocations [47,
48] as follows:
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 75
μT y − rf
max √
yT Vy
s.t. eT y = 1,
y ≥ 0, (3.9)
μT y − rf
max
CVaRθ (y)
s.t. eT y = 1,
y ≥ 0. (3.10)
Within the MV framework, portfolios are typically chosen from an unrestricted pool
of assets in an ideal market setting. However, real-world market imperfections often
limit investors to selecting only a subset of available assets, necessitating the inclu-
sion of cardinality constraints into the portfolio optimization equation. This, in turn,
increases the complexity of the problem significantly [50]. Thus, the optimization
scenario such as (3.2) incorporates these constraints as follows:
min yT Vy
y
s.t. μT y ≥ μmin ,
eT y = 1,
||y||0 ≤ k,
y ≥ 0, (3.11)
min −μT y
y,z,σ,ρ
76 M.-F. Leung and J. Wang
1 ∑ N
min ρ + σj
y,z,σ,ρ N (1 − θ ) j=1
s.t. σj ≥ −ξjT y − ρ, σj ≥ 0, j = 1, 2, . . . , N ,
eT y = 1,
eT z ≤ k,
0 ≤ y ≤ z,
z ∈ {0, 1}n , (3.12)
where z ∈ {0, 1}n is a binary vector, eT z ≤ k is the cardinality constraint which limits
the total assets selected to k, k is an integer. This formulation results in a bi-objective
mixed-integer programming problem due to the binary nature of z, known for its
computational intractability or NP-hardness [52].
To mitigate these challenges, the problem is reformulated as a constrained
global optimization issue, incorporating additional equality constraints signified by
z ◦ (z − e) = 0, with ◦ denoting the Hadamard product.
Let the functions be defined as:
f1 (y) = −μT y,
1 ∑N
f2 (σ, ρ) = ρ + σj ,
N (1 − θ ) j=1
g(y, z, σ, ρ) = (−ξjT y − ρ − σj , y − z, eT z − k T ),
( )T
h(y, z) = z ◦ (z − e), eT y − 1 .
Let f _λ = max{λ(f1 (y) + μmax ), (1 − λ)(f2 (σ, ρ))} a constrained global opti-
mization problem is formulated in the following epigraph form:
min fλ
fλ ,y,z,σ,ρ
h(y, z) = 0,
y, z ∈ [0, 1]n
σ ≥ 0. (3.14)
s.t. σJ ≥ −ξjT y − ρ, σJ ≥ 0, J = 1, 2, . . . , N ,
eT y = 1,
eT z ≤ k,
0 ≤ y ≤ z,
z ◦ ζ = 0,
z + ζ − e = 0, (3.15)
min ψ(y)
y∈Y
where ∈ stands for a positive time constant, ∇ψ(·) signifies the gradient of ψ, and
(·)+ is defined as follows:
{
+ 0, yi < 0;
(yi ) =
yi , yi ≥ 0.
dy ∑
∈ ∈ −∇ψ(y) − λ∂ max{0, gi (y)} (3.19)
dt i
where λ acts as a penalty parameter, and ∂(·) represents Clarke’s generalized gradient
[57]. The gradient of the max function can be given as
⎧
⎨ ∇gi (y), gi (y) > 0
∂ max{0, gi (y)} = [0, 1]∇gi (y), gi (y) = 0.
⎩
0, gi (y) < 0
dy
∈ ∈ φ(∇ψ(y), Y) (3.20)
dt
where φ(·) is a function dependent on the gradient of ∇ψ(y) and the domain Y.
In recent developments, CNO approaches incorporating multiple neurodynamic
models have been proposed to tackle the difficulty of finding global optimal solutions
for nonconvex objective functions (e.g., [26, 27, 58–60]). In these approaches, meta-
heuristics such as particle swarm optimization (PSO) [28], to dynamically adjust
the initial states of the models. The update rule for PSO is given by Eqs. (3.21) and
(3.22):
vi (j + 1) = c0 vi (j) + c1 r1 (~
yi (j) − yi (j)) + c2 r2 (ŷ − yi (j)) (3.21)
yi (j + 1) = yi (j) + vi (j + 1) (3.22)
where yi (j) = (yi1 (j), . . . , yin (j))T and vi (j) = (vi1 (j), . . . , vin (j))T represent the
position and velocity of the i-th particle at iteration j, respectively, with c0 as the inertia
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 79
coefficient, c1 and
( c2 as acceleration )T factors, and r1 and r2 as random numbers within
[0, 1]. ~
yi (j) = ỹi1 (j), . . . , ỹin (j) is the best previous position of the i-th particle,
while ŷ = (ŷ1 , . . . , ŷn )T denotes the swarm’s overall optimal position found.
To enhance exploration capabilities, wavelet mutation is employed [61, 62],
defined by the function:
1 ( e) 5e
η = √ exp − cos ,
a 2a a
where a = exp 10(j/jmax ), with jmax as the maximum iteration count, and e is a
uniformly distributed random number in the range (−2.5a, 2.5a) [61]. Subsequently,
wavelet mutation is performed according to the following equation:
{
yi (j) + η(1 − yi (j)), η > 0,
yi (k + 1) = (3.23)
yi (j) + η(yi (j)), η < 0.
1 ∑
M
δ= ||yi (j + 1) − ŷ||2
M i=1
where λ̃j and λ∗ represent the historical best individual solution and the optimal group
solution for j-th weight, respectively. The collaborative neurodynamic approach,
aimed at cardinality-constrained bi-objective portfolio optimization, employs a hier-
( )T
archical structure detailed in Algorithm 1, where w = yT , z T , σ T , ρ . Notably,
the foundational tier comprises an assembly of neurodynamic models (3.26) that
are periodically reset with PSO to discover Pareto-optimal solutions by addressing
the scalarized optimization problem depicted in (3.14). Concurrently, the upper tier
82 M.-F. Leung and J. Wang
advances weight adjustments using PSO to enhance HV, promoting a diverse solution
set, as elaborated in [30].
3.5.1 Setups
In alignment with prior research [34, 36], our experiments utilized data from four
major stock exchanges: HDAX, FTSE, HSCI, and SP500. The datasets comprised
938 weekly adjusted closing prices of stocks spanning from January 3, 2000, to
December 29, 2017, excluding stocks that were suspended or newly listed during
this timeframe [50, 51, 67]. Consequently, the datasets for HDAX, FTSE, HSCI, and
SP500 included 49, 56, 77, and 356 stocks, respectively. The data were segmented
into two portions for each experiment: one-third for in-sample pre-training and two-
thirds for out-of-sample testing, and half and half. Out-of-sample testing involved
continuous updating of problem parameters using historical return data up to the week
prior to each portfolio rebalancing, ensuring portfolio optimizations were based on
the most current data within a sequentially prolonged time window.
Additionally, following [36], the cardinality constraint k varies across datasets to
test different portfolio sizes: for HDAX, k values of 44, 34, 24, 14, and 4; for FTSE,
50, 39, 28, 16, and 5; for HSCI, 69, 53, 38, 23, and 7; and for SP500, 320, 249, 178,
106, and 35 are examined. The risk-free rate rf is calculated based on the annualized
returns of US Treasury three-month T-bills ryearly , converting these to weekly rates
( )938/18 ( )18/938
using the formula 1 + rweekly −1 = ryearly , rf = rweekly = 1 + ryearly −1
[68]. All experiments use simple return rates for rf .
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 85
Four approaches are used in performance comparison: (1) DNO, a duplex neurody-
namic optimization approach, (2) CNO, a collaborative neurodynamic optimization
approach with 20 neurodynamic models [36], (3) EW, an equally-weighted approach
for portfolio selection [69], and (4) MI, the market index.
For estimating CVaR, the threshold θ is set to 0.95, with N matching the total
available historical data at decision time. The two-timescale duplex neurodynamic
model employs ∈1 /∈2 = 10 in RNN1 and ∈2 /∈1 = 0.1 in RNN2. In the PSO rule,
the constants c1 and c2 are set to 1.49. Moreover, the algorithm terminates when the
threshold ε is equal to 10−3 , and the diversity threshold τ is set to 0.1 as suggested
in [62]. In addition, jmax is set to 50, and r1 , r2 , and the initial states of γ , ρ, σ, y, z,
and ζ are randomly generated within the range 0 and 1.
3.5.2 Results
Tables 3.1, 3.2 and 3.3 present the annualized SR, CSR, and returns across four
datasets. The portfolios developed using the DNO approach show superior annualized
SR, CSR, and return metrics at cardinality levels of 24, 34, and 44. However, in the
HDAX dataset, the DNO approach did not perform as well as the EW approach when
cardinality constraints are set at 4 and 14, where the EW portfolio is not limited by
cardinality. For the FTSE dataset, DNO portfolios surpass the three benchmarks
in terms of annualized SR, CSR, and returns. In the HSCI dataset, DNO portfolios
exhibit higher annualized SR for cardinality values of 38, 53, 69, and 77 but fall short
against the EW method in terms of both annualized SR and returns at cardinality
levels of 7 and 23. Similarly, DNO portfolios demonstrate greater annualized CSR
values than the benchmark methods for cardinality values of 38, 53, 69, and 77,
although they underperform compared to EW at cardinality settings of 7 and 23. It
is observed that both SR and CSR metrics for DNO portfolios tend to improve with
an increase in the value of k. Furthermore, Tables 3.4, 3.5 and 3.6, which use data
from the first half of the period for in-sample learning, show enhanced outcomes
compared to Tables 3.1, 3.2 and 3.3, attribute to the utilization of a larger sample
size for in-sample training, indicating improved performance across all metrics.
Figures 3.1 and 3.2 illustrate the growth of cumulative returns for portfolios that
are rebalanced weekly, utilizing datasets divided into thirds (1/3 for in-sample and
2/3 for out-of-sample) and halves (1/2 for in-sample and 1/2 for out-of-sample),
respectively. A general trend observed is that cumulative returns tend to rise with an
increase in the cardinality value k. More precisely, Fig. 3.1 highlights that portfolios
optimized using DNO exhibit superior cumulative returns at cardinality values of 44,
50, 69, and 320 across the respective datasets. Yet, for specific settings such as DNO-
4 and DNO-14 in the HDAX dataset, DNO-5 through DNO-39 in the FTSE dataset,
and DNO-7 and DNO-23 in the HSCI dataset, the performance falls short of the
EW portfolios, which are not limited by cardinality constraints. Figure 3.2 presents
analogous findings, with the first half of the period allocated for in-sample pre-
training, showing the highest cumulative returns for the DNO strategy at cardinality
86 M.-F. Leung and J. Wang
values of 44, 69, and 320 in the HDAX, HSCI, and SP500 datasets, respectively.
However, in certain cases like DNO-4, DNO-14, DNO-24 on HDAX, DNO-5 to
DNO-50 on FTSE, and DNO-7, DNO-23 on HSCI, the DNO portfolios did not
perform as well as the EW portfolios, which are free from cardinality restrictions.
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 87
Fig. 3.1 Cumulative returns for four distinct portfolios, derived from datasets divided into 1/3 for
training and 2/3 for testing phases, from HDAX (in the first subplot), FTSE (in the second subplot),
HSCI (in the third subplot), and SP500 (in the final subplot)
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 93
Fig. 3.2 Cumulative returns for four distinct portfolios, derived from datasets divided into 1/2 for
training and 1/2 for testing phases, from HDAX (in the first subplot), FTSE (in the second subplot),
HSCI (in the third subplot), and SP500 (in the final subplot)
References
10. Lai, Z.-R., Dai, D.-Q., Ren, C.-X., Huang, K.-K.: Radial basis functions with adaptive input and
composite trend representation for portfolio selection. IEEE Trans. Neural Networks Learn.
Syst. 29(12), 6214–6226 (2018)
11. Josa-Fombellida, R., Rincón-Zapatero, J.P.: Equilibrium strategies in a defined benefit pension
plan game. Eur. J. Oper. Res. 275(1), 374–386 (2019)
12. Ponsich, A., Jaimes, A.L., Coello, C.A.C.: A survey on multiobjective evolutionary algo-
rithms for the solution of the portfolio optimization problem and other finance and economics
applications. IEEE Trans. Evol. Comput. 17(3), 321–344 (2013)
13. Kroll, Y., Levy, H., Markowitz, H.M.: Mean-variance versus direct utility maximization. J.
Financ. 39(1), 47–61 (1984)
14. Sharpe, W.F.: Expected utility asset allocation. Financ. Anal. J. 63(5), 18–30 (2007)
15. Morgenstern, O., Von Neumann, J.: Theory of Games and Economic Behavior. Princeton
University Press (1953)
16. R. E. Steuer, Multiple Criteria Optimization: Theory, Computation, and Applications. Wiley,
1986.
17. Mansini, R., Ogryczak, W., Speranza, M.G.: Twenty years of linear programming based
portfolio optimization. Eur. J. Oper. Res. 234(2), 518–535 (2014)
18. Brandhofer, S., Braun, D., Dehn, V., Hellstern, G., Hüls, M., Y. Ji, I. Polian, A. S. Bhatia, and
T. Wellens: Benchmarking the performance of portfolio optimization with QAOA. Quantum
Inf. Process. 22(1), 25 (2022)
19. Ertenlice, O., Kalayci, C.B.: A survey of swarm intelligence for portfolio optimization:
algorithms and applications. Swarm Evol. Comput. 39, 36–52 (2018)
20. Gunjan, A., Bhattacharyya, S.: A brief review of portfolio optimization techniques. Artif. Intell.
Rev. 56(5), 3847–3886 (2023)
21. Tank, D., Hopfield, J.: Simple ‘neural’ optimization networks: an A/D converter, signal decision
circuit, and a linear programming circuit. IEEE Trans. Circ. Syst. 33(5), 533–541 (1986)
22. Hopfield, J.J., Tank, D.W.: Computing with neural circuits—a model. Science 233(4764),
625–633 (1986)
23. Xia, Y., Wang, J.: A recurrent neural network for nonlinear convex optimization subject to
nonlinear inequality constraints. IEEE Trans. Circ. Syst. I Regul. Pap. 51(7), 1385–1394 (2004)
24. Xia, Y., Wang, J.: A recurrent neural network for solving nonlinear convex programs subject
to linear constraints. IEEE Trans. Neural Networks 16(2), 379–386 (2005)
25. Li, G., Yan, Z., Wang, J.: A one-layer recurrent neural network for constrained nonsmooth
invex optimization. Neural Netw. 50, 79–89 (2014)
26. Yan, Z., Wang, J., Li, G.: A collective neurodynamic optimization approach to bound-
constrained nonconvex optimization. Neural Netw. 55, 20–29 (2014)
27. Yan, Z., Fan, J., Wang, J.: A collective neurodynamic approach to constrained global
optimization. IEEE Trans. Neural Networks Learn. Syst. 28(5), 1206–1215 (2017)
28. Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a
multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)
29. Yang, S., Liu, Q., Wang, J.: A collaborative neurodynamic approach to multiple-objective
distributed optimization. IEEE Trans. Neural Networks Learn. Syst. 29(4), 981–992 (2018)
30. Leung, M.-F., Wang, J.: A collaborative neurodynamic approach to multiobjective optimization.
IEEE Trans. Neural Networks Learn. Syst. 29(11), 5738–5748 (2018)
31. Che, H., Wang, J.: A nonnegative matrix factorization algorithm based on a discrete-time
projection neural network. Neural Netw. 103, 63–71 (2018)
32. Wang, J., Wang, J., Che, H.: Task assignment for multivehicle systems based on collaborative
neurodynamic optimization. IEEE Trans. Neural Networks Learn. Syst. 31(4), 1145–1154
(2019)
33. Wang, J., Wang, J., Han, Q.-L.: Neurodynamics-based model predictive control of continuous-
time under-actuated mechatronic systems. IEEE/ASME Trans. Mechatron. 26(1), 311–322
(2021)
34. Leung, M.-F., Wang, J.: Minimax and biobjective portfolio selection based on collaborative
neurodynamic optimization. IEEE Trans. Neural Networks Learn. Syst. 32(7), 2825–2836
(2021)
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 95
35. Leung, M.-F., Wang, J., Li, D.: Decentralized robust portfolio optimization based on
cooperative-competitive multiagent systems. IEEE Trans. Cybern. 52(12), 12785–12794
(2022)
36. Leung, M.-F., Wang, J.: Cardinality-constrained portfolio selection based on collaborative
neurodynamic optimization. Neural Netw. 145, 68–79 (2022)
37. Wang, J., Gan, X.: Neurodynamics-driven portfolio optimization with targeted performance
criteria. Neural Netw. 157, 404–421 (2023)
38. Gorski, J., Pfeuffer, F., Klamroth, K.: Biconvex sets and optimization with biconvex functions:
a survey and extensions. Math. Meth. Oper. Res. 66(3), 373–407 (2007)
39. Young, M.R.: A minimax portfolio selection rule with linear programming solution. Manag.
Sci. 44(5), 673–683 (1998)
40. Polak, G.G., Rogers, D.F., Sweeney, D.J.: Risk management strategies via minimax portfolio
optimization. Eur. J. Oper. Res. 207(1), 409–419 (2010)
41. Deng, X.-T., Li, Z.-F., Wang, S.-Y.: A minimax portfolio selection strategy with equilibrium.
Eur. J. Oper. Res. 166(1), 278–292 (2005)
42. Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42
(2000)
43. Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.: Coherent measures of risk. Math. Financ. 9(3),
203–228 (1999)
44. Gaivoronski, A.A., Pflug, G.: Value-at-risk in portfolio optimization: properties and computa-
tional approach. J. Risk 7(2), 1–31 (2005)
45. Sharpe, W.F.: The sharpe ratio. J. Portfolio Manag. 21(1), 49–58 (1994)
46. Christiansen, C., Joensen, J.S., Nielsen, H.S.: The risk-return trade-off in human capital
investment. Labour Econ. 14(6), 971–986 (2007)
47. Liu, Q., Guo, Z., Wang, J.: A one-layer recurrent neural network for constrained pseudoconvex
optimization and its application for dynamic portfolio optimization. Neural Netw. 26(1), 99–
109 (2012)
48. Liu, Q., Dang, C., Huang, T.: A one-layer recurrent neural network for real-time portfolio
optimization with probability criterion. IEEE Trans. Cybern. 43(1), 14–23 (2013)
49. Eling, M., Schuhmacher, F.: Does the choice of performance measure influence the evaluation
of hedge funds? J. Bank. Finan. 31(9), 2632–2647 (2007)
50. Chang, T.-J., Meade, N., Beasley, J.E., Sharaiha, Y.M.: Heuristics for cardinality constrained
portfolio optimization. Comput. Oper. Res. 27(13), 1271–1302 (2000)
51. Woodside-Oriakhi, M., Lucas, C., Beasley, J.E.: Heuristic algorithms for the cardinality
constrained efficient frontier. Eur. J. Oper. Res. 213(3), 538–550 (2011)
52. Gao, J., Li, D.: Optimal cardinality constrained portfolio selection. Oper. Res. 61(3), 745–761
(2013)
53. Hardoroudi, N.D., Keshvari, A., Kallio, M., Korhonen, P.: Solving cardinality constrained
mean-variance portfolio problems via MILP. Ann. Oper. Res. 254, 47–59 (2017)
54. Kalayci, C.B., Polat, O., Akbay, M.A.: An efficient hybrid metaheuristic algorithm for
cardinality constrained portfolio optimization. Swarm Evol. Comput. 54, 100662 (2020)
55. Xia, Y., Feng, G., Wang, J.: A novel neural network for solving nonlinear optimization problems
with inequality constraints. IEEE Trans. Neural Networks 19(8), 1340–1353 (2008)
56. Li, G., Yan, Z., Wang, J.: A one-layer recurrent neural network for constrained nonconvex
optimization. Neural Netw. 61, 10–21 (2015)
57. Liu, Q., Wang, J.: A one-layer recurrent neural network for constrained nonsmooth optimiza-
tion. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 40(5), 1323–1333 (2011)
58. Che, H., Wang, J.: A collaborative neurodynamic approach to global and combinatorial
optimization. Neural Netw. 114, 15–27 (2019)
59. Che, H., Wang, J.: A two-timescale duplex neurodynamic approach to biconvex optimization.
IEEE Trans. Neural Networks Learn. Syst. 30(8), 2503–2514 (2019)
60. Che, H., Wang, J.: A two-timescale duplex neurodynamic approach to mixed-integer optimiza-
tion. IEEE Trans. Neural Networks Learn. Syst. 32(1), 36–48 (2021)
96 M.-F. Leung and J. Wang
61. Ling, S.-H., Iu, H.H., Chan, K.Y., Lam, H.-K., Yeung, B.C., Leung, F.H.: Hybrid particle
swarm optimization with wavelet mutation and its industrial applications. IEEE Trans. Syst.
Man Cybern. Part B (Cybern.) 38(3), 743–763 (2008)
62. Fan, J., Wang, J.: A collective neurodynamic optimization approach to nonnegative matrix
factorization. IEEE Trans. Neural Networks Learn. Syst. 28(10), 2344–2356 (2017)
63. Juang, C.-F.: A hybrid of genetic algorithm and particle swarm optimization for recurrent
network design. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 34(2), 997–1006 (2004)
64. Leung, M.-F., Wang, J.: A collaborative neurodynamic optimization approach to bicriteria
portfolio selection. In: Lu, H., Tang, H., Wang, Z. (eds.) Advances in Neural Networks, pp. 318–
327. Springer International Publishing, Cham. ISNN 2019
65. Leung, M.-F., Wang, J., Che, H.: Cardinality-constrained portfolio selection via two-timescale
duplex neurodynamic optimization. Neural Netw. 153, 399–410 (2022)
66. Che, H., Wang, J., Cichocki, A.: Bicriteria sparse nonnegative matrix factorization via two-
timescale duplex neurodynamic optimization. IEEE Trans. Neural Networks Learn. Syst. 34(8),
4881–4891 (2023)
67. Guastaroba, G., Speranza, M.G.: Kernel search: an application to the index tracking problem.
Eur. J. Oper. Res. 217(1), 54–68 (2012)
68. Hodoshima, J.: Stock performance by utility indifference pricing and the Sharpe ratio. Quant.
Finan. 19, 1–12 (2018)
69. DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is
the 1/N portfolio strategy? Rev. Finan. Stud. 22(5), 1915–1953 (2009)
Chapter 4
Fully Homomorphic Encrypted Wavelet
Neural Network for Privacy-Preserving
Bankruptcy Prediction in Banks
4.1 Introduction
Machine Learning is being extensively used in almost every field such as healthcare,
finance, education, intrusion detection, and even in recommendation systems [1]. A
lot of private data is stored in the databases and is openly utilized by the ML algo-
rithms to build models from them. One of the major concerns in the application of
ML models is the privacy and security of such private data. Organizations cannot
simply ignore the privacy concerns of the data such as customers’ Personal Identifi-
able Information (PII) and at the same time cannot stop analyzing such data because
it would reap immense business and operational benefits to the organization.
On May 25, 2018, European Union (EU), brought into effect the toughest privacy
and security law in the world called General Data Protection Regulation (GDPR) [2].
The law states that the organizations that violate the privacy and security standards
will be imposed heavy fines of almost millions of euros. One more such law, namely,
California Consumer Privacy Act (CCPA), allows the consumers in California the
right to know about everything that a business collects about them, the right to delete
the collected information, and the right to opt out of the sale of their information
[3]. Similarly, Personal Data Protection Act (PDPA) enacted in Singapore protects
personal Data [4].
With such strict privacy laws, organizations are precluded from using private data
freely. To overcome this problem, PPML provides different ways that will assure the
customers that their data privacy will be protected and at the same time organizations
can work on the private data and build better and more responsible ML Models.
Even in the financial domain, privacy preservation has a high priority because
the PII of a customer should not be available or shared with other organization
without the consent of the particular individual. The privacy preservation is not only
concerned with the customers data but it is also of an utmost importance to the
organizations. When an analysis is done on the data of an organization and it is
found that the organization is on the verge of bankruptcy then it would create a
chaos among the employees as well as the customers. So PPML helps to perform the
analysis without revealing the actual output of the analysis to everyone but only the
important stakeholders of the company.
Bankruptcy can be explained as the phenomenon when an bank/firm is unable to
complete its financial commitments or unable to return the due credit amount to its
creditors. In simpler terms, we can say that the bank/firm is unable to generate the
wealth to clear off its debt. This information is highly confidential and should not be
revealed to everyone unless the bank/firm is completely. PPML allows the organiza-
tions or banks to perform the bankruptcy prediction analysis without revealing any
sensitive information.
There are different approaches in PPML and there is no single proven approach
that is considered to be the best among all the approaches. For example, one of
the approaches is Differential Privacy (DP) where the researchers can work on the
peoples’ personal information without disclosing their identity. But the drawback of
DP is that it might lead to a loss in model accuracy. Similarly, another technique is
4 Fully Homomorphic Encrypted Wavelet Neural Network … 99
called Secure Multi-Party Computation where multiple data owners can collabora-
tively train the model but this might result in high communication overhead or high
computation overhead [5].
One more approach is to secure the data using Homomorphic Encryption. It
allows the computation to be performed on the encrypted data without the need
for decryption. Partial Homomorphic Encryption (PHE), Somewhat Homomorphic
Encryption (SWHE), and Fully Homomorphic Encryption (FHE) are the variations
of Homomorphic Encryption. PHE allows an unlimited number of either additions
or multiplications, SHE allows a limited number of arithmetic operations, and FHE
allows an unlimited number of additions and multiplications on the encrypted data.
In this chapter, we focus on the FHE which is considered to be the most secure
technique compared to others. Here, we propose FHE based privacy-preserving
Wavelet Neural Network (WNN). Thus we designed and implemented the secure
WNN by ensuring that the data and all the trainable parameters in the network are
fully homomorphic encrypted and also we get the results in an encrypted format.
The remaining part of the chapter is structured as follows: in Sect. 4.2, we overview
the bankruptcy prediction and state the problem. Section 4.3 presents the related work
regarding homomorphic encryption. Section 4.4 explains the proposed methodology
and in Sect. 4.5 briefly describes the datasets analyzed. The results are discussed in
Sect. 4.6 and finally, Sect. 4.7 presents the conclusion and future directions. Appendix
consists of Tables presenting the features of datasets analyzed.
The prediction of bankruptcy for financial firms and banks has been extensively
researched area since late 1960s [6]. Creditors, auditors, stockholders and senior
management are all equally interested in bankruptcy prediction because it affects all
of them alike [7]. The health of a bank in a highly competitive business environment
depends on (i) how financially solvent it is at the inception, (ii) its ability, relative
flexibility and efficiency in creating cash from its continuous operations, (iii) its
access to capital markets and (iv) its financial capacity and staying power when
faced with unplanned cash short-falls. As a bank becomes more and more insolvent,
it gradually enters a danger zone. Then, changes to its operations and capital structure
must be made in order to keep it solvent [8].
The most precise way of monitoring banks is by conducting on-site examina-
tions. These examinations are conducted on a bank’s premises by regulators every
12–18 months, as mandated by the Federal Deposit Insurance Corporation Improve-
ment Act of 1991. Regulators utilize a six part rating system to indicate the safety
and soundness of the institution. This rating, referred to as the CAMELS rating,
evaluates banks according to their basic functional areas: capital adequacy, asset
quality, management expertise, earnings strength, liquidity and sensitivity to market
100 S. I. Ahamed et al.
risk. While CAMELS ratings clearly provide regulators with important informa-
tion, Cole and Gunther [9] reported that these CAMELS ratings decay rapidly. This
awakening opened the floodgates of research activities in bankruptcy prediction area
whereby the entire gamut statistical and machine learning techniques were applied
in a flurry of publications spread across 2 decades from 1990 to 2010s. However,
with the GDPR and other privacy laws in force, the financial statement data including
balance sheet data of banks cannot be shared with a third party for rigorous predictive
analytics purpose.
This stringent constraint calls for the application of privacy preserving machine
learning in the area of bankruptcy prediction as well. Toward that direction, this
chapter proposes the privacy preserved, fully homomorphic encrypted WNN to
bankruptcy prediction in banks.
The protocol consists of Paillier Homomorphic Encryption and data masking tech-
nique. Bonte and Vercauteren [20] implemented Privacy-Preserving Logistic Regres-
sion where somewhat homomorphic encryption based on the scheme of Fan and
Vercauteren [21] was used.
where m1 and m2 are plain text and E is the encryption scheme. This implies that
homomorphic encryption of the sum or multiplication of two numbers is equivalent
to the sum or multiplication of two individually homomorphic encrypted numbers.
The homomorphic encryption scheme is mainly divided into three categories
based on the number of operations that can be performed on the encrypted data:
102 S. I. Ahamed et al.
The PHE scheme allows only one type of operation either addition or multiplication
an unlimited number of times on the encrypted data. Some of the examples of partially
homomorphic encryption are RSA (multiplicative homomorphism) [24], ElGamal
(multiplicative homomorphism) [25], and Paillier (additive homomorphism) [26].
The PHE scheme is generally used in applications like Private Information Retrieval
(PIR) and E-Voting.
The SHE scheme allows both addition and multiplication operations but only to a
limited number of times on the encrypted data. Boneh-Goh-Nissim (BGN) and Polly
Cracker Scheme are some examples of the SHE scheme.
The FHE scheme allows all the operations like addition and multiplication an unlim-
ited number of times on the encrypted data but it has high computational complexity
and requires high-end resources for efficient implementation [27]. Gentry [28] was
the first one to propose the concept of FHE along with a general framework to
obtain an FHE scheme. There are mainly three FHE families: Ideal lattice based
over integers [29], Ring Learning With Errors (RLWE) [30], and NTRU-like [31].
We implemented Cheon-Kim-Kim-Song (CKKS) Scheme whose security is based
on the hardness assumption of the RLWE.
Encode
Decode
Plain Text P(X) Plain Text
=f(p)
Encrypt
Decrypt
Fig. 4.1 Block diagram of the encryption and decryption in CKKS scheme
The encryption process happens in two steps in CKKS Scheme. In the first operation,
the vector of real numbers is encoded into a plain-text polynomial. This plain text
polynomial is then encrypted into a ciphertext.
Similar to the encryption process, the decryption also happens in two steps. In the
first operation, the ciphertext is decoded into a plain-text polynomial. This plain text
polynomial is then decrypted to a vector of real numbers. Figure 4.1 depicts the
encryption and decryption process in the CKKS scheme.
The parameters in CKKS decide the privacy level and computational complexity of
the model. These are as follows:
1. Scaling Factor: This defines the encoding precision for the binary representation
of the number.
2. Polynomial modulus degree: This parameter is responsible for the number
of coefficients in plain text polynomials, size of ciphertext, computational
complexity, and security level. The degree should always be in the power of
2, for e.g., 1024, 2048, 4096, …
The higher the polynomial modulus degree, higher the security level achieved.
But, it will also result in the increase the computational time.
3. Coefficient Modulus sizes: This parameter is a list of binary sizes. A list of binary
sizes of those schemes will be generated which is called coefficient modulus size.
The length of the list indicates the number of multiplications possible. The longer
104 S. I. Ahamed et al.
the list the lower is the level of security of the scheme. The prime numbers in
the coefficient modulus must be congruent to 1 modulo 2 * polynomial modulus
degree.
The scheme generates different types of keys which are handled by a single object
called context. The keys are as follows:
1. Secret Key: This key is used for decryption and should not be shared with anyone.
2. Public Encryption Key: This key is used for the encryption of the data.
3. Relinearization Keys: In general the size of the new ciphertext is 2. If there
are two ciphertexts with sizes X and Y, then the multiplication of these two will
result in the size getting as big as X + Y − 1. The increase in the size increases
noise and also reduces the speed of multiplication. Therefore, Relinearization
reduces the size of the ciphertexts back to 2 and this is done by different public
keys which are created by the secret key owner.
The WNN [32] has a simple architecture with just three layers, namely the input
layer, hidden layer, and output layer. The input layer consists of the feature values
or the explanatory variables that are introduced to the WNN and the hidden layer
consists of hidden nodes which are generally referred to as Wavelons. These wavelons
transform the input values into translated and dilated forms of the Mother Wavelet.
The approximate target values are estimated in the output layer. All the nodes in
each layer are fully connected with the nodes in the next layer. We implemented the
WNN with Gaussian wavelet function as an activation function, which is defined as
follows (Fig. 4.2)
f (t) = e−t
2
(4.1)
The algorithm to train the WNN is as follows. It is simpler than the backprop-
agation algorithm because here only the gradient descent is applied to update the
parameters without backpropagating the errors [33]:
1. Select the number of hidden nodes and initialize all the weights, translation and
dilation parameters, randomly using uniform distribution in (0, 1).
2. The output value ŷ of each sample is predicted as follows:
(∑ )
∑
nhn nin
wij xki − bj
ŷ = Wj f i=1
(4.2)
j=1
aj
4 Fully Homomorphic Encrypted Wavelet Neural Network … 105
where nhn and nin are the numbers of hidden and input nodes respectively, Wj
and wij are the weights between hidden to output nodes and the weights between
the input to hidden nodes respectively, bj and aj are the translation and dilation
parameters respectively.
3. Update the weights (Wj and wij ), translation (bj ), and dilation (aj ) parameters.
The parameters of a WNN are updated by using the following formulas:
∂E
ΔWj (t + 1) = −η + αΔWj (t) (4.3)
∂Wj (t)
∂E
Δwij (t + 1) = −η + αΔwij (t) (4.4)
∂wij (t)
∂E
Δaj (t + 1) = −η + αΔaj (t) (4.5)
∂aj (t)
∂E
Δbj (t + 1) = −η + αΔbj (t) (4.6)
∂bj (t)
1 ∑
N
E= (yi − ŷi )2 (4.7)
N i=1
where y is the actual output value, N is the number of training samples, η and α
are the learning rate and momentum rate respectively.
4. The steps 2 and 3 are repeated until the error E reaches the specified convergence
criteria.
106 S. I. Ahamed et al.
The algorithms 1 and 2 explain the Training and Testing procedure of the
Encrypted WNN.
In the above algorithm, ΔE is the change in the Mean Squared Error of the current
batch and previous batch and max_accuracy is the maximum accuracy obtained by
the unencrypted model
In this dataset, there are 250 instances and 7 features including the target variable,
namely, whether, a bank is bankrupt or non-bankrupt [36]. Out of the 250 instances,
143 instances are non-bankrupt banks and 107 are bankrupt. The description of the
features is provided in Table 4.3.
4 Fully Homomorphic Encrypted Wavelet Neural Network … 109
In this dataset, there are 66 instances and a total of 10 features including the target
variable [37]. Out of the 66 instances, 37 are the bankrupt banks and 29 are healthy
ones.
The Turkish Bank Dataset consists of 40 instances and 9 features including the target
variable [38]. Out of the 40 instances, 22 instances are the banks which went bankrupt
and 18 are the banks were healthy.
The UK banks dataset has 60 instances with 10 features [39], where 30 banks were
bankrupt and 30 were healthy.
Traditional neural networks often involve numerous parameters and complex archi-
tectures, resulting in high time complexity for both training and inference. When
combined with FHE, where all parameters are encrypted, this time complexity is
further increased. However, WNNs offer a distinct advantage in terms of param-
eter reduction through multiresolution analysis. This reduction in network parame-
ters compared to other neural networks leads to faster training and inference times,
making WNNs computationally efficient even when the parameters are encrypted
using FHE.
By leveraging the multiresolution analysis inherent in WNNs, our research takes
advantage of the computational efficiency of the network architecture. This allows
us to mitigate the time complexity challenges associated with using FHE in neural
networks. The reduction in the number of parameters, combined with the unique
capabilities of WNNs, enables us to effectively apply FHE in the training and infer-
ence processes of WNNs, opening new possibilities for privacy-preserving machine
learning applications.
All the experiments are carried out on a system with the following configura-
tion: HP Z8 workstation with Intel Xeon (R) Gold 6235R CPU processor, Ubuntu
20.04lts, and having RAM of 376.6 GB. The number of hidden nodes is kept the
110 S. I. Ahamed et al.
same as the number of input nodes. Accuracy and Area Under the Receiver Operating
Characteristics Curve (AUC) are taken as the performance metrics.
The polynomial modulus degree and the coefficient modulus sizes were taken as
16384 and [42,36,36,36,36,36,36,36,36,36,36,36] respectively. The global scale was
taken as 220 . The same parameters were used in the encryption for all the datasets.
The polynomial modulus degree 16384 provides a max bit count of 438 bits for
the coefficient modulus which means that the sum of all the values in coefficient
modulus must be less than or equal to 438. The values from the index 1–10 are called
as the intermediate primes which are responsible for rescaling the ciphertext and also
indicate the number of multiplications supported by the scheme. Rescaling keeps the
scale constant and also reduces the noise present in the cipher text. The intermediate
primes should be greater than or equal to the value of the global scale. In our case,
we have chosen 20 as the global scale and the intermediate primes are selected as
36. The size of the plain text will be bounded by the first value in the coefficient
modulus which is taken as 42 in our scenario. The last prime should be as large as
the other primes in the coefficient modulus.
In the Qualitative Bankruptcy dataset, all the features are categorical (in a textual
format). We converted the labels of all the features into a numeric form. In the UK
and Turkish Dataset the predictor variable was in textual format which was converted
to numeric form. The hyperparameters for the Datasets are presented in Table 4.1
respectively.
In the datasets, both the unencrypted and encrypted models performed almost
identically because the Accuracy and AUC yielded by them turned out to be nearly
equal. The results of the Datasets are presented in Table 4.2. It turns out that the
PPWNN resulted in higher AUC compared to the unencrypted version of the WNN.
This is a significant result of the study.
This section provides the details about the feature information about the datasets
analyzed during the research (Tables 4.3, 4.4, 4.5 and 4.6).
112 S. I. Ahamed et al.
References
1. Al-Rubaie, M., Chang, J.M.: Privacy-preserving machine learning: threats and solutions. IEEE
Secur. Priv. 17(2), 49–58 (2019)
2. Truong, N., Sun, K., Wang, S., Guitton, F., Guo, Y.K.: Privacy preservation in federated learning:
an insightful survey from the GDPR perspective, Comput. Secur. 110, 102402 (2021). ISSN
0167-4048
3. Stallings, W.: Handling of personal information and deidentified, aggregated, and
pseudonymized information under the California consumer privacy act. IEEE Secur. Priv.
18(1), 61–64 (2020)
4. Chik, W.: The Singapore Personal Data Protection Act and an assessment of future trends in
data privacy reform. Comput. Law Secur. Rev. 29, 554–575 (2013)
5. Xu, R., Baracaldo, N., Joshi, J.: Privacy-preserving machine learning: methods, challenges and
directions (2021). arXiv preprint arXiv:2108.04417
6. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
J. Finance 23, 589–609 (1968)
7. Wilson, R.L., Sharda, R.: Bankruptcy prediction using neural networks. Decis. Support Syst.
11, 545–557 (1994)
8. Kumar, P.R., Ravi, V.: Bankruptcy prediction in banks and firms via statistical and intelligent
techniques—a review. Eur. J. Oper. Res. 180(1), 1–28 (2007)
9. Cole, R., Gunther, J.: A CAMEL rating’s shelf life. Federal Reserve Bank of Dallas Review,
pp. 13–20 (1995)
10. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving
ridge regression on hundreds of millions of records. In: 2013 IEEE Symposium on Security
and Privacy, pp. 334–348 (2013)
11. Chabanne H., De Wargny, A., Milgram, J., Morel, C., Prouff, E.: Privacy-preserving
classification on deep neural network. Cryptology ePrint Archive (2017)
12. Xie, P., Bilenko, M., Finley, T., Gilad-Bachrach, R., Lauter, K., Naehrig, M.: Crypto-nets:
neural networks over encrypted data (2014). arXiv preprint arXiv:1412.6181
13. Chen, H., Gilad-Bachrach, R., Han, K., et al.: Logistic regression over encrypted data from
fully homomorphic encryption. BMC Med. Genomics 11, 81 (2018)
14. Cheon, J.H., Kim, D., Kim, Y., Song, Y.: Ensemble method for privacy-preserving logistic
regression based on homomorphic encryption. IEEE Access 6, 46938–46948 (2018)
114 S. I. Ahamed et al.
15. Bellafqira, R., Coatrieux, G., Genin, E., Cozic, M.: Secure multilayer perceptron based on
homomorphic encryption. In: Yoo, C., Shi, Y.Q., Kim, H., Piva, A., Kim, G. (eds.) Digital
Forensics and Watermarking. IWDW. Lecture Notes in Computer Science, vol. 11378. Springer,
Cham (2019)
16. Nandakumar, K., Ratha, N., Pankanti, S., Halevi, S.: Towards deep neural network training on
encrypted data. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pp. 40–48 (2019)
17. Halevi, S., Shoup, V.: Design and implementation of HElib: a homomorphic encryption library.
Cryptology ePrint Archive (2020)
18. Sun, X., Zhang, P., Liu, J.K., Yu, J., Xie, W.: Private machine learning classification based on
fully homomorphic encryption. IEEE Trans. Emerg. Top. Comput. 8(2), 352–364 (2020)
19. Qiu, G., Gui, X., Zhao, Y.: Privacy-preserving linear regression on distributed data by
homomorphic encryption and data masking. IEEE Access 8, 107601–107613 (2020)
20. Bonte, C., Vercauteren, F.: Privacy-preserving logistic regression training. BMC Med.
Genomics 11, 86 (2018)
21. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint
Archive (2012)
22. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approxi-
mate numbers. In: International Conference on the Theory and Application of Cryptology and
Information Security, pp. 409–437. Springer, Cham (2017)
23. Acar, A., Aksu, H., Uluagac, A.S., Conti, M.: A survey on homomorphic encryption schemes:
theory and implementation. ACM Comput. Surv. 51(4), 35p, Article 79 (July 2019) (2018)
24. Nisha, S., Farik, M.: RSA public key cryptography algorithm—a review. Int. J. Sci. Technol.
Res. 6, 187–191 (2017)
25. Haraty, R.A., Otrok, H., El-Kassar, A.N.: A comparative study of Elgamal based cryptographic
algorithms. In: ICEIS 2004-Proceedings of the Sixth International Conference on Enterprise
Information Systems, pp. 79–84 (2004)
26. Nassar, M., Erradi A., Malluhi, Q.M.: Paillier’s encryption: implementation and cloud appli-
cations. In: 2015 International Conference on Applied Research in Computer Science and
Engineering (ICAR), pp. 1–5 (2015)
27. Chialva, D., Dooms, A.: Conditionals in homomorphic encryption and machine learning
applications (2018). arXiv preprint arXiv:1810.12380
28. Gentry, C.: A fully homomorphic encryption scheme. Stanford University (2009). https://cry
pto.stanford.edu/craig/craig-thesis.pdf
29. van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over
the integers. In: Gilbert, H. (eds.) Advances in Cryptology—EUROCRYPT 2010. Lecture
Notes in Computer Science, vol. 6110. Springer, Berlin (2010)
30. Brakerski, Z., Vaikuntanathan, V.: Fully homomorphic encryption from ring-LWE and security
for key dependent messages. In: Proceedings of the 31st Annual Conference on Advances in
Cryptology (CRYPTO’11), pp. 505–524. Springer, Berlin (2011)
31. López-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud
via multikey fully homomorphic encryption. In: Proceedings of the Forty-Fourth Annual ACM
Symposium on Theory of computing (STOC’12). Association for Computing Machinery, New
York, NY, USA, pp. 1219–1234 (2012)
32. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Networks 3(6), 889–898
(1992)
33. Kumar, K.V., Ravi, V., Carr, M., Kiran, N.R.: Software development cost estimation using
wavelet neural networks. J. Syst. Software 81(11), 1853–1867 (2008). ISSN 0164-1212
34. Benaissa, A., Retiat, B., Cebere, B., Belfedhal, A.E.: Tenseal: a library for encrypted tensor
operations using homomorphic encryption (2021). arXiv preprint arXiv:2104.03152
35. Qian, X., Klabjan, D.: The impact of the mini-batch size on the variance of gradients in
stochastic gradient descent (2020). arXiv preprint arXiv:2004.13146.
36. Kim, M.J., Ingoo, H.: The discovery of experts’ decision rules from qualitative bankruptcy data
using genetic algorithms. Expert Syst. Appl. 25(4), 637–646 (2003). ISSN 0957-4174
4 Fully Homomorphic Encrypted Wavelet Neural Network … 115
37. Olmeda, I., Fernández, E.: Hybrid classifiers for financial multicriteria decision making: The
case of bankruptcy prediction. Comput. Econ. 10, 317–335 (1997)
38. Canbas, S., Cabuk, A., Kilic, S.B.: Prediction of commercial bank failure via multivariate
statistical analysis of financial structures: The Turkish case. Eur. J. Operat. Res. 166(2), 528–546
(2005)
39. Beynon, M.J., Peel, M.J.: Variable precision refought set theory and data discretisation: an
application to corporate failure prediction. Omega 29, 561–576 (2001)
Chapter 5
Tools and Measurement Criteria
of Ethical Finance Through
Computational Finance
Abstract This chapter aims to offer the reader a critical reflection on computa-
tional finance starting from the principles of ethical finance. With this term, we
refer to those principles that arose from the 1970s onwards, are proposed to imple-
ment socio-environmental values in financial activities, from savings to employment,
also in response to the process of financialization of the economy that has removed
finance itself from real life of local populations. Starting from a critical analysis of
economic positivism that introduced the massive use of mathematics in economics,
it is proposed a reflection on the concept of financial accounting and on the role of the
real acquisition power of wages in order to create a financial system that determines
anew a socio-environmental horizon to which the economy must strive. With these
assumptions, financial tools are proposed based on the principles of ethical finance
and how they can promote a process that we call economic socialization, that is
to allow finance to carry forward again the social and environmental values neces-
sary for the life of local communities. With these assumptions, the first paragraph
introduces the concept and problems of economic positivism and how the process
of financialization of the world economy and its impact on the financial system has
been produced since 1970. In this context, some theoretical concepts are proposed
such as that of the purchasing power of wages to determine a finance linked to the
workforce. The second, introduces the principles of ethical finance, from birth to the
present day. The third introduces some financial and socio-environmental measure-
ment tools and models of ethical finance that could be introduced in computational
finance. Finally, the fourth proposes some conclusions starting from the arguments
set out, including the process of economic socialization, or how finance is called to
carry forward the socio-environmental values of local communities, under penalty
of losing the conditions of real well-being for our societies. In this scenario, it is
proposed that finance must respond to a demand for peoples’ rights.
M. Piccolo
Ethic Bank Foundation, London, United Kingdom
F. Vigliarolo (B)
UNESCO CHAIR, National University of La Plata; UCALP; UBA, La Plata, Argentina
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 117
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_5
118 M. Piccolo and F. Vigliarolo
5.1 Introduction
The objective of the theoretical research presented in this article is to define criteria
that re-establish an ethical dimension in the economy and in particular in finance
that can be applied to computational finance. The relationship between ethics and
computational finance allows us to restore a vision to economics and to computational
finance itself that allows for the incorporation of a social intelligibility, getting out
of the positivist straitjacket that defines it only in terms of reasoning and quantitative
mathematical dimensions.
With these assumptions, this chapter aims to offer the reader a critical reflection on
computational finance starting from the principles of ethical finance. With this term,
we refer to those principles, that arose from the 1970s onwards, are proposed to imple-
ment socio-environmental values in financial activities, from savings to employment,
also in response to the process of financialization of the economy that has removed
finance itself from real life of local populations. Starting from a critical analysis of
economic positivism that introduced the massive use of mathematics in economics,
it is proposed a reflection on the concept of financial accounting and on the role of the
real acquisition power of wages in order to create a financial system that determines
anew a socio-environmental horizon to which the economy must strive. With these
assumptions, financial tools are proposed based on the principles of ethical finance
and how they can promote a process that we call economic socialization, that is to
allow finance to carry forward again the social and environmental values necessary
for the life of local communities.
With these assumptions, the first paragraph introduces the concept and problems
of economic positivism and how the process of financialization of the world economy
and its impact on the financial system has been produced since 1970. In this context,
some theoretical concepts are proposed such as that of the purchasing power of
wages to determine a finance linked to the workforce. The second, introduces the
principles of ethical finance, from birth to the present day. The third introduces some
financial and socio-environmental measurement tools and models of ethical finance
that could be introduced in computational finance. Finally, the fourth proposes some
conclusions starting from the arguments set out, including the process of economic
socialization, or how finance is called to carry forward the socio-environmental values
of local communities, under penalty of losing the conditions of real well-being for
our societies. In this scenario, it is proposed that finance must respond to a demand
for peoples’ rights.
5 Tools and Measurement Criteria of Ethical Finance Through … 119
If today we can talk about ethical finance, it’s thanks of the diffusion of hundreds
and hundreds of initiatives around the world over the last fifty years carried on by
a movement characterized by a vision of development in which “money” is closely
linked to a human and social growth even before economic growth, or, from a use
of money consistent with the values of a community ethic. In other words, this does
not mean that the responsible use of money and its implications is only being asked
today; this has always been, more or less expressed, probably since the times in
which human communities began to use this medium of exchange which is money
(for example we can find in Aristoteles too1 ).
This is an important premise because it allows us to understand how the ideas that
generate change, and we are convinced that this also applies to ethical finance, are
not born only on the basis of a particularly brilliant invention, but rather respond to
questions of meaning that people do when they feel they are an active and responsible
part of their community. Hence the political value of ethical finance which takes
nothing away from the ideal value of individual experiences but which, as a movement
that puts many experiences online on a global scale, lays the foundations for a review
of economic and financial instruments according to a new idea of “development”,
thus breaking the molds of a dated and outdated reading of the world (north against
south, developed countries against less developed countries, etc.) and opening up
new scenarios of cooperation between communities.
Until a few years ago, those who became aware of ethical finance struggled to
see its financial implications, tending more to treat it as an activity falling within
the categories of the spirit rather than economic ones. In grasping above all the
social, solidarity and ecological aspects, there was the risk that this initiative would
be interpreted more as a sophisticated form of philanthropy and/or charity than a
proposal capable of also guaranteeing economic value. On the other hand, there
were those who, imbued with a mainstream financial culture, criticized this proposal
of ethical finance because, having to respond also to non-economic values, it was
not effective in its ability to generate value. Unfortunately, this “rational” approach,
albeit legitimate in its assumptions, has not been able to prevent a substantial part of
finance from changing its “skin”, becoming increasingly autonomous in its ability to
generate wealth independently of its support for the actual economic activity; it is no
coincidence that numerous authors expressly speak of the “financialisation” of the
economy, of detachment from the real economy and of the prevalence of income from
capital over income from workforce. However, these “external” critical readings have
not blocked a movement which in various parts of the world is instead taking hold
and above all is changing a culture of participation as it relates the classic instruments
of democracy (voting, delegation, choice of one’s own representatives, adhesion to
intermediate bodies, etc.) with choices in the field of consumption, savings, mobility,
work.
1 Aristotle [1].
120 M. Piccolo and F. Vigliarolo
In this part of the chapter we will try to analyze principles of ethical finance
from the movement of the 70s onwards even if we find interesting experiences,
in the sense that they reached a discrete size, already in the first half of the last
century, such as some American investment funds promoted by some institutions
of Protestant faith (see Pioneer Fund of the Mennonites 1928…), or those used
by the movements against the war in Vietnam and against nuclear power (civil and
military). All these investment funds, whether determined by religious or secular and
social motivations, were however characterized by negative criteria (no weapons, no
alcohol, no prostitution, no politics, etc.) and not positive ones. Furthermore, when
we speak of ethical finance, we cannot fail to mention other banking experiences
with a strong social vocation born towards the end of the nineteenth century, such as
rural and artisan banks, an expression of a Christian social culture (see Raiffesein2 ),
and popular banks, secular and socialist-inspired.
From the 1970s onwards, things began to change, and above all experiences were
no longer concentrated in Anglo-Saxon countries, where the culture of financial
investment was more widespread among the population. The first major oil crisis, the
rejection of the logic of the cold war, the dissatisfaction with a neo-liberal production
model governed by multinationals, the increasingly serious gap between the north and
the south of the world, etc., give new stimuli to the movement of finance ethics and we
begin to wonder not only about what not to finance but also about what to finance; if
you don’t agree with a certain way of producing and distributing wealth, the time has
come to combine a protest action, typical of movements, with a proposal, that is, to
offer the citizen the opportunity to make economic choices (produce, consume, save,
move) in a manner consistent with one’s ethics. It is for this reason that the movement
of ethical finance can only be understood by recognizing its complementarity with
other initiatives, always on a global scale, such as, for example, fair trade, organic
agriculture, the solidarity and civil economy (in its various expressions). In fact, a
careful observer does not escape how these experiences arose from an awareness and
consequent change of strategy of many of the social movements that characterized
the second half of the last century: from the movement for peace and nonviolence to
the environmentalist one, from the of human and social rights to that of North/South
cooperation, and, more generally, to a worldwide ramification of the cooperative
movement. Years of struggles, campaigns and actions, if they have been important
for changing the sensitivity of public opinion, have changed very little in the balance
of power between those who manage economic and financial power and those who
suffer from it. Here then is the importance of declining these non-economic values
also and precisely in economic activities, starting from the assumption that there
cannot be schizophrenia in our way of thinking and acting, almost a double morality
that is used in different ways depending on the context in which we find ourselves:
ethical/solidarity in social life, oriented towards maximizing profit in economic life.
The common thread of these movements is the awareness that only by putting a
2 Raiffeisen is committed to a strong local economy with cultural, sporting and tourism initia-
tives from which Raiffeisen members and YoungMemberPlus clients can benefit in the form of
MemberPlus supplementary services. See https://www.raiffeisen.ch/rch/it.html.
5 Tools and Measurement Criteria of Ethical Finance Through … 121
It does not discriminate between job recipients on the basis of sex, ethnicity or religion, or
even on the basis of assets, thus caring for the rights of the poor and marginalized. It therefore
finances human, social and environmental promotion activities, evaluating projects with the dual
criteria of economic viability and social utility. Loan guarantees are another way for partners to
take responsibility for financed projects. Ethical finance considers equally valid, like patrimonial
guarantees, those forms of personal, category or community guarantees that allow access to credit
even to the weakest sections of the population.
2. Consider efficiency a component of ethical responsibility.
It is not a form of charity: it is an economically viable activity that intends to be socially useful.
Assuming responsibility, both in making one’s savings available and in using them to preserve their
value, is the foundation of a partnership between subjects of equal dignity.
3. It does not consider enrichment based solely on the possession and exchange of money to be
legitimate.
The interest rate, in this context, is a measure of efficiency in the use of savings, a measure of
the commitment to safeguard the resources made available by savers and to make them bear fruit in
vital projects. Consequently, the interest rate, the return on savings, is different from zero but must
be kept as low as possible, on the basis of both economic and social and ethical evaluations.
4. It’s transparent.
The ethical financial intermediary has the duty to treat with confidentiality the information
on savers that it comes into possession of in the course of its activity, however the transparent
relationship with the customer imposes the naming of the savings. Depositors have the right to
know the financial institution’s operating processes and its employment and investment decisions.
It will be the responsibility of the ethically oriented intermediary to make the appropriate information
channels available to ensure transparency on his activity.
5. It provides for the participation in the important decisions of the company not only by the
shareholders but also by savers.
The forms can include both direct mechanisms of indication of preferences in the destination
of funds, and democratic mechanisms of participation in decisions. In this way, ethical finance
promotes economic democracy.
6. It has social and environmental responsibility as its reference criteria for employment.
It identifies the fields of employment, and possibly some preferential fields, by introducing
reference criteria based on the promotion of human development and social and environmental
responsibility in the economic investigation. In principle, it excludes financial relationships with
those economic activities that hinder human development and contribute to violating fundamental
human rights, such as the production and trade in arms, production seriously harmful to health and
the environment, activities based on exploitation of minors or the repression of civil liberties.
7. It requires global and coherent adherence by the manager who directs all of its activities.
If, on the other hand, the ethical finance activity is only partial, it is necessary to explain, in a
transparent way, the reasons for the limitation adopted. In any case, the intermediary declares its
willingness to be ‘monitored’ by institutions guaranteeing savers.
124 M. Piccolo and F. Vigliarolo
Product approach versus systemic approach: Ethical finance operators put the
assessment of social and environmental impacts at the heart of all proposed financial
products and all corporate practices, including, for example, manager remuneration
policies; the incentives; etc. Environmental and social impact assessments are a full
part of the internal control system on all activities.
Governance Models: The intermediary who does ethical finance must have trans-
parent and participatory governance.
Weight of ESG Parameters: Ethical finance evaluates with specific criteria and indi-
cators every aspect -environmental, social and governance- of the activities it finances
with credit and investments and also their respective interrelationships. Exclusion
criteria are adopted in different sectors, with low tolerance thresholds. It has its
own methodology that uses national and international databases integrating them
with those of non-governmental organizations and using them actively, not passively
applying scores provided by third parties.
Lobby versus Advocacy: Ethical finance invests in critical finance education projects
that make people aware of the social and environmental risks of the financial casino
and calls on institutions to regulate and tax finance so that it can contribute to healthy
and inclusive development across the globe planet. Other requests include the sepa-
ration of commercial and investment banks, the fight against tax havens (for example
through the universal adoption of country by country reporting), limits on the use of
derivatives and others. The initiatives are carried out in a widespread way thanks to
the active involvement of the members (participation).
Engagement and critical shareholding: Ethical finance seeks dialogue with the
companies in which it invests to stimulate them to constantly improve their social
and environmental performance.
In 1891 the distinction between positive economics and normative economics arises,
proposed by John Neville Keynes, father of the famous Maynard. Positive economics
is understood as “the description of the functioning of an economic system” “as it is”;
and normative economics is understood as “the evaluation of what is desirable, its
costs and benefits”. For Amartya Sen, Nobel Prize in Economics in 1998, the afore-
mentioned represents the crucial problem of the contemporary era to which giving
answers; that is, how to overcome utilitarian rationalism and reduce the distance from
ethic and classic economy principles? Now it must be said that with utilitarianism,
whose main exponents were Jeremy Bentham and Stuart Mill, a concept of well-
being (also community) is thus affirmed, based solely on personal interest in terms
of pleasure and pain, and they deny fully those elements of the process that this may
5 Tools and Measurement Criteria of Ethical Finance Through … 125
entail, whether they are considered fair or unfair. All this ended with the transfer from
a community ethic to a utilitarian ethic such as the distinction between the bonum
honestum and the bonum utile—the latter calculable in mathematical terms, mainly
with the amount of benefits and pleasures that are obtained and that give life to the
community famous “table of calculations of pleasures”-, which shows us and further
confirms how society becomes almost a by-product of the economy subordinated to
the utilitarian laws of maximizing monetary profit.
In this conceptual framework, the following paragraph is proposed, which tries
to criticize computational finance in a positivism matrix and successively proposes
how to overcome some limits that are highlighted, through the application of the
principles of ethical finance that were exposed before. To do this, in this paragraph
we will reconstruct some principles of computational finance, evidencing critical
points to later propose some indices that could be incorporated.
Before proposing some indices and measurement tools based on ethical finance, it is
good to analyze the limits, in our opinion, of computational finance.
Computational finance is a direct extension of economic positivism. With this
term we refer to “the science that studies economic systems ‘as they are’, through
mathematical rationality and the maximization of individual interest, leaving out
considerations of a normative type or that we could define from the question: what
kind of society do we want to be built in terms of values, ethic principles, rights or
identity? In this sense, economics positivism is interpreted as the attempt to transform
“social behavior” into mathematical reasoning, which is based on rational individual
interest that can be materially quantified to the detriment of a subjective identity,
more complex to define and that leaves in in the background, the “cultural implica-
tions” (values, principles, meanings, etc.) that contribute to define the individual and
community identity that can determine them. In other words, the economy ended up
dealing only with factors that can be mathematically quantified to the detriment of a
social identity, also made up of elements that are not mathematically quantifiable. For
these reasons, we understand by positivization of the economy also a systematization
of its own functioning, which totally dispenses with the transcendental dimension,
to apply mainly the laws of physics, statistics and (natural) mathematics, leaving the
others in the background factors, and which concentrates on describing the construc-
tion of wealth only as material facts. In other words, it could be said that economic
positivism is based on an unlimited faith in mathematics and in the ability to trans-
form the world in the name of progress and productive growth based on technological
innovation (the use of the steam engine, electricity and the expansion of railways).
For these reasons, economic positivism can be considered as an attempt to establish
the foundation for rational intervention in society and the economy or, better said, as
the use of empirical reason to modify and direct social behavior, which involves elim-
inating the metaphysical and transcendental implications. In economics, according
to us and for the reasons exposed in this work, this is delineated from the mercantilist
practices and in theoretical terms with the physiocrats before and with Smith later
[16]. In addition, it is based on the fact that man is a rational being who acts only
5 Tools and Measurement Criteria of Ethical Finance Through … 127
Starting from the three areas of intervention of computational finance exposed above,
below, we try to propose some index that can unite the principles of computational
finance and the criteria of ethical finance. With this objective, we propose three
formulas as a general methodology, in which coefficients can be incorporated with
respect to some of the principles of ethical finance considered important.
• Investment Banking Index with social and environmental implications (IBSEI)
When we calculate the investment banking, we have to reference to the entire
money supply chain, where it comes from, how it is managed, what it generates in
terms of society and the environment.
If the money comes from negative circuits, according to the principles set forth
in ethical finance, a negative coefficient is set. If the money comes from positive
circuits, a positive coefficient is set.
If the money is invested in negative or positive circuits, the same considerations
set out for the coefficients above apply.
The values of the coefficients are decided on the basis of the weight that a principle
has on reality, which is determined through a subjective process at a social level. In
128 M. Piccolo and F. Vigliarolo
%R = f(QMI ∗ +/ − coef) ∗ T
R is the risk
QI amount of money invested
Coef = positive or negative coefficient applied; it can range from − 10 to + 10
and cannot be equal to zero
T% coefficient of transparency of the money cycle.
• Business Management Index with social and environmental implications (BuSEI)
In the case of a company we can consider an index based on its final profit.
%R = fG(+/ − coef) ∗ T
Finally, we propose some indicators that are used in ethical finance that can be
incorporated to calculate the positive/negative coefficient. They are:
• Environmental
– % of CO2 emissions of the entire value chain
– % use of renewable energy across the value chain
– % Use of biodegradable products
– % of packaging with natural products
– …
• Socio-economic
– % of vulnerable people working
– Equality of wages between men and women for the same tasks throughout the
value chain
– Compliance with the official employment contracts required by the countries
– Respect for Human Rights in general throughout the value chain
– % of investments in strategic sectors for the local community
– % of investment in real and non-speculative productive sectors
– Existence of methodologies for involving all the stakeholders of a community
in decisions where to invest money
– …
• Financial management
– Transparency (existence of information on the origin of the money, use,
investment)
– Internal democracy of financial institutions (one person, one vote)
– …
The results presented in this article demonstrate how computational finance itself
can be treated in terms of social reasoning, i.e. meanings that guide decisions and
risks that are not defined only in terms of monetary quantities, but on the basis of
subjectively perceived priorities and concern the vision of life of citizens in general
terms. That is, they concern general environmental and social values too, that are
part of the real well-being of citizens.
In order for these indices to work, there is a need for a systemic approach and an
external certification model that issues the degree of coefficients at an international
level for shareholders, banks and companies. We could say that a change in the
world order is needed, even in Stiglitz’s [14] terms, when in his Free Fall, he states
that without a change in the international order it is not possible to overcome the
problems induced by this financial system today without a link with the reality and
even economy.
130 M. Piccolo and F. Vigliarolo
This could mean generating an evaluation system for shareholders, banks and
companies that pays attention to the socio-environmental impact and which must
be implemented by every country. This would allow citizens, businesses and share-
holders to be able to decide on the basis of a more complete information system than
just calculating the monetary return. This socio-economic financial order would allow
economic activities to be directed again towards a system of socio-environmental
well-being and to get out of the logic of utilitarian rationalism, which is, as Amartya
Sen states, one of the problems of our age. In other words, it would allow those who
invest to introduce ethical elements into their decisions too.
Finally, it must be said that these indices can undoubtedly capture partial aspects of
computational finance or be reductive. But the goal of having indices that integrate
the two criteria is to propose a normative and not just a positivist dimension also
in computational finance, capable of making it clear that despite the increase in
monetary results, it can sometimes have a negative impact on the environment or
society which affects the real quality of life of populations in the long term. In other
words, an attempt is made to “calculate”, if this is the right term, also the social and
environmental risks that the management of financial values entails.
All this brings us back to the fact that the economy is a tool for managing
resources for the well-being of citizens and not science for pursuing monetary
interests (chrematistics), since the two things do not always go together.
References
1. Aristotle.: [EN] Nicomachean Ethics, Politics. Spanish version and introduction by Antonio
Gomez Robledo, 19th edn. Porrúa, Mexico (2000)
2. Arrighi, G.: Adam Smith en Pekín. Orígenes y fundamentos del siglo XXI, Akal, Madrid (2007)
3. Bee, M., Santio, F.: Finanza quantitativa con R Copertina flessibile, Apogeo (2013)
4. Kumar, B., Kumar, L., Sassanelli, C.: Green Finance in Circular Economy: A Literature Review.
Springer (2023)
5. Curci, G.: Finanza quantitativa e modelli matematici, Plus (2007)
6. Krippner, G.: The financialization of the American economy. Socio-Economic Review 3, 173–
208 (2005)
7. Lapavistas, C.: Financialization and capitalist accumulation: structural accounts of the crisis
of 2007–9. Discussion Paper Series 16, 1–10 (2010)
8. Oliva, I., Renò R.: Principi di finanza quantitativa Copertina flessibile, Apogeo (2021)
9. Perna T. (ed.): Fair trade. La sfida etica al mercato mondiale. Bollati Boringhieri (1998)
10. Piketty, T. (ed.): Le Capital au XXIe siècle. Éditions du Seuil, París (2013)
11. Pope Francis.: Encyclical Letter Laudato si’ of the Holy Father Francis on care for our common
home (May 24, 2015)
12. Sen, A. (ed.): Libertà è sviluppo. Perché non c’è crescita senza democrazia. Arnoldo Mondadori,
Milán (2000)
13. Sen, A. (ed.): Etica ed economia. Laterza, Bari (2003)
14. Stiglitz, J.E.: Freefall: America, Free Markets, and the Sinking of the World Economy. W. W.
Norton (2010)
15. Vigliarolo, F.: Le imprese recuperate. Argentina, dal crac finanziario alla socializzazione
dell’economia, Città del Sole e Altreconomia Edizioni, Reggio de Calabria (2011)
16. Vigliarolo, F.: La economia es un fenomeno social. Principios de fenomenología económica,
Buenos Aires. EUDEBA (2019)
5 Tools and Measurement Criteria of Ethical Finance Through … 131
Websites
Abstract Banks in India are facing many challenges and witnessing many changes
in recent times. Managing Non-Performing Assets (NPAs) has emerged as a major
challenge for banks. This chapter presents the findings of a formal attempt to explain
NPA variations from 2005 to 17. The findings are based on the application of various
data mining techniques such as random forest, elastic net regression, and k-NN
algorithm to understand the NPAs of banks in India. The study uses gross NPA
as a dependent variable and other bank-specific and macroeconomic variables as
independent variables. The experimental results show that elastic net regression is
the best data mining technique to model the NPAs in the given context. Also, the
empirical results in all the models have found strong evidence that certain variables
like the previous year’s NPA and the loan amount distributed have an impact on the
NPAs. The findings of the study will provide policy directions to the banking sector
and the government to control the quantum of NPAs in the financial system.
6.1 Introduction
The banks in India are key ingredients for the country’s economic growth, financial
services, and the government, businesses, and individuals. Indian banks have been
resilient and withstood the crises such as the 2008 financial crisis and the covid-19
G. Kumar (B)
Dr. B. R. Ambedkar National Institute of Technology (NIT) Jalandhar, Jalandhar, India
e-mail: [email protected]
A. K. Misra
Vinod Gupta School of Management, Indian Institute of Technology Kharagpur, Kharagpur, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 133
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_6
134 G. Kumar and A. K. Misra
and value. The study revealed that high market power or low competition measured
using Learner’s Index lowers financial distress. It is positively related to financial
stability in banks in India.
Chawla and Rani [4] developed a structured questionnaire to collect data from
officers working in the credit department in different banks in India. The research
findings reveal the banker’s perspective and bring some practical insights into the
factors behind specific NPAs resolution strategies. The study has identified 7 signif-
icant management dimensions out of 21 dimensions based on exploratory factor
analysis (EFA). The study provides suggestions on effective credit management and
improving the asset quality of banks in India.
Jain and Gupta [7] examined the moral hazard behavior of Indian banks by
observing the impact of the level of Net Non-Performing Loans (NNPL) on lending
behavior. The study makes use of the lagged value of NNPL to determine the distress
levels. The analysis shows that there is an increase in NPLs when the loan growth
ratio increases. This is when banks experience prior sizable loan losses as compared
to when banks are relatively safe indicating moral hazard behavior.
A recent study by [3] examines the impact of the Covid-19 pandemic on
Bangladesh’s banking sector. The findings suggest that large banks are more vulner-
able to the risk posed by the pandemic. The study found that all banks are likely to
see a fall in risk-weighted asset (RWA) values, capital adequacy ratios (CAR), and
interest income at the individual bank and sectoral levels. There is a disproportionate
increase in these three dimensions after an NPL shock of a higher degree.
The study of [14] highlights the need for both better utilization of resources and
scale expansion. In conclusion, the discussion of the literature on NPAs in banks
highlights the causes and effects of this problem. The literature on NPAs in banks
in India discusses the complexity of the problem and the need for a multi-pronged
approach to address it. While measures such as the Insolvency and Bankruptcy Code
(IBC) and the Asset Quality Review (AQR) have been effective in addressing the
problem to a certain extent, there is a need for greater focus on understanding the
drivers of NPA. This will help to design preventive measures to avoid the build-
up of NPAs in the first place. The literature also emphasizes the need for banks to
improve their governance and risk management practices to prevent the recurrence
of the problem in the future. Overall, the literature provides valuable insights into
the problem of NPAs in banks in India and highlights the need for sustained efforts
to address the problem.
Data on every quarter is collected from the Reserve Bank of India (RBI) portal. The
data is then aggregated at year levels. The period of the study is from 2005 to 2017.
6 Data Mining Techniques for Predicting the Non-performing Assets … 137
The definition of the variables is provided in Table 6.1. The output variable of the
study is gross non-performing assets (GNPA). As per Fig. 6.1, the GNPA data is
closely resembling normal distribution. Following studies that evaluated the NPA in
banking through its exogenous determinants, this article uses EPU, learner index,
ROA, NIM, operational risk, leverage, liquidity ratio, GDP growth rate, interbank
rate, Gsec 10Yr yield, asset diversification, income diversification, regulatory capital,
ownership dummy, and loan size.
As per Fig. 6.1, the GNPA data is closely resembling normal distribution.
The study builds the ML models on the training set and evaluates their performance
on the test set. To achieve the optimal prediction for the NPA, we have applied various
data mining techniques using the caret and GLMnet packages in R. These libraries
have functions to implement the machine learning algorithms such as random forest
(RF), elastic net regression, and ANN. The varImp function is used to explain the
variable or feature importance. This function automatically provides importance
scores to the variables in the range between 0 and 100.
A well-liked machine learning method called Random Forest is utilized for both
classification and regression problems. It is an ensemble learning technique that
produces the class that is the mean of the classes (classification) or the mean prediction
(regression) of the individual decision trees during training time. Each decision tree
in a Random Forest is built using a randomly selected subset of the features from the
input data. This increases the diversity of the individual trees in the forest and reduces
overfitting. Additionally, each tree is trained on a random subset of the training data,
using a process called bootstrapping. The algorithm works by predicting each input
data point by passing it through each decision tree in the forest. The final prediction
is then determined by taking the majority vote (for classification) or the average (for
regression) of the predictions made by all the trees.
Random Forest (RF) has been extensively used in Finance. In order to create
detection models, [11] used four statistical approaches, including parametric and non-
parametric models. It came to the conclusion that Random Forest has the highest accu-
racy, and non-parametric models have higher accuracy than non-parametric models.
6 Data Mining Techniques for Predicting the Non-performing Assets … 139
Khaidem et al. [9] and, Kumar and Thenmozhi [10] predict trends in the stock market
prices using RF algorithm.
Random Forest has several advantages over other machine learning algorithms.
It can handle large datasets with a high number of features, and it is less prone to
overfitting than other methods. Additionally, it is highly scalable and can be easily
parallelized, making it ideal for big data applications.
Y = a + b ∗ X1 + c ∗ X2 (6.1)
λ ∗ b2 + λ ∗ c2 ; ||w||2 (6.2)
Elastic net regression is now finding its application in Finance. The identification
of the most important variables that are closely associated to the credit risks is a key
challenge for online financing. [5] creates a new multiple structural interacting elastic
net model for feature selection in order to efficiently discover the most important
features for credit risk assessment in online financing. On other hand, [12] predicts
coherent house prices using Elastic net regression and other algorithms for those
who don’t own homes based on their financial resources and goals.
6.3.3.3 k-NN
This study applied repeated training and testing on five procedures. The study sliced
the data set into 60:40, which provides a larger and more reliable validation set. After
training the model, we use it to generate the predictions and present the evaluation
results for both the training and test datasets.
The study acknowledges that only one metric for evaluating model performance, may
not be sufficient to make a definitive conclusion about the superiority of one model
over another. Therefore, the study makes use of Root Mean Square Error (RMSE),
6 Data Mining Techniques for Predicting the Non-performing Assets … 141
Mean Absolute Error (MAE), and R-squared values to evaluate the performance of
the models. The RMSE and MAE values represent the average difference between
the predicted and actual values, where lower values indicate better performance. The
R-squared value represents the proportion of the variance in the dependent variable
that is explained by the independent variables, where higher values indicate better
performance.
Table 6.2 presents results from the random forest model with different values of
mtry (the number of variables randomly selected at each split) and their corre-
sponding performance metrics: RMSE (Root Mean Squared Error), R-squared, and
MAE (Mean Absolute Error).
Based on the results, the model performance is best when mtry is equal to 6. This
value of mtry corresponds to an RMSE of 0.654, an R-squared of 0.821, and an
MAE of 0.415. As mtry increases or decreases from this optimal value, the model’s
performance tends to worsen slightly but remains relatively stable. Table 6.3 shows
the importance of nineteen variables when predicting the GNPA of the banks. As per
the random forest model results, the lag of gross NPA emerged as the variable with
utmost importance followed by loan advances and net interest margin respectively.
Interbank interest rate, G. sec. rate, and lag of GDP emerged as the least important
variables. Figure 6.2 presents the robustness of the random forest model by testing
it on a test data set, and regressing the predicted and measured values. The RMSE
between predicted and measured test set values is 0.644.
Fig. 6.2 Validation of random forest model. Notes This figure shows the values predicted by the
random forest model on Y-axis and the values in the test data set on the X-axis
6 Data Mining Techniques for Predicting the Non-performing Assets … 143
The results of the elastic net regression are presented in Table 6.4 which reports the
performance of the model for different tuning parameters. The analysis used cross-
validated resampling, which means that the data were divided into five-folds, and the
analysis was repeated five times. The results of the analysis show that the optimal
model has an alpha value of 0.601 and a lambda value of 0.014.
The model’s performance was evaluated using three metrics: RMSE, R-squared,
and MAE. The RMSE value was used to select the optimal model because it had
the smallest value. The optimal model had an RMSE value of 0.616 and an R-
squared value of 0.826. The MAE value was 0.416. The R-squared value indicates
that the model explains 82.6% of the variance in the dependent variable, which
is a good fit. The RMSE value indicates that the average difference between the
predicted and observed values is 0.616. The MAE value indicates that, on average, the
model’s predictions are off by 0.416 units. Overall, the elastic net regression analysis
suggests that there is a significant relationship between the dependent variable and
the independent variables, and the model’s performance is good.
Table 6.5 shows the importance of nineteen variables when predicting the GNPA of
the banks. As per the elastic net regression model results, lag of gross NPA emerged as
the variable with utmost importance followed by loan advances and return on asset
(ROA) respectively. Regulatory capital and credit-to-deposit ratio emerged as the
least important variables. Leverage and inter-bank rate are not found to be impacting
GNPA. Figure 6.3 presents the robustness of the elastic net regression model by
testing it on a test data set, and regressing the predicted and measured values. The
RMSE between predicted and measured test set values is found to be 0.633.
The results of the k-nearest neighbor (k-NN) algorithm, as shown in Table 6.6,
presents the performance of the model for different values of k. The algorithm was
applied to a pre-processed dataset that was centered and scaled, and the performance
144 G. Kumar and A. K. Misra
Fig. 6.3 Validation of elastic net regression model. Notes This figure shows the values predicted
by the elastic net regression model on Y-axis and the values in the test data set on the X-axis
was evaluated using cross-validation with a fivefold split. The summary of sample
sizes indicates that the dataset was divided into 5 sets with varying sample sizes
ranging from 183 to 187. The results table shows that as the value of k increases, the
performance of the model decreases, as indicated by the increase in the root mean
6 Data Mining Techniques for Predicting the Non-performing Assets … 145
squared error (RMSE), mean absolute error (MAE), and decrease in the R-squared
value.
The results show that the k-NN algorithm performs best with a k value of 5, as
it has the lowest RMSE, highest R-squared value, and lowest MAE. However, it’s
important to note that the performance of the model is not significantly different
between k = 5 and k = 7, as the differences in the RMSE, R-squared, and MAE
values are relatively small. As the value of k continues to increase beyond k = 7, the
performance of the model decreases significantly.
Table 6.7 shows the importance of nineteen variables when predicting the GNPA
of the banks. As per the k-NN model results, the lag of gross NPA emerged as the
variable with utmost importance followed by loan advances and net interest margin
(NIM) respectively. Income diversification and Interbank rate emerged as the least
important variables. The lag of economic policy uncertainty (EPU) is not found to
be impacting GNPA. Figure 6.4 presents the robustness of the elastic k-NN model
by testing it on a test data set, and regressing the predicted and measured values. The
RMSE between predicted and measured test set values is found to be 0.679.
Based on the summary of the performance metrics provided in Table 6.8, the
Elastic Net Regression model has the lowest RMSE (0.616) and the highest R-
squared value (0.826), indicating that it has the best overall predictive performance.
The Random Forest model has a slightly higher RMSE (0.654) and R-squared value
(0.821) than the Elastic Net Regression model, but it has a slightly lower MAE
(0.415) than the Elastic Net Regression model. The k-NN Algorithm has the highest
RMSE (0.750) and the lowest R-squared value (0.776), indicating that it has the
worst overall predictive performance among the three models.
The variable importance results in all models indicate that there is a relationship
between the Gross Non-performing assets of a bank in a given year and its Gross
Non-performing assets in the previous year. This suggests that if a bank has a high
level of non-performing assets in one year, it is more likely to have a high level of
146 G. Kumar and A. K. Misra
Fig. 6.4 Validation of the k-NN model. Notes This figure shows the values predicted by the k-NN
regression model on Y-axis and the values in the test data set on the X-axis
non-performing assets in the following year as well. This finding has implications for
banks and regulators, as it suggests that efforts to reduce non-performing assets may
need to focus not only on the current year’s performance but also on addressing the
root causes of non-performing assets in previous years. The result also indicates that
if a bank disburses a large number of loans each year, it is more likely to experience
6 Data Mining Techniques for Predicting the Non-performing Assets … 147
a higher level of GNPAs in that year or subsequent years. This finding has important
implications for banks and regulators, as it highlights the need to carefully manage
and monitor loan disbursal activities to minimize the risk of non-performing assets.
6.5 Conclusion
Banks in India have been undergoing major changes in the dynamic environment
over the past few years and NPAs continue to be a major concern for the banks. The
rise in NPAs has had a significant impact on the profitability of banks and has posed
a threat to the stability of the financial system. Machine learning has the potential to
transform the banking and finance industry by enabling financial institutions to make
more informed decisions and mitigate risks. This study attempts to study the bank-
specific and other macroeconomic determinants of the NPAs in Indian banks using
machine learning methodology. The experimental results demonstrated that based
on the lowest RMSE and highest R-squared criteria, the elastic net regression has
the best-predicting accuracy for modelling the NPAs in the Indian banking system.
The performance of the elastic net regression is closely followed by random forest.
The k-NN model is reported to be the least preferred model. Consistently, in all the
models the lag of gross NPA and the loan amount is figured among the top features.
Based on these results, this study suggests that regulators and banks should focus
on controlling the non-performing assets of the current year but also on addressing the
root causes of non-performing assets in previous years. Also, the study emphasizes
that banks may need to develop effective risk management strategies to ensure that
loans are disbursed only to creditworthy borrowers and that the bank’s exposure to
default risk is appropriately managed. This article has its scope limited to the models
tested, studied period, and banks operating in India. The impact of the COVID-19
pandemic on banks’ NPAs is an interesting area of research that can be taken for
further exploration.
148 G. Kumar and A. K. Misra
Appendix
1 Allahabad Bank
2 Andhra Bank
3 Axis Bank
4 Bank of Baroda
5 Bank of India
6 Bank of Maharashtra
7 Canara Bank
8 Central Bank of India
9 Citibank
10 Corporation Bank
11 Dena Bank
12 Deutsche Bank AG
13 Federal Bank
14 HDFC Bank
15 HSBC Ltd
16 ICICI Bank
17 IDBI Bank Ltd.
18 Indian Bank
19 Indian Overseas Bank
20 IndusInd Bank
21 Oriental Bank of Commerce
22 Punjab And Sind Bank
23 Punjab National Bank
24 State Bank of India
25 Syndicate Bank
26 UCO Bank
27 Union Bank of India
28 United Bank of India
29 Vijaya Bank
30 Yes Bank Ltd.
6 Data Mining Techniques for Predicting the Non-performing Assets … 149
References
1. Arrawatia, R., Dawar, V., Maitra, D., Dash, S.R.: Asset quality determinants of Indian banks:
empirical evidence and policy issues. J. Public Aff. 19(4), e1937 (2019)
2. Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., Inoue, D.: Referential k-NN regression for finan-
cial time series forecasting. In: Neural Information Processing: 20th International Conference,
ICONIP 2013, Daegu, Korea, November 3–7, 2013. Proceedings, Part I 20, pp. 601–608.
Springer Berlin Heidelberg (2013)
3. Barua, B., Barua, S.: COVID-19 implications for banks: evidence from an emerging economy.
SN Bus. Econ. 1, 19 (2021). https://doi.org/10.1007/s43546-020-00013-w
4. Chawla, S., Rani, S.: Resolution of non-performing assets of commercial banks: the evidence
from banker’s perspective in Indian banking sector. Ind. Econ. J. 70(4), 635–654 (2022). https://
doi.org/10.1177/00194662221118318
5. Cui, L., Bai, L., Wang, Y., Jin, X., Hancock, E.R.: Internet financing credit risk evaluation
using multiple structural interacting elastic net feature selection. Pattern Recogn. 114, 107835
(2021)
6. Garg, N.: Factors affecting NPAs in Indian banking sector. Paradigm 25(2), 181–193 (2021).
https://doi.org/10.1177/09718907211035594
7. Gupta, C.P., Jain, A.: A study of banks’ systemic importance and moral hazard behaviour: a
panel threshold regression approach. J. Risk Fin. Manage. 15(11), 537 (2022)
8. Kanoujiya, J., Rastogi, S., Bhimavarapu, V.M.: Competition and distress in banks in India: an
application of panel data. Cogn. Econ. Fin. 10(1), 2122177 (2022)
9. Khaidem, L., Saha, S., Dey, S.R.: Predicting the direction of stock market prices using random
forest (2016). arXiv preprint arXiv:1605.00003
10. Kumar, M., Thenmozhi, M.: Forecasting stock index movement: a comparison of support
vector machines and random forest. In: Indian Institute of Capital Markets 9th Capital Markets
Conference Paper (2006)
11. Liu, C., Chan, Y., Alam Kazmi, S.H., Fu, H.: Financial fraud detection model: based on random
forest. Int. J. Econ. Fin. 7(7) (2015)
12. Madhuri, C.R., Anuradha, G., Pujitha, M.V.: House price prediction using regression tech-
niques: a comparative study. In: 2019 International Conference on Smart Structures and Systems
(ICSSS), pp. 1–5. IEEE (2019)
13. Maiti, A., Jana, S.: Determinants of profitability of banks in India: a panel data analysis. Schol.
J. Econ. Bus. Manage. (SJEBM) 4, 436–445 (2017)
14. Maity, S., Sahu, T.N.: How far the Indian banking sectors are efficient? An empirical
investigation. Asian J. Econ. Banking 6(3), 413–431 (2022)
15. Olekar, R., Talawar, C.: Non-performing assets management in Karnatak Central Co-operative
Bank Ltd. Dharawad. Int. J. Res. Commerce Manage. 3(12), 126–130 (2012)
16. Raina, D., Sharma, S.K., Bhat, A.: Commercial banks performance and causality analysis.
Glob. Bus. Rev. 20(3), 769–794 (2019). https://doi.org/10.1177/0972150919837077
17. Rajaraman, I., Bhaumik, S., Bhatia, N.: NPA variations across Indian commercial banks: some
findings. Econ. Polit. Weekly, 161–168 (1999)
18. Rao, M., Patel, A.: A study on non-performing assets management with reference to public
sector banks, private sector banks and foreign banks in India. J. Manage. Sci. 5(1), 30–43
(2015). https://doi.org/10.26524/jms.2015.4
19. Subha, M.V., Nambi, S.T.: Classification of stock index movement using k-Nearest Neighbours
(k-NN) algorithm. WSEAS Trans. Inf. Sci. Appl. 9(9), 261–270 (2012)
20. Swami, O.S., Nethaji, B., Sharma, J.P.: Determining risk factors that diminish asset quality of
Indian commercial banks. Glob. Bus. Rev. 23(2), 372–384 (2022). https://doi.org/10.1177/097
2150919861470
150 G. Kumar and A. K. Misra
Dr. Gaurav Kumar is Assistant Professor at NIT Jalandhar. He has done post-doctoral research
in the area of Financial Data Analytics at University College Dublin (UCD), Ireland. He has
obtained a Ph.D. degree from the Indian Institute of Technology (IIT), Kharagpur, and an MBA
degree from the Indian Institute of Foreign Trade (IIFT), Delhi. He has studied the liquidity of
midcap stocks as an area of research for his doctoral thesis. He holds an engineering degree in
Computer science from the National Institute of Technology (NIT), Allahabad. He has industry
experience in SAP ERP consulting while working for Tata Consultancy Services (TCS). He has
received many awards and grants including UGC Junior Research Fellowship (JRF). His research
work is presented at various international conferences at American Economic Association (USA),
ICMA Centre (London), Corvinus University (Hungary), and La Trobe University (Melbourne,
Australia). Recently, he has published in top-tier journals viz. European Journal of Finance,
Journal of Behavioural and Experimental Finance, and Asian Journal of Economics. His broad
research interest includes Stock markets, Corporate Finance and Financial Analytics.
Dr. Arun Kumar Misra is Associate Professor at IIT Kharagpur. He received his MPhil and
Ph.D. from the Indian Institute of Technology (IIT), Bombay. He has more than 20 years of
industry and teaching experience. He worked in a leading PSU Bank in various areas of banking
like Credit Planning, Basel implementation, ALM, Capital Planning, Profit Planning, CRM, and
Market Risk, etc. As a senior banker, he has completed the required certifications related to
Management Accounting, Foreign Exchange, Risk Management, Banking Laws, and Banking IT
services. Under the guidance of Dr. Misra, nine Ph.D. students of IIT Kharagpur have got their
Ph.D. degrees. He has conducted a significant number of MDPs for banks, manufacturing compa-
nies, and government departments. He has completed consulting assignments for the Ministry
of Statistics and Programme Implementation (MoSPI), ICSSR, IRDA, and Banks. Dr. Misra has
several publications in national and international journals. His research interests are in the areas of
financial markets, market micro-structure, corporate finance, risk management, banking, and asset
pricing.
Chapter 7
Multiobjective Optimization
of Mean–Variance-Downside-Risk
Portfolio Selection Models
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 151
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_7
152 G. Mamanis and E. Kostarelou
7.1 Introduction
Modern portfolio theory has its foundation in the seminal work of Markowitz [23].
Harry Markowitz proposed the first mathematical model for portfolio selection. His
model was the first mean-risk model proposed and introduced variance as a risk
measure forming the so-called Mean–Variance model. Since Mean–Variance model
other researchers have applied different risk measurements, such as semi-variance,
Value-at-Risk and absolute deviation (see e.g., [14, 16, 25–28, 30]).
However, the Mean–Variance model is adequate only if (i) the distribution of
the rate of return is multi-variate normal or (ii) the utility function of the investor
is quadratic. If the normality assumption of the rate of returns does not meet, tail
returns might occur more frequently than a Mean–Variance model predicts. For this
reason, many researchers have proposed Mean-Downside-Risk portfolio optimiza-
tion models replacing variance with another (downside) measure of risk. Variance,
however, remains the most widely used risk measure in the practice of portfolio opti-
mization. Moreover, many investors may consider a portfolio obtained with an alter-
native mean-risk model unacceptable since it may have large variance and as a conse-
quence small Sharpe index [18]. Respectively, selecting a portfolio with minimum
variance might have an unacceptable tail risk (extremely unfavorable outcomes).
One way to tackle these issues is by considering the higher moments of return
distributions in the portfolio selection process [22]. Skewness accounts for down-
side risk since if a return distribution is “skewed to the left,” the cubed extreme
negative values dominate, and the negative skewness will be positive and should be
minimized. Moreover, skewness accounts for tail favorable events since if the distri-
bution of the rate of returns is “skewed to the right,” the extreme positive values,
when cubed, dominate skewness measure, resulting in a positive skewness which
should be maximized [4]. Thus, in this study skewness is considered as an additional
objective function beyond mean and variance, to be maximized in order to account
for tail returns of the return distribution.
An alternative way to address the issues of non-normality has been proposed by
Roman et al. [29] who introduced the Mean–Variance-Downside-Risk portfolio opti-
mization models. In their study they propose a Mean–Variance-CVaR multiobjective
portfolio optimization model with CVaR as a third criterion in order to account for
tail risk. However, there are other equally important downside risk measures, as are
identified in Bodie et al. [4], and are not used by any researchers in the context of
Mean–Variance-Downside-Risk multiobjective portfolio optimization. These are the
VaR (Value at Risk) and lower partial standard deviation (LPSD). LPSD is computed
by considering only deviations of portfolio returns from a given threshold which is
usually the return of a risk-free asset. Specifically, it uses only negative deviations
from the risk-free rate of return squares those deviations, averages them, and then
takes the square root to obtain a “left-tail standard deviation”. Value-at-Risk (VaR)
describes the maximum loss (negative of portfolio’s rate of return) of a portfolio
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 153
that will not be surpassed during a specified period, with a given probability, for
example, the 5th or 1st percentile rate of return. In this study we try to analyze for
the first time these multi(three)objective portfolio optimization models by means of
multiobjective evolutionary algorithm.
The introduction of these objective functions into the Mean–Variance model
results in portfolio optimization problems which are very difficult to be solved. The
problems, is of type of non-linear multiobjective, actually, tri-objective optimization
problems. Portfolios are evaluated, described and compared using three statistics: the
traditional expected return and variance and a downside risk measure (in this study,
VaR and LPSD) or skewness. By introducing these statistics into the Mean–Variance
model the efficient frontier becomes a surface in a higher dimensional space. Tradi-
tional mathematical programming algorithms have difficulties to provide a solution
to these multiobjective optimization problems at least in reasonable computational
effort.
Consequently, alternative solution techniques are required for computing the
efficient frontier. Evolutionary computation [24] is a family of search techniques
which are population-based, random search heuristics that imitate the principles of
Darwin’s theory of evolution, and are appropriate for tackling optimization problems
with tough search landscapes (e.g., large and multimodal search spaces, complex
constraints, nonlinear and non-differentiable objective functions, multiple objec-
tive functions). A branch of evolutionary computation is the so-called multiobjec-
tive evolutionary algorithms (MOEAs) which are especially designed to approxi-
mately solve optimization problems with two or more objective functions. The main
supremacy of MOEAs is that they produce a good approximation of the efficient
frontier in a single run and within little computing time [2]. In this study, instead
of transforming the multiobjective portfolio optimization problems into a single
objective one, we try to approximately solve and analyze the problems in their multi-
objective nature. Thus, we compute and analyze-evaluate a set of efficient portfolios.
We show that by taking into account downside risk as an additional third criterion
into the classical Mean–Variance model is a profitable investment decision.
Some studies explore an additional criterion in the portfolio selection problem.
Garcia-Bernabeu et al. [15] introduced sustainability as a third criterion within the
Mean–Variance portfolio model, catering to ethical or green investors who seek to
integrate financial gains with social benefits. They employed a modern multiobjec-
tive genetic algorithm called ev-MOGA, emphasizing ε-dominance. Additionally,
Chen et al. [8] suggested a blended method for multiobjective portfolio optimization
problems involving higher moments. Moreover, Mamanis [19] conducted a computa-
tional analysis comparing various MOEAs in another three-objective portfolio opti-
mization scenario. More recently, Mamanis [20] proposed and empirically exam-
ined a multi(three)objective portfolio optimization model, incorporating two-tailed
performance measures and a utility function to assess financial performance.
In this research, we experiment with a well-known and popular MOEA, namely
SPEA2 [32] for analyzing and evaluating the multiobjective portfolio optimization
154 G. Mamanis and E. Kostarelou
problems. SPEA2 is a well-tested MOEA that has been applied in various real-
world optimization problems. Especially in portfolio optimization problems, Anag-
nostopoulos and Mamanis [1, 19] compared a variety of state-of-the-art MOEAs on
multiobjective portfolio optimization problems with binary variables and found that
SPEA2 was a very effective algorithm for providing an approximation of the effi-
cient frontier in reasonable computational effort. Here, in contrast with these studies,
our focus is on the models. Our goal is to show that solving these models approx-
imately can provide the investor with a variety of portfolios with very good return
characteristics. Furthermore, unlike most studies that consider portfolio optimiza-
tion problems with MOEAs and try to improve the in-sample results, our focus is
on the out-of-sample performance of the algorithm and portfolio selection models
accordingly. According to our knowledge no paper has been devoted to solve these
multiobjective portfolio optimization problems with the exception of few papers that
solve the Mean–Variance-Skewness portfolio model [8, 21]. Furthermore, our goal
is to compare the out-of-sample performance of the portfolio selection models as
such a study is, to the best of our knowledge absent from the literature.
The conventional portfolio selection model assumes a single investment horizon and
a finite set of n available financial assets. The investor’s task is to build a portfolio by
determining the allocation of capital across these assets to maximize profits at the end
of the investment period. Each decision variable x i , represents the proportion of the
available funds invested in risky asset i = 1,…,n. The return on each financial asset
(denoted by the random variable Ri at the end of the investment period is initially
unknown. The portfolio’s return, being a weighted sum of these random variables, ∑ is
itself a random variable, as expressed by the following equation R(x) = ni−1 xi Ri .
The investor aims to construct a portfolio that maximizes the return at the end of the
investment period, subject to the constraint that the sum of the proportions assigned
to all assets equals 1.
Many approaches have been proposed for choosing among different random vari-
ables [9]. A fundamental answer was given by Harry Markowitz in 1952 [23] who
proposed the Mean–Variance model. The mean is used to define the profit of the
investment and should be maximized while variance defines the risk of the invest-
ment and ought to be minimized. Since Markowitz’s work, many alternative risk
measures have been proposed.
In this spirit the bi-objective mean-risk portfolio optimization problem that must
be solved is given below.
max μ(x)
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 155
min ρ(x)
{ | n }
|∑
n|
s. t. x ∈ X = x ∈ R | xi = 1, xi ≥ 0 (7.1)
|
i=1
Apart from the mean and risk objectives, these models incorporate a set of
constraints, forming a feasible collection of decision vectors denoted as X. The
simplest method to delineate this feasible set is by stipulating that the weights
sum up to 1, and prohibiting short-selling, hence ensuring non-negativity propor-
tions xi ≥ 0, i = 1, . . . n. In this study, the model with only budget and short-sale
constraints is called as the simple model.
An extension of this simple model includes the introduction of additional real-
world constraints. Among the constraints commonly employed are the cardinality
constraint, which restricts the number of assets held within specified lower (K min )
and upper (K max ) limits, and quantity constraints, which restrict the capital invested
in holding securities to fall within designated lower (li , for i = 1, … n) and upper
(ui , i = 1, … n) bounds.
These constraints define the so-called cardinality constrained portfolio optimiza-
tion problem. The additional real-world constraints (additional with the respect to the
simple model) are described by Eqs. (7.3)–(7.5). Equations (7.3) and (7.4) describe
cardinality and quantity constraints respectively.
min ρ(x)
max μ(x)
∑
n
s.t. xi = 1 (7.2)
i=1
∑
n
K min ≤ δi ≤ K max (7.3)
i=1
li δi ≤ xi ≤ ui δi , i = 1, . . . , n (7.4)
(7.5)
δi ∈ {0, 1}, i = 1, . . . , n
Moreover, two additional models are formed that use as an additional objective
function the skewness of the portfolio.
For computing, the expected return, variance and the other objective functions the
following process is utilized in this paper. Let r it be the observed historical return
of asset i at period t. Assuming that each period defines a different scenario to be
occurred in the future with an associated probability pt , all scenarios are considered
equally likely, thus pt . = 1/T, where T is the total number of scenarios.
For a portfolio x its realization under period t is given by the following equation:
∑
n
zt (x) = rit xi , t = 1 , . . . , T .
i=1
The expected return of the portfolio is calculated using the following formula:
∑
T
μ(x) = zt (x)pt
t=1
1∑
T
V (x) = [zt (x) − μ(x)]2
T t=1
The Value-at-Risk (VaR) at a given confidence level α is the maximum loss (or the
minimum return) that a portfolio will not exceed with a probability α. Probability α
is a parameter of the risk function which is usually fixed at a very small number (e.g.,
0.01, 0.05 or 0.1) in order to account only for extreme losses or extreme minimum
returns. In this study a value of α = 0.1 is used. In the following equation of the
VaR function, the negative sign is used in order to describe loss since zt (x) describes
return. For example, a return of − 3% corresponds to a 3% loss.
⎧ | ⎫
⎨ |∑ ⎬
| tα
VaRα (x) = − inf z(tα ) (x)|| p(j) ≥ α
⎩ | ⎭
j=1
where, z(j) are the ordered returns such that z(1) (x) ≤ z(2) (x) ≤ · · · ≤ z(T ) (x) and p(j)
their corresponding probabilities of occurrence.
Lower partial standard deviation is computed from the equation below:
/
1 ∑T [ ( )2 ]
LPSD(x) = min 0, zt (x) − rf
T t=1
1∑
T
S(x) = [zt (x) − μ(x)]3
T t=1
For the sake of completeness, kurtosis also is a moment (actually the fourth central
moment) that accounts also for tail risk of the return distribution. The kurtosis of a
portfolio x = (x1 , . . . , xn ) is calculated by:
1∑
T
K(x) = [zt (x) − μ(x)]4
T t=1
In this study, however, we concentrate on the first three central moments because
these are most commonly used in the specialized literature.
The above models are multi(three)objective optimization problems. There are
three conflicting objective functions, mean which should be maximized, variance
which should be minimized, and the third objective VaR which also should be mini-
mized or LPSD which ought to be minimized and finally skewness which must be
maximized over x.
A portfolio that simultaneously optimizes all the three objectives hardly exist.
Thus, the aim in multiobjective portfolio optimization is to find all (or a discrete
set) of the optimal trade-off portfolios among the three objectives. These trade-off
portfolios form a special solution set which is called efficient in modern portfolio
theory parlance. The image of the efficient set in the objective space defines the
efficient frontier [12]. The intention of multiobjective portfolio optimization is to find
the efficient frontier and the set of efficient solutions i.e., every solution (portfolio
structure) which are nondominated with respect the three objective functions.
In the particular problems at hand, it is said that a feasible portfolio x 1 domi-
nates another feasible portfolio x 2 iff: μ(x1 ) ≥ μ(x2 ), V (x1 ) ≤ V (x2 ) and either,
depending the portfolio selection model, VaR(x 1 ) ≤ VaR(x 2 ) or LPSD(x 1 ) ≤
LPSD(x 2 ) or S(x 1 ) ≥ S(x 2 ) with at least one strict inequality. This is the so-called
Pareto dominance relation in multiobjective optimization parlance.
Introducing a third objective function (beyond mean and variance) into the port-
folio selection model results in an efficient frontier that is a surface in the three-
dimensional space. Computing the exact efficient surface for the resulting multiob-
jective portfolio optimization problems is very difficult if not impossible. Further-
more, an additionally difficulty arises from the introduction of cardinality and quan-
tity constraints ending in a mixed-integer nonlinear multiobjective optimization
problem. Usually, however, a discrete approximation of the efficient surface is accept-
able, and as we will show, sufficient. For computing the efficient frontier for diffi-
cult optimization problems, i.e., optimization problems with large solution spaces,
158 G. Mamanis and E. Kostarelou
| | |
SV (si ) = |{sj |sj ∈ A ∪ B ∧ sj ≻ si }|
where, |·| denotes the cardinality of a set and the symbol ≻ denotes the Pareto
dominance relation which was defined for the multiobjective portfolio optimization
models considered in this study in Sect. 7.2.
Thereafter, the fitness of every solution in both archive A and population B is
calculated, determined by the sum of the strengths of its dominators:
∑
F(si ) = SV (sj ).
A∪B∧sj ≻si
Following this procedure, all non-dominated solutions are assigned a fitness value
of zero. Solutions with lower fitness values are deemed superior to those with higher
fitness values, indicating a focus on minimizing fitness. Subsequently, the evaluate
operator enhances the fitness of each individual by incorporating a crowding value,
aiming to maintain diversity within the population and guide the search across the
entire efficient frontier. Density information is integrated by adjusting the fitness
value of each solution in both the archive and population, based on the inverse of the
k-th smallest Euclidean distance (measured in objective space) plus two. Following
evaluation, the update and truncate operators select the top individuals from both
archive A and population B based on their assigned fitness values. Then, the external
population A undergoes a reproduction scheme similar to single-objective evolu-
tionary algorithms, resulting in the offspring population for the next generation. This
process iterates until a stopping criterion is met. Finally, the algorithm returns the best
solutions, offering the most optimal approximation of the global efficient frontier for
the underlying multi-objective optimization problem.
When MOEAs are applied in real-world multiobjective optimization problems,
several issues should be taken care like solution representation and variation oper-
ators. In this study, a problem-specific data structure for representing a solution
160 G. Mamanis and E. Kostarelou
and specialized variation operators to get most of the time a feasible portfolio is
implemented.
Each individual contains the following vectors for representing a solution:
{ }
S = {s1 , . . . , sk }, k ∈ K min , . . . , K max ,
{ }
W = ws1 , . . . , wsk , 0 ≤ wsi ≤ 1, i = 1, . . . , k.
Vector S includes k ∈ {K min , …, K max } integer numbers that represent the assets
that are in the portfolio while array W includes k real numbers between 0 and 1
associated with each asset. In order to satisfy quantity constraints, the following
procedure is followed. For satisfying the lower bounds the following normalization
equation is implemented.
( )
wsi ∑
xsi = lsi + ∑ 1− ls , i = 1, . . . , k
s∈S ws s∈S
To meet the upper bound constraint, if a particular asset within the portfolio
surpasses its upper limit following the application of the aforementioned equation,
it is adjusted to adhere to its upper bound. Any surplus weight is then redistributed
among the remaining assets in the portfolio according to W.
For the multiobjective portfolio optimization problems with only budget constraint
and short sales constraints, we simply set K min = 1, K max = n, l i = 0, and ui = 1 for
every i. This solution representation and constraint handling technique were proposed
by Chang et al. [7].
SPEA2 was executed 10 times for each portfolio selection problem using a
laptop computer equipped with an Intel(R) Core(TM) i5-7200U processor running at
2.5 GHz and 4.00 GB of RAM. The implementation was carried out using Microsoft
Visual C++. Across the ten runs of the algorithm, 10 different efficient frontiers
were generated for each portfolio optimization model. The parameters necessary for
running SPEA2 were set as follows: a population and archive size of 300 individuals
were utilized, along with a crossover probability of 0.9. Mutation probabilities of
0.01 for the S array and 1.0 for the W set were employed. The algorithm was termi-
nated after generating 150,000 solutions. On average, it took approximately 650 s to
obtain the efficient frontiers.
The next figures (Figs. 7.2, 7.3 and 7.4) show the efficient solutions depicted
in Mean–Variance space for the three models for a single execution of the algo-
rithm. From the first two graphs, it is seen that the algorithm generated a diverse
set of efficient portfolios ranging from approximately 1.2% expected rate of return
a month to 2.5%. There are also solutions that are not Mean–Variance efficient
(they have lower expected return and bigger variance) but have less downside risk
measured either by LPSD or VaR. However, from the graph for the Mean–Variance-
Skewness portfolio selection model (Fig. 7.4) we see that the efficient solutions are
far more diverse and diverge much more from the Mean–Variance efficient solu-
tions. This is because these portfolios have excessively large skewness. Thus, they
offer considerably less expected return, large variance but large skewness as well.
These portfolios seem to perform bad out-of-sample degrading the performance of
the Mean–Variance-Skewness portfolio optimization model as we will see next.
In the three dimensions the approximate efficient portfolios are shown in the next
figures (Figs. 7.5, 7.6 and 7.7). It is seen that the algorithm generates a very diverse
set of efficient portfolios for the decision maker to select.
Now an out-of-sample evaluation of the described portfolio models will be
presented. For each efficient frontier of a particular portfolio selection model, we
0.03
0.025
0.02
Mean
0.015
0.01
0.005
0
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016
Variance
0.03
0.025
0.02
Mean
0.015
0.01
0.005
0
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
Variance
0.03
0.025
0.02
Mean
0.015
0.01
0.005
0
0 0.01 0.02 0.03 0.04 0.05 0.06
Variance
Fig. 7.5 The efficient portfolios in the three-dimensional space for the Mean–Variance-LPSD
portfolio model
Fig. 7.6 The efficient portfolios in the three-dimensional space for the Mean–Variance-VaR
portfolio model
164 G. Mamanis and E. Kostarelou
Fig. 7.7 The efficient portfolios in the three-dimensional space for the Mean–Variance-Skewness
portfolio model
Mean–Variance-VaR respectively. Average value for Sortino ratio was 0.4 and 0.39
respectively for the two portfolio models.
The efficient solutions for each portfolio model are shown in the next figures
(Figs. 7.8, 7.9 and 7.10). The same conclusions with the portfolio model without
cardinality constraints can be drawn.
Furthermore, next (Figs. 7.11, 7.12 and 7.13) there are the three-dimensional plots
for the cardinality constrained Mean–Variance-Downside-Risk, Mean–Variance-
VaR and Mean–Variance-Skewness portfolio models.
Table 7.2 Results for the cardinality constrained portfolio selection models
VaR LPSD Skewness
% final wealth 91.7 100 26.4
better than S&P
500
% SR better 97.9 100 28.3
than S&P 500
% SoR better 99.1 97.7 27.9
than S&P 500
S&P 500 results: FW = 1.21, SR = 0.13, SoR = 0.18
FW SR SoR FW SR SoR FW SR SoR
Average max 1.51 0.38 0.71 1.5 0.38 0.72 1.51 0.37 0.68
value
Average min 1.17 0.11 0.16 1.19 0.13 0.2 0.94 − 0.07 − 0.07
value
Average 1.28 0.23 0.39 1.29 0.24 0.4 1.12 0.057 0.07
median value
0.025
0.02
0.015
Mean
0.01
0.005
0
0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 0.004 0.0045
Variance
0.025
0.02
0.015
Mean
0.01
0.005
0
0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 0.004 0.0045
Variance
0.025
0.02
0.015
Mean
0.01
0.005
0
0 0.002 0.004 0.006 0.008 0.01 0.012
Variance
It is seen that the algorithm generates a diverse set of portfolios that trade-off
amongst the three objectives.
Fig. 7.11 The efficient portfolios in the three-dimensional space for the cardinality constrained
Mean–Variance-LPSD portfolio model
Fig. 7.12 The efficient portfolios in the three-dimensional space for the cardinality constrained
Mean–Variance-VaR portfolio model
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 169
Fig. 7.13 The efficient portfolios in the three-dimensional space for the cardinality constrained
Mean–Variance-Skewness portfolio model
provided by Bruni et al. [5]. They include 595 weekly linear returns for 442 stocks
included in S&P 500 index. For conducting the experiments, we take the 520 first
returns for the in-sample optimization and the remaining 75 (approximately one year
and a half) for the out-of-sample analysis.
Due to space limitation, we do not provide the figures of the efficient portfolios.
Similar conclusions can be drawn as with the first set of experiments. We only provide
the out-of-sample comparison of the models.
From Table 7.3, it is observed that the majority of efficient portfolios generated
by the algorithm have better performance than the index for all three performance
measures, although the number of portfolios better than the index does not approach
100 as in the previous set of experiments. An important point is the improvement
of the Mean–Variance-Skewness portfolio selection model something that shows
a lack of stability for the portfolio model depending the data set and constraints’
parameters. However, considering the median of the performance measures against
the ten replicates of the algorithm it is observed that the best models are the Mean–
Variance-VaR and Mean–Variance-LPSD portfolio models which they are better than
the index considering all performance measures.
On the other hand, again, the imposition of cardinality constraints improves the
out-of-sample performance concerning the percentage of generated solutions that
are better than S&P 500 index. It is seen that all efficient portfolios generated by
SPEA2 for the Mean–Variance-VaR portfolio model outperforms the market for
all performance measures. The average median performance values, increases as
well. The same it is observed for Mean–Variance-LPSD portfolio selection model.
170 G. Mamanis and E. Kostarelou
Table 7.3 Results for the portfolio selection models on S&P 500
VaR LPSD Skewness
% final 84.1 69.5 68.2
wealth
better
than
S&P 500
% SR 85.7 72 73
better
than
S&P 500
% SoR 85.7 72 72.7
better
than
S&P 500
S&P 500 results: FW = 1.053, SR = 0.02, SoR = 0.026
FW SR SoR FW SR SoR FW SR SoR
Average 1.36 0.18 0.27 1.35 0.19 0.28 1.72 0.18 0.28
max
value
Average 0.98 0.003 0.004 0.95 − 0.004 − 0.005 0.83 − 0.055 − 0.07
min
value
Average 1.18 0.07 0.096 1.17 0.065 0.089 1.1 0.043 0.055
median
value
Except the stock market index, the results of the proposed portfolio selection models,
are compared against competing portfolios. The global minimum-variance (mv) port-
folio without short-sales (MV), and a minimum stochastic dominance portfolio (SD)
[17] are used as benchmarks. There is evidence that all these strategies produce good
out-of-sample results [3, 6, 10, 11].
The next table shows the percentage of portfolios that generates more wealth
(FW), Sharpe ratio (SR) and Sortino ratio (SoR) than investing in the minimum
second order stochastic dominance portfolio with short sales not allowed (SD) for
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 171
each portfolio model. Note that only the proposed portfolio models with cardinality
constraints are presented since they give the best results based on the above analysis.
The Sharpe ratio for the second order stochastic dominance portfolio with short
sales not allowed (SD) is 0.086; the Sortino ratio is 0.12 and the final wealth 1.22.
As can be seen from Table 7.4, the average median value for all generated portfolios
using LPSD, is almost equal to this portfolio (second order stochastic dominance
portfolio) considering all performance measures. However, as can be seen from
Table 7.5, the majority of portfolios (more than 50%) generated by the proposed
model are better than the values of the SD model considering all three performance
measures. Furthermore, it is worth pointing out that the exact optimization algorithm
for computing the optimal SD portfolio takes approximately 30 min to generate the
one optimal portfolio. The benefit by using heuristics is obvious as SPEA2 generates
a number of optimal portfolios in a timely manner (approximately 500 s for the entire
efficient frontier, on average).
Table 7.4 Results for the cardinality constrained portfolio selection models on S&P 500
VaR LPSD Skewness
% final 100 100 26.4
wealth
better than
S&P 500
% SR 100 100 28.3
better than
S&P 500
% SoR 100 100 27.9
better than
S&P 500
S&P 500 results: FW = 1.053, SR = 0.02, SoR = 0.026
FW SR SoR FW SR SoR FW SR SoR
Average 1.37 0.18 0.27 1.38 0.2 0.29 1.51 0.37 0.68
max value
Average 1.09 0.037 0.05 1.12 0.047 0.064 0.94 − 0.07 − 0.07
min value
Average 1.21 0.08 0.12 1.23 0.09 0.13 1.12 0.057 0.07
median
value
1.6
1.4
1.2
1
FW
0.8
MV
0.6
Proposed model
0.4
0.2
0
0 0.005 0.01 0.015 0.02 0.025
LPSD
The next table (Table 7.6) shows the percentage of portfolios that generates more
wealth (FW), Sharpe ratio (SR) and Sortino ratio (SoR) than investing in the global
minimum-variance portfolio with short sales not allowed (MV) for each portfolio
model. The Sharpe ratio for the global minimum-variance portfolio with short sales
not allowed is 0.137, the Sortino ratio is 0.194 and the final wealth 1.23.
It is seen, that only a small fraction of the proposed portfolio models, generate
better Sharpe and Sortino ratios than the global minimum-variance portfolio. But
these results do not imply that the proposed models are not good. Except perhaps
the Mean–Variance-Skewness model that provides constantly worst results than the
other two. Comparing the results of the global minimum-variance portfolio with short
sales not allowed with the best portfolio generated under the three models we see that
the three models provide better results. Of course, it would be unrealistic to expect
all portfolios of the proposed models to be better than the global minimum-variance
portfolio with short sales not allowed. It must be noted that the global minimum-
variance portfolio is an efficient portfolio under the three models (of course with
constraints imposed).
We can see from the below graphs that if the investor concentrates on the minimum
LPSD portfolios can gain better results than the Markowitz Mean–Variance portfolio
considering all performance measures (Figs. 7.14, 7.15 and 7.16).
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 173
0.25
0.2
0.15
SR
MV
0.1
Proposed model
0.05
0
0 0.005 0.01 0.015 0.02 0.025
LPSD
Fig. 7.15 Out of sample performance (SR—sharpe ratio) of Mean–Variance-LPSD portfolio model
0.35
0.3
0.25
0.2
SoR
0.15 MV
Proposed model
0.1
0.05
0
0 0.005 0.01 0.015 0.02 0.025
LPSD
7.5 Conclusion
programming techniques. In this study we have solved the three models using a very
popular Multiobjective Evolutionary Algorithm, SPEA2. Our goal was to show that
approximating the efficient frontiers can provide useful portfolios for the investor.
The results showed that the majority of the generated portfolios despite approxi-
mate have better out-of-sample performance than the S&P 500 index except Mean–
Variance-Skewness portfolio selection model. This outperformance obtained using
three performance measures, final wealth, Sharpe ratio and Sortino ratio.
Comparison against competing portfolios shows that the portfolios of the proposed
models except Mean–Variance-Skewness provide competing results. Especially the
efficient portfolios of the proposed models that concentrate on the minimum risk
area of the efficient frontier provide better results that the competing portfolios.
As future research a rolling window of the out-of-sample analysis may be consid-
ered in order to test the predictive ability of the proposed portfolio selection models.
Also, in this analysis a transaction cost constraint may be imposed on the portfolio
selection models.
References
1. Anagnostopoulos, K.P., Mamanis, G.: A portfolio optimization model with three objectives
and discrete variables. Comput. Oper. Res. 37, 1285–1297 (2010)
2. Bechikh, S., Datta, R., Gupta, A., (eds.).: Recent Advances in Evolutionary Multi-objective
Optimization. Springer International Publishing (2017)
3. Board, J.L.G., Sutcliffe, C.M.S.: Estimation methods in portfolio selection and the effectiveness
of short sales restrictions: UK evidence. Manag. Sci. 40(4), 516–534 (1994)
4. Bodie, Z., Kane, A., Marcus, A.J.: Investments, 10th edn. McGraw-Hill (2014)
5. Bruni, R., Cesarone, F., Scozzari, A., Tardella, F.: Real-world datasets for portfolio selection
and solutions of some stochastic dominance portfolio models. Data Brief 8, 858–862 (2016)
6. Chan, L.K.C., Karceski, J., Lakonishok, J.: On portfolio optimization: forecasting covariances
and choosing the risk model. Rev. Fin. Stud. 12(5), 937–974 (1999)
7. Chang, T.J., Meade, N., Beasley, J.E., Sharaiha, Y.M.: Heuristics for cardinality constrained
portfolio optimization. Comput. Oper. Res. 27, 1271–1302 (2000)
8. Chen, B., Zhong, J., Chen, Y.: A hybrid approach for portfolio selection with higher-order
moments: empirical evidence from shanghai stock exchange. Exp. Syst. Appl. 145(1), 1–11
(2020)
9. De Giorgi, E.: Reward-risk portfolio selection and stochastic dominance. J. Bank Fin. 29,
895–926 (2005)
10. DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is
the 1/N portfolio strategy? Rev. Fin. Stud. 22(5), 1915–1953 (2007)
11. DiBartolomeo, D.: The Equity Risk Premium, CAPM and Minimum Variance Portfolios.
Northfield News (2007)
12. Elton, E.J., Gruber, M.J., Brown, S.J.: Modern Portfolio Theory and Investment Analysis, 9th
edn. Wiley (2014)
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 175
13. Emmerich, M.T.M., Deutz, A.H.: A tutorial on multiobjective optimization: fundamentals and
evolutionary methods. Nat. Comp. 17, 585–609 (2018)
14. Fishburn, P.C.: Mean-risk analysis with risk associated with below target returns. Am. Econ.
Rev. 67, 116–126 (1977)
15. Garcia-Bernabeu, A., Salcedo, J.V., Hilario, A., Pla-Santamaria, D., Herrero, J.M.: Computing
the mean-variance-sustainability nondominated surface by Ev-MOGA. Complexity (2019).
https://doi.org/10.1155/2019/6095712
16. Konno, H., Yamazaki, H.: Mean absolute deviation portfolio optimization model and its
applications to Tokyo stock market. Manag. Sci. 37, 519–531 (1991)
17. Kuosmanen, T.: Efficient diversification according to stochastic dominance criteria. Manag.
Sci. 50, 1390–1406 (2004)
18. Luenberger, D.G.: Investment Science. Oxford University Press, New York (1998)
19. Mamanis, G.: A comparative study on multi-objective evolutionary algorithms for tri-objective
mean-risk-cardinality portfolio optimization problems. In: Patnaik, S., Tajeddini, K., Jain, V.
(eds.), Computational Management. Modeling and Optimization in Science and Technologies,
pp. 277–303 (2021)
20. Mamanis. G.: Analyzing the performance of a two-tail-measures-utility multi-objective
portfolio optimization model. Oper. Res. Forum 2(58) (2021)
21. Mamanis, G., Anagnostopoulos, K.P.: Multiobjective optimization of a discrete mean-variance-
skewness portfolio selection model using SPEA2. J. Fin. Decis. Mak. 7(2), 75–86 (2011)
22. Maringer, D., Parpas, P.: Global optimization of higher order moments in portfolio selection.
J. Glob. Opt. 43, 219–230 (2009)
23. Markowitz, H.M.: Portfolio selection. J. Fin. 7, 77–91 (1952)
24. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn.
Springer (2013)
25. Ogryczak, W., Ruszczynski, A.: From stochastic dominance to mean-risk models: semidevia-
tions as risk measures. Eur. J. Oper. Res. 116, 33–50 (1999)
26. Ogryczak, W., Ruszczynski, A.: On consistency of stochastic dominance and mean-
semideviations models. Math. Prog. 89, 217–232 (2001)
27. Rockafeller, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42
(2000)
28. Rockafeller, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank
Fin. 26(7), 1443–1471 (2002)
29. Roman, D., Darby-Dowman, K., Mitra, G.: Mean-risk models using two risk measures: a
multi-objective approach. Quant. Fin. 7(4), 443–458 (2007)
30. Yitzhaki, S.: Stochastic dominance, mean variance and Gini’s mean difference. Am. Econ. Rev.
72, 178–185 (1982)
31. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and
the strength pareto approach. IEEE Trans. Evol. Comp. 3(4), 257–271 (1999)
32. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolu-
tionary Algorithm. TIK-103, Department of Electrical Engineering, Swiss Federal Institute
of Technology, Zurich, Switzerland (2001)
Part III
Risk Assessment and Ethical
Considerations
Chapter 8
Bankruptcy Forecasting of Indian
Manufacturing Companies Post
the Insolvency and Bankruptcy Code
2016 Using Machine Learning
Techniques
S. Kaur (B)
Amity University, Noida, Uttar Pradesh 201313, India
e-mail: [email protected]
A. Munde
Southampton Malaysia Business School, University of Southampton, Iskandar Puteri, Malaysia
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 179
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_8
180 S. Kaur and A. Munde
forecast bankruptcy. This research, on the other hand, contributes to decision tree-
based studies and they are showing more accurate results as compared to ANN or
logistic regression.
Limitations: One of the major limitations of this paper is that it mainly considers
financial variables for research. Recent research has considered not just financial
variables, but also corporate governance indicators and macroeconomic variables.
Another disadvantage is that this report primarily focuses on the manufacturing
industry, thus bankruptcy research in other industries is required.
8.1 Introduction
• Winding up provisions of the Companies Act, 1956, Companies Act, 2013, and
LLP Act, 2013
• The Presidential Towns Insolvency Act, 1909
• Provincial Insolvency Act, 1920
In June 2017, the Reserve Bank of India (RBI) ordered that the 12 major loan
defaulters be taken before the National Company Law Tribunal (NCLT) and held
responsible under the IBC [8]. There has been relatively little study on forecasting
bankruptcy following the implementation of IBC. The primary goal of the IBC is to
aid distressed corporate defaulters [9].
Few research gaps have been identified while doing research on this research
topic. One of the research gaps is that due to the implementation of IBC 2016, the
research on bankruptcy data post-IBC 2016 is very limited. This paper focuses on
the prediction of bankruptcy on the data post-IBC 2016. Another research gap is
that research on bankruptcy prediction using machine learning techniques in India is
very limited. The significance and use of machine learning in today’s environment
are demonstrated in this paper. The goal of this study is to show how machine
learning has grown for the benefit of society by demonstrating how it can be used
to anticipate bankruptcy. First, PCA is used in this study to investigate the impact
of identified variables on bankruptcy, and then several methodologies such as ANN,
logistic regression, decision tree, and random forest are used to build financial distress
prediction models and compare their performance using bankruptcy data from India.
This can assist in analyzing and predicting a company’s financial health, preventing
it from going bankrupt. Through the use of financial indicators, it can provide a
more in-depth understanding of the implementation of the four models mentioned
above. The models created in this study could be used to predict corporate failure
by investors, creditors, auditors, and others associated with a company. The primary
objectives of this research are:
R1: To establish financial variables for predicting the bankruptcy of Indian
manufacturing companies.
R2: To investigate the impact of financial variables on the bankruptcy prediction of
Indian manufacturing companies using PCA.
R3: To perform a comparative analysis of several machine learning approaches in
Indian manufacturing companies.
Since the early work of FitzPatrick in the 1930s, there has been extensive research into
the ability to predict financial distress for financial companies [10]. Beaver began by
claiming that financial ratios can be used in models that predict bankruptcy, financial
difficulties, and individual firm failure [11]. In 1968, Altman developed the first
model to predict bankruptcy. Altman developed the Z-score by incorporating five
variables [12]. The model’s short-term accuracy was 95%, according to Altman, but
182 S. Kaur and A. Munde
when applied to two or more years prior to the bankruptcy, that figure lowers to
72%. Ohlson and Zmijewski investigated the possibility of bankruptcy using logit
and probit models [13, 14]. A logit analysis is utilized in a different model created
by Zavgren, to determine the likelihood that a solution specified by a dichotomous
(or polytomous) dependent variable will occur [15].
Neural networks (NNs) dominated the Artificial Intelligence (AI) research area
in the mid-1980s. Ever since, academics have widely utilized NNs, specifically
back-propagation neural networks (BPNN), to solve classification problems such
as bankruptcy prediction [16]. In a 1991 study, Hertz stated that algorithm-based
computer networks called ANNs might be constructed to mimic the internal work-
ings of the human brain [17]. Odom and Sharda were the first to use NNs in an issue of
bankruptcy prediction [18]. Among others, Altman conducted another investigation
with NNs. The problems of “black-box” NN algorithms, including the indicators’
illogical weightings and overfitting in the training stage, which both have a severe
impact on prediction accuracy, were highlighted in particular [19].
Bhunia and Sarkar forecasted financial distress in Indian firms using financial
ratios and multiple discriminant analysis. Profitability and liquidity ratios did excep-
tionally well in predicting distress, according to the findings [20]. Debt ratio, total
asset turnover ratio, working capital ratio, and net income to total assets ratio are all
significant financial measures [21].
In order to better accurately recognize and separate bankrupt companies from
non-bankrupt companies, the topic of novel and innovative forecasting models for
bankruptcy was researched [22]. Fedorova predicts bankruptcy using financial infor-
mation from Russian businesses and combining Multiple Discriminant Analysis
(MDA), Logistic Regression (LR), Classification and Regression Tree (CRT), and
ANNs [23]. Although the MDA is the most commonly used approach for predic-
tive modeling, logistic analysis (LA) techniques are also used to manage various
MDA-related challenges, as stated in [24]. The t-statistics feature selection method
was used to assess a number of intelligence techniques, including Random Forest,
Regression Trees, Support Vector Machines (SVM), Logistic Regression, and Multi-
layer perceptron (MLP) [25]. PCA is another technique used for feature selection.
Pearson was the one who initially introduced the idea of PCA [26]. With PCA, a new
set of variables called principal components is created with each one being a linear
combination of the previous ones [16].
The major goal of the study by Kim was to systematically evaluate machine
learning techniques for forecasting company failure [27]. In business Financial
Distress Prediction (FDP) investigations, logistic regression is extensively utilized.
Decision Trees (DTs) for FDP are used in a variety of studies, including Chen
[28]. Similar methods, like Random Forests, are described in the paper by Breiman,
however, they have the advantage of separating the data using multiple decision trees.
A large number of independent, unpruned decision trees are used by Random Forest
(RF), also known as Random Subspace, for training and creating the class [29].
Creamer and Freund were some of the first researchers to employ random forests
for bankruptcy prediction problems [30]. Only a few research have looked into the
usage of random forest in business financial distress prediction [25]. According to the
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 183
study, machine learning algorithms such as random forest, bagging, boosting, and
SVM outperform statistical techniques such as discriminant analysis and logistic
regression by 10% [31].
The IBBI website was used to obtain information about bankrupt companies, and
77 listed companies were discovered to be bankrupt. 49 of the 77 companies were
in the manufacturing sector. During the data collection method, 14 companies’ data
was revealed to be missing. This study included the remaining 35 companies. Non-
bankrupt companies are selected based on the sector of the bankrupt company and
the total worth of the bankrupt company’s assets [32].
This study involves data from 70 companies. Data from the past three years
is obtained from both non-bankrupt and bankrupt corporations’ annual reports. 35
companies consist of 18 sectors of the manufacturing industry. Four companies are
in the cable sector, as can be seen in the table above. Four companies are in the
textile industry. Three companies are in the steel industry, while two are in the
mining industry. Two companies in the auto ancillaries sector. One company from
the automobile sector. Two companies from the chemical and paper sector each. Four
companies in the gas and petroleum industry. Each company is from the non-ferrous
metal, electronics, Fast-moving consumer goods (FMCG), and glass sector. Three
companies in the agro-processing sector. Each company is from pharmaceuticals,
alcoholic beverages, consumer durables, and the plastic sector.
8.3.2 Methodology
8.3.2.1 Variables
For the purpose of predicting bankruptcy, 15 financial variables are selected. On the
basis of earlier studies, variables are selected (Table 8.1).
According to Fig. 8.1, the cumulative sum of variance explained is 89.5. After the
application of PCA (Principal Component Analysis), only 7 variables were found to
be relevant and under the explained variance. Those variables are (Table 8.2).
Before PCA
As seen in Table 8.3, the random forest technique outperformed all other prediction
algorithms in a comparative analysis before adopting PCA. In predicting bankruptcy,
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 185
decision trees had the second-highest accuracy of 90.47%. Following that, ANN had
an accuracy of 87.76% and logistic regression had an accuracy of 80.95%.
After PCA
After applying PCA, the accuracy of the random forest technique was lowered to
91.67%. The accuracy of ANN did not change after PCA and remained at 87.76%.
After PCA, the accuracy of the decision tree reduced to 85.71%. Logistic regression
had the lowest accuracy yet again but increased to 85.71% after PCA (Table 8.4).
8.6 Conclusion
Bankruptcy prediction is one of the most important and rapidly growing areas of
finance. The accuracy of forecasting approaches is critical. Because of the accuracy
of predictive algorithms, the companies will be able to respond to a bankruptcy early
warning. If a method for forecasting bankruptcy is established, investors will be able
to decide whether or not to invest in a company.
8.6.2 Conclusion
Working capital to total assets, retained earnings to total assets, EBIT to total assets,
sales to total assets, EBT to current liabilities, debt to asset ratio, and current ratio
were found to be useful in predicting bankruptcy in India’s manufacturing industry
after employing PCA. After using PCA, the accuracy of the decision tree and random
forest declined. In the paper by Chen, it was found that when PCA was used, the
results for the decision tree and logistic regression approaches were less accurate.
Whereas the accuracy of logistic regression increased after using PCA, the accuracy
of the ANN approach did not change after using PCA [28].
Random forest excelled ANN, decision trees, and logistic regression in predicting
bankruptcy. It demonstrated that random forest outperforms other forecasting tech-
niques, including the most widely used, logistic regression. In this study, ANN
surpassed decision tree techniques, but in earlier studies [37] decision tree was
thought to be superior to ANN. Logistic regression had the lowest accuracy, which
was also reported in earlier research like [28]. Because of their extensive use and high
accuracy, logistic regression, and artificial neural networks were used in the majority
of studies on bankruptcy prediction in India. Few studies in India used decision tree-
based approaches to forecast bankruptcy. This study, on the other hand, contributes
to decision tree-based investigations, which produce more accurate results than ANN
or logistic regression.
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 187
The most of previous bankruptcy prediction research was undertaken in the banking
industry, particularly in India. As India’s second-largest contributor to GDP, the
manufacturing sector needs extensive research on bankruptcy prediction.
After the implementation of IBC 2016, it is clear that the amount of data that is
freely available in India is limited. As data is limited in this study, future studies with
larger data sets and longer time frames might forecast bankruptcy. This study gives
suggestions for future research on the prediction of bankruptcy in other Indian indus-
tries. Macroeconomic variables and corporate governance measures can also be
included as independent variables when forecasting bankruptcy in India.
References
1. Di Donato, F., Nieddu, L.: A new proposal to predict corporate bankruptcy in Italy during the
2008 economic crisis. In: Causal Inference in Econometrics, 213–223 (2016). https://doi.org/
10.1007/978-3-319-27284-9_13
2. Farooq, U., Jibran Qamar, M.A., Haque, A.: A three-stage dynamic model of financial distress.
Manag. Financ. 44(9), 1101–1116 (2018). https://doi.org/10.1108/MF-07-2017-0244
3. Yu, Q., Miche, Y., Séverin, E., Lendasse, A.: Bankruptcy prediction using extreme learning
machine and financial expertise. Neurocomputing 128, 296–302 (2014). https://doi.org/10.
1016/j.neucom.2013.01.063
4. Ahn, H., Kim, K.: Bankruptcy prediction modeling with hybrid case-based reasoning and
genetic algorithms approach. Appl. Soft Comput. 9(2), 599–607 (2009). https://doi.org/10.
1016/j.asoc.2008.08.002
5. Roychoudhury, A.: Rajya Sabha Passes Bankruptcy Code. Business Standard (2016). https://
www.business-standard.com/article/economy-policy/rajya-sabha-passes-bankruptcy-code-
116051200075_1.html
6. Laws, I.: Short Note on Insolvency and Bankruptcy Code, 2016. IBC Law (2019). https://ibc
law.in/short-note-on-insolvency-and-bankruptcy-code-2016/
7. BCAS.: Insolvency and Bankruptcy Code, 2016 (IBC). BCAS Referencer (2022). https://www.
bcasonline.org/Referencer2018-19/part5/insolvency-and-bankruptcy-code-2016-ibc.html
8. John, N.: Bankruptcy Doubles to 3, 774 in FY20; Manufacturing, Construction Worst-Hit.
Business Today (2020)
9. Kaushik, A.: Is IBC 2016 Effective?|NITI Aayog. NITI Aayog (2020). https://www.niti.gov.
in/ibc-2016-effective
10. FitzPatrick, P.J.: A comparison of ratios of successful industrial enterprises with those of failed
firms. In: The Certified Public Accountant, 598–605 (1932)
11. Beaver, W.H.: Financial ratios as predictors of failure. J. Account. Res. 4, 71–111 (1966)
12. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
J. Financ. 23(4), 589–609 (1968)
13. Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res.
18(1), 109–131 (1980). https://doi.org/10.2307/2490395
14. Zmijewski, M.E.: Methodological issues related to the estimation of financial distress prediction
models. J. Account. Res. 22, 59–82 (1984)
15. Zavgren, V.: Assessing the vulnerability to failure of american industrial firms : a logistic
analysis. J. Bus. Fin. Account. 12 (1985)
188 S. Kaur and A. Munde
16. Ying, S., Shiwei, Z., Tao, Z.: Predicting financial distress of Chinese listed corporate by a hybrid
PCA-RBFNN model. In: 2008 Fourth International Conference on Natural Computation, 3,
277–281 (2008). https://doi.org/10.1109/ICNC.2008.778
17. Hertz, J., Krogh, A., Palmer, R.G., Horner, H.: Introduction to the theory of neural computation.
Phys. Today 44(12), 70 (1991). https://doi.org/10.1063/1.2810360
18. Odom, M.D., Sharda, R.: A neural network model for bankruptcy prediction. In: 1990 IJCNN
International Joint Conference on Neural Networks, 163–168 (1990). https://doi.org/10.1109/
ijcnn.1990.137710
19. Altman, E.I., Marco, G., Varetto, F.: Corporate distress diagnosis: comparisons using lineae
discriminant analysis and neural networks (the Italian experience). J. Banking Fin. 18(3),
505–529 (1994). http://linkinghub.elsevier.com/retrieve/pii/0378426694900078
20. Bhunia, A., Sarkar, R.: A study of financial distress based on MDA. J. Manag. Res. 3(2), 1–11
(2011). https://doi.org/10.5296/jmr.v3i2.549
21. Alifiah, M.N.: Prediction of financial distress companies in the trading and services sector in
Malaysia using macroeconomic variables. Procedia Soc. Behav. Sci. 129, 90–98 (2014). https://
doi.org/10.1016/j.sbspro.2014.03.652
22. Smith, M., Alvarez, F.: Predicting firm-level bankruptcy in the Spanish economy using extreme
gradient boosting. Comput. Econ. 59(1), 263–295 (2022). https://doi.org/10.1007/s10614-020-
10078-2
23. Fedorova, E., Gilenko, E., Dovzhenko, S.: Bankruptcy prediction for Russian companies: appli-
cation of combined classifiers. Expert Syst. Appl. 40(18), 7285–7293 (2013). https://doi.org/
10.1016/j.eswa.2013.07.032
24. Grice, J.S., Dugan, M.T.: The limitations of bankruptcy prediction models: some cautions for
the researcher. Rev. Quant. Financ. Acc. 17(2), 151–166 (2001). https://doi.org/10.1023/A:101
7973604789
25. Chandra, D.K., Ravi, V., Bose, I.: Failure prediction of dotcom companies using hybrid intelli-
gent techniques. Expert Syst. Appl. 36(3), 4830–4837 (2009). https://doi.org/10.1016/j.eswa.
2008.05.047
26. Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. Lond.
Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901). https://doi.org/10.1080/147864
40109462720
27. Kim, H., Cho, H., Ryu, D.: Corporate default predictions using machine learning: literature
review. Sustainability 12(16), 6325 (2020). https://doi.org/10.3390/SU12166325
28. Chen, M.Y.: Predicting corporate financial distress based on the integration of decision tree
classification and logistic regression. Expert Syst. Appl. 38(9), 11261–11272 (2011). https://
doi.org/10.1016/j.eswa.2011.02.173
29. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
30. Creamer, G., Freund, Y.: Predicting performance and quantifying corporate governance risk for
Latin American Adrs and Banks. In: Financial Engineering and Applications. MIT, Cambridge
(2004)
31. Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction.
Expert Syst. Appl. 83, 405–417 (2017). https://doi.org/10.1016/j.eswa.2017.04.006
32. Lakshan, A.M.I., Wijekoon, W.M.H.N.: Predicting corporate failure of listed companies in Sri
Lanka. GSTF Bus. Rev. (GBR) 2(1), 180–185 (2012)
33. Springate, G.L.V.: Predicting the Possibility of Failure in a Canadian Firm: Unpublished MBA
Research Project/Simon Fraser University (1978)
34. Mselmi, N., Lahiani, A., Hamza, T.: Financial distress prediction: the case of French small and
medium-sized firms. Int. Rev. Financ. Anal. 50, 67–80 (2017). https://doi.org/10.1016/j.irfa.
2017.02.004
35. Ong, S.W., Choong Yap, V., Khong, R.W.L.: Corporate failure prediction: a study of public
listed companies in Malaysia. Manag. Financ. 37(6), 553–564 (2011). https://doi.org/10.1108/
03074351111134745
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 189
36. Hu, Y.C., Ansell, J.: Measuring retail company performance using credit scoring techniques.
Eur. J. Oper. Res. 183(3), 1595–1606 (2007). https://doi.org/10.1016/j.ejor.2006.09.101
37. Olson, D.L., Delen, D., Meng, Y.: Comparative analysis of data mining methods for bankruptcy
prediction. Decis. Support. Syst. 52(2), 464–473 (2012). https://doi.org/10.1016/j.dss.2011.
10.007
Chapter 9
Ensemble Deep Reinforcement Learning
for Financial Trading
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 191
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_9
192 M. Vishal et al.
9.1 Introduction
Special Notations
Notation Meaning
Gt Cumulative reward at timestep t
Rt Reward earned by the agent at timestep t after performing action at timestep t − 1 in
the environment
D Number of stocks
Z+ Non-negative integers
R+ Non-negative real numbers
bt Balance available in the portfolio at time step t
pt Adjusted close price of each stock
Mt Calculated MACD value using close price
Ct Calculated CCI value using high, low, close prices
Xt Calculated ADX value using high, low, close prices
ht Number of shares of each stock
rt Calculated RSI value using close prices
Rp Expected return of the portfolio
Rf Risk free return of the portfolio
StdDev Standard deviation of the portfolio
A represents the action of the agent, S represents the agent state in the environ-
ment, π (S/A) represents action taken by the agent at the given state S, and P(A/S)
probability distribution for actions to be taken from that state.
The reward is a scalar feedback signal (either positive or negative) that indicates
how well an agent is performing by taking action at the right time. The agent’s job
is to maximize cumulative reward.
To address the dynamic nature of the stock market we used MDP as follows:
State s = [b, p, h] is a vector that includes stock prices p ∈ RD
+.
Recent deep reinforcement learning applications in the economic markets use either
the actor-critic alone, critic-only, or actor-only learning approaches in discrete or
continuous state and action spaces.
Recently, the actor-critic method has been used in finance [6–9]. The critic
network, which represents the value function, and the actor network, which reflects
the policy, are supposed to be updated simultaneously. The critic network estimates
the value function while the actor modifies the policy probability distribution in
accordance with the critic network using policy gradients methods. Both actor and
critic networks become more skilled at executing better actions and evaluating those
performances.
An actor-only approach [10–12], has been used. The idea is that the agent itself
quickly figures out what to do. A neural network learns the policy as opposed to the
Q-value. The most well-liked learning technique, critic-only learning, a single stock
is trained by an agent and, for instance, employs Deep Q-learning (DQN) and its
upgrades to resolve a discrete action space problem [13–16].
The authors developed and implemented the Temporal Difference and Kernel-
based Reinforcement Learning techniques in [17] for Financial Trading systems [18]
introduces a model-less convolutional neural network that uses the historical cost data
for a collection of financial assets as input and produces the group’s portfolio weights.
196 M. Vishal et al.
0.7 years’ worth of price information from a bitcoin exchange is used to train the
network. To maximize the cumulative return, training is carried out via reinforcement
techniques and a survey by [19].
Several studies simplified trading activities to include purchasing, selling, or
holding a single asset using RL with discrete action space. Trading with a small
number of positions has also been utilized. Still, it is not easy to extend this strategy
to big portfolios because adding assets causes the action space to grow exponentially.
Policy-based RL is used with deep learning as its approximation function to handle
the continuous action space problem [20–23].
Authors of [24] presented the Differential Sharpe ratio, Sterling ratio, Calmar
ratio, and Optimal variable weight Portfolio Allocation as an optimal function.
In [25], ReLU neurons are contained in three layers of neural networks that are
used to train RL agents using the Q-learning algorithm. In order to determine the
effects of earlier states and actions on policy optimization in non-Markov decision
processes, [26] employed deep recurrent neural network (RNN) models, GRU [27]
experimented with several RL approaches to integrate with the DL technique while
resolving the policy optimization issue.
The trading agent carefully tracks the price of an asset in this new price trailing
approach [28]. Instead of properly forecasting the future price within a specified
margin (direction) [5], the deep reinforcement learning approach effectively trains
an intelligent automated trader. It incorporates historical stock price information as
well as market mood perception by collecting information regarding [29] for a stock
portfolio made up of Dow Jones businesses [18, 26] and cryptocurrency [10].
Creation of a trading strategy for markets with a changing number of assets.
Unseen assets can be simply integrated without the network having to be altered or
retrained. For markets with transaction costs, optimal transactions are calculated [6],
and there is also a study on Asset variability and correlation [36].
The following assumption and constraints are made during stock trading.
• Non-negative balance b ≥ 0: actions of RL shouldn’t result in a deficit in the
account. The stocks are separated into sets for sale (S) based on the behaviour at
time t, buying B, holding H, were
• SU BU H = {1, 2, . . . , D} and they have no common elements in a set. Let pt B =
[pt i : i ∈ B] and kt B = [kt i : i ∈ B] be the vectors of prices of stocks and the number
of shares to buy for a particular stock. Similarly for selling stocks pt S and kt S , for
holding stocks Pt H and Kt H .
– Bt+1 = Bt + (Pt S )T Kt S − (Pt B )T , Kt B ≥ 0
• Transaction cost: A cost that the investor incurs when trading in the stock market
which is deducted from the sum of money. These include SEBI fee, Stamp duty,
securities transaction tax (STT), exchange fee, and brokerage GST, we assume
our transaction cost to be 0.15% of the value of every trade
– Ct = PT kt * 0.15%
• Market liquidity: Orders are carried out at a close price. We anticipate that our
RL agent won’t have an impact on the market.
We used a dimensional vector consisting of seven values that represent the state
space. [bt , pt , Mt , Ct , Xt , ht , Rt ].
In order to develop a highly reliable trading strategy that chooses one of the three RL
algorithms based on the Sharpe ratio when trading, we used the ensemble technique.
This is due to the fact that each trading agent is sensitive to various market move-
ments. One algorithm is good at predicting market upward trend (bullish), other is
adjusted to the unstable market, whichever algorithm gets the highest Sharpe ratio,
that algorithm is selected as a trading agent. We download the data from the yahoo
finance website from 01-01-2008 to 01-09-2021, whole data is split into 01-01-
2008 to 01-01-2017 for training, from 02-01-2017 to 01-01-2018 for validation, and
tuning hyperparameters. We tested the performance of the model from trading date
02-01-2018 to 01-09-2021.
In the ensemble strategy, we used the traditional actor-critic algorithm (A2C) [10].
A2C is improved by using policy gradient updates as the replacement for Trust region
policy optimization [40]. To lessen the variation of the policy gradient technique, A2C
uses an advantage function which along with value function is estimated by the critic
network. As a result, both the aspects i.e., how good activity is today and how much
better it can be in the future are considered in its evaluation. To improve the model
and reduce the policy network’s high variance.
To update the gradients with different data samples, A2C uses duplicates of the
same agent. Each agent functions autonomously in order to interact with the same
environment. A global network receives the average gradients across all agents by
A2C using a coordinator at the end of when all agents finished calculating their
gradients in each iteration. For the global network to update the critic and actor
networks. A2C’s objective function is described as below:
( T )
∑
∇Jθ (θ ) = E ∇θ logπθ (at |st )A(st , at ) . (9.1)
t=1
where π(at |st ) is a policy network, A(st , at ) is the Advantage function and it is
expressed as:
To encourage the highest possible investment return, DDPG [30] is utilized. DDPG
utilizes neural networks as function approximators and integrates the frameworks
200 M. Vishal et al.
of policy gradient [41] and Q-learning [26]. In contrast to DQN, which derives
its knowledge indirectly from Q-values tables and is plagued by the dimensionality
problem [42], DDPG derives its knowledge directly from observations through policy
gradient. It is advised to deterministically map states to actions in order to more
closely match the continuous action space environment.
Every time step, the DDPG agent takes a step. at at st , receives a reward r t, and
arrives at st+1 . The transitions (st , at , st+1 ,r t ). The buffer is used to draw a group of
N transitions, and the Q-value yi is updated as follows:
∧ ∧
yi = ri + γ Q0 (si+1 , μ0 (si |θ μ , θ Q )), i = 1, . . . , N. (9.3)
By minimizing the loss function L(θ Q ) which represents the anticipated difference
between the outputs of the target critic network Q̂ and the critic network Q, the critic
network is then updated. Q, i.e.,
( ) [( ( ))2 ]
L θ Q = Est,at,rt,st+1 ∼ buffer yi − Q st , at |θ Q (9.4)
As a part of the ensemble technique, we investigate and employ PPO. To ensure that
the new policy won’t diverge too far from the previous one by updating the policy
gradient, PPO [40] is developed. By adding a clipping term to the goal function, PPO
attempts to simplify the Trust Region Policy Optimization (TRPO) [5, 40, 43].
Let us assume the likelihood ratio of the new policies versus the old ones is
expressed as:
πθ (at |st )
rt (θ ) = (9.5)
πθold (at |st )
The successor to DDPG is TD3, which is unstable and strongly dependent on deter-
mining the proper hyperparameters [44]. This causes the algorithm to overestimate
the Q values; nevertheless, as time goes on, these inaccuracies cause the program to
reach local optimum conditions in TD3, and the problem is solved.
9 Ensemble Deep Reinforcement Learning for Financial Trading 201
∑ |
|
∇φ J (φ) = N −1 ∇a Qθ1 (s, a)| ∇φ πφ (s) (9.7)
a=πφ (s)
The definition of SAC for RL involves continuous action spaces, it not only maxi-
mizes the total reward but also maximizes the entropy of the policy which helps in
improving the exploration.
∑
T
J (π ) = E(st ,at )∼ρπ [r(st , at ) + αH(π (·|st ))] (9.8)
t=0
[ ]
Q̂(st , at ) = r(st , at ) + γ Est+1 ∼p Vψ (st+1 ) (9.9)
An objective function with a reward term and an entropy term H that are both
weighted by α, a state value function V parameterized by ψ.
We considered all the 30 stocks data from yahoo finance which are listed under the
Sensex index and Dow Jones respectively. Before training, we construct the DRL
agent’s environment to resemble a real-world trading system so that the agent can
engage in interaction and learning. Practical trading requires consideration of several
variables which include historical stock prices, present shareholdings, technical indi-
cators, etc. Our trading agent must observe the surrounding area for information
and conduct the appropriate steps outlined in the preceding section. To create our
environment and train the agent, we used OpenAI gym [39]. All the experiments
are performed on Google Collaboratory which is free for all Google account users
(premium account features are also present).
The following performance metrics are used in the current research work.
• Cumulative returns are the percentage return given by the portfolio from the initial
value to the final value.
• The geometric average of the money that an investment earned each year during
a specific time is known as annualized returns.
202 M. Vishal et al.
• The largest percentage of loss that occurred during the trading period is known
as the max drawdown.
• A portfolio’s risk-adjusted return is measured by the Sharpe ratio.
( )
S = Rp − Rf /StdDev
And we used some technical indicators like MACD, CCI, ADX, and RSI which
are described in the following paragraphs.
• Moving Average Convergence Divergence (MACD) is a trend-following indicator
that displays the correlation between two stock price moving averages.
• Commodity Channel Index (CCI) compares the price at the moment to the price
average over a certain period of time.
• Average Directional Movement Index (ADX) is a trend indicator that helps us
decide if a trend is worth following or not.
• A gauge of momentum, the Relative Strength Index (RSI), assesses the size of
recent price fluctuations to assess overbought or oversold positions.
• Annual volatility is the measure of the variance of returns over a year.
• Calmar ratio is the ratio of average annual return and max Drawdown.
• The omega ratio is the risk-return performance measure of the investment
portfolio.
• The risk-adjusted return of an investment asset is the Sortino ratio. When the
likelihood that the investment would deviate more than three standard deviations
from the mean which is indicated by a normal distribution is the Tail Ratio.
• Alpha is the excess return of an investment relative to the benchmark index.
• Beta is a metric used to compare a portfolio’s volatility to that of the entire market.
Ensemble strategy with TD3, PPO, DDPG outperformed all other ensembles
Strategies presented in Table 9.1 and the upward arrow indicates the larger value is
most desirable, whereas the downward arrow indicates the lower value is the most
desirable.
Table 9.2 presents the results of 30 Dow Jones index stocks for all three ensemble
strategies. The ensemble method TD3, PPO, and DDPG outperformed the other two
ensemble strategies.
Table 9.3 presents the results of 30 Dow Jones index stocks for all three ensembles.
The ensemble method TD3, PPO, and DDPG outperformed the other two ensembles.
Table 9.2 Results presented in column 1 are based upon the combination of algorithms
explained in [37] and compared with our other combinations of algorithms.
TD3 learns two Q-functions instead of one and uses the smaller of two Q-vales
to form the targets in the loss functions, it updates less policy less frequently than
the Q-function. TD3 adds noise to target action, to exploit Q-function errors by
smoothing out Q along with changes in action, due to these special features of TD3,
We observed it is performing better and giving good results.
Whereas, SAC uses entropy regularization where the policy is trained to maximize
a trade-off between expected return and entropy (randomness in the policy).
9 Ensemble Deep Reinforcement Learning for Financial Trading 203
Table 9.2 Results of three ensemble strategies applied on 30 Sensex index stocks
Algorithms used in ensemble A2C, PPO, DDPG SAC, PPO, DDPG TD3, PPO, DDPG
strategy
Annual return ↑ (%) 15.573 12.433 15.922
Cumulative returns ↑ (%) 84.985 64.552 87.372
Annual volatility ↓ (%) 21.202 22.645 21.541
Sharpe ratio ↑ 0.79 0.63 0.79
Calmar ratio ↑ 0.42 0.33 0.44
Max drawdown ↓ (%) − 36.878 − 37.382 − 36.411
Omega ratio ↑ 1.17 1.31 1.16
Sortino ratio ↑ 1.11 0.91 1.13
Tail ratio ↑ 1.08 1.24 1.40
Daily value at risk ↓ (%) − 2.605 − 2.796 − 2.646
Alpha ↑ 0.09 0.15 0.19
Beta ↑ 0.18 0.02 0.02
Stability ↑ 0.81 0.80 0.91
Time to run Ensemble strategy 138 213 163
(min)
204 M. Vishal et al.
DDPG uses policy data and the bellman equation to learn Q-function and Q-
function to learn the policy. Sometimes this Q-function is overestimated due to the
policy it learned. Which is rectified in TD3.
PPO is a policy gradient method where policy is updated explicitly.
Figure 9.4 is the sample of output taken from the ensemble strategy TD3, PPO,
and DDPG on 30 Sensex index stocks. That indicates the actions taken by the TD3
agent in the ensemble strategy between the period of 2018 to 2021. Where index 2,
index 3… etc. indicate stocks that are arranged in alphabetical order.
9.7 Conclusions
In this paper, we proposed the possibility of using TD3, SAC, DDPG, PPO, and A2C
agents, which are actor-critic based algorithms, to develop stock trading strategy. We
employ an ensemble technique to automatically choose the best performing agent
to trade based on the Sharpe ratio in order to adapt to various market conditions.
Results demonstrate that the TD3, PPO, DDPG ensemble approach outperforms the
9 Ensemble Deep Reinforcement Learning for Financial Trading 205
Fig. 9.4 Actions are taken by the TD3 agent in ensemble strategy
Acknowledgements The authors are thankful to the senior domain expert, Mr. Rajiv Ramachan-
dran for helping us in understanding the concepts of the stock market and guiding us in the project
during his tenure at IDRBT.
References
1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
2. Rubinstein, M.: Markowitz’s “portfolio selection”: a fifty-year retrospective. J. Finance 57(3),
1041–1045 (2002)
3. Betancourt, C., Chen, W.H.: Deep reinforcement learning for portfolio management of markets
with a dynamic number of assets. Exp. Syst. Appl. 164, 114002 (2021)
4. Bertsekas, D.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific (2012)
5. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization.
In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
6. Zhang, Z., Zohren, S., Roberts, S.: Deep reinforcement learning for trading. J. Finan. Data Sci.
2(2), 25–40 (2020)
7. Xiong, Z., Liu, X.Y., Zhong, S., Yang, H., Walid, A.: Practical deep reinforcement learning
approach for stock trading (2018). arXiv preprint arXiv:1811.07522
206 M. Vishal et al.
8. Bekiros, S.D.: Heterogeneous trading strategies with adaptive fuzzy actor–critic reinforcement
learning: a behavioral approach. J. Econ. Dyn. Control 34(6), 1153–1170 (2010)
9. Li, J., Rao, R., Shi, J.: Learning to trade with deep actor critic methods. In: 2018 11th Interna-
tional Symposium on Computational Intelligence and Design (ISCID), vol. 2, pp. 66–71. IEEE
(2018)
10. Jiang, Z., Liang, J.: Cryptocurrency portfolio management with deep reinforcement learning.
In: 2017 Intelligent Systems Conference (IntelliSys), pp. 905–913. IEEE (2017)
11. Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Trans. Neural Netw.
12(4), 875–889 (2001)
12. Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial
signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2016)
13. Chen, L., & Gao, Q.: Application of deep reinforcement learning on automated stock trading.
In: 2019 IEEE 10th International Conference on Software Engineering and Service Science
(ICSESS), pp. 29–33. IEEE (2019)
14. Dang, Q.V.: Reinforcement learning in stock trading. In: International Conference on Computer
Science, Applied Mathematics and Applications, pp. 311–322. Springer, Cham (2019)
15. Jeong, G., Kim, H.Y.: Improving financial trading decisions using deep Q-learning: predicting
the number of shares, action strategies, and transfer learning. Exp. Syst. Appl. 117, 125–138
(2019)
16. Wang, X., Gu, Y., Cheng, Y., Liu, A., Chen, C.P.: Approximate policy-based accelerated deep
reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 31(6), 1820–1830 (2019)
17. Bertoluzzo, F., Corazza, M.: Testing different reinforcement learning configurations for
financial trading: Introduction and applications. Proc. Econ. Finance 3, 68–77 (2012)
18. Pendharkar, P.C., Cusatis, P.: Trading financial indices with reinforcement learning agents.
Exp. Syst. Appl. 103, 1–13 (2018)
19. Fischer, T.G.: Reinforcement Learning in Financial Markets—A Survey. No. 12/2018. FAU
Discussion Papers in Economics (2018)
20. Weng, L., Sun, X., Xia, M., Liu, J., Xu, Y.: Portfolio trading system of digital currencies: a deep
reinforcement learning with multidimensional attention gating mechanism. Neurocomputing
402, 171–182 (2020)
21. García-Galicia, M., Carsteanu, A.A., Clempner, J.B.: Continuous-time reinforcement learning
approach for portfolio management with time penalization. Exp. Syst. Appl. 129, 27–36 (2019)
22. Raffin, A., Hill, A., Ernestus, M., Gleave, A., Kanervisto, A., Dormann, N.: Stable baselines3
(2019)
23. Gold, C.: FX trading via recurrent reinforcement learning. In: 2003 IEEE International Confer-
ence on Computational Intelligence for Financial Engineering, 2003. Proceedings, pp. 363–370.
IEEE (2003)
24. Almahdi, S., Yang, S.Y.: An adaptive portfolio trading system: a risk-return portfolio opti-
mization using recurrent reinforcement learning with expected maximum drawdown. Exp.
Syst. Appl. 87, 267–279 (2017)
25. Carapuço, J., Neves, R., Horta, N.: Reinforcement learning applied to Forex trading. Appl. Soft
Comput. 73, 783–794 (2018)
26. Hu, Y.J., Lin, S.J.: Deep reinforcement learning for optimizing finance portfolio management.
In: 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 14–20. IEEE
(2019)
27. Kim, T.W., Khushi, M.: Portfolio optimization with 2D relative-attentional gated transformer.
In: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE),
pp. 1–6. IEEE (2020)
28. Katongo, M., Bhattacharyya, R.: The use of deep reinforcement learning in tactical asset allo-
cation. Available at SSRN 3812609 (2021)
29. Koratamaddi, P., Wadhwani, K., Gupta, M., Sanjeevi, S.G.: Market sentiment-aware deep
reinforcement learning approach for stock portfolio allocation. Eng. Sci. Technol. Int. J. 24(4),
848–859 (2021)
9 Ensemble Deep Reinforcement Learning for Financial Trading 207
30. Zarkias, K.S., Passalis, N., Tsantekidis, A., Tefas, A.: Deep reinforcement learning for finan-
cial trading using price trailing. In: ICASSP 2019–2019 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 3067–3071. IEEE (2019)
31. Mabu, S., Chen, Y., Hirasawa, K., Hu, J.: Stock trading rules using genetic network program-
ming with actor-critic. In: 2007 IEEE Congress on Evolutionary Computation, pp. 508–515.
IEEE (2007)
32. Ponomarev, E.S., Oseledets, I.V., Cichocki, A.S.: Using reinforcement learning in the
algorithmic trading problem. J. Commun. Technol. Electron. 64(12), 1450–1457 (2019)
33. Liu, Y., Liu, Q., Zhao, H., Pan, Z., & Liu, C.: Adaptive quantitative trading: an imitative
deep reinforcement learning approach. In: Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 34, no. 02, pp. 2128–2135 (2020)
34. Briola, A., Turiel, J., Marcaccioli, R., Aste, T.: Deep reinforcement learning for active high
frequency trading (2021). arXiv preprint arXiv:2101.07107
35. Li, Y., Zheng, W., Zheng, Z.: Deep robust reinforcement learning for practical algorithmic
trading. IEEE Access 7, 108014–108022 (2019)
36. Olschewski, S., Diao, L., Rieskamp, J.: Reinforcement learning about asset variability and
correlation in repeated portfolio decisions. J. Behav. Exp. Finance 32, 100559 (2021)
37. Yang, H., Liu, X.-Y., Zhong, S., Walid, A.: Deep Reinforcement Learning for Automated Stock
Trading: An Ensemble Strategy. SSRN (2020)
38. Aroussi, R.: yfinance. PyPI (2019). Retrieved July 15, 2022. https://pypi.org/project/yfinance
39. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.:
Openai gym (2016). arXiv preprint arXiv:1606.01540
40. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement
learning with function approximation. Adv. Neural Inf. Process. Syst. 12 (1999)
41. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., et al.: Asynchronous
methods for deep reinforcement learning. In: International Conference on Machine Learning,
pp. 1928–1937. PMLR (2016)
42. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., et al.: Continuous control
with deep reinforcement learning (2015). arXiv preprint arXiv:1509.02971
43. Sewak, M., Sahay, S.K., Rathore, H.: Policy-approximation based deep reinforcement learning
techniques: an overview. In: Information and Communication Technology for Competitive
Strategies (ICTCS 2020), pp. 493–507 (2022)
44. Buşoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I.: Reinforcement learning for control:
performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018)
Part IV
Real-World Applications
Chapter 10
Bibliometric Analysis of Digital Financial
Reporting
Abstract The way people live and conduct business has altered as a result of the
fastest-growing technologies in recent years. The existence of the internet and mobile
devices has resulted in a significant shift in many industries, including banking and
finance, from manual to automation activity and from offline to online transactions.
This study’s goal is to examine the literature that has been written about digital finan-
cial reporting between 2011 and 2022. The approach used in this study is descriptive
research, which is based on document analysis of earlier studies and kinds of litera-
ture on digitalization and financial reporting that were either taken for free or without
requiring registration from online journals. Articles are gathered from a research
database i.e. Scopus and 879 articles that are relevant to this topic were gathered and
looked at. This study looked at a number of factors the volume of articles published
and citation analysis using the bibliometric tool. The overall outcome of this study
shows that the majority of earlier studies focused on how digitalization has bene-
fitted financial reporting.Please confirm if the inserted city name is correct. Amend
if necessary.No changes
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 211
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_10
212 N. Puri and V. Garg
10.1 Introduction
Digital revolution has changed the entire industry in the world. The Industrial revo-
lution 4.0 is quite different in the era of technology. New strategies for business
are being developed by the Government for superior business practice. The digital
revolution has a tremendous effect on all the sectors in the world economy. Indus-
trial revolution 4.0 is not so widely spread but it provides immense opportunity
to human to change the aspect of their life [5]. The information flow has become
fast with the rapid growth in the digital technology. The digital transformation has
bought transparency in the financial reporting system. Financial reporting means
details of any companies financial statements and it is an essential information for
the better growth of the industry. Through digitalization exchange of financial infor-
mation throughout the industry has become easy and transparent. Due to technology
the whole financial reporting process has not only become convenient but also the
quality of information has improved. Digitalization has created new and innovative
prospects for the inventors and organization [14]. The adoption of technology for
example cloud computing, cyber security, big data has helped in the collection of
data and its processing. It helps organization to simplify its financial and corporate
reporting. The emerging digital technology has completely changed the industry
practice. This digital financial reporting process has been widely researched by the
different researchers [6, 17]. The format of presenting accounting data has changed
due to the influence of digitalization. There is a proverb by Henry Ford “If you always
do what you have done, you will always get what you have always got”. These lines
means that if you want something different and extraordinary you will have to shift
form the traditional method to the new one. Now a days everyone is running out of
time, in this situation experts want short and crisp with relevancy in the information.
Financial reports is one of the main component for any kind of business. A set
of technology helps to provide a good quality of reporting at a lower cost. Use
of digital tools for example automation, machine learning, advance analytics helps
to prepare the reports faster. In the coming future it is implied that the reporting
tools itself will become interactive and their will be no need of the paper based
reports for communicating. Digitalization is throughout a new practice in industry
and sometimes it is felt that change will take some time. Financial reporting is not
all about technology only but it is a process of understanding the information in a
more effective way. Using tools and techniques has ended up the repetitive process.
Moudud-Ul-Huq [12] stated in his study that the use of digitalization is helpful for the
auditors also in audit planning, assessing risk, internal evaluation and going concern
decisions. Many large companies have automated the accounting work and they find
it easy. Most of the companies are doing investment for purchasing softwares to
facilitate the accounting work. Accounting information system which is a computer
based system is used for performing accounting analysis in a company. Integration
of digitalization into accounting system has become the essential need in this era.
Innovation in technology has not only made the process of working easy but it has
enhanced the productivity also which leads to fast economic growth. The entire
10 Bibliometric Analysis of Digital Financial Reporting 213
flow of the business has developed due to the impact of digital technology. The
presentation of economic performance through companies website is called internet
financial reporting. Stakeholders and investors of the company prefer internet based
financial reporting, instead of traditional financial reporting system as it saves cost
and maintains time efficiency. This is the era of globalization and the companies are
motivated to publish accurate information and both the financial and non-financial
information of the company should be transparent for the stakeholders. Internet
financial reporting tool helps the companies to deal with the stakeholders in a better
way for reducing problems. All the dimensional aspect of the companies future and
to maintain the sustainability is achieved through internet financial reporting.
This chapter is a bibliometric analysis of the existing literatures on the impact of
digitalization on financial reporting. This study provides a comprehensive overview,
by analyzing the findings from the previous literatures on digital financial reporting.
Damayanti et al. [2] has conducted the study on MSME owners who are now using
digital platform for keeping financial records. The researcher identified that owners
are very positive about using digital technology for maintain financial records as it
has helped them to run their business smoothly. Efimova and Rozhnova [4], exam-
ined the effect of technology on the development of financial reporting process. What
role does technology play in the delivery of financial information and what transfor-
mations have been done in the overall financial recording process. Kulikova et al.
[9] has studied on the development of financial reporting system and presented the
changes in the economy and how technology is significantly impacting the whole
financial system in the world. The study conducted by Nurunnabi and Hossain [15]
in an emerging economy, Bangladesh identified the effect of big data on companies
characteristic. The research was conducted on 83 listed companies and the result
concluded that only 28 companies provided financial information on web sites. Zaidi
et al. [21] has studied about the shift from Bhi-khata to annual statement on compa-
nies website which has occurred due to the technical changes in the economy. With
the help of Beneish model financial statement and window dressing in it of 100
listed companies on NIFTY were investigated. It was identified by the researcher
that construction companies do much window dressing in their financial statements.
A detail and depth audit report is needed to identify flaws in the financial statement.
A rule was approved in SEC in 2008 for the companies to submit their financial
statement in XBRL format. This study by Efendi et al. [3] emphasized on the global
development of XBRL and its benefit. Initially the reporting was low but over the
period of time it increased. The adopters not so used to using it face difficulties
in reporting but after learning will make it easy and it will be cost saving for the
companies. Moudud-Ul-Huq [12] stated in his study that the use of digitalization
is helpful for the auditors also in audit planning, assessing risk, internal evaluation
and going concern decisions. Many large companies have automated the accounting
214 N. Puri and V. Garg
work and they find it easy. A study was conducted was [16] on the 76 companies
of Germany about their current and future status of digital development. A ques-
tion was asked in the survey, how companies are planning towards the use of new
technology and they responded nearly 20% of the companies will share their data
to the suppliers and customers, 19% of them will use online transfer of payments,
14% will replace excel worksheet. Lasi et al. [10] has said that the first industrial
revolution is linked to the mechanization, the second one is linked with the extensive
use of power/electricity and this industry 4.0 is linked with digitalization. Salaudeen
and Alayenni [18] explains why companies take the help to publish their financial
reports on internet and how it is convenient for the stakeholders to collect information
through online platform easily. Internet financial reporting system is overall a very
smooth process for both the companies and investors. Not only the stakeholders but
public can also take financial information of any company. Development in IFR has
been marked so significant that regulators has now made mandate for the companies
to disclose their corporate information on the companies’ website [8]. The content of
IFR includes profit and loss statements, balance sheet, cashflow, sustainability report
and CSR report [20].
10.3 Methodology
The methodology section includes the objectives of the research and the process of
obtaining the data set. The primary objective of this study is to conduct a bibliometric
analysis of papers on Digital financial reporting found in the Scopus database, and
specific research questions have been formulated to achieve this goal (Table 10.1).
It is crucial to select the appropriate search engine for data extraction. In this study,
Scopus was chosen for this purpose due to its reputation as a prominent index and
the fact that it publishes peer-reviewed, high-quality work. Additionally, Scopus
measures the quality of each title using metrics such as the h-index, Cite Score,
SCI Imago Journal Rank, and Source Normalized Impact per Paper. Using Scopus,
879 research papers on the topic of digital financial reporting were found in the
index, with 879 published between 2010 and 2022. The search string used on
Scopus included parameters for identifying annual trends, leading authors and jour-
nals, subject areas, document types, affiliations, and top countries. This search was
conducted on September 21, 2022.
10 Bibliometric Analysis of Digital Financial Reporting 215
10.5 Results
In this research work, the prevalence of digital financial reporting in current research
is analyzed using bibliometric tools. The trends revealed that digital financial
reporting has become increasingly common. As per the study by Al-Sakran and
Al-Sakran [1], the use of mobile and online banking has grown significantly, with
more customers using these platforms to access financial services and products. This
has resulted in increased convenience and efficiency for customers and increased
216 N. Puri and V. Garg
200
180
160
No of publications
140
120
100
80
60
40
20
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Years
revenue for financial institutions [1]. Another area of emphasis in the literature on
digital finance is the utilization of blockchain technology, which is a decentralized
and secure digital ledger that is being considered as a potential solution to various
issues in the financial industry such as fraud and security [11]. Out of these, 879
research papers were published between 2010 and 2022. From Fig. 10.1 it is evident
that there was a steady increase in publications from 2010 to 2013 with 28, 33 and
34 publications respectively, and then from 2017 there was a growth in publications,
reaching a peak in 2020 with 136 articles. A sudden increase was noted in 2017 with
163 articles, and again in 2022 with 183 articles.
The list of authors who have published at least four papers, with a total of 18 such
authors is described in Table 10.2 and Fig. 10.2. Vasarhelyi M. A., who published
25 papers and received 871 citations, held the top spot on this list. Following closely
was Wang T., who authored eight papers and received 265 citations. Mithas S. also
made the list with 7 articles and 434 citations. Additionally, Zhang Y., Dwivedi Y.
K., Dai J. and Zhang I. all contributed to digital financial reporting with 5 articles
each. Moreover, other authors in the list contributed four publications in the field of
digital financial reporting.
10 Bibliometric Analysis of Digital Financial Reporting 217
In this section, the countries with the highest number of publications are listed.
Table 10.6 illustrates that the United States had the most publications at 317, followed
by the United Kingdom with 101, and then Australia and China with 72 each. Other
countries on the list include India with 51, Italy with 47, Germany with 44, Spain
10 Bibliometric Analysis of Digital Financial Reporting 221
with 38, Canada with 31, Malaysia with 30, Hong Kong with 25, France with 24,
Taiwan with 24, Indonesia with 23, and Brazil with 19. Figure 10.5 demonstrates the
countries’ collaboration network.
222 N. Puri and V. Garg
A total 4399 keywords are extracted, but there were 99 keywords at the minimum
number of occurrences of 5. The final network is developed and finally, 186 keywords
are selected for further analysis. The selected keywords are classified into 8 clusters
with different colors as shown in Fig. 10.6. Cluster one consists maximum of 38 items.
It has been observed that most of the researchers have been found the keyword for
xbrl followed by blockchain technology and finance.
10.13 Discussions
The literature on digital financial reporting is vast and multifaceted. Digital financial
reporting refers to the use of digital technologies to create, process, and disseminate
financial information. This includes the use of digital platforms and software to
10 Bibliometric Analysis of Digital Financial Reporting 223
create financial statements, as well as the use of digital tools for data analysis and
visualization.
One of the main areas of focus in the literature on digital financial reporting is the
use of XBRL (extensible Business Reporting Language) technology. XBRL is an
open standard for digital financial reporting that allows for the creation of structured
and machine-readable financial statements [7]. Researchers have examined the bene-
fits and challenges of using XBRL, including its potential to improve the accuracy,
timeliness, and comparability of financial information [7, 19].
Another area of focus in the literature on digital financial reporting is the use of
digital platforms and software for financial reporting. This includes the use of cloud
computing, blockchain technology, and artificial intelligence in financial reporting.
10 Bibliometric Analysis of Digital Financial Reporting 225
Researchers have examined the potential benefits and drawbacks of these technolo-
gies, such as improved efficiency and transparency, but also the security and data
privacy concerns.
10.14 Conclusion
There is also a growing body of literature on the impact of digital financial reporting
on various stakeholders, such as investors, regulators, and accounting profes-
sionals. Researchers have examined how digital financial reporting can improve the
decision-making process for these stakeholders and how it can affect their roles and
responsibilities.
The literature on digital financial reporting is also expanding geographically, with
researchers from different countries studying the adoption and implementation of
digital financial reporting in their respective countries.
Overall, the literature on digital financial reporting highlights the potential benefits
and challenges of using digital technologies in financial reporting, and the need for
further research to fully understand the implications of these technologies for various
stakeholders.
Finally, our bibliometric study has offered useful insights into the environment of
digital financial reporting research. We discovered numerous major trends, themes,
226 N. Puri and V. Garg
and patterns that provide light on the evolution of this discipline through a comprehen-
sive analysis of the relevant literature. Our findings are summarised in the following
important points:
Emerging Research Topics: Our research uncovered new research topics in digital
financial reporting, such as the impact of blockchain technology, the function of
artificial intelligence and machine learning, and the incorporation of sustainability
reporting.
Collaborations and Authors: We noticed notable authors and research organisa-
tions making significant contributions to the field. Collaboration networks among
scholars have been significant in promoting the development of knowledge.
Publication Trends: The number of publications associated to digital finan-
cial reporting has steadily grown over the years, demonstrating the field’s rising
importance.
Citation Patterns: Evident by their substantial citation counts, several key books
and publications have received considerable interest. These works affected future
research opinions.
Geographical Distribution: Research on digital financial reporting is a global
effort, with contributions from scholars and institutions all around the world. This
indicates the topic’s international importance and application.
Future Directions: Our analysis has identified several promising avenues for future
research, such as investigating the ethical and regulatory implications of digital
financial reporting, investigating the adoption challenges faced by organisations,
and assessing the impact of emerging technologies on financial reporting quality.
The findings derived from this analysis have practical consequences for policy-
makers, practitioners, and researchers. Policymakers may utilize this information to
guide regulatory choices, while practitioners can learn about the newest innovations
in digital financial reporting.
Limitations and Future Scope: This research suggests several potential areas for
future research in job stress, such as exploring highly cited research papers and
10 Bibliometric Analysis of Digital Financial Reporting 227
examining technical aspects using different literature databases like WoS (Web of
Sciences) and other internet databases, to compare the findings of this study.
References
1. Al-Sakran, W.A., Al-Sakran, W.: The impact of digital technology on banking services: a
literature review. J. Internet Bank. Commer.Commer. 25(2), 1–13 (2020)
2. Damayanti, F.N., Kusmawati, P., Navia, V., Luckyardi, S.: Readiness the owner of small medium
enterprises for digital financial records in Society 5.0 era. ASEAN J. Econ. Econ. Educ. 1(1),
1–8 (2022)
3. Efendi, J., Smith, L.M., Wong, J.: Longitudinal analysis of voluntary adoption of XBRL on
financial reporting. Int. J. Econ. Account. 2(2), 173–189 (2011)
4. Efimova, O., Rozhnova, O.: The corporate reporting development in the digital economy. In:
Digital Science, pp. 71–80. Springer (2019)
5. Fauzan, R.: Karakteristik model dan analisa peluang-tantangan industri 4.0. Phasti: Jurnal
Teknik Informatika Politeknik Hasnur 4(01), 1–11 (2018)
6. Guthrie, J., Manes-Rossi, F., Orelli, R.L.: Integrated reporting and integrated thinking in Italian
public sector organisations. Meditari Acc. Res. 25(4), 553–573 (2017)
7. Jørgensen, B.: Digital financial reporting: opportunities and challenges. J. Appl. Account. Res.
17(3), 285–298 (2016)
8. Keliwon, K.B., Abdul Shukor, Z., Hassan, M.S.: Measuring internet financial reporting (IFR)
disclosure strategy. Asian J. Account. Govern. 8, 7–24 (2017)
9. Kulikova, L.I., Mukhametzyanov, R.Z.: Formation of financial reporting in the conditions of
digital economy. J. Environ. Treat. Tech. 7(Special Issue), 1125 (2019)
10. Lasi, H., Fettke, P., Kemper, H.G., Feld, T., Hoffmann, M.: Industry 4.0. Bus. Inf. Syst. Eng.
6, 239–242 (2014)
11. Li, H., Wang, H., Zhang, Y.: Blockchain in finance: a literature review. J. Financ. Stab.Financ.
Stab. 34, 196–213 (2018)
12. Moudud-Ul-Huq, S.: The role of artificial intelligence in the development of accounting
systems: a review. IUP J. Account. Res. Audit Pract. 13(2) (2014)
13. Murdayanti, Y., Khan, M.N.A.A.: The development of internet financial reporting publications:
a concise of bibliometric analysis. Heliyon 7(12), e08551 (2021)
14. Nambisan, S., Lyytinen, K., Majchrzak, A., Song, M.: Digital innovation management:
reinventing innovation management research in a digital world. MIS Q. 41(1), 223–238 (2017)
15. Nurunnabi, M., Alam Hossain, M.: The voluntary disclosure of internet financial reporting
(IFR) in an emerging economy: a case of digital Bangladesh. J. Asia Bus. Stud. 6(1), 17–42
(2012)
16. PWC: Digitalisation in finance and accounting and what it means for financial statement
audit (2018). Available at: https://www.pwc.de/de/im-fokus/digitaleabschlusspruefung/pwc-
digitalisation-in-finance-2018.pdf. 13 Apr 2019
17. Rinaldi, L., Unerman, J., de Villiers, C.: Evaluating the integrated reporting journey: insights,
gaps and agendas for future research. Account. Audit. Account. J. 31(5), 1294–1318 (2018)
18. Salaudeen, H., Alayemi, S.A.: The level of internet adoption in business reporting: the Nigerian
perspectives. Int. J. Appl. Bus. Res. 107–121 (2020)
19. Sun, H., Li, X., Wang, Q.: Digital financial reporting: a literature review and research agenda.
J. Account. Data Sci. 2(2), 1–19 (2018)
20. Suryanto, T., Komalasari, A.: Effect of mandatory adoption of international financial reporting
standard (IFRS) on supply chain management: a case of Indonesian dairy industry. Uncertain
Supply Chain Manage. 7(2), 169–178 (2019)
21. Zaidi, U.K., Akhter, J., Akhtar, A.: Window dressing of financial statements in the era of digital
finance: a study of small cap Indian companies. Metamorphosis 17(2), 67–75 (2018)
Chapter 11
The Quest for Financing Environmental
Sustainability in Emerging Nations: Can
Internet Access and Financial
Technology Be Crucial?
E. P. Mesagan (B)
School of Management and Social Sciences, Pan-Atlantic University, Lagos, Nigeria
e-mail: [email protected]
P. M. Emmanuel
MGIG Global Research Institute, Lagos, Nigeria
e-mail: [email protected]
M. B. Salaudeen
Nungu Business School, Lagos, Nigeria
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 229
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_11
230 E. P. Mesagan et al.
11.1 Background
The climate change problem is now a global phenomenon attracting the attention of
global leaders, institutions, and policy analysts. This is due to the devastating impact
of climate change, most especially in developing continents like Africa. Olaoye [26],
and Mesagan et al. [22] argued that GHG emissions arising from energy firing and
anthropogenic activities are the stimulators of climate change. The climate change
impact due to GHG emissions is damaging with substantial economic, health, and
environmental consequences for the current and future generations. For instance,
Africa contributes about 2–3% of emissions, the lowest globally compared to Asia,
which accounts for half of the global CO2 , and North America emitted about 15%
of total emissions between 2000 and 2020 [30]. Despite African countries’ menial
contribution to GHGs, its negative effect on the regions is dehumanising, especially
among rural habitats.
African countries face dire environmental problems, including flooding, deser-
tification, and air pollution. However, the intensity of the annual flooding further
exacerbates rural poverty and acute food insecurity as flooding continues to wash
away farmlands and displace millions of farmers. World Metrological Organisation
[35] noted that by 2030, 700 million people could be forced out of their homes
due to high water stress, which is anticipated to affect roughly 250 million people in
Africa due to climate problems. Aside from this, UNEP [31] noted that environmental
pollution caused by CO2 emissions results in 1.96 billion IQ point losses annually
in Africa. Also, Fisher et al. [10], Olunkwa et al. [27], and Evans and Mesagan [9]
affirmed that over a million deaths on the African continent were attributed to air
pollution in 2019: 697,000 were attributable to household pollution, while 394,000
were to ambient pollution. These negative consequences of climate change require a
massive investment that promotes adaptation and mitigation. As a result, the finan-
cial system becomes indispensable in financing environmentally friendly projects to
enhance sustainability.
However, without considerable sustainability financing, the target and achieve-
ment of sustainable development goals (SDGs) by 2030 and global commitment
toward 2050 carbon neutrality by developing economies will be far from view. This
is because massive green investment financing is required to promote carbon effi-
ciency and environmental quality. Therefore, the interface of the financial system,
technology, and internet access is vital in ensuring that financing environmentally
sustainable projects is commensurate with the level of greenhouse depletion. Arguing
along this part, Nassiry [25] contends that to accomplish the SDGs and the core goal
of the Paris Agreement, keeping the increase in the global GHG below 2 °C, trillions
11 The Quest for Financing Environmental Sustainability in Emerging … 231
quality. Specifically, Muganyi et al. [24] posited that Fintech encourages green agri-
culture practices in China by assuring financial availability, reducing the information
gap, and increasing trust among farming communities. This indicates that Fintech
tends to lower unsustainable agricultural practices in Africa and thus encourage
sustainable agriculture.
There is a need to mention that environmental financing in Africa is driven mainly
by the government. However, the level of public climate financing needs to be
commensurate with the needed green investment for a sustainable environment due
to the peculiarity of the public revenue pressure. Although evidence from the Climate
Policy Initiative [6] affirmed that Africa’s public climate financing per total of 86%
exceeds those of the other regions (i.e., South Asia 64%, East Asia 62%, Western
Europe 41%), its private sector contribution is appalling. This evidence indicates that
the private sector accounts for only 14% of climate adaptation and resilience projects,
the lowest compared to the other regions. Moreover, between 2020 and 2030, Africa
will need almost $2.8 trillion worth of green investment to facilitate its Nationally
Determined Contributions (NDCs) [12]. The international public sector, as well as
the domestic and international private sectors, must contribute $2.5 trillion of this
total [12]. This shows that private green equities and bonds for African countries are
crucial to accelerating climate resilience financing and investment towards achieving
SDGs by 2030 and net-zero targets by 2050. Therefore, mopping and ensuring effi-
cient allocation of these green funds place much onus on internet accessibility and
financial technology since Fintech can make this funding accessible cheaply and
efficiently.
Therefore, this chapter explores the inherent opportunities of internet accessibility
and fintech adoption in promoting environmental sustainability. For structural organ-
isation, after the study background, Sect. 11.2 will present the schematic analysis
of the study, Sect. 11.3 will show situational analysis, Sect. 11.4 will discuss the
implications of findings, and lastly, Sect. 11.5 will provide a policy outlook.
Figure 11.1 presents the framework that connects Fintech, internet access and envi-
ronmental sustainability. This framework depicts how the financial system can drive
sustainable development through financial innovation and access to the internet. From
the schematic analysis in Fig. 11.1, the financial system is the central hub that coordi-
nates private funds and green bonds towards investment in environmentally friendly
projects. According to Muganyi et al. [24], the financial system is indispensable
in mopping financial resources towards sustainable investments. From Fig. 11.1, the
private funds, which include both households’ and firms’ surplus savings, are coordi-
nated by the financial structure and make such funds available to drive project funding
by most financial deficit units. Also, like the private fund, a green bond is a source
of environmental financing instrument that the government floats via the financial
system to motivate investment that considers environmental impact. In the view of
11 The Quest for Financing Environmental Sustainability in Emerging … 233
Internet Access
Sustainable
Agriculture
Private funds
Financial Renewable energy Environmental
Fintech Sustainability
system projects
Green Bonds
Sustainable
business/investment
Internet Access
Fig. 11.1 Schematic linkage between financial technology, internet access and environmental sustainability. Source Authors’ Design
E. P. Mesagan et al.
11 The Quest for Financing Environmental Sustainability in Emerging … 235
This segment of the study represents the data analysis concerning the depth of envi-
ronmental financing, access to the internet situation and financial technology adop-
tion in emerging nations with a special focus on Africa. The situational analysis of
these countries is illustrated using charts to give a clear picture of the environmental
financing scenario and the possibility of scaling up financing through the adoption
of Fintech supported by internet access.
The analysis we present in Fig. 11.2 shows the climate financing in Africa. The
countries represented are chosen based on the five sub-regions of the African conti-
nent. For instance, in North Africa, we select Algeria and Egypt. For East Africa, the
analysis chose Rwanda and Kenya, for Central Africa, Angola and Cameroon were
selected. In Southern Africa, we picked South Africa and Namibia; for West Africa,
the analysis selected Nigeria and Ghana. This approach is to provide a solid judgment
that will guide policy actions that will emanate from the study. Based on the data anal-
ysis in Fig. 11.2, In North Africa, Egypt spent about $2600 million on environmental
sustainability financing, while Algeria only spent $53 million on climate financing.
For East Africa, climate financing is $1919 million in Kenya, while Rwanda spends
about $601 million on environmental sustainability. Considering the West African
Region, Nigeria and Ghana spent $1923 million and $830 million, respectively. In
the Southern African region, South Africa’s climate financing stood at about $1660
million, and Namibia’s spending on climate mitigation and adoption is $202 million.
Lastly, for Central Africa, Angola’s climate financing is $307 million while $390
million. The evidence illustrates that African countries are making an effort towards
environmental sustainability. Comparing sustainability financing among the African
region, North and East Africa tend to perform better than the West, South and Central
African regions. However, the Central African countries perform poorly. Supporting
this findings, the IEA report indicates that sustainable financing in Northern Africa
has accelerated the clean energy transition agenda. As a result, the North African
countries have increased their renewable energy production by 40% by adding 4.5
GW of wind, solar PV, and solar thermal capacity to their renewable power grid in
the last decades and more so, renewable generation capacity has increased by 80%
over the same period [16].
236 E. P. Mesagan et al.
Cameroon
Angola
Namibia
African countries
South Africa
Ghana
Nigeria
Rwanda
Kenya
Algeria
Egypt
Climate Finance $ Million
Fig. 11.2 Climate financing in Africa. Source Authors’ Sketch using Data from [6]
Cameroon
Angola
Namibia
African countries
South Africa
Ghana
Nigeria
Rwanda
Kenya
Algeria
Egypt
Fig. 11.3 Sources of climate financing. Source Authors’ Sketch using Data from [6]
11 The Quest for Financing Environmental Sustainability in Emerging … 237
Sub-Saharan Africa
South Asia
US & Canada
Western Europe
Fig. 11.4 Proportion of domestic versus international source of green financing. Source Authors’
Sketch using Data from Climate Policy Initiative (2022)
Vietnam
Thailand
Philippines
ASEAN Countries
Myanmar
Malaysia
Lao PDR
Indonesia
Cambodia
Fig. 11.5 Climate financing for selected East Asian nations. Source Authors’ Sketch using Data
Retrieved from [20]
11 The Quest for Financing Environmental Sustainability in Emerging … 239
The subdivision of the study shows the internet access situation globally to give a
clear picture of internet access globally by region to assess its potential opportunity
to scale up fintech adoption and environmentally sustainable financing in emerging
nations. The situation analysis is illustrated in Fig. 11.6.
Figure 11.6 showcases the trend of internet access by region for a decade between
2010 and 2020. The evidence indicates that access to the internet globally has
continued to rise over the period across the regions of the world. However, the
African continent has the least access rate, as the report presents in Fig. 11.6. Strik-
ingly, for sub-Saharan Africa, access to the internet has sporadically risen from about
6.13% in 2010 to 29.34% in 2020. The implication is that in 2010 only about 6%
of the African population can access the internet, but over this period, the rate of
accessibility has significantly improved. Moreover, the accessibility of the internet
in Africa triggers an increase in mobile phone penetration in the region. The analysis
of mobile phone penetration is illustrated in Fig. 11.7.
The mobile phone penetration data is presented in Fig. 11.7, and the trend line is
similar to Fig. 11.6. It is not surprising because the internet access rate can determine
the mobile phone penetration rate. Assessing the situation of Africa, the mobile phone
penetration rate in sub-Saharan Africa is 44.09% in 2010 but increased to 81.9%. This
implies that over the period, the mobile phone penetration rate has almost doubled in
10 years. Moreover, this corroborates with internet with the surge in internet access
over the period. In this regard, internet access provides a huge opportunity to drive
fintech adoption since Al-Okaily et al. [1] and Crouhy et al. [7] emphasised that the
internet is the enabler of Fintech.
90
Internet Access (% population)
80
70
60
50
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Fig. 11.6 Internet access by regions. Source Authors’ Sketch using Data Sourced from World
Development Indicator (2021)
240 E. P. Mesagan et al.
140
100
80
60
40
20
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Fig. 11.7 Mobile phone penetration by region. Source Authors’ Sketch using Data Sourced from
World Bank World Development Indicator (2021)
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
East Asia & Latin America Middle East & South Asia Sub Saharan Europe &
Pacific & Caribbean North Africa Africa Central Asia
Fig. 11.8 Fintech situational analysis. Source Authors’ Sketch using Data Sourced from The World
Bank Global Findex Database (2021)
Furthermore, the analysis reveals that the public sector in Africa takes the burden
of climate financing while the private sector’s contribution is minimal. The statistics
revealed by Climate Policy Initiative [6] reveal that the public sector provides 86%
of climate financing needs in Africa while the private sector only accounts for 14%.
Similarly, our analysis discovered that even 86% of public climate financing, about
82% of the funding, is internationally sourced while 18% is domestic. Therefore, the
minimal private sector involvement in Africa possibly accounts for the wide disparity
in climate project funding in Africa compared with other regions globally. Therefore,
the role of the financial sector in mobilising private funds for sustainable investment
is essential for African countries and emerging nations to close the wider gap in
climate finance needs.
Therefore, since the financial sector is essential to stimulate climate financing to
narrow the green funding needs among the African nation, we evaluate the potential
of internet access and fintech adoption to accelerate environmental sustainability
financing. This analysis discovers that access to the internet in African countries
has substantially improved. The implication is that improved internet access consti-
tutes an enabler for the financial industries to advance financial services through
financial technology adoption. Also, through access to the internet, individuals and
organisations can enjoy financial services provided by the financial industry. Based
on this, the situational analysis of fintech penetration shows that digital transactions
and access to credit through mobile money apps are increasing in African nations.
Therefore, with this progress in internet access and financial technology adoption,
the financial sector can drive green financing by extending green bond opportunities
and credit services to the general public, which boosts environmental sustainability
financing.
which adequacy is largely questioned. Therefore, we suggest that the financial system
should actively get involved in financial resources mobilisation to support investment
in green projects such as supporting renewable mini-grids for electricity generation,
providing funding for sustainable agriculture, creating long-term financial avenues
for electric car projects in Africa to reduce the carbon footprints of the transport
sector via green bonds, debts, and equity and supporting general energy efficient
investment in Africa through financial technology adoption. This environmentally
sustainable financing will make African countries develop a climate change resistance
eco-system to promote sustainability even as the continent’s population doubles by
2060.
References
1. Al-Okaily, M., Al Natour, A.R., Shishan, F., Al-Dmour, A., Alghazzawi, R., Alsharairi, M.:
Sustainable FinTech innovation orientation: a moderated model. Sustainability 13(24), 13591
(2021)
2. Arner, D.W., Buckley, R.P., Zetzsche, D.A., Veidt, R.: Sustainability, FinTech and financial
inclusion. Eur. Bus. Organ. Law Rev. 21(1), 7–35 (2020)
3. Ayompe, L.M., Davis, S.J., Egoh, B.N.: Trends and drivers of African fossil fuel CO2 emissions
1990–2017. Environ. Res. Lett. 15(12), 124039 (2021)
4. Cen, T., He, R.: Fintech, green finance and sustainable development. In: 2018 International
Conference on Management, Economics, Education, Arts and Humanities (MEEAH 2018),
pp. 222–225. Atlantis Press (2018)
5. Chen, L., Ma, R., Li, J., Zhou, F.: Revolutionizing sustainable economic growth in China:
harnessing natural resources, green development, and fintech for a greener future. Res. Policy
92, 104944 (2024)
6. Climate Policy Initiative: Landscape of Climate Finance in Africa (2020). Avail-
able at: https://www.climatepolicyinitiative.org/publication/landscape-of-climate-finance-in-
africa/. Accessed 3 Jan 2023
7. Crouhy, M., Galai, D., Wiener, Z.: The impact of Fintechs on financial intermediation: a
functional approach. J. FinTech 1(01), 2031001 (2021)
8. Deng, X., Huang, Z., Cheng, X.: FinTech and sustainable development: evidence from China
based on P2P data. Sustainability 11(22), 6434 (2019)
9. Evans, O., Mesagan, E.P.: ICT-trade and pollution in Africa: do governance and regulation
matter? J. Policy Model. 44(3), 511–531 (2022)
10. Fisher, S., Bellinger, D.C., Cropper, M.L., Kumar, P., Binagwaho, A., Koudenoukpo, J.B., Park,
Y., Taghian, G., Landrigan, P.J.: Air pollution and development in Africa: impacts on health,
the economy, and human capital. Lancet Planet. Health 5(10), e681–e688 (2021)
11. Friedline, T., Naraharisetti, S., Weaver, A.: Digital redlining: poor rural communities’ access
to Fintech and implications for financial inclusion. J. Poverty 24(5–6), 517–541 (2020)
12. FsdAfrica: Current Levels of Climate Finance in Africa Falling Drastically Short of Needs
(2021). Retrieved from: https://www.fsdafrica.org/news/current-levels-of-climate-finance-in-
africa-falling-drastically-short-of-needs/. Accessed 9 Oct 2022
13. Giglio, S., Kelly, B., Stroebel, J.: Climate finance. Ann. Rev. Financ. Econ. 13, 15–36 (2021)
14. Gorton, G., Winton, A.: Financial intermediation. In: Handbook of the Economics of Finance,
vol. 1, pp. 431–552. Elsevier (2003)
15. Guang-Wen, Z., Siddik, A.B.: The effect of Fintech adoption on green finance and environ-
mental performance of banking institutions during the COVID-19 pandemic: the role of green
innovation. Environ. Sci. Pollut. Res. 1–13 (2022)
244 E. P. Mesagan et al.
16. IEA: North Africa’s Pathways to Clean Energy Transitions (2020). Available at: https://
www.iea.org/commentaries/north-africa-s-pathways-to-clean-energy-transitions. Accessed 3
Jan 2023
17. Joya, B.: South Africa’s Green Fund (2014). Available at: https://www.greenfinancepla
tform.org/sites/default/files/downloads/best-practices/GGBP%20Case%20Study%20Series_
South%20Africa_Green%20Fund.pdf. Accessed 3 Jan 2023
18. Khan, I.S., Ahmad, M.O., Majava, J.: Industry 4.0 and sustainable development: a system-
atic mapping of triple bottom line. Circular Economy and Sustainable Business Models
perspectives. J. Clean. Prod. 297, 126655 (2021)
19. Lewan, M.: The internet as an enabler of FinTech. In: The Rise and Development of FinTech,
pp. 190–204. Routledge (2018)
20. Melinda, M., Qiu, J.: Climate Finance in Southeast Asia: Trends and Opportunities (2022).
Available at: https://www.iseas.edu.sg/articles-commentaries/iseas-perspective/2022-9-cli
mate-finance-in-southeast-asia-trends-and-opportunities-by-melinda-martinus-and-qiu-jia
hui/. Accessed 1 Jan 2023
21. Mesagan, E.P., Akinsola, F., Akinsola, M., Emmanuel, P.M.: Pollution control in Africa: the
interplay between financial integration and industrialisation. Environ. Sci. Pollut. Res. 29(20),
29938–29948 (2022)
22. Mesagan, E.P., Charles, A.O., Vo, X.V.: The relevance of resource wealth in output growth and
industrial development in Africa. Resour. Policy 82, 103517 (2023)
23. Mesagan, E.P., Vo, X.V., Emmanuel, P.M.: The technological role in the growth-enhancing
financial development: evidence from African nations. Econ. Change Restruct. 1–24 (2022).
https://doi.org/10.1007/s10644-022-09442-z
24. Muganyi, T., Yan, L., Sun, H.P.: Green finance, Fintech and environmental protection: evidence
from China. Environ. Sci. Ecotechnol. 7, 100107 (2021)
25. Nassiry, D.: The Role of Fintech in Unlocking Green Finance: Policy Insights for Developing
Countries (No. 883). ADBI Working Paper (2018)
26. Olaoye, O.: Environmental quality, energy consumption and economic growth: evidence from
selected African countries. Green Low-Carbon Econ. 1–9 (2023)
27. Olunkwa, C.N., Adenuga, J.I., Salaudeen, M.B., Mesagan, E.P.: The demographic effects of
Covid-19: any hope for working populations. BizEcons Q. 15(1), 3–12 (2021)
28. Piñeiro, V., Arias, J., Elverdin, P., Ibáñez, A.M., Morales Opazo, C., Prager, S., Torero, M.:
Achieving Sustainable Agricultural Practices: From Incentives to Adoption and Outcomes. Intl
Food Policy Res Inst (2021)
29. Sebestyén, V.: Renewable and Sustainable Energy Reviews: environmental impact networks
of renewable energy power plants. Renew. Sustain. Energy Rev. 151, 111626 (2021)
30. Tiseo, I.: Breakdown of Carbon Dioxide Emissions Worldwide 2000–2050, by Region (2022).
Retrieved from: https://www.statista.com/statistics/1257801/global-emission-shares-worldw
ide-region-outlook/. Accessed 8 Oct 2022
31. UNEP Report: Aligning Africa’s Financial System with Sustainable Development (2015).
Available at: https://www.greengrowthknowledge.org/sites/default/files/downloads/resource/
Aligning%20Africa%27s%20Financial%20System%20with%20Sustainable%20Develop
ment.pdf. Accessed 3 Jan 2023
32. Wang, J., Chen, X., Li, X., Yu, J., Zhong, R.: The market reaction to green bond issuance:
evidence from China. Pac. Basin Financ. J. 60, 101294 (2020)
33. World Bank: The Global Findex Database 2021: Financial Inclusion, Digital Payments, and
Resilience in the Age of COVID-19 (2021). Retrieved from: https://www.worldbank.org/en/
publication/globalfindex. Accessed 4 Jan 2023
11 The Quest for Financing Environmental Sustainability in Emerging … 245
34. World Bank: World Development Indicators (2021). Retrieved from: https://databank.worldb
ank.org/source/world-development-indicators. Accessed 4 Jan 2023
35. World Metrological Organisation: State of Climate in Africa Highlights Water Stress and
Hazards (2022). Retrieved from: https://public.wmo.int/en/media/press-release/state-of-cli
mate-africa-highlights-water-stress-and-hazards. Accessed 4 Jan 2023
Chapter 12
A Comprehensive Review of Bitcoin’s
Energy Consumption and Its
Environmental Implications
Abstract Bitcoin, which has the highest net worth among cryptocurrencies and
the most significant transaction volume, has an immense prospect in terms of
economic cost, rapid processing, and minimal risk, even while driving significant
worldwide transformation and disruption. Since cryptocurrencies and Bitcoin are
new, have an uncertain legal status, and bear the possibility of being engaged in
illicit behaviour, there is the opportunity for their usage as a highly unpredictable and
impulsive investment instrument with environmental consequences. This research
studies Bitcoin mining and blockchain technologies, as well as Bitcoin’s high energy
consumption and environmental impacts. This study signifies the process used to
calculate bitcoin’s energy consumption. We discuss the two prominent models (a)
Model-1 by Christian Stoll, Lena Klaaßen, and Ulrich Gallersdörfer, and (b) Model-
2—The CBECI model used to determine the energy consumption meant for mining
bitcoins. The findings suggest that the power needed by bitcoin mining has detri-
mental ecological and social effects, such as causing global warming and climate
change. Further, this study also forecasts the future of bitcoin mining and its influence
on sustainability.
S. Harichandan
Institute of Management Technology, Nagpur, India
S. K. Kar (B) · A. Kumar
Department of Management Studies, Rajiv Gandhi Institute of Petroleum Technology, Amethi,
Uttar Pradesh 229304, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 247
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_12
248 S. Harichandan et al.
12.1 Introduction
Over the past few years, there has been an increase in interest in bitcoin’s usage
and potential as an investment vehicle. When people lost trust in the global finan-
cial crisis, bitcoin emerged as a viable answer owning to its mathematical certainty
based on blockchain technology [1]. Bitcoin is a decentralised electronic payment
system that operates via a direct, anonymous, and secure web. By differentiating
itself from typical bank transactions, bitcoin, the most valuable cryptocurrency in
terms of market capitalization and transaction volume, delivers considerable advan-
tages to its users. While bitcoin has received much attention in recent years, its
essential mechanism, most notably blockchain technology, has risen in popularity at
a breakneck speed [2]. Due to its key characteristics of decentralisation, auditability,
and anonymity, blockchain is widely regarded as one of the most promising and
attractive technologies for a variety of industries, including supply chain finance,
manufacturing operations management, logistics management, and the Internet of
Things (IoT). When bitcoin and other cryptocurrencies are used in conjunction with
blockchain technology, they are not controlled by any organisation or government, as
printed cash is [3]. Bitcoin is produced, updated, and examined via the use of cryp-
tographic concepts and computer algorithms. With the birth of bitcoin, the market
saw the emergence of hundreds of alternative cryptocurrencies nicknamed altcoins
[4]. On the surface, cryptocurrencies seem to be exceedingly volatile and suitable for
speculation. Despite its promises and attractiveness, the current consensus method’s
first application in the bitcoin network’s real functioning demonstrates that it has
a substantial energy and carbon emission cost. As a consequence, resolving this
problem expeditiously is crucial.
When employing blockchain technology, bitcoin and other cryptocurrencies are
not controlled by any organisation or government, as fiat money is. Bitcoin is created,
modified, and inspected using an established technical infrastructure. It is created,
modified, and inspected using cryptographic principles and a software algorithm [5].
With the advent of bitcoin, dozens more cryptocurrencies, dubbed altcoins, entered
the market. When evaluating cryptocurrencies on a surface level, they seem to be
very volatile and ripe for speculation. Attracting users and investors, although cryp-
tocurrencies are autonomous in that they may be used for money laundering and
unlawful activity, they are reliant on government choices on bitcoin and its envi-
ronmental impact. Based on these, this review article aims to study the detrimental
effects of bitcoin mining on society. The annual carbon footprint released by bitcoin
is estimated to be 44.56 Mt CO2 equivalent, and it consumes 79.89 TWh of electricity
primarily generated from fossil fuels [6]. Additionally, a single bitcoin transaction
releases electronic wastes equivalent to 406.50 g, annually comprising 44.59 kt of
electronic waste being released into Earth’s ecosystem. As growing economies like
the USA, China, Japan, United Kingdom, and India take longer strides towards
making their economy sustainable and emission-free (net-zero) by 2070, it is the
need of the hour to look at this emerging issue [3]. Bitcoin mining is on the rise, and
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 249
the graph is estimated to grow further in the future, thus making sustainable mining
a must for bitcoin miners.
The objective of this article centres around the environmental implications of
Bitcoin’s energy consumption, particularly in the context of its mining process.
This issue stems from the significant amount of electricity consumed by the Bitcoin
network, primarily through its mining operations. This energy consumption is a
concern due to its association with carbon emissions and electronic waste, both
of which have negative environmental effects. The increasing adoption and growth
of Bitcoin mining exacerbate these concerns, making it imperative to address the
environmental impact and seek sustainable alternatives. In light of this, the present
article aims to explore and analyse the detrimental effects of Bitcoin mining on
society, present data regarding its carbon footprint and electronic waste generation
and propose potential sustainable solutions to mitigate these adverse impacts. The
research objectives are:
(a) To analyse the negative environmental consequences of Bitcoin mining,
including its carbon footprint and electronic waste generation, in order to
understand the extent of the problem.
(b) To quantify the energy consumption of the Bitcoin network and its mining
operations, particularly in terms of electricity usage, and to highlight the reliance
on fossil fuels for energy generation.
(c) To provide insight into the annual carbon emissions resulting from Bitcoin
mining activities, emphasizing the ecological consequences of the energy-
intensive process.
(d) To explore and suggest sustainable alternatives and solutions that could mitigate
the environmental impact of Bitcoin mining, with a focus on reducing energy
consumption and carbon emissions.
The novelty of this article stems from its comprehensive coverage of Bitcoin’s
energy consumption and environmental implications, its focus on global economies,
its provision of specific data, its proposal of sustainable alternatives, and its implica-
tions for the future. These aspects collectively set the article apart and contribute to the
ongoing discourse around the environmental challenges posed by cryptocurrencies.
The study is categorised into 5 sections. Section 12.1 introduces the concept,
while Sect. 12.2 focuses on the literature review. Section 12.3 discusses bitcoin
mining and its implications, while Sect. 12.4 highlights the economies of bitcoin
mining. Section 12.5 discusses the findings, while Sect. 12.6 suggests implications
for future sustainable mining of bitcoin and concludes the article in Sect. 12.7.
fundamental argument of those who have a more hopeful view on bitcoin is that it is
built on a solid technological/cryptographic foundation and cannot be manipulated.
Pessimists claim that since it lacks a definite core, it risks financial ’balloon-lunacy,’
resulting in environmental devastation owing to the energy it requires. The first
group believes that cryptocurrencies should be subject to certain inspections and
regulations owing to their favourable attitude toward digital currencies, particularly
bitcoin, and that the system should be more legally based. Additionally, they argue
that bitcoin exchanges, regardless of their size, should be formed to monitor, examine,
and guarantee that procedures take place within a legal framework. Another key point
is that, as long as the system is legally sound and operates correctly, governments
may benefit from the market via taxes. The second group believes that bitcoin and
other digital currencies should be avoided and that no integration with this system
should occur due to its usage for money laundering.
Following a study of the literature, debates often centre on whether bitcoin is a
bubble or a commodity, a currency or a financial investment instrument, and, more
recently, on bitcoin’s energy usage. Numerous studies have been conducted on the
proper use of bitcoin to diversify portfolios, its utility as a hedge against the dollar
[7, 8], and a preference for it as an investing instrument rather than an alternative
payment method [9]. Another article describes bitcoin as both a speculative and a
conventional financial entity because to its unique structure [10]. Researchers have
compared bitcoin’s volatility to that of other financial instruments and concludes
that the bitcoin market is very speculative [11]. The primary objection levelled
towards cryptocurrencies is that they are often utilised for criminal activities such as
money laundering. Additionally, it is noted that cryptocurrencies, particularly bitcoin,
provide potential for tax evasion and may eventually replace tax havens [12]. In this
view, the bulk of publications consider bitcoin as a speculative investment instru-
ment rather than an alternative currency [13]. Due to the scarcity of research on the
consequences of bitcoin’s energy consumption and environmental impact, this study
intends to contribute to an increase in research on this subject.
Bitcoin has lately been in the headlines for a variety of reasons, including its value
and energy usage. The increasing levels of its energy consumption, and the likelihood
that this consumption will continue to rise, entail a slew of negative consequences.
The fact that approximately 80% of the world’s energy consumption is derived from
fossil fuels and that this status is unlikely to alter in the future creates major environ-
mental challenges. The huge amounts of energy consumption that bitcoin will achieve
are considered as a trigger for the depletion of limited fossil fuels. Bitcoin mining
are expanding in locations like China and India where primarily energy is generated
by burning coal resulting in poor air quality index in their prominent economic hubs
and cities. To cover the existing research gap as evident from the literature review,
the energy consumption and environmental impact of bitcoin have been highlighted
in this research, which examines bitcoin mining and blockchain technology. The
energy used by increasing bitcoin mining is seen as one of the primary impediments
to Bitcoin’s growth.
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 251
About every ten minutes, so-called miners connect new sets of transactions (blocks)
to Bitcoin’s blockchain. These miners are not expected to trust each other when
operating on the blockchain. The code that controls Bitcoin is the only thing that
miners can rely on. Several rules are used in the code to ensure all transactions
are legitimate. For instance, a transaction is legal (valid) only if the sender already
possesses the amount transferred. Any miner independently verifies that transactions
follow these laws, removing the need to rely on other miners.
The key is to convince all miners involved in mining to agree on the same trans-
action history. Each miner in the network is continuously charged with planning
the blockchain’s next set of transactions. Just one of these blocks will be randomly
chosen to become the chain’s new block. Since random selection in a distributed
network is difficult, ‘Proof-of-Work (PoW)’ is used [14]. In PoW, the next block
is generated by the first miner to do so. This is better said than done, since the
Bitcoin algorithm makes mining very complicated [15]. Indeed, the protocol adjusts
the complexity on a regular basis to ensure that all miners in the network output
only one legitimate block every ten minutes on average. When one of the miners
eventually succeeds in producing a valid block, the rest of the network is notified.
Other miners will approve this block until they verify that it complies with all laws
and will then discard the block they were already working on. The fortunate miner is
credited with a set number of coins in addition to the transaction costs associated with
the current block’s processed transactions. The loop then repeats itself. The energy
efficiency of the hardware used for mining is measured in joules per terahash (J/TH)
[16]. Energy-efficient mining rigs operate at around 30 J/TH, while less efficient rigs
can consume over 100 J/TH [17].
The method of creating a legitimate block is largely trial and error, with miners
making several attempts per second to find the correct value for a block element
called the “nonce” and hoping that the finished block meets the specifications. As a
result, mining is often linked to a lottery in which you can choose your own figures.
The hash rate of your mining equipment determines the number of attempts (hashes)
per second. Usually, this would be calculated in Giga hash per second.
For years, determining the precise carbon footprint of the bitcoin network has been a
concern. Not only does one need to understand the bitcoin network’s power require-
ments, but also where this power comes from. The position of miners is critical in
determining how dirty or clean the electricity they are using for mining. The envi-
ronmental impact of blockchain energy consumption is often measured in terms
252 S. Harichandan et al.
of carbon emissions [18]. This can be estimated based on the energy consumption
and the carbon intensity of the electricity used for mining [19]. Carbon intensity
measures the amount of CO2 emissions produced per unit of energy (gCO2 /kWh)
[20]. The global average carbon intensity varies by region and energy sources used
[21]. Assuming a carbon intensity of 500 gCO2 /kWh (a typical value for regions with
a mix of energy sources), and using the calculated daily energy consumption from
above (840,000 MJ = 233,333 kWh), the daily carbon emissions can be estimated:
For example, Carbon Emissions = 233,333 kWh × 500 gCO2 /kWh = 116,666,500
gCO2 = 116.67 tonnes CO2
Just as identifying the computers are operating on the bitcoin network is difficult,
determining their position is similarly difficult. As it is assumed that bulk of these
miners belong from main land China [22, 23]. The average pollution factor of Chinese
grid is approximately 700 g of CO2 equivalent per kWh [24]. This average pollution
factor is put to use to calculate the carbon intensity of the electricity used for mining
(as a rough estimation). If 70% of bitcoin mining occurs in China and 30% of mining
is totally clean, this results in a weighted average carbon intensity of 490 g CO2 eq/
kWh. This figure will then be applied to an estimate of the bitcoin network’s power
consumption to calculate the network’s carbon footprint.
Approximately 65% of the global hash power belongs to China [25]. The
autonomous Xinjiang province, which accounts for 35.76% of the global total,
produces nearly half of the nation’s hash power [26]. China’s hash rate exceeds that of
the United States by nine times, with the US accounting for 7.24% of the world’s hash
rate, a long way below than that of China. The justification for such extensive mining
in Xinjiang in comparison to the global hash rate is the availability of cheap coal.
Though wind turbines surround the peaks of Urumqi in Xinjiang, they still compen-
sate for less than a quarter of the power generated last year [27]. Coal is what makes
up the reminder.
Bitcoin mining is a method of making new coins that entails the use of computers to
resolve complicated mathematical equations or crypto puzzles. Cryptocurrencies are
built on a shared network and require mining to operate. Bitcoin mining software is
designed to take about ten minutes on average for those on the network to solve the
complicated programme and decrypt a block. The method consumes a significant
amount of energy as miners use large and efficient systems to mine blocks and
validate transactions. The mining method consumes the biggest share of bitcoin’s
resources. Miners are compensated for their services with newly generated bitcoins
and data processing fees. Mining cryptocurrency often needs electricity provided
using fossil fuels. As the price of bitcoin increases, so does energy demand. The
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 253
Fig. 12.1 The cycle of bitcoin mining and its environmental implications
rising price provides miners with an additional opportunity to mine coins and attracts
new users to the bitcoin network. This is like a constant, never-ending process or
cycle (Fig. 12.1), which would eventually get bigger and bigger with time.
The constant block mining loop incentivizes bitcoin miners worldwide. Since
mining could provide a reliable source of income, people are willing to run energy-
intensive machinery in order to have a portion of it. This has resulted in the bitcoin
network’s overall energy consumption growing to enormous proportions over the
years, as the currency’s value scales new peaks. A single bitcoin transaction’s carbon
footprint is equivalent to 406.28 kg CO2 , equal to the carbon footprint of approx-
imately 1million VISA transactions. In terms of electricity energy consumption, a
single bitcoin transaction consumes 729 kWh of energy. Electronic waste released
from single bitcoin transactions is around 406.5 g, and annually, this waste accounts
for 44.59 kt [6]. Such is the drastic impacts of bitcoin mining on the environment.
The bitcoin network as a whole now uses more electricity than most countries. If
bitcoin were a nation, it would have consumed 117.11 TWh per year, which would
have more than that consumed by countries like the Philippines, Kazakhstan even
Netherlands [28]. Similar sources even estimate that it would have come in the top 30
energy users globally if bitcoin had been a country. The massive chunk of this energy
comes from the conventional sources of energy, which are almost the by-products of
fossil fuels. The prime concern not only lies with their non-renewability and scarcity
but also with the massive carbon emission which is generated from them. It would
not be dismay here to mention that gradually with the days to come, bitcoin’s energy
consumption would soon be reflected in the top-10 global energy consumers.
254 S. Harichandan et al.
Due to the growing number of miners and the continuous advancement of mining
technology, crypto mining has become much more competitive, requiring a rising
amount of computational power to succeed, and hence requires more investment than
in the early days. As such, potential miners must carefully analyse questions of cost
and profitability, which will be covered in further detail below.
throughout the nation. China’s State Council stated the necessity to limit financial
risk in this respect, while local governments in the country’s most active mining areas
cited the abuse of power or the use of electricity from highly polluting sources as
justifications for the crackdown.
On the other hand, mining-friendly nations such as Kazakhstan and several states
in the United States, such as Texas, have regulations that see crypto mining enterprises
as a potential boost to their economies. Kazakhstan, which now has the world’s second
largest bitcoin mining industry due to the influx of Chinese crypto miners following
the recent crackdown, formally legalised crypto mining in 2020, confirming its legal
status and amending the tax code to allow for crypto mining to be taxed based on the
miner’s electricity consumption. This hospitable attitude may change, however, as the
recent surge of Chinese cryptocurrency miners has pushed the demand for energy
to an all-time high, leading the government to push for a new draught legislation
rationing power to new crypto mines.
This seems to be the trend in nations where cryptocurrency mining enterprises are
establishing a foothold. Even if these nations’ governments retain a favourable or at
least neutral posture toward crypto mining, the high energy consumption associated
with mining may necessitate government involvement to regulate the amount of
power given to crypto miners. This is the situation in Iran, where a licence structure
has been developed for the crypto mining company; yet the government was obliged
to impose a four-month ban on all mining activities after the summer blackouts.
12.5 Discussion
To determine the cost of how much electricity it will take to generate a bitcoin,
one must first understand a few basic concepts. To begin, what is the price of elec-
tricity in the mining area? Second, how much electricity will be consumed by the
mining or decrypting processors? More efficient computing technology consumes
less energy, resulting in lower utility bills. The lesser the energy price, the lower
the cost it is to miners. This raises the value of bitcoin for miners in areas with
lower production costs. Bitcoins are created by computer-based miners who consume
enormous quantities of electricity. Some scientists believe bitcoin is harmful to the
ecosystem because of its energy-intensive nature. Knowing how bitcoin is created is
important for a greater understanding of how the electrical resources used to operate
the bitcoin network functions.
The first step is to determine the amounts of sums performed per second to solve
the puzzles. Then figure out how much energy each sum needs. “Hashes” are the term
for these sums [29]. There are a lot of them, and they’re usually measured in millions
(called Mega hashes) or billions (Giga hashes) or quintillions (Exa hashes). It is
estimated that the bitcoin network’s processors were producing up to 120 exa hashes
256 S. Harichandan et al.
every second in early 2020 [30]. Many firms have concentrated on Application-
Specific Integrated Circuit (ASIC) mining computers, although there are various
bitcoin-mining computers available. ASICs need less energy to perform calculations
[31].
While the overall network hash rate can be easily measured, it is difficult to
determine what it reflects in terms of energy usage due to the lack of a central registry
of all operating devices. To arrive at a given number of watts used per Gigahash/sec
(GH/s), energy utilisation calculations are being used to provide an assumption of
what devices were already operating and how they were transmitted. The calculation
of bitcoin’s energy consumption circles around the premises of miner’s income and
expense, as shown in Fig. 12.2. Since energy prices are also a large part of ongoing
costs, the bitcoin network’s gross electricity usage is often linked to miner profits.
Simply stated, the greater the mining income, the more energy-intensive machinery
will be funded. The revenue derived from the coin is measured after it is produced
and decrypted by the miner. A proportion of the revenue generated by the coin, is
ascertained to the cost spent on electricity consumption. The point to note here is that
electricity prices vary from place to place. After determining the rate, it is translated
to a consumption price, and thus the energy consumption is calculated. Though there
are numerous bitcoin mining computers available, research has mostly concentrated
on ASIC mining computers [31, 32], owing to their speed and efficiency. Mining
companies that use a lot of ASIC state that they only consume, one watt of electricity
per giga hash per second of computation while mining bitcoin [32]. Below are few
correlations of bitcoin’s power consumption related to other entities [33].
In 2020 bitcoin network consumed 120 GW/s.
• 120 GW/s = 63 TWh/year, (relating the energy usage with the volume of hash
rates)
• 63 TWh/year = 13,200 million LEDs (note, 1 GW = 110 million LEDs) or
• 63 TWh/year = 375 million Photo-voltaic (PV) cells (1 GW = 3.125 million PV
cells) (given that they are generating power at peak production per second)
• Time taken to mine 1 bitcoin = 10 min, (Irrespective of the no. of miners involved)
using average power derived from ASIC miners, all other factors remaining
constant
• 10 min = 600 s,
• Power to mine 1 bitcoin = 72,000 GW or 72 TW
The above correlational aspects of bitcoin are few peripherals that state the vast
amount of energy consumption used for mining bitcoins. The same energy consumed
for mining in 2020 could have illuminated millions of families in energy impover-
ished countries. It is also the equivalent amount of energy that 375 million PV cells
would have generated. These are just a few stigmatic effects of bitcoin’s energy
consumption, leave its greater ecological implications.
In an analysis titled “The Carbon Footprint of Bitcoin” [34], the authors prop-
erly account for these geographical disparities (while also adding a novel approach
for localising miners based on IP addresses), but nevertheless discover a weighted
average carbon intensity of 480–500 gCO2 eq per kWh for the entire Bitcoin network
(in line with previous and more rough estimations).
A situation whereby all miners use the most powerful computational hardware defines
the lower threshold. The lower threshold of the range is determined by multiplying the
necessary computing power (as shown by the hash rate) by the energy consumption
of the most effective hardware:
c. Optimising Threshold
The optimal threshold is based on the lower limit but considers the network’s
projected energy efficiency and additional loss from cooling the processor and
auxiliary units.
The network’s realistic energy efficiency can be calculated utilizing the pricing
power of mining hardware manufacturers and the energy efficiency of the hardware
in use:
[ n ] [ ]
∑ ∑n
EH = SA × EFA + 1 − SA × EZ (12.5)
i=1 i=1
that appears nearest to bitcoin’s actual annual energy usage. The first figure, calcu-
lated in GigaWatts (GW), corresponds to the overall electrical power used by the
bitcoin network. This number is changed every 30 s and represents the amount of
energy used by bitcoin.
The second figure is calculated in TeraWatt-hours (TWh) and corresponds to
the Bitcoin network’s cumulative annual energy usage. Thus, annualizing bitcoin’s
energy use over a year, assuming constant power demand at the above cost. Appli-
cation of 7-day moving average on the resulting data point is made (as evident in
Fig. 12.3). The performance value less reliant on short-term hash rate fluctuations,
making it more appropriate for comparisons with alternative energy sources. The
CBECI model is based on the concept that miners can operate their machines as
long as it is efficient in terms of energy cost. To calculate the profitability of a given
hardware type, we consider the total miner revenues, the total network hash rate, the
electricity efficiency of the hardware in question, and the average electricity price
miners have to pay per kWh.
Fig. 12.3 The estimated and minimum energy consumption of Bitcoin from January 2017 to Feb
2023*. Source Created by authors with data from [6]. *It is assumed that the processor used in the
mining is Antminer S19 pro (2020 onwards), Antminer S9 (2019–20), Antminer S15 (2018–19)
and Antminer S17e (2017–18)
260 S. Harichandan et al.
θ = MR / EC (12.7)
c. Optimising Threshold
∑
n
Eoptimising (EC) = Ef /n × HR × PU × 3.16 × 107 (12.10)
i=1
As more people attempt bitcoin mining, more will be the carbon emission in the
future. Several analysts believe that all 21 million bitcoins will be explored by 2140
[36–38]. On the other hand, the energy used for this exploration and its environmental
consequences cannot be overlooked. Currently, there are 18.7 million bitcoins in
circulation [39]. When new blocks are mined, this amount varies approximately
every 10 min. Currently, any new block contributes 6.25 bitcoins to the system [40].
Blockchain, today by many, is seen by more than just cryptocurrency. Bitcoin has
succeeded in establishing a global, transparent monetary structure, but it falls short as
a general-purpose blockchain network. Smart contracts, for example, are expected to
challenge existing market models of banking, commerce, and logistics. Blockchain,
like all previous transformative developments, is merely the framework and enabler
of innovative apps. Nonetheless, with regards to the environmental consequences
it highlights the need for further studies on externalities to assist policymakers in
establishing the appropriate rules for the implementation of these innovations.
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 261
Due to the fact that the most cost-effective computers earn the highest income,
miners are motivated not only to use the most reliable hardware but also to check
out the cheapest source of power. The common location for such low-cost electricity
is China’s Sichuan province [41]. It is estimated that 48% of the world’s mining
potential is located at that place [42]. The southwest of China is capable of generating
vast volumes of hydropower, despite somewhat lower local demand. It is worthy of
mentioning here that China’s grid infrastructure is a barrier to clean energy production
at the moment [43]. The region’s power export capability is also constrained due to
inadequate grid penetration and a shortage of high-quality grid infrastructure. This
leaves the provinces of Sichuan and Yunnan with a surplus of hydropower, which
attracts energy-hungry and polluting factories looking to take advantage of the low
prices. One such industry is bitcoin mining. Unlike the power consumption of Bitcoin
mining equipment, which is constant throughout the year, hydropower generation is
seasonal. It is critical to understand that, while renewable energy sources are sporadic,
bitcoin miners have a persistent energy demand. When turned on, a bitcoin ASIC
miner may remain on until it either fails or becomes incapable of mining bitcoin
profitably [44]. As a result, bitcoin miners boost the grid’s baseload demand. They do
not need energy only when renewables are abundant; they also need electricity during
periods of supply scarcity. It is clearly evident from this why renewables can only
act as a secondary power option for mining, and conventional hydrocarbon-based
sources will always remain miners’ favourite.
Though proof-of-work was the first consensus algorithm to demonstrate its validity,
it is not the only one. Recent years have seen the emergence of more energy-efficient
algorithms, such as proof-of-stake. Proof-of-stake coins are created by coin owners
rather than miners, eliminating the need for power-hungry machines that generate as
many hashes per second as possible [45]. As a result, proof-of-stake consumes much
fewer resources than proof-of-work. Ethereum, the second-biggest cryptocurrency
at the time, is now running on proof of work but is preparing to move to proof of
stake. If Ethereum can transition to proof of stake, bitcoin, technically, can as well.
Bitcoin will eventually have to adopt such a consensus algorithm to improve envi-
ronmental sustainability dramatically. The only disadvantage is that there are several
proof-of-stake implementations, and none of them has been completely validated
yet. Nonetheless, the research on these algorithms provides a reason for optimism
for the future.
262 S. Harichandan et al.
12.7 Conclusion
Cryptocurrencies have grown in popularity over the last several years, capturing
the interest of both consumers and investors. At the moment, Bitcoin and other
altcoins are just utilised as a prospective tool of investment rather than a medium
of commercial exchange. This research investigated the environmental impacts of
Bitcoin mining, the world’s first cryptocurrency, may it be in terms of market capital-
ization as well as transaction volume. It is widely accepted that fossil fuels account
for about 80% of worldwide energy consumption, that this perspective is unlikely
to alter in the near future. Despite significant advancements in alternate energy
sources, it is clear that the hydrocarbon industry mostly fulfils the world economy’s
energy demands. With this in mind, the energy consumed by miners throughout the
processes of verifying, recording, and creating Bitcoin has been investigated here.
It has been stressed that the enormous amount of energy required to mine Bitcoin is
not sustainable because of the high demand for computing power.
Bitcoin’s daily growth in energy consumption has resulted in consuming more
energy than several countries and posing numerous risks to the cryptocurrency’s
future. It is well known that in order to minimise the high energy expenses associ-
ated with Bitcoin mining, individuals and businesses have conducted these activities
in nations with cheap electricity. However, bitcoin transactions and mining require
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 263
References
1. Jiang, S., Li, Y., Lu, Q., Hong, Y., Guan, D., Xiong, Y., Wang, S.: Policy assessments for
the carbon emission flows and sustainability of bitcoin blockchain operation in China. Nat.
Commun. 12, 1–10 (2021)
2. De Vries, A.: Cryptocurrencies on the road to sustainability: ethereum paving the way for
Bitcoin. Patterns, 100633 (2022)
3. Erdogan, S., Ahmed, M.Y., Sarkodie, S.A.: Analyzing asymmetric effects of cryptocurrency
demand on environmental sustainability. Environ. Sci. Pollut. Res., 1–11 (2022)
4. Alshahrani, H., Islam, N., Syed, D., Sulaiman, A., Reshan, A., Saleh, M., Rajab, K., Shaikh,
A., Shuja-Uddin, J., Soomro, A.: Sustainability in blockchain: a systematic literature review
on scalability and power consumption issues. Energies 16, 1510 (2023)
5. Mustafa, F., Lodh, S., Nandy, M., Kumar, V.: Coupling of cryptocurrency trading with the
sustainable environmental goals: is it on the cards? Bus. Strateg. Environ. 31, 1152–1168
(2022)
6. Digiconomist: Bitcoin Energy Consumption Index (2023). Digiconomist.com. https://digico
nomist.net/bitcoin-energy-consumption/
7. Bao, H., Li, J., Peng, Y., Qu, Q.: Can bitcoin help money cross the border: international evidence.
Financ. Res. Lett. 49, 103127 (2022)
8. Cole, B.M., Dyhrberg, A.H., Foley, S., Svec, J.: Can bitcoin be trusted? Quantifying the
economic value of blockchain transactions. J. Int. Financ. Mark. Inst. Money 79, 101577
(2022)
9. Yavuz, M.S., Bozkurt, G., Boğa, S.: Investigating the market linkages between cryptocurrencies
and conventional assets. EMAJ Emerg. Mark. J. 12, 36–45 (2022)
10. Kubal, J., Kristoufek, L.: Exploring the relationship between Bitcoin price and network’s
hashrate within endogenous system. Int. Rev. Financ. Anal. 84, 102375 (2022)
11. Murty, S., Victor, V., Fekete-Farkas, M.: Is bitcoin a safe haven for Indian investors? A GARCH
volatility analysis. J. Risk Financ. Manag. 15, 317 (2022)
12. Mariani, F., Polinesi, G., Recchioni, M.C.: A tail-revisited Markowitz mean-variance approach
and a portfolio network centrality. Comput. Manag. Sci. 19, 425–455 (2022)
13. Baur, D.G., Oll, J.: Bitcoin investments and climate change: a financial and carbon intensity
perspective. Financ. Res. Lett. 47, 102575 (2022)
14. Frankenfield, J.: Proof of Work (PoW). Investopedia.com (2021). https://www.investopedia.
com/terms/p/proof-work.asp
15. Aste, T.: The fair cost of bitcoin proof of work. SSRN Electron. J., 0–2 (2016).https://doi.org/
10.2139/ssrn.2801048
16. Hallinan, K.P., Hao, L., Mulford, R., Bower, L., Russell, K., Mitchell, A., Schroeder, A.: Review
and demonstration of the potential of bitcoin mining as a productive use of energy (PUE) to
aid equitable investment in solar micro-and mini-grids worldwide. Energies 16, 1200 (2023)
17. Kumari, P., Mamidala, V., Chavali, K., Behl, A.: The changing dynamics of crypto mining and
environmental impact. Int. Rev. Econ. Financ. (2023)
264 S. Harichandan et al.
18. Sibande, X., Demirer, R., Balcilar, M., Gupta, R.: On the pricing effects of bitcoin mining in
the fossil fuel market: the case of coal. Resour. Policy 85, 103539 (2023)
19. Bruno, A., Weber, P., Yates, A.J.: Can Bitcoin mining increase renewable electricity capacity.
Resour. Energy Econ., 101376 (2023)
20. Asgari, N., McDonald, M.T., Pearce, J.M.: Energy modeling and techno-economic feasibility
analysis of greenhouses for tomato cultivation utilizing the waste heat of cryptocurrency miners.
Energies 16, 1331 (2023)
21. Sapra, N., Shaikh, I.: Impact of bitcoin mining and crypto market determinants on bitcoin-based
energy consumption. Manag. Financ. (2023)
22. Chow, S., Peck, M.E.: The bitcoin mines of China. IEEE Spectr. 54, 46–53 (2017). https://doi.
org/10.1109/MSPEC.2017.8048840
23. Jiang, S., Li, Y., Lu, Q., Hong, Y., Guan, D., Xiong, Y., Wang, S.: Policy assessments for
the carbon emission flows and sustainability of Bitcoin blockchain operation in China. Nat.
Commun. 12, 1–10 (2021). https://doi.org/10.1038/s41467-021-22256-3
24. Mittal, M.L.: Estimates of emissions from coal fired thermal power plants in India 39, 1–22
(2010)
25. Gogo, J.: 65% of Global Bitcoin Hashrate Concentrated in China. Bitcoin.com (2020)
26. Benetton, M., Compiani, G., Morse, A.: CryptoMining: pollution, government incentives and
energy crowding out. (2019)
27. Murtaugh, D.: The possible Xinjiang coal link in Tesla’s bitcoin binge. Bloom (2021)
28. EIA: International electricity consumption. US Energy Inf. (2019)
29. Narayanan, A.: Hearing on Energy Efficiency of Blockchain and Similar Technologies (2018)
30. Redman, J.: BTC’s Hashrate Touches 120 Exahash, But the Price Has Not Followed (2020).
Bitcoin.com. https://news.bitcoin.com/btcs-hashrate-touches-120-exahash-but-the-price-has-
not-followed/
31. Li, J., Li, N., Peng, J., Cui, H., Wu, Z.: Energy consumption of cryptocurrency mining: a study
of electricity consumption in mining cryptocurrencies. Energy 168, 160–168 (2019). https://
doi.org/10.1016/j.energy.2018.11.046
32. Küfeoğlu, S., Özkuran, M.: Bitcoin mining: a global review of energy and power demand.
Energy Res. Soc. Sci. 58, 101273 (2019). https://doi.org/10.1016/j.erss.2019.101273
33. OEERE: How Much Power is 1 Gigawatt? (2019). energy.gov. https://www.energy.gov/eere/
articles/how-much-power-1-gigawatt
34. Stoll, C., Klaaßen, L., Gallersdörfer, U.: The carbon footprint of bitcoin. Joule 3(7), 1647–1661
(2019). https://www.cell.com/joule/abstract/S2542-4351(19)30255-7&lang=en
35. CCAF: Cambridge Bitcoin Electricity Consumption Index (2017)
36. Hayes, A.: What Happens to Bitcoin After All 21 Million Are Mined? (2021). Investo-
pedia.com. https://www.investopedia.com/tech/what-happens-bitcoin-after-21-million-
mined/
37. Kim, C.: With 18 million bitcoins mined, how hard is that 21 million limit? (2019). coindesk
indices. https://www.coindesk.com/with-18-million-bitcoins-mined-how-hard-is-that-21-mil
lion-limit
38. Yermack, D.: Is bitcoin a real currency? SSRN Electron. J. (2013). https://doi.org/10.2139/
ssrn.2361599
39. de Best, R.: Number of bitcoins in circulation worldwide from October 2009 to April 13,
2021(in millions) (2021). Statista.com. https://www.statista.com/statistics/247280/number-of-
bitcoins-in-circulation/
40. Song, Y.D., Aste, T.: The cost of bitcoin mining has never really increased. arXiv 3, 1–8 (2020).
https://doi.org/10.3389/fbloc.2020.565497
41. Cocco, L., Tonelli, R., Marchesi, M.: An agent based model to analyze the bitcoin mining
activity and a comparison with the gold mining industry. Futur. Internet 11, 1–12 (2019).
https://doi.org/10.3390/fi11010008
42. de Vries, A.: Bitcoin’s growing energy problem. Joule 2, 801–805 (2018). https://doi.org/10.
1016/j.joule.2018.04.016
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 265
43. Leyman, P., Vanhoucke, M., Althusser, L., Foucault, M.: The soul is the prison of the body.pdf.
Int. J. Prod. Res. (2018). ISBN: 978-92-9260-061-7
44. de Vries, A.: Renewable energy will not solve bitcoin’s sustainability problem. Joule 3, 893–898
(2019). https://doi.org/10.1016/j.joule.2019.02.007
45. Ismail, L., Materwala, H.: A review of blockchain architecture and consensus protocols: use
cases, challenges, and solutions. Symmetry 11 (2019). https://doi.org/10.3390/sym11101198
46. Williamson, S.: Is bitcoin a waste of resources? Fed. Reserv. Bank St. Louis Rev. 100, 107–115
(2018). https://doi.org/10.20955/R.2018.107-15
Chapter 13
Emerging Economies: Volatility
Prediction in the Metal Futures Markets
Using GARCH Model
Abstract This paper aims to study the volatility and its prediction using the GARCH
(1,1) model in the metal futures of two emerging economies, India and China. The
Metals considered for the study are aluminium, copper, lead, nickel, zinc, gold,
and silver. This study uses daily data from January 2016 to May 2021 from the
Shanghai Futures Exchange (SHFE) and Multi Commodity Exchange (MCX). The
study’s findings suggest the presence of short-run, long-run, and overall persistence
of shocks for all the metals.
13.1 Introduction
Trading in commodities has a much longer history than today’s frequently traded
asset classes like stocks, mutual funds, and even real-estate. It dates to the era when
people had no common currency, and the barter system prevailed. In modern times,
trading in commodities is still taking place, instead with more complex contracts
like futures and options, with more dedicated nationalized institutions, regulators,
and other important stakeholders. Trading in a commodity is as essential as anything
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 267
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_13
268 R. Kumar et al.
for economic development, for hopeful farmers’ growth and financial safety, and to
stretch other economic parameters like GDP and per capita income. It has a significant
role in bringing stability in price across the market and hedging price risks, which is
beneficial for agriculturists and manufacturing industries using agricultural products
as raw materials. For net importing countries, price movement predominantly affects
the economy [12].
Moreover, [7] highlight two critical roles of the futures market: hedging risks and
price discovery processes. Pavabutr and Chaihetphon [12] find the futures market’s
importance as this market responds to new information faster than the spot market
for lower transaction costs and a higher degree of leverage. The commodity market
provides a new asset class with the benefit of active participation in the commodity
market and helps disperse risk concentration. In the Chinese commodity futures
market, commodity futures have also been found to provide an effective tool for
the diversification of assets and combatting expected and unexpected inflation in the
economy [17]. However, the growth of commodity derivatives in such a globalized
and liberalized economy is not up to the expectations of investors and economists
because many others believe that speculation in commodities, especially in food
commodities, would cause malfunctioning of the spot market, and the prices could
be badly manipulated. This may result in an inflationary effect on the essential
commodities.
Moreover, in the absence of active participation of small farmers, this market often
needs to accommodate farmers [6]. Contrary to the theory of benefits to hedgers,
derivative markets have usually been found to be more favorable to speculators than
hedgers, and the possible reasons could be rigid contract specification, big lot size,
high transaction cost, taxes, and government intervention in free play. It is noticed
that the development and the studies on the commodity market are undoubtedly
unmatched by the potential of this market [10, 18, 19]. The issue of risk is universal
in any asset class. Although the volatility of some commodities is more frequently
studied, little attention is given for base metals and precious metals in both the
emerging markets. To the address the gap, the present study aims to extend the
literature in this area by investigating the extent to which ARCH and GARCH effects
has been present in base metals and precious metals price returns.
Since sound investment decisions are based on risk and return trade-off and
increased investment activities in the commodity market needs careful analysis and
estimation of future expected return, this paper aims to study the characteristics
of volatility and its prediction in the base metals (aluminium, copper, lead, nickel,
and zinc) and precious metals (gold and silver) futures markets of India and China,
representing the most prominent emerging economies using GARCH(1,1) Model
to deliver more accurate forecasts of future variance compared to the unconditional
variance.
13 Emerging Economies: Volatility Prediction in the Metal Futures … 269
The rest of the paper is structured as follows. Section 13.2 describes a brief
literature review of the stock and commodity market volatility studies in various
economies. Further, the data and methodology used have been pronounced in
Sect. 13.3. The results and discussion are elaborated in Sect. 13.4, and the conclusion
is deliberated in Sect. 13.5.
An investor needs to assess not only the return of a financial asset but also take care
of its risk. Risk is measured by modeling the volatility in the returns of a security.
Higher volatility represents a higher risk, and lower volatility indicates a lower risk.
Therefore, modeling and predicting volatility with greater accuracy is very important
in assessing a financial asset. Various authors have used the parsimonious model,
GARCH (1,1), to model and predict the volatility of a financial asset.
Karmakar [8] used standard and asymmetric models of GARCH to test the
predictability and asymmetricity of volatility in the Indian stock market returns.
The author reports the persistence and predictability of volatility in the market.
Regarding asymmetricity in volatility, it is reported that the market showed higher
volatility in times of the declining market. Bahadur [2] found GARCH (1,1) as the
most appropriate model for volatility forecasting in the Nepalese stock market and
reported clustering of high and low volatility periods, persistency, and the possibility
of prediction of volatility in the market. Similarly, [16] analyzed the Indian stock
market index (Sensex) using symmetric and asymmetric models of GARCH. They
found that despite the leverage effect, the symmetric GARCH model had a better
forecast of market volatility. [1] compared the predictive ability of the GARCH
(1,1) model and the implied volatility obtained from inverting the black equation.
The author modeled the volatility of WTI futures contracts traded at the NYMEX
and found the results of GARCH models to be more accurate [1]. Mahalakshmi
et al. [11] also used GARCH (1,1) on the MCX commodity index data from 2006
to 2011 and found the significant impact of its past price movements. Kumar and
Singh [9] applied various models of GARCH on the risk and returns of Indian stock
and commodity markets from 1990 to 2007. The results showed the presence of
volatility clustering, persistence, and asymmetric behavior of volatility. The author
also reported that the risk-return relationship was insignificant for the NIFTY index
and agricultural commodity (soybean).
On the contrary, the risk-return relationship for the gold was found to be positive
and significant. Using the daily returns data from 1996 to 2010 from the US stock
market, [15] applied simple GARCH, exponential GARCH, and threshold GARCH
models to forecast the volatility and found that the symmetric GARCH is better
than the asymmetric models in the forecasting of the volatility of S&P 500 index
returns. [14] studied the volatility in the stock market returns of Asian countries using
270 R. Kumar et al.
the Exponential GARCH model and found the clustering of volatility, persistence,
asymmetry, and leverage effect in the return of the stock markets of India, China, and
Japan, and Hong Kong. The authors also reported the positive impact of the subprime
crisis on the volatility of returns in India, China, and Japan. On the other hand, the
period of the Eurozone debt crisis showed a negative impact on the stock returns of
India and China.
Volatility spillover and transmission have played an important role in interna-
tional economic decisions [13]. Forecasting volatilities in any financial asset class
is of prime importance for risk management, asset pricing, and asset allocation [3].
Volatility spillover in commodities has been weaker than other asset classes but has
also been increasing over time. Moreover, agricultural commodities contribute less
than metal and energy in spillovers [4]. Metal markets of LME are found to be highly
integrated across the market [5]. Compared to the agricultural futures market, the
metal futures market in China is also more efficient and less risky. However, overall,
the Chinese commodity futures market lags behind the US market in terms of liquidity
and volatility [10]. China and the US agriculture commodity futures market show
significant positive correlation and high upside and downside risk spillover during
high uncertainty [19]. Risk spillover is extreme between Shanghai and London Gold
futures markets in the pre and post-crisis periods [18].
Daily data for the analysis has been retrieved from the websites of the respective
exchanges (MCX and SHFE). The period of study has been taken from January
2016 to May 2021. Aluminium, copper, lead, nickel, and zinc have been taken from
the base metals segment. From the bullion’s category, gold and silver have been
considered. For the commodities from the Shanghai futures exchange, the contract
with the highest trade (open interest) has been used to prepare continuous price
series for each date. MCX provides individual metal indices that are designed using
a well-defined methodology. The analysis is done using RStudio software.
Innovation (adaptability), persistency, and mean reversion are the three main char-
acteristics of Volatility. Researchers and academicians have been employing simple
moving averages, exponential weighted moving averages, and the GARCH models
to model and forecast volatility. The simple moving average method is needed to
capture the mean reversion property. Further, the adaptability also depends upon the
window size considered in the model. The exponential weighted moving average
emphasizes innovation and persistence factors in its model. It is mathematically
represented as
where r2n−1 represents innovation and α is the innovation factor. Similarly, σn−1
2
this equation sum of α and β is equal to 1. The exponential weighted moving average
is based on exponentially decreasing weight as the lag value increases.
Descriptive statistics for the returns have been presented in Table 13.4 in the appendix.
The negative skewness of returns and positive kurtosis with more than 3 (excess
kurtosis) indicates that the time series data are leptokurtic relative to the normal.
It shows positive and significantly different from zero, indicating that the series is
leptokurtic, displays non-normality, and the existence of heteroscedasticity.
Before applying the GARCH (1,1) model, we test the presence of autocorrelation
in all the variables using the ARCH LM test, and the results have been presented in
Table 13.1. The P-values of the arch test for most of the variables are found to be
significant at 1 and 5%. Only nickel futures at both exchanges are significant at 10%.
Figure 13.1 in the appendix represents the time-series graph of the log return of the
variable, which shows the clustering of volatilities in the variables.
ADF tests are used in the study to examine the stationary properties of time series.
The result of the test statistics rejects the Null hypothesis of variables that are not
272 R. Kumar et al.
stationary. So, the stationary of the time series has been confirmed by accepting the
alternative hypothesis. Results of the ADF test for confirming the stationarity of data
have been presented in Table 13.2. It states that all the return series are stationary at
level.
The GARCH results have been presented in Table 13.3. The sum of the ARCH
term (α) and GARCH term (β) is less than 1 for all the variables. It confirms that
ω is positive in all the cases, as seen in Table 13.3. This indicates the presence of
mean reversion property and ensures the stability of the model. The innovation char-
acteristics are shown by the innovation factor (α). In the metals segment (aluminium,
copper, lead, nickel, and zinc) of both the exchanges, α is found to be significant (at
1 percent) for all the variables except aluminium (SHFE) and lead (MCX). This indi-
cates that for most of the metal futures of MCX and SHFE, a short-run persistence
of shocks exists. For bullions, including gold and silver futures, the ARCH term is
significant in all cases, indicating the presence of short-term volatility persistence.
Moreover, it is found that for all the metals except silver, short-term persistency
is higher at SHFE. Long-run persistence is depicted by β, which is significant at 1%
for all the variables, including metals and bullions. Since the ARCH terms (except
aluminium at SHFE and lead at MCX) and the GARCH terms are significant for
all the metals at both exchanges, it is inferred that the volatility can be forecasted
for the metals and bullions futures at MCX and SHFE. The ARCH and GARCH
terms contain information from one previous period return and conditional variance,
respectively. Further, the one-period lagged conditional variance term can be said
to contain information from past returns (multiple lags). Therefore, the value of the
GARCH term (β) is supposed to be much higher than the ARCH term (α). We find
that for all the variables, the weightage of the ARCH term (α) is much lesser than the
GARCH term (β). The sum of the ARCH and GARCH terms is close to 1 in all the
cases, which shows the overall persistence of volatility. The closeness of the sum of α
and β to 1 shows the degree of persistency of volatility. For aluminium, nickel, silver,
and zinc, the overall persistency is higher at SHFE, and for the other metals (copper,
lead, and gold), it is higher at MCX. These results are consistent with the literature
on volatility modeling and prediction [9, 14]. In addition, [11] have also reported
the significant impact of past price movements in the MCX commodity index. The
findings of this paper have various important implications for investors and portfolio
managers. Also, the study is believed to enrich the literature on commodity market
volatility in emerging economies.
The study aims to investigate the volatility prediction for the metals segment,
including base metals (aluminium, copper, lead, nickel, and zinc) and precious metals
(gold and silver) from MCX and SHFE using daily returns data from January 2016
to May 2021 applying GARCH (1,1) model. The results of the study report that the
ARCH terms (except aluminium at SHFE and lead at MCX) and GARCH terms are
highly significant for all the metals at both exchanges. This inferred that volatility
can be predicted for all the variables. Furthermore, it is found that there is an overall
persistence of volatility in the metal futures traded at the exchanges.
The present study’s findings will provide valuable insights to the researchers,
investors/ institutional investors to understand the volatility characteristics and
13 Emerging Economies: Volatility Prediction in the Metal Futures … 275
prediction methods within the base metals (aluminium, copper, lead, nickel, and zinc)
and precious metals (gold and silver) futures markets of India and China to make
better investment decisions by diversifying their risk. The study’s results signifi-
cantly influence shaping the portfolio risk management strategies for individuals
and institutional investors participating in commodity markets. The analysis and the
information from the market movement can be instrumental to the investors/traders in
making better investment decisions and portfolio diversifications. The results of our
analysis is crucial for investors, portfolio managers, researchers, and practitioners
interested in effectively managing risk and optimizing investment strategies within
these emerging economies. It will also be helpful to policymakers, regulators, and
commodity traders to formulate better strategies to capture market share in the global
arena.
Volatility has become an essential topic in studying the risk associated with a
financial asset. The commodity markets have various categories of commodities
and can accommodate various stakeholders. In emerging economies, the commodity
market has enormous development, research, and returns potential. Therefore, in
the future, the research on the volatility and connections in volatility among the
markets can be extended in various ways with multiple other econometric tools
to accommodate a wide range of stakeholders, including farmers (providing the
commodity as raw material), industrialists (demanding raw material), speculators,
policymakers and governments, etc.
Appendix
Fig. 13.1 Time plot of metal futures returns. Note The letter ‘I’ before the metals name denote India (MCX), and similarly, the letter ‘C’ before the metals
name denotes China (SHFE)
R. Kumar et al.
Table 13.4 Descriptive statistics of returns
MCX SHFE
Alum. Copper Lead Nickel Zinc Gold Silver Alum. Copper Lead Nickel Zinc Gold Silver
N 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291
Mean 1.83E − 4.08E − 4.43E 3.76E 5.87E − 3.60E 2.12E 4.36E 5.52E − 1.33E 5.05E − 4.34E 4.18E 4.26E
04 04 − 06 − 04 04 − 04 − 04 − 04 04 − 04 04 − 04 − 04 − 04
Median − 3.80e 3.31E − 4.38E 4.44E 0.00108 3.01E 3.56E 3.60E 3.97E − 2.98E 0.00131 2.14E 3.51E 2.52E
−4 04 − 04 − 04 − 04 − 04 − 04 04 − 04 − 04 − 04 − 04
Minimum − − 0.0667 − − 0.08 − 0.0674 − − − − 0.0647 − − 0.0722 − − −
0.0905 0.0564 0.0565 0.119 0.067 0.0741 0.0709 0.0481 0.103
Maximum 0.0741 0.044 0.0712 0.0726 0.0733 0.0485 0.068 0.0498 0.0616 0.0696 0.0695 0.0575 0.054 0.0806
Skewness 0.342 − 0.302 0.0305 − 5.07E − − − − − 0.1 − − 0.155 − − −
0.138 04 0.378 0.934 0.289 0.156 0.136 0.0438 0.379
Std. error 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681
13 Emerging Economies: Volatility Prediction in the Metal Futures …
Kurtosis 7.39 2.6 2.33 1.89 1.76 5.17 9.9 3.72 5.06 2.73 1.67 1.82 5.31 6.66
Std. error 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136
Source Author’s calculation
277
278 R. Kumar et al.
References
1. Agnolucci, P.: Volatility in crude oil futures: a comparison of the predictive ability of GARCH
and implied volatility models. Energy Econ. 31(2), 316–321 (2009). https://doi.org/10.1016/j.
eneco.2008.11.001
2. Bahadur, G.C.S.: Volatility analysis of Nepalese stock market. J. Nepalese Bus. Stud. 5(1),
76–84 (2009). https://doi.org/10.3126/jnbs.v5i1.2085
3. Chen, R., Xu, J.: Forecasting volatility and correlation between oil and gold prices using a
novel multivariate GAS model. Energy Econ. 78, 379–391 (2019). https://doi.org/10.1016/j.
eneco.2018.11.011
4. Chevallier, J., Ielpo, F.: Volatility spillovers in commodity markets. Appl. Econ. Lett. 20(13),
1211–1227 (2013). https://doi.org/10.1080/13504851.2013.799748
5. Ciner, C., Lucey, B., Yarovaya, L.: Spillovers, integration and causality in LME non-ferrous
metal markets. J. Commod. Mark. 17 (2020). https://doi.org/10.1016/j.jcomm.2018.10.001
6. Dey, K., Maitra, D.: Can commodity futures accommodate India’s farmers? J. Agribus. Dev.
Emerg. Econ. 6(2), 150–172 (2016)
7. Hua, R., Chen, B.: International linkages of the Chinese futures markets. Appl. Fin. Econ.
17(16), 1275 (2007). https://doi.org/10.1080/09603100600735302
8. Karmakar, M.: Asymmetric volatility and risk-return relationship in the Indian stock market.
South Asia Econ. J. 8(1), 99–116 (2007). https://doi.org/10.1177/139156140600800106
9. Kumar, B., Singh, P.: Volatility modeling, seasonality and risk-return relationship in GARCH-
in-mean framework: the case of Indian stock and commodity markets. SSRN Electron. J.
(2011). https://doi.org/10.2139/ssrn.1140264
10. Liu, Q., Luo, Q., Tse, Y., Xie, Y.: The market quality of commodity futures markets. J. Futures
Mark., 1–16 (2020). https://doi.org/10.1002/fut.22115
11. Mahalakshmi, S., Thiyagarajan, S., Naresh, G.: Commodity derivatives behaviour in Indian
market using ARCH/GARCH. JIMS8M J. Indian Manage. Strat. 17(2), 60–64 (2012)
12. Pavabutr, P., Chaihetphon, P.: Price discovery in the Indian gold futures market. J. Econ. Fin.
34(4), 455–467 (2010). https://doi.org/10.1007/s12197-008-9068-9
13. Seth, N., Panda, L.: Financial contagion: review of empirical literature. Qual. Res. Fin. Mark.
10(1), 15–70 (2018). https://doi.org/10.1108/QRFM-06-2017-0056
14. Singhania, M., Anchalia, J.: Volatility in Asian stock markets and global financial crisis. J.
Adv. Manage. Res. 10(3), 333–351 (2013). https://doi.org/10.1108/JAMR-01-2013-0010
15. Srinivasan, P.: Modeling and forecasting the stock market volatility of S&P 500 index using
GARCH models. IUP J. Behav. Fin. 1, 51–69 (2011)
16. Srinivasan, P., Ibrahim, P.: Forecasting stock market volatility of Bse-30 index using GARCH
models. Asia Pac. Bus. Rev. 6(3), 47–60 (2010). https://doi.org/10.1177/097324701000600304
17. Tu, Z., Song, M., Zhang, L.: Emerging impact of Chinese commodity futures market on
domestic and global economy. Chin. World. Econ. 21(6), 79–99 (2013). https://doi.org/10.
1111/j.1749-124X.2013.12047.x
18. Wang, G.J., Xie, C., Jiang, Z.Q., Stanley, H.E.: Extreme risk spillover effects in world gold
markets and the global financial crisis. Int. Rev. Econ. Financ. 46, 55–77 (2016). https://doi.
org/10.1016/j.iref.2016.08.004
19. Zhu, Q., Tansuchat, R.: The extreme risk spillovers between the US and China’s agricultural
commodity futures markets. J. Phys. Conf. Ser. 1324(1) (2019). https://doi.org/10.1088/1742-
6596/1324/1/012085
Chapter 14
Constructing a Broad View of Tax
Compliance Intentions Based on Big Data
Abstract Taxpayer compliance is currently one of the problems faced by the govern-
ment in any country, especially in developing countries. Taxpayer compliance can
be assessed from the intention of the taxpayer toward compliance with the tax itself.
The Directorate General of Taxes as an extension of the government in tax matters
has the obligation to implement policies and technical standardization in the field
of taxation, including in providing efforts to provide an understanding of the impor-
tance of taxation in developing a country. Therefore, tax compliance is an important
thing to be improved by the government through tax compliance intentions. The
basis related to compliance intentions is the Theory of Planned Behavior by (Ajzen
in Organ Behav Hum Decis Process 50:179–211, 1991 [1]) which in this case is
used as the basis for this discussion. To determine the factors driving the intention
to comply, it is necessary to search for any variables that influence this. By using big
data, the variables obtained are more in line with the current reality, because they are
taken directly from the virtual world, which is more specifically sourced from online
media. After the data is mined from social media, the raw data will be extracted and
analyzed using Discourse Network Analysis (DNA) to be compiled into a variable
which will then be modeled using Structural Equation Modeling (SEM).
M. S. Utama (B)
Directorate of International Tax, Directorate General of Taxes in Indonesia, Jakarta, Indonesia
e-mail: [email protected]
Solimun · A. A. R. Fernandes
Department of Statistics, University of Brawijaya, Malang, Indonesia
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 279
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_14
280 M. S. Utama et al.
Taxes are compulsory contributions to the state owed by individuals or entities that
are compelling based on the Law, with no direct reward and are used for state purposes
for the greatest prosperity of the people. According to Prof. Dr. Rachmat Sumitro, SH
in 1990, taxes are people’s contributions to the state treasury (the transfer of wealth
from the people’s treasury to the government sector) based on the law to finance
routine expenses, and the surplus is used for public saving which is the main source
for financing public investment.
Tax Compliance refers to complying with all tax obligations as prescribed by
law in a free and complete manner, or the extent to which taxpayers comply or
fail to comply with their country’s tax regulations. Tax Compliance is the extent
to which taxpayers comply with tax law and full payment of all taxes owed. It
is also defined as the process by which a taxpayer files all required tax returns
by accurately declaring all income and paying tax obligations using applicable tax
laws and regulations. Theoretically, it can be defined by considering three different
types of compliance such as payment compliance, filing compliance, and reporting
compliance (Braithwaite 2009 in [2]). For a long time, tax compliance has been
associated with fiscal policies that are based on penalties, such as the use of tax penalty
instruments such as tax audits, fines, or other penalties. Attempts to explain taxpayer
behavior, centered on threats and despair, cannot offer a realistic, comprehensive,
and comprehensive image of Tax Compliance. Simply traditional factor management
of Tax Compliance is an expensive way to try to improve compliance.
Therefore, many researchers have introduced in the equation explaining Tax
Compliance and non-compliance behavior of taxpayers, in addition to economic
factors, classical (the broader related factors related to economic conditions: actual
level of income, tax rates, tax benefits, tax audits, audit probabilities, fines, and penal-
ties) and several non-economic factors. The latter, also called socio-psychological
factors, are taken into account to explain Tax Compliance behavior from a deeper and
more realistic perspective, thereby shaping modern fiscal policy, which is centered
on typology and the needs of citizens. In the category of non-economic factors, we
can include, for example, public education, and tax morale [3].
Tax compliance intention can also be defined as Tax Compliance. According to
James and Alley in [4], defines Tax Compliance is “the willingness of taxpayers to
act by the ‘spirit’ and ‘letter’ of law and tax administration without implementing law
enforcement activities”. According to Cuccia in [4] conducting their study in Brazil
defines Tax Compliance as filing all required tax returns promptly, and accurately
reporting tax obligations by the tax laws in effect at the time the returns are filed. It
can be concluded that Tax Compliance is the willingness to pay taxes following the
specified time.
Based on the theoretical study of tax compliance intention can be associ-
ated with the theory of planned behavior. The theory also defines concepts in
predictable ways and understands certain behaviors in certain contexts. Attitudes
toward behavior, subjective norms concerning behavior, and perceived control over
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 281
behavior (Perceived Behavior Control) are usually found to predict behavioral inten-
tions with a high degree of accuracy. Furthermore, intention, in combination with
perceived behavior control, can explain most of the variation in behavior [1].
This chapter discusses how attitudes, subjective norms, and behavior control affect
tax compliance intentions. The influence of Attitudes on Tax Compliance Intentions
is supported by the Attitude theory which was developed from Attitude Toward
the Behavior in the Theory of Planned Behavior [5, 6]. The effect of subjective
norms on tax compliance intentions is supported by the theory of subjective norms
developed from subjective norms in the theory of planned behavior [5, 7, 8]. The
effect of perceived Behavioral Control on Tax Compliance Intentions is supported
by the theory of perceived Behavioral Control which was developed from Perceived
Behavioral Control in the Theory of Planned Behavior [5, 9, 10].
Attitude is a driving factor for Tax Compliance Intentions in complying with tax
payments in Indonesia. Any changes that occur in Attitude will have a positive effect
on a significant change in Tax Compliance Intentions. Attitudes towards tax compli-
ance are formed by the beliefs of taxpayers regarding tax compliance which include
everything that is known, believed, and experienced by taxpayers regarding the imple-
mentation of tax regulations. Taxpayers’ beliefs about tax compliance behavior will
generate positive or negative attitudes toward tax compliance, which will further
shape the taxpayer’s intention to comply or not comply with applicable laws and
regulations. Main Research et al. stated that the better the attitude, the higher the
intention to comply with taxes, and conversely, the worse the attitude, the lower the
intention to comply with taxes. Likewise, the higher the attitude indicator which
consists of two items, behavior belief (Y1.1) and evaluation of behavioral belief
(Y1.2), the higher the application of tax compliance intentions.
Subjective norms are one of the variables that have an important role in forming an
intention for tax compliance. This is in accordance with what is explained in the TRA
theory which is a development of the TPB theory which states that subjective norms
have a role in one of the additive functions in behavioral intention, with behavioral
intention largely functioning in determining actual behavior. Main Research et al.
found that empirically the Subjective Norm is a driving factor for Tax Compliance
Intentions. Any changes that occur in the Subjective Norms will have a positive effect
on the Intention to Comply with Taxes, which means that the better the Subjective
Norms are, the higher the Intention to Comply with Taxes and vice versa. Subjective
Norm which consists of three items, namely the role of self-confidence in work,
self-confidence to consider what is important, and trust in the support of friends in
business, the higher the application of Tax Compliance Intentions.
Perceived behavioral control influences directly or indirectly (through intention)
behavior [5]. Direct influence can occur if there is actual control beyond the will of
the individual so that it influences behavior. The more positive the attitude toward
behavior and subjective norms, the greater the control one perceives, so the stronger
one’s intention to bring up certain behaviors. Finally, in accordance with the real
control conditions in the field (actual behavioral control), the intention will be realized
if the opportunity arises. Conversely, the behavior that appears may be contrary
to the individual’s intentions. This happened because the conditions in the field
282 M. S. Utama et al.
made it impossible to bring out the intended behavior so it would quickly affect
the individual’s perceived behavioral control. Perceived behavioral control that has
changed will affect the behavior displayed so that it is no longer the same as what
was intended.
Humans are social creatures. This shows that every human being who lives in this
world cannot be separated from the help of other people or always live side by side
with other humans. This behavior shows that someone will influence the behavior of
others.
Theory of planned behavior (TPB) is a theory that was developed from the theory
of reasoned action (TRA). TPB emerged because the previous theory only focused
on the rationality of behavior and actions within individual consciousness. Ajzen
says TPB has been widely accepted as a tool for analyzing the difference between
attitudes and intentions and as intentions and behavior. In this respect, attempts to
use TPB as an approach to explaining whistleblowing can help overcome some
of the limitations of previous research, and provide a means of understanding the
widely observed gap between attitudes and behavior [11]. Although in reality, some
individual behaviors are not entirely on individual awareness. Schematically, the
TPB model is as Fig. 14.1.
Ajzen and Fishben [12] refined the Theory of Reasoned Action (TRA) and gave
it the name TPB. TPB explains that the behavior carried out by individuals arises
because of the intention of the individual to behave and the individual’s intention
is caused by several internal and external factors from the individual. Individual
attitudes towards behavior include beliefs about a behavior, evaluation of the results
Fig. 14.1 The TPB model. Source Ajzen and Fishben [12]
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 283
of behavior, subjective norms, normative beliefs, and motivation to obey [13]. The
theory of Planned Behavior (TPB) seems to be very suitable for explaining the
intention to disclose fraud (whistleblowing), in this case, the action taken is based
on a very complex psychological process [14].
The theory of Planned Behavior explains that the behavior carried out by individ-
uals arises because of the intention to behave. Based on this theory, it can be seen that
intention is formed from attitude toward behavior, subjective norms, and perceived
behavioral control owned by individuals. The theory of Planned Behavior explains
that an individual’s intention to show behavior is determined by three factors, namely:
1. Attitude toward behavior (attitude toward behavior) Attitude toward behavior
(attitude toward a behavior) is a positive or negative evaluation of an object,
person, institution, event, behavior, or intention [5]. The theory of planned
behavior determines the nature of the relationship between beliefs and attitudes.
According to this theory, an individual’s evaluation or attitude toward a behavior
is determined by his or her beliefs about that behavior. The term trust in this theory
refers to the subjective probability that a behavior will produce a certain result.
Specifically, the evaluation of each outcome contribution to an attitude commen-
surate with the person’s subjective probability that the behavior produces the
outcome in question. Confidence is obtained when it is available from long-term
memory.
The concept of expected results comes from the expected value model.
Outcome expectancy can be in the form of beliefs, attitudes, opinions, or expec-
tations. According to the theory of planned behavior, an individual’s positive
evaluation of his performance on a particular behavior is similar to the concept
of perceived benefit. Positive evaluation refers to beliefs about the effective-
ness of the proposed behavior in reducing susceptibility to negative outcomes.
In contrast, negative self-evaluation refers to beliefs about the detrimental
consequences that can result from enacting a behavior.
2. Subjective norms (subjective norms) Subjective norms are factors outside the
individual that indicate one’s perception of the behavior implemented.
3. Perceived behavioral control (perceived behavioral control) Perceived ability to
control behavior is the individual’s perception or ability regarding the individual’s
control over a behavior.
From several definitions of the Theory of Planned Behavior according to some of
the researchers above, it can be concluded that Theory of Planned Behavior is the
intention that arises from the individual to behave and this intention is caused by
several internal and external factors from the individual. The intention to perform a
behavior is influenced by three variables, namely attitude toward the behavior, subjec-
tive norms, and perceptions of behavior control. TPB includes the volitional behavior
of people that cannot be explained by TRA. Individual behavioral intention cannot
be the exclusive determinant of behavior where individual control over behavior
is incomplete. By adding “perceived behavioral control,” the TPB can explain the
relationship between behavioral intention and actual behavior.
284 M. S. Utama et al.
Several studies have found that, compared to TRA, TPB is better at predicting
health-related behavioral intentions. TPB has improved the predictability of inten-
tion in various health-related areas, including condom use, recreation, exercise, diet,
etc. In addition, TPB (and TRA) have helped explain individual social behavior by
incorporating social norms as an important contribution.
More recently, some researchers have criticized the theory for ignoring individual
needs before committing to certain actions, needs that will influence behavior regard-
less of the attitudes expressed. haven’t ordered the steak yet because he wasn’t hungry.
Or, a person may have a negative attitude toward drinking and little intention of
drinking, but engage in drinking because he or she wants to belong to a group.
Another limitation is that the TPB does not integrate into theory the role that indi-
vidual emotions play in the development of intentions and during decision-making
games. In addition, most of the research on SDGs is correlational. More evidence
from randomized experiments would be helpful.
Several experimental studies challenge the assumption that intentions and
behavior are consequences of attitudes, social norms, and perceived behavioral
control. As an illustration, Sussman et al. [15] encouraged participants to form an
intention to support a particular environmental organization, for example signing a
petition. Once these intentions are formed, attitudes, social norms, and perceived
behavioral controls shift. Participants became more likely to report positive atti-
tudes toward these organizations and more likely to perceive that members of their
social group shared comparable attitudes. These findings imply that the relationship
between the three key elements—attitudes, social norms, and perceived behavioral
control—and intentions may be bidirectional.
Efforts that can be made to improve the prediction of behavior and traits are the
collection and unification of specific behaviors which include events, situations, and
forms of action [1, 16, 17]. In predicting human behavior known as the Theory of
Planned Behavior (TPB). This theory is an extension of the Theory of Reasoned
Action (TRA). The difference between TPB and TRA is that there is the addition of
one other construct in TPB, namely perceived behavioral control which is perceived
to influence a person’s intentions and behavior towards something.
The concept of intention and behavior in general has been studied in the Theory of
Reasoned Action (TRA) which was first introduced by Fishbein and Ajzen in 1975.
Within the TRA framework, behavioral intention which largely determines actual
behavior is an additive function of two variables, namely attitudes and norms subjec-
tive. Attitudes are favorable or unfavorable individual feelings about performing a
particular behavior. Attitudes include positive or negative evaluations of performing
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 285
the behavior. An individual will intend to perform a certain behavior when he eval-
uates it positively. Attitudes are determined by an individual’s beliefs about the
consequences of performing a behavior (behavioral beliefs), which are weighted by
their evaluation of these consequences (outcome evaluation). Thus, attitude is an
individual’s prominent belief, whether the result of his behavior will be positive or
negative.
Subjective norms are assumed as a function of beliefs that are approved or disap-
proved by individuals towards the behavior. Beliefs that underlie subjective norms
are normative. Normative social influence is defined as the influence of others that
directs us to adjust ourselves to be liked and accepted [18]. Even though an action
may not be accepted or approved by an individual, normative social influence puts
pressure on a person to comply with the social norms of the group. Normative social
influence has been shown to exert a high degree of persuasive influence on individ-
uals. An individual will intend to behave when he feels that others who are important
to him think he should do so.
In 1991, TRA was developed into the Theory of Planned Behavior (TPB) by
Ajzen. In his article, Ajzen tries to show that TPB provides a useful conceptual
framework for dealing with the complexities of human social behavior. This theory
incorporates several central concepts in the social and behavioral sciences. In addi-
tion, this theory also predictably defines concepts and understands certain behaviors
in certain contexts. Attitudes toward behavior, subjective norms concerning behavior,
and perceived control over behavior are usually found to predict behavioral inten-
tion with a high degree of accuracy. Furthermore, intention, in combination with
perceived behavior control, can explain most of the variation in behavior [1].
In order to better understand the measurement of attitudes, subjective norms, and
behavioral control, the concepts or factors forming them are first reviewed in the
Theory of Planned Behavior, as presented in Fig. 14.2.
Figure 14.2 is a schematic of the relationship between the variables involved in the
Theory of Planned Behavior. The figure explains that perceived behavioral control
together with behavioral intentions can be used directly to predict final behavior. In
addition, in predicting behavioral intentions, there is a role for behavioral attitudes
and subjective norms.
Religiosity is how far the knowledge is, how strong the belief is, how well the worship
and rules are carried out, and how deep the appreciation of the religion one adheres to
[20]. Religiosity is the strength of the relationship or individual belief in their religion
[21]. Religiosity is a complex integration between religious knowledge, feelings, and
religious actions in a person.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 287
In addition, Utama et al. [25] stated that there is a significant influence between
the use of e-Filing on perceived Behavior Control. In addition to the results of the
analysis of the direct effect, there are results of the indirect effect of the variable
Utilization of e-Filing on Tax Compliance Intentions through perceived Behavior
Control. The results of the analysis of the indirect effect of e-Filing Utilization on
Tax Compliance Intentions through Perceived Behavior Control conclude that the
effect of e-Filing Utilization in this relationship has a significant positive effect. This
indicates that perceived Behavioral Control is a mediating variable between e-Filing
Utilization and Tax Compliance Intentions. The coefficient is positive, meaning that
the better or increase the use of e-Filing, followed by an improvement or increase
in perceived Behavior Control, the better the intention to comply with taxes. The
existence of this significant influence is in line with the results of the measurement
model of the e-Filing Utilization variable that the strongest factor determines the level
of Utilization e-Filing is System Simplicity, so that the perceived Behavior Control
variable tends to have a high potential to encourage the Utilization of e-Filing on
Tax Compliance Intentions. The results of this indirect effect are also in line with
the Technology Acceptance Model (TAM) Theory, based on the TAM Theory states
that Perceived Behavior in the use of technology is the ease of the system and the
benefits provided.
This can be interpreted that every time there is a change in the variable Utilization
of e-Filing gives a significant influence or change in the perceived Behavior Control
variable. The results of this study support the concept of e-Filing which was devel-
oped based on the Regulation of the Director General of Taxes Number KEP-05/PJ/
2005 dated 12 January 2005 and Article 6 paragraph (2) of Law no. 16 of 2000. In
addition, the e-Filing system has high accuracy and can reduce errors in tax reporting
because generally e-Filing applications provide a double-checking feature, that is, if
an error occurs, the Taxpayer will receive an error message and cannot save and send
the report until it is corrected. By using e-Filing, Taxpayers are also able to make
environmentally friendly or reduce the use of paper in tax reporting.
From year to year, the development and use of the internet globally in today’s world
is increasing. This is inseparable from internet access which is increasingly easily
accessible and spreads to remote areas as well as access costs that are getting cheaper
to use the internet. One of the implications of this is the increase in the use of online
media which is increasingly being used by humans day by day.
Information news can not only be obtained through print media such as newspa-
pers, magazines, and so on as well as electronic media such as television and radio.
Online media which is seen as interactive media can also function as a medium
that provides various information in it, including news. The existence of the internet
290 M. S. Utama et al.
1. Information Speed
Journalism that uses the internet as a medium has an advantage over traditional
media, which is faster in distributing information. Generally, people have to wait
for the next day to find out what happened today. However, through online media,
information can be distributed along with events or issues that occur at that time.
Even though reports about an event through electronic media are also getting faster
now, this actuality will not be able to happen to print media. Because online media
is easily accessible, the delivery of information tends to be short. This also supports
one of the news values, namely actuality.
2. Information Update
The characteristics of the internet are unlimited and can be accessed anytime and
anywhere, making online media able to update previously published information
with more complete information. Information updates and publications do not have
a time limit and continue as long as they are relevant to the core information, in
contrast to broadcasting television programs during prime time and breaking news
which is available in electronic media.
3. Reciprocal
When compared to print and electronic media where communication goes in one
direction, online media gives communicants the flexibility to provide feedback in a
relatively short time. One example of online media that has a high level of interactivity
is a discussion group or forum. Internet users from various regions can write down
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 291
their thoughts on a topic being discussed. Online media such as news portals also
always provide a column at the bottom of the news for comments from readers and
complaints from the editorial team.
4. Personalization
Online media users have self-control, meaning that the communicant is given the
freedom to consume whichever information is deemed important or interesting. This
is different from print media, especially electronic media, where all information
is presented directly to the public without any control over choosing and filtering
information. In online media, users can search for the desired information through
search engines which are always provided by a website. Because of this, many online
media, especially news portals, provide categories for the news they publish.
5. Unlimited Capacity
The superior characteristic of online media is that there is no capacity limit to produce
and distribute information. Online media generally have a data bank or database that
can accommodate massive amounts of various kinds of information, so that audiences
can access even old information.
6. Link
Information published through online media can be connected with other related
information either on the same or different sites. As well as a citation in the literature.
7. Multimedia Capability
Online media makes it possible for communicators to include text, sound, images,
even videos, and other multimedia-based components in the news pages that are
presented.
Online media, if used wisely, can be used by the public to find and fulfill their need
for information. Basically, online media is not only used as a medium for communi-
cation between individuals, groups, and the masses, because online media is also used
by the public as an educational medium, such as disseminating news or information
about important events, discovering new things, and various other educational infor-
mation. Therefore, people will find it easy and fast to dig up educational information
from online media.
Information regarding the intention to comply with taxes is widespread in
cyberspace, where the information is complex enough that data processing tech-
niques are needed that are capable of extracting this information. Through the DNA
method, a discourse structure can be systematically identified in various textual docu-
ments so that through DNA, political, social, cultural, health, and other discourse
can be mapped and visualized into a network. The DNA analysis will obtain public
responses regarding the intention to comply with taxes from cyberspace. Based on
the DNA results, issues and concept issues will be obtained which can be used as
indicators and variables.
292 M. S. Utama et al.
Data is one of the important things for human life which can not only be interpreted
in a language dictionary, but data has its own essence. The essence in question
means that it can influence concepts, theories, and methods that are directly and
indirectly related. Data is the main component of an information system because
all information for decision-making comes from data. When data is processed, it
will produce information, information itself can be found in many places known
as information spaces, both in print copies and online copies, for example, books
have two forms, namely as documents published in printed form and those published
online. So it was concluded that enabling data derived from information can be
implemented.
Humans themselves have various accurate abilities to recognize units of informa-
tion as encoded data, but humans are not able to do it quickly and exceed the capacity
of the complexity of the human brain, especially if it involves a large enough pile
of information or what is commonly called big data. According to [27] Big Data is
a trend that covers a broad area in the world of business and technology, where Big
Data refers to technologies and initiatives that involve diverse, rapidly changing, and
very large data making it difficult to handle it effectively be it using conventional
database management tools or other data processing applications.
The Gartner IT Glossary (The Gartner IT Glossary, nd) defines Big Data as
follows: Big data is high-volume, high-velocity, and/or high-variety information
assets that demand cost-effective, innovative forms of information processing that
enable enhanced insight, decision making, and process automation. Referring to the
definition above, it is concluded that the characteristics of Big Data are volume,
velocity, and variety. Volume is the amount of data that must be managed with a very
large size, velocity is the speed of data processing which must develop the growth in
the amount of data, and variety is a characteristic of very diverse data sources, this
comes from structured databases and unstructured data.
According to, there are three things that bring about the development of Big Data
technology, namely as follows:
(1) The rapid growth of data storage capabilities
(2) The rapid increase in data processing engine capabilities
(3) Availability of abundant data
The Big Data process aims so that every business, organization, or individual
capable of processing data can obtain in-depth information (insights) so that it will
trigger decision making and a business action that can be relied on in insights. Big
Data technology can handle various variations of data, which are grouped into two,
namely structured data and unstructured data. Structured data is a group of data that
has a defined data type, format, and structure. Then, unstructured data is a group of
textual data with an erratic format and has no inherent structure, so it is necessary to
make it structured data but requires more effort and time.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 293
Big Data management has several stages that require assistance from tools to
support processing at each stage, the following are the stages of Big Data management
[28].
(1) Acquired
(2) Accessed
With regard to data access, data that has been collected requires governance, inte-
gration, storage, and computing so that it can be managed for the next stage. Devices
for processing (processing tools) use Hadoop, Nvidia CUDA, Twitter Storm, and
GraphLab. As for data storage management (storage tools) using Neo4J, Titan, and
HDFS.
(3) Analytic
Related to the information to be obtained, the results of data management that has
been processed. The analytics performed can be descriptive (describing data), diag-
nostic (looking for causes and effects based on data), predictive (predicting future
events), or prescriptive analytics (recommending choices and implications of each
option). Tools for the analytical phase use MLPACK and Mahout.
(4) Application
Regarding the visualization and reporting of the results of analytics. Tools for this
stage use RStudio.
Data mining is a process of dredging or collecting important information from
large data. The data mining process often uses statistical methods, and mathematics,
and makes use of artificial intelligence technology. Data mining has many functions,
namely:
1. Descriptive; refers to a function in understanding more detailed data. This process
aims to find patterns and characteristics of the data. By utilizing this descriptive
function, certain patterns or patterns can be found that were originally hidden in
data. That is, if there is a pattern that is repetitive and has value, it means that the
characteristics of the data can be known.
2. Predictive; is a function regarding a process that will later reveal a special pattern
from data. This pattern can be found in several variables contained in the data.
When you have found a pattern, that pattern can be used to estimate other variables
whose values are still unknown. This is why predictive functions are considered
equivalent to predictive analysis. Predictive can also be used to estimate a special
variable that is not in the data.
3. Associations; is a data mining function in which the process of identifying
relations (relationships) of each data. Both past and current data.
294 M. S. Utama et al.
Structural Equation Modeling (SEM) analysis was developed in the early 1950s.
The development of covariance analysis by Joreskog, Keesling, and Wiley in 1973
was the beginning of the emergence of software regarding SEM analysis. The main
purpose of the existence of SEM analysis software is to produce an analysis tool
that is more powerful than before and can solve problems that are more substantive
and comprehensive. The development of SEM analysis is currently increasingly
significant because of the need for researchers to solve the problem. At this time a lot
of software has been developed for SEM analysis, namely AMOS, LISREL, PLS,
GSCA, and TETRAD.
This analysis is the development of path analysis and multiple regression analysis,
where all of these methods are a form of multivariate analysis models. Ghozali [29]
describes the structural equation model (Structural Equation Modeling) as the second
generation of multivariate analysis techniques that allow researchers to examine the
relationship between complex variables both recursive and non-recursive to obtain
a comprehensive picture of the whole model. Solimun et al. [30] explained that
SEM is statistical modeling that can simultaneously involve the relationship between
research variables and their indicator models. The advantages of SEM analysis are
as follows [31].
1. Can test the relationship of causality, validity, and reliability as well.
2. Can be used to see the direct and indirect effects between variables.
3. Testing several dependent variables at once with several independent variables.
4. Can measure how much the indicator variables can influence the respective factor
variables.
5. Can measure factor variables that cannot be measured directly through the
indicator variables.
When using SEM analysis several variables must be understood, namely latent
variables and manifest variables [32]. Latent variables are variables that cannot be
measured directly. If this variable is described using a circle or oval or elliptical
icon. There are two kinds of latent variables, namely exogenous latent variables and
endogenous latent variables. Exogenous latent variables are variables that affect the
values of other variables in the model. Meanwhile, endogenous latent variables are
variables that are influenced directly or indirectly by exogenous variables. Examples
of latent variables, namely motivation, satisfaction, or attachment.
In addition to latent variables, there are manifest variables which are variables that
can be measured directly. The value of this variable can be found by conducting direct
research such as surveys and so on. Manifest variables are drawable with a rectangular
icon. One of the advantages of SEM analysis is being able to accommodate a study
involving intervening variables or variables that are dependent in one equation and
simultaneously become independent variables in other equations [33]. For example,
job satisfaction is influenced by the environment and at the same time job satisfaction
also affects work performance.
296 M. S. Utama et al.
In the SEM model, the minimum sample size that must be used is still being
debated. According to [34], structural models require a minimum sample of 200
observations or observations. Meanwhile, according to Hair et al. [35] argue that
the required minimum sample size is 100–150 observations. On the other hand,
[36] suggest that the sample size for SEM analysis is 5 times the parameter to be
estimated from the study. And finally, Byrne [37] suggests a minimum sample of
100 observations. Based on the various opinions expressed by experts, it shows that
the range of numbers is relatively not much different, namely the average states that
the minimum sample used is 100 samples.
There are two equation models in SEM analysis, namely [38].
1. Structural models
The structural model is a model that describes the relationship between latent vari-
ables. The parameter used to show the relationship between exogenous latent vari-
ables and endogenous latent variables is gamma. While the parameter that shows the
relationship between endogenous latent variables and other endogenous variables is
beta.
2. Measurement models
The measurement model is used to describe the relationship between the tenth vari-
able and the observed variable. Lambda is used to denote factor loading which relates
latent variables and observed variables.
If the researcher combines structural model testing with measurement model
testing, it allows researchers to carry out factor analysis together with hypothesis
testing and makes it possible to test measurement errors as an inseparable part of
SEM. The following is an error in SEM.
1. Structural fault
Structural errors occur when the latent variable cannot perfectly predict the dependent
variable so the structural error component is displayed in the structural model.
2. Measurement error
Measurement errors occur because the observed variables cannot perfectly describe
the latent variables, so a measurement error component needs to be added.
In SEM analysis it is necessary to test the goodness of the model or Goodness of
fit. There are several ways to measure the goodness of the model, namely:
Information mining results from online media using DNA can be integrated with SEM
modeling; this is known as the mixed method approach. The researcher first begins
by exploring the views that exist in online media, namely the intention to comply
with taxes. The data is then analyzed, and the information is used to construct the
most suitable instrument for the sample studied, to identify the appropriate instrument
which is then used to determine the variables (mining yields) that need to be included
in the model.
The sources used to obtain data in this study are content in cyberspace or social
media (data scraping). Data Scrapping refers to a technique in which a computer
program extracts data from the output produced by another program. One form of
Data scraping is text mining, which is the process of using an application to extract
important information from websites [40]. After scraping the data, DNA (Discourse
Network Analysis) will be carried out to find out the actors involved and the issues
raised. Furthermore, these issues are grouped so that the concepts of issues that
represent tax compliance intentions are obtained. An illustration of DNA output can
be seen in Fig. 14.3.
From Fig. 14.3 it can be seen that based on the results of DNA analysis, the
data will consist of two categories, namely issues and actors. These issues and actors
relate to the analysis of data obtained based on information obtained from cyberspace
regarding the topic under study. The actor shown icon is a black circle and a black
square is an issue. In addition, the DNA results also obtained sentiments from state-
ments/issues visualized with green lines representing positive sentiments, red lines
representing negative sentiments, and blue lines indicate several discourses which
contain positive and negative sentiment results.
After obtaining the concept of the issue that describes the intention to obey taxes,
then a research model is created. Apart from being based on the literature review, the
research model is added with research variables obtained from the results of DNA
298 M. S. Utama et al.
Attitude (Y1)
Religiosity (X1)
Tax
Use of Subjective Norms Compliance
e-Filing (X2) (Y2) Intention (Y4)
Fig. 14.4 Illustration of the research model from the integration of DNA and SEM outputs
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 299
Attitude (Y1)
Religiosity (X1)
Tax
Use of Subjective Norms Compliance
e-Filing (X2) (Y2) Intention (Y4)
Based on the conceptual model framework based on empirical studies and literature
review and by the formulation of the problems that have been put forward and the
research objectives that have been stated previously, the research hypothesis can be
formulated as follows:
H1: Religiosity (X1) influential to Attitude (Y1).
H2: Religiosity (X1) has an effect on subjective norms (Y2).
H3: Religiosity (X1) influences Behavior Control (Y3).
H4: E Filing (X2) has an effect to Attitude (Y1).
H5: E Filing (X2) effect on Behavior Control (Y3).
H6: Perceived risk (X3) influences attitude (Y1).
H7: Perceived risk (X3) has an effect on subjective norms (Y2).
H8: Perceived risk (X3) has an effect on Behavior Control (Y3).
H9: Attitude (Y1) influences tax compliance intention.
H10: Subjective Norms have an effect on Tax Compliance Intentions.
H11: Behavioral Control Influences Tax Compliance Intentions.
The hypothetical model designed in this study is presented in Fig. 14.6
(Table 14.1).
Based on the results of the hypothesis testing as presented in Table 14.1, it shows
that there are 11 (eleven) results of influence between research variables and 11
(eleven) influences between research variables that have a significant effect between
variables so that this study accepts 11 (eleven) hypotheses.
As previously explained, in SEM there are two types of influence, namely direct
effect and indirect effect. In Table 14.2, the indirect effect is presented using the
WarpPLS analysis.
300 M. S. Utama et al.
H1
Religiosity Attitude
H2
H3 H9
Tax
H4 H10 Compliance
E Filling Subjective Norms Intention
H5
H11
H6
H7
Perceived Risk Behavior Control
H8
Based on the results of the hypothesis testing as presented in Table 14.2, it was
found that there were 9 indirect effects between the research variables. Of the 9 (nine-
teen) indirect effects between the research variables, there are 7 (seven) significant
influences and there are 2 (two) significant indirect effects between variables.
In addition, the feasibility of the model can also be analyzed by calculating the
multivariate determination coefficient expressed by Q-Square (Q2 ). Q-Square is a
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 301
measure of how well the research model can explain the behavior of the research
object (system) studied. Q > 0 indicates the model has predictive relevance. To find
out how much the diversity of data can explain the model, this research can use Q2 .
Table 14.3 is a summary of the results of the coefficient of determination.
The predictive relevance value is 0.4629, indicating that the diversity of data that
can be explained by the model is 46.2%, or in other words the information contained
in the data is 46.2% can be explained by the model. While the remaining 53.7% is
explained by other variables (which are not included in the model) and errors. Thus
the structural model that has been formed is appropriate.
Graphically the results of hypothesis testing in the SEM structural model with the
WarpPLS approach can be seen in Fig. 14.7.
0.242
P=<0.001
Religiosity Attitude
0.263
P=<0.001 0.217
0.224 P=0.003
P=0.002
0.108 Tax
P=0.005 0.253
E Filling Subjective Norms
P=<0.001 Compliance
0.256
Intention
P=<0.001
0.216
0.251
P=0.003
0.224 P=<0.001
P=0.002
Perceived Risk Behavior Control
0.238
P=0.001
Fig. 14.7 Conceptual framework for hypothesis testing results. Source Primary Data Processed
(2020). Remarks black right arrow = Significant, red right arrow = Not Significant
14.10 Conclusion
Based on the results of the research and discussion of each of the variables previously
described, the following research conclusions can be drawn:
(1) Religiosity has a significant effect on Attitude in a positive direction. The
results of this study explain that if a taxpayer has religious beliefs, religious
practices, appreciation of religion, religious knowledge, and high religious
practice in himself for what is done and done, the taxpayer has a good attitude
to comply in paying taxes.
(2) Religiosity has a significant effect on Subjective Norms in a positive direction.
The results of this study explain that the higher the religiosity (X1), the higher
the subjective norm (Y2).
(3) Religiosity has a significant effect on Behavior Control in a positive direction.
The results showed that the higher the religiosity (X1), the higher the behavior
control (Y3).
(4) E-Filing has a significant effect on Attitude with a positive direction of influ-
ence. The results of this study indicate that any increase in E-Filling will cause
attitudes to increase positively, which means that the higher the E-Filling,
the higher the attitude. The coefficient is positive which means the effect is
unidirectional.
(5) E-Filing has no significant effect on Behavior Control with a positive direc-
tion of influence. The results of this study explain that every change in the
E-Filling variable has a significant effect or change on the Behavior Control
variable.
(6) Perceived risk has no significant effect on Attitude in a positive direction. The
results of this study indicate that every change in the perceived risk variable
has a significant influence or change in the Attitude variable.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 303
References
1. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211
(1991)
2. Musimenta, D., Nkundabanyanga, S.K., Muhwezi, M., Akankunda, B., Nalukenge, I.: Tax
compliance of small and medium enterprises: a developing country perspective. J. Fin. Regul.
Compl. 25(2), 149–175 (2017)
3. Mitu, N.E.: A basic necessity of a modern fiscal policy: voluntary compliance. Revista de
Științe Politice Revue des Sciences Politiques 57, 118–130 (2018)
4. Newman, W., Mwandambira, N., Charity, M., Ongayi, W.: Literature review on the impact of
tax knowledge on tax compliance among small medium enterprises in a developing country.
Int. J. Entrepreneursh. 22(4), 1–15 (2018)
5. Ajzen, I.: Attitudes, Personality and Behavior. Open University Press, Milton-Keynes, England
(2005)
6. Azwar, S.: Human Attitudes, Theories, and Measurements. Student Library, Jogjakarta (2010)
7. Haus, I., Steinmetz, H., Isidor, R., Kabst, R.: Gender effects on entrepreneurial intention: a
meta-analytical structural equation model. Int. J. Gend. Entrep. 5(2), 130–156 (2013)
8. Dharmamesta, B.S.: Theory of planned behavior in consumer attitude, intention, and behavior
research. Manage 7 (1998)
9. Rotter, J.B.: Internal vs external control of reinforcement: a case history of a variable. Am.
Psychol. 45(4), 489–493 (1990)
10. Tjahjono, A., Muhammad, F.H.: Taxation (3rd edn). Yogyakarta YKPN Company Academy
(2005)
11. Park, H., Blenkinsopp, J.: Whistleblowing as planned behavior—a survey of South Korean
police officers. J. Bus. Ethics 85, 545–556 (2009)
304 M. S. Utama et al.
12. Ajzen, I., Fishbein, M.: Attitude-behavior relations: a theoretical analysis and review of
empirical research. Psychol. Bull. 84(5), 888 (1988)
13. Sulistomo, A., Prastiwi, A.: Accounting Students’ Perceptions of Fraud Disclosure (Empirical
Study on UNDIP and UGM Accounting Students) (Doctoral dissertation, Faculty of Economics
and Business) (2011)
14. Gundlach, M.J., Douglas, S.C., Martinko, M.J.: The decision to blow the whistle: a social
information processing framework. Acad. Manag. Rev. 28(1), 107–123 (2003)
15. Sussman, R., Gifford, R.: Causality in the theory of planned behavior. Pers. Soc. Psychol. Bull.
45(6), 920–933 (2019)
16. Epstein, S.: Aggregation and beyond: some basic issues on the prediction of behavior. J. Pers.
51, 360–392 (1983)
17. Fishbein, M., Ajzen, I.: Attitudes towards objects as predictors of single and multiple behavioral
criteria. Psychol. Rev. 81(1), 59 (1974)
18. Fishbein, M., Yzer, M.C.: Using theory to design effective health behavior interventions.
Commun. Theory 13(2), 164–183 (2003)
19. Jogiyanto: Behavioral Information System. Andi Offset, Yogyakarta (2007)
20. Sulistyo, H.: The role of religious values on employee performance in the organization. Media
Res. Bus. Manage. 11(3), 252–270 (2011)
21. Susanti, R.: A description of the future orientation of adolescents in the field of work in terms
of religiosity and achievement motivation in the youth of Sei Banyak Ikan Kelayang Village.
J. Psychol. 12(2), 109–116 (2016)
22. Elci, M.: Effect of manifest needs, religiosity and selected demographics on hard working: an
empirical investigation in Turkey. J. Int. Bus. Res. 6(2), 97 (2007)
23. Mohdali, R., Pope, J.: The role of religiosity in tax morale and tax compliance. In: Australian
Tax Forum, vol. 25, no. 4, pp. 565–596 (2010)
24. Purnamasari, P., Amaliah, I.: Fraud prevention: relevance to religion and spirituality in the
workplace. Procedia Soc. Behav. Sci. 211, 827–835 (2015)
25. Utama, M.S., Nimran, U., Hidayat, K., Prasetya, A.: Effect of religiosity, perceived risk, and
attitude on tax compliant intention moderated by e-filing. Int. J. Fin. Stud. 10(1), 8 (2022)
26. Soemitro, R.: Tax Theory and Cases. Gramedia, Jakarta (2013)
27. Pujianto, A., Mulyati, A., Novaria, R.: Utilization of big data and consumer privacy protection
in the digital economy era. BIJAK Sci. Mag. 15(2), 127–137 (2018)
28. Kominfo, T.P.: Big Data Pocket Book. Ministry of Communication and Informatics (2015)
29. Ghozali, I.: Structural Equation Modeling, Alternative Method with Partial Least Square. Undip
Publishing Agency, Semarang
30. Solimun, S., Fernandes, A.A.R., Nurjannah, N.: Multivariate Statistical Method Structural
Equation Modeling (SEM) WarpPLS Approach. UB Press, Malang (2017)
31. Aji, A.S., Harahab, N.: Analysis of the effect of product price, product image, and customer
satisfaction as a mediation on brand loyalty of canned fish products from ABC brands.
ECSOFiM (Econ. Soc. Fish. Marine J.) 6(1), 83–92 (2018)
32. Ginting, D.B.: Structural equation model (SEM). Media Inform. 8(3), 121–134 (2009)
33. Chalil, D., Barus, R.: Qualitative Data Analysis: Theory and Applications in SWOT Anal-
ysis, Logit Models, and Structural Equation Modeling (Supplemented with SPSS and Amos
Manuals) (2014)
34. Hoelter, J.W.: The analysis of covariance structures: goodness-of-fit indices. Sociol. Methods
Res. 11(3), 325–344 (1983)
35. Hair, J., Anderson, R., Tatham, R., Black, W.: Multivariate Data Analysis, 5th edn. Prentice
Hall, Upper Saddle River, New Jersey (1998)
36. Bentler, P.M., Chou, C.P.: Practical issues in structural modeling. Sociol. Methods Res. 16(1),
78–117 (1987)
37. Byrne, B.M.: Structural Equation Modeling with AMOS: Basic Concepts, Applications, and
Programming. Lawrence Erlbaum Associates Inc., Mahwah, NJ (2001)
38. Ullman, J.B.: Structural equation modeling: reviewing the basics and moving forward. J.
California Stat. Assess. 87, 35–50 (2006)
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 305
39. Schumacker, R.E., Lomax, R.G.: A Beginner’s Guide to Structural Equation Modeling, 3rd
edn. Routledge, New York (2010)
40. Riyadi: REST Web Service Design for Comparison of Shipping Prices with Web Scrapping
Methods and Utilization of API. College of Informatics and Computer Management Amikom
Yogyakarta, Yogyakarta (2013)
Chapter 15
Influence of Firm-Specific Variables
on Capital Structure Decisions:
An Evidence from the Fintech Industry
Abstract Capital structure (Capstr) decisions have always been a concern for firms.
It is crucial to decide the right proportion of borrowed funds for any organisation. This
study examines the influence of firm-specific variables that determine the Capstr deci-
sions of firms from the fintech industry. The data for this study is sourced through the
Refinitiv Database consisting of the worldwide fintech industry. The selected sample
consists of 186 firms from across the global fintech industry. Through a quantita-
tive approach, we have used panel regression, supported by descriptive statistics and
correlation, considering the annual financial published data of the selected firms for a
period from 2011 to 2021. The data used in the study comprises of an unbalanced and
a cross-sectional data of 1000 firm/year observations, imported from the selected 186
firms. The findings conclude that the firm size (FS), profitability (Prft), tangibility
ratio (Tr) and volatility (Vol) have a significant impact on Total debt ratio (TDr) and
Short term debt ratio (SDr), However only profitability (Prft) has a significant impact
on Long term debt ratio (LDr). The study partially approves to be in agreement to
the pecking order theory for the studied industry.
S. Dsouza (B)
College of Business Administration, American University of the Middle East, Egaila, Kuwait
e-mail: [email protected]
A. K. Jain
Department of Finance, Westminster International University of Tashkent, Tashkent,
Uzbekistan 100047
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 307
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_15
308 S. Dsouza and A. K. Jain
15.1 Introduction
The decision regarding the capital structure (Capstr) is the most essential decision
for any business. A major issue that is faced by the firm while taking Capstr deci-
sions is understanding and determining optimal Capstr. Corporate finance consists
of many different theories which have a full focus on firm-specific factors that have
determinants in defining capital-structure decision making. Recently researchers are
focusing on firm-specific factors concerning the Fintech industry, especially in devel-
oped countries. The decision for Capstr has to be taken in advance at the beginning
of company formation or when there is additional requirement of funds for meeting
capital investment decisions.
Since the new era began, there is the introduction of new technologies and
implementation of digital technologies like: artificial-intelligence, cloud-computing,
block-chain, and a large amount of data in the finance industry. Fintech is trans-
forming the traditional financial industry all over the world and significantly impacts
the capital-structure of companies. Previously, Fintech was proposed in the year 1990
and introduced by “Financial Services Technology Alliance” in US by “Citibank”.
Fintech supports the financial industry by applying advanced technologies and helps
in their development [20]. A detailed definition of Fintech given by “Financial
Stability Board in 2016” [59], is that Fintech is a new technological invention in
the finance industry that influences and collide the financial market, institution and
services that results in new business models, product and services. This Fintech
technology involves digital technologies, blockchain, cloud computing, and artificial
intelligence [69].
According to Ding et al. [27], in the last years, the application of Fintech has grown
and emerged at a very high speed around the globe. KPMG’ (2021) reported that
global Fintech investment has received around 210 dollars billion with approximately
5684 deals in the year 2021 [1]. The rules implied in the traditional financial system
operational environment are changing with the emergence of fintech as suggested
by [12, 49, 62]. With the implementation of Fintech in the financial industry, the
customer-centric business style and efficiency in providing customer service can be
effectively improved as the operating cost is reduced and risk control systems are
strengthened [69]. Fintech is a measure that is revolutionizing the financial industry
with many benefits that are adding productive value for the entire industrial growth
[14]. Previous research has shown that Fintech enhances the total productivity of
the firm [71]. Lv and Xiong [50] discussed the development of Fintech corporate
investment and efficiency move together with a positive correlation. Despite of many
studies made on Fintech and its impact on the financial sector, its influence on the
specific variable of firms on Capstr decisions remained unexplored. As Fintech tech-
nologies are a mix of data and information which is critical, it is quite difficult to
comprehend the Capstr decisions of such firm applying such technologies.
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 309
Previous kind of literatures for Capstr financial decisions focus on two options
which are debt and equity. Also, the topic is researched on private and public fixed-
income securities and possible determinants which can affect the Capstr of the orga-
nization in any sector. There are huge number of studies made on Modigilani and
Miler Theory by researchers that suggested that the value of a firm is not dependent
on its Capstr in the case of theoretical markets having no taxes, no cost of agency,
and all information are disclosed [8]. Identification of determinants of Capstr is made
through research like the pecking order theory [36, 51, 63] and trade-off theory [2, 55,
64]. Theory of pecking order theory which was developed by Donaldson in the year
1961 and which was later altered by Myers in 1984 stated that any organization first
prioritizes the internal funds for sourcing their finances and then moves to debt and
equity capital in the respective order. Based on this theory the first readily available
option is prioritized first and then it adopts debt and equity for further finances [9]. On
the contrary, the theory of tradeoffs explained that Capstr decisions are established
on the understanding of tradeoffs by comparing the cost of debt with the benefits
[25].
Several studies have been conducted on the fintech industry which has investigated
the transformation and understanding of fintech in recent years around the globe and
determinants of debt financing in the case of fintech startups [4, 39, 47] but such
research is very limited. Also, there is no specific research made in the area which has
studied the “influence of firm-specific variables on Capstr decisions” as per evidence
from the fintech industry. It is better to understand the “influence of firm-specific
variables on Capstr decisions” in a specific area as the fintech industry is highly data-
driven. Barsotti [6] is of the view that for studying the optimal structure of capital, it
is crucial to understand firm-specific variables for different organizations, however
fintech industry has never been explored with capital structure studies as the nature of
the sources of funds have a different dimension for this industry, unlike the traditional
ways, this industry is very popular with VC funding and M&A activity. Having an
untraditional pattern to fund sourcing, always lead to a gap for further research to
observe the relationship between sales growth (SG), firm size (FS), profitability of
firm (Prft), tangibility ratio (Tr) and volatility (Vol) on total debt ratio (TDr), long
term debt ratio (LDr) and short term debt ratio (SDr).
Due to the distinctive characteristics of the fintech industry and assorted evidence
of determinants to make a Capstr decision, a structured study on the fintech industry
is to be researched well.
Our study will add significant answers and conclusions to the present limited
literature. The objective is to identify the impact of SG, FS, Prft, Tr and Vol on
TDr, LDr and SDr of the firms from across the global fintech industry. The reason
we chose fintech industry, as it has always been a booming industry with limited
research work on it and there are very few global evidence so far.
On priority, this study will focus on the Capstr decisions literature and theories
already made in the previous years and its generalization to the Fintech industry.
Secondly, the research will provide strong evidence collected from the literature to
understand the specific variables and their influence on Capstr decision-making. The
research methodology is discussed in the third section with sample and descriptive
310 S. Dsouza and A. K. Jain
statistics presented in the section fourth and the final result discussion in the fifth. In
addition, a conclusion is drawn to sum up the research in the last section.
The current section focuses on the empirical evidence from previous works of liter-
ature on factors and determinants of Capstr decisions. Matias and Serrasqueiro [52]
studied Capstr decision’s reliable determinant factors and identified taking prof-
itability, size, age, asset-structure, and growth opportunities as reliable determinants.
Also, the results have shown that the decisions were closer to the pecking-order theory
rather than the trade-off theory. In emerging markets, the Capstr determinants are
different. The researcher employed a GMM estimator for controlling the endogeneity
and declared the results that the factors of capital-structure are quite different in the
case of indicators of long-term from short-term indicators [68].
Güner [41] studied the factor consideration of Capstr decisions in Turkish compa-
nies and exploited the differences between the capital-structure decisions in terms of
different degrees of free float rate the companies have in the financial market, foreign
paid-in capital, and market values. The results have shown that though pecking-
order theory is the best principle that describes Capstr, but some determinants are
best suited to trade-off theory. It is observed in the research that companies having
free float rates have low leverage levels and it starts varying for different market
values of companies. Alipour et al. [3], collected evidence from companies in Iran
to study the factors of Capstr which stated that variables like the size of the firm,
flexibility in financials, the structure of assets, profitability, liquidity, growth, risk,
and state-ownership influences the measures of Capstr in the corporations situated
in Iran. Evidence from the research shows that short-term debt is the most important
financing option for sourcing companies in Iran and these results are supported by
many previous theories.
An empirical study using data from Chinese non-financial firms depicted that the
average leverage ratio in the collected evidence is alike to those derived in emergent
nations. Also, the study recommends that tangibility, size, volatility and age of the
firm are firmly correlated with the leverage and are quite robust determinants whereas
the firm’s profitability negatively impacts the leverage position Vijayakumaran and
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 311
Vijayakumaran [67]. Correia [19] and Chakrabarti and Chakrabarti [10], have exam-
ined and analyzed profitability as a factor for making capital-structure decisions. It
has been observed from the analysis that there is a negative correlation between the
profitability of the organization and debt. This is due to the firm which has a high
level of profit generally having a low level of debt funding. The reason is the cost of
financial distress which is assumed by the theory of trade-off. Further, this evidence
is similar to pecking-order theory and previous literature declared that profitability
has a negative relationship and profit-making SMEs prefer profit over debt [53].
Study conducted by Sofat and Singh [65] explained different conditional theories
of Capstr and reviewed the literature to conduct analysis on manufacturing firms.
Results of the research suggested that variables like asset composition, risk involved
in the business and return derived on assets (ROA), have positive correlation to
debt ratio and they are declared optimal determinants for Capstr decision making.
However, size of the firm and capacity of debt servicing are not considered to be influ-
ential determinants for decision making. Ullah et al. [66] in his research described
that there is a positive correlation between the debt-equity ratio and return on equity
when the confidence level is at 10%. In contrast to this, the asset turnover ratio has
an inverse relationship with return on equity. Also, there is a negative relationship
when dealing with the size of the firm and return on equity.
Rahman et al. [60] discussed the profits of listed manufacturing companies situ-
ated in Bangladesh and the impact of capital-structure decisions on it. By considering
around 50 observations that are listed on the Dhaka stock exchange in the years 2013
and 2017, the results declared that the debt and equity ratio both have a positive
effect. While there is a critical positive impact of equity ratio with return on equity
whereas debt-to-equity ratio behaves negatively with return on equity. Chang et al.
[13] assessed the impact on the profitability position of the company due to capital-
structure decisions. The study observed the data of Asian economics and imple-
mented regression analysis on the data to ascertain the results. The study observed a
negative correlation between leverage and profitability. However, there is a positive
relationship between growth factors and leverage levels. Nguyen and Nguyen [56]
studied the relationship between the profitability status of non-financial firms and
capital structure. Around 488 listed companies were selected from the Vietnam stock
exchange in the years 2013 and 2018. The results have shown that there is a negative
relation between profit position and capital structure.
Putri and Rahyuda [58], researched capital structure’s influence on the debt-equity
ratio, growth of sales, and the profitability matrix. The study showed results that the
debt-equity ratio negatively influences the profit position of the company. Whereas,
the growth factor positively influences the profits of the company. Orlova et al. [57]
concluded the complexity of Capstr decision making, which is based on the require-
ment of external funding, accessibility to the fixed-income securities market, and
the capabilities of the borrowing firm to handle the additional leverage. These deter-
minants affect the complexity of the decisions. The firm having a financing deficit
can take advantage of accessibility to the market which mitigates the complexity of
capital structure. Dimitropoulos and Koronis [26] examined the Capstr determinants
in the Greek debt crisis and the results depicted that the tangibility of the asset is
312 S. Dsouza and A. K. Jain
straightaway related to the total and long-term leverage position majorly when the
crisis due to debt hit the Greek. Whereas the “non-debt tax shield (NDTS)” and
payment of taxes negatively impacted the total leverage position in the firm. Also,
an organization that has a high level of growth opportunities in the market tends to
be more associated with lower long-term debt. This results in low debt exposure.
In the current section, the literature related to the Fintech sector is only for the
determination of Capstr decisions. Gastaud et al. [37] found that owner risk and
tolerance, the characteristics of the promoters, and the goods and services produced
by the firm [61] are a few factors that influence the sourcing of finance decisions.
Fintech is a fast-growing mechanism in the rapidly changing financial service
sector [45] and the same is not elaborated and understood in detail [5]. Literature
on Fintech is lacking behind and there are fewer key topics on the subject [54].
Zavolokina et al. [70] explained Fintech as a living entity instead treating it as a
stable idea. Many case studies and research have been conducted that examined the
levels of Capstr decision-making and raising of funds factors such as past funding
obtained, performance, and human resource characteristics [11].
Kachlami [46], examined the SME sector and reported in the research that SMEs
are prone to utilize their profits and apply an internal source of funding instead of
depending upon external funding such as debt. On the other hand, research conducted
on startups using the “Kauffman” firm survey shows results such as owners having
high net worth would consider using more of their equity in the firm than depending
on the debt [16]. Fintech is well established in rich venture capitalist countries and
fund requirements are fulfilled in such companies by a diversified approach like
the “in-residence incubator program” which is generally implied by the financial
institution which is working in the Fintech sector [44].
Giaquinto [40], explained in his research that the business environment of the
country affects the Fintech industry. Also, it is observed in the study that there is a
positive correlation between the business venture capitalist and seed round capital.
These studies have not focused on the Capstr stages of the FinTech industry and there
are many shortcomings in previous research.
Bui [7] discussed about the startups funding like Fintech, where there is less
traditional funding sourced by the firms, they are more likely to depend upon the
equity-based funds, however the same cannot be generalized. Evidence collected by
Langevin [48] shows that fintech is transforming the capital markets which has the
capability of mitigating the information asymmetry [35], however, it increases the
stock liquidity which allows the accessibility to low-cost equity finance options.
Evidence shown by Comeig et al. [17] accessing external financing is a major
issue for fintech companies as they have informational capacity especially which
are new startups as they lack collateral. Therefore, [15] explained that debt can be
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 313
used by such firms as it also provides better performance, and they are more likely
to survive in long run with fast growth in revenues.
In this research paper, we utilize unbalanced panel data from 186 firms for the period
2011–2021 making up a total of 1000 firm/year observations. The dependent variable
is the firm’s Capstr proxied by Total debt ratio (TDr), Long term debt ratio (LDr) and
Short term debt ratio (SDr), while the independent variables are Sales growth (SG),
Firm size (FS), Profitability of firm (Prft), Tangibility ratio (Tr) and Volatility (Vol).
The time series data on all variables were obtained from the financial data available
on the Refinitiv website. All the selected firms are listed on the stock exchange and
belong to the global fintech industry (Table 15.1).
where the dependent and independent variables are mentioned. The fixed effects are
proxied by year included in the model. εit represents the error term.
The selected sample comprises of firms listed at the stock exchange for the 2011–
2021 period, from the global fintech industry, available on the Refinitiv database.
In selecting the period we aimed at including as many of the most recent years
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 315
as possible. We pooled the firm/year data from all the listed firms globally and we
excluded firm/year data that had missing data or insufficient financial information for
all the selected variables. A cross-sectional and unbalanced panel was obtained after
all the possible data reductions. The panel comprises 1000 firm/year observations
from the selected 186 firms. The outliers in the sample were not removed from
the panel; however the data was winsorized at 2% (p. 2 98) level. The data was
further processed using STATA software. Table 15.2 shows the descriptive statistics,
skewness, and kurtosis results for the mentioned data.
As expressed in Table 15.2, the mean of TDr is 0.38, LDr is 0.08 and SDr is
0.20; the standard deviation for TDr is 1.20, LDr is 0.19 and for SDr is 0.72. With a
mean almost closer to zero indicates that the firms barely use debt (neither short term
nor long term) as a source of finance for the business. A lower standard deviation
also indicates that the firms across the fintech industry sample follow the same debt
practice. The mean and standard deviation of SG is 0.51, 1.76. The mean value
indicates that the sales growth has been positive on an average for the sample and
the lower standard deviation explains a similar behaviour across the sample. The
FS has a mean of 17.84 and standard deviation of 3.47, considering the maximum
and the minimum values of FS, the mean indicates a balanced distribution of firms
across sample w.r.t. their investments in assets. The Prft has a mean of − 0.99 and
a standard deviation of 3.33, having a negative mean though almost closer to zero
indicates that majority of the fintech firms are either having losses or are barely able
to achieve their breakeven point. The Tr has a mean of 0.13 and a standard deviation
of 0.21, having a mean lower as 0.13 indicates that the fixed assets comprises to be
on an average 13% of the total assets, with a consistent behaviour across the sample.
The Vol has a mean of 1.50 and a standard deviation of 5.04, the behaviour of ROA
with its three year standard deviation has a lower mean and a reasonably consistent
behaviour across the fintech sample. As the skewness observations for the whole
sample is almost equal to zero, it indicates that the data used in the complete sample
is fairly symmetrical. A low kurtosis value across the whole sample states that the
sample lacks outliers.
Table 15.3 reflects the correlation between the independent and dependent vari-
ables. It has been observed that SG has a positive correlation with TDr, LDr and SDr.
FS and Prft have a negative correlation with TDr, LDr and SDr along with Tr having
a positive correlation with TDr, LDr and SDr being statistically significant at 5%
(FS, Prft and Tr) with TDr and SDr. The correlation matrix provides a general and
primary association amongst the variables; however they need to be further tested
with regression analysis to identify the impact of the independent variables on the
dependent variables.
Table 15.4 shows the variance inflation factor (VIF) results. The variables in the
model are free from multicollinearity within themselves.
Table 15.4 VIF results: SG, FS, Prft, Tr, Vol and TDr/Ldr/Sdr
Variables SG FS Prft Tr Vol Mean VIF
VIF 1 1.44 2.31 1.06 2.17 1.6
1/VIF 1.00 0.69 0.43 0.94 0.46
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 317
8
6
Total debt ratio (TDr)
4
2
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figures 15.1, 15.2 and 15.3 displays the distribution of Total debt ratio (TDr), Short
term debt ratio (SDr) and Long term debt ratio (LDr) over a period from 2011 to
2021 using Box plot. Box plot technique displays the five-number summary as a
central box with whiskers that extend to the non-outlying values. As observed in
all the three figures, for all the individual years discussed the median is not roughly
centered between the quartiles and the whiskers are not of the similar length, thus
we conclude that the per year data distribution for TDr, SDr and LDr over the period
of the study is skewed.
The panel regression analysis is used to observe the impact of the independent vari-
ables on the dependent variables. The most significant results derived from the no
dummy or year dummy observation is used to analyse the discussed panel regression
model. Further the Hausman test derives the selection of Fixed or random effect for
the analysis.
Table 15.5 represents the regression results. Based on the Hausman test static (p
= 0.0018), the fixed effect results have been analysed for dependent variable TDr.
The firm’s sales growth (SG) have no significant impact on TDr. However, firm size
318 S. Dsouza and A. K. Jain
5
4
Short term debt ratio (SDr)
3
2
1
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Box plot of Long term debt ratio (LDr)
(FS) with no dummy has a negative and significant (p < 0.05) impact on TDr, which
accepts the (H2) null hypothesis. The profitability of the firm (Prft) with year dummy
has a negative and significant (p < 0.01) impact on TDr. Tangibility ratio (Tr) with no
dummy has a positive and significant (p < 0.01) impact on TDr.However Volatility
(Vol) with no dummy has a positive and significant (p < 0.05) impact on TDr, which
accepts the (H5) null hypothesis. For the dependent variable SDr, considering the
Hausman test static (p = 0.0049), the fixed effect results have been analysed. The
firm’s sales growth (SG) have no significant impact on SDr. However, firm size (FS)
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 319
with no dummy has a negative and significant (p < 0.01) impact on SDr, which accepts
the (H2) null hypothesis. The profitability of the firm (Prft) with year dummy has a
negative and significant (p < 0.01) impact on SDr. Tangibility ratio (Tr) with year
dummy has a positive and significant (p < 0.01) impact on SDr. However Volatility
(Vol) with no dummy has a positive and significant (p < 0.01) impact on SDr, which
accepts the (H5) null hypothesis. For the dependent variable LDr, the Hausman test
static (p = 0.8236) concludes the random effect results to be analysed. The firm’s
sales growth (SG), firm size (FS), Tangibility ratio (Tr)and Volatility (Vol) have no
significant impact on LDr.However, profitability of the firm (Prft) with no dummy
has a negative and significant (p < 0.05) impact on LDr.
The study contributes to the exisiting literature on Capstr decisions, however the
uniqueness to the study is the global finetch industry which makes it rare. The study
includes various variables like firm’s sales growth (SG), firm size (FS), Profitability
of firm (Prft), Tangibility ratio (Tr) and Volatility (Vol) whose impact has been
measured on debt ratios. However to make the study more robust the debt ratios
are further classified into separate models and the impact is measured separately
on them (TDr, SDr and LDr). The results indicate that the firm’s sales growth (SG)
has no significant impact on TDr, SDr and LDr for the fintech industry. The firm
size (FS) has a negative and significant impact on TDr and SDr, however FS has no
significant impact on LDr. The firm size (FS) showing negative impact on TDr and
SDr contradicts to the literature [53]. This indicates that the fintech industry does not
encourage debt to be a source of funds for the assets but focuses more on VC funding
and M&A activity [18], The Profitability of firm (Prft) has a negative and significant
impact on TDr, SDr and LDr, thus it’s in agreement to the pecking order theory, w.r.t.
the profitability behaviour. Tangibility ratio (Tr) has a positive and significant impact
on TDr, SDr. However the Volatility (Vol) has a positive and significant impact on
TDr and SDr, Vol has no significant impact on LDr. The Volatility (Vol) showing
positive impact on TDr and SDr contradicts to the literature [34, 38].
The study can be useful for fintech firms inorder to decide on their capital strucutre
decisions for exisiting and future business opportunities, it can be useful for investors
to observe the firm behavior w.r.t. the fintech industry funding behavior. Considering
the independent variables influence on debt ratios, the fintech firms can identify
business abnormalities and improvise on the same. Being a rare industrial study,
it still holds some limitations. The study is limited to global fintech industry and
the conclusions can vary with change in the industry. There can be further studies
motivated by adding more independent variables from the fintech industry and testing
their influence on Capstr decisions.
References
1. Agrawal, R.: Role of Fintech companies in increasing financial inclusion. J. Appl. Manage.
14(1), 24–36 (2022)
2. Ai, H., Frank, M.Z., Sanati, A.: The trade-off theory of Corporate Capital Structure. Oxford
Research Encyclopedia of Economics and Finance (2020)
3. Alipour, M., Mohammadi, M.F.S., Derakhshan, H.: Determinants of capital structure: an
empirical study of firms in Iran. Int. J. Law Manage. 57(1), 53–83 (2015)
4. Allen, F., Gu, X., Jagtiani, J.: Fintech, cryptocurrencies, and CBDC: financial structural
transformation in China. J. Int. Money Financ. 124, 102625 (2022)
5. Anagnostopoulos, I.: Fintech and regtech: impact on regulators and banks. J. Econ. Bus. 100,
7–25 (2018)
6. Barsotti, F.: Optimal Capital Structure with Endogenous Bankruptcy: Payouts, Tax Bene-
fits Asymetry and Volatility Risk (Doctoral dissertation, Université de Toulouse, Université
Toulouse III-Paul Sabatier) (2011)
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 321
7. Bui, T.P.: Fintech và đầu tư vào lĩnh vực Fintech tại Tổng công ty Viễn thông Viettel (2019)
8. Chadha, S., Sharma, A.K.: Capital structure and firm performance: empirical evidence from
India. Vis. J. Bus. Perspect. 19(4), 295–302 (2015). https://doi.org/10.1177/097226291561
0852
9. Chaklader, B., Chawla, D.: A study of determinants of capital structure through panel data
analysis of firms listed in NSE CNX 500. Vision 20(4), 267–277 (2016)
10. Chakrabarti, A., Chakrabarti, A.: The capital structure puzzle–evidence from Indian energy
sector. Int. J. Energy Sector Manage. (2019)
11. Chan, E., Fei, Y.: Assessing the startup bandwagon effect: the role of past funding in venture
capital investment. UChicago Undergr. Bus. J. 1(2), 1–18 (2015)
12. Chang, V., Baudier, P., Zhang, H., Xu, Q., Zhang, J., Arami, M.: How blockchain can impact
financial services—the overview, challenges and recommendations from expert interviewees.
Technol. Forecast. Soc. Chang. 158, 120166 (2020)
13. Chang, C.C., Batmunkh, M.U., Wong, W.K., Jargalsaikhan, M.: Relationship between capital
structure and profitability: evidence from four Asian tigers. J. Manage. Inf. Decis. Sci. (2019)
14. Chen, M.A., Wu, Q., Yang, B.: How valuable is FinTech innovation? Rev. Fin. Stud. 32(5),
2062–2106 (2019)
15. Cole, R.A., Sokolyk, T.: Debt financing, survival, and growth of start-up firms. J. Corp. Finan.
50, 609–625 (2018)
16. Coleman, S., Cotei, C., Farhat, J.: The debt-equity financing decisions of US startup firms. J.
Econ. Fin. 40, 105–126 (2016)
17. Comeig, I., Fernández-Blanco, M.O., Ramírez, F.: Information acquisition in SME’s relation-
ship lending and the cost of loans. J. Bus. Res. 68(7), 1650–1652 (2015)
18. Cornelli, G., Doerr, S., Franco, L., Frost, J.: Funding for fintechs: patterns and drivers (2021)
19. Correia, A.M.F.A.: Determinants of corporate capital structure: evidence from non-financial
listed French firms (2015)
20. Darolles, S.: The rise of fintechs and their regulation. Fin. Stabil. Rev. 20, 85–92 (2016)
21. Demiraj, R., Dsouza, S., Abiad, M.: Working capital management impact on profitability:
pre-pandemic and pandemic evidence from the European automotive industry. Risks 10(12)
(2022)
22. Demiraj, R., Demiraj, E., Dsouza, S.: Impact of financial leverage on the performance of tourism
firms in the MENA region. PressAcademia Procedia 16(1), 156–161 (2023)
23. Demiraj, R., Dsouza, S., Demiraj, E.: ESG scores relationship with firm performance: panel
data evidence from the European tourism industry. PressAcademia Procedia 16(1), 116–120
(2023)
24. Demiraj, R., Dsouza, S., Demiraj, E.: Capital structure and profitability: panel data evidence
from the European tourism industry. In: 6th International Scientific Conference ITEMA 2022,
Selected Papers (2023). https://doi.org/10.31410/ITEMA.S.P.2022.1
25. Dierker, M., Lee, I., Seo, S.W.: Risk changes and external financing activities: tests of the
dynamic trade-off theory of capital structure. J. Empir. Financ. 52, 178–200 (2019)
26. Dimitropoulos, P.E., Koronios, K.: Capital structure determinants of Greek hotels: the impact
of the Greek debt crisis. In: Culture and Tourism in a Smart, Globalized, and Sustainable
World: 7th International Conference of IACuDiT, Hydra, Greece, 2020, pp. 387–402. Springer
International Publishing, Cham (2021)
27. Ding, N., Gu, L., Peng, Y.: Fintech, financial constraints and innovation: evidence from China.
J. Corp. Finan. 73, 102194 (2022)
28. Dsouza, S., Pandey, D.: Study of relationship between liquidity and profitability of automobile
companies. In: International Conference on Finance and Economics, 83–93 (2017)
29. Dsouza, S., Rabbani, M.R., Hawaldar, I.T., Jain, A.K.: Impact of bank efficiency on the prof-
itability of the banks in India: an empirical analysis using panel data approach. Int. J. Fin. Stud.
10(4), 93 (2022)
30. Dsouza, S., Demiraj, R., Habibniya, H.: Variable reduction technique to boost financial anal-
ysis: a case study on emerging markets telecommunication industry, BRICS. SCMS J. Indian
Manage. 19(2) (2022)
322 S. Dsouza and A. K. Jain
31. Dsouza, S., Habibniya, H.: The impact of liquidity on the profitability of nifty pharma index
(NSE India). IUP J. Account. Res. Audit Pract. 20(4) (2021)
32. Dsouza, S., Demiraj, R., Habibniya, H.: A Study on the Impact of Liquidity and Leverage
on Performance: Hotels and Entertainment Services Industry–MENA Region: An Empirical
Panel Data Analysis. Available at SSRN 3989995 (2021)
33. Dsouza, S., Demiraj, R., Habibniya, H.: Impact of liquidity and leverage on performance:
panel data evidence of hotels and entertainment services industry in the MENA region. Int. J.
Hospital. Tour. Syst. 16(3), 26–39 (2023)
34. Dudley, E., James, C.M.: Cash flow volatility and capital structure choice (2015). https://doi.
org/10.2139/ssrn.2492152
35. Feyen, E., Frost, J., Gambacorta, L., Natarajan, H., Saal, M.: Fintech and the digital trans-
formation of financial services: implications for market structure and public policy. BIS Pap.
(2021)
36. Frank, M.Z., Goyal, V.K., Shen, T.: The pecking order theory of capital structure: where do we
stand? SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3540610
37. Gastaud, C., Carniel, T., Dalle, J.M.: The varying importance of extrinsic factors in the success
of startup fundraising: competition at early-stage and networks at growth-stage (2019). arXiv
preprint arXiv:1906.03210
38. Ghasemzadeh, M., Heydari, M., Mansourfar, G.: Earning volatility, capital structure decisions
and financial distress by SEM. Emerg. Mark. Financ. Trade 57(9), 1–19 (2019)
39. Giaretta, E., Chesini, G.: The determinants of debt financing: the case of fintech start-ups. J.
Innov. Knowl. 6(4), 268–279 (2021)
40. Giaquinto, L.: Angel, seed and founders influence on Fintech funding: semantic scholar.
Semantic Scholar (1970)
41. Güner, A.: The determinants of capital structure decisions: new evidence from Turkish
companies. Procedia Econ. Fin. 38, 84–89 (2016)
42. Habibniya, H., Dsouza, S.: Impact of performance measurements against market value of shares
in Indian banks an empirical study specific to EVA, EPS, ROA, and ROE. J. Manag. Res. 18(4),
203–210 (2018)
43. Habibniya, H., Dsouza, S., Rabbani, M.R., Nawaz, N., Demiraj, R.: Impact of capital structure
on profitability: panel data evidence of the telecom industry in the United States. Risks 10(8),
157 (2022)
44. Haddad, C., Hornuf, L.: The emergence of the global Fintech market: economic and
technological determinants. Small Bus. Econ. 53(1), 81–105 (2019)
45. Jagtiani, J., Lemieux, C.: Do Fintech lenders penetrate areas that are underserved by traditional
banks? J. Econ. Bus. 100, 43–54 (2018)
46. Kachlami, H., Yazdanfar, D.: Determinants of SME growth: the influence of financing pattern.
An empirical study based on Swedish data. Manage. Res. Rev. 39(9), 966–986 (2016)
47. Knewtson, H.S., Rosenbaum, Z.A.: Toward understanding FinTech and its industry. Manag.
Financ. 46(8), 1043–1060 (2020)
48. Langevin, M.: Big data for (not so) small loans: technological infrastructures and the
massification of fringe finance. Rev. Int. Polit. Econ. 26(5), 790–814 (2019)
49. Lee, I., Shin, Y.J.: Fintech: ecosystem, business models, investment decisions, and challenges.
Bus. Horiz. 61(1), 35–46 (2018)
50. Lv, P., Xiong, H.: Can FinTech improve corporate investment efficiency? Evidence from China.
Res. Int. Bus. Financ. 60, 101571 (2022)
51. Martinez, L.B., Scherger, V., Guercio, M.B.: SMEs capital structure: trade-off or pecking order
theory: a systematic review. J. Small Bus. Enterp. Dev. 26(1), 105–132 (2019)
52. Matias, F., Serrasqueiro, Z.: Are there reliable determinant factors of capital structure decisions?
Empirical study of SMEs in different regions of Portugal. Res. Int. Bus. Financ. 40, 19–33
(2017)
53. Matias, F., Salsa, L., Afonso, C.: Capital structure of Portuguese hotel firms: a structural
equation modelling approach. Tour. Manage. Stud. 14(1), 73–82 (2018)
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 323
54. Milian, E.Z., Spinola, M.D.M., de Carvalho, M.M.: Fintechs: a literature review and research
agenda. Electron. Commer. Res. Appl. 34, 100833 (2019)
55. Nicodano, G., Regis, L.: A trade-off theory of ownership and capital structure. J. Financ. Econ.
131(3), 715–735 (2019)
56. Nguyen, T., Nguyen, H.: Capital structure and firm performance of non-financial listed
companies: cross-sector empirical evidences from Vietnam. Accounting 6(2), 137–150 (2020)
57. Orlova, S., Harper, J.T., Sun, L.: Determinants of capital structure complexity. J. Econ. Bus.
110, 105905 (2020). https://doi.org/10.1016/j.jeconbus.2020.105905
58. Putri, I.G.A.P.T., Rahyuda, H.: Effect of capital structure and sales growth on firm value with
profitability as mediation. Int. Res. J. Manage. IT Soc. Sci. 7(1), 145–155 (2020)
59. Ramlall, I.: Understanding Financial Stability. Emerald Group Publishing (2018)
60. Rahman, M.A., Sarker, M.S.I., Uddin, M.J.: The impact of capital structure on the profitability
of publicly traded manufacturing firms in Bangladesh. Appl. Econ. Fin. 6(2), 1–5 (2019)
61. Roeder, J., Cardona, D.R., Palmer, M., Werth, O., Muntermann, J., Breitner, M.H.: Make
or break: business model determinants of FinTech venture success. In: Proceedings of the
Multikonferenz Wirtschaftsinformatik, Lüneburg, Germany, 6–9 (2018)
62. Saksonova, S., Kuzmina-Merlino, I.: Fintech as financial innovation—the possibilities and
problems of implementation (2017)
63. Serrasqueiro, Z., Caetano, A.: Trade-off theory versus pecking order theory: capital structure
decisions in a peripheral region of Portugal. J. Bus. Econ. Manag. 16(2), 445–466 (2015)
64. Shahar, W.S.S., Shahar, W.S.S., Bahari, N.F., Ahmad, N.W., Fisal, S., Rafdi, N.J.: A review of
capital structure theories: trade-off theory, pecking order theory, and market timing theory. In:
Proceeding of the 2nd International Conference on Management and Muamalah, pp. 240–247
(2015)
65. Sofat, R., Singh, S.: Determinants of capital structure: an empirical study of manufacturing
firms in India. Int. J. Law Manage. 59(6), 1029–1045 (2017)
66. Ullah, A., Kashif, M., Ullah, S.: Impact of capital structure on financial performance of textile
sector in Pakistan. KASBIT Bus. J. 10(2), 1–20 (2017)
67. Vijayakumaran, S., Vijayakumaran, R.: The determinants of capital structure decisions:
evidence from Chinese listed companies, 63–81 (2018)
68. Vo, X.V.: Determinants of capital structure in emerging markets: evidence from Vietnam. Res.
Int. Bus. Financ. 40, 105–113 (2017)
69. Yang, Y., Su, X., Yao, S.: Nexus between green finance, Fintech, and high-quality economic
development: empirical evidence from China. Resour. Policy 74, 102445 (2021)
70. Zavolokina, L., Dolata, M., Schwabe, G.: The FinTech phenomenon: antecedents of financial
innovation perceived by the popular press. Fin. Innov. 2(1), 1–16 (2016)
71. Zhou, G., Zhu, J., Luo, S.: The impact of Fintech innovation on green growth in China:
mediating effect of green finance. Ecol. Econ. 193, 107308 (2022)
Chapter 16
A Weights Direct Determination Neural
Network for Credit Card Attrition
Analysis
Abstract Cost reduction is a component that contributes to both the profitability and
longevity of a corporation, especially in the case of a financial institution, and can
be accomplished through greater client retention. Particularly, credit card customers
comprise a volatile subset of a bank’s client base. As such, banks would like to predict
in advance which of those clients are likely to attrite, so as to approach them with
proactive marketing campaigns. Credit card attrition is generally a poorly investi-
gated subtopic with a variety of challenges, like highly imbalanced datasets. This
article utilizes neural networks to address the challenges of credit card attrition since
they have found great application in many classification problems. More particu-
larly, to overcome the shortcomings of traditional back propagation neural networks,
we construct a multi-input trigonometrically activated weights and structure deter-
mination (MTA-WASD) neural network which incorporates structure trimming as
well as other techniques that boost its training speed as well as diminish the danger
and the subsequent detrimental effects of overfitting. When applied to three publicly
available datasets, the MTA-WASD neural network demonstrated either superior or
highly competitive performance across all metrics, compared to some of the best-
performing classification models that MATLAB’s classification learner app offers.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 325
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_16
326 V. N. Katsikis et al.
16.1 Introduction
In highly competitive and mature business sectors, one such being the banking sector,
the growth of a company, or rather a bank in this case, greatly depends on the efforts
that the entity makes towards: maintaining and growing its existing customer base,
acquiring/keeping up with new technology, focusing on specific market segments
and enhancing its productivity and efficiency [2, 11]. Of those factors, it is argued
that the first is also the most prominent. Namely, more and more companies become
aware of the fact that their most precious asset is the existing customer base [9, 27].
It comes as no surprise that service providers in the financial industry go through
great efforts to attract clients from their competitors whilst limiting their own losses
[10, 20].
However, it is not only the banks that have become conscious of the importance
of their clientele, but also the clients themselves. The increasing awareness of the
latter party when it comes to quality of service provided, is another element adding
to the already competitive environment. Oftentimes, factors such as accessibility or
even a more attractive interest rate are all it takes for a client, whether long-term or
new, to suddenly stop doing business with a bank and move to a competing firm [8,
15]. Financial institutions have been motivated to gradually shift their focus from
attracting new customers to retaining as many of their current ones as possible, mainly
due to the impact that even a small increase in customer retention can have on the
bank’s income statement but also due to the well-established facts that maintaining
is much cheaper than re-acquiring lost customers and, on a similar note, selling to an
existing customer is several times less expensive than selling to a new customer [19,
26, 29]. In effect, there has been a shift of interest. It has now become important for
banks to know in advance which of their customers, starting from the “high grade”-
high return on investment clients, are likely to leave [10, 14]. To the extent that
a financial institution can obtain this knowledge, it can launch targeted marketing
campaigns, which have been shown to be very effective when it comes to customer
retention [11]. The act of analysing data and developing models so as to make a
prediction on the clients that are likely to “attrite”, usually with an utter aim of
employing adequate proactive counter-measures, refers to attrition or churn, as it is
often called, analysis.
Artificial neural networks have been successfully applied to a wide spectrum of
fields, including but not limited to medicine, such as in the prediction of breast cancer
[22] and economics and finance, such as in the classification of firm fraud [24], in
portfolio optimization [25], in the analysis of time series [17], in the stabilization
of stochastic exchange rate dynamics [18] as well as in the prediction of various
macroeconomic measures [23]. Furthermore, there is an abundance of applications in
problems stemming from the various engineering disciplines. For example, artificial
neural network models have been applied to feedback control systems stabilization
[16], mobile objects localization [13], performance analysis of solar systems [6, 21],
remote sensing multi-sensor classification [1], prediction of the flow behavior of
alloy [12, 28] and performance analysis of heat pump systems [3–5].
16 A Weights Direct Determination Neural Network for Credit Card … 327
In this paper we will use feed-forward neural networks to classify customers that
are likely to attrite. As was already stated, financial institutions that can acquire that
knowledge for their own client base obtain a competitive advantage in the form of
operating cost reduction. In training a feed-forward neural network, there has been
a long tradition in the use of back-propagation algorithms where the structure of the
neural network is iteratively refined. On the other hand, newly implemented weights
and structure determination (WASD) training algorithms offer a feature that their
predecessors lack. Namely, the weights direct determination (WDD) process, inher-
ent in any WASD algorithm, facilitates the direct computation of the optimal set of
weights, hence allowing one to avoid getting stuck in local minima and all in all
contributing in the achievement of lower computational complexity [24, 31, 33]. We
thus develop a 3-layer feed-forward multi-input trigonometrically activated WASD
(MTA-WASD) neural network for classification. Its activation functions consist of
products of power based trigonometric functions. On testing the MTA-WASD neural
network to three publicly available credit card attrition datasets and comparing its
performance to another WASD neural network as well as a number of popular clas-
sifiers from MATLAB’s classification learner app, the MTA-WASD neural network
demonstrated either superior or equal performance across all metrics, thus suggest-
ing that the trigonometrically activated WASD model is both a competitive as well
as a reliable classifier.
This work’s main points can be summarized as follows:
The following is a breakdown of the paper’s structure. Section 16.2 begins with an
overview of the MTA-WASD neural network’s final structure and the rationale behind
it. It then proceeds into the development of activation functions through the use of
lexicographically ordered power tables and the formulation of the WDD process. The
section ends with the description of the full training process and the presentation of
all related algorithms. In Sect. 16.3, the MTA-WASD neural network is applied to
three publicly available credit card attrition datasets and its performance is compared
to other popular models. Section 16.4 contains some final remarks.
328 V. N. Katsikis et al.
The neural network presented in Fig. 16.1 is a classification neural network that
accepts one or many inputs.
Towards building its structure, the neural network employs a WASD algorithm
alongside a post-training structure trimming process. Let .n ∈ N denote the number
of inputs, with .x j ∈ Rm , j = 1, 2, . . [. , n and let . y ∈] Rm be the target response cor-
responding to the input matrix .x = x1 , x2 , . . . , xn ∈ Rm×n . The variable vectors
are passed into the next layer, each with a weight of .1. The training process populates
the hidden layer which, at the end of the procedure, will have accumulated . N ∈ N
[ ]T
neurons. The weights column vector .w = w1 , w2 , . . . , w N is computed by use
of the WDD process which ensures that, given the structure, the choice of weights
is indeed optimal. Each neuron .i = 1, 2, . . . , N represents the image of the input
matrix under the activation function .gi . The corresponding weight represents the
importance of the image’s contribution to the collective output, . ŷ. A weighted com-
bination of all images yields the prediction of the neural network. Finally, the output
neuron is activated in the sense that . ŷ is considered a valid prediction only after it
has been converted to binary form . ỹ through the following elementwise function:
{
1, ŷi ≥ p̃
f ( ŷ) =
. i (16.1)
0, ŷi < p̃
where . p̃ = min ŷ + p(max ŷ − min ŷ) with the threshold . p ∈ [0, 1] and .i = 1, 2,
. . . , m. Generally, if the threshold . p is picked to be close to .0 then more entries of . ŷ
will be mapped to .1 and vice versa.
As an essential feature in any WASD algorithm, the WDD process allows one to
obtain the optimal weights corresponding to the current hidden layer structure without
having to engage in lengthy iterative computations, where the quality of the outcome
is often uncertain. Evidently, the WDD process contributes in achieving both speed
and lower computational complexity compared to traditional weight determination
approaches, whilst avoiding some of the related pitfalls [31, 33].
The construction of the neural network revolves around linking the training input
matrix.x to a known target vector. y; that is, approximating the underlying relationship
between .x and . y through a combination of activation functions. This is mainly
achieved by the development of a large enough hidden layer that is paired with an
adequate set of weights. The activation functions themselves usually account for a
substantial part of the neural network’s performing ability, both its training as well
as testing components. A key feature in training a neural network through a WASD
algorithm is that the number of hidden layer neurons is not predetermined. Rather, as
the training procedure unfolds, the number of hidden layer neurons fluctuates until
the network settles to a structure that is considered optimal. Each added neuron has to
provide for something previously unavailable in the structure. There is no marginal
benefit for adding neurons that are constant multiples of the pre-existing neurons and,
depending on the algorithm, the training process is likely to terminate prematurely,
should that be the case.
It should now come as no surprise that, as far as WASD training algorithms are
concerned, polynomials and/or other functions that are raised incrementally to some
power are common choices for activation functions, mainly because the terms pro-
duced are inherently linearly independent. Namely, the power, the power sigmoid, the
power inverse exponential and the power softplus activation functions were proposed
in [24]. Furthermore, serving as building blocks for activation functions, polynomi-
als such as Chebyshev, Euler, Hermite, Laguerre, Legendre as well as Bernoulli
polynomials were proposed in [31]. In this paper, we investigate the implementation
of a trigonometrically activated neural network. As a result, the following two sub-
activations, which will be used as building blocks for the activation function .g(x),
are proposed and investigated. With .k ∈ .N ∪ {0}, the first sub-activation (SA1) is
As for the ability of the neural network to converge, the following Definition 1,
Theorem 1 and Proposition 1 from [31] should be noted.
330 V. N. Katsikis et al.
∑
n1 ∑
nk ( )∏
k
ν1 νk ν ν
. Bn n ...n (x 1 , x 2 , . . . , x k )
f
= ··· f ,..., Cnqq xq q (1 − xq )nq −νq
1 2 k
ν1 =0 νk =0
n1 nk q=1
ν
are called multivariate Bernstein polynomials of . f (x1 , x2 , . . . , xk ), where .Cnqq
denotes a binomial coefficient with .n q = n 1 , n 2 , . . . , n k and .νq = 0, 1, . . . , n q .
Theorem 1 Let . f (x1 , x2 , . . . , xk ) be a continuous function defined over .Vk =
{(x1 , x2 , . . . , xk ) ∈ Rk |0 ≤ xq ≤ 1, q = 1, 2, . . . , k}. Then the multivariate Bern-
f
stein polynomials . Bn 1 n 2 ...n k (x1 , x2 , . . . , xk ) converge uniformly to . f (x1 , x2 , . . . , xk )
as .n 1 , n 2 , . . . , n k → ∞.
Proposition 1 With a form of products of trigonometric power based functions .φk
employed, we can construct a generalized trigonometric polynomial
∏
n
g (x) = gi (x1 , x2 , . . . , xn ) = φki1 (x1 )φki2 (x2 ) · · · φkin (xn ) =
. i φki j (x j ),
j=1
for .i = 1, 2, . . . , N .
Given .n inputs and . N hidden layer neurons, it is suggested in [31] that .gi (x) =
gi (x1 , x2 , . . . , xn ), the image of.x under the.ith neuron, should be computed as a prod-
uct of .n sub-activations, each of them taking as input one of the .n variables/columns
of .x. The power .k to which each term .φk is raised will be given by an appropriate
.r × n power table . Tn with entries from .N ∪ {0}. For each neuron .i, .i = 1, 2, . . . , N ,
we will compute .gi (x) = gi (x1 , x2 , . . . , xn ) as
∏
n
g (x) = φki1 (x1 )φki2 (x2 ) · · · φkin (xn ) =
. i φki j (x j ), ki j = Tn (i, j). (16.4)
j=1
Before addressing a possible construction mechanism for .Tn , a few preliminaries are
in order. Each row in .Tn represents a unique .n-tuple of powers ranging from 0 to an
arbitrary positive integer. The total number of rows .r is also arbitrary in the sense that
one generates as many rows as sees fit. [ The table follows ] the graded
[ lexicographic ]
order. That is, given two rows .Tn(a) = ka1 , ka2 , . . . , kan , .Tn(b) = kb1 , kb2 , . . . , kbn ,
.a, b ∈ N with .a / = b, they are sorted by means of the following rule [30, 31]. If either
condition is true:
∑ ∑
C.I: . nj=1 ka j > nj=1 kbj , or
∑n ∑
C.II: . j=1 ka j = nj=1 kbj and the first nonzero entry of .Tn(a) − Tn(b) is positive
then, .Tn(b) precedes .Tn(a) . As for the actual entries of the table, a few samples are
given in [30, 31] and a different variation is examined in [32]. However, except for
the .n = 1 case, it is not clear how one should go about acquiring a working version
for .Tn . Drawing from those samples, we propose the heuristic Algorithm 1.
16 A Weights Direct Determination Neural Network for Credit Card … 331
Particularly, let .s denote the sum of the elements of any given row of .T and let S
denote an upper bound for.s. Given S.= 2, the resulting tables.Tn for.n = 1, 2, 3, 4, 5,
are given below.
⎡ ⎤
00 0 0 0
⎢0 0 0 0 1⎥
⎢0 0⎥
⎡ ⎤ ⎢ 0 0 1 ⎥
0000 ⎢0 0 1 0 0⎥
⎢0 ⎥
⎢0 0 0 1⎥ ⎢ 1 0 0 0⎥
⎡0 0 0⎤ ⎢0 0 1 0⎥ ⎢1 0⎥
⎢ ⎥ ⎢ 0 0 0 ⎥
⎢0 1 0 0⎥ ⎢0 0 0 0 2⎥
⎡ ⎤ ⎢0 0 1 ⎥ ⎢ ⎥ ⎢0 ⎥
⎢0 1 0⎥ ⎢1 0 0 0 ⎥ ⎢ 0 0 1 1⎥
00
⎢ ⎥ ⎢ ⎥ ⎢0 0⎥
[ ] ⎢0 1⎥ ⎢ 1 0 0⎥ ⎢0 0 0 2⎥ ⎢ 0 0 2 ⎥
0 ⎢1 0 ⎥ ⎢0 0 2⎥ ⎢0 0 1 1 ⎥ ⎢0 0 1 0 1⎥
. T1 = 1 , T2 = ⎢ ⎥ , T3 = ⎢ ⎥ , T4 = ⎢0 0 2 0⎥ , T5 = ⎢
⎢ ⎥
⎢0 0 1 1
⎥
0⎥ .
⎢0 2⎥ ⎢0 1 1⎥ ⎢0 1 0 1⎥ ⎢0
2 ⎣1 1⎦ ⎢ ⎥ ⎢ ⎥ ⎢ 0 2 0 0⎥⎥
⎢0 2 0 ⎥ ⎢ ⎥ ⎢0 1⎥
⎢1 0 1⎥ ⎢
0110
⎥ ⎢ 1 0 0
⎥
20 ⎣ ⎦ ⎢0 2 0 0⎥ ⎢0 1 0 1 0⎥
110 ⎢1 0 0 1⎥ ⎢0 0⎥
200 ⎢ ⎥ ⎢ 1 1 0 ⎥
⎢1 0 1 0 ⎥ ⎢0 2 0 0 0⎥
⎣1 1 0 0 ⎦ ⎢ ⎥
⎢1 0 0 0 1⎥
⎢1 0⎥
2000 ⎢ 0 0 1 ⎥
⎢1 0 1 0 0⎥
⎣1 1 0 0 0
⎦
20 0 0 0
Although no two rows of .T are identical, two or more rows may share a common
.s. Thus, .T is organized naturally into blocks of rows of equal .s. Those blocks are
sorted in ascending order, as C.I of the graded lexicographic ordering rule suggests.
Each block is then internally organized by means of C.II. The heuristic part of the
algorithm lies in the contents of each block. A guess is that, for a given block where
.s is fixed, we are interested in all possible unique .n-tuples where the sum of all .n
[ ]
entries equals .s, starting from . s, 0, . . . , 0 . Through this vector, one generates the
unique integer partitions of .s in the form of rows of length .n. For each such row,
we subsequently compute all unique permutations and by that point we should have
exhausted all vectors of interest. Starting from an empty matrix .T and setting .s = 0
we repeat those steps, each time sorting the resulting block by C.II , concatenating
it to the bottom of .T and incrementing .s by 1.
Suppose that at any given point in time the hidden layer of the neural network
consists of . N neurons. One needs to determine the optimal weights (vector coef-
ficients) .w1 , w2 , . . . , w N such that the linear combination .w1 g1 (x) + w2 g2 (x) +
· · · + w N g N (x) evaluates to a vector . ŷ ∈ Rm that
[ is, with as little error as] possible,
close to the target vector . y ∈ Rm . Letting . A = g1 (x), g2 (x), . . . , g N (x) ∈ Rm×N ,
the least-squares solution to . Aw = y is given by the WDD process [33]:
.w = A† y, (16.5)
1 ∑
m
.MAE = | ỹ j − y j |, (16.6)
m j=1
Given .x, . y and a maximum number of hidden layer neurons . N , the first
step
[ is to grow the hidden ] layer structure to size . N , essentially building . A =
g1 (x), g2 (x), . . . , g N (x) column by column, as in Sect. 16.2.1. Let .n denote the
number of inputs. It is essential that the first . N rows of .Tn (as in Sect. 16.2.1) are
available. The procedure above, which incorporates numerous MATLAB commands,
constructs . A through an implementation of (16.4) that takes advantage of the struc-
turing of .T in order to reduce the total running time. Namely, the entries in .T (see
examples in Sect. 16.2.1) suggest that for each variable .x j , . j = 1, 2, . . . , n, given
a power .k, the computation of .φk (x j ) is bound to come up multiple times within
the process. Thus, for each .k and .x j of interest, .φk (x j ) is saved the first time it is
computed, resulting in the development of a value ∏ lexicon (VL) in the form of a
three dimensional array. On calculating .gi (x) = nj=1 φki j (x j ), with .ki j = T (i, j),
all available terms are drawn from the lexicon and the rest are computed and sub-
sequently saved for future use. Looping through all .i from .1 to . N and assigning
each .gi (x) to the corresponding column of . A, matrix . A is sequentially produced.
Algorithm 2 describes the aforementioned process of creating . A.
The next step following the successful construction of the hidden layer is to
determine the optimal weights vector .w by use of (16.5). Then, one computes ∑ . ŷ =
Aw and given a threshold . p, converts . ŷ to binary as in (16.1). Let .e = m1 mj=1 | ỹ j −
y j | be the current MAE. In order to fine-tune the hidden layer structure, the process
progresses into a post-training structure timing stage. Particularly, each neuron is
taken out of the structure in an iterative manner and MAE is recomputed. Whenever
the resulting error is lower than the benchmark, the neuron in question is dropped and
the benchmark MAE, as well as the optimal weights vector, are updated. Through
indx, an index vector, one keeps track of the indices of all remaining neurons. To each
neuron corresponds a unique row in .T . Namely, in Algorithm 2, for the computation
334 V. N. Katsikis et al.
∏
of .gi (x) = nj=1 φki j (x j ) the powers .ki j were drawn from the .ith row of .T . If the .ith
neuron is dropped, the same should apply to the .ith row of .T so as not to be assigned
by mistake to one of the remaining neurons. This marks the end of the training process
which upon terminating will have yielded the optimal weights vector .wbest and the
remaining rows .Tbest of the starting power table .T . Those two elements, along with
the sub-activation.φk which was used in building the activation functions, fully define
the neural network in the sense that these are the only necessary features that are
needed in order for the MTA-WASD neural network’s structure to be recreated for
testing purposes. The aforementioned process is described in Algorithm 3, whereas
the MTA-WASD neural network’s training process is synopsized in the flowchart
of Fig. 16.2 and a roadmap for the application of the trained MTA-WASD neural
network is provided in the flowchart of Fig. 16.3.
16.3 Experiments
In this section, we apply the MTA-WASD neural network under SA1 and SA2 to three
publicly available credit card attrition/churn datasets and compare its performance
to four other well performing classifiers, three of them coming from MATLAB’s
classification learner app and the fourth one being another WASD neural network
that incorporates Bernoulli polynomials in the construction of its activation functions.
It is worth noting that all WASD neural networks will be trained up to 100 neurons
and that is because performance on the testing set deteriorates rapidly, at least in those
16 A Weights Direct Determination Neural Network for Credit Card … 335
three datasets, as the size of the structure increases over that threshold. Furthermore,
all inputs to those models will be normalized to the interval .[−1, 1], for similar
reasons. Last but not least, we will refer to each of the three datasets as AD.I, AD.II
and AD.III, respectively, and in the context of figures we will abbreviate “credit card
customer” by CCC. Notice also that the datasets employed in this research can be
acquired from Kaggle (https://www.kaggle.com/).
336 V. N. Katsikis et al.
Fig. 16.4 The MTA-WASD neural network’s results on AD.I under SA1 and SA2
to arrays of zeros and ones, transforming the rest of the columns entails, naturally, a
subjective element. Nevertheless, our approach may be summarized in the following
points:
1. Income category: Each of the five listed ranges was mapped to an integer from
.− 2 to 2 with the smallest value being assigned to the lowest income category.
2. Card category: The listed colors (Blue, Gold, Platinum, Silver) were ranked as
Blue(0) .< Silver(1) .< Gold(2) .< Platinum(3), where the value in the parenthesis
is the assigned integer.
3. Education level: A similar assignment to that in 1, with the smallest value being
assigned to the least educated category.
4. Martial status: Single(.−1) .< Divorced(0) .< Married(1)
5. Whenever an entry was marked as “unknown”, the whole row containing the
entry was dropped.
One could argue that the rationale behind this set of assignments seems to stem
more from a credit card default point of view rather than that of credit card attrition
(which is different). Although we won’t try to argue that the assignment is opti-
mal, it is in fact worth reporting that upon reversing (one at a time) the orders in
the aforementioned points 1, 3 and 4, we were faced with a very steep decline in
performance. After all is said and done, out of the 10,127 rows there will be 7081
remaining with only 15.72% of them corresponding to attrited customers. Having
tried to either randomly delete samples from the dominating class or to synthetically
generate more training samples matching to the dominated class, we found that the
choice which facilitated both performance and reproducibility was, surprisingly, to
keep the dataset as is.
Figure 16.5 demonstrates the training error paths (number of neurons plotted
against MAE) as well as the classification performances of the SA1 (left column
subfigures) and SA2 (right column subfigures) of the MTA-WASD neural network
when applied to the training and testing set, respectively. Comparing the figures for
the SA1 and SA2, a similar training path is drawn with neurons being trimmed at dif-
ferent positions, eventually resulting in a structure of (almost) identical size, namely
77 to 76 neurons for the models based on SA1 and SA2, respectively. When it comes
to performance on the training set the model based on SA1 performs slightly better
overall, scoring a MAE of 0.0658, whereas the model based on SA2 evaluates to a
MAE of 0.0687. Nevertheless, performance on the test set (see Sect. 16.3.4) more
than makes up for any deficiencies. This is the only dataset where, at least on the
testing set, only the MTA-WASD neural network model under SA2 seems to come
out on top. However, as will be seen in Sect. 16.3.4, there seems to be no evidence
to support the hypothesis that the difference between the predictive accuracies of the
two models is in fact statistically significant.
16 A Weights Direct Determination Neural Network for Credit Card … 339
Fig. 16.5 The MTA-WASD neural network’s results on AD.II under SA1 and SA2
value 0 at all related class columns except for the unknown class column, in which
it will be assigned the value 1. Consequently, at a cost of increasing the number
of columns of the original dataset by half, no rows are discarded and thus all the
available information is preserved. Interestingly enough, this results in the precision
of all investigated models (WASD and MATLAB classifiers alike) increasing by a
non-trivial amount.
All in all, except for two of the columns being dropped (those referring to Client
ID and row index), the dataset is in operational form once the contents of the comma-
separated values (CSV) files have been converted to arrays. Unbalanced though the
dataset may be, with nearly 85% of samples corresponding to non-attrited customers,
we chose not to apply any preprocessing techniques (other than the usual normal-
ization), nor to rearrange the training and testing sets, but rather to use the existing
format. As far as the two versions of the MTA-WASD neural network are concerned,
there is one final element of surprise and this is that they perform identically in all
aspects. Indeed, as is depicted in Fig. 16.6, the MTA-WASD neural networks based
on SA1 and SA2, respectively, converged to the same structure and produced the
same results both in the training as well as in the testing set. The performance of the
models on the testing set is presented in Sect. 16.3.4, while the training set evaluat-
ing to a MAE of 0.0903. Between AD.II and AD.III, there is a striking difference in
precision (.≈ 72–94.33%) which, as was discussed above, may be attributed to the
current structuring of the dataset.
Fig. 16.6 The MTA-WASD neural network’s results on AD.III under SA1 and SA2
rest of the evaluation criteria, the MTA-WASD neural network demonstrates robust
and often superior overall performance. Thus, it is a reliable classifier that is very
much capable of competing in equal terms with other well-established models, as
the results in the following table suggest. In view of clarity, for each dataset and each
metric a bolded value signifies the highest score, whereas the colors blue, red and
purple signify the winning model for AD.I, AD.II and AD.III, respectively. Note that
we do not take TP, FP, TN and FN directly into consideration. This is done indirectly,
by treating the other measures which use the former as building blocks. Furthermore,
in AD.III the Linear SVM and Bernoulli WASD both score the highest scores at two
different metrics. Picking a winner in that case is fairly subjective.
In order to properly assess whether the two MTA-WASD neural networks based
on SA1 and SA2 differ from each other and, furthermore, in order to address the
equally important question of whether one can conclude to a model that is better
suited for predicting credit card attrition in those datasets, we add to the previous
342
Table 16.2 McNemar test on the MTA-WASD neural network based on SA1
SA1 AD.I AD.II AD.III
Versus Null p-value Null p-value Null p-value
hypothesis hypothesis hypothesis
SA2 Not rejected 0.362032 Not rejected 0.401061991 Not rejected 1
KNB Not rejected 0.687149 Rejected 0.000135809 Rejected 1.79112E.−17
Linear SVM Rejected 0.007284 Not rejected 0.687885344 Not rejected 0.734342033
Fine KNN Rejected 5.42E.−05 Rejected 0.010212663 Rejected 6.21851E.−18
Bernoulli Not rejected 0.917041 Not rejected 0.897421827 Rejected 8.54198E.−30
344 V. N. Katsikis et al.
Table 16.3 McNemar test on the MTA-WASD neural network based on SA2
SA2 AD.I AD.II AD.III
Versus Null p-value Null p-value Null p-value
hypothesis hypothesis hypothesis
SA1 Not rejected 0.362032 Not rejected 0.401061991 Not rejected 1
KNB Not rejected 0.257761 Rejected 1.83849E.−05 Rejected 1.79112E.−17
Linear SVM Rejected 0.000774 Not rejected 0.32447878 Not rejected 0.734342033
Fine KNN Rejected 4.65E.−06 Rejected 0.002672984 Rejected 6.21851E.−18
Bernoulli Not rejected 0.267812 Not rejected 0.470879014 Rejected 8.54198E.−30
16.4 Conclusion
References
1. Bigdeli, B., Pahlavani, P., Amirkolaee, H.A.: An ensemble deep learning method as data fusion
system for remote sensing multisensor classification. Appl. Soft Comput. 110, 107563 (2021)
2. Chitra, K., Subashini, B.: Customer retention in banking sector using predictive data mining
technique. In: ICIT 2011 The 5th International Conference on Information Technology (2011)
3. Esen, H., Esen, M., Ozsolak, O.: Modelling and experimental performance analysis of solar-
assisted ground source heat pump system. J. Exp. Theor. Artif. Intell. 29(1), 1–17 (2017)
16 A Weights Direct Determination Neural Network for Credit Card … 345
4. Esen, H., Inalli, M., Sengur, A., Esen, M.: Artificial neural networks and adaptive neuro-fuzzy
assessments for ground-coupled heat pump system. Energy Build. 40(6), 1074–1083 (2008)
5. Esen, H., Inalli, M., Sengur, A., Esen, M.: Performance prediction of a ground-coupled heat
pump system using artificial neural networks. Expert Syst. Appl. 35(4), 1940–1948 (2008)
6. Esen, H., Ozgen, F., Esen, M., Sengur, A.: Artificial neural network and wavelet neural network
approaches for modelling of a solar air heater. Expert Syst. Appl. 36(8), 11240–11248 (2009)
7. Fagerland, M.W., Lydersen, S., Laake, P.: The McNemar test for binary matched-pairs data:
mid-p and asymptotic are better than exact conditional. BMC Med. Res. Methodol. 13(1), 1–8
(2013)
8. Farquad, M.A.H., Ravi, V., Raju, S.B.: Churn prediction using comprehensible support vector
machine: an analytical CRM application. Appl. Soft Comput. 19, 31–40 (2014)
9. García, D.L., Nebot, À., Vellido, A.: Intelligent data analysis approaches to churn as a business
problem: a survey. Knowl. Inf. Syst. 51(3), 719–774 (2017)
10. He, B., Shi, Y., Wan, Q., Zhao, X.: Prediction of customer attrition of commercial banks based
on SVM model. Procedia Comput. Sci. 31, 423–430 (2014)
11. Hu, X.: A data mining approach for retailing bank customer attrition analysis. Appl. Intell.
22(1), 47–60 (2005)
12. Huang, C., Jia, X., Zhang, Z.: A modified back propagation artificial neural network model
based on genetic algorithm to predict the flow behavior of 5754 aluminum alloy. Materials
11(5), 855 (2018)
13. Katsikis, V.N., Mourtas, S.D., Stanimirović, P.S., Zhang, Y.: Solving complex-valued time-
varying linear matrix equations via QR decomposition with applications to robotic motion
tracking and on angle-of-arrival localization. IEEE Trans. Neural Networks Learn. Syst. 33(8),
3415–3424 (2022)
14. Kim, S., Shin, K.-S., Park, K.: An application of support vector machines for customer churn
analysis: credit card case. In: International Conference on Natural Computation, pp. 636–647.
Springer, Berlin (2005)
15. Kumar, D.A., Ravi, V., et al.: Predicting credit card customer churn in banks using data mining.
Int. J. Data Anal. Tech. Strateg. 1(1), 4–28 (2008)
16. Mourtas, S., Katsikis, V., Kasimis, C.: Feedback control systems stabilization using a bio-
inspired neural network. EAI Endorsed Trans. AI Rob. 1(1), 1–13 (2022)
17. Mourtas, S.D.: A weights direct determination neuronet for time-series with applications in
the industrial indices of the federal reserve bank of St. Louis. J. Forecast. 14(7), 1512–1524
(2022)
18. Mourtas, S.D., Katsikis, V.N., Drakonakis, E., Kotsios, S.: Stabilization of stochastic exchange
rate dynamics under central bank intervention using neuronets. Int. J. Inf. Technol. Decis. Mak.
22(2), 855–883 (2023)
19. Nie, G., Wang, G., Zhang, P., Tian, Y., Shi, Y.: Finding the hidden pattern of credit card holder’s
churn: a case of china. In: International Conference on Computational Science, pp. 561–569.
Springer, Berlin
20. Poveda, R., Alvaro, C.: Forecasting credit card attrition using machine learning models. In:
ICAIW 2020: Workshops at the Third International Conference on Applied Informatics 2020,
29–31 Oct 2020, Ota, Nigeria (2020)
21. Premalatha, N., Valan Arasu, A.: Prediction of solar radiation for solar systems by using ANN
models with different back propagation algorithms. J. Appl. Res. Technol. 14(3), 206–214
(2016)
22. Simos, T.E., Katsikis, V.N., Mourtas, S.D.: A fuzzy WASD neuronet with application in breast
cancer prediction. Neural Comput. Appl. 34, 3019–3031 (2021)
23. Simos, T.E., Katsikis, V.N., Mourtas, S.D.: Multi-input bio-inspired weights and structure deter-
mination neuronet with applications in European Central Bank publications. Math. Comput.
Simul. 193, 451–465 (2022)
24. Simos, T.E., Katsikis, V.N., Mourtas, S.D.: A multi-input with multi-function activated weights
and structure determination neuronet for classification problems and applications in firm fraud
and loan approval. Appl. Soft Comput. 127, 109351 (2022)
346 V. N. Katsikis et al.
25. Simos, T.E., Mourtas, S.D., Katsikis, V.N.: Time-varying Black-Litterman portfolio optimiza-
tion using a bio-inspired approach and neuronets. Appl. Soft Comput. 112, 107767 (2021)
26. Tang, L., Thomas, L., Fletcher, M., Pan, J., Marshall, A.: Assessing the impact of derived
behavior information on customer attrition in the financial service industry. Eur. J. Oper. Res.
236(2), 624–633 (2014)
27. Van den Poel, D., Lariviere, B.: Customer attrition analysis for financial services using propor-
tional hazard models. Eur. J. Oper. Res. 157(1), 196–217 (2004)
28. Vera, R., Ossandón, S.: On the prediction of atmospheric corrosion of metals and alloys in
Chile using artificial neural networks. Int. J. Electrochem. Sci 9(12), 7131–7151 (2014)
29. Wang, G., Liu, L., Peng, Y., Nie, G., Kou, G., Shi, Y.: Predicting credit card holder churn
in banks of china using data mining and MCDM. In: 2010 IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 215–218. IEEE
(2010)
30. Zhang, Y., Chen, D., Jin, L., Wang, Y., Luo, F.: Twice-pruning aided WASD neuronet of
Bernoulli-polynomial type with extension to robust classification. In: 2013 IEEE 11th Inter-
national Conference on Dependable, Autonomic and Secure Computing, pp. 334–339. IEEE
(2013)
31. Zhang, Y., Chen, D., Ye, C.: Deep Neural Networks: WASD Neuronet Models, Algorithms,
and Applications. CRC Press (2019)
32. Zhang, Y., Wang, Y., Li, W., Chou, Y., Zhang, Z.: WASD algorithm with pruning-while-growing
and twice-pruning techniques for multi-input Euler polynomial neural network. Int. J. Artif.
Intell. Tools 25(02), 1650007 (2016)
33. Zhang, Y., Yu, X., Xiao, L., Li, W., Fan, Z., Zhang, W.: Weights and structure determination of
artificial neuronets. In: Self-Organization: Theories and Methods. Nova Science, New York,
NY, USA (2013)
Chapter 17
Stock Market Prediction Using Machine
Learning: Evidence from India
Abstract Literature deciphers the dynamics of the stock market environment across
the regions. Moreover, the emerging stock market like India has been experiencing
several ups and downturns due to its continuous economic reforms since the early
1990s, which makes the Indian stock markets exhibit the diversified information
characteristics. The chapter predicts the movements of the Indian stock markets over
2000–2022, and observes certain dynamism in both the actual and predicted trends
of the Indian stock markets. The results revealed that Long Short-Term Memory
holding the time-independence characteristics and greater extent of prediction accu-
racy proved as the best machine learning technique to predict the movement of the
Indian stock markets. Moreover, the degree of prediction accuracy of all the machine
learning techniques except Long-short term memory varies from one time to other.
On the other hand, Support vector machines and linear regression models with their
lowest degree of prediction accuracy and highest errors proved least appropriate in
predicting the movements of Nifty, and Sensex respectively. The robustness of our
method would benefit for testing it on another markets, and time periods. The study
also discusses the strengths and weaknesses of several machine learning techniques
and provide important insights in applying advanced technologies for stock market
prediction of an emerging economy like India. Our prediction approach provides a
potentially beneficial alternative for the investors to identify the return opportuni-
ties and achieve the diversification benefits by mitigating risk while investing in the
Indian stock markets.
S. Patra (B)
Goa Institute of Management, Sanquelim, Goa 403505, India
e-mail: [email protected]
T. N. Pandey
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil
Nadu 600127, India
e-mail: [email protected]
B. Bhuyan
Department of Economics, Maharaja Purna Chandra (Autonomous) College, Baripada,
Odisha 757003, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 347
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_17
348 S. Patra et al.
17.1 Introduction
In this chapter, we predict the stock price movements of an emerging economy like
India using sophisticated machine learning techniques over the decades. Since the late
2000, the continuous economic and financial market reforms in the Indian economy
has been creating a dynamic information environment in the stock markets. In this
scenario, the prediction of stock markets has become the most challenging tasks
for the investors [12]. Despite the establishment of the efficient market hypothesis
(EMH)1 of Fama [15], the inquiry into different models and profitable system is
still attracting a lot of attention from academia to predict the stock price movements
[45]. Recently, a group of literature [6, 28] argued against the all-or-none condition
of EMH. However, some other studies supported the essence of adaptive market
hypothesis (AMH)2 not only in the case of the emerging stock market like India [5,
22], but also across the regions [36–38]. Moreover, a sophisticated predictive model
with the capability to generate excess returns not only helps the investors to obtain
large profit but also deviates the stock prices from the random-walk benchmark [19].
The continuous dynamic economic situations such as reforms, financial crashes,
bubbles, manias, change in political environment, and uncertainty in investor’s bias
make the stock prices chaotic, and noisy [4, 18, 29], and therefore increases the
degree of volatility co-movement between the stock market liquidity and informa-
tional efficiency [39]. Moreover, the emerging stock market like India has been expe-
riencing several ups and downturns due to its continuous economic reforms since
the early 1990s [5, 22]. According to S & P fact book (Standard & Poor’s 2012),
Indian stock markets has grown with the largest number of listed companies in its
stock exchanges, changes in the market microstructures, transparent, and advanced
trading practices. Moreover, the increased integration of the Indian stock markets
with the world economy, and growing percentage of the stock market capitalization
to GDP explains the phenomenal growth of the Indian stock markets. Therefore, the
emerging stock markets like India exhibit the diversified information characteristics,
and hold complexities in their price patterns which distinguish from the developed
stock markets, and emerged as the most important destination for foreign institutional
investors [21]. In this aspect, the future of the Indian stock markets is uncertain, and
1 EMH indicates that the stock market in order to become informationally efficient needs to follow
the random-walk benchmark, which makes the stock price movement unpredictable over time.
2 Lo [32] proposed AMH, which indicates that the financial markets rather than being an all-or-none
condition evolves over time, and therefore remains adaptive to several economic and non-economic
events.
17 Stock Market Prediction Using Machine Learning: Evidence from India 349
thus it is necessary to predict the future price patterns of the Indian stock exchanges
to reap the investment benefits over time.
The present study contributes to the existing literature in the following ways.
Departing from the previous research, we use several machine learning techniques
to predict the historical movements of the Indian stock prices. The existing literature
extensively used technical [11, 31] and fundamental analysis [10] to predict the
future market trends and the factors associated with the market trends. Some other
studies used moving averages, autoregressive models, discriminating analyses, and
correlations to predict the financial time series [29, 44]. Recently, the use of artificial
intelligence system in the prediction of chaotic, random, and non-linear financial time
series [12, 44, 48] has become the most promising area of research, and therefore
demands the further empirical investigation. The present research complements the
literature on the application of machine learning techniques in the financial market,
and therefore extends the existing work to the usages of machine learning models in
the context of Indian stock markets, which is the issue that seldom explored before.
In particular, to the best of our knowledge, the present study is the first to investigate,
and compare the applications of several machine learning techniques in the emerging
stock market like India during both the pre and post-Covid periods. Further, the issue
of nonlinearity in the Indian stock prices is addressed in this paper, which seldom
received due attention in India in the post-Covid periods.
The remainder of the chapter is structured as follows. Section 17.2 describes a
brief review on the machine learning techniques, and its applications. We explain
the used data and methods in Sect. 17.3. Section 17.4 discusses the main results, and
evaluates the machine learning applications in the case of the Indian stock market.
Section 17.5 summarizes and concludes the chapter.
Artificial intelligence system is designed to efficiently deal with chaotic, random, and
non-linear financial time series [12]. Machine learning techniques, which combine
artificial intelligence systems, seek to extract patterns that learned from historical
data3 to successively make predictions about the new data4 [47]. The prediction
using the machine learning techniques is done in two phases. The First phase deals
with selecting the relevant variables, and models for the prediction, separating the
portion of the data for training and validation of the models, and then optimizing the
models. In the Second phase, the optimized models are applied to the data intended
for testing, which measures the predictive performance of the model. The existing
literature employs different machine learning techniques, such as the artificial neural
3 The process to learn the historical data is known as learning or training the dataset in machine
learning approach.
4 The process to make prediction about the new data is known as testing the dataset in machine
learning approach.
350 S. Patra et al.
networks (ANN) [49], support vector machines [40], and random forests (RF) [41]
to predict the time series.
In general, neural networks are developed to model the biological processes [1],
which is particularly related to the human system of learning and identifying the
patterns [43]. The basic units of the neural networks, known as the neuron, imitate
the human equivalent with dendrites to receive the input variables and obtain an output
value, which can also be served as the input for the rest neurons [30]. In this way,
the basic processing units of the neural networks are interconnected, and therefore
attributes certain weights for each connection [31]. These weights are adjusted in
each learning process of the network in the first phase [29]. In particular, in the First
phase, the model optimizes the interconnections between the layers of the neurons
by transferring the parameters from one layer to other, and therefore minimizes the
errors in the prediction of the subsequent dataset. Accordingly, the last layer of the
neural network combines all the signals from the previous layers, and converts into
one output signal, which is known as the response of the network to the input data.
Another important machine learning technique, Support Vector Machines (SVM)
considers the training samples, and efficiently transforms the training data from their
original dimension space to another space with the approximation of a linear separa-
tion by a hyperplane [25]. This technique is commonly used to classify the training
data based on the input variables in the model. In this technique, the transformation
is made with the help of kernel functions from the space of the original dimensions
to the space in which the classifications are performed during training the dataset
[33]. The major difference between ANN and SVM is that the former minimizes the
errors of their empirical responses in the first phase of the training stage, whereas
the later minimizes the upper threshold of the error of its classifications [24].
As an alternatives to ANN and SVM, the machine learning literature often uses
another technique namely decision tree (DT) to predict the financial time series.
This method divides the dataset into various subsets based on the values of the input
variables until the basic classification unit is obtained in accordance to the training
sample [3]. Moreover, the consistent classifications of the most accurate trees can be
efficiently combined into single one with the RF algorithm [9]. The combination of
DT and RF machine learning techniques can not only be used in the regressions or
classifications of the training samples, but also can efficiently predict the financial
markets [2, 26, 28, 29, 35].
The prediction of stock market with non-stationary behavior of its price patterns
is challenging [12, 42, 50]. Moreover, the dynamism in the stock price patterns are
influenced by the dynamic trends of the economy, industry, polity, and psycholog-
ical behavior of the investors [37, 51]. Thus, the prediction of the future behavior
of the stock market should be enriched with the use of advanced techniques, and its
practical applications to the historical price data for evaluating the profitability of the
techniques [20]. Recently, the use of machine learning techniques in the prediction
of financial time series with chaotic, noisy, and non-linear dynamics has become
a more promising area of research [12]. Machine-learning techniques integrating
several artificial intelligence systems seeks to extract specific patterns learned from
17 Stock Market Prediction Using Machine Learning: Evidence from India 351
historical data to subsequently make predictions about new data [47]. On this back-
drop, the prediction of chaotic stock price of the emerging economy like India remains
challenging [42, 50], but is intriguing over time.
In the overall, the literature used different machine learning techniques to predict
the financial time series. But, the application of the multiple machine learning tech-
niques in the prediction of the stock price of the emerging economy like India has
seldom explored before. Departing from the previous research, we employ several
other machine learning techniques such as Artificial Neural Networks (ANN), Long
Short-Term Memory (LSTM), Decision Tree Regression (DT), Random Forest (RF),
Support Vector Machine (SVM), Linear Regression (LR), Ridge Regression (RR),
and K-Nearest Neighbors (KNN) Regression to predict the movement of the Indian
stock price over two decades, which makes the present research unique in its own
way.
17.3.1 Data
Twenty-two years of data on the daily closing stock price of both Sensex and Nifty
ranged from 3-Jan-2000 to 11-Oct-2022 were collected from the websites of Bombay
Stock Exchange (BSE) and Nationl Stock Exchange (NSE) respectively. Both the
indices are expressed in Indian Rupees. We have divided the full sample into testing
and training samples. Accordingly from the total sample, we considered 80% as the
training and 20% as the testing samples.
17.3.2 Methods
Artificial Neural Networks (ANN) is a machine learning algorithm that has taken its
inspiration from the structure and function of the biological neural networks [27]. It
consists of interconnected artificial neurons that used to process input data in order
to generate an output. Each artificial neuron receives input from the previous layer
of neurons and processes it using an activation function, before passing the result to
other neurons in the network [16]. The output of the model is produced by the output
layer of neurons, which combine the intermediate outputs of the hidden layers of
neurons.
352 S. Patra et al.
The above ANN consists of one neuron that receives input from one or more input
nodes and produces an output. The output is calculated based on the weighted sum of
the inputs and a bias term, using an activation function. The weights and bias can be
adjusted to optimize the performance of the neuron on a particular task. The output
of a neuron can be calculated using the following equation:
( )
∑
n
y = f b+ xi wi (17.1)
i=1
where,
y: Output of neuron.
f : Activation function (generally a sigmoid function or a rectified linear unit
function).
b: Bias term.
x1 to xn : Inputs.
w1 to wn : Corresponding weights.
ANN can be used for time series regression tasks, in which the goal is to predict the
value of a continuous variable at a future time based on past observations [17]. ANNs
are a good choice for time series regression because they are able to capture complex
and non-linear relationships between the variables and can adapt to changing patterns
in the data over time.
There are many different types of ANN architectures that can be used for time
series regression tasks, including feedforward neural networks, convolutional neural
networks, and recurrent neural networks. Each type of ANN has its’ own strengths
and weaknesses, and the appropriate choice will depend on the characteristics
of the specific time series regression task. ANNs are trained using a variant of
stochastic gradient descent called back-propagation [46]. During training, the model
17 Stock Market Prediction Using Machine Learning: Evidence from India 353
Long short-term memory (LSTM) is a type of artificial neural network that is designed
to remember information for long periods of time [23]. They have a different struc-
ture from traditional ANNs, in the sense that they contain “memory cells” that can
retain information for long periods of time, as well as input, output, and forget gates
(controlled by sigmoid activation functions) that control the flow of information into
and out of the memory cells. It is particularly useful for tasks that involve sequential
data, such as language translation or stock price prediction, because it is able to
maintain a record of past events that can influence the present or future.
354 S. Patra et al.
A decision tree regressor is a type of model used for regression tasks that works by
building a tree-like structure in which the internal nodes represent decision points
and the leaf nodes represent the predicted value. The model makes predictions by
starting at the root node and following the path down the tree based on the values
of the input features. The predicted value is then the value at the leaf node that is
17 Stock Market Prediction Using Machine Learning: Evidence from India 355
reached. Below is an example of a simple decision tree that evaluates the smallest of
three numbers.
One way to represent the decision at each node mathematically is with a simple
equation:
where, y is the predicted value for the input observations, mean is the function that
calculates the average of the values of the observations that fall into the leaf node,
obser vations are the Values of observations in leaf node.
Decision tree regressors can be used for time series regression tasks, in which
the goal is to predict the value of a continuous variable at a future time based on
past observations. However, decision tree regressors may not be the best choice for
all-time series regression tasks due to their inherent limitations. One potential issue
with using decision tree regressors for time series regression is that they do not take
into account the time component of the data. Decision tree regressors treat each input
observation independently, regardless of when it occurred. This can be a problem in
time series regression tasks because the value of the response variable may depend on
the order of the observations. Another potential issue is that decision tree regressors
are prone to overfitting, especially when the tree becomes deep and has many nodes.
This can be a problem in time series regression tasks because the model may not
generalize well to future data. Overall, decision tree regressor may be a simple and
easy-to-understand choice for time series regression tasks, but they may not always
be the most accurate or robust option. The advantages of this model include; (a)
simple to understand and interpret, (b) decision tree models do not require the data
to be normally distributed or the relationships between variables to be linear, (c) it can
handle high-dimensional data. The disadvantages include; (a) prone to over-fitting;
(b) limited ability to model complex relationships; (c) poor performance on small
datasets; (d) poor performance on imbalanced datasets.
356 S. Patra et al.
A random forest (RF) is an ensemble machine learning method that uses multiple
decision trees to make predictions. It works by training multiple decision trees on
randomly selected subsets of the training data and then averaging the predictions
made by each tree. Random forests are a popular machine learning method because
they are able to improve the accuracy of the predictions made by individual decision
trees by reducing overfitting and improving the ability to generalize to unseen data.
The below figure represents the random forest structure.
where,
y: Overall prediction made by the random forest.
average: Function that calculates the average of the predictions made by each of
the individual decision trees in the forest.
DT pr edictions: Predictions made by individual decision trees.
One potential issue with using random forests for time series regression is that
they do not take into account the time component of the data. Like individual deci-
sion trees, random forests treat each input observation independently, regardless of
when it occurred. To address this issue, some researchers have proposed methods
for incorporating the time component into the random forest model. For example,
17 Stock Market Prediction Using Machine Learning: Evidence from India 357
one approach is to use lagged variables as input features, which can capture the
dependencies between observations at different times.
A support vector machine (SVM) is a type of model used for classification and
regression tasks. It works by finding the hyperplane in a high-dimensional space
that maximally separates the different classes or values of the response variable.
SVMs are a popular machine learning method because they are able to achieve good
generalization performance and are effective at handling high-dimensional data.
One way to represent the prediction made by an SVM mathematically is with the
following equation:
y = sign(w ∗ x + b) (17.4)
where,
w, b: Model parameters that define the hyperplane.
x: Input data.
sign: sign function; returns a positive value if the argument is positive and a
negative value if the argument is negative.
y: The class or value that is associated with the positive or negative value.
SVMs are a good choice for time series regression because they are able to achieve
good generalization performance and are effective at handling high-dimensional data.
One potential issue with using SVMs for time series regression is that they do not take
into account the time component of the data. To address this issue, lagged variables
used as input features, which can capture the dependencies between observations
358 S. Patra et al.
at different times. SVMs can use different kernel functions, which allows them to
model different types of relationships between the variables.
Linear regression (LR) is a type of model used for regression tasks that assumes a
linear relationship between the input features and the response variable. It works by
finding the line of best fit that minimizes the sum of the squared differences between
the predicted values and the true values. Linear regression is a simple and widely-
used method for regression tasks, but it is limited in its ability to model complex,
non-linear relationships between the variables.
y = b0 + b1 x1 + b2 x2 + · · · + bn xn (17.5)
where,
y: Predicted value of the response variable.
b0 : Intercept term.
b1 , b2 , . . . , bn : Coefficients for the input features.
x1 , x2 , . . . , xn : Input features.
The predicted value is thus a linear combination of the input features and the
coefficients.
17 Stock Market Prediction Using Machine Learning: Evidence from India 359
In ridge regression (RR), the goal is to minimize the residual sum of squares (RSS)
between the predicted output and the true output. This constraint helps to prevent
overfitting by penalizing models with large coefficients.
The mathematical equation for ridge regression is given by:
∑
minimi ze RSS+ ∝ ∗ θ2 (17.6)
where θ is the vector of model parameters (coefficients), RSS is the residual sum of
squares, and α is the regularization
∑ 2 parameter that controls the strength of the penalty.
The regularization term θ is also known as the L2 penalty or the “shrinkage
penalty”, as it encourages the model parameters to take on smaller values. The
strength of the penalty is controlled by the hyperparameter α, which is chosen by the
user. A larger value of α results in a stronger penalty and a smaller value of α results
in a weaker penalty. Ridge regression is often used to improve the generalization
of linear models by reducing the variance of the estimates. It is particularly useful
when the number of features is large, as it helps to prevent overfitting by penalizing
models with large coefficients. Ridge regression can be used for time series regres-
sion by using lagged variables as features. To use ridge regression for time series
regression, we created a design matrix with lagged variables as the features and the
target variable as the output. Then, we fit a ridge regression model to this design
matrix to make predictions about the target variable at future time points.
A large value of α can result in over-regularization, which can lead to poor perfor-
mance. On the other hand, a small value of α can result in under-regularization, which
can lead to overfitting. Therefore, it is usually necessary to tune the value of α using
cross-validation to find the optimal value.
K-nearest neighbors (KNN) is a type of model used for regression tasks that works
by finding the K data points in the training set that are most similar to the input
data point and averaging their target values to make a prediction. KNN is a simple
and easy-to-understand method for regression tasks, but it can be computationally
expensive, as it requires calculating the distances between the input data point and
all the data points in the training set.
One way to represent the prediction made by a KNN model mathematically is
with the following equation:
where,
360 S. Patra et al.
y: Predicted value, average: Function that calculates the average of the target
values of the K nearest neighbors, K : Number of nearest neighbors to consider.
The lagged variables are used as input features, which can capture the depen-
dencies between observations at different times. The above machine learning tools
have their advantages and disadvantages; therefore, it is better not to depend on one
model. In this study, we have estimated the above discussed machine learning tools
for the better comparison and conclusion.
We split the market dataset into two parts, such as 20% testing sample (i.e. the orange
colored portion in Fig. 17.1) and 80% training sample (i.e. the blue colored portion
in Fig. 17.1) as follows.
The price of both Nifty (Fig. 17.1a), and Sensex (Fig. 17.1b) declines in 2009,
2012–13, 2016, and 2020. On the other hand, we observe the greatest upturns in Indian
stock markets during 2022. We extensively survey national newspapers, published
reports of the Indian monetary authorities namely RBI and SEBI to identify the
dynamic socio-economic events associated with the upturns and downturns of the
Indian stock prices. The dynamic economic situations such as global financial crisis
(GFC), Eurozone sovereign debt crisis (ESC), demonetization of Indian bank notes,
and covid-19 global pandemic are attributed to the respective downturns, whereas
several precautionary macroeconomic reforms5 in the Indian economy in the post-
pandemic period helped the stock markets perform better in 2022. Moreover, we
observe an adaptability environment in the Indian stock markets to the dynamic
socio-economic situations over the period. Hence, our result supports Bhuyan et al.
[5] and Patra and Hiremath [36, 37].
The continuous exposure to global shocks, and the subsequent financial and
economic reforms in an emerging economy like India makes the stock prices evolves
over time. Further, we observe that daily ups and downs of the stock prices are inter-
twined with each other, indicating the presence of uncertainty in the daily movements
of stock price. In this aspect, predicting the movement of stock prices in the post-
pandemic world is essential to understand the return opportunities for the investors
in the Indian stock markets. We employ several sophisticated machine learning (ML)
techniques, such as Artificial Neural Networks (ANN), Long Short-Term Memory
(LSTM), Decision Tree Regression (DT), Random Forest (RF), Support Vector
Machine (SVM), Linear Regression (LR), Ridge Regression (RR), and K-Nearest
Neighbors (KNN) Regression to predict the movements of both Sensex and Nifty as
follows.
5The Government of India mainly focused on stabilizing the monetary policy, and creating a
sound interaction between monetary and fiscal policy to encourage financial stability in the post-
pandemic world. For details refer https://www.bis.org/publ/bppdf/bispap122_j.pdf, assessed on 25th
September 2023 at 8:30 PM.
17 Stock Market Prediction Using Machine Learning: Evidence from India 361
Fig. 17.1 Splitting of market dataset into training and testing samples. 6
Using ANN (Fig. 17.2), and LSTM (Fig. 17.3), we observe that the trends of
both actual and predicted stock prices are consistent with each other. In other words,
6 Note: a and b report the splitting of the Nifty and Sensex dataset respectively. Our total sample
period (i.e. 3rd January 2000–11th October 2022) in both the markets is divided into 2 parts, such
as training sample (reported in the blue line, i.e. from 3rd January 2000–11th October 2018), and
testing sample (reported in orange line, i.e. from 12th October 2018–11th October 2022)
362 S. Patra et al.
Fig. 17.2 Prediction of the Indian stock markets using ANN.7 Source Author’s own computation
there exists a minimal gap between actual and predicted price of the stock markets
indicating the greater extent of prediction accuracy with the usage of ANN and
LSTM techniques. In particular, we find the most accurate predicted prices in the
post-pandemic period for both the markets, which reflects the suitability of both ANN
and LSTM in predicting the stock prices in the dynamic information environment.
However, LSTM provides the highest degree of prediction accuracy in both the pre
and post-pandemic periods than ANN and the alternate ML techniques (Fig. 17.3),
indicating the greater insensitivity of LSTM model in predicting the emerging stock
prices during the dynamic socio-economic situations. The results revealed that LSTM
method is better than all other ML techniques over time. Our results are in line with
7 Note This figure reports the prediction of Nifty and Sensex indices respectively using ANN
technique. The red trend line in both the figures represents the movement of predicted stock prices,
whereas the green trend line represents the movement of actual stock prices. Here, due to better
prediction accuracy particularly in the post-pandemic period (i.e. after 2020), both the actual and
predicted trend lines are overlapping with each other.
17 Stock Market Prediction Using Machine Learning: Evidence from India 363
Fig. 17.3 Prediction of the Indian stock markets with LSTM. 8 Source Author’s own computation
Pang et al. [34], which document that LSTM with embedded layer, known as ELSTM
provides more stabilized results than alternate ML techniques.
8 Note This figure reports the prediction of Nifty and Sensex market respectively using LSTM
technique. The red and green trend lines in both the figures represent the movement of LSTM-
predicted and actual stock prices respectively. Here, due to the highest prediction accuracy, both the
actual (i.e. red colored line) and predicted trend lines (i.e. green colored line) coincide with each
other over the period.
364 S. Patra et al.
Further, the lowest mean absolute error (MAE),9 root mean square error (RMSE)10
and the highest R2 , adjusted R2 statistics11 observed in LSTM proved its best fit than
all other models (Fig. 17.10).
We employ alternate ML techniques, namely DT (in Fig. 17.4), RF (Fig. 17.5),
SVM (Fig. 17.6), LR (Fig. 17.7), RR (Fig. 17.8), and KNN regression (Fig. 17.9) to
check the robustness of the superiority of LSTM model in predicting the accurate
movement of the emerging stock prices, and observe the lower degree of prediction
accuracy using all the models. In other words, the trends of predicted prices computed
by the alternate ML models although looks similar with the trends of the actual prices,
there exists certain gap between the actual and the respective12 predicted prices. The
gap between the actual and the predicted prices arises due to the presence of higher
MAE and RMSE in the respective models during computation of the predicted values
(Fig. 17.10).
Using DT approach, we observe higher MAE (i.e. 8.57), RMSE (0.03), and lower
R2 (i.e. 0.983), adjusted R2 (0.987) for Sensex than its domestic counterparts, which
indicates the model’s comparatively lesser predictive performance for the market
(Fig. 17.10). The lesser degree of prediction accuracy reflects certain gap between
the actual and DT-predicted price of the markets. In Nifty, we find such gap over the
period. But in Sensex, the gap has been reduced after 2022, signifying the suitability
of DT model in predicting the market movements in the post Russian-Ukraine war
period (Fig. 17.4).
The trends of RF-predicted (Fig. 17.5) and SVM-predicted price (Fig. 17.6) also
hold certain differences from the actual price in both the markets, indicating the lower
degree of prediction accuracy of RF and SVM models than LSTM approach. The
degree of differences between the actual and the computed RF-SVM predicted prices
are higher in the pre-covid period in both the markets, but then consistently reduce
in the post-covid period signifying their increased degree of prediction accuracy
in the post-pandemic situation. Among both the approaches, the lowest degree of
MAE, and RMSE, and the highest R2 , adjusted R2 proved RF better than SVM in
predicting the Indian stock prices, but both the models lag behind LSTM by holding
comparatively higher MAE, RMSE, and lower R2 and adjusted R2 (Fig. 17.10).
9 MAE explains the average distance between the actual price, and predicted price. The value of
MAE closer to zero indicates that the model provides the accurate predicted prices, and the trends
of both the actual and predicted prices coincides with each other.
10 RMSE is a measure of the average deviation between the predicted values from a model and
the actual observed values. It calculates the square root of the average of the squared differences
between predicted and actual values. Smaller RMSE values indicate better predictive performance,
and vice versa.
11 R-squared, often denoted as R2 , is a statistical measure used to assess the goodness of fit of a
regression model. R2 provides an indication of how well the model fits the data. Adjusted R-squared
(Adj R2 ) adjusts the R2 value to account for the number of predictors in the model. The closer the
value of R2 and Adj R2 to 1, the better the model fits the data.
12 Here, the level of predicted prices computed by different ML techniques varies from each other.
Among all the techniques, LSTM model provides the accurate predicted prices irrespective of all
the time phases, which coincides with the trends of the actual prices over time (see Fig. 17.3).
17 Stock Market Prediction Using Machine Learning: Evidence from India 365
Fig. 17.4 Prediction of the Indian stock markets using DT. 13 Source Author’s own computation
Using LR approach (Fig. 17.7), we observe the higher degree of distance between
the actual, and predicted prices in the post-covid period, which indicates the inap-
propriateness of LR model in predicting the Indian stock prices in the dynamic
socio-economic situations. Further, the highest MAE (accounted as 9.74) proves the
lower degree of predictive performance of LR approach for Sensex than its domestic
counterpart (Fig. 17.10).
The predicted prices through RR (Fig. 17.8) and KNN approach (Fig. 17.9) also
holds certain gap from the actual prices. Such gaps are higher in the pre-covid period,
and then consistently reduce after 2022 (i.e. in the post Russian-Ukraine war period),
indicating the presence of certain time effect on the level of prediction accuracy of
13 Note This figure reports the prediction of Nifty and Sensex market respectively using DT model.
The red and green trend lines in both the figures represents the movement of DT-predicted and
actual stock prices respectively. Here, both the actual (i.e. red colored line) and DT-predicted trend
lines (i.e. green colored line) are similar, but there exists certain gap between them indicating lower
degree of prediction accuracy than LSTM model.
366 S. Patra et al.
Fig. 17.5 Prediction of the Indian stock markets using RF. 14 Source Author’s own computation
RR and KNN approach. We find the highest MAE of KNN in both the markets
indicating its lower degree of prediction accuracy than RR approach (Fig. 17.10).
In the overall, we find that the level of prediction accuracy through DT, RF, SVM,
LR, RR, and KNN drastically varies from one point of time to other, indicating the
presence of time-dependence characteristics of these models. The greater degree of
sensitiveness of these ML-techniques to time prove them comparatively inappropriate
in predicting the Indian stock prices during the dynamic socio-economic situations.
Similarly, the increased level of prediction accuracy of ANN in the post-covid period
signifies the best-fit of the model only in the post-pandemic situation. But, LSTM
approach unlike other models provides the most accurate predicted prices over the
14 Note This figure reports the prediction of Nifty and Sensex market respectively using RF approach.
The red and green trend lines in both the figures represents the movement of RF-predicted and actual
stock prices respectively. Here, both the actual (i.e. red colored line) and RF-predicted trend lines (i.e.
green colored line) are similar, but there exists certain gap between them indicating a comparatively
lower degree of prediction accuracy than LSTM model.
17 Stock Market Prediction Using Machine Learning: Evidence from India 367
Fig. 17.6 Prediction of the Indian stock markets using SVM. 15 Source Author’s own computation
15 Note This figure reports the prediction of Nifty and Sensex market respectively using SVM
approach. The red and green trend lines in both the figures represents the movement of SVM-
predicted and actual stock prices respectively. Here, both the actual (i.e. red colored line) and
SVM-predicted trend lines (i.e. green colored line) are similar, but there exists certain gap between
them indicating a comparatively lower degree of prediction accuracy than LSTM model, particularly
in the pre-pandemic period.
368 S. Patra et al.
Fig. 17.7 Prediction of the Indian stock markets using LR. 16 Source Author’s own computation
other hand, the lowest R2 (i.e. 0.922 for Nifty, and 0.929 for Sensex), adjusted R2
(i.e. 0.917 for Nifty, and 0.927 for Sensex) in LR model prove its least-fit to predict
the Indian stock prices (Table 17.1). In other words, we observe the lowest degree of
prediction accuracy in LR approach, indicating its inappropriateness to predict the
Indian stock market over time. Moreover, both the actual and predictive trends of the
Indian stock prices remain time-varying, indicating the market’s adaptability to the
dynamic economic situations. In a similar vein, Das and Patra [13, 14] observed the
variation in the Indian banking performance immediately after the global financial
crisis.
We report the performance parameters of all the ML-techniques in Fig. 17.10,
and observe the highest MAE in SVM (i.e. 6.05), and LR model (i.e. 9.74) for Nifty
16 Note This figure reports the prediction of Nifty and Sensex market respectively using LR approach.
The red and green trend lines in both the figures represents the movement of LR-predicted and actual
stock prices respectively.
17 Stock Market Prediction Using Machine Learning: Evidence from India 369
Fig. 17.8 Prediction of the Indian stock markets using RR. 17 Source Author’s own computation
and Sensex respectively. The highest degree of MAE creating the highest distance
between the actual, and predicted prices makes the respective models inappropriate in
predicting the respective stock markets. On the other hand, LSTM model holding the
lowest MAE (i.e. 0.06) has been proved as the most suitable approach in predicting
both the Indian stock markets (Fig. 17.10).
17 Note This figure reports the prediction of Nifty and Sensex market respectively using RR
approach. The red and green trend lines in both the figures represents the movement of RR-predicted
and actual stock prices respectively. Here, both the actual (i.e. red colored line) and RR-predicted
trend lines (i.e. green colored line) are similar, but there exists certain gap between them indicating
a comparatively lower degree of prediction accuracy than LSTM model.
370 S. Patra et al.
Fig. 17.9 Prediction of the Indian stock markets using KNN. 18 Source Author’s own computation
18 Note This figure reports the prediction of Nifty and Sensex market respectively using KNN
approach. The red and green trend lines in both the figures represents the movement of KNN-
predicted and actual stock prices respectively. Here, both the actual (i.e. red colored line) and
KNN-predicted trend lines (i.e. green colored line) are similar, but there exists certain gap between
them indicating a comparatively lower degree of prediction accuracy than LSTM model.
17 Stock Market Prediction Using Machine Learning: Evidence from India 371
Fig. 17.10 Comparison of performance parameters in predicting the Indian stock markets
372 S. Patra et al.
Table 17.1 Comparison of the alternative machine learning techniques in predicting the Indian
stock markets
ML techniques Computed statistics
R2 ADJ R2 RMSE MAE
Nifty Sensex Nifty Sensex Nifty Sensex Nifty Sensex
ANN 0.980 0.981 0.980 0.980 0.004 0.010 0.164 1.065
LSTM 0.989 0.983 0.989 0.987 0.002 0.002 0.067 0.067
DT 0.975 0.964 0.973 0.973 0.025 0.031 4.576 8.576
RF 0.978 0.972 0.979 0.974 0.018 0.028 1.625 8.290
SVM 0.925 0.936 0.918 0.928 0.084 0.029 6.053 8.281
LR 0.922 0.929 0.917 0.927 0.096 0.038 4.744 9.744
RR 0.983 0.974 0.975 0.975 0.065 0.025 3.877 6.416
KNN 0.964 0.968 0.964 0.964 0.134 0.031 5.968 9.741
Note: R-squared (R2 ) assesses the proportion of variance in the dependent variable explained by the
model. Adjusted R-squared (Adj R2 ) adjusts R2 for the number of predictors to mitigate the risk of
overfitting. Root Mean Squared Error (RMSE) measures the average deviation between predicted
and actual values in the respective models. Mean Absolute Error (MAE) measures the average
absolute deviation between predicted and actual values in the respective models. These metrics are
commonly used to evaluate the performance of predictive models and to compare different models.
Each of them provides valuable insights into different aspects of a model’s performance.
Source Author’s own computation.
17.5 Conclusion
In the modern trading platform, technology plays an important role in the financial
market. In a similar aspect, Bhuyan et al. [7] documented the beneficial impact
of the technological shift on the productivity of the Indian financial institutions
like banks. Therefore, the prediction of the trends of the financial sector like stock
markets using the sophisticated machine learning (ML) techniques has become a
thrust issue in the investor’s world. Several ML models produce an output of market’s
prediction, but the level of prediction accuracy varies from one method to other. The
study considers the data on closing stock prices of both Nifty and Sensex from 3rd
January 2000 to 11th October 2022, and splits the full sample into 20% testing
and 80% training sample. We observe that among all the ML techniques, Long
Short-Term Memory (LSTM) with the greater extent of prediction accuracy and the
lowest errors has been proved as the most suitable model to predict the movements
of the Indian stock prices over time. Moreover, LSTM approach unlike other ML-
techniques hold certain time-independence characteristics in predicting the market
movements, which makes the level of prediction accuracy stable over time. On the
other hand, support vector machines (SVM), and linear regression (LR) approaches
with lowest degree of prediction accuracy, and highest errors have been proved as
inappropriate models to predict the movements of Nifty, and Sensex respectively.
These strategies may be sample specific and thus could benefit from more back testing
17 Stock Market Prediction Using Machine Learning: Evidence from India 373
on other sample markets, and time periods. Here, the robustness of our method would
benefit from testing it on another time periods and markets. Further, the significance
of the climate change has been elaborated in detail for India [8]. Therefore, the
prediction of the high-frequency time series, such as climate variables using machine-
learning techniques can be the best scope for the future research.
We discuss the strengths and weaknesses of several ML-techniques and provide
the important insights in applying advanced technologies for stock market predic-
tion. We contribute to the emerging literature on empirical asset pricing in the Indian
stock market by building and analyzing a comprehensive set of market predic-
tion factors with the usage of several machine learning algorithms. Our prediction
approach provides a potentially beneficial alternative for the investors to identify the
return opportunities and achieve the diversification benefits by mitigating risk while
investing in the Indian stock markets.
References
1. Adya, M., Collopy, F.: How efective are neural networks at forecasting and prediction? A
review and evaluation. J. Forecast. 17(1), 481–495 (1998)
2. Ballings, M., den Poel, D.V., Hespeels, N., Gryp, R.: Evaluating multiple classifiers for stock
price direction prediction. Expert Syst. Appl. 42(20), 7046–7056 (2015)
3. Barak, S., Arjmand, A., Ortobelli, S.: Fusion of multiple diverse predictors in stock market.
Inf. Fusion 36(1), 90–102 (2017)
4. Bezerra, P.C.S., Albuquerque, P.H.M.: Volatility forecasting via SVR—GARCH with mixture
of Gaussian kernels. CMS 14(2), 179–196 (2017)
5. Bhuyan, B., Patra, S., Bhuian, R.K.: Market adaptability and evolving predictability of stock
returns: an evidence from India. Asia-Pacific Finan. Mark. 27, 605–619 (2020)
6. Bhuyan, B., Patra, S., Bhuian, R.K.: Do LBMA gold price follow random-walk? Gold Bulletin
54(2), 151–159 (2021)
7. Bhuyan, B., Patra, S., Bhuian, R.K.: Measurement and determinants of total factor productivity:
evidence from Indian banking industry. Int. J. Prod. Perform. Manag. 71(7), 2970–2990 (2022)
8. Bhuyan, B., Mohanty, R.K., Patra, S.: Impact of climate change on food security in India: an
evidence from autoregressive distributed lag model. Environ. Dev. Sustain. 1–21 (2023)
9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
10. Cavalcante, R.C., Brasileiro, R.C., Souza, V.L., Nobrega, J.P., Oliveira, A.L.: Computational
intelligence and financial markets: a survey and future directions. Expert Syst. Appl. 55(1),
194–211 (2016)
11. Chen, Y.-S., Cheng, C.-H., Tsai, W.-L.: Modeling fitting-function-based fuzzy time series
patterns for evolving stock index forecasting. Appl. Intell. 41(2), 327–347 (2014)
12. Chen, H., Xiao, K., Sun, J., Wu, S.: A double-layer neural network framework for high-
frequency forecasting. ACM Trans. Manage. Inf. Syst. (TMIS) 7(4), 1–17 (2017)
13. Das, M.K., Patra, S.: Productivity and efficiency of public sector banks in India after the global
financial crisis. IUP J. Bank Manag. 15(2) (2016)
14. Das, M.K., Patra, S.: Productivity and efficiency of private sector banks after global financial
crisis: evidence from India. Asian J. Res. Bank. Financ. 6(5), 1–14 (2016)
15. Fama, E.F.: Efficient capital markets: II. J. Financ. 46(5), 1575–1617 (1991)
16. Gupta, N.: Artificial neural network. Netw. Complex Syst. 3(1), 24–28 (2013)
17. Guresen, E., Kayakutlu, G., Daim, T.U.: Using artificial neural network models in stock market
index prediction. Expert Syst. Appl. 38(8), 10389–10397 (2011)
374 S. Patra et al.
18. Göçken, M., Özçalıcı, M., Boru, A., Dosdogru, A.T.: Integrating metaheuristics and artificial
neural networks for improved stock price prediction. Expert Syst. Appl. 44(1), 320–331 (2016)
19. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Building direct citation networks. Scientometrics
115(2), 817–832 (2018)
20. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques
applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
21. Hiremath, G.S., Kattuman, P.: Foreign portfolio flows and emerging stock market: Is the
midnight bell ringing in India? Res. Int. Bus. Financ. 42, 544–558 (2017)
22. Hiremath, G.S., Narayan, S.: Testing the adaptive market hypothesis and its determinants for
the Indian stock markets. Financ. Res. Lett. 19, 173–180 (2016)
23. Van Houdt, G., Mosquera, C., Nápoles, G.: A review on the long short-term memory model.
Artif. Intell. Rev. 53, 5929–5955 (2020)
24. Huang, W., Nakamori, Y., Wang, S.-Y.: Forecasting stock market movement direction with
support vector machine. Comput. Oper. Res. 32(10), 2513–2522 (2005)
25. Kara, Y., Boyacioglu, M.A., Baykan, Ö.K.: Predicting direction of stock price index movement
using artificial neural networks and support vector machines: the sample of the Istanbul Stock
Exchange. Expert Syst. Appl. 38(5), 5311–5319 (2011)
26. Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests:
statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259(2), 689–702 (2017)
27. Krogh, A.: What are artificial neural networks? Nat. Biotechnol. 26(2), 195–197 (2008)
28. Kumar, D., Meghwani, S.S., Thakur, M.: Proximal support vector machine based hybrid
prediction models for trend forecasting in financial markets. J. Comput. Sci. 17(1), 1–13 (2016)
29. Kumar, M., Thenmozhi, M.: Forecasting stock index returns using ARIMA-SVM, ARIMA-
ANN, and ARIMA-random forest hybrid models. Int. J. Bank. Account. Financ. 5(3), 284–308
(2014)
30. Laboissiere, L.A., Fernandes, R.A., Lage, G.G.: Maximum and minimum stock price fore-
casting of Brazilian power distribution companies based on artificial neural networks. Appl.
Soft Comput. 35(1), 66–74 (2015)
31. Lahmiri, S.: Improving forecasting accuracy of the S&P500 intra-day price direction using
both wavelet low and high frequency coefficients. Fluct. Noise Lett. 13(01), 1450008 (2014)
32. Lo, A.W.: Reconciling efficient markets with behavioral finance: the adaptive markets
hypothesis. J. Invest. Consult. 7(2), 21–44 (2005)
33. Pai, P.-F., Lin, C.-S.: A hybrid ARIMA and support vector machines model in stock price
forecasting. Omega 33(6), 497–505 (2005)
34. Pang, X., Zhou, Y., Wang, P., Lin, W., Chang, V.: An innovative neural network approach for
stock market prediction. J. Supercomput. 76, 2098–2118 (2020)
35. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement
using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl.
42(1), 259–268 (2015)
36. Patra, S., Hiremath, G.S.: Are the stock markets adaptive? Evidence from approximate entropy
approach. ASBBS Proc. 26, 408–408 (2019)
37. Patra, S., Hiremath, G.S.: An entropy approach to measure the dynamic stock market efficiency.
J. Quant. Econ. 20(2), 337–377 (2022)
38. Patra, S.: Informational efficiency and adaptive stock markets (Doctoral dissertation, IIT
Kharagpur) (2020)
39. Patra, S., Hiremath, G.S.: Is there a time-varying nexus between stock market liquidity and
informational efficiency?–A cross-regional evidence. Stud. Econ. Financ. (2024)
40. Pisner, D.A., Schnyer, D.M.: Support vector machine. In Machine learning (pp. 101–121).
Academic Press, New York (2020)
41. Schonlau, M., Zou, R.Y.: The random forest algorithm for statistical learning. Stand. Genomic
Sci. 20(1), 3–29 (2020)
42. Tay, F.E., Cao, L.: Application of support vector machines in financial time series forecasting.
Omega 29(4), 309–317 (2001)
17 Stock Market Prediction Using Machine Learning: Evidence from India 375
43. Tsaih, R., Hsu, Y., Lai, C.C.: Forecasting S&P 500 stock index futures with a hybrid AI system.
Decis. Support Syst. 23(2), 161–174 (1998)
44. Wang, J.-J., Wang, J.-Z., Zhang, Z.-G., Guo, S.-P.: Stock index forecasting based on a hybrid
model. Omega 40(6), 758–766 (2012)
45. Weng, B., Ahmed, M.A., Megahed, F.M.: Stock market one-day ahead movement prediction
using disparate data sources. Expert Syst. Appl. 79(1), 153–163 (2017)
46. Whittington, J.C., Bogacz, R.: Theories of error back-propagation in the brain. Trends Cogn.
Sci. 23(3), 235–250 (2019)
47. Xiao, Y., Xiao, J., Lu, F., Wang, S.: Ensemble ANNs-PSO-GA approach for day-ahead stock
e-exchange prices forecasting. Int. J. Comput. Intell. Syst. 6(1), 96–114 (2013)
48. Yan, D., Zhou, Q., Wang, J., Zhang, N.: Bayesian regularisation neural network based on
artificial intelligence optimisation. Int. J. Prod. Res. 55(8), 2266–2287 (2017)
49. Yang, Y., Yang, M., Shen, C., Wang, F., Yuan, J., Li, J., Liu, Y.: Evaluating the accuracy of
different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding
of 2019-nCoV infections. MedRxiv 78(3), 241 (2020)
50. Zhang, N., Lin, A., Shang, P.: Multidimensional k-nearest neighbor model based on EEMD
for financial time series forecasting. Physica A 477(1), 161–173 (2017)
51. Zhong, X., Enke, D.: Forecasting daily stock market return using dimensionality reduction.
Expert Syst. Appl. 67(1), 126–139 (2017)
Chapter 18
Realized Stock-Market Volatility: Do
Industry Returns Have Predictive Value?
R. Demirer
Department of Economics and Finance, Southern Illinois University Edwardsville, Edwardsville,
IL, USA
e-mail: [email protected]
R. Gupta
Department of Economics, University of Pretoria, Pretoria 0002, South Africa
e-mail: [email protected]
C. Pierdzioch (B)
Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008
Hamburg, Germany
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 377
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_18
378 R. Demirer et al.
18.1 Introduction
Predicting volatility is a key component of option pricing, hedging, and portfolio opti-
mization applications. Naturally, there exists a large strand of literature that offers a
wide-array of univariate and multivariate models to predict and model stock-market
volatility (e.g. [3, 11, 12, 27, 30, 31, 36] for a detailed discussion of research in
this area). Despite the multitude of studies in this literature, using a wide range of
predictors that include macroeconomic and financial variables, however, the litera-
ture has not yet examined the predictive power of industry level information over
aggregate level stock-market volatility. This study adds to this line of research by
investigating the role of lagged industry returns from across the entire economy in
predicting aggregate stock-market volatility. Indeed, we show that incorporating the
information in lagged industry returns can help improve out-of-sample predictions
of aggregate stock-market volatility, rendering significant economic benefits partic-
ularly as the degree of risk aversion increases.
In a well cited study, Hong et al. [21] present the theoretical framework towards
the predictive power of industry returns for stock market returns. According to the
so-called gradual diffusion of information hypothesis, the information contained in
industry returns diffuses gradually across markets as a result of the interaction of
boundedly rational investors with access to private information at different points in
time. In this setting, public information gets partially reflected in asset prices such
that certain types of investors, such as those who specialize in trading the broad
market index, experience a lag in receiving industry level information that is already
accessible to investors who specialize in particular industries. This, in turn, forms
the basis for return predictability at the aggregate market level as industry level
dynamics contain predictive information regarding the economic fundamentals that
lead the aggregate stock market. Although later studies including [7, 22, 33, 35,
38] present conflicting evidence regarding the predictive content of industry returns,
interestingly, the literature has not yet extended the analysis to stock-market volatility
predictions. To the best of our knowledge, ours is the first to examine the predictive
power of lagged industry returns over aggregate stock-market volatility.
In our empirical analysis, we use a machine-learning technique known as random
forests [4] to predict realized (good and bad) stock-market volatility. Random forests
have been used in recent applications to study the predictive value of industry returns
for stock market returns [7] and the realized volatility of intraday Bitcoin returns [5].1
In our case, instead of relying on conditional volatility models from the generalized
autoregressive conditional heteroskedasticity (GARCH)-family, we follow [1] and
study monthly realized volatility (RV) as measured by the sum of squared daily log-
returns over a month. The use of realized volatility provides an observable measure
of the latent process of volatility that is model-free unlike the conditional estimates
of the same.
1For other recent applications of machine-learning techniques to modeling and predicting the
volatility of financial time series, see [24, 28], among others.
18 Realized Stock-Market Volatility: Do Industry Returns … 379
Starting at the top level of a regression tree, the algorithm iterates over the predic-
tors, .s, and the corresponding splitting points, . p, that can be formed using the data on
a predictor. For every combination of a predictor and a splitting point, the algorithm
computes two half-planes,. R1 (s, p) = {xs |xs ≤ p} and. R2 (s, p) = {xs |xs > p}. The
search for an optimal combination of a predictor and a splitting point minimizes the
standard squared-error loss criterion:
⎧ ⎫
⎨ ∑ ∑ ⎬
min min
. ¯ 1 )2 + min
(RVi − RV ¯ 2 )2
(RVi − RV , (18.1)
s, p ⎩ RV
¯ 1 ¯ 2
RV ⎭
xs ∈R1 (s, p) xs ∈R2 (s, p)
where the index .i identifies those data on realized volatility that belong to a
half-plane, and . RV ¯ k = mean{RVi |xs ∈ Rk (s, p)}, k = 1, 2 denotes the half-plane-
specific mean of realized volatility. The outer minimization searches over all combi-
nations of .s and . p. Given .s and . p, the inner minimization minimizes the half-plane-
specific squared error loss by an optimal choice of the half-plane-specific means
of realized volatility. The solution of the minimization problem given in Eq. (18.1)
yields the top-level optimal splitting predictor, the top-level optimal splitting point,
and the two region-specific means of realized volatility. Accordingly, the solution
yields a first simple regression tree that has two terminal nodes.
At the next stage, the minimization problem in Eq. (18.1) is solved separately
for the two optimal top-level half-planes, . R1 (s, p) and . R2 (s, p), in order to grow a
larger regression tree. The new solution yields up to two second-level optimal splitting
predictors and optimal splitting points, and four second-level region-specific means
of realized volatility. Upon repeating this search-and-split algorithm multiple times,
we are able to grow an increasingly complex regression tree. Finally, the search-
and-split algorithm stops when a regression tree has a preset maximum number of
terminal nodes or every terminal node has a minimum number of observations. We use
a cross-validation approach to identify the optimal minimum number of observations
per terminal node (see Sect. 18.3.1 for further details).
When the search-partition algorithm stops, the regression tree sends the predictors
from its top level to the various leaves along the various optimal partitioning points
(nodes) and branches such that, for a regression tree made up of. L regions, the region-
specific means can be used to predict realized volatility as follows (.1 denotes the
indicator function):
( ) ∑ L
. T xi , {Rl }1 =
L ¯ l 1(xi ∈ Rl ).
RV (18.2)
l=1
While the search-and-split algorithm can be used in principle to compute finer and
finer granular predictions of realized volatility, the resulting growing complexity of
the hierarchical structure of a regression tree gives rise to an overfitting and data-
sensitivity problem, which, in turn, deteriorates its performance. A random forest
solves this problem as follows. First, a large number of bootstrap samples (sampling
with replacement) is obtained from the data. Second, to each bootstrap sample, a
18 Realized Stock-Market Volatility: Do Industry Returns … 381
random regression tree is fitted. A random regression tree differs from a standard
regression tree in that the former uses for every splitting step only a random subset
of the predictors, which mitigates the effect of influential predictors on tree building.
Growing a large number of random trees decorrelates the predictions from individual
trees, and averaging the decorrelated predictions obtained from the individual random
regression trees stabilizes the predictions of realized volatility.
18.2.2 Data
We use monthly excess returns for 49 value-weighted industry portfolios for the
period January 1946 to December 2019, obtained from Ken French’s online data
library.2 Following the convention, we exclude “others” and end up with 48 indus-
tries defined based on the Standard Industrial Classification (SIC) system. Sepa-
rately, daily and monthly stock market returns are collected as the returns of a value-
weighted market portfolio from the Center for Research in Security Prices (CRSP).
Daily market returns are used to compute the realized market volatility estimates
(. RV ) for each month from log daily returns (.rt ) as follows:
∑
N
. RVt = ri2 , (18.3)
i=1
where . N denotes the number of data available for the month. In addition to realized
volatility, we examine “good” and “bad” realized volatility. The categorization of
RV into its good and bad components is an important issue as [16] stresses that
financial market participants care not only about the level of volatility, but also of its
nature, with all traders making the distinction between good and bad volatilities. The
“good” and “bad” components of realized volatility are formulated as the upside and
downside realized semi-variances (. RV B and . RV G ), respectively, computed from
positive and negative returns (see [2]) as follows:
∑
T
. RVt B = ri2 I[(ri )<0] , (18.4)
i=1
∑
T
. RVtG = ri2 I[(ri )>0] . (18.5)
i=1
2 Available at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.
382 R. Demirer et al.
market hypothesis of [25]. The heterogeneous market hypothesis stipulates that the
stock market is populated by different types of traders who differ with respect to their
sensitivity to information flows at different time horizons. In this setting, market par-
ticipants with short- versus long-term investment horizons respond to information
flows heterogeneously at different time horizons.
Accordingly, the key idea underlying the HAR-RV model is to use realized volatil-
ities from different time resolutions to model the dynamics of realized volatility.
When studying daily realized volatility, it is common practice among researchers to
consider daily, weekly, and monthly realized volatilities as predictors of subsequent
realized volatility. In our case, because we study monthly data, in line with the strand
of the literature that deals with the lead-lag relationship between industry and aggre-
gate level returns, we model the month-.h-ahead realized volatility, . RVt+h , using the
current realized volatility, . RVt , the quarterly realized volatility, . RVt,q , computed as
the average realized volatility from month .t − 3 to month .t − 1, and the yearly real-
ized volatility, . RVt,y , computed as the average realized volatility from month .t − 12
to month .t − 1. We compute these quarterly and yearly average realized volatilities
for the standard measure of realized volatility and for good and bad realized volatility.
Figure 18.1 presents the time series plots of computed realized volatility, . RVt ,
series along with its low-frequency components, . RVt,q , and . RVt,y , in Panel A, and
the corresponding counterparts for bad and good realized volatility in Panels B and C.
We observe notable spikes in the realized volatility estimates around the stock market
crash of 1987 and later during the 2008 global financial crisis period. Comparing
Panels B and C, we observe that bad realized volatility was the dominant factor in
the case of the 1987 stock market crash, while the 2008 global financial crisis period
was equally plagued by both the good and bad components of realized volatility.
18.3.1 Calibration
Because the predictive value of industry returns may have changed over time, we
use rolling-estimation windows of length 120, 180, 240, and 360 months to esti-
mate both the baseline HAR-RV model (the model that excludes lagged industry
returns) and the HAR-RV model extended to include lagged industry returns. We
model realized volatility one-month, three-months and one-year ahead (that is, we
set .h = 1, 3, 12) by estimating random forests in the statistical computing program
R [29] using the add-on package “grf” [37]. While shifting the rolling-estimation
windows across the data set, we optimize, by means of cross validation, the number
of predictors randomly selected for splitting, the minimum node size of a tree, and
the parameter that governs the maximum imbalance of a node. We optimize these
18 Realized Stock-Market Volatility: Do Industry Returns … 383
Bad RV
Bad RVq
500
Bad RVy
Bad RV
300
100
0
Good RVy
Good RV
300
100
0
parameters separately for the baseline HAR-RV model and the extended HAR-RV
model that features lagged industry returns.3 We use 2000 random regression trees
to grow a random forest.
In order to set the stage for our empirical analysis, it is useful to go back for the
moment to the classic HAR-RV model. In the context of our analysis, the classic
HAR-RV model is formulated as . RVt+h = β0 + β1 RVt + β2 RVt,q + β3 RVt,y + εt ,
where.β j ,. j = 0, 1, 2, 3 are the coefficients to be estimated by means of the ordinary-
least-squares (OLS) technique, .εt is an error term, and . RVt+h denotes the realiza-
tion of realized volatility in month .t + h. The classic HAR-RV model extended
to include lagged industry∑ returns is then formulated as . RVt+h = β0 + β1 RVt +
β2 RVt,q + β3 RVt,y + 48 j=1 β j+3 r t, j + εt . The corresponding random-forest mod-
els can, thus, be expressed as . RVt+h = R F(RVt , RVt,q , RVt,y ) when we exclude
lagged industry returns, and as . RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )
when we include lagged industry returns in the array of predictors. At this point, it
is worth noting that our framework ensures that (i) random forests do not necessar-
ily invoke a linear structure as does the OLS technique, and (ii) random forests go
beyond the OLS technique in that they allow the predictors (lagged industry returns
in our case) to interact in an arbitrary data-driven way.4
We compare in Table 18.1, for various rolling-estimation windows and investment
horizons, the out-of-sample predictive performance of the HAR-RV model estimated
by means of the OLS technique with the out-of-sample predictive performance of
random forests in terms of the root-mean-squared prediction error (RMSPE) statistics
implied by these two models. To this end, we estimate both models by excluding
lagged industry returns (that is, the array of predictors includes . RVt , RVt,q , RVt,y
only; the OLS model also features a constant) and then by including lagged industry
returns. We then compute the RMSPE statistics for both models and compute the
corresponding ratios. A ratio larger than unity indicates that the random forests
outperform the corresponding HAR-RV model in terms of the RMSPE statistic.
Finally, we repeat these calculations for good and bad volatility.
Three main results emerge from Table 18.1. First, the RMSPE ratio exceeds unity
in all cases when we include lagged industry returns, indicating the superior perfor-
mance of random forests against the OLS model. Second, when we exclude lagged
industry returns from the set of predictors, the results are more balanced, where ran-
dom forests in several cases outperform the OLS estimator for realized volatility and
3 The “grf” package also allows different subsamples to be used for constructing a tree and for
tical tests is complicated by the nonlinear and complex structure of random forests. We, therefore,
use various statistics (basic statistics, formal statistical tests, measures of economic benefits, metrics
of relative importance of predictors) to evaluate and compare models along different dimensions.
18 Realized Stock-Market Volatility: Do Industry Returns … 385
Table 18.1 Comparing OLS and random forests by means of RMSPE ratios
Window Excluding industry returns Including industry returns
.h = 1 .h = 3 .h = 12 .h = 1 .h = 3 .h = 12
realized bad volatility for the short and intermediate rolling-estimation windows.
Third and more importantly, the RMSPE ratios are found to be substantially larger
when we include lagged industry returns in the set of predictors than when we exclude
lagged industry returns. In other words, the results show that random forests system-
atically, and in a quantitatively substantial way, improve out-of-sample predictions
of aggregate stock market realized volatility relative to the standard HAR-RV model
estimated by the OLS technique once we account for the industry level information
embedded in lagged industry returns.
The observed superior performance of the random forest against the OLS model
is not unexpected given that the HAR-RV model extended to include lagged industry
returns from across the entire economy requires the estimation of many parameters.
In case some of these industries have only limited predictive power for realized
volatility, their estimated parameters will add noise to the predictions of realized
volatility. This brings about a trade-off when the OLS technique is used to estimate
the HAR-RV model as the improvement in performance due to the predictive power
386 R. Demirer et al.
Having established evidence that the random-forest model is better suited than the
OLS technique to analyze the predictive value of lagged industry returns for realized
volatility, we summarize in Table 18.2, for the four rolling-estimation windows and
the three investment horizons studied, the ratios of the RMSPEs of the restricted,
. RVt+h = R F(RVt , RVt,q , RVt,y ), and the full random-forest model that includes
lagged industry returns,. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ). Panels
B and C present the results for good and bad volatility. The ratios in Table 18.2 exceed
unity for the vast majority of cases, indicating that the full model that incorporates
lagged industry returns outperforms the restricted model for realized volatility as
well as its bad and good variants. This means that extending the model to include
industry level information improves the out-of-sample accuracy of its predictions
of aggregate stock-market volatility. The magnitude of the ratios of the RMSPEs
tends to be larger for the short investment horizon than for the two longer investment
horizons, especially for realized volatility and good realized volatility, suggesting
that industry level information can be particularly useful to improve relatively shorter
term stock-market volatility predictions and for bullish market states.
Because large prediction errors have a disproportionately large effect on the
RMSPE statistic, we report in Table 18.3 the results obtained from the ratio of the
mean-absolute-prediction errors (MAPE) statistic of the restricted model and the full
model. Again, a value larger than unity for this ratio indicates that lagged industry
5 We also cross-checked how the random-forest model that features lagged industry returns performs
as compared to the classic HAR-RV model (estimated by OLS) that excludes industry returns. The
results, reported in Table 18.11 (Appendix), show that the random-forest model outperforms the
classic HAR-RV model in terms of the RMSPE ratio, with the long rolling-estimation window,
mainly for realized “bad” volatility, being an exception.
18 Realized Stock-Market Volatility: Do Industry Returns … 387
Table 18.2 The predictive power of lagged industry returns (RMSPE ratios)
Window .h = 1 .h = 3 .h = 12
returns improve predictive accuracy. The results in Table 18.3 corroborate those for
the RMSPE ratios reported in Table 18.2. We find that the full model that incorpo-
rates lagged industry returns outperforms the restricted model with generally larger
MAPE ratios observed at the short investment horizon, further supporting the pre-
dictive value of industry level information particularly for shorter horizons and for
bullish market states.
In volatility forecasting exercises that involve noisy volatility proxies, Patton
[26] shows that the quasi-likelihood (QLIKE) loss function along with the usual
mean-squared-error loss function allow for an unbiased model ordering. Therefore,
in order to check the robustness of our findings, we report in Table 18.4 the results
for the popular QLIKE loss function. We observe that the QLIKE ratios are smaller
than for the RMSPE and MAPE statistics, however still larger than unity except for
some cases mainly for .h = 12.6 Moreover, the QLIKE ratios tend to become larger
when the length of the rolling-estimation window increases. These additional results,
6 It should be noted that the QLIKE loss function studied here implies that the loss from an under-
estimation of realized volatility is larger than the loss form a corresponding over-estimation of
the same absolute size. Hence, the results imply that an investor who suffers a greater loss from
an under-estimation than from a corresponding over-estimation of realized volatility benefits from
using lagged industry returns to predict stock-market volatility, a result that is consistent with the
results we shall report in Sect. 18.3.7.
388 R. Demirer et al.
Table 18.3 The predictive power of lagged industry returns (MAPE ratios)
Window .h = 1 .h = 3 .h = 12
thus, further confirm the predictive value of lagged industry returns for stock-market
volatility.7
Finally, having confirmed the predictive value of lagged industry returns via alter-
native loss functions, as another approach, we report in Table 18.5 the results of [8]
test of equality of out-of-sample mean-squared prediction errors of the full model
that includes lagged industry returns and the restricted model that excludes industry
level information. The test yields significant results at the 5% and, in a few cases,
at the 10% level of significance at the two shorter investment horizons for realized
volatility and its good and bad variants, confirming that the full model outperforms
the restricted model in most cases. While the test statistic takes on smaller values for
the long investment horizon, it remains significant in the majority of cases at the 10%
level of significance, and in few cases even at the 5% percent level of significance.8
7 In a recent study, Reschenhofer et al. [34] propose two alternative likelihood-based loss functions,
one based on a t-distribution (QLIKE-t) and the other based on an F-distribution (QLIKE-F), that
are less sensitive to outliers and thus allow for a more stable ranking of models. Given this evidence,
we also experimented with the QLIKE-t and QLIKE-F distributions and found qualitatively similar
results (for alternative degrees-of-freedom parameters) to those obtained from QLIKE loss function.
8 We also examined, by means of the Clark-West test, how random forests perform relative to a
HAR-RV model estimated by means of the OLS technique, where both models feature lagged
industry returns as predictors in addition to the standard HAR-RV predictors. Hence, we treated the
HAR-RV model as a nested linear version of the nonlinear random-forest model. The results reported
18 Realized Stock-Market Volatility: Do Industry Returns … 389
Table 18.4 The predictive value of lagged industry returns (QLIKE ratios)
Window .h = 1 .h = 3 .h = 12
Overall, various methods to assess the predictive value of lagged industry returns
yield consistent findings confirming that incorporating industry level information in
prediction models can improve the out-of-sample accuracy of predictions of stock-
market volatility.
In their popularly cited study, Hong et al. [21] show that 14 out of 34 industries,
including commercial real estate, petroleum, metal, retail, financial, and services,
can predict stock market movements by one month, while other industries including
petroleum, metal, and financials can predict the market even two months ahead.
However, re-examining these results with updated data, Tse [35] shows that only
one to seven industries have significant predictive ability for the stock market. These
studies, with a focus on stock market return forecasting and in-sample tests, bring
about an interesting question as to the time-varying importance of industry returns
in Table 18.13 (Appendix) corroborate that the random-forest model significantly outperforms the
OLS model.
390 R. Demirer et al.
Table 18.5 The predictive value of lagged industry returns (Clark-West test)
Window .h = 1 .h = 3 .h = 12
in aggregate level market dynamics. For this reason, we supplement our analysis by
examining the relative importance of the predictors over time.
We present in Fig. 18.2 the relative importance of the predictors in the full
model, . RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ), measured in terms of
how often a predictor is used for splitting when building a tree. Given the large num-
ber of industries used in the array of predictors, in order to ease the interpretation of
the results, we aggregate the data into an “rv” block that represents the components of
the HAR-RV model, and eleven broad industry groups (energy, materials, industrials,
consumer staples, consumer discretionary, healthcare, financials, IT, communication,
utilities, real estate).
As expected, given the popularity of the HAR-RV model in empirical finance, the
three terms of the HAR-RV model (treated in the figure as a single block) always play
an important role in the models. Interestingly, however, the role of the “rv” block that
represents the components of the HAR-RV model, changes over time. We observe
that the role of the “rv” block has gained momentum during the period preceding the
Global Financial Crisis (GFC) of 2008 and then peaked during the GFC, suggest-
ing that the importance of non-industrial information including behavioral factors
Panel A: Realized volatility
1.00 1.00 1.00 group
communication
0.75 0.75 0.75 consumer discretionary
consumer staples
energy
financials
0.50 0.50 0.50 healthcare
percent
percent
percent
industrials
it
0.25 0.25 0.25 materials
real estate
rv
0.00 0.00 0.00 utilities
1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020
time time time
percent
percent
percent
industrials
it
0.25 0.25 0.25 materials
real estate
rv
0.00 0.00 0.00 utilities
1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020
time time time
18 Realized Stock-Market Volatility: Do Industry Returns …
percent
percent
percent
industrials
it
0.25 0.25 0.25 materials
real estate
rv
0.00 0.00 0.00 utilities
1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020
time time time
Fig. 18.2 The relative importance of predictors. Note Predictor importance is computed for the full model (. RVt+h =
391
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) and a rolling-estimation window of length 240 months. Predictor importance is defined as the weighted
sum of how often a predictor is used for splitting. Maximum tree depth considered: 4. Numbers are averaged across 10 estimations of the random forests. The
investment horizons are .h = 1, 3, 12 (from left to right). The random forests are built using 2000 trees
392 R. Demirer et al.
and/or changes in investors’ risk aversion increased during the run up to the global
crash. However, we also observe, at the short and intermediate investment horizons,
an increasing role of industrials and materials during the aftermath of the global
crisis, highlighting the informational value of real economic activity. Interestingly,
at the long investment horizon, we observe a similar pattern for consumer related
industries with consumer discretionary and consumer staples taking on a greater role
in the predictive models. Overall, our analysis suggests that certain industries play
a more dominant predictive role for aggregate level volatility and that the predic-
tive contribution of industry level returns is not constant over time with a structural
change occurring during the period that precedes the global financial crisis.
In order to further confirm the inferences discussed so far, we report in this section the
findings from a battery of robustness checks. In Table 18.6, we summarize the results
of the RMSPE ratio when we add market returns as a control variable to the array
of predictors of the full model. Specifically, we ask whether a model that features
the standard HAR-RV terms, lagged market returns, and lagged industry returns has
the same predictive performance as a model that features only the standard HAR-RV
terms and lagged market returns. The results, while depending to some extent on the
combination of estimation window and investment horizon being studied, in general
suggest that industry returns indeed capture predictive information for subsequent
realized market volatility over and above lagged market returns.
Table 18.7 reports the results of four additional robustness checks (for the sake
of brevity, we focus on realized volatility). First, we replace the rolling-estimation
window by means of a recursively expanding estimation window. Second, we study,
for.h > 1, the average realized volatility formulated as.mean(RVt+1 + · · · + RVt+h ).
Third, we use the realized standard deviation as the dependent variable in our models.
These additional robustness checks lend further support to our conclusion that indus-
try returns capture valuable predictive information for subsequent realized market
volatility.
As a fourth robustness check, we consider boosted regression trees [13, 14] as
an alternative to random forests. Boosted regression trees combine regression trees
with elements of statistical boosting. They resemble random forests insofar as the
key idea is to grow a forest of trees by combining simple regression trees. In contrast
to random forests, however, boosted regression trees are estimated by means of a
forward stage-wise iterative algorithm. The specific algorithm that we consider is
known as stochastic gradient-descent boosting. Estimating the stochastic gradient-
descent variant of boosted regression trees using the R add-on package “gbm” [17]
and computing the RMSPE ratio, we observe results that further confirm the pre-
dictive value of lagged industry returns for the intermediate and long investment
horizons.
18 Realized Stock-Market Volatility: Do Industry Returns … 393
9 We could hold constant the seed when studying the effect of a variation in the number of trees (or
some other parameter like, for example, the length of the rolling-estimation window) on our results.
The results would then reflect the pure effect of a variation in the number of trees conditional on the
fixed seed. We prefer in this study not to fix the seed because the results then give a clearer picture
of the total variation of our results due to a variation of the model configuration (like a variation in
the number of trees) and the random element involved in the estimation of random forests.
394 R. Demirer et al.
times.10 We then compute, across the simulation runs, the average, minimum, and
maximum of the RMSPE and the MAPE ratios. We report the results of our simula-
tion experiment in Table 18.14 (Appendix). The results of our simulation experiment
demonstrate the robustness of our finding that lagged industry returns improve the
accuracy of predictions of realized volatility.
A key feature of random forests is that they capture in a data-driven way any potential
nonlinearities in the data as well as interaction effects between the predictor variables.
[18], who compare the gains of using various machine-learning techniques for pre-
dicting stock returns, argue that predictive gains from using trees can be attributed to
nonlinear predictor interactions that other techniques do not detect. In order to assess
whether nonlinearities and predictor interactions also play a role in the context of
our prediction experiment, we compare random forests with three popular linear
10Computational time is not a severe binding constraint on our estimations because we run the
simulation experiment and the various other variants of our models in parallel.
18 Realized Stock-Market Volatility: Do Industry Returns … 395
shrinkage estimators (see the textbook by Hastie et al. [19]): the Lasso estimator, the
Ridge-regression estimator, and an elastic net. While the Lasso estimator uses the
L1 norm of the coefficient vector to shrink the dimension of the estimated model,
the Ridge-regression estimator uses the L2 norm. The elastic net uses (in the case
of our parametrization) an equally weighted combination of the two. We use the R
add-on package “glmnet” [15] to estimate the shrinkage models, where the optimal
shrinkage parameter minimizes the 10-fold cross-validation mean cross-validated
error.
Table 18.9 reports the RMSPE ratios of the shrinkage estimators and the random-
forest model. A ratio larger than unity shows that the random-forest model has a better
predicting performance than the respective shrinkage estimator. A ratio larger than
unity shows that the random-forest model performs better than the respective shrink-
age estimator for the majority of the configurations being studied, albeit for some
configurations by a small margin. On balance, the results show that the random-forest
model has a competitive performance relative to the linear shrinkage estimators, and
that it outperforms the latter for several configurations, indicating that accounting
for departures from linearity and predictor interactions can be useful for modeling
the link between realized volatility and lagged industry returns.
The results reported in the preceding sections are based on the assumption that an
investor’s loss is a symmetric function of the squared or absolute prediction error
(with the QLIKE loss function being the exception). That is, an under-prediction
of realized volatility causes the same loss as an over-prediction of the same size.
In practical settings, however, one could easily think of situations in which the loss
function, such as one implied by certain options-trading strategies, is asymmetric in
the prediction error. Therefore, in order to account for such a setting, we study the
396 R. Demirer et al.
than from an over-prediction) adjusts his or her prediction upward. Such an investor,
thus, predicts a quantile of the conditional distribution of realized volatility above
the median. Conversely, an investor who has a loss function with a shape parameter
.α < 0.5 adjusts his or her prediction downward relative to the median. We compute
Considering alternative loss function is one way to quantify the economic benefits of
predictions, and the discussion in Sect. 18.3.7 indicates that studying lagged industry
returns benefits investors who are particularly concerned about under-predicting mar-
ket volatility. This is certainly an important consideration for the pricing of options
contracts as ignoring industry level information can potentially lead to under-pricing
of these securities. An alternative way to assess the economic implications of our
findings is to directly use an investor’s utility function to measure the benefits from
utilizing industry level information in predicting realized market volatility. To this
398 R. Demirer et al.
1.6
1.6
1.4
1.4
1.4
Loss ratio
Loss ratio
Loss ratio
1.2
1.2
1.2
1.0
1.0
1.0
p=1
0.8
0.8
0.8
p=2
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
PanelB:Badrealizedvolatility
h=1 h=3 h=12
1.6
1.6
1.6
1.4
1.4
1.4
1.0 1.2
1.0 1.2
1.0 1.2
Loss ratio
Loss ratio
Loss ratio
0.8
0.8
0.8
p=1
0.6
0.6
0.6
p=2
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
PanelC:Goodrealizedvolatility
h=1 h=3 h=12
1.6
1.6
1.6
1.4
1.4
1.4
Loss ratio
Loss ratio
Loss ratio
1.2
1.2
1.2
1.0
1.0
1.0
p=1
p=2
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
Fig. 18.3 The shape of an investor’s loss function and the predictive value of lagged indus-
try returns. Note This figure displays the ratio of the cumulated loss for the restricted random-
forest model, . RVt+h = R F(RVt , RVt,q , RVt,y ), and the full random-forest model, . RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ) that includes lagged industry returns. The loss func-
tion is given in Eq. 18.6. A ratio larger than unity signals that the full model performs better than the
restricted model. The loss ratios are averaged across the four different rolling-estimation windows
(120, 180, 140, and 360 months). The random forests are estimated by setting the minimum node
size to 5 and using one-third of the predictors randomly chosen for splitting. The random forests
are built using 2000 trees. The parameter h denotes the investment horizon (in months)
18 Realized Stock-Market Volatility: Do Industry Returns … 399
In Table 18.10, we report the difference (in percent) between the resulting CER
values for the two types of investors. A positive number indicates that an investor
attains a higher CER by incorporating lagged industry returns in the prediction model.
The results for realized volatility in Panel A indicate substantial economic gains from
using industry level information (all one figure in the table are positive). The same
also holds for good realized volatility in Panel C with the magnitude of the economic
benefits from utilizing lagged industry returns increasing as the degree of risk aver-
sion increases. This suggests that more risk averse investors can reap increasingly
greater economic benefits from using industry level information. In the case of bad
realized volatility, in contrast, the results are mixed. A longer rolling-estimation win-
dow tends to worsen the economic value added of industry returns. This could be due
to the dominance of non-industry related factors such as behavioral and sentiment
related effects over stock-market volatility dynamics, particularly during periods of
market crisis when investors would be more likely to engage in herding behavior.
Nevertheless, our results indicate that an investor who plans to use lagged indus-
try returns to predict bad realized market volatility should choose a relatively short
rolling-estimation window, especially in case he or she is highly risk averse.
In a well-cited study, Hong et al. [21] argue that industry portfolios capture predictive
information over the aggregate stock market, in line with the so-called gradual dif-
fusion of information hypothesis that suggests the information contained in industry
returns diffuses gradually across markets. Although later studies provide mixed evi-
dence regarding the predictive power of industry returns over stock market returns,
the literature has not yet examined the out-of-sample predictability of stock-market
volatility in this context. Given the importance of accurate out-of-sample volatility
predictions for a number of financial activities including option pricing, hedging,
and portfolio optimization, our study is a first step in this direction by investigating
for the first time the role of lagged industry returns from across the entire economy
in predicting out-of-sample aggregate stock-market volatility.
18 Realized Stock-Market Volatility: Do Industry Returns … 401
Acknowledgements The authors thank an anonymous reviewer for helpful comments. The usual
disclaimer applies.
Appendix
Table 18.11 The full random-forest model versus the classic HAR-RV model (RMSPE ratios)
Window .h = 1 .h = 3 .h = 12
Panel A: realized volatility
120 1.1392 1.0654 1.1268
180 1.1205 1.0336 1.0911
240 1.0255 1.0378 1.0783
360 0.9479 1.0065 1.0168
Panel B: bad realized volatility
120 1.0643 1.1790 1.7693
180 1.0262 1.0539 1.2835
240 0.9086 1.0229 1.0374
360 0.9451 0.9896 0.9889
Panel C: good realized volatility
120 1.1054 1.0282 1.0569
180 1.0912 1.0434 1.0832
240 1.0998 1.0447 1.0721
360 1.0241 1.0255 1.0204
Note This table reports the ratio of the RMSPE statistics of the classic HAR-RV model (. RVt+h =
β0 + β1 RVt + β2 RVt,q + β3 RVt,y + εt ) estimated by OLS and the full random-forest model
(. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The
column entitled “Window” shows the length of the rolling-estimation window. The parameter h
denotes the investment horizon (in months). The random forests are built using 2000 trees
Table 18.12 The full random-forest model versus the HAR-RV model (model averaging)
Window .h = 1 .h = 3 .h = 12
Panel A: realized volatility
120 1.1399 1.0608 1.1206
180 1.1168 1.0304 1.0880
240 1.0230 1.0363 1.0746
360 0.9475 1.0073 1.0124
Panel B: bad realized volatility
120 1.0597 1.1612 1.7968
180 1.0210 1.0598 1.3016
240 0.9142 1.0247 1.0332
360 0.9471 0.9915 0.9832
Panel C: good realized volatility
120 1.1245 1.0230 1.0550
180 1.1001 1.0403 1.0818
240 1.0982 1.0425 1.0708
360 1.0121 1.0210 1.0195
Note This table reports the ratio of the RMSPE statistics of the HAR-RV model estimated by
OLS and the full random-forest model (. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that
includes lagged industry returns. In order to obtain the forecast for the HAR-RV model, only one
of the 48 lagged industry returns at a time is added to the classic HAR-RV model, the resulting
48 models are estimated by the OLS technique, and finally the forecasts from the estimated 48
models are averaged to predict realized volatility. The column entitled “Window” shows the length
of the rolling-estimation window. The parameter h denotes the investment horizon (in months). The
random forests are built using 2000 trees
18 Realized Stock-Market Volatility: Do Industry Returns … 403
Table 18.13 The random-forest model versus the HAR-RV model when both feature industry
returns (Clark-West test)
Window .h = 1 .h = 3 .h = 12
References
1. Andersen, T.G., Bollerslev, T.: Answering the skeptics: yes, standard volatility models do
provide accurate forecasts. Int. Econ. Rev. 39(4), 885–905 <error l="308" c="Invalid
command: paragraph not started." />
2. Barndorff-Nielsen, O.E., Kinnebrouk, S., Shephard, N.: Measuring downside risk: realised
semivariance. In: Bollerslev, T., Russell, J., Watson, M. (eds.) Volatility and Time Series Econo-
metrics: Essays in Honor of Robert F. Engle, pp. 117–136. Oxford University Press (2010)
3. Ben Nasr, A., Lux, T., Ajmi, A.N., Gupta, R.: Forecasting the volatility of the Dow Jones
Islamic stock market index: long memory vs. regime switching. Int. Rev. Econ. Finance 45(1),
559–571 (2016)
4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
5. Bouri, E., Gkillas, K., Gupta, R., Pierdzioch, C.: Forecasting realized volatility of Bitcoin: the
role of the trade war. Comput. Econ. (2020, forthcoming)
6. Cenesizoglu, T., Timmermann, S.: Do return prediction models add economic value? J. Bank.
Finance 36, 2974–2987 (2012)
7. Ciner, C.: Do industry returns predict the stock market? A reprise using the random forest. Q.
Rev. Econ. Finance 72, 152–158 (2019)
8. Clark, T.D., West, K.D.: Approximately normal tests for equal predictive accuracy in nested
models. J. Econom. 138, 291–311 (2007)
9. Corsi, F.: A simple approximate long-memory model of realized volatility. J. Financ. Econom.
7, 174–196 (2009)
10. Elliott, G., Komunjer, I., Timmermann, A.: Estimation and testing of forecasting rationality
under flexible loss. Rev. Econ. Stud. 72, 1107–1125 (2005)
11. Engle, R.F., Rangel, J.G.: The Spline-GARCH model for low-frequency volatility and its global
macroeconomic causes. Rev. Financ. Stud. 21(3), 1187–1222 (2008)
12. Engle, R.F., Ghysels, E., Sohn, B.: Stock market volatility and macroeconomic fundamentals.
Rev. Econ. Stat. 95(3), 776–797 (2013)
13. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29,
1189–1232 (2001)
14. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002)
15. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for Generalized Linear Models
via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). https://www.jstatsoft.org/v33/i01/
16. Giot, P., Laurent, S., Petitjean, M.: Trading activity, realized volatility and jumps. J. Empir.
Finance 17(1), 168–175 (2010)
17. Greenwell, B., Boehmke, B., Cunningham, J., GBM Developers: gbm: Generalized
Boosted Regression Models. R package version 2.1.8.1 (2022). https://CRAN.R-project.org/
package=gbm
18. Gu, S., Kelly, B., Xiu, D.: Empirical asset pricing via machine learning. Rev. Financ. Stud. 33,
2223–2273 (2020)
19. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, 2nd edn. Springer, New York, NY (2009)
20. Hong, H., Lim, T., Stein, J.C.: Bad news travels slowly: size, analyst coverage and the prof-
itability of momentum strategies. J. Finance 55, 265–295 (2000)
21. Hong, H., Torous, W., Valkanov, R.: Do industries lead stock markets? J. Financ. Econ. 83,
367–396 (2007)
22. Hong, H., Torous, W., Valkanov, R.: Note on “Do industries lead stock markets?”. http://rady.
ucsd.edu/docs/faculty/valkanov/Note_10282014.pdf (2014)
23. Meinshausen, N.: Quantile regression forests. J. Mach. Learn. 7, 983–999 (2006)
24. Mittnik, S., Robinzonov, N., Spindler, M.: Stock market volatility: identifying major drivers
and the nature of their impact. J. Bank. Finance 58, 1–4 (2015)
25. Müller, U.A., Dacorogna, M.M., Davé, R.D., Olsen, R.B., Pictet, O.V.: Volatilities of different
time resolutions—analyzing the dynamics of market components. J. Empir. Finance 4, 213–239
(1997)
18 Realized Stock-Market Volatility: Do Industry Returns … 405
26. Patton, A.J.: Volatility forecast comparison using imperfect volatility proxies. J. Econom. 160,
246–256 (2011)
27. Poon, S.-H., Granger, C.W.J.: Forecasting volatility in financial markets: a review. J. Econ. Lit.
41(2), 478–539 (2003)
28. Pradeepkumara, D., Ravi, V.: Forecasting financial time series volatility using Particle Swarm
Optimization trained Quantile Regression Neural Network. Appl. Soft Comput. 58, 35–52
(2017)
29. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria (2023). https://www.R-project.org/
30. Rangel, J.G., Engle, R.F.: The Factor-Spline-GARCH model for high and low frequency cor-
relations. J. Bus. Econ. Stat. 30(1), 109–124 (2011)
31. Rapach, D.E., Strauss, J.K., Wohar, M.E.: Forecasting stock return volatility in the presence of
structural breaks. In: Rapach, D.E., Wohar, M.E. (eds.) Forecasting in the presence of structural
breaks and model uncertainty. Frontiers of Economics and Globalization, vol. 3, pp. 381–416.
Emerald, Bingley, United Kingdom (2008)
32. Rapach, D.E., Zhou, G.: Forecasting stock returns. In: Elliott, G., Timmermann, A. (eds.)
Handbook of Economic Forecasting. Volume 2A. Elsevier, Amsterdam, pp. 328–383 (2013)
33. Rapach, D.E., Strauss, J.K., Tu, J., Zhou, G.: Industry return predictability: a machine learning
approach. J. Financ. Data Sci. 1(3), 9–28 (2019)
34. Reschenhofer, E., Mangat, M.K., Stark, T.: Volatility forecasts, proxies and loss functions. J.
Empir. Finance 59, 133–153 (2020)
35. Tse, Y.: Do industries lead stock markets? A reexamination. J. Empir. Finance 34, 195–203
(2015)
36. Salisu, A.A., Gupta, R., Ogbonna, A.E.: A moving average heterogeneous autoregressive model
for forecasting the realized volatility of the US stock market: evidence from over a century of
data. Int. J. Finance Econ. (2020)
37. Tibshirani, J., Athey, S., Wager, S.: grf: Generalized Random Forests. R package version 2.2.1.
https://CRAN.R-project.org/package=grf (2022)
38. Zhang, Y., Tse, Y., Zhang, G.: Return predictability between industries and the stock market
in China. Pac. Econ. Rev. 27(2), 194–220 (2022)
Chapter 19
Machine Learning Techniques
for Corporate Governance
Deepika Gupta
Abstract Even with much growth, development, evolution, advancement and contri-
bution to the governance mechanisms studies on firm and market performances, there
are no clear consensus on governance issues like CEO duality, board diversity, CSR
impact and other parameters. A need is felt to harmonize various concepts, theories,
models of corporate governance to meet the idiosyncratic needs of a firm. There is a
need of new data sources, technologies, research methods as a customized approach to
meet the gaps of existing literature and find better constructs to understand the intrica-
cies of governance mechanisms and help find resolution of conflicting or unexplored
results. One of such trajectories is machine learning techniques that can tailor the data
collection, process and analyze various sources for decision-making processes. This
chapter aims to provide creative integration of corporate governance mechanisms
with machine learning techniques in order to achieve managerial powers resulting
in competitive advantages. It looks at the areas wherein technology can provide the
required core competencies by providing solutions to enhance accurate and effec-
tive managerial decision-making, reduce their opportunistic behaviour and thereby
improve firm’s ability to handle different uncertainties in business. This move should
be towards a more universal and holistic approach through synergistic intelligence
to help shaping governance mechanisms and decision making in years to come.
19.1 Introduction
Technology plays an indispensable part in the lives of every human being. It has also
made its impact on businesses for various decision-making processes. One of such
general-purpose technology that has considerable impact is artificial intelligence
D. Gupta (B)
Indian Institute of Management, Visakhapatnam, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 407
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_19
408 D. Gupta
(AI). Though not a new term as it was coined in 1950s [105], it is on the top of
agenda for many businesses [34] and a part of corporate board discussions today.
Times have changed in comparison to what Peter Drucker in 1967 articulated the
computer to be a total ‘moron’ as it can only execute commands and does not make
decisions [41]. This limited the importance of artificial intelligence on corporate
management and governance. Today, it is viewed as ‘general solution technology’
to any managerial, commercial or even society issues and problems [65].
Corporate governance1 has always generated interest both inside as well as outside
academia due to various scandals like Enron, WorldCom, Satyam Computers and
many more. These scandals highlight the failures and shortcomings in governance
systems thereby resulting in significant losses for various stakeholders [124]. The
corporate governance systems have been extensively reviewed, revised and in some
cases repealed and replaced or even new regulations were brought in. These include
The Sarbanes Oxley Act 2002 in The United States of America or the new Companies
Act in 2013 [120] that replaced the earlier 1956 Act [119] in India. The reforms in
corporate governance were carried out to ensure and restore investor confidence
and reduce investment risks [124]. Governance depends on both country-level as
well as firm-level mechanisms. The country-level governance mechanisms include a
country’s laws, its culture and norms, and the institutions that enforce the laws [95,
110]. Firms must adhere to the governance environment not by choice but due to
procedural, regulatory, and statutory requirements.
The literature on corporate governance brings out interesting empirical results that
remain either mixed or inconclusive in nature. One of the reasons as highlighted by
[124] is the application of one-size-fits-all governance solution to every type of firm.
This warrants the need to explore the governance mechanisms to more granular firm
level requirements. This influences the choice of research methods in this domain as
well.
Traditionally, there had been a few performance metrics wherein Tobin’s Q, return
on equity (ROE), return on assets (ROA), economic value added (EVA) were the
most used ratios in various studies on corporate governance [101]. Thereafter, the
use of ‘composite measures’ like commercial ratings indices [26, 58, 106], corporate
governance quality [15], corporate governance score [113] and other complex and
comprehensive versions of governance mechanisms. There was also rise of new
performance measures that extended to efficiency indicators like total sales, sales-
per-employee, asset turnover and others [56], innovation measures like research and
development investments [60], cost of capital [122], diversity and corporate social
responsibility [45], disclosures [12], real earnings management [107], tax avoidance
[70] and many others.
With respect to research methods, regression models with certain set of vari-
ables were largely used to understand various relationships amongst governance
variables with corporate and market performances. Such methods resulted in various
econometrical issues like endogeneity [124] and many a times approaches were
1Certain parts of the chapter are excerpts from author’s doctoral thesis titled ‘Corporate governance
and initial public offerings’ (2015) at Indian Institute of Management, Bangalore, India.
19 Machine Learning Techniques for Corporate Governance 409
made to control such issues. Qualitative research picked up for studying human
aspects of governance [90] through techniques like interviews, archival data, obser-
vation, surveys and others. Text analysis such as ‘tone measures’ [88] of governance
reports and filings was another research method used to associate with governance
mechanisms.
Advancements in econometrics and other research methodologies helped to adapt
to new data analysis methods like data envelopment analysis (DEA) and stochastic
frontier approach (SFA) [56], logit/probit and structural equation models usually
with multi-year data [14]. In order to meet the challenges of endogeneity issues
along with causality questions, use of lagged variables or instrumental variables
approaches were adopted. As models became larger and complex, use of principal
component analysis, qualitative comparative analysis [29], techniques like fuzzy sets,
fuzzy logics and governance bundles were explored [124].
With such growth, development, evolution, advancement and contribution to the
governance mechanisms studies on firm and market performances, there is no clear
consensus on governance issues like CEO duality, board diversity, CSR impact and
other parameters. Even today, a need is felt to harmonize various concepts, theories,
models of corporate governance to meet the idiosyncratic needs of a firm. There
is a need of new data sources, technologies, research methods for a customized
approach to help academicians, researchers to meet the gaps of existing literature
and find better constructs to understand the intricacies of governance mechanisms
and help find resolution of conflicting or unexplored results. One of such trajectories
is artificial intelligence that can tailor the data collection, process and analyze various
sources for decision-making. The chapter looks to discover this unexplored path as
avenues for future research in the governance literature that now needs intervention
of other domains as well for more matured decision-making corporate policies.
The chapter aims to examine the collective governance aspects of political, economic,
cultural, social, and other changes [83] associated with the new concept of machine
learning techniques. However, such relationship is also dependent on the legal
systems, their enforcement, and other formal institutions that rely on cultural factors
of that particular nation [108]. It utilizes multiple perspectives to dissect the macro-
level as well as the micro-level aspects of governance as this is an essential tool in
the entrepreneurial process [21] of any enterprise.
The objective of the chapter is to decipher what lies in future at the intersection
of new technology usage of machine learning techniques on corporate governance
mechanisms. This chapter aims to provide creative integration of corporate gover-
nance mechanisms with machine learning techniques in order to achieve managerial
powers resulting in competitive advantages. It looks at the areas wherein technology
can provide the required core competencies by providing solutions to enhance accu-
rate and effective managerial decision-making, reduce their opportunistic behaviour
410 D. Gupta
Analytics
Artificial Intelligence
Unsupervised Learning
All these approaches assume a separation of the machine from the mind [65]. These
are as under:
(1) Supervised learning—Supervised learning is the most commonly used approach
as on date and it requires well-structured and labelled training data to train
the algorithms to improve AI-driven applications such as image recognition or
translation.
(2) Reinforcement learning—Reinforcement learning is based on the philosophy
of trial and error which is often used in board game simulations. The major
challenge of reinforcement learning lies in the large number of trial rounds
required to achieve good results.
(3) Unsupervised learning—The most challenging but most promising approach
is unsupervised learning, where the algorithms are designed to ‘learn directly
from unstructured data coming from their environments’ [51].
Machine learning technologies are in vogue today and currently being researched
though all the approaches fall under the purview of AI. It is also essential to under-
stand how the scale—human intelligence and how the process of intelligence devel-
opment—learning is compared between the mind and the machine. Both, human and
machine learning cycles assume that decisions are based on predictions of possible
outcomes. Prediction takes information called as ‘data’ and uses it to generate
information that we do not have [3].
So, in human learning, predictions are based entirely on input data, however in
machine learning it is assumed that three types of data—input, training and feedback
data exists as these data have different roles to pay in each supervised, reinforcement
and unsupervised learning [65]. The human and machine cycles are brought out in
Fig. 19.2.
Data is an inherent feature of decision-making on which the techniques of machine
learning are based. Decision making is always about consciously choosing between
two or more options. The options can be either binary, for example, yes or no, or
multifaceted, for example, options 1, 2 and 3. The choice always depends on the
412 D. Gupta
Algorithm
Decision Outcome
Decision Sensing
Decision Framing
Human
Learning Judgement Input data Feedback
Cycle
criteria chosen [65]. Still et al. [116] have outlined as to how an informed decision
usually follows a similar pattern and thereby distinguish between three phases, that is,
conceptualization, information, and prediction as in Fig. 19.2. In order to recommend
use of technology or rather machine learning techniques on business decisions, Hilb
[65] advocates Stacey’s [115] taxonomy of four different types of decisions based
on degree of certainty and agreement as in Fig. 19.3.
Though machine learning techniques also rely on input data, however, training
data is crucial for supervised learning while feedback data is vital for reinforcement
and unsupervised learning. The challenge now is to apply such logic to decision-
making processes by firms by use of machine learning techniques. It is important to
intertwine the decision types with machine learning approaches to understand how
decision-making processes can work corporates. This is explained in Table 19.1.
In a nutshell, machine learning techniques provide potential for transparency,
helps to distinguish between causality and correlation and is also an alternative
approach to existing traditional research methods in terms of providing efficiency
for valid predictions given the complexity and volumes of data in any domain. In
recent times, machine learning mechanisms are widely used across various sectors
and fields like finance, accounting, healthcare, logistics, supply chain management
and many more where new data sets, especially big data are available and there is
Complex
Agreement
Complicated
Common
High
Table 19.1 Machine learning approaches and decision types (Adapted from Hilb [65])
Approaches Decision type Reason Remarks
Supervised For common decisions Much less effective for Given the need for
learning other decision types like. relevant training data
complicated, complex or
chaotic
Reinforcement Effective in automating Relies heavily on trial Not effective in
learning complicated decisions and error and thus on handling complex or
based on past routines feedback data chaotic decisions
Unsupervised For complex decisions Provides clues For chaotic decisions,
learning difficult to rely on any
known machine
learning approach
huge literature around these domains. Machine learning, in fact is a system that uses
algorithms that can process large data sets, detect patterns, and improve its ability
to analyze information over time and with more data and these models are found to
have higher predictive accuracy than statistical models [124]. Its usage in corporate
governance is limited and this chapter aims to explore these new avenues of research
methods.
highlighted by Vito and Trottier [124]. Thomsen [121] defined it as ‘system that is a
composite of ownership, boards, incentives, company law, and other mechanisms.’
Corporate governance definitions are, thus, closely tied to different paradigms or
ways of conceptualizing the organization or the firm.
Corporate governance rules and norms originate and are supported by the legal
institutions of the firm’s home country. In prior studies, researchers continued to
debate on what factors best explain the diversity of corporate governance across
countries. The studies involving international comparisons largely took place after
the 1990s. These included the influential works of La Porta et al. (abbreviated as
LLS) [72–74] and LLS with Vishny (abbreviated as LLSV) [75–79] and LLS with
Djankov [37]. These studies largely focused on the differences among the legal
systems of countries, various shareholders’ and creditors’ rights, and enforcement
systems [10]. These studies also inspired a large body of works on international
comparisons. Additionally, a substantial body of research showed that cross-firm
differences in governance have a substantial effect on firm values and performances
(such as [9, 57]). However, such works largely looked at the effects of governance
mechanisms on valuation and performances of the entire population of the firms.
The important aspect here is that irrespective of the definition used, researchers
[54], for instance) have viewed corporate governance mechanisms as falling into two
groups:
(a) Mechanisms external to firms and
(b) Mechanisms internal to firms.
Acharya et al. [1] argued that external governance (even if crude and uninformed)
and internal governance complement one another, to improve efficiency and to ensure
that firms have substantial values.
Firms have to deal with various entities in the external environment [50, 54] and
are required to operate under the legal and regulatory environments of the particular
country where they are located. National systems of corporate governance differ in
terms of their institutional arrangements and these differences shape the possibilities
for change or diffusion in practices from one country to another [4] and thereby
comparative corporate governance was defined as ‘the study of relationships between
parties with a stake in the firm and how their influence on strategic corporate decision
making is shaped by institutions in different countries’.
The influential works of LLSV [75–79] documented significant differences in the
levels of investor protection, ownership concentration, dividend policies, creditor
rights, and enforcement abilities across countries, thus, attesting to variations in
several country-level institutional parameters. The similarities and differences among
corporate governance practices at the national level cater to the ‘macro’ question and
19 Machine Learning Techniques for Corporate Governance 415
the association with particular firm-level outcomes such as firm performances and
stock market returns addresses the ‘micro’ question.
At the national level, the governance environment matters for the size and the
extent of a country’s capital markets because good governance would protect poten-
tial investors from expropriation by entrepreneurs [75]. At the macro level, the corpo-
rate governance mechanisms are the economic and legal institutions that can be
altered through political processes, if required, for the betterment and scope of the
capital market. Additionally, Chen et al. [25] highlighted the criticality of research
from cross-cultural perspectives. At the micro level, corporate governance mecha-
nisms define the power-sharing relationships between investors and the founders of
the firms [57].
The institution theory-based works by Peng [96, 97] and Peng et al. [98] argued
for the institution-based view as the third leg of the strategy tripod, the other two
being industry organization and resource-based views. On the one hand, economists
(such as in the LLSV studies) mostly focused on formal laws, rules, and regulations
and sociologists (for instance, Meyer and Rowan [36, 91] paid more attention to
informal cultures, norms, and values; on the other hand scholars such as North [95]
and Scott [110] supported a complementary view where the research on the impact
of institutions investigates both formal as well as informal components. The new
institutional perspective [126] also attempts to focus on social and legal norms and
rules that underlie economic activities.
Institutions are commonly known as the ‘rules of the game’ [98]. The more formal
institutions are defined as ‘the humanly devised constraints that structure human
interaction’ and ‘regulative, normative and cognitive structures and activities that
provide stability and meaning to social behaviour’ [36, 91, 95, 110]. Thus, institutions
are broadly divided into formal and informal components that are complementary
to each other. The remarkable consensus here is that ‘institutions matter’ as a core
proposition [98].
Culture is argued to be related with governance because the effectiveness of legal
systems, their enforcement, and other formal institutions largely depend on cultural
factors [108]. In recent years, there has been growing recognition that culture affects
both economic exchange and outcomes by affecting expectations and preferences
[11]. Studies now show that perceptions rooted in culture are important determinants
and thus affect the level of trust and the nature of financial contracting.
Culture is often defined as a system of shared values, beliefs, and attitudes
that influences individual perceptions and behaviours. The level of trust encour-
ages economic exchange, and this trust leads investors to invest in even the IPOs of
totally unknown firms. Trust also affects stock market participation, and these cultural
aspects have a significant impact on a wide variety of cross-border economic trans-
actions. Thus, following Hofstede [66, 67], culture is often conceptualized through
the construction of national averages, which are used to create something akin to the
personality profile of an “average person” in a society. These latent propensities of
individuals are then argued to assert some causal influence on the economic organi-
zation. The emphasis on cultural value shape and justify individual and group beliefs,
actions, and goals. The external governance has influence on network centralities,
416 D. Gupta
social network structures, ties, strategic alliances and other contractual relations [85]
that educate firms to better position itself and its resources in markets for competitive
advantage.
On a macro-level, national cultural practices influence the institutional environ-
ment, which in turn has an influence on corporate governance practices [32]. Culture
plays an indirect role in shaping corporate governance mechanisms. Institutional
arrangements and policies, norms, and everyday practices express the underlying
cultural value emphasis in societies [109].
In a nutshell, the external mechanisms protect all the stakeholders through the legal
system, the market for corporate control, the managerial labour market, monitoring
by institutional investors, and disciplinary measures arising from financial debt [124],
social network structures, network centrality and innovation [85].
Jensen and Meckling [69] suggested that equity ownership aids in the alignment
of the interests of managers with those of the shareholders and mitigating agency
costs. Alignment refers to the effects of insider ownership and control refers to the
effects of outsider ownership [30]. Morck et al. [92] suggested managerial equity
ownership to be beneficial at lower levels but negative at higher levels, indicating
that as insider (retained) ownership increases, it affords managers greater power
that facilitates their entrenchment. Outside ownership in the form of institutional
holdings and block holdings are proposed as the solution to the agency problem [30].
These shareholders are better able to internalize the costs associated with monitoring
management. They have a general interest in profit maximization and enough control
over the assets of firms to have their interests respected [111].
Institutional investors: Institutional investors form one of the important groups
of investors in any firm. These can be either domestic or foreign in nature. It is
generally argued that institutional investors possess lot of private information about
the companies. They are seen as important players in any capital market across
the world. Institutional investors prefer large and liquid stocks with good corporate
governance practices, especially in countries where country-level investor protection
and the quality of institutions are weak [71]. However, the investment patterns of
these shareholders vary across time, between different countries, and also within the
19 Machine Learning Techniques for Corporate Governance 417
same countries. Studies show that institutional investors have almost doubled their
investments in firms, leading to an increase in the prices of such firms.
Institutional investors can have their representatives sit on the Boards and monitor
the decision-making processes, can have voting rights, and can also monitor executive
compensation contracts [62], dependent on the percentage of their shareholding of the
company’s equity. The institutional investors prefer to invest in firms with superior
past financial performance, lower volatility of share price, high trading liquidity,
larger size, longer listing history, better public funds utilization [22, 24, 59] and
others.
Kurshed et al. [71] in their study of the U.K. sample set, found institutional
ownership to be negatively related to directors’ ownership and positively related
to the composition of the Board of directors. They found that the U.K. institu-
tional investors prefer smaller firms and firms with smaller boards, shorter listing
history, and low trading liquidity, these findings were in contrast to the results of
the U.S. studies. Institutions play a monitoring role in mitigating agency problems
between shareholders and managers; they also influence, either positively or nega-
tively, compensation structures through their preferences [62]. Institutional investors
have influence on firm performances, signal firm’s prestige and quality and are one
of the important corporate governance mechanisms who can facilitate in monitoring
the decision-making processes.
Retained ownership: Retained ownership in firms is one of the key indicators
of the control of owners and managers of a firm. High concentration of retained
ownership mitigates various types of agency conflicts. Ownership concentration is,
therefore, an important governance parameter that enhances the firm’s performances
and reduces the chances of funds raising discounts arising from agency conflicts. One
of the key decisions that the firm’s owners and managers control in any initial public
offering is what percentage of the firm to sell [93]. This could act as an important
signalling mechanism to the investors to assess the reduction in the risks as foreseen
by them. Higher retained ownership helps to increase the level of confidence, builds
faith in the investors, and also helps to reduce agency costs. This also sends signals
about the owners’ confidence in the future prospects of the firm.
Retained ownership pattern is an important governance structure that influences
the firm’s performance in relation to the stated intentions of its public offerings [22].
Since high ownership retention helps in mitigating agency conflicts and investors
view such firms positively [102], promoters seek to have efficient decision-making
processes and also aspire to utilize these funds in order to retain confidence and the
positive signals about the firm in the market.
The Board of directors has been viewed as the heart of corporate governance. The
directors are elected by the shareholders and have a fiduciary obligation towards
them. They are also responsible for providing strategic directions on investments and
financing decisions and for monitoring the management of the firm [30, 54]. Boards
418 D. Gupta
consist of a mix of inside and outside directors. Inside directors are officers of the firm
and possess intimate knowledge about the firm’s activities. Outside directors have
no substantive relationship with the firm, they owe their position on the Board due
to the specific expertise they possess in areas that are significant to the firm. Outside
directors are independent in nature. There is no consensus that the differences in
Board independence result in improved corporate performance [30]. Whether the
Board is truly independent in nature is still debated. However, Dalton et al. [30] argued
that resource dependence [100] or a resource-based perspective [7, 125] values the
networking potential and innovation diffusion potential of the Board’s independence
while the agency perspective considers independence to be a threat given that such
directors also serve on the Boards of other firms.
Studies on Board size and composition have produced mixed evidence of the
relationship between Board composition and corporate financial performance [30].
The leadership structure of the Board has been another important element and there is
a large body of literature dedicated to CEO duality, and the nature of the chairperson
of the Board—whether independent or from the founding family of the firm. The
stewardship theory believes in the notion of the unity of command in CEO duality
as beneficial to the firm, however, from an agency perspective, issues related to the
separation of CEO from the Board chairperson remain unsettled [30].
Family-managed and non-family-managed firms: The composition of the Board,
given the ownership structure of the firm, is an important governance mechanism for
any company. When firms have members of family/families as the key members on
the Board, it can be either owned and managed by the family members or managed
by non-family members. This separation of ownership and control lies at the core
of agency theory and has been the subject of numerous debates and studies. Family-
managed and non-family-managed firms continue to attract interest as composition
of the Board is an internal corporate governance mechanism in any public fund-
raising context. Family-owned and family-managed firms and non-family-managed
firms have different impacts on the long-run performance of the firms.
McConaughy et al. [89] found that family-controlled firms have greater market
value and operate more efficiently than other firms because the costs of monitoring
are less in such firms. The choice between concentrated or dispersed ownership and
votes are determined by the size of private control benefits [42].
In India, the majority of companies continue to have a large number of family
members on their Boards. The owners of family firms generally rely exclusively on
family members because they find it difficult to delegate to outsiders, have insuffi-
cient knowledge of formal management techniques, fear losing control, or believe
that professionalization comes at unnecessary costs. In turn, non-family managers
decide to stay away from family firms because they are likely to offer outsiders
limited potential for professional growth, restrict their roles to that of a tutor, coun-
sellor, or confidant, and exclude them from succession [35]. Studies have shown
that non-family managers play an important role and have a positive impact on firm
performance due to their formal business training and experience, cultural compe-
tence, and they are not tied by the emotional connections to the family and the
firm.
19 Machine Learning Techniques for Corporate Governance 419
In a family-managed firm, the founding family has higher returns and private
benefits than the other large shareholders [42]. In addition to this, the founding
family also has cash flow rights and voting rights that are used in conjunction in
a controlled shareholder structure (as in family-managed firms) but not otherwise
[8]. As the family members managing such firms continue to exercise considerable
control on their firms, they can also seek opportunistic behaviour though trying to
operate within the ethical boundaries of social contracts, influence the monitoring
and decision-making processes.
CEO duality: An important internal governance mechanism is CEO duality, where
one person plays two roles—one as the Chief Executive Officer (CEO) of the firm
and the other as the Chairperson of the Board of directors. Non-duality implies
that different individuals serve as the CEO and the Chairperson [6]. CEO duality
[48] is one of the main factors in corporate governance that has been extensively
debated across the world. In a study using samples from the U.S., the U.K., and
Japan, Dalton and Kesner [31] found that 30% of the U.K. companies in the sample
had CEO duality, while 82% of the U.S. firms in the sample had CEO duality. With
subsequent changes in the regulations, the U.S. firms are adopting non-duality, but
these figures are still low when compared to the U.K. In the U.K., the Combined Code
recommends having different individuals as the CEO and the Chairman. In India,
there was no prohibition on CEO duality, the decision is left to the firms; however,
SEBI now mandates CEO non-duality for listed firms.
Most of the Anglo-Saxon countries use a one-tier system of Board structure.
Therefore, the results of the studies in these contexts are mixed, with some coun-
tries choosing unitary leadership structures (CEO duality) and some other countries
opting for dual leadership structures with a separation of the two jobs. The agency
theory treats CEO duality undesirable, as this would lead to a lack of indepen-
dence and vigilance, and would also lead to more agency problems, and thus, poor
performance. This theory postulates that CEO duality constrains Board indepen-
dence and promotes CEO entrenchment. A centralized leadership authority leads to
management dominance and poor financial performance.
The stewardship theory proposes that CEO duality works towards the unity of
command at top and avoids confusion regarding who the head is (the CEO or the
Chairperson); this enables timely decision making, thus positively impacting the
firm’s performance [99]. This theory postulates that when there is CEO duality, firms
reap a number of benefits since the potential for conflicts between the CEO and the
Chairman is eliminated due to the unified company leadership, thus making way for
smoother, more effective, consistent strategic decision-making and implementation
[23]. Boyd [13] provided partial support for both the agency as well as stewardship
perspectives. Several studies addressed the CEO duality-performance relationship
but reported inconsistent results.
Baliga et al. [6] studied CEO duality and the performance of the firms with Fortune
500 companies as the sample for the period 1980–1991 and found weak support for
a link between the two. Peng et al. [99] studied CEO duality and firm performance
during institutional transitions in China and found strong support for the steward-
ship theory and relatively less support for the agency theory. Elsayed [44] focussed
420 D. Gupta
on a sample of Egyptian listed firms and found that CEO duality had no impact
on corporate performance, the results supported the agency as well as the steward-
ship theories when the interaction term between industry type and CEO duality was
introduced. Ramdani and Witteloostuijn [104] used quantile regression analysis on
samples from Indonesia, Malaysia, South Korea, and Thailand and found a negative
moderating effect of Board size on the positive relationship between CEO duality
and firm performance. Iyengar and Zampelli [68] did not find evidence to support the
contention that CEO duality is a structure that is purposefully chosen for optimizing
performance. In line with the arguments of the agency theory, CEO duality would
result in the concentration of excess power, which could prevent adequate monitoring
[39] and encourage opportunistic behaviour.
Size, diversity, composition, and Chair of the Board: Board size, diversity and
composition are the most important internal governance mechanisms that send
signals to investors about the firm’s quality and prestige. Certo [20] suggested that
the investors’ perceptions of Board prestige signal organizational legitimacy, thereby
reducing the liability of market newness and improving firm’s stock performance.
Larger board size delays the decision-making processes due to the higher levels
of uncertainty and coordination problems. This would lead to difficulties in arriving
at a consensus speedily, in turn increasing timeliness on various fronts [63] and
trigger free-riding issues [104] among the board members. Smaller boards are easy
to manage and are capable of taking quick decisions [39].
Diverse board provides diversity of thoughts and perspectives. A commonly
studied measure for board diversity is number/proportion of women directors on
boards. There have been mandating requirements about inclusion of women on corpo-
rate boards. There have been various studies with mixed results in understanding
board diversity with firm performance [47, 52, 86] , board effectiveness [2], firm
value [18, 86] , earnings reporting quality [114], stock price informativeness [61],
agency costs [53], educational levels and independence [123].
Further, the Board composition—in terms of the proportion of inside and outside
(independent) directors—is important in the context of the regulatory requirements.
The independent directors on the firms bring with them diverse experience and exper-
tise that also signals about firms prestige and quality. Li and Naughton [84] used a
sample of Chinese firms and argued that higher the proportion of independent direc-
tors on the Board, the better would the firm’s long-term performance because these
independent directors have more incentives to work in the interests of stockholders,
thereby reducing information asymmetry.
The Chair of the Board can be either from the founding family or an indepen-
dent person unconnected with the founding family. An Executive Chair (being form
founding family) would wield more executive powers and can exercise more oppor-
tunistic behaviour though at the same time trying to operate within the ethical bound-
aries of social contracts, thus influencing the decision-making processes and the
decision control of the Board compared to the Non-executive Chair. Various studies
in this context also highlight mixed results with respect to the firm performance and
value to shareholders.
19 Machine Learning Techniques for Corporate Governance 421
Beginning with the definition of corporate governance and then its external and
internal mechanisms, there have been no consensus and different studies across the
world have shown mixed though significant results at times. This brings out confusion
and anomaly in relying on the outcomes of these studies in the field of corporate
governance thereby warranting a need to use technology in the form of machine
learning to innovate common solutions for managers.
The extant literature on both machine learning (part of artificial intelligence) and
corporate governance suggest that studies were usually done independently in each
of these domains by researchers from economics, entrepreneurship, finance, law,
management, operations and accounting backgrounds. The machine learning litera-
ture is relatively new and depicts a new path to understand novel research methods,
given the extensive digitalization generating big datasets. It is a challenge to under-
stand the solutions buried in such complex data. The machines extend the human
intelligence in finding resolutions to such mysteries.
Further, the governance literature highlights the importance and impact of gover-
nance structures and mechanisms that produce values for their shareholders and also
protect their rights. One of the reasons for this could be the changes in the legal
and economic structures of firms especially when they go public [40]. The regula-
tory compliances ensure that the systems of control are embedded in the operations
of the companies and are part of their daily cultures. As long as the company is
privately managed with no outside investors or major stakeholders, corporate gover-
nance requirements are not very strictly followed. However, when the company
seeks to offer its shares to the public and seeks stock market listing, detailed corpo-
rate governance mechanisms come into force for the first time. Existing shareholders
who want to sell their stock and prospective investors who want to buy stock have
the marketplace as the ultimate valuing mechanism to determine the final outcome
[102].
One of the most important contributors of corporate governance structures are
the investors because they provide finance when companies bring out their maiden
public issues; therefore, the need to reflect the owners’ or shareholders’ desires has
typically been the focus of debate regarding corporate governance reforms. The
protection of investors from agency risks resulting from the separation of ownership
and control [69] has been the central preserve of corporate governance recommenda-
tions throughout the world [16]. Following a public issue, in addition to maintaining
the corporate governance requirements, the company has to balance many competing
considerations of and obligations to different classes of stakeholders (such as share-
holders, employees, customers, suppliers, creditors, and others) as well as handle
the wider social responsibilities to the communities in which they operate. Thus,
422 D. Gupta
the firm needs to cater to external (legal and regulatory requirements) and internal
(ownership and board structures) governance mechanisms.
Given that, the studies on governance mechanisms have been mixed and with no
harmony in research across varied concepts, it is time that more advanced methodolo-
gies are explored for better harmonious results and conclusions. It is at this juncture
wherein advanced technologies like machine learning can prove to be a great blessing
to dig up nuances in complexities of external and internal governance mechanisms
in organizations. This makes the decision-making activities more demanding and
complicated.
At such confluence, intelligent machines can come to the rescue wherein an inte-
grated perspective can be used to achieve desirability, feasibility and responsibility
[65] tripod to understand various decision-making activities due to corporate gover-
nance mechanisms. Some of the essential functions of the board is strategy formu-
lation, policy making, executive supervision, accountability, transparency, disclo-
sure and others [49] or identified generically as supervisor, co-creator and supporter
[27]. These functions are in fact decision making processes and require proper
understanding of various input data as per the operations of the firm.
As the firms grow in size and with digitalization in place, the data usually generated
remains in unstructured format and in freer forms like digitized text, videos, audios,
photographs and others. All these data are useful as information and for prediction
and decision-making processes. Data from social media can also be new thought
on governance as it includes views of multiple stakeholder [124]. Researchers also
want to go beyond primary survey data by substituting it with archival data [14],
behavioural experiments and field studies [19, 81]. Due to such large and new data
sets and computational power, the importance of machine learning techniques cannot
be ignored in the corporate governance literature. Though the intervention of such
machine learning techniques is rather belated but as it is rightly said ‘better late than
never’.
Such solutions can help bring consensus through possible solutions in certain
concepts of the divided or single point of view results of governance literature.
Thereby, a need arises where competition and complementarity exist between mind
and machine for efficiency and effectiveness. Hilb [64, 65] argues for a superior
combination of man and machine for synergic intelligence being distinguished
into five scenarios—assisted, augmented, amplified, autonomous and autopoietic
intelligence.
These are discussed as in Table 19.2.
The use of such synergic intelligence when applied in corporate governance can
influence board practices such as direction, control, power and other matters like
compensation, compliances, misuses, obligations etc. This can be achieved through
various ways like automated reporting processes, real-time data provision, use of
predictive models like valid scenarios and superior simulations. In the long run,
there might be scenarios where machines teach machines and human intervention
may no longer be required [65].
As on date, there are studies wherein the machine learning techniques, compu-
tational linguistics are in use in accounting, taxation evasion, fraud detection,
19 Machine Learning Techniques for Corporate Governance 423
bankruptcy prediction in finance sector (say, [43, 117, 127]). Such novel methods
provide insights into meaningful economic effects. The same can also be extended in
the less explored corporate governance domain. Use of text analyses of information
such as annual reports, transcribed meetings and conference calls, extracting board
members from various public documents can be used to create directors’ network
analysis through the technique of say, named entity recognition in corporate gover-
nance [43, 124]. Recent studies also indicate inclusion of corporate governance vari-
ables like sustainability [103], reports and proxy statements [129], directors’ selection
[46] in machine learning analyses like text analyses, text mining, semantic networks,
decision tree algorithm [28].
The recent increasing literature on machine learning and corporate governance mech-
anisms provide huge and tremendous scope for future academicians, researchers,
managers, practitioners and policy makers as well. The emergence of machine
learning techniques provides opportunities and challenges about adoption of these
new avenues that shall further shape up managerial power and network structures
to influence firm’s strategy, decision-making and operational efficiencies. Future
19 Machine Learning Techniques for Corporate Governance 425
scope of studies can explore creative research methods, study variables and other
parameters to integrate insights from internal and external governance mechanisms.
Future works shall cultivate in motivating mangers and practitioners with better
decision-making in highly complex and competitive environments by their rich expe-
rience and professional knowledge by understanding and absorbing sense out of
the results and patterns generated through machine learning techniques. The policy
makers and government must take positive steps to promote reforms as a future
step by strengthening administrative and regulatory measures across the use of big
data environment along with corporate interests. Corporate governance is a broad
term and with machine learning techniques there is immense scope for future work
as technologies help in breaking the bounded rationality behaviour of humans and
broadens the scope of managers’ true motivation and decision-making behaviours.
References
1. Acharya, V., Myers, S., Rajan, R.: The internal governance of firms. J. Financ. 66(3), 689–720
(2011)
2. Adams, R., Ferreira, D.: Women in the boardroom and their impact on governance and
performance. J. Financ. Econ. 94(2), 291–309 (2009)
426 D. Gupta
3. Agrawal, A., Gans, J., Goldfarb, A.: Prediction machines: the simple economics of artificial
intelligence, vol. Spring. Harvard Business Press, Boston (2018)
4. Aguilera, R., Jackson, G.: Comparative and international corporate governance. Acad. Manag.
Ann. 4(1), 485–556 (2010)
5. Armour, J., Eidenmueller, H.: Self-driving corporations? ECGI Working Paper Series in Law
(2019).
6. Baliga, B., Moyer, R., Rao, R.: CEO duality and firm performance: What’s the fuss? Strateg.
Manag. J. 17(1), 41–53 (1996)
7. Barney, J.: Firm resources and sustained competitive advantage. J. Manag. 17(1), 99–120
(1991)
8. Bebchuk, L.: A rent protection theory of corporate ownership and control. Working Paper,
Harvard University (1999)
9. Bebchuk, L., Cohen, A., Ferrell, A.: What matters in corporate governance? Working Paper,
Harvard Law School (2004)
10. Bebchuk, L., Weisbach, M.: The state of corporate governance research. Rev. Financ. Stud.
23(3), 939–961 (2010)
11. Bell, R., Moore, C., Filatotchev, I.: Strategic and institutional effects of foreign IPO perfor-
mance: examining the impact of country of origin, corporate governance and host country
effects. J. Bus. Ventur. 27(2), 197–216 (2012)
12. Beyer, A., Cohen, D., Lys, T., Walthe, B.: The financial reporting environment: review of the
recent literature. J. Account. Econ. 50(2–3), 296–343 (2010)
13. Boyd, B.: CEO duality and firm performance: a contingency model. Strateg. Manag. J. 16(4),
301–312 (1995)
14. Boyd, B., Adams, R., Gove, S.: Research methodology of governance studies: challenges and
opportunities. Corp. Govern. (Oxford) 25(6), 382–383 (2017)
15. Bozec, R., Bozec, Y.: The use of governance indexes in the governance-performance
relationship literature: International evidence. Can. J. Admin. Sci. 29(1), 79–98 (2012)
16. Burton, B., Helliar, C., Power, D.: The role of corporate governance in IPO process: a note.
Corp. Govern. Int. Rev. 12(3), 353–360 (2004)
17. Cadbury, A.: Code of best practice: report of the committee on the financial aspects of corporate
governance. Gee and Co, London (1992)
18. Campbell, K., Mínguez-Vera, A.: Gender diversity in the boardroom and firm financial
performance. J. Bus. Ethics 83(3), 435–451 (2008)
19. Carcello, J., Hermanson, D., Ye, Z.: Corporate governance research in accounting and auditing:
Insights, practice implications, and future research directions. Audit. J. Pract. Theory 30(3),
1–31 (2011)
20. Certo, S.: Influencing initial public offering investors with prestige: signaling with board
structures. Acad. Manag. Rev. 28(3), 432–446 (2003)
21. Certo, S., Covin, J., Daily, C., Dalton, D.: Wealth and effects of founder management among
IPO-stage new ventures. Strateg. Manag. J. 22(6/7), 641–658 (2001)
22. Certo, S., Holcomb, T., Homes, M.: IPO research in management and entrepreneurship:
moving the agenda forward. J. Manag. 35(6), 1340–1378 (2009)
23. Chahine, S., Tohme, N.: Is CEO duality always negative? An exploration of CEO duality and
ownership structure in the Arab IPO context. Corp. Govern. Int. Rev. 17(2), 123–141 (2009)
24. Chemmanur, T., Hu, G., Huang, J.: The role of institutional investors in initial public offerings.
Rev. Financ. Stud. 23(12), 4496–4540 (2010)
25. Chen, Y., Leung, K., Chen, C.: Bringing national culture to the table: making a difference
with cross-cultural differences and perspectives. Acad. Manag. Ann. 3(1), 217–249 (2009)
26. Cheung, Y., Stouraitis, A., Tan, W.: Does the quality of corporate governance affect firm
valuation and risk? Evidence from a corporate governance scorecard in Hong Kong. Int. Rev.
Financ. 10(4), 403–432 (2010)
27. Cossin, D., Metayer, E.: How strategic is your board? MIT Sloan Business Review, Cambridge
(2014)
19 Machine Learning Techniques for Corporate Governance 427
28. Creamer, G., Freund, Y.: Learning a board balanced scorecard to improve corporate
performance. Decis. Support Syst. 49(4), 365–385 (2010)
29. Cucari, N.: Qualitative comparative analysis in corporate governance research: a systematic
literature review of applications. Corp. Govern. Int. J. Bus. Soc. 19(4), 717–734 (2019)
30. Dalton, D., Hitt, M., Certo, S., Dalton, C.: The fundamental agency problems and its
mitigation. Acad. Manag. Ann. 1(1), 1–64 (2007)
31. Dalton, D., Kesner, I.: Composition and CEO duality in boards of directors: an international
perspective. J. Int. Bus. Stud. 18(3), 33–42 (1987)
32. Daniel, S., Cieslewicz, J., Pourjalali, H.: The impact of national economic culture and country-
level institutional environment on corporate governance practices: theory and empirical
evidence. Manag. Int. Rev. 52(3), 365–394 (2012)
33. Davis, J., Schoorman, F., Donaldson, L.: Toward a stewardship theory of management. Acad.
Manag. Rev. 22(1), 20–47 (1997)
34. Davenport, T.H., Ronanki, R.: Artificial intelligence for the real world. Harvard Bus. Rev.
96(1), 108–116 (2018)
35. Dawson, A.: Private equity investment decisions in family firms: the role of human resources
and agency costs. J. Bus. Ventur. 26(2), 189–199 (2011)
36. DiMaggio, P., Powell, W.: The iron age revisited: institutional isomorphism and collective
rationality in organizational fields. Am. Sociol. Rev. 48(2), 147–160 (1983)
37. Djankov, S., La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: The law and economics of
self-dealing. J. Financ. Econ. 88(3), 430–465 (2008)
38. Donaldson, T., Dunfee, T.: Toward a unified conception of business ethics: Integrative social
contracts theory. Acad. Manag. Rev. 19(2), 252–284 (1994)
39. Dowell, G., Shackell, M., Stuart, N.: Boards, CEOs, and surviving a financial crisis: evidence
from the internet shakeout. Strateg. Manag. J. 32(10), 1025–1045 (2011)
40. Draho, J.: The IPO decision: why and how companies go public? Edward Elgar Publishing
Limited, Cheltenham (2004)
41. Drucker, P.: The manager and the moron. McKinsey Q 3(4), 42–52 (1967)
42. Ehrhardt, O., Nowak, E.: The effect of IPOs on German family-owned firms: governance
changes, ownership structure, and performance. J. Small Bus. Manage. 41(2), 222–232 (2003)
43. El-Haj, M., Rayson, P., Walker, M., Young, S., Simaki, V.: In search of meaning: lessons,
resources and next steps for computational analysis of financial discourse. J. Bus. Financ.
Acc. 46(3–4), 265–306 (2019)
44. Elsayed, K.: Does CEO duality really affect corporate performance? Corp. Govern. Int. Rev.
15(6), 1203–1214 (2007)
45. Endrikat, J., De Villiers, C., Guenther, T., Guenther, E.: Board characteristics and corporate
social responsibility: a meta-analytic investigation. Bus. Soc. 60(8), 2099–2135 (2021)
46. Erel, I., Stern, L., Tan, C., Weisbach, M.: Selecting directors using machine learning. Rev.
Financ. Stud. 34(7), 3226–3264 (2021)
47. Erhardt, N., Werbel, J., Shrader. C.: Board of director diversity and firm financial performance.
Corp. Governance Int. Rev. 11(2), 102–11 (2003)
48. Fama, E., Jensen, M.: Separation of ownership and control. J. Law Econ. 26(2), 301–325
(1983)
49. Fernando, A., Muraleedharan, K., Satheesh, E.: Corporate governance: principles, policies
and practices. Pearson India Education Service Pvt Ltd., Bengaluru (2017)
50. Filatotchev, I., Nakajima, C.: Internal and external corporate governance: an interface between
an organization and its environment. Br. J. Manag. 21(3), 591–606 (2010)
51. Ford, M.: Architects of intelligence: the truth about AI from the people building it. Pack
Publishing, Birmingham (2018)
52. Francoeur, C., Labelle, R., Sinclair-Desgagné, B.: Gender diversity in corporate governance
and top management. J. Bus. Ethics 81(1), 83–95 (2008)
53. Garanina, T., Kaikova, E.: Corporate governance mechanisms and agency costs: cross country
analysis. Corp. Gov. 16(2), 347–360 (2016)
428 D. Gupta
54. Gillan, S.: Recent developments in corporate governance: an overview. J. Corp. Finan. 12(3),
381–402 (2006)
55. Gillan, S., Starks, L.: A survey of shareholder activism: motivation and empirical evidence.
Contemp. Financ. Digest. 2(3), 10–34 (1998)
56. Gitundu, E., Kiprop, S., Kibet, L., Kisaka, S.: Corporate governance and financial perfor-
mance: a literature review of measurements and econometric methods of data analysis in
research. Corp. Govern. 7(14), 116–125 (2016)
57. Gompers, P., Ishii, J., Metrick, A.: Corporate governance and equity prices. Q. J. Econ. 118(1),
107–155 (2003)
58. Gompers, P., Ishii, J., Metrick, A.: Incentives versus control: an analysis of US dual-class
companies. National Bureau of Economic Research, Cambridge, MA (2004)
59. Gompers, P., Metrick, A.: Institutional investors and equity prices. Q. J. Econ. 116(1), 229–259
(2001)
60. Gonzales-Bustos, J., Hernandez-Lara, A.: Corporate governance and innovation: a systematic
literature review. Corp. Ownersh. Control. 13(2), 33–45 (2016)
61. Gul, F., Srinidhi, B., Ng, A.: Does board gender diversity improve the informativeness of
stock prices?. J. Account. Econ. 51(3), 314–338 (2011)
62. Hartzell, J., Starks, L.: Institutional investors and executive compensation. J. Financ. 58(6),
2351–2374 (2003)
63. Hermalin, B., Weisbach, M.: Boards of directors as an endogenously determined institution: a
survey of the economic literature. NBER Working Paper Series, Working Paper 8161. http://
www.nber.org/papers/w8161 (2001)
64. Hilb, M.: Unlocking the board’s data value challenge. Directorship 60–61 (2019)
65. Hilb, M.: Towards artificial intelligence? The role of artificial intelligence in shaping the
future of corporate governance. J. Manage. Govern. 24, 851–870 (2020)
66. Hofstede, G.: Culture’s consequences: international differences in work-related values. Sage,
Beverly Hills, CA (1980)
67. Hofstede, G.: Culture’s consequences: comparing values, behaviors, institutions, and organi-
zations across nations, 2nd edn. Sage, Beverly Hills, CA (2001)
68. Iyengar, R., Zampelli, E.: Self-selection, endogeneity, and the relationship between CEO
duality and firm performance. Strateg. Manag. J. 30(10), 1092–1112 (2009)
69. Jensen, M., Meckling, W.: The theory of the firm: managerial behavior, agency costs and
ownership structure. J. Financ. Econ. 3(4), 305–360 (1976)
70. Kovermann, J., Velte, P.: The impact of corporate governance on corporate tax avoidance—a
literature review. J. Int. Account. Audit. Tax. 36, 100270 (2019)
71. Kurshed, A., Lin, S., Wang, M.: Institutional block-holdings of UK firms: Do corporate
governance mechanisms matter? Eur. J. Financ. 17(2), 133–152 (2011)
72. La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: Corporate ownership around the world. J.
Financ. 54(2), 471–517 (1999)
73. La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: What works in securities laws? J. Financ.
61(1), 1–32 (2006)
74. La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: The economic consequences of legal origins.
J. Econ. Literat. 46(2), 30–44 (2008)
75. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Legal determinants of external
finance. J. Financ. 52(3), 1131–1150 (1997)
76. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Law and finance. J. Polit. Econ.
106(6), 1113–1155 (1998)
77. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Agency problems and dividend
policies around the world. J. Financ. 55(1), 1–33 (2000)
78. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Investor protection and corporate
governance. J. Financ. Econ. 58(1–2), 3–27 (2000)
79. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Investor protection and corporate
valuation. J. Financ. 57(3), 1147–1170 (2002)
19 Machine Learning Techniques for Corporate Governance 429
80. Larcker, D., Richardson, S., Tuna, I.: Corporate governance, accounting outcomes, and
organizational performance. Account. Rev. 82(4), 963–1008 (2007)
81. Larraza-Kintana, M., Wiseman, R., Gomez-Mejia, L., Welbourne, T.: Disentangling compen-
sation and employment risks using the behavioural agency model. Strateg. Manag. J. 28(10),
1001–1019 (2007)
82. Leland, H., Pyle, D.: Informational asymmetries, financial structure and financial intermedi-
ation. J. Financ. 32(2), 371–387 (1977)
83. Leung, K., Bhagat, R., Buchan, N., Erez, M., Gibson, C.: Culture and international business:
recent advances and their implications for future research. J. Int. Bus. Stud. 36(4), 357–378
(2005)
84. Li, L., Naughton, T.: Going public with good governance: evidence from China. Corp. Govern.
Int. Rev. 15(6), 1190–1202 (2007)
85. Lin, R., Xie, Z., Hao, Y., Wang, J.: Improving high-tech enterprise innovation in big data
environment: a combinative view of internal and external governance. Int. J. Inf. Manage. 50,
575–585 (2020)
86. Mahadeo, J., Soobaroyen, T., Hanuman, V.: Board composition and financial performance:
uncovering the effects of diversity in an emerging economy. J. Bus. Ethics 105(3), 375–388
(2012)
87. Marsh, H.: Can man ever build a mind? Financial Times, London (2019)
88. Martikainen, M., Miihkinen, A., Watson, L.: Board characteristics and disclosure tone (2019).
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3410036. Accessed 3 Oct 2023
89. McConaughy, D., Mathews, C., Fialko, A.: Founding family-controlled firms: performance,
risk and value. J. Small Bus. Manage. 39(1), 31–49 (2001)
90. McNulty, T., Zattoni, A., Douglas, T.: Developing corporate governance research through
qualitative methods: a review of previous studies. Corp. Govern. Int. Rev. 21(2), 183–198
(2013)
91. Meyer, J., Rowan, B.: Institutionalized organizations: formal structure as myth and ceremony.
Am. J. Sociol. 83(2), 340–363 (1977)
92. Morck, R., Shleifer, A., Vishny, R.: Management ownership and market valuation: an empirical
analysis. J. Financ. Econ. 20(1–2), 293–315 (1988)
93. Nelson, T.: The persistence of founder influence: management, ownership and performance
effects at initial public offering. Strateg. Manag. J. 24(8), 707–724 (2003)
94. Nilsson, N.: The quest for artificial intelligence—a history of ideas and achievements.
Cambridge University Press, Cambridge (2010)
95. North, D.: Institutions, institutional change, and economic performance. Harvard University
Press, Cambridge, MA (1990)
96. Peng, M.: Towards an institution-based view of business strategy. Asia Pac. J. Manage. 19(2/
3), 251–267 (2002)
97. Peng, M.: Institutional transitions and strategic choices. Acad. Manag. Rev. 28(2), 275–296
(2003)
98. Peng, M., Sun, S., Pinkham, B., Chen, H.: The institution-based view as a third leg for a
strategy tripod. Acad. Manag. Perspect. 23(3), 63–81 (2009)
99. Peng, M., Zhang, S., Li, X.: CEO duality and firm performance during China’s institutional
transitions. Manag. Organ. Rev. 3(2), 205–225 (2007)
100. Pfeffer, J., Salancik, C.: The external control of organizations: a resource dependence
perspective. Harper & Row, New York (1978)
101. Pintea, M., Fulop, M.: Literature review on corporate governance–firm performance relation-
ship. Ann. Facul. Econ. 1(1), 846–854 (2015)
102. Prasad, D., Vozikis, G., Bruton, G., Merikas, A.: “Harvesting” through initial public offerings
(IPOs): the implications of underpricing for the small firm. Entrep. Theory Pract. 20(2), 31–41
(1995)
103. Raghupathi, V., Ren, J., Raghupathi, W.: Identifying corporate sustainability issues by
analyzing shareholder resolutions: a machine-learning text analytics approach. Sustainability
12(11), 4753 (2020)
430 D. Gupta
104. Ramdani, D., Witteloostuijn, A.: The impact of board independence and CEO duality on
firm performance: a quantile regression analysis for Indonesia, Malaysia, South Korea and
Thailand. Br. J. Manag. 21(3), 607–626 (2010)
105. Russell, S., Norvig, P.: Artificial intelligence: a modern approach, 3rd edn. Prentice Hall,
Upper Saddle River (2016)
106. Samanta, N.: Convergence to shareholder holder primacy corporate governance: evidence
from a leximetric analysis of the evolution of corporate governance regulations in 21 countries,
1995–2014. Corp. Gov. 19(5), 849–883 (2019)
107. Sanad, Z., Shiwakoti, R., Kukreja, G.: The role of corporate governance in mitigating real
earnings management: literature review. In: Annual PwR Doctoral Symposium 2018–2019,
pp. 173–87. KnE Social Sciences, Manama (2019)
108. Sarkar, J., Sarkar, S.: Corporate governance in India. Sage Publications, New Delhi (2012)
109. Schwartz, S.: A theory of cultural value orientations: explication and applications. Comp.
Sociol. 5(2–3), 137–182 (2006)
110. Scott, W.: Institutions and organizations. Sage, Thousand Oaks, CA (1995)
111. Shleifer, A., Vishny, R.: Large shareholders and corporate control. J. Polit. Econ. 94(3),
461–488 (1986)
112. Shleifer, A., Vishny, R.: A survey of corporate governance. J. Financ. 52(2), 737–783 (1997)
113. Shukla, H., Limbasiya, N.: Board effectiveness: an evaluation based on corporate governance
score. Int. J. Bus. Ethics Dev. Econ. 4(1), 40–48 (2015)
114. Srinidhi, B., Gul, F., Tsui, J.: Female directors and earnings quality. Contemp. Account. Res.
28(5), 1610–1644 (2011)
115. Stacey, R.: Managing the unknowable: the strategic boundaries between order and chaos.
Jossey-Bass, London (1992)
116. Still, R., Cundiff, E., Govoni, N.: Sales management: decisions, policies, and cases. Prentice-
Hall, Englewood Cliffs (1958)
117. Tang, X., Li, S., Tan, M., Shi, W.: Incorporating textual and management factors into financial
distress prediction: a comparative study of machine learning methods. J. Forecast. 39(5),
769–787 (2020)
118. Tegmark, M.: Life 3.0. Allen Lane, London (2017)
119. The Companies Act, 1956. www.mca.gov.in
120. The Companies Act, 2013. www.mca.gov.in
121. Thomsen, S.: An introduction to corporate governance: mechanisms and systems. Djof
Publishing, Copenhagen (2008)
122. Toksal, A.: The impact of corporate governance on shareholder value. Doctoral dissertation,
Universität zu Köln (2004)
123. Toumi, N., Benkraiem, R., Hamrouni, A.: Board director disciplinary and cognitive influence
on corporate value creation. Corp. Govern. (Bradford) 16(3), 564–578 (2016)
124. Vito, J., Trottier, K.: A literature review on corporate governance mechanisms: past, present
and future. Account. Perspect. 21(2), 207–235 (2022)
125. Wernerfelt, B.: A resource-based view of the firm. Strateg. Manag. J. 5(2), 171–180 (1984)
126. Williamson, O.: The new institutional economics: taking stock, looking ahead. J. Econ. Literat.
38(3), 595–613 (2000)
127. Yousaf, U., Jebran, K., Wan, M.: Can board diversity predict the risk of financial distress?
Corp. Govern. Int. J. Bus. Soc. 21(4), 663–684 (2021)
128. Zingales, L.: Corporate governance. In: Newman, P. (ed.) The new Palgrave dictionary of
economics and the law. Palgrave MacMillan, London (1998)
129. Zheng, Y., Zhou, H., Chen, Z., Ekedebe, N.: Automated analysis and evaluation of SEC
documents. In: 2014 IEEE/ACIS 13th International Conference on Computer and Information
Science (ICIS), pp. 119–124. IEEE, Taiyuan (2014)
Chapter 20
Machine Learning Approaches
for Forecasting Financial Market
Volatility
Abstract Forecasting real estate market volatility is essential for investors, devel-
opers, and policymakers in the dynamic real estate industry landscape, which can
be considered a financial market. This paper extends the discussion of forecasting
financial market volatility using machine learning techniques to the real estate market
context. Drawing upon insights from relevant research studies, we delve into the
diverse methodologies, performance evaluation metrics, and case studies specific to
predicting real estate market volatility. Machine learning models, including regres-
sion analysis, time series models, ensemble methods, and deep learning networks,
are applied to capture the intricate patterns and uncertainties in the real estate market.
Economic indicators, investor sentiment, geospatial data, and housing market funda-
mentals enhance forecasting accuracy. Performance evaluation metrics like Inter-
section over Union (IoU) and Mean Squared Error (MSE), prove indispensable for
evaluating the reliability of predictive models in this domain. The studies presented in
this review demonstrate the practical applications of machine learning in forecasting
real estate market volatility across diverse regions and property types. By adapting
methodologies from the broader financial market context, we provide valuable
insights for stakeholders seeking to make informed decisions in the ever-evolving
real estate financial market.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 431
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_20
432 I. Behera et al.
20.1 Introduction
i. Financial Assets: In the real estate market context, financial assets are repre-
sented by tangible, physical properties. These real estate assets encompass
many properties, including residential homes, condominiums, apartments,
office buildings, retail spaces, warehouses, industrial facilities, land parcels,
and more. These assets have intrinsic value due to their physical existence and
location. Investors in the real estate market acquire ownership or claim future
cash flows from these properties.
ii. Participants: The real estate market attracts diverse participants, each with
unique financial objectives and strategies. Individual investors purchase prop-
erties for various reasons, such as personal residence, rental income, or long-
term investment. They may seek to build wealth, generate rental income, or
secure a living place. Again, institutional investors like Real Estate Investment
Trusts (REITs), pension funds, private equity firms, and hedge funds partici-
pate in the market with larger pools of capital. They often seek income, capital
appreciation, or portfolio diversification through real estate investments. Next,
Real estate developers acquire, develop, and build properties for sale or lease.
They play a vital role in adding new inventory and are sensitive to market trends
and demand. Also, Real Estate Agents are intermediaries between buyers and
sellers. They assist in property transactions, market analysis, pricing, and nego-
tiations. Finally, Mortgage Lenders and Financial institutions provide financing
solutions to individuals and businesses seeking to purchase real estate. They
offer mortgage loans and other financial products tailored to the real estate
market.
iii. Marketplaces: Real estate transactions can occur through various market-
places, reflecting the evolution of technology and participants’ preferences.
Physical Marketplaces or traditional physical locations, such as local real
estate offices, property auctions, and open houses, have historically facili-
tated real estate transactions. These locations offer face-to-face interactions
between buyers, sellers, and real estate professionals. In recent years online
platforms and listing services have gained prominence within the real estate
market. Websites and mobile apps allow buyers to search for properties, view
listings, and connect with sellers and agents remotely. These platforms have
greatly expanded the reach of real estate transactions.
iv. Intermediaries: Intermediaries are essential in real estate transactions to
ensure the smooth and efficient exchange of properties. Real estate agents
and brokers act as intermediaries between buyers and sellers. They assist
in marketing properties, conducting property tours, negotiating deals, and
handling the complex paperwork involved in real estate transactions. For rental
properties, property managers oversee day-to-day operations, tenant relations,
and property maintenance on behalf of property owners, making real estate
investment more passive.
v. Regulation: Real estate markets are subject to various regulations and legal
requirements. Zoning regulations govern how properties can be used and devel-
oped within specific geographic areas, affecting property values and usage.
Building codes ensure that properties meet safety and construction standards.
434 I. Behera et al.
Therefore, the real estate market operates as a financial market by offering finan-
cial assets in tangible properties, facilitating transactions through various participants
and marketplaces, utilizing intermediaries to ensure efficiency, and adhering to regu-
lations governing property transactions. It provides a range of financial instruments,
investment opportunities, and financing options while carrying its unique set of risks
and considerations. The liquidity of real estate assets can vary, reflecting market
dynamics and location-specific factors. Understanding the real estate market’s multi-
faceted nature is essential for individuals and businesses seeking to participate in this
significant sector of the economy. It is crucial in allocating capital, wealth creation,
and risk management for investors and property owners. However, the real estate
market is distinct from traditional financial markets (like stock and bond markets)
due to its unique characteristics, illiquidity of assets, and the physical nature of
real property. Financial markets are paramount in modern economies, serving as
the conduits through which resources are allocated efficiently, asset prices are deter-
mined, and economic growth is nurtured. These markets offer investors opportunities
to diversify portfolios, manage risk, and earn returns on investments, underpinned
by a foundation of rules and regulations to protect market participants and maintain
transparency. While financial markets present avenues for wealth creation, they also
carry inherent risks, emphasizing the need for participants to be well-informed and
prudent in their financial endeavours.
Again, Real estate market volatility refers to the degree of fluctuation or variability
in property prices and market conditions over a specific period [27]. It measures the
market’s instability and the rapidity of price changes within the real estate sector. Like
in financial markets, real estate market volatility can be influenced by various factors
and have significant implications for property buyers, sellers, investors, and the
overall real estate industry. Real estate market volatility, a multifaceted phenomenon
marked by fluctuations in property prices and broader market conditions, is intri-
cately shaped by many interconnected factors. Firstly, economic conditions, encom-
passing variables such as inflation, GDP growth, employment rates, and interest
rates, wield significant influence, with economic downturns diminishing demand,
precipitating price declines, and heightening market instability. Moreover, supply
and demand dynamics, crucial to market fluctuations, see oversupply pushing prices
down and shortages propelling them upward, contributing to volatility. Additionally,
location-specific attributes introduce further nuance, as high-demand urban areas
exhibit greater stability, while less favoured or oversaturated locales may undergo
more pronounced price swings. Furthermore, investor sentiment and behaviour, regu-
latory changes, interest rate shifts, market speculation, global and local events, devel-
opment activity, and market transparency collectively compose the intricate fabric
of volatility within the real estate market [34]. Furthermore, speculative investments
amplify price swings, while regulatory adjustments and interest rate fluctuations
lead to market unpredictability. In addition, the presence of speculators, coupled
with global and local events like natural disasters and economic crises, can trigger
immediate and lasting market shifts. Consequently, development activity influences
436 I. Behera et al.
market stability, with rapid construction leading to oversupply and limited develop-
ment, causing supply constraints [25]. Moreover, market transparency, or the avail-
ability of accurate and timely data, plays a pivotal role in decision-making, with less
information in some markets potentially contributing to price volatility. Grasping
these multifaceted influences is essential for navigating the dynamic real estate land-
scape, facilitating informed decision-making and effectively managing risks. Addi-
tionally, Real estate market volatility can have significant consequences for market
participants. It may present opportunities for buyers to purchase properties at lower
prices during downturns but can also introduce uncertainty and risk [23]. Sellers
may need help in pricing their properties accurately. Investors may need to assess
risk and return carefully while making real estate investment decisions. Additionally,
industry professionals like real estate agents and developers must adapt to changing
market conditions.
Forecasting market volatility holds immense importance for various market partic-
ipants, including investors, traders, financial institutions, policymakers, and busi-
nesspersons. It empowers individuals and organizations to make informed decisions,
adeptly manage risk, and adapt their strategies to evolving market conditions, driven
by several pivotal reasons [20]. Accurate volatility forecasts serve as a linchpin in risk
management, enabling investors and businesses to assess and mitigate risks effec-
tively. By comprehending the potential magnitude of price fluctuations, they can
implement risk-mitigation strategies like diversification, hedging, or adjustments to
their portfolio allocations. Furthermore, investors and portfolio managers deploy
volatility forecasts to inform asset allocation decisions, tailoring their exposure to
different asset classes in alignment with their risk tolerance and expectations of
market volatility. During periods of anticipated high volatility, they may reduce expo-
sure to riskier assets and bolster allocations to more stable ones. Again, traders and
active investors lean on volatility forecasts for crafting and executing trading strate-
gies, utilizing technical analysis, options strategies, and other tactics that capitalize
on projected price swings, and these forecasts guide the optimal timing for entering
or exiting positions [2]. Long-term investors, such as pension funds and endowments,
draw upon volatility forecasts to make investment choices, shaping asset allocation
and investment strategies that ensure long-term financial goals are achieved while
managing potential downside risks. Also, financial institutions, including banks and
insurance companies, leverage volatility forecasts to evaluate the risk inherent in
their investment and loan portfolios, a practice instrumental in maintaining capital
adequacy and making informed lending and investment decisions. Moreover, these
forecasts are indispensable for pricing financial derivatives like options and futures
contracts, guaranteeing that these instruments represent fair market values while
minimizing mispricing and arbitrage opportunities. Central banks and government
policymakers closely monitor market volatility to uphold financial stability and
spur economic growth. Their understanding of volatility often informs decisions
regarding interest rates, monetary policy, and financial regulations. Companies inte-
grate volatility forecasts into their business planning processes, adapting budgeting,
pricing strategies, and inventory management in response to expected market condi-
tions, thus enhancing their ability to navigate shifting economic environments and
20 Machine Learning Approaches for Forecasting Financial Market Volatility 437
Machine learning and data mining techniques have gained significant attention in real
estate market analysis and forecasting in recent years. The ability of these methods to
handle vast and complex datasets has provided valuable insights into various aspects
of the real estate market. This literature review synthesizes findings from 25 relevant
studies that explore the application of machine learning and data mining in real estate
research, shedding light on the diverse methodologies and their implications for the
industry. Cotter and Roll [7] conducted a comparative study of residential Real Estate
Investment Trusts (REITs) and private real estate markets, focusing on returns, risks,
and distributional characteristics. Their analysis highlighted the distinctions between
these two investment vehicles and offered insights into risk-return profiles. Yu et al.
[42] delved into real estate pricing methods, leveraging data mining and machine
learning techniques. Their research aimed to enhance pricing accuracy by considering
multiple variables and adopting sophisticated modelling approaches. Rafiei and Adeli
438 I. Behera et al.
[31] introduced a novel machine-learning model for estimating the sale prices of real
estate units. Their study demonstrated the potential of machine learning in capturing
the intricate relationships between property attributes and market dynamics. Using
geocoding and machine learning, Tchuente and Nyawa [37] explored real estate price
estimation in French cities. Their research harnessed location-based data to improve
price predictions and spatial understanding. Park and Ryu [29] contributed to risk
management in real estate markets by developing a machine learning-based early
warning system for housing and stock markets. Their approach focused on identifying
potential market fluctuations and risks. Kabaivanov and Markovska [17] examined
the role of artificial intelligence in real estate market analysis, highlighting the advan-
tages of AI in handling complex market dynamics. Using machine learning, Hausler
et al. [13] investigated news-based sentiment analysis in real estate. Their work
revealed the influence of sentiment on market trends and dynamics. Gupta et al. [10]
employed machine learning to predict housing market synchronization across US
states, emphasizing the role of uncertainty in market movements. Gude [9] proposed
multi-level modelling approach for forecasting real estate dynamics, capturing the
complexity of the market across different levels. Cepni et al. [5] explored the impact
of investor sentiment on housing returns in China, applying machine learning tech-
niques for sentiment analysis. Prakash et al. [30] demonstrated the application of
machine learning in predicting housing prices, offering insights into price trends
and patterns. Rosenbaum and Zhang [32] investigated the global presence of the
volatility formation process using rough volatility and machine learning techniques,
contributing to our understanding of market volatility. Hu et al. [16] developed a
hybrid deep learning approach for predicting copper price volatility, highlighting
the potential of combining neural networks with traditional models. Lian et al. [18]
applied machine learning and time series models to predict VNQ market trends,
offering investors valuable insights. Habbab and Kampouridis [11] investigated five
machine-learning algorithms for optimizing mixed-asset portfolios, including Real
Estate Investment Trusts (REITs). Ngene and Wang [28] explored shock transmis-
sions between real estate investment trusts and other assets using time–frequency
decomposition and machine-learning techniques. Lee and Park [19] focused on
forecasting trading volume in local housing markets through a time-series model
and a deep learning algorithm, contributing to market analysis. Verma et al. [39]
predicted house prices in India using linear regression and machine learning algo-
rithms, offering valuable insights into the Indian real estate market. Xu and Zhang
[41] employed neural networks to forecast retail property price indices, providing
accurate predictions for market participants. Abdul Salam et al. [1] conducted a
systematic literature review of machine learning algorithms for price and rent predic-
tions in real estate, summarizing the state of the art. Han et al. [12] demonstrated
machine learning methods to predict consumer confidence from search engine data,
providing insights into consumer sentiment and its impact on the real estate market.
Sanyal et al. [33] focused on Boston house price prediction using regression models,
offering localized insights into housing markets. Nagl [26] conducted sentiment
analysis within a deep learning probabilistic framework, offering new evidence from
20 Machine Learning Approaches for Forecasting Financial Market Volatility 439
residential real estate in the United States. Wiradinata et al. [40] performed a post-
pandemic analysis of house price prediction in Surabaya using machine learning,
contributing to our understanding of market resilience in challenging times.
All these studies reflect the evolving landscape of real estate market analysis,
where machine learning and data mining techniques play a vital role in enhancing
prediction accuracy, risk assessment, and decision-making processes. By leveraging
the vast amount of data available in the real estate domain, these methodologies
contribute to a more informed and efficient real estate market.
Forecasting volatility in the real estate market involves predicting future fluctuations
in property prices and market conditions. Traditional real estate market volatility
forecasting methods often draw from statistical and econometric models, as well
as real estate-specific data and indicators. Here are some traditional methods for
forecasting volatility in the real estate market:
i. Historical Volatility (HV): Historical volatility in the real estate market
involves looking at past changes in property prices to estimate how much they
have varied over time. It is calculated as the standard deviation of historical
property price returns. High historical volatility suggests that property prices
have experienced significant fluctuations in the past, which may continue in
the future.
ii. Moving Averages: Moving averages are used to smooth out fluctuations
in property prices. For example, a 12-month moving average calculates
the average property price over the past year. Investors and analysts use
moving averages to identify trends and assess whether prices are increasing or
decreasing steadily.
iii. Exponential Smoothing: Exponential smoothing models give more weight
to recent property price data while gradually reducing the significance of
older data points. This method is particularly useful for capturing short-term
fluctuations in property prices, as it emphasizes recent trends.
iv. GARCH Models (Generalized Autoregressive Conditional Heteroskedas-
ticity): GARCH models are statistical models that capture the time-varying
volatility of property prices. They estimate the conditional variance of prop-
erty price returns, allowing for the modelling of volatility clustering, where
periods of high volatility tend to follow one another.
v. Time Series Decomposition: Time series decomposition separates property
price data into its main components: trend, seasonality, and residual volatility.
Analyzing the volatility component can provide insights into the potential for
future price fluctuations.
440 I. Behera et al.
vi. Volatility Index: Similar to stock market volatility indices like the VIX,
some regions or markets have started developing real estate-specific volatility
indices. These indices measure the expected future volatility of property prices
within a specific real estate market or geographic area.
vii. Economic Indicators: Traditional economic indicators, such as GDP growth,
employment rates, and interest rates, can be used to gauge the potential for
volatility in the real estate market. Economic downturns or rising interest rates
can influence property price movements.
viii. Housing Market Data: Real estate-specific data, including housing starts,
building permits, and inventory levels, can provide insights into market condi-
tions and potential volatility. An oversupply of housing relative to demand can
lead to price fluctuations.
ix. Mortgage Market Data: Data related to mortgage rates, loan originations,
and mortgage delinquency rates can offer valuable insights into the health of
the real estate market and its potential for volatility. Rising mortgage rates, for
instance, can impact housing affordability and demand.
x. Local Market Indicators: Real estate markets are highly localized, with condi-
tions varying significantly from one region to another. Local indicators, such
as population growth, job opportunities, and supply–demand imbalances, are
crucial in forecasting volatility within specific markets.
xi. Consumer Confidence Surveys: Consumer sentiment and confidence in the
housing market can be leading indicators of potential volatility. A drop in
consumer confidence may signal uncertainty and price fluctuations.
xii. Real Estate Transaction Data: Historical data on property transactions,
including sales prices and transaction volumes, provide valuable informa-
tion about past price movements. Analyzing transaction data can help forecast
future volatility based on historical patterns.
Each method discussed above offers a different perspective on the real estate
market and its potential for volatility. Real estate professionals and investors often
use a combination of these methods and data sources to make informed decisions
about buying, selling, or investing in real estate properties. It is important to note
that real estate market volatility can differ significantly depending on factors like
property type (residential, commercial, industrial), location (urban vs. rural), and
regional economic conditions. Therefore, the forecasting method and data sources
should be tailored to the analyzed real estate market. Additionally, as with any fore-
casting model, ongoing monitoring and validation of results are essential to ensure
the accuracy and relevance of the forecasts.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 441
Machine learning techniques have gained traction in real estate market volatility
forecasting due to their ability to handle complex data patterns and improve accuracy.
These techniques leverage historical real estate data, economic indicators, and other
relevant factors to predict future market volatility. Here are some machine learning
techniques commonly used in real estate market volatility forecasting:
1. Regression Analysis: Linear and non-linear regression models can be applied
to real estate data to predict market volatility. Features like historical property
prices, interest rates, GDP growth, and unemployment rates can be used as input
variables. Regression models aim to find relationships between these variables
and the volatility of real estate prices.
Yi = β0 + β1 Xi + ε (20.1)
where
Yi is the dependent variable (volatility)
β0 is the intercept
β1 is the slope coefficient for the independent variable Xi
ε represents the error term
2. Time Series Models: Time series forecasting techniques, such as ARIMA (Auto
Regressive Integrated Moving Average) and its variations, are used to analyze
historical property price data. These models capture seasonality, trends, and auto-
correlation in the data to make short-term and long-term predictions about real
estate market volatility.
where
Yt is the time series at time t
c is a constant
ϕ1 , …, ϕp are auto-regressive coefficients
θ1 , ..., θq are moving average coefficients
εt is the white noise error term.
3. Random Forests: Random Forests are ensemble learning models that can capture
complex relationships in real estate data. They work well with numerical and
categorical features, making them suitable for analyzing various factors influ-
encing real estate market volatility, such as location, property type, and economic
indicators.
442 I. Behera et al.
Forecasting real estate market volatility is a complex task that relies on diverse data
sources and meticulous data preprocessing. Access to historical property price data is
fundamental, providing insights into past market dynamics. Alongside this, economic
indicators such as GDP growth, inflation, and interest rates offer crucial macroeco-
nomic context. Housing market data, including housing starts, building permits, and
inventory levels, helps assess supply and demand dynamics. Mortgage market data,
20 Machine Learning Approaches for Forecasting Financial Market Volatility 443
viii. News and Social Media Data: Textual data from news articles, social media
platforms, and real estate market reports can be processed using natural
language processing (NLP) techniques. This unstructured data can extract
sentiment, detect events, and monitor public perception, impacting market
sentiment and volatility.
Data preprocessing is essential for analyzing market volatility in the real estate
sector as it helps clean and refine raw data, ensuring accuracy and consistency.
Removing outliers, handling missing values, and normalizing data can enhance the
quality of information, making it suitable for robust predictive models. Effective data
preprocessing lays the foundation for accurate volatility forecasts, aiding investors
and professionals in making informed decisions in this complex and dynamic market.
Some common data preprocessing steps for real estate market volatility forecasting
are discussed below:
i. Data Cleaning: Data cleaning includes removing or addressing missing
outliers, values, and inconsistencies in the data. This step make sure that the
data is high quality and suitable for analysis.
ii. Handling Missing Data: Missing data can be addressed through imputation
(filling missing values with estimated values) or removing rows with missing
values, depending on the extent and nature of missing data.
iii. Time Alignment: To analyze data effectively, ensuring that all data sources
are synchronized regarding time frames and frequencies is crucial. This
may involve aggregating data to a consistent time interval (e.g., monthly or
quarterly) to facilitate analysis.
iv. Feature Engineering: Feature engineering entails creating new variables or
transforming existing ones to capture relevant information. For instance, lagged
property price data, moving averages, and economic indicator transformations
can help create informative features for forecasting.
v. Normalization and Scaling: Numerical features are often normalized or scaled
to a consistent range. Common techniques include Min–Max scaling (rescaling
to a range of [0, 1]) and Z-score normalization (scaling with mean 0 and
standard deviation 1).
vi. Handling Categorical Data: Categorical data, such as property types (e.g.,
residential, commercial) or regions, must be encoded into numerical format.
One-hot encoding is a technique that converts categorical data into a binary
format by creating individual binary variables for each category. In contrast,
label encoding assigns a unique numerical value to each category in the dataset.
vii. Handling Imbalanced Data: If there is an imbalance in the distribution
of volatility periods (e.g., high volatility periods are rare compared to low
volatility periods), techniques like oversampling (increasing the representa-
tion of minority class) or undersampling (reducing the majority class) may be
applied to balance the dataset.
viii. Time Series Decomposition: Time series data can be decomposed into its
primary components, including trend, seasonality, and residual volatility. This
decomposition aids in understanding the underlying patterns in the dataset.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 445
ix. Data Splitting: The data is split into validation, training and test sets. Time-
based splitting is often preferred to mimic real-world forecasting scenarios and
ensure that models are evaluated on unseen data.
x. Regularization and Transformation: Regularization techniques, such as L1
or L2, may be applied to prevent overfitting in predictive models. Log or
Box-Cox transformations can also be used for variables with non-normal
distributions.
Thus, data preprocessing is a critical and iterative step in the real estate
market volatility forecasting process. High-quality, well-processed data enhances
the accuracy and effectiveness of predictive models, allowing for more informed
decision-making in real estate investments and risk management.
Performance evaluation metrics are essential for assessing machine learning models’
accuracy and effectiveness in forecasting real estate market volatility. Some
common performance evaluation metrics related to the machine-learning techniques
mentioned earlier:
i. Mean Absolute Error (MAE) measures the average absolute difference
between the actual and predicted volatility values. It provides insight into the
degree of errors made by the model. Mean Absolute Error is useful to under-
stand the average absolute prediction error in the same units as the target variable
(volatility). A lower MAE indicates better model performance.
ii. Mean Squared Error (MSE) quantifies the average squared difference
between predicted and actual values. It gives higher weight to larger errors.
MSE helps identify outliers or cases where the model’s predictions signifi-
cantly deviate from actual values. A lower MSE suggests a model with smaller
errors, but it penalizes larger errors more heavily.
iii. Root Mean Squared Error (RMSE) is the square root of MSE and provides
a more interpretable metric in the same units as the target variable. RMSE is
preferred while expressing prediction errors in the original scale of the target
variable (volatility). A lower RMSE indicates a better fit of the model to the
data.
iv. R-squared (R2 ) represents the proportion of variance in the target variable
explained by the technique. It measures the goodness of fit. R2 helps evaluate
how well the technique captures the variability in volatility values. R2 values
range from 0 to 1, with higher values indicating that the model explains a larger
proportion of variance. A higher R2 suggests a better-fitting model.
v. Mean Absolute Percentage Error (MAPE) measures the percentage differ-
ence between predicted and actual values making it suitable for time series data.
MAPE is useful for understanding prediction accuracy in relative terms, which
446 I. Behera et al.
20.8 Conclusion
The real estate market, often regarded as a financial market, holds a pivotal position
in the global economy, significantly impacting wealth creation and capital alloca-
tion. Accurate real estate market volatility forecasts are essential for various stake-
holders, including investors, developers, policymakers, and homeowners. Incorpo-
rating machine learning techniques in the context of real estate market volatility fore-
casting has revolutionized how we analyze and predict market dynamics. The key
takeaways from the preceding discussions in this chapter on methodologies, perfor-
mance metrics, and case studies related to forecasting real estate market volatility
using machine learning are summarized here. The methodologies discussed in this
exploration encompass a wide array of machine-learning techniques tailored to the
unique characteristics of the real estate market. A detailed study of current research
work shows regression analysis, time series models like ARIMA, ensemble methods
such as Random Forests and Gradient Boosting, neural networks, support vector
machines, and deep reinforcement learning have all been applied to model volatility.
Feature engineering techniques have empowered these models to capture the intricate
relationships between economic indicators, geospatial data, investor sentiment, and
housing market fundamentals. These machine learning models have proven effec-
tive in handling numerical and categorical data, a crucial requirement given the
diverse factors influencing real estate market volatility. The assessment of machine
learning models for real estate market volatility forecasting requires comprehensive
performance evaluation metrics. These metrics differ based on the type of model
used and the specific forecasting task. For regression-based models, metrics like
Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared
Error (RMSE), and R-squared (R2 ) provide valuable insights into the explanatory
power and accuracy of the techniques. Classification models, such as support vector
machines, rely on metrics like accuracy, precision, recall, and F1-score to gauge
their effectiveness in identifying high and low volatility periods. Time series models
benefit from metrics such as Akaike Information Criterion (AIC), Mean Absolute
Percentage Error (MAPE), and Bayesian Information Criterion (BIC) for model
selection. These performance metrics are quantitative measures to compare models,
optimize hyperparameters, and enhance forecasting accuracy.
This chapter further explores the real-world applicability of machine learning
in real estate market volatility forecasting which is evident from the existing liter-
ature. These studies span diverse geographical regions and property types, empha-
sizing the versatility of machine learning models in capturing local market dynamics.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 449
References
1. Abdul Salam, M.H., Mohd, T., Masrom, S., Johari, N., Mohamad Saraf, M.H.: Machine learning
algorithms on price and rent predictions in real estate: a systematic literature review (2022)
2. Bhatia, A., Chandani, A., Atiq, R., Mehta, M., Divekar, R.: Artificial intelligence in financial
services: a qualitative research to discover robo-advisory services. Qual. Res. Finan. Mark.
13(5), 632–654 (2021)
3. Boukherouaa, E.B., Shabsigh, M.G., AlAjmi, K., Deodoro, J., Farias, A., Iskender, E.S.,
Mirestean, M.A.T., Ravikumar, R.: Powering the digital economy: opportunities and risks
of artificial intelligence in finance. Int. Monetary Fund (2021)
450 I. Behera et al.
4. Cavalcante, R.C., Brasileiro, R.C., Souza, V.L., Nobrega, J.P., Oliveira, A.L.: Computational
intelligence and financial markets: a survey and future directions. Expert Syst. Appl. 15(55),
194–211 (2016)
5. Cepni, O., Gupta, R., Onay, Y.: The role of investor sentiment in forecasting housing returns
in China: a machine learning approach. J. Forecast. 41(8), 1725–1740 (2022)
6. Cerutti, E., Dagher, J., Dell’Ariccia, G.: Housing finance and real-estate booms: a cross-country
perspective. J. Hous. Econ. 1(38), 1–3 (2017)
7. Cotter, J., Roll, R.: A comparative anatomy of residential REITs and private real estate markets:
returns, risks and distributional characteristics. Real Estate Econ. 43(1), 209–240 (2015)
8. Durusu-Ciftci, D., Ispir, M.S., Yetkiner, H.: Financial development and economic growth: some
theory and more evidence. J. Policy Model. 39(2), 290–306 (2017)
9. Gude, V.: A multi-level modeling approach for predicting real-estate dynamics. Int. J. Housing
Markets Anal. (2023)
10. Gupta, R., Marfatia, H.A., Pierdzioch, C., Salisu, A.A.: Machine learning predictions of housing
market synchronization across US states: the role of uncertainty. J. Real Estate Finance Econ,
1–23 (2022)
11. Habbab, F.Z., Kampouridis, M.: An in-depth investigation of five machine learning algorithms
for optimizing mixed-asset portfolios including REITs. Expert Syst. Appl. 235, 121102 (2024)
12. Han, H., Li, Z., Li, Z.: Using machine learning methods to predict consumer confidence from
search engine data. Sustainability 15(4), 3100 (2023)
13. Hausler, J., Ruscheinsky, J., Lang, M.: News-based sentiment analysis in real estate: a machine
learning approach. J. Prop. Res. 35(4), 344–371 (2018)
14. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques
applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
15. Hsiao, Y.J., Tsai, W.C.: Financial literacy and participation in the derivatives markets. J. Bank.
Finance 1(88), 15–29 (2018)
16. Hu, Y., Ni, J., Wen, L.: A hybrid deep learning approach by integrating LSTM-ANN networks
with GARCH model for copper price volatility prediction. Physica A Stat. Mech. Appl. 557,
124907 (2020)
17. Kabaivanov, S., Markovska, V.: Artificial intelligence in real estate market analysis. In: AIP
Conference Proceedings, vol. 2333, no. 1. AIP Publishing (2021)
18. Lian, Y.M., Li, C.H., Wei, Y.H.: Machine learning and time series models for vnq market
predictions. J. Appl. Finance Bank. 11(5), 29–44 (2021)
19. Lee, C., Park, K.K.H.: Forecasting trading volume in local housing markets through a time-
series model and a deep learning algorithm. Eng. Constr. Archit. Manag. 29(1), 165–178
(2022)
20. Liow, K.H., Huang, Y.: The dynamics of volatility connectedness in international real estate
investment trusts. J. Int. Finan. Markets. Inst. Money 1(55), 195–210 (2018)
21. Liow, K.H., Zhou, X., Ye, Q.: Correlation dynamics and determinants in international
securitized real estate markets. Real Estate Econ. 43(3), 537–585 (2015)
22. Liow, K.H., Liao, W.C., Huang, Y.: Dynamics of international spillovers and interaction:
evidence from financial market stress and economic policy uncertainty. Econ. Model. 1(68),
96–116 (2018)
23. Loutskina, E., Strahan, P.E.: Financial integration, housing, and economic volatility. J. Financ.
Econ. 115(1), 25–41 (2015)
24. Mohanta, B., Nanda, P., Patnaik, S.: Management of VUCA (volatility, uncertainty, complexity
and ambiguity) using machine learning techniques in industry 4.0 paradigm. New Paradigm
Industry 4.0 IoT Big Data Cyber Phys. Syst, 1–24 (2020)
25. Munawar, H.S., Qayyum, S., Ullah, F., Sepasgozar, S.: Big data and its applications in smart real
estate and the disaster management life cycle: a systematic analysis. Big Data Cogn. Comput.
4(2), 4 (2020)
26. Nagl, C.: Sentiment analysis within a deep learning probabilistic framework–new evidence
from residential real estate in the United States. J. Housing Res., 1–25 (2023)
20 Machine Learning Approaches for Forecasting Financial Market Volatility 451
27. Nazlioglu, S., Gormus, N.A., Soytas, U.: Oil prices and real estate investment trusts (REITs):
Gradual-shift causality and volatility transmission analysis. Energy Econ. 1(60), 168–175
(2016)
28. Ngene, G. M., Wang, J.: Transitory and permanent shock transmissions between real estate
investment trusts and other assets: evidence from time-frequency decomposition and machine
learning. Accounting & Finance 64(1), 539–573 (2023)
29. Park, D., Ryu, D.: A machine learning-based early warning system for the housing and stock
markets. IEEE Access 9, 85566–85572 (2021)
30. Prakash, H., Kanaujia, K., Juneja, S.: Using machine learning to predict housing prices. In:
2023 International Conference on Artificial Intelligence and Smart Communication (AISC),
pp. 1353–1357. IEEE (2023)
31. Rafiei, M.H., Adeli, H.: A novel machine learning model for estimation of sale prices of real
estate units. J. Constr. Eng. Manag. 142(2), 04015066 (2016)
32. Rosenbaum, M., Zhang, J.: On the universality of the volatility formation process: when
machine learning and rough volatility agree (2022). arXiv preprint arXiv:2206.14114
33. Sanyal, S., Biswas, S.K., Das, D., Chakraborty, M., Purkayastha, B.: Boston house price predic-
tion using regression models. In: 2022 2nd International Conference on Intelligent Technologies
(CONIT), pp. 1–6. IEEE (2022)
34. Shu, H.C., Chang, J.H.: Investor sentiment and financial market volatility. J. Behav. Financ.
16(3), 206–219 (2015)
35. Sonkavde, G., Dharrao, D.S., Bongale, A.M., Deokate, S.T., Doreswamy, D., Bhat, S.K.: Fore-
casting stock market prices using machine learning and deep learning models: a systematic
review, performance analysis and discussion of implications. Int. J. Finan. Stud. 11(3), 94
(2023)
36. Song, Y., Ma, X.: Exploration of intelligent housing price forecasting based on the anchoring
effect. Neural Comput. Appl. 18, 1–4 (2023)
37. Tchuente, D., Nyawa, S.: Real estate price estimation in French cities using geocoding and
machine learning. Ann. Oper. Res. 1–38 (2022)
38. Valickova, P., Havranek, T., Horvath, R.: Financial development and economic growth: a meta-
analysis. J. Econ. Surv. 29(3), 506–526 (2015)
39. Verma, A., Nagar, C., Singhi, N., Dongariya, N., Sethi, N.: Predicting house price in India
using linear regression machine learning algorithms. In: 2022 3rd International Conference on
Intelligent Engineering and Management (ICIEM), pp. 917–924. IEEE (2022)
40. Wiradinata, T., Graciella, F., Tanamal, R., Soekamto, Y.S., Saputri, T.R.D.: Post-Pandemic
Analysis of House Price Prediction in Surabaya: A Machine Learning Approach (2022)
41. Xu, X., Zhang, Y.: Retail property price index forecasting through neural networks. J. Real
Estate Portfolio Manag. 29(1), 1–28 (2023)
42. Yu, Y., Lu, J., Shen, D., Chen, B.: Research on real estate pricing methods based on data mining
and machine learning. Neural Comput. Appl. 33, 3925–3937 (2021)
Chapter 21
Deep Learning Models in Finance: Past,
Present, and Future
Abstract Over the past few decades, the financial industry has shown a keen interest
in using computational intelligence to improve various financial processes. As a
result, a range of models have been developed and published in numerous studies.
However, in recent years, deep learning (DL) has gained significant attention within
the field of machine learning (ML) due to its superior performance compared to
traditional models. There are now several different DL implementations being used
in finance, particularly in the rapidly growing field of Fintech. DL is being widely
utilized to develop advanced banking services and investment strategies. This chapter
provides a comprehensive overview of the current state-of-the-art in DL models for
financial applications. The chapter is divided into categories based on the specific
sub-fields of finance, and examines the use of DL models in each area. These include
algorithmic trading, price forecasting, credit assessment, and fraud detection. The
chapter aims to provide a concise overview of the various DL models being used in
these fields and their potential impact on the future of finance.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 453
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_21
454 S. K. Vishnumolakala et al.
21.1 Introduction
Over the past decade, advancements in artificial intelligence (AI) have permeated
almost all areas of human endeavor. Its versatility and adaptability have paved the
way for the execution of tasks with increased precision and efficacy, consequently
transforming traditional industries and institutions. The finance industry, renowned
for its intricate systems and prodigious generation of data, has particularly felt the
seismic shift brought about by the advent of AI.
One significant method of AI implementation is through machine learning, a
system that empowers computers to learn from data and improve from experience
without being explicitly programmed. In the echelons of machine learning, a partic-
ular subset known as deep learning has emerged as a game-changer. Rooted in artifi-
cial neural networks with multiple levels of abstraction, deep learning demonstrates
an exceptional ability to discern and decode complex patterns in large data sets,
mirroring the workings of the human brain in processing data for decision-making.
Deep learning, however, has not yet fully explored in the context of finance and
this presents a ripe area of investigation. Recognizing this, the core of this chapter
revolves around examining the role and potential of deep learning within the finance
industry. The primary objective of this study is to investigate the efficacy of deep
learning algorithms in diverse financial areas, including algorithmic trading, price
forecasting, fraud detection, and credit assessment. We aim to draw comparisons
between these novel techniques and the traditional statistical approaches, identifying
their advantages, limitations, and areas of application.
Furthermore, we also aspire to amalgamate ongoing research and experiments in
this field, with the ultimate goal of elucidating the process of adopting deep learning
in financial settings. By spotlighting the challenges, we hope to spur further discourse
on the practical implications and to stimulate innovative solutions that may pave the
way for an even more fruitful use of deep learning in finance.
The structure of this chapter is as follows: we commence with an exposition on
the role of AI and deep learning in the finance sector, building the foundation for our
subsequent exploration. This is followed by a deep dive into specific applications
and case studies of deep learning, dissecting its applications in areas like algorithmic
trading, price forecasting, fraud detection, and credit assessment. Ultimately, we draw
conclusions from our analysis, critically reflect on the implications of our research,
and outline potential areas of inquiry for future studies in this rapidly evolving field.
The financial sector has notably embraced advancements in information and commu-
nication technologies. Investors are usually driven by opportunities where the advan-
tages of data gathering, decision-making, and trade strategies implementation result
in potential gains. The rise of the Internet has facilitated a shift in the financial
21 Deep Learning Models in Finance: Past, Present, and Future 455
for price flow than manual trading [3]. Algorithmic trading positively influenced
market quality, as suggested by research conducted by Boehmer et al. [4].
Numerous deep learning-based algorithmic trading systems have been developed
to achieve a variety of trading objectives. Some systems aim to forecast price trajecto-
ries of financial assets (stocks, indices, bonds, currencies, etc.), some execute trades
based on buying and selling signals, and others generate asset returns by simu-
lating real-world financial scenarios. There are also systems designed to facilitate
independent research, such as pair trading, buying and selling signals, and more.
Since Algorithmic trading involves buy-sell decisions made exclusively by mathe-
matical algorithms, these decisions can be supported by straightforward principles,
mathematical models, optimal procedures, or even highly sophisticated function
approximation methods typical of machine learning/deep learning.
Over the past two decades, algorithmic trading has significantly transformed the
financial sector, primarily due to the development of electronic online trading plat-
forms and frameworks. As a result, algorithmic trading models based on deep learning
(DL) began receiving significant interest. The majority of algorithmic trading appli-
cations combine price prediction models for market timing purposes. As such, most
price or trend forecasting algorithms that generate buy-sell signals based on their
predictions are referred to as algorithmic trading systems.
However, some studies propose stand-alone algorithmic trading models that focus
on the dynamics of the transaction, optimising trading parameters like bid-ask spread,
limit order book analysis, position sizing, and more. This topic particularly piques
the interest of researchers studying High-Frequency Trading (HFT). DL models have
subsequently begun to appear in HFT studies.
Hu et al. present a comprehensive review of significant evolutionary algorithmic
implementations on algorithmic trading models [5]. Since algorithmic trading and
financial time series forecasting are closely intertwined, numerous ML survey papers
focus on forecasting-based algorithmic trading models. Those interested in this topic
can refer to [6] for more details.
Most studies on algorithmic trading have concentrated on forecasting stock or
index prices. Long Short-Term Memory (LSTM) has been the most used DL model
in these implementations. In [7], price prediction for algorithmic stock trading was
conducted using Recurrent Neural Networks (RNN) with Graves LSTM, using trade
indicators based on market microstructures as the input. Bao et al. utilised technical
indicators for their work. The research outlined in [8] discussed forecasting stock
prices using Wavelet Transforms (WT), LSTM, and Stacked Autoencoders (SAEs).
The research work presented in [9] combined the implementation of Convolutional
Neural Networks (CNN) and LSTM model structures (with CNN used for stock
selection and LSTM for price prediction) (Fig. 21.2).
Zhang et al. [10] presented an innovative method for stock price prediction with a
State Frequency Memory (SFM) recurrent network with different frequency trading
patterns, which improved prediction and trading performances. Tran et al. [11] devel-
oped a DL model that forecasts price changes through midprice prediction using
high-frequency limit order book data with tensor representation for an HFT trading
21 Deep Learning Models in Finance: Past, Present, and Future 457
Fig. 21.2 Price prediction using deep learning base sentiment analysis and corresponding predic-
tion models process flow
system. The authors of [12] utilised Fuzzy Deep Direct Reinforcement Learning
(FDDR) to anticipate stock prices and generate trading signals.
Noteworthy research exists for index prediction as well. The implementation of
S&P500 index price prediction using LSTM can be found in [13]. For the Greek Stock
Exchange Index prediction, Mourelatos et al. [14] compared the performance of
LSTM and Genetic Algorithm with a Support Vector Regression (GASVR). Chinese
intraday futures market trading model using Deep Reinforcement Learning (DRL)
and LSTM was implemented by Si et al. Yong et al. The research presented in [15]
used the DMLP approach to forecast Singapore Stock Market index data, considering
the Open, Close, High, Low values of the time series index data.
Certain studies have utilised trading in cryptocurrencies or forex. The research
[16] developed and evaluated agent-inspired trading using deep (recurrent) reinforce-
ment learning and LSTM in the trading of the GBP/USD. DMLP was used in [17]
to predict trading prices in commodities and foreign exchange. Korczak et al. [18]
used a multi-agent-based trading environment to implement a forex trading (GBP/
PLN) model using various input parameters. One of the agents outperformed all
other models when using CNN for prediction. Spilak et al. [19] used LSTM, RNN,
and DMLP algorithms to construct a dynamic portfolio utilising a variety of cryp-
tocurrencies. Jeong et al. [20] implemented a simple Deep Q-Network (DQN) for the
trading of Bitcoin. This is by no means an exhaustive analysis of all the different types
of models and techniques used in price prediction, but it provides a good overview
of some of the more notable examples. Despite these advancements, there remains
ample room for further exploration and development, particularly with the goal of
improving the effectiveness and applicability of deep learning in algorithmic trading
and price forecasting. Sezer et al. provide a systematic literature review covering
deep learning applications in financial time series forecasting from 2005 to 2019,
highlighting broad implementation areas and substantial impacts in academia and
the finance industry [21]. Another study by Sezer et al. proposes a deep neural-
network-based stock trading system optimized with technical analysis parameters
using genetic algorithms [22]. Navon and Keller present an end-to-end deep learning
approach for financial time series prediction, leveraging raw financial data inputs
to predict temporal trends in NYSE and NASDAQ stocks and ETFs [23]. Troiano
et al. explore using LSTM networks to learn trading rules from market indicators
458 S. K. Vishnumolakala et al.
and trading decisions [24]. Sirignano and Cont uncover universal and stationary
relations between order flow history and price move direction using a large-scale
deep learning approach applied to high-frequency market data [25]. Tsantekidis
et al. develop a deep learning model to detect price change indications in finan-
cial markets, addressing the noisy and stochastic nature of markets [26]. Gudelek
et al. propose a novel method for predicting stock price movements using convolu-
tional neural networks (CNN) with ETFs to avoid high market volatility [27]. Sezer
and Ozbayoglu introduce an algorithmic trading model using a 2-D CNN based
on image processing properties, converting financial time series into 2-D images
with various technical indicators [28]. Hu et al. present a deep stock representa-
tion learning approach from candlestick charts to investment decisions, addressing
limitations in existing stock similarity measurements [29]. Tsantekidis et al. propose
forecasting stock prices using CNNs applied to limit order book data, aiming to detect
repeated patterns of price movements [30]. Gunduz et al. use CNN architecture with
an ordered feature set to predict the intraday direction of Borsa Istanbul 100 stocks
[31]. Chen et al. develop an agent-based reinforcement learning system to mimic
professional trading strategies from large trading records [32]. Wang et al. leverage
deep learning to model the stock market structure as a correlation network of stocks,
improving market structure modeling [33]. Day and Lee use deep learning for finan-
cial sentiment analysis on news articles from financial news providers, enhancing
market sentiment understanding [34]. Sirignano applies deep learning techniques to
model the dynamics of limit order books, aiming to uncover patterns for predicting
future price movements [35]. Gao explores the use of deep reinforcement learning
for time series analysis in trading games, developing models to learn optimal trading
strategies through simulated environments [36].
Fraud detection, one of the most heavily studied topics in finance for deep learning
research, is an area of increasing importance for governments, authorities, and
financial institutions. Financial fraud manifests in various forms, including credit
card fraud, money laundering, consumer credit fraud, tax evasion, bank fraud, and
insurance claim fraud.
Historically, financial institutions relied on rule-based analysis, devised by domain
experts, to detect fraud. These rules were based on general patterns of fraudulent
transactions or events within finance or banking sectors. However, such rule-based
inference only considers a limited set of attributes, as comprehending all possible
patterns is a challenging task. With the advent of deep learning techniques, we can
now process data and recognize both generalized and complex patterns with higher
efficiency, thus potentially increasing the accuracy of financial fraud detection.
Several investigations into accounting and financial fraud detection, including
those by Kirkos et al. [37], Yue et al. [38], Wang et al. [39], Phua et al. [40], Ngai et al.
[41], Sharma et al. [41], and West et al. [42], have employed soft computing and data
21 Deep Learning Models in Finance: Past, Present, and Future 459
Fig. 21.3 Comparison between conventional and deep learning approaches for fraud detection
460 S. K. Vishnumolakala et al.
customer data over credit data for fraud detection, largely owing to the differing
fundamental dynamics of risk assessment and fraud detection.
working with borrowers. This credit score model often included a borrower’s credit
history, income, debt-to-income ratio, and other financial details.
Machine learning and deep learning have increasingly been applied to automate
the credit assessment process. These techniques analyze a wide range of data points
related to potential borrowers, including financial information (income, credit history,
debt obligations) and non-financial information (occupation, education, age). The
predictive power of these algorithms can provide a more accurate assessment of a
borrower’s ability to repay a loan. They can also identify patterns and trends in the
data that further improve credit assessment accuracy (Fig. 21.5).
Today, most credit lending institutions follow a two-phase system. Initially, they
calculate the probability of a person defaulting. If the probability is less than a
certain threshold, then the person is classified as a non-defaulter. Based on the input
parameters of the model, the output also includes the internal rate or return on the
credit for the institution based on the probability of default. Various deep learning
techniques used today are trained with a similar output structure.
In their study, Baesens et al. [55] examined the effectiveness of several classifica-
tion algorithms using eight real-world credit score datasets. Their analysis included
well.
In another study, Alaka et al. reviewed and categorised the most recent credit
scoring techniques and datasets. They evaluated classifiers, such as decision tree,
random forest, and gradient boosting on three publicly available datasets. The study’s
findings show that the recent application of machine learning techniques to credit
scoring problems can significantly enhance classification accuracy. A review by
Sharma and Panigrahi examines various data mining techniques used for detecting
financial accounting fraud, providing a comprehensive overview of their strengths
and limitations [56]. Pandey et al. explore the application of machine learning classi-
fiers for credit risk analysis, aiming to improve the accuracy and efficiency of credit
risk prediction [57]. Gunnarsson et al. investigate the use of deep learning for credit
462 S. K. Vishnumolakala et al.
scoring, comparing it with traditional methods and assessing its performance and
reliability [58]. Tripathi et al. present an experimental analysis of various machine
learning methods for credit score classification, evaluating their effectiveness in
accurately classifying credit scores [59].
Overall, machine learning and deep learning techniques can improve the accuracy
and efficiency of credit assessments. They can help financial institutions make more
informed decisions about lending, reduce the risk of bad loans, and provide faster
loan decisions. This can improve customer service and help financial institutions
remain competitive in the rapidly evolving financial industry.
In the scope of this chapter, we have analyzed various applications of deep learning,
a promising subset of artificial intelligence, in the domain of finance. These include
algorithmic trading, price forecasting, fraud detection, and credit assessment.
In terms of our objective, our investigation revealed that deep learning shows
significant promise in all areas examined. Deep learning algorithms demonstrated
higher efficiency and accuracy in detecting fraudulent transactions compared to
conventional rule-based methods. They have also shown effectiveness in automating
the credit assessment process, making it more accurate and efficient. However, while
these applications have shown great promise, it’s crucial to remember that these
algorithms are not a panacea for automated decision-making in finance and come
with their own challenges. Relevance to our stated objective lies in understanding the
pivotal role deep learning plays in enhancing and reshaping crucial financial opera-
tions, leading to improved accuracy, efficiency, and insights. The increasing adoption
of deep learning signals a transformation in the finance industry and indicates future
trends.
This research serves as a comprehensive overview of the current state of deep
learning applications in finance, while also providing insight into its challenges and
potential future directions. The conclusions drawn here underline the need for strong
scientific reasoning skills when adopting deep learning in finance and caution against
an over-reliance on in-sample fitting metrics. A keen understanding of the limitations
of forecasting models, as observed during the financial crisis of 2008, should guide
the adoption of these advanced techniques to avoid pitfalls associated with siloed data
extraction and over-reliance on automation. Ultimately, deep learning offers exciting
potential in finance, but its integration requires careful thought, understanding, and
a measured approach.
21 Deep Learning Models in Finance: Past, Present, and Future 463
References
22. Sezer, O.B., Ozbayoglu, M., Dogdu, E.: A deep neural-network based stock trading system
based on evolutionary optimized technical analysis parameters. Procedia Comput. Sci. 114,
473–480 (2017)
23. Navon, A., Keller, Y.: Financial time series prediction using deep learning (2017)
24. Troiano, L., Villa, E.M., Loia, V.: Replicating a trading strategy by means of lstm for financial
industry applications. IEEE Trans. Ind. Inform. 14(7), 3226–3234 (2018)
25. Sirignano, J., Cont, R.: Universal features of price formation in financial markets: perspectives
from deep learning. SSRN Electron. J. (2018)
26. Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis. A.: Using deep
learning to detect price change indications in financial markets. In: 2017 25th European Signal
Processing Conference (EUSIPCO), IEEE, Aug 2017
27. Ugur Gudelek, M., Arda Boluk, S., Murat Ozbayoglu, A.: A deep learning based stock trading
model with 2-d cnn trend detection. In: 2017 IEEE Symposium Series on Computational
Intelligence (SSCI), IEEE, Nov 2017
28. Sezer, O.B., Ozbayoglu, A.M.: Algorithmic financial trading with deep convolutional neural
networks: time series to image conversion approach. Appl. Soft Comput. 70, 525–538 (2018)
29. Hu, G., Hu, Y., Yang, K., Yu, Z., Sung, F., Zhang, Z., Xie, F., Liu, J., Robertson, N., Hospedales,
T., Miemie, Q.: Deep stock representation learning: from candlestick charts to investment
decisions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, Apr 2018
30. Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis, A.: Forecasting
stock prices from the limit order book using convolutional neural networks. In: 2017 IEEE
19th Conference on Business Informatics (CBI), IEEE, July 2017
31. Gunduz, H., Yaslan, Y., Cataltepe, Z.: Intraday prediction of borsa Istanbul using convolutional
neural networks and feature correlations. Knowl. Based Syst. 137, 138–148 (2017)
32. Chen, C.-T., Chen, A.-P., Huang, S.-H.: Cloning strategies from trading records using agent-
based reinforcement learning algorithm. In: 2018 IEEE International Conference on Agents
(ICA), IEEE, July 2018
33. Wang, Y., Zhang, C., Wang, S., Yu, P.S., Bai, L., Cui, L.: Deep co-investment network learning
for financial assets (2018)
34. Day, M.Y., Lee, C.-C.: Deep learning for financial sentiment analysis on finance news providers.
In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM), IEEE, Aug 2016
35. Sirignano, J.: Deep Learning for Limit Order Books (2016)
36. Gao, X.: Deep reinforcement learning for time series: playing idealized trading games (2018)
37. Kirkos, E., Spathis, C., Manolopoulos, Y.: Data mining techniques for the detection of
fraudulent financial statements. Expert Syst. Appl. 32(4), 995–1003 (2007)
38. Yue, D., Wu, X., Wang, Y., Li, Y., Chu, C.-H.: A review of data miningbased financial
fraud detection research. In: 2007 International Conference on Wireless Communications,
Networking and Mobile Computing, IEEE, Sep 2007
39. Wang, S.: A comprehensive survey of data mining-based accounting-fraud detection research.
In: 2010 International Conference on Intelligent Computation Technology and Automation,
IEEE, May 2010
40. Phua, C., Lee, V.C.S., Smith-Miles, K., Gayler, R.W.: A comprehensive survey of data mining-
based fraud detection research (2010). CoRR, abs/1009.6119
41. Ngai, E.W.T., Hu, Y., Wong, Y.H., Chen, Y., Sun, X.: The application of data mining techniques
in financial fraud detection: a classification framework and an academic review of literature.
Decis. Support. Syst. 50(3), 559–569, Feb 2011
42. West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review.
Comput. Secur. 57, 47–66 (2016)
43. Heryadi, Y., Warnars, H.L.H.S.: Learning temporal representation of transaction amount for
fraudulent transaction recognition using cnn, stacked lstm, and cnn-lstm. In: 2017 IEEE Inter-
national Conference on Cybernetics and Computational Intelligence (CyberneticsCom), IEEE,
Nov 2017
21 Deep Learning Models in Finance: Past, Present, and Future 465
44. Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., Beling, P.: Deep learning detecting fraud
in credit card transactions. In: 2018 Systems and Information Engineering Design Symposium
(SIEDS), IEEE, Apr 2018
45. Gomez, J.A., Ar’evalo, J., Paredes, R., Nin, J.: End-to-end neural network architecture for fraud
scoring in card payments. Pattern Recognit. Lett. 105, 175–181 (2018)
46. Sohony, I., Pratap, R., Nambiar, U.: Ensemble learning for credit card fraud detection. In:
Proceedings of the ACM India Joint International Conference on Data Science and Management
of Data—CoDS-COMAD18, ACM Press, 2018
47. Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.-E., He-Guelton, L., Caelen,
O.: Sequence classification for credit-card fraud detection. Expert Syst. Appl. 100, 234–245
(2018)
48. Paula, E.L., Ladeira, M., Carvalho, R.N., Marzagao, T.: Deep learning anomaly detection as
support fraud investigation in brazilian exports and anti-money laundering. In: 2016 15th IEEE
International Conference on Machine Learning and Applications (ICMLA), IEEE, Dec 2016
49. Gomes, T.A., Carvalho, R.N., Silva Carvalho, R.: Identifying anomalies in parliamentary
expenditures of brazilian chamber of deputies with deep autoencoders. In: 2017 16th IEEE
International Conference on Machine Learning and Applications (ICMLA), IEEE, Dec 2017
50. Wang, Y., Xu, W.: Leveraging deep learning with lda-based text analytics to detect automobile
insurance fraud. Decis. Support. Syst. 105, 87–95 (2018)
51. Li, L., Zhou, J., Li, X., Chen, T.: Poster: practical fraud transaction prediction. In: ACM
Conference on Computer and Communications Security, 2017
52. de Souza Costa, A.I., Silva, L.: Sequence classification of the limit order book using recurrent
neural networks, 2016
53. Goumagias, N.D., Hristu-Varsakelis, D., Assael, Y.M.: Using deep q-learning to understand
the tax evasion behavior of risk-averse firms. Expert Syst. Appl. 101, 258–270 (2018)
54. Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications (2002)
55. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Bench-
marking state-of-the-art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6),
627–635 (2003)
56. Sharma, A., Panigrahi, P.K.: A review of financial accounting fraud detection based on data
mining techniques. Int. J. Comput. Appl. 39(1), 37–47 (2012)
57. Pandey, T.N., Jagadev, A.K., Mohapatra, S.K., Dehuri, S.: Credit risk analysis using machine
learning classifiers. In: 2017 International Conference on Energy, Communication, Data
Analytics and Soft Computing (ICECDS), 2017
58. Gunnarsson, B.R., Vanden Broucke, S., Baesens, B., Óskarsdóttir, M., Lemahieu, W.: Deep
learning for credit scoring: do or don’t? Eur. J. Oper. Res. 295(1), 292–305 (2021)
59. Tripathi, D., Edla, D.R., Bablani, A., Shukla, A.K., Reddy, B.R.: Experimental analysis of
machine learning methods for credit score classification. Prog. Artif. Intell. 10(3), 217–243
(2021)
Chapter 22
New Paradigm in Financial Technology
Using Machine Learning Techniques
and Their Applications
Abstract Due to the inherent risks and challenges associated with financial manage-
ment, researchers have faced a significant obstacle when analyzing financial data.
The necessity for developing innovative models to comprehend financial assets
has become imperative due to the transformation of the foundational principles
underpinning financial markets. In order to provide a precise representation of
data, scholars have introduced various machine learning systems that have shown
promising outcomes. Within the pages of this book chapter, we delve into the progres-
sion of machine learning in the realm of finance over the past decade, with a particular
focus on its applications encompassing Algorithmic Trading, Fraud Detection and
Prevention, Portfolio Management, and Loan Underwriting. Algorithmic Trading
is a methodology that leverages machine learning algorithms to extract knowledge
from data, enabling and enhancing essential investment endeavors. These algorithms
can acquire rules or structures from data in pursuit of objectives such as reducing
prediction errors. In an era where fraudsters continuously evolve and sharpen their
tactics, maintaining constant vigilance is crucial to thwart fraud and stay one step
ahead of malicious actors. It is imperative to be attuned to significant trends that can
distinguish between legitimate and fraudulent transactions. This section compre-
hensively analyzes multiple machine learning algorithms, supported by examples.
Moreover, the chapter delves into the examination of the impact of machine learning
approaches in assessing credit risk and finance. It scrutinizes the limitations of recent
studies and explores emerging research trends in this domain.
D. Patnaik (B)
Kalinga University, Raipur, India
e-mail: [email protected]
S. Patnaik
Interscience Institute of Management and Technology, Bhubaneswar, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 467
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_22
468 D. Patnaik and S. Patnaik
22.1 Introduction
The field of machine learning (ML), a subset of artificial intelligence (AI), leverages
statistical techniques to imbue computer models with the ability to learn from data,
empowering them to perform specific tasks without explicit programming. A new
era of machine learning and data science is currently unfolding within the realm
of banking, promising to reshape the industry in the years to come. Presently, a
majority of financial institutions, including hedge funds, capital investment firms,
retail banks, and fintech companies, are actively adopting and investing in machine
learning. Consequently, the financial sector is poised to require a growing number
of experts specializing in machine learning and data science [1].
Machine learning has gained prominence in the finance sector, primarily due to the
availability of vast data volumes and increased processing capabilities. Data science
and machine learning have found extensive utility across all facets of finance. While
the primary goal of applying machine learning in finance is to enhance accuracy,
this may not be the sole criterion for evaluating system effectiveness, especially in
the context of financial trading. Profitability and cumulative returns over a defined
trading period emerge as the most crucial metrics for assessing trading strategies [2].
A series of experiments were conducted to investigate the impact of three vari-
ables: the size of the training dataset, the duration of retraining, and the number of
features in both the training and test datasets. The results revealed relatively low
accuracy, with only a marginal improvement over the 50% mark. However, they
also demonstrated highly promising outcomes in terms of profitability. Multiple
references underscore the existing flaws in the credit lending system for various
reasons [3]. Machine learning can unearth novel relationships that human intuition
might never consider exploring, raising ethical and legal considerations regarding its
application.
To ensure the success of machine learning in the banking industry, it is imperative
to construct robust infrastructure, employ the appropriate toolsets, and implement
suitable algorithms. These factors collectively play a pivotal role in harnessing the
potential of machine learning within the banking sector.
Portfolio
Management
Algorithmic Options
Trading Pricing and
Risk
past and can identify potential issues. ML is used to backtest and simulates the
trading strategies using historical data. This helps traders assess the viability
and profitability of their algorithms before deploying them in live markets.
detection tasks. Deep learning models can capture complex patterns in sequen-
tial data (e.g., transaction sequences) and unstructured data (e.g., images of
documents) to identify fraud [20].
• Real-Time Monitoring: ML models can be deployed in real-time systems
to monitor transactions and activities as they occur. This enables imme-
diate detection and prevention of fraudulent transactions, reducing financial
losses and customer impact. Implement the trained model in a real-time or
batch processing system to continuously monitor transactions, applications,
or activities for potential fraud.
• Behavioral Analysis: ML models can analyze user behavior over time to
create profiles of legitimate users. Any deviations from these profiles can
trigger alerts for further investigation.
• Graph Analytics: For detecting network-related fraud, such as identity theft
or collusion, ML models can utilize graph analytics to analyze the rela-
tionships and connections between entities (e.g., individuals or accounts) to
identify suspicious patterns [21].
• Cross-Channel Analysis: Fraud detection systems often analyze data from
multiple channels, such as online transactions, mobile apps, and call centers.
ML helps in identifying fraudulent patterns that span multiple channels [22].
• Model Evaluation and Improvement: Continuous monitoring and evalua-
tion of ML models are essential. Models should be retrained and updated to
adapt to evolving fraud techniques and changes in data patterns.
• Regulatory Compliance: ML models in fraud detection must adhere to regu-
latory and compliance standards, such as GDPR and PCI DSS, to ensure the
privacy and security of customer data.
ML-powered fraud detection systems have the advantage of scalability and adapt-
ability, making them a valuable asset in the ongoing battle against fraud. These
systems can evolve with emerging fraud patterns and provide organizations with the
ability to respond swiftly to new threats [23–25].
4. Portfolio Management: ML can assist portfolio managers in optimizing
asset allocation by analyzing market data and historical performance. It can
recommend investment strategies and adjust portfolios to meet specific risk and
return objectives [26]. ML is revolutionizing the field of portfolio management
by providing advanced tools and techniques to optimize investment strategies,
manage risk, and make data-driven decisions. Here are several ways in which
ML is applied in portfolio management:
• Predictive Analytics: ML models can analyze historical market data,
economic indicators, and company-specific information to make predictions
about asset prices, market trends, and economic conditions. These predictions
can inform portfolio managers when making investment decisions.
• Asset Allocation Optimization: Portfolio is to allocate the portfolio across
different asset classes, such as equities, fixed income, real estate, and alter-
native investments, based on the investor’s risk profile and investment goals.
474 D. Patnaik and S. Patnaik
22.3 Conclusion
These are just a few examples of how ML is transforming the finance industry. As
technology and data continue to advance, the applications of ML in finance are likely
to expand even further, leading to more efficient and innovative financial services.
ML has become indispensable tool across diverse areas of the financial sector, encom-
passing asset management, risk evaluation, investment advisory, anti-financial crime
efforts, document verification, and beyond. As ML algorithms handle a myriad of
functions, they continually evolve through data-driven learning, propelling the evolu-
tion of a fully automated financial landscape. This chapter has explored several of
the boundless potentials that machine learning brings to financial technology’s. In
22 New Paradigm in Financial Technology Using Machine Learning … 481
the years ahead, ongoing research in the field may see ML making substantial contri-
butions to the analysis of the financial industry. These applications represent just
a fraction of the innovative ways machine learning is reshaping the financial tech-
nology landscape. As technology continues to advance, we can expect even more
transformative developments in the intersection of finance and artificial intelligence.
However, it’s important to note that these advancements also raise important ques-
tions regarding data privacy, security, and ethical considerations, which need to be
carefully addressed as the industry evolves.
References
1. Warin, T., Stojkov, A.: Machine learning in finance: a metadata-based systematic review of the
literature. J. Risk Financ. Manag. 14(7), 302 (2021). https://doi.org/10.3390/jrfm14070302
2. Gerlein, E.A., McGinnity, M., Belatreche, A., Coleman, S.: Evaluating machine learning clas-
sification for financial trading: An empirical approach. Expert. Syst. Appl. (2016). https://doi.
org/10.1016/j.eswa.2016.01.018
3. Kodru, S.S.: Machine learning applications in finance (2021). http://hdl.handle.net/1920/12227
4. Jackson, S.: Machine Learning for Algorithmic Trading, 2nd edn (2020)
5. Huang, B., Huan, Y., Xu, L.D., Zheng, L., Zou, Z.: Automated trading systems statistical and
machine learning methods and hardware implementation: a survey (2019). https://doi.org/10.
1080/17517575.2018.1493145
6. Huang, Z., Li, N., Mei, W., Gong, W.: Algorithmic trading using combinational rule vector and
deep reinforcement learning. Appl. Soft Comput., 110802 (2023). ISSN 1568-4946. https://
doi.org/10.1016/j.asoc.2023.110802
7. Dubey, R.K.: Algorithmic trading: the intelligent trading systems and its impact on trade size.
Expert. Syst. Appl. 202, 117279 (2022). https://doi.org/10.1016/j.eswa.2022.117279
8. Majidi, M., Shamsi, M., Marvasti, F.: Algorithmic trading using continuous action space deep
reinforcement learning. Expert. Syst. Appl. 235, 121245 (2024). ISSN 0957-4174. https://doi.
org/10.1016/j.eswa.2023.121245
9. Ning, L.: A Machine Learning Approach to Automated Trading. Boston College Computer
Science Senior, Boston, MA (2016)
10. Markov, A., Seleznyova, Z., Lapshin, V.: Credit scoring methods: latest trends and points to
consider. J. Financ. Data Sci. 8, 180–201 (2022). ISSN 2405-9188. https://doi.org/10.1016/j.
jfds.2022.07.002
11. Bueff, A.C., Cytryński, M., Calabrese, R., Jones, M., Roberts, J., Moore, J., Brown, I.: Machine
learning interpretability for a stress scenario generation in credit scoring based on counterfac-
tuals. Expert. Syst. Appl. 202, 117271 (2022). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.
2022.117271
12. Liu, W., Fan, H., Xia, M., Xia, M.: A focal-aware cost-sensitive boosted tree for imbalanced
credit scoring. Expert. Syst. Appl. 208, 118158 (2022). ISSN 0957-4174. https://doi.org/10.
1016/j.eswa.2022.118158
13. Liu, W., Fan, H., Xia, M.: Tree-based heterogeneous cascade ensemble model for credit scoring.
Int. J. Forecast. (2022). ISSN 0169-2070. https://doi.org/10.1016/j.ijforecast.2022.07.007
14. Albanesi, S., DeGiorgi, G., Nosal, J.: Credit growth and the financial crisis: a new narrative. J.
Monet. Econ. 132, 118–139 (2022). ISSN 0304-3932. https://doi.org/10.1016/j.jmoneco.2022.
09.001
15. Helder, V.G., Filomena, T.P., Ferreira, L., Kirch, G.: Application of the VNS heuristic for feature
selection in credit scoring problems. Mach. Learn. Appl. 9, 100349 (2022). ISSN 2666-8270.
https://doi.org/10.1016/j.mlwa.2022.100349
482 D. Patnaik and S. Patnaik
16. Simumba, N., Okami, S., Kodaka, A., Kohtake, N.: Multiple objective metaheuristics for feature
selection based on stakeholder requirements in credit scoring. Decis. Support. Syst. 155, 113714
(2022). ISSN 167-9236. https://doi.org/10.1016/j.dss.2021.113714
17. Lee, K., Lee, H., Lee, H., Yoon, Y., Lee, E., Rhee, W.: Assuring explainability on demand
response targeting via credit scoring. Energy 161, 670–679 (2018). ISSN 0360-5442. https://
doi.org/10.1016/j.energy.2018.07.179
18. Rodrigues, V.F., Policarpo, L.M., da Silveira, D.E., da Rosa Righi, R., da Costa, C.A., Victória
Barbosa, J.L., Antunes, R.S., Scorsatto, R., Arcot, T.: Fraud detection and prevention in e-
commerce: a systematic literature review. Electron. Commer. Res. Appl. 56, 101207 (2022).
ISSN 1567-4223. https://doi.org/10.1016/j.elerap.2022.101207
19. Khatri, S., Arora, A., Agrawal, A.P.: Supervised machine learning algorithms for credit card
fraud detection: a comparison (2020). https://doi.org/10.1109/Confluence47617.2020.9057851
20. Raghavan, P., Gayar, N.E.: Fraud detection using machine learning and deep learning (2019).
https://doi.org/10.1109/ICCIKE47802.2019.9004231
21. Sun, H., Li, J., Zhu, X.: Financial fraud detection based on the part-of-speech features of
textual risk disclosures in financial reports. Procedia Comput. Sci. 221, 57–64 (2023). ISSN
1877-0509. https://doi.org/10.1016/j.procs.2023.07.009
22. Fanai, H., Abbasimehr, H.: A novel combined approach based on deep autoencoder and deep
classifiers for credit card fraud detection. Expert. Syst. Appl. 217, 119562 (2023). ISSN 0957-
4174. https://doi.org/10.1016/j.eswa.2023.119562
23. Shirgave, S., Awati, C., More, R., Patil, S.: A review on credit card fraud detection using
machine learning (2019)
24. Yi, Z., Cao, X., Pu, X., Wu, Y., Chen, Z., Khan, A.T., Francis, A., Li, S.: Fraud detection in
capital markets: a novel machine learning approach. Expert. Syst. Appl. 231, 120760 (2023).
ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2023.120760
25. Cao, R., Wang, J., Mao, M., Liu, G., Jiang, C.: Feature-wise attention based boosting ensemble
method for fraud detection. Eng. Appl. Artif. Intell. 126, 106975 (2023). ISSN 0952-1976.
https://doi.org/10.1016/j.engappai.2023.106975
26. Pozen, R.C., Ruane, J.: What machine learning will mean for asset managers (2019). https://
hbr.org/2019/12/what-machine-learning-will-mean-for-asset-managers
27. Soleymani, F., Paquet, E.: Financial portfolio optimization with online deep reinforcement
learning and restricted stacked autoencoder—deep breath (2020). https://doi.org/10.1016/j.
eswa.2020.113456
28. Wang, Z., Huang, B., Tu, S., Zhang, K., Xu, L.: Deep trader: a deep reinforcement learning
approach for risk-return balanced portfolio management with market conditions embedding
(2021). https://doi.org/10.1609/aaai.v35i1.16144
29. Tan, Z., Yan, Z., Zhu, G.: Stock selection with random forest: an exploitation of excess return
in the Chinese stock market (2019). https://doi.org/10.1016/j.heliyon.2019.e02310
30. Chuan, Y., Zhao, C., He, Z., Wu, L.: The success of adaboost and its application in portfolio
management (2021). https://doi.org/10.1142/S2424786321420019
31. Jiang, Z., Ji, R., Chang, K.-C.: A machine learning integrated portfolio rebalance framework
with risk-aversion adjustment (2020). https://doi.org/10.3390/jrfm13070155
32. Jomthanachai, S., Wong, W.-P., Lim, C.-P.: An application of data envelopment analysis and
machine learning approach to risk management. IEEE Access 9, 85978–85994 (2021). https://
doi.org/10.1109/ACCESS.2021.3087623
33. Liu, Y.: Artificial intelligence and machine learning based financial risk network assessment
model. In: 2023 IEEE 12th International Conference on Communication Systems and Network
Technologies (CSNT), Bhopal, India, 2023, pp. 158–163. https://doi.org/10.1109/CSNT57126.
2023.10134653.
34. Dominguez, G.A., Kawaai, K., Maruyama, H.: FAILS: a tool for assessing risk in ML systems.
In: 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops),
Taipei, Taiwan, 2021, pp. 1–4. https://doi.org/10.1109/APSECW53869.2021.00010
35. Aljabhan, B.: Economic strategic plans with supply chain risk management (SCRM) for organi-
zational growth and development. Alex. Eng. J. 79, 411–426 (2023). ISSN 1110-0168. https://
doi.org/10.1016/j.aej.2023.08.020
22 New Paradigm in Financial Technology Using Machine Learning … 483
36. d’Ambrosio, N., Perrone, G., Romano, S.P.: Including insider threats into risk management
through Bayesian threat graph networks. Comput. Secur. 133, 103410 (2023). ISSN 0167-4048.
https://doi.org/10.1016/j.cose.2023.103410
37. Chakabva, O., Tengeh, R.K.: The relationship between SME owner-manager characteristics
and risk management strategies. J. Open Innov. Technol. Mark. Complex. 9(3), 100112 (2023).
ISSN 2199-8531. https://doi.org/10.1016/j.joitmc.2023.100112
38. Yun, J.: The effect of enterprise risk management on corporate risk management. Financ. Res.
Lett. 55, 103950 (2023). ISSN 1544-6123. https://doi.org/10.1016/j.frl.2023.103950
39. Tan, Y., Zhang, G.-J.: The application of machine learning algorithm in underwriting process.
In: 2005 International Conference on Machine Learning and Cybernetics, vol. 6, Guangzhou,
China, 2005, pp. 3523–3527. https://doi.org/10.1109/ICMLC.2005.1527552
40. Vandervorst, F., Verbeke, W., Verdonck, T.: Data misrepresentation detection for insurance
underwriting fraud prevention. Decis. Support. Syst. 159, 113798 (2022). ISSN 0167-9236.
https://doi.org/10.1016/j.dss.2022.113798
41. Linnér, R.K., Koellinger, P.D.: Genetic risk scores in life insurance underwriting. J. Health
Econ. 81, 102556 (2022). ISSN 0167-6296. https://doi.org/10.1016/j.jhealeco.2021.102556
42. Dubey, A., Parida, T., Birajdar, A., Prajapati, A.K., Rane, S.: Smart underwriting system: an
intelligent decision support system for insurance approval & risk assessment. In: 2018 3rd
International Conference for Convergence in Technology (I2CT), Pune, India, 2018, pp. 1–6.
https://doi.org/10.1109/I2CT.2018.8529792
43. Doultani, M., Bhagchandani, J., Lalwani, S., Palsule, M., Sahoo, A.: Smart underwriting—a
personalised virtual agent. In: 2021 5th International Conference on Intelligent Computing and
Control Systems (ICICCS), Madurai, India, 2021, pp. 1762–1767. https://doi.org/10.1109/ICI
CCS51141.2021.9432216
44. Nikolopoulos, C., Duvendack, S.: A hybrid machine learning system and its application to insur-
ance underwriting. In: Proceedings of the First IEEE Conference on Evolutionary Computation.
IEEE World Congress on Computational Intelligence, vol. 2, Orlando, FL, 1994, pp. 692–695.
https://doi.org/10.1109/ICEC.1994.349974