Leandros A. Maglaras (editor), Sonali Das (editor), Naliniprava Tripathy (editor), Srikanta Patnaik (editor) - Machine Learning Approaches in Financial Analytics (Intelligent Systems Reference Library

Download as pdf or txt
Download as pdf or txt
You are on page 1of 485

Intelligent Systems Reference Library 254

Leandros A. Maglaras
Sonali Das
Naliniprava Tripathy
Srikanta Patnaik Editors

Machine
Learning
Approaches
in Financial
Analytics
Intelligent Systems Reference Library

Volume 254

Series Editors
Janusz Kacprzyk , Polish Academy of Sciences, Warsaw, Poland
Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The aim of this series is to publish a Reference Library, including novel advances
and developments in all aspects of Intelligent Systems in an easily accessible and
well structured form. The series includes reference works, handbooks, compendia,
textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains
well integrated knowledge and current information in the field of Intelligent Systems.
The series covers the theory, applications, and design methods of Intelligent Systems.
Virtually all disciplines such as engineering, computer science, avionics, business,
e-commerce, environment, healthcare, physics and life science are included. The list
of topics spans all the areas of modern intelligent systems such as: Ambient intelli-
gence, Computational intelligence, Social intelligence, Computational neuroscience,
Artificial life, Virtual society, Cognitive systems, DNA and immunity-based systems,
e-Learning and teaching, Human-centred computing and Machine ethics, Intelligent
control, Intelligent data analysis, Knowledge-based paradigms, Knowledge manage-
ment, Intelligent agents, Intelligent decision making, Intelligent network security,
Interactive entertainment, Learning paradigms, Recommender systems, Robotics
and Mechatronics including human-machine teaming, Self-organizing and adap-
tive systems, Soft computing including Neural systems, Fuzzy systems, Evolu-
tionary computing and the Fusion of these paradigms, Perception and Vision, Web
intelligence and Multimedia.
Indexed by SCOPUS, DBLP, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Leandros A. Maglaras · Sonali Das ·
Naliniprava Tripathy · Srikanta Patnaik
Editors

Machine Learning
Approaches in Financial
Analytics
Editors
Leandros A. Maglaras Sonali Das
Edinburgh Napier University Department of Business Management
Edinburgh, UK University of Pretoria
Hatfield, South Africa
Naliniprava Tripathy
Indian Institute of Management Srikanta Patnaik
Shillong, India Institute of Management and Technology
Bhubaneswar, Odisha, India

ISSN 1868-4394 ISSN 1868-4408 (electronic)


Intelligent Systems Reference Library
ISBN 978-3-031-61036-3 ISBN 978-3-031-61037-0 (eBook)
https://doi.org/10.1007/978-3-031-61037-0

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

If disposing of this product, please recycle the paper.


Preface

Welcome to Machine Learning Approaches in Financial Analytics. In an era where


technology is rapidly transforming every industry, the fusion of machine learning
and finance has created a powerful synergy that is reshaping how we understand,
predict, and navigate the intricate landscape of financial markets.
This book serves as a comprehensive guide to the intersection of machine learning
and finance. It’s designed for both seasoned finance professionals seeking to integrate
the latest technological advancements into their work and for data scientists eager to
delve into the intricate world of financial analytics.
The financial world has always been a realm of complexity, marked by volatility,
uncertainty, and dynamic interconnectedness. Traditional models and tools have
often struggled to capture the multifaceted nature of this domain. However, machine
learning techniques offer a paradigm shift, providing the capability to process
vast amounts of data, identify patterns, and generate insights that were previously
unimaginable.
Throughout the chapters of this book, we explore the fundamental principles
of machine learning and how they can be applied to tackle a myriad of financial
challenges. From predictive modeling, risk assessment, algorithmic trading, portfolio
optimization, fraud detection, to customer segmentation, the potential applications
are boundless.
Readers will embark on a journey that begins with foundational concepts and grad-
ually progresses to advanced methodologies, allowing for a comprehensive under-
standing of both the financial and technological aspects. Real-world case studies and
practical examples will illustrate how machine learning algorithms are transforming
the way we perceive, analyze, and strategize within financial markets.
This book aims to bridge the gap between the financial and technological realms,
catering to those who seek to harness the power of machine learning in their finan-
cial endeavors. By providing a deeper comprehension of the underlying principles
and methodologies, readers will be equipped to make informed decisions, develop
innovative strategies, and ultimately leverage the potential of machine learning in
the realm of finance.

v
vi Preface

We sincerely hope that Machine Learning Approaches in Financial Analytics


serves as a valuable resource in your quest to explore the dynamic landscape where
finance and technology converge. May this knowledge empower you to navigate
the complexities of financial markets with confidence, enabling you to unlock new
opportunities and insights in the realm of financial analytics.
Best wishes on your journey into the world of machine learning in financial
analytics.

Bhubaneswar, India Srikanta Patnaik


Editorial

In today’s fast-paced, data-driven world, the realms of finance and technology are
converging like never before. Machine learning, a subset of artificial intelligence, has
emerged as a game-changer in the world of financial analytics. The integration of
advanced algorithms and predictive models has revolutionized the way financial insti-
tutions, investors, and professionals analyse and predict market trends, manage risk,
and make critical decisions. In this edited volume, Machine Learning Approaches
in Financial Analytics, we dive deep into this exciting and rapidly evolving field,
providing a comprehensive guide for individuals looking to leverage the power of
machine learning in the finance industry.
The financial industry is no stranger to technological innovation, and machine
learning is the latest breakthrough in this ongoing transformation. From algorithmic
trading and portfolio management to credit risk assessment and fraud detection,
machine learning techniques are being harnessed to enhance efficiency, reduce costs,
and improve accuracy in decision-making. Our book unravels this financial revolu-
tion, making complex concepts accessible to both novices and experts, as we explore
the intersection of finance and artificial intelligence.

Key Features

Foundations: We start by building a strong foundation in machine learning, ensuring


that readers are equipped with the knowledge they need to navigate the complexities
of financial analytics. This part provides a gentle introduction to machine learning
concepts, algorithms, and models.
Tools and Techniques: To assist professionals in applying machine learning to their
work, we offer practical guidance, including coding examples, case studies, and
insights from industry experts. The book bridges the gap between theory and practice,
helping readers translate knowledge into action.

vii
viii Editorial

Risk Assessment and Ethical Considerations: In an era defined by market volatility


and economic uncertainty, managing risk is paramount. Machine learning offers
powerful tools for assessing, mitigating, and predicting risks in the financial sector.
Our book delves into these risk management strategies, providing insights into stress
testing, credit scoring, and fraud detection.
As with any transformative technology, machine learning in finance comes with
ethical considerations. We address these concerns, discussing topics like bias in
algorithms, data privacy, and regulatory compliance to ensure that readers have a well-
rounded understanding of the impact of machine learning in the financial industry.
Real-World Applications: The heart of our book lies in the exploration of real-
world applications of machine learning in finance. From forecasting stock prices and
managing investment portfolios to optimizing trading strategies, we illustrate how
machine learning is transforming these critical financial processes.
The final part explores the future of machine learning in financial analytics,
predicting upcoming trends, and suggesting areas where further innovation is likely
to occur.
Chapter 1 entitled “Introduction to Optimal Execution” by Makoto Shimoshimizu
aims to overview how the current financial market works and how one can analyse
and build the algorithms for an optimal trading strategy.
Chapter 2 entitled “Python Stack for Design and Visualization in Financial Engi-
neering” authored by Jayanth R. Varma and Vineet Virmani highlighted the power of
the Python stack for designing graphical user interfaces for engineering structured
product solutions by visualizing their payoffs and prices in a web browser.
Chapter 3 entitled “Neurodynamic Approaches to Cardinality-Constrained Port-
folio Optimization” authored by Man-Fai Leung and Jun Wang focuses on the
integration of neurodynamic optimization and cardinality-constrained portfolio
optimization with fruitful results and significant breakthroughs.
Chapter 4 entitled “Fully Homomorphic Encrypted Wavelet Neural Network for
Privacy-Preserving Bankruptcy Prediction in Banks” by Syed Imtiaz Ahamed et al.
proposes a fully homomorphic encrypted wavelet neural network to protect privacy
and at the same time not compromise on the efficiency of the model.
Chapter 5 by Marco Piccolo and Francesco Vigliarolo is titled as “Tools and
Measurement Criteria of Ethical Finance Through Computational Finance” which
demonstrates how computational finance itself can be treated in terms of social
reasoning.
Chapter 6 by Gaurav Kumar and Arun Kumar Misra entitled “Data Mining Tech-
niques for Predicting the Non-performing Assets (NPA) of Banks in India” presents
the findings of a formal attempt to explain NPA variations from 2005 to 17.
The author of the Chap. 7 entitled “Multiobjective Optimization of Mean–
Variance-Downside-Risk Portfolio Selection Models” is Georgios Mamanis. He
experimentally investigated the out-of-sample performance of three multiobjective
portfolio optimization models, namely Mean-Variance-VaR, Mean-Variance-LPSD
(LPSD: Lower Partial Standard Deviation) and Mean-Variance-Skewness.
Editorial ix

Chapter 8 by Simrat Kaur and Anjali Munde entitled “Bankruptcy Forecasting of


Indian Manufacturing Companies Post the Insolvency and Bankruptcy Code 2016
Using Machine Learning Techniques” conducted a comparative analysis of numerous
bankruptcy predictive models in order to recommend the optimal model with the
highest accuracy for bankruptcy prediction.
Chapter 9 entitled “Ensemble Deep Reinforcement Learning for Financial Trad-
ing” is authored by Mendhikar Vishal et al. proposed a couple of ensemble methods
that use a few deep reinforcement learning (DRL) architectures to train on dynamic
markets and learn complex trading strategies to achieve maximum returns on
investments.
Chapter 10 entitled “Bibliometric Analysis of Digital Financial Reporting”
authored by Neha Puri and Vikas Garg examines the literature that has been written
about digital financial reporting between 2011 and 2022 using descriptive research.
Chapter 11 entitled “The Quest for Financing Environmental Sustainability in
Emerging Nations: Can Internet Access and Financial Technology Be Crucial?” by
Ekundayo Peter Mesagan et al. analyses the role of internet access and financial
technology adoption to drive the quest for environmental sustainability financing in
emerging nations with a special focus on African countries.
Chapter 12 by Sidhartha Harichandan et al. entitled “A Comprehensive Review
of Bitcoin’s Energy Consumption and Its Environmental Implications” forecasts the
future of bitcoin mining and its influence on sustainability.
Chapter 13 entitled “Emerging Economies: Volatility Prediction in the Metal
Futures Markets Using GARCH Model” by Ravi Kumar et al. aims to study the
volatility and its prediction using the GARCH (1, 1) model in the metal futures of
two emerging economies, India and China.
Chapter 14 by Mekar Satria Utama et al. identify the elements steering compliance
intentions, various influencing variables need exploration in their chapter entitled
“Constructing a Broad View of Tax Compliance Intentions Based on Big Data”.
Chapter 15 by Suzan Dsouza and Ajay Kumar Jain is titled as “Influence of Firm-
Specific Variables on Capital Structure Decisions: An Evidence from the Fintech
Industry” examines the influence of firm-specific variables that determine the Capstr
decisions of firms from the fintech industry.
Chapter 16 by Vasilios N. Katsikis et al. entitled “A Weights Direct Determina-
tion Neural Network for Credit Card Attrition Analysis” utilizes neural networks to
address the challenges of credit card attrition since they have found great application
in many classification problems.
Chapter 17 entitled “Stock Market Prediction Using Machine Learning: Evidence
from India” is authored by Subhamitra Patra et al. They predict the movements of
the Indian stock markets over 2000–2022 and observes certain dynamism in both the
actual and predicted trends of the Indian stock markets.
Chapter 18 by Riza Demirer et al. is titled as “Realized Stock-Market Volatility:
Do Industry Returns Have Predictive Value?”, where they utilized a machine learning
technique known as random forests to compute predictions of realized (good and bad)
stock-market volatility, and showed that incorporating the information in lagged
x Editorial

industry returns can help improve out-of-sample predictions of aggregate stock-


market volatility.
Chapter 19 entitled “Machine Learning Techniques for Corporate Governance”
by Deepika Gupta seeks to find answers and solutions by exploring new thoughts
not only on performance measures and theories of corporate governance but also on
new research methods through machine learning techniques.
Chapter 20 entitled “Machine Learning Approaches for Forecasting Financial
Market Volatility” by Itishree Behera et al. extends the discussion of forecasting
financial market volatility using machine learning techniques to the real estate market
context.
Chapter 21 by Sai Krishna Vishnumolakala et al. entitled “Deep Learning Models
in Finance: Past, Present, and Future” provides a comprehensive overview of the
current state of the art in DL models for financial applications.
Last but not least, Chap. 22 entitled “New Paradigm in Financial Technology
Using Machine Learning Techniques and Their Applications” is authored by Deepti
Patnaik and Srikanta Patnaik which delves into the examination of the impact of
machine learning approaches in assessing credit risk and finance. It scrutinizes the
limitations of recent studies and explores emerging research trends in this domain.
Machine Learning Approaches in Financial Analytics is a comprehensive guide
for anyone seeking to navigate the dynamic intersection of finance and technology.
The integration of machine learning into financial analytics has the potential to rede-
fine the industry, offering new opportunities for growth, risk management, and finan-
cial well-being. It combines theoretical insights with practical applications, ethical
considerations, and expert perspectives to offer a holistic understanding of the impact
and potential of machine learning in finance. This book will empower its readers to
make informed, data-driven decisions in the dynamic world of financial analytics.
Whether one is a finance professional looking to gain a competitive edge, an
investor seeking better decision-making tools, or a student eager to explore the fore-
front of financial technology, this book provides the knowledge and insights you
need to succeed in this exciting and transformative field. As you embark on your
journey through these pages, you will not only master the tools and techniques but
also gain a profound understanding of how machine learning is reshaping the future
of finance.

Bhubaneswar, India Srikanta Patnaik


Contents

Part I Foundations
1 Introduction to Optimal Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Makoto Shimoshimizu
1.1 Overview: Financial Market and Execution Problem . . . . . . . . . 3
1.1.1 Electronic Market and System Transition . . . . . . . . . . . . 3
1.1.2 Large Trader and Market (Price) Impact . . . . . . . . . . . . . 5
1.1.3 Structure of This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Notations and Some Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Almgren-Chriss Model for Optimal Execution . . . . . . . . . . . . . . . 11
1.3.1 Market Model and Optimal Execution Strategy . . . . . . 11
1.3.2 Efficient Frontier of Optimal Execution:
A Mean-Variance Perspective . . . . . . . . . . . . . . . . . . . . . 15
1.4 A Continuous-time Analog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.2 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Transient Impact Model with Small Traders’ Orders [35] . . . . . . 20
1.5.1 Market Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.2 Formulation as a Markov Decision Process . . . . . . . . . . 24
1.5.3 Dynamics of the Optimal Execution . . . . . . . . . . . . . . . . 25
1.5.4 In the Case with Target Close Order . . . . . . . . . . . . . . . . 33
1.5.5 Computation Method for Optimal Execution . . . . . . . . . 34
1.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Appendix 1: Lagrange Multiplier Method . . . . . . . . . . . . . . . . . . . . . . . . . . 37

xi
xii Contents

Appendix 2: Second-Order Linear Difference Equation: A Review . . . . 38


Appendix 3: Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Appendix 4: Second-order Linear ODE: A Review . . . . . . . . . . . . . . . . . . 42
Appendix 5: A Review of Discrete-time Stochastic Dynamic
Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Part II Tools and Techniques


2 Python Stack for Design and Visualization in Financial
Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Jayanth R. Varma and Vineet Virmani
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 Design of Interactive Applications: Literature Review . . . . . . . . 55
2.3 Design Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4 Interactive Python Applications with Jupyter
and Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4.1 Interactive Plotting with Matplotlib . . . . . . . . . . . . . . . . . 58
2.5 Python Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5.1 The Exemplar Structured Product: A 3-Way
Collar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5.2 Python Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.5.3 Black_Scholes: An Object-Oriented Python
Module for Designing and Pricing . . . . . . . . . . . . . . . . . . 62
2.5.4 Adding Matplotlib Widgets at Run Time . . . . . . . . . . . . 63
2.5.5 Extensions: An Example with a Barrier Included . . . . . 64
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Appendix: Design of 3-Way Collar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3 Neurodynamic Approaches to Cardinality-Constrained
Portfolio Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Man-Fai Leung and Jun Wang
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.1 Biconvex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.2 Mean–Variance Portfolio Selection . . . . . . . . . . . . . . . . . 72
3.2.3 Conditional Value-at-Risk . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.4 Sharpe Ratio and Conditional Sharpe Ratio . . . . . . . . . . 74
3.2.5 Cardinality-Constrained Portfolio Selection . . . . . . . . . 75
3.3 Neurodynamic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Neurodynamic Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.1 Collaborative Neurodynamic Approach . . . . . . . . . . . . . 80
3.4.2 Two-Timescale Duplex Neurodynamic Approach . . . . . 82
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5.1 Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Contents xiii

3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4 Fully Homomorphic Encrypted Wavelet Neural Network
for Privacy-Preserving Bankruptcy Prediction in Banks . . . . . . . . . . 97
Syed Imtiaz Ahamed, Vadlamani Ravi, and Pranay Gopi
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2 Overview of Bankruptcy Prediction and Problem
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.1 Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.2 CKKS Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.3 Overview of the Original Unencrypted WNN . . . . . . . . 104
4.4.4 Proposed Privacy-Preserving Wavelet Neural
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5 Datasets Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5.1 Qualitative Bankruptcy Dataset . . . . . . . . . . . . . . . . . . . . 108
4.5.2 Spanish Banks Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5.3 Turkish Banks Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5.4 UK Banks Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Appendix: Datasets Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5 Tools and Measurement Criteria of Ethical Finance Through
Computational Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Marco Piccolo and Francesco Vigliarolo
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Ethical Finance, Principles and Operating Criteria . . . . . . . . . . . 119
5.3 Computational Finance Critic: Limits and Challenge
with Respect to Ethic Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.1 Some Definition Aspects Considered in This
Paragraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.3.2 The Background Vice: Economic Positivism . . . . . . . . . 126
5.4 Measurement Criteria of Computational Finance
with the Principles of Ethical Finance . . . . . . . . . . . . . . . . . . . . . . 127
5.5 Some Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
xiv Contents

6 Data Mining Techniques for Predicting the Non-performing


Assets (NPA) of Banks in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Gaurav Kumar and Arun Kumar Misra
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.1 Sample and Data Collection . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.2 Experimental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3.3 Data Mining Methodology . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4.1 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4.2 Elastic Net Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.4.3 k-NN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7 Multiobjective Optimization
of Mean–Variance-Downside-Risk Portfolio Selection
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Georgios Mamanis and Eftychia Kostarelou
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2 Multiobjective Portfolio Optimization Models . . . . . . . . . . . . . . . 154
7.3 Multiobjective Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . 158
7.4 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.4.1 Computational Experiments on S&P 100 Index . . . . . . 160
7.4.2 Computational Experiments on a Large-Scale
Problem Instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.4.3 Comparison with Competing Portfolios . . . . . . . . . . . . . 170
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Part III Risk Assessment and Ethical Considerations


8 Bankruptcy Forecasting of Indian Manufacturing Companies
Post the Insolvency and Bankruptcy Code 2016 Using
Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Simrat Kaur and Anjali Munde
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.3 Data Collection and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.5 Empirical Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Contents xv

8.6.1 Managerial Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . 186


8.6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
8.7 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9 Ensemble Deep Reinforcement Learning for Financial Trading . . . . 191
Mendhikar Vishal, Vadlamani Ravi, and Ramanuj Lal
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.1.1 How Reinforcement Learning Works . . . . . . . . . . . . . . . 193
9.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
9.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.4 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.4.1 Assumptions Made During Stock Trading . . . . . . . . . . . 198
9.4.2 Stock Market Environment . . . . . . . . . . . . . . . . . . . . . . . . 198
9.4.3 RL Trading Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.5 Dataset Description and Experimental Setup . . . . . . . . . . . . . . . . 201
9.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
9.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Part IV Real-World Applications


10 Bibliometric Analysis of Digital Financial Reporting . . . . . . . . . . . . . . 211
Neha Puri and Vikas Garg
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
10.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
10.4 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
10.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
10.6 Distribution of Annual Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
10.7 Distribution of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
10.8 Distribution of Top Journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
10.9 Distribution of Articles Based on Citations . . . . . . . . . . . . . . . . . . 218
10.10 Distribution of Different Affiliations . . . . . . . . . . . . . . . . . . . . . . . 220
10.11 Distribution of Publications Among Countries . . . . . . . . . . . . . . . 220
10.12 Distribution of Keyword Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 222
10.13 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
10.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
10.15 Theoretical Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
10.16 Practical Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
xvi Contents

11 The Quest for Financing Environmental Sustainability


in Emerging Nations: Can Internet Access and Financial
Technology Be Crucial? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Ekundayo Peter Mesagan, Precious Muhammed Emmanuel,
and Mohammed Bashir Salaudeen
11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
11.2 Schematic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
11.3 Situational Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.3.1 Where is African Climate Finance Coming from? . . . . 236
11.3.2 The Glimpse of Environmental Sustainability
Financing of East Asian Countries . . . . . . . . . . . . . . . . . 238
11.3.3 Internet Access Situational Analysis . . . . . . . . . . . . . . . . 239
11.3.4 Fintech Situational Analysis by Region . . . . . . . . . . . . . 240
11.4 Implication of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.5 Policy Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
12 A Comprehensive Review of Bitcoin’s Energy Consumption
and Its Environmental Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Sidhartha Harichandan, Sanjay Kumar Kar, and Abhishek Kumar
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
12.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
12.3 Bitcoin Mining and Its Implications . . . . . . . . . . . . . . . . . . . . . . . . 251
12.3.1 The Concept of Bitcoin Mining . . . . . . . . . . . . . . . . . . . . 251
12.3.2 Estimating Energy Consumption of Mining
Farms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
12.3.3 Issues with Bitcoin Mining . . . . . . . . . . . . . . . . . . . . . . . . 252
12.4 The Economies of Bitcoin Mining . . . . . . . . . . . . . . . . . . . . . . . . . 254
12.4.1 Profitability of Bitcoin Mining . . . . . . . . . . . . . . . . . . . . . 254
12.4.2 Regulation of Bitcoin Mining . . . . . . . . . . . . . . . . . . . . . . 254
12.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
12.5.1 Calculating Bitcoin’s Energy Consumption . . . . . . . . . . 255
12.5.2 Current Models Used to Calculate Bitcoin’s
Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
12.6 Sustainability and Future of Mining . . . . . . . . . . . . . . . . . . . . . . . . 260
12.6.1 The Discomforts of Switching to Renewable
Electricity for Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
12.6.2 Proof-of-Stake as an Alternative Strategy . . . . . . . . . . . . 261
12.6.3 Limitations on Circuit Applications for Reducing
Electronic Wastages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
12.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Contents xvii

13 Emerging Economies: Volatility Prediction in the Metal


Futures Markets Using GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . 267
Ravi Kumar, Babli Dhiman, and Naliniprava Tripathy
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
13.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
13.3 Data and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
13.3.1 GARCH (1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
13.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
13.5 Concluding Observation and Managerial Implication . . . . . . . . . 274
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
14 Constructing a Broad View of Tax Compliance Intentions
Based on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Mekar Satria Utama, Solimun, and Adji Achmad Rinaldo Fernandes
14.1 Introduction to Taxes and Tax Compliance Intentions . . . . . . . . . 280
14.2 Introduction to Theory of Planned Behavior (TPB) . . . . . . . . . . . 282
14.3 The Link Between TPB and Intention to Comply
with Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
14.4 Religiosity and Utilization of e-Filing as a Determinant
Factor in Intention to Comply with Taxes Through TPB . . . . . . . 286
14.5 Extracting Information in Online Media is Related
to Variables that Influence Tax Compliance Intentions . . . . . . . . 289
14.6 Introduction to Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
14.7 Modeling Using SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
14.8 Integration of Information Mining Results in Online
Media Using DNA with SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
14.9 Research Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
14.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
15 Influence of Firm-Specific Variables on Capital Structure
Decisions: An Evidence from the Fintech Industry . . . . . . . . . . . . . . . 307
Suzan Dsouza and Ajay Kumar Jain
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
15.2 Literature Review and Hypotheses Development . . . . . . . . . . . . . 310
15.2.1 Capital Structure (Capstr) Decisions . . . . . . . . . . . . . . . . 310
15.2.2 Capital Structure (Capstr) Decisions: Fintech
Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
15.2.3 Determinants of Capital Structure . . . . . . . . . . . . . . . . . . 313
15.3 Variables and the Research Model . . . . . . . . . . . . . . . . . . . . . . . . . 314
15.3.1 Research Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
15.4 Sample and Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 314
15.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
15.5.1 Distribution of Capstr (Box Plot Technique) . . . . . . . . . 317
xviii Contents

15.5.2 Regression Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317


15.6 Conclusion and Managerial Contribution . . . . . . . . . . . . . . . . . . . 320
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
16 A Weights Direct Determination Neural Network for Credit
Card Attrition Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Vasilios N. Katsikis, Spyridon D. Mourtas, Romanos Sahas,
and Dimitris Balios
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
16.2 The MTA-WASD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
16.2.1 Activation Functions and the WDD Process . . . . . . . . . 329
16.2.2 The Trigonometrically Activated WASD
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
16.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
16.3.1 Attrition Dataset I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
16.3.2 Attrition Dataset II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
16.3.3 Attrition Dataset III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
16.3.4 Collective Performance Comparison . . . . . . . . . . . . . . . . 340
16.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
17 Stock Market Prediction Using Machine Learning: Evidence
from India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Subhamitra Patra, Trilok Nath Pandey, and Biswabhusan Bhuyan
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
17.2 A Review on Machine Learning Techniques . . . . . . . . . . . . . . . . . 349
17.3 Data and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
17.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
17.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
17.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
17.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
18 Realized Stock-Market Volatility: Do Industry Returns Have
Predictive Value? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Riza Demirer, Rangan Gupta, and Christian Pierdzioch
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
18.2 Methodology and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
18.2.1 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
18.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
18.3 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
18.3.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
18.3.2 The Classic HAR-RV Model as a Benchmark . . . . . . . . 384
18.3.3 The Predictive Power of Lagged Industry Returns . . . . 386
18.3.4 Time-Varying Importance of Industry Returns . . . . . . . 389
18.3.5 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Contents xix

18.3.6 Random Forests Versus Shrinkage Estimators . . . . . . . . 394


18.3.7 Asymmetric Loss and Quantile-Random Forests . . . . . 395
18.4 Economic Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
18.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
19 Machine Learning Techniques for Corporate Governance . . . . . . . . . 407
Deepika Gupta
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
19.2 Aims and Objectives of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . 409
19.3 Machine Learning and Its Techniques . . . . . . . . . . . . . . . . . . . . . . 410
19.4 Corporate Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
19.4.1 External Governance Mechanisms . . . . . . . . . . . . . . . . . 414
19.4.2 Internal Governance Mechanisms . . . . . . . . . . . . . . . . . . 416
19.5 Why is Machine Learning Required for Corporate
Governance? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
19.5.1 Legal and Ethical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 423
19.6 Future Scope of Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
19.7 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
20 Machine Learning Approaches for Forecasting Financial
Market Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Itishree Behera, Pragyan Nanda, Soma Mitra, and Swapna Kumari
20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
20.2 Background and Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
20.3 Traditional Methods for Volatility Forecasting . . . . . . . . . . . . . . . 439
20.4 Machine Learning Techniques in Volatility Forecasting . . . . . . . 441
20.5 Data Sources and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
20.6 Performance Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
20.7 Challenges and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 446
20.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
21 Deep Learning Models in Finance: Past, Present, and Future . . . . . . 453
Sai Krishna Vishnumolakala, Sri Raj Gopu,
Jatindra Kumar Dash, Sasikanta Tripathy, and Shailender Singh
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
21.2 Algorithmic Trading and Price Forecasting . . . . . . . . . . . . . . . . . . 454
21.3 Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
21.4 Credit Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
21.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
xx Contents

22 New Paradigm in Financial Technology Using Machine


Learning Techniques and Their Applications . . . . . . . . . . . . . . . . . . . . 467
Deepti Patnaik and Srikanta Patnaik
22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
22.2 New Paradigm in Financial Technology Using Machine
Learning Techniques and Their Applications . . . . . . . . . . . . . . . . 468
22.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Part I
Foundations
Chapter 1
Introduction to Optimal Execution

Makoto Shimoshimizu

Abstract The developments in electronic markets have led to the diversification


of trading activity, and traders need to manage the liquidity risk carefully. This
problem is called the optimal execution problem and has become a significant issue
among financial mathematicians, economists, and practitioners. This chapter aims to
overview how the current financial market works and how one can analyze and build
the algorithms for an “optimal execution strategy.” The first section gives a review
of current financial markets, which leads to the basics for constructing a model of
optimal execution from the viewpoints of market microstructure. In particular, I
clarify the system of the “limit order book,” which includes an exposition about how
traders place orders and influence the market. Also, this section presents the basic
concepts of “large trader” and “market impact,” on top of which most execution
models are built. The succeeding sections explain how one can incorporate market
impact in modeling and formulate an execution problem through a fundamental
model posed by Almgren and Chriss (J. Risk 3:5–39, 2000 [2]). I then describe an
extensive model with a moderate change in market impact modeling, discussed in
Ohnishi and Shimoshimizu (Quant. Financ. 20:1625–1644, 2020 [35]). These models
embody the foundation of algorithms for optimal execution strategies.

1.1 Overview: Financial Market and Execution Problem

1.1.1 Electronic Market and System Transition

Various ways of trading are available to a preponderance of a trading market since


the structure of trading systems diverges in different directions. As an example of a
wide variety of electric trading platforms, algorithmic trading has emerged in these
decades, and the so–called high–frequency trading (HFT ) with computer systems,
which typifies algorithmic trading, significantly influences the financial market.

M. Shimoshimizu (B)
Department of Industrial and Systems Engineering, Tokyo University of Science, Tokyo, Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 3
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_1
4 M. Shimoshimizu

The last three decades have witnessed a huge (and worldwide) change in the
trading system on stock exchanges. For example, as stated in [33], the regulatory
development of the HFT was accelerated over the 1990s for the financial market
to be more competitive among market participants. The related regulation, Regu-
lation ATS (alternative trading systems; Reg ATS) in 2000, was enforced in the
U. S. for the sorts of non–exchange competitors to be able to enter the market-
place.1
In light of the emergence of MiFID in 2007, considerable concerns about the
so-called dark pool have arisen among practitioners and researchers. A dark pool is
a (private) securities trading exchange where traders can use an uninformed order
book and matching engine. Since the MiFID was enforced in Europe, institutional
traders such as pension fund manager rapidly used dark pools, where the trading of
a large block of orders are not informed to the market participants.
According to [24], although traders did not often use high-frequency trading
(HFT) around 2000, HFTs have accounted for .20 percent of the total trading volume
in the market since the mid-.2000s (until .2019). The volume-weighted average price
(VWAP) or time-weighted average price (TWAP) strategy was the mainstream of
algorithmic trading in the early .2000s. The VWAP, denoted by . PVWAP , is a bench-
mark as the average price weighted by the relative volume over the trading time
window:
∑n
Pi Vi
. PVWAP := ∑n ,
i=1
(1.1)
Vi
i=1

where . Pi is the asset price at time .n. The VWAP strategy aims to maintain the
price dynamics as the VWAP via one’s own trading activity. The TWAP, denoted by
. PTWAP , is a benchmark as the average price of a given number of trades, say .n, over
the trading time window:

1∑
n
. PTWAP := Pi . (1.2)
n i=1

The VWAP and TWAP are not realized until the end of the trading horizon. Thus,
traders generally consider the historical VWAP and TWAP as the benchmark. How-
ever, using some liquidity-seeking algorithms has become more common since the
mid-.2000s (until .2019). These facts underscore the importance of analyzing algo-
rithmic trading or HFT which financial market traders have heavily relied on for

1The Regulation National Market System (Reg NMS) in 2007 and Market in Financial Instrument
Directive (MiFID) in 2007, enforced in the U.S. and Europe, respectively, brought about a negative
outcome. Even though these regulations are designed for encouraging new competition and trading
venues, equity markets in the U.S. and Europe are fragmented since tradings spread out among
various exchanges and financial markets.
1 Introduction to Optimal Execution 5

more than a decade. The development of the trading system facilitates an increasing
number of studies encompassing a field such as market impact modeling, or (optimal)
execution problem.

1.1.2 Large Trader and Market (Price) Impact

There is a growing awareness among academic researchers or practitioners that some


institutional traders called large traders cause the market impact (or so-called price
impact) through their trades.2 A life insurance company, trust company, or com-
pany that manages pension funds exhibit typical examples of such traders of great
importance. Large traders recognize these market impacts as liquidity risk. They can
reduce the liquidity risk by splitting their order into small sizes over the course of the
trading epoch. Conversely, submitting small pieces of order gradually may expose
large traders to the risk of future price fluctuation. Thus, every large trader has to
pay attention to two distinct facets. The first ingredient is the liquidity risk which
arises owing to the large orders he/she submits, and the price risk corresponding to
the price fluctuations in the future. In the literature, the generally accepted use of the
term execution refers to buy/sell orders submitted by large traders. Developments in
trading technology for algorithmic trading or HFT have attracted a growing body
of research regarding execution problems. The optimal execution problem refers to
the one focusing on a large trader who aims to minimize an expected trading cost or
maximize an expected utility from his/her wealth.

1.1.2.1 Order Type and Limit Order Book

The mechanism of how the market impact occurs is captured via the information
obtained from the so-called limit order book. A limit order book (commonly abbre-
viated as LOB) is a set of information including volumes of buy/sell orders, and the
price at which the volume is submitted by traders. All market participants can access
the information about LOB.
A trader can select how to submit orders from the following two ways: market
order and limit order.3 A market order (MO) is a one used by traders to execute
buy/sell orders immediately after the order submission. A limit order (LO) is, on the
contrary, aimed at executing buy/sell orders at the price which the trader prefers to
trade. One of the main differences between the two orders is the fact that MOs are

2 As a current stream, the impact caused by large traders as well as other traders is called the market
impact rather than the price impact. Therefore, the author consistently uses the word market impact
in what follows.
3 The types of orders a trader can use are categorized into more classes (e.g., cancellation, dark-

pool, and so on). Here our aim is to illustrate a basic concept of how the market works and how a
market impact can arise. Readers can refer to [10, 22, 47] for more details.
6 M. Shimoshimizu

Fig. 1.1 Example of submiting a sell market order. Assume that the volume of best bid at time
.tis .3000 (units) at the best bid price 299$. When a trader submits .2500 (units of) volumes of sell
market orders, the LO at the best bid price is firstly executed. Since the volume of orders placed at
the best bid price is less than the market orders submitted by the trader, the best bid price does not
change. The best bid price at time .t + Δt thus remains 299$ and the execution price also remains
unchanged for the market order. Also, any market impacts does not occur

executed for certain, although LOs are not certainly executed. The feature of MOs
and LOs will be illustrated from Figs. 1.1, 1.2, 1.3 and 1.4.4
Some terminology expressing the LOB precedes at first. The term bid (ask, respec-
tively) has been applied to each buy (sell) LO. The price at which each buy (sell)
order is placed is referred to as bid price (ask price). In particular, the term best
bid price (best ask price) defines the highest (lowest) price of the bid (ask) price. A
trader submits an LO by designating the order type (buy/sell), the volume, and the
price at which the trader wants the orders to be executed. The term LOB is generally
understood to mean all the information about these features as well as the time each
order is placed. If an opposite LO at a price less than the best bid price or larger than
the best ask price comes into the market, the buy/sell transaction matches and the
LO vanishes in the LOB. Let us denote the best bid price and best ask price at time
b a mid
.t by . Pt and . Pt , respectively. Then, the mid-price, expressed as . Pt , is defined as

Ptb + Pta
. Ptmid := . (1.3)
2
The minimum price increment that all traders can submit orders is called tick size.
A change in the tick size influences the trading activity of market participants. (For
the details, see, for example [10, 22, 47].)

4 The tick size is assumed to be 1$ in Figs. 1.1, 1.2, 1.3 and 1.4. The horizontal axis denotes the
volume of orders on the LOB and the vertical axis the price at which each order is placed (as LOs).
1 Introduction to Optimal Execution 7

Fig. 1.2 Example of submiting a sell market order. Assume that the volume of best bid at time
.tis .3000 (units) at the best bid price 299$. When a trader submits .3500 (units of) volumes of sell
market orders, the LO at the best bid price is firstly executed. Since the volume of orders placed
at the best bid price is larger than the market orders submitted by the trader, the best bid price
changes. The best bid price at time .t + Δt thus moves to the next ask price (i.e., decreases) and
becomes 298$. The execution price also changes for the market order and the market impact occurs

Fig. 1.3 Example of submiting a buy market order. Assume that the volume of best ask at time
.tis .3000 (units) at the best ask price 301$. When a trader submits .2500 (units of) volumes of buy
market orders, the LO at the best ask price is firstly executed. Since the volume of orders placed at
the best ask price is less than the market orders submitted by the trader, the best ask price does not
change. The best ask price at time .t + Δt thus remains 301$ and the execution price also remains
unchanged for the market order. Also, any market impacts does not occur.

1.1.2.2 Types of Market Impact

We can categorize the market impact mentioned above into three types: temporary,
permanent, and transient market impact. The temporary (market) impact is defined
to stand for the part vanishing before the next trading due to the recovery of (limited)
market liquidity. On the other hand, the part of the market impact that remains at the
8 M. Shimoshimizu

Fig. 1.4 Example of submiting a buy market order. Assume that the volume of best ask at time
.tis .3000 (units) at the best ask price 301$. When a trader submits .3500 (units of) volumes of buy
market orders, the LO at the best ask price is firstly executed. Since the volume of orders placed
at the best ask price is larger than the market orders submitted by the trader, the best ask price
changes. The best ask price at time .t + Δt thus moves to the next ask price (i.e., increases) and
becomes 302$. The execution price also changes for the market order and the market impact occurs

next trading time is referred to as the permanent (market) impact. Moreover, when
the temporary impact dissipates over the course of the trading horizon, we call the
market impact the transient (market) impact.5 Figure 1.5 illustrates the basic concept
explained above.
Assume that a market (or quoted) price is given by. Pt . Since a large trader executes
a large number of orders denoted by .qt , the execution price (that corresponds to the
real trading price) goes up to some degree and becomes . Pt + λt qt . Here .λt qt denotes
the market impact caused by the submission of the large trader (under the assumption
of a linear market impact model). After the execution, the market price goes down to
some degree due to the liquidity provision from the market participants (e.g., market
makers, noise traders, and so on).6 The impact that does not affect the (market) price
at the next trading time (corresponding to .λt qt αt in Fig. 1.5) is the temporary impact.
The impact that affects the (market) price at the next trading time (corresponding

5 The definition of each kind of market impact may be different from that of other literature. The

above definition stems from the assumption that the market impact is decomposed into a temporary
part and a permanent one. Some literature, such as [8, 17], empirically show that the market impact
has transient properties. In the following, each market impact is abbreviated as temporary impact,
permanent impact, and transient impact, respectively.
6 We can classify the types of traders as follows:

1. Noise Traders: trading by economic fundamentals outside the exchange;


2. Informed Traders: traders seeking profit by leveraging information unreflected in market prices
and trading assets in anticipation of future price movement;
3. Market Makers: professional traders seeking profit by facilitating exchange and exploiting
their skills.
.
1 Introduction to Optimal Execution 9

Fig. 1.5 Illustration of temporary, permanent, and transient market impact (in the case of buy MO)

to .λt qt (1 − αt ) in Fig. 1.5) is the permanent impact. Moreover, the transient impact
describes a residual effect of past temporary impacts caused by the large trader (and
other market participants). The formulation of temporary, permanent, and transient
impacts is set to become a vital factor in analyzing optimal execution problems.

1.1.2.3 Arbitrage and Market Impact Function

An arbitrage opportunity is a key ingredient for market participants and is attracting


considerable interest from the viewpoints of both academicians and practitioners.
Arbitrage trading is a practical method that attempts to obtain profit by focusing on
the difference in prices and taking a position to buy an undervalued investment and sell
an overvalued one. Particularly, the existence of a dynamic arbitrage is recognized as
one of the important aspects of a financial market. The notion of dynamic arbitrage
is defined via the notion of round-trip trading as follows.

Definition 1 (Round Trip Trading and Dynamic Arbitrage) A round-trip trading is


a method of trading on .[0, T ] (or .0 = t0 < t1 < · · · < tn−1 < tn = T ) that satisfies
∑T
. qt = 0. A dynamic arbitrage is an opportunity of arbitrage that makes use
t=0
of a round trip profitable from an expected cost minimization point of view. To be
precise, a dynamic arbitrage exists if and only if a trading strategy, which .qt for time
.t ∈ [0, T ] (or .0 = t0 < t1 < · · · < tn−1 < tn = T ) consists of, exists that satisfies
∑T
. qt = 0 and
t=0
[ ]
.E W T ≥ W0 , (1.4)
10 M. Shimoshimizu

where .W0 and .WT represents the wealth at time .0 and .T , respectively.
Remark 1 (Premanent Impact)
Let us define the permanent impact function caused by a large trader by . f : R →
R. Then, as [17, 22, 23] shows, the market excludes the dynamic arbitrage if the
permanent impact function becomes

. f (v) = kv, (1.5)

for all .v ∈ R for some .k ∈ R++ , that is, a linear function [20]. In addition, they
theoretically demonstrate that nonlinear permanent impact can lead to no dynamic
arbitrage.
According to this fact (as well as some empirical results), a lot of existing research
including the one explained below analyzes an optimal execution strategy under a
linear permanent impact model.

1.1.3 Structure of This Chapter

The rest of this chapter proceeds as follows. Section 1.3 introduces the so-called
Almgren-Chriss (AC) model [2] in a discrete-time framework, a fundamental model
for seminal theoretical papers on optimal execution problems. Section 1.4 then ana-
lyzes a continuous-time analog of AC model, which captures the significant role that
the market impact and risk-averse feature of a large trader plays. In Sect. 1.5, we see
the model examined by Ohnishi and Shimoshimizu [35], where the market impact
caused by small traders as well as a large trader exists. The model also considers a
transient feature of the market impact which Almgren and Chriss [2] does not incor-
porate. All of these models derive the optimal execution strategy explicitly so that
these strategies can be used as a backtest from a practitioner’s point of view. A bib-
liographic note (Sect. 1.6) and some appendices are placed at the end of this chapter.

1.2 Notations and Some Remarks

In this chapter, .Z++ stands for the set of all positive integers, i.e., .Z++ := {1, 2, . . .}.
Likewise, we define .R+ := [0, ∞) and .R++ := (0, ∞). .Rd represents the set of
.d-dimensional real-valued vectors. Any vectors are defined as row vectors. .R
m×n

denotes the set of all .m × n real-valued matrices. For any .A ∈ R m×n


, we denote by
T
.A the transpose of the matrix (or vector) .A. .E[·] denotes the expectation (operator),

and .V[·] the variance, each defined on a probability space .(Ω, F, P). As for the
differentiation of a function, for any twice-differentiable function . f : X ⊂ R → R,
denoted by . f t := f (t) for .t ∈ X , . f˙t expresses the differentiated function of . f evalu-
ated at.t and. f¨t the twice-differentiated function. Also, for any vector-valued function
1 Introduction to Optimal Execution 11

. f : Rn → Rm ,. ∂∂ xf (a) is the Jacobian of the function. f evaluated at. a ∈ Rn , defined as


⎛ ∂ f1 ∂ fm ⎞
| (x)
∂ x1
··· ∂ x1
(x) |
∂f ∂f | ⎜ .. .. .. ⎟ ||
. (a) := (x)|| =⎝ . . . ⎠| ∈ Rn×m . (1.6)
∂x ∂x x=a ∂ f1 ∂ fm x=a
∂ xn
(x) ··· ∂ xn
(x)

This chapter aims to explain some introductory notions and models concerning
optimal execution problems for those who have some knowledge of stochastic anal-
ysis. To this end, the readers are supposed to have a basic knowledge of real analysis
and stochastic analysis. Readers who are not familiar with these issues, please refer
to some basic materials, for example, [9, 40]. The other materials needed for this
chapter, some optimization methods in particular, are shown in the appendices.

1.3 Almgren-Chriss Model for Optimal Execution

We first introduce a fundamental model proposed by Almgren and Chriss [2]. The
model searches for an optimal execution strategy from the viewpoints of cost mini-
mization criteria and mean-variance one.7

1.3.1 Market Model and Optimal Execution Strategy

Assume that a large trader (e.g., a life insurance company, trust company, or a com-
pany that manages pension funds) holds .Q ∈ R++ units of one risky asset. The large
trader needs to liquidate all of the assets by the maturity .T ∈ Z++ . The set of the
trading time, denoted by .T , consists of .n times as follows:
{ }
T 2T
.T := , ,...,T (1.7)
n n

In the following, we will use the notation .τ to express .T /n and .tk to express .kT /n
for .k ∈ {1, . . . , n}. We also define by .qk for .k ∈ {1, . . . , n} the number of shares that
the large trader submits at time .tk ∈ T and by . Q k the ones that the large trader still
holds at time .tk ∈ T ∪ {tn+1 }.8 By definition, the following relationships hold:

7 Bertsimas and Lo [7] propels to the forefront in investigations of this field, which addresses the
optimization problem of minimizing the expected execution cost in a discrete-time framework via a
dynamic programming approach. This analysis identifies the optimal execution volume as equally
divided volume throughout the trading epochs. Notwithstanding a valuable insight into the execution
problem, the model disregards any attitudes toward the large trader’s risk.
8 The definition of some variables is slightly different from the one defined in AC model [2]

to integrate the notation through this chapter, but the difference does not significantly affect the
contents compared with the original paper.
12 M. Shimoshimizu

• . Q 1 = Q. (initial condition)
• . Q n+1 = 0. (terminal condition)
∑k
• . Q k+1 = Q k − qk = Q − q j for .k ∈ {1, . . . , n − 1}.
j=1

Also, the execution strategy of the large trader, i.e., the set .{q1 , . . . , qn } is defined as
. q. The execution trajectory of the one, i.e., the set .{Q 1 , . . . , Q n , Q n+1 } is similarly

defined by . Q. In addition, let .A denote the set of admissible trading strategies in a


class of detetrministic strategies:

.A := { Q| Q is admissible and deterministic}. (1.8)

Let us next consider the dynamics of the risky asset. Assume that . Pt for .t ∈ T
represents the fundamental price or unaffected price, which expresses the risky asset
price without market impact caused by the large trader.
In the Almgren-Chriss Model, the market impact is divided into two parts: tempo-
rary impact and permanent impact. We assume that the dynamics of the risky asset,
denoted by . Pk := Ptk for .tk ∈ T , evolve as a discrete arithmetic random walk. The
dynamics of the risky asset price becomes as follows:
(q )
k
. Pk+1 = Pk + σ τ 1/2 ∈k − τ g , (1.9)
τ
where, for .t ∈ T , .σ represents the volatility of the asset, .∈t for .t ∈ T follows a
standard normal distribution, i.e.,

∈ ∼ N (0, 1),
. t (1.10)

and the function .g(·) stands for the permanent impact.9


The temporary impact affects the risky asset price only through the orders sub-
mitted by the large trader and disappears in the dynamics transition by its nature.
Therefore, the modelling of the temporary impact at time .tk ∈ T consists of mod-
̂k := P
elling the execution price, denoted by . P ̂tk , as follows:
( )
. ̂k = Pk − h qk ,
P (1.11)
τ
(q )
k
where .h for .k ∈ {1, . . . , n} is the temporary impact.
τ
The objective of the large trader is to minimize the expected total cost. In other
words, the large trader chooses his/her execution strategy so that the strategy min-
imizes the expected value of the total cost defined as the difference between the
initial holdings value and the revenue from the trading. We define the so-called

( )
9 To be precise, the permanent impact.g qτk is a function of the average rate of the trading volume
.qtk during the interval .(tk−1 , tk ). Moreover, .∈t is independent in time .t ∈ T by definition.
1 Introduction to Optimal Execution 13

captured value of a trading strategy (or trajectory) as the total revenue after the
trading transaction. The captured value is calculated, via a simple calculation, as
follows:


n n (
∑ ( q )) ∑
n (q )
. ̂k qk = P1 Q +
P στ 1/2
∈k − τ g
k
Q k+1 − qk h
k
. (1.12)
k=1 k=1
τ k=1
τ

Using the notation .C( Q) to represent the total cost, we can express the total cost as
follows:


n n (
∑ ( q )) ∑
n (q )
C( Q) := P0 Q −
. ̂k qk = −
P σ τ 1/2 ∈k − τ g
k
Q k+1 + qk h
k
.
k=1 k=1
τ k=1
τ
(1.13)
Remark 2 The transaction cost defined above is the standard measure in trading
performance evaluations, which [38] called the implementation shortfall.
By the above arguments, the expected total cost and the variance of the total cost
are readily computed as follows:

[ ] ∑ n (q ) ∑n (q )
k k
.E C( Q) = τg Q k+1 + qk h ; (1.14)
k=1
τ k=1
τ
[ ] ∑
n
2
.V C( Q) = σ 2 τ Q k+1 . (1.15)
k=1

In the following, we analyze a simple situation with the following assumption.


Assumption 1 Both temporary and permanent impacts take a linear form, that is,
(q ) qk
k
. h = ηqk ;
:= ητ (1.16)
( qτ ) τ
qk
k
.g := ψ , (1.17)
τ τ
ψ
for all .k ∈ {1, . . . , n}. In addition, the parameters satisfy the condition: .η − > 0.
2
Then, the problem that the large trader faces boils down to the following opti-
mization one:
[ ] ∑
n ∑
n
. min E C( Q) = ψ qk Q k+1 + η qk2 ,
Q∈A
k=1 k=1

n
s.t. qk = Q.
k=1
14 M. Shimoshimizu

Theorem 1 (Optimal execution strategy for AC model) The optimal execution strat-
egy at time .tk ∈ T becomes an equally divided trading strategy:

Q
. kq∗ = , k = 1, . . . , n. (1.18)
n
Proof By the summation by parts formula (see, e.g., [40, Theorem 3.41]), we have


n
ψ 2 ψ∑ 2
n
ψ
. qk Q k+1 = Q − q . (1.19)
k=1
2 2 k=1 k

Substituting this into the objective function of the above minimization problem then
results in the following minimization problem:
( )∑
n
ψ ψ 2
min η− qk2 + Q ,
q∈A 2 k=1
2
. (1.20)

n
s.t. qk = Q.
k=1

This is a convex optimization problem with equality constraint since we assume that
ψ
.η − > 0. Using the Lagrange multiplier method, we obtain
2
Q
q∗ =
. k , (1.21)
n
for .k ∈ {1, . . . , n}. For the Lagrange multiplier method, see Appendix 1. □

The corresponding execution trajectory then becomes

∗ n − (k − 1)
. Qk = Q, k = 1, . . . , n, n + 1. (1.22)
n
One can easily check that the expected total cost of the optimal execution strategy
obtained as above is less than the one of the strategy of initially executing all of the
trading volumes, denoted by . Q 0 :
( ) ( )
ψ Q2 ψ ψ ψ
C( Q ∗ ) = η −
. + Q2 > C( Q 0 ) = η − Q2 + Q2 (1.23)
2 n 2 2 2

This shows that dividing large orders into small ones can lead to reducing execution
costs.
This strategy is closely related to the Time-Weighted Average Price (TWAP) strat-
egy, which practitioners use as a standard trading strategy. For a continuous-time
1 Introduction to Optimal Execution 15

execution in an interval .[0, T ] for some .T ∈ R with initial trading volume .Q, the
TWAP strategy is defined as

Q
q TWAP :=
. t . (1.24)
T

1.3.2 Efficient Frontier of Optimal Execution:


A Mean-Variance Perspective

We show from the above analysis of an optimal execution that the optimal execu-
tion strategy under a cost minimization criterion exists without risk-aversion term.
An analogous manner with Markowitz’s mean-variance analysis [2, 31] defines the
so-called effecient execution strategy. An execution strategy is said to be efficient if
there exists no other strategy that attains a lower expected shortfall (variance, respec-
tively) than any other strategy that has the same or lower variance (expected shortfall,
respectively).
The efficient execution strategy is defined to satisfy the following constrained
optimization problem:
[ ]
. min E C( Q) ; (1.25)
Q∈A
[ ]

.s.t. V C( Q) ≤ V . (1.26)

V ∗ ∈ R+ represents a given maximum level of variance. This problem is converted


.

to a unconstrained problem via a Lagrange multiplier .γ as follows:


[ ] [ ]
min E C( Q) + γ V C( Q) ;
.
Q∈A (1.27)
s.t. Q 1 = Q; Q n+1 = 0,

which is a computationally tractable one. The parameter .γ ∈ R+ stands for a risk-


aversion parameter of the large trader. Let us define
[ ] [ ]
U( Q) := E C( Q) + γ V C( Q) ,
. (1.28)

which is thought of as a disutility function. Under the assumption of only temporary


impact, a concrete calculation then boils down to the following minimization problem
as follows:


n
( )2 ∑
n
2
. min U( Q) = η Q k+1 − Q k + γσ2 Q k+1 (1.29)
Q∈A
k=1 k=1
16 M. Shimoshimizu

Note that the function .U is a convex quadratic function with respect to . Q k for .k ∈
{1, . . . , n + 1}. The optimality conditions for Eq. (1.27) result in

∂U ( ) ( )
. ( Q) = 0 ⇐⇒ η Q k − Q k−1 − η Q k+1 − Q k + γ σ 2 Q k = 0. (1.30)
∂ Qk

This is a second-order linear difference equation. The approximate solution10 to the


system of the above linear difference equation is

∗ sinh (κ(T − tk ))
. Qk = Q , k = 1, . . . , n, n + 1, (1.31)
sinh (κ T )

where .κ is the urgency parameter defined as follows:


/
γσ2
. κ := . (1.32)
η

For the derivation of Eq. (1.31), see Appendix 2.

1.4 A Continuous-time Analog

1.4.1 Model

We can extend the above model to a continuous-time setting. In this case, the execu-
tion is continuously conducted in a trading horizon .[0, T ]. In the following, we only
consider the temporary impact and assume that the temporary impact is linear with
respect to the orders posed by the large trader.
Let . P0 be the risky asset price at time .0. The dynamics for the market price are
then described, using a standard Brownian motion, as follows:

dPt = σ dBt ,
. (1.33)

where .{Bt }0≤t≤T is a standard Brownian motion with . Bt = 0 and .σ is the risk of the
market price.11 The execution price at time .t ∈ [0, T ] is given by

10 The original model includes an approximation for .κ, although the model explained in this
subsection does not.
11 If we add a linear permanent impact in this model, the dynamics for the market price becomes

.dPt = −ψ Q̇ t dt + σ dBt , (1.34)


where .ψ ∈ R++ is the permanent impact coefficient.
1 Introduction to Optimal Execution 17

. ̂t = Pt − η Q̇ t ,
P (1.35)

where . Q̇ t at time .t ∈ [0, T ] is the execution speed that satisfies

dQ t
. = − Q̇ t , (1.36)
dt

and .η Q̇ t at time .t ∈ [0, T ] stands for the temporary impact with .η ∈ R++ .
Let us define . Q to be the trading trajectory and .A to be the set of admissible
strategies in a class of deterministic strategies as in the discrete-time setting. That is,

. Q := {Q t |0 ≤ t ≤ T } and A := { Q| Qis admissible and deterministic}. (1.37)

Using the stochastic integration by parts formula, we can calculate the trading cost
(or the implementation shortfall) and obtain the expression as follows:

{T
.C( Q) := P0 Q − ̂t Q̇ t dt
P
0
{T
( )
= P0 Q − Pt − η Q̇ t Q̇ t dt
0
{T {T
( )2
= P0 Q − Pt Q̇ t dt + η Q̇ t dt
0 0
{T {T
( )2
= −σ Q t dBt + η Q̇ t dt. (1.38)
0 0

From the property of stochastic integral and Itô’s isometry, we obtain the expec-
tation of the trading cost (or the expected implementation shortfall) and the variance
of the trading cost:

[ ] {T
( )2
.E C( Q) = η Q̇ t dt; (1.39)
0

[ ] {T
( )2
.V C( Q) = σ
2
Q t dt. (1.40)
0

Therefore, the following minimization problem yields an efficient execution strategy


in a continuous-time setting:
18 M. Shimoshimizu

[ ] [ ] {T ( ( )2 ( )2 )
. min E C( Q) + γ V C( Q) = η Q̇ t + γ σ 2 Q t dt, (1.41)
Q∈A
0

s.t. Q 0 = Q;
. Q T = 0. (1.42)
Theorem 2 The optimal trading strategy at time .t ∈ [0, T ], denoted by . Q ∗ :=

{Q t |0 ≤ t ≤ T }, is characterized by the following remaining execution volume:

∗ sinh(κ(T − t))
. Qt = Q , (1.43)
sinh(κ T )

where .κ is an urgency parameter defined as Eq. (1.32).


Proof The minimization problem described by Eqs. (1.41) and (1.42) is a one in
calculus of variations. The Euler-Lagrange equation yields the following ordinary
differential equation:

γσ2
. Q̈ t = Qt , (1.44)
η

with boundary conditions: . Q 0 = Q and . Q T = 0. (For the detail, see Appendix 3.)
This is a second order linear ordinary differential equation (ODE) with constant
coefficients. Solving this yields Eq. (1.43). (For the detail, see Appendix 4.) □
Remark 3 As Eq. (1.43) shows, if .Q = 0,

. Qt = 0 (1.45)

for all.t ∈ [0, T ]. This means that the round-trip trading in this model with zero initial
trading volume excludes the dynamic arbitrage opportunity.

1.4.2 Numerical Example

We see some numerical examples through comparative statics in the following. The
benchmark values are set as follows:

γ = 0.001; η = 0.001; σ = 0.5; Q = 100000; T = 10.


. (1.46)

1.4.2.1 Effect of Volatility

The first example encompasses how the level of volatility influences the optimal exe-
cution strategy. Figure 1.6 illustrates the remaining execution volume of the optimal
execution strategy for different .σ : .σ = 0.5, 1, and .5.
1 Introduction to Optimal Execution 19

Fig. 1.6 Remainig


execution volume for
different .σ

As Fig. 1.6 shows, the larger the volatility is, the faster the large trader executes.
This fact is compatible with the intuition that a large trader tends to avoid the risk of
future price fluctuation.

1.4.2.2 Effect of Risk-Aversion Paremeter

We next examine how the risk-aversion parameter can influence the trading strat-
egy. Figure 1.7 demonstrates the optimal trading strategy for different risk-aversion
parameters: .γ = 0.001, 0.01, and .0.1.
Figure 1.7 shows that the more risk-averse the large trader is, the faster he/she
executes the trading volume. This is rather compatible with the intuitive understand-
ing that the risk-averse large trader executes the orders fastly to avoid the effect of
market impact (and price fluctuation).
20 M. Shimoshimizu

Fig. 1.7 Remainig


execution volume for
different .γ

1.5 Transient Impact Model with Small Traders’ Orders


[35]

1.5.1 Market Model

We introduce an optimal execution problem in a discrete-time framework based on


Ohnishi and Shimoshimizu [35].12 Let us first describe the situation that a risk-
averse large trader faces. In a discrete-time framework .t ∈ {1, . . . , T, T + 1} .(T ∈
Z+ ), one large trader in a financial market is obligated to purchase .Q ∈ R volume
of one risky asset by the time .T + 1. The large trader has a Constant Absolute
Risk Aversion (CARA) von Neumann-Morgenstern (vN-M) (or negative exponential)
utility function:
{ }
u(x) := − exp − γ x ,
. (1.47)

with an absolute risk aversion parameter .γ ∈ R++ . Let .qt ∈ R represent a large
amount of orders submitted by the large trader at time .t ∈ {1, . . . , T }. We denote
by . Q t the remained execution volume, that is, the number of shares remained to
purchase by the large trader at time .t ∈ {1, . . . , T, T + 1}. This assumption yields
. Q 1 = Q and

12 As for [26, 27], they construct models with the residual effect of the market impact, i.e., the
transient impact which dissipates over the trading time window. These papers solve an optimization
problem of maximizing an expected utility payoff from the final wealth at maturity, deriving an
optimal execution strategy.
1 Introduction to Optimal Execution 21

. Q t+1 = Q t − qt , t = 1, . . . , T. (1.48)

The market price (or quoted price) of the risky asset at time .t ∈ {1, . . . , T, T + 1}
is expressed as . Pt . Since the large trader has a great influence on the risky asset’s
price through his/her submission of a large amount of orders, the execution price
̂t with the additive execution cost. In
at time .t ∈ {1, . . . , T } becomes not . Pt but . P
the rest of this paper, we assume that submitting one unit of (large) order at time
.t ∈ {1, . . . , T } causes the instantaneous linear market impact whose coefficient is
denoted as .λt ∈ R++ . We also assume that the aggregate trading volumes posed by
small traders also have some impact on the execution price.13 .κt ∈ R++ represents
the market impact coefficient per unit at time .t ∈ {1, . . . , T } caused by small traders.
The dynamics of the aggregate trading volumes submitted by small traders at time
.t ∈ {1, . . . , T } is assumed to be a sequence of random variables .vt , which follows a

normal distribution with mean .μvt and variance .(σtv )2 for each time .t ∈ {1, . . . , T },
that is,
( )
v ∼ N μvt , (σtv )2 , t = 1, . . . , T.
. t (1.49)

In the sequel, the buy- and sell-trades of a large trader are supposed to induce the
same (instantaneous) market impact. Assuming this would be inconsistent with the
situation observed in a real market. However, we can justify this assumption from
the statistical analysis of market data in [11, 12].14 We assume that the execution
price takes a form of a linear market impact model as follows:

. ̂t = Pt + (λt qt + κt vt ), t = 1, . . . , T.
P (1.50)

13 Existing research concerned with execution problems has thoroughly investigated the market
impact model with small traders. As [39] show, small trades have statistically far larger impacts
on the market than that of large trades in a relative sense. These results infer that one should take
into account a market impact caused by small traders when constructing a market impact model.
Cartea et al. [11] incorporates the market impact caused by other traders into the construction of
the midprice process by describing the market order-flow through a general Markov process and
derives a closed-form strategy for a large trader. They show that the optimal execution strategies
are different from [2] when small traders cause a market impact and coincide with [2] when small
traders do not affect the midprice. This analysis is based on the assumption that the market impact is
decomposed into temporary and permanent, and not transient. The model explained here considers
the transient impact through the residual effect of the past execution (caused by both a large trader
and small traders) on a risky asset market. This setting enables us to analyze how the residual effect
of the past market impact influences the execution strategy of a large trader. The effect of the market
impact caused by small traders on the execution price features the generalized market impact in the
model.
14 Their works estimate the permanent and temporary impacts by conducting a linear regression of

price changes on net order-flow using trading data obtained from Nasdaq. This estimation and the
relevant statistics show that the linear assumption of the market impact is compatible with the stock
market and that the market impact caused by both buy and sell trades is thought of as the same from
the viewpoint of statistical analysis.
22 M. Shimoshimizu

We next define the residual effect of past market impact (temporary impact to be
precise) at time .t ∈ {1, . . . , T, T + 1}, represented by . Rt , by means of the following
exponential decay kernel function .G(t) of time .t ∈ {1, . . . , T, T + 1}:

. G(t) := e−ρt , t = 1, . . . , T, T + 1. (1.51)

Using a deterministic price reversion rate .αt ∈ [0, 1] and a deterministic resilience
speed .ρ ∈ [0, ∞), the dynamics of the residual effect of past market impact is given
by

. R1 = 0;

t
Rt+1 := (λk qk + κk vk ) αk e−ρ((t+1)−k)
k=1


t−1
= e−ρ (λk qk + κk vk ) αk e−ρ(t−k) + (λt qt + κt vt ) αt e−ρ
k=1
[ ]
= Rt + (λt qt + κt vt ) αt e−ρ , t = 1, . . . , T. (1.52)

Note that (1.52) shows a Markov property. The Markov property arises from the
assumption of the exponential decay kernel.

Remark 4 (Time-dependency of the deterministic resilience speed)


The (deterministic) resilience speed is generally assumed to be a constant parameter
in theoretical analysis, such as in [32, 46]. A lot of the empirical studies, how-
ever, demonstrate that liquidity is variable over time. The result suggests that the
resilience speed is time-dependent. Our analysis allows the time-dependence for the
resilience speed, i.e.,.ρt for all.t ∈ {1, . . . , T }, as considered in [16]. Notwithstanding
a meaningful extension from the viewpoint of a real market analysis, we henceforth
formulate the model with .ρ (i.e., without time-dependent parameter .ρt ) since the
dependence will not offer additional intriguing results in the following analysis.

Here we also define by a series of independent random variables .∈t at time .t ∈


{1, . . . , T } the effect of the public news/information about the economic situation
between .t and .t + 1, and assume that for each .t ∈ {1, . . . , T }, .∈t follows a normal
distribution with mean .μ∈t and variance .(σt∈ )2 , i.e.,
( )
∈ ∼ N μ∈t , (σt∈ )2 , t = 1, . . . , T.
. t (1.53)

We suppose here that the two stochastic processes,.vt and.∈t for.t ∈ {1, . . . , T }, are
mutually independent for convenience. We hereafter conduct our analysis assuming
the independence of the two stochastic processes.
f
The construction of the fundamental price at time .t ∈ {1, . . . , T }, denoted by . Pt ,
must be carefully considered. Since the residual effect of the past execution dissipates
over the course of the trading horizon, we define . Pt − Rt as the fundamental price
1 Introduction to Optimal Execution 23

of the risky asset. By definition of .∈t and the assumption that the permanent impact
at time .t ∈ {1, . . . , T } is represented by .(λt qt + κt vt ) βt , we can set the fundamental
f
price . Pt := Pt − Rt with a permanent impact as follows:
f
. Pt+1 = Pt+1 − Rt+1
:= Pt − Rt + βt (λt qt + κt vt ) + ∈t (1.54)
f
= Pt + βt (λt qt + κt vt ) + ∈t , t = 1, . . . , T .

This relation indicates that (i) the permanent impact caused by large traders and
small traders and (ii) the public news or information about an economic situation
are assumed to affect the fundamental price. This assumption also reveals that the
permanent impact may give a non-zero trend to the fundamental price, even if the
mean of .∈t is zero for all .t ∈ {1, . . . , T }. According to Eqs. (1.50), (1.52), and (1.54),
the dynamics of the market price or the relation between . Pt+1 and . Pt are described
as

. Pt+1 = Pt + (Rt+1 − Rt ) + βt (λt qt + κt vt ) + ∈t


= Pt − (1 − e−ρ )Rt + (αt e−ρ + β)(λt qt + κt vt ) + ∈t , t = 1, . . . , T.
(1.55)

Remark 5 (Permanent, temporary and transient impact)


In this context,.(λt qt + κt vt )βt ,.(λt qt + κt vt )αt , and.(λt qt + κt vt )αt e−ρ represent the
permanent impact, temporary impact, and transient impact at time .t ∈ {1, . . . , T },
respectively. Moreover, if .ρ → ∞, the residual effect of past price impact becomes
zero for all .t ∈ {1, . . . , T } since . R1 = 0 and from Eq. (1.52),

. lim Rt+1 = lim [Rt + (λt qt + κt vt )αt ]e−ρ = 0, t = 1, . . . , T, (1.56)


ρ→∞ ρ→∞

and therefore,

. Pt+1 = Pt − (1 − e−ρ )Rt + (αt e−ρ + βt )(λt qt + κt vt ) + ∈t ,


= Pt + βt (λt qt + κt vt ) + ∈t , t = 1, . . . , T. (1.57)

Thus, in this case, we have a permanent impact model. Also, if .αt = 1, the model is
reduced to a transient impact model. Also, if .κt = 0 or .σtv = 0, the model is reduced
to [27].

From the definition of the execution price, the wealth process .Wt evolves

. ̂t qt = Wt − {Pt + (λt qt + κt vt )} qt , t = 1, . . . , T.
Wt+1 = Wt − P (1.58)
24 M. Shimoshimizu

1.5.2 Formulation as a Markov Decision Process

We formulate the large trader’s problem as a discrete-time Markov decision process.


In a discrete-time window .t ∈ {1, . . . , T, T + 1}, we define the state of the decision
process at time .t ∈ {1, . . . , T, T + 1} as .4-tuple and denote it as

s = (Wt , Pt , Q t , Rt ) ∈ R × R × R × R =: S.
. t (1.59)

For .t ∈ {1, . . . , T }, an allowable action chosen at state . st is an execution volume


q ∈ R =: A so that the set .A of admissible actions is independent of the current
. t
state . st .
When an action .qt is chosen in a state . st at time .t ∈ {1, . . . , T }, a transition to a
next state

s
. t+1 = (Wt+1 , Pt+1 , Q t+1 , Rt+1 ) ∈ S (1.60)

occurs according to the law of motion which we have precisely described in the
previous subsection. We symbolically describe the transition by a (Borel measurable)
system dynamics function . ht .(: S × A × (R × R) −→ S):

s
. t+1 = ht (st , qt , (∈t , vt )), t = 1, . . . , T. (1.61)

A utility payoff (or reward), defined by a function .gT +1 : S → R, arises only in a


terminal state . s T +1 at the end of horizon .T + 1 as
{ { }
− exp − γ WT +1 if Q T +1 = 0;
. gT +1 (s T +1 ) := (1.62)
−∞ if Q T +1 /= 0.

The term .−∞ means a hard constraint enforcing the large trader to execute all of the
remaining volume . Q T at the maturity .T , that is, .qT = Q T .
If we define a (history-independent) one-stage decision rule . f t at time .t ∈
{1, . . . , T } by a Borel measurable map from a state . st ∈ S = R4 to an action

q = f t (st ) ∈ A = R,
. t (1.63)

then a Markov execution strategy, denoted by.π , is defined as a sequence of one-stage


decision rules

π := ( f 1 , . . . , f t , . . . , f T ).
. (1.64)

We denote the set of all Markov execution strategies as .∏M . Further, for .t ∈
{1, . . . , T }, we define the sub-execution strategy after time .t of a Markov execu-
tion strategy .π = ( f 1 , . . . , f t , . . . , f T ) ∈ ∏M as
1 Introduction to Optimal Execution 25

π := ( f t , . . . , f T ),
. t (1.65)

and the entire set of .πt as .∏M,t .


By definition (1.62), the value function under an execution strategy .π is an
expected utility payoff arising from the terminal wealth .WT +1 of the large trader
with the absolute risk aversion parameter .γ :
[ ] [ | ]
|
V.1π s1 = Eπ1 gT +1 (s T +1 )| s1
[ { } | ]
= Eπ1 − exp − γ WT +1 · 1{ Q } + (−∞) · 1{ } ||s1 , (1.66)
T +1 =0 Q T +1 /=0

where .1 A is the indicator function of an event . A, and for .t ∈ {1, . . . , T }, .Eπt is a


conditional expectation given a condition at time .t under .π .
Then, for .t ∈ {1, . . . , T, T + 1} and . st ∈ S, we further let
[ ] [ | ]
|
V.tπ st = Eπt gT +1 (s T +1 )| st
[ { } | ]
|
= Eπt − exp − γ WT +1 · 1{Q T +1 =0} + (−∞) · 1{Q T +1 /=0} |st , (1.67)

be the expected utility payoff at time [ ].t under the strategy .π . It should be noted
that the expected utility payoff .Vtπ st depends on the Markov execution policy
.π = ( f 1 , . . . , f t , . . . , f T ) only through the sub-execution policy .πt := ( f t , . . . , f T )
after time .t.
Now, we define the optimal value function as follows:
[ ] [ ]
. Vt st = sup Vtπ st , st ∈ S, t = 1, . . . , T, T + 1. (1.68)
π∈∏M

From the principle of optimality, the optimality equation (Bellman equation, or


dynamic programming equation) becomes
[ ] [ [ ]|| ]
. Vt st = sup E Vt+1 ht (st , qt , (∈t , vt )) |st , st ∈ S, t = 1, . . . , T, T + 1.
qt ∈R
(1.69)

1.5.3 Dynamics of the Optimal Execution

Theorem 3 (Optimal execution strategy and optimal value function)


1. The optimal execution volume at time .t ∈ {1, . . . , T, T + 1}, denoted as .qt∗ ,
becomes an affine function of the remaining execution volume (. Q t ) and the
cumulative residual effect (. Rt ):
( )
q ∗ = f t Wt , Pt , Q t , Rt = at + bt Q t + ct Rt , t = 1, . . . , T,
. t (1.70)
26 M. Shimoshimizu

where .at , bt , ct for .t ∈ {1, . . . , T, T + 1} are deterministic functions of time .t


which depend on the problem parameters and can be computed backwardly from
maturity .T . [ ]
2. The optimal value function .Vt st at time .t ∈ {1, . . . , T, T + 1} is represented
as follows:
[ ]
. Vt Wt , Pt , Q t , Rt
{ [ 2
]}
= − exp − γ Wt − Pt Q t + G t Q t + Ht Q t + It Q t Rt + Jt Rt2 + L t Rt + Z t ,
(1.71)

where.G t , Ht , It , Jt , L t , Z t for.t ∈ {1, . . . , T, T + 1} are deterministic functions


of time .t which depend on the problem parameters, and can be computed back-
wardly from maturity .T .

Proof We derive the optimal execution volume .qt∗ at time .t ∈ {1, . . . , T } by back-
ward induction method of dynamic programming from the maturity .T .

[Step 1] From the assumption that the large trader must unwind all the remainder
of his/her position at time .t = T ,

. Q T +1 = Q T − qT = 0, (1.72)

must hold, which yields .qT∗ = Q T . Then, for .t = T , with the relation of the moment-
generating function of .vt :
{ }
[ ] 1
.E exp {γ κT qT vT } = exp γ κT qT μvT + γ 2 κT2 qT2 (σTv )2 , (1.73)
2

Eq. (1.69) (or the Bellman equation) becomes


[ ] [ [ ]|| ]
. VT s T = sup E VT +1 s T +1 | s T
qT ∈R
[ [ ]|| ]
= sup E VT +1 WT +1 , PT +1 , Q T +1 , RT +1 |WT , PT , Q T , RT
qT ∈R
[ { }]
= E − exp − γ WT +1
[ { [ ]}]
= E − exp − γ WT − P ̂T qT∗
{ [ ( ) ]}
1 2
= − exp −γ WT − PT Q T − λT + γ κT2 (σTv )2 Q T − κT μvT Q T
2
{ [ 2
]}
= − exp − γ WT − PT Q T + G T Q T + HT Q T , (1.74)
1 Introduction to Optimal Execution 27

where
( )
1
. G T := − λT + γ κT2 (σTv )2 (< 0); HT := −κT μvT .
2

[Step 2] For .t = T − 1, according to Eq. (1.69) (or the Bellman equation), we


have

[ ]
. VT −1 s T −1
[ [ ]| ]
|
= sup E VT s T |s T −1
qT −1 ∈R
[ { [ ]}| ]
2 |
= sup E − exp − γ WT − pT Q T + G T Q T + HT Q T |WT −1 , PT −1 , Q T −1 , RT −1
qT −1 ∈R
[ { [ { ( )}
= sup E − exp − γ WT −1 − PT −1 + λT −1 qT −1 + κT −1 vT −1 qT −1
qT −1 ∈R
{ ( ) }
− PT −1 − (1 − e−ρ )RT −1 + (λT −1 qT −1 + κT −1 vT −1 ) αT −1 e−ρ + βT −1 + ∈T −1
( ) ( )2 ( ) ]}|| ]
× Q T −1 − qT −1 + G T Q T −1 − qT −1 + HT Q T −1 − qT −1 |WT −1 , PT −1 , Q T −1 , RT −1
{ [
= sup − exp − γ − A T −1 qT2 −1 + (BT −1 Q T −1 + C T −1 RT −1 + DT −1 )qT −1
qT −1 ∈R
{ }
1 1 2
+ WT −1 − PT −1 Q T −1 + G T − γ (α T −1 )2 κT2 −1 (σTv −1 )2 − γ (σT∈ −1 )2 Q T −1
2 2
( ) ]}
+ HT − α T −1 κT −1 μvT −1 − μ∈T −1 Q T −1 + (1 − e−ρ )Q T −1 RT −1 , (1.75)

where

α T −1 := αT −1 e−ρ + βT −1 ;
. (1.76)
1 1
. A T −1 := (1 − α T −1 )λT −1 − G T + γ (1 − α T −1 )2 κT2 −1 (σTv −1 )2 + γ (σT∈ −1 )2 ;
2 2
(1.77)
. BT −1 := −α T −1 λT −1 − 2G T − γ α T −1 (1 − α T −1 )κT2 −1 (σTv −1 )2 + γ (σT∈ −1 )2 ;
(1.78)
. C T −1 := −(1 − e−ρ ); (1.79)
T −1
. DT −1 := −HT − (1 − α )κT −1 μvT −1 + μ∈T −1 . (1.80)

Finding the optimal execution volume .qT∗ −1 , which attains the supremum of Eq.
(1.75), is equivalent to finding the one which yields the maximum of

:= −A T −1 qT2 −1 + (BT −1 Q T −1 + C T −1 RT −1 + DT −1 )qT −1


. K T −1 (q T −1 )
{ }
1 1
+ WT −1 − PT −1 Q T −1 + G T − γ (α T −1 )2 κT2 −1 (σTv −1 )2 − γ (σT∈ −1 )2 Q T −1
2
2 2
( )
+ HT − α T −1 κT −1 μvT −1 − μ∈T −1 Q T −1 + (1 − e−ρ )Q T −1 RT −1 , (1.81)
28 M. Shimoshimizu

since Eq. (1.75) and Eq. (1.81) are concave functions with respect to .qT −1 . Thus, by
completing the square of . K T −1 (qT −1 ) with respect to .qT −1 , we obtain the optimal
execution volume .qT∗ −1 as

BT −1 Q T −1 + C T −1 RT −1 + DT −1 ( )
q∗
. T −1 = =: aT −1 + bT −1 Q T −1 + cT −1 RT −1 .
2 A T −1
(1.82)

Then, the optimal value function at time .T − 1 becomes a functional form as


follows:
[ ]
s T −1
. VT −1
{ [ { }
1 1
= − exp − γ WT −1 − PT −1 Q T −1 + G T − γ (α T −1 )2 κT2 −1 (σTv −1 )2 − γ (σT∈ −1 )2 Q T −1
2
2 2
( )
+ HT − α T −1 κT −1 μvT −1 − μ∈T −1 Q T −1 + (1 − e−ρ )Q T −1 RT −1

(BT −1 Q T −1 + C T −1 RT −1 + DT −1 )2 ]}
+
4 A T −1
{ [
2
= − exp − γ WT −1 − PT −1 Q T −1 + G T −1 Q T −1 + HT −1 Q T −1 + I T −1 Q T −1 RT −1
]}
+ JT −1 RT2 −1 + L T −1 RT −1 + Z T −1 , (1.83)

where

1 1 B2
G T −1 := G T − γ (α T −1 )2 κT2 −1 (σTv −1 )2 − γ (σT∈ −1 )2 + T −1 ;
2 2 4 A T −1
B −1 D −1
HT −1 := HT − α T −1 κT −1 μvT −1 − μ∈T −1 +
T T
;
2 A T −1
. (1.84)
BT −1 C T −1 C2
IT −1 := (1 − e−ρ ) + ; JT −1 := T −1 ,
2 A T −1 4 A T −1
C T −1 DT −1 DT2 −1
L T −1 := , Z T −1 := .
2 A T −1 4 A T −1

[Step 3] For .t ∈ {T − 2, . . . , 1}, we can assume from the above results that the
optimal value function has the following functional form at time .t + 1:

[ ] { [ 2
. Vt+1 st+1 = − exp − γ Wt+1 − Pt+1 Q t+1 + G t+1 Q t+1 + Ht+1 Q t+1
]}
+ It+1 Q t+1 Rt+1 + Jt+1 Rt+1
2
+ L t+1 Rt+1 + Z t+1 . (1.85)

Then, we obtain the following calculation by substituting the dynamics of .Wt , Pt ,


Q t , Rt into the equation above:
1 Introduction to Optimal Execution 29

[ ] [ { [
. Vt st = sup E − exp − γ Wt+1 − Pt+1 Q t+1
qt ∈R
2
+ G t+1 Q t+1 + Ht+1 Q t+1 + It+1 Q t+1 Rt+1
]}| ]
|
+ Jt+1 Rt+1
2
+ L t+1 Rt+1 + Z t+1 |Wt , Pt , Q t , Rt ,
{ [
= sup − exp − γ − At qt2 + (Bt Q t + Ct Rt + Dt )qt + Wt − Pt Q t
qt ∈R
[ 1 1 ] 2
v 2 ∈ 2
+ G t+1 − v 2 γ ηt (σt ) − γ (σt ) Q t
2
2{1 + 2γ ζt (σt ) } 2
[ 1
+ Ht+1 + ηt μvt
1 + 2γ ζt (σtv )2
1 ]
v 2 ∈
− γ θt φ t (σ ) − μ t Qt
1 + 2γ ζt (σtv )2 t
[ 1 ]
+ (1 − e−ρ ) + e−ρ It+1 − v 2
v 2 γ ηt θt (σt ) Q t Rt
1 + 2γ ζt (σt )
[ 1 ]
+ e−2ρ Jt+1 − v 2
v 2 γ θt (σt ) Rt
2 2
2{1 + 2γ ζt (σt ) }
[ 1
+ e−ρ L t+1 + θt μvt
1 + 2γ ζt (σtv )2
1 ]
v 2
− γ θt φ t (σ ) Rt
1 + 2γ ζt (σtv )2 t
[ 1 1
+ Z t+1 + φt μvt − γ φ 2 (σ v )2
1 + 2γ ζt (σtv )2 2{1 + 2γ ζt (σtv )2 } t t
1 ]]}
v 2
+ ζ t (μ ) + x t , (1.86)
1 + 2γ ζt (σtv )2 t

where

α t := αt e−ρ + βt ; ζt := κt2 αt2 e−2ρ Jt+1 ;


.

δt := (α t − 1)κt − κt αt e−ρ It+1 + 2λt κt αt2 e−2ρ Jt+1 ; ηt := −κt α t + κt αt e−ρ It+1 ;
1 1
θt := 2κt αt e−ρ Jt+1 ; φt := κt αt e−ρ L t+1 ; xt := − log √ ,
γ 1 + 2γ ζt (σtv )2

and

. At := (1 − α t )λt − G t+1 + λt αt e−ρ It+1 − λ2t αt2 e−2ρ Jt+1


1 1
+ γ δ 2 (σ v )2 + γ (σt∈ )2 ;
2{1 + 2γ ζt (σtv )2 } t t 2
1
Bt := −λt α t − 2G t+1 + λt αt e−ρ It+1 − γ δt ηt (σtv )2 + γ (σt∈ )2 ;
1 + 2γ ζt (σtv )2
30 M. Shimoshimizu

1
Ct := −(1 − e−ρ ) − e−ρ It+1 + 2λt αt e−2ρ Jt+1 − γ δt θt (σtv )2 ;
1 + 2γ ζt (σtv )2
1
Dt := −Ht+1 + λt αt e−ρ L t+1 + γ δt μvt
1 + 2γ ζt (σtv )2
1
− γ δt φt (σtv )2 + μ∈t . (1.87)
1 + 2γ ζt (σtv )2

Here we have used the following relation:


[ { [ ] }]
E exp −. γ δt qt + ηt Q t + θt Rt + φt vt − γ ζt vt2
{ 1 [ [ ] v
= exp v 2 − γ δt qt + ηt Q t + θt Rt + φt μt
1 + 2γ ζt (σt )
[ ]2 ] }
+ γ δt qt + ηt Q t + θt Rt + φt (σtv )2 − γ ζt (μvt )2 − γ xt ,
2

which stems from the following lemma.

Lemma 1 For a normally distributed random variable . X with mean .μ ∈ R and


variance .σ 2 ∈ R++ , we have
[ { }] { }
1 2aμ + a 2 σ 2 + 2bμ2
.E exp a X + bX =√ .
2
exp (1.88)
1 − 2bσ 2 2(1 − 2bσ 2 )

where .a ∈ R and .b ∈ R, provided .1 − 2bσ 2 > 0.


Remark 6 Remind that when .b = 0, the result is consistent with the one obtained
from the moment-generating function for a normally distributed function.
Proof Direct calculation yields
[ { }]
. E exp a X + bX 2
{∞ { } 1 { }
(x − μ)2
= exp ax + bx √ 2
exp − dx
2π σ 2 2σ 2
−∞
{∞ { ( )2 }
1 1 μ + aσ 2
=√ exp − 2 x−
2π σ 2 2σ /(1 − 2bσ 2 ) 1 − 2bσ 2
−∞
{ [ ]}
1 (μ + aσ 2 )2
× exp − 2 − +μ 2
dx
2σ 1 − 2bσ 2
{ }
−μ2 − 2aμσ 2 − a 2 (σ 2 )2 + (1 − 2bσ 2 )μ2
= exp −
2σ 2 (1 − 2bσ 2 )

2π σ 2 /(1 − 2bσ 2 )
× √
2π σ 2
1 Introduction to Optimal Execution 31

{∞ { ( )2 }
1 1 μ + aσ 2
× √ exp − 2 x− dx
2π σ 2 /(1 − 2bσ 2 ) 2σ /(1 − 2bσ 2 ) 1 − 2bσ 2
−∞
~ ~~ ~
=1
{ }
1 2aμ + a 2 σ 2 + 2bμ2
=√ exp . (1.89)
1 − 2bσ 2 2(1 − 2bσ 2 )

This completes the proof. □


To find the optimal execution volume .qt∗ at time .t ∈ {T − 2, . . . , 1} satisfying Eq.
(1.86), we only have to calculate the same derivation at time .t = T − 1. Completing
the square of

. K t (qt ) := −At qt2 + (Bt Q t + Ct Rt + Dt )qt + Wt − Pt Q t


[ 1 1 ]
γ ηt2 (σtv )2 − γ (σt∈ )2 Q t
2
+ G t+1 − v
2{1 + 2γ ζt (σt ) }
2 2
[ 1
+ Ht+1 + ηt atv
1 + 2γ ζt (σtv )2
1 ]
− γ θt φt (σtv )2 − μ∈t Q t
1 + 2γ ζt (σtv )2
[ 1 ]
+ (1 − e−ρ ) + e−ρ It+1 − γ ηt θt (σtv )2 Q t Rt
1 + 2γ ζt (σtv )2
[ 1 ]
+ e−2ρ Jt+1 − γ θt2 (σtv )2 γt2
2{1 + 2γ ζt (σtv )2 }
[ 1 1
+ e−ρ L t+1 + θt μvt −
1 + 2γ ζt (σtv )2 1 + 2γ ζt (σtv )2
] [ 1 1 1 ]
× γ θt φt (σtv )2 Rt + Z t+1 + v φt μvt − v γ φt2 (σtv )2 + v ζt (μvt )2 + xt ,
1 + 2γ ζt (σt ) 2 2{1 + 2γ ζt (σt ) }
2 1 + 2γ ζt (σt ) 2
(1.90)

then yields the optimal execution volume .qt∗ at time .t ∈ {T − 2, . . . , 1}:

( )
Bt Q t + Ct Rt + Dt
q∗
. t := f (st ) = = at + bt Q t + ct Rt , t = T − 2, . . . , 1,
2 At
(1.91)

where
Dt Bt Ct
a :=
. t , bt := , ct := . (1.92)
At At At

By inserting this into Eq. (1.86), the optimal value function at time.t ∈ {T − 2, . . . , 1}
has a functional form as follows:
32 M. Shimoshimizu

[ ] { [
. Vt st = − exp − γ Wt − Pt Q t
[ 1 1 ]
γ ηt2 (σtv )2 − γ (σt∈ )2 Q t
2
+ G t+1 − v
2{1 + 2γ ζt (σt ) } 2 2
[ 1
+ Ht+1 + ηt atv
1 + 2γ ζt (σtv )2
1 ]
− γ θt φt (σtv )2 − μ∈t Q t
1 + 2γ ζt (σtv )2
[ 1 ]
+ (1 − e−ρ ) + e−ρ It+1 − v γ ηt θ t (σ t
v )2 Q R
t t
1 + 2γ ζt (σt )2
[ 1 ]
+ e−2ρ Jt+1 − γ θ 2 (σ v )2 Rt2
2{1 + 2γ ζt (σtv )2 } t t
[ 1
+ e−ρ L t+1 + θt μvt
1 + 2γ ζt (σtv )2
1 ]
− v γ θt φt (σtv )2 Rt
1 + 2γ ζt (σt ) 2
[ 1 1
+ Z t+1 + φt μvt − γ φ 2 (σ v )2
1 + 2γ ζt (σtv )2 2{1 + 2γ ζt (σtv )2 } t t
1 ] (B Q + C R + D )2 ]}
ζt (μvt )2 + xt +
t t t t t
+ v
1 + 2γ ζt (σt ) 2 4 At
{ [ ]}
2
= − exp − γ Wt − Pt Q t + G t Q t + Ht Q t + It Q t Rt + Jt Rt2 + L t Rt + Z t , (1.93)

where

1 v 2 1 ∈ 2 Bt2
. G t := G t+1 − γ η 2
(σ ) − γ (σ ) + ;
2{1 + 2γ ζt (σtv )2 } t t
2 t
4 At
1 1 Bt Dt
Ht := Ht+1 + ηt μvt − γ ηt φt (σtv )2 − μ∈t + ;
1 + 2γ ζt (σtv )2 1 + 2γ ζt (σtv )2 2 At
1 Bt Ct
It := (1 − e−ρ ) + e−ρ It+1 − v 2
v 2 γ ηt θt (σt ) + ;
1 + 2γ ζt (σt ) 2 At
1 Ct2
Jt := e−2ρ Jt+1 − v 2 γ θt (σt ) +
2 v 2
;
2{1 + 2γ ζt (σt ) } 4 At
1 1 C t Dt
Lt := e−ρ L t+1 + v 2 θt μt −
v v 2
v 2 γ θt φt (σt ) + ;
1 + 2γ ζt (σt ) 1 + 2γ ζt (σt ) 2 At
1 v 1
Zt := Z t+1 + v 2 φt μt − γ φ 2 (σ v )2
1 + 2γ ζt (σt ) 2{1 + 2γ ζt (σtv )2 } t t
1 v 2 Dt2
+ ζ t (μ ) + x t + . (1.94)
1 + 2γ ζt (σtv )2 t
4 At


From the above theorem, we find that the optimal execution volume .qt∗ for .t ∈
{1, . . . , T } depend on the state . st = (Wt , Pt , Q t , Rt ) ∈ S of the decision process
through the remaining execution volume . Q t and the cumulative residual effect . Rt ,
and not through the wealth .Wt or market price . Pt . Not only does our analysis show
1 Introduction to Optimal Execution 33

that the optimal execution strategy becomes a stochastic one, but also it reveals that
the orders posed by small traders (indirectly) affect the execution strategy of the
large trader (through the residual effect). A great number of researches focus on the
execution problem of a single large trader and yield the optimal execution strategy
in a deterministic class, which is different from our results.

Corollary 1 If the aggregate trading volumes submitted by small traders .vt for .t ∈
{1, . . . , T } are deterministic, the optimal execution volumes.qt∗ at time.t ∈ {1, . . . , T }
also become deterministic functions of time. Thus, the optimal execution strategy is
the one in a class of the static (and non-randomized) execution strategy.

1.5.4 In the Case with Target Close Order

We here consider a model with a closing price. The time framework .t ∈ {1, . . . ,
T, T + 1} is the same in the model mentioned above. However, we add an assumption
that a large trader can execute his/her remaining execution volume at time .T + 1,
i.e., . Q T +1 , with closing price . PT +1 . We further assume that the trading at time .T + 1
imposes the large trader to pay the additive cost .χT +1 per unit of the remaining
volume.
According to the above settings, the value function at maturity becomes
[ ] { [ ]}
g
. T +1 (sT +1 ) = VT +1 sT +1 = − exp − γ WT +1 − (PT +1 + χT +1 Q T +1 )Q T +1 .
(1.95)

Then, the following theorem holds.

Theorem 4 (In the case with target close order)


1. The optimal execution volume at time .t ∈ {1, . . . , T, T + 1}, denoted as .qt∗∗ ,
becomes an affine function of the remaining execution volume (. Q t ) and the cumu-
lative residual effect (. Rt ):
( )
q ∗∗ = f t Wt , Pt , Q t , Rt = at∗ + bt∗ Q t + ct∗ Rt , t = 1, . . . , T, T + 1,
. t
(1.96)

where .at∗ , bt∗ , ct∗ for .t ∈ {1, . . . , T, T + 1} are deterministic functions of time .t
which depend on the problem parameters and can be computed backwardly from
maturity .T + 1. [ ]
2. The optimal value function .Vt st at time .t ∈ {1, . . . , T, T + 1} takes the form
as follows:
[ ]
. Vt Wt , Pt , Q t , Rt
{ [ ]}
= − exp − γ Wt − Pt Q t + G ∗t Q t + Ht∗ Q t + It∗ Q t Rt + Jt∗ Rt2 + L ∗t Rt + Z t∗ ,
2
(1.97)
34 M. Shimoshimizu

where .G ∗t , Ht∗ , It∗ , Jt∗ , L ∗t , Z t∗ for .t ∈ {1, . . . , T, T + 1} are deterministic func-


tions of time .t which depend on the problem parameters and can be computed
backwardly from maturity .T + 1.

Proof See [35]. □

We can consider .χT as the cost of the dark pool, which large traders make use of in
a real marketplace.

1.5.5 Computation Method for Optimal Execution

We finally illustrate how to compute the optimal execution strategy shown as Eq.
(1.70).

Algorithm 1 Algorithm for calculating the optimal execution strategy


Parameters <for t ∈ {1, . . . , T }> μvt , σtv , μ∈t , σt∈ , αt , βt , λt , ρ, γ ▷ These are all deterministic.
for t ∈ {T , T − 1, . . . , 1} do
Calculate the dynamics of G t , Ht , It , Jt , L t , Z t backwardly.
end for
for t ∈ {1, . . . , T } do
Calculate the dynamics of at , bt , and ct arising in the definition of qt .
end for
for t ∈ {1, . . . , T } do
if t = T then
Q T = Q T −1 − qT −1
qt = Q T ▷ Terminal condition
else if t ∈ {1, . . . , T − 1} then
Calculate the dynamics of Q t , Rt , and qt .
end if
end for

The algorithm illustrates that if we have the information about .μvt , σtv , μ∈t , σt∈ , αt ,
βt , λt , ρ, γ , then practitioners working as a large trader can use the algorithm to
execute a large amount of orders or as a backtest of his/her trade performance.

1.6 Bibliographic Notes

The pioneering theoretical study for optimal execution strategy is done by [7] that
addresses the optimization problem of minimizing the expected execution cost in a
discrete-time framework via a dynamic programming approach and shows that the
optimal execution strategy is the one equally split over (finite) time horizon under the
presence of temporary impact. Subsequently, [2] derive an optimal execution strategy
1 Introduction to Optimal Execution 35

by considering both the execution cost and volatility risk, which entails the analysis
with a mean-variance approach. [11, 12] incorporate the market impact caused by
other traders into the construction of the midprice process, showing that the optimal
execution strategies are different from the one obtained in [2] when the price impacts
caused by small traders and coincide with the one obtained in [2] when small traders
are assumed to not influence the midprice.
The modeling of the market impact plays an indispensable role in the research
of optimal and equilibrium execution problems. Some studies (e.g., [8]) empirically
show that part of the market impact at a real market consists of a transient impact. A
number of empirical and theoretical researches then investigate the transient impact
modeling, which is empirically compatible with the real situation. [17, 44] consider
the so-called no-arbitrage condition under a transient impact model. [32] show that
the resilience effect of the limit order book does affect the optimal execution strate-
gies. Then, the seminal papers, such as [26, 27, 46] theoretically consider a market
model under a transient impact and show that the transient impact does affect the
optimal execution strategy of a large trader. In addition, [14, 15, 35, 36] show that
the aggregate orders posed by small traders influence the optimal execution strategy
for a large trader under the assumption that the market impact has the temporary,
permanent, and transient parts. For multiple large traders’ equilibrium execution
problems, [29, 43] derive equilibrium execution strategies under a transient price
impact model. These execution strategies are in a deterministic and static class. [34,
35, 37] derive equilibrium execution strategies in a randomized and dynamic class.
As in the analysis of [7], a lot of research applies a method of dynamic program-
ming approach. For example, [12] study the optimal execution strategies considering
the VWAP as well as the market order-flow and provide the optimal execution speed
in an explicit form. On the other hand, [21] focuse on constructing a model that
explains a guaranteed VWAP strategy with risk mitigating and finds that optimal
trading speed for the strategy is characterized by a Hamiltonian system (through
Legendre transform). [13] consider the correlated multi-assets liquidation problem
with the information of untraded assets incorporated into the price dynamics. [26,
27] construct models for an investor to maximize an expected utility payoff from the
final wealth at maturity via a dynamic programming approach.
A series of recent studies focus on the optimal execution of multiple (correlated)
assets. As an extension of [11, 13] demonstrate the multi assets execution problem.
The paper considers an optimal execution strategy of a single large trader for multiple
risky assets using the information of both assets that he/she trades and does not. Their
research concerns a market that large orders impose both temporary and permanent
impacts, but not transient one. Another research [46] investigates the optimal execu-
tion strategy of multiple risky assets under the assumption that a single large trader
exists in a financial market and the orders posed by the large trader cause a tran-
sient impact. Following these researches, a number of studies concern the optimal
execution of multiple correlated risky assets.
The cross-impact has received much attention in recent years. [44] investigate a
condition for a market to admit no arbitrage opportunities and show that the cross-
impact of asset.i on asset. j must be identical to that of asset. j on asset.i. This condition
36 M. Shimoshimizu

is equivalent to the symmetricity of the market impact matrix representing all of the
market impacts on the order execution of multiple assets. From a theoretical point of
view, [1] examine the property of the so-called decay kernel, a matrix representing
the resilience speed of temporary impacts of multiple assets with cross-impact. They
show that the decay kernel must be (i) nonnegative, (ii) nonincreasing with respect
to trading time, (iii) convex with respect to trading time, and (iv) commuting.
There are a few papers that extend Ohnishi and Shimoshimizu model. [14, 36]
investigate the case that aggregate orders posed by small traders have a Markovian
dependence as follows:

v 0 = 0;
. (1.98)
v t+1 = (avt+1 − bvt+1 v t ) + σ vt+1ω t+1 , t = 0, . . . , T − 1,

where .ω t ∼ NRd (0, 1) for all .t ∈ {1, . . . , T }. (The dimension is set as .d = 1 for [14]
and .d = 2 for [36].) In this case, the optimal execution becomes

q ∗ := at + bt Q t + ct Rt + d t v t−1 , t = 0, . . . , T.
. t (1.99)

Thus, the previous aggregate orders posed by small traders directly and indirectly
affect the optimal execution strategy. [15] further investigates a continuous-time
analog of [14].
The question ‘How do traders act in the dark pool?’ and ‘To what extent does
the dark pool affect the market quality and market efficiency?’ has attracted both
empirical and theoretical researchers in the last decade. For more detail, see, eg.,
[25, 28].
Some papers delve into the interaction between more than one large trader; exam-
ples are [29, 43, 45], to mention only a few related papers. [45] analyze the interaction
of two large traders on their execution strategies, which inspires the following two
works. In [43], they formulate what they call a market impact game model (as a static
strategic game model). This study discovers some features of a Nash equilibrium
strategy, proving that a unique Nash equilibrium exists in a class of static and deter-
ministic strategies in explicit form. They also prevail, via a rather direct method, that
the equilibrium is also a Nash equilibrium in a broader class of dynamic strategies.
Subsequently, [29] extend the above model to .n-large trader model and constructs
cost minimization problems in terms of a mean-variance and expected utility max-
imization problems. A significant result of their analysis is that a Nash equilibrium
exists in each problem, which is also in explicit form and is unique for the former
one. They also reveal that the Bachelier price model renders the Nash equilibrium
obtained from each problem identical, where the price is composed of a Brown-
ian motion as a term expressing the volatility of the stock price. These studies are
noteworthy since they theoretically highlight the interaction of execution strategies
among multiple large traders.
Much of the research has been conducted to search for optimal trading perfor-
mance with trading (transaction) costs. [18, 19] theoretically consider portfolio
selection problems with transaction costs (which can be seen as market impact) by
1 Introduction to Optimal Execution 37

assuming the quadratic trading costs for the trading shares. They show that the optimal
trading strategies, under the maximization problem of the sum of all future expected
returns with the penalty for the risks and transaction cost, becomes a weighted aver-
age of the existing portfolio and the aim portfolio, which is the weighted average of
the current Markowitz portfolio and the expected Markowitz portfolio of the remain-
ing infinite future horizon. Another research [30] further investigate these works
when a CARA investor executes a large amount of orders in a finite time horizon
and shows that the CARA investor is sensitive to the risk which the return-predicting
factor causes while it is not the case in the above model.

Acknowledgements The author is partly supported by the JSPS Grant-in-aid for Early-Career
Scientists #21K13325.
Competing Interests The author has no conflicts of interest to declare that are relevant to the
content of this chapter.

Appendix 1: Lagrange Multiplier Method

Many problems arising from economic or financial problems often result in a max-
imization (or minimization) problem with equity constraint. To be precise, we often
face the following type of optimization problem:
( )
. maxn or minn f (x) (1.100a)
x∈R x∈R

subject to g(x) = 0,
. (1.100b)

where . f : Rn → R, and . g: Rn → Rm , defined by


⎛ ⎞
g1 (x)
⎜ .. ⎟
. g(x) := ⎝
. ⎠, (1.101)
gm (x)

and .0 := (0, . . . , 0)T ∈ Rm . The function . g indicates that the optimization problem
is subject to .m equality constraints. Here we assume that .n ≥ m holds. Under some
regularity conditions, the following theorem provides a necessity condition that the
optimal solution must satisfy.
38 M. Shimoshimizu

Theorem 5 (The Theorem of Lagrange) Assume that we have the following maxi-
mization (or minimization) problem as in Egs. (1.100a) and (1.100b). We also sup-
pose that a local maximum or minimum, denoted by . x ∗ , exists, and the rank of the
Jacobian:
⎛ ∂g (x ∗ ) ∗ ⎞
1
∂ x1
· · · ∂g∂1 x(xn )
∂g ∗ ⎜ . .. .. ⎟
. (x ) := ⎜
⎝ .. . . ⎠∈R
⎟ m×n
, (1.102)
∂x ∗
∂gm (x ) ∂gm (x ) ∗

∂ x1
· · · ∂ xn

is .m. (That is, the Jacobian matrix is full rank.) Then, there exists a vector .λ∗ :=
(λ∗1 , . . . , λ∗m ) ∈ Rm such that
| |
∂ f (x) || ∑
k
|
∗ ∂gi (x) |
| + λi = 0. (1.103)
∂ x | x=x ∗
.
∂ x x=x ∗ i=1

Proof See [41]. □

Equation (1.103) implies a first-order condition for the above problem. The
method that narrows down the candidates is referred to as the Lagrange Multiplier
Method.

Remark 7 (The Rank Condition)


The rank condition in the Theorem of Lagrange is referred to as the constraint quali-
fication under equity constraints. This condition means that the rank of the Jacobian
matrix is equal to the number of constraints. This constraint is a key ingredient for
proving the theorem(, although we omit the proof here). For the readers who want
to learn the necessity of the constraint qualification, see, e.g., [41].

The readers can confirm that the condition is satisfied for the expected cost min-
imization problem of the AC model since in the model . ∂∂ xg (x ∗ ) becomes
⎛ ∗⎞ ⎛ 2Q ⎞ ⎛ ⎞
( ) 2q1 ( ) n 0
ψ ⎜ . ⎟ ψ ⎜ . ⎟ ⎜.⎟
. η − .
⎝ . ⎠= η− .
⎝ . ⎠ /= ⎝ .. ⎠ , (1.104)
2 ∗
2 2Q
2qn n
0

and the rank of the above vector is one.

Appendix 2: Second-Order Linear Difference Equation:


A Review

This section is devoted to the introduction of a theory of second-order linear differ-


ence equation arising in the AC model.
1 Introduction to Optimal Execution 39

Linear Difference Equation: A Quick Review

A first-order linear difference equation generally takes the following form:

. x n+1 = Ax n , (1.105)

where . x n ∈ Rd for all .n ∈ Z++ and .A ∈ Rd×d , with the initial condition . x 0 = x ∈
Rd . The following one-dimensional example illustrates the basic concept of the first-
order linear difference equation.

Example 1 (A simple one-dimensional case)


Let .xn ∈ R for all .n ∈ Z++ satisfy the following dynamics:

x
. n+1 = axn , (1.106)

where .a ∈ R++ , with .x0 = 1. Then the explicit solution for .xn for .n ∈ Z++ is

x = a n x0 = a n .
. n (1.107)

This solution implies that




⎨0 if a ∈ (0, 1);
. lim x n = 1 if a = 1; (1.108)
n→∞ ⎪

+∞ if a ∈ (1, +∞).

Second Order Linear Difference Equation

Definition 2 (Second-order linear difference equation) The following equation:

x
. n+2 = axn+1 + bxn , (1.109)

where .a, b ∈ R, is called the second-order linear difference equation.

The general idea for solving the second-order linear difference equation is cap-
tured by the following example.

Example 2 (Fibonacci Sequence)


Consider the following second-order linear difference equation:

. Fn+2 = Fn+1 + Fn , (1.110)

with . F0 = 0 and . F1 = 1. This sequence is so-called the Fibonacci sequence. This


example is not directly related to a financial problem, but we put the exposition here
40 M. Shimoshimizu

since the Fibonacci sequence is well-known and easy to intuitively understand. To


derive the general form of the solution, let us first define
( )
Fn+1
. x n := ∈ R2 . (1.111)
Fn

Using. x n , we obtain the matrix representation of the following simultaneous equation


{
Fn+2 = Fn+1 + Fn ;
. (1.112)
Fn+1 = Fn+1 ,

as follows:
(
)
11
. x n+1 = x =: Ax n . (1.113)
10 n

The eigenvalues for .A, denoted by .λ1 and .λ2 with .λ1 > λ2 , are given by
√ √
1+ 5 1− 5
|A − λI| = λ − λ − 1 = 0
.
2
⇐⇒ λ1 = , λ2 = . (1.114)
2 2
The (typical) corresponding eigenvectors, .v 1 and .v 2 , is given by
( ) ( )
λ1 λ2
v =
. 1 , v2 = . (1.115)
1 1

Thus, by decomposing the matrix .A, we obtain


( ) ( ) ( )
Fn+2 ( ) λn1 0 ( )T 1
. = x n+1 = Ax n = A n+1
x 0 = v1 v2 v1 v2 . (1.116)
Fn+1 0 λn2 0

Combining the above equation with the fact that . x 0 = v 1 − x 2 /λ1 − λ2 , we obtain
(( √ )n ( √ )n )
λn1 λn2 1 1+ 5 1+ 5
. Fn = − =√ − . (1.117)
λ 1 − λ2 λ 1 − λ2 5 2 2

Derivation of Eq. (1.31)

Here we show how one can derive Eq. (1.31) and the reason that the equation is
an approximation of the solution. Let us assume that the solution of the following
equation:
1 Introduction to Optimal Execution 41
( ) ( )
. η Q k − Q k−1 − η Q k+1 − Q k + γ σ 2 Q k = 0, (1.118)

can be rewritten as follows:

. Q k = C + eκtk + C − e−κtk , (1.119)

where .C + and .C − are the constants determined by the boundary conditions: . Q 0 = Q


and . Q T = 0. Then we have, by the boundary conditions,
{ {
−Qe−κ T
C + + C − = Q; C+ = eκ T −e−κ T
;
. ⇐⇒ Qeκ T
(1.120)
C − eκ T + C − e−κ T = 0. C− = eκ T −e−κ T
.

Therefore, we have

−Qe−κ(T −tk ) + Qeκ(T −tk ) sinh (κ(T − tk )


. Qk = κ(T −t ) −κ(T −t )
=Q . (1.121)
e k −e k sinh (κ T )

Note that this is not the true solution for the above equation, although this approxi-
mation is similar to the one obtained in a continuous-time model.

Appendix 3: Euler-Lagrange Equation

As in the continuous-time problem of Almgren-Chriss model, the following type of


optimization problem arises in some financial problems:

{a
. min f (xt , ẋt , t)dt, (1.122)
{xt }t∈[a,b]
b

subject to various kinds of constraints over a set of functions defined on .[a, b](,
which, in the following, is symbolically denoted by .S). Here we review a necessary
condition for a minimization problem as the form (1.122) with initial and terminal
conditions (or the so-called boundary conditions). The necessary condition is given
by the form of a differential equation that the minimizer satisfies.

Theorem 6 (Euler-Lagrange equation [42, Theorem 2.1]) Suppose that


1. .S := {x ∈ C 1 [a, b]|xa = α, xb = β};
2. The function . f : R3 → R, defined by

f
.(ξ, θ, τ ) |→ f (ξ, θ, τ ), (1.123)

has continuous partial derivatives of order .2;


42 M. Shimoshimizu

3. . F: S → R is given by

{b
. F(x) := f (xt , ẋt , t)dt (1.124)
a

for all .x ∈ S.

Then, we have
1. If .x ∗ ∈ S is a minimizer of . F, the minimizer satisfies the Euler-Lagrange equa-
tion:
( )
∂F ( ∗ ∗ ) d ∂F ( ∗ ∗ )
. x , ẋ , t − x , ẋ , t = 0, (1.125)
∂ξ t t dt ∂θ t t

for all .t ∈ [a, b].


2. If . F is convex, and .x ∗ ∈ S satisfies the Euler-Lagrange equation, .x ∗ is a mini-
mizer of . F.

Proof See [42]. □

Appendix 4: Second-order Linear ODE: A Review

This section reviews how to solve an ordinary differential equation (ODE), in par-
ticular, a second-order linear ODE with constant coefficients.

Linear ODE: A Quick Review

This subsection is a quick review of linear ODE with two examples illuminating the
essence of ODEs.15

Example 3 Assume that a differentialble function .x: R → R satisfies the following


ODE:

15 The term linear implies the following form of ODE:


(n) (n−1) (1)
.a1 x t + a2 x t + · · · + an x t + an xt = ẋt , (1.126)
(i)
where .xt for .i ∈ {1, . . . , n} is the .ith differentiation with respect to .t. (We implicitly assume that
.a1 / = 0.).
1 Introduction to Optimal Execution 43

ẋ − 2t = 0.
. t (1.127)

Integrating both sides of the equation .ẋt = 2t results in


( { )
x
. t = 2tdt = t 2 + C, (1.128)

where .C is an integral constant. A solution with some (uncertain) constant is referred


to as general solution. Deriving a general solution is the first step to analyze any
ODEs.16 If there is another additional condition, e.g., .x0 = 2, then .C = 2 and

x = t 2 + 2.
. t (1.129)

A solution for an ODE without any (uncertain) constant is the particular solution.
The condition like “.x0 = 2” is called a initial condition.

Example 4 (Exponential function)


The exponential function . f t := et satisfies . f˙t = f t . As might be associated with this
fact, the general solution of the following type of ODE:

f˙ = f t
. t (1.130)

is expressed as . f t = Cet with a constant .C determined by an initial condition (if


existing). In addition, if . f t satisfies

f˙ = β f t ,
. t (1.131)

then the general solution of the following type of ODE is given by

. tf = Ceβt . (1.132)

Second-order Linear Ordinary Differential Equation

Let .x: T(⊂ R) → R be a twice-differentiable function on .T. We introduce a defini-


tion of a class of ODE as follows.

Definition 3 (Second-order linear ordinary differential equation) The following


type of ODE:

a ẍ + a2 ẋt + a3 xt = yt ,
. 1 t (1.133)

16 Note that all ODEs do not have an explicit solution with necessity.
44 M. Shimoshimizu

with .a1 /= 0 is called second-order linear ordinary differential equation (ODE) with
constant coefficients. The ODE is said to be homogeneous if . yt ≡ 0 for all .t ∈ T,
and non-homogeneous if otherwise.

We assume that. yt ≡ 0 in the rest of this section. The following explanation reveals
some features of Eq. (1.133). Let .xt be an exponential function, i.e., .xt = eβt . Then,
substituting this into Eq. (1.133) yields
( )
. a1 β 2 + a2 β + a3 eβt = 0. (1.134)

From the fact that .eβt /= 0 for all .t ∈ R,

a β 2 + a2 β + a3 = 0
. 1 (1.135)

holds. This equation is called the characteristic equation of Eq. (1.133). Assume that
the characteristic equation has two real solutions, for example, .β1 and .β2 . Then, Eq.
(1.133) has the two solutions:

x 1 = eβ1 t , xt2 = eβ2 t .


. t (1.136)

By taking a linear sum of the two solutions:

x ∗ := A1 eβ1 t + A2 eβ2 t ,
. t (1.137)

a simple calculation yields


( ) ( )
a ẍ ∗ + a2 ẋt∗ + a2 xt∗ = A1 a1 ẍt1 + a2 ẋt1 + a3 xt1 + A2 a1 ẍt2 + a2 ẋt2 + a3 xt2 = 0.
. 1 t

(1.138)

This fact leads to the following result.17

17 A rigorous explanation from a linear algebraic point of view shows that the set .V :

.V := {xt |a1 xt2 + a1 xt + a3 = ẋt } (1.139)


is in fact a vector space and the two solutions stemming from the characteristic equation constitute
the basis.
1 Introduction to Optimal Execution 45

Proposition 1 Assume that the characteristic equation has two real solutions,
denoted by .β1 and .β2 .18 Then, the general solution to the second-order linear ODE
(1.133) is given by

x = A1 eβ1 t + A2 eβ2 t ,
. t (1.140)

where . A1 and . A2 are some constants.

Solving Eq. (1.43)

Theorem 6 yields the solution of the following ODE:

λσ 2
. tẍ − xt = 0. (1.141)
η

From the fact that the solution of the following quadratic equation:

λσ 2
α2 −
. = 0, (1.142)
η
/
λσ 2
is given by .α = ±κ (.κ := η
), the general solution to Eq. (1.141) becomes

x = A1 eκt + A2 e−κt ,
. t (1.143)

where. A1 and. A2 are constatnts determined by boundary conditions. Combining with


the boundary conditions: .x0 = Q and .x T = 0, we obtain
{
x0 = A1 + A2 = Q;
. (1.144)
x T = A1 eκ T + A2 e−κ T = 0,

Solving the above simultaneous equation results in


{
A1 = 1−e12κ T Q;
.
e2κ T
(1.145)
A2 = − 1−e 2κ T Q.

18 To be precise, the solution of the ODE (1.133) exists regardless of the features of the solution.
46 M. Shimoshimizu

Therefore, the solution to Eq. (1.141) is expressed as

eκt eκ(2T −t) eκ(T −t) − e−κ(T −t) sinh(κ(T − t))


x =
. t Q − Q = κ −κ
Q= Q.
1−e 2κ T 1−e 2κ T e −e
T T sinh(κ T )
(1.146)

Appendix 5: A Review of Discrete-time Stochastic Dynamic


Programming

This section briefly explains a discrete-time stochastic dynamic programming con-


cerning Markov decision process formulation. In particular, the readers can concisely
understand the dynamic programming principle and Bellman’s principle of optimal-
ity.

State Process

The fundamental characteristics of dynamic programming are the information about


when a decision maker acts and how the state transition occurs in accordance with
the action chosen by the decision maker. We first define the information about the
stage, state, decision, and dynamics of the state transition described as follows.
1. A stage is a set of timing that a decision maker (or multiple decision makers) acts.
In a discrete-time setting, the timing is typically described by .t ∈ {1, 2, . . . , T }
(.T ∈ Z++ ). On the contrary, the timing of an action a decision maker takes gen-
erally becomes .[0, T ] (.T ∈ [0, ∞]) when considering a continuous-time setting.
2. A state at time .t ∈ {1, 2, . . . , T } is a set of variables through which the deci-
sion maker knows information at the time. The variables consisting of the
state are called state variables. In the following, we let . st be the state at time
.t ∈ {1, 2, . . . , T }.

3. A decision rule is the action that the decision maker takes at each time
.t ∈ {1, 2, . . . , T }. The action at time .t ∈ {1, 2, . . . , T } affects the state at the

next time .t + 1 ∈ {2, . . . , T + 1}. The action of the decision maker at time
.t ∈ {1, 2, . . . , T } is denoted as . q t .
19
The set of all actions the decision maker
can take at time .t ∈ {1, 2, . . . , T } is expressed as .At .20
4. The dynamics of the state variables, or so-called the law of motion, are the ones
that are determined after the decision maker take an action .q t at a state . st for

19 .q may be real-valued or vector-valuled.


t
20 When the set of all actions the decision maker can take at time .t ∈ {1, 2, . . . , T } depends on . st ,
we explicitly express the set as .At [st ]. When, contrary to that, the set of all actions the decision
maker can take at time .t ∈ {1, 2, . . . , T } does not depend even on the time (as well as the state)
such as .At = R, we simply denote the set by .A.
1 Introduction to Optimal Execution 47

time .t ∈ {1, . . . , T }. We often describe the relationships between . st and . st+1 via
a (Borel measurable) function . h(: S × At × Rl → S) as follows:
( )
s
. t+1 = h st , q t , ∈ t . (1.147)

Here .∈ t ∈ Rl is a random variable expressing a stochastic disturbance term that


occurs at time.t ∈ {1, . . . , T }. Generally speaking, the law of motion is described
explicitly so that we often explain the state transition in a stacked form as (1.147).

For each time .t ∈ {1, 2, . . . , T }, a payoff arises depending on the state at time .t and
( that) the decision maker takes at time .t. Here we denote the payoff at time
the action
.t as . gt s t , q t . The decision maker aims to maximize the expected sum of payoffs
for all .t ∈ {1, 2, . . . , T } defined as follows:
[ T | ]
∑ ( ) |
E1
. gt st , q t , ∈ t + gT +1 (s T +1 ) || s1 . (1.148)
t=1

Dynamic Programming Equation

A key concept for dynamic programming is the so-called Bellman’s principle of


optimality. The decision maker aims to maximize the expected sum of payoffs arising
from each time. We denote the expected sum as follows:
[ T | ]
[ ] ∑ ( ) |
. V1 s 1 := max E1 gn sn , q n , ∈ n + gT +1 (s T +1 ) || s1 .
(q 1 ,...,q T )∈A1 ×···×AT n=1
(1.149)

Similarly,
[ ] let us define the expected sum of payoff after time .{t, t + 1, . . . , T } as
Vt st :
.

[ | ]
[ ] ∑
T
( ) |
. Vt s t := max Et gn sn , q n , ∈ n + gT +1 (s T +1 ) || st . (1.150)
(q t ,...,q T )∈At ×···×AT n=t

Equation (1.150) corresponds to the tail problem starting from time .t ∈ {1, . . . , T }.
Under the above setting, Bellman’s principle of optimality becomes as follows.

Theorem 7 (Bellman’s principle of optimality: Bellman equation) Define the opti-


mal[ value
] finction at time .t ∈ {1, 2, . . . , T } as Eq. (1.150). Then, the value function
. Vt s t satisfies the so-called Bellman equation (or dynamic programming equation)

as follows:
48 M. Shimoshimizu

[ ] [ ( ) [ ( ) ]|| ]
. Vt st = max Et gt st , q t , ∈ t + Vt+1 h st , q t , ∈ n | st (1.151)
q t ∈At
( )
[ ( ) [ ]|| ]

. = max Et gt s t , q t , t + Vt+1 s t+1 | s t (1.152)
q t ∈At

The principle that this recursive mechanism holds for all time.t ∈ {1, . . . , T } is called
the Bellman’s principle of optimality.

Proof See, for example, [3] or [6]. □

The dynamic programming originated from [4, 5]. Application of Bellman’s prin-
ciple of optimality to financial problems is well explained in [3]. Readers can also
refer to a proof for a deterministic version of Bellman’s principle of optimality with
economic applications in [41].

Remark 8 (Ohnishi and Shimoshimizu Model)


As for the model explained in Ohnishi and Shimoshimizu model [35], the reward
function is set as

g (s1 ) = g(s2 ) = · · · = g(s T ) = 0;


. 1 (1.153)
{ }
. gT +1 (s T +1 ) := − exp − γ WT +1 . (1.154)

Also, their model uses the Markov decision process approach. This approach defines
the action of the decision maker at time .t as a map from the state at time .t to the
action at time .t: . f t : At → R, (that is, .qt = f t (st )). Then, the Bellman equation takes
the form of Eq. (1.69).

References

1. Alfonsi, A., Klöck, F., Schied, A.: Multivariate transient price impact and matrix-valued positive
definite functions. Math. Oper. Res. 41, 914–934 (2016)
2. Almgren, R., Chriss, N.: Optimal execution of portfolio transactions. J. Risk 3, 5–39 (2000)
3. Bäuerle, N., Rieder, U.: Markov Decision Processes with Applications to Finance. Springer,
Berlin, Heidelberg (2011)
4. Bellman, R.: The theory of dynamic programming. B. Am. Math. Soc. 60, 503–515 (1954)
5. Bellman, R.: Dynamic Programming. Princeton University Press (1957)
6. Bertsekas, D.: Dynamic programming and optimal control: Volume I (Vol. 1). Athena Scientific
(2012)
7. Bertsimas, D., Lo, A.W.: Optimal control of execution costs. J. Financ. Mark. 1, 1–50 (1998)
8. Bouchaud, J.P., Gefen, Y., Potters, M., Wyart, M.: Fluctuations and response in financial mar-
kets: the subtle nature of ‘random’ price changes. Quant. Financ. 4, 176–190 (2004)
9. Capiński, M., Kopp, P.E.: Measure. Springer-Verlag, Integral and Probability (2004)
10. Cartea, Á., Jaimungal, S., Penalva, J.: Algorithmic and High-frequency Trading. Cambridge
University Press (2015)
11. Cartea, Á., Jaimungal, S.: Incorporating order-flow into optimal execution. Math. and Financ.
Econ. 10, 339–364 (2016)
1 Introduction to Optimal Execution 49

12. Cartea, Á., Jaimungal, S.: A closed-form execution strategy to target volume weighted average
price. SIAM J. Financ. Math. 7, 760–785 (2016)
13. Cartea, Á., Gan, L., Jaimungal, S.: Trading co-integrated assets with price impact. Math. Financ.
29, 542–567 (2019)
14. Fukasawa, M., Ohnishi, M., Shimoshimizu, M.: Discrete-time optimal execution under a gen-
eralized price impact model with Markov exogenous orders. Int. J. Theor. Appl. Financ. 24,
2150025 (2021)
15. Fukasawa, M., Ohnishi, M., Shimoshimizu, M.: Optimal execution under a generalized price
impact model with Markovian exogenous orders in a continuous-time setting. RIMS Kokyuroku
2207, 1–22 (2022)
16. Fruth, A., Schöneborn, T., Urusov, M.: Optimal trade execution and price manipulation in order
books with time-varying liquidity. Math. Financ. 24, 651–695 (2014)
17. Gatheral, J.: No-dynamic-arbitrage and market impact. Quant. Financ. 10, 749–759 (2010)
18. Gârleanu, N., Pedersen, L.H.: Dynamic trading with predictable returns and transaction costs.
J. Financ. 68, 2309–2340 (2013)
19. Gârleanu, N., Pedersen, L.H.: Dynamic portfolio choice with frictions. J. Econ. Theory 165,
487–516 (2016)
20. Guéant, O.: Permanent market impact can be nonlinear. Available at arXiv:1305.0413 (2013)
21. Guéant, O., Royer, G.: VWAP execution and guaranteed VWAP. SIAM J. Financ. Math. 5,
445–471 (2014)
22. Guéant, O.: The Financial Mathematics of Market Liquidity: From Optimal Execution to Mar-
ket Making. CRC Press (2016)
23. Huberman, G., Stanzl, W.: Price manipulation and quasi-arbitrage. Econometrica 72, 1247–
1275 (2004)
24. Kissell, R.: Algorithmic Trading Methods: Applications Using Advanced Statistics, Optimiza-
tion, and Machine Learning Techniques. Academic Press (2021)
25. Kratz, P., Schöneborn, T.: Portfolio liquidation in dark pools in continuous time. Math. Financ.
25, 496–544 (2015)
26. Kuno, S., Ohnishi, M.: Optimal execution in illiquid market with the absence of price manip-
ulation. J. Math. Financ. 5, 1–14 (2015)
27. Kuno, S., Ohnishi, M., Shimizu, P.: Optimal off-exchange execution with closing price. J. Math.
Financ. 7, 54–64 (2017)
28. Laruelle, S., Lehalle, C. A.: Market Microstructure in Practice Second Edition. World Scientific
(2018)
29. Luo, X., Schied, A.: Nash equilibrium for risk-averse investors in a market impact game with
transient price impact. Market Microstruct. Liquidity 5, 2050001 (2019)
30. Ma, G., Siu, C.C., Zhu, S.-P.: Dynamic portfolio choice with return predictability and transac-
tion costs. Eur. J. Oper. Res. 278, 976–988 (2019)
31. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)
32. Obizhaeva, A.A., Wang, J.: Optimal trading strategy and supply/demand dynamics. J. Financ.
Mark. 16, 1–32 (2013)
33. O’Hara, M.: High frequency market microstructure. J. Financ. Econ. 116, 257–270 (2015)
34. Ohnishi, M., Shimoshimizu, M.: Equilibrium execution strategy with generalized price impacts.
RIMS Kokyuroku 2111, 84–106 (2019)
35. Ohnishi, M., Shimoshimizu, M.: Optimal and equilibrium execution strategies with generalized
price impact. Quant. Financ. 20, 1625–1644 (2020)
36. Ohnishi, M., Shimoshimizu, M.: Optimal pair-trade execution with generalized cross-impact.
Asia-Pac. Financ. Mark. 29, 253–289 (2022)
37. Ohnishi, M., Shimoshimizu, M.: Trade execution game in a Markovian environment. Available
at SSRN (2023)
38. Perold, A.F.: The implementation shortfall: paper versus reality. J. Portfolio Manage. 14, 4–9
(1988)
39. Potters, M., Bouchaud, J.P.: More statistical properties of order books and price impact. Physica
A 324, 133–140 (2003)
50 M. Shimoshimizu

40. Rudin, W.: Principles of Mathematical Analysis. McGraw-Hill, New York (1976)
41. Sundaram, R. K.: A First Course in Optimization Theory. Cambridge University Press (1996)
42. Sasane, A.: Optimization in Function Spaces. Courier Dover Publications (2016)
43. Schied, A., Zhang, T.: A market impact game under transient price impact. Math. Oper. Res.
44, 102–121 (2019)
44. Schneider, M., Lillo, F.: Cross-impact and no-dynamic-arbitrage. Quant. Financ. 19, 137–154
(2019)
45. Schöneborn, T.: Trade Execution in Illiquid Markets: Optimal Stochastic Control and Multi-
agent Equilibria. Doctoral dissertation, Technische Universität Berlin (2008)
46. Tsoukalas, G., Wang, J., Giesecke, K.: Dynamic portfolio execution. Manage. Sci. 65, 2015–
2040 (2019)
47. Velu, R., Hardy, M., Nehren, D.: Algorithmic Trading and Quantitative Strategies. Chapman
and Hall/CRC (2020)
Part II
Tools and Techniques
Chapter 2
Python Stack for Design
and Visualization in Financial
Engineering

Jayanth R. Varma and Vineet Virmani

Abstract The practical problems in financial engineering are highly interdisci-


plinary, requiring as much facility with applied mathematics, statistics and program-
ming as with finance. Solving mathematically challenging problems and writing
efficient computer programs to price complex structured products is just one part
of the puzzle, however. Given the number of design elements involved in creating
such products, a front end that combines visualization and interactivity is as impor-
tant as speed and efficiency of computations. In this note we highlight the power of
the Python stack for designing graphical user interfaces for engineering structured
product solutions by visualizing their payoffs and prices in a web browser. Object-
oriented programming in Python combined with the power of NumPy, Matplotlib
and Jupyter fits the bill perfectly for design and visualization in financial engineering.
We find that Python combined with Jupyter is not only very well suited for designing
and visualizing structured products and examining the impact on pricing as different
design elements are tweaked, but it is also amenable to a variety of extensions and
integration with other open-source computational finance libraries.

Keywords Financial engineering · Jupyter · Matplotlib · Python · Visualization

2.1 Introduction

The field of financial engineering consists of designing and structuring customized


solutions for investments and risk management by combining plain vanilla assets
like stocks and bonds with call and put options. A put option on a Facebook stock,
for example, protects the buyer by giving her the right to sell the Facebook stock

J. R. Varma · V. Virmani (B)


Indian Institute of Management Ahmedabad, Ahmedabad, India
e-mail: [email protected]
J. R. Varma
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 53


L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_2
54 J. R. Varma and V. Virmani

at a pre-defined strike price on a future date and make a profit if the price of the
Facebook stock falls below the strike price. Financial engineering involves taking
such options and other existing securities like stocks, indices, government bonds,
currencies (or even bitcoins) to create tailor-made insurance like products for a wide
variety of investors and corporations.
It turns out that designing and pricing of such products requires as much facility
with applied mathematics, statistics and programming as with finance. Reflecting
the industry’s need and students’ demand for such skills, an increasing number of
schools of engineering have started offering Master’s and certificate programs in
financial engineering or mathematical/computational finance, with many of them
being jointly offered with the departments of applied mathematics and economics.
While high-paying job opportunities in large banks and hedge funds definitely
explains part of the attraction, the fact that the field draws its toolkit from disci-
plines as varied as theory of stochastic processes and partial differential equations
to practicalities of Monte Carlo simulation and finite difference methods makes it
particularly exciting for students aiming for a career requiring expertise in applied
mathematics and computational methods [1]. Also, given that implementing practical
financial engineering applications routinely involve programming, many engineering
and computer science students find the coding part of the job equally fascinating.
It helps that programming jobs in Goldman Sachs are often far more lucrative than
those in Facebook. The fact that the most famous formula in the field [2] got two
of its inventors (Fischer Black, Myron Scholes and Robert Merton) a Nobel Prize in
Economic Sciences attracts even academically minded students to the field.
Even though there exist many open-source libraries in many languages today to
help solve practical financial engineering problems, ranging from those built in C++
to downloadable Excel add-ins, over time use of Python has become mainstream
at hedge funds and quantitative trading firms. Solving mathematically challenging
pricing problems and writing efficient computer programs to implement them is
just one part of the puzzle, however. Most of the action (and money) in financial
engineering lies in structuring—designing products suited to the exact needs of the
clients and investors, and here the powers of visualization and interactivity are more
important than speed and efficiency of computations.
In this note we highlight the power of the Python stack for designing graphical
user interfaces (GUIs) for engineering structured product solutions by visualizing
their payoffs and prices in a web browser.
The plan of the paper is as follows. After reviewing the literature on design of
such applications for practical and pedagogical use in Sect. 2.2, in Sect. 2.3, we
briefly describe the nature of structured products to understand the importance of
interactivity in visualization for the task at hand. In Sect. 2.4, we describe the Python
modules and the associated classes for designing interactive Python based GUIs.
Section 2.5 describes our main Python application in detail and Sect. 2.6 concludes.
2 Python Stack for Design and Visualization in Financial Engineering 55

2.2 Design of Interactive Applications: Literature Review

When it comes to designing interactive applications for building prototypes and for
pedagogical needs, Microsoft Excel remains one of the most popular software [3].
The fact that there are many websites, blogs and journal articles published on use
of Excel understandably makes it understandably a highly attractive and convenient
choice for beginners as well as for classroom use in a variety of contexts [4–7].
Some of the biggest concerns with using spreadsheets, however, is the compul-
sion to keep the data, inputs and outputs within the same software and lack of a
professional interactive graphic library important for designing and visualizing finan-
cial engineering applications [8]. It has also been often blamed for making it easy
to make ‘silly mistakes’, often leading to embarrassing reputational and financial
consequences for large organizations [9, 10].
Browser-based applications built using high level languages like R and Python
provide an attractive alternative. In this article we introduce one such browser-based
alternative to Excel called Jupyter based on Python. The use of Jupyter to create a
front end on the web browser is a game changer. Today all modern browsers are
compatible in being able to render text, tables and figures equally well, making such
a Python-based application well-suited for both prototypes as well as professional
applications.
To our knowledge, while there is large literature on using Python in natural
sciences or in engineering applications [11] and designing domain specific Python
libraries [12, 13], the literature on designing Python-based financial engineering
applications is still at a nascent stage with only studies on using Python for blockchain
and crypto [14] and as a front-end when business needs require using sophisticated
computational finance libraries built in C++ for efficient implementations [15].
Python for scientific computing and visualization has the advantage of being
open-source and free [16], so ideal for universities and schools with limited budgets
to teach courses in option pricing and financial engineering as well as for budding
trading and boutique consulting firms.

2.3 Design Elements

The category of structured products straddles the market somewhere in between the
market of regular call and put options and over-the-counter derivatives available
only to select large financial institutions and hedge funds. The domain of financial
engineering involves designing such structured products. The field in that sense has
all the flavor of boutique tailoring shops at Savoy, London. It involves designing
customized solutions but using otherwise familiar ingredients. The need for such
products arises from two kinds of clients—typically high net worth individuals and
corporations—those looking to express out-of-consensus and asymmetric views, and
those looking to hedge their financial risks to very specific kind of exposures.
56 J. R. Varma and V. Virmani

Most structured products are designed by embedding option like features on to a


simple risk-free bond. This ensures that only the buyers of structured products are
exposed to the credit risk, making it easier for issuers to offer such products to a
larger client base. Financial institutions have also been known to resort to issuing
structured products to raise money in difficult times by offering an upside potential
to the lenders in form of option like features.
With the range of structured products offerings possible by combining elementary
products, it is neither helpful nor possible to list all possible combinations of struc-
tured products across asset classes. At the same time, they can be broadly classified
into (a) capital guarantee, (b) yield enhancement and (c) participation and leverage
products, in the increasing order of their risk and expected return—with the cate-
gory of yield enhancement products being especially in popular in Switzerland and
Germany (the websites of Swiss SIX and Frankfurt Börse exchanges provide many
examples for products under each category).
A popular yield enhancement product is what is called an ‘autocallable note’, in
which the principal is preserved but is designed to earn a higher interest than a typical
bond if a pre-specified level (called barrier) has been hit. The name autocallable
comes from the fact the product can be ‘called back’ by the seller if a certain pre-
specified event occurs linked to a ‘reference asset’. The reference asset could be a
single stock, a basket of stocks, commodities, foreign exchange rates and/or interest
rates. Such products are very popular with large corporations and banks who may
have cash flows matching or directly offsetting the event linked to the reference asset.
Given the variety that is possible when combining a bond with option like features
and barriers, there is often no way to understand a product without visualizing how
sensitive its payoff and price would be to changes in the inputs. These inputs then
become both the design elements that can be tweaked to get the desired behaviour
as well as the factors on which the price and riskiness of the product depends. In a
typical product these input include contractual features, value of market variables as
well as statistical proxies for risk and dependence:
• Underlying/s: This represents the single asset or multiple assets (called the basket)
based on whose values the performance of the product will be measured. With
a basket of assets, often the ‘average’ and the extremes (‘worst’ or ‘best’) are
used as performance measures. Usually, one also needs to go beyond the value of
the underlying to also visualizing how the payoff and price of a product changes
when the volatility of the underlying/s is changed. For basket products (and when
working with advanced pricing models), in addition to market price and volatility
for each underlying, correlations also become critical.
• Strike: The strike price is one of the most important design tweaks for modifying
the payoff from the option embedded in the product. Depending on the nature of
the product, an option may have more than one strike. For example, a butterfly
payoff is designed by three strikes defining its wings and the body. A related
product is what is called a collar, something we discuss in detail when describing
our interactive Python application.
2 Python Stack for Design and Visualization in Financial Engineering 57

• Maturity: Although not often critical, the duration of the product is also up for
tweaking. It is more a matter of an investor’s (or sometimes the seller’s) preference
on how long the lock-in period is desired. For example, designers often introduce
autocallable features in the product to reduce the expected life of the product.
Other than for pricing, maturity is relatively less important otherwise.
• Barrier level: Upside (or downside) from a structured product is often contingent
on a barrier being hit prior to (or at) maturity of the product. A barrier represents
a pre-specified level for the underlying variable, say, foreign exchange rate. For
example, the payoff from a standard ‘Up-and-In’ call option on EUR/USD gives
the payoff from a call option at maturity if the value of EUR/USD rate has hit
a barrier above its beginning value at some time before maturity. In comparison,
an ‘Up-and-out’ option gives a payoff only if the barrier has not been hit. (‘Up’
and ‘Down’ are defined with respect to the current value of the variable defining
the barrier. If the barrier level is above/below the current value, it is designated as
‘Up/Down’.)
• Caps and Floors: Including caps and floors allow further tweaking of the payoff by
introducing the maximum possible performance (called a cap) and/or a minimum
guaranteed performance (called a floor) from the embedded option in the product.
So, if a buyer is not comfortable with the price of a product, issuers may either
tweak the barrier level or introduce a cap. This limits the upside performance
defined by the level of the cap, but also leads to a reduction in price. A floor
serves a similar purpose.

2.4 Interactive Python Applications with Jupyter


and Matplotlib

In the early years of the development of the IPython project (co-founded by Fernando
Perez and Brian Granger in 2001), IPython environment integrated a terminal, a
Python kernel (console and qtconsole), distributed computing, support for other
languages and the browser-based notebook [17].
In the 10 years since, the notebook part of the project has been split into a more
general purpose browser-based environment called Jupyter (post IPython 3.x), with
the ability to also integrate other popular languages used in the field of data science
like Julia and R. Jupyter is also known to integrate with other languages (list of
unofficial community maintained kernels is available at https://github.com/jupyter/
jupyter/wiki/Jupyter-kernels).
The modern Jupyter environment is further enhanced by a set of magic commands
designed to simplify and speed up commonly used operations like debugging, timing
codes, copying and pasting commands from external sources as well as plotting. With
a browser-based interface to the IPython shell and integration of scripting, formatted
texts and dynamic display capabilities, Jupyter provides an ideal setting for building
a GUI for financial engineering applications. And given the compatibility of all
58 J. R. Varma and V. Virmani

modern browsers in rendering figures, a browser-based interface works equally for


both prototypes as well as for professional applications.
Preceding the development of IPython, led by John Hunter, Matplotlib grew from
its early days as a replacement of MATLAB’s 2D graphics engine [18]. Matplotlib’s
biggest strengths include its integration with NumPy (it is built on NumPy arrays)
and SciPy stacks, and its support for graphic libraries across operating systems.
Unlike IPython and the Jupyter projects, however, Matplotlib has been relatively
slow to evolve, partly because of pyLab’s legacy as a clone of old MATLAB envi-
ronment. Only since version 2.0 has Matplotlib seen major changes in terms of its
configuration, default settings, stylesheets and available toolkit.
Recently Matplotlib has also been significantly extended by modern APIs
provided by Seaborn, Bokeh and plotly libraries, which add simpler high-level func-
tionality to Matplotlib for more advanced plotting. They also provide better integra-
tion with Pandas DataFrames than what has historically existed in Matplotlib, which
predates the Pandas project by almost a decade. The latest stable version is 3.0.3.

2.4.1 Interactive Plotting with Matplotlib

Interactivity in Matplotlib goes beyond using mere triggers (e.g.


matplotlib.pyplot.show() and matplotlib.pyplot.ion()), keymapping and writing
Python scripts to dynamically generate plots. Matplotlib contains a full-fledged
Animation class to create a motion picture of data. The three sub-classes, TimedAn-
imation, FuncAnimation and ArtistAnimation, together provide a convenient
interface to come up with quick and presentable animations for most purposes.
Advanced users can further embellish the animations by adding effects like ‘tails’
(trails to visualize) and fade-in and fade-outs.
For financial engineering applications, however, animations are only of a limited
use. On the other hand, widgets are game changers when it comes to studying and
demonstrating impact of change in inputs and parameters to payoff and price of
a structured product. In plain-speak, widgets are sort of buttons, allowing users to
move beyond a keyboard to interact with a plot.
Fundamentally designed as events with callbacks, Matplotlib comes with many
built in widgets with the ones most relevant for use in the GUI described here include:
• Sliders: Called as widgets.Slider, a slider represents a bar for gradually/uniformly
changing the value of a variable between a minimum and a maximum.
• Buttons: As the name suggests, widgets.Buttons adds buttons to the plot for
handling tasks like navigating or saving plots.
• Check buttons: widgets.CheckButtons are useful when multiple elements reside
in the same plot and one wants to selectively toggle one or all of the elements at
the same time.
Other commonly used widgets include radio buttons, and different shape selec-
tors like lasso, rectangle etc. (https://matplotlib.org/api/widgets_api.html provides
2 Python Stack for Design and Visualization in Financial Engineering 59

the full list). Third-party APIs like Seaborn and Plotly allow additional high-level
interactivity. While the context decides usability of one or more kind of widgets in
a GUI, for the task at hand we find sliders and check buttons to be the most useful.
In the next section we describe building a GUI towards designing a product called
“3-way collar”.

2.5 Python Implementation

2.5.1 The Exemplar Structured Product: A 3-Way Collar

A 3-way collar has all the features one finds in a typical structured product designed
for hedging risks while at the same time not being overly complicated. For producers
of commodities like oil and copper, the biggest risk is the fluctuation in the price of
their output. An oilfield worries about the price of crude oil, a copper mine worries
about the copper price, and a farmer worries about the price of wheat or corn.
There are a variety of ways to hedge the exposure to the sale price, but it is useful to
begin by graphically visualizing producer’s revenue assuming she does not hedge the
exposure at all and then slowly adding elements towards the design of a 3-way collar.
The first plot in Fig. 2.1 shows that the revenue varies one-on-one with fluctuation
in the prices. The danger is clear, at low prices producer might not even cover her
costs. At high prices, of course, she accordingly gains too.
The next plot (titled “Forward”) shows how the risk is completely eliminated
by using a forward contract to lock in the price of 100. Regardless of the output
price, the revenues are fixed at the forward price. This make sense from a pure risk
management perspective, but can be very unattractive if the producer has a view that
prices are more likely to rise than to fall. In this case, all the upside has been given
up to eliminate the downside risk.
The plot titled “Put” shows the power of option contracts. By buying a put option
with a strike price of 100, the producer has the option to sell at 100 while retaining
the ability to sell at the market price if that is higher. The producer keeps most
of the upside from the expected rise in output prices while largely eliminating the
downside risk from falling prices. The problem is that the put option costs money,
with a premium of about 8% in this case (given the assumed Black–Scholes option
pricing model). In the plot, the option premium is the gap between the “Revenue”
and “Net Revenue” lines. The Net Revenue is less than that obtained by the forward
contract unless the output price rises to 108.
The plot titled “Collar” shows a slightly more complex strategy that makes sense
if the producer believes that the price is unlikely to rise above 115. Based on this
view, the producer sells a call option with a strike of 115 in addition to buying a
put option with a strike of 100. The sold call option obliges the producer to sell the
output at 115 even if the market price is higher, while the bought put entitles her to
sell at 100 even if the market price is lower. The sold call option earns a part of the
60 J. R. Varma and V. Virmani

Fig. 2.1 Designing a 3-way collar step by step

premium expended on the bought put option, and the net total premium to be paid
is only around 5% (the gap between the “Revenue” and “Net Revenue” lines in the
plot is smaller than in the previous case).
Finally, the last plot titled “3-way Collar” is a strategy which might be adopted by
a producer who has a view similar to that in the “Collar” case, and also believes that a
significant drop in the output price is unlikely. Specifically, she believes that there is a
very low chance that the price will drop below 90. In this case, she might supplement
the “Collar” with the sale of a put option at 90. The 3-way collar therefore consists
of a put bought at 100 (K1 ), a call sold at 115 (K2 ), and a put sold at 90 (K3 ). This
strategy costs much less money (about 1.3%) to set it up.
Given the plot for 3-way collar in Figure [strategies], it is clear that a 3-way collar
has both a floor and a cap built in. Mathematically, the revenue may be written as:

Revenue of 3-way collar = min(max(S, K2 × I [S > K1 ]), K3 )

where I [S > K1 ] denotes an indicator function which takes the value 1 when S > K1
and 0 otherwise. From a design point of view, ignoring time to maturity, the 3-way
collar has three strikes (K1 , K2 and K3 ) to tweak, each of which can be moved up or
2 Python Stack for Design and Visualization in Financial Engineering 61

down to match the views of the producer, the available budget for option premiums
and the willingness to take risks.

2.5.2 Python Implementation

The interactive plot whose screenshot is shown in Fig. 2.2 builds on top of the
Black_Scholes python module for pricing described later. Having developed the
main module, however, the design of the 3-way collar itself only requires about a
half dozen lines of python code, with the combination of call and put options defining
the product (three_way) built up piece by piece in four lines towards the end of the
code in Appendix.
• The line combo.exposure() denotes the producer’s exposure to the commodity
price as an output price. Had we been considering a consumer of the commodity
worried about the commodity price as an input price, we would have a minus sign
in front −combo.exposure().
• combo.put(K[0]) denotes the put option that was bought with a strike of K[0] =
100.
• −combo.put(K[2]) denotes the sold put option with a strike of K[2] = 90.
• −combo.call(K[1]) is the sold call option with a strike of K[1] = 115.
• .set_name(‘3-way Collar’) gives the combo an easy to recognize name. Other-
wise, the software would give it a name reflecting its component pieces
(Exposure+Put@100−Call@115−Put@90).

Fig. 2.2 A 3-way collar with widgets


62 J. R. Varma and V. Virmani

The interactive plot is generated by calling the appropriate method of the three_
way object:
• three_way.interactive_plot([combo.payoff, combo.profit])
• The two lines that are plotted are what option traders call the payoff and the
profit. For ease of understanding, they are relabelled as Revenue and Net Revenue:
combo.name_mapping = dict(payoff = ‘Revenue’, profit = ‘Net Revenue’).
Had we been analyzing the 3-way collar from the perspective of an options dealer,
we might have chosen to plot the option value or one of its partial derivatives (Delta,
Gamma or Vega) instead of the payoff and the profit. We now turn to the details of
the Black_Scholes python module that does all the heavy lifting.

2.5.3 Black_Scholes: An Object-Oriented Python Module


for Designing and Pricing

As would be clear from the few lines of code in the listing in Appendix, the
Black_Scholes python module is entirely object oriented and is built on a series of
classes derived from the basic GBS class that implements the famous Black–Scholes
formulas [2]. The main derived classes include (the full source code is available on
our GitHub page at https://github.com/Computational-Finance/Black_Scholes):
• GBSx
• option_portfolio
• combos
The Black Scholes theory was important enough to earn a Nobel prize in
economics, but its main practical utility is that it provides analytical formulas for
all quantities of interest—the option prices as well as all its partial derivatives, which
are often referred to as Greeks. An acronym for ‘Generalized Black Scholes’, the
GBS class is an implementation of the Black Scholes formulas for option values,
implied volatility and the important Greeks (the term Generalized indicating that
underlying with dividend-like features—dividend-paying stocks, currencies, futures,
commodities—are also supported).
The constructor of the GBS class takes as parameters all the inputs to the Black
Scholes formulas: price of the underlying, strike price, volatility, time to maturity,
interest rate and the dividend yield. All of these can be NumPy arrays and so an array
of options can be analysed simultaneously. In this case, the class methods return
NumPy arrays. All the functions needed for the Black–Scholes formulas including
the cumulative normal distribution function are readily in available in NumPy and
SciPy and so most of this is a faithful transcription of the formulas into Python [19].
The only tricky part is that some calculations can lead to expressions of the form
‘0/0’ which would normally lead NumPy to return a nan (not a number). Wherever
2 Python Stack for Design and Visualization in Financial Engineering 63

possible, a careful analysis of the limiting behaviour is used to replace this with zero
or infinity.
The GBSx class is a convenience class derived from the GBS class that includes
forward contracts, the underlying and zero-coupon bonds as ‘first class’ objects
without resorting to artificial constructions. It is motivated by the fact that call and
put options can be used to replicate other common instruments. For example, a call
option with a strike of zero is the same as the underlying asset. A long position in a
call option combined with a short position in a put option with the same strike is a
forward contract at that price.
The option_portfolio class represents option portfolios containing long and short
positions in several different options. This is a relatively easy extension because
GBS already allows for an array of options. The only new thing in option_portfolio
is an additional array weight which can be positive (purchased options) or negative
(sold options). The methods of this class return the dot product of the weight with
the array returned by the corresponding method of the GBS class. For example, the
value method of this class calls the value method of the underlying GBS class to get
the values of all the options in the portfolio. It then computes the dot product of this
value array with the weight array to return the value of the entire portfolio as a scalar
quantity. Similarly, the Delta method returns the portfolio delta.
Finally, the combo class is a wrapper around the option_portfolio class focusing on
simplicity and ease of use (‘combo’ is common practitioner term for a combination
or portfolio of options). Instead of using NumPy arrays, this class uses operator
overloading to allow option combos to be built piece by piece as in the earlier example
of a three way collar. Further simplification is possible because combos typically have
the same underlying and maturity, and these common parameters can be represented
by static variables of the combo class. From the user point of view, combos can often
be constructed by specifying only one parameter—the strike. This is facilitated by
a number of static methods of the combo class that construct and return a combo
without the user having to worry about all the arrays and Pandas DataFrames that
the constructor uses internally.
The addition and subtraction operators and the unary minus operator are all over-
loaded to make it easy to combine multiple options by simple addition or subtrac-
tion. The multiplication operator is also overloaded in the special case where the
left operand is a number: 5 * call(110) means 5 call options with a strike of 100.
All this is largely syntactic sugar: for example, all that 5 * call(110) does is to set
the appropriate element of the weight array of the underlying option_portfolio to
5. Most of the addition and subtraction is implemented by merging the two Pandas
DataFrames to avoid reinventing the wheel.

2.5.4 Adding Matplotlib Widgets at Run Time

The other important methods in the combos class plot the values or Greeks (partial
derivatives) of the combo for various values of the underlying price. There is also
64 J. R. Varma and V. Virmani

an interactive_plot function that allows the combo to be redesigned by moving the


strikes around using sliders. This is implemented using Matplotlib widgets.
The number of widgets (sliders) depends on the number of options (strikes) in
the combo. So the widgets have to be set up programmatically at run time. When
each slider is changed, the callback functions must also be created dynamically at
run time. Python’s lexical scoping proves handy here. The function that creates and
returns the slider callback function is defined inside the interactive plot method; by
lexical scoping, the generated callback function has access to the combo and all other
variables pertaining to the graphic call. Care has to be taken while defining the range
of each of the sliders. For example, in the 3-way collar, the strike of the sold put
must be less than that of the bought put which must be less than that of the sold call.
The lower and upper limits of the lowest and highest strike respectively are set at
reasonable distances from the spot price.
When the sliders are moved, the strikes of the underlying combo are changed and
the relevant characteristics of the changed combo are then plotted. This would mean
that if the user plays around with the interactive plot for some time and then quits the
plot, the strikes would be equal to the last position of the sliders. This may or may
not be what the user wanted. To avoid this, the interactive plot method first clones
the combo and then runs the plot on the copy. At exit, the method’s return value is
the altered combo, and the original combo is left unchanged. If the user wishes to
change the original combo, the parameter in place can be set to True.

2.5.5 Extensions: An Example with a Barrier Included

We now discuss a considerably more complex example investment product, which


popularly goes by the name of Phoenix Memory Autocallable Note. An example
of such a product in the public domain is available at the website of US SEC
(see https://www.sec.gov/Archives/edgar/data/1114446/000139340115000435/c41
6152_6901135-424b2.htm).
The particular example considered here is a five-year bond which promised an
annual interest (called coupon) of 10.5% at a time when the prevailing interest rate
was less than 1%. The catch was that the coupon would not be paid at all if at the date
of coupon payment, the Euro Stoxx 50 stock market index level was below a barrier
(set at 60% of the index value at date of issue of the bond). Also, if at maturity, the
Euro Stoxx 50 level was less than the initial level, and the barrier had been hit at any
of the coupon dates during the life of the bond, then the principal would not be repaid
in full: the fraction of the principal to be repaid would be the ratio of the index value
at maturity to the initial index value. Specifically, the product returned the full 100%
of the principal if the index level at maturity was above the initial value (regardless
of whether barrier was hit or not); the product returned 100% also when the barrier
was never hit regardless of the terminal value, but if the barrier was hit in year 2,
and the terminal index level was 80% of the initial value, only 80% of the principal
would be repaid. There were two additional wrinkles in the structure. First was the
2 Python Stack for Design and Visualization in Financial Engineering 65

“memory” feature: if the coupon was not paid in some years (because the index level
was below the barrier), but in a later year, the index level was above the barrier, then
all the missed coupons would be paid along with the coupon for that year. Second
was the autocallable feature: if at any coupon date, the index level is above the initial
value, the bond would be called back prematurely by paying the coupon for the year
and the full principal.
When the issuing bank is designing such an instrument (often in consultation
with one or more potential investors), it has a number of elements to choose from:
the level of the coupon, the level of the barrier, whether or not to include the two
wrinkles (autocall and memory) and with what specifications. With the outcome
uncertain for both the parties, without visualizing the entire probability distribution
of outcomes they may not be in a position or interested to commit. For example,
when the investor pays 100 for this bond, the bank might receive around 97 (after
marketing and distribution costs) and must ensure that the discounted present value
of its expected payments amount to only 97. Otherwise, it makes a loss. At the same
time, the investor would have a view of how the stock market is likely to behave, and
would analyze the expected profit from the product based on that view.
This requires both a sophisticated pricing library like QuantLib as well as a visual-
izing platform. An object-oriented Python based module again perfectly fits the bill.
QuantLib-Python could be used to analyze different pricing scenarios using Monte
Carlo simulation [15], and the power of NumPy, Jupyter and Matplotlib could be
leveraged to visualize the results.
As an illustration, the two plots in Figs. 2.3 and 2.4 show the visualization of the
actual instrument described above alongside that of an alternative design which yields
roughly the same value but a different probability distribution and risk profile. In the
alternative design (Fig. 2.4), the memory feature has been switched OFF (making it
worse for the investor), but this has been offset by reducing the barrier level to 50%
thereby reducing the risk of principal loss.

2.6 Conclusion

The market for financial structured products runs in hundreds of billions of dollars
worldwide, with the variety ranging from payoffs from plain vanilla call and put
options to complex autocallable products with barrier and memory features. Given
the number of design elements involved which can be tweaked in creating such
products, there is often no way to determine their behaviour other than visualizing
how sensitive their payoff and price would be to changes in the contractual features,
market variables and statistical proxies for risk and dependence.
In this note we have highlighted the power of the Python stack for designing
graphical user interfaces for engineering structured product solutions by visualizing
their payoffs and prices in a web browser. Object-oriented programming in Python
combined with the power of NumPy, Matplotlib and Jupyter fits the bill perfectly
for design and visualization in financial engineering. Given the compatibility of
66 J. R. Varma and V. Virmani

Fig. 2.3 Distribution of payoff from a Phoenix Memory Autocallable Note with the memory feature
switched ON

Fig. 2.4 Distribution of payoff from a Phoenix Memory Autocallable Note with the memory feature
switched OFF

all modern browsers in rendering figures, a browser-based interface works equally


well for both prototypes as well as professional applications. The beauty of using
Python is that one can easily integrate the front end with other computational finance
libraries. Some examples of future scope include integration with QuantLib-Python
[15], adding suites for risk management analytics and algorithmic trading. Being
open source and free, such Python-based applications are also ideal for universities
and schools with limited budgets to teach courses in option pricing and financial
engineering.
2 Python Stack for Design and Visualization in Financial Engineering 67

Appendix: Design of 3-Way Collar

1 from Black_Scholes import combo


2 K = 100, 115, 90
3 combo . ttm0 = 1
4 combo . name_mapping = dict (payoff = ‘Revenue’, profit = ‘Net Revenue’)
5 three_way = (combo . exposure ()
6 + combo . put (K [0])
7 − combo . put (K [2])
8 − combo . call (K [1])). set_name (‘3-way Collar’)
9 three_way . interactive_plot ([combo . payoff , combo . profit ])

References

1. Higham, D.J.: Black-Scholes for scientific computing students. Comput. Sci. Eng. 6(6), 72–79
(2004)
2. Black, F., Scholes, M.S.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3),
637–654 (1973)
3. Barreto, H.: Why excel? J. Econ. Educ. 46(3), 300–309 (2015)
4. Barreto, H., Widdows, K.: Introductory economics labs. J. Econ. Educ. 43(1), 109 (2012)
5. Briand, G., Hill, R.C.: Teaching basic econometric concepts using Monte Carlo simulations in
excel. Int. Rev. Econ. Educ. (2013)
6. Engelhardt, L.M.: Simulating price-taking. J. Econ. Educ. 46(4), 107–113 (2015)
7. Zhang, C.: Incorporating powerful excel tools into finance teaching. J. Finan. Educ. 40(3 & 4),
87–113 (2015)
8. Varma, J.R., Virmani, V.: Web applications for teaching portfolio analysis and option pricing.
Adv. Finan. Educ. (2021)
9. Powell, S.G., Baker, K.R., Lawson, B.: Impact of errors in operational spreadsheets. Decis.
Support Syst. 47, 126–132 (2009)
10. Morgan, J.P.: Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO
Losses (2013)
11. Mandanici, A., Alessandro Sarà, S., Fiumara, G., Mandaglio, G.: Studying physics, getting to
know Python: RC circuit, simple experiments, coding, and data analysis with Raspberry Pi.
Comput. Sci. Eng. 23(1), 93–96 (2021)
12. Bauer, M., Lee, W., Papadakis, W., Zalewski, M., Garland, M.: Supercomputing in python with
legate. Comput. Sci. Eng. 23(4), 73–79 (2021)
13. Mandanici, A., Mandaglio, G., Pirrotta, G. , Nibali, V.C., Fiumara, G.: Simple physics with
python: a workbook on introductory physics with open-source software. Comput. Sci. Eng.
24(2), 1–5 (2022)
14. Zhang, L., Wu, T., Lahrichi, S., Salas-Flores, C.-G., Li, J.: A data science pipeline for algo-
rithmic trading: a comparative study of applications for finance and cryptoeconomics. In: 2022
IEEE International Conference on Blockchain (Blockchain), Espoo, Finland, pp. 298–303
(2022)
15. Varma, J.R., Virmani, V.: Computational finance using QuantLib-Python. Comput. Sci. Eng.
18, 78–88 (2016)
16. Oliphant, T.E.: Python for scientific computing. Comput. Sci. Eng. 9(3), 10–20 (2007)
17. Perez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci.
Eng. 9(3), 21–29 (2007)
18. Hunter, J.D.: Matplotlib: a 2d graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007)
19. van der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: a structure for efficient
numerical computation. Comput. Sci. Eng. 13(2), 22–30 (2011)
Chapter 3
Neurodynamic Approaches
to Cardinality-Constrained Portfolio
Optimization

Man-Fai Leung and Jun Wang

Abstract The field of portfolio optimization holds significant interest for both
academic researchers and financial practitioners. Markowitz’s seminal mean–vari-
ance analysis laid the groundwork for optimizing portfolios by balancing returns and
risks, marking a pivotal advancement in investment strategy formulation. However,
despite its foundational role, mean–variance theory is not without its limitations,
notably its reliance on assumptions that do not always hold in real-world scenarios
and its use of variance as a risk measure, which may not fully capture the complexities
of risk behaviors. The pursuit of alternative risk measures introduces mathematical
and computational challenges due to nonconvexity and discontinuities. Concurrently,
the field of neural networks has seen vigorous activity, particularly with the advance-
ments in deep learning, offering novel approaches to a variety of optimization prob-
lems. Within this stream, neurodynamic optimization emerges as a method that lever-
ages the parallel and distributed computing capabilities of neural networks, proving to
be effective for tackling global optimization, multi-period, and multi-objective prob-
lems, and is now expanding into bi-level and combinatorial optimization domains.
Given these developments, applying neurodynamic optimization to portfolio opti-
mization is a promising avenue, especially considering the unique challenges posed
by the financial domain in terms of complexity and scale. This chapter delves into
the application of neurodynamic optimization to portfolio optimization, specifically
focusing on cardinality-constrained problems. Through experimental analysis across
several global stock market datasets, neurodynamic systems have demonstrated their
efficacy in achieving superior performance based on key metrics.

M.-F. Leung (B)


School of Computing and Information Science, Faculty of Science and Engineering, Anglia
Ruskin University, Cambridge, UK
e-mail: [email protected]
J. Wang
Department of Computer Science and School of Data Science, City University of
Hong Kong, Hong Kong, China
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 69


L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_3
70 M.-F. Leung and J. Wang

Keywords Neurodynamic optimization · Minimax optimization · Mean–variance


theory · Asset allocation · Cardinality constraints

3.1 Introduction

H. M. Markowitz’s work has been identified as a cornerstone of modern finance,


as evidenced by the involvement of at least 5 Nobel Laureates [1–5]. From both
academic and economic perspectives, portfolio selection or optimization has become
an area of great interest [6–11]. Optimizing a portfolio is a strategic approach aimed
at enhancing expected returns and reducing the potential for losses stemming from
poor performance. Markowitz’s pioneering work on mean–variance (MV) optimiza-
tion [1] has become a standard in asset allocation [6]. This approach introduced three
pivotal concepts: risk minimization through diversification, the statistical quantifi-
cation of a portfolio’s risk and return, and the simultaneous evaluation of risk and
return to derive an optimal trade-off.
In the classic work of portfolio selection, two strategies are usually employed to
handle the two objectives: treating one objective as a constraint or amalgamating
the two into a singular metric, such as the Sharpe ratio [3]. Alternatively, one can
optimize both objectives explicitly by scalarizing them [12] or by maximizing utility
functions [13–15], to delineate a set of Pareto-optimal choices for stakeholders.
However, scalarization requires a set of predefined weights and the resulting solutions
may depend on the weights [16]. Maximizing utility functions needs prior preference
information from investors [13].
The literature offers a number of solution methods for portfolio optimiza-
tion, encompassing exact, approximate, and heuristic approaches [6, 12]. Exact
methods, exemplified by interior-point methods [17], ensure optimal solutions
within a finite runtime. Nonetheless, their effectiveness diminishes when faced with
significantly large problem sizes. Approximate methods, such as relaxation tech-
niques, offer reasonably good solutions within practical time limits [18], and meta-
heuristic strategies [19, 20], although lacking in solution optimality and consistency
assurances.
Neural networks, drawing inspiration from the human brain’s architecture, are
devised as computational constructs capable of intricate mappings between input
and output variables through learning algorithms like back-propagation. They can
be trained using various learning algorithms to optimize their weights and produce
desired outputs. These model facilitate the parallel processing of optimization chal-
lenges [21], aiming to identify optimal decision variable sets that align with prede-
fined objectives and constraints. The key idea is to formulate an optimization problem
and use neural networks to find the equilibrium points or trajectories that correspond
to the optimal solutions.
Neurodynamic approaches have been employed to tackle numerous optimization
problems since their initial utilization in the pioneering work of Hopfield and Tanks
[22]. Subsequent research has led to the development of neurodynamic approaches
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 71

for solving various optimization problems. For example, neurodynamic models have
been devised to address nonlinear optimization problems with nonlinear inequality
constraints [23], in addition to tackling constrained convex problems through the use
of projection operators [24]. By leveraging differential inclusion theory, a single-
layer neurodynamic approach has been proposed to tackle non-smooth optimization
problems, demonstrating convergence to solutions within a finite time [25]. To over-
come the challenge of multiple local optima, collaborative neurodynamic approaches
have been introduced [26, 27]. These approaches utilize multiple neural networks
to conduct precise local searches and incorporate metaheuristics like particle swarm
optimization for information exchange [28]. In the case of optimization problems
with multiple objectives, a scalarization technique is often employed to convert the
problem into a set of subproblems. Multiple neural networks are then employed to
solve each subproblem, generating a Pareto front of solutions [29, 30]. Neurodynamic
approaches have found applications in various fields, including engineering, due to
their superior performance [31–33], and have been proven to globally converge with
guaranteed optimality. Additionally, neurodynamic strategies that utilize recurrent
neural networks (RNNs) are particularly effective for real-time optimal asset allo-
cation when executed on specialized hardware, including various GPUs and CPUs.
The parallelism inherent in neurodynamic approaches is a significant advantage. By
leveraging parallel computing environments, neurodynamic models can efficiently
explore large solution spaces and accelerate the optimization process, leading to
faster and more effective solutions.
In particular, an approach involving collaborative neurodynamic optimization
(CNO) has been introduced for selecting portfolios [34]. This approach, grounded
in a minimax and bi-objective framework, leverages neural networks to map out the
Pareto front effectively. Following this, the approach tackles a decentralized robust
portfolio optimization challenge within the mean–variance (MV) framework using
a neurodynamic model [35]. Furthermore, this technique is applied to address port-
folio selection problems with specific constraints, reconceptualizing them as mixed-
integer optimization issues [36]. Through the utilization of RNNs, this strategy seeks
Pareto-optimal solutions by fine-tuning a weighted objective function, alongside
employing a meta-heuristic algorithm to adjust the weights, proving its efficacy in
yielding favorable Pareto-optimal outcomes. In another instance [37], the method is
applied to portfolio selection focusing on precise performance objectives, success-
fully addressing and solving five distinct optimization challenges, thereby show-
casing exceptional performance backed by thorough experimentation. This chapter
presents a two-timescale duplex neurodynamic approach to asset allocation. Here, the
challenge of MV asset allocation is redefined as a biconvex optimization problem,
incorporating the aspect of conditional value at risk. It utilizes two RNNs func-
tioning on separate timescales to identify the optimal solutions. Simultaneously, a
meta-heuristic approach is adpoted to update the neural states, thereby avoiding the
pitfalls of potential local minima.
This chapter is organized into five sections. Section 3.2 outlines the reformulation
of the portfolio optimization problem, considering both scenarios with and without
cardinality limitations. Section 3.3 highlights some existing neurodynamic models.
72 M.-F. Leung and J. Wang

Section 3.4 presents two neurodynamic models for portfolio optimization problem
with cardinality constraints. Section 3.5 presents the experimental results. Finally,
Sect. 3.6 concludes the chapter.

3.2 Preliminaries

3.2.1 Biconvex Optimization

The subsequent definitions elucidate the principles of biconvex optimization.

Definition 1 [38] A set Z ⊂ X × Y is biconvex if Zx is convex for each x ∈ X and


Zy is convex for each y ∈ Y , where X ⊆ Rm and Y ⊆ Rn are two nonempty convex
sets, and Zx = {(x, y) ∈ Z|y ∈ Y} and Zy = {(x, y) ∈ Z|x ∈ X } are sections of Z.

Definition 2 [38] A function f (x, y) : Z → R qualifies as biconvex on Z ⊆ X × Y


if, for every constant x ∈ X , the function f (x, ·) : Zx → R is convex over Zx , and
similarly, for each constant y ∈ Y , the function f (·, y) : Zy → R exhibits convexity
over Zy .

Definition 3 [38] The biconvex optimization problem is structured as follows:

min f (x, y) (3.1)


x∈X ,y∈Y

where f (x, y) is biconvex with respect to both x and y, over the domain X × Y.

3.2.2 Mean–Variance Portfolio Selection

The MV framework suggests that investors should evaluate the risk and expected
return of every asset, then allocate their funds across these assets to find the ideal
equilibrium between risk and return. This allocation of funds is represented by the
portfolio proportions y ∈ Y = [0, 1]n , with n representing the total number of assets..
For simplicity, no short-selling will be allowed. The expected portfolio return and
its variance are denoted by μT y and yT Vy, respectively, where μ ∈ Rn is the mean
returns and V is the covariance matrix. Markowitz’s MV portfolio selection model
can be articulated through two distinct optimization problems:

min yT Vy
y

s.t. μT y ≥ μmin ,
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 73

eT y = 1,
y ≥ 0, (3.2)

or

max μT y
y

s.t. yT Vy ≤ σmax ,
eT y = 1,
y ≥ 0, (3.3)

where μmin is the minimum required return in (3.2), σmax is the cap on portfolio
variance in (3.3), the vector e consists entirely of ones, and eT y = 1 acts as the budget
restriction. However, these formulations are prone to inaccuracies due to estimation
errors. To circumvent this, a more resilient strategy within the MV framework, like
minimax portfolio selection, is suggested [39, 40]. The strategy outlined in [34, 41]
aims to optimize the portfolio against the lowest expected returns, specified as:

min max(1 − β)xT y − βyT Vy


x y

s.t. e y = 1,
T

y ≥ 0, (3.4)

[ ]n β is a parameter indicating risk aversion, lying between 0 and 1, x ∈ X =


where
x, x denotes the anticipated return rates for n assets, x and x are historical low and
high returns [41], respectively. The goal of this problem is to minimize the maximum
returns of the portfolio, leading to a strategy that performs reliably in volatile markets
over the short term. However, the chosen solution may be too conservative and hence,
underperform in the long term in efficient markets. The value of β plays an important
role in this approach, as a lower value of β leads to higher risk while a higher value
of β results in a more conservative portfolio.

3.2.3 Conditional Value-at-Risk

Variance is not always a suitable measure of market volatility, and value-at-risk (VaR)
is an alternative. Let ξ ∈ Rn be random returns. VaR is defined as the lowest possible
ρ ∈ R such that the probability of −ξ T y ≤ ρ is greater than or equal to a given
threshold 0 < θ < 10 [42]. That is,
{ ( ) }
VaRθ (y) = min ρ ∈ R : P −ξ T y ≤ ρ ≥ θ . (3.5)
74 M.-F. Leung and J. Wang

It is important to acknowledge that VaRθ (y) characterizes a nonconvex relation-


ship with respect to y [43]. To further capture the volatility of the market, conditional
value-at-risk (CVaR) is introduced as the expected limit exceeding VaR [42], and is
formulated as follows:
{ }
CVaRθ (y) = E −ξ T y| − ξ T y ≥ VaRθ (y) . (3.6)

where E(·) denotes the expectation.


There are two principal methodologies for computing CVaR in (3.6): the para-
metric and sampling methods [44]. The parametric method is employed when the
distribution of asset returns is established, whereas the sampling method depends on
historical data of returns. According to [42], the CVaR can be approximated by:

1 ∑ N ( )
CVaRθ (y) ≈ ρ + max 0, −ξjT y − ρ . (3.7)
N (1 − θ ) j=1

Leveraging this approximation, a bicriteria portfolio optimization problem aiming


to minimize mean-CVaR is formulated as:

min −μT y
y

1 ∑ N
min ρ + σj
σ,ρ N (1 − θ ) j=1

s.t. σj ≥ −ξjT y − ρ, σj ≥ 0, j = 1, 2, . . . , N ,
eT y = 1,
y ≥ 0, (3.8)
( )
where σj = max 0, −ξjT y − ρ for all j.

3.2.4 Sharpe Ratio and Conditional Sharpe Ratio

Introduced by Nobel laureate Sharpe [45], the Sharpe Ratio (SR) serves as a widely
recognized metric for assessing the risk-adjusted performance of investment portfo-
lios. This metric calculates the average return that exceeds the risk-free rate for each
unit of volatility, using standard deviation as a measure of risk [46]. Additionally,
the SR has been employed as a target metric for optimizing portfolio allocations [47,
48] as follows:
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 75

μT y − rf
max √
yT Vy
s.t. eT y = 1,
y ≥ 0, (3.9)

where rf is the risk-free rate of return.


The Conditional Sharpe Ratio (CSR) is a modification of the SR (3.9), wherein
variance is replaced with the CVaR. It is used for portfolio optimization and is defined
by the following equation [49]:

μT y − rf
max
CVaRθ (y)
s.t. eT y = 1,
y ≥ 0. (3.10)

3.2.5 Cardinality-Constrained Portfolio Selection

Within the MV framework, portfolios are typically chosen from an unrestricted pool
of assets in an ideal market setting. However, real-world market imperfections often
limit investors to selecting only a subset of available assets, necessitating the inclu-
sion of cardinality constraints into the portfolio optimization equation. This, in turn,
increases the complexity of the problem significantly [50]. Thus, the optimization
scenario such as (3.2) incorporates these constraints as follows:

min yT Vy
y

s.t. μT y ≥ μmin ,
eT y = 1,
||y||0 ≤ k,
y ≥ 0, (3.11)

where ||y||0 ≤ k is the cardinality constraint, leading to the formulation of global or


mixed-integer optimization problems [51–54].
Similar to [50], the bi-objective optimization problem (3.8) under cardinality
constraints, building upon the aforementioned framework, is articulated as:

min −μT y
y,z,σ,ρ
76 M.-F. Leung and J. Wang

1 ∑ N
min ρ + σj
y,z,σ,ρ N (1 − θ ) j=1

s.t. σj ≥ −ξjT y − ρ, σj ≥ 0, j = 1, 2, . . . , N ,
eT y = 1,
eT z ≤ k,
0 ≤ y ≤ z,
z ∈ {0, 1}n , (3.12)

where z ∈ {0, 1}n is a binary vector, eT z ≤ k is the cardinality constraint which limits
the total assets selected to k, k is an integer. This formulation results in a bi-objective
mixed-integer programming problem due to the binary nature of z, known for its
computational intractability or NP-hardness [52].
To mitigate these challenges, the problem is reformulated as a constrained
global optimization issue, incorporating additional equality constraints signified by
z ◦ (z − e) = 0, with ◦ denoting the Hadamard product.
Let the functions be defined as:

f1 (y) = −μT y,
1 ∑N
f2 (σ, ρ) = ρ + σj ,
N (1 − θ ) j=1

g(y, z, σ, ρ) = (−ξjT y − ρ − σj , y − z, eT z − k T ),
( )T
h(y, z) = z ◦ (z − e), eT y − 1 .

The problem is then succinctly reformulated as:

min(f1 (y), f2 (σ, ρ))T


s.t. g(y, z, σ, ρ) ≤ 0,
h(y, z) = 0,
0 ≤ y, z ≤ e,
σ ≥ 0. (3.13)

Let f _λ = max{λ(f1 (y) + μmax ), (1 − λ)(f2 (σ, ρ))} a constrained global opti-
mization problem is formulated in the following epigraph form:

min fλ
fλ ,y,z,σ,ρ

s.t. λf1 (y) + λμmax − fλ ≤ 0,


(1 − λ)f2 (σ, ρ) − fλ ≤ 0,
g(y, z, σ, ρ) ≤ 0,
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 77

h(y, z) = 0,
y, z ∈ [0, 1]n
σ ≥ 0. (3.14)

Utilizing the conditional Sharpe ratio, a cardinality-constrained optimization


formulation is expressed as follows:
⎛ ⎞2
γ2⎝ 1 ∑N
( )
min ρ+ σJ ⎠ − γ μT y − rf
γ ,ρ,σ,y,z,ζ 2 N (1 − θ ) j=1

s.t. σJ ≥ −ξjT y − ρ, σJ ≥ 0, J = 1, 2, . . . , N ,
eT y = 1,
eT z ≤ k,
0 ≤ y ≤ z,
z ◦ ζ = 0,
z + ζ − e = 0, (3.15)

where γ is a variable weight, z = (z1 , . . . , zn )T ∈ Rn and ζ = (ζ1 , . . . , ζn )T ∈ Rn .


The constraints requiring zi = 1 or 0 and ζi = 0 or 1 for all i ensure that the problem
is well-defined within its constraints. For fixed γ and ζ , problem (3.15) maintains
a convex nature. Similarly, problem (3.15) remains convex when ρ, σ, y and z are
fixed. Based on Definitions 1 and 2, problem (3.15) is classified as biconvex.

3.3 Neurodynamic Models

In the context of solving the following optimization problem:

min ψ(y)
y∈Y

s.t. g(y) ≤ 0, (3.16)

where ψ : Rn → R and g : Rn → Rm represents the i-th inequality constraint,


denoted as gi (y)(i = 1, . . . , m). It is presupposed that both ψ(y) and g(y) are func-
tions capable of being differentiated twice. The Lagrangian associated with this
problem is formulated as:

L(y) = ψ(y) + α T g(y) (3.17)

with α ∈ Rm denoting the Lagrangian multiplier. Following this formulation, a


neurodynamic model is proposed for solving (3.16) [55], as depicted:
78 M.-F. Leung and J. Wang
{ ( ) ( )
∈ dy
dt
= −y + (y)+ − ∇ψ( (y)+
) − ∇g (y)+ (α)+
(3.18)
∈ ddtα = −α + (α)+ − g (y)+

where ∈ stands for a positive time constant, ∇ψ(·) signifies the gradient of ψ, and
(·)+ is defined as follows:
{
+ 0, yi < 0;
(yi ) =
yi , yi ≥ 0.

For cases where ψ(y) is nonsmooth, a neurodynamic model guaranteeing global


convergence is described as follows [56]:

dy ∑
∈ ∈ −∇ψ(y) − λ∂ max{0, gi (y)} (3.19)
dt i

where λ acts as a penalty parameter, and ∂(·) represents Clarke’s generalized gradient
[57]. The gradient of the max function can be given as

⎨ ∇gi (y), gi (y) > 0
∂ max{0, gi (y)} = [0, 1]∇gi (y), gi (y) = 0.

0, gi (y) < 0

In scenarios where the optimization problem includes h(y) = 0, the constraint


can be substituted with h(y) ≤ 0 and −h(y) ≤ 0 as suggested by [27].
A general framework for neurodynamic systems aimed at solving constrained
optimization problems is outlined as follows:

dy
∈ ∈ φ(∇ψ(y), Y) (3.20)
dt
where φ(·) is a function dependent on the gradient of ∇ψ(y) and the domain Y.
In recent developments, CNO approaches incorporating multiple neurodynamic
models have been proposed to tackle the difficulty of finding global optimal solutions
for nonconvex objective functions (e.g., [26, 27, 58–60]). In these approaches, meta-
heuristics such as particle swarm optimization (PSO) [28], to dynamically adjust
the initial states of the models. The update rule for PSO is given by Eqs. (3.21) and
(3.22):

vi (j + 1) = c0 vi (j) + c1 r1 (~
yi (j) − yi (j)) + c2 r2 (ŷ − yi (j)) (3.21)

yi (j + 1) = yi (j) + vi (j + 1) (3.22)

where yi (j) = (yi1 (j), . . . , yin (j))T and vi (j) = (vi1 (j), . . . , vin (j))T represent the
position and velocity of the i-th particle at iteration j, respectively, with c0 as the inertia
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 79

coefficient, c1 and
( c2 as acceleration )T factors, and r1 and r2 as random numbers within
[0, 1]. ~
yi (j) = ỹi1 (j), . . . , ỹin (j) is the best previous position of the i-th particle,
while ŷ = (ŷ1 , . . . , ŷn )T denotes the swarm’s overall optimal position found.
To enhance exploration capabilities, wavelet mutation is employed [61, 62],
defined by the function:

1 ( e) 5e
η = √ exp − cos ,
a 2a a

where a = exp 10(j/jmax ), with jmax as the maximum iteration count, and e is a
uniformly distributed random number in the range (−2.5a, 2.5a) [61]. Subsequently,
wavelet mutation is performed according to the following equation:
{
yi (j) + η(1 − yi (j)), η > 0,
yi (k + 1) = (3.23)
yi (j) + η(yi (j)), η < 0.

where yi (j + 1) is the offspring at the ( j + 1)-th generation.


The mutation is triggered when the diversity measure δ, exceeding a predefined
threshold τ , is computed as [63]:

1 ∑
M
δ= ||yi (j + 1) − ŷ||2
M i=1

where M is the particle count within a group.


Utilizing repeated reinitializations through meta-heuristic adjustments, it’s estab-
lished that CNO strategies reliably converge to the global optima of optimization
problems [26, 27, 59].
A two-timescale duplex neurodynamic system based on (3.20) [59] for solving
(3.1) is proposed, which is described by the coupled differential equations:
{
∈x dx
dt
∈ F(∇f (x, y), x)
(3.24)
∈y dy
dt
∈ F(∇f (x, y), y)

where F(·) is a function of ∇f (x, y) and x or y, ∇f (x, y) denotes the gradient of


f (x, y), ∈x and ∈y are two different time constants which are used to balance the
learning speeds of the two RNNs.
If Z = {(x, y)|gi (x, y) ≤ 0, i = 1, . . . , m}, , the dynamics of system (3.24) can
be represented as:
⎧ dx ∑
⎨ ∈x dt ∈ −∇f (x, y) − λ∂ max{0, gi (x, y)}

i
(3.25)
⎩ ∈y dy
dt
∈ −∇f (x, y) − λ∂ max{0, gi (x, y)}
i
80 M.-F. Leung and J. Wang

Duplex neurodynamic systems with two distinct operational timescales have


been introduced to address biconvex optimization [59] and mixed-integer optimiza-
tion problems [60]. These systems integrate two RNNs functioning on different
timescales, demonstrating assured convergence towards global optimum solutions
[59, 60]. CNO has shown effectiveness in portfolio optimization endeavors, deliv-
ering encouraging outcomes. Specifically, a bi-objective portfolio selection problem,
as described in Eqs. (3.2) and (3.3), has been effectively resolved using a CNO method
[64]. Furthermore, CNO strategies have been crafted for minimax portfolio optimiza-
tion [34] and for tackling decentralized robust portfolio optimization issues through
neurodynamic models [35].

3.4 Neurodynamic Portfolio Selection

3.4.1 Collaborative Neurodynamic Approach

In the study [58], a collaborative neurodynamic approach is introduced for addressing


both global and combinatorial optimization challenges, designed to reliably converge
to the global optima. This strategy entails redefining the global optimization problem
and establishing a network of neurodynamic models. These models undergo reinitial-
izations guided by the principles of PSO. To validate the effectiveness of the approach,
both theoretical analyses and simulation-based evidence are provided. Specifically,
the approach is applied to find a spectrum of evenly distributed Pareto-optimal solu-
tions for a bi-objective portfolio optimization issue with cardinality constraints, as
described in problem (3.14). The architecture of this approach is dual-layered: the
lower tier comprises several RNNs running concurrently to identify Pareto-optimal
solutions, while the upper tier leverages PSO to enhance the hypervolume enclosed
by these Pareto-optimal solutions through the adjustment of weights between the
two objectives.
The following dynamic equation of the neurodynamic model in [58] is tailored
and customized for the lower level:


⎪ ∈ dx = −x + P(x − ∇x g̃(x, σ, η)δ − ∇x h(x)γ


dt

⎪ −α∇ x g̃(x, σ, η)D2n (δ)g̃(x, σ, η) − β∇x h(x)h(x))

⎨ dσ
∈ dt = −σ + (σ − ∇σ g̃(x, σ, η)δ − α∇σ g̃(x, σ, η)DN (δ)g̃(x, σ, η))+
(3.26)

⎪ ∈ ddtη = −τ − ∇η g̃(x, σ, η)δ − α∇η g̃(x, σ, η)D2 (δ)g̃(x, σ, η)



⎪ ∈ d δ = −δ + [δ + g̃(x, σ, η)]+

⎩ ddtγ
∈ dt = h(x)
( )T
where ∈ is a positive time constant, x = yT , z T ∈ R2n is the output state vectors,
σ ∈ RN , η = (ρ, fλ )T ∈ R2 , δ ∈ RN +n+1 and γ ∈ Rn+1 are hidden state vectors, τ =
( )T
(0, 1)T , g̃(x, σ, η) = g(y, z, σ, ρ)T , λf1 (y) + λμmax − fλ , (1 − λ)f2 (σ, ρ) − fλ , α
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 81

and β are two positive penalty parameters. P(·) is defined as:



⎨ 1, xi > 1
P(xi ) = min{max{0, xi }, 1} = xi , 0 ≤ xi ≤ 1,

0, xi < 0

[δi ]+ = max{δi , 0} is a special case of P(·) also known as ReLU, and


⎛ ⎞
δ1 0 ... 0
⎜0 δ2 ... 0 ⎟
⎜ ⎟
Dr (δ) = ⎜ . .. .. .. ⎟.
⎝ .. . . . ⎠
0 0 . . . δr
( )
In the study [58], the use of quadratic diagonal components within D δ 2 is high-
lighted for managing both equality and inequality constraints, while a linear variant,
Dr (δ), , is specifically utilized for inequality constraints in the model (3.26) for the
sake of simplification, similar to the approach outlined in [60]. The structured model,
as detailed in (3.26), is composed of five distinct layers. The initial three layers are
designed to steer the state variables towards a feasible solution space, aiming to
satisfy both equality and inequality constraints, and to refine the objective function.
Conversely, the fourth and fifth layers are dedicated to addressing inequality and
equality constraints, respectively. The inherent nonconvex nature of the objective
function and constraints may prevent a single RNN from ensuring a globally optimal
solution. Therefore, multiple RNNs are employed to conduct a scatter search, with the
PSO reintroducing these RNNs to the search process post-local convergence. This
strategy enables the system to bypass local optima in pursuit of globally optimal
solutions, as corroborated by the findings in [27, 30, 58].
Due to the inability to directly correlate the HV diversity indicator with the weights
algebraically, PSO is employed at the higher level to refine the weights. This opti-
mization aims at enhancing the HV by considering a variety of potential weight
configurations until a predetermined termination point is met. Let λj ∈ RM and
νj ∈ RM be the position and velocity of the j-th weight (j = 1, 2, . . . , qw ) and be
updated using the standard PSO rule (3.21) as follows: For j = 1, 2, . . . , qw ;
( ) ( )
λj ← λj + c0 νj + c1 r1 λ̃j − λj + c2 r2 λ∗ − λj (3.27)

where λ̃j and λ∗ represent the historical best individual solution and the optimal group
solution for j-th weight, respectively. The collaborative neurodynamic approach,
aimed at cardinality-constrained bi-objective portfolio optimization, employs a hier-
( )T
archical structure detailed in Algorithm 1, where w = yT , z T , σ T , ρ . Notably,
the foundational tier comprises an assembly of neurodynamic models (3.26) that
are periodically reset with PSO to discover Pareto-optimal solutions by addressing
the scalarized optimization problem depicted in (3.14). Concurrently, the upper tier
82 M.-F. Leung and J. Wang

advances weight adjustments using PSO to enhance HV, promoting a diverse solution
set, as elaborated in [30].

Algorithm 1 Collaborative Neurodynamic Optimization for Portfolio Selection


Problem (3.14)

Input: [w1 (0), w2 (0), . . . , wM (0)] ∈ [0, 1]2n+N +1 .


Output: A.
1 while HV (A∗ , ζ ) < ω do
2 for j = 1 : qw do
3 m ← 1;
4 while m < M do
5 i ← 1;
6 while i < qn do
7 Update initial states wmi by PSO;
8 Compute steady states wmi by (3.26);
9 w̃mi ← wmi if fλm (wmi ) < fλm (w̃mi );
10 i ← i + 1;
11 end { ( )}
12 w∗m |←
|( argmin f)|λm| (wm1 ), . . . , fλm wmqn ;
13 if || wm∗ − w ∗ ||
{(m ( <)ε) then ( ))T }
14 Aj ← Aj ∪ f1 w∗m , f2 w∗m ;
15 m ← m + 1;( ) ( )
16 else if fλm w∗m < fλm wm∗ then
17 wm∗ ← w∗m ;
18 end
19 end ( ) ( )
20 if HV Aj , ζ < HV Ãj , ζ then
21 λ̃j ← λj , Ãj ← Aj ;
22 end
23 end { ( )}
24 A∗ ← argmax HV (A1 , ζ ), . . . , HV Aqw , ζ ;
25 Obtain λ∗ from A∗ ;
26 for j = 1 : qw do
27 Update λj using (3.27);
28 end
29 end

3.4.2 Two-Timescale Duplex Neurodynamic Approach

For addressing the biconvex portfolio optimization outlined in (3.15), a specialized


neurodynamic model built upon RNN (3.25) is devised as follows [65]:
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 83
⎧ dγ ∑

⎪ ∈1 dt ∈ −∇fc (γ , ρ, σ, y) − λ∂ max{0, ci (ρ, σ, y, z, ζ )}

⎪ ∑

⎪ dρ
i

⎪ ∈ ∈ −∇f (γ , ρ, σ, y) − λ∂ max{0, ci (ρ, σ, y, z, ζ )}


2 dt c
⎪ ∑ i
⎪ ∈ d σ ∈ −∇f (γ , ρ, σ, y) − λ∂ max{0, c (ρ, σ, y, z, ζ )}

⎨ 2 dt c i
∑i (3.28)


dy
∈2 ∈ −∇fc (γ , ρ, σ, y) − λ∂ max{0, ci (ρ, σ, y, z, ζ )}

⎪ dt

⎪ ∑ i

⎪ ∈2 dz ∈ −λ∂ max{0, ci (ρ, σ, y, z, ζ )}

⎪ dt

⎪ dζ ∑i

⎩ ∈1 dt ∈ −λ∂ max{0, ci (ρ, σ, y, z, ζ )}
i

In this model, fc (γ , ρ, σ, y) represents the objective function detailed in (3.15),


while c(ρ, σ, y, z, ζ ) refers to a set of vector-valued inequality constraints. This six-
layer neural network model is structured to minimize the objective function in (3.15)
under a variety of constraints, guiding the system states towards an admissible region,
particularly focusing on the fifth and sixth layers to enforce the binary constraint on
z through the fulfillment of bilinear and linear equality constraints z ◦ ζ = 0 and
z + ζ − e = 0. In RNN (3.28), there are 2n + N + 2 neurons.
Given the complexity of globally optimizing a biconvex portfolio problem like
(3.15) with a single RNN, a two-timescale duplex neurodynamic approach employing
dual RNNs operating on differing timescales (i.e., ∈1 > ∈2 for RNN1, and ∈2 < ∈1
for RNN2) is used [60, 66]. Additionally, to facilitate convergence, the PSO [28] is
implemented for the periodic reinitialization of the RNN states (3.28). Algorithm
2 delineates the two-timescale duplex neurodynamic approach for the cardinality-
constrained portfolio selection problem (3.15). As the algorithm uses two RNNs,
each containing 2n + N + 2 neurons, the spatial complexity of the algorithm is
4n + 2N + 4.
Due to the biconvex nature of problem (3.15), the two-timescale duplex neuro-
dynamic optimization approach, equipped with distinct initial conditions and
adequately divergent timescales for RNN1 and RNN2, reliably achieves convergence
to the global optimum of the problem [59, 60].

Algorithm 2 Two-Timescale Duplex Neurodynamic Optimization for Portfolio


Selection Problem (3.15)
1 Initialize the states of RNN1, RNN2
randomly: (γ1 (0), ρ1 (0), σ1 (0),y1 (0), z1 (0), ζ1 (0)),
(γ2 (0), ρ2 (0), σ2 (0),y2 (0), z2 (0), ζ2 (0)), and set the error
tolerance ε;
2 for I = 1 : 2 do
3 pI (0) = (γI (0), ρI (0), σI (0), yI (0), zI (0), ζI (0));
4 p∗ (0) = arg min(fc (γI (0), ρI (0), σI (0), yI (0)));
5 end
6 j ← 1;
7 while ||p∗ (j + 1) − p∗ (j)|| ≥ ε do ( )
8 Compute steady states γ 1 (j), ρ 1 (j), σ 1 (j), y1 (j), z 1 (j), ζ 1 (j)
84 M.-F. Leung and J. Wang
( )
and (γ 2 (j), ρ 2 (j), σ 2 (j), y2 (j),)z 2 (j), ζ 2 (j) by (3.28);
9 if fc γ I (j), ρ I (j), σ I (j), yI (j) < fc (pI (j)) then
( )
10 pI (j + 1) = γ I (j), ρ I (j), σ I (j), yI (j), z I (j), ζ I (j) ;
11 else
12 pI (j + 1) = pI (j);
13 end
14 if fc (pI (j + 1)) < fc (p∗ (j)) then
15 p∗ (j + 1) = pI (j + 1);
16 else
17 p∗ (j + 1) = p∗ (j);
18 end
19 Compute (γI (j + 1), ρI (j + 1), σI (j + 1), yI (j + 1), zI (j + 1), ζI (j + 1))
by (3.21) and (3.22);
20 if (τ > δ) then
21 Perform the wavelet mutation using (3.23);
22 end
23 j ← j + 1;
24 end

3.5 Experimental Results

3.5.1 Setups

In alignment with prior research [34, 36], our experiments utilized data from four
major stock exchanges: HDAX, FTSE, HSCI, and SP500. The datasets comprised
938 weekly adjusted closing prices of stocks spanning from January 3, 2000, to
December 29, 2017, excluding stocks that were suspended or newly listed during
this timeframe [50, 51, 67]. Consequently, the datasets for HDAX, FTSE, HSCI, and
SP500 included 49, 56, 77, and 356 stocks, respectively. The data were segmented
into two portions for each experiment: one-third for in-sample pre-training and two-
thirds for out-of-sample testing, and half and half. Out-of-sample testing involved
continuous updating of problem parameters using historical return data up to the week
prior to each portfolio rebalancing, ensuring portfolio optimizations were based on
the most current data within a sequentially prolonged time window.
Additionally, following [36], the cardinality constraint k varies across datasets to
test different portfolio sizes: for HDAX, k values of 44, 34, 24, 14, and 4; for FTSE,
50, 39, 28, 16, and 5; for HSCI, 69, 53, 38, 23, and 7; and for SP500, 320, 249, 178,
106, and 35 are examined. The risk-free rate rf is calculated based on the annualized
returns of US Treasury three-month T-bills ryearly , converting these to weekly rates
( )938/18 ( )18/938
using the formula 1 + rweekly −1 = ryearly , rf = rweekly = 1 + ryearly −1
[68]. All experiments use simple return rates for rf .
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 85

Four approaches are used in performance comparison: (1) DNO, a duplex neurody-
namic optimization approach, (2) CNO, a collaborative neurodynamic optimization
approach with 20 neurodynamic models [36], (3) EW, an equally-weighted approach
for portfolio selection [69], and (4) MI, the market index.
For estimating CVaR, the threshold θ is set to 0.95, with N matching the total
available historical data at decision time. The two-timescale duplex neurodynamic
model employs ∈1 /∈2 = 10 in RNN1 and ∈2 /∈1 = 0.1 in RNN2. In the PSO rule,
the constants c1 and c2 are set to 1.49. Moreover, the algorithm terminates when the
threshold ε is equal to 10−3 , and the diversity threshold τ is set to 0.1 as suggested
in [62]. In addition, jmax is set to 50, and r1 , r2 , and the initial states of γ , ρ, σ, y, z,
and ζ are randomly generated within the range 0 and 1.

3.5.2 Results

Tables 3.1, 3.2 and 3.3 present the annualized SR, CSR, and returns across four
datasets. The portfolios developed using the DNO approach show superior annualized
SR, CSR, and return metrics at cardinality levels of 24, 34, and 44. However, in the
HDAX dataset, the DNO approach did not perform as well as the EW approach when
cardinality constraints are set at 4 and 14, where the EW portfolio is not limited by
cardinality. For the FTSE dataset, DNO portfolios surpass the three benchmarks
in terms of annualized SR, CSR, and returns. In the HSCI dataset, DNO portfolios
exhibit higher annualized SR for cardinality values of 38, 53, 69, and 77 but fall short
against the EW method in terms of both annualized SR and returns at cardinality
levels of 7 and 23. Similarly, DNO portfolios demonstrate greater annualized CSR
values than the benchmark methods for cardinality values of 38, 53, 69, and 77,
although they underperform compared to EW at cardinality settings of 7 and 23. It
is observed that both SR and CSR metrics for DNO portfolios tend to improve with
an increase in the value of k. Furthermore, Tables 3.4, 3.5 and 3.6, which use data
from the first half of the period for in-sample learning, show enhanced outcomes
compared to Tables 3.1, 3.2 and 3.3, attribute to the utilization of a larger sample
size for in-sample training, indicating improved performance across all metrics.
Figures 3.1 and 3.2 illustrate the growth of cumulative returns for portfolios that
are rebalanced weekly, utilizing datasets divided into thirds (1/3 for in-sample and
2/3 for out-of-sample) and halves (1/2 for in-sample and 1/2 for out-of-sample),
respectively. A general trend observed is that cumulative returns tend to rise with an
increase in the cardinality value k. More precisely, Fig. 3.1 highlights that portfolios
optimized using DNO exhibit superior cumulative returns at cardinality values of 44,
50, 69, and 320 across the respective datasets. Yet, for specific settings such as DNO-
4 and DNO-14 in the HDAX dataset, DNO-5 through DNO-39 in the FTSE dataset,
and DNO-7 and DNO-23 in the HSCI dataset, the performance falls short of the
EW portfolios, which are not limited by cardinality constraints. Figure 3.2 presents
analogous findings, with the first half of the period allocated for in-sample pre-
training, showing the highest cumulative returns for the DNO strategy at cardinality
86 M.-F. Leung and J. Wang

Table 3.1 Annualized Sharpe ratio of 1/3–2/3 partitioned datasets


Dataset n k SR
DNO CNO EW MI
HDAX 49 4 0.3463 0.3071 0.4715 0.4087
14 0.4535 0.4244
24 0.4857 0.4627
34 0.4947 0.4600
44 0.5497 0.5396
49 0.5588 0.5415
FTSE 56 5 0.3713 0.3040 0.4470 0.1705
16 0.3774 0.3240
28 0.4002 0.3338
39 0.4075 0.3817
50 0.4319 0.4221
56 0.5906 0.5399
HSCI 77 7 0.7044 0.6347 0.7518 0.3174
23 0.7186 0.6599
38 0.7712 0.6704
53 0.7730 0.6923
69 0.7882 0.7074
77 0.7903 0.7386
SP500 356 35 0.7612 0.6744 0.7229 0.3810
106 0.8109 0.7967
178 0.8304 0.8208
249 0.8913 0.8430
320 0.8990 0.8609
356 0.9003 0.8733

values of 44, 69, and 320 in the HDAX, HSCI, and SP500 datasets, respectively.
However, in certain cases like DNO-4, DNO-14, DNO-24 on HDAX, DNO-5 to
DNO-50 on FTSE, and DNO-7, DNO-23 on HSCI, the DNO portfolios did not
perform as well as the EW portfolios, which are free from cardinality restrictions.
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 87

Table 3.2 Annualized conditional Sharpe ratio of 1/3–2/3 partitioned datasets


Dataset n k CSR
DNO CNO EW MI
HDAX 49 4 0.1691 0.1635 0.1964 0.1761
14 0.1890 0.1824
24 0.2010 0.1908
34 0.2093 0.1918
44 0.2321 0.1983
49 0.2379 0.1991
FTSE 56 5 0.1471 0.1154 0.1847 0.0716
16 0.1493 0.1242
28 0.1672 0.1285
39 0.1708 0.1409
50 0.1805 0.1499
56 0.2474 0.2215
HSCI 77 7 0.3029 0.3013 0.3331 0.1445
23 0.3233 0.3091
38 0.3394 0.3180
53 0.3421 0.3260
69 0.3490 0.3339
77 0.3502 0.3498
SP500 356 35 0.3322 0.3318 0.3079 0.1560
106 0.3593 0.3429
178 0.3885 0.3555
249 0.4016 0.3690
320 0.4339 0.3695
356 0.4396 0.3897

3.6 Concluding Remarks

In conclusion, the integration of neurodynamic optimization and cardinality-


constrained portfolio optimization has been thoroughly examined in this chapter.
The experimental results have provided compelling evidence for the superior perfor-
mance of neurodynamics-based approaches in portfolio optimization. Specifically,
the neurodynamic models outperformed the baselines in terms of major performance
criteria such as SR, CSR, and cumulative returns. The DNO portfolios achieved the
highest SR, CSR values, and returns at specific values of k on different datasets,
demonstrating their effectiveness in maximizing portfolio performance while consid-
ering cardinality constraints. This research breakthrough signifies the potential of
88 M.-F. Leung and J. Wang

Table 3.3 Annualized returns of 1/3–2/3 partitioned datasets


Dataset n k Annualized return (%)
DNO CNO EW MI
HDAX 49 4 7.5389 8.3862 9.4248 7.7961
14 9.1808 9.4128
24 9.9109 9.8893
34 10.2469 10.0069
44 11.2796 11.0843
49 11.8719 11.2828
FTSE 56 5 6.5139 6.0002 7.8591 2.4769
16 6.7312 6.6281
28 7.0314 7.1953
39 7.1560 7.4920
50 8.0872 7.6369
56 10.8381 8.3538
HSCI 77 7 15.3946 15.8822 16.3132 5.7223
23 15.8024 15.9351
38 17.2347 16.2815
53 17.2441 16.6426
69 17.4298 17.0163
77 17.5837 17.6290
SP500 356 35 15.9359 18.9221 14.0971 6.2927
106 17.0170 17.5123
178 17.8873 18.9221
249 19.3203 19.5123
320 20.1652 20.2054
356 20.4722 20.9749

neurodynamic systems as powerful tools for solving complex portfolio optimiza-


tion problems. Future research endeavors should focus on expanding the applica-
tion of these proposed methods to address more intricate financial engineering and
management challenges.
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 89

Table 3.4 Annualized Sharpe ratio of half-and-half partitioned datasets


Dataset n k SR
DNO CNO EW MI
HDAX 49 4 0.5486 0.5231 0.6873 0.6726
14 0.5975 0.5689
24 0.6687 0.6357
34 0.7271 0.6848
44 0.7574 0.7119
49 0.7582 0.7565
FTSE 56 5 0.4844 0.4349 0.8303 0.4364
16 0.6338 0.6187
28 0.6600 0.6302
39 0.6755 0.6554
50 0.7214 0.7201
56 0.9709 0.9008
HSCI 77 7 0.9315 0.8645 0.9791 0.4782
23 0.9411 0.8790
38 0.9551 0.8941
53 0.9640 0.9074
69 1.9881 0.9186
77 1.0147 0.9369
SP500 356 35 1.1204 1.1139 1.1599 0.8255
106 1.2346 1.1392
178 1.2724 1.1659
249 1.2852 1.1906
320 1.2854 1.2140
356 1.3284 1.2330
90 M.-F. Leung and J. Wang

Table 3.5 Annualized conditional Sharpe ratio of half-and-half partitioned datasets


Dataset n k CSR
DNO CNO EW MI
HDAX 49 4 0.2263 0.2152 0.3110 0.3170
14 0.2514 0.2358
24 0.2818 0.2476
34 0.3053 0.2501
44 0.3281 0.2674
49 0.3360 0.2769
FTSE 56 5 0.1918 0.1764 0.4009 0.2016
16 0.2730 0.2173
28 0.2854 0.2226
39 0.2878 0.2480
50 0.2945 0.2562
56 0.4134 0.4071
HSCI 77 7 0.4540 0.4044 0.3705 0.2052
23 0.4575 0.4114
38 0.4698 0.4186
53 0.4731 0.4247
69 0.4900 0.4291
77 0.4936 0.4398
SP500 356 35 0.4934 0.4045 0.4702 0.3495
106 0.5512 0.4273
178 0.5796 0.5069
249 0.6105 0.5345
320 0.6260 0.5390
356 0.6447 0.6422
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 91

Table 3.6 Annualized returns of half-and-half partitioned datasets


Dataset n k Annualized return (%)
DNO CNO EW MI
HDAX 49 4 10.4758 7.9459 12.9857 12.2463
14 11.2641 11.1892
24 12.7997 12.1576
34 13.8972 13.4195
44 14.5806 13.6594
49 15.1259 13.8340
FTSE 56 5 7.7156 7.9548 12.9811 5.9705
16 9.8699 8.2565
28 10.2688 8.4036
39 10.7203 9.1148
50 11.1810 12.4485
56 15.8881 15.6841
HSCI 77 7 16.9461 16.9073 17.8610 7.9394
23 17.6416 17.1799
38 17.8137 17.4809
53 18.0446 17.7833
69 18.0600 18.0823
77 18.7928 18.6832
SP500 356 35 20.9448 19.7990 20.3834 12.4253
106 24.1746 20.3223
178 24.6381 20.9244
249 25.1157 21.5870
320 25.9424 22.3755
356 26.9975 23.2873
92 M.-F. Leung and J. Wang

Fig. 3.1 Cumulative returns for four distinct portfolios, derived from datasets divided into 1/3 for
training and 2/3 for testing phases, from HDAX (in the first subplot), FTSE (in the second subplot),
HSCI (in the third subplot), and SP500 (in the final subplot)
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 93

Fig. 3.2 Cumulative returns for four distinct portfolios, derived from datasets divided into 1/2 for
training and 1/2 for testing phases, from HDAX (in the first subplot), FTSE (in the second subplot),
HSCI (in the third subplot), and SP500 (in the final subplot)

References

1. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)


2. Modigliani, F., Miller, M.H.: The cost of capital, corporation finance and the theory of
investment. Am. Econ. Rev. 48(3), 261–297 (1958)
3. Sharpe, W.F.: Capital asset prices: a theory of market equilibrium under conditions of risk. J.
Financ. 19(3), 425–442 (1964)
4. Merton, R.C.: Lifetime portfolio selection under uncertainty: the continuous-time case. Rev.
Econ. Stat. 247–257 (1969)
5. Samuelson, P.A.: Lifetime portfolio selection by dynamic stochastic programming. In:
Stochastic Optimization Models in Finance, pp. 517–524. Elsevier (1975)
6. Kolm, P.N., Tutuncu, R., Fabozzi, F.J.: 60 years of portfolio optimization: practical challenges
and current trends. Eur. J. Oper. Res. 234(2), 356–371 (2014)
7. Pinto, T., Morais, H., Sousa, T.M., Sousa, T., Vale, Z., Praca, I., Faia, R., Pires, E.J.S.: Adap-
tive portfolio optimization for multiple electricity markets participation. IEEE Trans. Neural
Networks Learn. Syst. 27(8), 1720–1733 (2015)
8. Villena, M.J., Reus, L.: On the strategic behavior of large investors: a mean-variance portfolio
approach. Eur. J. Oper. Res. 254(2), 679–688 (2016)
9. Lai, Z.-R., Dai, D.-Q., Ren, C.-X., Huang, K.-K.: A peak price tracking-based learning system
for portfolio selection. IEEE Trans. Neural Networks Learn. Syst. 29(7), 2823–2832 (2018)
94 M.-F. Leung and J. Wang

10. Lai, Z.-R., Dai, D.-Q., Ren, C.-X., Huang, K.-K.: Radial basis functions with adaptive input and
composite trend representation for portfolio selection. IEEE Trans. Neural Networks Learn.
Syst. 29(12), 6214–6226 (2018)
11. Josa-Fombellida, R., Rincón-Zapatero, J.P.: Equilibrium strategies in a defined benefit pension
plan game. Eur. J. Oper. Res. 275(1), 374–386 (2019)
12. Ponsich, A., Jaimes, A.L., Coello, C.A.C.: A survey on multiobjective evolutionary algo-
rithms for the solution of the portfolio optimization problem and other finance and economics
applications. IEEE Trans. Evol. Comput. 17(3), 321–344 (2013)
13. Kroll, Y., Levy, H., Markowitz, H.M.: Mean-variance versus direct utility maximization. J.
Financ. 39(1), 47–61 (1984)
14. Sharpe, W.F.: Expected utility asset allocation. Financ. Anal. J. 63(5), 18–30 (2007)
15. Morgenstern, O., Von Neumann, J.: Theory of Games and Economic Behavior. Princeton
University Press (1953)
16. R. E. Steuer, Multiple Criteria Optimization: Theory, Computation, and Applications. Wiley,
1986.
17. Mansini, R., Ogryczak, W., Speranza, M.G.: Twenty years of linear programming based
portfolio optimization. Eur. J. Oper. Res. 234(2), 518–535 (2014)
18. Brandhofer, S., Braun, D., Dehn, V., Hellstern, G., Hüls, M., Y. Ji, I. Polian, A. S. Bhatia, and
T. Wellens: Benchmarking the performance of portfolio optimization with QAOA. Quantum
Inf. Process. 22(1), 25 (2022)
19. Ertenlice, O., Kalayci, C.B.: A survey of swarm intelligence for portfolio optimization:
algorithms and applications. Swarm Evol. Comput. 39, 36–52 (2018)
20. Gunjan, A., Bhattacharyya, S.: A brief review of portfolio optimization techniques. Artif. Intell.
Rev. 56(5), 3847–3886 (2023)
21. Tank, D., Hopfield, J.: Simple ‘neural’ optimization networks: an A/D converter, signal decision
circuit, and a linear programming circuit. IEEE Trans. Circ. Syst. 33(5), 533–541 (1986)
22. Hopfield, J.J., Tank, D.W.: Computing with neural circuits—a model. Science 233(4764),
625–633 (1986)
23. Xia, Y., Wang, J.: A recurrent neural network for nonlinear convex optimization subject to
nonlinear inequality constraints. IEEE Trans. Circ. Syst. I Regul. Pap. 51(7), 1385–1394 (2004)
24. Xia, Y., Wang, J.: A recurrent neural network for solving nonlinear convex programs subject
to linear constraints. IEEE Trans. Neural Networks 16(2), 379–386 (2005)
25. Li, G., Yan, Z., Wang, J.: A one-layer recurrent neural network for constrained nonsmooth
invex optimization. Neural Netw. 50, 79–89 (2014)
26. Yan, Z., Wang, J., Li, G.: A collective neurodynamic optimization approach to bound-
constrained nonconvex optimization. Neural Netw. 55, 20–29 (2014)
27. Yan, Z., Fan, J., Wang, J.: A collective neurodynamic approach to constrained global
optimization. IEEE Trans. Neural Networks Learn. Syst. 28(5), 1206–1215 (2017)
28. Clerc, M., Kennedy, J.: The particle swarm-explosion, stability, and convergence in a
multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)
29. Yang, S., Liu, Q., Wang, J.: A collaborative neurodynamic approach to multiple-objective
distributed optimization. IEEE Trans. Neural Networks Learn. Syst. 29(4), 981–992 (2018)
30. Leung, M.-F., Wang, J.: A collaborative neurodynamic approach to multiobjective optimization.
IEEE Trans. Neural Networks Learn. Syst. 29(11), 5738–5748 (2018)
31. Che, H., Wang, J.: A nonnegative matrix factorization algorithm based on a discrete-time
projection neural network. Neural Netw. 103, 63–71 (2018)
32. Wang, J., Wang, J., Che, H.: Task assignment for multivehicle systems based on collaborative
neurodynamic optimization. IEEE Trans. Neural Networks Learn. Syst. 31(4), 1145–1154
(2019)
33. Wang, J., Wang, J., Han, Q.-L.: Neurodynamics-based model predictive control of continuous-
time under-actuated mechatronic systems. IEEE/ASME Trans. Mechatron. 26(1), 311–322
(2021)
34. Leung, M.-F., Wang, J.: Minimax and biobjective portfolio selection based on collaborative
neurodynamic optimization. IEEE Trans. Neural Networks Learn. Syst. 32(7), 2825–2836
(2021)
3 Neurodynamic Approaches to Cardinality-Constrained Portfolio … 95

35. Leung, M.-F., Wang, J., Li, D.: Decentralized robust portfolio optimization based on
cooperative-competitive multiagent systems. IEEE Trans. Cybern. 52(12), 12785–12794
(2022)
36. Leung, M.-F., Wang, J.: Cardinality-constrained portfolio selection based on collaborative
neurodynamic optimization. Neural Netw. 145, 68–79 (2022)
37. Wang, J., Gan, X.: Neurodynamics-driven portfolio optimization with targeted performance
criteria. Neural Netw. 157, 404–421 (2023)
38. Gorski, J., Pfeuffer, F., Klamroth, K.: Biconvex sets and optimization with biconvex functions:
a survey and extensions. Math. Meth. Oper. Res. 66(3), 373–407 (2007)
39. Young, M.R.: A minimax portfolio selection rule with linear programming solution. Manag.
Sci. 44(5), 673–683 (1998)
40. Polak, G.G., Rogers, D.F., Sweeney, D.J.: Risk management strategies via minimax portfolio
optimization. Eur. J. Oper. Res. 207(1), 409–419 (2010)
41. Deng, X.-T., Li, Z.-F., Wang, S.-Y.: A minimax portfolio selection strategy with equilibrium.
Eur. J. Oper. Res. 166(1), 278–292 (2005)
42. Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42
(2000)
43. Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.: Coherent measures of risk. Math. Financ. 9(3),
203–228 (1999)
44. Gaivoronski, A.A., Pflug, G.: Value-at-risk in portfolio optimization: properties and computa-
tional approach. J. Risk 7(2), 1–31 (2005)
45. Sharpe, W.F.: The sharpe ratio. J. Portfolio Manag. 21(1), 49–58 (1994)
46. Christiansen, C., Joensen, J.S., Nielsen, H.S.: The risk-return trade-off in human capital
investment. Labour Econ. 14(6), 971–986 (2007)
47. Liu, Q., Guo, Z., Wang, J.: A one-layer recurrent neural network for constrained pseudoconvex
optimization and its application for dynamic portfolio optimization. Neural Netw. 26(1), 99–
109 (2012)
48. Liu, Q., Dang, C., Huang, T.: A one-layer recurrent neural network for real-time portfolio
optimization with probability criterion. IEEE Trans. Cybern. 43(1), 14–23 (2013)
49. Eling, M., Schuhmacher, F.: Does the choice of performance measure influence the evaluation
of hedge funds? J. Bank. Finan. 31(9), 2632–2647 (2007)
50. Chang, T.-J., Meade, N., Beasley, J.E., Sharaiha, Y.M.: Heuristics for cardinality constrained
portfolio optimization. Comput. Oper. Res. 27(13), 1271–1302 (2000)
51. Woodside-Oriakhi, M., Lucas, C., Beasley, J.E.: Heuristic algorithms for the cardinality
constrained efficient frontier. Eur. J. Oper. Res. 213(3), 538–550 (2011)
52. Gao, J., Li, D.: Optimal cardinality constrained portfolio selection. Oper. Res. 61(3), 745–761
(2013)
53. Hardoroudi, N.D., Keshvari, A., Kallio, M., Korhonen, P.: Solving cardinality constrained
mean-variance portfolio problems via MILP. Ann. Oper. Res. 254, 47–59 (2017)
54. Kalayci, C.B., Polat, O., Akbay, M.A.: An efficient hybrid metaheuristic algorithm for
cardinality constrained portfolio optimization. Swarm Evol. Comput. 54, 100662 (2020)
55. Xia, Y., Feng, G., Wang, J.: A novel neural network for solving nonlinear optimization problems
with inequality constraints. IEEE Trans. Neural Networks 19(8), 1340–1353 (2008)
56. Li, G., Yan, Z., Wang, J.: A one-layer recurrent neural network for constrained nonconvex
optimization. Neural Netw. 61, 10–21 (2015)
57. Liu, Q., Wang, J.: A one-layer recurrent neural network for constrained nonsmooth optimiza-
tion. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 40(5), 1323–1333 (2011)
58. Che, H., Wang, J.: A collaborative neurodynamic approach to global and combinatorial
optimization. Neural Netw. 114, 15–27 (2019)
59. Che, H., Wang, J.: A two-timescale duplex neurodynamic approach to biconvex optimization.
IEEE Trans. Neural Networks Learn. Syst. 30(8), 2503–2514 (2019)
60. Che, H., Wang, J.: A two-timescale duplex neurodynamic approach to mixed-integer optimiza-
tion. IEEE Trans. Neural Networks Learn. Syst. 32(1), 36–48 (2021)
96 M.-F. Leung and J. Wang

61. Ling, S.-H., Iu, H.H., Chan, K.Y., Lam, H.-K., Yeung, B.C., Leung, F.H.: Hybrid particle
swarm optimization with wavelet mutation and its industrial applications. IEEE Trans. Syst.
Man Cybern. Part B (Cybern.) 38(3), 743–763 (2008)
62. Fan, J., Wang, J.: A collective neurodynamic optimization approach to nonnegative matrix
factorization. IEEE Trans. Neural Networks Learn. Syst. 28(10), 2344–2356 (2017)
63. Juang, C.-F.: A hybrid of genetic algorithm and particle swarm optimization for recurrent
network design. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 34(2), 997–1006 (2004)
64. Leung, M.-F., Wang, J.: A collaborative neurodynamic optimization approach to bicriteria
portfolio selection. In: Lu, H., Tang, H., Wang, Z. (eds.) Advances in Neural Networks, pp. 318–
327. Springer International Publishing, Cham. ISNN 2019
65. Leung, M.-F., Wang, J., Che, H.: Cardinality-constrained portfolio selection via two-timescale
duplex neurodynamic optimization. Neural Netw. 153, 399–410 (2022)
66. Che, H., Wang, J., Cichocki, A.: Bicriteria sparse nonnegative matrix factorization via two-
timescale duplex neurodynamic optimization. IEEE Trans. Neural Networks Learn. Syst. 34(8),
4881–4891 (2023)
67. Guastaroba, G., Speranza, M.G.: Kernel search: an application to the index tracking problem.
Eur. J. Oper. Res. 217(1), 54–68 (2012)
68. Hodoshima, J.: Stock performance by utility indifference pricing and the Sharpe ratio. Quant.
Finan. 19, 1–12 (2018)
69. DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is
the 1/N portfolio strategy? Rev. Finan. Stud. 22(5), 1915–1953 (2009)
Chapter 4
Fully Homomorphic Encrypted Wavelet
Neural Network for Privacy-Preserving
Bankruptcy Prediction in Banks

Syed Imtiaz Ahamed, Vadlamani Ravi, and Pranay Gopi

Abstract The main aim of Privacy-Preserving Machine Learning (PPML) is to


protect the privacy and provide security to the data used in building Machine Learning
models. There are various techniques in PPML such as Secure Multi-Party Compu-
tation, Differential Privacy, and Homomorphic Encryption (HE). The techniques are
combined with various Machine Learning models and even Deep Learning Networks
to protect the data privacy as well as the identity of the user. In this chapter, we propose
a fully homomorphic encrypted wavelet neural network to protect privacy and at the
same time not compromise on the efficiency of the model. We tested the effectiveness
of the proposed method on four datasets taken from the finance domain especially
on the bankruptcy prediction. The results show that our proposed model performs
similarly to or better than the unencrypted model on the datasets.

Keywords Fully homomorphic encryption · Wavelet neural networks · CKKS


scheme · Classification · Stochastic gradient descent · Bankruptcy prediction

An earlier version of this manuscript was submitted to arXiv: 2205.13265.

S. I. Ahamed · V. Ravi (B) · P. Gopi


Centre for AI and ML, Institute for Development and Research in Banking Technology,
Castle Hills, Masab Tank, Hyderabad 500057, India
e-mail: [email protected]
S. I. Ahamed
e-mail: [email protected]
P. Gopi
e-mail: [email protected]
S. I. Ahamed
School of Computer and Information Sciences (SCIS), University of Hyderabad,
Hyderabad 500046, India

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 97


L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_4
98 S. I. Ahamed et al.

4.1 Introduction

Machine Learning is being extensively used in almost every field such as healthcare,
finance, education, intrusion detection, and even in recommendation systems [1]. A
lot of private data is stored in the databases and is openly utilized by the ML algo-
rithms to build models from them. One of the major concerns in the application of
ML models is the privacy and security of such private data. Organizations cannot
simply ignore the privacy concerns of the data such as customers’ Personal Identifi-
able Information (PII) and at the same time cannot stop analyzing such data because
it would reap immense business and operational benefits to the organization.
On May 25, 2018, European Union (EU), brought into effect the toughest privacy
and security law in the world called General Data Protection Regulation (GDPR) [2].
The law states that the organizations that violate the privacy and security standards
will be imposed heavy fines of almost millions of euros. One more such law, namely,
California Consumer Privacy Act (CCPA), allows the consumers in California the
right to know about everything that a business collects about them, the right to delete
the collected information, and the right to opt out of the sale of their information
[3]. Similarly, Personal Data Protection Act (PDPA) enacted in Singapore protects
personal Data [4].
With such strict privacy laws, organizations are precluded from using private data
freely. To overcome this problem, PPML provides different ways that will assure the
customers that their data privacy will be protected and at the same time organizations
can work on the private data and build better and more responsible ML Models.
Even in the financial domain, privacy preservation has a high priority because
the PII of a customer should not be available or shared with other organization
without the consent of the particular individual. The privacy preservation is not only
concerned with the customers data but it is also of an utmost importance to the
organizations. When an analysis is done on the data of an organization and it is
found that the organization is on the verge of bankruptcy then it would create a
chaos among the employees as well as the customers. So PPML helps to perform the
analysis without revealing the actual output of the analysis to everyone but only the
important stakeholders of the company.
Bankruptcy can be explained as the phenomenon when an bank/firm is unable to
complete its financial commitments or unable to return the due credit amount to its
creditors. In simpler terms, we can say that the bank/firm is unable to generate the
wealth to clear off its debt. This information is highly confidential and should not be
revealed to everyone unless the bank/firm is completely. PPML allows the organiza-
tions or banks to perform the bankruptcy prediction analysis without revealing any
sensitive information.
There are different approaches in PPML and there is no single proven approach
that is considered to be the best among all the approaches. For example, one of
the approaches is Differential Privacy (DP) where the researchers can work on the
peoples’ personal information without disclosing their identity. But the drawback of
DP is that it might lead to a loss in model accuracy. Similarly, another technique is
4 Fully Homomorphic Encrypted Wavelet Neural Network … 99

called Secure Multi-Party Computation where multiple data owners can collabora-
tively train the model but this might result in high communication overhead or high
computation overhead [5].
One more approach is to secure the data using Homomorphic Encryption. It
allows the computation to be performed on the encrypted data without the need
for decryption. Partial Homomorphic Encryption (PHE), Somewhat Homomorphic
Encryption (SWHE), and Fully Homomorphic Encryption (FHE) are the variations
of Homomorphic Encryption. PHE allows an unlimited number of either additions
or multiplications, SHE allows a limited number of arithmetic operations, and FHE
allows an unlimited number of additions and multiplications on the encrypted data.
In this chapter, we focus on the FHE which is considered to be the most secure
technique compared to others. Here, we propose FHE based privacy-preserving
Wavelet Neural Network (WNN). Thus we designed and implemented the secure
WNN by ensuring that the data and all the trainable parameters in the network are
fully homomorphic encrypted and also we get the results in an encrypted format.
The remaining part of the chapter is structured as follows: in Sect. 4.2, we overview
the bankruptcy prediction and state the problem. Section 4.3 presents the related work
regarding homomorphic encryption. Section 4.4 explains the proposed methodology
and in Sect. 4.5 briefly describes the datasets analyzed. The results are discussed in
Sect. 4.6 and finally, Sect. 4.7 presents the conclusion and future directions. Appendix
consists of Tables presenting the features of datasets analyzed.

4.2 Overview of Bankruptcy Prediction and Problem


Definition

The prediction of bankruptcy for financial firms and banks has been extensively
researched area since late 1960s [6]. Creditors, auditors, stockholders and senior
management are all equally interested in bankruptcy prediction because it affects all
of them alike [7]. The health of a bank in a highly competitive business environment
depends on (i) how financially solvent it is at the inception, (ii) its ability, relative
flexibility and efficiency in creating cash from its continuous operations, (iii) its
access to capital markets and (iv) its financial capacity and staying power when
faced with unplanned cash short-falls. As a bank becomes more and more insolvent,
it gradually enters a danger zone. Then, changes to its operations and capital structure
must be made in order to keep it solvent [8].
The most precise way of monitoring banks is by conducting on-site examina-
tions. These examinations are conducted on a bank’s premises by regulators every
12–18 months, as mandated by the Federal Deposit Insurance Corporation Improve-
ment Act of 1991. Regulators utilize a six part rating system to indicate the safety
and soundness of the institution. This rating, referred to as the CAMELS rating,
evaluates banks according to their basic functional areas: capital adequacy, asset
quality, management expertise, earnings strength, liquidity and sensitivity to market
100 S. I. Ahamed et al.

risk. While CAMELS ratings clearly provide regulators with important informa-
tion, Cole and Gunther [9] reported that these CAMELS ratings decay rapidly. This
awakening opened the floodgates of research activities in bankruptcy prediction area
whereby the entire gamut statistical and machine learning techniques were applied
in a flurry of publications spread across 2 decades from 1990 to 2010s. However,
with the GDPR and other privacy laws in force, the financial statement data including
balance sheet data of banks cannot be shared with a third party for rigorous predictive
analytics purpose.
This stringent constraint calls for the application of privacy preserving machine
learning in the area of bankruptcy prediction as well. Toward that direction, this
chapter proposes the privacy preserved, fully homomorphic encrypted WNN to
bankruptcy prediction in banks.

4.3 Literature Survey

Of late the idea of PPML, resulted in the enablement of privacy preservation in a


few ML techniques. To start with privacy-preserving ridge regression was proposed
in Nikolaenko et al. [10], where the authors used a hybrid approach by combining
a linear homomorphic encryption approach with Yao garbled circuits. Later, A fully
homomorphic encrypted Convolution Neural Network was proposed in Chabanne
et al. [11] and it was combined with the Cryptonets [12] solution along with the batch
normalization principles.
Chen et al. [13], the authors implemented fully homomorphic encryption Logistic
Regression using the Fan-Vercauteren scheme implementation in the SEAL Library.
Cheon et al. [14] proposed an ensemble gradient descent method for optimizing the
coefficients in a homomorphically encrypted logistic regression, which resulted in
the reduction of time complexity of the algorithm.
A secure Multi-layer perceptron was implemented in Bellafqira et al. [15] which
trains the homomorphically encrypted data on the cloud using the Paillier cryp-
tosystem and makes use of two non-colluding severs. Later, Nandakumar et al.
[16] trained a typical two-layered neural network on the encrypted data using fully
homomorphic encryption with the help of an open-source library HElib [17] for
encryption.
Sun et al. [18] proposed an improved FHE scheme based on HElib and imple-
mented a private hyper-plane decision-based classification and private Naïve Bayes
Classification using the additive homomorphic and multiplicative homomorphic
encryption. They implemented a private decision tree classification with the proposed
FHE scheme.
Privacy-preserving Linear Regression model was implemented on distributed data
by Qiu et al. [19], which includes multiple clients and two non-colluding servers.
4 Fully Homomorphic Encrypted Wavelet Neural Network … 101

The protocol consists of Paillier Homomorphic Encryption and data masking tech-
nique. Bonte and Vercauteren [20] implemented Privacy-Preserving Logistic Regres-
sion where somewhat homomorphic encryption based on the scheme of Fan and
Vercauteren [21] was used.

4.4 Proposed Methodology

Considerable research has been conducted in the domain of privacy preserving


machine learning, employing diverse algorithms and Neural Networks. However,
in this study, we introduce a novel concept of privacy preservation using Wavelet
Neural Network. To the best of our knowledge, this specific approach has not been
previously proposed or developed. By combining the power of wavelet analysis with
neural networks and encrypting the data along with every parameter like weights, our
approach aims to enhance privacy protection in machine learning tasks. This unique
integration presents an innovative solution to address privacy concerns in the field,
offering a fresh perspective and potentially unlocking new avenues for research and
application.
In this section, the concepts of homomorphic encryption and its types along with
CKKS scheme [22] which we employed for implementing the FHE are explained.
Later, we explain the original unencrypted WNN and describe our proposed Privacy-
Preserving WNN in detail along with a block diagram.

4.4.1 Homomorphic Encryption

Homomorphic Encryption is a special type of encryption scheme which allows


computations on the encrypted data without decrypting it at any point in time during
the computation [23]. In the other encryption schemes, the encrypted data needs to be
decrypted first to perform the computation. The homomorphic encryption supports
both additive and multiplicative homomorphism which means:

E(m1 + m2 ) = E(m1 ) + E(m2 ), and E(m1 ∗ m2 ) = E(m1 ) ∗ E(m2 )

where m1 and m2 are plain text and E is the encryption scheme. This implies that
homomorphic encryption of the sum or multiplication of two numbers is equivalent
to the sum or multiplication of two individually homomorphic encrypted numbers.
The homomorphic encryption scheme is mainly divided into three categories
based on the number of operations that can be performed on the encrypted data:
102 S. I. Ahamed et al.

4.4.1.1 Partially Homomorphic Encryption (PHE)

The PHE scheme allows only one type of operation either addition or multiplication
an unlimited number of times on the encrypted data. Some of the examples of partially
homomorphic encryption are RSA (multiplicative homomorphism) [24], ElGamal
(multiplicative homomorphism) [25], and Paillier (additive homomorphism) [26].
The PHE scheme is generally used in applications like Private Information Retrieval
(PIR) and E-Voting.

4.4.1.2 Somewhat Homomorphic Encryption (SHE)

The SHE scheme allows both addition and multiplication operations but only to a
limited number of times on the encrypted data. Boneh-Goh-Nissim (BGN) and Polly
Cracker Scheme are some examples of the SHE scheme.

4.4.1.3 Fully Homomorphic Encryption

The FHE scheme allows all the operations like addition and multiplication an unlim-
ited number of times on the encrypted data but it has high computational complexity
and requires high-end resources for efficient implementation [27]. Gentry [28] was
the first one to propose the concept of FHE along with a general framework to
obtain an FHE scheme. There are mainly three FHE families: Ideal lattice based
over integers [29], Ring Learning With Errors (RLWE) [30], and NTRU-like [31].
We implemented Cheon-Kim-Kim-Song (CKKS) Scheme whose security is based
on the hardness assumption of the RLWE.

4.4.2 CKKS Scheme

Cheon-Kim-Kim-Song (CKKS) is a leveled homomorphic encryption scheme that


mainly works on an approximation of arithmetic numbers. It is known as leveled
homomorphic encryption because there is a limit on the number of multiplications
that can be performed on the encrypted data based on the selection of the parameters. It
works only on the vector of real numbers but not on the scalar numbers. This scheme
is based on the library Homomorphic Encryption for Arithmetic of Approximate
Numbers (HEAAN) which was first introduced in Cheon et al. [22]. HEAAN is an
open-source homomorphic encryption library where the algorithms are implemented
in C++. We used the CKKS scheme as we can encrypt the real numbers and perform
the arithmetic results and get approximate or close values to the original result.
4 Fully Homomorphic Encrypted Wavelet Neural Network … 103

Message (m) Message


=f(m)

Encode
Decode
Plain Text P(X) Plain Text
=f(p)

Encrypt
Decrypt

Ciphertext Compute f Ciphertext


c=(c0(X),c1(X)) =f(c)

Fig. 4.1 Block diagram of the encryption and decryption in CKKS scheme

4.4.2.1 Encryption in CKKS

The encryption process happens in two steps in CKKS Scheme. In the first operation,
the vector of real numbers is encoded into a plain-text polynomial. This plain text
polynomial is then encrypted into a ciphertext.

4.4.2.2 Decryption in CKKS

Similar to the encryption process, the decryption also happens in two steps. In the
first operation, the ciphertext is decoded into a plain-text polynomial. This plain text
polynomial is then decrypted to a vector of real numbers. Figure 4.1 depicts the
encryption and decryption process in the CKKS scheme.

4.4.2.3 Parameters in CKKS

The parameters in CKKS decide the privacy level and computational complexity of
the model. These are as follows:
1. Scaling Factor: This defines the encoding precision for the binary representation
of the number.
2. Polynomial modulus degree: This parameter is responsible for the number
of coefficients in plain text polynomials, size of ciphertext, computational
complexity, and security level. The degree should always be in the power of
2, for e.g., 1024, 2048, 4096, …
The higher the polynomial modulus degree, higher the security level achieved.
But, it will also result in the increase the computational time.
3. Coefficient Modulus sizes: This parameter is a list of binary sizes. A list of binary
sizes of those schemes will be generated which is called coefficient modulus size.
The length of the list indicates the number of multiplications possible. The longer
104 S. I. Ahamed et al.

the list the lower is the level of security of the scheme. The prime numbers in
the coefficient modulus must be congruent to 1 modulo 2 * polynomial modulus
degree.

4.4.2.4 Keys in CKKS

The scheme generates different types of keys which are handled by a single object
called context. The keys are as follows:
1. Secret Key: This key is used for decryption and should not be shared with anyone.
2. Public Encryption Key: This key is used for the encryption of the data.
3. Relinearization Keys: In general the size of the new ciphertext is 2. If there
are two ciphertexts with sizes X and Y, then the multiplication of these two will
result in the size getting as big as X + Y − 1. The increase in the size increases
noise and also reduces the speed of multiplication. Therefore, Relinearization
reduces the size of the ciphertexts back to 2 and this is done by different public
keys which are created by the secret key owner.

4.4.3 Overview of the Original Unencrypted WNN

The WNN [32] has a simple architecture with just three layers, namely the input
layer, hidden layer, and output layer. The input layer consists of the feature values
or the explanatory variables that are introduced to the WNN and the hidden layer
consists of hidden nodes which are generally referred to as Wavelons. These wavelons
transform the input values into translated and dilated forms of the Mother Wavelet.
The approximate target values are estimated in the output layer. All the nodes in
each layer are fully connected with the nodes in the next layer. We implemented the
WNN with Gaussian wavelet function as an activation function, which is defined as
follows (Fig. 4.2)

f (t) = e−t
2
(4.1)

The algorithm to train the WNN is as follows. It is simpler than the backprop-
agation algorithm because here only the gradient descent is applied to update the
parameters without backpropagating the errors [33]:
1. Select the number of hidden nodes and initialize all the weights, translation and
dilation parameters, randomly using uniform distribution in (0, 1).
2. The output value ŷ of each sample is predicted as follows:
(∑ )

nhn nin
wij xki − bj
ŷ = Wj f i=1
(4.2)
j=1
aj
4 Fully Homomorphic Encrypted Wavelet Neural Network … 105

Fig. 4.2 Topology of a wavelet neural network

where nhn and nin are the numbers of hidden and input nodes respectively, Wj
and wij are the weights between hidden to output nodes and the weights between
the input to hidden nodes respectively, bj and aj are the translation and dilation
parameters respectively.
3. Update the weights (Wj and wij ), translation (bj ), and dilation (aj ) parameters.
The parameters of a WNN are updated by using the following formulas:

∂E
ΔWj (t + 1) = −η + αΔWj (t) (4.3)
∂Wj (t)
∂E
Δwij (t + 1) = −η + αΔwij (t) (4.4)
∂wij (t)
∂E
Δaj (t + 1) = −η + αΔaj (t) (4.5)
∂aj (t)
∂E
Δbj (t + 1) = −η + αΔbj (t) (4.6)
∂bj (t)

Here the error function E is taken as Mean Squared Error (MSE),

1 ∑
N
E= (yi − ŷi )2 (4.7)
N i=1

where y is the actual output value, N is the number of training samples, η and α
are the learning rate and momentum rate respectively.
4. The steps 2 and 3 are repeated until the error E reaches the specified convergence
criteria.
106 S. I. Ahamed et al.

4.4.4 Proposed Privacy-Preserving Wavelet Neural Network

In this chapter, we proposed a novel Privacy-Preserving neural network architecture


in the form of a fully homomorphic encrypted wavelet neural network. We imple-
mented FHE by using a library called TenSEAL [34] (https://github.com/OpenMi
ned/TenSEAL). It provides a python API, but also maintains efficiency as most of
its operations are implemented in C++. It performs encryption and decryption on the
vector of real numbers using the CKKS scheme. It can perform various operations
like addition, subtraction, multiplication, and dot product on encrypted vectors.
In our architecture, we maintained the same number of hidden nodes as the number
of input nodes because the model complexity increases with the increase in the
number of hidden nodes. We can also decrease the number of hidden nodes but it
will lead to a decrease in the model performance and if we increase the hidden nodes,
the model might perform better but the computational as well as time complexity
increases.
The activation function works properly on unencrypted data but as we want to work
with the encrypted data, the implementation of the exponential activation function is
computationally expensive. For this reason, we performed an approximation of the
activation function using Taylor series expansion which resulted in the following:

f (t) = 1 − t 2 + 0.5t 4 (4.8)


∑nin
w x −b
where t = i=1 aijj ki j .
Accordingly, Eq. 4.2 also gets approximated. In this architecture, the weights
between the input to hidden nodes wij , weights between hidden to output node Wj ,
the translation parameter bj , and the dilation parameter aj are all encrypted along
with the input data.
The training and test phases are carried out on the encrypted data and only the
parameters are decrypted and encrypted after every update to reduce the computa-
tional complexity of the model. When we pass the entire encrypted training set to the
encrypted model, the computational time increases with the increase in the number
of samples, and the model would take a lot of time for training. So to overcome this
problem we used an optimization technique called mini-batch Stochastic Gradient
Descent (SGD) [35].
The mini-batch SGD divides the encrypted training data into a specified mini-
batch size of random samples in every epoch. Our encrypted model will be trained
on these mini-batches instead of the whole encrypted training set and the parameters
will be updated after every mini-batch. Thus by using the mini-batch SGD the time
taken by the model can be immensely reduced. The training will be stopped once
the model reaches the convergence criteria which is when the change in the error
is small enough to a range of 0.0001 or when the model reaches the maximum
accuracy. Maximum accuracy is the highest accuracy achieved by the unencrypted
model. The below block diagram explains the process flow the proposed privacy
preserving WNN (Fig. 4.3).
4 Fully Homomorphic Encrypted Wavelet Neural Network … 107

Fig. 4.3 Block diagram of the proposed PPWNN

The algorithms 1 and 2 explain the Training and Testing procedure of the
Encrypted WNN.

Algorithm 1 Training the Encrypted WNN


Input:
Encrypted Training Data: E(X) where X = (X 1 , X 2 ,
X 3 ,--------------X n )
Weights:- w ij , W j ,
Translation and Dilation Parameters:- b j , aj ,
Batch_size, Learning Rate (η) and Momentum (α)
Output:
Encrypted Predictions:- E(ŷ), Updated Encrypted
Weights = E(w ij ), E(W j )
Updated Encrypted Translation and Dilation
Parameters:- E(b j ), E(aj )
Function TrainEncryptedWNN (E(X), w ij , W j , b j , aj )
1. While true
2. if ΔE >= 0.0001 or accuracy <= max_accuracy
3. Encrypt the parameters w ij , W j , b j , aj
Divide E(X) into random batches of specified
Batch_size
4. for each sample in Batch_size
5. Generate E(ŷ) with (4.2)
6. Calculate the MSE by using (7) and derivatives
for each sample
7. Add all derivatives
Update the parameters E(w ij ), E(W j ), E(b j ),
108 S. I. Ahamed et al.

E(aj ) with (4.3), (4.4), (4.5), (4.6)


8. Decrypt the parameters E(w ij ), E(W j ), E(b j ),
E(aj )
9. else
10. Break

In the above algorithm, ΔE is the change in the Mean Squared Error of the current
batch and previous batch and max_accuracy is the maximum accuracy obtained by
the unencrypted model

Algorithm 2 Testing the Encrypted WNN


Input:
Encrypted Test Data: E(Xʹ) where X = (X 1 ʹ, X 2 ʹ,
X 3 ʹ,--------------X n ʹ)
Updated Encrypted Weights: E(w ij ʹ), E (W j ʹ)
Updated Encrypted Translation and Dilation
Parameters: E(b j ʹ), E(aj ʹ)
Output:
Encrypted Predictions: E(ŷ)
Function Test EncryptedWNN (E(Xʹ), E(w ij ʹ), E(W j ʹ),
E(b j ʹ), E(aj ʹ))
1. for each sample in Batch_size
2. Generate E(ŷ) with (4.2)
3. Decrypt the predictions and calculate the accuracy
and AUC.

4.5 Datasets Description

The features of all the datasets are presented in the Appendix.

4.5.1 Qualitative Bankruptcy Dataset

In this dataset, there are 250 instances and 7 features including the target variable,
namely, whether, a bank is bankrupt or non-bankrupt [36]. Out of the 250 instances,
143 instances are non-bankrupt banks and 107 are bankrupt. The description of the
features is provided in Table 4.3.
4 Fully Homomorphic Encrypted Wavelet Neural Network … 109

4.5.2 Spanish Banks Dataset

In this dataset, there are 66 instances and a total of 10 features including the target
variable [37]. Out of the 66 instances, 37 are the bankrupt banks and 29 are healthy
ones.

4.5.3 Turkish Banks Dataset

The Turkish Bank Dataset consists of 40 instances and 9 features including the target
variable [38]. Out of the 40 instances, 22 instances are the banks which went bankrupt
and 18 are the banks were healthy.

4.5.4 UK Banks Dataset

The UK banks dataset has 60 instances with 10 features [39], where 30 banks were
bankrupt and 30 were healthy.

4.6 Results and Discussion

Traditional neural networks often involve numerous parameters and complex archi-
tectures, resulting in high time complexity for both training and inference. When
combined with FHE, where all parameters are encrypted, this time complexity is
further increased. However, WNNs offer a distinct advantage in terms of param-
eter reduction through multiresolution analysis. This reduction in network parame-
ters compared to other neural networks leads to faster training and inference times,
making WNNs computationally efficient even when the parameters are encrypted
using FHE.
By leveraging the multiresolution analysis inherent in WNNs, our research takes
advantage of the computational efficiency of the network architecture. This allows
us to mitigate the time complexity challenges associated with using FHE in neural
networks. The reduction in the number of parameters, combined with the unique
capabilities of WNNs, enables us to effectively apply FHE in the training and infer-
ence processes of WNNs, opening new possibilities for privacy-preserving machine
learning applications.
All the experiments are carried out on a system with the following configura-
tion: HP Z8 workstation with Intel Xeon (R) Gold 6235R CPU processor, Ubuntu
20.04lts, and having RAM of 376.6 GB. The number of hidden nodes is kept the
110 S. I. Ahamed et al.

Table 4.1 Hyperparameters for the datasets


Datasets Momentum Learning rate Batch size
Qualitative bankruptcy 0.5 0.05 16
Spanish banks dataset 0.5 0.05 8
Turkish banks dataset 0.5 0.05 6
UK banks dataset 0.5 0.05 5

same as the number of input nodes. Accuracy and Area Under the Receiver Operating
Characteristics Curve (AUC) are taken as the performance metrics.
The polynomial modulus degree and the coefficient modulus sizes were taken as
16384 and [42,36,36,36,36,36,36,36,36,36,36,36] respectively. The global scale was
taken as 220 . The same parameters were used in the encryption for all the datasets.
The polynomial modulus degree 16384 provides a max bit count of 438 bits for
the coefficient modulus which means that the sum of all the values in coefficient
modulus must be less than or equal to 438. The values from the index 1–10 are called
as the intermediate primes which are responsible for rescaling the ciphertext and also
indicate the number of multiplications supported by the scheme. Rescaling keeps the
scale constant and also reduces the noise present in the cipher text. The intermediate
primes should be greater than or equal to the value of the global scale. In our case,
we have chosen 20 as the global scale and the intermediate primes are selected as
36. The size of the plain text will be bounded by the first value in the coefficient
modulus which is taken as 42 in our scenario. The last prime should be as large as
the other primes in the coefficient modulus.
In the Qualitative Bankruptcy dataset, all the features are categorical (in a textual
format). We converted the labels of all the features into a numeric form. In the UK
and Turkish Dataset the predictor variable was in textual format which was converted
to numeric form. The hyperparameters for the Datasets are presented in Table 4.1
respectively.
In the datasets, both the unencrypted and encrypted models performed almost
identically because the Accuracy and AUC yielded by them turned out to be nearly
equal. The results of the Datasets are presented in Table 4.2. It turns out that the
PPWNN resulted in higher AUC compared to the unencrypted version of the WNN.
This is a significant result of the study.

4.7 Conclusions and Future Work

A fully Homomorphic Encrypted Wavelet Neural Network is proposed with SGD


and an approximated activation function. The model provides high security because
along with input data, the predictions, the parameters like weights, translation, and
dilation parameters are also encrypted. This PPWNN is applied on the bankruptcy
4 Fully Homomorphic Encrypted Wavelet Neural Network … 111

Table 4.2 AUC obtained by unencrypted and encrypted WNN


Datasets Unencrypted mini batch PPWNN The average time taken for
SGD WNN one epoch in the PPWNN
AUC AUC
Qualitative bankruptcy 0.50 0.55 3 min 25 s
Spanish banks dataset 0.52 0.79 3 min 3 s
Turkish banks dataset 0.50 0.62 3 min 8 s
UK banks dataset 0.83 0.83 3 min 9 s

Table 4.3 Qualitative


Feature description
bankruptcy dataset
1. Industrial risk
2. Management risk
3. Financial flexibility
4. Credibility
5. Competitiveness
6. Operating risk
7. Class

prediction problem, where privacy is critical. Results on 4 distinct datasets emphasize


the importance of PPML in the financial research problems as well.
There is, however, a limitation to the model in that the average time per epoch
increases with the increase in the number of samples and features. Moreover, homo-
morphically encrypted networks require a high computational power system for effi-
cient execution. Finally, we can conclude that the data is well protected throughout
the process but it comes at the cost of resources and time.
The idea of a privacy-preserving WNN can be extended in a Federated Learning
setup where the model used on the individual nodes can be our PPWNN which will
ensure that the privacy of the data is protected even on the individual machines.
Further, this PPWNN can also be employed to solve regression problems owing to
the versatility of WNN. The main limitation of the proposed PPWNN is that it is
very slow and takes long convergence time eve for moderate sized datasets, which is
not acceptable in today’s world which is replete with big high dimensional datasets.
To overcome this limitation, in future, we plan to implement the PPWNN in a cluster
environment under Apache Spark.

Appendix: Datasets Description

This section provides the details about the feature information about the datasets
analyzed during the research (Tables 4.3, 4.4, 4.5 and 4.6).
112 S. I. Ahamed et al.

Table 4.4 Financial ratios of


S.No. Predictor variables Acronym
Spanish banks datasets
1 Current assets/Total assets CA/TA
2 Current assets-cash/Total assets CAC/TA
3 Current assets/Loans CA/L
4 Reserves/Loans R/L
5 Net income/Total assets NI/TA
6 Net income/Total equity capital NI/TEC
7 Net income/Loans NI/L
8 Cost of sales/Sales CS/S
9 Cash flow/Loans CF/L

Table 4.5 Financial ratios of Turkish banks datasets


S.No. Predictor variables Acronym
1 Interest expenses/Average profitable assets IE/APA
2 Interest expenses/Average non-profitable assets IE/ANA
3 (Share holders equity + Total income)/(Deposits + Non-deposit funds) (SE + TI)/(D
+ NF)
4 Interest income/Interest expenses II/IE
5 (Share holders equity + Total income)/Total assets (SE + TI)/
TA
6 (Share holders equity + Total income)/(Total assets + Contingencies (SE + TI)/
and commitments) (TA + CC)
7 Networking capital/Total assets NC/TA
8 (Salary and employees benefits + Reserve for retirement)/No. of (SEB + RR)/
personnel P
9 Liquid assets/(Deposits + Non-deposit funds) LA/(D +
NF)
10 Interest expenses/Total expenses IE/TE
11 Liquid assets/Total assets LA/TA
12 Standard capital ratio SCR
4 Fully Homomorphic Encrypted Wavelet Neural Network … 113

Table 4.6 Financial ratios of UK banks datasets


S.No. Predictor variables Acronym
1 Sales Sales
2 Profit before tax/Capital employed (%) PBT/CE
3 Funds flow/Total liabilities FF/TL
4 (Current liabilities + Long term debits)/Total assets (CL + LTD)/TA
5 Current liabilities/Total assets CL/TA
6 Current assets/Current liabilities CA/CL
7 Current assets-stock/Current liabilities CA-S/CL
8 Current assets − Current liabilities/Total assets (CA − CL)/TA
9 LAG (number of days between account year end and the date of LAG
annual report)
10 Age Age

References

1. Al-Rubaie, M., Chang, J.M.: Privacy-preserving machine learning: threats and solutions. IEEE
Secur. Priv. 17(2), 49–58 (2019)
2. Truong, N., Sun, K., Wang, S., Guitton, F., Guo, Y.K.: Privacy preservation in federated learning:
an insightful survey from the GDPR perspective, Comput. Secur. 110, 102402 (2021). ISSN
0167-4048
3. Stallings, W.: Handling of personal information and deidentified, aggregated, and
pseudonymized information under the California consumer privacy act. IEEE Secur. Priv.
18(1), 61–64 (2020)
4. Chik, W.: The Singapore Personal Data Protection Act and an assessment of future trends in
data privacy reform. Comput. Law Secur. Rev. 29, 554–575 (2013)
5. Xu, R., Baracaldo, N., Joshi, J.: Privacy-preserving machine learning: methods, challenges and
directions (2021). arXiv preprint arXiv:2108.04417
6. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
J. Finance 23, 589–609 (1968)
7. Wilson, R.L., Sharda, R.: Bankruptcy prediction using neural networks. Decis. Support Syst.
11, 545–557 (1994)
8. Kumar, P.R., Ravi, V.: Bankruptcy prediction in banks and firms via statistical and intelligent
techniques—a review. Eur. J. Oper. Res. 180(1), 1–28 (2007)
9. Cole, R., Gunther, J.: A CAMEL rating’s shelf life. Federal Reserve Bank of Dallas Review,
pp. 13–20 (1995)
10. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving
ridge regression on hundreds of millions of records. In: 2013 IEEE Symposium on Security
and Privacy, pp. 334–348 (2013)
11. Chabanne H., De Wargny, A., Milgram, J., Morel, C., Prouff, E.: Privacy-preserving
classification on deep neural network. Cryptology ePrint Archive (2017)
12. Xie, P., Bilenko, M., Finley, T., Gilad-Bachrach, R., Lauter, K., Naehrig, M.: Crypto-nets:
neural networks over encrypted data (2014). arXiv preprint arXiv:1412.6181
13. Chen, H., Gilad-Bachrach, R., Han, K., et al.: Logistic regression over encrypted data from
fully homomorphic encryption. BMC Med. Genomics 11, 81 (2018)
14. Cheon, J.H., Kim, D., Kim, Y., Song, Y.: Ensemble method for privacy-preserving logistic
regression based on homomorphic encryption. IEEE Access 6, 46938–46948 (2018)
114 S. I. Ahamed et al.

15. Bellafqira, R., Coatrieux, G., Genin, E., Cozic, M.: Secure multilayer perceptron based on
homomorphic encryption. In: Yoo, C., Shi, Y.Q., Kim, H., Piva, A., Kim, G. (eds.) Digital
Forensics and Watermarking. IWDW. Lecture Notes in Computer Science, vol. 11378. Springer,
Cham (2019)
16. Nandakumar, K., Ratha, N., Pankanti, S., Halevi, S.: Towards deep neural network training on
encrypted data. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pp. 40–48 (2019)
17. Halevi, S., Shoup, V.: Design and implementation of HElib: a homomorphic encryption library.
Cryptology ePrint Archive (2020)
18. Sun, X., Zhang, P., Liu, J.K., Yu, J., Xie, W.: Private machine learning classification based on
fully homomorphic encryption. IEEE Trans. Emerg. Top. Comput. 8(2), 352–364 (2020)
19. Qiu, G., Gui, X., Zhao, Y.: Privacy-preserving linear regression on distributed data by
homomorphic encryption and data masking. IEEE Access 8, 107601–107613 (2020)
20. Bonte, C., Vercauteren, F.: Privacy-preserving logistic regression training. BMC Med.
Genomics 11, 86 (2018)
21. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint
Archive (2012)
22. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approxi-
mate numbers. In: International Conference on the Theory and Application of Cryptology and
Information Security, pp. 409–437. Springer, Cham (2017)
23. Acar, A., Aksu, H., Uluagac, A.S., Conti, M.: A survey on homomorphic encryption schemes:
theory and implementation. ACM Comput. Surv. 51(4), 35p, Article 79 (July 2019) (2018)
24. Nisha, S., Farik, M.: RSA public key cryptography algorithm—a review. Int. J. Sci. Technol.
Res. 6, 187–191 (2017)
25. Haraty, R.A., Otrok, H., El-Kassar, A.N.: A comparative study of Elgamal based cryptographic
algorithms. In: ICEIS 2004-Proceedings of the Sixth International Conference on Enterprise
Information Systems, pp. 79–84 (2004)
26. Nassar, M., Erradi A., Malluhi, Q.M.: Paillier’s encryption: implementation and cloud appli-
cations. In: 2015 International Conference on Applied Research in Computer Science and
Engineering (ICAR), pp. 1–5 (2015)
27. Chialva, D., Dooms, A.: Conditionals in homomorphic encryption and machine learning
applications (2018). arXiv preprint arXiv:1810.12380
28. Gentry, C.: A fully homomorphic encryption scheme. Stanford University (2009). https://cry
pto.stanford.edu/craig/craig-thesis.pdf
29. van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over
the integers. In: Gilbert, H. (eds.) Advances in Cryptology—EUROCRYPT 2010. Lecture
Notes in Computer Science, vol. 6110. Springer, Berlin (2010)
30. Brakerski, Z., Vaikuntanathan, V.: Fully homomorphic encryption from ring-LWE and security
for key dependent messages. In: Proceedings of the 31st Annual Conference on Advances in
Cryptology (CRYPTO’11), pp. 505–524. Springer, Berlin (2011)
31. López-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud
via multikey fully homomorphic encryption. In: Proceedings of the Forty-Fourth Annual ACM
Symposium on Theory of computing (STOC’12). Association for Computing Machinery, New
York, NY, USA, pp. 1219–1234 (2012)
32. Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Networks 3(6), 889–898
(1992)
33. Kumar, K.V., Ravi, V., Carr, M., Kiran, N.R.: Software development cost estimation using
wavelet neural networks. J. Syst. Software 81(11), 1853–1867 (2008). ISSN 0164-1212
34. Benaissa, A., Retiat, B., Cebere, B., Belfedhal, A.E.: Tenseal: a library for encrypted tensor
operations using homomorphic encryption (2021). arXiv preprint arXiv:2104.03152
35. Qian, X., Klabjan, D.: The impact of the mini-batch size on the variance of gradients in
stochastic gradient descent (2020). arXiv preprint arXiv:2004.13146.
36. Kim, M.J., Ingoo, H.: The discovery of experts’ decision rules from qualitative bankruptcy data
using genetic algorithms. Expert Syst. Appl. 25(4), 637–646 (2003). ISSN 0957-4174
4 Fully Homomorphic Encrypted Wavelet Neural Network … 115

37. Olmeda, I., Fernández, E.: Hybrid classifiers for financial multicriteria decision making: The
case of bankruptcy prediction. Comput. Econ. 10, 317–335 (1997)
38. Canbas, S., Cabuk, A., Kilic, S.B.: Prediction of commercial bank failure via multivariate
statistical analysis of financial structures: The Turkish case. Eur. J. Operat. Res. 166(2), 528–546
(2005)
39. Beynon, M.J., Peel, M.J.: Variable precision refought set theory and data discretisation: an
application to corporate failure prediction. Omega 29, 561–576 (2001)
Chapter 5
Tools and Measurement Criteria
of Ethical Finance Through
Computational Finance

Marco Piccolo and Francesco Vigliarolo

Abstract This chapter aims to offer the reader a critical reflection on computa-
tional finance starting from the principles of ethical finance. With this term, we
refer to those principles that arose from the 1970s onwards, are proposed to imple-
ment socio-environmental values in financial activities, from savings to employment,
also in response to the process of financialization of the economy that has removed
finance itself from real life of local populations. Starting from a critical analysis of
economic positivism that introduced the massive use of mathematics in economics,
it is proposed a reflection on the concept of financial accounting and on the role of the
real acquisition power of wages in order to create a financial system that determines
anew a socio-environmental horizon to which the economy must strive. With these
assumptions, financial tools are proposed based on the principles of ethical finance
and how they can promote a process that we call economic socialization, that is
to allow finance to carry forward again the social and environmental values neces-
sary for the life of local communities. With these assumptions, the first paragraph
introduces the concept and problems of economic positivism and how the process
of financialization of the world economy and its impact on the financial system has
been produced since 1970. In this context, some theoretical concepts are proposed
such as that of the purchasing power of wages to determine a finance linked to the
workforce. The second, introduces the principles of ethical finance, from birth to the
present day. The third introduces some financial and socio-environmental measure-
ment tools and models of ethical finance that could be introduced in computational
finance. Finally, the fourth proposes some conclusions starting from the arguments
set out, including the process of economic socialization, or how finance is called to
carry forward the socio-environmental values of local communities, under penalty
of losing the conditions of real well-being for our societies. In this scenario, it is
proposed that finance must respond to a demand for peoples’ rights.

M. Piccolo
Ethic Bank Foundation, London, United Kingdom
F. Vigliarolo (B)
UNESCO CHAIR, National University of La Plata; UCALP; UBA, La Plata, Argentina
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 117
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_5
118 M. Piccolo and F. Vigliarolo

Keywords Finance · Ethic · Computational · Socialization · Rights

5.1 Introduction

The objective of the theoretical research presented in this article is to define criteria
that re-establish an ethical dimension in the economy and in particular in finance
that can be applied to computational finance. The relationship between ethics and
computational finance allows us to restore a vision to economics and to computational
finance itself that allows for the incorporation of a social intelligibility, getting out
of the positivist straitjacket that defines it only in terms of reasoning and quantitative
mathematical dimensions.
With these assumptions, this chapter aims to offer the reader a critical reflection on
computational finance starting from the principles of ethical finance. With this term,
we refer to those principles, that arose from the 1970s onwards, are proposed to imple-
ment socio-environmental values in financial activities, from savings to employment,
also in response to the process of financialization of the economy that has removed
finance itself from real life of local populations. Starting from a critical analysis of
economic positivism that introduced the massive use of mathematics in economics,
it is proposed a reflection on the concept of financial accounting and on the role of the
real acquisition power of wages in order to create a financial system that determines
anew a socio-environmental horizon to which the economy must strive. With these
assumptions, financial tools are proposed based on the principles of ethical finance
and how they can promote a process that we call economic socialization, that is to
allow finance to carry forward again the social and environmental values necessary
for the life of local communities.
With these assumptions, the first paragraph introduces the concept and problems
of economic positivism and how the process of financialization of the world economy
and its impact on the financial system has been produced since 1970. In this context,
some theoretical concepts are proposed such as that of the purchasing power of
wages to determine a finance linked to the workforce. The second, introduces the
principles of ethical finance, from birth to the present day. The third introduces some
financial and socio-environmental measurement tools and models of ethical finance
that could be introduced in computational finance. Finally, the fourth proposes some
conclusions starting from the arguments set out, including the process of economic
socialization, or how finance is called to carry forward the socio-environmental values
of local communities, under penalty of losing the conditions of real well-being for
our societies. In this scenario, it is proposed that finance must respond to a demand
for peoples’ rights.
5 Tools and Measurement Criteria of Ethical Finance Through … 119

5.2 Ethical Finance, Principles and Operating Criteria

If today we can talk about ethical finance, it’s thanks of the diffusion of hundreds
and hundreds of initiatives around the world over the last fifty years carried on by
a movement characterized by a vision of development in which “money” is closely
linked to a human and social growth even before economic growth, or, from a use
of money consistent with the values of a community ethic. In other words, this does
not mean that the responsible use of money and its implications is only being asked
today; this has always been, more or less expressed, probably since the times in
which human communities began to use this medium of exchange which is money
(for example we can find in Aristoteles too1 ).
This is an important premise because it allows us to understand how the ideas that
generate change, and we are convinced that this also applies to ethical finance, are
not born only on the basis of a particularly brilliant invention, but rather respond to
questions of meaning that people do when they feel they are an active and responsible
part of their community. Hence the political value of ethical finance which takes
nothing away from the ideal value of individual experiences but which, as a movement
that puts many experiences online on a global scale, lays the foundations for a review
of economic and financial instruments according to a new idea of “development”,
thus breaking the molds of a dated and outdated reading of the world (north against
south, developed countries against less developed countries, etc.) and opening up
new scenarios of cooperation between communities.
Until a few years ago, those who became aware of ethical finance struggled to
see its financial implications, tending more to treat it as an activity falling within
the categories of the spirit rather than economic ones. In grasping above all the
social, solidarity and ecological aspects, there was the risk that this initiative would
be interpreted more as a sophisticated form of philanthropy and/or charity than a
proposal capable of also guaranteeing economic value. On the other hand, there
were those who, imbued with a mainstream financial culture, criticized this proposal
of ethical finance because, having to respond also to non-economic values, it was
not effective in its ability to generate value. Unfortunately, this “rational” approach,
albeit legitimate in its assumptions, has not been able to prevent a substantial part of
finance from changing its “skin”, becoming increasingly autonomous in its ability to
generate wealth independently of its support for the actual economic activity; it is no
coincidence that numerous authors expressly speak of the “financialisation” of the
economy, of detachment from the real economy and of the prevalence of income from
capital over income from workforce. However, these “external” critical readings have
not blocked a movement which in various parts of the world is instead taking hold
and above all is changing a culture of participation as it relates the classic instruments
of democracy (voting, delegation, choice of one’s own representatives, adhesion to
intermediate bodies, etc.) with choices in the field of consumption, savings, mobility,
work.

1 Aristotle [1].
120 M. Piccolo and F. Vigliarolo

In this part of the chapter we will try to analyze principles of ethical finance
from the movement of the 70s onwards even if we find interesting experiences,
in the sense that they reached a discrete size, already in the first half of the last
century, such as some American investment funds promoted by some institutions
of Protestant faith (see Pioneer Fund of the Mennonites 1928…), or those used
by the movements against the war in Vietnam and against nuclear power (civil and
military). All these investment funds, whether determined by religious or secular and
social motivations, were however characterized by negative criteria (no weapons, no
alcohol, no prostitution, no politics, etc.) and not positive ones. Furthermore, when
we speak of ethical finance, we cannot fail to mention other banking experiences
with a strong social vocation born towards the end of the nineteenth century, such as
rural and artisan banks, an expression of a Christian social culture (see Raiffesein2 ),
and popular banks, secular and socialist-inspired.
From the 1970s onwards, things began to change, and above all experiences were
no longer concentrated in Anglo-Saxon countries, where the culture of financial
investment was more widespread among the population. The first major oil crisis, the
rejection of the logic of the cold war, the dissatisfaction with a neo-liberal production
model governed by multinationals, the increasingly serious gap between the north and
the south of the world, etc., give new stimuli to the movement of finance ethics and we
begin to wonder not only about what not to finance but also about what to finance; if
you don’t agree with a certain way of producing and distributing wealth, the time has
come to combine a protest action, typical of movements, with a proposal, that is, to
offer the citizen the opportunity to make economic choices (produce, consume, save,
move) in a manner consistent with one’s ethics. It is for this reason that the movement
of ethical finance can only be understood by recognizing its complementarity with
other initiatives, always on a global scale, such as, for example, fair trade, organic
agriculture, the solidarity and civil economy (in its various expressions). In fact, a
careful observer does not escape how these experiences arose from an awareness and
consequent change of strategy of many of the social movements that characterized
the second half of the last century: from the movement for peace and nonviolence to
the environmentalist one, from the of human and social rights to that of North/South
cooperation, and, more generally, to a worldwide ramification of the cooperative
movement. Years of struggles, campaigns and actions, if they have been important
for changing the sensitivity of public opinion, have changed very little in the balance
of power between those who manage economic and financial power and those who
suffer from it. Here then is the importance of declining these non-economic values
also and precisely in economic activities, starting from the assumption that there
cannot be schizophrenia in our way of thinking and acting, almost a double morality
that is used in different ways depending on the context in which we find ourselves:
ethical/solidarity in social life, oriented towards maximizing profit in economic life.
The common thread of these movements is the awareness that only by putting a

2 Raiffeisen is committed to a strong local economy with cultural, sporting and tourism initia-
tives from which Raiffeisen members and YoungMemberPlus clients can benefit in the form of
MemberPlus supplementary services. See https://www.raiffeisen.ch/rch/it.html.
5 Tools and Measurement Criteria of Ethical Finance Through … 121

hand on economic and financial mechanisms for a more equitable distribution of


resources and wealth can we counteract the dynamics that are currently generating
exclusion, poverty, exploitation, destruction of the planet and migratory flows bigger
and bigger.
An awareness which, however, is not limited to identifying the “guilty” in the
great economic, financial and political powers but which also recalls the responsi-
bility of citizens on the non-economic consequences of their choices in the field of
consumption, savings management, their job. If the economy is important because
it provides answers to our “material” needs, it is important that this does not contra-
dict our vision of values and ethics. It is thus understandable that one of the most
important reasons, which promoted ethical finance, was the refusal to entrust one’s
savings to financial intermediaries who, according to the maximum return, did not
hesitate to invest them in sectors in which only the greatest possible profit was valued,
but not how much these impacted on the social and environmental dimension of the
communities, such as for example the protection of rights, respect for nature, the
promotion of peace and non-violence, the support of the weakest groups, decent
working conditions, etc. If there is a world movement of ethical finance, however,
there is not a single or absolute thought of this, as this movement has developed
on the basis of the socio-economic and political context of the various territories,
trying to give answers to the social and ecological aspirations that the movements of
local opinion expressed. In some countries, such as those of northern Europe, char-
acterized by better socio-economic conditions, the orientation was more towards
environmental protection, in others such as Italy, strongly characterized by social
sensitivity, ethical finance was instead more oriented towards the support of social
and international cooperation, towards peace and nonviolence. It is no coincidence
that the first experiences of alternative finance in Italy, such as the MAGs (mutual
self-management cooperatives), and subsequently Banca Etica, have as their main
motivation the commitment not to finance both the production and trade of arms and
dictatorial and non-democratic regimes.
The adoption of positive criteria has involved, at least for many of the main
experiences, a change in the very mission of ethical finance, stimulating it to over-
come a limited or residual vision of the economy to try to understand if and how,
keeping together the various social and environmental aspects of the movements,
it is possible to hypothesize an economic proposal which, albeit in the plurality
of its various expressions, is functional to a real human, social and environmental
growth of society. The goal is therefore to get out of the strongly sectoral and frag-
mented approach of the movements, often even self-referential, to get to promote an
economic culture based on a new concept of value, in which the economic dimension
is closely linked to the social dimension and environmental. Concept well expressed
in the papal encyclical Laudato Sì 3 where it is clearly stated that the challenges facing
humanity today require finance to support only an economy that integrates with a
choice of integral ecology. Hence the importance for those involved in ethical finance

3 Pope Francis [11]. See https://www.vatican.va/content/francesco/en/encyclicals/documents/papa-


francesco_20150524_enciclica-laudato-si.html.
122 M. Piccolo and F. Vigliarolo

to constantly ask themselves about the relationship between economic development


and human growth, subordinating their credit and investment choices to economic
activities whose effects are not limited only to the interest of the individual (be it
a person, a family, a company) but also contemplate a benefit for the community.
So far we have mainly dealt with the vision of ethical finance which can be consid-
ered the common denominator of many experiences, almost all born with the aim
not so much of making finance as an end in itself but rather of setting in motion
processes of sustainable development, citizenship active and responsible, of eman-
cipation from situations of poverty and exploitation. Experiences all animated by a
desire for change and therefore with a strong political value, in the awareness that
every investment, every loan disbursement implies a choice as to what kind of world
we want to help build.
Having clarified this, it is also important to understand how the ethical finance
movement has made its commitment concrete, and here we find a huge variety of
forms and instruments from the credit union to the guarantee consortium, from the
real bank to the microcredit and microfinance network, from the management of
mutual funds to closed-end funds. Almost all characterized by a cooperative approach
or in any case of popular participation, disconnected from the large financial and
economic groups and instead closely connected to civil society organizations, often
in a network with the actors of a new economic culture which, leaving the profit/non-
profit dichotomy, proposes to keep economic activity together with ethical tension:
we speak of a circular, transformative, civil, solidarity economy, of communion,
of the common good, to name the most widespread, and although each adjective
corresponds to a specific stream of experiences we can say that all move in one
direction: to generate the future!
It is difficult to give precise numbers on how many actors in ethical finance
there are today in the world, however it is significant that the various networks or
federations that connect them are worldwide. Here is a short list in chronological
order: Inaise,4 Febea,5 Gabv6 (link), and all the microcredit acronyms.
We conclude with a summary table taken from a document of the Banca Etica
Group called “Ethical Finance and Sustainable Finance: two comparison models”
in which we try to clearly summarize the peculiarities of doing ethical finance with
respect to sustainable finance. A clarification that is necessary today because, despite
the awareness that this European legislation is also the result of 40 years of ethical
finance, there is a strong risk of legitimizing a merely formal change in mainstream
finance, instead creating confusion with those who apply these criteria in depth.7
Below we can observe a summary table of the principles of ethical finance.8

4 Association Internationale des Investisseurs dans l’Économie sociale. See https://inaise.org.


5 European Federation of Ethical and Alternative Banks and Financiers. See https://febea.org.
6 Global Alliance for Banking on Values: GABV. See https://www.gabv.org.
7 Extract of the complete text at the link Finanza etica e finanza sostenibile due modelli.
8 THE ETHICAL FINANCE MANIFESTO edited by the Ethical Finance Association—1998.

Ethically oriented finance:


1. Believes that credit, in all its forms, is a human right.
5 Tools and Measurement Criteria of Ethical Finance Through … 123

Basic objectives: Provide economic resources to those who have entrepreneurial


projects that respect the environment and human rights and are capable of promoting
inclusion, also setting themselves social and environmental objectives to achieve.
Profit making is pursued as an indicator of efficiency and tool to increase impacts.
Speculation or support of the real economy: Ethical finance operators are inextri-
cably linked to the real economy. The financial instruments are aimed at financing
companies that are focused to the environment and human rights and to ensure a
balanced return on savings and investments. Ethical finance supports the adoption of
measures designed to counter speculation, such as the tax on financial transactions.

It does not discriminate between job recipients on the basis of sex, ethnicity or religion, or
even on the basis of assets, thus caring for the rights of the poor and marginalized. It therefore
finances human, social and environmental promotion activities, evaluating projects with the dual
criteria of economic viability and social utility. Loan guarantees are another way for partners to
take responsibility for financed projects. Ethical finance considers equally valid, like patrimonial
guarantees, those forms of personal, category or community guarantees that allow access to credit
even to the weakest sections of the population.
2. Consider efficiency a component of ethical responsibility.
It is not a form of charity: it is an economically viable activity that intends to be socially useful.
Assuming responsibility, both in making one’s savings available and in using them to preserve their
value, is the foundation of a partnership between subjects of equal dignity.
3. It does not consider enrichment based solely on the possession and exchange of money to be
legitimate.
The interest rate, in this context, is a measure of efficiency in the use of savings, a measure of
the commitment to safeguard the resources made available by savers and to make them bear fruit in
vital projects. Consequently, the interest rate, the return on savings, is different from zero but must
be kept as low as possible, on the basis of both economic and social and ethical evaluations.
4. It’s transparent.
The ethical financial intermediary has the duty to treat with confidentiality the information
on savers that it comes into possession of in the course of its activity, however the transparent
relationship with the customer imposes the naming of the savings. Depositors have the right to
know the financial institution’s operating processes and its employment and investment decisions.
It will be the responsibility of the ethically oriented intermediary to make the appropriate information
channels available to ensure transparency on his activity.
5. It provides for the participation in the important decisions of the company not only by the
shareholders but also by savers.
The forms can include both direct mechanisms of indication of preferences in the destination
of funds, and democratic mechanisms of participation in decisions. In this way, ethical finance
promotes economic democracy.
6. It has social and environmental responsibility as its reference criteria for employment.
It identifies the fields of employment, and possibly some preferential fields, by introducing
reference criteria based on the promotion of human development and social and environmental
responsibility in the economic investigation. In principle, it excludes financial relationships with
those economic activities that hinder human development and contribute to violating fundamental
human rights, such as the production and trade in arms, production seriously harmful to health and
the environment, activities based on exploitation of minors or the repression of civil liberties.
7. It requires global and coherent adherence by the manager who directs all of its activities.
If, on the other hand, the ethical finance activity is only partial, it is necessary to explain, in a
transparent way, the reasons for the limitation adopted. In any case, the intermediary declares its
willingness to be ‘monitored’ by institutions guaranteeing savers.
124 M. Piccolo and F. Vigliarolo

Product approach versus systemic approach: Ethical finance operators put the
assessment of social and environmental impacts at the heart of all proposed financial
products and all corporate practices, including, for example, manager remuneration
policies; the incentives; etc. Environmental and social impact assessments are a full
part of the internal control system on all activities.
Governance Models: The intermediary who does ethical finance must have trans-
parent and participatory governance.
Weight of ESG Parameters: Ethical finance evaluates with specific criteria and indi-
cators every aspect -environmental, social and governance- of the activities it finances
with credit and investments and also their respective interrelationships. Exclusion
criteria are adopted in different sectors, with low tolerance thresholds. It has its
own methodology that uses national and international databases integrating them
with those of non-governmental organizations and using them actively, not passively
applying scores provided by third parties.
Lobby versus Advocacy: Ethical finance invests in critical finance education projects
that make people aware of the social and environmental risks of the financial casino
and calls on institutions to regulate and tax finance so that it can contribute to healthy
and inclusive development across the globe planet. Other requests include the sepa-
ration of commercial and investment banks, the fight against tax havens (for example
through the universal adoption of country by country reporting), limits on the use of
derivatives and others. The initiatives are carried out in a widespread way thanks to
the active involvement of the members (participation).
Engagement and critical shareholding: Ethical finance seeks dialogue with the
companies in which it invests to stimulate them to constantly improve their social
and environmental performance.

5.3 Computational Finance Critic: Limits and Challenge


with Respect to Ethic Finance

In 1891 the distinction between positive economics and normative economics arises,
proposed by John Neville Keynes, father of the famous Maynard. Positive economics
is understood as “the description of the functioning of an economic system” “as it is”;
and normative economics is understood as “the evaluation of what is desirable, its
costs and benefits”. For Amartya Sen, Nobel Prize in Economics in 1998, the afore-
mentioned represents the crucial problem of the contemporary era to which giving
answers; that is, how to overcome utilitarian rationalism and reduce the distance from
ethic and classic economy principles? Now it must be said that with utilitarianism,
whose main exponents were Jeremy Bentham and Stuart Mill, a concept of well-
being (also community) is thus affirmed, based solely on personal interest in terms
of pleasure and pain, and they deny fully those elements of the process that this may
5 Tools and Measurement Criteria of Ethical Finance Through … 125

entail, whether they are considered fair or unfair. All this ended with the transfer from
a community ethic to a utilitarian ethic such as the distinction between the bonum
honestum and the bonum utile—the latter calculable in mathematical terms, mainly
with the amount of benefits and pleasures that are obtained and that give life to the
community famous “table of calculations of pleasures”-, which shows us and further
confirms how society becomes almost a by-product of the economy subordinated to
the utilitarian laws of maximizing monetary profit.
In this conceptual framework, the following paragraph is proposed, which tries
to criticize computational finance in a positivism matrix and successively proposes
how to overcome some limits that are highlighted, through the application of the
principles of ethical finance that were exposed before. To do this, in this paragraph
we will reconstruct some principles of computational finance, evidencing critical
points to later propose some indices that could be incorporated.

5.3.1 Some Definition Aspects Considered in This Paragraph

How do we consider computational finance in this context? Computational finance,


also sometimes defined as financial engineering, is conceived as a process through
which different factors are applied to reach conclusions about investments in shares,
bonds, futures, and coverage of stock market activity. For all this, it uses the tools
of mathematics and proposes computer simulations to explore possible risks and
results. In this context, through these instruments of quantitative analysis is applied
to:
• Calculate the risks and results in the field of investment banking
Due to the sheer amount of funds involved in this type of situation, computational
finance comes to the fore as one of the tools used to evaluate any potential invest-
ment, whether it’s something as simple as a new start-up or a fund, well established.
Computational finance can help prevent large amounts of finance from being invested
in something that simply doesn’t appear to have much of a future from a financial
point of view.
• The second area could be the world of financial management.
Stockbrokers, shareholders, and anyone who chooses to invest in any type of
investment can benefit from using the basic principles of computational finance as
a way to manage an individual portfolio. Looking at the numbers for individual
investors, just like for larger concerns, can often clarify what risks are associated
with a given investment opportunity. The result can often be an individual who is
able to avoid a bad opportunity and decide to invest another day in something that
will be worth it in the long run.
126 M. Piccolo and F. Vigliarolo

• The third area is the strategic planning of an enterprise:


In the business world, the use of computational finance can often come into play
when it comes time to engage in some form of corporate strategic planning. For
example, reorganizing a company’s operating structure in order to maximize profits
may seem very good at first glance, but running the data through a computational
finance process may in fact uncover some drawbacks to the current plan that were
not readily visible. Before. In other words, computational finance is concerned with
calculating full gas costs in a large area that sometimes lie hidden and a restructuring
can be more expensive than expected.

5.3.2 The Background Vice: Economic Positivism

Before proposing some indices and measurement tools based on ethical finance, it is
good to analyze the limits, in our opinion, of computational finance.
Computational finance is a direct extension of economic positivism. With this
term we refer to “the science that studies economic systems ‘as they are’, through
mathematical rationality and the maximization of individual interest, leaving out
considerations of a normative type or that we could define from the question: what
kind of society do we want to be built in terms of values, ethic principles, rights or
identity? In this sense, economics positivism is interpreted as the attempt to transform
“social behavior” into mathematical reasoning, which is based on rational individual
interest that can be materially quantified to the detriment of a subjective identity,
more complex to define and that leaves in in the background, the “cultural implica-
tions” (values, principles, meanings, etc.) that contribute to define the individual and
community identity that can determine them. In other words, the economy ended up
dealing only with factors that can be mathematically quantified to the detriment of a
social identity, also made up of elements that are not mathematically quantifiable. For
these reasons, we understand by positivization of the economy also a systematization
of its own functioning, which totally dispenses with the transcendental dimension,
to apply mainly the laws of physics, statistics and (natural) mathematics, leaving the
others in the background factors, and which concentrates on describing the construc-
tion of wealth only as material facts. In other words, it could be said that economic
positivism is based on an unlimited faith in mathematics and in the ability to trans-
form the world in the name of progress and productive growth based on technological
innovation (the use of the steam engine, electricity and the expansion of railways).
For these reasons, economic positivism can be considered as an attempt to establish
the foundation for rational intervention in society and the economy or, better said, as
the use of empirical reason to modify and direct social behavior, which involves elim-
inating the metaphysical and transcendental implications. In economics, according
to us and for the reasons exposed in this work, this is delineated from the mercantilist
practices and in theoretical terms with the physiocrats before and with Smith later
[16]. In addition, it is based on the fact that man is a rational being who acts only
5 Tools and Measurement Criteria of Ethical Finance Through … 127

in view of monetary maximization, but as Amartya Sen demonstrates in “Ethics and


economics”, this is not always the case. There are other aspects that are important
such as social and environmental; actor’s economic logic is conceived toward the
construction of real well-being and not only for maximizing monetary interest. In
fact, all of this led to the conception of economic systems as expenditure/receipt
balances without addressing what we call the worldview behind economic activities.
In other words, it leaves out important questions such as: what kind of society do we
want to build with resource management? For the same reasons, also computational
finance deals with quantifiable mathematical aspects, but “does not calculate” what
is a human truth: it depends on the choices that men and women make regarding
what they consider important. So, returning to the three areas mentioned above, we
can affirm that:
• The positive or negative benefit of an investment cannot be calculated without
taking into account the socio-environmental impact, that is, its externalities in
positive and negative terms.
• The risk of the investment bank cannot ignore the fact that finance today is not
linked to production and its actions need to be related to real productive activities
and not speculative.
• The strategy of a company cannot be evaluated only in terms of costs/inputs based
on mathematical reasoning because men and women are not only rational beings
who act to reach ends but also towards values.

5.4 Measurement Criteria of Computational Finance


with the Principles of Ethical Finance

Starting from the three areas of intervention of computational finance exposed above,
below, we try to propose some index that can unite the principles of computational
finance and the criteria of ethical finance. With this objective, we propose three
formulas as a general methodology, in which coefficients can be incorporated with
respect to some of the principles of ethical finance considered important.
• Investment Banking Index with social and environmental implications (IBSEI)
When we calculate the investment banking, we have to reference to the entire
money supply chain, where it comes from, how it is managed, what it generates in
terms of society and the environment.
If the money comes from negative circuits, according to the principles set forth
in ethical finance, a negative coefficient is set. If the money comes from positive
circuits, a positive coefficient is set.
If the money is invested in negative or positive circuits, the same considerations
set out for the coefficients above apply.
The values of the coefficients are decided on the basis of the weight that a principle
has on reality, which is determined through a subjective process at a social level. In
128 M. Piccolo and F. Vigliarolo

this way we re-appropriate an ontological dimension defined by the relationship


between the subject, noesis, and the object, noerma [19]. For example, (if the money
comes from activities related to armaments, a negative coefficient—5 could be used
if one considers that it is precisely negative, which must be multiplied by the % of
monetary risk which in turn must be multiplied by the positive coefficient or negative
depending on the intended area of the investment.
The formula to calculate this index could be:

{(coef + /−)QMO ∗ %(riesgo R) ∗ (+/ − coef)QMI} ∗ T

Coef = coefficient applied; it can go from − 10 a + 10 and it cannot be equal to


zero
MO = Money of origin
MI = Money invested
T% coefficient of transparency of the money cycle.
• Risk Impact Index with social and environmental implications (RISEI)
In the field of risk impact, there may be investments with high monetary risk
but a low environmental impact and vice versa. For example, if there is a case of a
company or an individual that invests in a fund whose care for money has not been
related to the production or has had a high environmental impact, the risk is that it
could fall through the following formula.

%R = f(QMI ∗ +/ − coef) ∗ T

R is the risk
QI amount of money invested
Coef = positive or negative coefficient applied; it can range from − 10 to + 10
and cannot be equal to zero
T% coefficient of transparency of the money cycle.
• Business Management Index with social and environmental implications (BuSEI)
In the case of a company we can consider an index based on its final profit.

%R = fG(+/ − coef) ∗ T

R = is always the risky


G = the profit that the company has
Coef = always the coefficient that is attributed based on the social and
environmental externality that it presents
T% coefficient of transparency of the money cycle.
5 Tools and Measurement Criteria of Ethical Finance Through … 129

Finally, we propose some indicators that are used in ethical finance that can be
incorporated to calculate the positive/negative coefficient. They are:
• Environmental
– % of CO2 emissions of the entire value chain
– % use of renewable energy across the value chain
– % Use of biodegradable products
– % of packaging with natural products
– …
• Socio-economic
– % of vulnerable people working
– Equality of wages between men and women for the same tasks throughout the
value chain
– Compliance with the official employment contracts required by the countries
– Respect for Human Rights in general throughout the value chain
– % of investments in strategic sectors for the local community
– % of investment in real and non-speculative productive sectors
– Existence of methodologies for involving all the stakeholders of a community
in decisions where to invest money
– …
• Financial management
– Transparency (existence of information on the origin of the money, use,
investment)
– Internal democracy of financial institutions (one person, one vote)
– …

5.5 Some Conclusions

The results presented in this article demonstrate how computational finance itself
can be treated in terms of social reasoning, i.e. meanings that guide decisions and
risks that are not defined only in terms of monetary quantities, but on the basis of
subjectively perceived priorities and concern the vision of life of citizens in general
terms. That is, they concern general environmental and social values too, that are
part of the real well-being of citizens.
In order for these indices to work, there is a need for a systemic approach and an
external certification model that issues the degree of coefficients at an international
level for shareholders, banks and companies. We could say that a change in the
world order is needed, even in Stiglitz’s [14] terms, when in his Free Fall, he states
that without a change in the international order it is not possible to overcome the
problems induced by this financial system today without a link with the reality and
even economy.
130 M. Piccolo and F. Vigliarolo

This could mean generating an evaluation system for shareholders, banks and
companies that pays attention to the socio-environmental impact and which must
be implemented by every country. This would allow citizens, businesses and share-
holders to be able to decide on the basis of a more complete information system than
just calculating the monetary return. This socio-economic financial order would allow
economic activities to be directed again towards a system of socio-environmental
well-being and to get out of the logic of utilitarian rationalism, which is, as Amartya
Sen states, one of the problems of our age. In other words, it would allow those who
invest to introduce ethical elements into their decisions too.
Finally, it must be said that these indices can undoubtedly capture partial aspects of
computational finance or be reductive. But the goal of having indices that integrate
the two criteria is to propose a normative and not just a positivist dimension also
in computational finance, capable of making it clear that despite the increase in
monetary results, it can sometimes have a negative impact on the environment or
society which affects the real quality of life of populations in the long term. In other
words, an attempt is made to “calculate”, if this is the right term, also the social and
environmental risks that the management of financial values entails.
All this brings us back to the fact that the economy is a tool for managing
resources for the well-being of citizens and not science for pursuing monetary
interests (chrematistics), since the two things do not always go together.

References

1. Aristotle.: [EN] Nicomachean Ethics, Politics. Spanish version and introduction by Antonio
Gomez Robledo, 19th edn. Porrúa, Mexico (2000)
2. Arrighi, G.: Adam Smith en Pekín. Orígenes y fundamentos del siglo XXI, Akal, Madrid (2007)
3. Bee, M., Santio, F.: Finanza quantitativa con R Copertina flessibile, Apogeo (2013)
4. Kumar, B., Kumar, L., Sassanelli, C.: Green Finance in Circular Economy: A Literature Review.
Springer (2023)
5. Curci, G.: Finanza quantitativa e modelli matematici, Plus (2007)
6. Krippner, G.: The financialization of the American economy. Socio-Economic Review 3, 173–
208 (2005)
7. Lapavistas, C.: Financialization and capitalist accumulation: structural accounts of the crisis
of 2007–9. Discussion Paper Series 16, 1–10 (2010)
8. Oliva, I., Renò R.: Principi di finanza quantitativa Copertina flessibile, Apogeo (2021)
9. Perna T. (ed.): Fair trade. La sfida etica al mercato mondiale. Bollati Boringhieri (1998)
10. Piketty, T. (ed.): Le Capital au XXIe siècle. Éditions du Seuil, París (2013)
11. Pope Francis.: Encyclical Letter Laudato si’ of the Holy Father Francis on care for our common
home (May 24, 2015)
12. Sen, A. (ed.): Libertà è sviluppo. Perché non c’è crescita senza democrazia. Arnoldo Mondadori,
Milán (2000)
13. Sen, A. (ed.): Etica ed economia. Laterza, Bari (2003)
14. Stiglitz, J.E.: Freefall: America, Free Markets, and the Sinking of the World Economy. W. W.
Norton (2010)
15. Vigliarolo, F.: Le imprese recuperate. Argentina, dal crac finanziario alla socializzazione
dell’economia, Città del Sole e Altreconomia Edizioni, Reggio de Calabria (2011)
16. Vigliarolo, F.: La economia es un fenomeno social. Principios de fenomenología económica,
Buenos Aires. EUDEBA (2019)
5 Tools and Measurement Criteria of Ethical Finance Through … 131

Websites

17. Finanza etica e finanza sostenibile due modelli


18. http://www.bancaetica.it
19. http://www.eticasgr.it
20. https://febea.org
21. https://inaise.org
22. https://www.gabv.org/
23. https://www.raiffeisen.ch/rch/it.html
Chapter 6
Data Mining Techniques for Predicting
the Non-performing Assets (NPA)
of Banks in India

Gaurav Kumar and Arun Kumar Misra

Abstract Banks in India are facing many challenges and witnessing many changes
in recent times. Managing Non-Performing Assets (NPAs) has emerged as a major
challenge for banks. This chapter presents the findings of a formal attempt to explain
NPA variations from 2005 to 17. The findings are based on the application of various
data mining techniques such as random forest, elastic net regression, and k-NN
algorithm to understand the NPAs of banks in India. The study uses gross NPA
as a dependent variable and other bank-specific and macroeconomic variables as
independent variables. The experimental results show that elastic net regression is
the best data mining technique to model the NPAs in the given context. Also, the
empirical results in all the models have found strong evidence that certain variables
like the previous year’s NPA and the loan amount distributed have an impact on the
NPAs. The findings of the study will provide policy directions to the banking sector
and the government to control the quantum of NPAs in the financial system.

Keywords Indian Banks · NPA · Data mining · Caret · GLMnet · Machine


learning

6.1 Introduction

The banks in India are key ingredients for the country’s economic growth, financial
services, and the government, businesses, and individuals. Indian banks have been
resilient and withstood the crises such as the 2008 financial crisis and the covid-19

G. Kumar (B)
Dr. B. R. Ambedkar National Institute of Technology (NIT) Jalandhar, Jalandhar, India
e-mail: [email protected]
A. K. Misra
Vinod Gupta School of Management, Indian Institute of Technology Kharagpur, Kharagpur, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 133
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_6
134 G. Kumar and A. K. Misra

pandemic. However, in recent times, the banking ecosystem is marked by techno-


logical transformation, financial inclusion, and non-performing assets (NPAs). Non-
Performing Assets (NPAs) are loans and advances given by banks that have stopped
generating income for the bank. In other words, these are the loans that are not being
serviced by the borrower, either in terms of payment of principal or payment of
interest. In India, the level of NPAs has been a cause for concern for the govern-
ment and regulators for quite some time. NPAs are a result of various factors such
as economic downturns, mismanagement by borrowers, fraud, etc. NPAs not only
affect the profitability of banks but also pose a threat to the stability of the financial
system.
The high level of NPAs reduces the profitability of banks as they are unable to
recover the loans given to defaulters. It also reduces the availability of credit for
businesses and individuals. This, in turn, reduces the banks’ capacity to lend further,
which affects economic growth. Therefore, it is imperative to study the drivers of
NPA.
Machine learning (ML) has become increasingly important in the banking and
finance industry in recent years. Machine learning algorithms can analyze large
amounts of data and identify patterns that may indicate fraudulent activity. By
detecting fraud early, banks can prevent financial losses and protect their customers’
accounts. Moreover, ML can help banks and financial institutions manage risk by
identifying potential risks and analyzing data to make more informed decisions.
This can include assessing the creditworthiness of borrowers, evaluating investment
portfolios, and managing market risks.
The contribution of this chapter is to apply data mining techniques in the Indian
banking system to explore the determinants of NPAs. The study makes use of three
popular machine learning (ML) algorithms: Random Forest, elastic net regression,
and KNN algorithm to examine the determinants of non-performing assets in banks.
There are several machine learning R packages available, however, we have employed
a caret package. The objective is to develop three ML models using real banking
data-set of Indian banks, that were used to model the drivers of Gross NPA (GNPA).
Finally, the study compares the performance of three models by testing them on a
validation set.
The next section discusses the key literature in the area of non-performing assets.
The following section introduces the data, and machine learning methodology,
and provides a brief on testing and performance matrices. The subsequent section
discusses the estimation results. The concluding section presents a summary of the
results, limitations, and policy implications emerging from the study. A list of banks
included in the sample is provided in the appendix.
6 Data Mining Techniques for Predicting the Non-performing Assets … 135

6.2 Literature Review

Non-performing assets (NPAs) or non-performing loans (NPLs) are a major issue in


the Indian banking sector, and several studies have explored this problem in detail.
NPA is a double whammy for the banks. While the bank is required to make provisions
such as assets, they do not produce any income [15]. Arrawatia et al. [1] argued that
massive amounts of nonperforming loans have been harming the Indian banking
sector. The quality of the loan portfolio has declined over time. Many Indian banks
have experienced severe capital erosion as a result of this. The literature on NPAs in
India covers various aspects such as the causes of NPAs and their impact on banks.
This literature review aims to provide an overview of the existing literature primarily
on NPAs in banks in India.
Garg [6] highlights the need of understanding NPAs as it is a major driver in
controlling the profitability of banks. The study reports that both bank-specific factors
and macroeconomic factors are significantly driving the NPAs of banks in India,
such as GDP, return on assets, return on equity, lending rates, and exchange rates.
Similarly, [20] emphasize that regulating the high level of non-performing assets
is one of the goals of RBI as it may impact the economy and banking. This study
makes use of pooled and panel logit models to determine the risk factors that led to
a fall in the asset quality of banks in India. According to the study, banks with lower
profitability, lower levels of capital, ineffective operations and management, and a
less diverse portfolio are more likely to have poor asset quality. On the other hand,
a larger degree of NPAs is positively correlated with the size of the bank. Another
study by [18] studied NPA’s of Indian banks using aggregate data from the public
sector, private sector, and foreign banks from 2009 to 2013. According to the results
of the ANOVA test, there is no significant difference between international banks
and the private sector. The study also found that for public banks, the ratio of gross
NPA to gross advances is rising, while for foreign banks, the ratio of loss advances
to gross loans is rising.
Rajaraman et al. [17] evaluates the effect of the operating region on domesti-
cally owned banks using the percentage of branches in states. The study reports that
banking efficiency and technology significantly influence NPA performance in the
Indian context.
Maiti and Jena [13] analyzed the profitability of selected bank groups in India
using panel regression. The study found that major determinants of profitability are
net interest margin, profit per employee, non-interest income, and net NPA ratio.
The study by [16] estimates the technical efficiency (TE) of the banks by developing
a Translog stochastic frontier approach. Additionally, the study employed Granger
causality tests to verify the causal relationship between efficiency and advances.
The study reported that total liabilities are negatively related while the ratio of non-
performing assets to total assets, the ratio of priority sector advances to total advances,
and market share are positively associated with technical inefficiency.
Kanoujiya et al. [8] explored the financial distress in 34 banks using the three
variants of Altman’s Z-score while controlling for bank regulation, size, profitability,
136 G. Kumar and A. K. Misra

and value. The study revealed that high market power or low competition measured
using Learner’s Index lowers financial distress. It is positively related to financial
stability in banks in India.
Chawla and Rani [4] developed a structured questionnaire to collect data from
officers working in the credit department in different banks in India. The research
findings reveal the banker’s perspective and bring some practical insights into the
factors behind specific NPAs resolution strategies. The study has identified 7 signif-
icant management dimensions out of 21 dimensions based on exploratory factor
analysis (EFA). The study provides suggestions on effective credit management and
improving the asset quality of banks in India.
Jain and Gupta [7] examined the moral hazard behavior of Indian banks by
observing the impact of the level of Net Non-Performing Loans (NNPL) on lending
behavior. The study makes use of the lagged value of NNPL to determine the distress
levels. The analysis shows that there is an increase in NPLs when the loan growth
ratio increases. This is when banks experience prior sizable loan losses as compared
to when banks are relatively safe indicating moral hazard behavior.
A recent study by [3] examines the impact of the Covid-19 pandemic on
Bangladesh’s banking sector. The findings suggest that large banks are more vulner-
able to the risk posed by the pandemic. The study found that all banks are likely to
see a fall in risk-weighted asset (RWA) values, capital adequacy ratios (CAR), and
interest income at the individual bank and sectoral levels. There is a disproportionate
increase in these three dimensions after an NPL shock of a higher degree.
The study of [14] highlights the need for both better utilization of resources and
scale expansion. In conclusion, the discussion of the literature on NPAs in banks
highlights the causes and effects of this problem. The literature on NPAs in banks
in India discusses the complexity of the problem and the need for a multi-pronged
approach to address it. While measures such as the Insolvency and Bankruptcy Code
(IBC) and the Asset Quality Review (AQR) have been effective in addressing the
problem to a certain extent, there is a need for greater focus on understanding the
drivers of NPA. This will help to design preventive measures to avoid the build-
up of NPAs in the first place. The literature also emphasizes the need for banks to
improve their governance and risk management practices to prevent the recurrence
of the problem in the future. Overall, the literature provides valuable insights into
the problem of NPAs in banks in India and highlights the need for sustained efforts
to address the problem.

6.3 Research Methodology

6.3.1 Sample and Data Collection

Data on every quarter is collected from the Reserve Bank of India (RBI) portal. The
data is then aggregated at year levels. The period of the study is from 2005 to 2017.
6 Data Mining Techniques for Predicting the Non-performing Assets … 137

6.3.2 Experimental Variables

The definition of the variables is provided in Table 6.1. The output variable of the
study is gross non-performing assets (GNPA). As per Fig. 6.1, the GNPA data is
closely resembling normal distribution. Following studies that evaluated the NPA in
banking through its exogenous determinants, this article uses EPU, learner index,
ROA, NIM, operational risk, leverage, liquidity ratio, GDP growth rate, interbank
rate, Gsec 10Yr yield, asset diversification, income diversification, regulatory capital,
ownership dummy, and loan size.
As per Fig. 6.1, the GNPA data is closely resembling normal distribution.

Table 6.1 Variable definition


Variable Description
GNPA GNPA indicates the total value of gross non-performing assets
EPU India index (EPU) Economic policy uncertainty (EPU) is an index constructed based on
newspaper articles regarding policy uncertainty from leading
newspapers
Learner index (LI) It is used as a measure of a bank’s monopoly power. It is computed as
the markup over the marginal cost
ROA It is a financial ratio that indicates how profitable a bank is
concerning its total assets
NIM Net interest margin (NIM) is a measure of the difference between the
interest income earned by a bank and the interest it pays out to its
depositors, relative to the amount of their assets that earn interest
Leverage Risk leverage is defined as net worth to risk-weighted assets
Credit-deposit ratio (CD It is the ratio of total advances to the total deposits. It indicates how
ratio) much of each rupee of the deposit goes toward credit markets
Liquidity ratio It is the ratio of short-term deposits to total deposits
GDP growth rate It is a ratio that measures the change in the GDP of the country in
comparison to an earlier period
Interbank rate It is the interest charged on short-term loans made between banks
Gsec 10Yr yield This is the market yield on the Indian 10-year government bond
Asset-diversification It is the ratio of corporate loans to retail loans
Income diversification It is the ratio of interest income to non-interest or fee-based income
Regulatory capital It is the ratio of bank capital to risk (weighted) assets ratio (CRAR)
Ownership dummy Public sector banks (PSB) = 1|private banks = 0
Loan size Gross corporate loan, retail loan, and working capital loan disbursed
in a particular financial year
138 G. Kumar and A. K. Misra

Fig. 6.1 Histogram of GNPA data

6.3.3 Data Mining Methodology

The study builds the ML models on the training set and evaluates their performance
on the test set. To achieve the optimal prediction for the NPA, we have applied various
data mining techniques using the caret and GLMnet packages in R. These libraries
have functions to implement the machine learning algorithms such as random forest
(RF), elastic net regression, and ANN. The varImp function is used to explain the
variable or feature importance. This function automatically provides importance
scores to the variables in the range between 0 and 100.

6.3.3.1 Random Forest

A well-liked machine learning method called Random Forest is utilized for both
classification and regression problems. It is an ensemble learning technique that
produces the class that is the mean of the classes (classification) or the mean prediction
(regression) of the individual decision trees during training time. Each decision tree
in a Random Forest is built using a randomly selected subset of the features from the
input data. This increases the diversity of the individual trees in the forest and reduces
overfitting. Additionally, each tree is trained on a random subset of the training data,
using a process called bootstrapping. The algorithm works by predicting each input
data point by passing it through each decision tree in the forest. The final prediction
is then determined by taking the majority vote (for classification) or the average (for
regression) of the predictions made by all the trees.
Random Forest (RF) has been extensively used in Finance. In order to create
detection models, [11] used four statistical approaches, including parametric and non-
parametric models. It came to the conclusion that Random Forest has the highest accu-
racy, and non-parametric models have higher accuracy than non-parametric models.
6 Data Mining Techniques for Predicting the Non-performing Assets … 139

Khaidem et al. [9] and, Kumar and Thenmozhi [10] predict trends in the stock market
prices using RF algorithm.
Random Forest has several advantages over other machine learning algorithms.
It can handle large datasets with a high number of features, and it is less prone to
overfitting than other methods. Additionally, it is highly scalable and can be easily
parallelized, making it ideal for big data applications.

6.3.3.2 Elastic Net Regression

GLMnet is a package in R that provides a suite of algorithms for fitting generalized


linear models (GLMs) with L1 (lasso) and L2 (ridge) regularization, as well as elastic
net regularization, which is a combination of both L1 and L2 penalties. Regularization
seeks to control variance by adding a tuning parameter, lambda, or alpha:

Y = a + b ∗ X1 + c ∗ X2 (6.1)

Ridge regression penalty

λ ∗ b2 + λ ∗ c2 ; ||w||2 (6.2)

Ridge regression cost function


∑ ∑
min (y − yi )2 + α ∗ β2 (6.3)

Lasso regression penalty

λ∗|b| + λ∗|c|; ||w|| (6.4)

Elastic net regression is a regularized linear regression method that combines


the strengths of both L1 and L2 regularization. L1 regularization has a sparsity-
inducing effect, meaning it can set some of the coefficients to zero and perform feature
selection. L2 regularization, on the other hand, has a shrinkage effect, meaning it
reduces the magnitude of the coefficients and prevents overfitting. The elastic net
penalty combines both L1 and L2 regularization and has a hyperparameter alpha that
controls the balance between the two penalties. A value of alpha = 0 corresponds to
ridge regression, while alpha = 1 corresponds to lasso regression.
The GLMnet package implements the elastic net regularization using a coordinate
descent algorithm, which updates the coefficients one at a time while holding the
other coefficients fixed. This makes the algorithm efficient and scalable for high-
dimensional data. One of the strengths of GLMnet elastic net regression is that it
can handle high-dimensional data and perform feature selection, making it useful for
data with many predictors. It can also handle correlated predictors and is less prone
to overfitting than standard linear regression.
140 G. Kumar and A. K. Misra

Elastic net regression is now finding its application in Finance. The identification
of the most important variables that are closely associated to the credit risks is a key
challenge for online financing. [5] creates a new multiple structural interacting elastic
net model for feature selection in order to efficiently discover the most important
features for credit risk assessment in online financing. On other hand, [12] predicts
coherent house prices using Elastic net regression and other algorithms for those
who don’t own homes based on their financial resources and goals.

6.3.3.3 k-NN

K-nearest neighbors (k-NN) is a simple yet effective machine learning algorithm


used for both classification and regression tasks. In k-NN, the output of a new data
point is determined by the majority class of its K nearest neighbors in the training
data. The algorithm works by calculating the distance between the new data point
and each point in the training data. The K nearest points are then selected, and the
output is determined by taking the majority class (in classification) or the average
(in regression) of their target values. The value of K is an important hyperparameter
in k-NN, and it determines the number of nearest neighbors used in the prediction.
A higher value of K leads to a smoother decision boundary and reduces the effect
of noise in the data, while a lower value of K can lead to overfitting. One of the
strengths of k-NN is its simplicity and ease of implementation. It is also a non-
parametric algorithm, meaning it makes no assumptions about the underlying data
distribution. However, KNN can be computationally expensive, especially with large
datasets, and it requires scaling of the features to ensure equal contribution to the
distance calculation.
k-NN is also finding applications in finance. Subha and Nambi [19] makes use
of k-NN algorithm to predict the direction of stock markets. Similarly, [2] provide
a novel multivariate k-NN approach for predicting financial time series based on
information shared among referential nearest neighbours.

6.3.3.4 Testing and Accuracy

This study applied repeated training and testing on five procedures. The study sliced
the data set into 60:40, which provides a larger and more reliable validation set. After
training the model, we use it to generate the predictions and present the evaluation
results for both the training and test datasets.

6.3.3.5 Performance Matrices

The study acknowledges that only one metric for evaluating model performance, may
not be sufficient to make a definitive conclusion about the superiority of one model
over another. Therefore, the study makes use of Root Mean Square Error (RMSE),
6 Data Mining Techniques for Predicting the Non-performing Assets … 141

Mean Absolute Error (MAE), and R-squared values to evaluate the performance of
the models. The RMSE and MAE values represent the average difference between
the predicted and actual values, where lower values indicate better performance. The
R-squared value represents the proportion of the variance in the dependent variable
that is explained by the independent variables, where higher values indicate better
performance.

6.4 Results and Discussion

The application of machine learning algorithms aims to identify the relationship


between the gross NPA as a dependent variable and a set of other independent
variables. The results of the same are discussed in the subsequent subsections.

6.4.1 Random Forest

Table 6.2 presents results from the random forest model with different values of
mtry (the number of variables randomly selected at each split) and their corre-
sponding performance metrics: RMSE (Root Mean Squared Error), R-squared, and
MAE (Mean Absolute Error).
Based on the results, the model performance is best when mtry is equal to 6. This
value of mtry corresponds to an RMSE of 0.654, an R-squared of 0.821, and an
MAE of 0.415. As mtry increases or decreases from this optimal value, the model’s
performance tends to worsen slightly but remains relatively stable. Table 6.3 shows
the importance of nineteen variables when predicting the GNPA of the banks. As per
the random forest model results, the lag of gross NPA emerged as the variable with
utmost importance followed by loan advances and net interest margin respectively.
Interbank interest rate, G. sec. rate, and lag of GDP emerged as the least important

Table 6.2 Results from the


mtry RMSE R-squared MAE
random forest model with
different values of mtry 2 0.686 0.823 0.461
4 0.658 0.822 0.427
6 0.654 0.821 0.415
10 0.656 0.814 0.404
15 0.661 0.807 0.399
20 0.663 0.803 0.395
100 0.667 0.801 0.396
Notes RMSE was used to select the optimal model using the
smallest value. The final value used for the model was mtry =
6
142 G. Kumar and A. K. Misra

Table 6.3 Variable


Lag (GNPA) 100
importance generated through
random forest model Loan size 93.332
NIM 55.907
Regulatory capital 24.977
Internal governance 17.341
LI 15.161
ROA 14.467
Asset diversification 12.424
Liquidity 11.426
Lag (regulatory capital) 9.636
Lag (LI) 9.013
Leverage 5.588
Income diversification 4.507
CD ratio 2.987
Ownership dummy 1.696
EPU 0.639
Lag (GDP) 0.402
Gsec 10Yr yield 0.171
Interbank rate 0
Source Author’s calculations
Notes This table shows the variable importance of all the variables
in predicting GNPA using a random forest model

variables. Figure 6.2 presents the robustness of the random forest model by testing
it on a test data set, and regressing the predicted and measured values. The RMSE
between predicted and measured test set values is 0.644.

Fig. 6.2 Validation of random forest model. Notes This figure shows the values predicted by the
random forest model on Y-axis and the values in the test data set on the X-axis
6 Data Mining Techniques for Predicting the Non-performing Assets … 143

Table 6.4 Resampling results across tuning parameters


Alpha Lambda RMSE R-squared MAE
0.311 0.004 0.622 0.822 0.425
0.534 2.636 1.473 NaN 1.111
0.601 0.014 0.616 0.826 0.416
0.762 0.234 0.667 0.818 0.453
0.925 0.003 0.620 0.823 0.423
Notes RMSE was used to select the optimal model using the smallest value. The tuning parameters
used for the model are alpha = 0.601 and lambda = 0.014

6.4.2 Elastic Net Regression

The results of the elastic net regression are presented in Table 6.4 which reports the
performance of the model for different tuning parameters. The analysis used cross-
validated resampling, which means that the data were divided into five-folds, and the
analysis was repeated five times. The results of the analysis show that the optimal
model has an alpha value of 0.601 and a lambda value of 0.014.
The model’s performance was evaluated using three metrics: RMSE, R-squared,
and MAE. The RMSE value was used to select the optimal model because it had
the smallest value. The optimal model had an RMSE value of 0.616 and an R-
squared value of 0.826. The MAE value was 0.416. The R-squared value indicates
that the model explains 82.6% of the variance in the dependent variable, which
is a good fit. The RMSE value indicates that the average difference between the
predicted and observed values is 0.616. The MAE value indicates that, on average, the
model’s predictions are off by 0.416 units. Overall, the elastic net regression analysis
suggests that there is a significant relationship between the dependent variable and
the independent variables, and the model’s performance is good.
Table 6.5 shows the importance of nineteen variables when predicting the GNPA of
the banks. As per the elastic net regression model results, lag of gross NPA emerged as
the variable with utmost importance followed by loan advances and return on asset
(ROA) respectively. Regulatory capital and credit-to-deposit ratio emerged as the
least important variables. Leverage and inter-bank rate are not found to be impacting
GNPA. Figure 6.3 presents the robustness of the elastic net regression model by
testing it on a test data set, and regressing the predicted and measured values. The
RMSE between predicted and measured test set values is found to be 0.633.

6.4.3 k-NN Algorithm

The results of the k-nearest neighbor (k-NN) algorithm, as shown in Table 6.6,
presents the performance of the model for different values of k. The algorithm was
applied to a pre-processed dataset that was centered and scaled, and the performance
144 G. Kumar and A. K. Misra

Table 6.5 Variable


Lag (GNPA) 100
importance generated through
elastic net regression Loan size 64.905
ROA 49.219
Internal governance 33.671
Regulatory capital 33.296
Income diversification 27.083
LI 25.808
Lag (LI) 25.521
Liquidity 20.318
EPU 16.768
Lag (GDP) 10.268
Gsec 10Yr yield 8.314
Asset diversification 7.082
NIM 6.272
Ownership dummy 5.928
Lag (regulatory capital) 5.649
CD ratio 4.262
Leverage 0
Interbank rate 0
Source Author’s calculations
Notes This table shows the variable importance of all the variables
in predicting GNPA using elastic net model

Fig. 6.3 Validation of elastic net regression model. Notes This figure shows the values predicted
by the elastic net regression model on Y-axis and the values in the test data set on the X-axis

was evaluated using cross-validation with a fivefold split. The summary of sample
sizes indicates that the dataset was divided into 5 sets with varying sample sizes
ranging from 183 to 187. The results table shows that as the value of k increases, the
performance of the model decreases, as indicated by the increase in the root mean
6 Data Mining Techniques for Predicting the Non-performing Assets … 145

Table 6.6 KNN resampling


k RMSE R-squared MAE
results across tuning
parameters 5 0.750 0.776 0.524
7 0.760 0.782 0.537
9 0.780 0.771 0.554
11 0.792 0.772 0.561
13 0.796 0.776 0.564
15 0.813 0.771 0.571
17 0.828 0.768 0.581
19 0.837 0.772 0.580
21 0.849 0.768 0.588
23 0.870 0.753 0.606
Notes RMSE was used to select the optimal k-NN model using the
smallest value. The final value used for the model was k = 5

squared error (RMSE), mean absolute error (MAE), and decrease in the R-squared
value.
The results show that the k-NN algorithm performs best with a k value of 5, as
it has the lowest RMSE, highest R-squared value, and lowest MAE. However, it’s
important to note that the performance of the model is not significantly different
between k = 5 and k = 7, as the differences in the RMSE, R-squared, and MAE
values are relatively small. As the value of k continues to increase beyond k = 7, the
performance of the model decreases significantly.
Table 6.7 shows the importance of nineteen variables when predicting the GNPA
of the banks. As per the k-NN model results, the lag of gross NPA emerged as the
variable with utmost importance followed by loan advances and net interest margin
(NIM) respectively. Income diversification and Interbank rate emerged as the least
important variables. The lag of economic policy uncertainty (EPU) is not found to
be impacting GNPA. Figure 6.4 presents the robustness of the elastic k-NN model
by testing it on a test data set, and regressing the predicted and measured values. The
RMSE between predicted and measured test set values is found to be 0.679.
Based on the summary of the performance metrics provided in Table 6.8, the
Elastic Net Regression model has the lowest RMSE (0.616) and the highest R-
squared value (0.826), indicating that it has the best overall predictive performance.
The Random Forest model has a slightly higher RMSE (0.654) and R-squared value
(0.821) than the Elastic Net Regression model, but it has a slightly lower MAE
(0.415) than the Elastic Net Regression model. The k-NN Algorithm has the highest
RMSE (0.750) and the lowest R-squared value (0.776), indicating that it has the
worst overall predictive performance among the three models.
The variable importance results in all models indicate that there is a relationship
between the Gross Non-performing assets of a bank in a given year and its Gross
Non-performing assets in the previous year. This suggests that if a bank has a high
level of non-performing assets in one year, it is more likely to have a high level of
146 G. Kumar and A. K. Misra

Table 6.7 Variable


Lag (GNPA) 100
importance generated through
the k-NN algorithm Loan size 96.442
NIM 82.387
Regulatory capital 63.326
Internal governance 55.450
Lag (regulatory capital) 46.036
LI 44.068
Lag (LI) 34.347
ROA 32.085
Asset diversification 26.430
Liquidity 23.315
Ownership dummy 22.740
Leverage 7.091
Lag (GDP) 6.512
CD ratio 6.235
Gsec 10Yr yield 3.483
Income diversification 2.621
Interbank rate 0.685
EPU 0
Source Author’s calculations
Notes This table shows the variable importance of all the variables
in predicting GNPA using the k-NN model

Fig. 6.4 Validation of the k-NN model. Notes This figure shows the values predicted by the k-NN
regression model on Y-axis and the values in the test data set on the X-axis

non-performing assets in the following year as well. This finding has implications for
banks and regulators, as it suggests that efforts to reduce non-performing assets may
need to focus not only on the current year’s performance but also on addressing the
root causes of non-performing assets in previous years. The result also indicates that
if a bank disburses a large number of loans each year, it is more likely to experience
6 Data Mining Techniques for Predicting the Non-performing Assets … 147

Table 6.8 Summary of the


Model RMSE R-squared MAE
model results
Random forest 0.654 0.821 0.415
Elastic net regression 0.616 0.826 0.416
k-NN algorithm 0.750 0.776 0.524
Notes Performance results of testing and training financial perfor-
mance of Indian banking data from caret and GLMnet library
outputs in R

a higher level of GNPAs in that year or subsequent years. This finding has important
implications for banks and regulators, as it highlights the need to carefully manage
and monitor loan disbursal activities to minimize the risk of non-performing assets.

6.5 Conclusion

Banks in India have been undergoing major changes in the dynamic environment
over the past few years and NPAs continue to be a major concern for the banks. The
rise in NPAs has had a significant impact on the profitability of banks and has posed
a threat to the stability of the financial system. Machine learning has the potential to
transform the banking and finance industry by enabling financial institutions to make
more informed decisions and mitigate risks. This study attempts to study the bank-
specific and other macroeconomic determinants of the NPAs in Indian banks using
machine learning methodology. The experimental results demonstrated that based
on the lowest RMSE and highest R-squared criteria, the elastic net regression has
the best-predicting accuracy for modelling the NPAs in the Indian banking system.
The performance of the elastic net regression is closely followed by random forest.
The k-NN model is reported to be the least preferred model. Consistently, in all the
models the lag of gross NPA and the loan amount is figured among the top features.
Based on these results, this study suggests that regulators and banks should focus
on controlling the non-performing assets of the current year but also on addressing the
root causes of non-performing assets in previous years. Also, the study emphasizes
that banks may need to develop effective risk management strategies to ensure that
loans are disbursed only to creditworthy borrowers and that the bank’s exposure to
default risk is appropriately managed. This article has its scope limited to the models
tested, studied period, and banks operating in India. The impact of the COVID-19
pandemic on banks’ NPAs is an interesting area of research that can be taken for
further exploration.
148 G. Kumar and A. K. Misra

Appendix

A. Banks included in the sample

1 Allahabad Bank
2 Andhra Bank
3 Axis Bank
4 Bank of Baroda
5 Bank of India
6 Bank of Maharashtra
7 Canara Bank
8 Central Bank of India
9 Citibank
10 Corporation Bank
11 Dena Bank
12 Deutsche Bank AG
13 Federal Bank
14 HDFC Bank
15 HSBC Ltd
16 ICICI Bank
17 IDBI Bank Ltd.
18 Indian Bank
19 Indian Overseas Bank
20 IndusInd Bank
21 Oriental Bank of Commerce
22 Punjab And Sind Bank
23 Punjab National Bank
24 State Bank of India
25 Syndicate Bank
26 UCO Bank
27 Union Bank of India
28 United Bank of India
29 Vijaya Bank
30 Yes Bank Ltd.
6 Data Mining Techniques for Predicting the Non-performing Assets … 149

References

1. Arrawatia, R., Dawar, V., Maitra, D., Dash, S.R.: Asset quality determinants of Indian banks:
empirical evidence and policy issues. J. Public Aff. 19(4), e1937 (2019)
2. Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., Inoue, D.: Referential k-NN regression for finan-
cial time series forecasting. In: Neural Information Processing: 20th International Conference,
ICONIP 2013, Daegu, Korea, November 3–7, 2013. Proceedings, Part I 20, pp. 601–608.
Springer Berlin Heidelberg (2013)
3. Barua, B., Barua, S.: COVID-19 implications for banks: evidence from an emerging economy.
SN Bus. Econ. 1, 19 (2021). https://doi.org/10.1007/s43546-020-00013-w
4. Chawla, S., Rani, S.: Resolution of non-performing assets of commercial banks: the evidence
from banker’s perspective in Indian banking sector. Ind. Econ. J. 70(4), 635–654 (2022). https://
doi.org/10.1177/00194662221118318
5. Cui, L., Bai, L., Wang, Y., Jin, X., Hancock, E.R.: Internet financing credit risk evaluation
using multiple structural interacting elastic net feature selection. Pattern Recogn. 114, 107835
(2021)
6. Garg, N.: Factors affecting NPAs in Indian banking sector. Paradigm 25(2), 181–193 (2021).
https://doi.org/10.1177/09718907211035594
7. Gupta, C.P., Jain, A.: A study of banks’ systemic importance and moral hazard behaviour: a
panel threshold regression approach. J. Risk Fin. Manage. 15(11), 537 (2022)
8. Kanoujiya, J., Rastogi, S., Bhimavarapu, V.M.: Competition and distress in banks in India: an
application of panel data. Cogn. Econ. Fin. 10(1), 2122177 (2022)
9. Khaidem, L., Saha, S., Dey, S.R.: Predicting the direction of stock market prices using random
forest (2016). arXiv preprint arXiv:1605.00003
10. Kumar, M., Thenmozhi, M.: Forecasting stock index movement: a comparison of support
vector machines and random forest. In: Indian Institute of Capital Markets 9th Capital Markets
Conference Paper (2006)
11. Liu, C., Chan, Y., Alam Kazmi, S.H., Fu, H.: Financial fraud detection model: based on random
forest. Int. J. Econ. Fin. 7(7) (2015)
12. Madhuri, C.R., Anuradha, G., Pujitha, M.V.: House price prediction using regression tech-
niques: a comparative study. In: 2019 International Conference on Smart Structures and Systems
(ICSSS), pp. 1–5. IEEE (2019)
13. Maiti, A., Jana, S.: Determinants of profitability of banks in India: a panel data analysis. Schol.
J. Econ. Bus. Manage. (SJEBM) 4, 436–445 (2017)
14. Maity, S., Sahu, T.N.: How far the Indian banking sectors are efficient? An empirical
investigation. Asian J. Econ. Banking 6(3), 413–431 (2022)
15. Olekar, R., Talawar, C.: Non-performing assets management in Karnatak Central Co-operative
Bank Ltd. Dharawad. Int. J. Res. Commerce Manage. 3(12), 126–130 (2012)
16. Raina, D., Sharma, S.K., Bhat, A.: Commercial banks performance and causality analysis.
Glob. Bus. Rev. 20(3), 769–794 (2019). https://doi.org/10.1177/0972150919837077
17. Rajaraman, I., Bhaumik, S., Bhatia, N.: NPA variations across Indian commercial banks: some
findings. Econ. Polit. Weekly, 161–168 (1999)
18. Rao, M., Patel, A.: A study on non-performing assets management with reference to public
sector banks, private sector banks and foreign banks in India. J. Manage. Sci. 5(1), 30–43
(2015). https://doi.org/10.26524/jms.2015.4
19. Subha, M.V., Nambi, S.T.: Classification of stock index movement using k-Nearest Neighbours
(k-NN) algorithm. WSEAS Trans. Inf. Sci. Appl. 9(9), 261–270 (2012)
20. Swami, O.S., Nethaji, B., Sharma, J.P.: Determining risk factors that diminish asset quality of
Indian commercial banks. Glob. Bus. Rev. 23(2), 372–384 (2022). https://doi.org/10.1177/097
2150919861470
150 G. Kumar and A. K. Misra

Dr. Gaurav Kumar is Assistant Professor at NIT Jalandhar. He has done post-doctoral research
in the area of Financial Data Analytics at University College Dublin (UCD), Ireland. He has
obtained a Ph.D. degree from the Indian Institute of Technology (IIT), Kharagpur, and an MBA
degree from the Indian Institute of Foreign Trade (IIFT), Delhi. He has studied the liquidity of
midcap stocks as an area of research for his doctoral thesis. He holds an engineering degree in
Computer science from the National Institute of Technology (NIT), Allahabad. He has industry
experience in SAP ERP consulting while working for Tata Consultancy Services (TCS). He has
received many awards and grants including UGC Junior Research Fellowship (JRF). His research
work is presented at various international conferences at American Economic Association (USA),
ICMA Centre (London), Corvinus University (Hungary), and La Trobe University (Melbourne,
Australia). Recently, he has published in top-tier journals viz. European Journal of Finance,
Journal of Behavioural and Experimental Finance, and Asian Journal of Economics. His broad
research interest includes Stock markets, Corporate Finance and Financial Analytics.

Dr. Arun Kumar Misra is Associate Professor at IIT Kharagpur. He received his MPhil and
Ph.D. from the Indian Institute of Technology (IIT), Bombay. He has more than 20 years of
industry and teaching experience. He worked in a leading PSU Bank in various areas of banking
like Credit Planning, Basel implementation, ALM, Capital Planning, Profit Planning, CRM, and
Market Risk, etc. As a senior banker, he has completed the required certifications related to
Management Accounting, Foreign Exchange, Risk Management, Banking Laws, and Banking IT
services. Under the guidance of Dr. Misra, nine Ph.D. students of IIT Kharagpur have got their
Ph.D. degrees. He has conducted a significant number of MDPs for banks, manufacturing compa-
nies, and government departments. He has completed consulting assignments for the Ministry
of Statistics and Programme Implementation (MoSPI), ICSSR, IRDA, and Banks. Dr. Misra has
several publications in national and international journals. His research interests are in the areas of
financial markets, market micro-structure, corporate finance, risk management, banking, and asset
pricing.
Chapter 7
Multiobjective Optimization
of Mean–Variance-Downside-Risk
Portfolio Selection Models

Georgios Mamanis and Eftychia Kostarelou

Abstract In this research paper we experimentally investigate the out-of-sample


performance of three multiobjective portfolio optimization models, namely Mean–
Variance-VaR, Mean–Variance-LPSD (LPSD: Lower Partial Standard Deviation)
and Mean–Variance-Skewness. For solving the optimization problems, we apply a
very popular efficient and effective Multiobjective Evolutionary Algorithm, SPEA2
(SPEA: Strength Pareto Evolutionary Algorithm) since the problems are not solved
using existing mathematical programming techniques at least in reasonable computa-
tional time. The models are tested on real data drawn from the S&P 100 and S&P 500
indexes. Out-of-sample results show that the efficient portfolios generated by SPEA2
for the Mean–Variance-LPSD portfolio selection model outperform the market port-
folio measured by S&P 500 index considering three performance measures; final
wealth, Sharpe ratio, Sortino ratio. The efficient portfolios generated by SPEA2
for the Mean–Variance-VaR comes next and it also beat the S&P 500 index for all
performance measures. The efficient portfolios generated by SPEA2 for the Mean–
Variance-Skewness portfolio optimization model does not provide satisfactory results
and fail to beat the market. Furthermore, comparison against competing portfolios,
that have shown good out-of-sample performance in past studies, like the global
minimum variance portfolio and the second order stochastic dominance portfolio
shows that the portfolios of the proposed models except Mean–Variance-Skewness
provide competing results.

Keywords Computational finance · Portfolio optimization · Portfolio analysis ·


Multiobjective evolutionary algorithms

G. Mamanis (B) · E. Kostarelou


Athens, Greece
e-mail: [email protected]
E. Kostarelou
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 151
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_7
152 G. Mamanis and E. Kostarelou

7.1 Introduction

Modern portfolio theory has its foundation in the seminal work of Markowitz [23].
Harry Markowitz proposed the first mathematical model for portfolio selection. His
model was the first mean-risk model proposed and introduced variance as a risk
measure forming the so-called Mean–Variance model. Since Mean–Variance model
other researchers have applied different risk measurements, such as semi-variance,
Value-at-Risk and absolute deviation (see e.g., [14, 16, 25–28, 30]).
However, the Mean–Variance model is adequate only if (i) the distribution of
the rate of return is multi-variate normal or (ii) the utility function of the investor
is quadratic. If the normality assumption of the rate of returns does not meet, tail
returns might occur more frequently than a Mean–Variance model predicts. For this
reason, many researchers have proposed Mean-Downside-Risk portfolio optimiza-
tion models replacing variance with another (downside) measure of risk. Variance,
however, remains the most widely used risk measure in the practice of portfolio opti-
mization. Moreover, many investors may consider a portfolio obtained with an alter-
native mean-risk model unacceptable since it may have large variance and as a conse-
quence small Sharpe index [18]. Respectively, selecting a portfolio with minimum
variance might have an unacceptable tail risk (extremely unfavorable outcomes).
One way to tackle these issues is by considering the higher moments of return
distributions in the portfolio selection process [22]. Skewness accounts for down-
side risk since if a return distribution is “skewed to the left,” the cubed extreme
negative values dominate, and the negative skewness will be positive and should be
minimized. Moreover, skewness accounts for tail favorable events since if the distri-
bution of the rate of returns is “skewed to the right,” the extreme positive values,
when cubed, dominate skewness measure, resulting in a positive skewness which
should be maximized [4]. Thus, in this study skewness is considered as an additional
objective function beyond mean and variance, to be maximized in order to account
for tail returns of the return distribution.
An alternative way to address the issues of non-normality has been proposed by
Roman et al. [29] who introduced the Mean–Variance-Downside-Risk portfolio opti-
mization models. In their study they propose a Mean–Variance-CVaR multiobjective
portfolio optimization model with CVaR as a third criterion in order to account for
tail risk. However, there are other equally important downside risk measures, as are
identified in Bodie et al. [4], and are not used by any researchers in the context of
Mean–Variance-Downside-Risk multiobjective portfolio optimization. These are the
VaR (Value at Risk) and lower partial standard deviation (LPSD). LPSD is computed
by considering only deviations of portfolio returns from a given threshold which is
usually the return of a risk-free asset. Specifically, it uses only negative deviations
from the risk-free rate of return squares those deviations, averages them, and then
takes the square root to obtain a “left-tail standard deviation”. Value-at-Risk (VaR)
describes the maximum loss (negative of portfolio’s rate of return) of a portfolio
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 153

that will not be surpassed during a specified period, with a given probability, for
example, the 5th or 1st percentile rate of return. In this study we try to analyze for
the first time these multi(three)objective portfolio optimization models by means of
multiobjective evolutionary algorithm.
The introduction of these objective functions into the Mean–Variance model
results in portfolio optimization problems which are very difficult to be solved. The
problems, is of type of non-linear multiobjective, actually, tri-objective optimization
problems. Portfolios are evaluated, described and compared using three statistics: the
traditional expected return and variance and a downside risk measure (in this study,
VaR and LPSD) or skewness. By introducing these statistics into the Mean–Variance
model the efficient frontier becomes a surface in a higher dimensional space. Tradi-
tional mathematical programming algorithms have difficulties to provide a solution
to these multiobjective optimization problems at least in reasonable computational
effort.
Consequently, alternative solution techniques are required for computing the
efficient frontier. Evolutionary computation [24] is a family of search techniques
which are population-based, random search heuristics that imitate the principles of
Darwin’s theory of evolution, and are appropriate for tackling optimization problems
with tough search landscapes (e.g., large and multimodal search spaces, complex
constraints, nonlinear and non-differentiable objective functions, multiple objec-
tive functions). A branch of evolutionary computation is the so-called multiobjec-
tive evolutionary algorithms (MOEAs) which are especially designed to approxi-
mately solve optimization problems with two or more objective functions. The main
supremacy of MOEAs is that they produce a good approximation of the efficient
frontier in a single run and within little computing time [2]. In this study, instead
of transforming the multiobjective portfolio optimization problems into a single
objective one, we try to approximately solve and analyze the problems in their multi-
objective nature. Thus, we compute and analyze-evaluate a set of efficient portfolios.
We show that by taking into account downside risk as an additional third criterion
into the classical Mean–Variance model is a profitable investment decision.
Some studies explore an additional criterion in the portfolio selection problem.
Garcia-Bernabeu et al. [15] introduced sustainability as a third criterion within the
Mean–Variance portfolio model, catering to ethical or green investors who seek to
integrate financial gains with social benefits. They employed a modern multiobjec-
tive genetic algorithm called ev-MOGA, emphasizing ε-dominance. Additionally,
Chen et al. [8] suggested a blended method for multiobjective portfolio optimization
problems involving higher moments. Moreover, Mamanis [19] conducted a computa-
tional analysis comparing various MOEAs in another three-objective portfolio opti-
mization scenario. More recently, Mamanis [20] proposed and empirically exam-
ined a multi(three)objective portfolio optimization model, incorporating two-tailed
performance measures and a utility function to assess financial performance.
In this research, we experiment with a well-known and popular MOEA, namely
SPEA2 [32] for analyzing and evaluating the multiobjective portfolio optimization
154 G. Mamanis and E. Kostarelou

problems. SPEA2 is a well-tested MOEA that has been applied in various real-
world optimization problems. Especially in portfolio optimization problems, Anag-
nostopoulos and Mamanis [1, 19] compared a variety of state-of-the-art MOEAs on
multiobjective portfolio optimization problems with binary variables and found that
SPEA2 was a very effective algorithm for providing an approximation of the effi-
cient frontier in reasonable computational effort. Here, in contrast with these studies,
our focus is on the models. Our goal is to show that solving these models approx-
imately can provide the investor with a variety of portfolios with very good return
characteristics. Furthermore, unlike most studies that consider portfolio optimiza-
tion problems with MOEAs and try to improve the in-sample results, our focus is
on the out-of-sample performance of the algorithm and portfolio selection models
accordingly. According to our knowledge no paper has been devoted to solve these
multiobjective portfolio optimization problems with the exception of few papers that
solve the Mean–Variance-Skewness portfolio model [8, 21]. Furthermore, our goal
is to compare the out-of-sample performance of the portfolio selection models as
such a study is, to the best of our knowledge absent from the literature.

7.2 Multiobjective Portfolio Optimization Models

The conventional portfolio selection model assumes a single investment horizon and
a finite set of n available financial assets. The investor’s task is to build a portfolio by
determining the allocation of capital across these assets to maximize profits at the end
of the investment period. Each decision variable x i , represents the proportion of the
available funds invested in risky asset i = 1,…,n. The return on each financial asset
(denoted by the random variable Ri at the end of the investment period is initially
unknown. The portfolio’s return, being a weighted sum of these random variables, ∑ is
itself a random variable, as expressed by the following equation R(x) = ni−1 xi Ri .
The investor aims to construct a portfolio that maximizes the return at the end of the
investment period, subject to the constraint that the sum of the proportions assigned
to all assets equals 1.
Many approaches have been proposed for choosing among different random vari-
ables [9]. A fundamental answer was given by Harry Markowitz in 1952 [23] who
proposed the Mean–Variance model. The mean is used to define the profit of the
investment and should be maximized while variance defines the risk of the invest-
ment and ought to be minimized. Since Markowitz’s work, many alternative risk
measures have been proposed.
In this spirit the bi-objective mean-risk portfolio optimization problem that must
be solved is given below.

max μ(x)
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 155

min ρ(x)
{ | n }
|∑
n|
s. t. x ∈ X = x ∈ R | xi = 1, xi ≥ 0 (7.1)
|
i=1

Apart from the mean and risk objectives, these models incorporate a set of
constraints, forming a feasible collection of decision vectors denoted as X. The
simplest method to delineate this feasible set is by stipulating that the weights
sum up to 1, and prohibiting short-selling, hence ensuring non-negativity propor-
tions xi ≥ 0, i = 1, . . . n. In this study, the model with only budget and short-sale
constraints is called as the simple model.
An extension of this simple model includes the introduction of additional real-
world constraints. Among the constraints commonly employed are the cardinality
constraint, which restricts the number of assets held within specified lower (K min )
and upper (K max ) limits, and quantity constraints, which restrict the capital invested
in holding securities to fall within designated lower (li , for i = 1, … n) and upper
(ui , i = 1, … n) bounds.
These constraints define the so-called cardinality constrained portfolio optimiza-
tion problem. The additional real-world constraints (additional with the respect to the
simple model) are described by Eqs. (7.3)–(7.5). Equations (7.3) and (7.4) describe
cardinality and quantity constraints respectively.

min ρ(x)
max μ(x)

n
s.t. xi = 1 (7.2)
i=1


n
K min ≤ δi ≤ K max (7.3)
i=1

li δi ≤ xi ≤ ui δi , i = 1, . . . , n (7.4)

(7.5)
δi ∈ {0, 1}, i = 1, . . . , n

Both constraints utilize a binary variable δ i which is equal to 1 if asset i = 1, …,


n is held in the portfolio and 0 otherwise.
In the above models a further extension may be the introduction of an additional
risk measure or skewness that accounts for tail outcomes. In this study we consider
six multiobjective portfolio optimization models defined by introducing a downside
risk measure or skewness in the above models. The two risk measures that we have
implemented in this study are VaR, and lower partial standard deviation (LPSD).
156 G. Mamanis and E. Kostarelou

Moreover, two additional models are formed that use as an additional objective
function the skewness of the portfolio.
For computing, the expected return, variance and the other objective functions the
following process is utilized in this paper. Let r it be the observed historical return
of asset i at period t. Assuming that each period defines a different scenario to be
occurred in the future with an associated probability pt , all scenarios are considered
equally likely, thus pt . = 1/T, where T is the total number of scenarios.
For a portfolio x its realization under period t is given by the following equation:


n
zt (x) = rit xi , t = 1 , . . . , T .
i=1

The expected return of the portfolio is calculated using the following formula:


T
μ(x) = zt (x)pt
t=1

The variance of the portfolio is given by:

1∑
T
V (x) = [zt (x) − μ(x)]2
T t=1

The Value-at-Risk (VaR) at a given confidence level α is the maximum loss (or the
minimum return) that a portfolio will not exceed with a probability α. Probability α
is a parameter of the risk function which is usually fixed at a very small number (e.g.,
0.01, 0.05 or 0.1) in order to account only for extreme losses or extreme minimum
returns. In this study a value of α = 0.1 is used. In the following equation of the
VaR function, the negative sign is used in order to describe loss since zt (x) describes
return. For example, a return of − 3% corresponds to a 3% loss.
⎧ | ⎫
⎨ |∑ ⎬
| tα
VaRα (x) = − inf z(tα ) (x)|| p(j) ≥ α
⎩ | ⎭
j=1

where, z(j) are the ordered returns such that z(1) (x) ≤ z(2) (x) ≤ · · · ≤ z(T ) (x) and p(j)
their corresponding probabilities of occurrence.
Lower partial standard deviation is computed from the equation below:
/
1 ∑T [ ( )2 ]
LPSD(x) = min 0, zt (x) − rf
T t=1

where r f is the risk-free rate. In this study a rate of 0.0005 is used.


7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 157

Skewness is the third central moment of return distribution and is calculated by


the following formula:

1∑
T
S(x) = [zt (x) − μ(x)]3
T t=1

For the sake of completeness, kurtosis also is a moment (actually the fourth central
moment) that accounts also for tail risk of the return distribution. The kurtosis of a
portfolio x = (x1 , . . . , xn ) is calculated by:

1∑
T
K(x) = [zt (x) − μ(x)]4
T t=1

In this study, however, we concentrate on the first three central moments because
these are most commonly used in the specialized literature.
The above models are multi(three)objective optimization problems. There are
three conflicting objective functions, mean which should be maximized, variance
which should be minimized, and the third objective VaR which also should be mini-
mized or LPSD which ought to be minimized and finally skewness which must be
maximized over x.
A portfolio that simultaneously optimizes all the three objectives hardly exist.
Thus, the aim in multiobjective portfolio optimization is to find all (or a discrete
set) of the optimal trade-off portfolios among the three objectives. These trade-off
portfolios form a special solution set which is called efficient in modern portfolio
theory parlance. The image of the efficient set in the objective space defines the
efficient frontier [12]. The intention of multiobjective portfolio optimization is to find
the efficient frontier and the set of efficient solutions i.e., every solution (portfolio
structure) which are nondominated with respect the three objective functions.
In the particular problems at hand, it is said that a feasible portfolio x 1 domi-
nates another feasible portfolio x 2 iff: μ(x1 ) ≥ μ(x2 ), V (x1 ) ≤ V (x2 ) and either,
depending the portfolio selection model, VaR(x 1 ) ≤ VaR(x 2 ) or LPSD(x 1 ) ≤
LPSD(x 2 ) or S(x 1 ) ≥ S(x 2 ) with at least one strict inequality. This is the so-called
Pareto dominance relation in multiobjective optimization parlance.
Introducing a third objective function (beyond mean and variance) into the port-
folio selection model results in an efficient frontier that is a surface in the three-
dimensional space. Computing the exact efficient surface for the resulting multiob-
jective portfolio optimization problems is very difficult if not impossible. Further-
more, an additionally difficulty arises from the introduction of cardinality and quan-
tity constraints ending in a mixed-integer nonlinear multiobjective optimization
problem. Usually, however, a discrete approximation of the efficient surface is accept-
able, and as we will show, sufficient. For computing the efficient frontier for diffi-
cult optimization problems, i.e., optimization problems with large solution spaces,
158 G. Mamanis and E. Kostarelou

multimodal search spaces, constraints, nonlinear and non-differentiable functions,


multiple objectives a very powerful family of techniques are evolutionary algorithms.

7.3 Multiobjective Evolutionary Algorithms

Evolutionary algorithms (EAs) are metaheuristics based on populations, mirroring


the principles of Darwin’s theory of evolution. Within this category, Multiobjective
Evolutionary Algorithms (MOEAs) represent a distinct subset tailored for addressing
optimization problems characterized by complex search landscapes and multiple
objectives. Contrary to traditional EAs, MOEAs primarily diverge in their selection
mechanisms, aiming to guide the population towards the global optimum efficient
frontier. The primary advantage of MOEAs lies in their capability to approximate
the efficient frontier effectively within a single run and with minimal computational
resources [13].
In this study we aim to analyze, examine, and assess, the proposed portfolio opti-
mization models by employing a widely recognized, efficient, and proficient MOEA
known as SPEA2 (Strength Pareto Evolutionary Algorithm 2). SPEA2, initially
introduced by Zitzler et al. in 2001, represents an enhancement of its predecessor,
SPEA [31]. Anagnostopoulos and Mamanis [1] conducted a comparative analysis of
leading MOEAs on a three-objective problem with binary variables, determining that
SPEA2 demonstrated superior performance. This finding underpins the selection of
SPEA2 for addressing the multiobjective portfolio optimization problems outlined
in Sect. 7.2.
SPEA2 generates successive populations of solutions by employing recombina-
tion operators (crossover and mutation) and selection, aiming to steer these popula-
tions towards global optimum regions. It utilizes nondominated ranking and selec-
tion to approach the efficient frontier, integrates diversity-preserving techniques to
prevent convergence to a single solution on the efficient frontier, and incorporates an
elitist archiving strategy to retain the best nondominated solutions discovered during
the search process.
The structure and flow of SPEA2 is given in Fig. 7.1.
SPEA2 employs two solution groups: Population A, which preserves the top
solutions based on both nondominance and diversity throughout the algorithm’s
execution, while Population B represents the conventional set of solutions found in
evolutionary algorithms. Initially, Population B is populated with solutions randomly
generated in the solution space using the initialization operator, while Population A0 ,
often referred to as the archive or external population, begins empty. During each
generation, the evaluate operator assigns fitness to individuals from both archive A
and population B.
First, in the evaluate operator, for each individual-solution si in the archive A
and population B is assigned a strength value SV (si ) which equals to the number of
individuals-solutions that dominates:
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 159

Fig. 7.1 Structure and flow t=0


of SPEA2
(A0, B0 ) = initialize()
while (termination = false) do
evaluate(At, Bt)
At+1 = truncate(update(At, Bt)
Bt+1 = variation(sample(At+1)
t = t +1
end while

return truncate(update(At, Bt)

| | |
SV (si ) = |{sj |sj ∈ A ∪ B ∧ sj ≻ si }|

where, |·| denotes the cardinality of a set and the symbol ≻ denotes the Pareto
dominance relation which was defined for the multiobjective portfolio optimization
models considered in this study in Sect. 7.2.
Thereafter, the fitness of every solution in both archive A and population B is
calculated, determined by the sum of the strengths of its dominators:

F(si ) = SV (sj ).
A∪B∧sj ≻si

Following this procedure, all non-dominated solutions are assigned a fitness value
of zero. Solutions with lower fitness values are deemed superior to those with higher
fitness values, indicating a focus on minimizing fitness. Subsequently, the evaluate
operator enhances the fitness of each individual by incorporating a crowding value,
aiming to maintain diversity within the population and guide the search across the
entire efficient frontier. Density information is integrated by adjusting the fitness
value of each solution in both the archive and population, based on the inverse of the
k-th smallest Euclidean distance (measured in objective space) plus two. Following
evaluation, the update and truncate operators select the top individuals from both
archive A and population B based on their assigned fitness values. Then, the external
population A undergoes a reproduction scheme similar to single-objective evolu-
tionary algorithms, resulting in the offspring population for the next generation. This
process iterates until a stopping criterion is met. Finally, the algorithm returns the best
solutions, offering the most optimal approximation of the global efficient frontier for
the underlying multi-objective optimization problem.
When MOEAs are applied in real-world multiobjective optimization problems,
several issues should be taken care like solution representation and variation oper-
ators. In this study, a problem-specific data structure for representing a solution
160 G. Mamanis and E. Kostarelou

and specialized variation operators to get most of the time a feasible portfolio is
implemented.
Each individual contains the following vectors for representing a solution:
{ }
S = {s1 , . . . , sk }, k ∈ K min , . . . , K max ,

{ }
W = ws1 , . . . , wsk , 0 ≤ wsi ≤ 1, i = 1, . . . , k.

Vector S includes k ∈ {K min , …, K max } integer numbers that represent the assets
that are in the portfolio while array W includes k real numbers between 0 and 1
associated with each asset. In order to satisfy quantity constraints, the following
procedure is followed. For satisfying the lower bounds the following normalization
equation is implemented.
( )
wsi ∑
xsi = lsi + ∑ 1− ls , i = 1, . . . , k
s∈S ws s∈S

To meet the upper bound constraint, if a particular asset within the portfolio
surpasses its upper limit following the application of the aforementioned equation,
it is adjusted to adhere to its upper bound. Any surplus weight is then redistributed
among the remaining assets in the portfolio according to W.
For the multiobjective portfolio optimization problems with only budget constraint
and short sales constraints, we simply set K min = 1, K max = n, l i = 0, and ui = 1 for
every i. This solution representation and constraint handling technique were proposed
by Chang et al. [7].

7.4 Computational Results

7.4.1 Computational Experiments on S&P 100 Index

In this section, we present the outcomes of our computational investigation utilizing


assets contained within the S&P 100 index. For this analysis, we utilized a dataset
sourced from the S&P 100 index, gathering the necessary data from the Yahoo
Finance webpage, focusing on closing prices adjusted for dividends and stock splits.
We calculated the monthly returns spanning from January 1, 2000, to June 30, 2020,
for a total of 78 assets. The monthly returns from January 1, 2000, to December 31,
2018, were utilized to derive the optimal portfolios. Each computed rate of return
defined a distinct scenario, resulting in a total of T = 228 scenarios. For the out-
of-sample analysis, we evaluated the performance of the portfolios generated by the
algorithm over the subsequent eighteen months following the selection date, covering
the period from January 1, 2019, to June 30, 2020.
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 161

SPEA2 was executed 10 times for each portfolio selection problem using a
laptop computer equipped with an Intel(R) Core(TM) i5-7200U processor running at
2.5 GHz and 4.00 GB of RAM. The implementation was carried out using Microsoft
Visual C++. Across the ten runs of the algorithm, 10 different efficient frontiers
were generated for each portfolio optimization model. The parameters necessary for
running SPEA2 were set as follows: a population and archive size of 300 individuals
were utilized, along with a crossover probability of 0.9. Mutation probabilities of
0.01 for the S array and 1.0 for the W set were employed. The algorithm was termi-
nated after generating 150,000 solutions. On average, it took approximately 650 s to
obtain the efficient frontiers.
The next figures (Figs. 7.2, 7.3 and 7.4) show the efficient solutions depicted
in Mean–Variance space for the three models for a single execution of the algo-
rithm. From the first two graphs, it is seen that the algorithm generated a diverse
set of efficient portfolios ranging from approximately 1.2% expected rate of return
a month to 2.5%. There are also solutions that are not Mean–Variance efficient
(they have lower expected return and bigger variance) but have less downside risk
measured either by LPSD or VaR. However, from the graph for the Mean–Variance-
Skewness portfolio selection model (Fig. 7.4) we see that the efficient solutions are
far more diverse and diverge much more from the Mean–Variance efficient solu-
tions. This is because these portfolios have excessively large skewness. Thus, they
offer considerably less expected return, large variance but large skewness as well.
These portfolios seem to perform bad out-of-sample degrading the performance of
the Mean–Variance-Skewness portfolio optimization model as we will see next.
In the three dimensions the approximate efficient portfolios are shown in the next
figures (Figs. 7.5, 7.6 and 7.7). It is seen that the algorithm generates a very diverse
set of efficient portfolios for the decision maker to select.
Now an out-of-sample evaluation of the described portfolio models will be
presented. For each efficient frontier of a particular portfolio selection model, we

0.03

0.025

0.02
Mean

0.015

0.01

0.005

0
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016
Variance

Fig. 7.2 Mean–Variance-LPSD efficient portfolios depicted in Mean–Variance space


162 G. Mamanis and E. Kostarelou

0.03

0.025

0.02
Mean

0.015

0.01

0.005

0
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
Variance

Fig. 7.3 Mean–Variance-VaR efficient portfolios depicted in Mean–Variance space

0.03

0.025

0.02
Mean

0.015

0.01

0.005

0
0 0.01 0.02 0.03 0.04 0.05 0.06
Variance

Fig. 7.4 Mean–Variance-Skewness efficient portfolios depicted in Mean–Variance space

computed the percentage of portfolios obtained by the algorithm that produces


better final wealth (FW) (compounded every month for 18 months following the
date of selection) than that of the S&P 500 index. Table 7.1, shows the percentage
of portfolios that generate more wealth than investing in the S&P 500 index for
each portfolio model i.e., Mean–Variance-VaR, Mean–Variance-LPSD, and Mean–
Variance-Skewness. It is seen that the majority of the “optimal” portfolios generate
more wealth than the S&P 500 index except the Mean–Variance-Skewness portfolio
selection model. In the same table, it is shown also the percentage of portfolios gener-
ated by the algorithm that exceeds the Sharpe ratio (SR) and Sortino ratio (SoR) of
the S&P 500 index. The results are similar to that of the FW.
The Sharpe ratio is computed using the formula {(Mean Portfolio Return − Risk-
Free Rate)/Standard Deviation of Portfolio Return}, while, Sortino ratio is given by
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 163

Fig. 7.5 The efficient portfolios in the three-dimensional space for the Mean–Variance-LPSD
portfolio model

Fig. 7.6 The efficient portfolios in the three-dimensional space for the Mean–Variance-VaR
portfolio model
164 G. Mamanis and E. Kostarelou

Fig. 7.7 The efficient portfolios in the three-dimensional space for the Mean–Variance-Skewness
portfolio model

Table 7.1 Results for the simple portfolio selection models


VaR LPSD Skewness
% final wealth 76.7 96.5 30.4
better than S&P
500
% SR better than 78.8 98.7 30.6
S&P 500
% SoR better 80.4 99 30.8
than S&P 500
S&P 500 results: FW = 1.21, SR = 0.13, SoR = 0.18
FW SR SoR FW SR SoR FW SR SoR
Average max 2.61 0.62 1.18 2.61 0.63 1.17 2.61 0.63 1.17
value
Average min 1.07 0.0008 0.0011 1.14 0.08 0.11 0.77 − 0.1 − 0.13
value
Average median 1.45 0.3 0.49 1.61 0.4 0.68 1.028 0.007 0.009
value

{(Mean Portfolio Return − Risk-Free Rate)/Downside Risk Deviation}. Downside


risk deviation is the standard deviation of the portfolio’s returns that are below the
risk-free rate. The risk-free rate was fixed at 0.0005.
The next lines of Table 7.1 show the average maximum final wealth, the average
minimum final wealth and the average median final wealth for each portfolio model
averaged against the ten replications of the algorithm. The final wealth produced
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 165

by investing in S&P 500 index is 1.21. It is seen that the Mean–Variance-LPSD


model produces the best results considering this measure compared to the other
two portfolio models. Mean–Variance-VaR comes next. Mean–Variance-Skewness
portfolio model does not provide satisfactory results.
However, the better final wealth of the Mean–Variance-Downside-Risk portfolio
models may have been realized by simply taking more risk (thus it may be a compen-
sation for bearing extra risk). For this reason, we calculated the Sharpe ratio and the
Sortino ratio for each multiobjective portfolio optimization model. In the same table
it is seen the percentage number of portfolios that produces better ratios than S&P
500 index. In addition, the average maximum ratios, the average minimum ratios, the
average median ratios against the ten replicates of the algorithm for each multiobjec-
tive portfolio model are supplied. The most important line of Table 7.1 is the last line
which provides the average median ratios for each performance measure and port-
folio model. The results show that the Mean–Variance-LPSD model performs best,
and better than S&P 500 index from the other portfolio models. Mean–Variance-
VaR comes next while being also better than the market portfolio. Mean–Variance-
Skewness does not perform well. In all performance measures underperforms the
index.
However, the optimal efficient portfolios generated by the algorithm for the port-
folio models without the cardinality constraint contain from 4–5 assets (the majority)
to 10–11 assets (the minority). This size of the portfolio might be considered unac-
ceptable by many investment managers as not adequately diversified. For this reason,
we implemented the algorithm on the multiobjective cardinality constrained portfolio
optimization models. We fixed the size of the portfolio to be between 10 to 15 assets.
According to Bodie et al. [4] all the nonsystematic risk can be diversified away by
holding 10 or more assets. So, we have decided to restrict the size of the portfolio
to 10 to 15 assets. A lower bound of 5% for each asset (if it is in the portfolio) in
order to avoid very small holdings and an upper bound of 25% in the proportion of
each asset in the portfolio in order to avoid overweighting in some assets and ensure
a properly diversification was fixed.
The results are as expected: it deteriorates a little bit for all models, since the
additional constraints impose a restriction to portfolio formulation. However, for
Mean–Variance-VaR the imposition of cardinality and quantity constraints leads the
algorithm to produce more robust efficient frontiers as the percentage of portfolios
that exceeds final wealth, Sharpe ratio and Sortino ratio of S&P 500 increases and
approaches 100. It seems, that the imposition of cardinality constraints, restrict the
search space and leads to more tight results.
As it is seen in Table 7.2 the Mean–Variance-LPSD generates almost the same
quality of solutions with Mean–Variance-VaR portfolio model. Mean–Variance-
Skewness does not perform well. Both Mean–Variance-LPSD and Mean–Variance-
VaR are considered better than S&P 500. The average median final wealth is 1.29
for Mean–Variance-LPSD and 1.28 for Mean–Variance-VaR while that of S&P 500
index is 1.21. The same conclusions are drawn considering Sharpe and Sortino ratios.
The average median Sharpe ratio is 0.24 and 0.23 for Mean–Variance-LPSD and
166 G. Mamanis and E. Kostarelou

Mean–Variance-VaR respectively. Average value for Sortino ratio was 0.4 and 0.39
respectively for the two portfolio models.
The efficient solutions for each portfolio model are shown in the next figures
(Figs. 7.8, 7.9 and 7.10). The same conclusions with the portfolio model without
cardinality constraints can be drawn.
Furthermore, next (Figs. 7.11, 7.12 and 7.13) there are the three-dimensional plots
for the cardinality constrained Mean–Variance-Downside-Risk, Mean–Variance-
VaR and Mean–Variance-Skewness portfolio models.

Table 7.2 Results for the cardinality constrained portfolio selection models
VaR LPSD Skewness
% final wealth 91.7 100 26.4
better than S&P
500
% SR better 97.9 100 28.3
than S&P 500
% SoR better 99.1 97.7 27.9
than S&P 500
S&P 500 results: FW = 1.21, SR = 0.13, SoR = 0.18
FW SR SoR FW SR SoR FW SR SoR
Average max 1.51 0.38 0.71 1.5 0.38 0.72 1.51 0.37 0.68
value
Average min 1.17 0.11 0.16 1.19 0.13 0.2 0.94 − 0.07 − 0.07
value
Average 1.28 0.23 0.39 1.29 0.24 0.4 1.12 0.057 0.07
median value

0.025

0.02

0.015
Mean

0.01

0.005

0
0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 0.004 0.0045
Variance

Fig. 7.8 Cardinality constrained Mean–Variance-LPSD efficient portfolios depicted in Mean–Vari-


ance space
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 167

0.025

0.02

0.015
Mean

0.01

0.005

0
0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 0.004 0.0045
Variance

Fig. 7.9 Cardinality constrained Mean–Variance-VaR efficient portfolios depicted in Mean–Vari-


ance space

0.025

0.02

0.015
Mean

0.01

0.005

0
0 0.002 0.004 0.006 0.008 0.01 0.012
Variance

Fig. 7.10 Cardinality constrained Mean–Variance-Skewness efficient portfolios depicted in Mean–


Variance space

It is seen that the algorithm generates a diverse set of portfolios that trade-off
amongst the three objectives.

7.4.2 Computational Experiments on a Large-Scale Problem


Instance

In this section we report the results obtained by running SPEA2, on a publicly


available large data set containing returns for S&P 500 (442 assets). The data set are
168 G. Mamanis and E. Kostarelou

Fig. 7.11 The efficient portfolios in the three-dimensional space for the cardinality constrained
Mean–Variance-LPSD portfolio model

Fig. 7.12 The efficient portfolios in the three-dimensional space for the cardinality constrained
Mean–Variance-VaR portfolio model
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 169

Fig. 7.13 The efficient portfolios in the three-dimensional space for the cardinality constrained
Mean–Variance-Skewness portfolio model

provided by Bruni et al. [5]. They include 595 weekly linear returns for 442 stocks
included in S&P 500 index. For conducting the experiments, we take the 520 first
returns for the in-sample optimization and the remaining 75 (approximately one year
and a half) for the out-of-sample analysis.
Due to space limitation, we do not provide the figures of the efficient portfolios.
Similar conclusions can be drawn as with the first set of experiments. We only provide
the out-of-sample comparison of the models.
From Table 7.3, it is observed that the majority of efficient portfolios generated
by the algorithm have better performance than the index for all three performance
measures, although the number of portfolios better than the index does not approach
100 as in the previous set of experiments. An important point is the improvement
of the Mean–Variance-Skewness portfolio selection model something that shows
a lack of stability for the portfolio model depending the data set and constraints’
parameters. However, considering the median of the performance measures against
the ten replicates of the algorithm it is observed that the best models are the Mean–
Variance-VaR and Mean–Variance-LPSD portfolio models which they are better than
the index considering all performance measures.
On the other hand, again, the imposition of cardinality constraints improves the
out-of-sample performance concerning the percentage of generated solutions that
are better than S&P 500 index. It is seen that all efficient portfolios generated by
SPEA2 for the Mean–Variance-VaR portfolio model outperforms the market for
all performance measures. The average median performance values, increases as
well. The same it is observed for Mean–Variance-LPSD portfolio selection model.
170 G. Mamanis and E. Kostarelou

Table 7.3 Results for the portfolio selection models on S&P 500
VaR LPSD Skewness
% final 84.1 69.5 68.2
wealth
better
than
S&P 500
% SR 85.7 72 73
better
than
S&P 500
% SoR 85.7 72 72.7
better
than
S&P 500
S&P 500 results: FW = 1.053, SR = 0.02, SoR = 0.026
FW SR SoR FW SR SoR FW SR SoR
Average 1.36 0.18 0.27 1.35 0.19 0.28 1.72 0.18 0.28
max
value
Average 0.98 0.003 0.004 0.95 − 0.004 − 0.005 0.83 − 0.055 − 0.07
min
value
Average 1.18 0.07 0.096 1.17 0.065 0.089 1.1 0.043 0.055
median
value

Furthermore, the Mean–Variance-LPSD model presents slightly better results than


Mean–Variance-VaR portfolio model. The results, on the other hand, are deteriorated
for the Mean–Variance-Skewness portfolio selection model something that shows a
lack of stability and robustness for the particular portfolio model.

7.4.3 Comparison with Competing Portfolios

Except the stock market index, the results of the proposed portfolio selection models,
are compared against competing portfolios. The global minimum-variance (mv) port-
folio without short-sales (MV), and a minimum stochastic dominance portfolio (SD)
[17] are used as benchmarks. There is evidence that all these strategies produce good
out-of-sample results [3, 6, 10, 11].
The next table shows the percentage of portfolios that generates more wealth
(FW), Sharpe ratio (SR) and Sortino ratio (SoR) than investing in the minimum
second order stochastic dominance portfolio with short sales not allowed (SD) for
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 171

each portfolio model. Note that only the proposed portfolio models with cardinality
constraints are presented since they give the best results based on the above analysis.
The Sharpe ratio for the second order stochastic dominance portfolio with short
sales not allowed (SD) is 0.086; the Sortino ratio is 0.12 and the final wealth 1.22.
As can be seen from Table 7.4, the average median value for all generated portfolios
using LPSD, is almost equal to this portfolio (second order stochastic dominance
portfolio) considering all performance measures. However, as can be seen from
Table 7.5, the majority of portfolios (more than 50%) generated by the proposed
model are better than the values of the SD model considering all three performance
measures. Furthermore, it is worth pointing out that the exact optimization algorithm
for computing the optimal SD portfolio takes approximately 30 min to generate the
one optimal portfolio. The benefit by using heuristics is obvious as SPEA2 generates
a number of optimal portfolios in a timely manner (approximately 500 s for the entire
efficient frontier, on average).

Table 7.4 Results for the cardinality constrained portfolio selection models on S&P 500
VaR LPSD Skewness
% final 100 100 26.4
wealth
better than
S&P 500
% SR 100 100 28.3
better than
S&P 500
% SoR 100 100 27.9
better than
S&P 500
S&P 500 results: FW = 1.053, SR = 0.02, SoR = 0.026
FW SR SoR FW SR SoR FW SR SoR
Average 1.37 0.18 0.27 1.38 0.2 0.29 1.51 0.37 0.68
max value
Average 1.09 0.037 0.05 1.12 0.047 0.064 0.94 − 0.07 − 0.07
min value
Average 1.21 0.08 0.12 1.23 0.09 0.13 1.12 0.057 0.07
median
value

Table 7.5 Results for each


VaR LPSD Skewness
portfolio selection model
against SD % final wealth better than SD 42.5 55.7 2.3
% SR better than SD 46.3 57.9 3.7
% SoR better than SD 46.5 57.5 3.2
172 G. Mamanis and E. Kostarelou

Table 7.6 Results for each


VaR LPSD Skewness
portfolio selection model
against MV % final wealth better than MV 35.5 47.3 1.9
% SR better than MV 6.9 14 0.4
% SoR better than MV 7.6 14.3 0.43

1.6
1.4
1.2
1
FW

0.8
MV
0.6
Proposed model
0.4
0.2
0
0 0.005 0.01 0.015 0.02 0.025
LPSD

Fig. 7.14 Out of sample performance (FW—future wealth) of Mean–Variance-LPSD portfolio


model

The next table (Table 7.6) shows the percentage of portfolios that generates more
wealth (FW), Sharpe ratio (SR) and Sortino ratio (SoR) than investing in the global
minimum-variance portfolio with short sales not allowed (MV) for each portfolio
model. The Sharpe ratio for the global minimum-variance portfolio with short sales
not allowed is 0.137, the Sortino ratio is 0.194 and the final wealth 1.23.
It is seen, that only a small fraction of the proposed portfolio models, generate
better Sharpe and Sortino ratios than the global minimum-variance portfolio. But
these results do not imply that the proposed models are not good. Except perhaps
the Mean–Variance-Skewness model that provides constantly worst results than the
other two. Comparing the results of the global minimum-variance portfolio with short
sales not allowed with the best portfolio generated under the three models we see that
the three models provide better results. Of course, it would be unrealistic to expect
all portfolios of the proposed models to be better than the global minimum-variance
portfolio with short sales not allowed. It must be noted that the global minimum-
variance portfolio is an efficient portfolio under the three models (of course with
constraints imposed).
We can see from the below graphs that if the investor concentrates on the minimum
LPSD portfolios can gain better results than the Markowitz Mean–Variance portfolio
considering all performance measures (Figs. 7.14, 7.15 and 7.16).
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 173

0.25

0.2

0.15
SR

MV
0.1
Proposed model

0.05

0
0 0.005 0.01 0.015 0.02 0.025
LPSD

Fig. 7.15 Out of sample performance (SR—sharpe ratio) of Mean–Variance-LPSD portfolio model

0.35

0.3

0.25

0.2
SoR

0.15 MV
Proposed model
0.1

0.05

0
0 0.005 0.01 0.015 0.02 0.025
LPSD

Fig. 7.16 Out of sample performance (SoR—Sortino Ratio) of Mean–Variance-LPSD portfolio


model

7.5 Conclusion

Multiobjective optimization in financial problems is gaining momentum the last


years. An especially promising area for multiobjective optimization is the port-
folio optimization problem which is an inherently multiobjective problem from its
origin. In this research we considered three multiobjective portfolio optimization
problems namely Mean–Variance-VaR, Mean–Variance-LPSD and Mean–Variance-
Skewness. These models have never been solved by any researcher; we think that
it is mainly because of the difficulty that presents for conventional mathematical
174 G. Mamanis and E. Kostarelou

programming techniques. In this study we have solved the three models using a very
popular Multiobjective Evolutionary Algorithm, SPEA2. Our goal was to show that
approximating the efficient frontiers can provide useful portfolios for the investor.
The results showed that the majority of the generated portfolios despite approxi-
mate have better out-of-sample performance than the S&P 500 index except Mean–
Variance-Skewness portfolio selection model. This outperformance obtained using
three performance measures, final wealth, Sharpe ratio and Sortino ratio.
Comparison against competing portfolios shows that the portfolios of the proposed
models except Mean–Variance-Skewness provide competing results. Especially the
efficient portfolios of the proposed models that concentrate on the minimum risk
area of the efficient frontier provide better results that the competing portfolios.
As future research a rolling window of the out-of-sample analysis may be consid-
ered in order to test the predictive ability of the proposed portfolio selection models.
Also, in this analysis a transaction cost constraint may be imposed on the portfolio
selection models.

Declarations Funding: No funding.


Conflicts of interest/Competing interests: There are no any conflicts of interest.
Availability of Data and Material: Yes.
Code availability: Yes.

References

1. Anagnostopoulos, K.P., Mamanis, G.: A portfolio optimization model with three objectives
and discrete variables. Comput. Oper. Res. 37, 1285–1297 (2010)
2. Bechikh, S., Datta, R., Gupta, A., (eds.).: Recent Advances in Evolutionary Multi-objective
Optimization. Springer International Publishing (2017)
3. Board, J.L.G., Sutcliffe, C.M.S.: Estimation methods in portfolio selection and the effectiveness
of short sales restrictions: UK evidence. Manag. Sci. 40(4), 516–534 (1994)
4. Bodie, Z., Kane, A., Marcus, A.J.: Investments, 10th edn. McGraw-Hill (2014)
5. Bruni, R., Cesarone, F., Scozzari, A., Tardella, F.: Real-world datasets for portfolio selection
and solutions of some stochastic dominance portfolio models. Data Brief 8, 858–862 (2016)
6. Chan, L.K.C., Karceski, J., Lakonishok, J.: On portfolio optimization: forecasting covariances
and choosing the risk model. Rev. Fin. Stud. 12(5), 937–974 (1999)
7. Chang, T.J., Meade, N., Beasley, J.E., Sharaiha, Y.M.: Heuristics for cardinality constrained
portfolio optimization. Comput. Oper. Res. 27, 1271–1302 (2000)
8. Chen, B., Zhong, J., Chen, Y.: A hybrid approach for portfolio selection with higher-order
moments: empirical evidence from shanghai stock exchange. Exp. Syst. Appl. 145(1), 1–11
(2020)
9. De Giorgi, E.: Reward-risk portfolio selection and stochastic dominance. J. Bank Fin. 29,
895–926 (2005)
10. DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how inefficient is
the 1/N portfolio strategy? Rev. Fin. Stud. 22(5), 1915–1953 (2007)
11. DiBartolomeo, D.: The Equity Risk Premium, CAPM and Minimum Variance Portfolios.
Northfield News (2007)
12. Elton, E.J., Gruber, M.J., Brown, S.J.: Modern Portfolio Theory and Investment Analysis, 9th
edn. Wiley (2014)
7 Multiobjective Optimization of Mean–Variance-Downside-Risk … 175

13. Emmerich, M.T.M., Deutz, A.H.: A tutorial on multiobjective optimization: fundamentals and
evolutionary methods. Nat. Comp. 17, 585–609 (2018)
14. Fishburn, P.C.: Mean-risk analysis with risk associated with below target returns. Am. Econ.
Rev. 67, 116–126 (1977)
15. Garcia-Bernabeu, A., Salcedo, J.V., Hilario, A., Pla-Santamaria, D., Herrero, J.M.: Computing
the mean-variance-sustainability nondominated surface by Ev-MOGA. Complexity (2019).
https://doi.org/10.1155/2019/6095712
16. Konno, H., Yamazaki, H.: Mean absolute deviation portfolio optimization model and its
applications to Tokyo stock market. Manag. Sci. 37, 519–531 (1991)
17. Kuosmanen, T.: Efficient diversification according to stochastic dominance criteria. Manag.
Sci. 50, 1390–1406 (2004)
18. Luenberger, D.G.: Investment Science. Oxford University Press, New York (1998)
19. Mamanis, G.: A comparative study on multi-objective evolutionary algorithms for tri-objective
mean-risk-cardinality portfolio optimization problems. In: Patnaik, S., Tajeddini, K., Jain, V.
(eds.), Computational Management. Modeling and Optimization in Science and Technologies,
pp. 277–303 (2021)
20. Mamanis. G.: Analyzing the performance of a two-tail-measures-utility multi-objective
portfolio optimization model. Oper. Res. Forum 2(58) (2021)
21. Mamanis, G., Anagnostopoulos, K.P.: Multiobjective optimization of a discrete mean-variance-
skewness portfolio selection model using SPEA2. J. Fin. Decis. Mak. 7(2), 75–86 (2011)
22. Maringer, D., Parpas, P.: Global optimization of higher order moments in portfolio selection.
J. Glob. Opt. 43, 219–230 (2009)
23. Markowitz, H.M.: Portfolio selection. J. Fin. 7, 77–91 (1952)
24. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn.
Springer (2013)
25. Ogryczak, W., Ruszczynski, A.: From stochastic dominance to mean-risk models: semidevia-
tions as risk measures. Eur. J. Oper. Res. 116, 33–50 (1999)
26. Ogryczak, W., Ruszczynski, A.: On consistency of stochastic dominance and mean-
semideviations models. Math. Prog. 89, 217–232 (2001)
27. Rockafeller, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42
(2000)
28. Rockafeller, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank
Fin. 26(7), 1443–1471 (2002)
29. Roman, D., Darby-Dowman, K., Mitra, G.: Mean-risk models using two risk measures: a
multi-objective approach. Quant. Fin. 7(4), 443–458 (2007)
30. Yitzhaki, S.: Stochastic dominance, mean variance and Gini’s mean difference. Am. Econ. Rev.
72, 178–185 (1982)
31. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and
the strength pareto approach. IEEE Trans. Evol. Comp. 3(4), 257–271 (1999)
32. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolu-
tionary Algorithm. TIK-103, Department of Electrical Engineering, Swiss Federal Institute
of Technology, Zurich, Switzerland (2001)
Part III
Risk Assessment and Ethical
Considerations
Chapter 8
Bankruptcy Forecasting of Indian
Manufacturing Companies Post
the Insolvency and Bankruptcy Code
2016 Using Machine Learning
Techniques

Simrat Kaur and Anjali Munde

Abstract Purpose: Bankruptcies have increased dramatically in recent years. The


manufacturing industry is one of the most important contributors to the country’s
Gross Domestic Product (GDP). The GDP of a country reflects its development
and progress. More and more bankruptcies in the manufacturing business will have
a significant influence on the country’s GDP. The primary goal of this study is to
conduct a comparative analysis of numerous bankruptcy predictive models in order to
recommend the optimal model with the highest accuracy for bankruptcy prediction.
Methodology/Approach: This research employs a number of machine-learning
forecasting approaches. Logistic Regression, Decision Tree, Artificial Neural
Networks (ANN), and Random Forest are the machine learning techniques employed
in this paper. A comparison study is conducted with and without Principal Compo-
nent Analysis (PCA). A total of 15 financial factors were identified from prior studies,
and a comparative analysis was conducted with those variables. From 1 April 2017
to 31 March 2020, the Insolvency and Bankruptcy Board of India (IBBI) database is
used to collect information on bankrupt companies. Data for the previous three years
is gathered from the annual reports of 70 enterprises (35 bankrupt, 35 non-bankrupt).
Contribution: This paper adds to the existing research on bankruptcy. There is
relatively limited research on bankruptcy prediction after the implementation of the
Insolvency and Bankruptcy Code (IBC), 2016. Most studies on bankruptcy prediction
in India used logistic regression or ANN because of their widespread use and good
accuracy. In India, very few research used decision tree-based methodologies to

S. Kaur (B)
Amity University, Noida, Uttar Pradesh 201313, India
e-mail: [email protected]
A. Munde
Southampton Malaysia Business School, University of Southampton, Iskandar Puteri, Malaysia
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 179
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_8
180 S. Kaur and A. Munde

forecast bankruptcy. This research, on the other hand, contributes to decision tree-
based studies and they are showing more accurate results as compared to ANN or
logistic regression.
Limitations: One of the major limitations of this paper is that it mainly considers
financial variables for research. Recent research has considered not just financial
variables, but also corporate governance indicators and macroeconomic variables.
Another disadvantage is that this report primarily focuses on the manufacturing
industry, thus bankruptcy research in other industries is required.

Keywords Machine learning · PCA · Logistic regression · ANN · Random


forest · Decision tree

JEL Classification C45 · C53 · G33

8.1 Introduction

Timely failure prediction of a business corporation is a crucial issue in today’s


economic system, given the influence of the global financial crisis on the
global economy over the past ten years [1]. Expenses may be incurred by a
distressed company without necessarily resulting in bankruptcy in the end [2]. In
the domains of statistics, accounting, finance, and other business sciences, it is
becoming crucial to be able to predict an enterprise’s financial distress with accuracy
[3]. Predicting corporate financial distress is a subject that has long drawn a great
deal of attention in finance and management research due to its vital significance in
the risk management of financial organizations [4].
An industrial firm (one that has been in business for at least five years) in India is
sent to the Board for Industrial and Financial Reconstruction (BIFR) if it had losses
at the end of any fiscal year equal to or higher than its entire net worth [5]. The
IBC 2016 intends to strengthen the current framework by repealing the Provincial
Insolvency Act of 1920 and the Presidency Towns Insolvency Act of 1909. It also
modifies 11 other legislations, including the Companies Act of 2013, the Recovery
Of Debts And Bankruptcy Act of 1993, and the Securitization and Reconstruction
of Financial Assets and Enforcement of Security Interest (SARFAESI) Act of 2002
[6]. The legal foundation is built up by a series of Rules and Regulations adopted
under this Act. The Insolvency and Bankruptcy Board of India was established as
the regulator on October 1, 2016 [7]. The IBC is a bankruptcy law that intends
to reform and consolidate the legal frameworks associated with the reorganization
and insolvency resolution of corporations, partnership firms, and individuals, among
others:
• Recovery of Debts due to Banks and Financial Institutions Act, 1993
• SARFAESI Act, 2002
• Sick Industrial Companies (Special Provisions) Act, (SICA) 1985 repealed
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 181

• Winding up provisions of the Companies Act, 1956, Companies Act, 2013, and
LLP Act, 2013
• The Presidential Towns Insolvency Act, 1909
• Provincial Insolvency Act, 1920
In June 2017, the Reserve Bank of India (RBI) ordered that the 12 major loan
defaulters be taken before the National Company Law Tribunal (NCLT) and held
responsible under the IBC [8]. There has been relatively little study on forecasting
bankruptcy following the implementation of IBC. The primary goal of the IBC is to
aid distressed corporate defaulters [9].
Few research gaps have been identified while doing research on this research
topic. One of the research gaps is that due to the implementation of IBC 2016, the
research on bankruptcy data post-IBC 2016 is very limited. This paper focuses on
the prediction of bankruptcy on the data post-IBC 2016. Another research gap is
that research on bankruptcy prediction using machine learning techniques in India is
very limited. The significance and use of machine learning in today’s environment
are demonstrated in this paper. The goal of this study is to show how machine
learning has grown for the benefit of society by demonstrating how it can be used
to anticipate bankruptcy. First, PCA is used in this study to investigate the impact
of identified variables on bankruptcy, and then several methodologies such as ANN,
logistic regression, decision tree, and random forest are used to build financial distress
prediction models and compare their performance using bankruptcy data from India.
This can assist in analyzing and predicting a company’s financial health, preventing
it from going bankrupt. Through the use of financial indicators, it can provide a
more in-depth understanding of the implementation of the four models mentioned
above. The models created in this study could be used to predict corporate failure
by investors, creditors, auditors, and others associated with a company. The primary
objectives of this research are:
R1: To establish financial variables for predicting the bankruptcy of Indian
manufacturing companies.
R2: To investigate the impact of financial variables on the bankruptcy prediction of
Indian manufacturing companies using PCA.
R3: To perform a comparative analysis of several machine learning approaches in
Indian manufacturing companies.

8.2 Literature Review

Since the early work of FitzPatrick in the 1930s, there has been extensive research into
the ability to predict financial distress for financial companies [10]. Beaver began by
claiming that financial ratios can be used in models that predict bankruptcy, financial
difficulties, and individual firm failure [11]. In 1968, Altman developed the first
model to predict bankruptcy. Altman developed the Z-score by incorporating five
variables [12]. The model’s short-term accuracy was 95%, according to Altman, but
182 S. Kaur and A. Munde

when applied to two or more years prior to the bankruptcy, that figure lowers to
72%. Ohlson and Zmijewski investigated the possibility of bankruptcy using logit
and probit models [13, 14]. A logit analysis is utilized in a different model created
by Zavgren, to determine the likelihood that a solution specified by a dichotomous
(or polytomous) dependent variable will occur [15].
Neural networks (NNs) dominated the Artificial Intelligence (AI) research area
in the mid-1980s. Ever since, academics have widely utilized NNs, specifically
back-propagation neural networks (BPNN), to solve classification problems such
as bankruptcy prediction [16]. In a 1991 study, Hertz stated that algorithm-based
computer networks called ANNs might be constructed to mimic the internal work-
ings of the human brain [17]. Odom and Sharda were the first to use NNs in an issue of
bankruptcy prediction [18]. Among others, Altman conducted another investigation
with NNs. The problems of “black-box” NN algorithms, including the indicators’
illogical weightings and overfitting in the training stage, which both have a severe
impact on prediction accuracy, were highlighted in particular [19].
Bhunia and Sarkar forecasted financial distress in Indian firms using financial
ratios and multiple discriminant analysis. Profitability and liquidity ratios did excep-
tionally well in predicting distress, according to the findings [20]. Debt ratio, total
asset turnover ratio, working capital ratio, and net income to total assets ratio are all
significant financial measures [21].
In order to better accurately recognize and separate bankrupt companies from
non-bankrupt companies, the topic of novel and innovative forecasting models for
bankruptcy was researched [22]. Fedorova predicts bankruptcy using financial infor-
mation from Russian businesses and combining Multiple Discriminant Analysis
(MDA), Logistic Regression (LR), Classification and Regression Tree (CRT), and
ANNs [23]. Although the MDA is the most commonly used approach for predic-
tive modeling, logistic analysis (LA) techniques are also used to manage various
MDA-related challenges, as stated in [24]. The t-statistics feature selection method
was used to assess a number of intelligence techniques, including Random Forest,
Regression Trees, Support Vector Machines (SVM), Logistic Regression, and Multi-
layer perceptron (MLP) [25]. PCA is another technique used for feature selection.
Pearson was the one who initially introduced the idea of PCA [26]. With PCA, a new
set of variables called principal components is created with each one being a linear
combination of the previous ones [16].
The major goal of the study by Kim was to systematically evaluate machine
learning techniques for forecasting company failure [27]. In business Financial
Distress Prediction (FDP) investigations, logistic regression is extensively utilized.
Decision Trees (DTs) for FDP are used in a variety of studies, including Chen
[28]. Similar methods, like Random Forests, are described in the paper by Breiman,
however, they have the advantage of separating the data using multiple decision trees.
A large number of independent, unpruned decision trees are used by Random Forest
(RF), also known as Random Subspace, for training and creating the class [29].
Creamer and Freund were some of the first researchers to employ random forests
for bankruptcy prediction problems [30]. Only a few research have looked into the
usage of random forest in business financial distress prediction [25]. According to the
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 183

study, machine learning algorithms such as random forest, bagging, boosting, and
SVM outperform statistical techniques such as discriminant analysis and logistic
regression by 10% [31].

8.3 Data Collection and Methodology

8.3.1 Data Collection

The IBBI website was used to obtain information about bankrupt companies, and
77 listed companies were discovered to be bankrupt. 49 of the 77 companies were
in the manufacturing sector. During the data collection method, 14 companies’ data
was revealed to be missing. This study included the remaining 35 companies. Non-
bankrupt companies are selected based on the sector of the bankrupt company and
the total worth of the bankrupt company’s assets [32].
This study involves data from 70 companies. Data from the past three years
is obtained from both non-bankrupt and bankrupt corporations’ annual reports. 35
companies consist of 18 sectors of the manufacturing industry. Four companies are
in the cable sector, as can be seen in the table above. Four companies are in the
textile industry. Three companies are in the steel industry, while two are in the
mining industry. Two companies in the auto ancillaries sector. One company from
the automobile sector. Two companies from the chemical and paper sector each. Four
companies in the gas and petroleum industry. Each company is from the non-ferrous
metal, electronics, Fast-moving consumer goods (FMCG), and glass sector. Three
companies in the agro-processing sector. Each company is from pharmaceuticals,
alcoholic beverages, consumer durables, and the plastic sector.

8.3.2 Methodology

8.3.2.1 Variables

For the purpose of predicting bankruptcy, 15 financial variables are selected. On the
basis of earlier studies, variables are selected (Table 8.1).

8.3.2.2 Predictive Techniques

Python is used to perform supervised learning for the prediction of bankruptcy.


Following the import of the data into Python, missing values were checked. The
dataset contained a total of nine missing values. The cumulative average was utilized
to fill in the missing values. After missing data was sorted, data duplication was
184 S. Kaur and A. Munde

Table 8.1 Variables


Variables Authors
Working capital to total assets [12, 13, 33]
Retained earnings to total assets [12]
Earnings before interest and tax (EBIT) to total assets [12, 33]
Sales to total assets [12, 33]
Earnings before tax (EBT) to current liabilities [33]
Debt to asset ratio [13, 14]
Current ratio [14]
Return on assets (ROA) [13, 14]
Quick ratio [34]
Cash flow ratio [35]
Interest coverage ratio [36]
Return on equity (ROE) [28]
Earnings per share (EPS) [28]
Debt to equity ratio [36, 34]
Cash ratio [36]

investigated. There is no data duplication, as discovered. Outliers were then elimi-


nated. Boxplot was created to determine whether or not there are outliers. The data
for all four models were divided into training and testing periods. The prediction was
tested with 30% of the data, while the remaining 70% was employed for training. The
accuracy of the selected predictive techniques, namely Logistic Regression, Decision
Tree, Artificial Neural Networks (ANN), and Random Forest, are compared in this
research.

8.4 Data Analysis

According to Fig. 8.1, the cumulative sum of variance explained is 89.5. After the
application of PCA (Principal Component Analysis), only 7 variables were found to
be relevant and under the explained variance. Those variables are (Table 8.2).

8.5 Empirical Findings

Before PCA
As seen in Table 8.3, the random forest technique outperformed all other prediction
algorithms in a comparative analysis before adopting PCA. In predicting bankruptcy,
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 185

Fig. 8.1 PCA

Table 8.2 Variables after PCA


Working capital to total assets Retained earnings to total assets
EBIT to total assets Sales to total assets
EBT to current liabilities Debt to asset ratio
Current ratio

decision trees had the second-highest accuracy of 90.47%. Following that, ANN had
an accuracy of 87.76% and logistic regression had an accuracy of 80.95%.
After PCA
After applying PCA, the accuracy of the random forest technique was lowered to
91.67%. The accuracy of ANN did not change after PCA and remained at 87.76%.
After PCA, the accuracy of the decision tree reduced to 85.71%. Logistic regression
had the lowest accuracy yet again but increased to 85.71% after PCA (Table 8.4).

Table 8.3 Results before


Predictive techniques Accuracy (%)
PCA
Logistic regression 80.95
Artificial neural networks 87.76
Decision tree 90.47
Random forest 93.74
186 S. Kaur and A. Munde

Table 8.4 Results after PCA


Predictive techniques Accuracy (%)
Logistic regression 85.71
Artificial neural networks 87.76
Decision tree 85.71
Random forest 91.67

8.6 Conclusion

8.6.1 Managerial Implication

Bankruptcy prediction is one of the most important and rapidly growing areas of
finance. The accuracy of forecasting approaches is critical. Because of the accuracy
of predictive algorithms, the companies will be able to respond to a bankruptcy early
warning. If a method for forecasting bankruptcy is established, investors will be able
to decide whether or not to invest in a company.

8.6.2 Conclusion

Working capital to total assets, retained earnings to total assets, EBIT to total assets,
sales to total assets, EBT to current liabilities, debt to asset ratio, and current ratio
were found to be useful in predicting bankruptcy in India’s manufacturing industry
after employing PCA. After using PCA, the accuracy of the decision tree and random
forest declined. In the paper by Chen, it was found that when PCA was used, the
results for the decision tree and logistic regression approaches were less accurate.
Whereas the accuracy of logistic regression increased after using PCA, the accuracy
of the ANN approach did not change after using PCA [28].
Random forest excelled ANN, decision trees, and logistic regression in predicting
bankruptcy. It demonstrated that random forest outperforms other forecasting tech-
niques, including the most widely used, logistic regression. In this study, ANN
surpassed decision tree techniques, but in earlier studies [37] decision tree was
thought to be superior to ANN. Logistic regression had the lowest accuracy, which
was also reported in earlier research like [28]. Because of their extensive use and high
accuracy, logistic regression, and artificial neural networks were used in the majority
of studies on bankruptcy prediction in India. Few studies in India used decision tree-
based approaches to forecast bankruptcy. This study, on the other hand, contributes
to decision tree-based investigations, which produce more accurate results than ANN
or logistic regression.
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 187

8.7 Future Scope

The most of previous bankruptcy prediction research was undertaken in the banking
industry, particularly in India. As India’s second-largest contributor to GDP, the
manufacturing sector needs extensive research on bankruptcy prediction.
After the implementation of IBC 2016, it is clear that the amount of data that is
freely available in India is limited. As data is limited in this study, future studies with
larger data sets and longer time frames might forecast bankruptcy. This study gives
suggestions for future research on the prediction of bankruptcy in other Indian indus-
tries. Macroeconomic variables and corporate governance measures can also be
included as independent variables when forecasting bankruptcy in India.

References

1. Di Donato, F., Nieddu, L.: A new proposal to predict corporate bankruptcy in Italy during the
2008 economic crisis. In: Causal Inference in Econometrics, 213–223 (2016). https://doi.org/
10.1007/978-3-319-27284-9_13
2. Farooq, U., Jibran Qamar, M.A., Haque, A.: A three-stage dynamic model of financial distress.
Manag. Financ. 44(9), 1101–1116 (2018). https://doi.org/10.1108/MF-07-2017-0244
3. Yu, Q., Miche, Y., Séverin, E., Lendasse, A.: Bankruptcy prediction using extreme learning
machine and financial expertise. Neurocomputing 128, 296–302 (2014). https://doi.org/10.
1016/j.neucom.2013.01.063
4. Ahn, H., Kim, K.: Bankruptcy prediction modeling with hybrid case-based reasoning and
genetic algorithms approach. Appl. Soft Comput. 9(2), 599–607 (2009). https://doi.org/10.
1016/j.asoc.2008.08.002
5. Roychoudhury, A.: Rajya Sabha Passes Bankruptcy Code. Business Standard (2016). https://
www.business-standard.com/article/economy-policy/rajya-sabha-passes-bankruptcy-code-
116051200075_1.html
6. Laws, I.: Short Note on Insolvency and Bankruptcy Code, 2016. IBC Law (2019). https://ibc
law.in/short-note-on-insolvency-and-bankruptcy-code-2016/
7. BCAS.: Insolvency and Bankruptcy Code, 2016 (IBC). BCAS Referencer (2022). https://www.
bcasonline.org/Referencer2018-19/part5/insolvency-and-bankruptcy-code-2016-ibc.html
8. John, N.: Bankruptcy Doubles to 3, 774 in FY20; Manufacturing, Construction Worst-Hit.
Business Today (2020)
9. Kaushik, A.: Is IBC 2016 Effective?|NITI Aayog. NITI Aayog (2020). https://www.niti.gov.
in/ibc-2016-effective
10. FitzPatrick, P.J.: A comparison of ratios of successful industrial enterprises with those of failed
firms. In: The Certified Public Accountant, 598–605 (1932)
11. Beaver, W.H.: Financial ratios as predictors of failure. J. Account. Res. 4, 71–111 (1966)
12. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
J. Financ. 23(4), 589–609 (1968)
13. Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res.
18(1), 109–131 (1980). https://doi.org/10.2307/2490395
14. Zmijewski, M.E.: Methodological issues related to the estimation of financial distress prediction
models. J. Account. Res. 22, 59–82 (1984)
15. Zavgren, V.: Assessing the vulnerability to failure of american industrial firms : a logistic
analysis. J. Bus. Fin. Account. 12 (1985)
188 S. Kaur and A. Munde

16. Ying, S., Shiwei, Z., Tao, Z.: Predicting financial distress of Chinese listed corporate by a hybrid
PCA-RBFNN model. In: 2008 Fourth International Conference on Natural Computation, 3,
277–281 (2008). https://doi.org/10.1109/ICNC.2008.778
17. Hertz, J., Krogh, A., Palmer, R.G., Horner, H.: Introduction to the theory of neural computation.
Phys. Today 44(12), 70 (1991). https://doi.org/10.1063/1.2810360
18. Odom, M.D., Sharda, R.: A neural network model for bankruptcy prediction. In: 1990 IJCNN
International Joint Conference on Neural Networks, 163–168 (1990). https://doi.org/10.1109/
ijcnn.1990.137710
19. Altman, E.I., Marco, G., Varetto, F.: Corporate distress diagnosis: comparisons using lineae
discriminant analysis and neural networks (the Italian experience). J. Banking Fin. 18(3),
505–529 (1994). http://linkinghub.elsevier.com/retrieve/pii/0378426694900078
20. Bhunia, A., Sarkar, R.: A study of financial distress based on MDA. J. Manag. Res. 3(2), 1–11
(2011). https://doi.org/10.5296/jmr.v3i2.549
21. Alifiah, M.N.: Prediction of financial distress companies in the trading and services sector in
Malaysia using macroeconomic variables. Procedia Soc. Behav. Sci. 129, 90–98 (2014). https://
doi.org/10.1016/j.sbspro.2014.03.652
22. Smith, M., Alvarez, F.: Predicting firm-level bankruptcy in the Spanish economy using extreme
gradient boosting. Comput. Econ. 59(1), 263–295 (2022). https://doi.org/10.1007/s10614-020-
10078-2
23. Fedorova, E., Gilenko, E., Dovzhenko, S.: Bankruptcy prediction for Russian companies: appli-
cation of combined classifiers. Expert Syst. Appl. 40(18), 7285–7293 (2013). https://doi.org/
10.1016/j.eswa.2013.07.032
24. Grice, J.S., Dugan, M.T.: The limitations of bankruptcy prediction models: some cautions for
the researcher. Rev. Quant. Financ. Acc. 17(2), 151–166 (2001). https://doi.org/10.1023/A:101
7973604789
25. Chandra, D.K., Ravi, V., Bose, I.: Failure prediction of dotcom companies using hybrid intelli-
gent techniques. Expert Syst. Appl. 36(3), 4830–4837 (2009). https://doi.org/10.1016/j.eswa.
2008.05.047
26. Pearson, K.: LIII. On lines and planes of closest fit to systems of points in space. Lond.
Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901). https://doi.org/10.1080/147864
40109462720
27. Kim, H., Cho, H., Ryu, D.: Corporate default predictions using machine learning: literature
review. Sustainability 12(16), 6325 (2020). https://doi.org/10.3390/SU12166325
28. Chen, M.Y.: Predicting corporate financial distress based on the integration of decision tree
classification and logistic regression. Expert Syst. Appl. 38(9), 11261–11272 (2011). https://
doi.org/10.1016/j.eswa.2011.02.173
29. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
30. Creamer, G., Freund, Y.: Predicting performance and quantifying corporate governance risk for
Latin American Adrs and Banks. In: Financial Engineering and Applications. MIT, Cambridge
(2004)
31. Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction.
Expert Syst. Appl. 83, 405–417 (2017). https://doi.org/10.1016/j.eswa.2017.04.006
32. Lakshan, A.M.I., Wijekoon, W.M.H.N.: Predicting corporate failure of listed companies in Sri
Lanka. GSTF Bus. Rev. (GBR) 2(1), 180–185 (2012)
33. Springate, G.L.V.: Predicting the Possibility of Failure in a Canadian Firm: Unpublished MBA
Research Project/Simon Fraser University (1978)
34. Mselmi, N., Lahiani, A., Hamza, T.: Financial distress prediction: the case of French small and
medium-sized firms. Int. Rev. Financ. Anal. 50, 67–80 (2017). https://doi.org/10.1016/j.irfa.
2017.02.004
35. Ong, S.W., Choong Yap, V., Khong, R.W.L.: Corporate failure prediction: a study of public
listed companies in Malaysia. Manag. Financ. 37(6), 553–564 (2011). https://doi.org/10.1108/
03074351111134745
8 Bankruptcy Forecasting of Indian Manufacturing Companies Post … 189

36. Hu, Y.C., Ansell, J.: Measuring retail company performance using credit scoring techniques.
Eur. J. Oper. Res. 183(3), 1595–1606 (2007). https://doi.org/10.1016/j.ejor.2006.09.101
37. Olson, D.L., Delen, D., Meng, Y.: Comparative analysis of data mining methods for bankruptcy
prediction. Decis. Support. Syst. 52(2), 464–473 (2012). https://doi.org/10.1016/j.dss.2011.
10.007
Chapter 9
Ensemble Deep Reinforcement Learning
for Financial Trading

Mendhikar Vishal, Vadlamani Ravi, and Ramanuj Lal

Abstract Stocks trading strategy plays an important role in financial investment.


However, it is challenging to come up with an optimal profit-making portfolio in a
volatile market. In this thesis, we proposed a couple of ensemble methods that use
a few deep reinforcement learning (DRL) architectures to train on dynamic markets
and learn complex trading strategies to achieve maximum returns on investments. We
proposed three ensemble strategies with three different RL Actor-Critic algorithms
as constituents: Twin Delayed Deep Deterministic Policy Gradient (TD3), Deep
Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO)
and Soft Actor-Critic (SAC). These three ensembles are as follows: (i) PPO, TD3,
and DDPG (ii) SAC, PPO, and TD3 (iii) DDPG, SAC, and PPO and compared their
performance with that of the state-of-the-art ensemble method, performance namely,
Advantage Actor Critic (A2C), PPO, and DDPG. The ensemble techniques adapt to
various market conditions by utilizing the best aspects of all three algorithms. The
effectiveness of these ensembles is demonstrated on 30 Sensex stocks with sufficient
liquidity and 30 Dow Jones Industrial Average (DJIA) indexed stocks. The Sharpe
ratio and maximum drawdown are employed to evaluate the performance of the
ensemble methods.

Keywords Stock trading · Actor-critic framework · Markov decision process ·


Deep reinforcement learning · Ensemble methods

M. Vishal · V. Ravi (B)


Center of Excellence in Analytics, Institute for Development and Research in Banking
Technology (IDRBT), Castle Hills, Masab Tank, Hyderabad 500057, India
e-mail: [email protected]
M. Vishal
e-mail: [email protected]
M. Vishal
The School of Computer and Information Sciences (SCIS), University of Hyderabad,
Hyderabad 500046, India
R. Lal
Graduate School of Business, Stanford University, Stanford, CA 94305, USA
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 191
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_9
192 M. Vishal et al.

9.1 Introduction

Special Notations

Notation Meaning
Gt Cumulative reward at timestep t
Rt Reward earned by the agent at timestep t after performing action at timestep t − 1 in
the environment
D Number of stocks
Z+ Non-negative integers
R+ Non-negative real numbers
bt Balance available in the portfolio at time step t
pt Adjusted close price of each stock
Mt Calculated MACD value using close price
Ct Calculated CCI value using high, low, close prices
Xt Calculated ADX value using high, low, close prices
ht Number of shares of each stock
rt Calculated RSI value using close prices
Rp Expected return of the portfolio
Rf Risk free return of the portfolio
StdDev Standard deviation of the portfolio

“Reinforcement learning problems involve learning what to do—how to map


situations to actions—so as to maximize a numerical reward signal. In an essential
way, they are closed-loop problems because the learning system’s actions influence
its later inputs” [1].
The two components of reinforcement learning difficulties are learning what to
perform and understanding how to link the circumstances to actions and thereby
optimizing a reward signal. Additionally, unlike many machine learning techniques,
the learner is not given clear instructions on how to proceed; instead, they must
experiment to determine which activities result in the most significant rewards. The
most fascinating and difficult situations are those in which choices could affect future
circumstances in addition to the present rewards and, consequently [1].
In financial markets, the general goal is to ideally allocate a set of stocks to maxi-
mize the returns over time and reduce risk at the same time. For financial investors,
it is crucial to invest in a portfolio that can satisfy their goals by building an optimal
portfolio primarily and subsequently and then to rebalance it optimally. Portfolio
theory begins with mean–variance optimization by [2] where he proposed portfolio
selection by maximizing the expected return while minimizing risk in the form of
covariance matrices. Rebalancing a portfolio re-optimizes the weights of the port-
folio over a predefined period. The application of dynamic stock allocation using
dynamic programming methods was introduced in references [3, 4]. Due to the curse
9 Ensemble Deep Reinforcement Learning for Financial Trading 193

of dimensionality in dynamic programming, automated self-learning algorithms are


normally applied by investors and scholars in designing optimal trading strategies
instead we used reinforcement learning to solve this problem.

9.1.1 How Reinforcement Learning Works

Let’s look at a straightforward example to help you understand the reinforcement


learning process.
Task: Think about the possibility of training your cat to perform new tricks.
We can’t directly instruct the cat because she doesn’t speak human or any other
language. We instead use a different approach. The cat tries to react in various ways
as we imitate a situation. We will give the cat a fish if her response is what we want it
to be. Now, whenever the cat encounters the same circumstance, it acts similarly with
even greater fervour in anticipation of receiving an additional reward (food). Similar
to how cats learn “what to do” from satisfying experiences. The cat also picks up
lessons on what not to do when faced with unpleasant situations.
In this instance, your cat is a substance in contact with the outside world. It is
your home in this instance. Your cat might be sitting as an example of a state, and
you might use a particular word to refer to your cat walking. When our agent reacts,
they transition from one “state” to another as depicted in Fig. 9.1. For instance, your
cat may go from sitting to walking. An agent’s response is an action, and a policy
is a way to choose an action given a state in the hopes of better results. After the
changeover, they might receive a reward or a punishment.
The policy is of two types.

Fig. 9.1 Shows action,


reward, agent
194 M. Vishal et al.

deterministic policy A = (S)

stochastic policyπ (S|A) = P(A|S)

A represents the action of the agent, S represents the agent state in the environ-
ment, π (S/A) represents action taken by the agent at the given state S, and P(A/S)
probability distribution for actions to be taken from that state.
The reward is a scalar feedback signal (either positive or negative) that indicates
how well an agent is performing by taking action at the right time. The agent’s job
is to maximize cumulative reward.

Gt = Rt+1 + Rt+2 + Rt+3 + · · ·

The value of a state is expected to be the a cumulative reward, from a state s,

V(s) = E[Gt |St = S]


[ ]
= E Rt+1 + Rt+2 + Rt+3 + · · · |St = S

It is also possible to condition the value on action and it is represented as

Q(s, a) = E[Gt |st = s, at = a]

9.2 Problem Statement

The objective of research on ensemble reinforcement learning for stock trading is to


develop and investigate techniques that leverage the power of ensemble of reinforce-
ment learning algorithms to improve the performance and robustness of automated
trading systems in financial markets.
Ensemble reinforcement learning involves the use of multiple reinforcement
learning models, often referred to as “base learners” or “agents,” that work together
collaboratively to make trading decisions. Each base learner can have its own
strengths, weaknesses, and biases, and by combining their decisions, it is possible to
create a more accurate and reliable trading strategy.
This work in this chapter addresses the research gap that until now a host of
ensembles of varieties of deep reinforcement learning algorithms have not yet been
proposed to aid the decision makers in financial stock market trading. This is an
important departure from the extant literature.
We model stock trading as a Markov decision process (MDP) and formulate our
trading objective as maximization of expected return [5].
MDP model for stock trading
9 Ensemble Deep Reinforcement Learning for Financial Trading 195

To address the dynamic nature of the stock market we used MDP as follows:
State s = [b, p, h] is a vector that includes stock prices p ∈ RD
+.

1. The stock shares h ∈ ZD + , and the remaining balance b ∈ R+ ,


2. Action a is a vector of actions over D stocks. The allowed actions on each stock
include selling, buying, or holding, which result in increasing or decreasing and
or no change of stock shares h, respectively.
3. Reward r (s, a, s' ) is the direct reward of taking action ‘a’ at state s and resulting
in a new state s' .
4. Policy π (s) is the trading strategy at state s, which is the probability distribution
of actions at the new state s.
5. Q-value Qπ (s, a): the expected reward of taking action ‘a’ at state s following
policy
Portfolio value with three possible actions results in three possible portfolios.
Note that hold action may lead to different portfolio values due to the change in
stock prices from time to time.
The rest of the paper is organized as follows: Sect. 9.2 discusses the literature
survey. Section 9.3 describes the proposed methodology which is essential to under-
stand the DRL for stock trading, In Sect. 9.4 the dataset description is provided.
Section 9.5 presents the experimental setup, along with the results obtained for the
datasets. Finally, Sect. 9.6 concludes the paper.

9.3 Literature Survey

Recent deep reinforcement learning applications in the economic markets use either
the actor-critic alone, critic-only, or actor-only learning approaches in discrete or
continuous state and action spaces.
Recently, the actor-critic method has been used in finance [6–9]. The critic
network, which represents the value function, and the actor network, which reflects
the policy, are supposed to be updated simultaneously. The critic network estimates
the value function while the actor modifies the policy probability distribution in
accordance with the critic network using policy gradients methods. Both actor and
critic networks become more skilled at executing better actions and evaluating those
performances.
An actor-only approach [10–12], has been used. The idea is that the agent itself
quickly figures out what to do. A neural network learns the policy as opposed to the
Q-value. The most well-liked learning technique, critic-only learning, a single stock
is trained by an agent and, for instance, employs Deep Q-learning (DQN) and its
upgrades to resolve a discrete action space problem [13–16].
The authors developed and implemented the Temporal Difference and Kernel-
based Reinforcement Learning techniques in [17] for Financial Trading systems [18]
introduces a model-less convolutional neural network that uses the historical cost data
for a collection of financial assets as input and produces the group’s portfolio weights.
196 M. Vishal et al.

0.7 years’ worth of price information from a bitcoin exchange is used to train the
network. To maximize the cumulative return, training is carried out via reinforcement
techniques and a survey by [19].
Several studies simplified trading activities to include purchasing, selling, or
holding a single asset using RL with discrete action space. Trading with a small
number of positions has also been utilized. Still, it is not easy to extend this strategy
to big portfolios because adding assets causes the action space to grow exponentially.
Policy-based RL is used with deep learning as its approximation function to handle
the continuous action space problem [20–23].
Authors of [24] presented the Differential Sharpe ratio, Sterling ratio, Calmar
ratio, and Optimal variable weight Portfolio Allocation as an optimal function.
In [25], ReLU neurons are contained in three layers of neural networks that are
used to train RL agents using the Q-learning algorithm. In order to determine the
effects of earlier states and actions on policy optimization in non-Markov decision
processes, [26] employed deep recurrent neural network (RNN) models, GRU [27]
experimented with several RL approaches to integrate with the DL technique while
resolving the policy optimization issue.
The trading agent carefully tracks the price of an asset in this new price trailing
approach [28]. Instead of properly forecasting the future price within a specified
margin (direction) [5], the deep reinforcement learning approach effectively trains
an intelligent automated trader. It incorporates historical stock price information as
well as market mood perception by collecting information regarding [29] for a stock
portfolio made up of Dow Jones businesses [18, 26] and cryptocurrency [10].
Creation of a trading strategy for markets with a changing number of assets.
Unseen assets can be simply integrated without the network having to be altered or
retrained. For markets with transaction costs, optimal transactions are calculated [6],
and there is also a study on Asset variability and correlation [36].

9.4 Proposed Methodology

Authors of [37] proposed the ensemble strategy of three algorithms, Advantage


Actor Critic (A2C), Proximal Policy Optimization (PPO), and Deep Deterministic
Policy Gradient (DDPG), on 30 Dow Jones stocks. In our research, we worked on a
different set of algorithms and developed an ensemble strategy (whose block diagram
is depicted in Fig. 9.2) on the same Dow Jones stocks as well as the 30 Sensex
Stocks. Data sets are downloaded from Yahoo Finance [38] and for RL algorithms
we used the framework Stable baselines [39].
The state transaction of a stock trading process is depicted in Fig. 9.3. Each state
chooses one of three possible stock actions d (d = 1,2,3, …, D) in the portfolio.
Selling k[d] ∈ [1, H[d]] shares results in Ht+1 [d] = Ht [d] – k[d], where k[d] ∈ Z+
and d = 1, 2, …., D.
9 Ensemble Deep Reinforcement Learning for Financial Trading 197

Fig. 9.2 Block diagram of the ensemble strategy

Holding, Ht+1 [d] = Ht [d].


Buying k[d] shares results in ht+1 [d] = Ht [d] + k[d].
At time t action is performed and the stock prices update at t + 1, according to
respective portfolio value may change to “portfolio value 0” to “portfolio value 1”,
“portfolio value 2”, or “portfolio value 3”, the portfolio value is pT H + b (Fig. 9.3).

Fig. 9.3 Block diagram for actions in RL agent


198 M. Vishal et al.

9.4.1 Assumptions Made During Stock Trading

The following assumption and constraints are made during stock trading.
• Non-negative balance b ≥ 0: actions of RL shouldn’t result in a deficit in the
account. The stocks are separated into sets for sale (S) based on the behaviour at
time t, buying B, holding H, were
• SU BU H = {1, 2, . . . , D} and they have no common elements in a set. Let pt B =
[pt i : i ∈ B] and kt B = [kt i : i ∈ B] be the vectors of prices of stocks and the number
of shares to buy for a particular stock. Similarly for selling stocks pt S and kt S , for
holding stocks Pt H and Kt H .
– Bt+1 = Bt + (Pt S )T Kt S − (Pt B )T , Kt B ≥ 0
• Transaction cost: A cost that the investor incurs when trading in the stock market
which is deducted from the sum of money. These include SEBI fee, Stamp duty,
securities transaction tax (STT), exchange fee, and brokerage GST, we assume
our transaction cost to be 0.15% of the value of every trade
– Ct = PT kt * 0.15%
• Market liquidity: Orders are carried out at a close price. We anticipate that our
RL agent won’t have an impact on the market.

9.4.2 Stock Market Environment

Environment for multiple stocks


We used continuous action space to model the trading of several stocks, in our
portfolio, there are 30 stocks.

9.4.2.1 State Space

We used a dimensional vector consisting of seven values that represent the state
space. [bt , pt , Mt , Ct , Xt , ht , Rt ].

9.4.2.2 Action Space

For every stock, the action space is defined as {− x, − (x + 1), …, − 1, 0, 1…, x}


where x and − x represent the number of shares that we can buy and sell. The entire
action space is (2x + 1)30 .
9 Ensemble Deep Reinforcement Learning for Financial Trading 199

9.4.3 RL Trading Agents

In order to develop a highly reliable trading strategy that chooses one of the three RL
algorithms based on the Sharpe ratio when trading, we used the ensemble technique.
This is due to the fact that each trading agent is sensitive to various market move-
ments. One algorithm is good at predicting market upward trend (bullish), other is
adjusted to the unstable market, whichever algorithm gets the highest Sharpe ratio,
that algorithm is selected as a trading agent. We download the data from the yahoo
finance website from 01-01-2008 to 01-09-2021, whole data is split into 01-01-
2008 to 01-01-2017 for training, from 02-01-2017 to 01-01-2018 for validation, and
tuning hyperparameters. We tested the performance of the model from trading date
02-01-2018 to 01-09-2021.

9.4.3.1 Advantage Actor-Critic (A2C)

In the ensemble strategy, we used the traditional actor-critic algorithm (A2C) [10].
A2C is improved by using policy gradient updates as the replacement for Trust region
policy optimization [40]. To lessen the variation of the policy gradient technique, A2C
uses an advantage function which along with value function is estimated by the critic
network. As a result, both the aspects i.e., how good activity is today and how much
better it can be in the future are considered in its evaluation. To improve the model
and reduce the policy network’s high variance.
To update the gradients with different data samples, A2C uses duplicates of the
same agent. Each agent functions autonomously in order to interact with the same
environment. A global network receives the average gradients across all agents by
A2C using a coordinator at the end of when all agents finished calculating their
gradients in each iteration. For the global network to update the critic and actor
networks. A2C’s objective function is described as below:
( T )

∇Jθ (θ ) = E ∇θ logπθ (at |st )A(st , at ) . (9.1)
t=1

where π(at |st ) is a policy network, A(st , at ) is the Advantage function and it is
expressed as:

A(st , at ) = Q(st , at ) − V (st ) (9.2)

9.4.3.2 Deep Deterministic Policy Gradient (DDPG)

To encourage the highest possible investment return, DDPG [30] is utilized. DDPG
utilizes neural networks as function approximators and integrates the frameworks
200 M. Vishal et al.

of policy gradient [41] and Q-learning [26]. In contrast to DQN, which derives
its knowledge indirectly from Q-values tables and is plagued by the dimensionality
problem [42], DDPG derives its knowledge directly from observations through policy
gradient. It is advised to deterministically map states to actions in order to more
closely match the continuous action space environment.
Every time step, the DDPG agent takes a step. at at st , receives a reward r t, and
arrives at st+1 . The transitions (st , at , st+1 ,r t ). The buffer is used to draw a group of
N transitions, and the Q-value yi is updated as follows:
∧ ∧
yi = ri + γ Q0 (si+1 , μ0 (si |θ μ , θ Q )), i = 1, . . . , N. (9.3)

By minimizing the loss function L(θ Q ) which represents the anticipated difference
between the outputs of the target critic network Q̂ and the critic network Q, the critic
network is then updated. Q, i.e.,
( ) [( ( ))2 ]
L θ Q = Est,at,rt,st+1 ∼ buffer yi − Q st , at |θ Q (9.4)

9.4.3.3 Proximal Policy Optimization (PPO)

As a part of the ensemble technique, we investigate and employ PPO. To ensure that
the new policy won’t diverge too far from the previous one by updating the policy
gradient, PPO [40] is developed. By adding a clipping term to the goal function, PPO
attempts to simplify the Trust Region Policy Optimization (TRPO) [5, 40, 43].
Let us assume the likelihood ratio of the new policies versus the old ones is
expressed as:

πθ (at |st )
rt (θ ) = (9.5)
πθold (at |st )

The clipped surrogate objective function of PPO is:


[ ( )]
J CLIP (θ ) = Et min rt (θ )Aŝ (st at ), clip(rt (θ ), 1 − ∈, 1 + ∈)A(st , at ) (9.6)

9.4.3.4 Twin Deep Deterministic Policy Gradient (TD3)

The successor to DDPG is TD3, which is unstable and strongly dependent on deter-
mining the proper hyperparameters [44]. This causes the algorithm to overestimate
the Q values; nevertheless, as time goes on, these inaccuracies cause the program to
reach local optimum conditions in TD3, and the problem is solved.
9 Ensemble Deep Reinforcement Learning for Financial Trading 201
∑ |
|
∇φ J (φ) = N −1 ∇a Qθ1 (s, a)| ∇φ πφ (s) (9.7)
a=πφ (s)

Critic network Qθ1 , and actor-network πφ with random parameters θ1 , φ.

9.4.3.5 Soft Actor-Critic (SAC)

The definition of SAC for RL involves continuous action spaces, it not only maxi-
mizes the total reward but also maximizes the entropy of the policy which helps in
improving the exploration.


T
J (π ) = E(st ,at )∼ρπ [r(st , at ) + αH(π (·|st ))] (9.8)
t=0
[ ]
Q̂(st , at ) = r(st , at ) + γ Est+1 ∼p Vψ (st+1 ) (9.9)

An objective function with a reward term and an entropy term H that are both
weighted by α, a state value function V parameterized by ψ.

9.5 Dataset Description and Experimental Setup

We considered all the 30 stocks data from yahoo finance which are listed under the
Sensex index and Dow Jones respectively. Before training, we construct the DRL
agent’s environment to resemble a real-world trading system so that the agent can
engage in interaction and learning. Practical trading requires consideration of several
variables which include historical stock prices, present shareholdings, technical indi-
cators, etc. Our trading agent must observe the surrounding area for information
and conduct the appropriate steps outlined in the preceding section. To create our
environment and train the agent, we used OpenAI gym [39]. All the experiments
are performed on Google Collaboratory which is free for all Google account users
(premium account features are also present).

9.6 Results and Discussion

The following performance metrics are used in the current research work.
• Cumulative returns are the percentage return given by the portfolio from the initial
value to the final value.
• The geometric average of the money that an investment earned each year during
a specific time is known as annualized returns.
202 M. Vishal et al.

• The largest percentage of loss that occurred during the trading period is known
as the max drawdown.
• A portfolio’s risk-adjusted return is measured by the Sharpe ratio.
( )
S = Rp − Rf /StdDev

And we used some technical indicators like MACD, CCI, ADX, and RSI which
are described in the following paragraphs.
• Moving Average Convergence Divergence (MACD) is a trend-following indicator
that displays the correlation between two stock price moving averages.
• Commodity Channel Index (CCI) compares the price at the moment to the price
average over a certain period of time.
• Average Directional Movement Index (ADX) is a trend indicator that helps us
decide if a trend is worth following or not.
• A gauge of momentum, the Relative Strength Index (RSI), assesses the size of
recent price fluctuations to assess overbought or oversold positions.
• Annual volatility is the measure of the variance of returns over a year.
• Calmar ratio is the ratio of average annual return and max Drawdown.
• The omega ratio is the risk-return performance measure of the investment
portfolio.
• The risk-adjusted return of an investment asset is the Sortino ratio. When the
likelihood that the investment would deviate more than three standard deviations
from the mean which is indicated by a normal distribution is the Tail Ratio.
• Alpha is the excess return of an investment relative to the benchmark index.
• Beta is a metric used to compare a portfolio’s volatility to that of the entire market.
Ensemble strategy with TD3, PPO, DDPG outperformed all other ensembles
Strategies presented in Table 9.1 and the upward arrow indicates the larger value is
most desirable, whereas the downward arrow indicates the lower value is the most
desirable.
Table 9.2 presents the results of 30 Dow Jones index stocks for all three ensemble
strategies. The ensemble method TD3, PPO, and DDPG outperformed the other two
ensemble strategies.
Table 9.3 presents the results of 30 Dow Jones index stocks for all three ensembles.
The ensemble method TD3, PPO, and DDPG outperformed the other two ensembles.
Table 9.2 Results presented in column 1 are based upon the combination of algorithms
explained in [37] and compared with our other combinations of algorithms.
TD3 learns two Q-functions instead of one and uses the smaller of two Q-vales
to form the targets in the loss functions, it updates less policy less frequently than
the Q-function. TD3 adds noise to target action, to exploit Q-function errors by
smoothing out Q along with changes in action, due to these special features of TD3,
We observed it is performing better and giving good results.
Whereas, SAC uses entropy regularization where the policy is trained to maximize
a trade-off between expected return and entropy (randomness in the policy).
9 Ensemble Deep Reinforcement Learning for Financial Trading 203

Table 9.1 Literature survey of RL in the stock market


References RL algorithm Financial sector Metrics for evaluation
[11] RRL with NN US futures/US treasury-bill Total return, Sharpe ratio
[30] DDQN with LSTM Forex Total returns on investments,
Sharpe ratio Max. drawdown
[23] DQN with LSTM US stocks and China futures Annual return, Sharpe ratio
[31] DQN with LSTM forex Average return on investment,
Sharpe ratio, Sortino ratio,
Max. drawdown
[8] DDPG with GRU U.S., U.K., and Chines stocks Total return, Sharpe ratio
[9] GNP Japan stock market Total returns
[32] Adaptive fuzzy Various stock markets Total returns, Sharpe ratio
[33] RDPG with GRU China stock market Total returns, Sharpe ratio,
max Drawdown
[34] A3C with LSTM Russian stock market Total returns, Sharpe ratio
[35] RDPG with LSTM Russia stock market Total returns, Sharpe ratio
[6] PPO with NN US Stock market Total returns

Table 9.2 Results of three ensemble strategies applied on 30 Sensex index stocks
Algorithms used in ensemble A2C, PPO, DDPG SAC, PPO, DDPG TD3, PPO, DDPG
strategy
Annual return ↑ (%) 15.573 12.433 15.922
Cumulative returns ↑ (%) 84.985 64.552 87.372
Annual volatility ↓ (%) 21.202 22.645 21.541
Sharpe ratio ↑ 0.79 0.63 0.79
Calmar ratio ↑ 0.42 0.33 0.44
Max drawdown ↓ (%) − 36.878 − 37.382 − 36.411
Omega ratio ↑ 1.17 1.31 1.16
Sortino ratio ↑ 1.11 0.91 1.13
Tail ratio ↑ 1.08 1.24 1.40
Daily value at risk ↓ (%) − 2.605 − 2.796 − 2.646
Alpha ↑ 0.09 0.15 0.19
Beta ↑ 0.18 0.02 0.02
Stability ↑ 0.81 0.80 0.91
Time to run Ensemble strategy 138 213 163
(min)
204 M. Vishal et al.

Table 9.3 Results of 30 Dow Jones index stocks


Algorithms used in A2C, PPO, SAC, PPO, TD3 TD3, PPO, SAC, PPO,
ensemble strategy DDPG [37] DDPG DDPG
Annual return ↑ (%) 9.965 10.245 14.593 11.811
Cumulative returns 51.21 52.895 80.934 62.577
↑ (%)
Annual volatility ↓ 8.288 8.657 8.305 7.923
(%)
Sharpe ratio ↑ 1.19 1.17 1.68 1.45
Calmar ratio ↑ 1.32 1.30 2.50 2.06
Max drawdown ↓ − 7.524 − 7.865 − 5.839 − 5.735
(%)
Omega ratio ↑ 1.31 1.31 1.47 1.40
Sortino ratio ↑ 1.80 1.71 2.71 2.28
Tail ratio ↑ 1.24 1.24 1.40 1.44
Daily value at risk ↓ − 1.005 − 1.05 − 0.991 − 0.953
(%)
Alpha ↑ 0.09 0.09 0.13 0.11
Beta ↑ 0.15 0.16 0.14 0.14
Stability ↑ 0.94 0.90 0.86 0.94
Time to run 145 261 162 197
Ensemble strategy
(min)

DDPG uses policy data and the bellman equation to learn Q-function and Q-
function to learn the policy. Sometimes this Q-function is overestimated due to the
policy it learned. Which is rectified in TD3.
PPO is a policy gradient method where policy is updated explicitly.
Figure 9.4 is the sample of output taken from the ensemble strategy TD3, PPO,
and DDPG on 30 Sensex index stocks. That indicates the actions taken by the TD3
agent in the ensemble strategy between the period of 2018 to 2021. Where index 2,
index 3… etc. indicate stocks that are arranged in alphabetical order.

9.7 Conclusions

In this paper, we proposed the possibility of using TD3, SAC, DDPG, PPO, and A2C
agents, which are actor-critic based algorithms, to develop stock trading strategy. We
employ an ensemble technique to automatically choose the best performing agent
to trade based on the Sharpe ratio in order to adapt to various market conditions.
Results demonstrate that the TD3, PPO, DDPG ensemble approach outperforms the
9 Ensemble Deep Reinforcement Learning for Financial Trading 205

Fig. 9.4 Actions are taken by the TD3 agent in ensemble strategy

other ensemble strategies. We included transaction charges in addition to using stock


data from the Dow Jones and Sensex indices.
In future we consider, exploring more complex models, overcoming empirical
difficulties, and working with enormous amounts of data like the equities that make
up the S&P 500 will all be intriguing areas for future research. Additionally, we can
investigate other features for the state space, such as combining advanced transaction
cost and liquidity models, fundamental analytical indicators, and natural language
processing analysis of financial market news into our observations. Although the
agents must examine a lot more historical data, the state space will grow exponentially
if we use the Sharpe ratio directly as the reward function.

Acknowledgements The authors are thankful to the senior domain expert, Mr. Rajiv Ramachan-
dran for helping us in understanding the concepts of the stock market and guiding us in the project
during his tenure at IDRBT.

References

1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
2. Rubinstein, M.: Markowitz’s “portfolio selection”: a fifty-year retrospective. J. Finance 57(3),
1041–1045 (2002)
3. Betancourt, C., Chen, W.H.: Deep reinforcement learning for portfolio management of markets
with a dynamic number of assets. Exp. Syst. Appl. 164, 114002 (2021)
4. Bertsekas, D.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific (2012)
5. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization.
In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
6. Zhang, Z., Zohren, S., Roberts, S.: Deep reinforcement learning for trading. J. Finan. Data Sci.
2(2), 25–40 (2020)
7. Xiong, Z., Liu, X.Y., Zhong, S., Yang, H., Walid, A.: Practical deep reinforcement learning
approach for stock trading (2018). arXiv preprint arXiv:1811.07522
206 M. Vishal et al.

8. Bekiros, S.D.: Heterogeneous trading strategies with adaptive fuzzy actor–critic reinforcement
learning: a behavioral approach. J. Econ. Dyn. Control 34(6), 1153–1170 (2010)
9. Li, J., Rao, R., Shi, J.: Learning to trade with deep actor critic methods. In: 2018 11th Interna-
tional Symposium on Computational Intelligence and Design (ISCID), vol. 2, pp. 66–71. IEEE
(2018)
10. Jiang, Z., Liang, J.: Cryptocurrency portfolio management with deep reinforcement learning.
In: 2017 Intelligent Systems Conference (IntelliSys), pp. 905–913. IEEE (2017)
11. Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Trans. Neural Netw.
12(4), 875–889 (2001)
12. Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial
signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2016)
13. Chen, L., & Gao, Q.: Application of deep reinforcement learning on automated stock trading.
In: 2019 IEEE 10th International Conference on Software Engineering and Service Science
(ICSESS), pp. 29–33. IEEE (2019)
14. Dang, Q.V.: Reinforcement learning in stock trading. In: International Conference on Computer
Science, Applied Mathematics and Applications, pp. 311–322. Springer, Cham (2019)
15. Jeong, G., Kim, H.Y.: Improving financial trading decisions using deep Q-learning: predicting
the number of shares, action strategies, and transfer learning. Exp. Syst. Appl. 117, 125–138
(2019)
16. Wang, X., Gu, Y., Cheng, Y., Liu, A., Chen, C.P.: Approximate policy-based accelerated deep
reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 31(6), 1820–1830 (2019)
17. Bertoluzzo, F., Corazza, M.: Testing different reinforcement learning configurations for
financial trading: Introduction and applications. Proc. Econ. Finance 3, 68–77 (2012)
18. Pendharkar, P.C., Cusatis, P.: Trading financial indices with reinforcement learning agents.
Exp. Syst. Appl. 103, 1–13 (2018)
19. Fischer, T.G.: Reinforcement Learning in Financial Markets—A Survey. No. 12/2018. FAU
Discussion Papers in Economics (2018)
20. Weng, L., Sun, X., Xia, M., Liu, J., Xu, Y.: Portfolio trading system of digital currencies: a deep
reinforcement learning with multidimensional attention gating mechanism. Neurocomputing
402, 171–182 (2020)
21. García-Galicia, M., Carsteanu, A.A., Clempner, J.B.: Continuous-time reinforcement learning
approach for portfolio management with time penalization. Exp. Syst. Appl. 129, 27–36 (2019)
22. Raffin, A., Hill, A., Ernestus, M., Gleave, A., Kanervisto, A., Dormann, N.: Stable baselines3
(2019)
23. Gold, C.: FX trading via recurrent reinforcement learning. In: 2003 IEEE International Confer-
ence on Computational Intelligence for Financial Engineering, 2003. Proceedings, pp. 363–370.
IEEE (2003)
24. Almahdi, S., Yang, S.Y.: An adaptive portfolio trading system: a risk-return portfolio opti-
mization using recurrent reinforcement learning with expected maximum drawdown. Exp.
Syst. Appl. 87, 267–279 (2017)
25. Carapuço, J., Neves, R., Horta, N.: Reinforcement learning applied to Forex trading. Appl. Soft
Comput. 73, 783–794 (2018)
26. Hu, Y.J., Lin, S.J.: Deep reinforcement learning for optimizing finance portfolio management.
In: 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 14–20. IEEE
(2019)
27. Kim, T.W., Khushi, M.: Portfolio optimization with 2D relative-attentional gated transformer.
In: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE),
pp. 1–6. IEEE (2020)
28. Katongo, M., Bhattacharyya, R.: The use of deep reinforcement learning in tactical asset allo-
cation. Available at SSRN 3812609 (2021)
29. Koratamaddi, P., Wadhwani, K., Gupta, M., Sanjeevi, S.G.: Market sentiment-aware deep
reinforcement learning approach for stock portfolio allocation. Eng. Sci. Technol. Int. J. 24(4),
848–859 (2021)
9 Ensemble Deep Reinforcement Learning for Financial Trading 207

30. Zarkias, K.S., Passalis, N., Tsantekidis, A., Tefas, A.: Deep reinforcement learning for finan-
cial trading using price trailing. In: ICASSP 2019–2019 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 3067–3071. IEEE (2019)
31. Mabu, S., Chen, Y., Hirasawa, K., Hu, J.: Stock trading rules using genetic network program-
ming with actor-critic. In: 2007 IEEE Congress on Evolutionary Computation, pp. 508–515.
IEEE (2007)
32. Ponomarev, E.S., Oseledets, I.V., Cichocki, A.S.: Using reinforcement learning in the
algorithmic trading problem. J. Commun. Technol. Electron. 64(12), 1450–1457 (2019)
33. Liu, Y., Liu, Q., Zhao, H., Pan, Z., & Liu, C.: Adaptive quantitative trading: an imitative
deep reinforcement learning approach. In: Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 34, no. 02, pp. 2128–2135 (2020)
34. Briola, A., Turiel, J., Marcaccioli, R., Aste, T.: Deep reinforcement learning for active high
frequency trading (2021). arXiv preprint arXiv:2101.07107
35. Li, Y., Zheng, W., Zheng, Z.: Deep robust reinforcement learning for practical algorithmic
trading. IEEE Access 7, 108014–108022 (2019)
36. Olschewski, S., Diao, L., Rieskamp, J.: Reinforcement learning about asset variability and
correlation in repeated portfolio decisions. J. Behav. Exp. Finance 32, 100559 (2021)
37. Yang, H., Liu, X.-Y., Zhong, S., Walid, A.: Deep Reinforcement Learning for Automated Stock
Trading: An Ensemble Strategy. SSRN (2020)
38. Aroussi, R.: yfinance. PyPI (2019). Retrieved July 15, 2022. https://pypi.org/project/yfinance
39. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.:
Openai gym (2016). arXiv preprint arXiv:1606.01540
40. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement
learning with function approximation. Adv. Neural Inf. Process. Syst. 12 (1999)
41. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., et al.: Asynchronous
methods for deep reinforcement learning. In: International Conference on Machine Learning,
pp. 1928–1937. PMLR (2016)
42. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., et al.: Continuous control
with deep reinforcement learning (2015). arXiv preprint arXiv:1509.02971
43. Sewak, M., Sahay, S.K., Rathore, H.: Policy-approximation based deep reinforcement learning
techniques: an overview. In: Information and Communication Technology for Competitive
Strategies (ICTCS 2020), pp. 493–507 (2022)
44. Buşoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I.: Reinforcement learning for control:
performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018)
Part IV
Real-World Applications
Chapter 10
Bibliometric Analysis of Digital Financial
Reporting

Neha Puri and Vikas Garg

Abstract The way people live and conduct business has altered as a result of the
fastest-growing technologies in recent years. The existence of the internet and mobile
devices has resulted in a significant shift in many industries, including banking and
finance, from manual to automation activity and from offline to online transactions.
This study’s goal is to examine the literature that has been written about digital finan-
cial reporting between 2011 and 2022. The approach used in this study is descriptive
research, which is based on document analysis of earlier studies and kinds of litera-
ture on digitalization and financial reporting that were either taken for free or without
requiring registration from online journals. Articles are gathered from a research
database i.e. Scopus and 879 articles that are relevant to this topic were gathered and
looked at. This study looked at a number of factors the volume of articles published
and citation analysis using the bibliometric tool. The overall outcome of this study
shows that the majority of earlier studies focused on how digitalization has bene-
fitted financial reporting.Please confirm if the inserted city name is correct. Amend
if necessary.No changes

Keywords Digital · Financial reporting · Bibliometric · Citation analysis ·


Digitalization

N. Puri · V. Garg (B)


Amity University, Greater Noida, Uttar Pradesh, India
e-mail: [email protected]
N. Puri
e-mail: [email protected]
N. Puri
ACCF, Amity University, Noida, Uttar Pradesh, India

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 211
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_10
212 N. Puri and V. Garg

10.1 Introduction

Digital revolution has changed the entire industry in the world. The Industrial revo-
lution 4.0 is quite different in the era of technology. New strategies for business
are being developed by the Government for superior business practice. The digital
revolution has a tremendous effect on all the sectors in the world economy. Indus-
trial revolution 4.0 is not so widely spread but it provides immense opportunity
to human to change the aspect of their life [5]. The information flow has become
fast with the rapid growth in the digital technology. The digital transformation has
bought transparency in the financial reporting system. Financial reporting means
details of any companies financial statements and it is an essential information for
the better growth of the industry. Through digitalization exchange of financial infor-
mation throughout the industry has become easy and transparent. Due to technology
the whole financial reporting process has not only become convenient but also the
quality of information has improved. Digitalization has created new and innovative
prospects for the inventors and organization [14]. The adoption of technology for
example cloud computing, cyber security, big data has helped in the collection of
data and its processing. It helps organization to simplify its financial and corporate
reporting. The emerging digital technology has completely changed the industry
practice. This digital financial reporting process has been widely researched by the
different researchers [6, 17]. The format of presenting accounting data has changed
due to the influence of digitalization. There is a proverb by Henry Ford “If you always
do what you have done, you will always get what you have always got”. These lines
means that if you want something different and extraordinary you will have to shift
form the traditional method to the new one. Now a days everyone is running out of
time, in this situation experts want short and crisp with relevancy in the information.
Financial reports is one of the main component for any kind of business. A set
of technology helps to provide a good quality of reporting at a lower cost. Use
of digital tools for example automation, machine learning, advance analytics helps
to prepare the reports faster. In the coming future it is implied that the reporting
tools itself will become interactive and their will be no need of the paper based
reports for communicating. Digitalization is throughout a new practice in industry
and sometimes it is felt that change will take some time. Financial reporting is not
all about technology only but it is a process of understanding the information in a
more effective way. Using tools and techniques has ended up the repetitive process.
Moudud-Ul-Huq [12] stated in his study that the use of digitalization is helpful for the
auditors also in audit planning, assessing risk, internal evaluation and going concern
decisions. Many large companies have automated the accounting work and they find
it easy. Most of the companies are doing investment for purchasing softwares to
facilitate the accounting work. Accounting information system which is a computer
based system is used for performing accounting analysis in a company. Integration
of digitalization into accounting system has become the essential need in this era.
Innovation in technology has not only made the process of working easy but it has
enhanced the productivity also which leads to fast economic growth. The entire
10 Bibliometric Analysis of Digital Financial Reporting 213

flow of the business has developed due to the impact of digital technology. The
presentation of economic performance through companies website is called internet
financial reporting. Stakeholders and investors of the company prefer internet based
financial reporting, instead of traditional financial reporting system as it saves cost
and maintains time efficiency. This is the era of globalization and the companies are
motivated to publish accurate information and both the financial and non-financial
information of the company should be transparent for the stakeholders. Internet
financial reporting tool helps the companies to deal with the stakeholders in a better
way for reducing problems. All the dimensional aspect of the companies future and
to maintain the sustainability is achieved through internet financial reporting.
This chapter is a bibliometric analysis of the existing literatures on the impact of
digitalization on financial reporting. This study provides a comprehensive overview,
by analyzing the findings from the previous literatures on digital financial reporting.

10.2 Literature Review

Damayanti et al. [2] has conducted the study on MSME owners who are now using
digital platform for keeping financial records. The researcher identified that owners
are very positive about using digital technology for maintain financial records as it
has helped them to run their business smoothly. Efimova and Rozhnova [4], exam-
ined the effect of technology on the development of financial reporting process. What
role does technology play in the delivery of financial information and what transfor-
mations have been done in the overall financial recording process. Kulikova et al.
[9] has studied on the development of financial reporting system and presented the
changes in the economy and how technology is significantly impacting the whole
financial system in the world. The study conducted by Nurunnabi and Hossain [15]
in an emerging economy, Bangladesh identified the effect of big data on companies
characteristic. The research was conducted on 83 listed companies and the result
concluded that only 28 companies provided financial information on web sites. Zaidi
et al. [21] has studied about the shift from Bhi-khata to annual statement on compa-
nies website which has occurred due to the technical changes in the economy. With
the help of Beneish model financial statement and window dressing in it of 100
listed companies on NIFTY were investigated. It was identified by the researcher
that construction companies do much window dressing in their financial statements.
A detail and depth audit report is needed to identify flaws in the financial statement.
A rule was approved in SEC in 2008 for the companies to submit their financial
statement in XBRL format. This study by Efendi et al. [3] emphasized on the global
development of XBRL and its benefit. Initially the reporting was low but over the
period of time it increased. The adopters not so used to using it face difficulties
in reporting but after learning will make it easy and it will be cost saving for the
companies. Moudud-Ul-Huq [12] stated in his study that the use of digitalization
is helpful for the auditors also in audit planning, assessing risk, internal evaluation
and going concern decisions. Many large companies have automated the accounting
214 N. Puri and V. Garg

work and they find it easy. A study was conducted was [16] on the 76 companies
of Germany about their current and future status of digital development. A ques-
tion was asked in the survey, how companies are planning towards the use of new
technology and they responded nearly 20% of the companies will share their data
to the suppliers and customers, 19% of them will use online transfer of payments,
14% will replace excel worksheet. Lasi et al. [10] has said that the first industrial
revolution is linked to the mechanization, the second one is linked with the extensive
use of power/electricity and this industry 4.0 is linked with digitalization. Salaudeen
and Alayenni [18] explains why companies take the help to publish their financial
reports on internet and how it is convenient for the stakeholders to collect information
through online platform easily. Internet financial reporting system is overall a very
smooth process for both the companies and investors. Not only the stakeholders but
public can also take financial information of any company. Development in IFR has
been marked so significant that regulators has now made mandate for the companies
to disclose their corporate information on the companies’ website [8]. The content of
IFR includes profit and loss statements, balance sheet, cashflow, sustainability report
and CSR report [20].

10.3 Methodology

The methodology section includes the objectives of the research and the process of
obtaining the data set. The primary objective of this study is to conduct a bibliometric
analysis of papers on Digital financial reporting found in the Scopus database, and
specific research questions have been formulated to achieve this goal (Table 10.1).

10.4 Data Extraction

It is crucial to select the appropriate search engine for data extraction. In this study,
Scopus was chosen for this purpose due to its reputation as a prominent index and
the fact that it publishes peer-reviewed, high-quality work. Additionally, Scopus
measures the quality of each title using metrics such as the h-index, Cite Score,
SCI Imago Journal Rank, and Source Normalized Impact per Paper. Using Scopus,
879 research papers on the topic of digital financial reporting were found in the
index, with 879 published between 2010 and 2022. The search string used on
Scopus included parameters for identifying annual trends, leading authors and jour-
nals, subject areas, document types, affiliations, and top countries. This search was
conducted on September 21, 2022.
10 Bibliometric Analysis of Digital Financial Reporting 215

Table 10.1 Research questions with their significance


Research questions Significance
Annual Publication Trend of Digital Financial It would help in determining the annual
Reporting publication trend and can aid in predicting
future trends in the field
Leading Authors and Journals in Digital Identifying specific authors and journals can
Financial Reporting Research aid researchers in finding high-quality studies,
methods, and materials related to digital
financial reporting
Research Efforts in Digital Financial Reporting Identifying the areas and document types
by Area and Document Type where the most research is being conducted in
digital financial reporting can help researchers
identify future research directions
Top Publication Affiliations for Digital Understanding the top affiliations for digital
Financial Reporting financial reporting research can aid researchers
in selecting the appropriate conferences,
universities, and journals to publish their work,
which may affect the citations of their papers
in the future
Leading Countries in the Publication of Digital Understanding which countries are producing
Financial Reporting Papers the most research in digital financial reporting
can provide researchers and practitioners with
opportunities to contribute to the field by
publishing their own research in those
countries
Keyword and citation network of digital Easy searching method for future researchers
financial reporting
Source Compiled by authors

10.5 Results

In this section, a bibliometric analysis of the Scopus database is presented. The


analysis is carried out in such a way that it will answer all the formatted research
questions.

10.6 Distribution of Annual Trend

In this research work, the prevalence of digital financial reporting in current research
is analyzed using bibliometric tools. The trends revealed that digital financial
reporting has become increasingly common. As per the study by Al-Sakran and
Al-Sakran [1], the use of mobile and online banking has grown significantly, with
more customers using these platforms to access financial services and products. This
has resulted in increased convenience and efficiency for customers and increased
216 N. Puri and V. Garg

200
180
160
No of publications

140
120
100
80
60
40
20
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Years

Fig. 10.1 Publication trend. Complied by Authors

revenue for financial institutions [1]. Another area of emphasis in the literature on
digital finance is the utilization of blockchain technology, which is a decentralized
and secure digital ledger that is being considered as a potential solution to various
issues in the financial industry such as fraud and security [11]. Out of these, 879
research papers were published between 2010 and 2022. From Fig. 10.1 it is evident
that there was a steady increase in publications from 2010 to 2013 with 28, 33 and
34 publications respectively, and then from 2017 there was a growth in publications,
reaching a peak in 2020 with 136 articles. A sudden increase was noted in 2017 with
163 articles, and again in 2022 with 183 articles.

10.7 Distribution of Authors

The list of authors who have published at least four papers, with a total of 18 such
authors is described in Table 10.2 and Fig. 10.2. Vasarhelyi M. A., who published
25 papers and received 871 citations, held the top spot on this list. Following closely
was Wang T., who authored eight papers and received 265 citations. Mithas S. also
made the list with 7 articles and 434 citations. Additionally, Zhang Y., Dwivedi Y.
K., Dai J. and Zhang I. all contributed to digital financial reporting with 5 articles
each. Moreover, other authors in the list contributed four publications in the field of
digital financial reporting.
10 Bibliometric Analysis of Digital Financial Reporting 217

Table 10.2 Author’s distribution


Author Documents Citations Total link strength Clusters
No W.G 6 225 16 1
Janvrin D.J 4 89 10 1
Alles M 4 87 4 1
Pinsker R.E 4 49 8 1
Zhang Y 5 18 13 1
Kumar S 4 17 2 1
Pavlou P.A 4 2045 11 2
Mithas S 7 434 7 2
Wang T 8 265 12 2
Dwivedi Y.K 5 131 1 2
Li J 4 67 1 2
Jr 4 24 1 2
Vasarhelyi M.A 25 871 37 3
Dai J 5 476 26 3
Zhang l 5 107 1 3
Smith S.S 4 21 8 3
Gray G.l 4 90 4 4
Debreceny R.S 4 80 6 4
Compiled by Authors

Fig. 10.2 Author’s relationship. Compiled by Authors


218 N. Puri and V. Garg

Fig. 10.3 Top journals

10.8 Distribution of Top Journals

It is observed that a total of 170 journals published articles on digital financial


reporting. In this study top, 9 journals are reported, having 5 or more publica-
tions. As Fig. 10.3 and Table 10.3 shows, the maximum number of publications
was published in the “Journal of Information Systems” (63), “International Journal
of Digital Accounting Research” (44), “International Journal of Accounting Infor-
mation Systems” (41), and “Journal of Emerging Technologies in Accounting”
(36).

10.9 Distribution of Articles Based on Citations

We used the VOSviewer software to conduct a citation analysis, which helped us


identify the key organizations and scientists, as well as examine their association.
The analysis helped us generate a list of the leading authors, ranked by the number
of documents and citations. The most significant thirty authors, whose citations were
most powerful, were characterized by the documents’ publication in Table 10.4 and
Fig. 10.4. Based on Table 10.1, Bharadwaj A. had the highest number of citations,
followed by Eccles R.G (763 sources), Bélanger F. (762 sources), and Wu S.P (339
citations). Similar patterns can be seen from the format. We focused on the top thirty
authors with more than 115 citations out of the 2247 authors spotted on VOSviewer.
10 Bibliometric Analysis of Digital Financial Reporting 219

Table 10.3 Top journals


Journal Documents Citations
Journal of Information Systems 63 1630
International Journal of Digital Accounting Research 44 535
International Journal of Accounting Information Systems 41 877
Journal of Emerging Technologies in Accounting 36 540
Critical Perspectives on Accounting 25 387
Cogent Business and Management 22 76
Management Science 21 1272
Management Decision 20 565
Information and Management 19 424
International Journal of Information Management 18 271
MIS Quarterly: Management Information Systems 18 3987
International Journal of Business Information Systems 16 91
Decision Support Systems 15 582
Accounting, Organizations and Society 14 374
Journal of Enterprise Information Management 14 253
Information Systems Research 13 349
Production and Operations Management 12 224
Production Planning and Control 12 140
Vine Journal of Information and Knowledge Management Systems 12 27
International Journal of Production Economics 11 285
Technology Analysis and Strategic Management 11 21
Australasian Journal of Information Systems 10 81
International Journal of Production Research 10 284
International Journal of Supply Chain Management 10 45
Journal of Risk Management in Financial Institutions 10 17
Journal of Information Technology 9 284
Journal of Management Control 9 70
Research Policy 9 247
Information Society 8 282
Managerial and Decision Economics 8 27
Problems and Perspectives in Management 8 34
Industrial Management and Data Systems 7 110
International Journal of Operations and Production Management 7 97
Journal of Science and Technology Policy Management 7 97
Journal of Strategic Information Systems 7 102
Electronic Commerce Research and Applications 6 154
European Journal of Information Systems 6 59
(continued)
220 N. Puri and V. Garg

Table 10.3 (continued)


Journal Documents Citations
Information and Computer Security 6 29
Information and Organization 6 421
International Journal of Information Systems in the Service Sector 6 10
International Journal of Management Reviews 6 26
Journal of Global Information Management 6 129
Journal of Theoretical and Applied Electronic Commerce Research 6 48
Knowledge Management Research and Practice 6 21
TQM Journal 6 71
Uncertain Supply Chain Management 6 22
Computer Law and Security Review 5 128
Electronic Markets 5 116
Information Technology and Management 5 10
International Journal of Business Intelligence and Data Mining 5 11
International Journal of Management 5 10
Journal of Management Information Systems 5 151
Journal of Telecommunications and the Digital Economy 5 3
Management Accounting Research 5 218
Research in Transportation Business and Management 5 84
Compiled by Authors

10.10 Distribution of Different Affiliations

Robert H. Smith School of Business, University of Maryland has the maximum


number of publications with 7 publications and 439 citations for the publications.
Rutgers, the state university of New Jersey, Newark and Iowa State University are in
the second position with 6 publications followed by the Carlson School of Manage-
ment, University of Minnesota (4 publications), University of Waterloo (4 publica-
tions), San Diego State University (4 publications) and Qatar university (4 publica-
tions). Table 10.5 highlights the organisation’s affiliations along with country details,
number of documents published and citations.

10.11 Distribution of Publications Among Countries

In this section, the countries with the highest number of publications are listed.
Table 10.6 illustrates that the United States had the most publications at 317, followed
by the United Kingdom with 101, and then Australia and China with 72 each. Other
countries on the list include India with 51, Italy with 47, Germany with 44, Spain
10 Bibliometric Analysis of Digital Financial Reporting 221

Table 10.4 Distribution of


Rank Author by citation Citations
articles based on citations
1 Bharadwaj A. (2013) 1325
2 Eccles R.G. (2014) 763
3 Bélanger F. (2011) 762
4 Wu S.P.-J. (2015) 339
5 Jenkin T.A. (2011) 303
6 Pavlou P.A. (2011) 270
7 Dimoka A. (2012) 262
8 Grabski S.V. (2011) 262
9 Dai J. (2017) 242
10 Chang R.M. (2014) 232
11 Dubey R. (2021) 204
12 Banker R.D. (2011) 188
13 Ananny M. (2016) 185
14 Hofer C. (2012) 185
15 Mithas S. (2016) 174
16 Burtch G. (2018) 159
17 Gao G.G. (2015) 156
18 Arvidsson A. (2012) 155
19 Chen C.S. (2013) 153
20 Haddud A. (2017) 150
21 Gordon L.A. (2010) 148
22 Pondeville S. (2013) 142
23 Skærbæk P. (2010) 142
24 Xie K. (2016) 139
25 Saldanha T.J.V. (2017) 132
26 Abrahams A.S. (2015) 132
27 Pang M.-S. (2014) 127
28 Banker R.D. (2014) 122
29 Milian E.Z. (2019) 117
30 Hu Y. (2015) 117
Compiled by Authors

with 38, Canada with 31, Malaysia with 30, Hong Kong with 25, France with 24,
Taiwan with 24, Indonesia with 23, and Brazil with 19. Figure 10.5 demonstrates the
countries’ collaboration network.
222 N. Puri and V. Garg

Fig. 10.4 Distribution of articles based on citations

Table 10.5 Organisation’s affiliations


Organization Country Documents Citations
Rutgers, the State University of New Jersey, Newark United States 6 493
Robert H. Smith School of Business, University of United States 7 439
Maryland
Iowa State University United States 6 231
Carlson School of management, University of United States 4 229
Minnesota
University of Waterloo Canada 4 160
San Diego State University United States 4 114
Rutgers, the State University of New Jersey United States 5 95
College of Business and Economics, Qatar Qatar 4 30
University

10.12 Distribution of Keyword Analysis

A total 4399 keywords are extracted, but there were 99 keywords at the minimum
number of occurrences of 5. The final network is developed and finally, 186 keywords
are selected for further analysis. The selected keywords are classified into 8 clusters
with different colors as shown in Fig. 10.6. Cluster one consists maximum of 38 items.
It has been observed that most of the researchers have been found the keyword for
xbrl followed by blockchain technology and finance.

10.13 Discussions

The literature on digital financial reporting is vast and multifaceted. Digital financial
reporting refers to the use of digital technologies to create, process, and disseminate
financial information. This includes the use of digital platforms and software to
10 Bibliometric Analysis of Digital Financial Reporting 223

Table 10.6 Distribution of publications among countries


Country Documents Citations Total link strength
United States 317 11,538 450
United Kingdom 101 2836 108
Australia 72 1054 159
China 72 1463 104
India 51 553 33
Italy 47 684 92
Germany 44 822 78
Spain 38 590 26
Canada 33 1386 69
Malaysia 30 127 11
Hong Kong 25 431 36
France 24 550 21
Taiwan 24 673 52
Indonesia 23 57 25
Brazil 19 208 10
Finland 18 275 26
Portugal 17 238 13
Russian Federation 17 64 4
Saudi Arabia 16 130 6
Netherlands 14 201 33
South Korea 14 910 18
Jordan 13 41 23
South Africa 12 104 16
Sweden 12 234 12
United Arab Emirates 12 128 14
Vietnam 11 27 8
Austria 10 499 7
Ireland 10 120 24
New Zealand 10 172 14
Pakistan 10 108 1
Switzerland 10 144 2
Denmark 8 424 8
Ukraine 8 30 2
Egypt 7 22 11
Iran 7 32 4
Qatar 7 62 2
Singapore 7 426 46
(continued)
224 N. Puri and V. Garg

Table 10.6 (continued)


Country Documents Citations Total link strength
Turkey 7 40 5
Bangladesh 5 25 3
Norway 5 49 13
Poland 5 80 4
Thailand 5 45 1
Compiled by Authors

Fig. 10.5 Country’s collaboration network

create financial statements, as well as the use of digital tools for data analysis and
visualization.
One of the main areas of focus in the literature on digital financial reporting is the
use of XBRL (extensible Business Reporting Language) technology. XBRL is an
open standard for digital financial reporting that allows for the creation of structured
and machine-readable financial statements [7]. Researchers have examined the bene-
fits and challenges of using XBRL, including its potential to improve the accuracy,
timeliness, and comparability of financial information [7, 19].
Another area of focus in the literature on digital financial reporting is the use of
digital platforms and software for financial reporting. This includes the use of cloud
computing, blockchain technology, and artificial intelligence in financial reporting.
10 Bibliometric Analysis of Digital Financial Reporting 225

Fig. 10.6 Keyword analysis

Researchers have examined the potential benefits and drawbacks of these technolo-
gies, such as improved efficiency and transparency, but also the security and data
privacy concerns.

10.14 Conclusion

There is also a growing body of literature on the impact of digital financial reporting
on various stakeholders, such as investors, regulators, and accounting profes-
sionals. Researchers have examined how digital financial reporting can improve the
decision-making process for these stakeholders and how it can affect their roles and
responsibilities.
The literature on digital financial reporting is also expanding geographically, with
researchers from different countries studying the adoption and implementation of
digital financial reporting in their respective countries.
Overall, the literature on digital financial reporting highlights the potential benefits
and challenges of using digital technologies in financial reporting, and the need for
further research to fully understand the implications of these technologies for various
stakeholders.
Finally, our bibliometric study has offered useful insights into the environment of
digital financial reporting research. We discovered numerous major trends, themes,
226 N. Puri and V. Garg

and patterns that provide light on the evolution of this discipline through a comprehen-
sive analysis of the relevant literature. Our findings are summarised in the following
important points:
Emerging Research Topics: Our research uncovered new research topics in digital
financial reporting, such as the impact of blockchain technology, the function of
artificial intelligence and machine learning, and the incorporation of sustainability
reporting.
Collaborations and Authors: We noticed notable authors and research organisa-
tions making significant contributions to the field. Collaboration networks among
scholars have been significant in promoting the development of knowledge.
Publication Trends: The number of publications associated to digital finan-
cial reporting has steadily grown over the years, demonstrating the field’s rising
importance.
Citation Patterns: Evident by their substantial citation counts, several key books
and publications have received considerable interest. These works affected future
research opinions.
Geographical Distribution: Research on digital financial reporting is a global
effort, with contributions from scholars and institutions all around the world. This
indicates the topic’s international importance and application.
Future Directions: Our analysis has identified several promising avenues for future
research, such as investigating the ethical and regulatory implications of digital
financial reporting, investigating the adoption challenges faced by organisations,
and assessing the impact of emerging technologies on financial reporting quality.

10.15 Theoretical Implications

The current study makes several contributions:


1. It is the first study in the Scopus literature to conduct a bibliometric analysis of
digital financial reporting.
2. The results of the analysis provide important implications for future researchers
by giving them a new direction for further investigation in the field.

10.16 Practical Implications

The findings derived from this analysis have practical consequences for policy-
makers, practitioners, and researchers. Policymakers may utilize this information to
guide regulatory choices, while practitioners can learn about the newest innovations
in digital financial reporting.
Limitations and Future Scope: This research suggests several potential areas for
future research in job stress, such as exploring highly cited research papers and
10 Bibliometric Analysis of Digital Financial Reporting 227

examining technical aspects using different literature databases like WoS (Web of
Sciences) and other internet databases, to compare the findings of this study.

References

1. Al-Sakran, W.A., Al-Sakran, W.: The impact of digital technology on banking services: a
literature review. J. Internet Bank. Commer.Commer. 25(2), 1–13 (2020)
2. Damayanti, F.N., Kusmawati, P., Navia, V., Luckyardi, S.: Readiness the owner of small medium
enterprises for digital financial records in Society 5.0 era. ASEAN J. Econ. Econ. Educ. 1(1),
1–8 (2022)
3. Efendi, J., Smith, L.M., Wong, J.: Longitudinal analysis of voluntary adoption of XBRL on
financial reporting. Int. J. Econ. Account. 2(2), 173–189 (2011)
4. Efimova, O., Rozhnova, O.: The corporate reporting development in the digital economy. In:
Digital Science, pp. 71–80. Springer (2019)
5. Fauzan, R.: Karakteristik model dan analisa peluang-tantangan industri 4.0. Phasti: Jurnal
Teknik Informatika Politeknik Hasnur 4(01), 1–11 (2018)
6. Guthrie, J., Manes-Rossi, F., Orelli, R.L.: Integrated reporting and integrated thinking in Italian
public sector organisations. Meditari Acc. Res. 25(4), 553–573 (2017)
7. Jørgensen, B.: Digital financial reporting: opportunities and challenges. J. Appl. Account. Res.
17(3), 285–298 (2016)
8. Keliwon, K.B., Abdul Shukor, Z., Hassan, M.S.: Measuring internet financial reporting (IFR)
disclosure strategy. Asian J. Account. Govern. 8, 7–24 (2017)
9. Kulikova, L.I., Mukhametzyanov, R.Z.: Formation of financial reporting in the conditions of
digital economy. J. Environ. Treat. Tech. 7(Special Issue), 1125 (2019)
10. Lasi, H., Fettke, P., Kemper, H.G., Feld, T., Hoffmann, M.: Industry 4.0. Bus. Inf. Syst. Eng.
6, 239–242 (2014)
11. Li, H., Wang, H., Zhang, Y.: Blockchain in finance: a literature review. J. Financ. Stab.Financ.
Stab. 34, 196–213 (2018)
12. Moudud-Ul-Huq, S.: The role of artificial intelligence in the development of accounting
systems: a review. IUP J. Account. Res. Audit Pract. 13(2) (2014)
13. Murdayanti, Y., Khan, M.N.A.A.: The development of internet financial reporting publications:
a concise of bibliometric analysis. Heliyon 7(12), e08551 (2021)
14. Nambisan, S., Lyytinen, K., Majchrzak, A., Song, M.: Digital innovation management:
reinventing innovation management research in a digital world. MIS Q. 41(1), 223–238 (2017)
15. Nurunnabi, M., Alam Hossain, M.: The voluntary disclosure of internet financial reporting
(IFR) in an emerging economy: a case of digital Bangladesh. J. Asia Bus. Stud. 6(1), 17–42
(2012)
16. PWC: Digitalisation in finance and accounting and what it means for financial statement
audit (2018). Available at: https://www.pwc.de/de/im-fokus/digitaleabschlusspruefung/pwc-
digitalisation-in-finance-2018.pdf. 13 Apr 2019
17. Rinaldi, L., Unerman, J., de Villiers, C.: Evaluating the integrated reporting journey: insights,
gaps and agendas for future research. Account. Audit. Account. J. 31(5), 1294–1318 (2018)
18. Salaudeen, H., Alayemi, S.A.: The level of internet adoption in business reporting: the Nigerian
perspectives. Int. J. Appl. Bus. Res. 107–121 (2020)
19. Sun, H., Li, X., Wang, Q.: Digital financial reporting: a literature review and research agenda.
J. Account. Data Sci. 2(2), 1–19 (2018)
20. Suryanto, T., Komalasari, A.: Effect of mandatory adoption of international financial reporting
standard (IFRS) on supply chain management: a case of Indonesian dairy industry. Uncertain
Supply Chain Manage. 7(2), 169–178 (2019)
21. Zaidi, U.K., Akhter, J., Akhtar, A.: Window dressing of financial statements in the era of digital
finance: a study of small cap Indian companies. Metamorphosis 17(2), 67–75 (2018)
Chapter 11
The Quest for Financing Environmental
Sustainability in Emerging Nations: Can
Internet Access and Financial
Technology Be Crucial?

Ekundayo Peter Mesagan, Precious Muhammed Emmanuel,


and Mohammed Bashir Salaudeen

Abstract Environmental problems have continued to ravage emerging economies


with numerous negative impacts on human life. The African continent is the worst
heat among emerging nations with increasing dissert encroachment, flooding and
air population. The mitigation and adaptation approach through environmentally
sustainable financing can be the bailout for emerging countries to build a climate
resilient eco-system that drives environmental quality. As a result, we devote this
chapter to analysing the role of internet access and financial technology adoption
to drive the quest for environmental sustainability financing in emerging nations
with a special focus on African countries. We discover that environmental financing
in African nations is far below the needed climate change finance, thus lowering
the stock of green investment projects in the region. The situational analysis also
reveals that the public sector accounts for 86% of climate funding in Africa, with
the private sector contributing only 14%. Considering the role of internet access and
Fintech adoption, we discover that internet access has improved substantially from
6 to 29.36% over the last decades, and Fintech adoption has increased likewise. The
study concludes that African nations have the potential through the financial system
to mop private funds for sustainable investment leveraging on internet access and
Fintech penetration and adoption.

E. P. Mesagan (B)
School of Management and Social Sciences, Pan-Atlantic University, Lagos, Nigeria
e-mail: [email protected]
P. M. Emmanuel
MGIG Global Research Institute, Lagos, Nigeria
e-mail: [email protected]
M. B. Salaudeen
Nungu Business School, Lagos, Nigeria
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 229
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_11
230 E. P. Mesagan et al.

Keywords Environmental sustainability · Finance · Internet access · Financial


technology · Emerging nations

JEL Classification Q56 · G20 · O36 · O16 · N77

11.1 Background

The climate change problem is now a global phenomenon attracting the attention of
global leaders, institutions, and policy analysts. This is due to the devastating impact
of climate change, most especially in developing continents like Africa. Olaoye [26],
and Mesagan et al. [22] argued that GHG emissions arising from energy firing and
anthropogenic activities are the stimulators of climate change. The climate change
impact due to GHG emissions is damaging with substantial economic, health, and
environmental consequences for the current and future generations. For instance,
Africa contributes about 2–3% of emissions, the lowest globally compared to Asia,
which accounts for half of the global CO2 , and North America emitted about 15%
of total emissions between 2000 and 2020 [30]. Despite African countries’ menial
contribution to GHGs, its negative effect on the regions is dehumanising, especially
among rural habitats.
African countries face dire environmental problems, including flooding, deser-
tification, and air pollution. However, the intensity of the annual flooding further
exacerbates rural poverty and acute food insecurity as flooding continues to wash
away farmlands and displace millions of farmers. World Metrological Organisation
[35] noted that by 2030, 700 million people could be forced out of their homes
due to high water stress, which is anticipated to affect roughly 250 million people in
Africa due to climate problems. Aside from this, UNEP [31] noted that environmental
pollution caused by CO2 emissions results in 1.96 billion IQ point losses annually
in Africa. Also, Fisher et al. [10], Olunkwa et al. [27], and Evans and Mesagan [9]
affirmed that over a million deaths on the African continent were attributed to air
pollution in 2019: 697,000 were attributable to household pollution, while 394,000
were to ambient pollution. These negative consequences of climate change require a
massive investment that promotes adaptation and mitigation. As a result, the finan-
cial system becomes indispensable in financing environmentally friendly projects to
enhance sustainability.
However, without considerable sustainability financing, the target and achieve-
ment of sustainable development goals (SDGs) by 2030 and global commitment
toward 2050 carbon neutrality by developing economies will be far from view. This
is because massive green investment financing is required to promote carbon effi-
ciency and environmental quality. Therefore, the interface of the financial system,
technology, and internet access is vital in ensuring that financing environmentally
sustainable projects is commensurate with the level of greenhouse depletion. Arguing
along this part, Nassiry [25] contends that to accomplish the SDGs and the core goal
of the Paris Agreement, keeping the increase in the global GHG below 2 °C, trillions
11 The Quest for Financing Environmental Sustainability in Emerging … 231

of dollars in new investment will be needed. These investments involve an incre-


mental investment for long-term projects like infrastructure low-carbon and climate-
resilient. Hence, the financial industry has a resource mobilisation and efficient
allocation role in stimulating this investment.
Moreover, environmental financing involves investment decisions in a business
venture and its environmental, social, and governance (ESG) aspects. According to
Khan et al. [18], environmental financing is built on the triple bottom-line framework,
which considers allocating financial resources to sustainable projects that support
people, prosperity, and the environment. Similarly, Giglio et al. [13] climate finance
aims to increase greenhouse gas sinks and decrease emissions, thereby reducing
climate vulnerability. This climate change adaptation and mitigation financing make
the financial system indispensable in the quest for sustainability, especially in
developing countries. Guang-Wen and Siddik [15] argued that the financial system
promotes sustainability by embracing cutting-edge technology like blockchain, green
banking, and online banking.
This cutting-edge technology adoption by financial institutions is driving Fintech,
which supports various eco-friendly initiatives, such as developing renewable or
alternative energy sources and green industries via efficient allocation of funds.
According to Chen et al. [5], Fintech makes it possible to finance a range of environ-
mentally friendly programs, including those that promote renewable energy, energy
efficiency, clean technology, and the growth of green industries. Cen and He [4],
Nassiry [25], and Al-Okaily et al. [1] support this argument that Fintech can be a
channel for driving environmental sustainability. Interestingly, the role of Fintech
towards climate financing is well documented in the literature, but the consideration
of internet accessibility to engender wider adaptability remains sparingly considered.
The accessibility of the internet, especially in developing economies, is a requirement
to the extent of progress that can be attained for environmental financing through
fintech adoption.
In this respect, internet access refers to a person’s or organisation’s capacity to
connect to the internet through control panels, computers, and other devices and their
ability to access services [19]. Access to the internet is an enabler of Fintech. As such,
the effectiveness of Fintech in promoting environmental finance depends on the avail-
ability and accessibility of the Internet [2]. Moreover, Friedline et al. [11] pointed out
that digital financial transactions, such as direct paycheck deposits and electronic bill
payments, are made possible by high-speed internet and smartphone applications.
In this regard, access to the internet enables Fintech to promote financial inclusion
such that the most climate-vulnerable areas can access finance to drive sustainable
agricultural practice vis-à-vis promoting sustainability and economic well-being.
Also, information asymmetry that characterised the financial market concerning the
availability of financial instrument that supports green growth can be minimised
by fintech-wide adoption due to internet access. This will enable investors to access
green financial instruments from any location without being excluded from the finan-
cial system. For instance, government green bonds and green private equities will be
accessible for all to support carbon-efficient investments to enhance environmental
232 E. P. Mesagan et al.

quality. Specifically, Muganyi et al. [24] posited that Fintech encourages green agri-
culture practices in China by assuring financial availability, reducing the information
gap, and increasing trust among farming communities. This indicates that Fintech
tends to lower unsustainable agricultural practices in Africa and thus encourage
sustainable agriculture.
There is a need to mention that environmental financing in Africa is driven mainly
by the government. However, the level of public climate financing needs to be
commensurate with the needed green investment for a sustainable environment due
to the peculiarity of the public revenue pressure. Although evidence from the Climate
Policy Initiative [6] affirmed that Africa’s public climate financing per total of 86%
exceeds those of the other regions (i.e., South Asia 64%, East Asia 62%, Western
Europe 41%), its private sector contribution is appalling. This evidence indicates that
the private sector accounts for only 14% of climate adaptation and resilience projects,
the lowest compared to the other regions. Moreover, between 2020 and 2030, Africa
will need almost $2.8 trillion worth of green investment to facilitate its Nationally
Determined Contributions (NDCs) [12]. The international public sector, as well as
the domestic and international private sectors, must contribute $2.5 trillion of this
total [12]. This shows that private green equities and bonds for African countries are
crucial to accelerating climate resilience financing and investment towards achieving
SDGs by 2030 and net-zero targets by 2050. Therefore, mopping and ensuring effi-
cient allocation of these green funds place much onus on internet accessibility and
financial technology since Fintech can make this funding accessible cheaply and
efficiently.
Therefore, this chapter explores the inherent opportunities of internet accessibility
and fintech adoption in promoting environmental sustainability. For structural organ-
isation, after the study background, Sect. 11.2 will present the schematic analysis
of the study, Sect. 11.3 will show situational analysis, Sect. 11.4 will discuss the
implications of findings, and lastly, Sect. 11.5 will provide a policy outlook.

11.2 Schematic Analysis

Figure 11.1 presents the framework that connects Fintech, internet access and envi-
ronmental sustainability. This framework depicts how the financial system can drive
sustainable development through financial innovation and access to the internet. From
the schematic analysis in Fig. 11.1, the financial system is the central hub that coordi-
nates private funds and green bonds towards investment in environmentally friendly
projects. According to Muganyi et al. [24], the financial system is indispensable
in mopping financial resources towards sustainable investments. From Fig. 11.1, the
private funds, which include both households’ and firms’ surplus savings, are coordi-
nated by the financial structure and make such funds available to drive project funding
by most financial deficit units. Also, like the private fund, a green bond is a source
of environmental financing instrument that the government floats via the financial
system to motivate investment that considers environmental impact. In the view of
11 The Quest for Financing Environmental Sustainability in Emerging … 233

[32], investment in sustainable projects needs long-term financial instruments, as a


result, shareholders favour green bonds as they have the potential to support firms’
sustainable investment growth in the long run. Therefore, the ultimate responsibility
of the financial industry is to intermediate between the owners and users of the fund
to support investment [14, 21, 23].
The penetration of technology and innovation into the financial system has trans-
formed the intermediation process of the financial system. From Fig. 11.1, financial
intermediation between owners of financial resources and the users is expediently
connected through financial technology (Fintech). The use of Fintech enables the
financial system to optimally channel private funds and public green bonds towards
environmental investment. According to [15], Fintech enables financial organisations
to speed up the delivery of their financial services towards eco-friendly projects.
Similarly, Deng et al. [8] emphasised that fintech platforms hasten the acquisition
and distribution of cash designated for environmental projects. Also, Muganyi et al.
[24] put that Fintech can expedite the transition to a new sustainable investment
system that promotes cleaner production through smart manufacturing and other
green management procedures.
Therefore, based on the schematic analysis presented in Fig. 11.1, the financial
system can drive environmental sustainability by making the finances available for
sustainable agriculture, renewable energy projects, and sustainable businesses. In this
respect, sustainable agriculture involves farming practices that consider the ecolog-
ical impact of the farming practices without compromising the food needs of the
current and future generations [28]. Also, financing renewable energy projects entails
supporting projects such as renewable mini and microgrids that enable the decen-
tralisation of power generation for household and commercial energy consumption.
Renewable energy projects are environmentally friendly and sustainable, promoting
such projects is key towards sustainability [29]. Furthermore, a sustainable business
is a business that operates without negatively influencing the environment, people,
or society as a whole [18]. Supporting this argument, Muganyi et al. [24] opined that
these kinds of business promote green and reward green behaviour such as virtual
transactions, eliminating paper use, encouraging walking and cycling and encour-
aging carbon savings. If the financial system expedites financial services towards
these kinds of projects through Fintech, the industry has the tendency to champion
sustainability transition.
On the foregoing, the role of internet access is crucial in enabling the financial
sector to adopt Fintech in dispatching financial services. Internet access is the conduit
through which Fintech interacts with owners and users of financial services. Al-
Okaily et al. [1] and Crouhy et al. [7] indicate that the internet is the enabler of
Fintech. This means that Fintech can achieve less without adequate access to the
internet. In Fig. 11.1, the diagram depicts that financial systems require internet access
to enable the usage of financial technology. Farmers, renewable energy investors and
business owners also need access to the internet to enjoy the financial services of
the financial system. Moreover, the internet enables Fintech to promote financial
inclusion, making the most rural bankable and able to access financial resources.
Therefore, internet access can play a crucial role in the intermediation of accessing
234

Internet Access
Sustainable
Agriculture
Private funds
Financial Renewable energy Environmental
Fintech Sustainability
system projects
Green Bonds

Sustainable
business/investment
Internet Access

Fig. 11.1 Schematic linkage between financial technology, internet access and environmental sustainability. Source Authors’ Design
E. P. Mesagan et al.
11 The Quest for Financing Environmental Sustainability in Emerging … 235

green financing for climate resilience investment. Finally, the interconnectedness of


Fintech and access to the internet by the financial system can empower the financial
sector to drive environmental sustainability. Figure 11.1 depicts the paths on how the
financial system can promote sustainability.

11.3 Situational Analysis

This segment of the study represents the data analysis concerning the depth of envi-
ronmental financing, access to the internet situation and financial technology adop-
tion in emerging nations with a special focus on Africa. The situational analysis of
these countries is illustrated using charts to give a clear picture of the environmental
financing scenario and the possibility of scaling up financing through the adoption
of Fintech supported by internet access.
The analysis we present in Fig. 11.2 shows the climate financing in Africa. The
countries represented are chosen based on the five sub-regions of the African conti-
nent. For instance, in North Africa, we select Algeria and Egypt. For East Africa, the
analysis chose Rwanda and Kenya, for Central Africa, Angola and Cameroon were
selected. In Southern Africa, we picked South Africa and Namibia; for West Africa,
the analysis selected Nigeria and Ghana. This approach is to provide a solid judgment
that will guide policy actions that will emanate from the study. Based on the data anal-
ysis in Fig. 11.2, In North Africa, Egypt spent about $2600 million on environmental
sustainability financing, while Algeria only spent $53 million on climate financing.
For East Africa, climate financing is $1919 million in Kenya, while Rwanda spends
about $601 million on environmental sustainability. Considering the West African
Region, Nigeria and Ghana spent $1923 million and $830 million, respectively. In
the Southern African region, South Africa’s climate financing stood at about $1660
million, and Namibia’s spending on climate mitigation and adoption is $202 million.
Lastly, for Central Africa, Angola’s climate financing is $307 million while $390
million. The evidence illustrates that African countries are making an effort towards
environmental sustainability. Comparing sustainability financing among the African
region, North and East Africa tend to perform better than the West, South and Central
African regions. However, the Central African countries perform poorly. Supporting
this findings, the IEA report indicates that sustainable financing in Northern Africa
has accelerated the clean energy transition agenda. As a result, the North African
countries have increased their renewable energy production by 40% by adding 4.5
GW of wind, solar PV, and solar thermal capacity to their renewable power grid in
the last decades and more so, renewable generation capacity has increased by 80%
over the same period [16].
236 E. P. Mesagan et al.

Cameroon
Angola
Namibia
African countries
South Africa
Ghana
Nigeria
Rwanda
Kenya
Algeria
Egypt
Climate Finance $ Million

Fig. 11.2 Climate financing in Africa. Source Authors’ Sketch using Data from [6]

11.3.1 Where is African Climate Finance Coming from?

African countries are investing in environmental sustainability financing to mitigate


the environmental impact of climate change vulnerability. Source of African coun-
tries’ climate change funding is crucial to know the sustainability of the sources to
propel the clean technology investment to support adaptation and mitigation plans.
Figure 11.3 depicts the sources of climate financing in selected African countries
based on 5 regions in Africa.
Figure 11.3 shows the source of climate financing for African countries. The
analysis shows that the source of funding for a sustainable environment comes from
the public and private sectors. However, for all African countries, the government
spend more on green financing and investment based on the report in Fig. 11.3. This

Cameroon
Angola
Namibia
African countries

South Africa
Ghana
Nigeria
Rwanda
Kenya
Algeria
Egypt

0 500 1000 1500 2000 2500


USD in Millions
Private climate financing Public climate financing

Fig. 11.3 Sources of climate financing. Source Authors’ Sketch using Data from [6]
11 The Quest for Financing Environmental Sustainability in Emerging … 237

Middle East & North Africa

Sub-Saharan Africa

South Asia

Latin America & Caribbean

US & Canada

Western Europe

East Asia & Pacific

0% 20% 40% 60% 80% 100% 120%

Proportion of domestic green financing Proportion of international green financing

Fig. 11.4 Proportion of domestic versus international source of green financing. Source Authors’
Sketch using Data from Climate Policy Initiative (2022)

indicates that private sector financing in Africa constitutes a smaller percentage of


green funding in the region. Comparing the selected countries, South Africa’s private
sector contribute more to green financing with about $656 million, which is higher
than combing private sector contribution in Cameroon, Angola, Namibia, Ghana,
Rwanda and Algeria, which is about $619 million in total. The huge participation of
the South African private sector is connected to South Africa’s Green Fund, estab-
lished in 2011 to pave the way for the private sector to access private equities to drive
green investment projects [17]. Additionally, situational analysis of the sources of
funding of climate financing extends the analysis to find out the ratio of domestic
and foreign funds that account for sustainable financing for Africa. For a solid anal-
ysis, the report considered wide global, and regional data to understand the global
perspective of this scenario. The analysis is presented in Fig. 11.4.
Figure 11.4 shows the domestic and international proportions of green financing
across the world’s regions. The evidence illustrates that some continents massively
depend on international funds for sustainable investment while others engender
green investment through domestic funding. For instance, the East Asia and Pacific
region’s source of green finance is domestically driven by 93%, while only 7% of the
finance comes from international sources. Focusing on Africa, the report indicates
that African countries substantially depend on foreign funding for climate resilience
projects and investments. For instance, about 82% of climate finance in sub-Saharan
Africa is internationally sponsored, while only 18% is domestically sourced. This
low proportion of domestic funding towards climate investment may account for the
minimal commitment of African nations towards climate resilience financing.
238 E. P. Mesagan et al.

11.3.2 The Glimpse of Environmental Sustainability


Financing of East Asian Countries

Environmental problem is global the relative vulnerability to the climate change


problem varies. Apart from African countries, World Bank [33, 34] noted that East
Asian countries are severely vulnerable to global warming, with damming health and
economic challenges. As a result, climate finances to mitigate climate change’s effect
have remained encouraging compared with other developing economies. Therefore,
Fig. 11.4 present the climate financing evidence of selected East Asian nations.
Figure 11.5 illustrates the climate financing level of the ASEAN nations. The
spending on a green environmental project in these nations outperformed the commit-
ment of the African region. Besides Malaysia and LAO DPR, which spent about
$141 million and $1536 million on sustainable environmental projects, the remaining
economies, as depicted in Fig. 11.5, generate more finance to stimulate green invest-
ment. Interestingly, Indonesia, the Philippines, and Vietnam spend more on climate
resilience investment compared with the total spending of the selected African nations
in Fig. 11.2. In this respect, Indonesia’s climate financing is $18,056 million, the
Philippines spends $12,217 million, and Vietnam spends $13,383 million on climate-
sustainable projects, but the combined spending of the selected African economies
stands at $10,053 million. This indicates that climate mitigation and adaptation efforts
are more intense in the Asian continent than African continent. This evidence supports
Fig. 11.4, showing that East African countries rely mainly on domestic funding
sources on climate mitigation and adoption projects, which is more significant than
relying on climate finance aids.

Vietnam

Thailand

Philippines
ASEAN Countries

Myanmar

Malaysia

Lao PDR

Indonesia

Cambodia

0 5000 10000 15000 20000


USD in Million

Fig. 11.5 Climate financing for selected East Asian nations. Source Authors’ Sketch using Data
Retrieved from [20]
11 The Quest for Financing Environmental Sustainability in Emerging … 239

11.3.3 Internet Access Situational Analysis

The subdivision of the study shows the internet access situation globally to give a
clear picture of internet access globally by region to assess its potential opportunity
to scale up fintech adoption and environmentally sustainable financing in emerging
nations. The situation analysis is illustrated in Fig. 11.6.
Figure 11.6 showcases the trend of internet access by region for a decade between
2010 and 2020. The evidence indicates that access to the internet globally has
continued to rise over the period across the regions of the world. However, the
African continent has the least access rate, as the report presents in Fig. 11.6. Strik-
ingly, for sub-Saharan Africa, access to the internet has sporadically risen from about
6.13% in 2010 to 29.34% in 2020. The implication is that in 2010 only about 6%
of the African population can access the internet, but over this period, the rate of
accessibility has significantly improved. Moreover, the accessibility of the internet
in Africa triggers an increase in mobile phone penetration in the region. The analysis
of mobile phone penetration is illustrated in Fig. 11.7.
The mobile phone penetration data is presented in Fig. 11.7, and the trend line is
similar to Fig. 11.6. It is not surprising because the internet access rate can determine
the mobile phone penetration rate. Assessing the situation of Africa, the mobile phone
penetration rate in sub-Saharan Africa is 44.09% in 2010 but increased to 81.9%. This
implies that over the period, the mobile phone penetration rate has almost doubled in
10 years. Moreover, this corroborates with internet with the surge in internet access
over the period. In this regard, internet access provides a huge opportunity to drive
fintech adoption since Al-Okaily et al. [1] and Crouhy et al. [7] emphasised that the
internet is the enabler of Fintech.

90
Internet Access (% population)

80
70
60
50
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Europe & Central Asia East Asia & Pacific


Middle East & North Africa South Asia
Sub-Saharan Africa Latin America & Caribbean

Fig. 11.6 Internet access by regions. Source Authors’ Sketch using Data Sourced from World
Development Indicator (2021)
240 E. P. Mesagan et al.

140

Mobile penetration ( per 100 people)


120

100

80

60

40

20

0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Europe & Central Asia East Asia & Pacific


Latin America & Caribbean Middle East & North Africa
South Asia Sub-Saharan Africa

Fig. 11.7 Mobile phone penetration by region. Source Authors’ Sketch using Data Sourced from
World Bank World Development Indicator (2021)

11.3.4 Fintech Situational Analysis by Region

Financial technology has gained momentum globally, with people increasingly


enjoying financial services through computing devices connected to the internet.
This segment considers the level of fintech adoption and embracement by analysing
the regional population that owns credit cards, performs digital transactions, and
borrows from the financial institution using the mobile money app. Based on this
analysis, the study presents Fig. 11.8 as thus.
Figure 11.8 presents the Fintech indicators for 6 regions of the world to depict
the position of financial technology globally and to compare the African region as
an emerging region with the rest of the world. From the Analysis, East Asia and
Europe & Central Asia performance better in financial technology. Specifically, for
both regions, about 36 and 37% of the population borrow money from a financial
institution using the mobile money app. Similarly, 79 and 75% of the population
made or accepted digital payments, while 34 and 38% owned a credit cards. This
shows that technology has strongly penetrated the region’s financial system, enabling
the ease of transactions. On the other hand, the analysis shows that technology is still
penetrating other regions. However, considering the situation of Africa, about 13%
of the sub-Saharan African population borrow money from a financial institution
using mobile money app, 53% make or receive digital payments, and only 3% own
credit cards. This indicates that the adoption of Fintech in Africa is still at the initial
stage requiring penetration. Moreover, financial technology has a huge opportunity
for wider adoption in Africa owing to the increasing internet access rate and mobile
phone penetration.
11 The Quest for Financing Environmental Sustainability in Emerging … 241

90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
East Asia & Latin America Middle East & South Asia Sub Saharan Europe &
Pacific & Caribbean North Africa Africa Central Asia

Borrowed money from the financial institution using mobile money


Made or receive digital payment
owned a credit card % of age 15+

Fig. 11.8 Fintech situational analysis. Source Authors’ Sketch using Data Sourced from The World
Bank Global Findex Database (2021)

11.4 Implication of Findings

The situational analysis section of this paper is revealing depicting sustainability


financing reality in emerging nations. We discover that there is a global push for
sustainability financing to mitigate the negative effect of environmental degradation.
However, the efforts are disproportional across the regions globally. For instance, the
analysis reveals that Egypt spent more on climate financing for African countries than
other African nations, spending about $2600 million on green projects to mitigate
the effect of climate change. However, comparing the environmental sustainability
financing of the African countries to East Asian countries that are also vulnerable
to environmental pollution, the analysis illustrates that sustainable environmental
financing in Indonesia alone outweighs the spending of 10 selected countries across
the 5 sub-regions of Africa accounting for the largest countries financing green
projects in Africa. The analysis depicts that while African countries are most vulner-
able to climate change, the financing capacity to build a climate resilience eco-system
to promote mitigation and adaptation measures is lagging because of the weak coor-
dination and non-alignment of the African financial system to drive green projects
to support environmental quality. UNEP [31] supports this argument and shows that
financial systems need to be aligned with the call to mobilise financial resources to
support green growth. Additionally, African countries are debt trap with obligations
to service bilateral and multilateral debt. This debt obligation creates revenue pres-
sure on the nations, constraining what can be injected into environmentally friendly
projects across the sectors of the economy.
242 E. P. Mesagan et al.

Furthermore, the analysis reveals that the public sector in Africa takes the burden
of climate financing while the private sector’s contribution is minimal. The statistics
revealed by Climate Policy Initiative [6] reveal that the public sector provides 86%
of climate financing needs in Africa while the private sector only accounts for 14%.
Similarly, our analysis discovered that even 86% of public climate financing, about
82% of the funding, is internationally sourced while 18% is domestic. Therefore, the
minimal private sector involvement in Africa possibly accounts for the wide disparity
in climate project funding in Africa compared with other regions globally. Therefore,
the role of the financial sector in mobilising private funds for sustainable investment
is essential for African countries and emerging nations to close the wider gap in
climate finance needs.
Therefore, since the financial sector is essential to stimulate climate financing to
narrow the green funding needs among the African nation, we evaluate the potential
of internet access and fintech adoption to accelerate environmental sustainability
financing. This analysis discovers that access to the internet in African countries
has substantially improved. The implication is that improved internet access consti-
tutes an enabler for the financial industries to advance financial services through
financial technology adoption. Also, through access to the internet, individuals and
organisations can enjoy financial services provided by the financial industry. Based
on this, the situational analysis of fintech penetration shows that digital transactions
and access to credit through mobile money apps are increasing in African nations.
Therefore, with this progress in internet access and financial technology adoption,
the financial sector can drive green financing by extending green bond opportunities
and credit services to the general public, which boosts environmental sustainability
financing.

11.5 Policy Outlook

Africa continues to be the most susceptible continent to climate change despite


contributing about 3% of the global emissions, the most negligible greenhouse gas
emissions contributions worldwide. This minimal contribution should be noticed
owing to Africa’s projected population rise to about 3 billion by 2060. This implies
that more electricity, transportation, energy input to agriculture and manufacturing,
and housing will be needed as the population increases. As a result, the carbon
footprint per capita will drastically rise, thereby making Africa a massive contributor
to GHG emissions by 2060. Ayompe et al. [3] have projected that if the current
level of emissions and population growth trends persist, Africa’s contribution to
global warming will rise to about 30% by 2030. This implies that by 2060 Africa’s
contribution to global warming can be 30% if there are no substantial commitments
towards mitigations and adaption efforts in the region.
The finding justifies the research problem by confirming the veracity of internet
access and financial technology as important alternative climate financing sources for
emerging nations rather than the sole dependence on the government’s annual budget,
11 The Quest for Financing Environmental Sustainability in Emerging … 243

which adequacy is largely questioned. Therefore, we suggest that the financial system
should actively get involved in financial resources mobilisation to support investment
in green projects such as supporting renewable mini-grids for electricity generation,
providing funding for sustainable agriculture, creating long-term financial avenues
for electric car projects in Africa to reduce the carbon footprints of the transport
sector via green bonds, debts, and equity and supporting general energy efficient
investment in Africa through financial technology adoption. This environmentally
sustainable financing will make African countries develop a climate change resistance
eco-system to promote sustainability even as the continent’s population doubles by
2060.

References

1. Al-Okaily, M., Al Natour, A.R., Shishan, F., Al-Dmour, A., Alghazzawi, R., Alsharairi, M.:
Sustainable FinTech innovation orientation: a moderated model. Sustainability 13(24), 13591
(2021)
2. Arner, D.W., Buckley, R.P., Zetzsche, D.A., Veidt, R.: Sustainability, FinTech and financial
inclusion. Eur. Bus. Organ. Law Rev. 21(1), 7–35 (2020)
3. Ayompe, L.M., Davis, S.J., Egoh, B.N.: Trends and drivers of African fossil fuel CO2 emissions
1990–2017. Environ. Res. Lett. 15(12), 124039 (2021)
4. Cen, T., He, R.: Fintech, green finance and sustainable development. In: 2018 International
Conference on Management, Economics, Education, Arts and Humanities (MEEAH 2018),
pp. 222–225. Atlantis Press (2018)
5. Chen, L., Ma, R., Li, J., Zhou, F.: Revolutionizing sustainable economic growth in China:
harnessing natural resources, green development, and fintech for a greener future. Res. Policy
92, 104944 (2024)
6. Climate Policy Initiative: Landscape of Climate Finance in Africa (2020). Avail-
able at: https://www.climatepolicyinitiative.org/publication/landscape-of-climate-finance-in-
africa/. Accessed 3 Jan 2023
7. Crouhy, M., Galai, D., Wiener, Z.: The impact of Fintechs on financial intermediation: a
functional approach. J. FinTech 1(01), 2031001 (2021)
8. Deng, X., Huang, Z., Cheng, X.: FinTech and sustainable development: evidence from China
based on P2P data. Sustainability 11(22), 6434 (2019)
9. Evans, O., Mesagan, E.P.: ICT-trade and pollution in Africa: do governance and regulation
matter? J. Policy Model. 44(3), 511–531 (2022)
10. Fisher, S., Bellinger, D.C., Cropper, M.L., Kumar, P., Binagwaho, A., Koudenoukpo, J.B., Park,
Y., Taghian, G., Landrigan, P.J.: Air pollution and development in Africa: impacts on health,
the economy, and human capital. Lancet Planet. Health 5(10), e681–e688 (2021)
11. Friedline, T., Naraharisetti, S., Weaver, A.: Digital redlining: poor rural communities’ access
to Fintech and implications for financial inclusion. J. Poverty 24(5–6), 517–541 (2020)
12. FsdAfrica: Current Levels of Climate Finance in Africa Falling Drastically Short of Needs
(2021). Retrieved from: https://www.fsdafrica.org/news/current-levels-of-climate-finance-in-
africa-falling-drastically-short-of-needs/. Accessed 9 Oct 2022
13. Giglio, S., Kelly, B., Stroebel, J.: Climate finance. Ann. Rev. Financ. Econ. 13, 15–36 (2021)
14. Gorton, G., Winton, A.: Financial intermediation. In: Handbook of the Economics of Finance,
vol. 1, pp. 431–552. Elsevier (2003)
15. Guang-Wen, Z., Siddik, A.B.: The effect of Fintech adoption on green finance and environ-
mental performance of banking institutions during the COVID-19 pandemic: the role of green
innovation. Environ. Sci. Pollut. Res. 1–13 (2022)
244 E. P. Mesagan et al.

16. IEA: North Africa’s Pathways to Clean Energy Transitions (2020). Available at: https://
www.iea.org/commentaries/north-africa-s-pathways-to-clean-energy-transitions. Accessed 3
Jan 2023
17. Joya, B.: South Africa’s Green Fund (2014). Available at: https://www.greenfinancepla
tform.org/sites/default/files/downloads/best-practices/GGBP%20Case%20Study%20Series_
South%20Africa_Green%20Fund.pdf. Accessed 3 Jan 2023
18. Khan, I.S., Ahmad, M.O., Majava, J.: Industry 4.0 and sustainable development: a system-
atic mapping of triple bottom line. Circular Economy and Sustainable Business Models
perspectives. J. Clean. Prod. 297, 126655 (2021)
19. Lewan, M.: The internet as an enabler of FinTech. In: The Rise and Development of FinTech,
pp. 190–204. Routledge (2018)
20. Melinda, M., Qiu, J.: Climate Finance in Southeast Asia: Trends and Opportunities (2022).
Available at: https://www.iseas.edu.sg/articles-commentaries/iseas-perspective/2022-9-cli
mate-finance-in-southeast-asia-trends-and-opportunities-by-melinda-martinus-and-qiu-jia
hui/. Accessed 1 Jan 2023
21. Mesagan, E.P., Akinsola, F., Akinsola, M., Emmanuel, P.M.: Pollution control in Africa: the
interplay between financial integration and industrialisation. Environ. Sci. Pollut. Res. 29(20),
29938–29948 (2022)
22. Mesagan, E.P., Charles, A.O., Vo, X.V.: The relevance of resource wealth in output growth and
industrial development in Africa. Resour. Policy 82, 103517 (2023)
23. Mesagan, E.P., Vo, X.V., Emmanuel, P.M.: The technological role in the growth-enhancing
financial development: evidence from African nations. Econ. Change Restruct. 1–24 (2022).
https://doi.org/10.1007/s10644-022-09442-z
24. Muganyi, T., Yan, L., Sun, H.P.: Green finance, Fintech and environmental protection: evidence
from China. Environ. Sci. Ecotechnol. 7, 100107 (2021)
25. Nassiry, D.: The Role of Fintech in Unlocking Green Finance: Policy Insights for Developing
Countries (No. 883). ADBI Working Paper (2018)
26. Olaoye, O.: Environmental quality, energy consumption and economic growth: evidence from
selected African countries. Green Low-Carbon Econ. 1–9 (2023)
27. Olunkwa, C.N., Adenuga, J.I., Salaudeen, M.B., Mesagan, E.P.: The demographic effects of
Covid-19: any hope for working populations. BizEcons Q. 15(1), 3–12 (2021)
28. Piñeiro, V., Arias, J., Elverdin, P., Ibáñez, A.M., Morales Opazo, C., Prager, S., Torero, M.:
Achieving Sustainable Agricultural Practices: From Incentives to Adoption and Outcomes. Intl
Food Policy Res Inst (2021)
29. Sebestyén, V.: Renewable and Sustainable Energy Reviews: environmental impact networks
of renewable energy power plants. Renew. Sustain. Energy Rev. 151, 111626 (2021)
30. Tiseo, I.: Breakdown of Carbon Dioxide Emissions Worldwide 2000–2050, by Region (2022).
Retrieved from: https://www.statista.com/statistics/1257801/global-emission-shares-worldw
ide-region-outlook/. Accessed 8 Oct 2022
31. UNEP Report: Aligning Africa’s Financial System with Sustainable Development (2015).
Available at: https://www.greengrowthknowledge.org/sites/default/files/downloads/resource/
Aligning%20Africa%27s%20Financial%20System%20with%20Sustainable%20Develop
ment.pdf. Accessed 3 Jan 2023
32. Wang, J., Chen, X., Li, X., Yu, J., Zhong, R.: The market reaction to green bond issuance:
evidence from China. Pac. Basin Financ. J. 60, 101294 (2020)
33. World Bank: The Global Findex Database 2021: Financial Inclusion, Digital Payments, and
Resilience in the Age of COVID-19 (2021). Retrieved from: https://www.worldbank.org/en/
publication/globalfindex. Accessed 4 Jan 2023
11 The Quest for Financing Environmental Sustainability in Emerging … 245

34. World Bank: World Development Indicators (2021). Retrieved from: https://databank.worldb
ank.org/source/world-development-indicators. Accessed 4 Jan 2023
35. World Metrological Organisation: State of Climate in Africa Highlights Water Stress and
Hazards (2022). Retrieved from: https://public.wmo.int/en/media/press-release/state-of-cli
mate-africa-highlights-water-stress-and-hazards. Accessed 4 Jan 2023
Chapter 12
A Comprehensive Review of Bitcoin’s
Energy Consumption and Its
Environmental Implications

Sidhartha Harichandan, Sanjay Kumar Kar, and Abhishek Kumar

Abstract Bitcoin, which has the highest net worth among cryptocurrencies and
the most significant transaction volume, has an immense prospect in terms of
economic cost, rapid processing, and minimal risk, even while driving significant
worldwide transformation and disruption. Since cryptocurrencies and Bitcoin are
new, have an uncertain legal status, and bear the possibility of being engaged in
illicit behaviour, there is the opportunity for their usage as a highly unpredictable and
impulsive investment instrument with environmental consequences. This research
studies Bitcoin mining and blockchain technologies, as well as Bitcoin’s high energy
consumption and environmental impacts. This study signifies the process used to
calculate bitcoin’s energy consumption. We discuss the two prominent models (a)
Model-1 by Christian Stoll, Lena Klaaßen, and Ulrich Gallersdörfer, and (b) Model-
2—The CBECI model used to determine the energy consumption meant for mining
bitcoins. The findings suggest that the power needed by bitcoin mining has detri-
mental ecological and social effects, such as causing global warming and climate
change. Further, this study also forecasts the future of bitcoin mining and its influence
on sustainability.

Keywords Bitcoin mining · Cryptocurrency · Environmental effect · Blockchain


technology

S. Harichandan
Institute of Management Technology, Nagpur, India
S. K. Kar (B) · A. Kumar
Department of Management Studies, Rajiv Gandhi Institute of Petroleum Technology, Amethi,
Uttar Pradesh 229304, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 247
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_12
248 S. Harichandan et al.

12.1 Introduction

Over the past few years, there has been an increase in interest in bitcoin’s usage
and potential as an investment vehicle. When people lost trust in the global finan-
cial crisis, bitcoin emerged as a viable answer owning to its mathematical certainty
based on blockchain technology [1]. Bitcoin is a decentralised electronic payment
system that operates via a direct, anonymous, and secure web. By differentiating
itself from typical bank transactions, bitcoin, the most valuable cryptocurrency in
terms of market capitalization and transaction volume, delivers considerable advan-
tages to its users. While bitcoin has received much attention in recent years, its
essential mechanism, most notably blockchain technology, has risen in popularity at
a breakneck speed [2]. Due to its key characteristics of decentralisation, auditability,
and anonymity, blockchain is widely regarded as one of the most promising and
attractive technologies for a variety of industries, including supply chain finance,
manufacturing operations management, logistics management, and the Internet of
Things (IoT). When bitcoin and other cryptocurrencies are used in conjunction with
blockchain technology, they are not controlled by any organisation or government, as
printed cash is [3]. Bitcoin is produced, updated, and examined via the use of cryp-
tographic concepts and computer algorithms. With the birth of bitcoin, the market
saw the emergence of hundreds of alternative cryptocurrencies nicknamed altcoins
[4]. On the surface, cryptocurrencies seem to be exceedingly volatile and suitable for
speculation. Despite its promises and attractiveness, the current consensus method’s
first application in the bitcoin network’s real functioning demonstrates that it has
a substantial energy and carbon emission cost. As a consequence, resolving this
problem expeditiously is crucial.
When employing blockchain technology, bitcoin and other cryptocurrencies are
not controlled by any organisation or government, as fiat money is. Bitcoin is created,
modified, and inspected using an established technical infrastructure. It is created,
modified, and inspected using cryptographic principles and a software algorithm [5].
With the advent of bitcoin, dozens more cryptocurrencies, dubbed altcoins, entered
the market. When evaluating cryptocurrencies on a surface level, they seem to be
very volatile and ripe for speculation. Attracting users and investors, although cryp-
tocurrencies are autonomous in that they may be used for money laundering and
unlawful activity, they are reliant on government choices on bitcoin and its envi-
ronmental impact. Based on these, this review article aims to study the detrimental
effects of bitcoin mining on society. The annual carbon footprint released by bitcoin
is estimated to be 44.56 Mt CO2 equivalent, and it consumes 79.89 TWh of electricity
primarily generated from fossil fuels [6]. Additionally, a single bitcoin transaction
releases electronic wastes equivalent to 406.50 g, annually comprising 44.59 kt of
electronic waste being released into Earth’s ecosystem. As growing economies like
the USA, China, Japan, United Kingdom, and India take longer strides towards
making their economy sustainable and emission-free (net-zero) by 2070, it is the
need of the hour to look at this emerging issue [3]. Bitcoin mining is on the rise, and
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 249

the graph is estimated to grow further in the future, thus making sustainable mining
a must for bitcoin miners.
The objective of this article centres around the environmental implications of
Bitcoin’s energy consumption, particularly in the context of its mining process.
This issue stems from the significant amount of electricity consumed by the Bitcoin
network, primarily through its mining operations. This energy consumption is a
concern due to its association with carbon emissions and electronic waste, both
of which have negative environmental effects. The increasing adoption and growth
of Bitcoin mining exacerbate these concerns, making it imperative to address the
environmental impact and seek sustainable alternatives. In light of this, the present
article aims to explore and analyse the detrimental effects of Bitcoin mining on
society, present data regarding its carbon footprint and electronic waste generation
and propose potential sustainable solutions to mitigate these adverse impacts. The
research objectives are:
(a) To analyse the negative environmental consequences of Bitcoin mining,
including its carbon footprint and electronic waste generation, in order to
understand the extent of the problem.
(b) To quantify the energy consumption of the Bitcoin network and its mining
operations, particularly in terms of electricity usage, and to highlight the reliance
on fossil fuels for energy generation.
(c) To provide insight into the annual carbon emissions resulting from Bitcoin
mining activities, emphasizing the ecological consequences of the energy-
intensive process.
(d) To explore and suggest sustainable alternatives and solutions that could mitigate
the environmental impact of Bitcoin mining, with a focus on reducing energy
consumption and carbon emissions.
The novelty of this article stems from its comprehensive coverage of Bitcoin’s
energy consumption and environmental implications, its focus on global economies,
its provision of specific data, its proposal of sustainable alternatives, and its implica-
tions for the future. These aspects collectively set the article apart and contribute to the
ongoing discourse around the environmental challenges posed by cryptocurrencies.
The study is categorised into 5 sections. Section 12.1 introduces the concept,
while Sect. 12.2 focuses on the literature review. Section 12.3 discusses bitcoin
mining and its implications, while Sect. 12.4 highlights the economies of bitcoin
mining. Section 12.5 discusses the findings, while Sect. 12.6 suggests implications
for future sustainable mining of bitcoin and concludes the article in Sect. 12.7.

12.2 Literature Review

There are several viewpoints about bitcoin as a cryptocurrency. These viewpoints


may be classified into two categories: (i) optimistic viewpoints that stress bitcoin’s
benefits, and (ii) pessimistic viewpoints that emphasise bitcoin’s shortcomings. The
250 S. Harichandan et al.

fundamental argument of those who have a more hopeful view on bitcoin is that it is
built on a solid technological/cryptographic foundation and cannot be manipulated.
Pessimists claim that since it lacks a definite core, it risks financial ’balloon-lunacy,’
resulting in environmental devastation owing to the energy it requires. The first
group believes that cryptocurrencies should be subject to certain inspections and
regulations owing to their favourable attitude toward digital currencies, particularly
bitcoin, and that the system should be more legally based. Additionally, they argue
that bitcoin exchanges, regardless of their size, should be formed to monitor, examine,
and guarantee that procedures take place within a legal framework. Another key point
is that, as long as the system is legally sound and operates correctly, governments
may benefit from the market via taxes. The second group believes that bitcoin and
other digital currencies should be avoided and that no integration with this system
should occur due to its usage for money laundering.
Following a study of the literature, debates often centre on whether bitcoin is a
bubble or a commodity, a currency or a financial investment instrument, and, more
recently, on bitcoin’s energy usage. Numerous studies have been conducted on the
proper use of bitcoin to diversify portfolios, its utility as a hedge against the dollar
[7, 8], and a preference for it as an investing instrument rather than an alternative
payment method [9]. Another article describes bitcoin as both a speculative and a
conventional financial entity because to its unique structure [10]. Researchers have
compared bitcoin’s volatility to that of other financial instruments and concludes
that the bitcoin market is very speculative [11]. The primary objection levelled
towards cryptocurrencies is that they are often utilised for criminal activities such as
money laundering. Additionally, it is noted that cryptocurrencies, particularly bitcoin,
provide potential for tax evasion and may eventually replace tax havens [12]. In this
view, the bulk of publications consider bitcoin as a speculative investment instru-
ment rather than an alternative currency [13]. Due to the scarcity of research on the
consequences of bitcoin’s energy consumption and environmental impact, this study
intends to contribute to an increase in research on this subject.
Bitcoin has lately been in the headlines for a variety of reasons, including its value
and energy usage. The increasing levels of its energy consumption, and the likelihood
that this consumption will continue to rise, entail a slew of negative consequences.
The fact that approximately 80% of the world’s energy consumption is derived from
fossil fuels and that this status is unlikely to alter in the future creates major environ-
mental challenges. The huge amounts of energy consumption that bitcoin will achieve
are considered as a trigger for the depletion of limited fossil fuels. Bitcoin mining
are expanding in locations like China and India where primarily energy is generated
by burning coal resulting in poor air quality index in their prominent economic hubs
and cities. To cover the existing research gap as evident from the literature review,
the energy consumption and environmental impact of bitcoin have been highlighted
in this research, which examines bitcoin mining and blockchain technology. The
energy used by increasing bitcoin mining is seen as one of the primary impediments
to Bitcoin’s growth.
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 251

12.3 Bitcoin Mining and Its Implications

12.3.1 The Concept of Bitcoin Mining

About every ten minutes, so-called miners connect new sets of transactions (blocks)
to Bitcoin’s blockchain. These miners are not expected to trust each other when
operating on the blockchain. The code that controls Bitcoin is the only thing that
miners can rely on. Several rules are used in the code to ensure all transactions
are legitimate. For instance, a transaction is legal (valid) only if the sender already
possesses the amount transferred. Any miner independently verifies that transactions
follow these laws, removing the need to rely on other miners.
The key is to convince all miners involved in mining to agree on the same trans-
action history. Each miner in the network is continuously charged with planning
the blockchain’s next set of transactions. Just one of these blocks will be randomly
chosen to become the chain’s new block. Since random selection in a distributed
network is difficult, ‘Proof-of-Work (PoW)’ is used [14]. In PoW, the next block
is generated by the first miner to do so. This is better said than done, since the
Bitcoin algorithm makes mining very complicated [15]. Indeed, the protocol adjusts
the complexity on a regular basis to ensure that all miners in the network output
only one legitimate block every ten minutes on average. When one of the miners
eventually succeeds in producing a valid block, the rest of the network is notified.
Other miners will approve this block until they verify that it complies with all laws
and will then discard the block they were already working on. The fortunate miner is
credited with a set number of coins in addition to the transaction costs associated with
the current block’s processed transactions. The loop then repeats itself. The energy
efficiency of the hardware used for mining is measured in joules per terahash (J/TH)
[16]. Energy-efficient mining rigs operate at around 30 J/TH, while less efficient rigs
can consume over 100 J/TH [17].
The method of creating a legitimate block is largely trial and error, with miners
making several attempts per second to find the correct value for a block element
called the “nonce” and hoping that the finished block meets the specifications. As a
result, mining is often linked to a lottery in which you can choose your own figures.
The hash rate of your mining equipment determines the number of attempts (hashes)
per second. Usually, this would be calculated in Giga hash per second.

12.3.2 Estimating Energy Consumption of Mining Farms

For years, determining the precise carbon footprint of the bitcoin network has been a
concern. Not only does one need to understand the bitcoin network’s power require-
ments, but also where this power comes from. The position of miners is critical in
determining how dirty or clean the electricity they are using for mining. The envi-
ronmental impact of blockchain energy consumption is often measured in terms
252 S. Harichandan et al.

of carbon emissions [18]. This can be estimated based on the energy consumption
and the carbon intensity of the electricity used for mining [19]. Carbon intensity
measures the amount of CO2 emissions produced per unit of energy (gCO2 /kWh)
[20]. The global average carbon intensity varies by region and energy sources used
[21]. Assuming a carbon intensity of 500 gCO2 /kWh (a typical value for regions with
a mix of energy sources), and using the calculated daily energy consumption from
above (840,000 MJ = 233,333 kWh), the daily carbon emissions can be estimated:

Carbon Emissions = Energy Consumption × Carbon Intensity (12.1)

For example, Carbon Emissions = 233,333 kWh × 500 gCO2 /kWh = 116,666,500
gCO2 = 116.67 tonnes CO2
Just as identifying the computers are operating on the bitcoin network is difficult,
determining their position is similarly difficult. As it is assumed that bulk of these
miners belong from main land China [22, 23]. The average pollution factor of Chinese
grid is approximately 700 g of CO2 equivalent per kWh [24]. This average pollution
factor is put to use to calculate the carbon intensity of the electricity used for mining
(as a rough estimation). If 70% of bitcoin mining occurs in China and 30% of mining
is totally clean, this results in a weighted average carbon intensity of 490 g CO2 eq/
kWh. This figure will then be applied to an estimate of the bitcoin network’s power
consumption to calculate the network’s carbon footprint.
Approximately 65% of the global hash power belongs to China [25]. The
autonomous Xinjiang province, which accounts for 35.76% of the global total,
produces nearly half of the nation’s hash power [26]. China’s hash rate exceeds that of
the United States by nine times, with the US accounting for 7.24% of the world’s hash
rate, a long way below than that of China. The justification for such extensive mining
in Xinjiang in comparison to the global hash rate is the availability of cheap coal.
Though wind turbines surround the peaks of Urumqi in Xinjiang, they still compen-
sate for less than a quarter of the power generated last year [27]. Coal is what makes
up the reminder.

12.3.3 Issues with Bitcoin Mining

Bitcoin mining is a method of making new coins that entails the use of computers to
resolve complicated mathematical equations or crypto puzzles. Cryptocurrencies are
built on a shared network and require mining to operate. Bitcoin mining software is
designed to take about ten minutes on average for those on the network to solve the
complicated programme and decrypt a block. The method consumes a significant
amount of energy as miners use large and efficient systems to mine blocks and
validate transactions. The mining method consumes the biggest share of bitcoin’s
resources. Miners are compensated for their services with newly generated bitcoins
and data processing fees. Mining cryptocurrency often needs electricity provided
using fossil fuels. As the price of bitcoin increases, so does energy demand. The
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 253

Fig. 12.1 The cycle of bitcoin mining and its environmental implications

rising price provides miners with an additional opportunity to mine coins and attracts
new users to the bitcoin network. This is like a constant, never-ending process or
cycle (Fig. 12.1), which would eventually get bigger and bigger with time.
The constant block mining loop incentivizes bitcoin miners worldwide. Since
mining could provide a reliable source of income, people are willing to run energy-
intensive machinery in order to have a portion of it. This has resulted in the bitcoin
network’s overall energy consumption growing to enormous proportions over the
years, as the currency’s value scales new peaks. A single bitcoin transaction’s carbon
footprint is equivalent to 406.28 kg CO2 , equal to the carbon footprint of approx-
imately 1million VISA transactions. In terms of electricity energy consumption, a
single bitcoin transaction consumes 729 kWh of energy. Electronic waste released
from single bitcoin transactions is around 406.5 g, and annually, this waste accounts
for 44.59 kt [6]. Such is the drastic impacts of bitcoin mining on the environment.
The bitcoin network as a whole now uses more electricity than most countries. If
bitcoin were a nation, it would have consumed 117.11 TWh per year, which would
have more than that consumed by countries like the Philippines, Kazakhstan even
Netherlands [28]. Similar sources even estimate that it would have come in the top 30
energy users globally if bitcoin had been a country. The massive chunk of this energy
comes from the conventional sources of energy, which are almost the by-products of
fossil fuels. The prime concern not only lies with their non-renewability and scarcity
but also with the massive carbon emission which is generated from them. It would
not be dismay here to mention that gradually with the days to come, bitcoin’s energy
consumption would soon be reflected in the top-10 global energy consumers.
254 S. Harichandan et al.

12.4 The Economies of Bitcoin Mining

Due to the growing number of miners and the continuous advancement of mining
technology, crypto mining has become much more competitive, requiring a rising
amount of computational power to succeed, and hence requires more investment than
in the early days. As such, potential miners must carefully analyse questions of cost
and profitability, which will be covered in further detail below.

12.4.1 Profitability of Bitcoin Mining

When calculating the profitability of crypto mining, numerous aspects must be


considered. To begin, the techniques for crypto mining might result in a range of hard-
ware expenses, depending on the miner’s selections. A specialised cryptocurrency
mining component, such as a GPU (graphic processing unit) or ASIC (application-
specific integrated circuit), may cost between 600 US$ and 9000 US$ or more. Joining
a mining pool may ease part of the requirement for high-end gear, but this generally
comes with a pool fee. Alternatively, in a process known as cloud mining, miners may
hire mining components from professional mining operators for a certain period of
time. Secondly, the energy costs associated with operating the crypto mining system,
which may add up over time due to the power-intensive nature of the activity. This
varies significantly across nations, which is why some countries become famous as
crypto mining locales, as miners rush to the country with the cheapest power. Thirdly,
one must evaluate the difficulties of crypto mining. The difficulty of crypto mining is
proportional to the aggregate computer power of the cryptocurrency’s miners, with
more aggregate computing power resulting in a harsher difficulty. Thus, increasing
the difficulty of crypto mining would entail a greater expenditure on the part of the
miner. The difficulty level fluctuates according to the kind of cryptocurrency being
mined at any given moment. Fourthly, the value of the cryptocurrency that is mined
should be greater than the total expenses. That is to say, the profitability of crypto
mining is always contingent upon the cryptocurrency’s volatility.

12.4.2 Regulation of Bitcoin Mining

Apart from the inherent risk of cryptocurrency mining profitability, prospective


miners should regularly monitor the government’s stance on crypto mining and cryp-
tocurrencies in general. Each country’s government’s attitude on crypto mining may
differ from accommodating to outright prohibiting it, which may mean the end for
tiny mining enterprises without the requisite cash to migrate. Since May and June
of this year, China, which formerly accounted for more than half of the world’s
bitcoin mining operations1, has been clamping down on the crypto mining business
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 255

throughout the nation. China’s State Council stated the necessity to limit financial
risk in this respect, while local governments in the country’s most active mining areas
cited the abuse of power or the use of electricity from highly polluting sources as
justifications for the crackdown.
On the other hand, mining-friendly nations such as Kazakhstan and several states
in the United States, such as Texas, have regulations that see crypto mining enterprises
as a potential boost to their economies. Kazakhstan, which now has the world’s second
largest bitcoin mining industry due to the influx of Chinese crypto miners following
the recent crackdown, formally legalised crypto mining in 2020, confirming its legal
status and amending the tax code to allow for crypto mining to be taxed based on the
miner’s electricity consumption. This hospitable attitude may change, however, as the
recent surge of Chinese cryptocurrency miners has pushed the demand for energy
to an all-time high, leading the government to push for a new draught legislation
rationing power to new crypto mines.
This seems to be the trend in nations where cryptocurrency mining enterprises are
establishing a foothold. Even if these nations’ governments retain a favourable or at
least neutral posture toward crypto mining, the high energy consumption associated
with mining may necessitate government involvement to regulate the amount of
power given to crypto miners. This is the situation in Iran, where a licence structure
has been developed for the crypto mining company; yet the government was obliged
to impose a four-month ban on all mining activities after the summer blackouts.

12.5 Discussion

12.5.1 Calculating Bitcoin’s Energy Consumption

To determine the cost of how much electricity it will take to generate a bitcoin,
one must first understand a few basic concepts. To begin, what is the price of elec-
tricity in the mining area? Second, how much electricity will be consumed by the
mining or decrypting processors? More efficient computing technology consumes
less energy, resulting in lower utility bills. The lesser the energy price, the lower
the cost it is to miners. This raises the value of bitcoin for miners in areas with
lower production costs. Bitcoins are created by computer-based miners who consume
enormous quantities of electricity. Some scientists believe bitcoin is harmful to the
ecosystem because of its energy-intensive nature. Knowing how bitcoin is created is
important for a greater understanding of how the electrical resources used to operate
the bitcoin network functions.
The first step is to determine the amounts of sums performed per second to solve
the puzzles. Then figure out how much energy each sum needs. “Hashes” are the term
for these sums [29]. There are a lot of them, and they’re usually measured in millions
(called Mega hashes) or billions (Giga hashes) or quintillions (Exa hashes). It is
estimated that the bitcoin network’s processors were producing up to 120 exa hashes
256 S. Harichandan et al.

every second in early 2020 [30]. Many firms have concentrated on Application-
Specific Integrated Circuit (ASIC) mining computers, although there are various
bitcoin-mining computers available. ASICs need less energy to perform calculations
[31].
While the overall network hash rate can be easily measured, it is difficult to
determine what it reflects in terms of energy usage due to the lack of a central registry
of all operating devices. To arrive at a given number of watts used per Gigahash/sec
(GH/s), energy utilisation calculations are being used to provide an assumption of
what devices were already operating and how they were transmitted. The calculation
of bitcoin’s energy consumption circles around the premises of miner’s income and
expense, as shown in Fig. 12.2. Since energy prices are also a large part of ongoing
costs, the bitcoin network’s gross electricity usage is often linked to miner profits.
Simply stated, the greater the mining income, the more energy-intensive machinery
will be funded. The revenue derived from the coin is measured after it is produced
and decrypted by the miner. A proportion of the revenue generated by the coin, is
ascertained to the cost spent on electricity consumption. The point to note here is that
electricity prices vary from place to place. After determining the rate, it is translated
to a consumption price, and thus the energy consumption is calculated. Though there
are numerous bitcoin mining computers available, research has mostly concentrated
on ASIC mining computers [31, 32], owing to their speed and efficiency. Mining
companies that use a lot of ASIC state that they only consume, one watt of electricity
per giga hash per second of computation while mining bitcoin [32]. Below are few
correlations of bitcoin’s power consumption related to other entities [33].
In 2020 bitcoin network consumed 120 GW/s.

• Calculation of total bitcoin mining revenues


1

• Estimation of what proportion of it is spent on energy (electricity)


consumption
2

• Calculation of how much do miners pay per kWh


3

• Conversion of cost to consumption


4

Fig. 12.2 Process used to calculate bitcoin’s energy consumption


12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 257

• 120 GW/s = 63 TWh/year, (relating the energy usage with the volume of hash
rates)
• 63 TWh/year = 13,200 million LEDs (note, 1 GW = 110 million LEDs) or
• 63 TWh/year = 375 million Photo-voltaic (PV) cells (1 GW = 3.125 million PV
cells) (given that they are generating power at peak production per second)
• Time taken to mine 1 bitcoin = 10 min, (Irrespective of the no. of miners involved)
using average power derived from ASIC miners, all other factors remaining
constant
• 10 min = 600 s,
• Power to mine 1 bitcoin = 72,000 GW or 72 TW
The above correlational aspects of bitcoin are few peripherals that state the vast
amount of energy consumption used for mining bitcoins. The same energy consumed
for mining in 2020 could have illuminated millions of families in energy impover-
ished countries. It is also the equivalent amount of energy that 375 million PV cells
would have generated. These are just a few stigmatic effects of bitcoin’s energy
consumption, leave its greater ecological implications.

12.5.2 Current Models Used to Calculate Bitcoin’s Energy


Consumption

12.5.2.1 Model-1 by Christian Stoll, Lena Klaaßen, Ulrich


Gallersdörfer

In an analysis titled “The Carbon Footprint of Bitcoin” [34], the authors prop-
erly account for these geographical disparities (while also adding a novel approach
for localising miners based on IP addresses), but nevertheless discover a weighted
average carbon intensity of 480–500 gCO2 eq per kWh for the entire Bitcoin network
(in line with previous and more rough estimations).

a. The Lower Threshold

A situation whereby all miners use the most powerful computational hardware defines
the lower threshold. The lower threshold of the range is determined by multiplying the
necessary computing power (as shown by the hash rate) by the energy consumption
of the most effective hardware:

Power Consumption lower(PCL ) = Hash rate(Hr )


× Energy efficiency of powerful
computational hardware(Ef )
PCL = HR × EF (12.2)
258 S. Harichandan et al.

b. The Upper Threshold


The break-even rate for sales and energy costs determines the upper bound. Miners
would disconnect their hardware from the network if their costs exceeded their
income from mining and validation, based on rational behaviour and attitude:

Power Consumption upper(PCU ) = [{(Block value(BV ) + Transaction fees(TF ))


( )]
×Market value(VM )} /Energy price Ep
× 1/Timespan(TS )
[ ]
PCU = {( BV + TF ) × VM }/Ep × 1/TS (12.3)

c. Optimising Threshold
The optimal threshold is based on the lower limit but considers the network’s
projected energy efficiency and additional loss from cooling the processor and
auxiliary units.

Power Consumption optimal(PCO ) = HR × Acceptable efficiency of Hardware(EH )


× Additional Loss(AL )
PCO = HR × EH × AL (12.4)

The network’s realistic energy efficiency can be calculated utilizing the pricing
power of mining hardware manufacturers and the energy efficiency of the hardware
in use:
[ n ] [ ]
∑ ∑n
EH = SA × EFA + 1 − SA × EZ (12.5)
i=1 i=1

EH Realistic energy efficiency.


I Mining manufacturers (1, 2, 3, …, n).
SA Share of Application Specific Integrated Circuit (ASIC) producer.
EZ Efficiency (energy) for 0 profit.
EFA Efficiency (energy) of ASIC.

12.5.2.2 Model-2—The CBECI Model

The CBECI (Cambridge Bitcoin Electricity Consumption Index) is a real-time calcu-


lation of the bitcoin network’s total electricity load and consumption. The model is
based on a bottom-up method that Marc Bevand first introduced in 2017 and ranges
with various types of accessible mining hardware [35]. Given the inability to deter-
mine exact electricity usage, the CBECI offers multiple choices, including a lower
bound (floor) and upper bound (ceiling) calculation. A best-guess approximation
is estimated within the limits of this range to provide a more accurate calculation
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 259

that appears nearest to bitcoin’s actual annual energy usage. The first figure, calcu-
lated in GigaWatts (GW), corresponds to the overall electrical power used by the
bitcoin network. This number is changed every 30 s and represents the amount of
energy used by bitcoin.
The second figure is calculated in TeraWatt-hours (TWh) and corresponds to
the Bitcoin network’s cumulative annual energy usage. Thus, annualizing bitcoin’s
energy use over a year, assuming constant power demand at the above cost. Appli-
cation of 7-day moving average on the resulting data point is made (as evident in
Fig. 12.3). The performance value less reliant on short-term hash rate fluctuations,
making it more appropriate for comparisons with alternative energy sources. The
CBECI model is based on the concept that miners can operate their machines as
long as it is efficient in terms of energy cost. To calculate the profitability of a given
hardware type, we consider the total miner revenues, the total network hash rate, the
electricity efficiency of the hardware in question, and the average electricity price
miners have to pay per kWh.

EF (Mining hardware’s energy efficiency)


× EC (Cost of electricity) ≤ MR (Revenue from mining)
EF × EC ≤ MR (12.6)

Fig. 12.3 The estimated and minimum energy consumption of Bitcoin from January 2017 to Feb
2023*. Source Created by authors with data from [6]. *It is assumed that the processor used in the
mining is Antminer S19 pro (2020 onwards), Antminer S9 (2019–20), Antminer S15 (2018–19)
and Antminer S17e (2017–18)
260 S. Harichandan et al.

The level of profit (θ) is then estimated as:

θ = MR / EC (12.7)

a. The Lower Threshold

Power Consumption lower (PCL ) × Cost of electricity (EC ) = minimum [Effective


hardware energy efficiency {(EEFH ) (EC )}] × Hash rate (HR ) × Effective use of
power (PU )

PCL × (EC ) = min.[{(EEFH ) (EC )}] × HR × PU × 3.16 × 107 (12.8)

b. The Upper Threshold

Power Consumption upper (PCU ) × Cost of electricity (EC ) = maximum [least


effective hardware efficiency {(EEFH ) (EC )}] × Hash rate (HR ) × Effective use of
power (PU )

PCU × EC = max.[{(EEFH ) (EC )}] × HR × PU × 3.16 × 107 (12.9)

c. Optimising Threshold


n
Eoptimising (EC) = Ef /n × HR × PU × 3.16 × 107 (12.10)
i=1

12.6 Sustainability and Future of Mining

As more people attempt bitcoin mining, more will be the carbon emission in the
future. Several analysts believe that all 21 million bitcoins will be explored by 2140
[36–38]. On the other hand, the energy used for this exploration and its environmental
consequences cannot be overlooked. Currently, there are 18.7 million bitcoins in
circulation [39]. When new blocks are mined, this amount varies approximately
every 10 min. Currently, any new block contributes 6.25 bitcoins to the system [40].
Blockchain, today by many, is seen by more than just cryptocurrency. Bitcoin has
succeeded in establishing a global, transparent monetary structure, but it falls short as
a general-purpose blockchain network. Smart contracts, for example, are expected to
challenge existing market models of banking, commerce, and logistics. Blockchain,
like all previous transformative developments, is merely the framework and enabler
of innovative apps. Nonetheless, with regards to the environmental consequences
it highlights the need for further studies on externalities to assist policymakers in
establishing the appropriate rules for the implementation of these innovations.
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 261

12.6.1 The Discomforts of Switching to Renewable Electricity


for Mining

Due to the fact that the most cost-effective computers earn the highest income,
miners are motivated not only to use the most reliable hardware but also to check
out the cheapest source of power. The common location for such low-cost electricity
is China’s Sichuan province [41]. It is estimated that 48% of the world’s mining
potential is located at that place [42]. The southwest of China is capable of generating
vast volumes of hydropower, despite somewhat lower local demand. It is worthy of
mentioning here that China’s grid infrastructure is a barrier to clean energy production
at the moment [43]. The region’s power export capability is also constrained due to
inadequate grid penetration and a shortage of high-quality grid infrastructure. This
leaves the provinces of Sichuan and Yunnan with a surplus of hydropower, which
attracts energy-hungry and polluting factories looking to take advantage of the low
prices. One such industry is bitcoin mining. Unlike the power consumption of Bitcoin
mining equipment, which is constant throughout the year, hydropower generation is
seasonal. It is critical to understand that, while renewable energy sources are sporadic,
bitcoin miners have a persistent energy demand. When turned on, a bitcoin ASIC
miner may remain on until it either fails or becomes incapable of mining bitcoin
profitably [44]. As a result, bitcoin miners boost the grid’s baseload demand. They do
not need energy only when renewables are abundant; they also need electricity during
periods of supply scarcity. It is clearly evident from this why renewables can only
act as a secondary power option for mining, and conventional hydrocarbon-based
sources will always remain miners’ favourite.

12.6.2 Proof-of-Stake as an Alternative Strategy

Though proof-of-work was the first consensus algorithm to demonstrate its validity,
it is not the only one. Recent years have seen the emergence of more energy-efficient
algorithms, such as proof-of-stake. Proof-of-stake coins are created by coin owners
rather than miners, eliminating the need for power-hungry machines that generate as
many hashes per second as possible [45]. As a result, proof-of-stake consumes much
fewer resources than proof-of-work. Ethereum, the second-biggest cryptocurrency
at the time, is now running on proof of work but is preparing to move to proof of
stake. If Ethereum can transition to proof of stake, bitcoin, technically, can as well.
Bitcoin will eventually have to adopt such a consensus algorithm to improve envi-
ronmental sustainability dramatically. The only disadvantage is that there are several
proof-of-stake implementations, and none of them has been completely validated
yet. Nonetheless, the research on these algorithms provides a reason for optimism
for the future.
262 S. Harichandan et al.

12.6.3 Limitations on Circuit Applications for Reducing


Electronic Wastages

Apart from wasting enormous quantities of electricity, bitcoin also contributes to


an increasing amount of electronic waste. Since bitcoin first released in 2009, users
mined the currency using standard machines. However, in 2013, miners shifted to
ASICs capable of performing only mining algorithms. This sparked a computational
arms war, in which only the most powerful ASICs are capable of winning the race
for new bitcoins. When more contemporary models of the circuits become available
(average in about every 18 months), old units become redundant [46]. These pieces of
equipment are so operational sensitive that if they do not assist miners in decrypting
profitably, they become obsolete. Antminer S19 is the most recent ASIC on the
market. The replacement of existing ASICs with the current version is expected to
produce at least 11,000 tonnes of electronic waste. This equates to 135 g of e-waste
produced by each bitcoin transaction [6]. Like the proof of stake, more energy-
efficient circuits and processors should be designed for mining in the future. Also,
country-wise regulating bodies must be set up regarding legalising such energy-
consuming circuits, and power limitations must be set upon them in the days to
come. Regular monitoring and environmentally friendly disposal of these ASICs
must be an ethical obligation for every bitcoin miner and circuit manufacturers.

12.7 Conclusion

Cryptocurrencies have grown in popularity over the last several years, capturing
the interest of both consumers and investors. At the moment, Bitcoin and other
altcoins are just utilised as a prospective tool of investment rather than a medium
of commercial exchange. This research investigated the environmental impacts of
Bitcoin mining, the world’s first cryptocurrency, may it be in terms of market capital-
ization as well as transaction volume. It is widely accepted that fossil fuels account
for about 80% of worldwide energy consumption, that this perspective is unlikely
to alter in the near future. Despite significant advancements in alternate energy
sources, it is clear that the hydrocarbon industry mostly fulfils the world economy’s
energy demands. With this in mind, the energy consumed by miners throughout the
processes of verifying, recording, and creating Bitcoin has been investigated here.
It has been stressed that the enormous amount of energy required to mine Bitcoin is
not sustainable because of the high demand for computing power.
Bitcoin’s daily growth in energy consumption has resulted in consuming more
energy than several countries and posing numerous risks to the cryptocurrency’s
future. It is well known that in order to minimise the high energy expenses associ-
ated with Bitcoin mining, individuals and businesses have conducted these activities
in nations with cheap electricity. However, bitcoin transactions and mining require
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 263

energy, which is supplied from coal and thermal power plants—hydrocarbons—


which increases CO2 emissions and contributes to global warming, air pollution, and
even death rates. Sustainability of the environment is critical for the world’s develop-
ment and progress, and as outlined in the Paris Climate Agreement, measures against
global warming and climate change must be implemented. During this time period,
Bitcoin’s energy consumption scale has a detrimental effect on the environment and
is one of the most important impediments to Bitcoin’s development.

References

1. Jiang, S., Li, Y., Lu, Q., Hong, Y., Guan, D., Xiong, Y., Wang, S.: Policy assessments for
the carbon emission flows and sustainability of bitcoin blockchain operation in China. Nat.
Commun. 12, 1–10 (2021)
2. De Vries, A.: Cryptocurrencies on the road to sustainability: ethereum paving the way for
Bitcoin. Patterns, 100633 (2022)
3. Erdogan, S., Ahmed, M.Y., Sarkodie, S.A.: Analyzing asymmetric effects of cryptocurrency
demand on environmental sustainability. Environ. Sci. Pollut. Res., 1–11 (2022)
4. Alshahrani, H., Islam, N., Syed, D., Sulaiman, A., Reshan, A., Saleh, M., Rajab, K., Shaikh,
A., Shuja-Uddin, J., Soomro, A.: Sustainability in blockchain: a systematic literature review
on scalability and power consumption issues. Energies 16, 1510 (2023)
5. Mustafa, F., Lodh, S., Nandy, M., Kumar, V.: Coupling of cryptocurrency trading with the
sustainable environmental goals: is it on the cards? Bus. Strateg. Environ. 31, 1152–1168
(2022)
6. Digiconomist: Bitcoin Energy Consumption Index (2023). Digiconomist.com. https://digico
nomist.net/bitcoin-energy-consumption/
7. Bao, H., Li, J., Peng, Y., Qu, Q.: Can bitcoin help money cross the border: international evidence.
Financ. Res. Lett. 49, 103127 (2022)
8. Cole, B.M., Dyhrberg, A.H., Foley, S., Svec, J.: Can bitcoin be trusted? Quantifying the
economic value of blockchain transactions. J. Int. Financ. Mark. Inst. Money 79, 101577
(2022)
9. Yavuz, M.S., Bozkurt, G., Boğa, S.: Investigating the market linkages between cryptocurrencies
and conventional assets. EMAJ Emerg. Mark. J. 12, 36–45 (2022)
10. Kubal, J., Kristoufek, L.: Exploring the relationship between Bitcoin price and network’s
hashrate within endogenous system. Int. Rev. Financ. Anal. 84, 102375 (2022)
11. Murty, S., Victor, V., Fekete-Farkas, M.: Is bitcoin a safe haven for Indian investors? A GARCH
volatility analysis. J. Risk Financ. Manag. 15, 317 (2022)
12. Mariani, F., Polinesi, G., Recchioni, M.C.: A tail-revisited Markowitz mean-variance approach
and a portfolio network centrality. Comput. Manag. Sci. 19, 425–455 (2022)
13. Baur, D.G., Oll, J.: Bitcoin investments and climate change: a financial and carbon intensity
perspective. Financ. Res. Lett. 47, 102575 (2022)
14. Frankenfield, J.: Proof of Work (PoW). Investopedia.com (2021). https://www.investopedia.
com/terms/p/proof-work.asp
15. Aste, T.: The fair cost of bitcoin proof of work. SSRN Electron. J., 0–2 (2016).https://doi.org/
10.2139/ssrn.2801048
16. Hallinan, K.P., Hao, L., Mulford, R., Bower, L., Russell, K., Mitchell, A., Schroeder, A.: Review
and demonstration of the potential of bitcoin mining as a productive use of energy (PUE) to
aid equitable investment in solar micro-and mini-grids worldwide. Energies 16, 1200 (2023)
17. Kumari, P., Mamidala, V., Chavali, K., Behl, A.: The changing dynamics of crypto mining and
environmental impact. Int. Rev. Econ. Financ. (2023)
264 S. Harichandan et al.

18. Sibande, X., Demirer, R., Balcilar, M., Gupta, R.: On the pricing effects of bitcoin mining in
the fossil fuel market: the case of coal. Resour. Policy 85, 103539 (2023)
19. Bruno, A., Weber, P., Yates, A.J.: Can Bitcoin mining increase renewable electricity capacity.
Resour. Energy Econ., 101376 (2023)
20. Asgari, N., McDonald, M.T., Pearce, J.M.: Energy modeling and techno-economic feasibility
analysis of greenhouses for tomato cultivation utilizing the waste heat of cryptocurrency miners.
Energies 16, 1331 (2023)
21. Sapra, N., Shaikh, I.: Impact of bitcoin mining and crypto market determinants on bitcoin-based
energy consumption. Manag. Financ. (2023)
22. Chow, S., Peck, M.E.: The bitcoin mines of China. IEEE Spectr. 54, 46–53 (2017). https://doi.
org/10.1109/MSPEC.2017.8048840
23. Jiang, S., Li, Y., Lu, Q., Hong, Y., Guan, D., Xiong, Y., Wang, S.: Policy assessments for
the carbon emission flows and sustainability of Bitcoin blockchain operation in China. Nat.
Commun. 12, 1–10 (2021). https://doi.org/10.1038/s41467-021-22256-3
24. Mittal, M.L.: Estimates of emissions from coal fired thermal power plants in India 39, 1–22
(2010)
25. Gogo, J.: 65% of Global Bitcoin Hashrate Concentrated in China. Bitcoin.com (2020)
26. Benetton, M., Compiani, G., Morse, A.: CryptoMining: pollution, government incentives and
energy crowding out. (2019)
27. Murtaugh, D.: The possible Xinjiang coal link in Tesla’s bitcoin binge. Bloom (2021)
28. EIA: International electricity consumption. US Energy Inf. (2019)
29. Narayanan, A.: Hearing on Energy Efficiency of Blockchain and Similar Technologies (2018)
30. Redman, J.: BTC’s Hashrate Touches 120 Exahash, But the Price Has Not Followed (2020).
Bitcoin.com. https://news.bitcoin.com/btcs-hashrate-touches-120-exahash-but-the-price-has-
not-followed/
31. Li, J., Li, N., Peng, J., Cui, H., Wu, Z.: Energy consumption of cryptocurrency mining: a study
of electricity consumption in mining cryptocurrencies. Energy 168, 160–168 (2019). https://
doi.org/10.1016/j.energy.2018.11.046
32. Küfeoğlu, S., Özkuran, M.: Bitcoin mining: a global review of energy and power demand.
Energy Res. Soc. Sci. 58, 101273 (2019). https://doi.org/10.1016/j.erss.2019.101273
33. OEERE: How Much Power is 1 Gigawatt? (2019). energy.gov. https://www.energy.gov/eere/
articles/how-much-power-1-gigawatt
34. Stoll, C., Klaaßen, L., Gallersdörfer, U.: The carbon footprint of bitcoin. Joule 3(7), 1647–1661
(2019). https://www.cell.com/joule/abstract/S2542-4351(19)30255-7&lang=en
35. CCAF: Cambridge Bitcoin Electricity Consumption Index (2017)
36. Hayes, A.: What Happens to Bitcoin After All 21 Million Are Mined? (2021). Investo-
pedia.com. https://www.investopedia.com/tech/what-happens-bitcoin-after-21-million-
mined/
37. Kim, C.: With 18 million bitcoins mined, how hard is that 21 million limit? (2019). coindesk
indices. https://www.coindesk.com/with-18-million-bitcoins-mined-how-hard-is-that-21-mil
lion-limit
38. Yermack, D.: Is bitcoin a real currency? SSRN Electron. J. (2013). https://doi.org/10.2139/
ssrn.2361599
39. de Best, R.: Number of bitcoins in circulation worldwide from October 2009 to April 13,
2021(in millions) (2021). Statista.com. https://www.statista.com/statistics/247280/number-of-
bitcoins-in-circulation/
40. Song, Y.D., Aste, T.: The cost of bitcoin mining has never really increased. arXiv 3, 1–8 (2020).
https://doi.org/10.3389/fbloc.2020.565497
41. Cocco, L., Tonelli, R., Marchesi, M.: An agent based model to analyze the bitcoin mining
activity and a comparison with the gold mining industry. Futur. Internet 11, 1–12 (2019).
https://doi.org/10.3390/fi11010008
42. de Vries, A.: Bitcoin’s growing energy problem. Joule 2, 801–805 (2018). https://doi.org/10.
1016/j.joule.2018.04.016
12 A Comprehensive Review of Bitcoin’s Energy Consumption and Its … 265

43. Leyman, P., Vanhoucke, M., Althusser, L., Foucault, M.: The soul is the prison of the body.pdf.
Int. J. Prod. Res. (2018). ISBN: 978-92-9260-061-7
44. de Vries, A.: Renewable energy will not solve bitcoin’s sustainability problem. Joule 3, 893–898
(2019). https://doi.org/10.1016/j.joule.2019.02.007
45. Ismail, L., Materwala, H.: A review of blockchain architecture and consensus protocols: use
cases, challenges, and solutions. Symmetry 11 (2019). https://doi.org/10.3390/sym11101198
46. Williamson, S.: Is bitcoin a waste of resources? Fed. Reserv. Bank St. Louis Rev. 100, 107–115
(2018). https://doi.org/10.20955/R.2018.107-15
Chapter 13
Emerging Economies: Volatility
Prediction in the Metal Futures Markets
Using GARCH Model

Ravi Kumar, Babli Dhiman, and Naliniprava Tripathy

Abstract This paper aims to study the volatility and its prediction using the GARCH
(1,1) model in the metal futures of two emerging economies, India and China. The
Metals considered for the study are aluminium, copper, lead, nickel, zinc, gold,
and silver. This study uses daily data from January 2016 to May 2021 from the
Shanghai Futures Exchange (SHFE) and Multi Commodity Exchange (MCX). The
study’s findings suggest the presence of short-run, long-run, and overall persistence
of shocks for all the metals.

Keywords Volatility · Commodity market · Futures market · GARCH · Emerging


economy

JEL Classification C22 · G13 · G15

13.1 Introduction

Trading in commodities has a much longer history than today’s frequently traded
asset classes like stocks, mutual funds, and even real-estate. It dates to the era when
people had no common currency, and the barter system prevailed. In modern times,
trading in commodities is still taking place, instead with more complex contracts
like futures and options, with more dedicated nationalized institutions, regulators,
and other important stakeholders. Trading in a commodity is as essential as anything

R. Kumar (B) · B. Dhiman


Mittal School of Business, Lovely Professional University, Phagwara, India
e-mail: [email protected]
B. Dhiman
e-mail: [email protected]
N. Tripathy
Indian Institute of Management Shillong, Meghalaya 793014, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 267
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_13
268 R. Kumar et al.

for economic development, for hopeful farmers’ growth and financial safety, and to
stretch other economic parameters like GDP and per capita income. It has a significant
role in bringing stability in price across the market and hedging price risks, which is
beneficial for agriculturists and manufacturing industries using agricultural products
as raw materials. For net importing countries, price movement predominantly affects
the economy [12].
Moreover, [7] highlight two critical roles of the futures market: hedging risks and
price discovery processes. Pavabutr and Chaihetphon [12] find the futures market’s
importance as this market responds to new information faster than the spot market
for lower transaction costs and a higher degree of leverage. The commodity market
provides a new asset class with the benefit of active participation in the commodity
market and helps disperse risk concentration. In the Chinese commodity futures
market, commodity futures have also been found to provide an effective tool for
the diversification of assets and combatting expected and unexpected inflation in the
economy [17]. However, the growth of commodity derivatives in such a globalized
and liberalized economy is not up to the expectations of investors and economists
because many others believe that speculation in commodities, especially in food
commodities, would cause malfunctioning of the spot market, and the prices could
be badly manipulated. This may result in an inflationary effect on the essential
commodities.
Moreover, in the absence of active participation of small farmers, this market often
needs to accommodate farmers [6]. Contrary to the theory of benefits to hedgers,
derivative markets have usually been found to be more favorable to speculators than
hedgers, and the possible reasons could be rigid contract specification, big lot size,
high transaction cost, taxes, and government intervention in free play. It is noticed
that the development and the studies on the commodity market are undoubtedly
unmatched by the potential of this market [10, 18, 19]. The issue of risk is universal
in any asset class. Although the volatility of some commodities is more frequently
studied, little attention is given for base metals and precious metals in both the
emerging markets. To the address the gap, the present study aims to extend the
literature in this area by investigating the extent to which ARCH and GARCH effects
has been present in base metals and precious metals price returns.
Since sound investment decisions are based on risk and return trade-off and
increased investment activities in the commodity market needs careful analysis and
estimation of future expected return, this paper aims to study the characteristics
of volatility and its prediction in the base metals (aluminium, copper, lead, nickel,
and zinc) and precious metals (gold and silver) futures markets of India and China,
representing the most prominent emerging economies using GARCH(1,1) Model
to deliver more accurate forecasts of future variance compared to the unconditional
variance.
13 Emerging Economies: Volatility Prediction in the Metal Futures … 269

The rest of the paper is structured as follows. Section 13.2 describes a brief
literature review of the stock and commodity market volatility studies in various
economies. Further, the data and methodology used have been pronounced in
Sect. 13.3. The results and discussion are elaborated in Sect. 13.4, and the conclusion
is deliberated in Sect. 13.5.

13.2 Literature Review

An investor needs to assess not only the return of a financial asset but also take care
of its risk. Risk is measured by modeling the volatility in the returns of a security.
Higher volatility represents a higher risk, and lower volatility indicates a lower risk.
Therefore, modeling and predicting volatility with greater accuracy is very important
in assessing a financial asset. Various authors have used the parsimonious model,
GARCH (1,1), to model and predict the volatility of a financial asset.
Karmakar [8] used standard and asymmetric models of GARCH to test the
predictability and asymmetricity of volatility in the Indian stock market returns.
The author reports the persistence and predictability of volatility in the market.
Regarding asymmetricity in volatility, it is reported that the market showed higher
volatility in times of the declining market. Bahadur [2] found GARCH (1,1) as the
most appropriate model for volatility forecasting in the Nepalese stock market and
reported clustering of high and low volatility periods, persistency, and the possibility
of prediction of volatility in the market. Similarly, [16] analyzed the Indian stock
market index (Sensex) using symmetric and asymmetric models of GARCH. They
found that despite the leverage effect, the symmetric GARCH model had a better
forecast of market volatility. [1] compared the predictive ability of the GARCH
(1,1) model and the implied volatility obtained from inverting the black equation.
The author modeled the volatility of WTI futures contracts traded at the NYMEX
and found the results of GARCH models to be more accurate [1]. Mahalakshmi
et al. [11] also used GARCH (1,1) on the MCX commodity index data from 2006
to 2011 and found the significant impact of its past price movements. Kumar and
Singh [9] applied various models of GARCH on the risk and returns of Indian stock
and commodity markets from 1990 to 2007. The results showed the presence of
volatility clustering, persistence, and asymmetric behavior of volatility. The author
also reported that the risk-return relationship was insignificant for the NIFTY index
and agricultural commodity (soybean).
On the contrary, the risk-return relationship for the gold was found to be positive
and significant. Using the daily returns data from 1996 to 2010 from the US stock
market, [15] applied simple GARCH, exponential GARCH, and threshold GARCH
models to forecast the volatility and found that the symmetric GARCH is better
than the asymmetric models in the forecasting of the volatility of S&P 500 index
returns. [14] studied the volatility in the stock market returns of Asian countries using
270 R. Kumar et al.

the Exponential GARCH model and found the clustering of volatility, persistence,
asymmetry, and leverage effect in the return of the stock markets of India, China, and
Japan, and Hong Kong. The authors also reported the positive impact of the subprime
crisis on the volatility of returns in India, China, and Japan. On the other hand, the
period of the Eurozone debt crisis showed a negative impact on the stock returns of
India and China.
Volatility spillover and transmission have played an important role in interna-
tional economic decisions [13]. Forecasting volatilities in any financial asset class
is of prime importance for risk management, asset pricing, and asset allocation [3].
Volatility spillover in commodities has been weaker than other asset classes but has
also been increasing over time. Moreover, agricultural commodities contribute less
than metal and energy in spillovers [4]. Metal markets of LME are found to be highly
integrated across the market [5]. Compared to the agricultural futures market, the
metal futures market in China is also more efficient and less risky. However, overall,
the Chinese commodity futures market lags behind the US market in terms of liquidity
and volatility [10]. China and the US agriculture commodity futures market show
significant positive correlation and high upside and downside risk spillover during
high uncertainty [19]. Risk spillover is extreme between Shanghai and London Gold
futures markets in the pre and post-crisis periods [18].

13.3 Data and Methodology

Daily data for the analysis has been retrieved from the websites of the respective
exchanges (MCX and SHFE). The period of study has been taken from January
2016 to May 2021. Aluminium, copper, lead, nickel, and zinc have been taken from
the base metals segment. From the bullion’s category, gold and silver have been
considered. For the commodities from the Shanghai futures exchange, the contract
with the highest trade (open interest) has been used to prepare continuous price
series for each date. MCX provides individual metal indices that are designed using
a well-defined methodology. The analysis is done using RStudio software.
Innovation (adaptability), persistency, and mean reversion are the three main char-
acteristics of Volatility. Researchers and academicians have been employing simple
moving averages, exponential weighted moving averages, and the GARCH models
to model and forecast volatility. The simple moving average method is needed to
capture the mean reversion property. Further, the adaptability also depends upon the
window size considered in the model. The exponential weighted moving average
emphasizes innovation and persistence factors in its model. It is mathematically
represented as

σn2 = αr2n−1 + βσn−1


2
(13.1)

where r2n−1 represents innovation and α is the innovation factor. Similarly, σn−1
2

denotes lagged variance depicting persistence, and β is the persistence factor. In


13 Emerging Economies: Volatility Prediction in the Metal Futures … 271

this equation sum of α and β is equal to 1. The exponential weighted moving average
is based on exponentially decreasing weight as the lag value increases.

13.3.1 GARCH (1,1)

GARCH models, proposed by Bollerslev in 1986, are widely accepted among


researchers and academicians for modelling volatility and studying the volatility
spillover. GARCH (1,1) model is mathematically represented as

σn2 = ω + αr2n−1 + βσn−1


2
(13.2)

where ω = γ × (long term unconditional variance).


ω is expressing mean reversion level, and γ is the weight assigned to the mean
reversion factor. Using the long-term unconditional variance, volatility is conditioned
based on innovation and current variance. In this way, this model incorporates all the
three characteristics of volatility which are innovation (having weight α), persistence
(having weight β) and mean reversion (γ). So, the sum of all the three weights (σ, β
and γ) assigned is 1.
In the GARCH model, all the characteristics of volatility (mean reversion
tendency, persistence and innovation) have been given due weight. Further, the sum
of the ARCH term and the GARCH term (α + β) is less than 1, so the mean reversion
term (ω) is positive (mean-reverting). On the other hand, if the sum is greater than 1,
the model becomes mean fleeing instead of mean-reverting, and the model ceases to
be stable. In this case, GARCH use is not appropriate, and an exponential weighted
moving average is preferred.

13.4 Results and Discussion

Descriptive statistics for the returns have been presented in Table 13.4 in the appendix.
The negative skewness of returns and positive kurtosis with more than 3 (excess
kurtosis) indicates that the time series data are leptokurtic relative to the normal.
It shows positive and significantly different from zero, indicating that the series is
leptokurtic, displays non-normality, and the existence of heteroscedasticity.
Before applying the GARCH (1,1) model, we test the presence of autocorrelation
in all the variables using the ARCH LM test, and the results have been presented in
Table 13.1. The P-values of the arch test for most of the variables are found to be
significant at 1 and 5%. Only nickel futures at both exchanges are significant at 10%.
Figure 13.1 in the appendix represents the time-series graph of the log return of the
variable, which shows the clustering of volatilities in the variables.
ADF tests are used in the study to examine the stationary properties of time series.
The result of the test statistics rejects the Null hypothesis of variables that are not
272 R. Kumar et al.

Table 13.1 ARCH LM test result


MCX SHFE
Chi sq. df P-value Chi sq. df P-value
Alum. 305.52 12 2.20E − 16 82.639 12 1.29E − 12
Copper 23.623 12 0.02288 193.68 12 2.20E − 16
Lead 43.738 12 1.69E − 05 105.92 12 2.20E − 16
Nickel 18.897 12 0.09104 18.658 12 0.09712
Zinc 31.839 12 0.001465 75.13 12 3.47E − 11
Gold 45.673 12 7.90E − 06 105.4 12 2.20E − 16
Silver 131.62 12 2.20E − 16 174.15 12 2.20E − 16
Source Author’s calculation

stationary. So, the stationary of the time series has been confirmed by accepting the
alternative hypothesis. Results of the ADF test for confirming the stationarity of data
have been presented in Table 13.2. It states that all the return series are stationary at
level.
The GARCH results have been presented in Table 13.3. The sum of the ARCH
term (α) and GARCH term (β) is less than 1 for all the variables. It confirms that
ω is positive in all the cases, as seen in Table 13.3. This indicates the presence of
mean reversion property and ensures the stability of the model. The innovation char-
acteristics are shown by the innovation factor (α). In the metals segment (aluminium,
copper, lead, nickel, and zinc) of both the exchanges, α is found to be significant (at
1 percent) for all the variables except aluminium (SHFE) and lead (MCX). This indi-
cates that for most of the metal futures of MCX and SHFE, a short-run persistence
of shocks exists. For bullions, including gold and silver futures, the ARCH term is
significant in all cases, indicating the presence of short-term volatility persistence.
Moreover, it is found that for all the metals except silver, short-term persistency
is higher at SHFE. Long-run persistence is depicted by β, which is significant at 1%

Table 13.2 ADF test result


MCX SHFE
Dickey-Fuller Lag p-value Dickey-Fuller Lag p-value
Alum. − 12.078 10 0.01 − 9.7286 10 0.01
Copper − 10.713 10 0.01 − 10.031 10 0.01
Lead − 10.882 10 0.01 − 10.21 10 0.01
Nickel − 10.771 10 0.01 − 11.357 10 0.01
Zinc − 11.139 10 0.01 − 11.104 10 0.01
Gold − 10.533 10 0.01 − 11.092 10 0.01
Silver − 10.572 10 0.01 − 10.33 10 0.01
Source Author’s calculation
13 Emerging Economies: Volatility Prediction in the Metal Futures … 273

Table 13.3 GARCH (1,1) results


Exchange MCX SHFE
METALS Estimates Estimates
Alum. μ 0.000057 μ 0.000101
ω 0.000007*** ω 0.000001
α 0.077832*** α 0.080353
β 0.852884*** β 0.915401***
α+β 0.930716 α+β 0.995754
Copper μ 0.000289 μ 0.000331
ω 0.000004*** ω 0.000008***
α 0.026042*** α 0.110534***
β 0.948097*** β 0.830695***
α+β 0.974139 α+β 0.941229
Lead μ − 0.00004 μ − 0.00011
ω 0.000001 ω 0.000002
α 0.024645 α 0.040866***
β 0.967698*** β 0.944015***
α+β 0.992343 α+β 0.984881
Nickel μ 0.000306 μ 0.000394
ω 0.000008*** ω 0.000006***
α 0.018683*** α 0.027107***
β 0.952177*** β 0.949714***
α+β 0.97086 α+β 0.976821
Zinc μ 0.000481 μ 0.000335
ω 0.000006*** ω 0.000004***
α 0.034084*** α 0.039527***
β 0.937587*** β 0.938896***
α+β 0.971671 α+β 0.978423
Gold μ 0.00022 μ 0.000164
ω 0 ω 0
α 0.019559*** α 0.040115**
β 0.978451*** β 0.957495***
α+β 0.99801 α+β 0.99761
Silver μ − 0.00018 μ 0.000029
ω 0.000001 ω 0.000001
α 0.047829* α 0.046499***
(continued)
274 R. Kumar et al.

Table 13.3 (continued)


Exchange MCX SHFE
METALS Estimates Estimates
β 0.947487*** β 0.95112***
α+β 0.995316 α+β 0.997619
Source Author’s calculation
Notes ***significant at 1% significance level, **significant at 5% significance level, *significant at
10% significance level

for all the variables, including metals and bullions. Since the ARCH terms (except
aluminium at SHFE and lead at MCX) and the GARCH terms are significant for
all the metals at both exchanges, it is inferred that the volatility can be forecasted
for the metals and bullions futures at MCX and SHFE. The ARCH and GARCH
terms contain information from one previous period return and conditional variance,
respectively. Further, the one-period lagged conditional variance term can be said
to contain information from past returns (multiple lags). Therefore, the value of the
GARCH term (β) is supposed to be much higher than the ARCH term (α). We find
that for all the variables, the weightage of the ARCH term (α) is much lesser than the
GARCH term (β). The sum of the ARCH and GARCH terms is close to 1 in all the
cases, which shows the overall persistence of volatility. The closeness of the sum of α
and β to 1 shows the degree of persistency of volatility. For aluminium, nickel, silver,
and zinc, the overall persistency is higher at SHFE, and for the other metals (copper,
lead, and gold), it is higher at MCX. These results are consistent with the literature
on volatility modeling and prediction [9, 14]. In addition, [11] have also reported
the significant impact of past price movements in the MCX commodity index. The
findings of this paper have various important implications for investors and portfolio
managers. Also, the study is believed to enrich the literature on commodity market
volatility in emerging economies.

13.5 Concluding Observation and Managerial Implication

The study aims to investigate the volatility prediction for the metals segment,
including base metals (aluminium, copper, lead, nickel, and zinc) and precious metals
(gold and silver) from MCX and SHFE using daily returns data from January 2016
to May 2021 applying GARCH (1,1) model. The results of the study report that the
ARCH terms (except aluminium at SHFE and lead at MCX) and GARCH terms are
highly significant for all the metals at both exchanges. This inferred that volatility
can be predicted for all the variables. Furthermore, it is found that there is an overall
persistence of volatility in the metal futures traded at the exchanges.
The present study’s findings will provide valuable insights to the researchers,
investors/ institutional investors to understand the volatility characteristics and
13 Emerging Economies: Volatility Prediction in the Metal Futures … 275

prediction methods within the base metals (aluminium, copper, lead, nickel, and zinc)
and precious metals (gold and silver) futures markets of India and China to make
better investment decisions by diversifying their risk. The study’s results signifi-
cantly influence shaping the portfolio risk management strategies for individuals
and institutional investors participating in commodity markets. The analysis and the
information from the market movement can be instrumental to the investors/traders in
making better investment decisions and portfolio diversifications. The results of our
analysis is crucial for investors, portfolio managers, researchers, and practitioners
interested in effectively managing risk and optimizing investment strategies within
these emerging economies. It will also be helpful to policymakers, regulators, and
commodity traders to formulate better strategies to capture market share in the global
arena.
Volatility has become an essential topic in studying the risk associated with a
financial asset. The commodity markets have various categories of commodities
and can accommodate various stakeholders. In emerging economies, the commodity
market has enormous development, research, and returns potential. Therefore, in
the future, the research on the volatility and connections in volatility among the
markets can be extended in various ways with multiple other econometric tools
to accommodate a wide range of stakeholders, including farmers (providing the
commodity as raw material), industrialists (demanding raw material), speculators,
policymakers and governments, etc.

Appendix

See Fig. 13.1 and Table 13.4.


276

Fig. 13.1 Time plot of metal futures returns. Note The letter ‘I’ before the metals name denote India (MCX), and similarly, the letter ‘C’ before the metals
name denotes China (SHFE)
R. Kumar et al.
Table 13.4 Descriptive statistics of returns
MCX SHFE
Alum. Copper Lead Nickel Zinc Gold Silver Alum. Copper Lead Nickel Zinc Gold Silver
N 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291 1291
Mean 1.83E − 4.08E − 4.43E 3.76E 5.87E − 3.60E 2.12E 4.36E 5.52E − 1.33E 5.05E − 4.34E 4.18E 4.26E
04 04 − 06 − 04 04 − 04 − 04 − 04 04 − 04 04 − 04 − 04 − 04
Median − 3.80e 3.31E − 4.38E 4.44E 0.00108 3.01E 3.56E 3.60E 3.97E − 2.98E 0.00131 2.14E 3.51E 2.52E
−4 04 − 04 − 04 − 04 − 04 − 04 04 − 04 − 04 − 04 − 04
Minimum − − 0.0667 − − 0.08 − 0.0674 − − − − 0.0647 − − 0.0722 − − −
0.0905 0.0564 0.0565 0.119 0.067 0.0741 0.0709 0.0481 0.103
Maximum 0.0741 0.044 0.0712 0.0726 0.0733 0.0485 0.068 0.0498 0.0616 0.0696 0.0695 0.0575 0.054 0.0806
Skewness 0.342 − 0.302 0.0305 − 5.07E − − − − − 0.1 − − 0.155 − − −
0.138 04 0.378 0.934 0.289 0.156 0.136 0.0438 0.379
Std. error 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681 0.0681
13 Emerging Economies: Volatility Prediction in the Metal Futures …

Kurtosis 7.39 2.6 2.33 1.89 1.76 5.17 9.9 3.72 5.06 2.73 1.67 1.82 5.31 6.66
Std. error 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136 0.136
Source Author’s calculation
277
278 R. Kumar et al.

References

1. Agnolucci, P.: Volatility in crude oil futures: a comparison of the predictive ability of GARCH
and implied volatility models. Energy Econ. 31(2), 316–321 (2009). https://doi.org/10.1016/j.
eneco.2008.11.001
2. Bahadur, G.C.S.: Volatility analysis of Nepalese stock market. J. Nepalese Bus. Stud. 5(1),
76–84 (2009). https://doi.org/10.3126/jnbs.v5i1.2085
3. Chen, R., Xu, J.: Forecasting volatility and correlation between oil and gold prices using a
novel multivariate GAS model. Energy Econ. 78, 379–391 (2019). https://doi.org/10.1016/j.
eneco.2018.11.011
4. Chevallier, J., Ielpo, F.: Volatility spillovers in commodity markets. Appl. Econ. Lett. 20(13),
1211–1227 (2013). https://doi.org/10.1080/13504851.2013.799748
5. Ciner, C., Lucey, B., Yarovaya, L.: Spillovers, integration and causality in LME non-ferrous
metal markets. J. Commod. Mark. 17 (2020). https://doi.org/10.1016/j.jcomm.2018.10.001
6. Dey, K., Maitra, D.: Can commodity futures accommodate India’s farmers? J. Agribus. Dev.
Emerg. Econ. 6(2), 150–172 (2016)
7. Hua, R., Chen, B.: International linkages of the Chinese futures markets. Appl. Fin. Econ.
17(16), 1275 (2007). https://doi.org/10.1080/09603100600735302
8. Karmakar, M.: Asymmetric volatility and risk-return relationship in the Indian stock market.
South Asia Econ. J. 8(1), 99–116 (2007). https://doi.org/10.1177/139156140600800106
9. Kumar, B., Singh, P.: Volatility modeling, seasonality and risk-return relationship in GARCH-
in-mean framework: the case of Indian stock and commodity markets. SSRN Electron. J.
(2011). https://doi.org/10.2139/ssrn.1140264
10. Liu, Q., Luo, Q., Tse, Y., Xie, Y.: The market quality of commodity futures markets. J. Futures
Mark., 1–16 (2020). https://doi.org/10.1002/fut.22115
11. Mahalakshmi, S., Thiyagarajan, S., Naresh, G.: Commodity derivatives behaviour in Indian
market using ARCH/GARCH. JIMS8M J. Indian Manage. Strat. 17(2), 60–64 (2012)
12. Pavabutr, P., Chaihetphon, P.: Price discovery in the Indian gold futures market. J. Econ. Fin.
34(4), 455–467 (2010). https://doi.org/10.1007/s12197-008-9068-9
13. Seth, N., Panda, L.: Financial contagion: review of empirical literature. Qual. Res. Fin. Mark.
10(1), 15–70 (2018). https://doi.org/10.1108/QRFM-06-2017-0056
14. Singhania, M., Anchalia, J.: Volatility in Asian stock markets and global financial crisis. J.
Adv. Manage. Res. 10(3), 333–351 (2013). https://doi.org/10.1108/JAMR-01-2013-0010
15. Srinivasan, P.: Modeling and forecasting the stock market volatility of S&P 500 index using
GARCH models. IUP J. Behav. Fin. 1, 51–69 (2011)
16. Srinivasan, P., Ibrahim, P.: Forecasting stock market volatility of Bse-30 index using GARCH
models. Asia Pac. Bus. Rev. 6(3), 47–60 (2010). https://doi.org/10.1177/097324701000600304
17. Tu, Z., Song, M., Zhang, L.: Emerging impact of Chinese commodity futures market on
domestic and global economy. Chin. World. Econ. 21(6), 79–99 (2013). https://doi.org/10.
1111/j.1749-124X.2013.12047.x
18. Wang, G.J., Xie, C., Jiang, Z.Q., Stanley, H.E.: Extreme risk spillover effects in world gold
markets and the global financial crisis. Int. Rev. Econ. Financ. 46, 55–77 (2016). https://doi.
org/10.1016/j.iref.2016.08.004
19. Zhu, Q., Tansuchat, R.: The extreme risk spillovers between the US and China’s agricultural
commodity futures markets. J. Phys. Conf. Ser. 1324(1) (2019). https://doi.org/10.1088/1742-
6596/1324/1/012085
Chapter 14
Constructing a Broad View of Tax
Compliance Intentions Based on Big Data

Mekar Satria Utama, Solimun, and Adji Achmad Rinaldo Fernandes

Abstract Taxpayer compliance is currently one of the problems faced by the govern-
ment in any country, especially in developing countries. Taxpayer compliance can
be assessed from the intention of the taxpayer toward compliance with the tax itself.
The Directorate General of Taxes as an extension of the government in tax matters
has the obligation to implement policies and technical standardization in the field
of taxation, including in providing efforts to provide an understanding of the impor-
tance of taxation in developing a country. Therefore, tax compliance is an important
thing to be improved by the government through tax compliance intentions. The
basis related to compliance intentions is the Theory of Planned Behavior by (Ajzen
in Organ Behav Hum Decis Process 50:179–211, 1991 [1]) which in this case is
used as the basis for this discussion. To determine the factors driving the intention
to comply, it is necessary to search for any variables that influence this. By using big
data, the variables obtained are more in line with the current reality, because they are
taken directly from the virtual world, which is more specifically sourced from online
media. After the data is mined from social media, the raw data will be extracted and
analyzed using Discourse Network Analysis (DNA) to be compiled into a variable
which will then be modeled using Structural Equation Modeling (SEM).

Keywords Theory of planned behavior · Tax compliance · Big data · e-Filing ·


Structural equation modelling

M. S. Utama (B)
Directorate of International Tax, Directorate General of Taxes in Indonesia, Jakarta, Indonesia
e-mail: [email protected]
Solimun · A. A. R. Fernandes
Department of Statistics, University of Brawijaya, Malang, Indonesia

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 279
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_14
280 M. S. Utama et al.

14.1 Introduction to Taxes and Tax Compliance Intentions

Taxes are compulsory contributions to the state owed by individuals or entities that
are compelling based on the Law, with no direct reward and are used for state purposes
for the greatest prosperity of the people. According to Prof. Dr. Rachmat Sumitro, SH
in 1990, taxes are people’s contributions to the state treasury (the transfer of wealth
from the people’s treasury to the government sector) based on the law to finance
routine expenses, and the surplus is used for public saving which is the main source
for financing public investment.
Tax Compliance refers to complying with all tax obligations as prescribed by
law in a free and complete manner, or the extent to which taxpayers comply or
fail to comply with their country’s tax regulations. Tax Compliance is the extent
to which taxpayers comply with tax law and full payment of all taxes owed. It
is also defined as the process by which a taxpayer files all required tax returns
by accurately declaring all income and paying tax obligations using applicable tax
laws and regulations. Theoretically, it can be defined by considering three different
types of compliance such as payment compliance, filing compliance, and reporting
compliance (Braithwaite 2009 in [2]). For a long time, tax compliance has been
associated with fiscal policies that are based on penalties, such as the use of tax penalty
instruments such as tax audits, fines, or other penalties. Attempts to explain taxpayer
behavior, centered on threats and despair, cannot offer a realistic, comprehensive,
and comprehensive image of Tax Compliance. Simply traditional factor management
of Tax Compliance is an expensive way to try to improve compliance.
Therefore, many researchers have introduced in the equation explaining Tax
Compliance and non-compliance behavior of taxpayers, in addition to economic
factors, classical (the broader related factors related to economic conditions: actual
level of income, tax rates, tax benefits, tax audits, audit probabilities, fines, and penal-
ties) and several non-economic factors. The latter, also called socio-psychological
factors, are taken into account to explain Tax Compliance behavior from a deeper and
more realistic perspective, thereby shaping modern fiscal policy, which is centered
on typology and the needs of citizens. In the category of non-economic factors, we
can include, for example, public education, and tax morale [3].
Tax compliance intention can also be defined as Tax Compliance. According to
James and Alley in [4], defines Tax Compliance is “the willingness of taxpayers to
act by the ‘spirit’ and ‘letter’ of law and tax administration without implementing law
enforcement activities”. According to Cuccia in [4] conducting their study in Brazil
defines Tax Compliance as filing all required tax returns promptly, and accurately
reporting tax obligations by the tax laws in effect at the time the returns are filed. It
can be concluded that Tax Compliance is the willingness to pay taxes following the
specified time.
Based on the theoretical study of tax compliance intention can be associ-
ated with the theory of planned behavior. The theory also defines concepts in
predictable ways and understands certain behaviors in certain contexts. Attitudes
toward behavior, subjective norms concerning behavior, and perceived control over
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 281

behavior (Perceived Behavior Control) are usually found to predict behavioral inten-
tions with a high degree of accuracy. Furthermore, intention, in combination with
perceived behavior control, can explain most of the variation in behavior [1].
This chapter discusses how attitudes, subjective norms, and behavior control affect
tax compliance intentions. The influence of Attitudes on Tax Compliance Intentions
is supported by the Attitude theory which was developed from Attitude Toward
the Behavior in the Theory of Planned Behavior [5, 6]. The effect of subjective
norms on tax compliance intentions is supported by the theory of subjective norms
developed from subjective norms in the theory of planned behavior [5, 7, 8]. The
effect of perceived Behavioral Control on Tax Compliance Intentions is supported
by the theory of perceived Behavioral Control which was developed from Perceived
Behavioral Control in the Theory of Planned Behavior [5, 9, 10].
Attitude is a driving factor for Tax Compliance Intentions in complying with tax
payments in Indonesia. Any changes that occur in Attitude will have a positive effect
on a significant change in Tax Compliance Intentions. Attitudes towards tax compli-
ance are formed by the beliefs of taxpayers regarding tax compliance which include
everything that is known, believed, and experienced by taxpayers regarding the imple-
mentation of tax regulations. Taxpayers’ beliefs about tax compliance behavior will
generate positive or negative attitudes toward tax compliance, which will further
shape the taxpayer’s intention to comply or not comply with applicable laws and
regulations. Main Research et al. stated that the better the attitude, the higher the
intention to comply with taxes, and conversely, the worse the attitude, the lower the
intention to comply with taxes. Likewise, the higher the attitude indicator which
consists of two items, behavior belief (Y1.1) and evaluation of behavioral belief
(Y1.2), the higher the application of tax compliance intentions.
Subjective norms are one of the variables that have an important role in forming an
intention for tax compliance. This is in accordance with what is explained in the TRA
theory which is a development of the TPB theory which states that subjective norms
have a role in one of the additive functions in behavioral intention, with behavioral
intention largely functioning in determining actual behavior. Main Research et al.
found that empirically the Subjective Norm is a driving factor for Tax Compliance
Intentions. Any changes that occur in the Subjective Norms will have a positive effect
on the Intention to Comply with Taxes, which means that the better the Subjective
Norms are, the higher the Intention to Comply with Taxes and vice versa. Subjective
Norm which consists of three items, namely the role of self-confidence in work,
self-confidence to consider what is important, and trust in the support of friends in
business, the higher the application of Tax Compliance Intentions.
Perceived behavioral control influences directly or indirectly (through intention)
behavior [5]. Direct influence can occur if there is actual control beyond the will of
the individual so that it influences behavior. The more positive the attitude toward
behavior and subjective norms, the greater the control one perceives, so the stronger
one’s intention to bring up certain behaviors. Finally, in accordance with the real
control conditions in the field (actual behavioral control), the intention will be realized
if the opportunity arises. Conversely, the behavior that appears may be contrary
to the individual’s intentions. This happened because the conditions in the field
282 M. S. Utama et al.

made it impossible to bring out the intended behavior so it would quickly affect
the individual’s perceived behavioral control. Perceived behavioral control that has
changed will affect the behavior displayed so that it is no longer the same as what
was intended.

14.2 Introduction to Theory of Planned Behavior (TPB)

Humans are social creatures. This shows that every human being who lives in this
world cannot be separated from the help of other people or always live side by side
with other humans. This behavior shows that someone will influence the behavior of
others.
Theory of planned behavior (TPB) is a theory that was developed from the theory
of reasoned action (TRA). TPB emerged because the previous theory only focused
on the rationality of behavior and actions within individual consciousness. Ajzen
says TPB has been widely accepted as a tool for analyzing the difference between
attitudes and intentions and as intentions and behavior. In this respect, attempts to
use TPB as an approach to explaining whistleblowing can help overcome some
of the limitations of previous research, and provide a means of understanding the
widely observed gap between attitudes and behavior [11]. Although in reality, some
individual behaviors are not entirely on individual awareness. Schematically, the
TPB model is as Fig. 14.1.
Ajzen and Fishben [12] refined the Theory of Reasoned Action (TRA) and gave
it the name TPB. TPB explains that the behavior carried out by individuals arises
because of the intention of the individual to behave and the individual’s intention
is caused by several internal and external factors from the individual. Individual
attitudes towards behavior include beliefs about a behavior, evaluation of the results

Fig. 14.1 The TPB model. Source Ajzen and Fishben [12]
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 283

of behavior, subjective norms, normative beliefs, and motivation to obey [13]. The
theory of Planned Behavior (TPB) seems to be very suitable for explaining the
intention to disclose fraud (whistleblowing), in this case, the action taken is based
on a very complex psychological process [14].
The theory of Planned Behavior explains that the behavior carried out by individ-
uals arises because of the intention to behave. Based on this theory, it can be seen that
intention is formed from attitude toward behavior, subjective norms, and perceived
behavioral control owned by individuals. The theory of Planned Behavior explains
that an individual’s intention to show behavior is determined by three factors, namely:
1. Attitude toward behavior (attitude toward behavior) Attitude toward behavior
(attitude toward a behavior) is a positive or negative evaluation of an object,
person, institution, event, behavior, or intention [5]. The theory of planned
behavior determines the nature of the relationship between beliefs and attitudes.
According to this theory, an individual’s evaluation or attitude toward a behavior
is determined by his or her beliefs about that behavior. The term trust in this theory
refers to the subjective probability that a behavior will produce a certain result.
Specifically, the evaluation of each outcome contribution to an attitude commen-
surate with the person’s subjective probability that the behavior produces the
outcome in question. Confidence is obtained when it is available from long-term
memory.
The concept of expected results comes from the expected value model.
Outcome expectancy can be in the form of beliefs, attitudes, opinions, or expec-
tations. According to the theory of planned behavior, an individual’s positive
evaluation of his performance on a particular behavior is similar to the concept
of perceived benefit. Positive evaluation refers to beliefs about the effective-
ness of the proposed behavior in reducing susceptibility to negative outcomes.
In contrast, negative self-evaluation refers to beliefs about the detrimental
consequences that can result from enacting a behavior.
2. Subjective norms (subjective norms) Subjective norms are factors outside the
individual that indicate one’s perception of the behavior implemented.
3. Perceived behavioral control (perceived behavioral control) Perceived ability to
control behavior is the individual’s perception or ability regarding the individual’s
control over a behavior.
From several definitions of the Theory of Planned Behavior according to some of
the researchers above, it can be concluded that Theory of Planned Behavior is the
intention that arises from the individual to behave and this intention is caused by
several internal and external factors from the individual. The intention to perform a
behavior is influenced by three variables, namely attitude toward the behavior, subjec-
tive norms, and perceptions of behavior control. TPB includes the volitional behavior
of people that cannot be explained by TRA. Individual behavioral intention cannot
be the exclusive determinant of behavior where individual control over behavior
is incomplete. By adding “perceived behavioral control,” the TPB can explain the
relationship between behavioral intention and actual behavior.
284 M. S. Utama et al.

Several studies have found that, compared to TRA, TPB is better at predicting
health-related behavioral intentions. TPB has improved the predictability of inten-
tion in various health-related areas, including condom use, recreation, exercise, diet,
etc. In addition, TPB (and TRA) have helped explain individual social behavior by
incorporating social norms as an important contribution.
More recently, some researchers have criticized the theory for ignoring individual
needs before committing to certain actions, needs that will influence behavior regard-
less of the attitudes expressed. haven’t ordered the steak yet because he wasn’t hungry.
Or, a person may have a negative attitude toward drinking and little intention of
drinking, but engage in drinking because he or she wants to belong to a group.
Another limitation is that the TPB does not integrate into theory the role that indi-
vidual emotions play in the development of intentions and during decision-making
games. In addition, most of the research on SDGs is correlational. More evidence
from randomized experiments would be helpful.
Several experimental studies challenge the assumption that intentions and
behavior are consequences of attitudes, social norms, and perceived behavioral
control. As an illustration, Sussman et al. [15] encouraged participants to form an
intention to support a particular environmental organization, for example signing a
petition. Once these intentions are formed, attitudes, social norms, and perceived
behavioral controls shift. Participants became more likely to report positive atti-
tudes toward these organizations and more likely to perceive that members of their
social group shared comparable attitudes. These findings imply that the relationship
between the three key elements—attitudes, social norms, and perceived behavioral
control—and intentions may be bidirectional.

14.3 The Link Between TPB and Intention to Comply


with Taxes

Efforts that can be made to improve the prediction of behavior and traits are the
collection and unification of specific behaviors which include events, situations, and
forms of action [1, 16, 17]. In predicting human behavior known as the Theory of
Planned Behavior (TPB). This theory is an extension of the Theory of Reasoned
Action (TRA). The difference between TPB and TRA is that there is the addition of
one other construct in TPB, namely perceived behavioral control which is perceived
to influence a person’s intentions and behavior towards something.
The concept of intention and behavior in general has been studied in the Theory of
Reasoned Action (TRA) which was first introduced by Fishbein and Ajzen in 1975.
Within the TRA framework, behavioral intention which largely determines actual
behavior is an additive function of two variables, namely attitudes and norms subjec-
tive. Attitudes are favorable or unfavorable individual feelings about performing a
particular behavior. Attitudes include positive or negative evaluations of performing
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 285

the behavior. An individual will intend to perform a certain behavior when he eval-
uates it positively. Attitudes are determined by an individual’s beliefs about the
consequences of performing a behavior (behavioral beliefs), which are weighted by
their evaluation of these consequences (outcome evaluation). Thus, attitude is an
individual’s prominent belief, whether the result of his behavior will be positive or
negative.
Subjective norms are assumed as a function of beliefs that are approved or disap-
proved by individuals towards the behavior. Beliefs that underlie subjective norms
are normative. Normative social influence is defined as the influence of others that
directs us to adjust ourselves to be liked and accepted [18]. Even though an action
may not be accepted or approved by an individual, normative social influence puts
pressure on a person to comply with the social norms of the group. Normative social
influence has been shown to exert a high degree of persuasive influence on individ-
uals. An individual will intend to behave when he feels that others who are important
to him think he should do so.
In 1991, TRA was developed into the Theory of Planned Behavior (TPB) by
Ajzen. In his article, Ajzen tries to show that TPB provides a useful conceptual
framework for dealing with the complexities of human social behavior. This theory
incorporates several central concepts in the social and behavioral sciences. In addi-
tion, this theory also predictably defines concepts and understands certain behaviors
in certain contexts. Attitudes toward behavior, subjective norms concerning behavior,
and perceived control over behavior are usually found to predict behavioral inten-
tion with a high degree of accuracy. Furthermore, intention, in combination with
perceived behavior control, can explain most of the variation in behavior [1].
In order to better understand the measurement of attitudes, subjective norms, and
behavioral control, the concepts or factors forming them are first reviewed in the
Theory of Planned Behavior, as presented in Fig. 14.2.
Figure 14.2 is a schematic of the relationship between the variables involved in the
Theory of Planned Behavior. The figure explains that perceived behavioral control
together with behavioral intentions can be used directly to predict final behavior. In
addition, in predicting behavioral intentions, there is a role for behavioral attitudes
and subjective norms.

Fig. 14.2 Theory of planned


behavior. Source [5]
286 M. S. Utama et al.

The theory of Planned Behavior postulates three independent conceptual deter-


minants of intention. The first is the attitude toward behavior and refers to the extent
to which a person has a favorable or unfavorable evaluation or assessment of the
behavior in question. The second predictor is a social factor called subjective norm;
it refers to perceived social pressure to perform or not perform a behavior. The third
antecedent of intention is the degree of perceived behavioral control which, as seen
previously, refers to the perceived ease or difficulty in performing the behavior and
is assumed to reflect experience and anticipated obstacles and obstacles.
As a general rule, the more favorable the attitudes and subjective norms concerning
the behavior, and the greater the perceived control of the behavior, the stronger
must be the individual’s intention to perform the behavior in question. The relative
importance of attitudes, subjective norms, and perceived behavioral control in the
prediction of intention is expected to vary across behaviors and situations. Thus,
in some applications, it can be found that only attitude has a significant impact on
intention, in others that attitude and perceived behavioral control are sufficient to
account for intention, and in still others that all three predictors make independent
contributions.
Tax Compliance Intention is the development of the Intention variable in the
Theory of Planned Behavior. Intention is the desire to perform a behavior [19].
Intentions are not always static, intentions can change over time. Theory of Reasoned
Action (TRA) assumes that humans behave consciously, taking into account the avail-
able information, and also implicitly and explicitly considering the consequences of
their actions. The TRA theory also suggests that a person’s intention to behave is
predicted by his attitude toward the behavior and how he thinks others will judge
him if he does the behavior.
Based on the description above, it can be concluded that about tax compliance,
intention is intended as the desire of taxpayers to carry out tax-compliant or non-
compliant behavior. However, not all taxpayer non-compliance is caused by an
intention to disobey. The complexity of tax law also determines the occurrence of
general tax non-compliance in many places, so tax non-compliance can occur due to
non-intentional factors or not intended.

14.4 Religiosity and Utilization of e-Filing as a Determinant


Factor in Intention to Comply with Taxes Through
TPB

Religiosity is how far the knowledge is, how strong the belief is, how well the worship
and rules are carried out, and how deep the appreciation of the religion one adheres to
[20]. Religiosity is the strength of the relationship or individual belief in their religion
[21]. Religiosity is a complex integration between religious knowledge, feelings, and
religious actions in a person.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 287

Religiosity is generally explained as related to cognition (religious knowledge,


religious beliefs) that affects what is done with emotional attachment or emotional
feelings about religion, and/or behavior, such as attendance at places of worship,
reading holy books, and praying [22]. Someone who is said to be religious is someone
who tries to understand life and live more deeply than just the outer limits, who moves
in the vertical dimension of life and transcends this life (Syafiq and Wahyuningsih
in [20]).
Religiosity is generally explained about cognition (religious knowledge, religious
beliefs) that influences what is done with emotional attachment or emotional feelings
about religion, and/or behavior, such as attendance at places of worship, reading holy
books, and praying (Elci in [20]). Someone who is said to be religious is someone
who tries to understand life and live more deeply than just the outer limits, who moves
in the vertical dimension of life and transcends this life (Syafiq and Wahyuningsih
in [20]). The Directorate General of Taxes under the coordination of the Ministry of
Finance of the Republic of Indonesia can pay attention to the elements of individual
and organizational religiosity in the process of complying with timely tax payments
according to the regulations that have been regulated through the provision of under-
standing related to taxes using a religiosity approach, so that there are no doubts from
the public about paying taxes to the state as a whole. religious rules, besides that so
that the attitude of the taxpayer can be better and to increase the level of taxpayer
compliance.
Ajzen [5] says that subjective norms are functions based on beliefs called norma-
tive beliefs, namely beliefs regarding agreement and/or disagreement originating
from referents or people and groups that influence individuals (significant others)
such as parents, spouses, close friends, colleagues or others to a behavior. Subjec-
tive norms are individual perceptions of social pressure to perform or not perform a
behavior [5]. Subjective norms are determined by the existence of normative beliefs
and the desire to follow (motivation to comply). Normative beliefs regarding expec-
tations come from referents or people and groups that influence individuals (signifi-
cant others) such as parents, spouses, close friends, co-workers, or others, depending
on the behavior involved. So subjective norms are formed as a result of individual
perceptions of existing social pressures to manifest or not a behavior. Empirically,
Religiosity is a driving factor for Subjective Norms for Tax Compliance. Any change
that occurs in Religiosity will have a significant effect on the Subjective Norms posi-
tively, which means the better the Religiosity one has, the better the Subjective Norms
and vice versa.
In addition to subjective attitudes and norms, any changes that occur in Religiosity
will have a significant effect on Behavior Control that is perceived positively, which
means that the better the Religiosity one has, the higher the perceived Behavior
Control and vice versa. Based on [23] explaining that with the role of religious
values, it is hoped that it can spur positive behavior and prevent negative behavior
towards behavioral compliance to encourage good behavior. Religiosity can be a
factor that strengthens individual self-control and takes a positive role in preventing
deviant behavior [24]. Research Utama et al. [25] states that religiosity is the driving
force behind perceived behavior control in tax compliance. Any change that occurs
288 M. S. Utama et al.

in Religiosity will have a significant effect on Behavior Control that is perceived


positively, which means that the better the Religiosity one has, the higher the Behavior
Control is felt and vice versa.
Ajzen [5] explains perceived behavioral control as a function based on beliefs
known as control beliefs, namely individual beliefs regarding the presence or absence
of factors that support or hinder individuals from eliciting a behavior. This belief is
based on the individual’s previous experience about a behavior, information that
the individual has about a behavior that is obtained by observing the knowledge
possessed by oneself and other people known to the individual, as well as by various
other factors that can increase or decrease the individual’s feelings about the level of
difficulty in performing a behavior.
According to [26], argued that taxes are the transfer of wealth from the people to
the state treasury to finance routine expenses and the surplus is used for public saving
which is the main source for financing public investment. Based on Law no. 28 of
2007 states that taxes are mandatory contributions to the state owed by individuals
or entities that are coercive under the law by not getting compensation directly.
On January 24, 2005, at the Presidential Office, the President of the Republic of
Indonesia together with the Directorate General of Taxes launched the product e-
Filing or Electronic Filing System. e-Filing, namely a system of reporting or submit-
ting taxes with electronic notification letters (e-Filing) carried out through a real-time
online system. In the Decree of the Director General of Taxes, it is stated that the
Submission of Electronic Tax Returns (e-SPT) is carried out through an Application
Service Provider Company appointed by the Director General of Taxes.
For further regulation, the Director General of Taxes issued Regulation No. KEP-
05/PJ/2005 dated January 12, 2005, concerning Procedures for Submitting Electronic
Tax Returns (e-Filing) through Application Service Provider Companies (ASP). With
this system, it will be easier for taxpayers to fulfill their obligations without having
to queue at the Tax Service Office so that they feel more effective and efficient. In
addition, sending tax returns (SPT) data can be done anywhere and anytime, both
inside and outside the country, not depending on office hours, and can also be done
on holidays and without the presence of a Tax Officer (24 h in 7 days). where data
will be sent directly to the database of the Directorate General of Taxes with internet
(online) facilities that are channeled through one or several Application Service
Provider Companies (ASP).
Implementation of e-Filing makes it easier and more efficient for taxpayers to
make tax payments. This allows taxpayers to change attitudes to increase tax compli-
ance intentions. In its implementation, in paying taxes, e-Filing is considered a small
part of the use of technology provided by DGT in increasing the compliance intention
of Large Taxpayers. (Body). In tax reporting, e-Filing is a part of tax reporting which
in this case aims to facilitate tax payments for taxpayers. In addition, e-billing is
also known in tax reporting, which is also part of the use of information technology
provided by DGT in tax reporting. E-Billing is a continuation of e-Filing, so it is
possible that if all components of information technology provided by the DGT in
paying taxes are examined it will further increase tax compliance intentions through
attitude.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 289

In addition, Utama et al. [25] stated that there is a significant influence between
the use of e-Filing on perceived Behavior Control. In addition to the results of the
analysis of the direct effect, there are results of the indirect effect of the variable
Utilization of e-Filing on Tax Compliance Intentions through perceived Behavior
Control. The results of the analysis of the indirect effect of e-Filing Utilization on
Tax Compliance Intentions through Perceived Behavior Control conclude that the
effect of e-Filing Utilization in this relationship has a significant positive effect. This
indicates that perceived Behavioral Control is a mediating variable between e-Filing
Utilization and Tax Compliance Intentions. The coefficient is positive, meaning that
the better or increase the use of e-Filing, followed by an improvement or increase
in perceived Behavior Control, the better the intention to comply with taxes. The
existence of this significant influence is in line with the results of the measurement
model of the e-Filing Utilization variable that the strongest factor determines the level
of Utilization e-Filing is System Simplicity, so that the perceived Behavior Control
variable tends to have a high potential to encourage the Utilization of e-Filing on
Tax Compliance Intentions. The results of this indirect effect are also in line with
the Technology Acceptance Model (TAM) Theory, based on the TAM Theory states
that Perceived Behavior in the use of technology is the ease of the system and the
benefits provided.
This can be interpreted that every time there is a change in the variable Utilization
of e-Filing gives a significant influence or change in the perceived Behavior Control
variable. The results of this study support the concept of e-Filing which was devel-
oped based on the Regulation of the Director General of Taxes Number KEP-05/PJ/
2005 dated 12 January 2005 and Article 6 paragraph (2) of Law no. 16 of 2000. In
addition, the e-Filing system has high accuracy and can reduce errors in tax reporting
because generally e-Filing applications provide a double-checking feature, that is, if
an error occurs, the Taxpayer will receive an error message and cannot save and send
the report until it is corrected. By using e-Filing, Taxpayers are also able to make
environmentally friendly or reduce the use of paper in tax reporting.

14.5 Extracting Information in Online Media is Related


to Variables that Influence Tax Compliance Intentions

From year to year, the development and use of the internet globally in today’s world
is increasing. This is inseparable from internet access which is increasingly easily
accessible and spreads to remote areas as well as access costs that are getting cheaper
to use the internet. One of the implications of this is the increase in the use of online
media which is increasingly being used by humans day by day.
Information news can not only be obtained through print media such as newspa-
pers, magazines, and so on as well as electronic media such as television and radio.
Online media which is seen as interactive media can also function as a medium
that provides various information in it, including news. The existence of the internet
290 M. S. Utama et al.

in society is currently used as a channel to convey information with a much more


massive reach and capacity. Adequate knowledge and ease of access have made
people more familiar with the Internet, so some people are now starting to consume
information daily via the Internet.
Online media is a new media (new media) with a way of conveying information
that is different from conventional media, namely print media and electronic media.
Online media requires a computer-based device and internet connection to search for
and receive information. The internet with its unlimited character makes internet users
free in media. The use of the term online media is often interpreted as a news site or
written journalistic practice that is published via the internet. However, according to
Ashadi Siregar, online media can be interpreted as a general term for a form of media
based on telecommunications and multimedia (computers and the internet). Inside
there are news portals, websites (websites), online radio, online TV, online press,
online mail, and so on, with their respective characteristics according to the facilities
that enable users or consumers to use them. In this general sense, online media can
also be interpreted as a means of online communication. From the explanation above,
it can be concluded that online media can also be used as a medium to communicate
with audiences.
Online media has several characteristics that can be used as a comparison with
conventional media, including the following:

1. Information Speed

Journalism that uses the internet as a medium has an advantage over traditional
media, which is faster in distributing information. Generally, people have to wait
for the next day to find out what happened today. However, through online media,
information can be distributed along with events or issues that occur at that time.
Even though reports about an event through electronic media are also getting faster
now, this actuality will not be able to happen to print media. Because online media
is easily accessible, the delivery of information tends to be short. This also supports
one of the news values, namely actuality.
2. Information Update

The characteristics of the internet are unlimited and can be accessed anytime and
anywhere, making online media able to update previously published information
with more complete information. Information updates and publications do not have
a time limit and continue as long as they are relevant to the core information, in
contrast to broadcasting television programs during prime time and breaking news
which is available in electronic media.

3. Reciprocal

When compared to print and electronic media where communication goes in one
direction, online media gives communicants the flexibility to provide feedback in a
relatively short time. One example of online media that has a high level of interactivity
is a discussion group or forum. Internet users from various regions can write down
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 291

their thoughts on a topic being discussed. Online media such as news portals also
always provide a column at the bottom of the news for comments from readers and
complaints from the editorial team.

4. Personalization

Online media users have self-control, meaning that the communicant is given the
freedom to consume whichever information is deemed important or interesting. This
is different from print media, especially electronic media, where all information
is presented directly to the public without any control over choosing and filtering
information. In online media, users can search for the desired information through
search engines which are always provided by a website. Because of this, many online
media, especially news portals, provide categories for the news they publish.

5. Unlimited Capacity

The superior characteristic of online media is that there is no capacity limit to produce
and distribute information. Online media generally have a data bank or database that
can accommodate massive amounts of various kinds of information, so that audiences
can access even old information.

6. Link

Information published through online media can be connected with other related
information either on the same or different sites. As well as a citation in the literature.

7. Multimedia Capability

Online media makes it possible for communicators to include text, sound, images,
even videos, and other multimedia-based components in the news pages that are
presented.
Online media, if used wisely, can be used by the public to find and fulfill their need
for information. Basically, online media is not only used as a medium for communi-
cation between individuals, groups, and the masses, because online media is also used
by the public as an educational medium, such as disseminating news or information
about important events, discovering new things, and various other educational infor-
mation. Therefore, people will find it easy and fast to dig up educational information
from online media.
Information regarding the intention to comply with taxes is widespread in
cyberspace, where the information is complex enough that data processing tech-
niques are needed that are capable of extracting this information. Through the DNA
method, a discourse structure can be systematically identified in various textual docu-
ments so that through DNA, political, social, cultural, health, and other discourse
can be mapped and visualized into a network. The DNA analysis will obtain public
responses regarding the intention to comply with taxes from cyberspace. Based on
the DNA results, issues and concept issues will be obtained which can be used as
indicators and variables.
292 M. S. Utama et al.

14.6 Introduction to Big Data

Data is one of the important things for human life which can not only be interpreted
in a language dictionary, but data has its own essence. The essence in question
means that it can influence concepts, theories, and methods that are directly and
indirectly related. Data is the main component of an information system because
all information for decision-making comes from data. When data is processed, it
will produce information, information itself can be found in many places known
as information spaces, both in print copies and online copies, for example, books
have two forms, namely as documents published in printed form and those published
online. So it was concluded that enabling data derived from information can be
implemented.
Humans themselves have various accurate abilities to recognize units of informa-
tion as encoded data, but humans are not able to do it quickly and exceed the capacity
of the complexity of the human brain, especially if it involves a large enough pile
of information or what is commonly called big data. According to [27] Big Data is
a trend that covers a broad area in the world of business and technology, where Big
Data refers to technologies and initiatives that involve diverse, rapidly changing, and
very large data making it difficult to handle it effectively be it using conventional
database management tools or other data processing applications.
The Gartner IT Glossary (The Gartner IT Glossary, nd) defines Big Data as
follows: Big data is high-volume, high-velocity, and/or high-variety information
assets that demand cost-effective, innovative forms of information processing that
enable enhanced insight, decision making, and process automation. Referring to the
definition above, it is concluded that the characteristics of Big Data are volume,
velocity, and variety. Volume is the amount of data that must be managed with a very
large size, velocity is the speed of data processing which must develop the growth in
the amount of data, and variety is a characteristic of very diverse data sources, this
comes from structured databases and unstructured data.
According to, there are three things that bring about the development of Big Data
technology, namely as follows:
(1) The rapid growth of data storage capabilities
(2) The rapid increase in data processing engine capabilities
(3) Availability of abundant data
The Big Data process aims so that every business, organization, or individual
capable of processing data can obtain in-depth information (insights) so that it will
trigger decision making and a business action that can be relied on in insights. Big
Data technology can handle various variations of data, which are grouped into two,
namely structured data and unstructured data. Structured data is a group of data that
has a defined data type, format, and structure. Then, unstructured data is a group of
textual data with an erratic format and has no inherent structure, so it is necessary to
make it structured data but requires more effort and time.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 293

Big Data management has several stages that require assistance from tools to
support processing at each stage, the following are the stages of Big Data management
[28].

(1) Acquired

It is a way of getting data and sources.

(2) Accessed

With regard to data access, data that has been collected requires governance, inte-
gration, storage, and computing so that it can be managed for the next stage. Devices
for processing (processing tools) use Hadoop, Nvidia CUDA, Twitter Storm, and
GraphLab. As for data storage management (storage tools) using Neo4J, Titan, and
HDFS.

(3) Analytic

Related to the information to be obtained, the results of data management that has
been processed. The analytics performed can be descriptive (describing data), diag-
nostic (looking for causes and effects based on data), predictive (predicting future
events), or prescriptive analytics (recommending choices and implications of each
option). Tools for the analytical phase use MLPACK and Mahout.

(4) Application

Regarding the visualization and reporting of the results of analytics. Tools for this
stage use RStudio.
Data mining is a process of dredging or collecting important information from
large data. The data mining process often uses statistical methods, and mathematics,
and makes use of artificial intelligence technology. Data mining has many functions,
namely:
1. Descriptive; refers to a function in understanding more detailed data. This process
aims to find patterns and characteristics of the data. By utilizing this descriptive
function, certain patterns or patterns can be found that were originally hidden in
data. That is, if there is a pattern that is repetitive and has value, it means that the
characteristics of the data can be known.
2. Predictive; is a function regarding a process that will later reveal a special pattern
from data. This pattern can be found in several variables contained in the data.
When you have found a pattern, that pattern can be used to estimate other variables
whose values are still unknown. This is why predictive functions are considered
equivalent to predictive analysis. Predictive can also be used to estimate a special
variable that is not in the data.
3. Associations; is a data mining function in which the process of identifying
relations (relationships) of each data. Both past and current data.
294 M. S. Utama et al.

4. Classification; serves to conclude several definitions of characteristics in a group


or group of data. An example is customer data (customer data) who stop using
a product because they think competitors’ products provide more benefits and
customer value for them.
5. Clusterization; is the process of identifying groups of products or goods that have
special characteristics.
6. Forecasting; is a data forecasting technique that is useful for getting an idea
of the value of data in the future. Forecasting can be done by gathering large
amounts of information. An example of the application of forecasting is data
related to forecasting the number of requests for seasonal products in certain
seasons (seasonal marketing).
7. Sequencing; namely the process of identifying each relationship that is different
in a certain period. An example of sequencing itself is customer data that repeat
purchases of a particular product repeatedly in the customer lifecycle.
Discourse is considered an important phenomenon in many disciplines. Discourse
also greatly influences political processes at various levels, both at the public,
political, scientific, and other levels. One approach to understanding discourse is
Discourse Network Analysis (DNA). DNA is a technique for visualizing discourse,
be it political, social, cultural, etc., into a network. DNA is a combination of category-
based content analysis and social network analysis. This approach allows us to
systematically identify a discourse structure in various textual documents such as
newspaper/print media articles, transcripts of parliamentary debates, etc.
DNA is the development of a social network analysis methodology that combines
two method elements, namely qualitative content analysis and social network anal-
ysis. DNA was coined by Professor Philip Leifeld from the Department of Govern-
ment of the University of Essex. Initially, DNA was intended to map problems related
to government science and public science. Along with the complexity of the problem,
DNA can also be used to solve various problems related to communication and other
social sciences.
The main component in DNA is the mapping of actors, organizations, discourses,
sentiments, and topics being discussed or debated. In his research study, Professor
Leifeld created a Java-based program called Discourse Network Analyzer. The
Discourse Network Analyzer helps researchers to upload online news content,
speeches, or threads about the discourse being reviewed or in debate. After
the researcher uploads all online news content, speeches, or discourse threads.
Researchers are required to mark important statements in each news item, along
with actors, organizations, and sentiments for content analysis. Then the results of the
discourse content analysis can be exported in CSV or graphml form to visualize the
connectedness of discourse, actors, and related sentiments. For image visualization,
you can use the Visone software.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 295

14.7 Modeling Using SEM

Structural Equation Modeling (SEM) analysis was developed in the early 1950s.
The development of covariance analysis by Joreskog, Keesling, and Wiley in 1973
was the beginning of the emergence of software regarding SEM analysis. The main
purpose of the existence of SEM analysis software is to produce an analysis tool
that is more powerful than before and can solve problems that are more substantive
and comprehensive. The development of SEM analysis is currently increasingly
significant because of the need for researchers to solve the problem. At this time a lot
of software has been developed for SEM analysis, namely AMOS, LISREL, PLS,
GSCA, and TETRAD.
This analysis is the development of path analysis and multiple regression analysis,
where all of these methods are a form of multivariate analysis models. Ghozali [29]
describes the structural equation model (Structural Equation Modeling) as the second
generation of multivariate analysis techniques that allow researchers to examine the
relationship between complex variables both recursive and non-recursive to obtain
a comprehensive picture of the whole model. Solimun et al. [30] explained that
SEM is statistical modeling that can simultaneously involve the relationship between
research variables and their indicator models. The advantages of SEM analysis are
as follows [31].
1. Can test the relationship of causality, validity, and reliability as well.
2. Can be used to see the direct and indirect effects between variables.
3. Testing several dependent variables at once with several independent variables.
4. Can measure how much the indicator variables can influence the respective factor
variables.
5. Can measure factor variables that cannot be measured directly through the
indicator variables.
When using SEM analysis several variables must be understood, namely latent
variables and manifest variables [32]. Latent variables are variables that cannot be
measured directly. If this variable is described using a circle or oval or elliptical
icon. There are two kinds of latent variables, namely exogenous latent variables and
endogenous latent variables. Exogenous latent variables are variables that affect the
values of other variables in the model. Meanwhile, endogenous latent variables are
variables that are influenced directly or indirectly by exogenous variables. Examples
of latent variables, namely motivation, satisfaction, or attachment.
In addition to latent variables, there are manifest variables which are variables that
can be measured directly. The value of this variable can be found by conducting direct
research such as surveys and so on. Manifest variables are drawable with a rectangular
icon. One of the advantages of SEM analysis is being able to accommodate a study
involving intervening variables or variables that are dependent in one equation and
simultaneously become independent variables in other equations [33]. For example,
job satisfaction is influenced by the environment and at the same time job satisfaction
also affects work performance.
296 M. S. Utama et al.

In the SEM model, the minimum sample size that must be used is still being
debated. According to [34], structural models require a minimum sample of 200
observations or observations. Meanwhile, according to Hair et al. [35] argue that
the required minimum sample size is 100–150 observations. On the other hand,
[36] suggest that the sample size for SEM analysis is 5 times the parameter to be
estimated from the study. And finally, Byrne [37] suggests a minimum sample of
100 observations. Based on the various opinions expressed by experts, it shows that
the range of numbers is relatively not much different, namely the average states that
the minimum sample used is 100 samples.
There are two equation models in SEM analysis, namely [38].

1. Structural models

The structural model is a model that describes the relationship between latent vari-
ables. The parameter used to show the relationship between exogenous latent vari-
ables and endogenous latent variables is gamma. While the parameter that shows the
relationship between endogenous latent variables and other endogenous variables is
beta.

2. Measurement models
The measurement model is used to describe the relationship between the tenth vari-
able and the observed variable. Lambda is used to denote factor loading which relates
latent variables and observed variables.
If the researcher combines structural model testing with measurement model
testing, it allows researchers to carry out factor analysis together with hypothesis
testing and makes it possible to test measurement errors as an inseparable part of
SEM. The following is an error in SEM.
1. Structural fault

Structural errors occur when the latent variable cannot perfectly predict the dependent
variable so the structural error component is displayed in the structural model.

2. Measurement error

Measurement errors occur because the observed variables cannot perfectly describe
the latent variables, so a measurement error component needs to be added.
In SEM analysis it is necessary to test the goodness of the model or Goodness of
fit. There are several ways to measure the goodness of the model, namely:

1. Goodness of Fit Indices (GFI)


GFI is a non-statistical measure whose value ranges from 0 (poor fit) to 1 (perfect
fit). The higher the GFI value, the better the resulting model. According to [39], a
GFI value above 90% is an ideal measure that states that the model is good.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 297

2. Root Mean Square Error of Approximation (RMSEA)

RMSEA values range from 0.05 to 0.08.


The following are SEM modeling steps (Ferdinand 2022).
1. Theoretical model development
2. Development of flowcharts
3. Convert flowcharts into SEM equations
4. Selection of input matrices and estimation techniques
5. Assess identification problems
6. Model evaluation
7. Model interpretation and modification.

14.8 Integration of Information Mining Results in Online


Media Using DNA with SEM

Information mining results from online media using DNA can be integrated with SEM
modeling; this is known as the mixed method approach. The researcher first begins
by exploring the views that exist in online media, namely the intention to comply
with taxes. The data is then analyzed, and the information is used to construct the
most suitable instrument for the sample studied, to identify the appropriate instrument
which is then used to determine the variables (mining yields) that need to be included
in the model.
The sources used to obtain data in this study are content in cyberspace or social
media (data scraping). Data Scrapping refers to a technique in which a computer
program extracts data from the output produced by another program. One form of
Data scraping is text mining, which is the process of using an application to extract
important information from websites [40]. After scraping the data, DNA (Discourse
Network Analysis) will be carried out to find out the actors involved and the issues
raised. Furthermore, these issues are grouped so that the concepts of issues that
represent tax compliance intentions are obtained. An illustration of DNA output can
be seen in Fig. 14.3.
From Fig. 14.3 it can be seen that based on the results of DNA analysis, the
data will consist of two categories, namely issues and actors. These issues and actors
relate to the analysis of data obtained based on information obtained from cyberspace
regarding the topic under study. The actor shown icon is a black circle and a black
square is an issue. In addition, the DNA results also obtained sentiments from state-
ments/issues visualized with green lines representing positive sentiments, red lines
representing negative sentiments, and blue lines indicate several discourses which
contain positive and negative sentiment results.
After obtaining the concept of the issue that describes the intention to obey taxes,
then a research model is created. Apart from being based on the literature review, the
research model is added with research variables obtained from the results of DNA
298 M. S. Utama et al.

Fig. 14.3 Illustration of DNA output

analysis. The research variable is usually used as an independent variable, which is


helpful as a variable that influences the dependent variable.
The following is presented in Fig. 14.4 as an illustration of the integration of DNA
results and SEM modeling.
After the research model was formed, a research questionnaire was created. The
basis used for the preparation of the questionnaire is the result of the analysis on
qualitative data, which includes the identification of variables, the conceptualization
of variables, and the concept of variable measurement. After variable mining using
DNA analysis, an exogenous variable was obtained, namely perceived risk (X3). The
research model is presented in Fig. 14.5.

Attitude (Y1)
Religiosity (X1)

Tax
Use of Subjective Norms Compliance
e-Filing (X2) (Y2) Intention (Y4)

DNA analysis Behavior Control


results (X3) (Y3)

Fig. 14.4 Illustration of the research model from the integration of DNA and SEM outputs
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 299

Attitude (Y1)
Religiosity (X1)

Tax
Use of Subjective Norms Compliance
e-Filing (X2) (Y2) Intention (Y4)

Perceived Risk Behavior Control


(X3) (Y3)

Fig. 14.5 Research model

14.9 Research Hypothesis

Based on the conceptual model framework based on empirical studies and literature
review and by the formulation of the problems that have been put forward and the
research objectives that have been stated previously, the research hypothesis can be
formulated as follows:
H1: Religiosity (X1) influential to Attitude (Y1).
H2: Religiosity (X1) has an effect on subjective norms (Y2).
H3: Religiosity (X1) influences Behavior Control (Y3).
H4: E Filing (X2) has an effect to Attitude (Y1).
H5: E Filing (X2) effect on Behavior Control (Y3).
H6: Perceived risk (X3) influences attitude (Y1).
H7: Perceived risk (X3) has an effect on subjective norms (Y2).
H8: Perceived risk (X3) has an effect on Behavior Control (Y3).
H9: Attitude (Y1) influences tax compliance intention.
H10: Subjective Norms have an effect on Tax Compliance Intentions.
H11: Behavioral Control Influences Tax Compliance Intentions.
The hypothetical model designed in this study is presented in Fig. 14.6
(Table 14.1).
Based on the results of the hypothesis testing as presented in Table 14.1, it shows
that there are 11 (eleven) results of influence between research variables and 11
(eleven) influences between research variables that have a significant effect between
variables so that this study accepts 11 (eleven) hypotheses.
As previously explained, in SEM there are two types of influence, namely direct
effect and indirect effect. In Table 14.2, the indirect effect is presented using the
WarpPLS analysis.
300 M. S. Utama et al.

H1
Religiosity Attitude
H2
H3 H9

Tax
H4 H10 Compliance
E Filling Subjective Norms Intention

H5

H11
H6
H7
Perceived Risk Behavior Control
H8

Fig. 14.6 Hypothetical model

Table 14.1 SEM analysis results


Variable Path coefficient P-values Test result
Predictor Response
Religiosity (X1) Attitude (Y1) 0.242** < 0.001 Significant
Religiosity (X1) Subjective norms (Y2) 0.263** < 0.001 Significant
Religiosity (X1) Behavior control (Y3) 0.224** 0.002 Significant
E-filing (X2) Attitude (Y1) 0.108** 0.005 Significant
E-filing (X2) Behavior control (Y3) 0.256** < 0.001 Significant
Perceived risk (X3) Attitude (Y1) 0.216** 0.003 Significant
Perceived risk (X3) Subjective norms (Y2) 0.224** 0.002 Significant
Perceived risk (X3) Behavior control (Y3) 0.238** 0.001 Significant
Attitude (Y1) Tax compliance intention 0.217** 0.003 Significant
(Y4)
Subjective norms (Y2) Tax compliance intention 0.253** < 0.001 Significant
(Y4)
Behavior control (Y3) Tax compliance intention 0.251** < 0.001 Significant
(Y4)
Perceived risk (X3) Subjective norms (Y2) 0.224** 0.002 Significant
Source Primary Data Processed, 2020
Description *significant at alpha (α) 5%; **significant alpha (α) 1%
ns is not significant

Based on the results of the hypothesis testing as presented in Table 14.2, it was
found that there were 9 indirect effects between the research variables. Of the 9 (nine-
teen) indirect effects between the research variables, there are 7 (seven) significant
influences and there are 2 (two) significant indirect effects between variables.
In addition, the feasibility of the model can also be analyzed by calculating the
multivariate determination coefficient expressed by Q-Square (Q2 ). Q-Square is a
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 301

Table 14.2 Hypothesis testing on the indirect effect of WarpPLS analysis


Connection Path coefficient P-values Conclusion
X1 → Y1 → Y4 0.242 × 0.217 = 0.053 0.175* 0.012 Significant
X1 → Y2 → Y4 0.263 × 0.253 = 0.067
X1 → Y3 → Y4 0.224 × 0.251 = 0.056
X2 → Y1 → Y4 0.108 × 0.217 = 0.023 0.088 ns 0.134 Not significant
X2 → Y3 → Y4 0.256 × 0.251 = 0.064
X3 → Y1 → Y4 0.216 × 0.217 = 0.047 0.163* 0.018 Significant
X3 → Y2 → Y4 0.224 × 0.253 = 0.057
X3 → Y3 → Y4 0.238 × 0.251 = 0.060
Source Primary Data Processed, 2020
Description *significant at alpha (α) 5%; **significant alpha (α) 1%
ns is not significant

measure of how well the research model can explain the behavior of the research
object (system) studied. Q > 0 indicates the model has predictive relevance. To find
out how much the diversity of data can explain the model, this research can use Q2 .
Table 14.3 is a summary of the results of the coefficient of determination.
The predictive relevance value is 0.4629, indicating that the diversity of data that
can be explained by the model is 46.2%, or in other words the information contained
in the data is 46.2% can be explained by the model. While the remaining 53.7% is
explained by other variables (which are not included in the model) and errors. Thus
the structural model that has been formed is appropriate.
Graphically the results of hypothesis testing in the SEM structural model with the
WarpPLS approach can be seen in Fig. 14.7.

Table 14.3 Coefficient of


Response variable R-squared
determination
Attitude (Y1) 0.106
Subjective norms (Y2) 0.114
Behavior control (Y3) 0.172
Tax compliance intention (Y4) 0.181
Total determination coefficient 0.4629
Source Primary Data Processed (2020)
Q2 = 1 – [(1 – 0.106)(1 – 0.114) (1 – 0.172)(1 – 0.181)]
Q2 = 1 – (0.894)(0.886)(0.828)(0.819)
Q2 = 1 – 0.5371
Q2 = 0.4629 = 46.2%
302 M. S. Utama et al.

0.242
P=<0.001
Religiosity Attitude
0.263
P=<0.001 0.217
0.224 P=0.003
P=0.002
0.108 Tax
P=0.005 0.253
E Filling Subjective Norms
P=<0.001 Compliance
0.256
Intention
P=<0.001
0.216
0.251
P=0.003
0.224 P=<0.001
P=0.002
Perceived Risk Behavior Control
0.238
P=0.001

Fig. 14.7 Conceptual framework for hypothesis testing results. Source Primary Data Processed
(2020). Remarks black right arrow = Significant, red right arrow = Not Significant

14.10 Conclusion

Based on the results of the research and discussion of each of the variables previously
described, the following research conclusions can be drawn:
(1) Religiosity has a significant effect on Attitude in a positive direction. The
results of this study explain that if a taxpayer has religious beliefs, religious
practices, appreciation of religion, religious knowledge, and high religious
practice in himself for what is done and done, the taxpayer has a good attitude
to comply in paying taxes.
(2) Religiosity has a significant effect on Subjective Norms in a positive direction.
The results of this study explain that the higher the religiosity (X1), the higher
the subjective norm (Y2).
(3) Religiosity has a significant effect on Behavior Control in a positive direction.
The results showed that the higher the religiosity (X1), the higher the behavior
control (Y3).
(4) E-Filing has a significant effect on Attitude with a positive direction of influ-
ence. The results of this study indicate that any increase in E-Filling will cause
attitudes to increase positively, which means that the higher the E-Filling,
the higher the attitude. The coefficient is positive which means the effect is
unidirectional.
(5) E-Filing has no significant effect on Behavior Control with a positive direc-
tion of influence. The results of this study explain that every change in the
E-Filling variable has a significant effect or change on the Behavior Control
variable.
(6) Perceived risk has no significant effect on Attitude in a positive direction. The
results of this study indicate that every change in the perceived risk variable
has a significant influence or change in the Attitude variable.
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 303

(7) Perceived Risk significant effect on Subjective Norms in a positive direc-


tion. The results of this study are the higher the perceived risk, the higher the
subjective norm, and the lower the perceived risk, the lower the subjective
norm.
(8) Perceived Risk significant effect on Behavior Control with a positive direc-
tion of influence. The higher the perceived risk, the higher the behavior control,
and conversely, the lower the perceived risk, the lower the behavior control.
(9) Attitude significant effect on The Intention to Comply with Taxes in a
positive direction. The results of this study are that the higher the attitude, the
higher the tax compliance intention, and the lower the attitude, the lower the
tax compliance intention.
(10) Subjective Norms significant effect on The Intention to Comply with Taxes
in a positive direction. The higher the Subjective Norm, the higher the Tax
Compliance Intention, and conversely the lower the Subjective Norm, the lower
the Tax Compliance Intention.
(11) Behavior Control significant effect on The Intention to Comply with Taxes
in a positive direction. The sign on the positive coefficient means that the
effect is unidirectional. The higher the Behavior Control, the higher the Tax
Compliance Intention, and conversely the lower the Behavior Control, the
lower the Tax Compliance Intention.

References

1. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211
(1991)
2. Musimenta, D., Nkundabanyanga, S.K., Muhwezi, M., Akankunda, B., Nalukenge, I.: Tax
compliance of small and medium enterprises: a developing country perspective. J. Fin. Regul.
Compl. 25(2), 149–175 (2017)
3. Mitu, N.E.: A basic necessity of a modern fiscal policy: voluntary compliance. Revista de
Științe Politice Revue des Sciences Politiques 57, 118–130 (2018)
4. Newman, W., Mwandambira, N., Charity, M., Ongayi, W.: Literature review on the impact of
tax knowledge on tax compliance among small medium enterprises in a developing country.
Int. J. Entrepreneursh. 22(4), 1–15 (2018)
5. Ajzen, I.: Attitudes, Personality and Behavior. Open University Press, Milton-Keynes, England
(2005)
6. Azwar, S.: Human Attitudes, Theories, and Measurements. Student Library, Jogjakarta (2010)
7. Haus, I., Steinmetz, H., Isidor, R., Kabst, R.: Gender effects on entrepreneurial intention: a
meta-analytical structural equation model. Int. J. Gend. Entrep. 5(2), 130–156 (2013)
8. Dharmamesta, B.S.: Theory of planned behavior in consumer attitude, intention, and behavior
research. Manage 7 (1998)
9. Rotter, J.B.: Internal vs external control of reinforcement: a case history of a variable. Am.
Psychol. 45(4), 489–493 (1990)
10. Tjahjono, A., Muhammad, F.H.: Taxation (3rd edn). Yogyakarta YKPN Company Academy
(2005)
11. Park, H., Blenkinsopp, J.: Whistleblowing as planned behavior—a survey of South Korean
police officers. J. Bus. Ethics 85, 545–556 (2009)
304 M. S. Utama et al.

12. Ajzen, I., Fishbein, M.: Attitude-behavior relations: a theoretical analysis and review of
empirical research. Psychol. Bull. 84(5), 888 (1988)
13. Sulistomo, A., Prastiwi, A.: Accounting Students’ Perceptions of Fraud Disclosure (Empirical
Study on UNDIP and UGM Accounting Students) (Doctoral dissertation, Faculty of Economics
and Business) (2011)
14. Gundlach, M.J., Douglas, S.C., Martinko, M.J.: The decision to blow the whistle: a social
information processing framework. Acad. Manag. Rev. 28(1), 107–123 (2003)
15. Sussman, R., Gifford, R.: Causality in the theory of planned behavior. Pers. Soc. Psychol. Bull.
45(6), 920–933 (2019)
16. Epstein, S.: Aggregation and beyond: some basic issues on the prediction of behavior. J. Pers.
51, 360–392 (1983)
17. Fishbein, M., Ajzen, I.: Attitudes towards objects as predictors of single and multiple behavioral
criteria. Psychol. Rev. 81(1), 59 (1974)
18. Fishbein, M., Yzer, M.C.: Using theory to design effective health behavior interventions.
Commun. Theory 13(2), 164–183 (2003)
19. Jogiyanto: Behavioral Information System. Andi Offset, Yogyakarta (2007)
20. Sulistyo, H.: The role of religious values on employee performance in the organization. Media
Res. Bus. Manage. 11(3), 252–270 (2011)
21. Susanti, R.: A description of the future orientation of adolescents in the field of work in terms
of religiosity and achievement motivation in the youth of Sei Banyak Ikan Kelayang Village.
J. Psychol. 12(2), 109–116 (2016)
22. Elci, M.: Effect of manifest needs, religiosity and selected demographics on hard working: an
empirical investigation in Turkey. J. Int. Bus. Res. 6(2), 97 (2007)
23. Mohdali, R., Pope, J.: The role of religiosity in tax morale and tax compliance. In: Australian
Tax Forum, vol. 25, no. 4, pp. 565–596 (2010)
24. Purnamasari, P., Amaliah, I.: Fraud prevention: relevance to religion and spirituality in the
workplace. Procedia Soc. Behav. Sci. 211, 827–835 (2015)
25. Utama, M.S., Nimran, U., Hidayat, K., Prasetya, A.: Effect of religiosity, perceived risk, and
attitude on tax compliant intention moderated by e-filing. Int. J. Fin. Stud. 10(1), 8 (2022)
26. Soemitro, R.: Tax Theory and Cases. Gramedia, Jakarta (2013)
27. Pujianto, A., Mulyati, A., Novaria, R.: Utilization of big data and consumer privacy protection
in the digital economy era. BIJAK Sci. Mag. 15(2), 127–137 (2018)
28. Kominfo, T.P.: Big Data Pocket Book. Ministry of Communication and Informatics (2015)
29. Ghozali, I.: Structural Equation Modeling, Alternative Method with Partial Least Square. Undip
Publishing Agency, Semarang
30. Solimun, S., Fernandes, A.A.R., Nurjannah, N.: Multivariate Statistical Method Structural
Equation Modeling (SEM) WarpPLS Approach. UB Press, Malang (2017)
31. Aji, A.S., Harahab, N.: Analysis of the effect of product price, product image, and customer
satisfaction as a mediation on brand loyalty of canned fish products from ABC brands.
ECSOFiM (Econ. Soc. Fish. Marine J.) 6(1), 83–92 (2018)
32. Ginting, D.B.: Structural equation model (SEM). Media Inform. 8(3), 121–134 (2009)
33. Chalil, D., Barus, R.: Qualitative Data Analysis: Theory and Applications in SWOT Anal-
ysis, Logit Models, and Structural Equation Modeling (Supplemented with SPSS and Amos
Manuals) (2014)
34. Hoelter, J.W.: The analysis of covariance structures: goodness-of-fit indices. Sociol. Methods
Res. 11(3), 325–344 (1983)
35. Hair, J., Anderson, R., Tatham, R., Black, W.: Multivariate Data Analysis, 5th edn. Prentice
Hall, Upper Saddle River, New Jersey (1998)
36. Bentler, P.M., Chou, C.P.: Practical issues in structural modeling. Sociol. Methods Res. 16(1),
78–117 (1987)
37. Byrne, B.M.: Structural Equation Modeling with AMOS: Basic Concepts, Applications, and
Programming. Lawrence Erlbaum Associates Inc., Mahwah, NJ (2001)
38. Ullman, J.B.: Structural equation modeling: reviewing the basics and moving forward. J.
California Stat. Assess. 87, 35–50 (2006)
14 Constructing a Broad View of Tax Compliance Intentions Based on Big … 305

39. Schumacker, R.E., Lomax, R.G.: A Beginner’s Guide to Structural Equation Modeling, 3rd
edn. Routledge, New York (2010)
40. Riyadi: REST Web Service Design for Comparison of Shipping Prices with Web Scrapping
Methods and Utilization of API. College of Informatics and Computer Management Amikom
Yogyakarta, Yogyakarta (2013)
Chapter 15
Influence of Firm-Specific Variables
on Capital Structure Decisions:
An Evidence from the Fintech Industry

Suzan Dsouza and Ajay Kumar Jain

Abstract Capital structure (Capstr) decisions have always been a concern for firms.
It is crucial to decide the right proportion of borrowed funds for any organisation. This
study examines the influence of firm-specific variables that determine the Capstr deci-
sions of firms from the fintech industry. The data for this study is sourced through the
Refinitiv Database consisting of the worldwide fintech industry. The selected sample
consists of 186 firms from across the global fintech industry. Through a quantita-
tive approach, we have used panel regression, supported by descriptive statistics and
correlation, considering the annual financial published data of the selected firms for a
period from 2011 to 2021. The data used in the study comprises of an unbalanced and
a cross-sectional data of 1000 firm/year observations, imported from the selected 186
firms. The findings conclude that the firm size (FS), profitability (Prft), tangibility
ratio (Tr) and volatility (Vol) have a significant impact on Total debt ratio (TDr) and
Short term debt ratio (SDr), However only profitability (Prft) has a significant impact
on Long term debt ratio (LDr). The study partially approves to be in agreement to
the pecking order theory for the studied industry.

Keywords Fintech industry · Capital structure decisions · Regression models ·


Debt ratios · Panel data

S. Dsouza (B)
College of Business Administration, American University of the Middle East, Egaila, Kuwait
e-mail: [email protected]
A. K. Jain
Department of Finance, Westminster International University of Tashkent, Tashkent,
Uzbekistan 100047
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 307
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_15
308 S. Dsouza and A. K. Jain

15.1 Introduction

The decision regarding the capital structure (Capstr) is the most essential decision
for any business. A major issue that is faced by the firm while taking Capstr deci-
sions is understanding and determining optimal Capstr. Corporate finance consists
of many different theories which have a full focus on firm-specific factors that have
determinants in defining capital-structure decision making. Recently researchers are
focusing on firm-specific factors concerning the Fintech industry, especially in devel-
oped countries. The decision for Capstr has to be taken in advance at the beginning
of company formation or when there is additional requirement of funds for meeting
capital investment decisions.
Since the new era began, there is the introduction of new technologies and
implementation of digital technologies like: artificial-intelligence, cloud-computing,
block-chain, and a large amount of data in the finance industry. Fintech is trans-
forming the traditional financial industry all over the world and significantly impacts
the capital-structure of companies. Previously, Fintech was proposed in the year 1990
and introduced by “Financial Services Technology Alliance” in US by “Citibank”.
Fintech supports the financial industry by applying advanced technologies and helps
in their development [20]. A detailed definition of Fintech given by “Financial
Stability Board in 2016” [59], is that Fintech is a new technological invention in
the finance industry that influences and collide the financial market, institution and
services that results in new business models, product and services. This Fintech
technology involves digital technologies, blockchain, cloud computing, and artificial
intelligence [69].
According to Ding et al. [27], in the last years, the application of Fintech has grown
and emerged at a very high speed around the globe. KPMG’ (2021) reported that
global Fintech investment has received around 210 dollars billion with approximately
5684 deals in the year 2021 [1]. The rules implied in the traditional financial system
operational environment are changing with the emergence of fintech as suggested
by [12, 49, 62]. With the implementation of Fintech in the financial industry, the
customer-centric business style and efficiency in providing customer service can be
effectively improved as the operating cost is reduced and risk control systems are
strengthened [69]. Fintech is a measure that is revolutionizing the financial industry
with many benefits that are adding productive value for the entire industrial growth
[14]. Previous research has shown that Fintech enhances the total productivity of
the firm [71]. Lv and Xiong [50] discussed the development of Fintech corporate
investment and efficiency move together with a positive correlation. Despite of many
studies made on Fintech and its impact on the financial sector, its influence on the
specific variable of firms on Capstr decisions remained unexplored. As Fintech tech-
nologies are a mix of data and information which is critical, it is quite difficult to
comprehend the Capstr decisions of such firm applying such technologies.
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 309

Previous kind of literatures for Capstr financial decisions focus on two options
which are debt and equity. Also, the topic is researched on private and public fixed-
income securities and possible determinants which can affect the Capstr of the orga-
nization in any sector. There are huge number of studies made on Modigilani and
Miler Theory by researchers that suggested that the value of a firm is not dependent
on its Capstr in the case of theoretical markets having no taxes, no cost of agency,
and all information are disclosed [8]. Identification of determinants of Capstr is made
through research like the pecking order theory [36, 51, 63] and trade-off theory [2, 55,
64]. Theory of pecking order theory which was developed by Donaldson in the year
1961 and which was later altered by Myers in 1984 stated that any organization first
prioritizes the internal funds for sourcing their finances and then moves to debt and
equity capital in the respective order. Based on this theory the first readily available
option is prioritized first and then it adopts debt and equity for further finances [9]. On
the contrary, the theory of tradeoffs explained that Capstr decisions are established
on the understanding of tradeoffs by comparing the cost of debt with the benefits
[25].
Several studies have been conducted on the fintech industry which has investigated
the transformation and understanding of fintech in recent years around the globe and
determinants of debt financing in the case of fintech startups [4, 39, 47] but such
research is very limited. Also, there is no specific research made in the area which has
studied the “influence of firm-specific variables on Capstr decisions” as per evidence
from the fintech industry. It is better to understand the “influence of firm-specific
variables on Capstr decisions” in a specific area as the fintech industry is highly data-
driven. Barsotti [6] is of the view that for studying the optimal structure of capital, it
is crucial to understand firm-specific variables for different organizations, however
fintech industry has never been explored with capital structure studies as the nature of
the sources of funds have a different dimension for this industry, unlike the traditional
ways, this industry is very popular with VC funding and M&A activity. Having an
untraditional pattern to fund sourcing, always lead to a gap for further research to
observe the relationship between sales growth (SG), firm size (FS), profitability of
firm (Prft), tangibility ratio (Tr) and volatility (Vol) on total debt ratio (TDr), long
term debt ratio (LDr) and short term debt ratio (SDr).
Due to the distinctive characteristics of the fintech industry and assorted evidence
of determinants to make a Capstr decision, a structured study on the fintech industry
is to be researched well.
Our study will add significant answers and conclusions to the present limited
literature. The objective is to identify the impact of SG, FS, Prft, Tr and Vol on
TDr, LDr and SDr of the firms from across the global fintech industry. The reason
we chose fintech industry, as it has always been a booming industry with limited
research work on it and there are very few global evidence so far.
On priority, this study will focus on the Capstr decisions literature and theories
already made in the previous years and its generalization to the Fintech industry.
Secondly, the research will provide strong evidence collected from the literature to
understand the specific variables and their influence on Capstr decision-making. The
research methodology is discussed in the third section with sample and descriptive
310 S. Dsouza and A. K. Jain

statistics presented in the section fourth and the final result discussion in the fifth. In
addition, a conclusion is drawn to sum up the research in the last section.

15.2 Literature Review and Hypotheses Development

At present, a huge amount of literature is presented by previous case studies and


journals on Capstr decisions and their determinants which can be made applicable in
general. The review of literature in this section is divided into three parts, first Capstr
decisions in general, second capital-structure decisions in the Fintech industry, and
determinants of Capstr with hypotheses.

15.2.1 Capital Structure (Capstr) Decisions

The current section focuses on the empirical evidence from previous works of liter-
ature on factors and determinants of Capstr decisions. Matias and Serrasqueiro [52]
studied Capstr decision’s reliable determinant factors and identified taking prof-
itability, size, age, asset-structure, and growth opportunities as reliable determinants.
Also, the results have shown that the decisions were closer to the pecking-order theory
rather than the trade-off theory. In emerging markets, the Capstr determinants are
different. The researcher employed a GMM estimator for controlling the endogeneity
and declared the results that the factors of capital-structure are quite different in the
case of indicators of long-term from short-term indicators [68].
Güner [41] studied the factor consideration of Capstr decisions in Turkish compa-
nies and exploited the differences between the capital-structure decisions in terms of
different degrees of free float rate the companies have in the financial market, foreign
paid-in capital, and market values. The results have shown that though pecking-
order theory is the best principle that describes Capstr, but some determinants are
best suited to trade-off theory. It is observed in the research that companies having
free float rates have low leverage levels and it starts varying for different market
values of companies. Alipour et al. [3], collected evidence from companies in Iran
to study the factors of Capstr which stated that variables like the size of the firm,
flexibility in financials, the structure of assets, profitability, liquidity, growth, risk,
and state-ownership influences the measures of Capstr in the corporations situated
in Iran. Evidence from the research shows that short-term debt is the most important
financing option for sourcing companies in Iran and these results are supported by
many previous theories.
An empirical study using data from Chinese non-financial firms depicted that the
average leverage ratio in the collected evidence is alike to those derived in emergent
nations. Also, the study recommends that tangibility, size, volatility and age of the
firm are firmly correlated with the leverage and are quite robust determinants whereas
the firm’s profitability negatively impacts the leverage position Vijayakumaran and
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 311

Vijayakumaran [67]. Correia [19] and Chakrabarti and Chakrabarti [10], have exam-
ined and analyzed profitability as a factor for making capital-structure decisions. It
has been observed from the analysis that there is a negative correlation between the
profitability of the organization and debt. This is due to the firm which has a high
level of profit generally having a low level of debt funding. The reason is the cost of
financial distress which is assumed by the theory of trade-off. Further, this evidence
is similar to pecking-order theory and previous literature declared that profitability
has a negative relationship and profit-making SMEs prefer profit over debt [53].
Study conducted by Sofat and Singh [65] explained different conditional theories
of Capstr and reviewed the literature to conduct analysis on manufacturing firms.
Results of the research suggested that variables like asset composition, risk involved
in the business and return derived on assets (ROA), have positive correlation to
debt ratio and they are declared optimal determinants for Capstr decision making.
However, size of the firm and capacity of debt servicing are not considered to be influ-
ential determinants for decision making. Ullah et al. [66] in his research described
that there is a positive correlation between the debt-equity ratio and return on equity
when the confidence level is at 10%. In contrast to this, the asset turnover ratio has
an inverse relationship with return on equity. Also, there is a negative relationship
when dealing with the size of the firm and return on equity.
Rahman et al. [60] discussed the profits of listed manufacturing companies situ-
ated in Bangladesh and the impact of capital-structure decisions on it. By considering
around 50 observations that are listed on the Dhaka stock exchange in the years 2013
and 2017, the results declared that the debt and equity ratio both have a positive
effect. While there is a critical positive impact of equity ratio with return on equity
whereas debt-to-equity ratio behaves negatively with return on equity. Chang et al.
[13] assessed the impact on the profitability position of the company due to capital-
structure decisions. The study observed the data of Asian economics and imple-
mented regression analysis on the data to ascertain the results. The study observed a
negative correlation between leverage and profitability. However, there is a positive
relationship between growth factors and leverage levels. Nguyen and Nguyen [56]
studied the relationship between the profitability status of non-financial firms and
capital structure. Around 488 listed companies were selected from the Vietnam stock
exchange in the years 2013 and 2018. The results have shown that there is a negative
relation between profit position and capital structure.
Putri and Rahyuda [58], researched capital structure’s influence on the debt-equity
ratio, growth of sales, and the profitability matrix. The study showed results that the
debt-equity ratio negatively influences the profit position of the company. Whereas,
the growth factor positively influences the profits of the company. Orlova et al. [57]
concluded the complexity of Capstr decision making, which is based on the require-
ment of external funding, accessibility to the fixed-income securities market, and
the capabilities of the borrowing firm to handle the additional leverage. These deter-
minants affect the complexity of the decisions. The firm having a financing deficit
can take advantage of accessibility to the market which mitigates the complexity of
capital structure. Dimitropoulos and Koronis [26] examined the Capstr determinants
in the Greek debt crisis and the results depicted that the tangibility of the asset is
312 S. Dsouza and A. K. Jain

straightaway related to the total and long-term leverage position majorly when the
crisis due to debt hit the Greek. Whereas the “non-debt tax shield (NDTS)” and
payment of taxes negatively impacted the total leverage position in the firm. Also,
an organization that has a high level of growth opportunities in the market tends to
be more associated with lower long-term debt. This results in low debt exposure.

15.2.2 Capital Structure (Capstr) Decisions: Fintech Industry

In the current section, the literature related to the Fintech sector is only for the
determination of Capstr decisions. Gastaud et al. [37] found that owner risk and
tolerance, the characteristics of the promoters, and the goods and services produced
by the firm [61] are a few factors that influence the sourcing of finance decisions.
Fintech is a fast-growing mechanism in the rapidly changing financial service
sector [45] and the same is not elaborated and understood in detail [5]. Literature
on Fintech is lacking behind and there are fewer key topics on the subject [54].
Zavolokina et al. [70] explained Fintech as a living entity instead treating it as a
stable idea. Many case studies and research have been conducted that examined the
levels of Capstr decision-making and raising of funds factors such as past funding
obtained, performance, and human resource characteristics [11].
Kachlami [46], examined the SME sector and reported in the research that SMEs
are prone to utilize their profits and apply an internal source of funding instead of
depending upon external funding such as debt. On the other hand, research conducted
on startups using the “Kauffman” firm survey shows results such as owners having
high net worth would consider using more of their equity in the firm than depending
on the debt [16]. Fintech is well established in rich venture capitalist countries and
fund requirements are fulfilled in such companies by a diversified approach like
the “in-residence incubator program” which is generally implied by the financial
institution which is working in the Fintech sector [44].
Giaquinto [40], explained in his research that the business environment of the
country affects the Fintech industry. Also, it is observed in the study that there is a
positive correlation between the business venture capitalist and seed round capital.
These studies have not focused on the Capstr stages of the FinTech industry and there
are many shortcomings in previous research.
Bui [7] discussed about the startups funding like Fintech, where there is less
traditional funding sourced by the firms, they are more likely to depend upon the
equity-based funds, however the same cannot be generalized. Evidence collected by
Langevin [48] shows that fintech is transforming the capital markets which has the
capability of mitigating the information asymmetry [35], however, it increases the
stock liquidity which allows the accessibility to low-cost equity finance options.
Evidence shown by Comeig et al. [17] accessing external financing is a major
issue for fintech companies as they have informational capacity especially which
are new startups as they lack collateral. Therefore, [15] explained that debt can be
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 313

used by such firms as it also provides better performance, and they are more likely
to survive in long run with fast growth in revenues.

15.2.3 Determinants of Capital Structure

Referring to the literature review in the above sections, prospective determinants of


Capstr decision making is discussed in the present section. These factors are sales,
firm size, the profitability of the firm, tangibility ratio, and volatility.
Sales growth is calculated by percentage change in sales from previous year sales.
It has been observed that firm having more growth opportunities are less prone to
depend on debt funding due to it high cost of funding [9]. Therefore, the results can
be derived that firm having high growth avail less leverage level and the hypothesis
formed is:
H1: Firms’ sales growth has a negative impact on debt ratios.
The size of the firm is known to be the legitimate logarithm of total assets and
it is observed that firms having a high amount of asset base tend to acquire a high
proportion of debt [53]. Therefore, the hypothesis concluded as debt and firm size
positively correlated.
H2: Size of the firm has a positive impact on debt ratios.
Firm’s profitability is calculated by calculating the ROA [21–24, 28–33, 42, 43].
It is observed that firm which is having high level of profits already have enough
internal funds. Therefore, there is less dependency of such firms on debt financing
[8]. The formation of next hypothesis based on profitability as a factor is defined as:
H3: Profitability of the firm has a negative impact on debt ratios.
Tangibility is one of the critical factors of Capstr decision-making and is calculated
as “net fixed assets upon total assets”. Firms whose fixed asset level is high tend to
source their finance from external borrowing or bank borrowings by using such fixed
assets as collateral [19]. Therefore, tangibility affects positively firm debt financing.
H4: Tangibility ratio of firm has a positive impact on debt ratios.
Barsotti [6] explained measuring volatility by measuring “three-year standard
deviation of ROA”. Firm which has high volatility levels have less dependency on
debt funding [34, 38] and next hypothesis based on this is:
H5: Volatility of the firm has a negative impact on debt ratios.
314 S. Dsouza and A. K. Jain

Table 15.1 Explanation of variables


Type of variable Variables Measurement
Dependent variable Total debt ratio (TDr) Total debt/total assets
Long term debt ratio (LDr) Long-term debt/total assets
Short term debt ratio (SDr) Short-term debt/total assets
Independent variables Sales growth (SG) Sales growth rate
Firm size (FS) Ln total assets
Profitability of firm (Prft) ROA
Tangibility ratio (Tr) Net fixed assets/total assets
Volatility (Vol) Calculated by the three-year standard
deviation of ROA

15.3 Variables and the Research Model

In this research paper, we utilize unbalanced panel data from 186 firms for the period
2011–2021 making up a total of 1000 firm/year observations. The dependent variable
is the firm’s Capstr proxied by Total debt ratio (TDr), Long term debt ratio (LDr) and
Short term debt ratio (SDr), while the independent variables are Sales growth (SG),
Firm size (FS), Profitability of firm (Prft), Tangibility ratio (Tr) and Volatility (Vol).
The time series data on all variables were obtained from the financial data available
on the Refinitiv website. All the selected firms are listed on the stock exchange and
belong to the global fintech industry (Table 15.1).

15.3.1 Research Model

The below model has been adopted to test our hypothesis.

TDrit/Ldrit/Sdrit = β1 + β2SGit + β3FSit


+ β4Prftit + β5Trit + β6Volit + Fixed effects + εit

where the dependent and independent variables are mentioned. The fixed effects are
proxied by year included in the model. εit represents the error term.

15.4 Sample and Descriptive Statistics

The selected sample comprises of firms listed at the stock exchange for the 2011–
2021 period, from the global fintech industry, available on the Refinitiv database.
In selecting the period we aimed at including as many of the most recent years
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 315

as possible. We pooled the firm/year data from all the listed firms globally and we
excluded firm/year data that had missing data or insufficient financial information for
all the selected variables. A cross-sectional and unbalanced panel was obtained after
all the possible data reductions. The panel comprises 1000 firm/year observations
from the selected 186 firms. The outliers in the sample were not removed from
the panel; however the data was winsorized at 2% (p. 2 98) level. The data was
further processed using STATA software. Table 15.2 shows the descriptive statistics,
skewness, and kurtosis results for the mentioned data.
As expressed in Table 15.2, the mean of TDr is 0.38, LDr is 0.08 and SDr is
0.20; the standard deviation for TDr is 1.20, LDr is 0.19 and for SDr is 0.72. With a
mean almost closer to zero indicates that the firms barely use debt (neither short term
nor long term) as a source of finance for the business. A lower standard deviation
also indicates that the firms across the fintech industry sample follow the same debt
practice. The mean and standard deviation of SG is 0.51, 1.76. The mean value
indicates that the sales growth has been positive on an average for the sample and
the lower standard deviation explains a similar behaviour across the sample. The
FS has a mean of 17.84 and standard deviation of 3.47, considering the maximum
and the minimum values of FS, the mean indicates a balanced distribution of firms
across sample w.r.t. their investments in assets. The Prft has a mean of − 0.99 and
a standard deviation of 3.33, having a negative mean though almost closer to zero
indicates that majority of the fintech firms are either having losses or are barely able

Table 15.2 Descriptive statistics


Variables Observations Mean Standard Min. Max. Pr Pr
deviation (Skewness) (Kurtosis)
(std. dev.)
Total debt 1000 0.38 1.20 0.00 7.49 0.00 0.00
ratio (TDr)
Long term 1000 0.08 0.19 0.00 1.03 0.00 0.00
debt ratio
(LDr)
Short term 1000 0.20 0.72 0.00 4.54 0.00 0.00
debt ratio
(SDr)
Sales growth 1000 0.51 1.76 -1.00 9.35 0.00 0.00
(SG)
Firm size 1000 17.84 3.47 9.86 24.74 0.41 0.00
(FS)
Profitability 1000 -0.99 3.33 -20.46 0.36 0.00 0.00
of firm (Prft)
Tangibility 1000 0.13 0.21 0.00 0.88 0.00 0.00
ratio (Tr)
Volatility 1000 1.50 5.04 0.00 29.74 0.00 0.00
(Vol)
316 S. Dsouza and A. K. Jain

to achieve their breakeven point. The Tr has a mean of 0.13 and a standard deviation
of 0.21, having a mean lower as 0.13 indicates that the fixed assets comprises to be
on an average 13% of the total assets, with a consistent behaviour across the sample.
The Vol has a mean of 1.50 and a standard deviation of 5.04, the behaviour of ROA
with its three year standard deviation has a lower mean and a reasonably consistent
behaviour across the fintech sample. As the skewness observations for the whole
sample is almost equal to zero, it indicates that the data used in the complete sample
is fairly symmetrical. A low kurtosis value across the whole sample states that the
sample lacks outliers.
Table 15.3 reflects the correlation between the independent and dependent vari-
ables. It has been observed that SG has a positive correlation with TDr, LDr and SDr.
FS and Prft have a negative correlation with TDr, LDr and SDr along with Tr having
a positive correlation with TDr, LDr and SDr being statistically significant at 5%
(FS, Prft and Tr) with TDr and SDr. The correlation matrix provides a general and
primary association amongst the variables; however they need to be further tested
with regression analysis to identify the impact of the independent variables on the
dependent variables.
Table 15.4 shows the variance inflation factor (VIF) results. The variables in the
model are free from multicollinearity within themselves.

Table 15.3 Correlation amongst the variables


Variables TDr LDr SDr SG FS Prft Tr Vol
TDr 1
LDr 0.4057* 1
SDr 0.8738* 0.1207* 1
SG 0.0051 0.0166 0.0156 1
FS − − 0.0051 − − 0.0241 1
0.3229* 0.3581*
Prft − − 0.0499 − − 0.0444 0.5068* 1
0.5279* 0.5863*
Tr 0.1116* 0.019 0.1518* − 0.0291 − − 1
0.2298* 0.1082*
Vol 0.4333* 0.0042 0.4932* 0.0312 − − 0.0432 1
0.4521* 0.725*
Note *Statistically significant at 5% level

Table 15.4 VIF results: SG, FS, Prft, Tr, Vol and TDr/Ldr/Sdr
Variables SG FS Prft Tr Vol Mean VIF
VIF 1 1.44 2.31 1.06 2.17 1.6
1/VIF 1.00 0.69 0.43 0.94 0.46
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 317

8
6
Total debt ratio (TDr)
4
2
0

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Box Plot of Total debt ratio (TDr)

Fig. 15.1 Box plot of total debt ratio (TDr)

15.5 Results and Discussion

15.5.1 Distribution of Capstr (Box Plot Technique)

Figures 15.1, 15.2 and 15.3 displays the distribution of Total debt ratio (TDr), Short
term debt ratio (SDr) and Long term debt ratio (LDr) over a period from 2011 to
2021 using Box plot. Box plot technique displays the five-number summary as a
central box with whiskers that extend to the non-outlying values. As observed in
all the three figures, for all the individual years discussed the median is not roughly
centered between the quartiles and the whiskers are not of the similar length, thus
we conclude that the per year data distribution for TDr, SDr and LDr over the period
of the study is skewed.

15.5.2 Regression Results

The panel regression analysis is used to observe the impact of the independent vari-
ables on the dependent variables. The most significant results derived from the no
dummy or year dummy observation is used to analyse the discussed panel regression
model. Further the Hausman test derives the selection of Fixed or random effect for
the analysis.
Table 15.5 represents the regression results. Based on the Hausman test static (p
= 0.0018), the fixed effect results have been analysed for dependent variable TDr.
The firm’s sales growth (SG) have no significant impact on TDr. However, firm size
318 S. Dsouza and A. K. Jain

5
4
Short term debt ratio (SDr)
3
2
1
0

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Box plot of Short term debt ratio (SDr)

Fig. 15.2 Box plot of short term debt ratio (SDr)


1
.8
Long term debt ratio (LDr)
.6
.4
.2
0

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Box plot of Long term debt ratio (LDr)

Fig. 15.3 Box plot of long term debt ratio (LDr)

(FS) with no dummy has a negative and significant (p < 0.05) impact on TDr, which
accepts the (H2) null hypothesis. The profitability of the firm (Prft) with year dummy
has a negative and significant (p < 0.01) impact on TDr. Tangibility ratio (Tr) with no
dummy has a positive and significant (p < 0.01) impact on TDr.However Volatility
(Vol) with no dummy has a positive and significant (p < 0.05) impact on TDr, which
accepts the (H5) null hypothesis. For the dependent variable SDr, considering the
Hausman test static (p = 0.0049), the fixed effect results have been analysed. The
firm’s sales growth (SG) have no significant impact on SDr. However, firm size (FS)
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 319

with no dummy has a negative and significant (p < 0.01) impact on SDr, which accepts
the (H2) null hypothesis. The profitability of the firm (Prft) with year dummy has a
negative and significant (p < 0.01) impact on SDr. Tangibility ratio (Tr) with year
dummy has a positive and significant (p < 0.01) impact on SDr. However Volatility
(Vol) with no dummy has a positive and significant (p < 0.01) impact on SDr, which
accepts the (H5) null hypothesis. For the dependent variable LDr, the Hausman test
static (p = 0.8236) concludes the random effect results to be analysed. The firm’s
sales growth (SG), firm size (FS), Tangibility ratio (Tr)and Volatility (Vol) have no
significant impact on LDr.However, profitability of the firm (Prft) with no dummy
has a negative and significant (p < 0.05) impact on LDr.

Table 15.5 Regression results


Independent Dependent variables
variables Fixed effects Independent Random effects
Total debt ratio Short term debt ratio variables Long term debt
(TDr) (SDr) ratio (LDr)
No Year No Year No Year
dummy dummy dummy dummy dummy dummy
Sales growth 0.009 0.01 0.009 0.01 Sales growth 0.002 0.002
(SG) (SG)
(0.016) (0.016) (0.010) (0.011) (0.003) (0.003)
Firm size (FS) − − 0.05 − − Firm size (FS) 0 −
0.069** 0.058*** 0.055*** 0.003
(0.028) (0.031) (0.018) (0.020) (0.003) (0.003)
Profitability of − − − − Profitability of − −
firm (Prft) 0.143*** 0.147*** 0.093*** 0.094*** firm (Prft) 0.005** 0.004*
(0.013) (0.014) (0.009) (0.009) (0.002) (0.002)
Tangibility 0.51*** 0.48*** 0.388*** 0.391*** Tangibility 0.029 0.039
ratio (Tr) ratio (Tr)
(0.177) (0.182) (0.117) (0.120) (0.029) (0.029)
Volatility (Vol) 0.019** 0.02** 0.015*** 0.015*** Volatility (Vol) − 0.001 −
0.001
(0.008) (0.008) (0.005) (0.005) (0.001) (0.001)
Observations 1000 1000 1000 1000 Observations 1000 1000
R-squared 0.266 0.274 0.281 0.29 Pseudo R2 .z .z
Durbin–Watson 1.068 1.06 1.49 1.502 Durbin–Watson 1.008 1.01
F-statistics 58.71*** 20.12*** 63.67*** 21.8*** Wald chi2(15) 8.03 25.1**
0.00 0.00 0.00 0.00 0.1544 0.0486
Hausman test 0.0018 0.0018 0.0049 0.0049 Hausman test 0.8236 0.8236
(Prob > chi2) (Prob > chi2)
Standard errors are in parentheses
***p < 0.01, **p < 0.05, *p < 0.1
320 S. Dsouza and A. K. Jain

15.6 Conclusion and Managerial Contribution

The study contributes to the exisiting literature on Capstr decisions, however the
uniqueness to the study is the global finetch industry which makes it rare. The study
includes various variables like firm’s sales growth (SG), firm size (FS), Profitability
of firm (Prft), Tangibility ratio (Tr) and Volatility (Vol) whose impact has been
measured on debt ratios. However to make the study more robust the debt ratios
are further classified into separate models and the impact is measured separately
on them (TDr, SDr and LDr). The results indicate that the firm’s sales growth (SG)
has no significant impact on TDr, SDr and LDr for the fintech industry. The firm
size (FS) has a negative and significant impact on TDr and SDr, however FS has no
significant impact on LDr. The firm size (FS) showing negative impact on TDr and
SDr contradicts to the literature [53]. This indicates that the fintech industry does not
encourage debt to be a source of funds for the assets but focuses more on VC funding
and M&A activity [18], The Profitability of firm (Prft) has a negative and significant
impact on TDr, SDr and LDr, thus it’s in agreement to the pecking order theory, w.r.t.
the profitability behaviour. Tangibility ratio (Tr) has a positive and significant impact
on TDr, SDr. However the Volatility (Vol) has a positive and significant impact on
TDr and SDr, Vol has no significant impact on LDr. The Volatility (Vol) showing
positive impact on TDr and SDr contradicts to the literature [34, 38].
The study can be useful for fintech firms inorder to decide on their capital strucutre
decisions for exisiting and future business opportunities, it can be useful for investors
to observe the firm behavior w.r.t. the fintech industry funding behavior. Considering
the independent variables influence on debt ratios, the fintech firms can identify
business abnormalities and improvise on the same. Being a rare industrial study,
it still holds some limitations. The study is limited to global fintech industry and
the conclusions can vary with change in the industry. There can be further studies
motivated by adding more independent variables from the fintech industry and testing
their influence on Capstr decisions.

References

1. Agrawal, R.: Role of Fintech companies in increasing financial inclusion. J. Appl. Manage.
14(1), 24–36 (2022)
2. Ai, H., Frank, M.Z., Sanati, A.: The trade-off theory of Corporate Capital Structure. Oxford
Research Encyclopedia of Economics and Finance (2020)
3. Alipour, M., Mohammadi, M.F.S., Derakhshan, H.: Determinants of capital structure: an
empirical study of firms in Iran. Int. J. Law Manage. 57(1), 53–83 (2015)
4. Allen, F., Gu, X., Jagtiani, J.: Fintech, cryptocurrencies, and CBDC: financial structural
transformation in China. J. Int. Money Financ. 124, 102625 (2022)
5. Anagnostopoulos, I.: Fintech and regtech: impact on regulators and banks. J. Econ. Bus. 100,
7–25 (2018)
6. Barsotti, F.: Optimal Capital Structure with Endogenous Bankruptcy: Payouts, Tax Bene-
fits Asymetry and Volatility Risk (Doctoral dissertation, Université de Toulouse, Université
Toulouse III-Paul Sabatier) (2011)
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 321

7. Bui, T.P.: Fintech và đầu tư vào lĩnh vực Fintech tại Tổng công ty Viễn thông Viettel (2019)
8. Chadha, S., Sharma, A.K.: Capital structure and firm performance: empirical evidence from
India. Vis. J. Bus. Perspect. 19(4), 295–302 (2015). https://doi.org/10.1177/097226291561
0852
9. Chaklader, B., Chawla, D.: A study of determinants of capital structure through panel data
analysis of firms listed in NSE CNX 500. Vision 20(4), 267–277 (2016)
10. Chakrabarti, A., Chakrabarti, A.: The capital structure puzzle–evidence from Indian energy
sector. Int. J. Energy Sector Manage. (2019)
11. Chan, E., Fei, Y.: Assessing the startup bandwagon effect: the role of past funding in venture
capital investment. UChicago Undergr. Bus. J. 1(2), 1–18 (2015)
12. Chang, V., Baudier, P., Zhang, H., Xu, Q., Zhang, J., Arami, M.: How blockchain can impact
financial services—the overview, challenges and recommendations from expert interviewees.
Technol. Forecast. Soc. Chang. 158, 120166 (2020)
13. Chang, C.C., Batmunkh, M.U., Wong, W.K., Jargalsaikhan, M.: Relationship between capital
structure and profitability: evidence from four Asian tigers. J. Manage. Inf. Decis. Sci. (2019)
14. Chen, M.A., Wu, Q., Yang, B.: How valuable is FinTech innovation? Rev. Fin. Stud. 32(5),
2062–2106 (2019)
15. Cole, R.A., Sokolyk, T.: Debt financing, survival, and growth of start-up firms. J. Corp. Finan.
50, 609–625 (2018)
16. Coleman, S., Cotei, C., Farhat, J.: The debt-equity financing decisions of US startup firms. J.
Econ. Fin. 40, 105–126 (2016)
17. Comeig, I., Fernández-Blanco, M.O., Ramírez, F.: Information acquisition in SME’s relation-
ship lending and the cost of loans. J. Bus. Res. 68(7), 1650–1652 (2015)
18. Cornelli, G., Doerr, S., Franco, L., Frost, J.: Funding for fintechs: patterns and drivers (2021)
19. Correia, A.M.F.A.: Determinants of corporate capital structure: evidence from non-financial
listed French firms (2015)
20. Darolles, S.: The rise of fintechs and their regulation. Fin. Stabil. Rev. 20, 85–92 (2016)
21. Demiraj, R., Dsouza, S., Abiad, M.: Working capital management impact on profitability:
pre-pandemic and pandemic evidence from the European automotive industry. Risks 10(12)
(2022)
22. Demiraj, R., Demiraj, E., Dsouza, S.: Impact of financial leverage on the performance of tourism
firms in the MENA region. PressAcademia Procedia 16(1), 156–161 (2023)
23. Demiraj, R., Dsouza, S., Demiraj, E.: ESG scores relationship with firm performance: panel
data evidence from the European tourism industry. PressAcademia Procedia 16(1), 116–120
(2023)
24. Demiraj, R., Dsouza, S., Demiraj, E.: Capital structure and profitability: panel data evidence
from the European tourism industry. In: 6th International Scientific Conference ITEMA 2022,
Selected Papers (2023). https://doi.org/10.31410/ITEMA.S.P.2022.1
25. Dierker, M., Lee, I., Seo, S.W.: Risk changes and external financing activities: tests of the
dynamic trade-off theory of capital structure. J. Empir. Financ. 52, 178–200 (2019)
26. Dimitropoulos, P.E., Koronios, K.: Capital structure determinants of Greek hotels: the impact
of the Greek debt crisis. In: Culture and Tourism in a Smart, Globalized, and Sustainable
World: 7th International Conference of IACuDiT, Hydra, Greece, 2020, pp. 387–402. Springer
International Publishing, Cham (2021)
27. Ding, N., Gu, L., Peng, Y.: Fintech, financial constraints and innovation: evidence from China.
J. Corp. Finan. 73, 102194 (2022)
28. Dsouza, S., Pandey, D.: Study of relationship between liquidity and profitability of automobile
companies. In: International Conference on Finance and Economics, 83–93 (2017)
29. Dsouza, S., Rabbani, M.R., Hawaldar, I.T., Jain, A.K.: Impact of bank efficiency on the prof-
itability of the banks in India: an empirical analysis using panel data approach. Int. J. Fin. Stud.
10(4), 93 (2022)
30. Dsouza, S., Demiraj, R., Habibniya, H.: Variable reduction technique to boost financial anal-
ysis: a case study on emerging markets telecommunication industry, BRICS. SCMS J. Indian
Manage. 19(2) (2022)
322 S. Dsouza and A. K. Jain

31. Dsouza, S., Habibniya, H.: The impact of liquidity on the profitability of nifty pharma index
(NSE India). IUP J. Account. Res. Audit Pract. 20(4) (2021)
32. Dsouza, S., Demiraj, R., Habibniya, H.: A Study on the Impact of Liquidity and Leverage
on Performance: Hotels and Entertainment Services Industry–MENA Region: An Empirical
Panel Data Analysis. Available at SSRN 3989995 (2021)
33. Dsouza, S., Demiraj, R., Habibniya, H.: Impact of liquidity and leverage on performance:
panel data evidence of hotels and entertainment services industry in the MENA region. Int. J.
Hospital. Tour. Syst. 16(3), 26–39 (2023)
34. Dudley, E., James, C.M.: Cash flow volatility and capital structure choice (2015). https://doi.
org/10.2139/ssrn.2492152
35. Feyen, E., Frost, J., Gambacorta, L., Natarajan, H., Saal, M.: Fintech and the digital trans-
formation of financial services: implications for market structure and public policy. BIS Pap.
(2021)
36. Frank, M.Z., Goyal, V.K., Shen, T.: The pecking order theory of capital structure: where do we
stand? SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3540610
37. Gastaud, C., Carniel, T., Dalle, J.M.: The varying importance of extrinsic factors in the success
of startup fundraising: competition at early-stage and networks at growth-stage (2019). arXiv
preprint arXiv:1906.03210
38. Ghasemzadeh, M., Heydari, M., Mansourfar, G.: Earning volatility, capital structure decisions
and financial distress by SEM. Emerg. Mark. Financ. Trade 57(9), 1–19 (2019)
39. Giaretta, E., Chesini, G.: The determinants of debt financing: the case of fintech start-ups. J.
Innov. Knowl. 6(4), 268–279 (2021)
40. Giaquinto, L.: Angel, seed and founders influence on Fintech funding: semantic scholar.
Semantic Scholar (1970)
41. Güner, A.: The determinants of capital structure decisions: new evidence from Turkish
companies. Procedia Econ. Fin. 38, 84–89 (2016)
42. Habibniya, H., Dsouza, S.: Impact of performance measurements against market value of shares
in Indian banks an empirical study specific to EVA, EPS, ROA, and ROE. J. Manag. Res. 18(4),
203–210 (2018)
43. Habibniya, H., Dsouza, S., Rabbani, M.R., Nawaz, N., Demiraj, R.: Impact of capital structure
on profitability: panel data evidence of the telecom industry in the United States. Risks 10(8),
157 (2022)
44. Haddad, C., Hornuf, L.: The emergence of the global Fintech market: economic and
technological determinants. Small Bus. Econ. 53(1), 81–105 (2019)
45. Jagtiani, J., Lemieux, C.: Do Fintech lenders penetrate areas that are underserved by traditional
banks? J. Econ. Bus. 100, 43–54 (2018)
46. Kachlami, H., Yazdanfar, D.: Determinants of SME growth: the influence of financing pattern.
An empirical study based on Swedish data. Manage. Res. Rev. 39(9), 966–986 (2016)
47. Knewtson, H.S., Rosenbaum, Z.A.: Toward understanding FinTech and its industry. Manag.
Financ. 46(8), 1043–1060 (2020)
48. Langevin, M.: Big data for (not so) small loans: technological infrastructures and the
massification of fringe finance. Rev. Int. Polit. Econ. 26(5), 790–814 (2019)
49. Lee, I., Shin, Y.J.: Fintech: ecosystem, business models, investment decisions, and challenges.
Bus. Horiz. 61(1), 35–46 (2018)
50. Lv, P., Xiong, H.: Can FinTech improve corporate investment efficiency? Evidence from China.
Res. Int. Bus. Financ. 60, 101571 (2022)
51. Martinez, L.B., Scherger, V., Guercio, M.B.: SMEs capital structure: trade-off or pecking order
theory: a systematic review. J. Small Bus. Enterp. Dev. 26(1), 105–132 (2019)
52. Matias, F., Serrasqueiro, Z.: Are there reliable determinant factors of capital structure decisions?
Empirical study of SMEs in different regions of Portugal. Res. Int. Bus. Financ. 40, 19–33
(2017)
53. Matias, F., Salsa, L., Afonso, C.: Capital structure of Portuguese hotel firms: a structural
equation modelling approach. Tour. Manage. Stud. 14(1), 73–82 (2018)
15 Influence of Firm-Specific Variables on Capital Structure Decisions … 323

54. Milian, E.Z., Spinola, M.D.M., de Carvalho, M.M.: Fintechs: a literature review and research
agenda. Electron. Commer. Res. Appl. 34, 100833 (2019)
55. Nicodano, G., Regis, L.: A trade-off theory of ownership and capital structure. J. Financ. Econ.
131(3), 715–735 (2019)
56. Nguyen, T., Nguyen, H.: Capital structure and firm performance of non-financial listed
companies: cross-sector empirical evidences from Vietnam. Accounting 6(2), 137–150 (2020)
57. Orlova, S., Harper, J.T., Sun, L.: Determinants of capital structure complexity. J. Econ. Bus.
110, 105905 (2020). https://doi.org/10.1016/j.jeconbus.2020.105905
58. Putri, I.G.A.P.T., Rahyuda, H.: Effect of capital structure and sales growth on firm value with
profitability as mediation. Int. Res. J. Manage. IT Soc. Sci. 7(1), 145–155 (2020)
59. Ramlall, I.: Understanding Financial Stability. Emerald Group Publishing (2018)
60. Rahman, M.A., Sarker, M.S.I., Uddin, M.J.: The impact of capital structure on the profitability
of publicly traded manufacturing firms in Bangladesh. Appl. Econ. Fin. 6(2), 1–5 (2019)
61. Roeder, J., Cardona, D.R., Palmer, M., Werth, O., Muntermann, J., Breitner, M.H.: Make
or break: business model determinants of FinTech venture success. In: Proceedings of the
Multikonferenz Wirtschaftsinformatik, Lüneburg, Germany, 6–9 (2018)
62. Saksonova, S., Kuzmina-Merlino, I.: Fintech as financial innovation—the possibilities and
problems of implementation (2017)
63. Serrasqueiro, Z., Caetano, A.: Trade-off theory versus pecking order theory: capital structure
decisions in a peripheral region of Portugal. J. Bus. Econ. Manag. 16(2), 445–466 (2015)
64. Shahar, W.S.S., Shahar, W.S.S., Bahari, N.F., Ahmad, N.W., Fisal, S., Rafdi, N.J.: A review of
capital structure theories: trade-off theory, pecking order theory, and market timing theory. In:
Proceeding of the 2nd International Conference on Management and Muamalah, pp. 240–247
(2015)
65. Sofat, R., Singh, S.: Determinants of capital structure: an empirical study of manufacturing
firms in India. Int. J. Law Manage. 59(6), 1029–1045 (2017)
66. Ullah, A., Kashif, M., Ullah, S.: Impact of capital structure on financial performance of textile
sector in Pakistan. KASBIT Bus. J. 10(2), 1–20 (2017)
67. Vijayakumaran, S., Vijayakumaran, R.: The determinants of capital structure decisions:
evidence from Chinese listed companies, 63–81 (2018)
68. Vo, X.V.: Determinants of capital structure in emerging markets: evidence from Vietnam. Res.
Int. Bus. Financ. 40, 105–113 (2017)
69. Yang, Y., Su, X., Yao, S.: Nexus between green finance, Fintech, and high-quality economic
development: empirical evidence from China. Resour. Policy 74, 102445 (2021)
70. Zavolokina, L., Dolata, M., Schwabe, G.: The FinTech phenomenon: antecedents of financial
innovation perceived by the popular press. Fin. Innov. 2(1), 1–16 (2016)
71. Zhou, G., Zhu, J., Luo, S.: The impact of Fintech innovation on green growth in China:
mediating effect of green finance. Ecol. Econ. 193, 107308 (2022)
Chapter 16
A Weights Direct Determination Neural
Network for Credit Card Attrition
Analysis

Vasilios N. Katsikis, Spyridon D. Mourtas, Romanos Sahas,


and Dimitris Balios

Abstract Cost reduction is a component that contributes to both the profitability and
longevity of a corporation, especially in the case of a financial institution, and can
be accomplished through greater client retention. Particularly, credit card customers
comprise a volatile subset of a bank’s client base. As such, banks would like to predict
in advance which of those clients are likely to attrite, so as to approach them with
proactive marketing campaigns. Credit card attrition is generally a poorly investi-
gated subtopic with a variety of challenges, like highly imbalanced datasets. This
article utilizes neural networks to address the challenges of credit card attrition since
they have found great application in many classification problems. More particu-
larly, to overcome the shortcomings of traditional back propagation neural networks,
we construct a multi-input trigonometrically activated weights and structure deter-
mination (MTA-WASD) neural network which incorporates structure trimming as
well as other techniques that boost its training speed as well as diminish the danger
and the subsequent detrimental effects of overfitting. When applied to three publicly
available datasets, the MTA-WASD neural network demonstrated either superior or
highly competitive performance across all metrics, compared to some of the best-
performing classification models that MATLAB’s classification learner app offers.

Keywords Neural networks · Weights and structure determination · Credit card


attrition · Credit card churn · Classification · Machine learning

V. N. Katsikis (B) · S. D. Mourtas · R. Sahas


Division of Mathematics-Informatics and Statistics-Econometrics, Department of Economics,
National and Kapodistrian University of Athens, Sofokleous 1 Street, 10559 Athens, Greece
e-mail: [email protected]
S. D. Mourtas
e-mail: [email protected]
R. Sahas
e-mail: [email protected]
D. Balios
Division of Business Economics and Business Administration-Finance, Department of
Economics, National and Kapodistrian University of Athens, Sofokleous 1 Street, 10559 Athens,
Greece
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 325
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_16
326 V. N. Katsikis et al.

16.1 Introduction

In highly competitive and mature business sectors, one such being the banking sector,
the growth of a company, or rather a bank in this case, greatly depends on the efforts
that the entity makes towards: maintaining and growing its existing customer base,
acquiring/keeping up with new technology, focusing on specific market segments
and enhancing its productivity and efficiency [2, 11]. Of those factors, it is argued
that the first is also the most prominent. Namely, more and more companies become
aware of the fact that their most precious asset is the existing customer base [9, 27].
It comes as no surprise that service providers in the financial industry go through
great efforts to attract clients from their competitors whilst limiting their own losses
[10, 20].
However, it is not only the banks that have become conscious of the importance
of their clientele, but also the clients themselves. The increasing awareness of the
latter party when it comes to quality of service provided, is another element adding
to the already competitive environment. Oftentimes, factors such as accessibility or
even a more attractive interest rate are all it takes for a client, whether long-term or
new, to suddenly stop doing business with a bank and move to a competing firm [8,
15]. Financial institutions have been motivated to gradually shift their focus from
attracting new customers to retaining as many of their current ones as possible, mainly
due to the impact that even a small increase in customer retention can have on the
bank’s income statement but also due to the well-established facts that maintaining
is much cheaper than re-acquiring lost customers and, on a similar note, selling to an
existing customer is several times less expensive than selling to a new customer [19,
26, 29]. In effect, there has been a shift of interest. It has now become important for
banks to know in advance which of their customers, starting from the “high grade”-
high return on investment clients, are likely to leave [10, 14]. To the extent that
a financial institution can obtain this knowledge, it can launch targeted marketing
campaigns, which have been shown to be very effective when it comes to customer
retention [11]. The act of analysing data and developing models so as to make a
prediction on the clients that are likely to “attrite”, usually with an utter aim of
employing adequate proactive counter-measures, refers to attrition or churn, as it is
often called, analysis.
Artificial neural networks have been successfully applied to a wide spectrum of
fields, including but not limited to medicine, such as in the prediction of breast cancer
[22] and economics and finance, such as in the classification of firm fraud [24], in
portfolio optimization [25], in the analysis of time series [17], in the stabilization
of stochastic exchange rate dynamics [18] as well as in the prediction of various
macroeconomic measures [23]. Furthermore, there is an abundance of applications in
problems stemming from the various engineering disciplines. For example, artificial
neural network models have been applied to feedback control systems stabilization
[16], mobile objects localization [13], performance analysis of solar systems [6, 21],
remote sensing multi-sensor classification [1], prediction of the flow behavior of
alloy [12, 28] and performance analysis of heat pump systems [3–5].
16 A Weights Direct Determination Neural Network for Credit Card … 327

In this paper we will use feed-forward neural networks to classify customers that
are likely to attrite. As was already stated, financial institutions that can acquire that
knowledge for their own client base obtain a competitive advantage in the form of
operating cost reduction. In training a feed-forward neural network, there has been
a long tradition in the use of back-propagation algorithms where the structure of the
neural network is iteratively refined. On the other hand, newly implemented weights
and structure determination (WASD) training algorithms offer a feature that their
predecessors lack. Namely, the weights direct determination (WDD) process, inher-
ent in any WASD algorithm, facilitates the direct computation of the optimal set of
weights, hence allowing one to avoid getting stuck in local minima and all in all
contributing in the achievement of lower computational complexity [24, 31, 33]. We
thus develop a 3-layer feed-forward multi-input trigonometrically activated WASD
(MTA-WASD) neural network for classification. Its activation functions consist of
products of power based trigonometric functions. On testing the MTA-WASD neural
network to three publicly available credit card attrition datasets and comparing its
performance to another WASD neural network as well as a number of popular clas-
sifiers from MATLAB’s classification learner app, the MTA-WASD neural network
demonstrated either superior or equal performance across all metrics, thus suggest-
ing that the trigonometrically activated WASD model is both a competitive as well
as a reliable classifier.
This work’s main points can be summarized as follows:

• A 3-layer feed-forward MTA-WASD neural network that is trained through a


WASD algorithm in view of application to classification problems, is presented
along with the relevant algorithms.
• The construction of power based activation functions for multi-input neural net-
works often requires the implementation of lexicographically ordered power
tables. On the construction of said tables, a heuristic algorithm is proposed.
• Faster training is achieved through the implementation of a value lexicon, which
takes advantage of the structuring of those power tables.
• A structure trimming technique is employed so as to lower the risk of overfitting
and facilitate subsequent computations.
• Three publicly available credit card attrition datasets are considered and the MTA-
WASD neural network’s performance is compared to other popular classifiers,
namely support vector machines (SVM), k-nearest neighbors (KNN), kernel naive
Bayes (KNB) as well as another WASD neural network.

The following is a breakdown of the paper’s structure. Section 16.2 begins with an
overview of the MTA-WASD neural network’s final structure and the rationale behind
it. It then proceeds into the development of activation functions through the use of
lexicographically ordered power tables and the formulation of the WDD process. The
section ends with the description of the full training process and the presentation of
all related algorithms. In Sect. 16.3, the MTA-WASD neural network is applied to
three publicly available credit card attrition datasets and its performance is compared
to other popular models. Section 16.4 contains some final remarks.
328 V. N. Katsikis et al.

16.2 The MTA-WASD Model

The neural network presented in Fig. 16.1 is a classification neural network that
accepts one or many inputs.
Towards building its structure, the neural network employs a WASD algorithm
alongside a post-training structure trimming process. Let .n ∈ N denote the number
of inputs, with .x j ∈ Rm , j = 1, 2, . . [. , n and let . y ∈] Rm be the target response cor-
responding to the input matrix .x = x1 , x2 , . . . , xn ∈ Rm×n . The variable vectors
are passed into the next layer, each with a weight of .1. The training process populates
the hidden layer which, at the end of the procedure, will have accumulated . N ∈ N
[ ]T
neurons. The weights column vector .w = w1 , w2 , . . . , w N is computed by use
of the WDD process which ensures that, given the structure, the choice of weights
is indeed optimal. Each neuron .i = 1, 2, . . . , N represents the image of the input
matrix under the activation function .gi . The corresponding weight represents the
importance of the image’s contribution to the collective output, . ŷ. A weighted com-
bination of all images yields the prediction of the neural network. Finally, the output
neuron is activated in the sense that . ŷ is considered a valid prediction only after it
has been converted to binary form . ỹ through the following elementwise function:
{
1, ŷi ≥ p̃
f ( ŷ) =
. i (16.1)
0, ŷi < p̃

where . p̃ = min ŷ + p(max ŷ − min ŷ) with the threshold . p ∈ [0, 1] and .i = 1, 2,
. . . , m. Generally, if the threshold . p is picked to be close to .0 then more entries of . ŷ
will be mapped to .1 and vice versa.

Fig. 16.1 Final structure of


the trained neural network,
accepting an arbitrary
number .n of inputs
16 A Weights Direct Determination Neural Network for Credit Card … 329

16.2.1 Activation Functions and the WDD Process

As an essential feature in any WASD algorithm, the WDD process allows one to
obtain the optimal weights corresponding to the current hidden layer structure without
having to engage in lengthy iterative computations, where the quality of the outcome
is often uncertain. Evidently, the WDD process contributes in achieving both speed
and lower computational complexity compared to traditional weight determination
approaches, whilst avoiding some of the related pitfalls [31, 33].
The construction of the neural network revolves around linking the training input
matrix.x to a known target vector. y; that is, approximating the underlying relationship
between .x and . y through a combination of activation functions. This is mainly
achieved by the development of a large enough hidden layer that is paired with an
adequate set of weights. The activation functions themselves usually account for a
substantial part of the neural network’s performing ability, both its training as well
as testing components. A key feature in training a neural network through a WASD
algorithm is that the number of hidden layer neurons is not predetermined. Rather, as
the training procedure unfolds, the number of hidden layer neurons fluctuates until
the network settles to a structure that is considered optimal. Each added neuron has to
provide for something previously unavailable in the structure. There is no marginal
benefit for adding neurons that are constant multiples of the pre-existing neurons and,
depending on the algorithm, the training process is likely to terminate prematurely,
should that be the case.
It should now come as no surprise that, as far as WASD training algorithms are
concerned, polynomials and/or other functions that are raised incrementally to some
power are common choices for activation functions, mainly because the terms pro-
duced are inherently linearly independent. Namely, the power, the power sigmoid, the
power inverse exponential and the power softplus activation functions were proposed
in [24]. Furthermore, serving as building blocks for activation functions, polynomi-
als such as Chebyshev, Euler, Hermite, Laguerre, Legendre as well as Bernoulli
polynomials were proposed in [31]. In this paper, we investigate the implementation
of a trigonometrically activated neural network. As a result, the following two sub-
activations, which will be used as building blocks for the activation function .g(x),
are proposed and investigated. With .k ∈ .N ∪ {0}, the first sub-activation (SA1) is

φ (x) = (sin(kx) + cos(kx))k ,


. k (16.2)

and the second sub-activation (SA2) is

φ (x) = (sin(x k ) + cos(x k ))k .


. k (16.3)

As for the ability of the neural network to converge, the following Definition 1,
Theorem 1 and Proposition 1 from [31] should be noted.
330 V. N. Katsikis et al.

Definition 1 Let . f (x1 , x2 , . . . , xk ) be a function of .k variables. The polynomials


n1 ∑
nk ( )∏
k
ν1 νk ν ν
. Bn n ...n (x 1 , x 2 , . . . , x k )
f
= ··· f ,..., Cnqq xq q (1 − xq )nq −νq
1 2 k
ν1 =0 νk =0
n1 nk q=1

ν
are called multivariate Bernstein polynomials of . f (x1 , x2 , . . . , xk ), where .Cnqq
denotes a binomial coefficient with .n q = n 1 , n 2 , . . . , n k and .νq = 0, 1, . . . , n q .
Theorem 1 Let . f (x1 , x2 , . . . , xk ) be a continuous function defined over .Vk =
{(x1 , x2 , . . . , xk ) ∈ Rk |0 ≤ xq ≤ 1, q = 1, 2, . . . , k}. Then the multivariate Bern-
f
stein polynomials . Bn 1 n 2 ...n k (x1 , x2 , . . . , xk ) converge uniformly to . f (x1 , x2 , . . . , xk )
as .n 1 , n 2 , . . . , n k → ∞.
Proposition 1 With a form of products of trigonometric power based functions .φk
employed, we can construct a generalized trigonometric polynomial


n
g (x) = gi (x1 , x2 , . . . , xn ) = φki1 (x1 )φki2 (x2 ) · · · φkin (xn ) =
. i φki j (x j ),
j=1

for .i = 1, 2, . . . , N .
Given .n inputs and . N hidden layer neurons, it is suggested in [31] that .gi (x) =
gi (x1 , x2 , . . . , xn ), the image of.x under the.ith neuron, should be computed as a prod-
uct of .n sub-activations, each of them taking as input one of the .n variables/columns
of .x. The power .k to which each term .φk is raised will be given by an appropriate
.r × n power table . Tn with entries from .N ∪ {0}. For each neuron .i, .i = 1, 2, . . . , N ,
we will compute .gi (x) = gi (x1 , x2 , . . . , xn ) as


n
g (x) = φki1 (x1 )φki2 (x2 ) · · · φkin (xn ) =
. i φki j (x j ), ki j = Tn (i, j). (16.4)
j=1

Before addressing a possible construction mechanism for .Tn , a few preliminaries are
in order. Each row in .Tn represents a unique .n-tuple of powers ranging from 0 to an
arbitrary positive integer. The total number of rows .r is also arbitrary in the sense that
one generates as many rows as sees fit. [ The table follows ] the graded
[ lexicographic ]
order. That is, given two rows .Tn(a) = ka1 , ka2 , . . . , kan , .Tn(b) = kb1 , kb2 , . . . , kbn ,
.a, b ∈ N with .a / = b, they are sorted by means of the following rule [30, 31]. If either
condition is true:
∑ ∑
C.I: . nj=1 ka j > nj=1 kbj , or
∑n ∑
C.II: . j=1 ka j = nj=1 kbj and the first nonzero entry of .Tn(a) − Tn(b) is positive
then, .Tn(b) precedes .Tn(a) . As for the actual entries of the table, a few samples are
given in [30, 31] and a different variation is examined in [32]. However, except for
the .n = 1 case, it is not clear how one should go about acquiring a working version
for .Tn . Drawing from those samples, we propose the heuristic Algorithm 1.
16 A Weights Direct Determination Neural Network for Credit Card … 331

Particularly, let .s denote the sum of the elements of any given row of .T and let S
denote an upper bound for.s. Given S.= 2, the resulting tables.Tn for.n = 1, 2, 3, 4, 5,
are given below.
⎡ ⎤
00 0 0 0
⎢0 0 0 0 1⎥
⎢0 0⎥
⎡ ⎤ ⎢ 0 0 1 ⎥
0000 ⎢0 0 1 0 0⎥
⎢0 ⎥
⎢0 0 0 1⎥ ⎢ 1 0 0 0⎥
⎡0 0 0⎤ ⎢0 0 1 0⎥ ⎢1 0⎥
⎢ ⎥ ⎢ 0 0 0 ⎥
⎢0 1 0 0⎥ ⎢0 0 0 0 2⎥
⎡ ⎤ ⎢0 0 1 ⎥ ⎢ ⎥ ⎢0 ⎥
⎢0 1 0⎥ ⎢1 0 0 0 ⎥ ⎢ 0 0 1 1⎥
00
⎢ ⎥ ⎢ ⎥ ⎢0 0⎥
[ ] ⎢0 1⎥ ⎢ 1 0 0⎥ ⎢0 0 0 2⎥ ⎢ 0 0 2 ⎥
0 ⎢1 0 ⎥ ⎢0 0 2⎥ ⎢0 0 1 1 ⎥ ⎢0 0 1 0 1⎥
. T1 = 1 , T2 = ⎢ ⎥ , T3 = ⎢ ⎥ , T4 = ⎢0 0 2 0⎥ , T5 = ⎢
⎢ ⎥
⎢0 0 1 1

0⎥ .
⎢0 2⎥ ⎢0 1 1⎥ ⎢0 1 0 1⎥ ⎢0
2 ⎣1 1⎦ ⎢ ⎥ ⎢ ⎥ ⎢ 0 2 0 0⎥⎥
⎢0 2 0 ⎥ ⎢ ⎥ ⎢0 1⎥
⎢1 0 1⎥ ⎢
0110
⎥ ⎢ 1 0 0

20 ⎣ ⎦ ⎢0 2 0 0⎥ ⎢0 1 0 1 0⎥
110 ⎢1 0 0 1⎥ ⎢0 0⎥
200 ⎢ ⎥ ⎢ 1 1 0 ⎥
⎢1 0 1 0 ⎥ ⎢0 2 0 0 0⎥
⎣1 1 0 0 ⎦ ⎢ ⎥
⎢1 0 0 0 1⎥
⎢1 0⎥
2000 ⎢ 0 0 1 ⎥
⎢1 0 1 0 0⎥
⎣1 1 0 0 0

20 0 0 0

Although no two rows of .T are identical, two or more rows may share a common
.s. Thus, .T is organized naturally into blocks of rows of equal .s. Those blocks are
sorted in ascending order, as C.I of the graded lexicographic ordering rule suggests.
Each block is then internally organized by means of C.II. The heuristic part of the
algorithm lies in the contents of each block. A guess is that, for a given block where
.s is fixed, we are interested in all possible unique .n-tuples where the sum of all .n
[ ]
entries equals .s, starting from . s, 0, . . . , 0 . Through this vector, one generates the
unique integer partitions of .s in the form of rows of length .n. For each such row,
we subsequently compute all unique permutations and by that point we should have
exhausted all vectors of interest. Starting from an empty matrix .T and setting .s = 0
we repeat those steps, each time sorting the resulting block by C.II , concatenating
it to the bottom of .T and incrementing .s by 1.
Suppose that at any given point in time the hidden layer of the neural network
consists of . N neurons. One needs to determine the optimal weights (vector coef-
ficients) .w1 , w2 , . . . , w N such that the linear combination .w1 g1 (x) + w2 g2 (x) +
· · · + w N g N (x) evaluates to a vector . ŷ ∈ Rm that
[ is, with as little error as] possible,
close to the target vector . y ∈ Rm . Letting . A = g1 (x), g2 (x), . . . , g N (x) ∈ Rm×N ,
the least-squares solution to . Aw = y is given by the WDD process [33]:

.w = A† y, (16.5)

where . A† denotes the pseudo-inverse of . A.


332 V. N. Katsikis et al.

Algorithm 1 Creation of Power Tables


Input: positive integer S, number of inputs n
1: initialize T as an empty matrix and set s to 0.
2: while s ≤ S do
3: set t to be an empty matrix
4: compute the m unique integer partitions of s consisting of n integers ≥ 0.
5: for i = 1 to m do
6: compute in the form of rows the unique permutations of each integer partition
7: concatenate the resulting permutations to the bottom of t
8: end for
9: sort t by C.II of the graded lexicographic order and concatenate t to the bottom of T
10: s =s+1
11: end while
Output: T

16.2.2 The Trigonometrically Activated WASD Algorithm

A WASD training algorithm, as the name suggests, is bound to incorporate at least


two features: a method to determine the optimal weights, which we described as
the WDD process in the previous subsection, as well as a strategy based on which
the training process will converge to a hidden layer structure that maximizes perfor-
mance. Although the WDD process offers little room for variation, one can adjust
the latter part in a number of ways. Namely, a few viable options are the following:
1. grow the size of the hidden layer until a performance target is met [31];
2. let the number of hidden layer neurons increase up to a number of choice and
then trim the structure in a way that facilitates performance;
3. trim the structure as it is grows, adding only those neurons that directly improve
the neural network’s performing ability [30, 31];
4. use a combination of the above or even an entirely different strategy.
It is worth noting that the success of any structure determination scheme greatly
depends on a number of factors such as the type of the problem, the performance
metric to be used, the choice of whether performance is evaluated before or after
the output has been converted to binary (applies on classification problems), the
activation functions, etc. In view of the above, namely the type of the problem
(classification) and the conversion of the output to binary prior to the evaluation of
the performance metric which, for performance reasons, we chose to be

1 ∑
m
.MAE = | ỹ j − y j |, (16.6)
m j=1

we based the structure determination part of the training algorithm on option 2.


16 A Weights Direct Determination Neural Network for Credit Card … 333

Algorithm 2 Constructing the Matrix A


Input: The matrix of inputs x, the power table T
1: procedure matrixA(x, T )
2: set m, n the number of rows and columns, respectively, in x
3: set N the number of rows in T and set S = sum(T (N , :))
4: initialize VL as an m × (S + 1) × n NaN array
5: set A = zeros(m, N )
6: for i = 1 to N do
7: set g = zeros(m, n)
8: for j = 1 to n do
9: set k = T (i, j )
10: if sum(isnan(VL(:, k + 1, j ))) == m then
11: set VL(:, k + 1, j ) = φk (x(:, j )), as in Sect. 16.2.1
12: end if
13: set g(:, j ) = VL(:, k + 1, j )
14: end for
15: set A(:, i ) = prod(g,2)
16: end for
17: end procedure
Output: The matrix A

Given .x, . y and a maximum number of hidden layer neurons . N , the first
step
[ is to grow the hidden ] layer structure to size . N , essentially building . A =
g1 (x), g2 (x), . . . , g N (x) column by column, as in Sect. 16.2.1. Let .n denote the
number of inputs. It is essential that the first . N rows of .Tn (as in Sect. 16.2.1) are
available. The procedure above, which incorporates numerous MATLAB commands,
constructs . A through an implementation of (16.4) that takes advantage of the struc-
turing of .T in order to reduce the total running time. Namely, the entries in .T (see
examples in Sect. 16.2.1) suggest that for each variable .x j , . j = 1, 2, . . . , n, given
a power .k, the computation of .φk (x j ) is bound to come up multiple times within
the process. Thus, for each .k and .x j of interest, .φk (x j ) is saved the first time it is
computed, resulting in the development of a value ∏ lexicon (VL) in the form of a
three dimensional array. On calculating .gi (x) = nj=1 φki j (x j ), with .ki j = T (i, j),
all available terms are drawn from the lexicon and the rest are computed and sub-
sequently saved for future use. Looping through all .i from .1 to . N and assigning
each .gi (x) to the corresponding column of . A, matrix . A is sequentially produced.
Algorithm 2 describes the aforementioned process of creating . A.
The next step following the successful construction of the hidden layer is to
determine the optimal weights vector .w by use of (16.5). Then, one computes ∑ . ŷ =
Aw and given a threshold . p, converts . ŷ to binary as in (16.1). Let .e = m1 mj=1 | ỹ j −
y j | be the current MAE. In order to fine-tune the hidden layer structure, the process
progresses into a post-training structure timing stage. Particularly, each neuron is
taken out of the structure in an iterative manner and MAE is recomputed. Whenever
the resulting error is lower than the benchmark, the neuron in question is dropped and
the benchmark MAE, as well as the optimal weights vector, are updated. Through
indx, an index vector, one keeps track of the indices of all remaining neurons. To each
neuron corresponds a unique row in .T . Namely, in Algorithm 2, for the computation
334 V. N. Katsikis et al.


of .gi (x) = nj=1 φki j (x j ) the powers .ki j were drawn from the .ith row of .T . If the .ith
neuron is dropped, the same should apply to the .ith row of .T so as not to be assigned
by mistake to one of the remaining neurons. This marks the end of the training process
which upon terminating will have yielded the optimal weights vector .wbest and the
remaining rows .Tbest of the starting power table .T . Those two elements, along with
the sub-activation.φk which was used in building the activation functions, fully define
the neural network in the sense that these are the only necessary features that are
needed in order for the MTA-WASD neural network’s structure to be recreated for
testing purposes. The aforementioned process is described in Algorithm 3, whereas
the MTA-WASD neural network’s training process is synopsized in the flowchart
of Fig. 16.2 and a roadmap for the application of the trained MTA-WASD neural
network is provided in the flowchart of Fig. 16.3.

Algorithm 3 The TRIM Algorithm


Input: The current weight vector w, the current error e, the matrix A, the power table T , the target
vector y, m and N as in Algorithm 2 and a threshold p ∈ [0, 1]
1: wbest = w
2: emin = e
3: indx = array of positive integers from 1 to N
4: for i = 1 to N do
5: rem = setdiff(indx,i)
6: B = A(:,rem)
7: w = B† y
8: ỹ = f (Bw, p) as in (16.1)

m
9: e = m1 | ỹ j − y j |
j=1
10: if emin > e then
11: emin = e
12: wbest = w
13: indx = rem
14: end if
15: end for
16: Tbest = T (indx, :)
Output: The post-trimming optimal weights vector wbest and the remaining rows of T , Tbest

16.3 Experiments

In this section, we apply the MTA-WASD neural network under SA1 and SA2 to three
publicly available credit card attrition/churn datasets and compare its performance
to four other well performing classifiers, three of them coming from MATLAB’s
classification learner app and the fourth one being another WASD neural network
that incorporates Bernoulli polynomials in the construction of its activation functions.
It is worth noting that all WASD neural networks will be trained up to 100 neurons
and that is because performance on the testing set deteriorates rapidly, at least in those
16 A Weights Direct Determination Neural Network for Credit Card … 335

Fig. 16.2 Flowchart of the


MTA-WASD neural
network’s training

Fig. 16.3 Flowchart of the


MTA-WASD neural
network’s testing

three datasets, as the size of the structure increases over that threshold. Furthermore,
all inputs to those models will be normalized to the interval .[−1, 1], for similar
reasons. Last but not least, we will refer to each of the three datasets as AD.I, AD.II
and AD.III, respectively, and in the context of figures we will abbreviate “credit card
customer” by CCC. Notice also that the datasets employed in this research can be
acquired from Kaggle (https://www.kaggle.com/).
336 V. N. Katsikis et al.

16.3.1 Attrition Dataset I

The first dataset, available at https://www.kaggle.com/datasets/rjmanoj/credit-card-


customer-churn-prediction, consists of 14 columns (13 variables) and 10,000 rows.
Of them, the first three columns, referring to the index of each row, the Customer
ID and the customer’s surname, were discarded as being trivial. Another column
containing the name of the country of origin of each customer was also let aside.
Finally, a minor conversion took place regarding the values “Female” and “Male” of
the Gender column to 1 and 0, respectively.
The dataset is imbalanced with only 2037 (or .20.37% of) samples corresponding
to customers who attrited. To even the proportions we randomly sampled 5926 indices
from the dominating class and dropped the corresponding rows. The resulting dataset
consists of 4074 rows with an even distribution between attrited and non-attrited
credit card customers. For training and testing purposes, a 80–20% stratified (yet
random) split is implemented through MATLAB’s cvpartition routine. As a
result, the balance between the two classes is maintained in both sets. Figure 16.4
demonstrates the training error paths (number of neurons plotted against MAE) as
well as the classification performances of the SA1 (left column subfigures) and SA2
(right column subfigures) of the MTA-WASD neural network when applied to the
training and testing set, respectively.
The two MTA-WASD neural networks seem to have followed a similar training
path, although there are slight differences both in the positions of the neurons that
have been trimmed as well as in the direction of the error path which in the model
based on SA1 seems to have stabilized whereas in the model based on SA2 to have
taken an uphill direction near the late stages. The post-trimming error (green mark) is
nevertheless the all-time minimum for both models. Namely, the MTA-WASD neural
network based on SA1 settles on a (training) MAE of 0.2399 (the lower the better)
whereas the MTA-WASD neural network based on SA2 achieves an even lower MAE
of 0.2387, regardless of the slight “misstep” near the end. The results on the testing set,
which are collectively examined for all models (and datasets) in Sect. 16.3.4, follow
the same pattern where no decisive winner emerges among the two models but rather
each one of the two performs slightly better than the other on a subset of metrics, but
not all metrics. The structures are of similar size, 93 and 89 neurons, respectively,
with the two MTA-WASD neural networks achieving uniform performance across
the training and testing sets. This may be interpreted as a positive signal as to the
generalization ability of both models, however it bears mentioning that, for attrited
and non-attrited customers alike, about a fourth (sometimes even a third) of samples
is misclassified as being of the opposite class. Fortunately, all involved classifiers
demonstrated this tendency when trained and tested on this dataset, therefore we
may safely assume that this error does not stem from the training mechanism and/or
choice of activation functions but rather from the dataset itself.
16 A Weights Direct Determination Neural Network for Credit Card … 337

Fig. 16.4 The MTA-WASD neural network’s results on AD.I under SA1 and SA2

16.3.2 Attrition Dataset II

The second dataset, which is available at https://www.kaggle.com/datasets/devkuttan/


data-of-bank-customers, consists of 23 columns (22 variables) and 10,127 rows. The
last two columns comprise of probabilities coming from a Naive Bayes classifica-
tion model that apparently has already been trained on the data. Those are of course
removed along with a third column which enumerates the IDs of the various clients.
To bring this dataset into operational form there are multiple conversions that have to
take place, mainly because 6 of the remaining 20 columns comprise solely of strings.
Those columns refer to the income category (given in income ranges), the card cate-
gory (color), the education level, the gender, the marital status and finally the attrition
status of each customer. It is reasonable that one will opt to assign to the instances of
each column a numerical value that encaptures as much qualitative information as
possible. Except for the attrition status and gender columns, which were converted
338 V. N. Katsikis et al.

to arrays of zeros and ones, transforming the rest of the columns entails, naturally, a
subjective element. Nevertheless, our approach may be summarized in the following
points:
1. Income category: Each of the five listed ranges was mapped to an integer from
.− 2 to 2 with the smallest value being assigned to the lowest income category.
2. Card category: The listed colors (Blue, Gold, Platinum, Silver) were ranked as
Blue(0) .< Silver(1) .< Gold(2) .< Platinum(3), where the value in the parenthesis
is the assigned integer.
3. Education level: A similar assignment to that in 1, with the smallest value being
assigned to the least educated category.
4. Martial status: Single(.−1) .< Divorced(0) .< Married(1)
5. Whenever an entry was marked as “unknown”, the whole row containing the
entry was dropped.
One could argue that the rationale behind this set of assignments seems to stem
more from a credit card default point of view rather than that of credit card attrition
(which is different). Although we won’t try to argue that the assignment is opti-
mal, it is in fact worth reporting that upon reversing (one at a time) the orders in
the aforementioned points 1, 3 and 4, we were faced with a very steep decline in
performance. After all is said and done, out of the 10,127 rows there will be 7081
remaining with only 15.72% of them corresponding to attrited customers. Having
tried to either randomly delete samples from the dominating class or to synthetically
generate more training samples matching to the dominated class, we found that the
choice which facilitated both performance and reproducibility was, surprisingly, to
keep the dataset as is.
Figure 16.5 demonstrates the training error paths (number of neurons plotted
against MAE) as well as the classification performances of the SA1 (left column
subfigures) and SA2 (right column subfigures) of the MTA-WASD neural network
when applied to the training and testing set, respectively. Comparing the figures for
the SA1 and SA2, a similar training path is drawn with neurons being trimmed at dif-
ferent positions, eventually resulting in a structure of (almost) identical size, namely
77 to 76 neurons for the models based on SA1 and SA2, respectively. When it comes
to performance on the training set the model based on SA1 performs slightly better
overall, scoring a MAE of 0.0658, whereas the model based on SA2 evaluates to a
MAE of 0.0687. Nevertheless, performance on the test set (see Sect. 16.3.4) more
than makes up for any deficiencies. This is the only dataset where, at least on the
testing set, only the MTA-WASD neural network model under SA2 seems to come
out on top. However, as will be seen in Sect. 16.3.4, there seems to be no evidence
to support the hypothesis that the difference between the predictive accuracies of the
two models is in fact statistically significant.
16 A Weights Direct Determination Neural Network for Credit Card … 339

Fig. 16.5 The MTA-WASD neural network’s results on AD.II under SA1 and SA2

16.3.3 Attrition Dataset III

The third dataset, which is available at https://www.kaggle.com/datasets/


jaikishankumaraswamy/customer-credit-card-churn, consists of 34 columns (33
variables) and comes already split into a training and testing set of 7595 (75% of
total) and 2532 (25% of total) rows, respectively. It is a modified version of the pre-
vious dataset (AD.II) with the difference being that some feature columns (income
category, education level, card category, marital status), that used to contain a number
of different classes, have been split into an equal number of columns. Each class is
now assigned a unique binary column so that a client which, for example, falls under
a certain income range, will be assigned the value 1 at the appropriate column and the
value 0 at all other columns related to that specific feature; that is, the income range.
Furthermore, for each feature column, unknown values are treated as a class of their
own. Even if a row is missing a value at a specific feature, it will be assigned the
340 V. N. Katsikis et al.

value 0 at all related class columns except for the unknown class column, in which
it will be assigned the value 1. Consequently, at a cost of increasing the number
of columns of the original dataset by half, no rows are discarded and thus all the
available information is preserved. Interestingly enough, this results in the precision
of all investigated models (WASD and MATLAB classifiers alike) increasing by a
non-trivial amount.
All in all, except for two of the columns being dropped (those referring to Client
ID and row index), the dataset is in operational form once the contents of the comma-
separated values (CSV) files have been converted to arrays. Unbalanced though the
dataset may be, with nearly 85% of samples corresponding to non-attrited customers,
we chose not to apply any preprocessing techniques (other than the usual normal-
ization), nor to rearrange the training and testing sets, but rather to use the existing
format. As far as the two versions of the MTA-WASD neural network are concerned,
there is one final element of surprise and this is that they perform identically in all
aspects. Indeed, as is depicted in Fig. 16.6, the MTA-WASD neural networks based
on SA1 and SA2, respectively, converged to the same structure and produced the
same results both in the training as well as in the testing set. The performance of the
models on the testing set is presented in Sect. 16.3.4, while the training set evaluat-
ing to a MAE of 0.0903. Between AD.II and AD.III, there is a striking difference in
precision (.≈ 72–94.33%) which, as was discussed above, may be attributed to the
current structuring of the dataset.

16.3.4 Collective Performance Comparison

To be able to extract meaningful conclusions as to the competency of the proposed


MTA-WASD neural network, a comparison with other well performing classifiers is
in order. To that end, we picked three popular models from MATLAB’s classification
learner app, namely, KNB, Linear SVM and Fine KNN, as well as another WASD
neural network whose activation functions are based on Bernoulli polynomials [30].
All models are evaluated on the basis of nine metrics, namely, MAE, TP, FP, TN,
FN, Precision, Recall, Accuracy and F-measure, where TP, FP, TN, FN stand for true
positive, false positive, true negative and false negative rate, respectively. Except for
MAE, the rest of the metrics are calculated by means of those last four quantities.
In Table 16.1, we present the collective testing results on all three datasets. The
two MTA-WASD neural networks based on SA1 and SA2, respectively, definitely
stand out. In AD.I and AD.II, they score the highest F-measure, the highest accuracy,
the highest precision and, along with the Bernoulli neural network, the lowest MAE.
In AD.I, the Bernoulli neural network comes ahead in the recall department, whereas
KNB dominates that metric in AD.II. As far as AD.III is concerned, the two MTA-
WASD neural networks based on SA1 and SA2 score identically (as was mentioned
in Sect. 16.3.3) with an accuracy of 90.67% and precision of 97.06% but only take
2nd or even 3rd place across the listed metrics. Nevertheless, contrary to other models
whose success in one metric comes at a cost of below average performance in the
16 A Weights Direct Determination Neural Network for Credit Card … 341

Fig. 16.6 The MTA-WASD neural network’s results on AD.III under SA1 and SA2

rest of the evaluation criteria, the MTA-WASD neural network demonstrates robust
and often superior overall performance. Thus, it is a reliable classifier that is very
much capable of competing in equal terms with other well-established models, as
the results in the following table suggest. In view of clarity, for each dataset and each
metric a bolded value signifies the highest score, whereas the colors blue, red and
purple signify the winning model for AD.I, AD.II and AD.III, respectively. Note that
we do not take TP, FP, TN and FN directly into consideration. This is done indirectly,
by treating the other measures which use the former as building blocks. Furthermore,
in AD.III the Linear SVM and Bernoulli WASD both score the highest scores at two
different metrics. Picking a winner in that case is fairly subjective.
In order to properly assess whether the two MTA-WASD neural networks based
on SA1 and SA2 differ from each other and, furthermore, in order to address the
equally important question of whether one can conclude to a model that is better
suited for predicting credit card attrition in those datasets, we add to the previous
342

Table 16.1 Performance comparison between neural network models


WASD Single pruning MAE variation Single pruning MAE variation Double pruning MSE variation
Activation SA1 SA2 Bernoulli polynomials
Dataset AD.I AD.II AD.III AD.I AD.II AD.III AD.I AD.II AD.III
MAE 0.248157 0.080508 0.093207 0.238329 0.076271 0.093207 0.249386 0.079802 .191548
TP 0.847666 0.707207 0.97067 0.815725 0.725225 0.97067 0.734644 0.689189 0.795624
FP 0.152334 0.292793 0.02933 0.184275 0.274775 0.02933 0.265356 0.310811 0.204376
TN 0.65602 0.958961 0.549479 0.707617 0.960637 0.549479 0.766585 0.963149 0.880208
FN 0.34398 0.041039 0.450521 0.292383 0.039363 0.450521 0.233415 0.036851 0.119792
Precision 0.847666 0.707207 0.97067 0.815725 0.725225 0.97067 0.734644 0.689189 0.795624
Recall 0.71134 0.945154 0.682998 0.736142 0.948517 0.682998 0.758883 0.949244 0.86914
Accuracy 0.751843 0.919492 0.906793 0.761671 0.923729 0.906793 0.750614 0.920198 0.808452
F-measure 0.773543 0.809048 0.801812 0.773893 0.821976 0.801812 0.746567 0.798578 0.830758
Model KNB Linear SVM Fine KNN
Dataset AD.I AD.II AD.III AD.I AD.II AD.III AD.I AD.II AD.III
MAE 0.2543 0.113701 0.146524 0.292383 0.083333 0.092022 0.31941 0.103814 0.150079
TP 0.742015 0.292793 1 0.700246 0.617117 0.96648 0.685504 0.608108 0.925512
FP 0.257985 0.707207 0 0.299754 0.382883 0.03352 0.314496 0.391892 0.074488
TN 0.749386 0.99665 0.033854 0.714988 0.972362 0.580729 0.675676 0.949749 0.427083
FN 0.250614 0.00335 0.966146 0.285012 0.027638 0.419271 0.324324 0.050251 0.572917
Precision 0.742015 0.292793 1 0.700246 0.617117 0.96648 0.685504 0.608108 0.925512
Recall 0.747525 0.988688 0.508609 0.710723 0.957134 0.697441 0.678832 0.923672 0.617655
Accuracy 0.7457 0.886299 0.853476 0.707617 0.916667 0.907978 0.68059 0.896186 0.849921
F-measure 0.74476 0.451791 0.674276 0.705446 0.750406 0.810211 0.682152 0.733385 0.740875
V. N. Katsikis et al.
16 A Weights Direct Determination Neural Network for Credit Card … 343

analysis a statistical element, in the form of a mid-. p-value McNemar test, as in


[7, 24]. The purpose of this test is to assert whether the difference between the
predictive accuracies of two binary classification models, that are trained and tested
on the same sets of data, is in fact statistically significant. In other words, one tests
whether two classifiers demonstrate equal (null hypothesis) or unequal (alternative
hypothesis) accuracy in predicting the true class (in this case, the attrited customer).
MATLAB’s statistics and machine learning toolbox comes with a pre-built solution
to that, namely, the testcholdout function.
For each of the two MTA-WASD neural networks, we took all possible combi-
nations of pairs consisting of the target neural network and one of the other five
models and applied the mid-. p-value McNemar test. At the 5% significance level, we
were able to assess that the two MTA-WASD models are not statistically different in
terms of their accuracy, in any of the three datasets. Moreover, regarding the accu-
racy of both models in AD.III, in which they formerly scored second, there is strong
evidence that it is equal to Linear SVM’s, which took first place. Other than that,
the evidence suggests that in AD.I there is no distinction between the accuracies of
the MTA-WASD based on SA1 and SA2, the Bernoulli WASD and the KNB model.
However, although McNemar’s test does not extend to that, the precision of the MTA-
WASD model is quite higher, therefore it is reasonable that one would choose that
neural network over the others if there was ever a need to make predictions on more
data coming from the same source. Last but not least, substituting Linear SVM in
the place of KNB, the same argument can be made in AD.II. Again, although there
seems to be no strong evidence towards rejecting the null hypothesis that the four
models perform equally in terms of accuracy, the scores of the MTA-WASD neural
network in the rest of the metrics suggest that it is again the better choice for use in
this dataset.
The collective results regarding the mid-. p-value McNemar test, involving the two
MTA-WASD neural networks based on SA1 and SA2 versus the rest of the models,
are presented in Tables 16.2 and 16.3, respectively.

Table 16.2 McNemar test on the MTA-WASD neural network based on SA1
SA1 AD.I AD.II AD.III
Versus Null p-value Null p-value Null p-value
hypothesis hypothesis hypothesis
SA2 Not rejected 0.362032 Not rejected 0.401061991 Not rejected 1
KNB Not rejected 0.687149 Rejected 0.000135809 Rejected 1.79112E.−17
Linear SVM Rejected 0.007284 Not rejected 0.687885344 Not rejected 0.734342033
Fine KNN Rejected 5.42E.−05 Rejected 0.010212663 Rejected 6.21851E.−18
Bernoulli Not rejected 0.917041 Not rejected 0.897421827 Rejected 8.54198E.−30
344 V. N. Katsikis et al.

Table 16.3 McNemar test on the MTA-WASD neural network based on SA2
SA2 AD.I AD.II AD.III
Versus Null p-value Null p-value Null p-value
hypothesis hypothesis hypothesis
SA1 Not rejected 0.362032 Not rejected 0.401061991 Not rejected 1
KNB Not rejected 0.257761 Rejected 1.83849E.−05 Rejected 1.79112E.−17
Linear SVM Rejected 0.000774 Not rejected 0.32447878 Not rejected 0.734342033
Fine KNN Rejected 4.65E.−06 Rejected 0.002672984 Rejected 6.21851E.−18
Bernoulli Not rejected 0.267812 Not rejected 0.470879014 Rejected 8.54198E.−30

16.4 Conclusion

This paper introduces a trigonometrically activated classification neural network,


called MTA-WASD, that is trained through a WASD algorithm. The activation
functions SA1 and SA2 are defined as products of power based trigonometric sub-
activations, where each term is assigned a power based on a lexicographically ordered
power table. The training process incorporates structure trimming techniques and
employs a value lexicon that allows the MTA-WASD neural network to train at high
speed. Application of the MTA-WASD neural network to three publicly available
credit card attrition datasets and subsequent comparison of its performance with
other popular classifiers have demonstrated that the MTA-WASD model consists a
reliable solution when it comes to classification problems that involve imbalanced
datasets.
A limitation in the study was the shortage of data as only three datasets were
considered, whereas some areas of future research can be pointed out:
1. The development of an ensemble of trigonometrically activated WASD classi-
fiers, potentially through the use of the already established algorithms, may be
investigated.
2. A different implementation of the power table generation algorithm that allows
for a stopping criterion in the number of total rows produced, would enable the
training algorithm to apply on larger datasets that one would expect to encounter
in the banking industry.

References

1. Bigdeli, B., Pahlavani, P., Amirkolaee, H.A.: An ensemble deep learning method as data fusion
system for remote sensing multisensor classification. Appl. Soft Comput. 110, 107563 (2021)
2. Chitra, K., Subashini, B.: Customer retention in banking sector using predictive data mining
technique. In: ICIT 2011 The 5th International Conference on Information Technology (2011)
3. Esen, H., Esen, M., Ozsolak, O.: Modelling and experimental performance analysis of solar-
assisted ground source heat pump system. J. Exp. Theor. Artif. Intell. 29(1), 1–17 (2017)
16 A Weights Direct Determination Neural Network for Credit Card … 345

4. Esen, H., Inalli, M., Sengur, A., Esen, M.: Artificial neural networks and adaptive neuro-fuzzy
assessments for ground-coupled heat pump system. Energy Build. 40(6), 1074–1083 (2008)
5. Esen, H., Inalli, M., Sengur, A., Esen, M.: Performance prediction of a ground-coupled heat
pump system using artificial neural networks. Expert Syst. Appl. 35(4), 1940–1948 (2008)
6. Esen, H., Ozgen, F., Esen, M., Sengur, A.: Artificial neural network and wavelet neural network
approaches for modelling of a solar air heater. Expert Syst. Appl. 36(8), 11240–11248 (2009)
7. Fagerland, M.W., Lydersen, S., Laake, P.: The McNemar test for binary matched-pairs data:
mid-p and asymptotic are better than exact conditional. BMC Med. Res. Methodol. 13(1), 1–8
(2013)
8. Farquad, M.A.H., Ravi, V., Raju, S.B.: Churn prediction using comprehensible support vector
machine: an analytical CRM application. Appl. Soft Comput. 19, 31–40 (2014)
9. García, D.L., Nebot, À., Vellido, A.: Intelligent data analysis approaches to churn as a business
problem: a survey. Knowl. Inf. Syst. 51(3), 719–774 (2017)
10. He, B., Shi, Y., Wan, Q., Zhao, X.: Prediction of customer attrition of commercial banks based
on SVM model. Procedia Comput. Sci. 31, 423–430 (2014)
11. Hu, X.: A data mining approach for retailing bank customer attrition analysis. Appl. Intell.
22(1), 47–60 (2005)
12. Huang, C., Jia, X., Zhang, Z.: A modified back propagation artificial neural network model
based on genetic algorithm to predict the flow behavior of 5754 aluminum alloy. Materials
11(5), 855 (2018)
13. Katsikis, V.N., Mourtas, S.D., Stanimirović, P.S., Zhang, Y.: Solving complex-valued time-
varying linear matrix equations via QR decomposition with applications to robotic motion
tracking and on angle-of-arrival localization. IEEE Trans. Neural Networks Learn. Syst. 33(8),
3415–3424 (2022)
14. Kim, S., Shin, K.-S., Park, K.: An application of support vector machines for customer churn
analysis: credit card case. In: International Conference on Natural Computation, pp. 636–647.
Springer, Berlin (2005)
15. Kumar, D.A., Ravi, V., et al.: Predicting credit card customer churn in banks using data mining.
Int. J. Data Anal. Tech. Strateg. 1(1), 4–28 (2008)
16. Mourtas, S., Katsikis, V., Kasimis, C.: Feedback control systems stabilization using a bio-
inspired neural network. EAI Endorsed Trans. AI Rob. 1(1), 1–13 (2022)
17. Mourtas, S.D.: A weights direct determination neuronet for time-series with applications in
the industrial indices of the federal reserve bank of St. Louis. J. Forecast. 14(7), 1512–1524
(2022)
18. Mourtas, S.D., Katsikis, V.N., Drakonakis, E., Kotsios, S.: Stabilization of stochastic exchange
rate dynamics under central bank intervention using neuronets. Int. J. Inf. Technol. Decis. Mak.
22(2), 855–883 (2023)
19. Nie, G., Wang, G., Zhang, P., Tian, Y., Shi, Y.: Finding the hidden pattern of credit card holder’s
churn: a case of china. In: International Conference on Computational Science, pp. 561–569.
Springer, Berlin
20. Poveda, R., Alvaro, C.: Forecasting credit card attrition using machine learning models. In:
ICAIW 2020: Workshops at the Third International Conference on Applied Informatics 2020,
29–31 Oct 2020, Ota, Nigeria (2020)
21. Premalatha, N., Valan Arasu, A.: Prediction of solar radiation for solar systems by using ANN
models with different back propagation algorithms. J. Appl. Res. Technol. 14(3), 206–214
(2016)
22. Simos, T.E., Katsikis, V.N., Mourtas, S.D.: A fuzzy WASD neuronet with application in breast
cancer prediction. Neural Comput. Appl. 34, 3019–3031 (2021)
23. Simos, T.E., Katsikis, V.N., Mourtas, S.D.: Multi-input bio-inspired weights and structure deter-
mination neuronet with applications in European Central Bank publications. Math. Comput.
Simul. 193, 451–465 (2022)
24. Simos, T.E., Katsikis, V.N., Mourtas, S.D.: A multi-input with multi-function activated weights
and structure determination neuronet for classification problems and applications in firm fraud
and loan approval. Appl. Soft Comput. 127, 109351 (2022)
346 V. N. Katsikis et al.

25. Simos, T.E., Mourtas, S.D., Katsikis, V.N.: Time-varying Black-Litterman portfolio optimiza-
tion using a bio-inspired approach and neuronets. Appl. Soft Comput. 112, 107767 (2021)
26. Tang, L., Thomas, L., Fletcher, M., Pan, J., Marshall, A.: Assessing the impact of derived
behavior information on customer attrition in the financial service industry. Eur. J. Oper. Res.
236(2), 624–633 (2014)
27. Van den Poel, D., Lariviere, B.: Customer attrition analysis for financial services using propor-
tional hazard models. Eur. J. Oper. Res. 157(1), 196–217 (2004)
28. Vera, R., Ossandón, S.: On the prediction of atmospheric corrosion of metals and alloys in
Chile using artificial neural networks. Int. J. Electrochem. Sci 9(12), 7131–7151 (2014)
29. Wang, G., Liu, L., Peng, Y., Nie, G., Kou, G., Shi, Y.: Predicting credit card holder churn
in banks of china using data mining and MCDM. In: 2010 IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 215–218. IEEE
(2010)
30. Zhang, Y., Chen, D., Jin, L., Wang, Y., Luo, F.: Twice-pruning aided WASD neuronet of
Bernoulli-polynomial type with extension to robust classification. In: 2013 IEEE 11th Inter-
national Conference on Dependable, Autonomic and Secure Computing, pp. 334–339. IEEE
(2013)
31. Zhang, Y., Chen, D., Ye, C.: Deep Neural Networks: WASD Neuronet Models, Algorithms,
and Applications. CRC Press (2019)
32. Zhang, Y., Wang, Y., Li, W., Chou, Y., Zhang, Z.: WASD algorithm with pruning-while-growing
and twice-pruning techniques for multi-input Euler polynomial neural network. Int. J. Artif.
Intell. Tools 25(02), 1650007 (2016)
33. Zhang, Y., Yu, X., Xiao, L., Li, W., Fan, Z., Zhang, W.: Weights and structure determination of
artificial neuronets. In: Self-Organization: Theories and Methods. Nova Science, New York,
NY, USA (2013)
Chapter 17
Stock Market Prediction Using Machine
Learning: Evidence from India

Subhamitra Patra, Trilok Nath Pandey, and Biswabhusan Bhuyan

Abstract Literature deciphers the dynamics of the stock market environment across
the regions. Moreover, the emerging stock market like India has been experiencing
several ups and downturns due to its continuous economic reforms since the early
1990s, which makes the Indian stock markets exhibit the diversified information
characteristics. The chapter predicts the movements of the Indian stock markets over
2000–2022, and observes certain dynamism in both the actual and predicted trends
of the Indian stock markets. The results revealed that Long Short-Term Memory
holding the time-independence characteristics and greater extent of prediction accu-
racy proved as the best machine learning technique to predict the movement of the
Indian stock markets. Moreover, the degree of prediction accuracy of all the machine
learning techniques except Long-short term memory varies from one time to other.
On the other hand, Support vector machines and linear regression models with their
lowest degree of prediction accuracy and highest errors proved least appropriate in
predicting the movements of Nifty, and Sensex respectively. The robustness of our
method would benefit for testing it on another markets, and time periods. The study
also discusses the strengths and weaknesses of several machine learning techniques
and provide important insights in applying advanced technologies for stock market
prediction of an emerging economy like India. Our prediction approach provides a
potentially beneficial alternative for the investors to identify the return opportuni-
ties and achieve the diversification benefits by mitigating risk while investing in the
Indian stock markets.

S. Patra (B)
Goa Institute of Management, Sanquelim, Goa 403505, India
e-mail: [email protected]
T. N. Pandey
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil
Nadu 600127, India
e-mail: [email protected]
B. Bhuyan
Department of Economics, Maharaja Purna Chandra (Autonomous) College, Baripada,
Odisha 757003, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 347
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_17
348 S. Patra et al.

Keywords Forecasting · Machine learning · Time series · Financial markets ·


Model validation · India

JEL Classification G17 · C45 · C22 · D53 · C52 · G19

17.1 Introduction

In this chapter, we predict the stock price movements of an emerging economy like
India using sophisticated machine learning techniques over the decades. Since the late
2000, the continuous economic and financial market reforms in the Indian economy
has been creating a dynamic information environment in the stock markets. In this
scenario, the prediction of stock markets has become the most challenging tasks
for the investors [12]. Despite the establishment of the efficient market hypothesis
(EMH)1 of Fama [15], the inquiry into different models and profitable system is
still attracting a lot of attention from academia to predict the stock price movements
[45]. Recently, a group of literature [6, 28] argued against the all-or-none condition
of EMH. However, some other studies supported the essence of adaptive market
hypothesis (AMH)2 not only in the case of the emerging stock market like India [5,
22], but also across the regions [36–38]. Moreover, a sophisticated predictive model
with the capability to generate excess returns not only helps the investors to obtain
large profit but also deviates the stock prices from the random-walk benchmark [19].
The continuous dynamic economic situations such as reforms, financial crashes,
bubbles, manias, change in political environment, and uncertainty in investor’s bias
make the stock prices chaotic, and noisy [4, 18, 29], and therefore increases the
degree of volatility co-movement between the stock market liquidity and informa-
tional efficiency [39]. Moreover, the emerging stock market like India has been expe-
riencing several ups and downturns due to its continuous economic reforms since
the early 1990s [5, 22]. According to S & P fact book (Standard & Poor’s 2012),
Indian stock markets has grown with the largest number of listed companies in its
stock exchanges, changes in the market microstructures, transparent, and advanced
trading practices. Moreover, the increased integration of the Indian stock markets
with the world economy, and growing percentage of the stock market capitalization
to GDP explains the phenomenal growth of the Indian stock markets. Therefore, the
emerging stock markets like India exhibit the diversified information characteristics,
and hold complexities in their price patterns which distinguish from the developed
stock markets, and emerged as the most important destination for foreign institutional
investors [21]. In this aspect, the future of the Indian stock markets is uncertain, and

1 EMH indicates that the stock market in order to become informationally efficient needs to follow
the random-walk benchmark, which makes the stock price movement unpredictable over time.
2 Lo [32] proposed AMH, which indicates that the financial markets rather than being an all-or-none

condition evolves over time, and therefore remains adaptive to several economic and non-economic
events.
17 Stock Market Prediction Using Machine Learning: Evidence from India 349

thus it is necessary to predict the future price patterns of the Indian stock exchanges
to reap the investment benefits over time.
The present study contributes to the existing literature in the following ways.
Departing from the previous research, we use several machine learning techniques
to predict the historical movements of the Indian stock prices. The existing literature
extensively used technical [11, 31] and fundamental analysis [10] to predict the
future market trends and the factors associated with the market trends. Some other
studies used moving averages, autoregressive models, discriminating analyses, and
correlations to predict the financial time series [29, 44]. Recently, the use of artificial
intelligence system in the prediction of chaotic, random, and non-linear financial time
series [12, 44, 48] has become the most promising area of research, and therefore
demands the further empirical investigation. The present research complements the
literature on the application of machine learning techniques in the financial market,
and therefore extends the existing work to the usages of machine learning models in
the context of Indian stock markets, which is the issue that seldom explored before.
In particular, to the best of our knowledge, the present study is the first to investigate,
and compare the applications of several machine learning techniques in the emerging
stock market like India during both the pre and post-Covid periods. Further, the issue
of nonlinearity in the Indian stock prices is addressed in this paper, which seldom
received due attention in India in the post-Covid periods.
The remainder of the chapter is structured as follows. Section 17.2 describes a
brief review on the machine learning techniques, and its applications. We explain
the used data and methods in Sect. 17.3. Section 17.4 discusses the main results, and
evaluates the machine learning applications in the case of the Indian stock market.
Section 17.5 summarizes and concludes the chapter.

17.2 A Review on Machine Learning Techniques

Artificial intelligence system is designed to efficiently deal with chaotic, random, and
non-linear financial time series [12]. Machine learning techniques, which combine
artificial intelligence systems, seek to extract patterns that learned from historical
data3 to successively make predictions about the new data4 [47]. The prediction
using the machine learning techniques is done in two phases. The First phase deals
with selecting the relevant variables, and models for the prediction, separating the
portion of the data for training and validation of the models, and then optimizing the
models. In the Second phase, the optimized models are applied to the data intended
for testing, which measures the predictive performance of the model. The existing
literature employs different machine learning techniques, such as the artificial neural

3 The process to learn the historical data is known as learning or training the dataset in machine
learning approach.
4 The process to make prediction about the new data is known as testing the dataset in machine

learning approach.
350 S. Patra et al.

networks (ANN) [49], support vector machines [40], and random forests (RF) [41]
to predict the time series.
In general, neural networks are developed to model the biological processes [1],
which is particularly related to the human system of learning and identifying the
patterns [43]. The basic units of the neural networks, known as the neuron, imitate
the human equivalent with dendrites to receive the input variables and obtain an output
value, which can also be served as the input for the rest neurons [30]. In this way,
the basic processing units of the neural networks are interconnected, and therefore
attributes certain weights for each connection [31]. These weights are adjusted in
each learning process of the network in the first phase [29]. In particular, in the First
phase, the model optimizes the interconnections between the layers of the neurons
by transferring the parameters from one layer to other, and therefore minimizes the
errors in the prediction of the subsequent dataset. Accordingly, the last layer of the
neural network combines all the signals from the previous layers, and converts into
one output signal, which is known as the response of the network to the input data.
Another important machine learning technique, Support Vector Machines (SVM)
considers the training samples, and efficiently transforms the training data from their
original dimension space to another space with the approximation of a linear separa-
tion by a hyperplane [25]. This technique is commonly used to classify the training
data based on the input variables in the model. In this technique, the transformation
is made with the help of kernel functions from the space of the original dimensions
to the space in which the classifications are performed during training the dataset
[33]. The major difference between ANN and SVM is that the former minimizes the
errors of their empirical responses in the first phase of the training stage, whereas
the later minimizes the upper threshold of the error of its classifications [24].
As an alternatives to ANN and SVM, the machine learning literature often uses
another technique namely decision tree (DT) to predict the financial time series.
This method divides the dataset into various subsets based on the values of the input
variables until the basic classification unit is obtained in accordance to the training
sample [3]. Moreover, the consistent classifications of the most accurate trees can be
efficiently combined into single one with the RF algorithm [9]. The combination of
DT and RF machine learning techniques can not only be used in the regressions or
classifications of the training samples, but also can efficiently predict the financial
markets [2, 26, 28, 29, 35].
The prediction of stock market with non-stationary behavior of its price patterns
is challenging [12, 42, 50]. Moreover, the dynamism in the stock price patterns are
influenced by the dynamic trends of the economy, industry, polity, and psycholog-
ical behavior of the investors [37, 51]. Thus, the prediction of the future behavior
of the stock market should be enriched with the use of advanced techniques, and its
practical applications to the historical price data for evaluating the profitability of the
techniques [20]. Recently, the use of machine learning techniques in the prediction
of financial time series with chaotic, noisy, and non-linear dynamics has become
a more promising area of research [12]. Machine-learning techniques integrating
several artificial intelligence systems seeks to extract specific patterns learned from
17 Stock Market Prediction Using Machine Learning: Evidence from India 351

historical data to subsequently make predictions about new data [47]. On this back-
drop, the prediction of chaotic stock price of the emerging economy like India remains
challenging [42, 50], but is intriguing over time.
In the overall, the literature used different machine learning techniques to predict
the financial time series. But, the application of the multiple machine learning tech-
niques in the prediction of the stock price of the emerging economy like India has
seldom explored before. Departing from the previous research, we employ several
other machine learning techniques such as Artificial Neural Networks (ANN), Long
Short-Term Memory (LSTM), Decision Tree Regression (DT), Random Forest (RF),
Support Vector Machine (SVM), Linear Regression (LR), Ridge Regression (RR),
and K-Nearest Neighbors (KNN) Regression to predict the movement of the Indian
stock price over two decades, which makes the present research unique in its own
way.

17.3 Data and Methods

17.3.1 Data

Twenty-two years of data on the daily closing stock price of both Sensex and Nifty
ranged from 3-Jan-2000 to 11-Oct-2022 were collected from the websites of Bombay
Stock Exchange (BSE) and Nationl Stock Exchange (NSE) respectively. Both the
indices are expressed in Indian Rupees. We have divided the full sample into testing
and training samples. Accordingly from the total sample, we considered 80% as the
training and 20% as the testing samples.

17.3.2 Methods

17.3.2.1 Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) is a machine learning algorithm that has taken its
inspiration from the structure and function of the biological neural networks [27]. It
consists of interconnected artificial neurons that used to process input data in order
to generate an output. Each artificial neuron receives input from the previous layer
of neurons and processes it using an activation function, before passing the result to
other neurons in the network [16]. The output of the model is produced by the output
layer of neurons, which combine the intermediate outputs of the hidden layers of
neurons.
352 S. Patra et al.

The above ANN consists of one neuron that receives input from one or more input
nodes and produces an output. The output is calculated based on the weighted sum of
the inputs and a bias term, using an activation function. The weights and bias can be
adjusted to optimize the performance of the neuron on a particular task. The output
of a neuron can be calculated using the following equation:
( )

n
y = f b+ xi wi (17.1)
i=1

where,
y: Output of neuron.
f : Activation function (generally a sigmoid function or a rectified linear unit
function).
b: Bias term.
x1 to xn : Inputs.
w1 to wn : Corresponding weights.
ANN can be used for time series regression tasks, in which the goal is to predict the
value of a continuous variable at a future time based on past observations [17]. ANNs
are a good choice for time series regression because they are able to capture complex
and non-linear relationships between the variables and can adapt to changing patterns
in the data over time.
There are many different types of ANN architectures that can be used for time
series regression tasks, including feedforward neural networks, convolutional neural
networks, and recurrent neural networks. Each type of ANN has its’ own strengths
and weaknesses, and the appropriate choice will depend on the characteristics
of the specific time series regression task. ANNs are trained using a variant of
stochastic gradient descent called back-propagation [46]. During training, the model
17 Stock Market Prediction Using Machine Learning: Evidence from India 353

is presented with a sequence of input observations and the corresponding target


values, and the weights of the network are updated based on the prediction error.
Overall, ANNs are a powerful and widely-used tool for time series regression tasks
and can achieve good results when used appropriately. There are some advantages
artificial neural networks for regression tasks which include as: (a) ability to model
non-linear relationships, particularly useful in regression tasks where the relation-
ship is not well-understood or is not well-described by a linear model; (b) ANNs are
able to handle large amounts of data and can perform well even when the number of
observations is much larger than the number of variables; (c) ANNs have the ability
to capture long-term dependencies in the data, which can be useful in regression
tasks where past observations have a significant influence on the outcome. Besides
the above advantage of ANN, there are also some potential disadvantages include: (a)
ANNs can be computationally intensive to train, especially for large datasets; (b) it
can be difficult to interpret the internal workings of an ANN, which can make it harder
to understand why the model is making certain predictions; (c) ANNs is sensitive to
the scaling and distribution of the input variables, and may require careful prepro-
cessing to achieve good performance; (d) ANNs have the potential to over-fit the
training data if there are too many parameters relative to the number of observations
which can lead to poor generalization to unseen data.

17.3.2.2 Long Short-Term Memory (LSTM)

Long short-term memory (LSTM) is a type of artificial neural network that is designed
to remember information for long periods of time [23]. They have a different struc-
ture from traditional ANNs, in the sense that they contain “memory cells” that can
retain information for long periods of time, as well as input, output, and forget gates
(controlled by sigmoid activation functions) that control the flow of information into
and out of the memory cells. It is particularly useful for tasks that involve sequential
data, such as language translation or stock price prediction, because it is able to
maintain a record of past events that can influence the present or future.
354 S. Patra et al.

The above figure displays a block of LSTM at any time = t.


Long short-term memory (LSTM) is a type of artificial neural network that is
particularly well-suited for time series regression tasks. LSTMs are able to capture
long-term dependencies in the data, which is important for time series regression
because past observations can have a significant influence on future predictions.
LSTMs are also designed to handle sequential data, which is a common characteristic
of time series data. They are able to maintain a record of past events that can influence
the present or future. LSTMs can be trained using a variant of stochastic gradient
descent called backpropagation through time (BPTT). During training, the model
is presented with a sequence of input observations and the corresponding target
values, and the weights of the network are updated based on the prediction error.
Some advantages of long short-term memory networks for regression tasks include:
(a) LSTMs are able to capture long-term dependencies in the data similar to the
ANN; (b) LSTMs are designed to handle sequential data, which can be useful in
regression tasks where the order of the observations is important; (c) LSTMs can
adapt to changing patterns in the data over time, which is often the case in real-world
time series data; (d) LSTMs have achieved state-of-the-art results on a wide range of
regression tasks. The potential disadvantages of using LSTMs for regression tasks
are similar to the ANN modeling.

17.3.2.3 Decision Tree Regression (DT)

A decision tree regressor is a type of model used for regression tasks that works by
building a tree-like structure in which the internal nodes represent decision points
and the leaf nodes represent the predicted value. The model makes predictions by
starting at the root node and following the path down the tree based on the values
of the input features. The predicted value is then the value at the leaf node that is
17 Stock Market Prediction Using Machine Learning: Evidence from India 355

reached. Below is an example of a simple decision tree that evaluates the smallest of
three numbers.

One way to represent the decision at each node mathematically is with a simple
equation:

y = mean(obser vations) (17.2)

where, y is the predicted value for the input observations, mean is the function that
calculates the average of the values of the observations that fall into the leaf node,
obser vations are the Values of observations in leaf node.
Decision tree regressors can be used for time series regression tasks, in which
the goal is to predict the value of a continuous variable at a future time based on
past observations. However, decision tree regressors may not be the best choice for
all-time series regression tasks due to their inherent limitations. One potential issue
with using decision tree regressors for time series regression is that they do not take
into account the time component of the data. Decision tree regressors treat each input
observation independently, regardless of when it occurred. This can be a problem in
time series regression tasks because the value of the response variable may depend on
the order of the observations. Another potential issue is that decision tree regressors
are prone to overfitting, especially when the tree becomes deep and has many nodes.
This can be a problem in time series regression tasks because the model may not
generalize well to future data. Overall, decision tree regressor may be a simple and
easy-to-understand choice for time series regression tasks, but they may not always
be the most accurate or robust option. The advantages of this model include; (a)
simple to understand and interpret, (b) decision tree models do not require the data
to be normally distributed or the relationships between variables to be linear, (c) it can
handle high-dimensional data. The disadvantages include; (a) prone to over-fitting;
(b) limited ability to model complex relationships; (c) poor performance on small
datasets; (d) poor performance on imbalanced datasets.
356 S. Patra et al.

17.3.2.4 Random Forest (RF)

A random forest (RF) is an ensemble machine learning method that uses multiple
decision trees to make predictions. It works by training multiple decision trees on
randomly selected subsets of the training data and then averaging the predictions
made by each tree. Random forests are a popular machine learning method because
they are able to improve the accuracy of the predictions made by individual decision
trees by reducing overfitting and improving the ability to generalize to unseen data.
The below figure represents the random forest structure.

One way to represent the prediction made by a random forest mathematically is


with the following equation:

y = average(DT pr edictions) (17.3)

where,
y: Overall prediction made by the random forest.
average: Function that calculates the average of the predictions made by each of
the individual decision trees in the forest.
DT pr edictions: Predictions made by individual decision trees.
One potential issue with using random forests for time series regression is that
they do not take into account the time component of the data. Like individual deci-
sion trees, random forests treat each input observation independently, regardless of
when it occurred. To address this issue, some researchers have proposed methods
for incorporating the time component into the random forest model. For example,
17 Stock Market Prediction Using Machine Learning: Evidence from India 357

one approach is to use lagged variables as input features, which can capture the
dependencies between observations at different times.

17.3.2.5 Support Vector Machine (SVM)

A support vector machine (SVM) is a type of model used for classification and
regression tasks. It works by finding the hyperplane in a high-dimensional space
that maximally separates the different classes or values of the response variable.
SVMs are a popular machine learning method because they are able to achieve good
generalization performance and are effective at handling high-dimensional data.

One way to represent the prediction made by an SVM mathematically is with the
following equation:

y = sign(w ∗ x + b) (17.4)

where,
w, b: Model parameters that define the hyperplane.
x: Input data.
sign: sign function; returns a positive value if the argument is positive and a
negative value if the argument is negative.
y: The class or value that is associated with the positive or negative value.
SVMs are a good choice for time series regression because they are able to achieve
good generalization performance and are effective at handling high-dimensional data.
One potential issue with using SVMs for time series regression is that they do not take
into account the time component of the data. To address this issue, lagged variables
used as input features, which can capture the dependencies between observations
358 S. Patra et al.

at different times. SVMs can use different kernel functions, which allows them to
model different types of relationships between the variables.

17.3.2.6 Linear Regression (LR)

Linear regression (LR) is a type of model used for regression tasks that assumes a
linear relationship between the input features and the response variable. It works by
finding the line of best fit that minimizes the sum of the squared differences between
the predicted values and the true values. Linear regression is a simple and widely-
used method for regression tasks, but it is limited in its ability to model complex,
non-linear relationships between the variables.

One way to represent a linear regression model mathematically is with the


following equation:

y = b0 + b1 x1 + b2 x2 + · · · + bn xn (17.5)

where,
y: Predicted value of the response variable.
b0 : Intercept term.
b1 , b2 , . . . , bn : Coefficients for the input features.
x1 , x2 , . . . , xn : Input features.
The predicted value is thus a linear combination of the input features and the
coefficients.
17 Stock Market Prediction Using Machine Learning: Evidence from India 359

17.3.2.7 Ridge Regression (RR)

In ridge regression (RR), the goal is to minimize the residual sum of squares (RSS)
between the predicted output and the true output. This constraint helps to prevent
overfitting by penalizing models with large coefficients.
The mathematical equation for ridge regression is given by:

minimi ze RSS+ ∝ ∗ θ2 (17.6)

where θ is the vector of model parameters (coefficients), RSS is the residual sum of
squares, and α is the regularization
∑ 2 parameter that controls the strength of the penalty.
The regularization term θ is also known as the L2 penalty or the “shrinkage
penalty”, as it encourages the model parameters to take on smaller values. The
strength of the penalty is controlled by the hyperparameter α, which is chosen by the
user. A larger value of α results in a stronger penalty and a smaller value of α results
in a weaker penalty. Ridge regression is often used to improve the generalization
of linear models by reducing the variance of the estimates. It is particularly useful
when the number of features is large, as it helps to prevent overfitting by penalizing
models with large coefficients. Ridge regression can be used for time series regres-
sion by using lagged variables as features. To use ridge regression for time series
regression, we created a design matrix with lagged variables as the features and the
target variable as the output. Then, we fit a ridge regression model to this design
matrix to make predictions about the target variable at future time points.
A large value of α can result in over-regularization, which can lead to poor perfor-
mance. On the other hand, a small value of α can result in under-regularization, which
can lead to overfitting. Therefore, it is usually necessary to tune the value of α using
cross-validation to find the optimal value.

17.3.2.8 K-Nearest Neighbors (KNN) Regression

K-nearest neighbors (KNN) is a type of model used for regression tasks that works
by finding the K data points in the training set that are most similar to the input
data point and averaging their target values to make a prediction. KNN is a simple
and easy-to-understand method for regression tasks, but it can be computationally
expensive, as it requires calculating the distances between the input data point and
all the data points in the training set.
One way to represent the prediction made by a KNN model mathematically is
with the following equation:

y = average(target values o f K near est neighbor s) (17.7)

where,
360 S. Patra et al.

y: Predicted value, average: Function that calculates the average of the target
values of the K nearest neighbors, K : Number of nearest neighbors to consider.
The lagged variables are used as input features, which can capture the depen-
dencies between observations at different times. The above machine learning tools
have their advantages and disadvantages; therefore, it is better not to depend on one
model. In this study, we have estimated the above discussed machine learning tools
for the better comparison and conclusion.

17.4 Empirical Results

We split the market dataset into two parts, such as 20% testing sample (i.e. the orange
colored portion in Fig. 17.1) and 80% training sample (i.e. the blue colored portion
in Fig. 17.1) as follows.
The price of both Nifty (Fig. 17.1a), and Sensex (Fig. 17.1b) declines in 2009,
2012–13, 2016, and 2020. On the other hand, we observe the greatest upturns in Indian
stock markets during 2022. We extensively survey national newspapers, published
reports of the Indian monetary authorities namely RBI and SEBI to identify the
dynamic socio-economic events associated with the upturns and downturns of the
Indian stock prices. The dynamic economic situations such as global financial crisis
(GFC), Eurozone sovereign debt crisis (ESC), demonetization of Indian bank notes,
and covid-19 global pandemic are attributed to the respective downturns, whereas
several precautionary macroeconomic reforms5 in the Indian economy in the post-
pandemic period helped the stock markets perform better in 2022. Moreover, we
observe an adaptability environment in the Indian stock markets to the dynamic
socio-economic situations over the period. Hence, our result supports Bhuyan et al.
[5] and Patra and Hiremath [36, 37].
The continuous exposure to global shocks, and the subsequent financial and
economic reforms in an emerging economy like India makes the stock prices evolves
over time. Further, we observe that daily ups and downs of the stock prices are inter-
twined with each other, indicating the presence of uncertainty in the daily movements
of stock price. In this aspect, predicting the movement of stock prices in the post-
pandemic world is essential to understand the return opportunities for the investors
in the Indian stock markets. We employ several sophisticated machine learning (ML)
techniques, such as Artificial Neural Networks (ANN), Long Short-Term Memory
(LSTM), Decision Tree Regression (DT), Random Forest (RF), Support Vector
Machine (SVM), Linear Regression (LR), Ridge Regression (RR), and K-Nearest
Neighbors (KNN) Regression to predict the movements of both Sensex and Nifty as
follows.

5The Government of India mainly focused on stabilizing the monetary policy, and creating a
sound interaction between monetary and fiscal policy to encourage financial stability in the post-
pandemic world. For details refer https://www.bis.org/publ/bppdf/bispap122_j.pdf, assessed on 25th
September 2023 at 8:30 PM.
17 Stock Market Prediction Using Machine Learning: Evidence from India 361

Fig. 17.1 Splitting of market dataset into training and testing samples. 6

Using ANN (Fig. 17.2), and LSTM (Fig. 17.3), we observe that the trends of
both actual and predicted stock prices are consistent with each other. In other words,

6 Note: a and b report the splitting of the Nifty and Sensex dataset respectively. Our total sample
period (i.e. 3rd January 2000–11th October 2022) in both the markets is divided into 2 parts, such
as training sample (reported in the blue line, i.e. from 3rd January 2000–11th October 2018), and
testing sample (reported in orange line, i.e. from 12th October 2018–11th October 2022)
362 S. Patra et al.

Fig. 17.2 Prediction of the Indian stock markets using ANN.7 Source Author’s own computation

there exists a minimal gap between actual and predicted price of the stock markets
indicating the greater extent of prediction accuracy with the usage of ANN and
LSTM techniques. In particular, we find the most accurate predicted prices in the
post-pandemic period for both the markets, which reflects the suitability of both ANN
and LSTM in predicting the stock prices in the dynamic information environment.
However, LSTM provides the highest degree of prediction accuracy in both the pre
and post-pandemic periods than ANN and the alternate ML techniques (Fig. 17.3),
indicating the greater insensitivity of LSTM model in predicting the emerging stock
prices during the dynamic socio-economic situations. The results revealed that LSTM
method is better than all other ML techniques over time. Our results are in line with

7 Note This figure reports the prediction of Nifty and Sensex indices respectively using ANN
technique. The red trend line in both the figures represents the movement of predicted stock prices,
whereas the green trend line represents the movement of actual stock prices. Here, due to better
prediction accuracy particularly in the post-pandemic period (i.e. after 2020), both the actual and
predicted trend lines are overlapping with each other.
17 Stock Market Prediction Using Machine Learning: Evidence from India 363

Fig. 17.3 Prediction of the Indian stock markets with LSTM. 8 Source Author’s own computation

Pang et al. [34], which document that LSTM with embedded layer, known as ELSTM
provides more stabilized results than alternate ML techniques.

8 Note This figure reports the prediction of Nifty and Sensex market respectively using LSTM
technique. The red and green trend lines in both the figures represent the movement of LSTM-
predicted and actual stock prices respectively. Here, due to the highest prediction accuracy, both the
actual (i.e. red colored line) and predicted trend lines (i.e. green colored line) coincide with each
other over the period.
364 S. Patra et al.

Further, the lowest mean absolute error (MAE),9 root mean square error (RMSE)10
and the highest R2 , adjusted R2 statistics11 observed in LSTM proved its best fit than
all other models (Fig. 17.10).
We employ alternate ML techniques, namely DT (in Fig. 17.4), RF (Fig. 17.5),
SVM (Fig. 17.6), LR (Fig. 17.7), RR (Fig. 17.8), and KNN regression (Fig. 17.9) to
check the robustness of the superiority of LSTM model in predicting the accurate
movement of the emerging stock prices, and observe the lower degree of prediction
accuracy using all the models. In other words, the trends of predicted prices computed
by the alternate ML models although looks similar with the trends of the actual prices,
there exists certain gap between the actual and the respective12 predicted prices. The
gap between the actual and the predicted prices arises due to the presence of higher
MAE and RMSE in the respective models during computation of the predicted values
(Fig. 17.10).
Using DT approach, we observe higher MAE (i.e. 8.57), RMSE (0.03), and lower
R2 (i.e. 0.983), adjusted R2 (0.987) for Sensex than its domestic counterparts, which
indicates the model’s comparatively lesser predictive performance for the market
(Fig. 17.10). The lesser degree of prediction accuracy reflects certain gap between
the actual and DT-predicted price of the markets. In Nifty, we find such gap over the
period. But in Sensex, the gap has been reduced after 2022, signifying the suitability
of DT model in predicting the market movements in the post Russian-Ukraine war
period (Fig. 17.4).
The trends of RF-predicted (Fig. 17.5) and SVM-predicted price (Fig. 17.6) also
hold certain differences from the actual price in both the markets, indicating the lower
degree of prediction accuracy of RF and SVM models than LSTM approach. The
degree of differences between the actual and the computed RF-SVM predicted prices
are higher in the pre-covid period in both the markets, but then consistently reduce
in the post-covid period signifying their increased degree of prediction accuracy
in the post-pandemic situation. Among both the approaches, the lowest degree of
MAE, and RMSE, and the highest R2 , adjusted R2 proved RF better than SVM in
predicting the Indian stock prices, but both the models lag behind LSTM by holding
comparatively higher MAE, RMSE, and lower R2 and adjusted R2 (Fig. 17.10).

9 MAE explains the average distance between the actual price, and predicted price. The value of
MAE closer to zero indicates that the model provides the accurate predicted prices, and the trends
of both the actual and predicted prices coincides with each other.
10 RMSE is a measure of the average deviation between the predicted values from a model and

the actual observed values. It calculates the square root of the average of the squared differences
between predicted and actual values. Smaller RMSE values indicate better predictive performance,
and vice versa.
11 R-squared, often denoted as R2 , is a statistical measure used to assess the goodness of fit of a

regression model. R2 provides an indication of how well the model fits the data. Adjusted R-squared
(Adj R2 ) adjusts the R2 value to account for the number of predictors in the model. The closer the
value of R2 and Adj R2 to 1, the better the model fits the data.
12 Here, the level of predicted prices computed by different ML techniques varies from each other.

Among all the techniques, LSTM model provides the accurate predicted prices irrespective of all
the time phases, which coincides with the trends of the actual prices over time (see Fig. 17.3).
17 Stock Market Prediction Using Machine Learning: Evidence from India 365

Fig. 17.4 Prediction of the Indian stock markets using DT. 13 Source Author’s own computation

Using LR approach (Fig. 17.7), we observe the higher degree of distance between
the actual, and predicted prices in the post-covid period, which indicates the inap-
propriateness of LR model in predicting the Indian stock prices in the dynamic
socio-economic situations. Further, the highest MAE (accounted as 9.74) proves the
lower degree of predictive performance of LR approach for Sensex than its domestic
counterpart (Fig. 17.10).
The predicted prices through RR (Fig. 17.8) and KNN approach (Fig. 17.9) also
holds certain gap from the actual prices. Such gaps are higher in the pre-covid period,
and then consistently reduce after 2022 (i.e. in the post Russian-Ukraine war period),
indicating the presence of certain time effect on the level of prediction accuracy of

13 Note This figure reports the prediction of Nifty and Sensex market respectively using DT model.
The red and green trend lines in both the figures represents the movement of DT-predicted and
actual stock prices respectively. Here, both the actual (i.e. red colored line) and DT-predicted trend
lines (i.e. green colored line) are similar, but there exists certain gap between them indicating lower
degree of prediction accuracy than LSTM model.
366 S. Patra et al.

Fig. 17.5 Prediction of the Indian stock markets using RF. 14 Source Author’s own computation

RR and KNN approach. We find the highest MAE of KNN in both the markets
indicating its lower degree of prediction accuracy than RR approach (Fig. 17.10).
In the overall, we find that the level of prediction accuracy through DT, RF, SVM,
LR, RR, and KNN drastically varies from one point of time to other, indicating the
presence of time-dependence characteristics of these models. The greater degree of
sensitiveness of these ML-techniques to time prove them comparatively inappropriate
in predicting the Indian stock prices during the dynamic socio-economic situations.
Similarly, the increased level of prediction accuracy of ANN in the post-covid period
signifies the best-fit of the model only in the post-pandemic situation. But, LSTM
approach unlike other models provides the most accurate predicted prices over the

14 Note This figure reports the prediction of Nifty and Sensex market respectively using RF approach.
The red and green trend lines in both the figures represents the movement of RF-predicted and actual
stock prices respectively. Here, both the actual (i.e. red colored line) and RF-predicted trend lines (i.e.
green colored line) are similar, but there exists certain gap between them indicating a comparatively
lower degree of prediction accuracy than LSTM model.
17 Stock Market Prediction Using Machine Learning: Evidence from India 367

Fig. 17.6 Prediction of the Indian stock markets using SVM. 15 Source Author’s own computation

period irrespective of all the situations, indicating its time-independence charac-


teristics in predicting the Indian market movements. In other words, the highest
degree of prediction accuracy with the highest R2 (i.e. 0.989 for Nifty, and 0.983
for Sensex), adjusted R2 (i.e. 0.989 for Nifty, and 0.987 for Sensex), and lowest
MAE (i.e. 0.067) and LMSE (i.e. 0.002) in LSTM model proved it’s best-fit than the
alternate ML-techniques during all the time periods (Table 17.1).
The better extent of prediction accuracy further reduces the deviation between
actual and LSTM-predicted prices, and therefore proves LSTM as the best predictive
approach in predicting the movements of the Indian stock prices over time. On the

15 Note This figure reports the prediction of Nifty and Sensex market respectively using SVM
approach. The red and green trend lines in both the figures represents the movement of SVM-
predicted and actual stock prices respectively. Here, both the actual (i.e. red colored line) and
SVM-predicted trend lines (i.e. green colored line) are similar, but there exists certain gap between
them indicating a comparatively lower degree of prediction accuracy than LSTM model, particularly
in the pre-pandemic period.
368 S. Patra et al.

Fig. 17.7 Prediction of the Indian stock markets using LR. 16 Source Author’s own computation

other hand, the lowest R2 (i.e. 0.922 for Nifty, and 0.929 for Sensex), adjusted R2
(i.e. 0.917 for Nifty, and 0.927 for Sensex) in LR model prove its least-fit to predict
the Indian stock prices (Table 17.1). In other words, we observe the lowest degree of
prediction accuracy in LR approach, indicating its inappropriateness to predict the
Indian stock market over time. Moreover, both the actual and predictive trends of the
Indian stock prices remain time-varying, indicating the market’s adaptability to the
dynamic economic situations. In a similar vein, Das and Patra [13, 14] observed the
variation in the Indian banking performance immediately after the global financial
crisis.
We report the performance parameters of all the ML-techniques in Fig. 17.10,
and observe the highest MAE in SVM (i.e. 6.05), and LR model (i.e. 9.74) for Nifty

16 Note This figure reports the prediction of Nifty and Sensex market respectively using LR approach.
The red and green trend lines in both the figures represents the movement of LR-predicted and actual
stock prices respectively.
17 Stock Market Prediction Using Machine Learning: Evidence from India 369

Fig. 17.8 Prediction of the Indian stock markets using RR. 17 Source Author’s own computation

and Sensex respectively. The highest degree of MAE creating the highest distance
between the actual, and predicted prices makes the respective models inappropriate in
predicting the respective stock markets. On the other hand, LSTM model holding the
lowest MAE (i.e. 0.06) has been proved as the most suitable approach in predicting
both the Indian stock markets (Fig. 17.10).

17 Note This figure reports the prediction of Nifty and Sensex market respectively using RR
approach. The red and green trend lines in both the figures represents the movement of RR-predicted
and actual stock prices respectively. Here, both the actual (i.e. red colored line) and RR-predicted
trend lines (i.e. green colored line) are similar, but there exists certain gap between them indicating
a comparatively lower degree of prediction accuracy than LSTM model.
370 S. Patra et al.

Fig. 17.9 Prediction of the Indian stock markets using KNN. 18 Source Author’s own computation

18 Note This figure reports the prediction of Nifty and Sensex market respectively using KNN
approach. The red and green trend lines in both the figures represents the movement of KNN-
predicted and actual stock prices respectively. Here, both the actual (i.e. red colored line) and
KNN-predicted trend lines (i.e. green colored line) are similar, but there exists certain gap between
them indicating a comparatively lower degree of prediction accuracy than LSTM model.
17 Stock Market Prediction Using Machine Learning: Evidence from India 371

Fig. 17.10 Comparison of performance parameters in predicting the Indian stock markets
372 S. Patra et al.

Table 17.1 Comparison of the alternative machine learning techniques in predicting the Indian
stock markets
ML techniques Computed statistics
R2 ADJ R2 RMSE MAE
Nifty Sensex Nifty Sensex Nifty Sensex Nifty Sensex
ANN 0.980 0.981 0.980 0.980 0.004 0.010 0.164 1.065
LSTM 0.989 0.983 0.989 0.987 0.002 0.002 0.067 0.067
DT 0.975 0.964 0.973 0.973 0.025 0.031 4.576 8.576
RF 0.978 0.972 0.979 0.974 0.018 0.028 1.625 8.290
SVM 0.925 0.936 0.918 0.928 0.084 0.029 6.053 8.281
LR 0.922 0.929 0.917 0.927 0.096 0.038 4.744 9.744
RR 0.983 0.974 0.975 0.975 0.065 0.025 3.877 6.416
KNN 0.964 0.968 0.964 0.964 0.134 0.031 5.968 9.741
Note: R-squared (R2 ) assesses the proportion of variance in the dependent variable explained by the
model. Adjusted R-squared (Adj R2 ) adjusts R2 for the number of predictors to mitigate the risk of
overfitting. Root Mean Squared Error (RMSE) measures the average deviation between predicted
and actual values in the respective models. Mean Absolute Error (MAE) measures the average
absolute deviation between predicted and actual values in the respective models. These metrics are
commonly used to evaluate the performance of predictive models and to compare different models.
Each of them provides valuable insights into different aspects of a model’s performance.
Source Author’s own computation.

17.5 Conclusion

In the modern trading platform, technology plays an important role in the financial
market. In a similar aspect, Bhuyan et al. [7] documented the beneficial impact
of the technological shift on the productivity of the Indian financial institutions
like banks. Therefore, the prediction of the trends of the financial sector like stock
markets using the sophisticated machine learning (ML) techniques has become a
thrust issue in the investor’s world. Several ML models produce an output of market’s
prediction, but the level of prediction accuracy varies from one method to other. The
study considers the data on closing stock prices of both Nifty and Sensex from 3rd
January 2000 to 11th October 2022, and splits the full sample into 20% testing
and 80% training sample. We observe that among all the ML techniques, Long
Short-Term Memory (LSTM) with the greater extent of prediction accuracy and the
lowest errors has been proved as the most suitable model to predict the movements
of the Indian stock prices over time. Moreover, LSTM approach unlike other ML-
techniques hold certain time-independence characteristics in predicting the market
movements, which makes the level of prediction accuracy stable over time. On the
other hand, support vector machines (SVM), and linear regression (LR) approaches
with lowest degree of prediction accuracy, and highest errors have been proved as
inappropriate models to predict the movements of Nifty, and Sensex respectively.
These strategies may be sample specific and thus could benefit from more back testing
17 Stock Market Prediction Using Machine Learning: Evidence from India 373

on other sample markets, and time periods. Here, the robustness of our method would
benefit from testing it on another time periods and markets. Further, the significance
of the climate change has been elaborated in detail for India [8]. Therefore, the
prediction of the high-frequency time series, such as climate variables using machine-
learning techniques can be the best scope for the future research.
We discuss the strengths and weaknesses of several ML-techniques and provide
the important insights in applying advanced technologies for stock market predic-
tion. We contribute to the emerging literature on empirical asset pricing in the Indian
stock market by building and analyzing a comprehensive set of market predic-
tion factors with the usage of several machine learning algorithms. Our prediction
approach provides a potentially beneficial alternative for the investors to identify the
return opportunities and achieve the diversification benefits by mitigating risk while
investing in the Indian stock markets.

References

1. Adya, M., Collopy, F.: How efective are neural networks at forecasting and prediction? A
review and evaluation. J. Forecast. 17(1), 481–495 (1998)
2. Ballings, M., den Poel, D.V., Hespeels, N., Gryp, R.: Evaluating multiple classifiers for stock
price direction prediction. Expert Syst. Appl. 42(20), 7046–7056 (2015)
3. Barak, S., Arjmand, A., Ortobelli, S.: Fusion of multiple diverse predictors in stock market.
Inf. Fusion 36(1), 90–102 (2017)
4. Bezerra, P.C.S., Albuquerque, P.H.M.: Volatility forecasting via SVR—GARCH with mixture
of Gaussian kernels. CMS 14(2), 179–196 (2017)
5. Bhuyan, B., Patra, S., Bhuian, R.K.: Market adaptability and evolving predictability of stock
returns: an evidence from India. Asia-Pacific Finan. Mark. 27, 605–619 (2020)
6. Bhuyan, B., Patra, S., Bhuian, R.K.: Do LBMA gold price follow random-walk? Gold Bulletin
54(2), 151–159 (2021)
7. Bhuyan, B., Patra, S., Bhuian, R.K.: Measurement and determinants of total factor productivity:
evidence from Indian banking industry. Int. J. Prod. Perform. Manag. 71(7), 2970–2990 (2022)
8. Bhuyan, B., Mohanty, R.K., Patra, S.: Impact of climate change on food security in India: an
evidence from autoregressive distributed lag model. Environ. Dev. Sustain. 1–21 (2023)
9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
10. Cavalcante, R.C., Brasileiro, R.C., Souza, V.L., Nobrega, J.P., Oliveira, A.L.: Computational
intelligence and financial markets: a survey and future directions. Expert Syst. Appl. 55(1),
194–211 (2016)
11. Chen, Y.-S., Cheng, C.-H., Tsai, W.-L.: Modeling fitting-function-based fuzzy time series
patterns for evolving stock index forecasting. Appl. Intell. 41(2), 327–347 (2014)
12. Chen, H., Xiao, K., Sun, J., Wu, S.: A double-layer neural network framework for high-
frequency forecasting. ACM Trans. Manage. Inf. Syst. (TMIS) 7(4), 1–17 (2017)
13. Das, M.K., Patra, S.: Productivity and efficiency of public sector banks in India after the global
financial crisis. IUP J. Bank Manag. 15(2) (2016)
14. Das, M.K., Patra, S.: Productivity and efficiency of private sector banks after global financial
crisis: evidence from India. Asian J. Res. Bank. Financ. 6(5), 1–14 (2016)
15. Fama, E.F.: Efficient capital markets: II. J. Financ. 46(5), 1575–1617 (1991)
16. Gupta, N.: Artificial neural network. Netw. Complex Syst. 3(1), 24–28 (2013)
17. Guresen, E., Kayakutlu, G., Daim, T.U.: Using artificial neural network models in stock market
index prediction. Expert Syst. Appl. 38(8), 10389–10397 (2011)
374 S. Patra et al.

18. Göçken, M., Özçalıcı, M., Boru, A., Dosdogru, A.T.: Integrating metaheuristics and artificial
neural networks for improved stock price prediction. Expert Syst. Appl. 44(1), 320–331 (2016)
19. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Building direct citation networks. Scientometrics
115(2), 817–832 (2018)
20. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques
applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
21. Hiremath, G.S., Kattuman, P.: Foreign portfolio flows and emerging stock market: Is the
midnight bell ringing in India? Res. Int. Bus. Financ. 42, 544–558 (2017)
22. Hiremath, G.S., Narayan, S.: Testing the adaptive market hypothesis and its determinants for
the Indian stock markets. Financ. Res. Lett. 19, 173–180 (2016)
23. Van Houdt, G., Mosquera, C., Nápoles, G.: A review on the long short-term memory model.
Artif. Intell. Rev. 53, 5929–5955 (2020)
24. Huang, W., Nakamori, Y., Wang, S.-Y.: Forecasting stock market movement direction with
support vector machine. Comput. Oper. Res. 32(10), 2513–2522 (2005)
25. Kara, Y., Boyacioglu, M.A., Baykan, Ö.K.: Predicting direction of stock price index movement
using artificial neural networks and support vector machines: the sample of the Istanbul Stock
Exchange. Expert Syst. Appl. 38(5), 5311–5319 (2011)
26. Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests:
statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259(2), 689–702 (2017)
27. Krogh, A.: What are artificial neural networks? Nat. Biotechnol. 26(2), 195–197 (2008)
28. Kumar, D., Meghwani, S.S., Thakur, M.: Proximal support vector machine based hybrid
prediction models for trend forecasting in financial markets. J. Comput. Sci. 17(1), 1–13 (2016)
29. Kumar, M., Thenmozhi, M.: Forecasting stock index returns using ARIMA-SVM, ARIMA-
ANN, and ARIMA-random forest hybrid models. Int. J. Bank. Account. Financ. 5(3), 284–308
(2014)
30. Laboissiere, L.A., Fernandes, R.A., Lage, G.G.: Maximum and minimum stock price fore-
casting of Brazilian power distribution companies based on artificial neural networks. Appl.
Soft Comput. 35(1), 66–74 (2015)
31. Lahmiri, S.: Improving forecasting accuracy of the S&P500 intra-day price direction using
both wavelet low and high frequency coefficients. Fluct. Noise Lett. 13(01), 1450008 (2014)
32. Lo, A.W.: Reconciling efficient markets with behavioral finance: the adaptive markets
hypothesis. J. Invest. Consult. 7(2), 21–44 (2005)
33. Pai, P.-F., Lin, C.-S.: A hybrid ARIMA and support vector machines model in stock price
forecasting. Omega 33(6), 497–505 (2005)
34. Pang, X., Zhou, Y., Wang, P., Lin, W., Chang, V.: An innovative neural network approach for
stock market prediction. J. Supercomput. 76, 2098–2118 (2020)
35. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement
using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl.
42(1), 259–268 (2015)
36. Patra, S., Hiremath, G.S.: Are the stock markets adaptive? Evidence from approximate entropy
approach. ASBBS Proc. 26, 408–408 (2019)
37. Patra, S., Hiremath, G.S.: An entropy approach to measure the dynamic stock market efficiency.
J. Quant. Econ. 20(2), 337–377 (2022)
38. Patra, S.: Informational efficiency and adaptive stock markets (Doctoral dissertation, IIT
Kharagpur) (2020)
39. Patra, S., Hiremath, G.S.: Is there a time-varying nexus between stock market liquidity and
informational efficiency?–A cross-regional evidence. Stud. Econ. Financ. (2024)
40. Pisner, D.A., Schnyer, D.M.: Support vector machine. In Machine learning (pp. 101–121).
Academic Press, New York (2020)
41. Schonlau, M., Zou, R.Y.: The random forest algorithm for statistical learning. Stand. Genomic
Sci. 20(1), 3–29 (2020)
42. Tay, F.E., Cao, L.: Application of support vector machines in financial time series forecasting.
Omega 29(4), 309–317 (2001)
17 Stock Market Prediction Using Machine Learning: Evidence from India 375

43. Tsaih, R., Hsu, Y., Lai, C.C.: Forecasting S&P 500 stock index futures with a hybrid AI system.
Decis. Support Syst. 23(2), 161–174 (1998)
44. Wang, J.-J., Wang, J.-Z., Zhang, Z.-G., Guo, S.-P.: Stock index forecasting based on a hybrid
model. Omega 40(6), 758–766 (2012)
45. Weng, B., Ahmed, M.A., Megahed, F.M.: Stock market one-day ahead movement prediction
using disparate data sources. Expert Syst. Appl. 79(1), 153–163 (2017)
46. Whittington, J.C., Bogacz, R.: Theories of error back-propagation in the brain. Trends Cogn.
Sci. 23(3), 235–250 (2019)
47. Xiao, Y., Xiao, J., Lu, F., Wang, S.: Ensemble ANNs-PSO-GA approach for day-ahead stock
e-exchange prices forecasting. Int. J. Comput. Intell. Syst. 6(1), 96–114 (2013)
48. Yan, D., Zhou, Q., Wang, J., Zhang, N.: Bayesian regularisation neural network based on
artificial intelligence optimisation. Int. J. Prod. Res. 55(8), 2266–2287 (2017)
49. Yang, Y., Yang, M., Shen, C., Wang, F., Yuan, J., Li, J., Liu, Y.: Evaluating the accuracy of
different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding
of 2019-nCoV infections. MedRxiv 78(3), 241 (2020)
50. Zhang, N., Lin, A., Shang, P.: Multidimensional k-nearest neighbor model based on EEMD
for financial time series forecasting. Physica A 477(1), 161–173 (2017)
51. Zhong, X., Enke, D.: Forecasting daily stock market return using dimensionality reduction.
Expert Syst. Appl. 67(1), 126–139 (2017)
Chapter 18
Realized Stock-Market Volatility: Do
Industry Returns Have Predictive Value?

Riza Demirer, Rangan Gupta, and Christian Pierdzioch

Abstract Yes, they do. Utilizing a machine-learning technique known as random


forests to compute predictions of realized (good and bad) stock-market volatility, we
show that incorporating the information in lagged industry returns can help improve
out-of-sample predictions of aggregate stock-market volatility. While the predictive
contribution of industry level returns is not constant over time, industrials and mate-
rials play a dominant predictive role during the aftermath of the 2008 global financial
crisis, highlighting the informational value of real economic activity on stock-market
volatility dynamics. Finally, we show that incorporating lagged industry returns in
aggregate level volatility predictions is beneficial particularly when under-predicting
market volatility is costly, yielding greater economic benefits as the degree of risk
aversion increases.

Keywords Stock market · Realized volatility · Industry returns · Market


efficiency and information

JEL Classification G17 · Q02 · Q47

R. Demirer
Department of Economics and Finance, Southern Illinois University Edwardsville, Edwardsville,
IL, USA
e-mail: [email protected]
R. Gupta
Department of Economics, University of Pretoria, Pretoria 0002, South Africa
e-mail: [email protected]
C. Pierdzioch (B)
Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008
Hamburg, Germany
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 377
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_18
378 R. Demirer et al.

18.1 Introduction

Predicting volatility is a key component of option pricing, hedging, and portfolio opti-
mization applications. Naturally, there exists a large strand of literature that offers a
wide-array of univariate and multivariate models to predict and model stock-market
volatility (e.g. [3, 11, 12, 27, 30, 31, 36] for a detailed discussion of research in
this area). Despite the multitude of studies in this literature, using a wide range of
predictors that include macroeconomic and financial variables, however, the litera-
ture has not yet examined the predictive power of industry level information over
aggregate level stock-market volatility. This study adds to this line of research by
investigating the role of lagged industry returns from across the entire economy in
predicting aggregate stock-market volatility. Indeed, we show that incorporating the
information in lagged industry returns can help improve out-of-sample predictions
of aggregate stock-market volatility, rendering significant economic benefits partic-
ularly as the degree of risk aversion increases.
In a well cited study, Hong et al. [21] present the theoretical framework towards
the predictive power of industry returns for stock market returns. According to the
so-called gradual diffusion of information hypothesis, the information contained in
industry returns diffuses gradually across markets as a result of the interaction of
boundedly rational investors with access to private information at different points in
time. In this setting, public information gets partially reflected in asset prices such
that certain types of investors, such as those who specialize in trading the broad
market index, experience a lag in receiving industry level information that is already
accessible to investors who specialize in particular industries. This, in turn, forms
the basis for return predictability at the aggregate market level as industry level
dynamics contain predictive information regarding the economic fundamentals that
lead the aggregate stock market. Although later studies including [7, 22, 33, 35,
38] present conflicting evidence regarding the predictive content of industry returns,
interestingly, the literature has not yet extended the analysis to stock-market volatility
predictions. To the best of our knowledge, ours is the first to examine the predictive
power of lagged industry returns over aggregate stock-market volatility.
In our empirical analysis, we use a machine-learning technique known as random
forests [4] to predict realized (good and bad) stock-market volatility. Random forests
have been used in recent applications to study the predictive value of industry returns
for stock market returns [7] and the realized volatility of intraday Bitcoin returns [5].1
In our case, instead of relying on conditional volatility models from the generalized
autoregressive conditional heteroskedasticity (GARCH)-family, we follow [1] and
study monthly realized volatility (RV) as measured by the sum of squared daily log-
returns over a month. The use of realized volatility provides an observable measure
of the latent process of volatility that is model-free unlike the conditional estimates
of the same.

1For other recent applications of machine-learning techniques to modeling and predicting the
volatility of financial time series, see [24, 28], among others.
18 Realized Stock-Market Volatility: Do Industry Returns … 379

As far as the econometric framework is concerned, we utilize the popularly


employed heterogeneous autoregressive realized-volatility model (HAR-RV) of [9]
that allows to capture stylized facts such as multi-scaling behavior and long-memory
of the volatility process in a straightforward and simple way. Although the ordinary-
least-squares technique is commonly applied to estimate the HAR-RV model, the
use of random forests in our empirical application has several advantages. First, ran-
dom forests render it possible to analyze the links between realized volatility and
a large number of predictors (in our case, lagged industry returns from across the
entire economy) in a fully data-driven way. Second, random forests automatically
capture potential nonlinear links between realized volatility and its predictors as well
as any interaction effects among the predictors. Finally, unlike the ordinary-least-
squares technique, random forests always yield predictions of realized volatility that
are non-negative.
The empirical findings confirm that, when we account for lagged industry returns,
random forests have a superior predictive performance over the HAR-RV model, and
that lagged industry returns indeed contain valuable predictive information over the
aggregate stock market realized volatility as well as its “good” (upward) and “bad”
(downward) variants, both under the standard symmetric and an asymmetric loss
functions. Incorporating lagged industry returns in the array of predictors benefits
particularly an investor who suffers more from an under-prediction than an over-
prediction, while these benefits tend to decrease with the length of the prediction
horizon. We further show that the benefits of using the information in lagged industry
returns are economically valuable, with greater economic benefits for an investor who
has a high level of risk aversion. Finally, we show that certain industries including
those that reflect real economic activity play a more dominant role than others, which
is in line with the gradual diffusion of information as opposed to an efficient market
setting.
The remainder of this study is structured as follows. In Sect. 18.2, we describe how
a random forest is grown and present our data. In Sect. 18.3, we report the empirical
results, followed by a discussion of the economic implications of our findings in
Sect. 18.4. Finally, in Sect. 18.5, we conclude the study with final remarks.

18.2 Methodology and Data

18.2.1 Random Forests

A random forest is an ensemble machine-learning technique, consisting of a large


number of individual regression trees (for a textbook exposition, see [19] our notation
follows theirs). A regression tree, .T , consists of branches that subdivide the space of
predictors,.x = (x1 , x2 , . . .), of realized volatility (in our case) into.l non-overlapping
regions, . Rl . These regions are formed by applying a search-and-split algorithm in a
recursive top-down fashion.
380 R. Demirer et al.

Starting at the top level of a regression tree, the algorithm iterates over the predic-
tors, .s, and the corresponding splitting points, . p, that can be formed using the data on
a predictor. For every combination of a predictor and a splitting point, the algorithm
computes two half-planes,. R1 (s, p) = {xs |xs ≤ p} and. R2 (s, p) = {xs |xs > p}. The
search for an optimal combination of a predictor and a splitting point minimizes the
standard squared-error loss criterion:
⎧ ⎫
⎨ ∑ ∑ ⎬
min min
. ¯ 1 )2 + min
(RVi − RV ¯ 2 )2
(RVi − RV , (18.1)
s, p ⎩ RV
¯ 1 ¯ 2
RV ⎭
xs ∈R1 (s, p) xs ∈R2 (s, p)

where the index .i identifies those data on realized volatility that belong to a
half-plane, and . RV ¯ k = mean{RVi |xs ∈ Rk (s, p)}, k = 1, 2 denotes the half-plane-
specific mean of realized volatility. The outer minimization searches over all combi-
nations of .s and . p. Given .s and . p, the inner minimization minimizes the half-plane-
specific squared error loss by an optimal choice of the half-plane-specific means
of realized volatility. The solution of the minimization problem given in Eq. (18.1)
yields the top-level optimal splitting predictor, the top-level optimal splitting point,
and the two region-specific means of realized volatility. Accordingly, the solution
yields a first simple regression tree that has two terminal nodes.
At the next stage, the minimization problem in Eq. (18.1) is solved separately
for the two optimal top-level half-planes, . R1 (s, p) and . R2 (s, p), in order to grow a
larger regression tree. The new solution yields up to two second-level optimal splitting
predictors and optimal splitting points, and four second-level region-specific means
of realized volatility. Upon repeating this search-and-split algorithm multiple times,
we are able to grow an increasingly complex regression tree. Finally, the search-
and-split algorithm stops when a regression tree has a preset maximum number of
terminal nodes or every terminal node has a minimum number of observations. We use
a cross-validation approach to identify the optimal minimum number of observations
per terminal node (see Sect. 18.3.1 for further details).
When the search-partition algorithm stops, the regression tree sends the predictors
from its top level to the various leaves along the various optimal partitioning points
(nodes) and branches such that, for a regression tree made up of. L regions, the region-
specific means can be used to predict realized volatility as follows (.1 denotes the
indicator function):
( ) ∑ L
. T xi , {Rl }1 =
L ¯ l 1(xi ∈ Rl ).
RV (18.2)
l=1

While the search-and-split algorithm can be used in principle to compute finer and
finer granular predictions of realized volatility, the resulting growing complexity of
the hierarchical structure of a regression tree gives rise to an overfitting and data-
sensitivity problem, which, in turn, deteriorates its performance. A random forest
solves this problem as follows. First, a large number of bootstrap samples (sampling
with replacement) is obtained from the data. Second, to each bootstrap sample, a
18 Realized Stock-Market Volatility: Do Industry Returns … 381

random regression tree is fitted. A random regression tree differs from a standard
regression tree in that the former uses for every splitting step only a random subset
of the predictors, which mitigates the effect of influential predictors on tree building.
Growing a large number of random trees decorrelates the predictions from individual
trees, and averaging the decorrelated predictions obtained from the individual random
regression trees stabilizes the predictions of realized volatility.

18.2.2 Data

We use monthly excess returns for 49 value-weighted industry portfolios for the
period January 1946 to December 2019, obtained from Ken French’s online data
library.2 Following the convention, we exclude “others” and end up with 48 indus-
tries defined based on the Standard Industrial Classification (SIC) system. Sepa-
rately, daily and monthly stock market returns are collected as the returns of a value-
weighted market portfolio from the Center for Research in Security Prices (CRSP).
Daily market returns are used to compute the realized market volatility estimates
(. RV ) for each month from log daily returns (.rt ) as follows:


N
. RVt = ri2 , (18.3)
i=1

where . N denotes the number of data available for the month. In addition to realized
volatility, we examine “good” and “bad” realized volatility. The categorization of
RV into its good and bad components is an important issue as [16] stresses that
financial market participants care not only about the level of volatility, but also of its
nature, with all traders making the distinction between good and bad volatilities. The
“good” and “bad” components of realized volatility are formulated as the upside and
downside realized semi-variances (. RV B and . RV G ), respectively, computed from
positive and negative returns (see [2]) as follows:


T
. RVt B = ri2 I[(ri )<0] , (18.4)
i=1


T
. RVtG = ri2 I[(ri )>0] . (18.5)
i=1

The model of realized volatility follows the widely-employed heterogeneous autore-


gressive realized volatility (HAR-RV) model of [9] that has now become one of the
most popular models in the empirical finance literature on realized volatility. The the-
oretical foundation of the HAR-RV model is laid out by the so-called heterogeneous

2 Available at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.
382 R. Demirer et al.

market hypothesis of [25]. The heterogeneous market hypothesis stipulates that the
stock market is populated by different types of traders who differ with respect to their
sensitivity to information flows at different time horizons. In this setting, market par-
ticipants with short- versus long-term investment horizons respond to information
flows heterogeneously at different time horizons.
Accordingly, the key idea underlying the HAR-RV model is to use realized volatil-
ities from different time resolutions to model the dynamics of realized volatility.
When studying daily realized volatility, it is common practice among researchers to
consider daily, weekly, and monthly realized volatilities as predictors of subsequent
realized volatility. In our case, because we study monthly data, in line with the strand
of the literature that deals with the lead-lag relationship between industry and aggre-
gate level returns, we model the month-.h-ahead realized volatility, . RVt+h , using the
current realized volatility, . RVt , the quarterly realized volatility, . RVt,q , computed as
the average realized volatility from month .t − 3 to month .t − 1, and the yearly real-
ized volatility, . RVt,y , computed as the average realized volatility from month .t − 12
to month .t − 1. We compute these quarterly and yearly average realized volatilities
for the standard measure of realized volatility and for good and bad realized volatility.
Figure 18.1 presents the time series plots of computed realized volatility, . RVt ,
series along with its low-frequency components, . RVt,q , and . RVt,y , in Panel A, and
the corresponding counterparts for bad and good realized volatility in Panels B and C.
We observe notable spikes in the realized volatility estimates around the stock market
crash of 1987 and later during the 2008 global financial crisis period. Comparing
Panels B and C, we observe that bad realized volatility was the dominant factor in
the case of the 1987 stock market crash, while the 2008 global financial crisis period
was equally plagued by both the good and bad components of realized volatility.

18.3 Empirical Analysis

18.3.1 Calibration

Because the predictive value of industry returns may have changed over time, we
use rolling-estimation windows of length 120, 180, 240, and 360 months to esti-
mate both the baseline HAR-RV model (the model that excludes lagged industry
returns) and the HAR-RV model extended to include lagged industry returns. We
model realized volatility one-month, three-months and one-year ahead (that is, we
set .h = 1, 3, 12) by estimating random forests in the statistical computing program
R [29] using the add-on package “grf” [37]. While shifting the rolling-estimation
windows across the data set, we optimize, by means of cross validation, the number
of predictors randomly selected for splitting, the minimum node size of a tree, and
the parameter that governs the maximum imbalance of a node. We optimize these
18 Realized Stock-Market Volatility: Do Industry Returns … 383

Panel A: Realized volatility


RV
500 RVq
RVy
300
RV
100
0

1980 1990 2000 2010


Time

Panel B: Bad realized volatility

Bad RV
Bad RVq
500

Bad RVy
Bad RV
300
100
0

1980 1990 2000 2010


Time

Panel C: Good realized volatility


Good RV
Good RVq
500

Good RVy
Good RV
300
100
0

1980 1990 2000 2010


Time

Fig. 18.1 The components of the HAR-RV model


384 R. Demirer et al.

parameters separately for the baseline HAR-RV model and the extended HAR-RV
model that features lagged industry returns.3 We use 2000 random regression trees
to grow a random forest.

18.3.2 The Classic HAR-RV Model as a Benchmark

In order to set the stage for our empirical analysis, it is useful to go back for the
moment to the classic HAR-RV model. In the context of our analysis, the classic
HAR-RV model is formulated as . RVt+h = β0 + β1 RVt + β2 RVt,q + β3 RVt,y + εt ,
where.β j ,. j = 0, 1, 2, 3 are the coefficients to be estimated by means of the ordinary-
least-squares (OLS) technique, .εt is an error term, and . RVt+h denotes the realiza-
tion of realized volatility in month .t + h. The classic HAR-RV model extended
to include lagged industry∑ returns is then formulated as . RVt+h = β0 + β1 RVt +
β2 RVt,q + β3 RVt,y + 48 j=1 β j+3 r t, j + εt . The corresponding random-forest mod-
els can, thus, be expressed as . RVt+h = R F(RVt , RVt,q , RVt,y ) when we exclude
lagged industry returns, and as . RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )
when we include lagged industry returns in the array of predictors. At this point, it
is worth noting that our framework ensures that (i) random forests do not necessar-
ily invoke a linear structure as does the OLS technique, and (ii) random forests go
beyond the OLS technique in that they allow the predictors (lagged industry returns
in our case) to interact in an arbitrary data-driven way.4
We compare in Table 18.1, for various rolling-estimation windows and investment
horizons, the out-of-sample predictive performance of the HAR-RV model estimated
by means of the OLS technique with the out-of-sample predictive performance of
random forests in terms of the root-mean-squared prediction error (RMSPE) statistics
implied by these two models. To this end, we estimate both models by excluding
lagged industry returns (that is, the array of predictors includes . RVt , RVt,q , RVt,y
only; the OLS model also features a constant) and then by including lagged industry
returns. We then compute the RMSPE statistics for both models and compute the
corresponding ratios. A ratio larger than unity indicates that the random forests
outperform the corresponding HAR-RV model in terms of the RMSPE statistic.
Finally, we repeat these calculations for good and bad volatility.
Three main results emerge from Table 18.1. First, the RMSPE ratio exceeds unity
in all cases when we include lagged industry returns, indicating the superior perfor-
mance of random forests against the OLS model. Second, when we exclude lagged
industry returns from the set of predictors, the results are more balanced, where ran-
dom forests in several cases outperform the OLS estimator for realized volatility and

3 The “grf” package also allows different subsamples to be used for constructing a tree and for

making predictions. We deactivate this option, as in a classic random forest.


4 It should be noted that a comparison of linear models and random forests in terms of formal statis-

tical tests is complicated by the nonlinear and complex structure of random forests. We, therefore,
use various statistics (basic statistics, formal statistical tests, measures of economic benefits, metrics
of relative importance of predictors) to evaluate and compare models along different dimensions.
18 Realized Stock-Market Volatility: Do Industry Returns … 385

Table 18.1 Comparing OLS and random forests by means of RMSPE ratios
Window Excluding industry returns Including industry returns
.h = 1 .h = 3 .h = 12 .h = 1 .h = 3 .h = 12

Panel A: realized volatility


120 1.0353 1.0220 1.0867 1.4397 1.4885 1.5900
180 1.0216 0.9964 1.0426 1.2656 1.2608 1.3294
240 0.9120 1.0062 1.0149 1.1563 1.2166 1.2712
360 0.8888 0.9585 0.9543 1.0142 1.1217 1.0976
Panel B: bad realized volatility
120 1.0329 1.1400 1.6747 1.4310 1.6192 2.1122
180 0.9940 1.0271 1.2201 1.2119 1.3450 1.4977
240 0.8628 0.9934 0.9350 1.1450 1.2595 1.2846
360 0.9186 0.9680 0.9939 1.0650 1.1472 1.1012
Panel C: good realized volatility
120 0.9623 0.9725 1.0078 1.3601 1.4558 1.5745
180 0.9673 1.0084 0.9978 1.2491 1.2729 1.3198
240 0.9572 1.0090 1.0069 1.1840 1.2128 1.2232
360 0.9420 0.9838 0.9945 1.0169 1.1110 1.0799
Note The columns entitled “Excluding industry returns” compare the standard HAR-
RV model, . RVt+h = β0 + β1 RVt + β2 RVt,q + β3 RVt,y + εt , with the corresponding random-
forest model, . RVt+h = R F(RVt , RVt,q , RVt,y ). The columns entitled “Including indus-
try returns” compare the extended HAR-RV model, . RVt+h = β0 + β1 RVt + β2 RVt,q +

β3 RVt,y + 48 j=1 β j+3 rt, j + εt , with the corresponding random-forest model, . RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ). A ratio larger than unity indicates that the random-
forest model outperforms the corresponding HAR-RV model estimated by means of the OLS tech-
nique in terms of the RMSPE criterion. The column entitled “Window” shows the length of the
rolling-estimation window. The parameter h denotes the investment horizon (in months). The ran-
dom forests are built using 2000 trees

realized bad volatility for the short and intermediate rolling-estimation windows.
Third and more importantly, the RMSPE ratios are found to be substantially larger
when we include lagged industry returns in the set of predictors than when we exclude
lagged industry returns. In other words, the results show that random forests system-
atically, and in a quantitatively substantial way, improve out-of-sample predictions
of aggregate stock market realized volatility relative to the standard HAR-RV model
estimated by the OLS technique once we account for the industry level information
embedded in lagged industry returns.
The observed superior performance of the random forest against the OLS model
is not unexpected given that the HAR-RV model extended to include lagged industry
returns from across the entire economy requires the estimation of many parameters.
In case some of these industries have only limited predictive power for realized
volatility, their estimated parameters will add noise to the predictions of realized
volatility. This brings about a trade-off when the OLS technique is used to estimate
the HAR-RV model as the improvement in performance due to the predictive power
386 R. Demirer et al.

of industry returns has to be weighed against a deteriorated performance due to


an overparameterization of the model. Random forests, in contrast, do not suffer
from such an overparameterization problem as the search-and-split algorithm that is
used to grow regression trees automatically discards less informative predictors in
a data-driven way as we recursively subdivide the predictor space into rectangular
non-overlapping arrays in a top-down fashion.5
Finally, we follow a slightly more sophisticated modeling approach by adding
only one of the 48 lagged industry returns at a time to the classic HAR-RV model,
estimating the resulting 48 models by the OLS technique and finally averaging the
predictions from these 48 models to predict realized volatility. When we compare
the predictions computed by means of random forests with the predictions obtained
from such a thick-modeling approach, once again, we find that random forests yield
superior performance in terms of the RMSE statistic, the long rolling-estimation
window being the only exception and mainly for realized “bad” volatility. Results
are reported in Table 18.12 (Appendix).

18.3.3 The Predictive Power of Lagged Industry Returns

Having established evidence that the random-forest model is better suited than the
OLS technique to analyze the predictive value of lagged industry returns for realized
volatility, we summarize in Table 18.2, for the four rolling-estimation windows and
the three investment horizons studied, the ratios of the RMSPEs of the restricted,
. RVt+h = R F(RVt , RVt,q , RVt,y ), and the full random-forest model that includes
lagged industry returns,. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ). Panels
B and C present the results for good and bad volatility. The ratios in Table 18.2 exceed
unity for the vast majority of cases, indicating that the full model that incorporates
lagged industry returns outperforms the restricted model for realized volatility as
well as its bad and good variants. This means that extending the model to include
industry level information improves the out-of-sample accuracy of its predictions
of aggregate stock-market volatility. The magnitude of the ratios of the RMSPEs
tends to be larger for the short investment horizon than for the two longer investment
horizons, especially for realized volatility and good realized volatility, suggesting
that industry level information can be particularly useful to improve relatively shorter
term stock-market volatility predictions and for bullish market states.
Because large prediction errors have a disproportionately large effect on the
RMSPE statistic, we report in Table 18.3 the results obtained from the ratio of the
mean-absolute-prediction errors (MAPE) statistic of the restricted model and the full
model. Again, a value larger than unity for this ratio indicates that lagged industry

5 We also cross-checked how the random-forest model that features lagged industry returns performs
as compared to the classic HAR-RV model (estimated by OLS) that excludes industry returns. The
results, reported in Table 18.11 (Appendix), show that the random-forest model outperforms the
classic HAR-RV model in terms of the RMSPE ratio, with the long rolling-estimation window,
mainly for realized “bad” volatility, being an exception.
18 Realized Stock-Market Volatility: Do Industry Returns … 387

Table 18.2 The predictive power of lagged industry returns (RMSPE ratios)
Window .h = 1 .h = 3 .h = 12

Panel A: realized volatility


120 1.1004 1.0425 1.0368
180 1.0968 1.0374 1.0466
240 1.1245 1.0315 1.0625
360 1.0665 1.0501 1.0655
Panel B: bad realized volatility
120 1.0304 1.0341 1.0565
180 1.0324 1.0261 1.0520
240 1.0530 1.0297 1.1095
360 1.0288 1.0223 0.9949
Panel C: good realized volatility
120 1.1487 1.0572 1.0486
180 1.1281 1.0347 1.0856
240 1.1490 1.0354 1.0648
360 1.0871 1.0424 1.0260
Note This table reports the ratio of the RMSPE statistics of the restricted random-
forest model (. RVt+h = R F(RVt , RVt,q , RVt,y )) and the full random-forest model (. RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The column enti-
tled “Window” shows the length of the rolling-estimation window. The parameter h denotes the
investment horizon (in months). The random forests are built using 2000 trees

returns improve predictive accuracy. The results in Table 18.3 corroborate those for
the RMSPE ratios reported in Table 18.2. We find that the full model that incorpo-
rates lagged industry returns outperforms the restricted model with generally larger
MAPE ratios observed at the short investment horizon, further supporting the pre-
dictive value of industry level information particularly for shorter horizons and for
bullish market states.
In volatility forecasting exercises that involve noisy volatility proxies, Patton
[26] shows that the quasi-likelihood (QLIKE) loss function along with the usual
mean-squared-error loss function allow for an unbiased model ordering. Therefore,
in order to check the robustness of our findings, we report in Table 18.4 the results
for the popular QLIKE loss function. We observe that the QLIKE ratios are smaller
than for the RMSPE and MAPE statistics, however still larger than unity except for
some cases mainly for .h = 12.6 Moreover, the QLIKE ratios tend to become larger
when the length of the rolling-estimation window increases. These additional results,

6 It should be noted that the QLIKE loss function studied here implies that the loss from an under-
estimation of realized volatility is larger than the loss form a corresponding over-estimation of
the same absolute size. Hence, the results imply that an investor who suffers a greater loss from
an under-estimation than from a corresponding over-estimation of realized volatility benefits from
using lagged industry returns to predict stock-market volatility, a result that is consistent with the
results we shall report in Sect. 18.3.7.
388 R. Demirer et al.

Table 18.3 The predictive power of lagged industry returns (MAPE ratios)
Window .h = 1 .h = 3 .h = 12

Panel A: realized volatility


120 1.1095 1.0540 1.0159
180 1.1331 1.0916 1.0169
240 1.1439 1.0854 1.0634
360 1.1428 1.0974 1.0918
Panel B: bad realized volatility
120 1.0493 1.0584 1.0596
180 1.0737 1.0262 1.0642
240 1.0951 1.0266 1.1011
360 1.0858 1.0620 1.0265
Panel C: good realized volatility
120 1.1752 1.1157 1.0220
180 1.1779 1.1080 1.0716
240 1.2563 1.1192 1.0916
360 1.2205 1.1478 1.0381
Note This table reports results the ratio of the MAPE statistics of the restricted random-
forest model (. RVt+h = R F(RVt , RVt,q , RVt,y )) and the full random-forest model (. RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The column enti-
tled “Window” shows the length of the rolling-estimation window. The parameter h denotes the
investment horizon (in months). The random forests are built using 2000 trees

thus, further confirm the predictive value of lagged industry returns for stock-market
volatility.7
Finally, having confirmed the predictive value of lagged industry returns via alter-
native loss functions, as another approach, we report in Table 18.5 the results of [8]
test of equality of out-of-sample mean-squared prediction errors of the full model
that includes lagged industry returns and the restricted model that excludes industry
level information. The test yields significant results at the 5% and, in a few cases,
at the 10% level of significance at the two shorter investment horizons for realized
volatility and its good and bad variants, confirming that the full model outperforms
the restricted model in most cases. While the test statistic takes on smaller values for
the long investment horizon, it remains significant in the majority of cases at the 10%
level of significance, and in few cases even at the 5% percent level of significance.8

7 In a recent study, Reschenhofer et al. [34] propose two alternative likelihood-based loss functions,

one based on a t-distribution (QLIKE-t) and the other based on an F-distribution (QLIKE-F), that
are less sensitive to outliers and thus allow for a more stable ranking of models. Given this evidence,
we also experimented with the QLIKE-t and QLIKE-F distributions and found qualitatively similar
results (for alternative degrees-of-freedom parameters) to those obtained from QLIKE loss function.
8 We also examined, by means of the Clark-West test, how random forests perform relative to a

HAR-RV model estimated by means of the OLS technique, where both models feature lagged
industry returns as predictors in addition to the standard HAR-RV predictors. Hence, we treated the
HAR-RV model as a nested linear version of the nonlinear random-forest model. The results reported
18 Realized Stock-Market Volatility: Do Industry Returns … 389

Table 18.4 The predictive value of lagged industry returns (QLIKE ratios)
Window .h = 1 .h = 3 .h = 12

Panel A: realized volatility


120 1.0041 1.0154 0.9870
180 1.0061 1.0238 0.9954
240 1.0053 1.0206 1.0165
360 1.0115 1.0287 1.0178
Panel B: bad realized volatility
120 0.9941 1.0296 0.9887
180 1.0066 1.0169 0.9937
240 1.0099 1.0267 1.0075
360 1.0153 1.0164 1.0033
Panel C: good realized volatility
120 1.0168 1.0164 0.9944
180 1.0179 1.0127 1.0104
240 1.0247 1.0186 1.0321
360 1.0268 1.0271 1.0324
Note This table reports results the ratio of the quasi-likelihood (QLIKE) losses of the restricted
random-forest model (. RVt+h = R F(RVt , RVt,q , RVt,y )) and the full random-forest model
(. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The
column entitled “Window” shows the length of the rolling-estimation window. The parameter h
denotes the investment horizon (in months). The random forests are built using 2000 trees

Overall, various methods to assess the predictive value of lagged industry returns
yield consistent findings confirming that incorporating industry level information in
prediction models can improve the out-of-sample accuracy of predictions of stock-
market volatility.

18.3.4 Time-Varying Importance of Industry Returns

In their popularly cited study, Hong et al. [21] show that 14 out of 34 industries,
including commercial real estate, petroleum, metal, retail, financial, and services,
can predict stock market movements by one month, while other industries including
petroleum, metal, and financials can predict the market even two months ahead.
However, re-examining these results with updated data, Tse [35] shows that only
one to seven industries have significant predictive ability for the stock market. These
studies, with a focus on stock market return forecasting and in-sample tests, bring
about an interesting question as to the time-varying importance of industry returns

in Table 18.13 (Appendix) corroborate that the random-forest model significantly outperforms the
OLS model.
390 R. Demirer et al.

Table 18.5 The predictive value of lagged industry returns (Clark-West test)
Window .h = 1 .h = 3 .h = 12

Panel A: realized volatility


120 1.6136 3.1811 1.6123
180 1.4199 3.5650 1.5224
240 1.5290 3.4485 1.4752
360 1.5386 2.8947 1.2107
Panel B: bad realized volatility
120 1.7284 2.3637 1.6449
180 1.8349 2.8107 1.5169
240 1.8133 2.5655 1.3198
360 1.8813 2.1763 0.6721
Panel C: good realized volatility
120 1.8172 2.4927 1.6416
180 1.8407 2.9750 1.6657
240 1.9321 3.3568 2.1779
360 2.2709 2.5821 1.7002
Note This table reports the results of the [8] test of equal mean-squared predic-
tion errors. The null hypothesis is that the restricted random-forest model (. RVt+h =
R F(RVt , RVt,q , RVt,y )) has the same performance as the full random-forest model (. RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The alternative
hypothesis is that the full model performs better than the restricted model. Results are based on
Newey-West robust standard errors. Critical values are 1.28 and 1.65 at the 10% and 5% level of
significance. The column entitled “Window” shows the length of the rolling-estimation window.
The parameter h denotes the investment horizon (in months). The random forests are built using
2000 trees

in aggregate level market dynamics. For this reason, we supplement our analysis by
examining the relative importance of the predictors over time.
We present in Fig. 18.2 the relative importance of the predictors in the full
model, . RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ), measured in terms of
how often a predictor is used for splitting when building a tree. Given the large num-
ber of industries used in the array of predictors, in order to ease the interpretation of
the results, we aggregate the data into an “rv” block that represents the components of
the HAR-RV model, and eleven broad industry groups (energy, materials, industrials,
consumer staples, consumer discretionary, healthcare, financials, IT, communication,
utilities, real estate).
As expected, given the popularity of the HAR-RV model in empirical finance, the
three terms of the HAR-RV model (treated in the figure as a single block) always play
an important role in the models. Interestingly, however, the role of the “rv” block that
represents the components of the HAR-RV model, changes over time. We observe
that the role of the “rv” block has gained momentum during the period preceding the
Global Financial Crisis (GFC) of 2008 and then peaked during the GFC, suggest-
ing that the importance of non-industrial information including behavioral factors
Panel A: Realized volatility
1.00 1.00 1.00 group
communication
0.75 0.75 0.75 consumer discretionary
consumer staples
energy
financials
0.50 0.50 0.50 healthcare

percent
percent
percent
industrials
it
0.25 0.25 0.25 materials
real estate
rv
0.00 0.00 0.00 utilities
1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020
time time time

Panel B: Bad realized volatility


1.00 1.00 1.00 group
communication
0.75 0.75 0.75 consumer discretionary
consumer staples
energy
financials
0.50 0.50 0.50 healthcare

percent
percent
percent
industrials
it
0.25 0.25 0.25 materials
real estate
rv
0.00 0.00 0.00 utilities
1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020
time time time
18 Realized Stock-Market Volatility: Do Industry Returns …

Panel C: Good realized volatility


1.00 1.00 1.00 group
communication
0.75 0.75 0.75 consumer discretionary
consumer staples
energy
financials
0.50 0.50 0.50 healthcare

percent
percent
percent

industrials
it
0.25 0.25 0.25 materials
real estate
rv
0.00 0.00 0.00 utilities
1990 2000 2010 2020 1990 2000 2010 2020 1990 2000 2010 2020
time time time

Fig. 18.2 The relative importance of predictors. Note Predictor importance is computed for the full model (. RVt+h =
391

R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) and a rolling-estimation window of length 240 months. Predictor importance is defined as the weighted
sum of how often a predictor is used for splitting. Maximum tree depth considered: 4. Numbers are averaged across 10 estimations of the random forests. The
investment horizons are .h = 1, 3, 12 (from left to right). The random forests are built using 2000 trees
392 R. Demirer et al.

and/or changes in investors’ risk aversion increased during the run up to the global
crash. However, we also observe, at the short and intermediate investment horizons,
an increasing role of industrials and materials during the aftermath of the global
crisis, highlighting the informational value of real economic activity. Interestingly,
at the long investment horizon, we observe a similar pattern for consumer related
industries with consumer discretionary and consumer staples taking on a greater role
in the predictive models. Overall, our analysis suggests that certain industries play
a more dominant predictive role for aggregate level volatility and that the predic-
tive contribution of industry level returns is not constant over time with a structural
change occurring during the period that precedes the global financial crisis.

18.3.5 Robustness Checks

In order to further confirm the inferences discussed so far, we report in this section the
findings from a battery of robustness checks. In Table 18.6, we summarize the results
of the RMSPE ratio when we add market returns as a control variable to the array
of predictors of the full model. Specifically, we ask whether a model that features
the standard HAR-RV terms, lagged market returns, and lagged industry returns has
the same predictive performance as a model that features only the standard HAR-RV
terms and lagged market returns. The results, while depending to some extent on the
combination of estimation window and investment horizon being studied, in general
suggest that industry returns indeed capture predictive information for subsequent
realized market volatility over and above lagged market returns.
Table 18.7 reports the results of four additional robustness checks (for the sake
of brevity, we focus on realized volatility). First, we replace the rolling-estimation
window by means of a recursively expanding estimation window. Second, we study,
for.h > 1, the average realized volatility formulated as.mean(RVt+1 + · · · + RVt+h ).
Third, we use the realized standard deviation as the dependent variable in our models.
These additional robustness checks lend further support to our conclusion that indus-
try returns capture valuable predictive information for subsequent realized market
volatility.
As a fourth robustness check, we consider boosted regression trees [13, 14] as
an alternative to random forests. Boosted regression trees combine regression trees
with elements of statistical boosting. They resemble random forests insofar as the
key idea is to grow a forest of trees by combining simple regression trees. In contrast
to random forests, however, boosted regression trees are estimated by means of a
forward stage-wise iterative algorithm. The specific algorithm that we consider is
known as stochastic gradient-descent boosting. Estimating the stochastic gradient-
descent variant of boosted regression trees using the R add-on package “gbm” [17]
and computing the RMSPE ratio, we observe results that further confirm the pre-
dictive value of lagged industry returns for the intermediate and long investment
horizons.
18 Realized Stock-Market Volatility: Do Industry Returns … 393

Table 18.6 Controlling for market returns


Window .h = 1 .h =3 .h = 12
Panel A: realized volatility
120 1.0028 1.0394 1.0378
180 1.0070 1.0210 1.0412
240 1.0186 1.0048 1.0531
360 0.9683 1.0239 1.0783
Panel B: bad realized volatility
120 1.0139 1.0272 1.0424
180 1.0285 1.0152 1.0449
240 1.0063 1.0159 1.1189
360 0.9235 1.0386 1.0197
Panel C: good realized volatility
120 1.0007 1.0233 1.0472
180 1.0041 1.0142 1.0763
240 0.9972 1.0033 1.0376
360 0.9423 1.0146 1.0299
Note This table reports the ratio of the RMSPE statistics of the restricted model (the model that
features only the standard HAR-RV terms and market returns) and the full model (the model that
features the standard HAR-RV terms, market returns, and lagged industry returns). The column
entitled “Window” shows the length of the rolling-estimation window. The parameter h denotes the
investment horizon (in months). The random forests are built using 2000 trees

As a further robustness check, considering that there is a tradeoff between training


time and the number of trees constituting the random forest, we report in Table 18.8
the results for realized volatility predictions that we obtain when we vary the number
of trees from 1000 to 3000. The results show that variation of the number of trees in
this range leaves our results qualitatively unchanged.
A comparison of the results for 2000 trees in Table 18.8 with the corresponding
results in Table 18.2 shows that, mainly for the short investment horizon (.h = 1), the
results display a certain element of variability due to the random element involved
in the estimation of random forests.9 In order to shed some light on the robustness
of our results with respect to this random element, we setup a small-scale simu-
lation experiment. Specifically, we focus on a rolling-estimation window of length
240 observations and the short investment horizon (.h = 1), estimate the random-
forest models with and without lagged industry returns, and repeat this process 30

9 We could hold constant the seed when studying the effect of a variation in the number of trees (or
some other parameter like, for example, the length of the rolling-estimation window) on our results.
The results would then reflect the pure effect of a variation in the number of trees conditional on the
fixed seed. We prefer in this study not to fix the seed because the results then give a clearer picture
of the total variation of our results due to a variation of the model configuration (like a variation in
the number of trees) and the random element involved in the estimation of random forests.
394 R. Demirer et al.

Table 18.7 Further robustness checks


Specification .h = 1 .h =3 .h = 12
Recursive estimation 1.0383 1.0262 1.0142
of random forests
Average realized 1.0833 1.0580 1.0896
volatility
Realized standard 1.0569 1.0568 1.0566
deviation
Boosted regression 0.9994 1.0571 1.0595
trees
Note This table reports for realized volatility the ratio of the RMSPE statistics of
the restricted model (. RVt+h = R F(RVt , RVt,q , RVt,y )) and the full model (. RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The column
entitled “Specification” shows the model variant being studied. Recursive estimation of random
forests: A recursive estimation window replaces the rolling-estimation window. The initial train-
ing period is 120 months. The random forests are built using 2000 trees. Average . RV : For .h > 1
the dependent variable in the model is computed as .mean(RVt+1 + · · · + RVt+h ). The length of
the rolling-estimation window is 120 months. The random forests are built using 2,000 trees. For
.h = 1, the result slightly differs from the corresponding result reported in Table 18.2, reflecting
random variation due to bootstrapping and random tree building. Realized standard deviation: The
dependent variable is the square root of . RV . The length of the rolling-estimation window is 120
months. The random forests are built using 2000 trees. Boosted regression trees: The parameters
used for estimation of boosted regression trees are as follows: tree depth.= 5, learning rate.= 0.005,
minimum number of observations per node.= 5, bag fraction.= 0.5. The maximum number of trees
is 2000. The number of trees used for estimation is determined by five-fold cross-validation. The
parameter h denotes the investment horizon (in months)

times.10 We then compute, across the simulation runs, the average, minimum, and
maximum of the RMSPE and the MAPE ratios. We report the results of our simula-
tion experiment in Table 18.14 (Appendix). The results of our simulation experiment
demonstrate the robustness of our finding that lagged industry returns improve the
accuracy of predictions of realized volatility.

18.3.6 Random Forests Versus Shrinkage Estimators

A key feature of random forests is that they capture in a data-driven way any potential
nonlinearities in the data as well as interaction effects between the predictor variables.
[18], who compare the gains of using various machine-learning techniques for pre-
dicting stock returns, argue that predictive gains from using trees can be attributed to
nonlinear predictor interactions that other techniques do not detect. In order to assess
whether nonlinearities and predictor interactions also play a role in the context of
our prediction experiment, we compare random forests with three popular linear

10Computational time is not a severe binding constraint on our estimations because we run the
simulation experiment and the various other variants of our models in parallel.
18 Realized Stock-Market Volatility: Do Industry Returns … 395

Table 18.8 Varying the number of trees


Number of trees .h = 1 .h =3 .h = 12
1000 1.1472 1.0269 1.0728
1500 1.1911 1.0402 1.0728
2000 1.1961 1.0321 1.0815
2500 1.1839 1.0236 1.0464
3000 1.1669 1.0286 1.0444
Note This table reports for realized volatility the ratio of the RMSPE statistics of
the restricted model (. RVt+h = R F(RVt , RVt,q , RVt,y )) and the full model (. RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The length of
the rolling-estimation window is 240 months. For 2000 trees, the result are not exactly identical to
the corresponding results reported in Table 18.2, reflecting random variation due to bootstrapping
and random tree building. The parameter h denotes the investment horizon (in months)

shrinkage estimators (see the textbook by Hastie et al. [19]): the Lasso estimator, the
Ridge-regression estimator, and an elastic net. While the Lasso estimator uses the
L1 norm of the coefficient vector to shrink the dimension of the estimated model,
the Ridge-regression estimator uses the L2 norm. The elastic net uses (in the case
of our parametrization) an equally weighted combination of the two. We use the R
add-on package “glmnet” [15] to estimate the shrinkage models, where the optimal
shrinkage parameter minimizes the 10-fold cross-validation mean cross-validated
error.
Table 18.9 reports the RMSPE ratios of the shrinkage estimators and the random-
forest model. A ratio larger than unity shows that the random-forest model has a better
predicting performance than the respective shrinkage estimator. A ratio larger than
unity shows that the random-forest model performs better than the respective shrink-
age estimator for the majority of the configurations being studied, albeit for some
configurations by a small margin. On balance, the results show that the random-forest
model has a competitive performance relative to the linear shrinkage estimators, and
that it outperforms the latter for several configurations, indicating that accounting
for departures from linearity and predictor interactions can be useful for modeling
the link between realized volatility and lagged industry returns.

18.3.7 Asymmetric Loss and Quantile-Random Forests

The results reported in the preceding sections are based on the assumption that an
investor’s loss is a symmetric function of the squared or absolute prediction error
(with the QLIKE loss function being the exception). That is, an under-prediction
of realized volatility causes the same loss as an over-prediction of the same size.
In practical settings, however, one could easily think of situations in which the loss
function, such as one implied by certain options-trading strategies, is asymmetric in
the prediction error. Therefore, in order to account for such a setting, we study the
396 R. Demirer et al.

Table 18.9 Comparison with shrinkage estimators


Window .h = 1 .h = 3 .h = 12
Panel A: Lasso
120 1.0182 1.0014 1.0085
180 1.0324 1.0107 1.0281
240 0.9860 1.0137 1.0438
360 1.1125 1.0359 1.0077
Panel B: ridge regression
120 1.0101 0.9963 0.9921
180 1.0024 1.0136 1.0178
240 0.9600 1.0048 1.0296
360 1.0357 1.0380 1.0071
Panel C: elastic net
120 1.0128 1.0019 1.0087
180 1.0269 1.0128 1.0256
240 0.9800 1.0094 1.0421
360 1.0997 1.0237 1.0072
Note This table reports the ratio of the RMSPE statistics of the shrinkage estimators and the random-
forest model. Both models feature lagged industry returns in the array of potential predictors. The
column entitled “Window” shows the length of the rolling-estimation window. The parameter h
denotes the investment horizon (in months). The random forests are built using 2000 trees

following loss function (e.g., [10]):

. L(P E t+h , α) = [α + (1 − 2α)1(P E t+h < 0)]|P E t+h | p . (18.6)

where we compute the prediction error, . P E, by subtracting the prediction of realized


volatility from the actual realization of realized volatility. Special cases of this loss
function are the lin-lin loss function (. p = 1) and the quad-quad loss function (. p = 2)
where the shape parameter .α ∈ (0, 1) determines the asymmetry of the loss function.
The loss function is symmetric for the special case .α = 0.5. Hence, the parameter
configuration .α = 0.5 and . p = 1 implies that an investor’s loss is a symmetric func-
tion of the mean-absolute prediction error, while .α = 0.5 and . p = 2 implies that an
investor’s loss is a symmetric function of the squared prediction error. Setting the
shape parameter to .α > 0.5, in turn, results in a loss function that attaches a higher
loss to an under-prediction of realized volatility than to an over-prediction of the
same (absolute) size. In the opposite case, .α < 0.5, an over-prediction is costlier
than a corresponding under-prediction.
We assume that an investor whose loss function is asymmetric focuses on a quan-
tile of the conditional distribution of realized volatility that corresponds to the shape
of the loss function rather than simply the mean (or median) of realized volatility.
Specifically, we assume that an investor who has a loss function with a shape param-
eter .α > 0.5 (that is, an investor who suffers a higher loss from an under-prediction
18 Realized Stock-Market Volatility: Do Industry Returns … 397

than from an over-prediction) adjusts his or her prediction upward. Such an investor,
thus, predicts a quantile of the conditional distribution of realized volatility above
the median. Conversely, an investor who has a loss function with a shape parameter
.α < 0.5 adjusts his or her prediction downward relative to the median. We compute

such upward- and downward-adjusted predictions by estimating quantile-random


forests [23].
We proceed as follows. We estimate quantile-random forests and predict the
.α−quantiles of the conditional distribution function of realized volatility, and then
compute the prediction errors that correspond to the estimated .α−quantiles. We use
the resulting prediction errors to compute an investor’s loss according to the loss
function given in Eq. (18.6). Cumulating the losses over all out-of-sample predic-
tions for both the restricted model, . RVt+h = R F(RVt , RVt,q , RVt,y ), and the full
model, . RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ), we finally compute the
ratio of the cumulated losses, where a loss ratio that exceeds unity indicates that the
full model produces a lower cumulated loss than the restricted model.
Figure 18.3 presents the plots for the loss ratios for . p = 1 and . p = 2 as a function
of the shape parameter, .α. We average for each shape parameter, .α, the loss ratios
across the four different investment horizons. For a loss function of the quad-quad
type (. p = 2), the loss ratio is smaller than unity for some shape parameters smaller
than .α = 0.5, and for shape parameters close to its upper boundary. In contrast, for
a broad range of shape parameters above 0.5, the loss ratio is found to be larger
than unity with the loss ratio attaining a maximum in this range. This indicates that
an investor who suffers more from an under-prediction than an over-prediction (and
whose loss function has a shape parameter not too close to its upper boundary) reaps
relatively larger benefits from studying lagged industry returns. The benefits, how-
ever, tend to decrease with the length of the investment horizon, and are particularly
large in the case of good realized volatility and.h = 1. For a loss function of the lin-lin
type, in turn, we observe a loss ratio larger than unity for .h = 1 and .h = 3, which
attains a maximum for .α > 0.5. For .h = 12, the loss ratio decreases in the magnitude
of the shape parameter, .α, but remains largely positive. To sum up, these additional
results provide additional insight to the potential benefits an investor can achieve by
using industry level information to predict volatility at the aggregate market level.

18.4 Economic Implications

Considering alternative loss function is one way to quantify the economic benefits of
predictions, and the discussion in Sect. 18.3.7 indicates that studying lagged industry
returns benefits investors who are particularly concerned about under-predicting mar-
ket volatility. This is certainly an important consideration for the pricing of options
contracts as ignoring industry level information can potentially lead to under-pricing
of these securities. An alternative way to assess the economic implications of our
findings is to directly use an investor’s utility function to measure the benefits from
utilizing industry level information in predicting realized market volatility. To this
398 R. Demirer et al.

Panel A: Realized volatility


h=1 h=3 h=12
1.6

1.6

1.6
1.4

1.4

1.4
Loss ratio

Loss ratio

Loss ratio
1.2

1.2

1.2
1.0

1.0

1.0
p=1
0.8

0.8

0.8
p=2

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

PanelB:Badrealizedvolatility
h=1 h=3 h=12
1.6

1.6

1.6
1.4

1.4

1.4
1.0 1.2

1.0 1.2

1.0 1.2
Loss ratio

Loss ratio

Loss ratio
0.8

0.8

0.8
p=1
0.6

0.6

0.6

p=2

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

PanelC:Goodrealizedvolatility
h=1 h=3 h=12
1.6

1.6

1.6
1.4

1.4

1.4
Loss ratio

Loss ratio

Loss ratio
1.2

1.2

1.2
1.0

1.0

1.0

p=1
p=2

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

Fig. 18.3 The shape of an investor’s loss function and the predictive value of lagged indus-
try returns. Note This figure displays the ratio of the cumulated loss for the restricted random-
forest model, . RVt+h = R F(RVt , RVt,q , RVt,y ), and the full random-forest model, . RVt+h =
R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 ) that includes lagged industry returns. The loss func-
tion is given in Eq. 18.6. A ratio larger than unity signals that the full model performs better than the
restricted model. The loss ratios are averaged across the four different rolling-estimation windows
(120, 180, 140, and 360 months). The random forests are estimated by setting the minimum node
size to 5 and using one-third of the predictors randomly chosen for splitting. The random forests
are built using 2000 trees. The parameter h denotes the investment horizon (in months)
18 Realized Stock-Market Volatility: Do Industry Returns … 399

Table 18.10 Economic implications


Window .γ = 3 .γ =5 .γ = 10
Panel A: realized volatility
120 2.90 9.49 36.22
180 1.59 6.83 31.57
240 0.29 5.93 35.39
360 3.52 13.29 69.22
Panel B: bad realized volatility
120 .− 0.81 0.44 8.51
180 .− 0.77 1.86 8.58
240 .− 1.85 .− 1.25 .− 4.29

360 .− 6.14 .− 6.36 .− 9.40

Panel C: good realized volatility


120 3.64 7.62 39.02
180 4.73 12.50 68.64
240 5.99 12.97 62.38
360 12.44 30.59 455.66
Note This table reports the difference (in percent) between the certainty equivalent returns (CER)
that an investor attains upon using lagged industry returns to predict realized market volatility and
otherwise. A positive number indicates that an investor attains a higher CER ratio when using lagged
industry returns to set up a prediction model (that is, we compute (CER with lagged industry returns
.− CER without industry returns)/|CER without industry returns|). The utility function is of the
constant-relative-risk-aversion (CRRA) type, where the parameter.γ captures the degree of risk aver-
sion. The risk-free interest rate is fixed at zero and there are no transaction costs. An investor rebal-
ances his or her portfolio every month, where portfolio weights are restricted to the interval.[0, 1.5].
Predictions are based on the full model (. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that
includes lagged industry returns and the restricted model (. RVt+h = R F(RVt , RVt,q , RVt,y )). The
investment horizon is fixed at .h = 1. The random forests are built using 2000 trees. The maxi-
mum number of trees is 2000. The number of trees used for estimation is determined by five-fold
cross-validation

end, we consider an investor who decides whether to invest in a portfolio consisting


of a riskless asset and in the stock market. Like [6], we keep things simple in that we
abstract from transaction costs, neglect intertemporal hedging considerations, and
focus on the short investment horizon (.h = 1). We assume that an investor has a con-
stant relative risk aversion utility function (CRRA; that is .U (W ) = W 1−γ /(1 − γ ),
where .U denotes utility and .W denotes wealth), for which we consider three levels of
risk aversion (.γ = 3, 5, 10) similar to the application by Cenesizoglu and Timmer-
mann [6]. We further fix the riskless interest rate at zero and assume that an investor
uses the returns as observed in the period in which a prediction is to be formed
to predict returns. We then compute the certainty equivalent returns (CER) for an
investor who utilizes lagged industry returns to predict realized market volatility and,
alternatively, for an investor who ignores industry level information.
400 R. Demirer et al.

In Table 18.10, we report the difference (in percent) between the resulting CER
values for the two types of investors. A positive number indicates that an investor
attains a higher CER by incorporating lagged industry returns in the prediction model.
The results for realized volatility in Panel A indicate substantial economic gains from
using industry level information (all one figure in the table are positive). The same
also holds for good realized volatility in Panel C with the magnitude of the economic
benefits from utilizing lagged industry returns increasing as the degree of risk aver-
sion increases. This suggests that more risk averse investors can reap increasingly
greater economic benefits from using industry level information. In the case of bad
realized volatility, in contrast, the results are mixed. A longer rolling-estimation win-
dow tends to worsen the economic value added of industry returns. This could be due
to the dominance of non-industry related factors such as behavioral and sentiment
related effects over stock-market volatility dynamics, particularly during periods of
market crisis when investors would be more likely to engage in herding behavior.
Nevertheless, our results indicate that an investor who plans to use lagged indus-
try returns to predict bad realized market volatility should choose a relatively short
rolling-estimation window, especially in case he or she is highly risk averse.

18.5 Concluding Remarks

In a well-cited study, Hong et al. [21] argue that industry portfolios capture predictive
information over the aggregate stock market, in line with the so-called gradual dif-
fusion of information hypothesis that suggests the information contained in industry
returns diffuses gradually across markets. Although later studies provide mixed evi-
dence regarding the predictive power of industry returns over stock market returns,
the literature has not yet examined the out-of-sample predictability of stock-market
volatility in this context. Given the importance of accurate out-of-sample volatility
predictions for a number of financial activities including option pricing, hedging,
and portfolio optimization, our study is a first step in this direction by investigating
for the first time the role of lagged industry returns from across the entire economy
in predicting out-of-sample aggregate stock-market volatility.
18 Realized Stock-Market Volatility: Do Industry Returns … 401

Utilizing a machine-learning technique known as random forests, we show that


incorporating the information in lagged industry returns can indeed help improve the
out-of-sample accuracy of predictions of aggregate stock-market volatility. The pre-
dictive contribution of industry level returns, however, is not constant over time with
an increasing role of industrials and materials during the aftermath of the 2008 global
financial crisis, highlighting the informational value of real economic activity on
stock-market volatility dynamics. We also show that studying lagged industry returns
tend to benefit investors who are particularly concerned about under-predicting mar-
ket volatility. Finally, assuming a constant relative risk aversion utility function, we
show that the magnitude of the economic benefits from utilizing lagged industry
returns increases as the degree of risk aversion of an investor increases, suggest-
ing that more risk averse investors can reap increasingly greater economic benefits
studying industry level information.
As a final note, it is important to point out that the purpose of our empirical
study is not to show that random forests and related tree-based techniques are the
best machine-learning techniques for modeling and predicting realized volatility.
Many different techniques (like the shrinkage estimators we have considered in this
study) populate the machine-learning zoo and the comparative advantages of these
techniques can be used to shed light on various different aspects of realized volatility.
The purpose of our empirical study is to shed light on the potential role of industry
returns for modeling and predicting stock-market volatility at the aggregate level, and
random forests turn out to be useful in this regard. For the purpose of our analysis,
random forests have several advantages. They are straightforward to implement and
produce results that are easy to interpret. They can easily manage a large number
of industries from across the entire economy as predictors of realized volatility and
they always produce nonnegative predictions of realized volatility. Random forests
also can be adapted to a quantile framework such that it becomes straightforward
to study whether and, if so, how investors with different loss functions benefit from
using lagged industry returns for predicting realized volatility.
In future research, it will be interesting to build on our empirical results by apply-
ing other machine-learning techniques to examine whether industries lead the stock
market in the context of return and volatility prediction.

Acknowledgements The authors thank an anonymous reviewer for helpful comments. The usual
disclaimer applies.

Appendix

See Tables 18.11, 18.12, 18.13, and 18.14.


402 R. Demirer et al.

Table 18.11 The full random-forest model versus the classic HAR-RV model (RMSPE ratios)
Window .h = 1 .h = 3 .h = 12
Panel A: realized volatility
120 1.1392 1.0654 1.1268
180 1.1205 1.0336 1.0911
240 1.0255 1.0378 1.0783
360 0.9479 1.0065 1.0168
Panel B: bad realized volatility
120 1.0643 1.1790 1.7693
180 1.0262 1.0539 1.2835
240 0.9086 1.0229 1.0374
360 0.9451 0.9896 0.9889
Panel C: good realized volatility
120 1.1054 1.0282 1.0569
180 1.0912 1.0434 1.0832
240 1.0998 1.0447 1.0721
360 1.0241 1.0255 1.0204
Note This table reports the ratio of the RMSPE statistics of the classic HAR-RV model (. RVt+h =
β0 + β1 RVt + β2 RVt,q + β3 RVt,y + εt ) estimated by OLS and the full random-forest model
(. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that includes lagged industry returns. The
column entitled “Window” shows the length of the rolling-estimation window. The parameter h
denotes the investment horizon (in months). The random forests are built using 2000 trees

Table 18.12 The full random-forest model versus the HAR-RV model (model averaging)
Window .h = 1 .h = 3 .h = 12
Panel A: realized volatility
120 1.1399 1.0608 1.1206
180 1.1168 1.0304 1.0880
240 1.0230 1.0363 1.0746
360 0.9475 1.0073 1.0124
Panel B: bad realized volatility
120 1.0597 1.1612 1.7968
180 1.0210 1.0598 1.3016
240 0.9142 1.0247 1.0332
360 0.9471 0.9915 0.9832
Panel C: good realized volatility
120 1.1245 1.0230 1.0550
180 1.1001 1.0403 1.0818
240 1.0982 1.0425 1.0708
360 1.0121 1.0210 1.0195
Note This table reports the ratio of the RMSPE statistics of the HAR-RV model estimated by
OLS and the full random-forest model (. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) that
includes lagged industry returns. In order to obtain the forecast for the HAR-RV model, only one
of the 48 lagged industry returns at a time is added to the classic HAR-RV model, the resulting
48 models are estimated by the OLS technique, and finally the forecasts from the estimated 48
models are averaged to predict realized volatility. The column entitled “Window” shows the length
of the rolling-estimation window. The parameter h denotes the investment horizon (in months). The
random forests are built using 2000 trees
18 Realized Stock-Market Volatility: Do Industry Returns … 403

Table 18.13 The random-forest model versus the HAR-RV model when both feature industry
returns (Clark-West test)
Window .h = 1 .h = 3 .h = 12

Panel A: realized volatility


120 3.2539 3.4620 3.1259
180 2.9219 4.2481 3.4697
240 2.1151 3.2946 2.7494
360 3.8998 2.1487 2.6774
Panel B: bad realized volatility
120 3.0884 2.0974 1.5782
180 4.1625 2.9738 2.1935
240 5.3242 6.6560 4.1608
360 4.6712 2.8245 3.1990
Panel C: good realized volatility
120 3.0806 2.7379 2.5748
180 2.3024 2.1048 2.7511
240 2.2268 1.9401 2.1129
360 2.4586 1.8452 1.9595
Note This table reports the results of the Clark and West [8] test of equal mean-squared predic-
tion errors. The null hypothesis is that the extended HAR-RV model that includes lagged industry

returns (. RVt+h = β0 + β1 RVt + β2 RVt,q + β3 RVt,y + 48 j=1 β j+3 rt, j + εt ) has the same perfor-
mance as the full random-forest model (. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )). The
alternative hypothesis is that the full random-forest model performs better than the extended HAR-
RV model. Results are based on Newey-West robust standard errors. Critical values are 1.28 and
1.65 at the 10% and 5% level of significance. The column entitled “Window” shows the length of
the rolling-estimation window. The parameter h denotes the investment horizon (in months). The
random forests are built using 2000 trees

Table 18.14 Simulation experiment


Ratio Mean Min Max
RMSPE 1.1802 1.1203 1.2284
MAPE 1.1999 1.1596 1.2492
Note This table reports for realized volatility the results of a small-scale simulation experiment. For
every simulation run, the restricted random-forest model (. RVt+h = R F(RVt , RVt,q , RVt,y )) and
the full random-forest model (. RVt+h = R F(RVt , RVt,q , RVt,y , rt,1 , rt,2 , . . . , rt,48 )) are estimated
using a rolling-estimation window of length 240 observations. The investment horizon is set to
.h = 1. The random forests are built using 2000 trees. A total of 30 simulation runs are considered.
For every simulation run, the RMSPE ratio and the MAPE ratio are computed. The results are then
averaged (Mean) across the simulation runs, and the minimum (Min) and maximum (Max) of the
ratios across the simulation runs are computed
404 R. Demirer et al.

References

1. Andersen, T.G., Bollerslev, T.: Answering the skeptics: yes, standard volatility models do
provide accurate forecasts. Int. Econ. Rev. 39(4), 885–905 <error l="308" c="Invalid
command: paragraph not started." />
2. Barndorff-Nielsen, O.E., Kinnebrouk, S., Shephard, N.: Measuring downside risk: realised
semivariance. In: Bollerslev, T., Russell, J., Watson, M. (eds.) Volatility and Time Series Econo-
metrics: Essays in Honor of Robert F. Engle, pp. 117–136. Oxford University Press (2010)
3. Ben Nasr, A., Lux, T., Ajmi, A.N., Gupta, R.: Forecasting the volatility of the Dow Jones
Islamic stock market index: long memory vs. regime switching. Int. Rev. Econ. Finance 45(1),
559–571 (2016)
4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
5. Bouri, E., Gkillas, K., Gupta, R., Pierdzioch, C.: Forecasting realized volatility of Bitcoin: the
role of the trade war. Comput. Econ. (2020, forthcoming)
6. Cenesizoglu, T., Timmermann, S.: Do return prediction models add economic value? J. Bank.
Finance 36, 2974–2987 (2012)
7. Ciner, C.: Do industry returns predict the stock market? A reprise using the random forest. Q.
Rev. Econ. Finance 72, 152–158 (2019)
8. Clark, T.D., West, K.D.: Approximately normal tests for equal predictive accuracy in nested
models. J. Econom. 138, 291–311 (2007)
9. Corsi, F.: A simple approximate long-memory model of realized volatility. J. Financ. Econom.
7, 174–196 (2009)
10. Elliott, G., Komunjer, I., Timmermann, A.: Estimation and testing of forecasting rationality
under flexible loss. Rev. Econ. Stud. 72, 1107–1125 (2005)
11. Engle, R.F., Rangel, J.G.: The Spline-GARCH model for low-frequency volatility and its global
macroeconomic causes. Rev. Financ. Stud. 21(3), 1187–1222 (2008)
12. Engle, R.F., Ghysels, E., Sohn, B.: Stock market volatility and macroeconomic fundamentals.
Rev. Econ. Stat. 95(3), 776–797 (2013)
13. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29,
1189–1232 (2001)
14. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002)
15. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for Generalized Linear Models
via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). https://www.jstatsoft.org/v33/i01/
16. Giot, P., Laurent, S., Petitjean, M.: Trading activity, realized volatility and jumps. J. Empir.
Finance 17(1), 168–175 (2010)
17. Greenwell, B., Boehmke, B., Cunningham, J., GBM Developers: gbm: Generalized
Boosted Regression Models. R package version 2.1.8.1 (2022). https://CRAN.R-project.org/
package=gbm
18. Gu, S., Kelly, B., Xiu, D.: Empirical asset pricing via machine learning. Rev. Financ. Stud. 33,
2223–2273 (2020)
19. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, 2nd edn. Springer, New York, NY (2009)
20. Hong, H., Lim, T., Stein, J.C.: Bad news travels slowly: size, analyst coverage and the prof-
itability of momentum strategies. J. Finance 55, 265–295 (2000)
21. Hong, H., Torous, W., Valkanov, R.: Do industries lead stock markets? J. Financ. Econ. 83,
367–396 (2007)
22. Hong, H., Torous, W., Valkanov, R.: Note on “Do industries lead stock markets?”. http://rady.
ucsd.edu/docs/faculty/valkanov/Note_10282014.pdf (2014)
23. Meinshausen, N.: Quantile regression forests. J. Mach. Learn. 7, 983–999 (2006)
24. Mittnik, S., Robinzonov, N., Spindler, M.: Stock market volatility: identifying major drivers
and the nature of their impact. J. Bank. Finance 58, 1–4 (2015)
25. Müller, U.A., Dacorogna, M.M., Davé, R.D., Olsen, R.B., Pictet, O.V.: Volatilities of different
time resolutions—analyzing the dynamics of market components. J. Empir. Finance 4, 213–239
(1997)
18 Realized Stock-Market Volatility: Do Industry Returns … 405

26. Patton, A.J.: Volatility forecast comparison using imperfect volatility proxies. J. Econom. 160,
246–256 (2011)
27. Poon, S.-H., Granger, C.W.J.: Forecasting volatility in financial markets: a review. J. Econ. Lit.
41(2), 478–539 (2003)
28. Pradeepkumara, D., Ravi, V.: Forecasting financial time series volatility using Particle Swarm
Optimization trained Quantile Regression Neural Network. Appl. Soft Comput. 58, 35–52
(2017)
29. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria (2023). https://www.R-project.org/
30. Rangel, J.G., Engle, R.F.: The Factor-Spline-GARCH model for high and low frequency cor-
relations. J. Bus. Econ. Stat. 30(1), 109–124 (2011)
31. Rapach, D.E., Strauss, J.K., Wohar, M.E.: Forecasting stock return volatility in the presence of
structural breaks. In: Rapach, D.E., Wohar, M.E. (eds.) Forecasting in the presence of structural
breaks and model uncertainty. Frontiers of Economics and Globalization, vol. 3, pp. 381–416.
Emerald, Bingley, United Kingdom (2008)
32. Rapach, D.E., Zhou, G.: Forecasting stock returns. In: Elliott, G., Timmermann, A. (eds.)
Handbook of Economic Forecasting. Volume 2A. Elsevier, Amsterdam, pp. 328–383 (2013)
33. Rapach, D.E., Strauss, J.K., Tu, J., Zhou, G.: Industry return predictability: a machine learning
approach. J. Financ. Data Sci. 1(3), 9–28 (2019)
34. Reschenhofer, E., Mangat, M.K., Stark, T.: Volatility forecasts, proxies and loss functions. J.
Empir. Finance 59, 133–153 (2020)
35. Tse, Y.: Do industries lead stock markets? A reexamination. J. Empir. Finance 34, 195–203
(2015)
36. Salisu, A.A., Gupta, R., Ogbonna, A.E.: A moving average heterogeneous autoregressive model
for forecasting the realized volatility of the US stock market: evidence from over a century of
data. Int. J. Finance Econ. (2020)
37. Tibshirani, J., Athey, S., Wager, S.: grf: Generalized Random Forests. R package version 2.2.1.
https://CRAN.R-project.org/package=grf (2022)
38. Zhang, Y., Tse, Y., Zhang, G.: Return predictability between industries and the stock market
in China. Pac. Econ. Rev. 27(2), 194–220 (2022)
Chapter 19
Machine Learning Techniques
for Corporate Governance

Deepika Gupta

Abstract Even with much growth, development, evolution, advancement and contri-
bution to the governance mechanisms studies on firm and market performances, there
are no clear consensus on governance issues like CEO duality, board diversity, CSR
impact and other parameters. A need is felt to harmonize various concepts, theories,
models of corporate governance to meet the idiosyncratic needs of a firm. There is a
need of new data sources, technologies, research methods as a customized approach to
meet the gaps of existing literature and find better constructs to understand the intrica-
cies of governance mechanisms and help find resolution of conflicting or unexplored
results. One of such trajectories is machine learning techniques that can tailor the data
collection, process and analyze various sources for decision-making processes. This
chapter aims to provide creative integration of corporate governance mechanisms
with machine learning techniques in order to achieve managerial powers resulting
in competitive advantages. It looks at the areas wherein technology can provide the
required core competencies by providing solutions to enhance accurate and effec-
tive managerial decision-making, reduce their opportunistic behaviour and thereby
improve firm’s ability to handle different uncertainties in business. This move should
be towards a more universal and holistic approach through synergistic intelligence
to help shaping governance mechanisms and decision making in years to come.

Keywords Corporate governance · Machine learning · Artificial intelligence ·


Public funds · Ownership · Board · Decision-making

19.1 Introduction

Technology plays an indispensable part in the lives of every human being. It has also
made its impact on businesses for various decision-making processes. One of such
general-purpose technology that has considerable impact is artificial intelligence

D. Gupta (B)
Indian Institute of Management, Visakhapatnam, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 407
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_19
408 D. Gupta

(AI). Though not a new term as it was coined in 1950s [105], it is on the top of
agenda for many businesses [34] and a part of corporate board discussions today.
Times have changed in comparison to what Peter Drucker in 1967 articulated the
computer to be a total ‘moron’ as it can only execute commands and does not make
decisions [41]. This limited the importance of artificial intelligence on corporate
management and governance. Today, it is viewed as ‘general solution technology’
to any managerial, commercial or even society issues and problems [65].
Corporate governance1 has always generated interest both inside as well as outside
academia due to various scandals like Enron, WorldCom, Satyam Computers and
many more. These scandals highlight the failures and shortcomings in governance
systems thereby resulting in significant losses for various stakeholders [124]. The
corporate governance systems have been extensively reviewed, revised and in some
cases repealed and replaced or even new regulations were brought in. These include
The Sarbanes Oxley Act 2002 in The United States of America or the new Companies
Act in 2013 [120] that replaced the earlier 1956 Act [119] in India. The reforms in
corporate governance were carried out to ensure and restore investor confidence
and reduce investment risks [124]. Governance depends on both country-level as
well as firm-level mechanisms. The country-level governance mechanisms include a
country’s laws, its culture and norms, and the institutions that enforce the laws [95,
110]. Firms must adhere to the governance environment not by choice but due to
procedural, regulatory, and statutory requirements.
The literature on corporate governance brings out interesting empirical results that
remain either mixed or inconclusive in nature. One of the reasons as highlighted by
[124] is the application of one-size-fits-all governance solution to every type of firm.
This warrants the need to explore the governance mechanisms to more granular firm
level requirements. This influences the choice of research methods in this domain as
well.
Traditionally, there had been a few performance metrics wherein Tobin’s Q, return
on equity (ROE), return on assets (ROA), economic value added (EVA) were the
most used ratios in various studies on corporate governance [101]. Thereafter, the
use of ‘composite measures’ like commercial ratings indices [26, 58, 106], corporate
governance quality [15], corporate governance score [113] and other complex and
comprehensive versions of governance mechanisms. There was also rise of new
performance measures that extended to efficiency indicators like total sales, sales-
per-employee, asset turnover and others [56], innovation measures like research and
development investments [60], cost of capital [122], diversity and corporate social
responsibility [45], disclosures [12], real earnings management [107], tax avoidance
[70] and many others.
With respect to research methods, regression models with certain set of vari-
ables were largely used to understand various relationships amongst governance
variables with corporate and market performances. Such methods resulted in various
econometrical issues like endogeneity [124] and many a times approaches were

1Certain parts of the chapter are excerpts from author’s doctoral thesis titled ‘Corporate governance
and initial public offerings’ (2015) at Indian Institute of Management, Bangalore, India.
19 Machine Learning Techniques for Corporate Governance 409

made to control such issues. Qualitative research picked up for studying human
aspects of governance [90] through techniques like interviews, archival data, obser-
vation, surveys and others. Text analysis such as ‘tone measures’ [88] of governance
reports and filings was another research method used to associate with governance
mechanisms.
Advancements in econometrics and other research methodologies helped to adapt
to new data analysis methods like data envelopment analysis (DEA) and stochastic
frontier approach (SFA) [56], logit/probit and structural equation models usually
with multi-year data [14]. In order to meet the challenges of endogeneity issues
along with causality questions, use of lagged variables or instrumental variables
approaches were adopted. As models became larger and complex, use of principal
component analysis, qualitative comparative analysis [29], techniques like fuzzy sets,
fuzzy logics and governance bundles were explored [124].
With such growth, development, evolution, advancement and contribution to the
governance mechanisms studies on firm and market performances, there is no clear
consensus on governance issues like CEO duality, board diversity, CSR impact and
other parameters. Even today, a need is felt to harmonize various concepts, theories,
models of corporate governance to meet the idiosyncratic needs of a firm. There
is a need of new data sources, technologies, research methods for a customized
approach to help academicians, researchers to meet the gaps of existing literature
and find better constructs to understand the intricacies of governance mechanisms
and help find resolution of conflicting or unexplored results. One of such trajectories
is artificial intelligence that can tailor the data collection, process and analyze various
sources for decision-making. The chapter looks to discover this unexplored path as
avenues for future research in the governance literature that now needs intervention
of other domains as well for more matured decision-making corporate policies.

19.2 Aims and Objectives of the Chapter

The chapter aims to examine the collective governance aspects of political, economic,
cultural, social, and other changes [83] associated with the new concept of machine
learning techniques. However, such relationship is also dependent on the legal
systems, their enforcement, and other formal institutions that rely on cultural factors
of that particular nation [108]. It utilizes multiple perspectives to dissect the macro-
level as well as the micro-level aspects of governance as this is an essential tool in
the entrepreneurial process [21] of any enterprise.
The objective of the chapter is to decipher what lies in future at the intersection
of new technology usage of machine learning techniques on corporate governance
mechanisms. This chapter aims to provide creative integration of corporate gover-
nance mechanisms with machine learning techniques in order to achieve managerial
powers resulting in competitive advantages. It looks at the areas wherein technology
can provide the required core competencies by providing solutions to enhance accu-
rate and effective managerial decision-making, reduce their opportunistic behaviour
410 D. Gupta

and thereby improve firm’s ability to handle different uncertainties in business.


Studies on the use of machine learning tools on corporate governance mechanisms
are new and limited in number, and therefore, this chapter broadly explores this inter-
section thereby contributing to the strategy, finance, entrepreneurial and information
systems literatures.
The chapter is arranged as under. A detailed literature review of machine learning
techniques and corporate governance are done. A detailed understanding of various
theoretical lenses are done. Thereafter, an understanding of the interplay of machine
learning techniques with corporate governance mechanisms are explored to facilitate
the trend of ‘where we might go next’ in comparison of ‘how things got started’ and
‘where we are’ [124]. Lastly, the chapter looks at some thoughts, recommendations
and what lies in the future of this interesting juxtaposition of using digital intervention
in the corporate governance literature.

19.3 Machine Learning and Its Techniques

As we are aware artificial intelligence is the technology in the making. Artificial


intelligence is defined as ‘the activity devoted to making machines intelligent, and
intelligence is that quality that enables an entity to function appropriately and with
foresight in its environment ([94], p. 13). This technology is on a dynamic trajectory
and is on the path to further development [5, 118]. Artificial intelligence, at the
highest levels [65], encompass:
(a) Rule-based logic—Rule-based logic require humans to fully understand a given
context and define the rules that the machine should execute,
(b) Machine learning—Machine learning enables the machine to learn and derive
conclusions based on a set of data and learning algorithms without requiring
context understanding. In the end, machine learning could be considered rule-
based, if only for the rules underlying the learning algorithms.
(c) Mind machine learning—Mind machine learning is the futuristic category or
wave of artificial intelligence that can overcome the separation of mind and
machine and be based on the assumption that machines and minds can be
connected. Marsh [87] forecast it to be the emergence of ‘neuromorphic chips’
or other ways to connect the mind to the machine to unleash new potentials.
This taxonomy of artificial intelligence with analytics is brought out in Fig. 19.1.
It can be observed from the Fig. 19.1 that machine learning includes traditional
machine learning and deep learning as further two major divisions within this wave
of artificial intelligence. The traditional approaches to machine learning require
the extraction of features by humans. Deep learning is the most popular approach
currently and it draws its name to the architecture of ‘deep (or multi-layered) arti-
ficial neural networks—software that roughly emulates the way neurons operate in
the brain’ [51]. Within this deep learning, three approaches are usually identified.
19 Machine Learning Techniques for Corporate Governance 411

Analytics
Artificial Intelligence

Rule based Machine Learning Mind Machine


Logic Learning
Traditional Machine Learning

Deep Supervised Learning


Learning
Reinforcement Learning

Unsupervised Learning

Fig. 19.1 The taxonomy of artificial intelligence with analytics [65]

All these approaches assume a separation of the machine from the mind [65]. These
are as under:
(1) Supervised learning—Supervised learning is the most commonly used approach
as on date and it requires well-structured and labelled training data to train
the algorithms to improve AI-driven applications such as image recognition or
translation.
(2) Reinforcement learning—Reinforcement learning is based on the philosophy
of trial and error which is often used in board game simulations. The major
challenge of reinforcement learning lies in the large number of trial rounds
required to achieve good results.
(3) Unsupervised learning—The most challenging but most promising approach
is unsupervised learning, where the algorithms are designed to ‘learn directly
from unstructured data coming from their environments’ [51].
Machine learning technologies are in vogue today and currently being researched
though all the approaches fall under the purview of AI. It is also essential to under-
stand how the scale—human intelligence and how the process of intelligence devel-
opment—learning is compared between the mind and the machine. Both, human and
machine learning cycles assume that decisions are based on predictions of possible
outcomes. Prediction takes information called as ‘data’ and uses it to generate
information that we do not have [3].
So, in human learning, predictions are based entirely on input data, however in
machine learning it is assumed that three types of data—input, training and feedback
data exists as these data have different roles to pay in each supervised, reinforcement
and unsupervised learning [65]. The human and machine cycles are brought out in
Fig. 19.2.
Data is an inherent feature of decision-making on which the techniques of machine
learning are based. Decision making is always about consciously choosing between
two or more options. The options can be either binary, for example, yes or no, or
multifaceted, for example, options 1, 2 and 3. The choice always depends on the
412 D. Gupta

Machine Input data Training data Feedback data


Learning
Cycle

Algorithm

Decision Outcome

Decision Sensing
Decision Framing

Human
Learning Judgement Input data Feedback
Cycle

Fig. 19.2 Human and machine learning cycles [65]

criteria chosen [65]. Still et al. [116] have outlined as to how an informed decision
usually follows a similar pattern and thereby distinguish between three phases, that is,
conceptualization, information, and prediction as in Fig. 19.2. In order to recommend
use of technology or rather machine learning techniques on business decisions, Hilb
[65] advocates Stacey’s [115] taxonomy of four different types of decisions based
on degree of certainty and agreement as in Fig. 19.3.
Though machine learning techniques also rely on input data, however, training
data is crucial for supervised learning while feedback data is vital for reinforcement
and unsupervised learning. The challenge now is to apply such logic to decision-
making processes by firms by use of machine learning techniques. It is important to
intertwine the decision types with machine learning approaches to understand how
decision-making processes can work corporates. This is explained in Table 19.1.
In a nutshell, machine learning techniques provide potential for transparency,
helps to distinguish between causality and correlation and is also an alternative
approach to existing traditional research methods in terms of providing efficiency
for valid predictions given the complexity and volumes of data in any domain. In
recent times, machine learning mechanisms are widely used across various sectors
and fields like finance, accounting, healthcare, logistics, supply chain management
and many more where new data sets, especially big data are available and there is

Fig. 19.3 The four decision


types [65] Low Chaotic

Complex
Agreement

Complicated

Common
High

High Certainty Low


19 Machine Learning Techniques for Corporate Governance 413

Table 19.1 Machine learning approaches and decision types (Adapted from Hilb [65])
Approaches Decision type Reason Remarks
Supervised For common decisions Much less effective for Given the need for
learning other decision types like. relevant training data
complicated, complex or
chaotic
Reinforcement Effective in automating Relies heavily on trial Not effective in
learning complicated decisions and error and thus on handling complex or
based on past routines feedback data chaotic decisions
Unsupervised For complex decisions Provides clues For chaotic decisions,
learning difficult to rely on any
known machine
learning approach

huge literature around these domains. Machine learning, in fact is a system that uses
algorithms that can process large data sets, detect patterns, and improve its ability
to analyze information over time and with more data and these models are found to
have higher predictive accuracy than statistical models [124]. Its usage in corporate
governance is limited and this chapter aims to explore these new avenues of research
methods.

19.4 Corporate Governance

Defining corporate governance is a challenging mission. There are variations in the


definition of corporate governance because of the differences in the perspectives
regarding its ambit. Cadbury [17] defines corporate governance as ‘the system by
which companies are directed and controlled’. Shleifer and Vishny [112] defines it
as a ‘set of mechanisms relevant to economic efficiency due to its influence over
the decision of investors to provide finance, debt or equity, to the firm’. The defini-
tion highlights the importance of governance structure in assuring a significant flow
of capital to enable financing for firms. Zingales [128] viewed governance from a
broader perspective and defined it as ‘the complex set of constraints that shape the
ex-post bargaining over the quasi-rents generated by the firm’. Gillan and Starks
[55] defined corporate governance as ‘the system of laws, rules, and factors that
control operations at a company.’ Some economists view corporate governance as
a nexus of contracts among owners, who essentially pursue private means and run
their corporations for their own self-interests without explicit obligations to society.
Leung et al. [83] defined corporate governance mechanisms as the collective conse-
quences of political, economic, cultural, social, and other changes, and thus, extended
the horizon from formal to informal components. Larcker et al. [80] viewed corpo-
rate governance as ‘the set of mechanisms that influence the decisions made by
managers when there is a separation of ownership and control.’ A similar view is
414 D. Gupta

highlighted by Vito and Trottier [124]. Thomsen [121] defined it as ‘system that is a
composite of ownership, boards, incentives, company law, and other mechanisms.’
Corporate governance definitions are, thus, closely tied to different paradigms or
ways of conceptualizing the organization or the firm.
Corporate governance rules and norms originate and are supported by the legal
institutions of the firm’s home country. In prior studies, researchers continued to
debate on what factors best explain the diversity of corporate governance across
countries. The studies involving international comparisons largely took place after
the 1990s. These included the influential works of La Porta et al. (abbreviated as
LLS) [72–74] and LLS with Vishny (abbreviated as LLSV) [75–79] and LLS with
Djankov [37]. These studies largely focused on the differences among the legal
systems of countries, various shareholders’ and creditors’ rights, and enforcement
systems [10]. These studies also inspired a large body of works on international
comparisons. Additionally, a substantial body of research showed that cross-firm
differences in governance have a substantial effect on firm values and performances
(such as [9, 57]). However, such works largely looked at the effects of governance
mechanisms on valuation and performances of the entire population of the firms.
The important aspect here is that irrespective of the definition used, researchers
[54], for instance) have viewed corporate governance mechanisms as falling into two
groups:
(a) Mechanisms external to firms and
(b) Mechanisms internal to firms.
Acharya et al. [1] argued that external governance (even if crude and uninformed)
and internal governance complement one another, to improve efficiency and to ensure
that firms have substantial values.

19.4.1 External Governance Mechanisms

Firms have to deal with various entities in the external environment [50, 54] and
are required to operate under the legal and regulatory environments of the particular
country where they are located. National systems of corporate governance differ in
terms of their institutional arrangements and these differences shape the possibilities
for change or diffusion in practices from one country to another [4] and thereby
comparative corporate governance was defined as ‘the study of relationships between
parties with a stake in the firm and how their influence on strategic corporate decision
making is shaped by institutions in different countries’.
The influential works of LLSV [75–79] documented significant differences in the
levels of investor protection, ownership concentration, dividend policies, creditor
rights, and enforcement abilities across countries, thus, attesting to variations in
several country-level institutional parameters. The similarities and differences among
corporate governance practices at the national level cater to the ‘macro’ question and
19 Machine Learning Techniques for Corporate Governance 415

the association with particular firm-level outcomes such as firm performances and
stock market returns addresses the ‘micro’ question.
At the national level, the governance environment matters for the size and the
extent of a country’s capital markets because good governance would protect poten-
tial investors from expropriation by entrepreneurs [75]. At the macro level, the corpo-
rate governance mechanisms are the economic and legal institutions that can be
altered through political processes, if required, for the betterment and scope of the
capital market. Additionally, Chen et al. [25] highlighted the criticality of research
from cross-cultural perspectives. At the micro level, corporate governance mecha-
nisms define the power-sharing relationships between investors and the founders of
the firms [57].
The institution theory-based works by Peng [96, 97] and Peng et al. [98] argued
for the institution-based view as the third leg of the strategy tripod, the other two
being industry organization and resource-based views. On the one hand, economists
(such as in the LLSV studies) mostly focused on formal laws, rules, and regulations
and sociologists (for instance, Meyer and Rowan [36, 91] paid more attention to
informal cultures, norms, and values; on the other hand scholars such as North [95]
and Scott [110] supported a complementary view where the research on the impact
of institutions investigates both formal as well as informal components. The new
institutional perspective [126] also attempts to focus on social and legal norms and
rules that underlie economic activities.
Institutions are commonly known as the ‘rules of the game’ [98]. The more formal
institutions are defined as ‘the humanly devised constraints that structure human
interaction’ and ‘regulative, normative and cognitive structures and activities that
provide stability and meaning to social behaviour’ [36, 91, 95, 110]. Thus, institutions
are broadly divided into formal and informal components that are complementary
to each other. The remarkable consensus here is that ‘institutions matter’ as a core
proposition [98].
Culture is argued to be related with governance because the effectiveness of legal
systems, their enforcement, and other formal institutions largely depend on cultural
factors [108]. In recent years, there has been growing recognition that culture affects
both economic exchange and outcomes by affecting expectations and preferences
[11]. Studies now show that perceptions rooted in culture are important determinants
and thus affect the level of trust and the nature of financial contracting.
Culture is often defined as a system of shared values, beliefs, and attitudes
that influences individual perceptions and behaviours. The level of trust encour-
ages economic exchange, and this trust leads investors to invest in even the IPOs of
totally unknown firms. Trust also affects stock market participation, and these cultural
aspects have a significant impact on a wide variety of cross-border economic trans-
actions. Thus, following Hofstede [66, 67], culture is often conceptualized through
the construction of national averages, which are used to create something akin to the
personality profile of an “average person” in a society. These latent propensities of
individuals are then argued to assert some causal influence on the economic organi-
zation. The emphasis on cultural value shape and justify individual and group beliefs,
actions, and goals. The external governance has influence on network centralities,
416 D. Gupta

social network structures, ties, strategic alliances and other contractual relations [85]
that educate firms to better position itself and its resources in markets for competitive
advantage.
On a macro-level, national cultural practices influence the institutional environ-
ment, which in turn has an influence on corporate governance practices [32]. Culture
plays an indirect role in shaping corporate governance mechanisms. Institutional
arrangements and policies, norms, and everyday practices express the underlying
cultural value emphasis in societies [109].
In a nutshell, the external mechanisms protect all the stakeholders through the legal
system, the market for corporate control, the managerial labour market, monitoring
by institutional investors, and disciplinary measures arising from financial debt [124],
social network structures, network centrality and innovation [85].

19.4.2 Internal Governance Mechanisms

The internal governance mechanisms are sub-categorized into ownership struc-


tures and the Board of directors. The internal governance mechanisms have been
largely framed on the basis of agency theory [69], signalling theory [82], resource
dependence theory [100], stewardship theory [33], institutional theory [96–98] and
integrative social contracts theory [38].

19.4.2.1 Ownership Structures

Jensen and Meckling [69] suggested that equity ownership aids in the alignment
of the interests of managers with those of the shareholders and mitigating agency
costs. Alignment refers to the effects of insider ownership and control refers to the
effects of outsider ownership [30]. Morck et al. [92] suggested managerial equity
ownership to be beneficial at lower levels but negative at higher levels, indicating
that as insider (retained) ownership increases, it affords managers greater power
that facilitates their entrenchment. Outside ownership in the form of institutional
holdings and block holdings are proposed as the solution to the agency problem [30].
These shareholders are better able to internalize the costs associated with monitoring
management. They have a general interest in profit maximization and enough control
over the assets of firms to have their interests respected [111].
Institutional investors: Institutional investors form one of the important groups
of investors in any firm. These can be either domestic or foreign in nature. It is
generally argued that institutional investors possess lot of private information about
the companies. They are seen as important players in any capital market across
the world. Institutional investors prefer large and liquid stocks with good corporate
governance practices, especially in countries where country-level investor protection
and the quality of institutions are weak [71]. However, the investment patterns of
these shareholders vary across time, between different countries, and also within the
19 Machine Learning Techniques for Corporate Governance 417

same countries. Studies show that institutional investors have almost doubled their
investments in firms, leading to an increase in the prices of such firms.
Institutional investors can have their representatives sit on the Boards and monitor
the decision-making processes, can have voting rights, and can also monitor executive
compensation contracts [62], dependent on the percentage of their shareholding of the
company’s equity. The institutional investors prefer to invest in firms with superior
past financial performance, lower volatility of share price, high trading liquidity,
larger size, longer listing history, better public funds utilization [22, 24, 59] and
others.
Kurshed et al. [71] in their study of the U.K. sample set, found institutional
ownership to be negatively related to directors’ ownership and positively related
to the composition of the Board of directors. They found that the U.K. institu-
tional investors prefer smaller firms and firms with smaller boards, shorter listing
history, and low trading liquidity, these findings were in contrast to the results of
the U.S. studies. Institutions play a monitoring role in mitigating agency problems
between shareholders and managers; they also influence, either positively or nega-
tively, compensation structures through their preferences [62]. Institutional investors
have influence on firm performances, signal firm’s prestige and quality and are one
of the important corporate governance mechanisms who can facilitate in monitoring
the decision-making processes.
Retained ownership: Retained ownership in firms is one of the key indicators
of the control of owners and managers of a firm. High concentration of retained
ownership mitigates various types of agency conflicts. Ownership concentration is,
therefore, an important governance parameter that enhances the firm’s performances
and reduces the chances of funds raising discounts arising from agency conflicts. One
of the key decisions that the firm’s owners and managers control in any initial public
offering is what percentage of the firm to sell [93]. This could act as an important
signalling mechanism to the investors to assess the reduction in the risks as foreseen
by them. Higher retained ownership helps to increase the level of confidence, builds
faith in the investors, and also helps to reduce agency costs. This also sends signals
about the owners’ confidence in the future prospects of the firm.
Retained ownership pattern is an important governance structure that influences
the firm’s performance in relation to the stated intentions of its public offerings [22].
Since high ownership retention helps in mitigating agency conflicts and investors
view such firms positively [102], promoters seek to have efficient decision-making
processes and also aspire to utilize these funds in order to retain confidence and the
positive signals about the firm in the market.

19.4.2.2 Board of Directors

The Board of directors has been viewed as the heart of corporate governance. The
directors are elected by the shareholders and have a fiduciary obligation towards
them. They are also responsible for providing strategic directions on investments and
financing decisions and for monitoring the management of the firm [30, 54]. Boards
418 D. Gupta

consist of a mix of inside and outside directors. Inside directors are officers of the firm
and possess intimate knowledge about the firm’s activities. Outside directors have
no substantive relationship with the firm, they owe their position on the Board due
to the specific expertise they possess in areas that are significant to the firm. Outside
directors are independent in nature. There is no consensus that the differences in
Board independence result in improved corporate performance [30]. Whether the
Board is truly independent in nature is still debated. However, Dalton et al. [30] argued
that resource dependence [100] or a resource-based perspective [7, 125] values the
networking potential and innovation diffusion potential of the Board’s independence
while the agency perspective considers independence to be a threat given that such
directors also serve on the Boards of other firms.
Studies on Board size and composition have produced mixed evidence of the
relationship between Board composition and corporate financial performance [30].
The leadership structure of the Board has been another important element and there is
a large body of literature dedicated to CEO duality, and the nature of the chairperson
of the Board—whether independent or from the founding family of the firm. The
stewardship theory believes in the notion of the unity of command in CEO duality
as beneficial to the firm, however, from an agency perspective, issues related to the
separation of CEO from the Board chairperson remain unsettled [30].
Family-managed and non-family-managed firms: The composition of the Board,
given the ownership structure of the firm, is an important governance mechanism for
any company. When firms have members of family/families as the key members on
the Board, it can be either owned and managed by the family members or managed
by non-family members. This separation of ownership and control lies at the core
of agency theory and has been the subject of numerous debates and studies. Family-
managed and non-family-managed firms continue to attract interest as composition
of the Board is an internal corporate governance mechanism in any public fund-
raising context. Family-owned and family-managed firms and non-family-managed
firms have different impacts on the long-run performance of the firms.
McConaughy et al. [89] found that family-controlled firms have greater market
value and operate more efficiently than other firms because the costs of monitoring
are less in such firms. The choice between concentrated or dispersed ownership and
votes are determined by the size of private control benefits [42].
In India, the majority of companies continue to have a large number of family
members on their Boards. The owners of family firms generally rely exclusively on
family members because they find it difficult to delegate to outsiders, have insuffi-
cient knowledge of formal management techniques, fear losing control, or believe
that professionalization comes at unnecessary costs. In turn, non-family managers
decide to stay away from family firms because they are likely to offer outsiders
limited potential for professional growth, restrict their roles to that of a tutor, coun-
sellor, or confidant, and exclude them from succession [35]. Studies have shown
that non-family managers play an important role and have a positive impact on firm
performance due to their formal business training and experience, cultural compe-
tence, and they are not tied by the emotional connections to the family and the
firm.
19 Machine Learning Techniques for Corporate Governance 419

In a family-managed firm, the founding family has higher returns and private
benefits than the other large shareholders [42]. In addition to this, the founding
family also has cash flow rights and voting rights that are used in conjunction in
a controlled shareholder structure (as in family-managed firms) but not otherwise
[8]. As the family members managing such firms continue to exercise considerable
control on their firms, they can also seek opportunistic behaviour though trying to
operate within the ethical boundaries of social contracts, influence the monitoring
and decision-making processes.
CEO duality: An important internal governance mechanism is CEO duality, where
one person plays two roles—one as the Chief Executive Officer (CEO) of the firm
and the other as the Chairperson of the Board of directors. Non-duality implies
that different individuals serve as the CEO and the Chairperson [6]. CEO duality
[48] is one of the main factors in corporate governance that has been extensively
debated across the world. In a study using samples from the U.S., the U.K., and
Japan, Dalton and Kesner [31] found that 30% of the U.K. companies in the sample
had CEO duality, while 82% of the U.S. firms in the sample had CEO duality. With
subsequent changes in the regulations, the U.S. firms are adopting non-duality, but
these figures are still low when compared to the U.K. In the U.K., the Combined Code
recommends having different individuals as the CEO and the Chairman. In India,
there was no prohibition on CEO duality, the decision is left to the firms; however,
SEBI now mandates CEO non-duality for listed firms.
Most of the Anglo-Saxon countries use a one-tier system of Board structure.
Therefore, the results of the studies in these contexts are mixed, with some coun-
tries choosing unitary leadership structures (CEO duality) and some other countries
opting for dual leadership structures with a separation of the two jobs. The agency
theory treats CEO duality undesirable, as this would lead to a lack of indepen-
dence and vigilance, and would also lead to more agency problems, and thus, poor
performance. This theory postulates that CEO duality constrains Board indepen-
dence and promotes CEO entrenchment. A centralized leadership authority leads to
management dominance and poor financial performance.
The stewardship theory proposes that CEO duality works towards the unity of
command at top and avoids confusion regarding who the head is (the CEO or the
Chairperson); this enables timely decision making, thus positively impacting the
firm’s performance [99]. This theory postulates that when there is CEO duality, firms
reap a number of benefits since the potential for conflicts between the CEO and the
Chairman is eliminated due to the unified company leadership, thus making way for
smoother, more effective, consistent strategic decision-making and implementation
[23]. Boyd [13] provided partial support for both the agency as well as stewardship
perspectives. Several studies addressed the CEO duality-performance relationship
but reported inconsistent results.
Baliga et al. [6] studied CEO duality and the performance of the firms with Fortune
500 companies as the sample for the period 1980–1991 and found weak support for
a link between the two. Peng et al. [99] studied CEO duality and firm performance
during institutional transitions in China and found strong support for the steward-
ship theory and relatively less support for the agency theory. Elsayed [44] focussed
420 D. Gupta

on a sample of Egyptian listed firms and found that CEO duality had no impact
on corporate performance, the results supported the agency as well as the steward-
ship theories when the interaction term between industry type and CEO duality was
introduced. Ramdani and Witteloostuijn [104] used quantile regression analysis on
samples from Indonesia, Malaysia, South Korea, and Thailand and found a negative
moderating effect of Board size on the positive relationship between CEO duality
and firm performance. Iyengar and Zampelli [68] did not find evidence to support the
contention that CEO duality is a structure that is purposefully chosen for optimizing
performance. In line with the arguments of the agency theory, CEO duality would
result in the concentration of excess power, which could prevent adequate monitoring
[39] and encourage opportunistic behaviour.
Size, diversity, composition, and Chair of the Board: Board size, diversity and
composition are the most important internal governance mechanisms that send
signals to investors about the firm’s quality and prestige. Certo [20] suggested that
the investors’ perceptions of Board prestige signal organizational legitimacy, thereby
reducing the liability of market newness and improving firm’s stock performance.
Larger board size delays the decision-making processes due to the higher levels
of uncertainty and coordination problems. This would lead to difficulties in arriving
at a consensus speedily, in turn increasing timeliness on various fronts [63] and
trigger free-riding issues [104] among the board members. Smaller boards are easy
to manage and are capable of taking quick decisions [39].
Diverse board provides diversity of thoughts and perspectives. A commonly
studied measure for board diversity is number/proportion of women directors on
boards. There have been mandating requirements about inclusion of women on corpo-
rate boards. There have been various studies with mixed results in understanding
board diversity with firm performance [47, 52, 86] , board effectiveness [2], firm
value [18, 86] , earnings reporting quality [114], stock price informativeness [61],
agency costs [53], educational levels and independence [123].
Further, the Board composition—in terms of the proportion of inside and outside
(independent) directors—is important in the context of the regulatory requirements.
The independent directors on the firms bring with them diverse experience and exper-
tise that also signals about firms prestige and quality. Li and Naughton [84] used a
sample of Chinese firms and argued that higher the proportion of independent direc-
tors on the Board, the better would the firm’s long-term performance because these
independent directors have more incentives to work in the interests of stockholders,
thereby reducing information asymmetry.
The Chair of the Board can be either from the founding family or an indepen-
dent person unconnected with the founding family. An Executive Chair (being form
founding family) would wield more executive powers and can exercise more oppor-
tunistic behaviour though at the same time trying to operate within the ethical bound-
aries of social contracts, thus influencing the decision-making processes and the
decision control of the Board compared to the Non-executive Chair. Various studies
in this context also highlight mixed results with respect to the firm performance and
value to shareholders.
19 Machine Learning Techniques for Corporate Governance 421

Beginning with the definition of corporate governance and then its external and
internal mechanisms, there have been no consensus and different studies across the
world have shown mixed though significant results at times. This brings out confusion
and anomaly in relying on the outcomes of these studies in the field of corporate
governance thereby warranting a need to use technology in the form of machine
learning to innovate common solutions for managers.

19.5 Why is Machine Learning Required for Corporate


Governance?

The extant literature on both machine learning (part of artificial intelligence) and
corporate governance suggest that studies were usually done independently in each
of these domains by researchers from economics, entrepreneurship, finance, law,
management, operations and accounting backgrounds. The machine learning litera-
ture is relatively new and depicts a new path to understand novel research methods,
given the extensive digitalization generating big datasets. It is a challenge to under-
stand the solutions buried in such complex data. The machines extend the human
intelligence in finding resolutions to such mysteries.
Further, the governance literature highlights the importance and impact of gover-
nance structures and mechanisms that produce values for their shareholders and also
protect their rights. One of the reasons for this could be the changes in the legal
and economic structures of firms especially when they go public [40]. The regula-
tory compliances ensure that the systems of control are embedded in the operations
of the companies and are part of their daily cultures. As long as the company is
privately managed with no outside investors or major stakeholders, corporate gover-
nance requirements are not very strictly followed. However, when the company
seeks to offer its shares to the public and seeks stock market listing, detailed corpo-
rate governance mechanisms come into force for the first time. Existing shareholders
who want to sell their stock and prospective investors who want to buy stock have
the marketplace as the ultimate valuing mechanism to determine the final outcome
[102].
One of the most important contributors of corporate governance structures are
the investors because they provide finance when companies bring out their maiden
public issues; therefore, the need to reflect the owners’ or shareholders’ desires has
typically been the focus of debate regarding corporate governance reforms. The
protection of investors from agency risks resulting from the separation of ownership
and control [69] has been the central preserve of corporate governance recommenda-
tions throughout the world [16]. Following a public issue, in addition to maintaining
the corporate governance requirements, the company has to balance many competing
considerations of and obligations to different classes of stakeholders (such as share-
holders, employees, customers, suppliers, creditors, and others) as well as handle
the wider social responsibilities to the communities in which they operate. Thus,
422 D. Gupta

the firm needs to cater to external (legal and regulatory requirements) and internal
(ownership and board structures) governance mechanisms.
Given that, the studies on governance mechanisms have been mixed and with no
harmony in research across varied concepts, it is time that more advanced methodolo-
gies are explored for better harmonious results and conclusions. It is at this juncture
wherein advanced technologies like machine learning can prove to be a great blessing
to dig up nuances in complexities of external and internal governance mechanisms
in organizations. This makes the decision-making activities more demanding and
complicated.
At such confluence, intelligent machines can come to the rescue wherein an inte-
grated perspective can be used to achieve desirability, feasibility and responsibility
[65] tripod to understand various decision-making activities due to corporate gover-
nance mechanisms. Some of the essential functions of the board is strategy formu-
lation, policy making, executive supervision, accountability, transparency, disclo-
sure and others [49] or identified generically as supervisor, co-creator and supporter
[27]. These functions are in fact decision making processes and require proper
understanding of various input data as per the operations of the firm.
As the firms grow in size and with digitalization in place, the data usually generated
remains in unstructured format and in freer forms like digitized text, videos, audios,
photographs and others. All these data are useful as information and for prediction
and decision-making processes. Data from social media can also be new thought
on governance as it includes views of multiple stakeholder [124]. Researchers also
want to go beyond primary survey data by substituting it with archival data [14],
behavioural experiments and field studies [19, 81]. Due to such large and new data
sets and computational power, the importance of machine learning techniques cannot
be ignored in the corporate governance literature. Though the intervention of such
machine learning techniques is rather belated but as it is rightly said ‘better late than
never’.
Such solutions can help bring consensus through possible solutions in certain
concepts of the divided or single point of view results of governance literature.
Thereby, a need arises where competition and complementarity exist between mind
and machine for efficiency and effectiveness. Hilb [64, 65] argues for a superior
combination of man and machine for synergic intelligence being distinguished
into five scenarios—assisted, augmented, amplified, autonomous and autopoietic
intelligence.
These are discussed as in Table 19.2.
The use of such synergic intelligence when applied in corporate governance can
influence board practices such as direction, control, power and other matters like
compensation, compliances, misuses, obligations etc. This can be achieved through
various ways like automated reporting processes, real-time data provision, use of
predictive models like valid scenarios and superior simulations. In the long run,
there might be scenarios where machines teach machines and human intervention
may no longer be required [65].
As on date, there are studies wherein the machine learning techniques, compu-
tational linguistics are in use in accounting, taxation evasion, fraud detection,
19 Machine Learning Techniques for Corporate Governance 423

Table 19.2 Scenarios of synergic intelligence (Adapted from Hilb [65])


Particulars Assisted Augmented Amplified Autonomous Autopoietic
intelligence intelligence intelligence intelligence intelligence
Who makes Humans Humans Both humans and Machines An artificial
the decision machine entity within
independently? a certain area
How are Rely on Rely on The machine Machines This artificial
decisions selective more makes operate within entity is also
made? decision sophisticated recommendations a predefined able to
support solutions that must be range without develop and
approved by constant expand the
humans, who is decision inputs area over
able to provide time
additional inputs
Which Translation That may Co-existence Self-regulating Marginalizes
technologies or speech surpass between mind control the necessity
can be used? recognition human and machine mechanisms or and influence
intelligence highly of human
developed decision
robots making
Examples AI driven Outliers in Complex expert Accountability Science
applications large data or recommendations and liability fiction
automated measures literature
reporting
Perspective Supported Call for Social debate Social and Substantive
by society regulation is required as not regulatory societal
and usually on top of the acceptable debates have debates yet
well agenda neither provided begun and are to start
regulated in legal regime required

bankruptcy prediction in finance sector (say, [43, 117, 127]). Such novel methods
provide insights into meaningful economic effects. The same can also be extended in
the less explored corporate governance domain. Use of text analyses of information
such as annual reports, transcribed meetings and conference calls, extracting board
members from various public documents can be used to create directors’ network
analysis through the technique of say, named entity recognition in corporate gover-
nance [43, 124]. Recent studies also indicate inclusion of corporate governance vari-
ables like sustainability [103], reports and proxy statements [129], directors’ selection
[46] in machine learning analyses like text analyses, text mining, semantic networks,
decision tree algorithm [28].

19.5.1 Legal and Ethical Issues

With such potential contribution and impact of machine learning techniques, it is


also essential to address issues pertaining to legal and ethics as these are the roots
424 D. Gupta

of corporate governance mechanisms. Issues like accountability, liability, business


judgement, data protection, regime heterogeneity [65] are significant because corpo-
rate governance mechanisms are in force due to the basic characteristics of limited
liability, separation of ownership and management the moment the company get
incorporated under statute. Various decisions are made by the management (Board)
based on available internal and external information and data with results limited due
to bounded rationality and moral hazards (issues due to agency theory). Furthermore,
the use of machine learning approaches depends heavily on the data availability, data
protection and data access regulations to ensure companies can protect sensitive data
needed for strategic decisions. In cases of legal disputes, it is essential to decipher and
decrypt the ‘black boxes’ [65] behind these technological applications and the chal-
lenges posed to corporates whether to automate or not decision-making processes at
which managerial levels?
Further, Hilb [65] also brings out serious ethical considerations like values, free
will, biases due to technological intervention, wealth distribution and intelligence
monopolization in such context. The fear of biases creeping in, distortions caused
and prejudices when machines imitate human solution finding methods that shall
influence human behaviour in future. The major challenge is the correction and
resolutions that can be adapted in such scenarios wherein intelligence is largely
threated to be monopolized and powered by machines. Corporate governance is
based on two philosophical pillars—values and free-will wherein technology be
instrumental rather than value-based view of morality. It is essential for any type of
learning, whether machine or otherwise to be moral in nature for avoiding unintended
costs, conflicts and consequences. In addition, free will is essential for legitimate
decision-making systems and the autonomous control of machines or technology may
limit the freeness of decision-making processes. The legal and ethical perspectives
need to serve roles of desirability, feasibility and responsibility because the corporate
governance characteristics need to be understood in terms of values, free-will such as
the independent directors on the board, committees formed with unbiased intentions
and many other statutory and regulatory compliances to be adhered to. This brings
out the levels of complexities and dynamism in the field wherein the solution lies
in an integrated perspective of corporate governance and all dimensions of artificial
intelligence [65] for sustainability.

19.6 Future Scope of Study

The recent increasing literature on machine learning and corporate governance mech-
anisms provide huge and tremendous scope for future academicians, researchers,
managers, practitioners and policy makers as well. The emergence of machine
learning techniques provides opportunities and challenges about adoption of these
new avenues that shall further shape up managerial power and network structures
to influence firm’s strategy, decision-making and operational efficiencies. Future
19 Machine Learning Techniques for Corporate Governance 425

scope of studies can explore creative research methods, study variables and other
parameters to integrate insights from internal and external governance mechanisms.
Future works shall cultivate in motivating mangers and practitioners with better
decision-making in highly complex and competitive environments by their rich expe-
rience and professional knowledge by understanding and absorbing sense out of
the results and patterns generated through machine learning techniques. The policy
makers and government must take positive steps to promote reforms as a future
step by strengthening administrative and regulatory measures across the use of big
data environment along with corporate interests. Corporate governance is a broad
term and with machine learning techniques there is immense scope for future work
as technologies help in breaking the bounded rationality behaviour of humans and
broadens the scope of managers’ true motivation and decision-making behaviours.

19.7 Summary and Conclusion

Using machine learning techniques to the field of corporate governance should


encourage research to fill gaps, find consensus in various mixed empirical studies
and solutions for firms that are complex in nature with different ownership structures
across various economies of the world. The chapter seeks to find answers and solu-
tions by exploring new thoughts not only on performance measures [65], theories
of corporate governance [19, 90] but also on new research methods [65] through
machine learning techniques. This can help expanding the boundaries of both the
domains by enriching through knowledge sharing and support designs the complexi-
ties and dynamism involved in these literatures. The move should be towards a more
universal and holistic approach rather than forcing a one-fit-size-all solutions to each
firm [65]. Given the limitations of machine learning techniques as well, synergistic
intelligence can help for shaping governance and decision making in years to come.
The firms should discuss the implications of both the technology and governance
in understanding, designing and shaping the future of both the domains. The multi-
faceted nature of corporate governance provides the freedom to innovate, to disrupt
and facilitate itself to be screwed with machine learning techniques within the legal
and ethical frameworks for more meaningful and sustainable results for decision-
making for the benefits of businesses, investors, leadership and policymakers as
well.

References

1. Acharya, V., Myers, S., Rajan, R.: The internal governance of firms. J. Financ. 66(3), 689–720
(2011)
2. Adams, R., Ferreira, D.: Women in the boardroom and their impact on governance and
performance. J. Financ. Econ. 94(2), 291–309 (2009)
426 D. Gupta

3. Agrawal, A., Gans, J., Goldfarb, A.: Prediction machines: the simple economics of artificial
intelligence, vol. Spring. Harvard Business Press, Boston (2018)
4. Aguilera, R., Jackson, G.: Comparative and international corporate governance. Acad. Manag.
Ann. 4(1), 485–556 (2010)
5. Armour, J., Eidenmueller, H.: Self-driving corporations? ECGI Working Paper Series in Law
(2019).
6. Baliga, B., Moyer, R., Rao, R.: CEO duality and firm performance: What’s the fuss? Strateg.
Manag. J. 17(1), 41–53 (1996)
7. Barney, J.: Firm resources and sustained competitive advantage. J. Manag. 17(1), 99–120
(1991)
8. Bebchuk, L.: A rent protection theory of corporate ownership and control. Working Paper,
Harvard University (1999)
9. Bebchuk, L., Cohen, A., Ferrell, A.: What matters in corporate governance? Working Paper,
Harvard Law School (2004)
10. Bebchuk, L., Weisbach, M.: The state of corporate governance research. Rev. Financ. Stud.
23(3), 939–961 (2010)
11. Bell, R., Moore, C., Filatotchev, I.: Strategic and institutional effects of foreign IPO perfor-
mance: examining the impact of country of origin, corporate governance and host country
effects. J. Bus. Ventur. 27(2), 197–216 (2012)
12. Beyer, A., Cohen, D., Lys, T., Walthe, B.: The financial reporting environment: review of the
recent literature. J. Account. Econ. 50(2–3), 296–343 (2010)
13. Boyd, B.: CEO duality and firm performance: a contingency model. Strateg. Manag. J. 16(4),
301–312 (1995)
14. Boyd, B., Adams, R., Gove, S.: Research methodology of governance studies: challenges and
opportunities. Corp. Govern. (Oxford) 25(6), 382–383 (2017)
15. Bozec, R., Bozec, Y.: The use of governance indexes in the governance-performance
relationship literature: International evidence. Can. J. Admin. Sci. 29(1), 79–98 (2012)
16. Burton, B., Helliar, C., Power, D.: The role of corporate governance in IPO process: a note.
Corp. Govern. Int. Rev. 12(3), 353–360 (2004)
17. Cadbury, A.: Code of best practice: report of the committee on the financial aspects of corporate
governance. Gee and Co, London (1992)
18. Campbell, K., Mínguez-Vera, A.: Gender diversity in the boardroom and firm financial
performance. J. Bus. Ethics 83(3), 435–451 (2008)
19. Carcello, J., Hermanson, D., Ye, Z.: Corporate governance research in accounting and auditing:
Insights, practice implications, and future research directions. Audit. J. Pract. Theory 30(3),
1–31 (2011)
20. Certo, S.: Influencing initial public offering investors with prestige: signaling with board
structures. Acad. Manag. Rev. 28(3), 432–446 (2003)
21. Certo, S., Covin, J., Daily, C., Dalton, D.: Wealth and effects of founder management among
IPO-stage new ventures. Strateg. Manag. J. 22(6/7), 641–658 (2001)
22. Certo, S., Holcomb, T., Homes, M.: IPO research in management and entrepreneurship:
moving the agenda forward. J. Manag. 35(6), 1340–1378 (2009)
23. Chahine, S., Tohme, N.: Is CEO duality always negative? An exploration of CEO duality and
ownership structure in the Arab IPO context. Corp. Govern. Int. Rev. 17(2), 123–141 (2009)
24. Chemmanur, T., Hu, G., Huang, J.: The role of institutional investors in initial public offerings.
Rev. Financ. Stud. 23(12), 4496–4540 (2010)
25. Chen, Y., Leung, K., Chen, C.: Bringing national culture to the table: making a difference
with cross-cultural differences and perspectives. Acad. Manag. Ann. 3(1), 217–249 (2009)
26. Cheung, Y., Stouraitis, A., Tan, W.: Does the quality of corporate governance affect firm
valuation and risk? Evidence from a corporate governance scorecard in Hong Kong. Int. Rev.
Financ. 10(4), 403–432 (2010)
27. Cossin, D., Metayer, E.: How strategic is your board? MIT Sloan Business Review, Cambridge
(2014)
19 Machine Learning Techniques for Corporate Governance 427

28. Creamer, G., Freund, Y.: Learning a board balanced scorecard to improve corporate
performance. Decis. Support Syst. 49(4), 365–385 (2010)
29. Cucari, N.: Qualitative comparative analysis in corporate governance research: a systematic
literature review of applications. Corp. Govern. Int. J. Bus. Soc. 19(4), 717–734 (2019)
30. Dalton, D., Hitt, M., Certo, S., Dalton, C.: The fundamental agency problems and its
mitigation. Acad. Manag. Ann. 1(1), 1–64 (2007)
31. Dalton, D., Kesner, I.: Composition and CEO duality in boards of directors: an international
perspective. J. Int. Bus. Stud. 18(3), 33–42 (1987)
32. Daniel, S., Cieslewicz, J., Pourjalali, H.: The impact of national economic culture and country-
level institutional environment on corporate governance practices: theory and empirical
evidence. Manag. Int. Rev. 52(3), 365–394 (2012)
33. Davis, J., Schoorman, F., Donaldson, L.: Toward a stewardship theory of management. Acad.
Manag. Rev. 22(1), 20–47 (1997)
34. Davenport, T.H., Ronanki, R.: Artificial intelligence for the real world. Harvard Bus. Rev.
96(1), 108–116 (2018)
35. Dawson, A.: Private equity investment decisions in family firms: the role of human resources
and agency costs. J. Bus. Ventur. 26(2), 189–199 (2011)
36. DiMaggio, P., Powell, W.: The iron age revisited: institutional isomorphism and collective
rationality in organizational fields. Am. Sociol. Rev. 48(2), 147–160 (1983)
37. Djankov, S., La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: The law and economics of
self-dealing. J. Financ. Econ. 88(3), 430–465 (2008)
38. Donaldson, T., Dunfee, T.: Toward a unified conception of business ethics: Integrative social
contracts theory. Acad. Manag. Rev. 19(2), 252–284 (1994)
39. Dowell, G., Shackell, M., Stuart, N.: Boards, CEOs, and surviving a financial crisis: evidence
from the internet shakeout. Strateg. Manag. J. 32(10), 1025–1045 (2011)
40. Draho, J.: The IPO decision: why and how companies go public? Edward Elgar Publishing
Limited, Cheltenham (2004)
41. Drucker, P.: The manager and the moron. McKinsey Q 3(4), 42–52 (1967)
42. Ehrhardt, O., Nowak, E.: The effect of IPOs on German family-owned firms: governance
changes, ownership structure, and performance. J. Small Bus. Manage. 41(2), 222–232 (2003)
43. El-Haj, M., Rayson, P., Walker, M., Young, S., Simaki, V.: In search of meaning: lessons,
resources and next steps for computational analysis of financial discourse. J. Bus. Financ.
Acc. 46(3–4), 265–306 (2019)
44. Elsayed, K.: Does CEO duality really affect corporate performance? Corp. Govern. Int. Rev.
15(6), 1203–1214 (2007)
45. Endrikat, J., De Villiers, C., Guenther, T., Guenther, E.: Board characteristics and corporate
social responsibility: a meta-analytic investigation. Bus. Soc. 60(8), 2099–2135 (2021)
46. Erel, I., Stern, L., Tan, C., Weisbach, M.: Selecting directors using machine learning. Rev.
Financ. Stud. 34(7), 3226–3264 (2021)
47. Erhardt, N., Werbel, J., Shrader. C.: Board of director diversity and firm financial performance.
Corp. Governance Int. Rev. 11(2), 102–11 (2003)
48. Fama, E., Jensen, M.: Separation of ownership and control. J. Law Econ. 26(2), 301–325
(1983)
49. Fernando, A., Muraleedharan, K., Satheesh, E.: Corporate governance: principles, policies
and practices. Pearson India Education Service Pvt Ltd., Bengaluru (2017)
50. Filatotchev, I., Nakajima, C.: Internal and external corporate governance: an interface between
an organization and its environment. Br. J. Manag. 21(3), 591–606 (2010)
51. Ford, M.: Architects of intelligence: the truth about AI from the people building it. Pack
Publishing, Birmingham (2018)
52. Francoeur, C., Labelle, R., Sinclair-Desgagné, B.: Gender diversity in corporate governance
and top management. J. Bus. Ethics 81(1), 83–95 (2008)
53. Garanina, T., Kaikova, E.: Corporate governance mechanisms and agency costs: cross country
analysis. Corp. Gov. 16(2), 347–360 (2016)
428 D. Gupta

54. Gillan, S.: Recent developments in corporate governance: an overview. J. Corp. Finan. 12(3),
381–402 (2006)
55. Gillan, S., Starks, L.: A survey of shareholder activism: motivation and empirical evidence.
Contemp. Financ. Digest. 2(3), 10–34 (1998)
56. Gitundu, E., Kiprop, S., Kibet, L., Kisaka, S.: Corporate governance and financial perfor-
mance: a literature review of measurements and econometric methods of data analysis in
research. Corp. Govern. 7(14), 116–125 (2016)
57. Gompers, P., Ishii, J., Metrick, A.: Corporate governance and equity prices. Q. J. Econ. 118(1),
107–155 (2003)
58. Gompers, P., Ishii, J., Metrick, A.: Incentives versus control: an analysis of US dual-class
companies. National Bureau of Economic Research, Cambridge, MA (2004)
59. Gompers, P., Metrick, A.: Institutional investors and equity prices. Q. J. Econ. 116(1), 229–259
(2001)
60. Gonzales-Bustos, J., Hernandez-Lara, A.: Corporate governance and innovation: a systematic
literature review. Corp. Ownersh. Control. 13(2), 33–45 (2016)
61. Gul, F., Srinidhi, B., Ng, A.: Does board gender diversity improve the informativeness of
stock prices?. J. Account. Econ. 51(3), 314–338 (2011)
62. Hartzell, J., Starks, L.: Institutional investors and executive compensation. J. Financ. 58(6),
2351–2374 (2003)
63. Hermalin, B., Weisbach, M.: Boards of directors as an endogenously determined institution: a
survey of the economic literature. NBER Working Paper Series, Working Paper 8161. http://
www.nber.org/papers/w8161 (2001)
64. Hilb, M.: Unlocking the board’s data value challenge. Directorship 60–61 (2019)
65. Hilb, M.: Towards artificial intelligence? The role of artificial intelligence in shaping the
future of corporate governance. J. Manage. Govern. 24, 851–870 (2020)
66. Hofstede, G.: Culture’s consequences: international differences in work-related values. Sage,
Beverly Hills, CA (1980)
67. Hofstede, G.: Culture’s consequences: comparing values, behaviors, institutions, and organi-
zations across nations, 2nd edn. Sage, Beverly Hills, CA (2001)
68. Iyengar, R., Zampelli, E.: Self-selection, endogeneity, and the relationship between CEO
duality and firm performance. Strateg. Manag. J. 30(10), 1092–1112 (2009)
69. Jensen, M., Meckling, W.: The theory of the firm: managerial behavior, agency costs and
ownership structure. J. Financ. Econ. 3(4), 305–360 (1976)
70. Kovermann, J., Velte, P.: The impact of corporate governance on corporate tax avoidance—a
literature review. J. Int. Account. Audit. Tax. 36, 100270 (2019)
71. Kurshed, A., Lin, S., Wang, M.: Institutional block-holdings of UK firms: Do corporate
governance mechanisms matter? Eur. J. Financ. 17(2), 133–152 (2011)
72. La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: Corporate ownership around the world. J.
Financ. 54(2), 471–517 (1999)
73. La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: What works in securities laws? J. Financ.
61(1), 1–32 (2006)
74. La Porta, R., Lopez-de-Silanes, F., Shleifer, A.: The economic consequences of legal origins.
J. Econ. Literat. 46(2), 30–44 (2008)
75. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Legal determinants of external
finance. J. Financ. 52(3), 1131–1150 (1997)
76. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Law and finance. J. Polit. Econ.
106(6), 1113–1155 (1998)
77. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Agency problems and dividend
policies around the world. J. Financ. 55(1), 1–33 (2000)
78. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Investor protection and corporate
governance. J. Financ. Econ. 58(1–2), 3–27 (2000)
79. La Porta, R., Lopez-de-Silanes, F., Shleifer, A., Vishny, R.: Investor protection and corporate
valuation. J. Financ. 57(3), 1147–1170 (2002)
19 Machine Learning Techniques for Corporate Governance 429

80. Larcker, D., Richardson, S., Tuna, I.: Corporate governance, accounting outcomes, and
organizational performance. Account. Rev. 82(4), 963–1008 (2007)
81. Larraza-Kintana, M., Wiseman, R., Gomez-Mejia, L., Welbourne, T.: Disentangling compen-
sation and employment risks using the behavioural agency model. Strateg. Manag. J. 28(10),
1001–1019 (2007)
82. Leland, H., Pyle, D.: Informational asymmetries, financial structure and financial intermedi-
ation. J. Financ. 32(2), 371–387 (1977)
83. Leung, K., Bhagat, R., Buchan, N., Erez, M., Gibson, C.: Culture and international business:
recent advances and their implications for future research. J. Int. Bus. Stud. 36(4), 357–378
(2005)
84. Li, L., Naughton, T.: Going public with good governance: evidence from China. Corp. Govern.
Int. Rev. 15(6), 1190–1202 (2007)
85. Lin, R., Xie, Z., Hao, Y., Wang, J.: Improving high-tech enterprise innovation in big data
environment: a combinative view of internal and external governance. Int. J. Inf. Manage. 50,
575–585 (2020)
86. Mahadeo, J., Soobaroyen, T., Hanuman, V.: Board composition and financial performance:
uncovering the effects of diversity in an emerging economy. J. Bus. Ethics 105(3), 375–388
(2012)
87. Marsh, H.: Can man ever build a mind? Financial Times, London (2019)
88. Martikainen, M., Miihkinen, A., Watson, L.: Board characteristics and disclosure tone (2019).
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3410036. Accessed 3 Oct 2023
89. McConaughy, D., Mathews, C., Fialko, A.: Founding family-controlled firms: performance,
risk and value. J. Small Bus. Manage. 39(1), 31–49 (2001)
90. McNulty, T., Zattoni, A., Douglas, T.: Developing corporate governance research through
qualitative methods: a review of previous studies. Corp. Govern. Int. Rev. 21(2), 183–198
(2013)
91. Meyer, J., Rowan, B.: Institutionalized organizations: formal structure as myth and ceremony.
Am. J. Sociol. 83(2), 340–363 (1977)
92. Morck, R., Shleifer, A., Vishny, R.: Management ownership and market valuation: an empirical
analysis. J. Financ. Econ. 20(1–2), 293–315 (1988)
93. Nelson, T.: The persistence of founder influence: management, ownership and performance
effects at initial public offering. Strateg. Manag. J. 24(8), 707–724 (2003)
94. Nilsson, N.: The quest for artificial intelligence—a history of ideas and achievements.
Cambridge University Press, Cambridge (2010)
95. North, D.: Institutions, institutional change, and economic performance. Harvard University
Press, Cambridge, MA (1990)
96. Peng, M.: Towards an institution-based view of business strategy. Asia Pac. J. Manage. 19(2/
3), 251–267 (2002)
97. Peng, M.: Institutional transitions and strategic choices. Acad. Manag. Rev. 28(2), 275–296
(2003)
98. Peng, M., Sun, S., Pinkham, B., Chen, H.: The institution-based view as a third leg for a
strategy tripod. Acad. Manag. Perspect. 23(3), 63–81 (2009)
99. Peng, M., Zhang, S., Li, X.: CEO duality and firm performance during China’s institutional
transitions. Manag. Organ. Rev. 3(2), 205–225 (2007)
100. Pfeffer, J., Salancik, C.: The external control of organizations: a resource dependence
perspective. Harper & Row, New York (1978)
101. Pintea, M., Fulop, M.: Literature review on corporate governance–firm performance relation-
ship. Ann. Facul. Econ. 1(1), 846–854 (2015)
102. Prasad, D., Vozikis, G., Bruton, G., Merikas, A.: “Harvesting” through initial public offerings
(IPOs): the implications of underpricing for the small firm. Entrep. Theory Pract. 20(2), 31–41
(1995)
103. Raghupathi, V., Ren, J., Raghupathi, W.: Identifying corporate sustainability issues by
analyzing shareholder resolutions: a machine-learning text analytics approach. Sustainability
12(11), 4753 (2020)
430 D. Gupta

104. Ramdani, D., Witteloostuijn, A.: The impact of board independence and CEO duality on
firm performance: a quantile regression analysis for Indonesia, Malaysia, South Korea and
Thailand. Br. J. Manag. 21(3), 607–626 (2010)
105. Russell, S., Norvig, P.: Artificial intelligence: a modern approach, 3rd edn. Prentice Hall,
Upper Saddle River (2016)
106. Samanta, N.: Convergence to shareholder holder primacy corporate governance: evidence
from a leximetric analysis of the evolution of corporate governance regulations in 21 countries,
1995–2014. Corp. Gov. 19(5), 849–883 (2019)
107. Sanad, Z., Shiwakoti, R., Kukreja, G.: The role of corporate governance in mitigating real
earnings management: literature review. In: Annual PwR Doctoral Symposium 2018–2019,
pp. 173–87. KnE Social Sciences, Manama (2019)
108. Sarkar, J., Sarkar, S.: Corporate governance in India. Sage Publications, New Delhi (2012)
109. Schwartz, S.: A theory of cultural value orientations: explication and applications. Comp.
Sociol. 5(2–3), 137–182 (2006)
110. Scott, W.: Institutions and organizations. Sage, Thousand Oaks, CA (1995)
111. Shleifer, A., Vishny, R.: Large shareholders and corporate control. J. Polit. Econ. 94(3),
461–488 (1986)
112. Shleifer, A., Vishny, R.: A survey of corporate governance. J. Financ. 52(2), 737–783 (1997)
113. Shukla, H., Limbasiya, N.: Board effectiveness: an evaluation based on corporate governance
score. Int. J. Bus. Ethics Dev. Econ. 4(1), 40–48 (2015)
114. Srinidhi, B., Gul, F., Tsui, J.: Female directors and earnings quality. Contemp. Account. Res.
28(5), 1610–1644 (2011)
115. Stacey, R.: Managing the unknowable: the strategic boundaries between order and chaos.
Jossey-Bass, London (1992)
116. Still, R., Cundiff, E., Govoni, N.: Sales management: decisions, policies, and cases. Prentice-
Hall, Englewood Cliffs (1958)
117. Tang, X., Li, S., Tan, M., Shi, W.: Incorporating textual and management factors into financial
distress prediction: a comparative study of machine learning methods. J. Forecast. 39(5),
769–787 (2020)
118. Tegmark, M.: Life 3.0. Allen Lane, London (2017)
119. The Companies Act, 1956. www.mca.gov.in
120. The Companies Act, 2013. www.mca.gov.in
121. Thomsen, S.: An introduction to corporate governance: mechanisms and systems. Djof
Publishing, Copenhagen (2008)
122. Toksal, A.: The impact of corporate governance on shareholder value. Doctoral dissertation,
Universität zu Köln (2004)
123. Toumi, N., Benkraiem, R., Hamrouni, A.: Board director disciplinary and cognitive influence
on corporate value creation. Corp. Govern. (Bradford) 16(3), 564–578 (2016)
124. Vito, J., Trottier, K.: A literature review on corporate governance mechanisms: past, present
and future. Account. Perspect. 21(2), 207–235 (2022)
125. Wernerfelt, B.: A resource-based view of the firm. Strateg. Manag. J. 5(2), 171–180 (1984)
126. Williamson, O.: The new institutional economics: taking stock, looking ahead. J. Econ. Literat.
38(3), 595–613 (2000)
127. Yousaf, U., Jebran, K., Wan, M.: Can board diversity predict the risk of financial distress?
Corp. Govern. Int. J. Bus. Soc. 21(4), 663–684 (2021)
128. Zingales, L.: Corporate governance. In: Newman, P. (ed.) The new Palgrave dictionary of
economics and the law. Palgrave MacMillan, London (1998)
129. Zheng, Y., Zhou, H., Chen, Z., Ekedebe, N.: Automated analysis and evaluation of SEC
documents. In: 2014 IEEE/ACIS 13th International Conference on Computer and Information
Science (ICIS), pp. 119–124. IEEE, Taiyuan (2014)
Chapter 20
Machine Learning Approaches
for Forecasting Financial Market
Volatility

Itishree Behera, Pragyan Nanda, Soma Mitra, and Swapna Kumari

Abstract Forecasting real estate market volatility is essential for investors, devel-
opers, and policymakers in the dynamic real estate industry landscape, which can
be considered a financial market. This paper extends the discussion of forecasting
financial market volatility using machine learning techniques to the real estate market
context. Drawing upon insights from relevant research studies, we delve into the
diverse methodologies, performance evaluation metrics, and case studies specific to
predicting real estate market volatility. Machine learning models, including regres-
sion analysis, time series models, ensemble methods, and deep learning networks,
are applied to capture the intricate patterns and uncertainties in the real estate market.
Economic indicators, investor sentiment, geospatial data, and housing market funda-
mentals enhance forecasting accuracy. Performance evaluation metrics like Inter-
section over Union (IoU) and Mean Squared Error (MSE), prove indispensable for
evaluating the reliability of predictive models in this domain. The studies presented in
this review demonstrate the practical applications of machine learning in forecasting
real estate market volatility across diverse regions and property types. By adapting
methodologies from the broader financial market context, we provide valuable
insights for stakeholders seeking to make informed decisions in the ever-evolving
real estate financial market.

Keywords Market volatility forecasting · Real estate · Machine learning ·


Investor sentiment · Geospatial data · Risk management

I. Behera (B) · P. Nanda · S. Mitra · S. Kumari


Interscience Institute of Management and Technology, Bhubaneswar, Odisha, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 431
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_20
432 I. Behera et al.

20.1 Introduction

A financial market is a broad term for a marketplace or platform where individuals,


businesses, and governments can buy and sell various financial assets and instru-
ments. These markets facilitate the allocation of capital and the transfer of finan-
cial risk between participants. Financial markets play a crucial role in the global
economy by providing a mechanism for investors to access funding and invest their
capital while allowing borrowers to raise funds for various purposes [38, 8]. Financial
markets are intricate ecosystems where diverse financial assets are traded, serving as
the lifeblood of global economic activity. These financial assets encompass stocks,
bonds, commodities, currencies, derivatives, and more, each representing a unique
financial value or future cash flow. The participants in these markets form a rich
tapestry, ranging from individual investors and institutional giants like mutual funds
and pension funds to corporations, banks, and governments, each pursuing distinct
financial objectives. These markets can take shape in physical locations, like stock
exchanges and commodity markets, where traders converge to conduct transactions,
or in the virtual realm of electronic marketplaces, where participants trade assets elec-
tronically via computer networks. In many financial markets, intermediaries such as
brokers and dealers play pivotal roles in connecting buyers and sellers and ensuring
the efficiency of trades.
Moreover, regulatory oversight is a linchpin of market integrity, with regulatory
bodies varying by region, establishing rules and guidelines to safeguard investors
and maintain transparency. These regulations protect against fraud, ensure market
stability, foster fair competition, and promote transparency [6]. Various types exist
within financial markets, each with unique characteristics and assets. The stock
market, perhaps the most widely recognized, is where publicly traded company
shares change hands, enabling investors to acquire ownership stakes in these firms.
The bond market, in contrast, revolves around the issuance and trading of debt
securities, as governments and corporations borrow funds from investors through
bonds. Commodity markets form the bedrock of raw material trading, encom-
passing goods like oil, gold, wheat, and coffee, critical for global trade. The colossal
foreign exchange (Forex) market facilitates currency trading, enabling participants
to exchange one currency for another, a cornerstone of international commerce. The
derivatives market is a realm of financial instruments whose values derive from under-
lying assets, encompassing options, futures, swaps, and forwards, which are pivotal
for hedging and speculation. On the other hand, money markets deal in short-term
debt instruments like commercial paper and Treasury bills where institutions engage
in lending and borrowing for brief periods. Finally, the real estate market involves the
buying and selling of residential and commercial properties, offering opportunities
for individuals and businesses to invest in tangible assets [15].
The real estate market can be considered a financial market because it involves
buying and selling financial assets related to real property, and it plays a significant
role in allocating capital and managing financial risk [4]. An overview of various
aspects of the real estate market as a financial market has been en listed below:
20 Machine Learning Approaches for Forecasting Financial Market Volatility 433

i. Financial Assets: In the real estate market context, financial assets are repre-
sented by tangible, physical properties. These real estate assets encompass
many properties, including residential homes, condominiums, apartments,
office buildings, retail spaces, warehouses, industrial facilities, land parcels,
and more. These assets have intrinsic value due to their physical existence and
location. Investors in the real estate market acquire ownership or claim future
cash flows from these properties.
ii. Participants: The real estate market attracts diverse participants, each with
unique financial objectives and strategies. Individual investors purchase prop-
erties for various reasons, such as personal residence, rental income, or long-
term investment. They may seek to build wealth, generate rental income, or
secure a living place. Again, institutional investors like Real Estate Investment
Trusts (REITs), pension funds, private equity firms, and hedge funds partici-
pate in the market with larger pools of capital. They often seek income, capital
appreciation, or portfolio diversification through real estate investments. Next,
Real estate developers acquire, develop, and build properties for sale or lease.
They play a vital role in adding new inventory and are sensitive to market trends
and demand. Also, Real Estate Agents are intermediaries between buyers and
sellers. They assist in property transactions, market analysis, pricing, and nego-
tiations. Finally, Mortgage Lenders and Financial institutions provide financing
solutions to individuals and businesses seeking to purchase real estate. They
offer mortgage loans and other financial products tailored to the real estate
market.
iii. Marketplaces: Real estate transactions can occur through various market-
places, reflecting the evolution of technology and participants’ preferences.
Physical Marketplaces or traditional physical locations, such as local real
estate offices, property auctions, and open houses, have historically facili-
tated real estate transactions. These locations offer face-to-face interactions
between buyers, sellers, and real estate professionals. In recent years online
platforms and listing services have gained prominence within the real estate
market. Websites and mobile apps allow buyers to search for properties, view
listings, and connect with sellers and agents remotely. These platforms have
greatly expanded the reach of real estate transactions.
iv. Intermediaries: Intermediaries are essential in real estate transactions to
ensure the smooth and efficient exchange of properties. Real estate agents
and brokers act as intermediaries between buyers and sellers. They assist
in marketing properties, conducting property tours, negotiating deals, and
handling the complex paperwork involved in real estate transactions. For rental
properties, property managers oversee day-to-day operations, tenant relations,
and property maintenance on behalf of property owners, making real estate
investment more passive.
v. Regulation: Real estate markets are subject to various regulations and legal
requirements. Zoning regulations govern how properties can be used and devel-
oped within specific geographic areas, affecting property values and usage.
Building codes ensure that properties meet safety and construction standards.
434 I. Behera et al.

Compliance with these codes is essential when constructing or renovating


buildings. Local governments levy property taxes on real estate, which can
impact the cost of property ownership and the overall financial performance
of real estate investments. Regulations related to real estate transactions vary
by jurisdiction and may include laws governing property transfer, disclosure
of property conditions, and contract requirements.
vi. Financial Instruments: Real estate market participants can access various
financial instruments and tools. Mortgages: Mortgages are loans used to
purchase real estate. They allow buyers to acquire properties with a down
payment and repay the loan, typically with interest, over time. Real Estate
Investment Trusts (REITs) are investment vehicles that enable individuals and
institutions to invest in a diversified portfolio of income-producing real estate
assets. They provide an opportunity for passive real estate investment and often
offer regular dividend payments. Mutual funds focused on real estate invest in
a diversified portfolio of real estate securities, such as REITs and real estate-
related stocks. Mortgage-backed securities (MBS) are financial instruments
backed by pools of mortgage loans. They enable investors to indirectly invest
in the mortgage market and receive income from mortgage payments.
vii. Investment and Financing: Real estate investment involves acquiring prop-
erties for various purposes, including generating income and potential capital
appreciation. Investors may purchase real estate properties to build wealth over
time. Real estate can provide potential returns through rental income and the
appreciation of property values. Financing is essential in real estate transac-
tions. Borrowers obtain mortgage loans to purchase properties, while busi-
nesses and developers secure construction loans and other forms of financing
to fund real estate projects.
viii. Risk Management: Similar to other financial markets, the real estate market
carries risks that investors and property owners must consider. Real estate
includes Market Risk, as Property values can fluctuate due to economic condi-
tions, location, supply and demand dynamics, and market trends. Also, property
owners face operational risks, such as property maintenance, tenant manage-
ment, and property management, which can impact cash flow and property
value. Furthermore, changes in interest rates can affect borrowing costs for
mortgages and financing, impacting property affordability and investment deci-
sions. Location-specific risk occurs when the location of a property signifi-
cantly influences its value and potential for rental income. Lastly, changes in
zoning laws, building codes, or tax regulations can impact property usage and
profitability, showcasing regulatory risks.
ix. Liquidity: The liquidity of real estate assets varies depending on factors such
as property type, location, and economic conditions. Residential properties are
often more liquid than commercial or specialized properties, as there is a larger
pool of potential buyers. Properties in high-demand locations tend to be more
liquid, while those in less desirable areas may take longer to sell. Economic
downturns can reduce demand for real estate, potentially prolonging the time
it takes to sell properties.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 435

Therefore, the real estate market operates as a financial market by offering finan-
cial assets in tangible properties, facilitating transactions through various participants
and marketplaces, utilizing intermediaries to ensure efficiency, and adhering to regu-
lations governing property transactions. It provides a range of financial instruments,
investment opportunities, and financing options while carrying its unique set of risks
and considerations. The liquidity of real estate assets can vary, reflecting market
dynamics and location-specific factors. Understanding the real estate market’s multi-
faceted nature is essential for individuals and businesses seeking to participate in this
significant sector of the economy. It is crucial in allocating capital, wealth creation,
and risk management for investors and property owners. However, the real estate
market is distinct from traditional financial markets (like stock and bond markets)
due to its unique characteristics, illiquidity of assets, and the physical nature of
real property. Financial markets are paramount in modern economies, serving as
the conduits through which resources are allocated efficiently, asset prices are deter-
mined, and economic growth is nurtured. These markets offer investors opportunities
to diversify portfolios, manage risk, and earn returns on investments, underpinned
by a foundation of rules and regulations to protect market participants and maintain
transparency. While financial markets present avenues for wealth creation, they also
carry inherent risks, emphasizing the need for participants to be well-informed and
prudent in their financial endeavours.
Again, Real estate market volatility refers to the degree of fluctuation or variability
in property prices and market conditions over a specific period [27]. It measures the
market’s instability and the rapidity of price changes within the real estate sector. Like
in financial markets, real estate market volatility can be influenced by various factors
and have significant implications for property buyers, sellers, investors, and the
overall real estate industry. Real estate market volatility, a multifaceted phenomenon
marked by fluctuations in property prices and broader market conditions, is intri-
cately shaped by many interconnected factors. Firstly, economic conditions, encom-
passing variables such as inflation, GDP growth, employment rates, and interest
rates, wield significant influence, with economic downturns diminishing demand,
precipitating price declines, and heightening market instability. Moreover, supply
and demand dynamics, crucial to market fluctuations, see oversupply pushing prices
down and shortages propelling them upward, contributing to volatility. Additionally,
location-specific attributes introduce further nuance, as high-demand urban areas
exhibit greater stability, while less favoured or oversaturated locales may undergo
more pronounced price swings. Furthermore, investor sentiment and behaviour, regu-
latory changes, interest rate shifts, market speculation, global and local events, devel-
opment activity, and market transparency collectively compose the intricate fabric
of volatility within the real estate market [34]. Furthermore, speculative investments
amplify price swings, while regulatory adjustments and interest rate fluctuations
lead to market unpredictability. In addition, the presence of speculators, coupled
with global and local events like natural disasters and economic crises, can trigger
immediate and lasting market shifts. Consequently, development activity influences
436 I. Behera et al.

market stability, with rapid construction leading to oversupply and limited develop-
ment, causing supply constraints [25]. Moreover, market transparency, or the avail-
ability of accurate and timely data, plays a pivotal role in decision-making, with less
information in some markets potentially contributing to price volatility. Grasping
these multifaceted influences is essential for navigating the dynamic real estate land-
scape, facilitating informed decision-making and effectively managing risks. Addi-
tionally, Real estate market volatility can have significant consequences for market
participants. It may present opportunities for buyers to purchase properties at lower
prices during downturns but can also introduce uncertainty and risk [23]. Sellers
may need help in pricing their properties accurately. Investors may need to assess
risk and return carefully while making real estate investment decisions. Additionally,
industry professionals like real estate agents and developers must adapt to changing
market conditions.
Forecasting market volatility holds immense importance for various market partic-
ipants, including investors, traders, financial institutions, policymakers, and busi-
nesspersons. It empowers individuals and organizations to make informed decisions,
adeptly manage risk, and adapt their strategies to evolving market conditions, driven
by several pivotal reasons [20]. Accurate volatility forecasts serve as a linchpin in risk
management, enabling investors and businesses to assess and mitigate risks effec-
tively. By comprehending the potential magnitude of price fluctuations, they can
implement risk-mitigation strategies like diversification, hedging, or adjustments to
their portfolio allocations. Furthermore, investors and portfolio managers deploy
volatility forecasts to inform asset allocation decisions, tailoring their exposure to
different asset classes in alignment with their risk tolerance and expectations of
market volatility. During periods of anticipated high volatility, they may reduce expo-
sure to riskier assets and bolster allocations to more stable ones. Again, traders and
active investors lean on volatility forecasts for crafting and executing trading strate-
gies, utilizing technical analysis, options strategies, and other tactics that capitalize
on projected price swings, and these forecasts guide the optimal timing for entering
or exiting positions [2]. Long-term investors, such as pension funds and endowments,
draw upon volatility forecasts to make investment choices, shaping asset allocation
and investment strategies that ensure long-term financial goals are achieved while
managing potential downside risks. Also, financial institutions, including banks and
insurance companies, leverage volatility forecasts to evaluate the risk inherent in
their investment and loan portfolios, a practice instrumental in maintaining capital
adequacy and making informed lending and investment decisions. Moreover, these
forecasts are indispensable for pricing financial derivatives like options and futures
contracts, guaranteeing that these instruments represent fair market values while
minimizing mispricing and arbitrage opportunities. Central banks and government
policymakers closely monitor market volatility to uphold financial stability and
spur economic growth. Their understanding of volatility often informs decisions
regarding interest rates, monetary policy, and financial regulations. Companies inte-
grate volatility forecasts into their business planning processes, adapting budgeting,
pricing strategies, and inventory management in response to expected market condi-
tions, thus enhancing their ability to navigate shifting economic environments and
20 Machine Learning Approaches for Forecasting Financial Market Volatility 437

stay competitive [24]. Market participants frequently rely on volatility forecasts to


bolster their confidence in investment choices, as access to reliable forecasts dimin-
ishes uncertainty and enhances investor sentiment, contributing to overall market
stability. Lastly, financial professionals and analysts employ volatility forecasts to
communicate market risks to clients and stakeholders transparently, aiding clients
in making informed decisions while comprehending the potential for market fluc-
tuations. Moreover, these forecasts can also significantly impact the valuation of
financial assets, as investors and analysts often incorporate expected volatility into
their valuation models to ascertain fair asset prices.
A thorough examination and rigorous assessment of market volatility are crucial
prerequisites in real estate investments. Real estate professionals, investors, and
policymakers frequently rely on various metrics and indicators to evaluate market
stability and anticipate potential shifts in property values. Predicting market volatility
plays a fundamental role in financial analysis and decision-making, serving as an
indispensable guide for navigating the complex landscape of uncertain economic
conditions. It empowers market participants to optimize investment strategies and
adeptly manage risk, both essential components of prudent financial management [21,
22]. The accuracy and dependability of volatility forecasts are essential tools, equip-
ping individuals and organizations with the capabilities needed to make informed
financial decisions and remain adaptable in response to the constant fluctuations in
market conditions [36]. Within this context, this chapter explores the current state of
the art in forecasting financial market volatility within the real estate sector, specif-
ically focusing on applying machine learning techniques. The rest of the sections
have been chapterized as follows: (ii) Background and significance (iii) Traditional
Methods for Volatility Forecasting (iv) Machine Learning Techniques in Volatility
Forecasting (v) Data Sources and Preprocessing (vi) Performance Evaluation Metrics
(vii) Challenges and Future Directions (viii) Conclusion.

20.2 Background and Significance

Machine learning and data mining techniques have gained significant attention in real
estate market analysis and forecasting in recent years. The ability of these methods to
handle vast and complex datasets has provided valuable insights into various aspects
of the real estate market. This literature review synthesizes findings from 25 relevant
studies that explore the application of machine learning and data mining in real estate
research, shedding light on the diverse methodologies and their implications for the
industry. Cotter and Roll [7] conducted a comparative study of residential Real Estate
Investment Trusts (REITs) and private real estate markets, focusing on returns, risks,
and distributional characteristics. Their analysis highlighted the distinctions between
these two investment vehicles and offered insights into risk-return profiles. Yu et al.
[42] delved into real estate pricing methods, leveraging data mining and machine
learning techniques. Their research aimed to enhance pricing accuracy by considering
multiple variables and adopting sophisticated modelling approaches. Rafiei and Adeli
438 I. Behera et al.

[31] introduced a novel machine-learning model for estimating the sale prices of real
estate units. Their study demonstrated the potential of machine learning in capturing
the intricate relationships between property attributes and market dynamics. Using
geocoding and machine learning, Tchuente and Nyawa [37] explored real estate price
estimation in French cities. Their research harnessed location-based data to improve
price predictions and spatial understanding. Park and Ryu [29] contributed to risk
management in real estate markets by developing a machine learning-based early
warning system for housing and stock markets. Their approach focused on identifying
potential market fluctuations and risks. Kabaivanov and Markovska [17] examined
the role of artificial intelligence in real estate market analysis, highlighting the advan-
tages of AI in handling complex market dynamics. Using machine learning, Hausler
et al. [13] investigated news-based sentiment analysis in real estate. Their work
revealed the influence of sentiment on market trends and dynamics. Gupta et al. [10]
employed machine learning to predict housing market synchronization across US
states, emphasizing the role of uncertainty in market movements. Gude [9] proposed
multi-level modelling approach for forecasting real estate dynamics, capturing the
complexity of the market across different levels. Cepni et al. [5] explored the impact
of investor sentiment on housing returns in China, applying machine learning tech-
niques for sentiment analysis. Prakash et al. [30] demonstrated the application of
machine learning in predicting housing prices, offering insights into price trends
and patterns. Rosenbaum and Zhang [32] investigated the global presence of the
volatility formation process using rough volatility and machine learning techniques,
contributing to our understanding of market volatility. Hu et al. [16] developed a
hybrid deep learning approach for predicting copper price volatility, highlighting
the potential of combining neural networks with traditional models. Lian et al. [18]
applied machine learning and time series models to predict VNQ market trends,
offering investors valuable insights. Habbab and Kampouridis [11] investigated five
machine-learning algorithms for optimizing mixed-asset portfolios, including Real
Estate Investment Trusts (REITs). Ngene and Wang [28] explored shock transmis-
sions between real estate investment trusts and other assets using time–frequency
decomposition and machine-learning techniques. Lee and Park [19] focused on
forecasting trading volume in local housing markets through a time-series model
and a deep learning algorithm, contributing to market analysis. Verma et al. [39]
predicted house prices in India using linear regression and machine learning algo-
rithms, offering valuable insights into the Indian real estate market. Xu and Zhang
[41] employed neural networks to forecast retail property price indices, providing
accurate predictions for market participants. Abdul Salam et al. [1] conducted a
systematic literature review of machine learning algorithms for price and rent predic-
tions in real estate, summarizing the state of the art. Han et al. [12] demonstrated
machine learning methods to predict consumer confidence from search engine data,
providing insights into consumer sentiment and its impact on the real estate market.
Sanyal et al. [33] focused on Boston house price prediction using regression models,
offering localized insights into housing markets. Nagl [26] conducted sentiment
analysis within a deep learning probabilistic framework, offering new evidence from
20 Machine Learning Approaches for Forecasting Financial Market Volatility 439

residential real estate in the United States. Wiradinata et al. [40] performed a post-
pandemic analysis of house price prediction in Surabaya using machine learning,
contributing to our understanding of market resilience in challenging times.
All these studies reflect the evolving landscape of real estate market analysis,
where machine learning and data mining techniques play a vital role in enhancing
prediction accuracy, risk assessment, and decision-making processes. By leveraging
the vast amount of data available in the real estate domain, these methodologies
contribute to a more informed and efficient real estate market.

20.3 Traditional Methods for Volatility Forecasting

Forecasting volatility in the real estate market involves predicting future fluctuations
in property prices and market conditions. Traditional real estate market volatility
forecasting methods often draw from statistical and econometric models, as well
as real estate-specific data and indicators. Here are some traditional methods for
forecasting volatility in the real estate market:
i. Historical Volatility (HV): Historical volatility in the real estate market
involves looking at past changes in property prices to estimate how much they
have varied over time. It is calculated as the standard deviation of historical
property price returns. High historical volatility suggests that property prices
have experienced significant fluctuations in the past, which may continue in
the future.
ii. Moving Averages: Moving averages are used to smooth out fluctuations
in property prices. For example, a 12-month moving average calculates
the average property price over the past year. Investors and analysts use
moving averages to identify trends and assess whether prices are increasing or
decreasing steadily.
iii. Exponential Smoothing: Exponential smoothing models give more weight
to recent property price data while gradually reducing the significance of
older data points. This method is particularly useful for capturing short-term
fluctuations in property prices, as it emphasizes recent trends.
iv. GARCH Models (Generalized Autoregressive Conditional Heteroskedas-
ticity): GARCH models are statistical models that capture the time-varying
volatility of property prices. They estimate the conditional variance of prop-
erty price returns, allowing for the modelling of volatility clustering, where
periods of high volatility tend to follow one another.
v. Time Series Decomposition: Time series decomposition separates property
price data into its main components: trend, seasonality, and residual volatility.
Analyzing the volatility component can provide insights into the potential for
future price fluctuations.
440 I. Behera et al.

vi. Volatility Index: Similar to stock market volatility indices like the VIX,
some regions or markets have started developing real estate-specific volatility
indices. These indices measure the expected future volatility of property prices
within a specific real estate market or geographic area.
vii. Economic Indicators: Traditional economic indicators, such as GDP growth,
employment rates, and interest rates, can be used to gauge the potential for
volatility in the real estate market. Economic downturns or rising interest rates
can influence property price movements.
viii. Housing Market Data: Real estate-specific data, including housing starts,
building permits, and inventory levels, can provide insights into market condi-
tions and potential volatility. An oversupply of housing relative to demand can
lead to price fluctuations.
ix. Mortgage Market Data: Data related to mortgage rates, loan originations,
and mortgage delinquency rates can offer valuable insights into the health of
the real estate market and its potential for volatility. Rising mortgage rates, for
instance, can impact housing affordability and demand.
x. Local Market Indicators: Real estate markets are highly localized, with condi-
tions varying significantly from one region to another. Local indicators, such
as population growth, job opportunities, and supply–demand imbalances, are
crucial in forecasting volatility within specific markets.
xi. Consumer Confidence Surveys: Consumer sentiment and confidence in the
housing market can be leading indicators of potential volatility. A drop in
consumer confidence may signal uncertainty and price fluctuations.
xii. Real Estate Transaction Data: Historical data on property transactions,
including sales prices and transaction volumes, provide valuable informa-
tion about past price movements. Analyzing transaction data can help forecast
future volatility based on historical patterns.
Each method discussed above offers a different perspective on the real estate
market and its potential for volatility. Real estate professionals and investors often
use a combination of these methods and data sources to make informed decisions
about buying, selling, or investing in real estate properties. It is important to note
that real estate market volatility can differ significantly depending on factors like
property type (residential, commercial, industrial), location (urban vs. rural), and
regional economic conditions. Therefore, the forecasting method and data sources
should be tailored to the analyzed real estate market. Additionally, as with any fore-
casting model, ongoing monitoring and validation of results are essential to ensure
the accuracy and relevance of the forecasts.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 441

20.4 Machine Learning Techniques in Volatility


Forecasting

Machine learning techniques have gained traction in real estate market volatility
forecasting due to their ability to handle complex data patterns and improve accuracy.
These techniques leverage historical real estate data, economic indicators, and other
relevant factors to predict future market volatility. Here are some machine learning
techniques commonly used in real estate market volatility forecasting:
1. Regression Analysis: Linear and non-linear regression models can be applied
to real estate data to predict market volatility. Features like historical property
prices, interest rates, GDP growth, and unemployment rates can be used as input
variables. Regression models aim to find relationships between these variables
and the volatility of real estate prices.

Yi = β0 + β1 Xi + ε (20.1)

where
Yi is the dependent variable (volatility)
β0 is the intercept
β1 is the slope coefficient for the independent variable Xi
ε represents the error term
2. Time Series Models: Time series forecasting techniques, such as ARIMA (Auto
Regressive Integrated Moving Average) and its variations, are used to analyze
historical property price data. These models capture seasonality, trends, and auto-
correlation in the data to make short-term and long-term predictions about real
estate market volatility.

Yt = c + ϕ1 Yt−1 + · · · + ϕp Yt−p + θ1 εt−1 + · · · + θq εt−q + εt (20.2)

where
Yt is the time series at time t
c is a constant
ϕ1 , …, ϕp are auto-regressive coefficients
θ1 , ..., θq are moving average coefficients
εt is the white noise error term.
3. Random Forests: Random Forests are ensemble learning models that can capture
complex relationships in real estate data. They work well with numerical and
categorical features, making them suitable for analyzing various factors influ-
encing real estate market volatility, such as location, property type, and economic
indicators.
442 I. Behera et al.

4. Gradient Boosting Machines (GBMs): GBMs, including XG Boost and Light


GBM, are powerful ensemble models for real estate market volatility forecasting.
They can handle high-dimensional data and effectively capture non-linear
relationships between input features and volatility.
5. Neural Networks: Deep learning models like artificial neural networks (ANNs),
recurrent neural networks (RNNs), and long short-term memory networks
(LSTMs) can analyze time series data and complex interactions between multiple
variables. ANNs can capture non-linear relationships, while RNNs and LSTMs
excel in handling sequential data.
6. Support Vector Machines (SVMs): SVMs can be used for binary classification
to identify high or low volatility periods in the real estate market. SVMs aim
to find a hyperplane that best separates different volatility regimes based on
historical data and relevant features.
7. Deep Reinforcement Learning: Reinforcement learning techniques can be
applied to real estate market forecasting as a sequential decision-making problem.
Agents learn to take actions (e.g., adjusting real estate portfolios) based on
historical data and economic indicators to maximize returns while managing
risk.
8. Geospatial Analysis: Geospatial analysis incorporates location-based data into
machine learning models. Geospatial data, including property location, land use,
and proximity to amenities, can be incorporated into machine learning models
to capture spatial patterns and their impact on real estate market volatility.
9. Ensemble Models: Ensemble models combine the predictions of multiple
machine learning models to improve forecasting accuracy. Combining the predic-
tions of multiple machine learning models, such as bagging and stacking, can
improve forecasting accuracy by reducing model bias and variance.
Machine learning techniques in real estate market volatility forecasting require
data preparation, feature selection, model training, and evaluation. The choice of tech-
nique depends on the specific forecasting objectives and data availability. Continuous
monitoring and model updates are essential to ensure the models adapt to changing
real estate market conditions.

20.5 Data Sources and Preprocessing

Forecasting real estate market volatility is a complex task that relies on diverse data
sources and meticulous data preprocessing. Access to historical property price data is
fundamental, providing insights into past market dynamics. Alongside this, economic
indicators such as GDP growth, inflation, and interest rates offer crucial macroeco-
nomic context. Housing market data, including housing starts, building permits, and
inventory levels, helps assess supply and demand dynamics. Mortgage market data,
20 Machine Learning Approaches for Forecasting Financial Market Volatility 443

encompassing mortgage rates and lending standards, illuminates financing condi-


tions. Local market data, such as population growth, job opportunities, and infras-
tructure development, is vital for understanding regional market nuances. Geospatial
data aids in analyzing property location attributes and neighbourhood characteris-
tics. Consumer sentiment surveys and news/social media data contribute insights
into market sentiment and events shaping the real estate landscape. To make sense
of this diverse data, rigorous preprocessing steps involve cleaning, time alignment,
feature engineering, normalization, handling categorical data, and more. Ensuring
data quality and relevance is paramount, as it lays the foundation for building accu-
rate predictive models to forecast real estate market volatility. Here are common data
sources for real estate market volatility forecasting are as follows:
i. Historical Property Price Data: This source records property prices for a
specific region or market. The data should cover a sufficiently long period,
often several years or decades, to capture market trends and cycles. This data
serves as the foundation for understanding past market behaviour.
ii. Economic Indicators: Economic indicators offer insights into the broader
economic conditions that influence the real estate market. GDP growth, infla-
tion rates, unemployment rates, and interest rates provide context for under-
standing the economic health of a region, which in turn affects property prices
and market volatility.
iii. Housing Market Data: Housing market data includes information on funda-
mental aspects of the real estate market, such as housing starts, building permits,
construction activity, and inventory levels. These indicators help assess the
supply and demand dynamics within the housing market, which can influence
property prices and volatility.
iv. Mortgage Market Data: Data related to the mortgage market, including mort-
gage rates, loan origination volumes, mortgage delinquency rates, and lending
standards, provides insights into the availability of financing and credit condi-
tions. Changes in mortgage rates and lending practices can impact the demand
for properties.
v. Local Market Data: Local market indicators are specific to a particular
geographic area and include population growth, job opportunities, school
quality, crime rates, and infrastructure development. These factors play a
crucial role in determining the attractiveness of a location and, subsequently,
its real estate market dynamics.
vi. Geospatial Data: Geospatial data encompasses information about property
locations, such as proximity to amenities (e.g., schools, parks, shopping
centres), neighbourhood attributes, land use patterns, and geographic features.
This data helps capture the spatial characteristics influencing property values
and market behaviour.
vii. Consumer Sentiment Surveys: Consumer sentiment surveys gauge the confi-
dence and perceptions of individuals regarding the housing market. These
surveys can provide early indications of shifts in market sentiment, which can
influence buying and selling behaviour.
444 I. Behera et al.

viii. News and Social Media Data: Textual data from news articles, social media
platforms, and real estate market reports can be processed using natural
language processing (NLP) techniques. This unstructured data can extract
sentiment, detect events, and monitor public perception, impacting market
sentiment and volatility.
Data preprocessing is essential for analyzing market volatility in the real estate
sector as it helps clean and refine raw data, ensuring accuracy and consistency.
Removing outliers, handling missing values, and normalizing data can enhance the
quality of information, making it suitable for robust predictive models. Effective data
preprocessing lays the foundation for accurate volatility forecasts, aiding investors
and professionals in making informed decisions in this complex and dynamic market.
Some common data preprocessing steps for real estate market volatility forecasting
are discussed below:
i. Data Cleaning: Data cleaning includes removing or addressing missing
outliers, values, and inconsistencies in the data. This step make sure that the
data is high quality and suitable for analysis.
ii. Handling Missing Data: Missing data can be addressed through imputation
(filling missing values with estimated values) or removing rows with missing
values, depending on the extent and nature of missing data.
iii. Time Alignment: To analyze data effectively, ensuring that all data sources
are synchronized regarding time frames and frequencies is crucial. This
may involve aggregating data to a consistent time interval (e.g., monthly or
quarterly) to facilitate analysis.
iv. Feature Engineering: Feature engineering entails creating new variables or
transforming existing ones to capture relevant information. For instance, lagged
property price data, moving averages, and economic indicator transformations
can help create informative features for forecasting.
v. Normalization and Scaling: Numerical features are often normalized or scaled
to a consistent range. Common techniques include Min–Max scaling (rescaling
to a range of [0, 1]) and Z-score normalization (scaling with mean 0 and
standard deviation 1).
vi. Handling Categorical Data: Categorical data, such as property types (e.g.,
residential, commercial) or regions, must be encoded into numerical format.
One-hot encoding is a technique that converts categorical data into a binary
format by creating individual binary variables for each category. In contrast,
label encoding assigns a unique numerical value to each category in the dataset.
vii. Handling Imbalanced Data: If there is an imbalance in the distribution
of volatility periods (e.g., high volatility periods are rare compared to low
volatility periods), techniques like oversampling (increasing the representa-
tion of minority class) or undersampling (reducing the majority class) may be
applied to balance the dataset.
viii. Time Series Decomposition: Time series data can be decomposed into its
primary components, including trend, seasonality, and residual volatility. This
decomposition aids in understanding the underlying patterns in the dataset.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 445

ix. Data Splitting: The data is split into validation, training and test sets. Time-
based splitting is often preferred to mimic real-world forecasting scenarios and
ensure that models are evaluated on unseen data.
x. Regularization and Transformation: Regularization techniques, such as L1
or L2, may be applied to prevent overfitting in predictive models. Log or
Box-Cox transformations can also be used for variables with non-normal
distributions.
Thus, data preprocessing is a critical and iterative step in the real estate
market volatility forecasting process. High-quality, well-processed data enhances
the accuracy and effectiveness of predictive models, allowing for more informed
decision-making in real estate investments and risk management.

20.6 Performance Evaluation Metrics

Performance evaluation metrics are essential for assessing machine learning models’
accuracy and effectiveness in forecasting real estate market volatility. Some
common performance evaluation metrics related to the machine-learning techniques
mentioned earlier:
i. Mean Absolute Error (MAE) measures the average absolute difference
between the actual and predicted volatility values. It provides insight into the
degree of errors made by the model. Mean Absolute Error is useful to under-
stand the average absolute prediction error in the same units as the target variable
(volatility). A lower MAE indicates better model performance.
ii. Mean Squared Error (MSE) quantifies the average squared difference
between predicted and actual values. It gives higher weight to larger errors.
MSE helps identify outliers or cases where the model’s predictions signifi-
cantly deviate from actual values. A lower MSE suggests a model with smaller
errors, but it penalizes larger errors more heavily.
iii. Root Mean Squared Error (RMSE) is the square root of MSE and provides
a more interpretable metric in the same units as the target variable. RMSE is
preferred while expressing prediction errors in the original scale of the target
variable (volatility). A lower RMSE indicates a better fit of the model to the
data.
iv. R-squared (R2 ) represents the proportion of variance in the target variable
explained by the technique. It measures the goodness of fit. R2 helps evaluate
how well the technique captures the variability in volatility values. R2 values
range from 0 to 1, with higher values indicating that the model explains a larger
proportion of variance. A higher R2 suggests a better-fitting model.
v. Mean Absolute Percentage Error (MAPE) measures the percentage differ-
ence between predicted and actual values making it suitable for time series data.
MAPE is useful for understanding prediction accuracy in relative terms, which
446 I. Behera et al.

is important in forecasting. A lower MAPE indicates more accurate predictions


in terms of percentage error.
vi. Akaike Information Criterion (AIC) and Bayesian Information Criterion
(BIC) are model selection criteria used to choose the best-fitting ARIMA model.
Lower values indicate better model fit.AIC and BIC help in selecting the most
appropriate ARIMA model among candidates. BIC and Lower AIC values
suggest a better trade-off between complexity and model fit.
vii. Classification metrics (Accuracy, Precision, Recall, and F1-Score) assess the
performance of SVMs in distinguishing between high and low volatility periods.
They are essential when SVMs are used for binary classification tasks. High
accuracy, precision, recall, and F1-score values indicate good classification
performance.
These performance evaluation metrics play a crucial role in assessing the effec-
tiveness of machine learning models in real estate market volatility forecasting,
allowing stakeholders to make informed decisions and improve model performance.
The choice of metrics depends on the specific machine learning task, objectives, and
nature of the data.

20.7 Challenges and Future Directions

Financial market volatility is fundamental to global economies, affecting investors,


businesses, and policymakers. The unpredictable nature of market fluctuations makes
it a challenging field to navigate. Over the years, financial professionals have sought
ways to model and predict market volatility, and one of the most promising avenues
for this is the application of machine learning. Predicting financial market volatility
using machine learning is complex and challenging, but it holds immense potential for
enhancing decision-making in the financial industry [14]. Addressing data quality,
non-stationarity, feature selection, model complexity, and volatility clustering are
essential for accurate predictions.
There were various challenges in Predicting Financial Market Volatility Using
Machine Learning; a few of the common challenges are listed below:
1. Data Quality and Quantity: One of the primary challenges in applying machine
learning to predict market volatility is obtaining high-quality and sufficient data
[35]. Financial markets generate vast amounts of data, which may need to be
more efficient, complete, and reliable. Ensuring data cleanliness and obtaining
historical data for meaningful analysis can be complex and costly.
2. Non-Stationarity and Feature Selection: Financial time series data often exhibit
non-stationarity [14], where statistical properties change over time. Machine
learning models, particularly traditional ones, assume stationarity. Adapting
machine learning models for non-stationarity is crucial for accurate volatility
prediction. Determining which features or variables are most relevant for
predicting market volatility is non-trivial. Feature engineering and selection
20 Machine Learning Approaches for Forecasting Financial Market Volatility 447

require domain expertise and understanding of financial markets to identify


informative features that capture volatility dynamics [35].
3. Model Complexity and Volatility Clustering: While machine learning models
can capture complex relationships in data, overly complex models may overfit
the training data, leading to poor generalization to unseen data. Balancing model
complexity is critical for predictive accuracy. Financial markets exhibit periods of
high and low volatility, often in clusters. Machine learning models need to account
for these patterns, as they can significantly affect the accuracy of predictions. Only
simple models that capture such dynamics may produce reliable forecasts [35].
Future research directions in Predicting Financial Market Volatility Using
Machine Learning includes application of deep learning techniques, such as recur-
rent neural networks (RNNs) [14] and long short-term memory networks (LSTMs),
that can capture complex temporal dependencies in financial data. Future research
may focus on developing deep learning architectures tailored to volatility predic-
tion. Also, financial markets are mostly influenced by news, sentiment, and macroe-
conomic factors. NLP models can analyze textual data from news articles, social
media, and financial reports to incorporate sentiment analysis and news sentiment
into volatility prediction models. Furthermore, combining multiple machine learning
models through ensemble methods can improve predictive accuracy and robustness.
Future research may explore ensemble techniques incorporating diverse models,
including traditional econometric and machine learning models [3]. As machine
learning models become more complex, ensuring interpretability and explainability is
crucial, especially in financial markets where decision-makers require transparency.
Future research should focus on developing methods to interpret and explain machine
learning models’ predictions. Additionally since, financial markets operate in real
time, and predictions must be updated continuously. Developing online learning
algorithms that can adapt to changing market conditions in real time is an important
direction for research.
The challenges and future directions of using machine learning to predict financial
market volatility present a captivating landscape with promise and complexity [3].
With its intricate web of variables and constant evolution, the financial world demands
innovative approaches to provide better insights into market dynamics. While the
challenges are formidable, they are also opportunities for growth and advance-
ment. The foremost challenge in this domain lies in data. Financial markets generate
massive datasets, but this data’s quality, completeness, and reliability remain critical
issues. Overcoming these hurdles requires technological prowess and stringent data
governance and cleaning procedures. The trade-off between model complexity and
generalization is a challenge that persists. Machine learning models can capture intri-
cate relationships in data, but overly complex models can lead to overfitting. Striking
the right balance is crucial, requiring continuous refinement of model architectures.
Furthermore, volatility clustering is a unique challenge in financial markets. Tradi-
tional models often fail to capture the clustered nature of volatility, leading to unre-
liable forecasts. Developing models that account for these patterns will be essential
for accurate predictions [3]. The journey to harness machine learning for predicting
448 I. Behera et al.

financial market volatility is dynamic and evolving. While challenges abound, so


too do opportunities for innovation. As technology advances and researchers push
the boundaries of what is possible, the financial industry can look forward to more
accurate, timely, and actionable insights that will drive better decision-making and
ultimately shape the future of finance.

20.8 Conclusion

The real estate market, often regarded as a financial market, holds a pivotal position
in the global economy, significantly impacting wealth creation and capital alloca-
tion. Accurate real estate market volatility forecasts are essential for various stake-
holders, including investors, developers, policymakers, and homeowners. Incorpo-
rating machine learning techniques in the context of real estate market volatility fore-
casting has revolutionized how we analyze and predict market dynamics. The key
takeaways from the preceding discussions in this chapter on methodologies, perfor-
mance metrics, and case studies related to forecasting real estate market volatility
using machine learning are summarized here. The methodologies discussed in this
exploration encompass a wide array of machine-learning techniques tailored to the
unique characteristics of the real estate market. A detailed study of current research
work shows regression analysis, time series models like ARIMA, ensemble methods
such as Random Forests and Gradient Boosting, neural networks, support vector
machines, and deep reinforcement learning have all been applied to model volatility.
Feature engineering techniques have empowered these models to capture the intricate
relationships between economic indicators, geospatial data, investor sentiment, and
housing market fundamentals. These machine learning models have proven effec-
tive in handling numerical and categorical data, a crucial requirement given the
diverse factors influencing real estate market volatility. The assessment of machine
learning models for real estate market volatility forecasting requires comprehensive
performance evaluation metrics. These metrics differ based on the type of model
used and the specific forecasting task. For regression-based models, metrics like
Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared
Error (RMSE), and R-squared (R2 ) provide valuable insights into the explanatory
power and accuracy of the techniques. Classification models, such as support vector
machines, rely on metrics like accuracy, precision, recall, and F1-score to gauge
their effectiveness in identifying high and low volatility periods. Time series models
benefit from metrics such as Akaike Information Criterion (AIC), Mean Absolute
Percentage Error (MAPE), and Bayesian Information Criterion (BIC) for model
selection. These performance metrics are quantitative measures to compare models,
optimize hyperparameters, and enhance forecasting accuracy.
This chapter further explores the real-world applicability of machine learning
in real estate market volatility forecasting which is evident from the existing liter-
ature. These studies span diverse geographical regions and property types, empha-
sizing the versatility of machine learning models in capturing local market dynamics.
20 Machine Learning Approaches for Forecasting Financial Market Volatility 449

Researchers have successfully employed machine learning to predict property prices,


assess market synchronization across states, forecast trading volumes, and optimize
mixed-asset portfolios that include Real Estate Investment Trusts (REITs). Further-
more, studies on geospatial data and sentiment analysis using NLP techniques have
enriched our understanding of how location characteristics and news sentiment influ-
ence real estate market dynamics. The relevance of machine learning is also reflected
in early warning systems for housing and stock markets, emphasizing the importance
of timely information in decision-making processes. The implications of integrating
machine learning in real estate market volatility forecasting extend beyond research
into practical applications for stakeholders. Investors can benefit from more accurate
predictions of property prices and market trends, enabling informed investment deci-
sions and improved risk management. Developers can use machine learning models
to assess supply and demand dynamics in specific regions, optimizing construc-
tion activities and resource allocation. Policymakers gain insights into the effects of
economic indicators on housing markets, supporting data-driven policy formulation.
Homeowners can make informed choices about property investments and mortgage
decisions.
Lastly, this chapter attempts to highlight the convergence of machine learning
techniques with real estate market volatility forecasting represents a significant
advancement in the field. The ability to model complex relationships, adapt to diverse
data types, and provide actionable insights positions machine learning as a valuable
tool in the financial landscape of real estate. However, it is essential to acknowledge
that machine learning models have limitations, including the need for substantial data
and potential overfitting challenges. Therefore, continued research and innovation in
this area are crucial to refine further and expand the applicability of machine learning
in real estate market volatility forecasting. As the real estate market evolves, stake-
holders must remain vigilant in adopting advanced forecasting methodologies. With
its adaptability and power to uncover hidden patterns, within the real estate sector
Machine learning is composed to take on a progressively central role in shaping the
future of financial markets; in a world where data is abundant and decision-making
is increasingly data-driven, embracing machine learning is not only prudent but also
necessary for navigating the complexities of the real estate market.

References

1. Abdul Salam, M.H., Mohd, T., Masrom, S., Johari, N., Mohamad Saraf, M.H.: Machine learning
algorithms on price and rent predictions in real estate: a systematic literature review (2022)
2. Bhatia, A., Chandani, A., Atiq, R., Mehta, M., Divekar, R.: Artificial intelligence in financial
services: a qualitative research to discover robo-advisory services. Qual. Res. Finan. Mark.
13(5), 632–654 (2021)
3. Boukherouaa, E.B., Shabsigh, M.G., AlAjmi, K., Deodoro, J., Farias, A., Iskender, E.S.,
Mirestean, M.A.T., Ravikumar, R.: Powering the digital economy: opportunities and risks
of artificial intelligence in finance. Int. Monetary Fund (2021)
450 I. Behera et al.

4. Cavalcante, R.C., Brasileiro, R.C., Souza, V.L., Nobrega, J.P., Oliveira, A.L.: Computational
intelligence and financial markets: a survey and future directions. Expert Syst. Appl. 15(55),
194–211 (2016)
5. Cepni, O., Gupta, R., Onay, Y.: The role of investor sentiment in forecasting housing returns
in China: a machine learning approach. J. Forecast. 41(8), 1725–1740 (2022)
6. Cerutti, E., Dagher, J., Dell’Ariccia, G.: Housing finance and real-estate booms: a cross-country
perspective. J. Hous. Econ. 1(38), 1–3 (2017)
7. Cotter, J., Roll, R.: A comparative anatomy of residential REITs and private real estate markets:
returns, risks and distributional characteristics. Real Estate Econ. 43(1), 209–240 (2015)
8. Durusu-Ciftci, D., Ispir, M.S., Yetkiner, H.: Financial development and economic growth: some
theory and more evidence. J. Policy Model. 39(2), 290–306 (2017)
9. Gude, V.: A multi-level modeling approach for predicting real-estate dynamics. Int. J. Housing
Markets Anal. (2023)
10. Gupta, R., Marfatia, H.A., Pierdzioch, C., Salisu, A.A.: Machine learning predictions of housing
market synchronization across US states: the role of uncertainty. J. Real Estate Finance Econ,
1–23 (2022)
11. Habbab, F.Z., Kampouridis, M.: An in-depth investigation of five machine learning algorithms
for optimizing mixed-asset portfolios including REITs. Expert Syst. Appl. 235, 121102 (2024)
12. Han, H., Li, Z., Li, Z.: Using machine learning methods to predict consumer confidence from
search engine data. Sustainability 15(4), 3100 (2023)
13. Hausler, J., Ruscheinsky, J., Lang, M.: News-based sentiment analysis in real estate: a machine
learning approach. J. Prop. Res. 35(4), 344–371 (2018)
14. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques
applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
15. Hsiao, Y.J., Tsai, W.C.: Financial literacy and participation in the derivatives markets. J. Bank.
Finance 1(88), 15–29 (2018)
16. Hu, Y., Ni, J., Wen, L.: A hybrid deep learning approach by integrating LSTM-ANN networks
with GARCH model for copper price volatility prediction. Physica A Stat. Mech. Appl. 557,
124907 (2020)
17. Kabaivanov, S., Markovska, V.: Artificial intelligence in real estate market analysis. In: AIP
Conference Proceedings, vol. 2333, no. 1. AIP Publishing (2021)
18. Lian, Y.M., Li, C.H., Wei, Y.H.: Machine learning and time series models for vnq market
predictions. J. Appl. Finance Bank. 11(5), 29–44 (2021)
19. Lee, C., Park, K.K.H.: Forecasting trading volume in local housing markets through a time-
series model and a deep learning algorithm. Eng. Constr. Archit. Manag. 29(1), 165–178
(2022)
20. Liow, K.H., Huang, Y.: The dynamics of volatility connectedness in international real estate
investment trusts. J. Int. Finan. Markets. Inst. Money 1(55), 195–210 (2018)
21. Liow, K.H., Zhou, X., Ye, Q.: Correlation dynamics and determinants in international
securitized real estate markets. Real Estate Econ. 43(3), 537–585 (2015)
22. Liow, K.H., Liao, W.C., Huang, Y.: Dynamics of international spillovers and interaction:
evidence from financial market stress and economic policy uncertainty. Econ. Model. 1(68),
96–116 (2018)
23. Loutskina, E., Strahan, P.E.: Financial integration, housing, and economic volatility. J. Financ.
Econ. 115(1), 25–41 (2015)
24. Mohanta, B., Nanda, P., Patnaik, S.: Management of VUCA (volatility, uncertainty, complexity
and ambiguity) using machine learning techniques in industry 4.0 paradigm. New Paradigm
Industry 4.0 IoT Big Data Cyber Phys. Syst, 1–24 (2020)
25. Munawar, H.S., Qayyum, S., Ullah, F., Sepasgozar, S.: Big data and its applications in smart real
estate and the disaster management life cycle: a systematic analysis. Big Data Cogn. Comput.
4(2), 4 (2020)
26. Nagl, C.: Sentiment analysis within a deep learning probabilistic framework–new evidence
from residential real estate in the United States. J. Housing Res., 1–25 (2023)
20 Machine Learning Approaches for Forecasting Financial Market Volatility 451

27. Nazlioglu, S., Gormus, N.A., Soytas, U.: Oil prices and real estate investment trusts (REITs):
Gradual-shift causality and volatility transmission analysis. Energy Econ. 1(60), 168–175
(2016)
28. Ngene, G. M., Wang, J.: Transitory and permanent shock transmissions between real estate
investment trusts and other assets: evidence from time-frequency decomposition and machine
learning. Accounting & Finance 64(1), 539–573 (2023)
29. Park, D., Ryu, D.: A machine learning-based early warning system for the housing and stock
markets. IEEE Access 9, 85566–85572 (2021)
30. Prakash, H., Kanaujia, K., Juneja, S.: Using machine learning to predict housing prices. In:
2023 International Conference on Artificial Intelligence and Smart Communication (AISC),
pp. 1353–1357. IEEE (2023)
31. Rafiei, M.H., Adeli, H.: A novel machine learning model for estimation of sale prices of real
estate units. J. Constr. Eng. Manag. 142(2), 04015066 (2016)
32. Rosenbaum, M., Zhang, J.: On the universality of the volatility formation process: when
machine learning and rough volatility agree (2022). arXiv preprint arXiv:2206.14114
33. Sanyal, S., Biswas, S.K., Das, D., Chakraborty, M., Purkayastha, B.: Boston house price predic-
tion using regression models. In: 2022 2nd International Conference on Intelligent Technologies
(CONIT), pp. 1–6. IEEE (2022)
34. Shu, H.C., Chang, J.H.: Investor sentiment and financial market volatility. J. Behav. Financ.
16(3), 206–219 (2015)
35. Sonkavde, G., Dharrao, D.S., Bongale, A.M., Deokate, S.T., Doreswamy, D., Bhat, S.K.: Fore-
casting stock market prices using machine learning and deep learning models: a systematic
review, performance analysis and discussion of implications. Int. J. Finan. Stud. 11(3), 94
(2023)
36. Song, Y., Ma, X.: Exploration of intelligent housing price forecasting based on the anchoring
effect. Neural Comput. Appl. 18, 1–4 (2023)
37. Tchuente, D., Nyawa, S.: Real estate price estimation in French cities using geocoding and
machine learning. Ann. Oper. Res. 1–38 (2022)
38. Valickova, P., Havranek, T., Horvath, R.: Financial development and economic growth: a meta-
analysis. J. Econ. Surv. 29(3), 506–526 (2015)
39. Verma, A., Nagar, C., Singhi, N., Dongariya, N., Sethi, N.: Predicting house price in India
using linear regression machine learning algorithms. In: 2022 3rd International Conference on
Intelligent Engineering and Management (ICIEM), pp. 917–924. IEEE (2022)
40. Wiradinata, T., Graciella, F., Tanamal, R., Soekamto, Y.S., Saputri, T.R.D.: Post-Pandemic
Analysis of House Price Prediction in Surabaya: A Machine Learning Approach (2022)
41. Xu, X., Zhang, Y.: Retail property price index forecasting through neural networks. J. Real
Estate Portfolio Manag. 29(1), 1–28 (2023)
42. Yu, Y., Lu, J., Shen, D., Chen, B.: Research on real estate pricing methods based on data mining
and machine learning. Neural Comput. Appl. 33, 3925–3937 (2021)
Chapter 21
Deep Learning Models in Finance: Past,
Present, and Future

Sai Krishna Vishnumolakala, Sri Raj Gopu, Jatindra Kumar Dash,


Sasikanta Tripathy, and Shailender Singh

Abstract Over the past few decades, the financial industry has shown a keen interest
in using computational intelligence to improve various financial processes. As a
result, a range of models have been developed and published in numerous studies.
However, in recent years, deep learning (DL) has gained significant attention within
the field of machine learning (ML) due to its superior performance compared to
traditional models. There are now several different DL implementations being used
in finance, particularly in the rapidly growing field of Fintech. DL is being widely
utilized to develop advanced banking services and investment strategies. This chapter
provides a comprehensive overview of the current state-of-the-art in DL models for
financial applications. The chapter is divided into categories based on the specific
sub-fields of finance, and examines the use of DL models in each area. These include
algorithmic trading, price forecasting, credit assessment, and fraud detection. The
chapter aims to provide a concise overview of the various DL models being used in
these fields and their potential impact on the future of finance.

Keywords Deep learning · Algorithmic trading · Price forecasting · Fraud


detection · Credit assessment

S. K. Vishnumolakala · S. R. Gopu · J. K. Dash (B) · S. Singh


SRM University AP, Amaravat, Andhra Pradesh 522240, India
e-mail: [email protected]
S. K. Vishnumolakala
e-mail: [email protected]
S. Tripathy
University of Bahrain, Zallaq, Kingdom of Bahrain
S. Singh
Symbiosis Centre for Management Studies, Symbiosis International University, Noida, India

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 453
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_21
454 S. K. Vishnumolakala et al.

21.1 Introduction

Over the past decade, advancements in artificial intelligence (AI) have permeated
almost all areas of human endeavor. Its versatility and adaptability have paved the
way for the execution of tasks with increased precision and efficacy, consequently
transforming traditional industries and institutions. The finance industry, renowned
for its intricate systems and prodigious generation of data, has particularly felt the
seismic shift brought about by the advent of AI.
One significant method of AI implementation is through machine learning, a
system that empowers computers to learn from data and improve from experience
without being explicitly programmed. In the echelons of machine learning, a partic-
ular subset known as deep learning has emerged as a game-changer. Rooted in artifi-
cial neural networks with multiple levels of abstraction, deep learning demonstrates
an exceptional ability to discern and decode complex patterns in large data sets,
mirroring the workings of the human brain in processing data for decision-making.
Deep learning, however, has not yet fully explored in the context of finance and
this presents a ripe area of investigation. Recognizing this, the core of this chapter
revolves around examining the role and potential of deep learning within the finance
industry. The primary objective of this study is to investigate the efficacy of deep
learning algorithms in diverse financial areas, including algorithmic trading, price
forecasting, fraud detection, and credit assessment. We aim to draw comparisons
between these novel techniques and the traditional statistical approaches, identifying
their advantages, limitations, and areas of application.
Furthermore, we also aspire to amalgamate ongoing research and experiments in
this field, with the ultimate goal of elucidating the process of adopting deep learning
in financial settings. By spotlighting the challenges, we hope to spur further discourse
on the practical implications and to stimulate innovative solutions that may pave the
way for an even more fruitful use of deep learning in finance.
The structure of this chapter is as follows: we commence with an exposition on
the role of AI and deep learning in the finance sector, building the foundation for our
subsequent exploration. This is followed by a deep dive into specific applications
and case studies of deep learning, dissecting its applications in areas like algorithmic
trading, price forecasting, fraud detection, and credit assessment. Ultimately, we draw
conclusions from our analysis, critically reflect on the implications of our research,
and outline potential areas of inquiry for future studies in this rapidly evolving field.

21.2 Algorithmic Trading and Price Forecasting

The financial sector has notably embraced advancements in information and commu-
nication technologies. Investors are usually driven by opportunities where the advan-
tages of data gathering, decision-making, and trade strategies implementation result
in potential gains. The rise of the Internet has facilitated a shift in the financial
21 Deep Learning Models in Finance: Past, Present, and Future 455

Fig. 21.1 Algorithmic


trading process flow

market business towards increased automation, culminating in what is now known


as algorithmic trading (AT).
Algorithmic trading broadly refers to various aspects of financial decision-making
and investment carried out electronically via computers and communication networks
[1]. Every investor and investing firm prioritises the acquisition and rapid deployment
of new technology and algorithms. These algorithms are employed in either adap-
tive or predetermined trading strategies, often developed using artificial intelligence
techniques.
Trading strategies can be automated by creating a series of computer-executed
instructions for identifying investment opportunities and executing orders across
different asset classes. The critical components of these trading processes are shown
in Fig. 21.1.
The trading strategy generation process comprises four steps, as depicted in
Fig. 21.1. It starts with the analysis of market data and relevant external news.
Computer tools, such as spreadsheets or charts, frequently support this analysis,
which is vital for generating a trading signal and strategy. The heart of AT lies
in the trading model and decision-making. The execution of the trading strategy,
which can be automated by a computer, marks the final step of the process. The
many advantages of algorithmic trading, particularly those based on deep learning,
have attracted interest from academics and financial investors. Algorithmic strategy
systems exhibit more stability than traditional human-defined strategy rules, as they
remain unaffected by negative human emotions, such as fear and greed. Research by
Treleaven et al. revealed that algorithmic trading constituted over 70% of all trading in
American stocks in 2011 [2]. Moreover, when Hendershott and Riordan investigated
how algorithmic trading impacted price formation, they found it was more beneficial
456 S. K. Vishnumolakala et al.

for price flow than manual trading [3]. Algorithmic trading positively influenced
market quality, as suggested by research conducted by Boehmer et al. [4].
Numerous deep learning-based algorithmic trading systems have been developed
to achieve a variety of trading objectives. Some systems aim to forecast price trajecto-
ries of financial assets (stocks, indices, bonds, currencies, etc.), some execute trades
based on buying and selling signals, and others generate asset returns by simu-
lating real-world financial scenarios. There are also systems designed to facilitate
independent research, such as pair trading, buying and selling signals, and more.
Since Algorithmic trading involves buy-sell decisions made exclusively by mathe-
matical algorithms, these decisions can be supported by straightforward principles,
mathematical models, optimal procedures, or even highly sophisticated function
approximation methods typical of machine learning/deep learning.
Over the past two decades, algorithmic trading has significantly transformed the
financial sector, primarily due to the development of electronic online trading plat-
forms and frameworks. As a result, algorithmic trading models based on deep learning
(DL) began receiving significant interest. The majority of algorithmic trading appli-
cations combine price prediction models for market timing purposes. As such, most
price or trend forecasting algorithms that generate buy-sell signals based on their
predictions are referred to as algorithmic trading systems.
However, some studies propose stand-alone algorithmic trading models that focus
on the dynamics of the transaction, optimising trading parameters like bid-ask spread,
limit order book analysis, position sizing, and more. This topic particularly piques
the interest of researchers studying High-Frequency Trading (HFT). DL models have
subsequently begun to appear in HFT studies.
Hu et al. present a comprehensive review of significant evolutionary algorithmic
implementations on algorithmic trading models [5]. Since algorithmic trading and
financial time series forecasting are closely intertwined, numerous ML survey papers
focus on forecasting-based algorithmic trading models. Those interested in this topic
can refer to [6] for more details.
Most studies on algorithmic trading have concentrated on forecasting stock or
index prices. Long Short-Term Memory (LSTM) has been the most used DL model
in these implementations. In [7], price prediction for algorithmic stock trading was
conducted using Recurrent Neural Networks (RNN) with Graves LSTM, using trade
indicators based on market microstructures as the input. Bao et al. utilised technical
indicators for their work. The research outlined in [8] discussed forecasting stock
prices using Wavelet Transforms (WT), LSTM, and Stacked Autoencoders (SAEs).
The research work presented in [9] combined the implementation of Convolutional
Neural Networks (CNN) and LSTM model structures (with CNN used for stock
selection and LSTM for price prediction) (Fig. 21.2).
Zhang et al. [10] presented an innovative method for stock price prediction with a
State Frequency Memory (SFM) recurrent network with different frequency trading
patterns, which improved prediction and trading performances. Tran et al. [11] devel-
oped a DL model that forecasts price changes through midprice prediction using
high-frequency limit order book data with tensor representation for an HFT trading
21 Deep Learning Models in Finance: Past, Present, and Future 457

Fig. 21.2 Price prediction using deep learning base sentiment analysis and corresponding predic-
tion models process flow

system. The authors of [12] utilised Fuzzy Deep Direct Reinforcement Learning
(FDDR) to anticipate stock prices and generate trading signals.
Noteworthy research exists for index prediction as well. The implementation of
S&P500 index price prediction using LSTM can be found in [13]. For the Greek Stock
Exchange Index prediction, Mourelatos et al. [14] compared the performance of
LSTM and Genetic Algorithm with a Support Vector Regression (GASVR). Chinese
intraday futures market trading model using Deep Reinforcement Learning (DRL)
and LSTM was implemented by Si et al. Yong et al. The research presented in [15]
used the DMLP approach to forecast Singapore Stock Market index data, considering
the Open, Close, High, Low values of the time series index data.
Certain studies have utilised trading in cryptocurrencies or forex. The research
[16] developed and evaluated agent-inspired trading using deep (recurrent) reinforce-
ment learning and LSTM in the trading of the GBP/USD. DMLP was used in [17]
to predict trading prices in commodities and foreign exchange. Korczak et al. [18]
used a multi-agent-based trading environment to implement a forex trading (GBP/
PLN) model using various input parameters. One of the agents outperformed all
other models when using CNN for prediction. Spilak et al. [19] used LSTM, RNN,
and DMLP algorithms to construct a dynamic portfolio utilising a variety of cryp-
tocurrencies. Jeong et al. [20] implemented a simple Deep Q-Network (DQN) for the
trading of Bitcoin. This is by no means an exhaustive analysis of all the different types
of models and techniques used in price prediction, but it provides a good overview
of some of the more notable examples. Despite these advancements, there remains
ample room for further exploration and development, particularly with the goal of
improving the effectiveness and applicability of deep learning in algorithmic trading
and price forecasting. Sezer et al. provide a systematic literature review covering
deep learning applications in financial time series forecasting from 2005 to 2019,
highlighting broad implementation areas and substantial impacts in academia and
the finance industry [21]. Another study by Sezer et al. proposes a deep neural-
network-based stock trading system optimized with technical analysis parameters
using genetic algorithms [22]. Navon and Keller present an end-to-end deep learning
approach for financial time series prediction, leveraging raw financial data inputs
to predict temporal trends in NYSE and NASDAQ stocks and ETFs [23]. Troiano
et al. explore using LSTM networks to learn trading rules from market indicators
458 S. K. Vishnumolakala et al.

and trading decisions [24]. Sirignano and Cont uncover universal and stationary
relations between order flow history and price move direction using a large-scale
deep learning approach applied to high-frequency market data [25]. Tsantekidis
et al. develop a deep learning model to detect price change indications in finan-
cial markets, addressing the noisy and stochastic nature of markets [26]. Gudelek
et al. propose a novel method for predicting stock price movements using convolu-
tional neural networks (CNN) with ETFs to avoid high market volatility [27]. Sezer
and Ozbayoglu introduce an algorithmic trading model using a 2-D CNN based
on image processing properties, converting financial time series into 2-D images
with various technical indicators [28]. Hu et al. present a deep stock representa-
tion learning approach from candlestick charts to investment decisions, addressing
limitations in existing stock similarity measurements [29]. Tsantekidis et al. propose
forecasting stock prices using CNNs applied to limit order book data, aiming to detect
repeated patterns of price movements [30]. Gunduz et al. use CNN architecture with
an ordered feature set to predict the intraday direction of Borsa Istanbul 100 stocks
[31]. Chen et al. develop an agent-based reinforcement learning system to mimic
professional trading strategies from large trading records [32]. Wang et al. leverage
deep learning to model the stock market structure as a correlation network of stocks,
improving market structure modeling [33]. Day and Lee use deep learning for finan-
cial sentiment analysis on news articles from financial news providers, enhancing
market sentiment understanding [34]. Sirignano applies deep learning techniques to
model the dynamics of limit order books, aiming to uncover patterns for predicting
future price movements [35]. Gao explores the use of deep reinforcement learning
for time series analysis in trading games, developing models to learn optimal trading
strategies through simulated environments [36].

21.3 Fraud Detection

Fraud detection, one of the most heavily studied topics in finance for deep learning
research, is an area of increasing importance for governments, authorities, and
financial institutions. Financial fraud manifests in various forms, including credit
card fraud, money laundering, consumer credit fraud, tax evasion, bank fraud, and
insurance claim fraud.
Historically, financial institutions relied on rule-based analysis, devised by domain
experts, to detect fraud. These rules were based on general patterns of fraudulent
transactions or events within finance or banking sectors. However, such rule-based
inference only considers a limited set of attributes, as comprehending all possible
patterns is a challenging task. With the advent of deep learning techniques, we can
now process data and recognize both generalized and complex patterns with higher
efficiency, thus potentially increasing the accuracy of financial fraud detection.
Several investigations into accounting and financial fraud detection, including
those by Kirkos et al. [37], Yue et al. [38], Wang et al. [39], Phua et al. [40], Ngai et al.
[41], Sharma et al. [41], and West et al. [42], have employed soft computing and data
21 Deep Learning Models in Finance: Past, Present, and Future 459

mining approaches. These fraud detection issues typically present as classification


problems and can be viewed through the lens of anomaly detection. A wealth of
research has focused on credit card fraud detection. For example, Heryadi et al.
[43] developed several deep learning models for credit card fraud detection within
Indonesian banks and examined the impact of data imbalance between fraud and non-
fraud data. In a more recent investigation, Roy et al. [44] employed a Long Short-
Term Memory (LSTM) model for credit card fraud detection, while [45] utilized
Deep Multi-layer Perceptron (DMLP) networks to determine whether a credit card
transaction was fraudulent or not (Fig. 21.3).
Sohony et al. [46] used an ensemble of Feedforward Neural Networks (FFNN) to
detect card fraud, while Jurgovsky et al. [47] applied LSTM to identify credit card
fraud from transaction sequences, comparing their results with Random Forest (RF)
models. Other researchers, such as Paula et al. [48], used deep autoencoder-based
anomaly detection to identify financial fraud and money laundering in Brazilian
companies’ export tax claims. In a related investigation, Gomes et al. [49] proposed
an anomaly detection model that used deep autoencoders to identify anomalies in
parliamentary spending during Brazilian elections.
Other applications of deep learning in fraud detection include the use of text
mining and DMLP models to identify vehicle insurance fraud [50], DMLP models to
identify fraudulent online payment transactions [51], and the application of LSTM
to character sequences in financial transactions to determine if a transaction was
fraudulent [52]. Goumagias et al. [53] used Deep Q-learning, a form of reinforcement
learning, to predict tax evasion behaviors of risk-averse businesses, providing advice
on maximizing tax revenue for states.
Fraud Detection shares many domain characteristics with Risk Assessment, and
the corresponding choices of models, features, and datasets are likewise closely
related. The most significant distinction that could be drawn is the preference for

Fig. 21.3 Comparison between conventional and deep learning approaches for fraud detection
460 S. K. Vishnumolakala et al.

customer data over credit data for fraud detection, largely owing to the differing
fundamental dynamics of risk assessment and fraud detection.

21.4 Credit Assessment

Financial institutions lend money to customers at interest, acting as cornerstones of


the finance sector. Before lending, these institutions must evaluate the creditworthi-
ness of a potential borrower. Credit assessment is an analysis and evaluation of a
borrower’s ability and willingness to repay a loan, and it plays a crucial role in the
loan approval process.
Credit assessment involves evaluating various factors such as a borrower’s credit
score, employment history, income level, debt-to-income ratio, and other financial
obligations. The lender also evaluates the borrower’s ability to repay the loan based
on their current financial situation and future financial prospects, potentially verifying
any collateral to secure the loan and assessing the borrower’s credit report (Fig. 21.4).
Traditional credit assessment, such as the 5C’s approach (Character, Capital,
Collateral, Capacity, Condition), which dates back to the 1950s [54], required manual
processing. Scorecards were developed to treat all borrowers equally and quantify
the risk of lending money to borrowers.
Credit scores were calculated manually before the advent of machine learning and
deep learning in the finance sector. A financial institution would construct a credit
score model based on its requirements, drawing on its knowledge and experience

Fig. 21.4 Measures for


credit risk scoring
21 Deep Learning Models in Finance: Past, Present, and Future 461

Fig. 21.5 Credit risk analysis process flow

working with borrowers. This credit score model often included a borrower’s credit
history, income, debt-to-income ratio, and other financial details.
Machine learning and deep learning have increasingly been applied to automate
the credit assessment process. These techniques analyze a wide range of data points
related to potential borrowers, including financial information (income, credit history,
debt obligations) and non-financial information (occupation, education, age). The
predictive power of these algorithms can provide a more accurate assessment of a
borrower’s ability to repay a loan. They can also identify patterns and trends in the
data that further improve credit assessment accuracy (Fig. 21.5).
Today, most credit lending institutions follow a two-phase system. Initially, they
calculate the probability of a person defaulting. If the probability is less than a
certain threshold, then the person is classified as a non-defaulter. Based on the input
parameters of the model, the output also includes the internal rate or return on the
credit for the institution based on the probability of default. Various deep learning
techniques used today are trained with a similar output structure.
In their study, Baesens et al. [55] examined the effectiveness of several classifica-
tion algorithms using eight real-world credit score datasets. Their analysis included
well.
In another study, Alaka et al. reviewed and categorised the most recent credit
scoring techniques and datasets. They evaluated classifiers, such as decision tree,
random forest, and gradient boosting on three publicly available datasets. The study’s
findings show that the recent application of machine learning techniques to credit
scoring problems can significantly enhance classification accuracy. A review by
Sharma and Panigrahi examines various data mining techniques used for detecting
financial accounting fraud, providing a comprehensive overview of their strengths
and limitations [56]. Pandey et al. explore the application of machine learning classi-
fiers for credit risk analysis, aiming to improve the accuracy and efficiency of credit
risk prediction [57]. Gunnarsson et al. investigate the use of deep learning for credit
462 S. K. Vishnumolakala et al.

scoring, comparing it with traditional methods and assessing its performance and
reliability [58]. Tripathi et al. present an experimental analysis of various machine
learning methods for credit score classification, evaluating their effectiveness in
accurately classifying credit scores [59].
Overall, machine learning and deep learning techniques can improve the accuracy
and efficiency of credit assessments. They can help financial institutions make more
informed decisions about lending, reduce the risk of bad loans, and provide faster
loan decisions. This can improve customer service and help financial institutions
remain competitive in the rapidly evolving financial industry.

21.5 Summary and Conclusions

In the scope of this chapter, we have analyzed various applications of deep learning,
a promising subset of artificial intelligence, in the domain of finance. These include
algorithmic trading, price forecasting, fraud detection, and credit assessment.
In terms of our objective, our investigation revealed that deep learning shows
significant promise in all areas examined. Deep learning algorithms demonstrated
higher efficiency and accuracy in detecting fraudulent transactions compared to
conventional rule-based methods. They have also shown effectiveness in automating
the credit assessment process, making it more accurate and efficient. However, while
these applications have shown great promise, it’s crucial to remember that these
algorithms are not a panacea for automated decision-making in finance and come
with their own challenges. Relevance to our stated objective lies in understanding the
pivotal role deep learning plays in enhancing and reshaping crucial financial opera-
tions, leading to improved accuracy, efficiency, and insights. The increasing adoption
of deep learning signals a transformation in the finance industry and indicates future
trends.
This research serves as a comprehensive overview of the current state of deep
learning applications in finance, while also providing insight into its challenges and
potential future directions. The conclusions drawn here underline the need for strong
scientific reasoning skills when adopting deep learning in finance and caution against
an over-reliance on in-sample fitting metrics. A keen understanding of the limitations
of forecasting models, as observed during the financial crisis of 2008, should guide
the adoption of these advanced techniques to avoid pitfalls associated with siloed data
extraction and over-reliance on automation. Ultimately, deep learning offers exciting
potential in finance, but its integration requires careful thought, understanding, and
a measured approach.
21 Deep Learning Models in Finance: Past, Present, and Future 463

References

1. Eriksson, S., Roding, C.: Algorithmic Trading Uncovered—Impacts on an Electronic Exchange


of Increasing Automation in Futures Trading. Royal Institute of Technology, Stockholm, 2007
2. Treleaven, P., Galas, M., Lalchand, V.: Algorithmic trading review. Commun. ACM 56, 76–85
(2013). https://doi.org/10.1145/2500117
3. Hendershott, T., Riordan, R.: Algorithmic Trading and Information. University of California,
Berkeley (2009)
4. Boehmer, E., Fong, K., Wu, J.J.: International evidence on algorithmic trading. SSRN Electron.
J. (2012). https://doi.org/10.2139/ssrn.2022034
5. Hu, Y., Liu, K., Zhang, X., Su, L., Ngai, E.W.T., Liu, M.: Application of evolutionary compu-
tation for rule discovery in stock algorithmic trading: a literature review. Appl. Soft Comput.
36, 534–551 (2015)
6. Sezer, O.B., Ozbayoglu, A.M.: Financial trading model with stock bar chart image time series
with deep convolutional neural networks (2019). arXiv preprint arXiv:1903.04610
7. Karaoglu, S., Arpaci, U.: A deep learning approach for optimization of systematic signal
detection in financial trading systems with big data. Int. J. Intell. Syst. Appl. Eng., 31–36
(2017)
8. Bao, W., Yue, J., Rao, Y.: A deep learning framework for financial time series using stacked
autoencoders and long-short term memory. PLoS ONE 12(7), e0180944 (2017)
9. Liu, S., Zhang, C., Ma, J.:. Cnn-lstm neural network model for quantitative strategy analysis in
stock markets. Neural Information Processing, pp. 198–206. Springer International Publishing
(2017)
10. Zhang, L., Aggarwal, C., Qi, G.-J.: Stock price prediction via discovering multifrequency
trading patterns. In: Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining—KDD17, ACM Press, 2017
11. Tran, D.T., Magris, M., Kanniainen, J., Gabbouj, M., Iosifidis, A.: Tensor representation in
high-frequency financial data for price change prediction. In: 2017 IEEE Symposium Series
on Computational Intelligence (SSCI), IEEE, Nov 2017
12. Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial
signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2017)
13. Fischer, T., Krauss, C.: Deep learning with long short- term memory networks for financial
market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018)
14. Mourelatos, M., Alexakos, C., Amorgianiotis, T., Likothanassis, S.: Financial indices modelling
and trading utilizing deep learning techniques: the Athens se ftse/ase large cap use case. In:
2018 Innovations in Intelligent Systems and Applications (INISTA), IEEE, July 2018
15. Yong, B.X., Abdul Rahim, M.R., Abdullah, A.S.: A stock market trading system using deep
neural network. Communications in Computer and Information Science, pp. 356–364. Springer
Singapore (2017)
16. Lu, D.W.: Agent inspired trading using recurrent reinforcement learning and lstm neural
networks (2017)
17. Dixon, M.F., Klabjan, D., Bang, J.H.: Classification-based financial markets prediction using
deep neural networks. SSRN Electron. J. (2016)
18. Korczak, J., Hernes, M.: Deep learning for financial time series forecasting in a-trader system.
In: Proceedings of the 2017 Federated Conference on Computer Science and Information
Systems, IEEE, September 2017
19. Spilak, B.: Deep neural networks for cryptocurrencies price prediction. Master’s thesis,
HumboldtUniversitat zu Berlin, Wirtschaftswissenschaftliche Fakultat (2018)
20. Jeong, G., Kim, H.J.: Improving financial trading decisions using deep q-learning: predicting
the number of shares, action strategies, and transfer learning. Expert. Syst. Appl. 117, 125–138
(2019)
21. Sezer, O.B., Gudelek, M.U., Ozbayoglu, A.M.: Financial time series forecasting with deep
learning: a systematic literature review: 2005–2019. arXiv preprint arXiv:1911.13288
464 S. K. Vishnumolakala et al.

22. Sezer, O.B., Ozbayoglu, M., Dogdu, E.: A deep neural-network based stock trading system
based on evolutionary optimized technical analysis parameters. Procedia Comput. Sci. 114,
473–480 (2017)
23. Navon, A., Keller, Y.: Financial time series prediction using deep learning (2017)
24. Troiano, L., Villa, E.M., Loia, V.: Replicating a trading strategy by means of lstm for financial
industry applications. IEEE Trans. Ind. Inform. 14(7), 3226–3234 (2018)
25. Sirignano, J., Cont, R.: Universal features of price formation in financial markets: perspectives
from deep learning. SSRN Electron. J. (2018)
26. Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis. A.: Using deep
learning to detect price change indications in financial markets. In: 2017 25th European Signal
Processing Conference (EUSIPCO), IEEE, Aug 2017
27. Ugur Gudelek, M., Arda Boluk, S., Murat Ozbayoglu, A.: A deep learning based stock trading
model with 2-d cnn trend detection. In: 2017 IEEE Symposium Series on Computational
Intelligence (SSCI), IEEE, Nov 2017
28. Sezer, O.B., Ozbayoglu, A.M.: Algorithmic financial trading with deep convolutional neural
networks: time series to image conversion approach. Appl. Soft Comput. 70, 525–538 (2018)
29. Hu, G., Hu, Y., Yang, K., Yu, Z., Sung, F., Zhang, Z., Xie, F., Liu, J., Robertson, N., Hospedales,
T., Miemie, Q.: Deep stock representation learning: from candlestick charts to investment
decisions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, Apr 2018
30. Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis, A.: Forecasting
stock prices from the limit order book using convolutional neural networks. In: 2017 IEEE
19th Conference on Business Informatics (CBI), IEEE, July 2017
31. Gunduz, H., Yaslan, Y., Cataltepe, Z.: Intraday prediction of borsa Istanbul using convolutional
neural networks and feature correlations. Knowl. Based Syst. 137, 138–148 (2017)
32. Chen, C.-T., Chen, A.-P., Huang, S.-H.: Cloning strategies from trading records using agent-
based reinforcement learning algorithm. In: 2018 IEEE International Conference on Agents
(ICA), IEEE, July 2018
33. Wang, Y., Zhang, C., Wang, S., Yu, P.S., Bai, L., Cui, L.: Deep co-investment network learning
for financial assets (2018)
34. Day, M.Y., Lee, C.-C.: Deep learning for financial sentiment analysis on finance news providers.
In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM), IEEE, Aug 2016
35. Sirignano, J.: Deep Learning for Limit Order Books (2016)
36. Gao, X.: Deep reinforcement learning for time series: playing idealized trading games (2018)
37. Kirkos, E., Spathis, C., Manolopoulos, Y.: Data mining techniques for the detection of
fraudulent financial statements. Expert Syst. Appl. 32(4), 995–1003 (2007)
38. Yue, D., Wu, X., Wang, Y., Li, Y., Chu, C.-H.: A review of data miningbased financial
fraud detection research. In: 2007 International Conference on Wireless Communications,
Networking and Mobile Computing, IEEE, Sep 2007
39. Wang, S.: A comprehensive survey of data mining-based accounting-fraud detection research.
In: 2010 International Conference on Intelligent Computation Technology and Automation,
IEEE, May 2010
40. Phua, C., Lee, V.C.S., Smith-Miles, K., Gayler, R.W.: A comprehensive survey of data mining-
based fraud detection research (2010). CoRR, abs/1009.6119
41. Ngai, E.W.T., Hu, Y., Wong, Y.H., Chen, Y., Sun, X.: The application of data mining techniques
in financial fraud detection: a classification framework and an academic review of literature.
Decis. Support. Syst. 50(3), 559–569, Feb 2011
42. West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review.
Comput. Secur. 57, 47–66 (2016)
43. Heryadi, Y., Warnars, H.L.H.S.: Learning temporal representation of transaction amount for
fraudulent transaction recognition using cnn, stacked lstm, and cnn-lstm. In: 2017 IEEE Inter-
national Conference on Cybernetics and Computational Intelligence (CyberneticsCom), IEEE,
Nov 2017
21 Deep Learning Models in Finance: Past, Present, and Future 465

44. Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., Beling, P.: Deep learning detecting fraud
in credit card transactions. In: 2018 Systems and Information Engineering Design Symposium
(SIEDS), IEEE, Apr 2018
45. Gomez, J.A., Ar’evalo, J., Paredes, R., Nin, J.: End-to-end neural network architecture for fraud
scoring in card payments. Pattern Recognit. Lett. 105, 175–181 (2018)
46. Sohony, I., Pratap, R., Nambiar, U.: Ensemble learning for credit card fraud detection. In:
Proceedings of the ACM India Joint International Conference on Data Science and Management
of Data—CoDS-COMAD18, ACM Press, 2018
47. Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.-E., He-Guelton, L., Caelen,
O.: Sequence classification for credit-card fraud detection. Expert Syst. Appl. 100, 234–245
(2018)
48. Paula, E.L., Ladeira, M., Carvalho, R.N., Marzagao, T.: Deep learning anomaly detection as
support fraud investigation in brazilian exports and anti-money laundering. In: 2016 15th IEEE
International Conference on Machine Learning and Applications (ICMLA), IEEE, Dec 2016
49. Gomes, T.A., Carvalho, R.N., Silva Carvalho, R.: Identifying anomalies in parliamentary
expenditures of brazilian chamber of deputies with deep autoencoders. In: 2017 16th IEEE
International Conference on Machine Learning and Applications (ICMLA), IEEE, Dec 2017
50. Wang, Y., Xu, W.: Leveraging deep learning with lda-based text analytics to detect automobile
insurance fraud. Decis. Support. Syst. 105, 87–95 (2018)
51. Li, L., Zhou, J., Li, X., Chen, T.: Poster: practical fraud transaction prediction. In: ACM
Conference on Computer and Communications Security, 2017
52. de Souza Costa, A.I., Silva, L.: Sequence classification of the limit order book using recurrent
neural networks, 2016
53. Goumagias, N.D., Hristu-Varsakelis, D., Assael, Y.M.: Using deep q-learning to understand
the tax evasion behavior of risk-averse firms. Expert Syst. Appl. 101, 258–270 (2018)
54. Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its Applications (2002)
55. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Bench-
marking state-of-the-art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6),
627–635 (2003)
56. Sharma, A., Panigrahi, P.K.: A review of financial accounting fraud detection based on data
mining techniques. Int. J. Comput. Appl. 39(1), 37–47 (2012)
57. Pandey, T.N., Jagadev, A.K., Mohapatra, S.K., Dehuri, S.: Credit risk analysis using machine
learning classifiers. In: 2017 International Conference on Energy, Communication, Data
Analytics and Soft Computing (ICECDS), 2017
58. Gunnarsson, B.R., Vanden Broucke, S., Baesens, B., Óskarsdóttir, M., Lemahieu, W.: Deep
learning for credit scoring: do or don’t? Eur. J. Oper. Res. 295(1), 292–305 (2021)
59. Tripathi, D., Edla, D.R., Bablani, A., Shukla, A.K., Reddy, B.R.: Experimental analysis of
machine learning methods for credit score classification. Prog. Artif. Intell. 10(3), 217–243
(2021)
Chapter 22
New Paradigm in Financial Technology
Using Machine Learning Techniques
and Their Applications

Deepti Patnaik and Srikanta Patnaik

Abstract Due to the inherent risks and challenges associated with financial manage-
ment, researchers have faced a significant obstacle when analyzing financial data.
The necessity for developing innovative models to comprehend financial assets
has become imperative due to the transformation of the foundational principles
underpinning financial markets. In order to provide a precise representation of
data, scholars have introduced various machine learning systems that have shown
promising outcomes. Within the pages of this book chapter, we delve into the progres-
sion of machine learning in the realm of finance over the past decade, with a particular
focus on its applications encompassing Algorithmic Trading, Fraud Detection and
Prevention, Portfolio Management, and Loan Underwriting. Algorithmic Trading
is a methodology that leverages machine learning algorithms to extract knowledge
from data, enabling and enhancing essential investment endeavors. These algorithms
can acquire rules or structures from data in pursuit of objectives such as reducing
prediction errors. In an era where fraudsters continuously evolve and sharpen their
tactics, maintaining constant vigilance is crucial to thwart fraud and stay one step
ahead of malicious actors. It is imperative to be attuned to significant trends that can
distinguish between legitimate and fraudulent transactions. This section compre-
hensively analyzes multiple machine learning algorithms, supported by examples.
Moreover, the chapter delves into the examination of the impact of machine learning
approaches in assessing credit risk and finance. It scrutinizes the limitations of recent
studies and explores emerging research trends in this domain.

Keywords Machine learning · Artificial intelligence · Fraud detection · Credit


scoring · Risk management · Algorithm trading · Underwriting · Portfolio
management

D. Patnaik (B)
Kalinga University, Raipur, India
e-mail: [email protected]
S. Patnaik
Interscience Institute of Management and Technology, Bhubaneswar, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 467
L. A. Maglaras et al. (eds.), Machine Learning Approaches in Financial Analytics,
Intelligent Systems Reference Library 254,
https://doi.org/10.1007/978-3-031-61037-0_22
468 D. Patnaik and S. Patnaik

22.1 Introduction

The field of machine learning (ML), a subset of artificial intelligence (AI), leverages
statistical techniques to imbue computer models with the ability to learn from data,
empowering them to perform specific tasks without explicit programming. A new
era of machine learning and data science is currently unfolding within the realm
of banking, promising to reshape the industry in the years to come. Presently, a
majority of financial institutions, including hedge funds, capital investment firms,
retail banks, and fintech companies, are actively adopting and investing in machine
learning. Consequently, the financial sector is poised to require a growing number
of experts specializing in machine learning and data science [1].
Machine learning has gained prominence in the finance sector, primarily due to the
availability of vast data volumes and increased processing capabilities. Data science
and machine learning have found extensive utility across all facets of finance. While
the primary goal of applying machine learning in finance is to enhance accuracy,
this may not be the sole criterion for evaluating system effectiveness, especially in
the context of financial trading. Profitability and cumulative returns over a defined
trading period emerge as the most crucial metrics for assessing trading strategies [2].
A series of experiments were conducted to investigate the impact of three vari-
ables: the size of the training dataset, the duration of retraining, and the number of
features in both the training and test datasets. The results revealed relatively low
accuracy, with only a marginal improvement over the 50% mark. However, they
also demonstrated highly promising outcomes in terms of profitability. Multiple
references underscore the existing flaws in the credit lending system for various
reasons [3]. Machine learning can unearth novel relationships that human intuition
might never consider exploring, raising ethical and legal considerations regarding its
application.
To ensure the success of machine learning in the banking industry, it is imperative
to construct robust infrastructure, employ the appropriate toolsets, and implement
suitable algorithms. These factors collectively play a pivotal role in harnessing the
potential of machine learning within the banking sector.

22.2 New Paradigm in Financial Technology Using


Machine Learning Techniques and Their Applications

The financial technology has been undergoing a significant transformation in recent


years, driven in large part by the application of machine learning techniques. These
technologies are enabling innovative solutions that are reshaping various aspects of
the financial sector. Here’s a look at some new paradigms in financial technology
using machine learning and their applications (Fig. 22.1):
22 New Paradigm in Financial Technology Using Machine Learning … 469

Portfolio
Management
Algorithmic Options
Trading Pricing and
Risk

Credit Machine Insurance


Scoring Learning Underwriting
and its
application
s
Fraud Risk
Detection Management
Predictive
Maintenance

Fig. 22.1 Applications of machine learning

1. Algorithmic Trading: ML models can analyze historical trading data, news


sentiment, and market indicators to make real-time trading decisions. These
algorithms can execute trades at speeds and frequencies impossible for humans,
optimizing trading strategies and risk management [4]. Algorithmic trading and
ML are closely intertwined in the world of finance, particularly in the domain
of quantitative trading. ML techniques have been instrumental in enhancing the
capabilities of algorithmic trading strategies. Here’s how algorithmic trading and
ML are connected:
• Strategy Development: Traders or quantitative analysts (quants) develop
specific trading strategies based on various factors, including technical indi-
cators, fundamental data, statistical models, and market conditions. These
strategies can be designed to exploit arbitrage opportunities, execute large
orders efficiently, or implement various trading styles. ML is used to develop
trading strategies by analyzing historical market data. ML models can identify
complex patterns and relationships that may not be evident to human traders.
These patterns can be used to formulate algorithmic trading strategies that
aim to predict price movements, market trends, and trading opportunities [5].
• Predictive Analytics: ML algorithms, such as regression models, decision
trees, and neural networks, are employed to make predictions about future
market conditions [6]. For example, ML models can predict stock price
movements, volatility, or market sentiment based on various input features
like historical price data, trading volumes, news sentiment, and economic
indicators.
470 D. Patnaik and S. Patnaik

• Risk Management: Effective risk management is critical in algorithmic


trading to prevent significant losses. Traders must define risk limits, posi-
tion sizing rules, and stop-loss mechanisms to protect their capital. ML
plays a crucial role in managing risk in algorithmic trading. ML models can
assess portfolio risk by analyzing the correlations and dependencies between
different assets in a portfolio. They can also provide real-time risk assessment,
helping traders adjust their positions or hedge against potential losses [4].
• Execution Optimization: ML algorithms are used to optimize the execution
of trades. These algorithms consider factors like market liquidity, order book
dynamics, and transaction costs to execute trades efficiently and minimize
market impact [5].
• Market Microstructure Analysis: ML techniques are applied to analyze
market microstructure data, including order flow, bid-ask spreads, and order
book imbalances. This analysis can inform algorithmic strategies by identi-
fying patterns that are indicative of market manipulation or unusual trading
behavior.
• High-Frequency Trading (HFT): Algorithms can execute trades automati-
cally, sending orders to electronic exchanges or trading platforms. The speed
and efficiency of execution are crucial in high-frequency trading (HFT) strate-
gies. HFT firms rely heavily on ML for their trading strategies. ML algorithms
enable HFT systems to make split-second trading decisions based on real-time
data feeds, helping them capitalize on arbitrage opportunities and market
inefficiencies [7].
• Natural Language Processing (NLP): NLP models are used to analyze
news articles, social media, and other textual data to gauge market sentiment
and news sentiment. This sentiment analysis can inform algorithmic trading
decisions [7].
• Reinforcement Learning: Some algorithmic trading strategies are developed
using reinforcement learning, where the algorithm learns optimal trading poli-
cies through trial and error in simulated trading environments. This approach
can lead to the discovery of novel trading strategies [8].
• Portfolio Optimization: ML can be used to optimize portfolio construction
by selecting the best combination of assets to achieve specific risk and return
objectives. These algorithms consider factors like historical asset performance
and correlations.
• Market Anomaly Detection: ML models can identify unusual market
behavior or anomalies that may indicate opportunities or risks. For example,
anomaly detection can be used to flag potential market manipulation or flash
crashes [9].
• Adaptive Strategies: ML-driven trading algorithms can adapt to changing
market conditions in real time. They can adjust their trading parameters and
strategies based on the evolving market environment [7].
• Backtesting and Simulation: Before deploying an algorithm in live trading,
it’s common practice to test it on historical data to assess its performance. Back
testing helps traders understand how the strategy would have performed in the
22 New Paradigm in Financial Technology Using Machine Learning … 471

past and can identify potential issues. ML is used to backtest and simulates the
trading strategies using historical data. This helps traders assess the viability
and profitability of their algorithms before deploying them in live markets.

While ML has brought significant advancements to algorithmic trading, it’s impor-


tant to note that the application of ML in finance also comes with challenges, such as
model interpretability, overfitting, and the risk of unexpected market events. Proper
risk management and ongoing monitoring are essential to ensure the effectiveness
and safety of ML-powered algorithmic trading strategies.
2. Credit Scoring: ML is used to assess credit risk by analyzing an individual’s
credit history, transaction data, and other factors. It helps lenders make more
accurate decisions about whether to grant loans and at what interest rates. ML
has had a significant impact on the field of credit scoring, revolutionizing the way
lenders assess the creditworthiness of individuals and businesses. ML techniques
have the potential to improve the accuracy and efficiency of credit scoring models
[10–14]. Here’s how ML is integrated into credit scoring:
• Improved Predictive Models: ML algorithms, such as random forests,
gradient boosting, and neural networks, can analyze a wider range of data
points and patterns than traditional credit scoring models. This allows for
more accurate predictions of credit risk.
• Feature Engineering: ML can help identify relevant features or variables
that traditional models may overlook. It can automatically extract valuable
insights from non-traditional data sources, such as social media activity, online
behavior, or transaction history [14]. ML models can consider a wider array
of features (variables) than traditional models. These features may include not
only credit history but also alternative data sources like social media activity,
online behavior, and transaction data.
• Alternative Data: ML models are adept at incorporating alternative data
sources, such as rental payment history, utility bills, and mobile phone usage,
which can be particularly useful for individuals with limited credit histories
or those without traditional bank accounts.
• Credit Scoring Automation: ML algorithms can automate the credit scoring
process, making it faster and more efficient. This can reduce the time it takes
to make lending decisions and improve the overall customer experience.
• Model Adaptation: ML models can adapt to changing market conditions
and evolving consumer behavior more readily than static, rule-based models.
They can continuously learn and update their predictions as new data becomes
available.
• Ensemble Models: ML allows for the creation of ensemble models that
combine the predictions of multiple algorithms. These ensemble models can
provide more robust and accurate credit risk assessments. Ensemble learning
techniques, such as random forests and gradient boosting, can combine
multiple models to improve predictive accuracy and reduce overfitting.
472 D. Patnaik and S. Patnaik

• Fraud Detection: ML can help detect fraudulent activities or unusual patterns


in credit applications by using anomaly detection algorithms. ML can be
integrated into credit scoring systems to help identify and prevent fraud. It
can analyze transaction patterns and detect suspicious activities in real-time.
• Regulatory Compliance: ML models can be designed to meet regulatory
requirements for credit scoring, ensuring that they comply with fair lending
laws and consumer protection regulations.
Despite the many advantages of ML in credit scoring, there are also challenges
to consider, such as model interpretability, data privacy, and the potential for biased
predictions if not carefully managed [15, 16]. Additionally, regulatory oversight and
compliance with fairness and transparency standards are crucial when implementing
ML in credit scoring to ensure ethical and equitable lending practices [17].
3. Fraud Detection: ML models can detect fraudulent transactions by analyzing
patterns and anomalies in transaction data. They can flag potentially suspicious
activities and reduce false positives, saving financial institutions money and
protecting customers. Machine Learning (ML) has proven to be highly effec-
tive in the field of fraud detection across various industries, including finance,
e-commerce, healthcare, and telecommunications [18]. ML techniques enable
organizations to detect fraudulent activities in real-time or during data analysis,
reducing financial losses and enhancing security. Here’s how ML is used for
fraud detection:
• Anomaly Detection: Anomaly detection is a common ML technique used
in fraud detection. ML models learn normal behavior patterns from histor-
ical data and flag any deviations from these patterns as potential anomalies.
For example, abnormal transaction amounts, unusual spending patterns, or
atypical login locations can be indicative of fraud.
• Supervised Learning: In supervised ML, fraud detection models are trained
on labeled data, which includes both legitimate and fraudulent examples.
Common supervised algorithms used for fraud detection include decision
trees, random forests, logistic regression, and support vector machines. These
models learn to distinguish between genuine and fraudulent transactions based
on the provided labels [19].
• Unsupervised Learning: Unsupervised ML techniques are employed when
labeled fraud data is scarce. Clustering algorithms like k-means or density-
based methods can identify unusual patterns or clusters in data that may
indicate fraudulent behavior [19].
• Semi-Supervised Learning: In situations where only a small portion of data
is labeled, semi-supervised learning can be used. This approach combines
labeled and unlabeled data to improve model performance.
• Deep Learning: Deep neural networks, particularly recurrent neural networks
(RNNs) and convolutional neural networks (CNNs), can be used for fraud
22 New Paradigm in Financial Technology Using Machine Learning … 473

detection tasks. Deep learning models can capture complex patterns in sequen-
tial data (e.g., transaction sequences) and unstructured data (e.g., images of
documents) to identify fraud [20].
• Real-Time Monitoring: ML models can be deployed in real-time systems
to monitor transactions and activities as they occur. This enables imme-
diate detection and prevention of fraudulent transactions, reducing financial
losses and customer impact. Implement the trained model in a real-time or
batch processing system to continuously monitor transactions, applications,
or activities for potential fraud.
• Behavioral Analysis: ML models can analyze user behavior over time to
create profiles of legitimate users. Any deviations from these profiles can
trigger alerts for further investigation.
• Graph Analytics: For detecting network-related fraud, such as identity theft
or collusion, ML models can utilize graph analytics to analyze the rela-
tionships and connections between entities (e.g., individuals or accounts) to
identify suspicious patterns [21].
• Cross-Channel Analysis: Fraud detection systems often analyze data from
multiple channels, such as online transactions, mobile apps, and call centers.
ML helps in identifying fraudulent patterns that span multiple channels [22].
• Model Evaluation and Improvement: Continuous monitoring and evalua-
tion of ML models are essential. Models should be retrained and updated to
adapt to evolving fraud techniques and changes in data patterns.
• Regulatory Compliance: ML models in fraud detection must adhere to regu-
latory and compliance standards, such as GDPR and PCI DSS, to ensure the
privacy and security of customer data.
ML-powered fraud detection systems have the advantage of scalability and adapt-
ability, making them a valuable asset in the ongoing battle against fraud. These
systems can evolve with emerging fraud patterns and provide organizations with the
ability to respond swiftly to new threats [23–25].
4. Portfolio Management: ML can assist portfolio managers in optimizing
asset allocation by analyzing market data and historical performance. It can
recommend investment strategies and adjust portfolios to meet specific risk and
return objectives [26]. ML is revolutionizing the field of portfolio management
by providing advanced tools and techniques to optimize investment strategies,
manage risk, and make data-driven decisions. Here are several ways in which
ML is applied in portfolio management:
• Predictive Analytics: ML models can analyze historical market data,
economic indicators, and company-specific information to make predictions
about asset prices, market trends, and economic conditions. These predictions
can inform portfolio managers when making investment decisions.
• Asset Allocation Optimization: Portfolio is to allocate the portfolio across
different asset classes, such as equities, fixed income, real estate, and alter-
native investments, based on the investor’s risk profile and investment goals.
474 D. Patnaik and S. Patnaik

Asset allocation is a critical driver of portfolio performance. ML can optimize


asset allocation by considering a broader range of factors, including historical
performance, correlations, volatility, and economic data. Portfolio optimiza-
tion models can generate the most efficient allocation of assets to maximize
returns while managing risk.
• Risk Management: Evaluate and quantify the investor’s risk tolerance and
capacity for risk. This assessment helps determine the appropriate asset allo-
cation and investment strategies. ML helps in identifying and managing risk in
portfolios. ML models can calculate risk metrics, such as Value at Risk (VaR)
and Conditional Value at Risk (CVaR), and provide insights into potential
portfolio losses under various scenarios.
• Factor-Based Investing: ML models can identify and incorporate relevant
factors (such as market, value, size, and momentum) that drive asset returns.
Factor models help portfolio managers construct portfolios that are exposed
to specific risk factors, enhancing performance.
• Algorithmic Trading Strategies: ML can be applied to develop algorithmic
trading strategies that automatically execute buy and sell orders based on real-
time market data. These strategies aim to capture short-term price movements
and market inefficiencies [9].
• Sentiment Analysis: ML and Natural Language Processing (NLP) can
analyze news articles, social media, and other textual data to gauge market
sentiment. Sentiment analysis can inform portfolio managers about market
sentiment trends and potential impacts on asset prices.
• Reinforcement Learning: Some portfolio management strategies are devel-
oped using reinforcement learning, where the algorithm learns optimal trading
policies through trial and error in simulated trading environments. This
approach can lead to the discovery of novel trading strategies [27].
• Dynamic Portfolio Rebalancing: ML models can continuously monitor port-
folio performance and automatically rebalance the portfolio to align with
predefined goals and risk tolerances. This dynamic approach ensures that
portfolios stay in line with changing market conditions [28].
• Alternative Data Integration: ML can incorporate alternative data sources,
such as satellite imagery, social media sentiment, or supply chain data, to gain
insights into specific industries or companies. These data sources can provide
a competitive advantage in portfolio management.
• Fraud and Anomaly Detection: ML models can identify unusual patterns or
anomalies in trading and portfolio data that may indicate fraudulent activities
or errors in trading execution.
• Market Impact Analysis: ML can simulate the potential impact of large
trades on market prices and liquidity. This analysis helps portfolio managers
execute trades more efficiently while minimizing market impact [29].
• Robo-Advisors: Robo-advisory platforms use ML algorithms to provide auto-
mated and personalized investment advice to retail investors. These platforms
can construct and manage portfolios based on individual risk profiles and
financial goals.
22 New Paradigm in Financial Technology Using Machine Learning … 475

• Regulatory Compliance: ML models in portfolio management must adhere


to regulatory and compliance standards, ensuring that they comply with
financial regulations and risk management guidelines.
ML-driven portfolio management has the potential to enhance portfolio perfor-
mance, reduce risk, and improve decision-making [30]. However, it also comes with
challenges such as model interpretability, data quality, and the need for continuous
monitoring and adaptation to changing market conditions. As technology and data
continue to evolve, the integration of ML in portfolio management is expected to
become even more prominent [31].
5. Risk Management: ML models are used to assess and manage various types of
risk, including market risk, credit risk, and operational risk. They can provide
early warnings of potential issues and help financial institutions make informed
decisions [32]. ML is playing an increasingly important role in risk management
across various industries, helping organizations identify, assess, and mitigate
risks more effectively [32–34]. Here’s how ML is applied in risk management:
• Credit Risk Assessment: ML models analyze vast amounts of data to assess
the creditworthiness of individuals and businesses. They consider factors such
as credit history, payment behavior, income, and more to predict the likelihood
of loan defaults. This helps financial institutions make more accurate lending
decisions.
• Market Risk Analysis: ML can analyze historical market data, news senti-
ment, and external factors to model and predict market movements and
volatility. Portfolio managers use these predictions to optimize asset allocation
and manage exposure to market risk.
• Operational Risk Identification: ML can detect operational risks by
analyzing transaction data, employee behavior, and network logs. Unusual
patterns or anomalies in these data can signal potential operational issues,
such as fraud, errors, or system failures.
• Credit Fraud Prevention: ML algorithms can analyze credit card transactions
and detect fraudulent activities, such as unauthorized transactions or identity
theft. Real-time alerts and transaction blocking can be implemented to mitigate
fraud risk.
• Risk Scoring and Modeling: ML can develop risk models that assess the like-
lihood and impact of various risks. These models consider historical data and
identify key risk factors, enabling organizations to prioritize risk mitigation
strategies.
• Supply Chain Risk Management: ML can analyze supply chain data to
identify potential disruptions or bottlenecks. Predictive models can help
organizations anticipate and mitigate supply chain risks [35].
• Credit Portfolio Management: In banking and finance, ML helps manage
credit portfolios by optimizing asset allocation, monitoring credit quality, and
identifying potential issues within a portfolio of loans or investments.
476 D. Patnaik and S. Patnaik

• Cybersecurity Risk: ML models can analyze network traffic and detect


abnormal patterns or intrusion attempts, enhancing cybersecurity risk manage-
ment. They can also access vulnerabilities in software and infrastructure
[36].
• Natural Disaster Prediction and Response: ML can analyze weather and
environmental data to predict natural disasters like hurricanes or earthquakes.
This information can be used to assess the risk to assets and plan disaster
response strategies.
• Regulatory Compliance: ML is used to ensure regulatory compliance by
monitoring and reporting on various aspects of risk management. This
includes anti-money laundering (AML) and know-your-customer (KYC)
procedures.
• Extreme Event Prediction: ML models can predict extreme events, such as
financial market crashes or supply chain disruptions, by analyzing historical
data and identifying leading indicators.
While ML offers significant advantages in risk management, it also presents chal-
lenges, such as model interpretability and data privacy concerns. Continuous moni-
toring and model validation are essential to ensure the effectiveness and reliability
of ML-based risk management systems. As ML technologies continue to evolve,
they are expected to play an even more crucial role in helping organizations navigate
complex and dynamic risk landscapes [37, 38].
6. Insurance Underwriting: In the insurance industry, ML can assess risk factors
and help determine insurance premiums based on a customer’s individual char-
acteristics and historical data [39]. Machine Learning (ML) is revolutionizing the
insurance underwriting process by enabling insurers to make more accurate risk
assessments, streamline operations, and enhance customer experiences. Here’s
how ML is transforming insurance underwriting [39–44]:
• Automated Risk Assessment: ML models can analyze large datasets,
including historical claims data, policyholder information, and external factors
(e.g., weather, economic trends) to assess risk. This helps insurers make more
precise risk predictions.
• Predictive Underwriting: ML algorithms can predict future risks by identi-
fying patterns and trends in historical data. For example, they can predict the
likelihood of an insured event based on factors such as location, demographics,
and past claims.
• Personalized Pricing: ML-driven underwriting enables insurers to offer
personalized pricing to policyholders based on their individual risk profiles.
This can lead to more competitive pricing for lower-risk customers and fairer
premiums overall.
• Enhanced Fraud Detection: ML models can identify fraudulent applications
and claims by analyzing data for suspicious patterns and anomalies. This helps
insurers prevent fraudulent activities and reduce financial losses.
22 New Paradigm in Financial Technology Using Machine Learning … 477

• Data Enrichment: ML can leverage external data sources, such as satellite


imagery, IoT devices, and social media, to enhance underwriting decisions.
For example, satellite imagery can assess property conditions, while IoT data
can monitor vehicle usage.
• Natural Language Processing (NLP): NLP techniques can extract valu-
able information from unstructured text, such as medical records or accident
reports, to assess risk and process claims more efficiently.
• Automation of Routine Tasks: ML can automate routine underwriting tasks,
such as data entry and document verification, reducing manual effort and
improving efficiency.
• Real-time Risk Assessment: ML models can provide real-time risk assess-
ments by continuously analyzing incoming data, allowing insurers to adjust
premiums and coverage as conditions change.
• Dynamic Pricing: ML enables dynamic pricing adjustments based on
changing risk factors. For instance, auto insurance premiums can be adjusted
based on driving behavior and real-time telematics data.
• Claims Prediction: ML models can predict the likelihood and severity of
future claims, helping insurers set aside adequate reserves and plan for
potential losses.
• Customer Segmentation: ML-driven underwriting allows insurers to
segment customers more precisely, tailoring policies and pricing to different
customer profiles and needs.
• Telematics and Usage-Based Insurance: ML can analyze telematics data
from connected devices to assess driver behavior and offer usage-based
insurance policies, which can reward safe driving with lower premiums.
• Compliance and Regulatory Reporting: ML can assist insurers in
automating compliance checks and generating regulatory reports to ensure
adherence to insurance regulations.
• Customer Experience: ML-powered underwriting processes can provide
faster and more convenient customer experiences by reducing the time
required for policy approval and claims processing.
• Risk Mitigation Recommendations: ML models can provide policy-
holders with personalized recommendations for risk mitigation, such as
safety improvements for property owners or driving tips for auto insurance
customers.
ML-driven underwriting has the potential to improve profitability for insurers,
reduce fraud, enhance customer satisfaction, and provide more accurate risk assess-
ments. However, it’s essential for insurers to address challenges related to data
privacy, model transparency, and regulatory compliance when implementing ML
solutions in underwriting processes [39].
7. Options pricing and Risk Management: ML can assist in pricing options and
managing the risk associated with complex derivatives by simulating different
market scenarios [33, 34, 37]. Machine Learning (ML) is increasingly used
in options pricing and risk management within the financial industry, offering
478 D. Patnaik and S. Patnaik

advanced techniques to improve accuracy, speed, and risk assessment. Here’s


how ML is transforming options pricing and risk management:
• Volatility Forecasting: ML models can analyze historical market data to
forecast future volatility. Accurate volatility predictions are crucial for options
pricing as they directly impact option premiums.
• Implied Volatility Prediction: ML algorithms can predict implied volatility
levels based on market prices. This is particularly useful in understanding how
the market perceives future volatility, which can inform trading decisions.
• Options Pricing Models: ML can enhance options pricing models by incor-
porating more complex factors, such as non-linear relationships and market
sentiment, which traditional models may not capture effectively.
• Risk Scenario Analysis: ML-driven risk management systems can perform
extensive scenario analysis by simulating various market conditions and their
impact on options portfolios. This helps assess potential losses and optimize
risk mitigation strategies.
• Portfolio Optimization: ML can optimize options portfolios by finding the
optimal combination of option contracts and underlying assets to achieve
specific risk-return objectives.
• Algorithmic Trading Strategies: ML models can develop and optimize algo-
rithmic trading strategies for options markets. These strategies can be based
on statistical arbitrage, volatility arbitrage, or other quantitative methods.
• Tail Risk Management: ML can identify tail risk events, such as extreme
market moves, and recommend risk-hedging strategies, including the use of
options like protective puts or collars.
• Risk Analytics: ML-driven risk analytics platforms can provide real-time risk
assessments for options portfolios, including Value at Risk (VaR) calculations,
stress testing, and scenario analysis.
• Algorithmic Execution: ML algorithms can optimize the execution of option
trades by considering factors like market liquidity, order book dynamics, and
transaction costs. This helps minimize execution risk.
• Credit Risk Assessment: ML models assess the credit risk associated with
options contracts, taking into account counterparty risk. This is crucial when
trading options with over-the-counter (OTC) counterparties.
• Market Sentiment Analysis: ML and Natural Language Processing (NLP)
techniques analyze news sentiment, social media chatter, and other textual data
to gauge market sentiment. This can inform options traders about potential
market moves.
• Reinforcement Learning for Option Pricing: Some advanced options
pricing models use reinforcement learning to adapt and learn optimal pricing
strategies over time, considering changing market conditions and dynamics.
• Early Warning Systems: ML-driven early warning systems can detect
abnormal market behavior or volatility patterns that may impact options
portfolios, allowing traders to react swiftly.
22 New Paradigm in Financial Technology Using Machine Learning … 479

• Liquidity Risk Management: ML models analyze liquidity conditions in


options markets, helping traders and risk managers make informed decisions
about trade execution and portfolio management.
• Regulatory Compliance: ML-based risk management systems ensure
compliance with regulatory standards, such as those related to margin
requirements and derivatives reporting.
ML’s ability to analyze vast datasets, uncover non-linear relationships, and adapt
to changing market conditions makes it a valuable tool in options pricing and risk
management. However, it’s essential to address model interpretability, data quality,
and robustness concerns when implementing ML solutions in this domain to ensure
effective risk management and regulatory compliance [27, 32, 33].
8. Predictive Maintenance: In the context of asset management, ML can predict
when equipment and assets require maintenance or replacement, reducing down-
time and maintenance costs. Predictive maintenance, powered by Machine
Learning (ML), is a data-driven approach used in various industries to predict
when equipment or machinery is likely to fail and proactively schedule mainte-
nance before a breakdown occurs. This approach minimizes downtime, reduces
maintenance costs, and improves operational efficiency. Here’s how ML is
transforming predictive maintenance:
• Data Collection: Sensors, IoT devices, and connected equipment collect vast
amounts of data related to equipment performance, temperature, vibration,
pressure, and more. ML models rely on this data to make predictions.
• Anomaly Detection: ML algorithms can detect anomalies in real-time data
streams. When a deviation from normal behavior is identified, it triggers
maintenance alerts. This approach is particularly effective for early fault
detection.
• Failure Prediction: ML models use historical data to predict when specific
components or systems are likely to fail. These predictions are based on
patterns and trends identified in the data.
• Asset Health Monitoring: ML continuously monitors the health of assets and
provides a real-time assessment of their condition. This allows maintenance
teams to prioritize repairs and replacements based on asset criticality and
wear-and-tear.
• Failure Root Cause Analysis: ML can analyze historical data to identify the
root causes of failures. This helps organizations address underlying issues to
prevent recurring failures.
• Optimal Maintenance Scheduling: ML models consider factors like equip-
ment usage patterns, production schedules, and maintenance costs to optimize
maintenance schedules. This minimizes downtime while ensuring equipment
reliability.
• Prescriptive Maintenance: ML-driven systems not only predict failures
but also prescribe specific maintenance actions to address issues. These
480 D. Patnaik and S. Patnaik

recommendations may include adjusting operating parameters or scheduling


maintenance tasks.
• Prognostics: ML models estimate the remaining useful life (RUL) of equip-
ment or components. This information helps organizations plan maintenance
activities more efficiently.
• Cognitive Diagnostics: ML-powered systems can provide detailed diagnostic
information about the condition of equipment and the severity of issues,
helping maintenance teams prioritize tasks.
• Supply Chain Optimization: Predictive maintenance can be integrated with
supply chain management systems to ensure that necessary parts and materials
are available when maintenance is scheduled.
• Cost Reduction: By avoiding unplanned downtime and conducting mainte-
nance only when needed, organizations can significantly reduce maintenance
costs and improve the cost-effectiveness of their operations.
• Asset Performance Optimization: ML can identify opportunities to optimize
asset performance, such as adjusting operating parameters or implementing
energy-efficient practices.
• Integration with Enterprise Systems: Predictive maintenance systems can
be integrated with other enterprise systems, such as Enterprise Resource Plan-
ning (ERP) and Computerized Maintenance Management Systems (CMMS),
for seamless maintenance planning and execution.
• Continuous Learning: ML models continuously learn and adapt as new data
becomes available, improving prediction accuracy over time.
• Safety Improvements: Predictive maintenance helps reduce the risk of acci-
dents and safety incidents by addressing equipment issues before they become
critical.
Predictive maintenance, powered by ML, is an essential tool for industries that
rely on machinery and equipment to maintain smooth operations. It not only extends
the lifespan of assets but also enhances overall operational efficiency and safety.

22.3 Conclusion

These are just a few examples of how ML is transforming the finance industry. As
technology and data continue to advance, the applications of ML in finance are likely
to expand even further, leading to more efficient and innovative financial services.
ML has become indispensable tool across diverse areas of the financial sector, encom-
passing asset management, risk evaluation, investment advisory, anti-financial crime
efforts, document verification, and beyond. As ML algorithms handle a myriad of
functions, they continually evolve through data-driven learning, propelling the evolu-
tion of a fully automated financial landscape. This chapter has explored several of
the boundless potentials that machine learning brings to financial technology’s. In
22 New Paradigm in Financial Technology Using Machine Learning … 481

the years ahead, ongoing research in the field may see ML making substantial contri-
butions to the analysis of the financial industry. These applications represent just
a fraction of the innovative ways machine learning is reshaping the financial tech-
nology landscape. As technology continues to advance, we can expect even more
transformative developments in the intersection of finance and artificial intelligence.
However, it’s important to note that these advancements also raise important ques-
tions regarding data privacy, security, and ethical considerations, which need to be
carefully addressed as the industry evolves.

References

1. Warin, T., Stojkov, A.: Machine learning in finance: a metadata-based systematic review of the
literature. J. Risk Financ. Manag. 14(7), 302 (2021). https://doi.org/10.3390/jrfm14070302
2. Gerlein, E.A., McGinnity, M., Belatreche, A., Coleman, S.: Evaluating machine learning clas-
sification for financial trading: An empirical approach. Expert. Syst. Appl. (2016). https://doi.
org/10.1016/j.eswa.2016.01.018
3. Kodru, S.S.: Machine learning applications in finance (2021). http://hdl.handle.net/1920/12227
4. Jackson, S.: Machine Learning for Algorithmic Trading, 2nd edn (2020)
5. Huang, B., Huan, Y., Xu, L.D., Zheng, L., Zou, Z.: Automated trading systems statistical and
machine learning methods and hardware implementation: a survey (2019). https://doi.org/10.
1080/17517575.2018.1493145
6. Huang, Z., Li, N., Mei, W., Gong, W.: Algorithmic trading using combinational rule vector and
deep reinforcement learning. Appl. Soft Comput., 110802 (2023). ISSN 1568-4946. https://
doi.org/10.1016/j.asoc.2023.110802
7. Dubey, R.K.: Algorithmic trading: the intelligent trading systems and its impact on trade size.
Expert. Syst. Appl. 202, 117279 (2022). https://doi.org/10.1016/j.eswa.2022.117279
8. Majidi, M., Shamsi, M., Marvasti, F.: Algorithmic trading using continuous action space deep
reinforcement learning. Expert. Syst. Appl. 235, 121245 (2024). ISSN 0957-4174. https://doi.
org/10.1016/j.eswa.2023.121245
9. Ning, L.: A Machine Learning Approach to Automated Trading. Boston College Computer
Science Senior, Boston, MA (2016)
10. Markov, A., Seleznyova, Z., Lapshin, V.: Credit scoring methods: latest trends and points to
consider. J. Financ. Data Sci. 8, 180–201 (2022). ISSN 2405-9188. https://doi.org/10.1016/j.
jfds.2022.07.002
11. Bueff, A.C., Cytryński, M., Calabrese, R., Jones, M., Roberts, J., Moore, J., Brown, I.: Machine
learning interpretability for a stress scenario generation in credit scoring based on counterfac-
tuals. Expert. Syst. Appl. 202, 117271 (2022). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.
2022.117271
12. Liu, W., Fan, H., Xia, M., Xia, M.: A focal-aware cost-sensitive boosted tree for imbalanced
credit scoring. Expert. Syst. Appl. 208, 118158 (2022). ISSN 0957-4174. https://doi.org/10.
1016/j.eswa.2022.118158
13. Liu, W., Fan, H., Xia, M.: Tree-based heterogeneous cascade ensemble model for credit scoring.
Int. J. Forecast. (2022). ISSN 0169-2070. https://doi.org/10.1016/j.ijforecast.2022.07.007
14. Albanesi, S., DeGiorgi, G., Nosal, J.: Credit growth and the financial crisis: a new narrative. J.
Monet. Econ. 132, 118–139 (2022). ISSN 0304-3932. https://doi.org/10.1016/j.jmoneco.2022.
09.001
15. Helder, V.G., Filomena, T.P., Ferreira, L., Kirch, G.: Application of the VNS heuristic for feature
selection in credit scoring problems. Mach. Learn. Appl. 9, 100349 (2022). ISSN 2666-8270.
https://doi.org/10.1016/j.mlwa.2022.100349
482 D. Patnaik and S. Patnaik

16. Simumba, N., Okami, S., Kodaka, A., Kohtake, N.: Multiple objective metaheuristics for feature
selection based on stakeholder requirements in credit scoring. Decis. Support. Syst. 155, 113714
(2022). ISSN 167-9236. https://doi.org/10.1016/j.dss.2021.113714
17. Lee, K., Lee, H., Lee, H., Yoon, Y., Lee, E., Rhee, W.: Assuring explainability on demand
response targeting via credit scoring. Energy 161, 670–679 (2018). ISSN 0360-5442. https://
doi.org/10.1016/j.energy.2018.07.179
18. Rodrigues, V.F., Policarpo, L.M., da Silveira, D.E., da Rosa Righi, R., da Costa, C.A., Victória
Barbosa, J.L., Antunes, R.S., Scorsatto, R., Arcot, T.: Fraud detection and prevention in e-
commerce: a systematic literature review. Electron. Commer. Res. Appl. 56, 101207 (2022).
ISSN 1567-4223. https://doi.org/10.1016/j.elerap.2022.101207
19. Khatri, S., Arora, A., Agrawal, A.P.: Supervised machine learning algorithms for credit card
fraud detection: a comparison (2020). https://doi.org/10.1109/Confluence47617.2020.9057851
20. Raghavan, P., Gayar, N.E.: Fraud detection using machine learning and deep learning (2019).
https://doi.org/10.1109/ICCIKE47802.2019.9004231
21. Sun, H., Li, J., Zhu, X.: Financial fraud detection based on the part-of-speech features of
textual risk disclosures in financial reports. Procedia Comput. Sci. 221, 57–64 (2023). ISSN
1877-0509. https://doi.org/10.1016/j.procs.2023.07.009
22. Fanai, H., Abbasimehr, H.: A novel combined approach based on deep autoencoder and deep
classifiers for credit card fraud detection. Expert. Syst. Appl. 217, 119562 (2023). ISSN 0957-
4174. https://doi.org/10.1016/j.eswa.2023.119562
23. Shirgave, S., Awati, C., More, R., Patil, S.: A review on credit card fraud detection using
machine learning (2019)
24. Yi, Z., Cao, X., Pu, X., Wu, Y., Chen, Z., Khan, A.T., Francis, A., Li, S.: Fraud detection in
capital markets: a novel machine learning approach. Expert. Syst. Appl. 231, 120760 (2023).
ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2023.120760
25. Cao, R., Wang, J., Mao, M., Liu, G., Jiang, C.: Feature-wise attention based boosting ensemble
method for fraud detection. Eng. Appl. Artif. Intell. 126, 106975 (2023). ISSN 0952-1976.
https://doi.org/10.1016/j.engappai.2023.106975
26. Pozen, R.C., Ruane, J.: What machine learning will mean for asset managers (2019). https://
hbr.org/2019/12/what-machine-learning-will-mean-for-asset-managers
27. Soleymani, F., Paquet, E.: Financial portfolio optimization with online deep reinforcement
learning and restricted stacked autoencoder—deep breath (2020). https://doi.org/10.1016/j.
eswa.2020.113456
28. Wang, Z., Huang, B., Tu, S., Zhang, K., Xu, L.: Deep trader: a deep reinforcement learning
approach for risk-return balanced portfolio management with market conditions embedding
(2021). https://doi.org/10.1609/aaai.v35i1.16144
29. Tan, Z., Yan, Z., Zhu, G.: Stock selection with random forest: an exploitation of excess return
in the Chinese stock market (2019). https://doi.org/10.1016/j.heliyon.2019.e02310
30. Chuan, Y., Zhao, C., He, Z., Wu, L.: The success of adaboost and its application in portfolio
management (2021). https://doi.org/10.1142/S2424786321420019
31. Jiang, Z., Ji, R., Chang, K.-C.: A machine learning integrated portfolio rebalance framework
with risk-aversion adjustment (2020). https://doi.org/10.3390/jrfm13070155
32. Jomthanachai, S., Wong, W.-P., Lim, C.-P.: An application of data envelopment analysis and
machine learning approach to risk management. IEEE Access 9, 85978–85994 (2021). https://
doi.org/10.1109/ACCESS.2021.3087623
33. Liu, Y.: Artificial intelligence and machine learning based financial risk network assessment
model. In: 2023 IEEE 12th International Conference on Communication Systems and Network
Technologies (CSNT), Bhopal, India, 2023, pp. 158–163. https://doi.org/10.1109/CSNT57126.
2023.10134653.
34. Dominguez, G.A., Kawaai, K., Maruyama, H.: FAILS: a tool for assessing risk in ML systems.
In: 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops),
Taipei, Taiwan, 2021, pp. 1–4. https://doi.org/10.1109/APSECW53869.2021.00010
35. Aljabhan, B.: Economic strategic plans with supply chain risk management (SCRM) for organi-
zational growth and development. Alex. Eng. J. 79, 411–426 (2023). ISSN 1110-0168. https://
doi.org/10.1016/j.aej.2023.08.020
22 New Paradigm in Financial Technology Using Machine Learning … 483

36. d’Ambrosio, N., Perrone, G., Romano, S.P.: Including insider threats into risk management
through Bayesian threat graph networks. Comput. Secur. 133, 103410 (2023). ISSN 0167-4048.
https://doi.org/10.1016/j.cose.2023.103410
37. Chakabva, O., Tengeh, R.K.: The relationship between SME owner-manager characteristics
and risk management strategies. J. Open Innov. Technol. Mark. Complex. 9(3), 100112 (2023).
ISSN 2199-8531. https://doi.org/10.1016/j.joitmc.2023.100112
38. Yun, J.: The effect of enterprise risk management on corporate risk management. Financ. Res.
Lett. 55, 103950 (2023). ISSN 1544-6123. https://doi.org/10.1016/j.frl.2023.103950
39. Tan, Y., Zhang, G.-J.: The application of machine learning algorithm in underwriting process.
In: 2005 International Conference on Machine Learning and Cybernetics, vol. 6, Guangzhou,
China, 2005, pp. 3523–3527. https://doi.org/10.1109/ICMLC.2005.1527552
40. Vandervorst, F., Verbeke, W., Verdonck, T.: Data misrepresentation detection for insurance
underwriting fraud prevention. Decis. Support. Syst. 159, 113798 (2022). ISSN 0167-9236.
https://doi.org/10.1016/j.dss.2022.113798
41. Linnér, R.K., Koellinger, P.D.: Genetic risk scores in life insurance underwriting. J. Health
Econ. 81, 102556 (2022). ISSN 0167-6296. https://doi.org/10.1016/j.jhealeco.2021.102556
42. Dubey, A., Parida, T., Birajdar, A., Prajapati, A.K., Rane, S.: Smart underwriting system: an
intelligent decision support system for insurance approval & risk assessment. In: 2018 3rd
International Conference for Convergence in Technology (I2CT), Pune, India, 2018, pp. 1–6.
https://doi.org/10.1109/I2CT.2018.8529792
43. Doultani, M., Bhagchandani, J., Lalwani, S., Palsule, M., Sahoo, A.: Smart underwriting—a
personalised virtual agent. In: 2021 5th International Conference on Intelligent Computing and
Control Systems (ICICCS), Madurai, India, 2021, pp. 1762–1767. https://doi.org/10.1109/ICI
CCS51141.2021.9432216
44. Nikolopoulos, C., Duvendack, S.: A hybrid machine learning system and its application to insur-
ance underwriting. In: Proceedings of the First IEEE Conference on Evolutionary Computation.
IEEE World Congress on Computational Intelligence, vol. 2, Orlando, FL, 1994, pp. 692–695.
https://doi.org/10.1109/ICEC.1994.349974

You might also like