Neural - Networks in Finance - Seminar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Generative Adversarial Networks

Seminar paper
European University Viadrina
Department of Finance and Capital Market Theory
Neural Networks in Finance
Dr. Rick Steinert

Submitted by: Shokhrukhsora Askarova - 89812,


Otabek Askarov - 135712,
Farrukh Mirzaev - 91857
Abstract

This seminar paper explores the landscape of Generative Adversarial Networks (GANs)
through a literature review of prominent GAN models. It then focuses on the
development of Wasserstein GAN with Gradient Penalty (WGAN-GP), highlighting its
significance in the evolution of GANs. The paper further introduces the concept of
oversampling with neural networks for loan data and demonstrates the application of
WGAN-GP in generating synthetic loan data for oversampling purposes. Finally, the
paper discusses the issues and choices involved in the application and implementation
of WGAN-GP for this specific task.

1
Contents

1. Introduction 3
2. Methodology 4
2.1 Generative Adversarial Network 4
2.1.1 The Discriminator 4
2.1.2 The Generator 4
2.1.3 Loss function 5
2.1.4 Optimizers 6
2.1.4 Challenges 6
2.2 Wasserstein GAN 8
2.2.1 The WGAN with Gradient Penalty 8
2.3 Oversampling with WGAN-GP 9
3. Dataset 10
4. Application of WGAN-GP 11
5. Model evaluation 11
6. Conclusion 13
References 14

2
1. Introduction

In the domain of finance, neural networks have brought transformative opportunities in


data analysis, prediction, and decision-making. This seminar paper explores the
application of neural networks in finance, with a particular focus on Generative
Adversarial Networks (GANs). GANs are powerful models capable of generating
synthetic data that closely resembles real financial data, enabling various applications in
financial analysis, risk management, and decision support systems.

Obtaining adequate volumes of high-quality, realistic loan data is a key challenge in


finance. Due to privacy restrictions, real-world loan data with details like amount,
income, credit history, etc. is difficult and expensive to collect at scale. Yet robust loan
prediction models require large, representative datasets encompassing diverse economic
conditions. GANs offer a novel solution by learning to generate new synthetic loan
profiles that protect sensitive customer information while capturing the complexity of
real loans. Among the various GAN models, we will extensively examine the
Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). This
variation addresses limitations of the original GAN model, such as training instability
and mode collapse. By incorporating a gradient penalty term, WGAN-GP enforces the
Lipschitz constraint on the discriminator, leading to more stable training and improved
convergence, which is crucial in finance where robust and accurate data generation is
essential for making informed decisions. WGAN-GP has shown promising results in
generating synthetic financial data, including time series data, stock market data, and
credit risk data. Financial analysts and researchers can leverage these generated datasets
for various applications, such as backtesting trading strategies, simulating economic
scenarios, and stress testing financial systems. Furthermore, WGAN-GP can be used for
anomaly detection and risk assessment in financial data, effectively identifying
fraudulent transactions, unusual market behavior, or potential risks in a financial
portfolio. This study aims to shed light on the applications of WGANs in generating
synthetic loan data and demonstrate their effectiveness as a powerful tool for
data-driven modeling in the finance industry.

3
2. Methodology

2.1 Generative Adversarial Network


One of the most fascinating fields of artificial intelligence research is the generative
adversarial network (GAN), which has drawn a lot of attention for its exceptional data
generation capabilities. Deep neural net architectures called Generative Adversarial
Networks (GANs)are composed of a generator and a discriminator, two competing
neural networks. The Generator (G) learns to create samples that resemble real data, and
the Discriminator (D) learns to distinguish between real and fake data, as this model is
trained by alternately optimizing two objective functions.1 A paradigm like this has
enormous potential because it can be taught to produce any data distribution.

2.1.1 The Discriminator


The Discriminator classifies things. It attempts to differentiate between Real and
Synthetic (from G) data that it receives. Depending on the type of data being classified,
it can use a variety of different network architectures. Two loss functions that are
utilized in various training phases are connected to the discriminator network.
Following the classification of real or synthetic data, the Discriminator loss penalizes it
for incorrect classifications, and its weights are updated through the network via
backpropagation from its loss.2

2.1.2 The Generator


The Generator network G learns to produce synthetic data that, in an ideal world,
closely mimics the original data in important ways using input from the Discriminator
D. Its objective is for the Discriminator to identify the generated data as real. The
generator is penalized for not tricking the discriminator by the network when it creates

1
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Bengio
(2014)
2
Florian Eckerli and Joerg Osterrieder (2021)
4
some output from a random input—some kind of noise. This output is then assessed by
the discriminator and causes a Generator loss. A GAN may be able to generate a broad
range of outputs by adding noise and sampling from various locations within the target
distribution. By taking a sample from the uniform distribution, noise is typically added.

2.1.3 Loss function


The implementation of Generative Adversarial Networks (GANs) may seem
straightforward in theory, but in practice, they often exhibit learning challenges that
deviate from initial expectations. One prominent factor contributing to this phenomenon
is the usage of overly simplistic loss functions in the training process. These loss
functions, which guide the optimization of GANs, might lack the necessary complexity
and sophistication to effectively capture the underlying data distribution and thereby
hinder the desired learning behavior. As a result, the GAN model may struggle to
converge to the desired solution, leading to suboptimal or unexpected outcomes during
training.

The standard GAN loss function, introduced in the seminal paper "Generative
Adversarial Networks" by Goodfellow et al. (2014), operates on a min-max game
framework where the generator aims to minimize the loss while the discriminator
endeavors to maximize it. This formulation was initially perceived as effective, and it
involves two components: the Discriminator loss and the Generator loss:

(1) 3

This function is attempted to be minimized by the generator and maximized by the


discriminator. This explanation of the loss appeared useful when viewing it as a
min-max game. It saturates the generator in reality, which means that if it doesn't catch
up to the discriminator, the generator usually stops training. This and other challenges
will be addressed later in this section.

3
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Bengio
(2014)
5
2.1.4 Optimizers
Optimizers play a crucial role in training neural networks by adjusting model
parameters to minimize the loss function. The learning rate is a key hyperparameter that
determines the step size of parameter updates during training. Gradient descent-based
optimizers, like Adam, are commonly used in deep learning and help find the local
minimum of a differentiable function by iteratively descending along the steepest
direction. Adam is an optimization algorithm widely used in various deep learning
applications, including computer vision and natural language processing. It extends
stochastic gradient descent by maintaining exponentially decaying averages of past
gradients and past squared gradients.4 This adaptive approach allows Adam to
dynamically adjust the learning rate for each parameter, leading to faster convergence
and improved performance. In the realm of GANs, different optimizers are explored,
but Adam remains a popular choice due to its effectiveness in optimizing complex
models like GANs.5

2.1.4 Challenges
Despite their ability to generate highly accurate synthetic samples, GANs are also
known to present challenges in the training process. The simultaneous training of two
networks introduces complexities as updating the parameters of one model changes the
optimization problem. This dynamic system becomes harder to control, leading to
non-convergence, which is a common issue in GAN training. Unlike the standard
optimization process that seeks the lowest point of a loss function for deep models, in
the context of GANs, the gradients may conflict, preventing convergence and missing
the optimum minima.6 To put it simply, if the Generator becomes too proficient and
quickly outsmarts the Discriminator, it may not receive meaningful feedback to improve
further. As a result, the Generator could start training based on incorrect feedback,
leading to a decline in the quality of the generated output, often referred to as "collapse
in output quality." This situation can hinder the effectiveness of the GAN model and
hinder its ability to produce realistic and diverse samples.
4
Diederik P. Kingma*,Jimmy Lei Ba∗(2015)
5
Florian Eckerli and Joerg Osterrieder (2021)
6
Florian Eckerli and Joerg Osterrieder (2021)
6
Mode collapse in GANs happens when the training process goes wrong, and the
generator ends up producing limited and repetitive samples. This happens because the
generator may only generate similar-looking samples and not diverse ones. As a result,
the discriminator figures this out and stops the learning process too early, leading to a
lack of variety in the generated data.7

The generator's training can fail when the discriminator becomes too good at
distinguishing real vs fake data. This happens because an optimal discriminator
provides insufficient feedback for the generator to learn properly. This issue is known as
the vanishing gradient problem - when the gradient propagates back through the
generator during training, it shrinks rapidly due to repeated multiplication. By the time
the gradients reach the initial layers of the generator, they become so small that they do
not modify the weight values. As a result, the early layers of the generator stop learning,
causing its training to slow down dramatically and eventually halt. The core reason is
that an overly strong discriminator gives sparse and uninformative gradients, leading to
the vanishing gradient effect that hinders effective generator training.8

Since the original GAN was introduced, many modified and new loss functions have
been proposed for GAN training. These new loss variants aim to address some of the
major challenges faced in GANs that we discussed earlier. For instance, the FIN-GAN
model explored using the original GAN to generate synthetic financial time series
matching key data properties.9 The cGAN model conditions the GAN on auxiliary
inputs to guide and control the generation process.10 Other notable financial applications
include CorrGAN, which utilizes a DCGAN architecture to generate realistic correlation
matrices, and QuantGAN, which leverages temporal convolutional networks to better
capture long-range dependencies in time series data.11 The recent MAS-GAN
framework demonstrates using a self-attention GAN discriminator to optimize
agent-based market simulator models. This provides an overview of some key GAN

7
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Bengio
(2014)
8
Florian Eckerli and Joerg Osterrieder (2021)
9
Shuntaro Takahashi, Yu Chen, and Kumiko Tanaka-Ishii (2019)
10
Mehdi Mirza and Simon Osindero (2014)
11
Alec Radford, Luke Metz, and Soumith Chintala (2015)
7
innovations.12 One particularly influential model is WGAN-GP, which we will focus on
in this paper.

2.2 Wasserstein GAN


Wasserstein GAN is one of the most powerful alternatives to the original GAN loss. It
tackles the problem of mode collapse and vanishing gradients. As an alternative to the
original GAN loss function, which is based on the KL and JS distances, the first
Wasserstein GAN suggested a new cost function utilizing the Wasserstein distance.13
The Wasserstein distance, also known as the Earth Mover's distance, quantifies the
distance between two probability distributions. To compute the Wasserstein distance,
WGAN introduces a critic or discriminator that outputs a real number rather than a
binary classification. The critic is trained to give higher scores to real samples and lower
scores to fakes. The generator then trains to fool the critic and push the expected score
on generated samples towards the score on real samples. This aligned the generator and
critic objectives. The key innovation of WGAN was using the Wasserstein distance as
the loss function between the generated and real data distributions.

2.2.1 The WGAN with Gradient Penalty


In standard WGAN, the critic, or discriminator, is required to be a 1-Lipschitz function,
ensuring stability during training. To achieve this, weight clipping was employed in the
original WGAN paper. However, weight clipping posed its own set of challenges. It
often caused optimization issues, such as exploding or vanishing gradients, making it
difficult to find an optimal solution. Moreover, weight clipping imposed limitations on
the critic's capacity, restricting its ability to learn complex functions, even when it could
be beneficial for performance. To improve on this, in 2017 Gulrajani et al. published
"Improved Training of Wasserstein GANs" which introduced a new penalty term for the
critic's loss function. This penalizes gradients that have high norms, enforcing the
Lipschitz constraint smoothly without weight clipping. The gradient penalty enables
stable training of a WGAN with excellent performance.14

12
Victor Storchan, Tucker Balch, and Svitlana Vyetrenko (2021)
13
Martin Arjovsky, Soumith Chintala, and Léon Bottou (2017)
14
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville (2017)
8
With the gradient penalty, the Lipschitz constraint is now enforced softly through
regularization, leading to more stable training and better convergence. This not only
solves the issues caused by weight clipping but also opens up possibilities for using a
wide variety of model architectures without compromising stability.15 WGAN has seen
continued improvements and variations. For example, there are conditional WGANs
that incorporate class information, as well as progressive growing of GANs that grow
both the generator and discriminator progressively to boost training stability and quality.
WGAN has become one of the most widely used GAN architectures, especially when
image quality and training stability are important. The development of WGAN shows
how iterative research can take an initial idea like the Wasserstein distance and over
time build a robust and influential GAN methodology. Researchers continue to refine
and tweak WGAN to improve results across a variety of applications involving
generative modeling.

2.3 Oversampling with WGAN-GP


Imbalanced datasets are a common challenge in machine learning, especially for risk
modeling in finance. Real-world loan data often has many more examples of low-risk
loans than high-risk loans. This can bias models towards always predicting low risk.
Oversampling is a technique to address class imbalance by generating additional
examples of underrepresented classes. For loan data, we can oversample the minority
high-risk loans. Recent advances in generative modeling using neural networks provide
more powerful oversampling capabilities. Models like WGAN-GP can learn complex
distributions from real data. The key advantage is the model directly learns the true data
distribution from real samples. This preserves complex multivariate relationships
between loan features like amount, income, credit history, etc. By generating additional
high-risk loan profiles, we can oversample the minority class and balance out the
original skewed dataset. This enables training more robust, unbiased loan risk models.
Synthetic oversampling is also privacy-preserving compared to collecting more real
examples.

15
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville (2017)
9
3. Dataset

The dataset used in this seminar paper contains 254,191 observations of loans given to
individuals by lending clubs. It originally consisted of 37 variables covering loan
attributes, demographic information, and various information about payment status.
However, for the generative modeling we focused on just 4 key variables - loan amount,
interest rate, installment payment, and annual income. Though the full dataset is
substantial in size, these 4 financial features are most relevant for generating realistic
synthetic loan data.
The domain of consumer loans presents unique challenges for generative modeling.
Real loan data is difficult and expensive to obtain at scale due to privacy restrictions.
Yet reliable data is crucial for training robust loan prediction models. Synthetic loan
data offers a solution by providing an unlimited volume of realistic samples that protect
sensitive personal information.

Generating plausible synthetic loans is also difficult due to the complex


multidimensional relationships between variables like loan amount, income, interest
rate, and installments. Simple univariate distributions are insufficient. Effective
generative models must capture these multivariate interactions to produce loans that
look authentic.

In summary, the ability to generate synthetic loan data has enormous value for
expanding limited real-world datasets. It enables robust model development while also
protecting sensitive personal finance data. This research aims to demonstrate that
modern generative techniques can produce realistic multivariate loan samples to
augment sparse, biased, or private real-world loan data.

10
4. Application of WGAN-GP

Building an effective WGAN-GP model requires careful tuning of key hyperparameters


like batch size, learning rates, optimizer settings, and training epochs. Through iterative
experiments on the 250k row loan dataset, we identified an optimal configuration for
stable training and convergence.

The Adam optimizer was chosen as it adapts the learning rate and handles sparse,
high-dimensional data efficiently. For the discriminator, a small learning rate of 0.00001
enabled gradual improvements in distinguishing real versus generated data. Beta1 and
beta2 values of 0.5 and 0.9 provided the right balance of exploration and exploitation.

The generator used a higher learning rate of 0.0001 to quickly refine generated samples
based on the discriminator's feedback. Lower beta1 and beta2 values of 0.4 and 0.8
focused more on recent gradients to avoid poor local optima early in training.

A large batch size of 2,500 maintained diversity while enabling efficient CPU
computation of the gradients. We found 50 training epochs achieved convergence
without overfitting - the discriminator and generator reached an equilibrium in
improving.

For the gradient penalty, a coefficient of 30 struck the right balance in penalizing
unrealistic gradients without obstructing the training process. Updating the
discriminator 3 times for each generator update provided sufficiently informative
gradients without allowing the discriminator to dominate.

11
5. Model evaluation

The fine-tuned WGAN-GP generated realistic data - loan amounts, interest rates,
installments and incomes exhibited distributions similar to the real data based on
P-values. We can also evaluate using visualizations of the distributions:

Figure 1: Comparison plot of the distributions for variables.

We can observe that visualized distributions indicate the WGAN-GP model succeeded
in generating synthetic data that has very similar distributions to the real data across the
four features. Furthermore, we can compare and study the statistical parameters of the
randomly drawn data of 100 observations and the generated data of 100 observations:

12
Figure 2: Results of statistical evaluation for variables.

In an initial single-sample evaluation, the model demonstrated its ability to closely


mimic the real data distribution through aligned summary statistics and high p-value
results on two-sample t-tests. Although the p-values range from 0.57 to 0.96, this is only
one sample of 100 observations, while our dataset that contains the real data is over
250,000 observations.
To provide a more robust quantitative assessment, we utilized an iterative testing
procedure with 2,500 iterations. In each iteration, we draw a random sample of 100
observations from the full real dataset of 250k rows and compare it against an
equally-sized sample of 100 generated observations from the trained WGAN-GP model
using two-sample t-tests. The t-tests allow us to statistically evaluate whether the
distribution of each variable in the real sample is different from the distribution in the
generated sample. Our null hypothesis is that the two distributions are equal, against the
alternative that they differ.
For each of the 2,500 randomly drawn samples, we perform t-tests on each of the four
variables of interest. This results in 2,500 p-values per variable, whose means we
calculate to summarize the results across iterations. The mean p-values ranged from
0.28 to 0.58, exceeding the traditional 0.05 significance level used to reject the null
hypothesis. This indicates we do not have sufficient evidence to reject the idea that the
generated data distributions match the real data distributions based on the iterative t-test
evaluation.
Overall, both the initial analysis and iterative evaluation indicate the WGAN-GP model
effectively learned the underlying data distribution, as evidenced by the strong
similarity between the properties of the generated and real data. The iterative sampling
provides a robust assessment that this holds true across thousands of diverse
100-observation samples from the full 250k dataset.

13
6. Conclusion

This research paper demonstrates that Wasserstein GAN with gradient penalty
(WGAN-GP) is an effective technique for generating realistic synthetic loan data. The
model was trained on a dataset of over 250,000 real consumer loans. Through iterative
tuning of hyperparameters like learning rates, batch size, and penalty weight, the
WGAN-GP model learned to produce synthetic samples that closely matched the
distributions of key features like loan amount, interest rate, installment payment, and
annual income. Statistical analysis showed strong statistical similarity between the real
and synthetically generated data distributions. The p-values comparing the distributions
were all consistently high, proving we cannot reject the hypothesis that the real and fake
data share quite similar distributions. The ability to generate unlimited volumes of
realistic synthetic loan data has enormous value for the finance industry. It enables
robust model development on expanded datasets without requiring additional collection
of sensitive private customer data. Overall, this research provides a strong proof of
concept for using WGAN-GP to create privacy-preserving synthetic data. With further
refinement of model training and evaluation methods, we can work toward integration
of high-quality generated financial data in real-world applications. This will enable the
finance industry to continue innovating with data while upholding customer privacy
through reduced reliance on real sensitive data.
While this research demonstrates promising results, there are several areas for further
exploration. One direction is expanding the model to generate synthetic data
conditioned on different customer segments and economic scenarios. This could
improve realism by capturing segment-specific distributions. Another area is increasing
diversity and preventing mode collapse by incorporating techniques like dropout or
minibatch discrimination.

14
References

Eckerli, Florian, and Osterrieder, Joerg (2021). "Generative Adversarial Networks in


Finance: An Overview." Preprint. School of Engineering, Zurich University of Applied
Sciences, Winterthur, Switzerland.

Fernando De Meer Pardo (2019). “Enriching financial datasets with generative


adversarial networks’’

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., Bengio, Y.(2014): “Generative adversarial nets”

Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari Inria⋆(2018),” How good
is my GAN?”

Harshit Dwivedi (2023), “Understanding GAN Loss Functions”

Diederik P. Kingma, Jimmy Lei Ba (2015), “Adam: a method for stochastic


optimization”

Martin Arjovsky, Soumith Chintala, L´eon Bottou (2017), “Wasserstein GAN”

Shuntaro Takahashi, Yu Chen, and Kumiko Tanaka-Ishii (2019), “Conditional


generative adversarial nets”

Victor Storchan, Tucker Balch, and Svitlana Vyetrenko (2021), “Mas-gan: Adversarial
calibration of multi-agent market simulators”

Alec Radford, Luke Metz, and Soumith Chintala (2015), “Unsupervised representation
learning with deep convolutional generative adversarial networks”

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville
(2017), “Improved Training of Wasserstein GANs”

15

You might also like