Neural - Networks in Finance - Seminar
Neural - Networks in Finance - Seminar
Neural - Networks in Finance - Seminar
Seminar paper
European University Viadrina
Department of Finance and Capital Market Theory
Neural Networks in Finance
Dr. Rick Steinert
This seminar paper explores the landscape of Generative Adversarial Networks (GANs)
through a literature review of prominent GAN models. It then focuses on the
development of Wasserstein GAN with Gradient Penalty (WGAN-GP), highlighting its
significance in the evolution of GANs. The paper further introduces the concept of
oversampling with neural networks for loan data and demonstrates the application of
WGAN-GP in generating synthetic loan data for oversampling purposes. Finally, the
paper discusses the issues and choices involved in the application and implementation
of WGAN-GP for this specific task.
1
Contents
1. Introduction 3
2. Methodology 4
2.1 Generative Adversarial Network 4
2.1.1 The Discriminator 4
2.1.2 The Generator 4
2.1.3 Loss function 5
2.1.4 Optimizers 6
2.1.4 Challenges 6
2.2 Wasserstein GAN 8
2.2.1 The WGAN with Gradient Penalty 8
2.3 Oversampling with WGAN-GP 9
3. Dataset 10
4. Application of WGAN-GP 11
5. Model evaluation 11
6. Conclusion 13
References 14
2
1. Introduction
3
2. Methodology
1
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Bengio
(2014)
2
Florian Eckerli and Joerg Osterrieder (2021)
4
some output from a random input—some kind of noise. This output is then assessed by
the discriminator and causes a Generator loss. A GAN may be able to generate a broad
range of outputs by adding noise and sampling from various locations within the target
distribution. By taking a sample from the uniform distribution, noise is typically added.
The standard GAN loss function, introduced in the seminal paper "Generative
Adversarial Networks" by Goodfellow et al. (2014), operates on a min-max game
framework where the generator aims to minimize the loss while the discriminator
endeavors to maximize it. This formulation was initially perceived as effective, and it
involves two components: the Discriminator loss and the Generator loss:
(1) 3
3
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Bengio
(2014)
5
2.1.4 Optimizers
Optimizers play a crucial role in training neural networks by adjusting model
parameters to minimize the loss function. The learning rate is a key hyperparameter that
determines the step size of parameter updates during training. Gradient descent-based
optimizers, like Adam, are commonly used in deep learning and help find the local
minimum of a differentiable function by iteratively descending along the steepest
direction. Adam is an optimization algorithm widely used in various deep learning
applications, including computer vision and natural language processing. It extends
stochastic gradient descent by maintaining exponentially decaying averages of past
gradients and past squared gradients.4 This adaptive approach allows Adam to
dynamically adjust the learning rate for each parameter, leading to faster convergence
and improved performance. In the realm of GANs, different optimizers are explored,
but Adam remains a popular choice due to its effectiveness in optimizing complex
models like GANs.5
2.1.4 Challenges
Despite their ability to generate highly accurate synthetic samples, GANs are also
known to present challenges in the training process. The simultaneous training of two
networks introduces complexities as updating the parameters of one model changes the
optimization problem. This dynamic system becomes harder to control, leading to
non-convergence, which is a common issue in GAN training. Unlike the standard
optimization process that seeks the lowest point of a loss function for deep models, in
the context of GANs, the gradients may conflict, preventing convergence and missing
the optimum minima.6 To put it simply, if the Generator becomes too proficient and
quickly outsmarts the Discriminator, it may not receive meaningful feedback to improve
further. As a result, the Generator could start training based on incorrect feedback,
leading to a decline in the quality of the generated output, often referred to as "collapse
in output quality." This situation can hinder the effectiveness of the GAN model and
hinder its ability to produce realistic and diverse samples.
4
Diederik P. Kingma*,Jimmy Lei Ba∗(2015)
5
Florian Eckerli and Joerg Osterrieder (2021)
6
Florian Eckerli and Joerg Osterrieder (2021)
6
Mode collapse in GANs happens when the training process goes wrong, and the
generator ends up producing limited and repetitive samples. This happens because the
generator may only generate similar-looking samples and not diverse ones. As a result,
the discriminator figures this out and stops the learning process too early, leading to a
lack of variety in the generated data.7
The generator's training can fail when the discriminator becomes too good at
distinguishing real vs fake data. This happens because an optimal discriminator
provides insufficient feedback for the generator to learn properly. This issue is known as
the vanishing gradient problem - when the gradient propagates back through the
generator during training, it shrinks rapidly due to repeated multiplication. By the time
the gradients reach the initial layers of the generator, they become so small that they do
not modify the weight values. As a result, the early layers of the generator stop learning,
causing its training to slow down dramatically and eventually halt. The core reason is
that an overly strong discriminator gives sparse and uninformative gradients, leading to
the vanishing gradient effect that hinders effective generator training.8
Since the original GAN was introduced, many modified and new loss functions have
been proposed for GAN training. These new loss variants aim to address some of the
major challenges faced in GANs that we discussed earlier. For instance, the FIN-GAN
model explored using the original GAN to generate synthetic financial time series
matching key data properties.9 The cGAN model conditions the GAN on auxiliary
inputs to guide and control the generation process.10 Other notable financial applications
include CorrGAN, which utilizes a DCGAN architecture to generate realistic correlation
matrices, and QuantGAN, which leverages temporal convolutional networks to better
capture long-range dependencies in time series data.11 The recent MAS-GAN
framework demonstrates using a self-attention GAN discriminator to optimize
agent-based market simulator models. This provides an overview of some key GAN
7
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Bengio
(2014)
8
Florian Eckerli and Joerg Osterrieder (2021)
9
Shuntaro Takahashi, Yu Chen, and Kumiko Tanaka-Ishii (2019)
10
Mehdi Mirza and Simon Osindero (2014)
11
Alec Radford, Luke Metz, and Soumith Chintala (2015)
7
innovations.12 One particularly influential model is WGAN-GP, which we will focus on
in this paper.
12
Victor Storchan, Tucker Balch, and Svitlana Vyetrenko (2021)
13
Martin Arjovsky, Soumith Chintala, and Léon Bottou (2017)
14
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville (2017)
8
With the gradient penalty, the Lipschitz constraint is now enforced softly through
regularization, leading to more stable training and better convergence. This not only
solves the issues caused by weight clipping but also opens up possibilities for using a
wide variety of model architectures without compromising stability.15 WGAN has seen
continued improvements and variations. For example, there are conditional WGANs
that incorporate class information, as well as progressive growing of GANs that grow
both the generator and discriminator progressively to boost training stability and quality.
WGAN has become one of the most widely used GAN architectures, especially when
image quality and training stability are important. The development of WGAN shows
how iterative research can take an initial idea like the Wasserstein distance and over
time build a robust and influential GAN methodology. Researchers continue to refine
and tweak WGAN to improve results across a variety of applications involving
generative modeling.
15
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville (2017)
9
3. Dataset
The dataset used in this seminar paper contains 254,191 observations of loans given to
individuals by lending clubs. It originally consisted of 37 variables covering loan
attributes, demographic information, and various information about payment status.
However, for the generative modeling we focused on just 4 key variables - loan amount,
interest rate, installment payment, and annual income. Though the full dataset is
substantial in size, these 4 financial features are most relevant for generating realistic
synthetic loan data.
The domain of consumer loans presents unique challenges for generative modeling.
Real loan data is difficult and expensive to obtain at scale due to privacy restrictions.
Yet reliable data is crucial for training robust loan prediction models. Synthetic loan
data offers a solution by providing an unlimited volume of realistic samples that protect
sensitive personal information.
In summary, the ability to generate synthetic loan data has enormous value for
expanding limited real-world datasets. It enables robust model development while also
protecting sensitive personal finance data. This research aims to demonstrate that
modern generative techniques can produce realistic multivariate loan samples to
augment sparse, biased, or private real-world loan data.
10
4. Application of WGAN-GP
The Adam optimizer was chosen as it adapts the learning rate and handles sparse,
high-dimensional data efficiently. For the discriminator, a small learning rate of 0.00001
enabled gradual improvements in distinguishing real versus generated data. Beta1 and
beta2 values of 0.5 and 0.9 provided the right balance of exploration and exploitation.
The generator used a higher learning rate of 0.0001 to quickly refine generated samples
based on the discriminator's feedback. Lower beta1 and beta2 values of 0.4 and 0.8
focused more on recent gradients to avoid poor local optima early in training.
A large batch size of 2,500 maintained diversity while enabling efficient CPU
computation of the gradients. We found 50 training epochs achieved convergence
without overfitting - the discriminator and generator reached an equilibrium in
improving.
For the gradient penalty, a coefficient of 30 struck the right balance in penalizing
unrealistic gradients without obstructing the training process. Updating the
discriminator 3 times for each generator update provided sufficiently informative
gradients without allowing the discriminator to dominate.
11
5. Model evaluation
The fine-tuned WGAN-GP generated realistic data - loan amounts, interest rates,
installments and incomes exhibited distributions similar to the real data based on
P-values. We can also evaluate using visualizations of the distributions:
We can observe that visualized distributions indicate the WGAN-GP model succeeded
in generating synthetic data that has very similar distributions to the real data across the
four features. Furthermore, we can compare and study the statistical parameters of the
randomly drawn data of 100 observations and the generated data of 100 observations:
12
Figure 2: Results of statistical evaluation for variables.
13
6. Conclusion
This research paper demonstrates that Wasserstein GAN with gradient penalty
(WGAN-GP) is an effective technique for generating realistic synthetic loan data. The
model was trained on a dataset of over 250,000 real consumer loans. Through iterative
tuning of hyperparameters like learning rates, batch size, and penalty weight, the
WGAN-GP model learned to produce synthetic samples that closely matched the
distributions of key features like loan amount, interest rate, installment payment, and
annual income. Statistical analysis showed strong statistical similarity between the real
and synthetically generated data distributions. The p-values comparing the distributions
were all consistently high, proving we cannot reject the hypothesis that the real and fake
data share quite similar distributions. The ability to generate unlimited volumes of
realistic synthetic loan data has enormous value for the finance industry. It enables
robust model development on expanded datasets without requiring additional collection
of sensitive private customer data. Overall, this research provides a strong proof of
concept for using WGAN-GP to create privacy-preserving synthetic data. With further
refinement of model training and evaluation methods, we can work toward integration
of high-quality generated financial data in real-world applications. This will enable the
finance industry to continue innovating with data while upholding customer privacy
through reduced reliance on real sensitive data.
While this research demonstrates promising results, there are several areas for further
exploration. One direction is expanding the model to generate synthetic data
conditioned on different customer segments and economic scenarios. This could
improve realism by capturing segment-specific distributions. Another area is increasing
diversity and preventing mode collapse by incorporating techniques like dropout or
minibatch discrimination.
14
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., Bengio, Y.(2014): “Generative adversarial nets”
Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari Inria⋆(2018),” How good
is my GAN?”
Victor Storchan, Tucker Balch, and Svitlana Vyetrenko (2021), “Mas-gan: Adversarial
calibration of multi-agent market simulators”
Alec Radford, Luke Metz, and Soumith Chintala (2015), “Unsupervised representation
learning with deep convolutional generative adversarial networks”
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville
(2017), “Improved Training of Wasserstein GANs”
15