ANNand Its Applications
ANNand Its Applications
ANNand Its Applications
Alexandros Vasileiadis, Eirini Alexandrou, Lydia Paschalidou, Maria Chrysanthou, Maria Hadjichristoforou
Abstract—This paper focuses on Artificial Neural Networks backpropagation (BP) training algorithm programmer. Despite
(ANNs) and their applications. Initially, it explores the core the numerous training techniques, establishing an optimal
concepts of a neural network (NN), including their inspiration, ANN for a particular application remains a notable challenge.
basic structure, and training process, along with an overview of This challenge persists from compelling evidence from both
the most commonly used models. Additionally, the paper delves biological and technical perspectives, suggesting that the
into the three fields that ANNs play an important role: (1) effectiveness of an ANN in manipulating knowledge is
Computer Science, (2) Security, and (3) Health Care. These fields impacted by its design. [1]
are marked as significant since they hold great impact on various
aspects of society. For each one field, the paper discusses ways
that NNs have been utilised to unravel problems, the
This research will focus on neural network applications in
architectures employed, notable applications of NN within the computer science, security, and healthcare. It will explore how
domain and challenges faced because of NNs implementation. ANNs can be used in these fields, delve into their impact and
Lastly, it discusses the future directions of ANNs, exploring challenges, and discuss their potential future.
potential advancements in architecture, models, and applications
across diverse domains. Neural Networks are used in computer science for problem-
solving across various disciplines. Through algorithms, they
Index Terms—Artificial Neural Networks, Neural Networks, can execute various tasks, including image recognition, natural
Core Concepts, Training, models, Applications, Computer language processing (NLP), machine translation, speech
Science, Security, Health Care, Challenges, Architecture, Future recognition, and help with developing language translation
Direction systems. Building their success in Computer Science, ANNs
extended their applications into the Security sector. Various
types of ANNs, such as Convolutional Neural Networks
I. INTRODUCTION (CNNs), Graph Neural Networks (GNNs), and Recurrent
Neural Networks (RNNs), play an important role in addressing
Artificial Neural network (ANNs) is a machine learning security matters, such as Fraud Detection, cybersecurity
model, designed to emulate human decision-making processes threads, and facial recognition. Moreover, ANNs architectures
by simulating how biological neurons work. They consist of have expanded their use into the realm of healthcare.
interconnected layers of units, where data flows through them Capitalising on their abilities, they can help analyse medical
in an orderly sequence. Specifically, it can be categorised into images such as MRIs, CT scans, X-rays and ultrasounds which
three neural layers: (1) an input layer, (2) a hidden layer, and helps us to make early clinical diagnosis. ANNs can also
(3) an output layer. Even though ANNs are the simplified predict epidemic outbreaks, organise patients' health records
version of how our brain works, they are adept at learning to and personalise their medicine.
solve difficult problems through training, using experiments
and observations. Meaning, they are proficient at
comprehending intricate patterns and connections. [1]
Figure 2. MLP showing input, hidden and output layers and nodes
with feedforward links.
Output y of the perceptron clarifies whether a weighted sum of B. Training process of neural networks
inputs and the bias exceeds a certain value. if y = 1 is the
To achieve a network that produces accurate outputs, it must
output, the model will predict that an input belongs to class 1,
first go through a training process. Like a human, an ANN
while when y = 0 is the output, it is predicted that the input
learns from examples, so it is important to provide a large
belongs to class 0.
amount of data. There are three main approaches to training an
ANN - supervised, unsupervised, and reinforcement.
Even of the fact that the Perceptron represented real progress
in the development of artificial neural networks, it had its
Supervised training requires well-defined data with
drawbacks. Perceptrons were capable to learn just linearly
corresponding labels, and it is being used to make networks
separable data, where one class of objects is positioned on one
that are capable of making predictions, image classification,
side of the plane and the other class on the opposite side, as
market forecasting, and more. One of the algorithms is linear
shown in the figure 1 below, while data in the real world is
regression and k-nearest neighbours represent supervised
usually not linearly separable. As a consequence, perceptrons
learning. Linear Regression is a way of modelling a
have been fraught with the problems of solving the essential
relationship between various independent variables and a
issues that are important to the society.
dependent variable by fitting a straight line to observed data.
While a k-Nearest Neighbours (kNN) is a non-parametric
method which is useful for both classification and regression
tasks and is a way of predicting the output of an instance
based on the majority class of its k neighbours in the feature
space. [4] [5]
Moreover, ANNs not only solve these technical issues, but A prominent example is the Boltzmann machine, created by
also provide needed optimization of the current computer Terrence Sejnowski and Geoffrey Hinton, which introduced
systems to improve their performance. For example, in this new learning mechanisms that can handle the complex
case, they can also expedite the effectiveness of search linguistic patterns of the domain.
algorithms armed with large databases of images by only
displaying the ones that satisfy certain conditions which are With symmetric connections inspired by physical phenomena
uncomplicated to deal with manually. such as spin-glass, the Boltzmann machine offered a
mechanism for unsupervised learning that could be applied to
In the area of software development, ANNs can be used to machines recognizing and replicating relationships between
develop intelligent user interfaces whose visual inputs are linguistic elements. Topping this development was the Back-
processed and turned into usable information. This is Propagation algorithm by David Rumelhart, Geoffrey Hinton,
specifically significant in cases when a framework of gesture and R. J. Williams, which made it possible for multi-layer
acknowledgment is required, in which the system interprets perceptrons to solve complex linguistic problems. These
physical performance into valuable commands. networks, through stimulus and response iteration, could also
account for the subtle linguistic differences that issues such as
As research progresses, existing ANNs in computer science exclusive OR and the T/C problem pose.
can be made to become stronger and more innovative by
taking advantage of the fact that artificial systems can be Practical impacts of these capabilities can be seen in several
designed to gather information and interact optimally with the real-world applications of NLP. For instance, Sejnowski and
visual world. Such a move will go far beyond bringing the Charles Rosenberg have trained a network to pronounce
computational power to the next level, but also will generate English words correctly and, thus, have shown some potential
new methods of making machines intelligent and responsive that neural networks have for speech recognition. That is what
to human-like situation perceptions. [12] [13] opens the promising realm of using ANN for developing
automatic speech recognition systems to be used in virtual
A2. Natural Language Processing assistants, home-based voice-controlled devices, amongst
The journey of NLP started with the early theoretical models others.
by pioneers such as Warren McCulloch, Walter Pitts, and
Frank Rosenblatt. These early researchers laid down the Moreover, Teuvo Kohonen offered another insight about
theoretical foundation for the recognition and classification of topographical networks for map reading, which provided other
patterns in text data by using computation. Rosenblatt's ways concerning analysing and interpreting linguistic data.
invention of the 'Perceptron' in 1958 was quite revolutionary Some of these improvements include the creation of intelligent
because the Perceptron could train neural networks to classify chatbots, language translation systems, sentiment analysis
text patterns. It became the indispensable one for a task like tools, and named entity recognition systems, amongst others.
text classification and sentiment analysis. Essentially, practical applications of NLP, artificial neural
In 1961, further development in the field was the proposition networks, have revolutionised how the interaction of machines
of the 'back-propagating error correction algorithm' by Frank with human language allows them to make sense of and
Rosenblatt. It underscored that in case the level of accuracy produce text data in ways that would have otherwise been
was to increase in the recognition of patterns, then the training considered impossible. ANNs continue to grow and evolve,
of the neural network must be sophisticatedly done. This early creating the potential to harness advances in the understanding
work opened the doors to the use of Artificial Neural of languages and interaction for just about any industry.
Networks for supporting complex tasks in NLP, such as
named entity recognition or sentiment analysis. ii. NN Architectures in NLP
The advancement of Natural Language Processing (NLP)
i. Applications of NLP systems, which allows machines to comprehend and interpret
human language more efficiently, is greatly dependent on the and innovation within the research of neural networking
development of neural network architectures. These signals is a significant advancement in the understanding of
architectures provide considerable category disparities in natural language and interaction that may revolutionise
terms of time, manner, and place, in an attempt to mimic the teaching, translation, and communication. [14]
unusual computational needs of neural networks.
Furthermore, computation in the neural network architectures The success of RNNs in enhancing algorithmic efficiency and
proceeds in a way to have a differentiation between the digital model performance when dealing with sequential data requires
(or binary) and analogue processing. The most significant careful consideration of several factors. In particular, the
difference: While computational computers produce binary architecture of the model, the tuning of hyperparameters, and
answers (true or false), the neural net machines take in inputs the quality of training data are extremely important for
and give outputs along a range to allow for the full range of achieving optimal results.
variation in the linguistic stimuli. This is what makes the
subtleties and nuances of human language become captured in A significant advancement in RNN-based optimization is the
analogue processing, hence making learning natural and development of the Recurrent Neural Network-Based
flexible. Optimization Algorithm (RNN-OA). This method enhances
the dimensions of processing algorithms through the
In addition, the distributed connectivity seems to call for application of attention mechanisms, regularisation
information processing in the various classes of neural techniques, and improvements in interpretability. By
network architectures. Unlike traditional computers, where the processing input selectively and maintaining model stability,
information processing is isolated at unique addresses, neural RNN-OA significantly boosts the efficiency and adaptability
net machines happen to distribute information connectively of algorithms to various problem-solving scenarios.
between many addresses, both wholly and partly. This
distributed architecture allows the representation of complex This algorithmic approach also benefits the overall
patterns in the linguistic data, hence improving the learning computational process by incorporating fine-tuning, transfer
ability of the machine in understanding and interpreting learning, and other techniques that reduce the computational
language. load and expedite algorithm development. The efficiency,
scalability, and robustness of RNN-OA have been rigorously
Such architectural considerations follow three important tested, showing that it offers significant benefits and has
characteristics set by Hopfield for neural network computers: potential for further improvements.
large connectivity, analogue response, and reciprocal or re-
entrant connections. These characteristics give rise to In practical terms, RNN-OA is applicable to a broad range of
computations qualitatively different from those performed by computer science functions, including voice recognition,
Boolean logic. machine translation, and time series forecasting. The
evaluation of its efficiency and scalability involves the use of
In practical applications, several neural network architectures specially developed frameworks and mathematical models.
have impressively addressed a wide range of natural language These models take into account dynamic learning, model
problems, from low-level phonology to high-level syntax. For stability, adaptability to various data sources, and sensitivity to
example, Rumelhart did experiments in the prediction of input fluctuations.
English verb morphology, and Sejnowski and Rosenberg
developed a model of phonology. Similarly, companies such The integration of RNNs into computer science marks a
as Nestor Corporation have manufactured tablets that significant leap forward, and innovative approaches like RNN-
recognize handwritten input, while Neural Tech has OA have boundless potential to expand their application
introduced products that can recognize teaching and learning further. Continuous improvements in RNN-based methods are
input in more than one natural language. setting high expectations for computational advancements that
While these are certainly indicative of what may be promised promise extensive benefits for academia and industry. [15]
by neural network architectures within NLP, it has to be
understood that they remain experimental and years away A4. The Back-Propagation Algorithm in Computer
from realising wide usage. However, this ongoing experiment Science and Applications
Back-Propagation (BP) is an algorithm used to train artificial Output Layer Neurons: For each output neuron, the error
neural networks through a method of error correction based on term (δ) is calculated by:
the previously computed errors. From a computer science
perspective, BP modifies weights allocated to a multilayer
network according to the actual computation of error that Where y i is the actual output for neuron j , y iis the predicted
happened during the previous iteration. The role of BP in output for neuron j , and f ' ( z j ) is the derivative of the
computer science applications is mainly to minimise the error
rate when predicting the outputs and is largely used in solving activation function applied at the output of neuron j , z j .
more complex problems like image recognition, autonomous
vehicles, and natural language processing. Hidden Layer Neurons: For neurons in the hidden layers, the
error is propagated back from the output layer, and the error
Back-Propagation (BP) is an algorithm used to train artificial term is calculated using:
neural networks through a method of error correction based on
the previously computed errors. From a computer science
perspective, weight modification is executed with the basic
help of BP in the process of network iteration. The role of BP
in computer science applications is mainly to minimise the Where w jk are the weights connecting neuron j in a hidden
error rate when predicting the outputs and is largely used in
solving more complex problems like the large-scale process of layer to neuron k in the subsequent layer, δ k is the error term
image recognition, autonomous vehicles, and natural language for neuron k in the layer above, and f ' ( z j ) is the derivative of
processing. the activation function at the output of neuron j , z j .
i. How BP works
Weight Update Rule: The weights are updated by moving
The BP algorithm consists of two main phases: the forward
against the gradient of the error function, which is computed
pass and the backward pass. During the forward pass, the
input data passes through the network layer after another,
starting from the input layer to the network’s output layer with
some initialised weight in the matrices and vector form.
Further, the resultant values of each layer are then passed to
the next layer. Subsequently, some predicted output of the for each weight as follows:
actual output data is made from the output layer generated Where η is the learning rate, δ j is the error term for the
during the forward pass. The next step is the backward pass, neuron j , computed as shown above, and ο i is the output of
which is just a direct influence of the forward pass process. the previous layer's neuron i , which is connected to the neuron
The error is to be used for iteration; hence it is made against
the actual output so as the error is made using the back pass.
j by the weight w ij.
The weights are fine-tuned in that they minimise the error
value. Some differentiation is involved concerning the partial These formulas are essential because they direct the iterative
margin of change in the weights. This is generally calculated weight modifications that enable neural networks to learn
using calculus, or partial derivatives to be more precise. from their mistakes and gradually increase their accuracy.
The above scenarios are just a few examples that indicate how
BP algorithms can be used to model intricate patterns and
make informed patterns. Given their history of innovation and
the optimal efficiency of their models, BP models are
unquestionably going to be the cornerstone of modern
computational sciences. [16]
A5. Computer Network Routing Optimization Figure 4. Architecture of the Neural Network for Routing
Algorithm Optimization
The development of Internet technologies has not only
changed the way we live, but also led to the emergence of a The ANN employs a probabilistic model to dynamically form
particularly urgent problem – the need for high-quality network connections, as described by the following equation:
network infrastructure. Due to the steady increase in network
requirements, one of the most pressing concerns is the
optimization of the network routing process. Traditional
technologies rarely cope with network complexities, which
forces researchers to seek new trends such as Artificial Neural
Networks that optimise routing. where Π ( i ) is the probability that a new node i will connect
to an existing node, k i is the degree of node i , and k j
i. The Challenge of Network Routing represents the degree of node j . This formula helps the ANN
in predicting the most efficient pathways by optimising the vital in cybersecurity in situations where the risks develop at a
network topology based on the likelihood of node connections. great speed and display nonlinear characteristics. Through the
Additionally, the system delay model used to minimise latency use of networks simulating actual biological systems, security
infrastructures can naturally adapt to new menaces, grasping
the patterns and misconforms in real time to fortify the shield
and hedge risks that might occur. [18]
and optimise routing is given by:
B1. Fraud Detection
where T C i is the total system delay, T ti represents the Financial fraud has continued to be an enduring threat that is
transmission delay, and t b irepresents the delay experienced faced in the financial sector, and this assails on the
due to inadequate bandwidth availability, signifying the individuals, institutions, and economies greatly. The deep
waiting time for data transmission. Pci , j symbolises the delay neural networks that exhibit this capability are known for
induced by queuing at the Mobile Edge Computing (MEC) autonomous learning of complex patterns and representations
from raw data, therefore this technique could be very effective
infrastructure, and t i signifies the delay attributed to task in addressing this issue. The performance of neural networks
execution by the MEC server. These components are crucial in fraud detection is not just a representation of their aptitude
for evaluating the efficiency of different routing paths and are for detecting sophisticated patterns in large data sets but a
integral to the ANN’s decision-making process. profound illustration of their excellence. The neural networks,
namely neural networks that are built based on the
iv. Simulation Results transactional data, user behaviour, and historical patterns, are
A comparative assessment according to the traditional routing capable of spotting anomalous activities that could be
of the network demonstrates the much higher efficacy of the fraudulent behaviour. They are rather capable of adjusting to
new proposed model. According to the results, the ANN fresh cases of fraud and learning from these new instances.
model practically reduces packet loss and delay to zero and [18] As a result, the anti-fraud mechanism is constantly
does not require human intervention, which can significantly improving its detection algorithms, and therefore, the efficacy
increase its effectiveness in terms of routing. of fraud prevention technologies goes up. It thus facilitates
eliminating the requirement of labour-intensive manual feature
v. Applications and Future Work engineering, which may also be very time-consuming and
The ANN-based routing optimization model has extensive domain specific. In addition, deep learning approaches are
prospects in that it can be applied to both small corporate good at handling multidimensional data and finding hidden
networks and international Internet backbones. The relationships, especially the complex and hidden ones, which
programmed scalability allows the use of this algorithm for give the system a unique feature of identifying subtle and
modern dynamic telecommunications. Future work will covert signs that characterise the fraudulent behaviours.
involve the reduction of dependency on human data and the Different deep learning architectures, such as the
rapid response of the network to the situation. [17] convolutional neural networks (CNN), and the graph neural
networks (GNN) are used to detect financial fraud in recent
B. In Security times. These models are used in different types of financial
systems like detecting credit card fraud, insurance, and money
ANN adaptability and efficacy have rendered them laundering. Most notably, deep learning models have been
indispensable in safeguarding critical systems, combating consistently outperforming classic approaches, with a success
fraudulent activities, and enhancing security measures across rate of around 99%. [19] [20].
diverse domains.
This section delves into the multifaceted applications of neural
i. CNN in Fraud detection
networks in security, focusing on three pivotal areas: fraud
Convolutional Neural Networks (CNN) is a popular deep
detection, anomaly detection in cybersecurity, and facial
learning algorithm, which shows good results in finding
recognition for security purposes. The integration of neural
unobservable features of dubious transactions and helps to
networks in these realms not only augments traditional
avoid overfitting of the model. The CNN algorithm has three
security measures but also empowers organisations to
main layers which are: Convolution layer, pooling layer, and
proactively mitigate risks and fortify their defences against
fully connected layer constitute the neural network. Normally,
evolving threats.
the role of the convolution and pooling layers is to perform
feature extraction. The third layer which is known as the fully
Neural networks have especially much to offer in the security
connected layer performs the operation of mapping the
field by focusing on the logic of input-output relationship
extracted features into its final output, such as classification.
surface and the depth learning process inspired by the human
[19] [21]
brain, they acquire knowledge by learning and storing it
within connection strengths between the neurons, recognized
as synaptic weights. Different from traditional fit to purpose
linear models, neural networks demonstrate their flexibility to
non-linear and linear correlations while not using intermediate
variables to model the reality. This capability is proven to be
The graph neural networks achieve this by using message
passing procedures where it disseminates information across
the network edges thus processing information in a way that
encapsulates the graph topology and relationships of the
nodes. GNN gives fraud scores to the node or transaction as it
does graph embedding operations on the financial transaction
graph and learning its features. These suspicion scores are the
variables that are determined for the accounting of these
Fi systems in order to be exposed to fraud. The GNN was used to
gure 5. Overall Network Structure guess fraud scores and a threshold that separated ordinary and
suspicious transactions was applied. Fraud scores higher than
The design of network structure is intended to make it possible a certain threshold is the sign to put the transactions on stake,
for applying the analytical tool to network transaction data and and they are investigated deeper. The boundary value might be
for the identification of criminal financial activities in a short computed from the data distribution to avoid the occurrence of
time. In essence, we have an input feature sequencing layer, a either false positives or false negatives while optimising for
group of four convolutional layers interlaced with pooling the necessary intervals within the domain knowledge.
layers, and a fully connected layer (Fig. 1). The next task is Cooperation between automated detection from the GNN and
the feature sequencing layer; a layer operated through which the expertise of professional human analysts, will provide any
the input features are processed according to their orders. financial institution with the means to tackle financial fraud
Distinction of effects are accumulated on the model whenever efficiently and effectively by being more proactive.
different order feature input layers are convoluted. The
filtering function of the convolutional layer is to detect the B2. Anomaly Detection in Cybersecurity
local feature of the input data; in this context, developers The cybersecurity domain is nowadays being challenged by
would benefit from the new computed features based on the non-trivial attacks, whose skilling development is advanced.
input features. These new attribute items that are not defined This is why the research in defence mechanisms is now
physically but are certainly useful in the data modelling booming. Traditional detection systems that are designed to
domain, they are. Pooling helps to combine the features from work only with attack templates are not effective enough when
the adjacent areas into a single higher-level feature which is it comes to the development of new threats or changing attack
more efficient and makes use of less of the data. The final strategies, which has already resulted in search for better
layer, which is fully connected, is responsible for dynamic and smart solutions. The fact that machine learning
classification of stocks. The number of nodes in each layer of techniques including the neural networks are used as a good
a neural network varies from one input to another. The trained option to strengthen intrusion detection systems and those
networks model will get the optimised model parameters from systems have the ability of learning and reacting to new
the training data. The optimised model parameters also can be threats in (a) real-time has been a positive sign. Through the
directly applied to the detection of real trading data in a real application of the data science and analytics, cybersecurity
time. [22] experts can obtain more and more useful data from the vast
data set, which will make the defence mechanism more
ii. GNNs applied for financial fraud detection effective and the digital fortress also stronger because of the
Graph neural networks (GNN) is grasping a larger pool of continual cybersecurity threats evolution. Neural networks
users as they discover their utility in learning about graphs. provide an alternative solution by resorting to their ability to
The structure of the graph naturally supports strong problem- observe the smallest disparities with well-established norms.
solving and modelling of complex relationships between Shifting from reactive detection to proactive detection, neural
nodes through message passing and agglomeration. [23] networks automatically process historical information and
The graph applied in the case of financial fraud detection datasets containing malicious behaviour patterns, thus being
scenario is usually made-up of nodes that refer to accounts and more capable of identifying and mitigating cyber threats in
edges which represent transactions. Every node means a real-time.
financial account including the examples of bank account,
credit card account, or any financial institution implicated in a i. RNN in cybersecurity
transaction. Nodes can possess values, namely type of RNN, or recurrent neural network, which is a subset of neural
account, transaction history, current balance, account owner networks, features loops within its nodes, forming a directed
information, and other data applicable to fraud detection. All graph. This structure enhances its status as a network. This
ripples in between correspond to a financial exchange between subject allows us to demonstrate the recognition of the
two accounts. The edge label displays the transaction amount dynamic behaviour that is carried out in the sequence. The
transferred, in relation from account A to account B. Edges internal memory serves as a place where the sequence of
may be linked with weighted attributes representing the activations is processed, that way they can conduct both back
quantities’ transfers or transactions annotations (e.g. and forward transmission by forming feedback loops in the
transactions mechanism in certain occasions or the transferred network. Gradients are more complicated to deal with when
sums). training RNNs, however. Nevertheless, the progress attained
in architecture and training as-of-today yielded different
Algorithm 1: Training Neural Network the labelled data. The algorithm introduced begins with
----------------------------------------------------------- classifying the specified LSTM-RNN model as a classifier of
Input: Features X extracted from the training each channel. It then gets the R vector indicating results of
dataset with labelled information evaluation through the classifier by using the test dataset.
Continuing, it goes through all elements of R by applying the
Initialization: voting method to determine the value v as the element of
1. for channel = 1 to N do majority. It finishes by returning the element v as the result of
2. Train LSTM-RNN model the attack detection process. [26]
3. Save the LSTM-RNN model as a classifier c
4. end for B3. Facial Recognition for Security Purposes
Facial recognition is the most critical function of video
Return: c surveillance systems, which makes it possible to determine
whether the image is that of a person in a scene, and mostly
RNNs. The model is a little bit easier to train as it is. LSTM monitored through a network of cameras. Such application has
(long short-term memory), the improved one of RNN, was widespread use in border security, access control systems,
monitoring and enforcing the law. This helps in addressing
Algorithm 2: Attack Detection security related issues but at the same time making privacy
----------------------------------------------------------- and accuracy a top priority. The utilisation of people’s faces in
Input: Feature X extracted from test dataset with the photos to give rise to the increasing interest among the
labelled information scientists is a factor which is due to their application interests
as well as the challenge that this presents to artificial vision
Initialization: algorithms. The specialists have to be ready to deal with the
1. for channel = 1 to N do extremely high diversity of the features of faces, as well as of
2. Load LSTM-RNN model as a classifier the many different parameters of the image (angle, lighting,
3. Get the result vector R of the classifier hairstyle, facial expression, background, etc.). Currently, the
4. end for most widely recognized face recognition methods utilise
Convolutional Neural Networks. It describes the architecture
Vote to get the majority element v: of a Deep Learning model which allows the enhancement of
1. for r in R do the existing best programs in terms of accuracy and processing
2. Vote to get the majority element v time.
3. end for
i. CNN in Facial Recognition
Return: v The said network is composed of two convolutional layers,
then a fully connected layer and at last classification layer.
proposed in 1997 as they were put forward by Hohenreiter and Every layer of convolution is succeeded by an activation layer
Schmidhuber. LSTM is the first step of a new revolution on and a carpooling operation. Also, two regularisation
speech recognition and incredible success on some traditional techniques after each convolution layer are added: batch norm
models in niche applications. It serves to overcome the only and dropout. The fully connected layer is then applied
drawback of RNNs, in short-term memory. LSTMs, with followed by the dropout technique which is to reduce
several neurons connected to the previous time unit. The overfitting and to improve the performance of the proposed
memory accumulator is the term that defines the configuration neural network model. [27]
of units responsible for collecting the information and is called
a memory cell [24] [25]. In Deep Learning Based Multi- While for image processing or any sort of prediction, which is
Channel Intelligent Attack Detection for Data Security [26] associated with image, a convolutional neural network is first
the authors recommend the following algorithm as seen of all the choice. A standard convolutional neural network
below: would constitute of a number of simple layers, which may be
The detection algorithm is described by pseudocode, given as repeated n times in the network depending on the topic that is
Algorithm 2. to be predicted [28] [29]. The first layer consists of a
Algorithm 1 presents the process for training a network that convolutional layer populated with some filter that will be
will have a Long Short-Term Memory Recurrent Neural applied to the pixels of the image.
Network (LSTM-RNN) model. From the labelled training
dataset features X this requires are taken in as the input. The Usually, the image should be larger relative to the filter
algorithm gets started with setting up the LSTM-RNN model applied to it. From the beginning to the end of the image, the
for each channel in the dataset. It performs the process of filter goes in the horizontal and vertical directions, one step at
looping over all the channels, trains the LSTM-RNN network a time, the values of the convolutional layer are calculated
model, and saves the trained model in the classifier. After that, with a dot product method. The generated convolutional layer
it returns the classifier c that can make predictions. Algorithm results are then passed to the next layer called pooling layer.
2 explains the detection scheme with the classifier made using Through this process, the dimensions of values taken from the
LSTM-RNN which is learned from Algorithm 1. It reviews previous layer are actually the features we have extracted to
the test data set that comes in as a featured data X including
better describe the image. The same needs to be approached As we navigate an increasingly interconnected and digitised
using a pooling filter which smoothly scans the output of the world, the integration of neural networks in security systems
previous output. Conditioned on the topic to be predicted, a promises to fortify defences, thwart malicious activities, and
convolutional layer and successive pooling layers are safeguard critical assets. Through an exploration of their
repeatedly applied to produce the desired output. applications in fraud detection, anomaly detection in
Subsequently, the subset is exposed to the compression stage, cybersecurity, and facial recognition for security purposes, this
where after it is pooled, the final dimension is flattened out. section illuminates the transformative potential of neural
Such output from the first layer goes to the next layer which is networks in shaping the future of security paradigms.
fully connected, and the prediction is done; finally at the last
layer, the predicted output can be seen. In the present study, an C. In Health Care
exhaustive search of the data from the image is going to In recent years, the technological advancements in health
produce around 68 key points which is the main asset of the systems and especially the integration of neural networks in
study. It is evident that the overall CNN model can be healthcare have revolutionized the world of medicine.
extracted from the given Fig. 1 to understand the structure of In this section, we will focus on the influence neural networks
the CNN. The image will be pre-trained in the proposed CNN have had in healthcare, emphasizing on the various neural
architecture which hasn’t been done in the previous stage [30] network architectures that are commonly used in medicine,
[31]. The RGB-formatted input image that uses colour space discussing the diverse range of their applications across
from [0,255], will be converted to grayscale so that it changes various medical fields, as well as analysing the challenges of
to [0,1]. To maintain the consistency of the original applying deep learning in healthcare.
information- it has a resolution of 224*224 pixels -, this
grayscale data is resampled to the standard pixel size [32] [33] C1. Architectures
[34]. The task is to apply appropriate formatting steps. After This section describes the various neural network architectures
that, the convolution model accepts the image. Human figure adapted for healthcare applications. While Convolutional
key point extraction was achieved by the use of the given Neural Networks (CNN) and Recurrent Neural Networks
figure, which is the architecture of the CNN model in Fig. 6. (RNN) are extensively used in healthcare, this section will
focus on Autoencoders (AE), Restricted Boltzmann Machines
(RBM) and Long Short-Term Memory (LSTM).
i. Autoencoders (AE)
Autoencoders are one of the deep learning models that
illustrate the idea of unsupervised representation learning.
Initially, they were introduced as an early tool used to pre-
train supervised deep learning models, when labeled data was
Figure 6. CNN architecture for Facial Key point Prediction uncommon. Despite that, they kept usefulness for
unsupervised procedures such as the phenotype discovery
B4. Challenges [36]. Explicitly, autoencoders are divided into two main parts
Application of neural networks to security, on the other hand, the encoder and the decoder. The encoder consists of an input
is fraught with a lot of challenges even with the effectiveness layer, while the decoder comprises an output layer [37].
of it. There is one prominent drawback of neural network Moreover, they possess a similar number of nodes for both
models; it is in the paring of the network architecture. When input and output, and the number of units that are not visible is
carrying out some studies researchers have noticed that the less than that of the input or output layers, which achieves the
number of layers in the model can be affected in a negative whole purpose of AE. Autoencoders are designed to encode
way through a decrease in accuracy. [20] Here is a the input data into a lower dimensional space [38]. By training
manifestation highlighting the importance of the model (model an AE on a dataset, they are able to transform the input data
class) architecture by demonstrating how it affects the into a format focused only on storing the most important
accuracy; hence, an appropriate model class architecture and derived dimensions. In this way, they bear resemblance to
tuning are required. Ensuring that they keep up with the latest standard dimensionality reduction techniques, for instance, the
algorithms and solutions for neural networks for organisations singular value decomposition (SVD) and the principal
that are prone to financial abuse is also critical. [35] The component analysis (PCA). However, autoencoders have an
malicious changing nature of fraud schemes will continue to important advantage for complicated problems on account of
pose a challenge for financial institutions since the criminals nonlinear transformations by each hidden layer’s activation
are always devising new means to carry out their scams. In functions, but one hidden layer of an autoencoder could
other words, although neural networks leverage very attractive potentially be insufficient to represent all the data if the input
tools for fraud detection, anomaly detection etc, their is of high dimensionality.
incorporation necessitates in-depth comprehension of their
capabilities, defects, and latest developments to make them an Additionally, autoencoders when stacked on top of each other
excellent weapon against crimes. are able to construct a Deep Autoencoder (DAE) architecture.
B5. Conclusion
Numerous mutations of AE have been proposed to convert the probability to enhance the lower bound of the probability.
acquired representations into something more robust and Similarly to DBNs, DBMs utilize a greedy layer-wise training
consistent rather than tiny changes in the input pattern. One of method during pretraining. The primary challenge they face
those mutations is the Sparse Autoencoder (SAE), which lies within their inference time complexity, which is
specializes in learning sparse representations of the input data. significantly higher than that of DBN, making the argument
Sparse Autoencoders achieve sparsity by activating only a optimization impractical for large training sets [44].
small subset of neurons during encoding, making the classes
even more divisible. Vincent et al. [39] proposed another iii. Long Short-Term Memory (LSTM)
mutation known as denoising autoencoders. This method LSTM is a specialized recurrent neural network (RNN)
remakes the input by bringing in noise to the patterns, forcing architecture that was designed to model their long-range
the model to focus solely on capturing the formation of the dependencies and their temporal sequences, more accurately
input. A similar concept was introduced by Rifai et al. [40] in than conventional RNNs [41]. In the typical architecture of
their proposal of contractive autoencoders. However, instead LSTM networks, there is an input layer, a recurrent LSTM
of corrupting the training set with noise, this mutation adds an layer, and an output layer, with the input layer being directly
analytical contractive penalty to the error function. Lastly, in connected to the LSTM layer. The recurrent connections
Convolutional Autoencoders (CAE) [41] their weights are within the LSTM layer extend directly from the cell output
shared amidst all locations in the input to maintain spatial units to the cell input units, input gates, output gates, and
locality and accurately process two-dimensional (2-D) forget gates [42]. These gates regulate the flow of information
patterns. within the network. They control how much information is
stored or discarded from the memory cell each time step,
ii. Restricted Boltzmann Machine (RBM) enabling the model to learn long-term dependencies more
The Restricted Boltzmann machine is another unsupervised effectively. One of the main motivations behind LSTM’s
deep learning architecture for learning input data design is to address the vanishing gradient problem
representations. Their aim is similar to autoencoders, but encountered in traditional RNNs. By introducing the memory
RBMs put on a stochastic outlook by evaluating the cell and gating mechanism, LSTM can reduce the issue of
probability distribution of the input data. Because of this, they vanishing gradients, allowing it to carry forward errors over
are frequently considered as generative models, aiming to extended sequences without the gradients diminishing to zero.
model the underlying process, responsible for generating the
data. Training an RBM usually includes stochastic C2. Applications
optimization methods, such as Gibbs sampling, which This section explores the applications of neural networks in
gradually adjusts the weights to minimize the reconstruction healthcare, focusing on three important areas: Medical
error. In an RBM, the visible and hidden units are combined to Imaging, Medical Informatics, and Disease Diagnosis
form a bipartite graph allowing for the implementation of Prediction.
more effective and thorough training algorithms. The
Restricted Boltzmann Machines serve as learning models in i. Medical Imaging
two main deep learning configurations, that have been In modern medicine, automatic medical imaging analysis
proposed in literature. These are the Deep Belief Network holds significant importance, since diagnosis based on the
(DBN) and the Deep Boltzmann machine (DBM). interpretation of images can be extremely subjective.