Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Received March 17, 2022, accepted April 1, 2022, date of publication April 20, 2022, date of current version

April 27, 2022.


Digital Object Identifier 10.1109/ACCESS.2022.3168794

Deep-Ensemble and Multifaceted Behavioral


Malware Variant Detection Model
ASMA A. AL-HASHMI 1 , FUAD A. GHALEB 2,3 , A. AL-MARGHILANI4 ,
ABDULSAMAD E. YAHYA 4 , SHOUKI A. EBAD 1 , MUHAMMAD SAQIB M. S.1 ,
AND ABDULBASIT A. DAREM 1 , (Member, IEEE)
1 Department of Computer Science, Northern Border University, Arar 91431, Saudi Arabia
2 School of Computing, University Teknologi Malaysia (UTM), Johor Bahru, Johor 81310, Malaysia
3 Department of Computer and Electronic Engineering, Sana’a Community College, Sana’a, Yemen
4 College of Computer Science & Information Technology, Northern Border University, Arar 91431, Saudi Arabia

Corresponding author: Asma A. Al-Hashmi ([email protected])


This work was supported by the Deputyship for Research and Innovation, Ministry of Education, Saudi Arabia, under Project 1385.

ABSTRACT Every day, hundreds of thousands of new malware programs are developed and spread
worldwide in cyberspace. Most of these malware programs are malware variants such as polymorphic
and metamorphic malware, which are created from older versions of malware and able to change their
structures and function flows to circumvent security solutions. The accuracy of malware variant detection
is a crucial challenge. Many existing malware variant detections use static features extracted from the
physical structure of malware file, such as opcodes and function flows. Unfortunately, the static features
are subject to obfuscation and code shelling using simple obfuscation techniques. Although a malware
variant can change its structure and function flows, it is widely believed that the malware variant cannot
hide its malicious behavioral patterns during the runtime. Accordingly, dynamic, or behavioral analysis-
based features were suggested by many studies to detect malware variants accurately. However, most of
these studies are solely dependent on application-programmable interface calls (or API calls), which is not
enough to accurately distinguish between malware and benign due to API-based obfuscation techniques.
Therefore, a malware variant detection model that combines different behavioral activities can improve
detection accuracy while reducing the false-negative rate. To this end, this study proposed a Deep-Ensemble
and Multifaceted Behavioral Malware Variant Detection Model using Sequential Deep Learning and Extreme
Gradient Boosting Techniques. Different behavioral features were extracted from the dynamic analysis
environment. Then, a feature extraction algorithm that can automatically extract effective representative
patterns has been designed and developed to extract the hidden representative features of the malware variants
using a sequential deep learning model. These features have been fed into a developed extreme gradient
boosting-based classifier for decision making. Extensive experiments have been carried out to validate the
proposed scheme. The results were compared to the other related techniques in the field. The results show
that the proposed model is reliable, as it improves the detection rate while reducing the false-negative rate.

INDEX TERMS Malware detection, malware variants, multifaceted behavioral features, deep ensemble
learning, sequential deep learning.

I. INTRODUCTION According to [3]–[5], 50% to 80% of the existing malicious


Malicious software or malware programs have been rapidly software are malware variants. The newly detected malware
growing in recent years. According to the AV-Test Insti- variants in 2020 have been increased by 74% compared to
tute, there are more than one billion malware worldwide, that identified in 2019. In their 2021 report, Webroot stated
and 560,000 malware are detected every day [1]. Most mal- that 94% of all malicious executables are polymorphic [6].
ware developers do not develop malware from scratch [2]. Polymorphic malware can frequently change its appearance
(e.g., every 20 seconds) in terms of code structure and logic
The associate editor coordinating the review of this manuscript and flow, creating massive malware variants [6], [7]. These vast
approving it for publication was Claudio Zunino. amounts of malware forced researchers to propose many

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
42762 VOLUME 10, 2022
A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

methods to detect and naturalize the malware payload before static analysis can achieve high detection performance, API
compromising security. alone cannot reflect the malicious behavior of a malware
Existing malware detection solutions can be categorized sample. Other dynamic behavior such as file auditing, reg-
into two groups namely static and dynamic analysis. It is istry access, and network behavior can further improve the
categorized based on the type of analysis and how their detection accuracy while reducing the false-negative rate. The
features are extracted [8], [9]. In the static analysis, features hypothesis is that each type of behavioral characteristic can
are extracted from malware executable files (e.g. .exe and tell a part of the maliciousness or goodness of the investi-
.dll in MS Windows), without the need to execute the mal- gated executable file. However, to the best of our knowl-
ware samples. Examples of static features include strings, edge, no model was found that combines different dynamic
imported libraries, and function calls, among many others. behavior to detect malware variants. Therefore, it is important
Many solutions have been proposed to detect malware vari- to incorporate different behavioral patterns into the malware
ants using static analysis [7], [10]–[15]. Static analysis has variant detection model to improve its performance. Design-
been frequently reported for the detection of malware variants ing a model that combines different behavioral features is
[7], [10], [11], [14], [15]. However, static features can be challenging due to the overlapping nature of the patterns
hampered by obfuscation techniques, such as polymorphic that may work as noise during constructing the classifier.
and metamorphic malware, that hide the malicious payload Therefore, it is essential to effectively extract the represen-
and make it indistinguishable [4], [16]. Some obfuscation tative features that distinguish between benign and malware
techniques can prevent feature extraction and hinder the patterns.
static analysis by dynamically loading the code during the To this end, this study proposed a Multifaceted
runtime [13]. Therefore, static features are ineffective for Deep Ensemble Behavioral-based Malware Variant Detec-
malware variants that change their appearance frequently by tion Scheme using sequential deep learning and the
modifying or hiding their malicious structure, function flow, eXtreme Gradient Boosting algorithm (MDEB-MVDS-
or rewriting themselves from scratch. XGB). The proposed MDEB-MVDS-XGB combines
In contrast, the dynamic analysis aims to extract the behav- multiple behavioral-based features extracted from dynamic
ioral features by monitoring the behavior of the executable analysis. Different behavioral features were extracted from
program during the runtime execution. Examples of behav- dynamic analysis, such as API calls, log and file auditing, reg-
ioral features include API and system calls, log and audit- istry access, and network traffic. The sequential deep learning
ing files, registry access, and network traffic. Because the algorithm was designed and developed to extract the hidden
malware variant is generated from old malware, malware representative malware features automatically. The activation
variants usually have similar behavior to the original [2]. values and weights of the last hidden layer of the trained deep
Therefore, behavioral analysis is key to accurately detecting learning model were used to develop an ensemble classifier
malware variants. However, most existing behavioral-based using the eXtreme Gradient Boosting algorithm. Extensive
malware detection solutions are based solely on API calls experiments were carried out to evaluate the proposed model.
[4], [9], [17]–[21]. Although API calls traces can represent The results show that the proposed MDEB-MVDS-XGB
most of the malware variants, API calls alone are not enough model can detect unseen malware variants effectively. This
to accurately distinguish between malware and benign. This study made the following contributions.
is because most of the malware writers use the same APIs 1) A Multifaceted and Deep Ensemble Behavioral-based
functions that are used for developing benign software. Thus, Malware Variant Detection Scheme using sequential
it becomes difficult to differentiate between malware and deep learning and the eXtreme Gradient Boosting algo-
benign, depending solely on API calls. rithm (MDEB-MVDS-XGB) are designed and devel-
Moreover, many malware writers deliberately inject unnec- oped. Different behavioral features were extracted from
essary API calls to evade detection. In addition, not all mali- dynamic analysis. These features were combined and
cious or benign software uses API calls to function; many of used for the detection to overcome the malware variant
them write their codes without the use of the API. In this case, obfuscation techniques.
the subject file may be represented by a sparse vector, and 2) A features extraction algorithm based on a sequential
thus, it is hard to distinguish the malicious behavior from the deep learning model is designed and developed for
legitimate one. We argue that the absence of API traces does automatic extraction of the hidden malware patterns
not mean that the subject file is benign. Accordingly, API call without human intervention. The weights and activa-
sequences become ineffective for accurate representation and tion values of the neurons of the last hidden layer of the
detection. trained deep learning model are extracted and used as a
Some solutions combined different types of features to new representative feature to train the ensemble model
detect malicious patterns accurately [4], [5], [22], [23]. How- based on the extreme gradient boosting algorithm.
ever, most of these solutions combine different types of static 3) The fail-safe security principle is preserved by increas-
features [4], [9] or combine static features with API call ing the classification accuracy by minimizing the false-
sequences that are extracted from the dynamic analysis [4], negative rate. Different ensemble models with single
[22]. Although API-based features from the dynamic and and multiple behavioral features were designed and

VOLUME 10, 2022 42763


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

developed to evaluate the proposed model. The pro- malware detection based on API call traces. Frequent sub-
posed model was validated and evaluated by conduct- graphs were used to represent the behavior of malware in the
ing extensive experiments. same family. The main drawback of this approach is the static
The rest of this study is organized as follows. The related features that are used to detect dynamic structure malware.
work is presented in Section 2. A detailed description of the Mahawer and Nagaraju [24] proposed a model for detecting
proposed model is explained in Section 3. Section 4 presents metamorphic malware using a support vector machine with
the performance analysis, including the description of the a histogram kernel. Patanaik and Barbhuiya [20] proposed
dataset, performance measures, and evaluation and validation a model using system calls to create a signature to detect
procedures. Section 5 presents the results and a discussion. malicious obfuscated programs. However, relying solely on
It also includes the limitations and future work of this study. interdependent system calls is ineffective to detect malware
This study is concluded in Section 6. variants because such features can be evaded easily using
simple obfuscation techniques such as API call reordering
II. RELATED WORK and garbage API call insertion. Huang et al [27] extracted
Malware authors constantly innovate ways to create new representative features from a user interface that is associated
malware variants that circumvent security solutions while with the top-level API function to detect stealthy behavior.
security analysts and researchers try to improve the security For example, sending an email must be associated with a
defenses and naturalize such threats. Many obfuscation tech- user interface to allow the user to create the message and
niques have been reproduced to create new malware variants send the button. However, the behavioral models designed
that can evade detection. For example, polymorphic malware based on API correlation with the user interface have many
can modify its appearance in terms of structure and functions drawbacks. For example, in many automated services of
flow like the chameleon, which can change its color to dis- benign programs, an API function does not need to have a
guise itself and hide from predators [7]. Another example is corresponding user interface. Therefore, depending on static
metamorphic malware which can rewrite itself from scratch analysis only makes the solution vulnerable to polymorphic
[10], [12], [24]. Such malware is usually created from pre- and metamorphic malware types.
vious malware but with new characteristics. There are many Bai et al [26] developed a model that used a function call
solutions proposed to countermeasure malware variants [2], graph (FCG) to represent the malware variant. The signa-
[4], [5], [7], [16], [25], [26]. Most of these solutions are for tures for the FCGs were created and stored in a database.
detection purposes. A portable file with a match FCG signature in the database
Malware variant detection solutions can be grouped is recognized as malware. The final decision of whether a
according to the type of analysis into two types: static file is malware or not is based on the graph isomorphism
and dynamic. In static analysis, representative features are algorithm [33]. The main disadvantage of the signature-based
extracted from the portable executable file (the exe files approach is its ineffectiveness in detecting new malware
and dll files on MS Windows platforms) without execut- variants. Moreover, the graph isomorphism algorithm can
ing these files. Static features are extracted from the file be circumvented by polymorphic malware. Xiao et al [11]
that includes strings [13], operation codes (opcodes) [12], proposed a malware variant detection framework based on
dynamic link libraries, API calls [4], [14], [19]–[21], function binary features that were extracted from portable executable
calls [26], and requested permeations and intended correla- file samples using the deep convolutional neural network.
tions (in Android platforms) [27]. Meanwhile, in dynamic The malware binary is represented as an entropy, graph,
analysis, representative features are extracted by monitor- and features were extracted using the convolutional neural
ing the behavior of malware during runtime in terms of its network (CNN). Then, a classifier using the support vector
interactions with the operating system [17]–[19], [22], [28], machine (SVM) was trained for the final classification Cui,
[29], file systems, windows registry, and network traffic [15], Xue [16] visualized the opcodes extracted from portable
[30]. Different behavioral features can be extracted, such as executable files by grayscale images and used CNN to train
system calls, API call sequence, file-related behavior (access, a model that can detect malware variants. Wang, Gao [13]
created, modified, or deleted), registry access (creating or proposed malware variant detection based on the Ensemble of
modification), and network traffic. String and Structural Static Features. Many types of features
behavior Liu et al [31] proposed a malware detection model were extracted, including string, permissions, hardware and
using an ensemble shared nearest neighbor (SNN) clustering software requirements, intents, API calls, opcode, and the
algorithm. Three types of features were extracted through function call graph. These features have been grouped into
static analysis: opcode, control flow graph, and import func- two types string-based and structural-based features. These
tions which were represented by a grayscale image, directed features were separately used for training. Three machine
graph, and term frequency, respectively. Features selection learning classifiers were used to train the proposed ensemble
using information gain of the sequence was applied to extract model, SVM, k-nearest neighbor (KNN), and random for-
500 features among all features extracted using 3-gram. Fea- est (RF) algorithm. The result of each classifier is weighted
tures were combined, and different machine learning was based on the features type. The main drawbacks of these
trained for the decision-making Fan et al [32] developed a solutions are their dependence on static features, which is

42764 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

TABLE 1. Related work.

ineffective for detecting malware variants due to the simple classifier for the combined features. Sun et al [2] proposed
obfuscation techniques that malware authors can use to hide a malware variant detection model based on both static and
malicious patterns in the binary code. dynamic analysis structured features. The suspicious system
Darem et al. [25] present an adaptive mode for detecting call set (SSS) and runtime behavior graph (RBG) were used
malware variants based on API calls sequences and incre- as behavioral features. The static behavior graph (SBG),
mental deep learning. The API calls sequence were extracted which is a subgraph of RBG was used to represent malware
using n-gram and represented using term frequency-inverse static behavior while the system calls were used to repre-
document frequency (TF-IDF). The main limitation of this sent its dynamic behavior. Although the model generates
approach is the need for human intervention to label the the signature from malware runtime behavior, the model is
malware variant to update the model. Han, Xue [28] used API signature-based, where the runtime behavior signatures of
call sequences that were extracted from static and dynamic known attacks are stored for matching. A new malware vari-
analysis to develop a malware detection framework. Dynamic ant is detected based on the similarity of its RBG and SSS
and static API call sequences are correlated to construct a with the existing signature. Zhang et al [4] proposed a hybrid
hybrid feature vector based on semantics mapping. A poten- malware variant detection system based on the combination
tial downside of this framework is that a malware author can of statistically extracted features with dynamically extracted
maintain a correlation between the static API and the dynamic features. More particularly, the operation code and API calls
API by calling the injected static API during runtime. Thus, were used to construct two models using CNN for the opcode-
the correlation is preserved while the malicious program is based features and artificial neural network (ANN), the back-
executed. propagation neural network (BPNN) for the API calls-based
Kang and Won [5] combined features extracted from static features. The hidden features extracted from the hidden layer
and dynamic analysis to train an ensemble model for detect- of BPNN were combined with the SoftMax features extracted
ing malware variants. Opcode-type features were extracted from the SoftMax layer of the CNN model to construct the
using static analysis, while API calls-based features were hybrid feature vector. Then, a SoftMax classifier, which uses
extracted using dynamic analysis. The opcode-based fea- the cross-entropy loss, was used to train the malware variant
ture was represented as a grayscale image, while the API classifier. Although such model has improved the classifi-
calls are represented by their term frequency. Random for- cation accuracy to some extent, there is room for improve-
est, XGBoost, and different deep learning algorithms were ment, especially if a single type of behavioral features was
used for classification. XGBoost was reported as the best used.

VOLUME 10, 2022 42765


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 1. Overview of the proposed MDEB-MVDS-SDLXGB model.

In summary, many solutions were proposed to detect mal- III. THE PROPOSED MODEL
ware variants. As shown in Table 1, these solutions were The proposed MDEB-MVDS-SDLXGB model consists of
grouped based on the type of analysis into static, dynamic, six main components: raw behavioral data accusation, data
or hybrid (static and dynamic). Static analysis was fre- preprocessing, features extraction, features representation,
quently reported for malware variant detection. However, features selection, deep multifaceted hidden features extrac-
static features can be hampered by obfuscation techniques tion, and ensemble-based classification. Figure 1 shows an
such as polymorphic and metamorphic malware [4], [16]. overview of the proposed model. As can be seen in Figure 1,
A polymorphic malware changes its appearance frequently by four types of features were extracted namely the API-, File-,
modifying its structure or flow, while metamorphic rewrites Registry-, and Network-based features. After the prepro-
itself from scratch, generating a new malware variant. Some cessing, the extraction of features sequences using n-gram,
obfuscation techniques can prevent feature extraction and the representation using TF/IDF, and the important features
hinder the static analysis by dynamically loading the code are selected, four types of hidden features are extracted
during the run time [13]. API call sequences from both using sequential deep learning. Four sets of hidden fea-
dynamic and static analysis were commonly used to represent tures were extracted denoted by f1, f2, f3, and f4 for API-,
the malware variants. However, depending on API calls is File-, Registry-, and Network-based, respectively. The hid-
ineffective for many reasons. First, malware authors usually den features are merged and used to train a classifier for
use the API calls that are used to develop benign software. decision-making using the XGBoost algorithm. The detailed
Thus, it becomes difficult to differentiate between malware description of each component in Figure 1 is provided in the
and benign depending solely on the API calls. Secondly, the following subsections.
malware author injects unnecessary API calls to hide the
malicious patterns into different benign patterns to evade A. BEHAVIORAL DATA EXTRACTION PHASE
the detection. Thirdly, not all malicious or benign software In this step, different types of behavioral features are col-
use API calls to the function. In this case, the subject file lected about the subject executable file, such as network traf-
may be represented by a sparse vector, and thus, it is hard to fic, file access (read, write, create, or delete), registry access,
distinguish the malicious behavior from the legitimate one. and system call sequence (or API call traces). These features
Many solutions have been suggested to combine different are extracted during the runtime by submitting the subject
types of features to represent the malware author. However, file to a dynamic analysis environment to extract behavioral
most of these solutions combine different types of static features automatically. When the subject file is executed (usu-
feature or API calls sequences extracted from dynamic anal- ally in an isolated environment such as Windows Sandbox)
ysis. Although API-based features from the dynamic and different behavioral data can be captured.
static analysis can achieve high detection performance, other
dynamic behavior such as file auditing, register access, and B. DATA PREPROCESSING
network behavior can further improve the detection accu- Data preprocessing plays an essential role in machine
racy while reducing the false-negative rate. Unfortunately, learning-based models, especially in malware detection,
combining different dynamic behavioral features to detect where the malware can compromise the system in case of
malware variants was not considered. This study proposes misclassification. Data preprocessing helps to eliminate the
a Multifaceted Deep Ensemble Behavioral-based Malware effect of unnecessary content that contributes to classification
Variant Detection Scheme using sequential deep learning and to maximize accuracy. Most of the data collected in the
the eXtreme Gradient Boosting algorithm (MDEB-MVDS- previous step are unstructured text data. It may be dumped
XGB). The MDEB-MVDS-SDLXGB combines multiple from different types of acquisition tools with different for-
behavioral-based features extracted from dynamic analysis. mats and structures. Such data is usually contained redundant
A detailed explanation of the proposed model is provided in and unnecessary features, has missing values, and contains
the subsequent section. noise such as symbols, XML or HTML tags, punctuations,

42766 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

and stop words. Such unnecessary content or inconsistencies where x is the term, tf (x) is the term frequency, df (x) is
should be removed because it can produce misleading results. the document frequency where x has occurred, N denotes the
Therefore, in the preprocessing step, the data is cleaned by number of samples in the given dataset.
removing unnecessary content to help the machine learning The TF-IDF can make general-purpose terms and specific
algorithm find a correct and representative malware pattern terms distinguishable. For example, API terms that are fre-
that is distinguishable from the benign pattern. After spe- quently called by many samples are given scores lower than
cial characters, stop words, punctuation, and unnecessary API calls that are specific for a particular sample or class.
symbols are removed, the data are converted into lowercase The general-purpose terms, which frequently occur in many
characters for consistency. samples from different classes, do not add any information
about the target class. Therefore, the term is ranked when it
C. FEATURES EXTRACTION is frequently used by a class and not frequently used by the
Feature extraction aims to create new informative features other classes.
sets in which the malware variant can be represented better
than using the original features. In this step, a technique E. FEATURES SELECTION
called n-gram is used to extract more features from each One of the challenges of classifying malware is the high
sample by concatenating the subsequent words (also called dimensionality of the features extracted from the behavioral
terms) in the group of n subsequent words that occurred in data of the malware. The large number of features that can
the sample. In n-gram, each subsequent word starting from 1 be extracted by the n-gram technique can lead to either an
to n is used as a unique feature. For example, in one-gram, overfitting or an underfitting problem. Redundant features
every single word is considered one feature, while in two- are a common problem in API calls due to the use of the
gram, every two subsequent words are considered one feature. same API functions for different functions in the program by
N-gram has been commonly used in text data mining appli- both benign and malware authors. In addition, the correlated
cations. N-gram is also used by many malware studies [8], features make the gradient descent algorithm in machine
[10], [34] to extract features from API sequence, strings, and learning-based models oscillate and slow the convergence.
file auditing. The higher is the n value, the more features Moreover, the correlation between the features and the vari-
that can be extracted. However, too many features lead to ance of the loss is high even with a small average value.
high dimensionality, noises, and overfitting problems. In this Thus, the learner is misled and converges in inaccurate coef-
study, the n is set to a range of one and two so every two ficients. Furthermore, some features are very specific to a
subsequent features are combined to represent and then added particular sample, and others are very general. Both types of
to the extracted single features [25], [38], [39]. The reason features make noises that affect the accuracy of the detection.
for selecting n-gram is that a single feature in malware is Therefore, feature selection is an important step in eliminat-
not harmful compared to feature sequence which is more ing redundant features and improving detection performance.
representative [38]. The use of a short sequence consisting of The eXtreme Gradient Boosting Algorithm (XGBoost) was
one or two features sequence is found to be better than using a utilized to select the important features in this study. XGBoost
three-gram in terms of performance as reported in [25], [39]. can estimate the importance of the features during training
by measuring how each feature was useful in the construc-
D. FEATURES REPRESENTATION tion of the boosted decision trees. The feature is ranked
In this step, each sample (malware or benign) is represented based on the number of the split points that contribute to the
by sets of unique terms (vocabulary). These unique terms decision tree. This technique is called the Gini impurity or
were used to create a corpus. Then, all unique words in Gini index. In the Gini index, a feature is more important
the corpus were used as feature vectors that will be used to than the others if its GI f is lower than the other compared
generate the representative feature of each sample. The aim features. Gini index GI f for a feature f can be calculated as
is to transform the text into numerical values so that machine follows.
learning algorithms can deal with it. The Term Frequency- n
X
Inverse Document Frequency (TF-IDF) is used to represent GI f = 1 − p2j (4)
each unique term in the sample features. Thus, for each j=1
sample, every term is the feature vector is represented by its
F(j) : f ≥ t
TF/IDF equivalent value. The TF-IDF is calculated as in the pj(f ≥t) = (5)
|F : f ≥ t|
following formula.
where p denotes to the proportion of samples of each class in
Number of times x occur in a sample a split at point t, n denotes the number of the classes in each
tf (x) = (1)
No. of terms in the sample split, F denotes the set of all values in feature f that are in the
df (x) = Number of documents that has x (2) split f ≥ t, and F(j) denotes the set of all values belonging
 
N to the class j that is in the split f ≥ t. For example, if we
xtf −idf = tf (x) × ln (3)
df (x) + 1 have two classes and a feature f is split at point t, then the

VOLUME 10, 2022 42767


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 2. Shows multifaceted sequential deep learning hidden features.

feature importance (fi) is calculated as follows. hidden layer is the ReLu function while the sigmoid function
!2 is used in the output layer for decision making. To minimize
2 F(j)
X the error and update the weights, the Adam optimizer, which
fi = 1 − (6)
|F| is an extension of the stochastic gradient descent technique,
j=1
was employed. It’s a form of adaptive gradient that uses an
F. DEEP MULTIFACETED HIDDEN FEATURES EXTRACTION adaptive moment estimation technique to estimate a dynamic
learning rate. The model is trained, and then the activation
This phase aims to extract the hidden features representing the
values of the last layer are extracted and used as input features
subject concerning its different behavior in terms of network,
for the XGBoost classifier.
file access, API calls, and registry access.
As can be seen in Figure 2, the important features that
These features are extracted from the last layer of the
were selected in the feature selection phase which is donated
trained deep neural sequential model. They are the activation
by f1 , f2 , . . . fn where n is the number of selected features
values of the last hidden layer with the weights of each
from the four extracted features sets. The selected features
neuron of this layer in the deep learning model. In this phase,
are used as input to four SDL classifiers and the outputs are
four feature vectors are extracted, each representing different
the hidden features S1 , S2 , . . . Sm where m is the number of
malware behavior for each subject. These features are used to
the neuorns in the last hidden layer of the SDLs classifiers.
learn hidden behavioral patterns. Figure 2 illustrates the mul-
Figure 3 represents the methodology of the constructed mul-
tifaceted feature vector extracted from the hidden layer of the
tifaceted sequential deep learning model.
trained sequential deep learning model. Two activities were
Let F is the set of input features selected using the features
conducted to develop these multifaceted features vectors, one
importance and f is an element in F, L is the set of all layers
for training and the other for online operation or testing. In the (l)
training phase, the datasets containing features representing in the deep learning model and l is an element in L, ai is the
(l)
each type of behavior have been split into two subsets, 60% activation value of the node i in level l and wij is the weight
of the data is for the training, and the rest is for testing. In the of input node i for node j in level l and g is the activation
(l)
training phase, sequential deep learning (SDL) is constructed, function. The activation score Si of the last hidden layer of
trained, and validated. The constructed sequential model con- the train and the deep sequential model can be calculated as
sists of five dense layers: one input layer consists of the follows.
number of selected features, three hidden layers with size 64,  
32, and 16 neurons in each hidden layer, respectively, and one n
(0) (1) (0)
X
output layer consists of one neuron to evaluate the learning ai = g wij fi + b(0)  (7)
performance. The activation function used in the input and j=1

42768 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 3. The methodology of the proposed MDEB-MVDS-SDLXGB model.

 
k
(l)
X (l) (l−1) H. ONLINE OPERATION
ai = g wij ai + b(l−2)  (8) The subject file is submitted to the sandbox environment for
j=1
dynamic analysis. The subject is executed, and its different
m
(l) (l)
X (l−1) (l) behavior in terms of API calls, network traffic, file access, and
Si = wij wij ai + b(l−1) (9) registry access is logged. The raw text data collected are pre-
j=1
processed using the aforementioned data preprocessing steps.
where m is the number of nodes in the last hidden layer, k is Then, more features are extracted using the n-gram technique.
the number of nodes in the hidden layer before the last, and Then using the trained TF-IDF vectorization method, the rep-
n is the number of input features. The function g() is the resentative numerical features are created. Using the trained
activation function. In this study, the ReLU function was used feature selection model, only important features are selected
as the activation function of all nodes in the hidden layers. and used as input to the sequential deep learning model.
The sequential deep learning model gives a score between
G. ENSEMBLE BASED CLASSIFICATION zero and one. For each subject, there are four scores to
This phase aims to make the final decision about whether it is represent its behavior in terms of API calls, network traffic,
malware or benign. In this phase, the feature vector obtained file access, and registry access. Finally, these scores are used
from the previous phase is used as input features for the as input features to the trained XGBoost model to decide
Extreme Gradient Boosting algorithm for decision making. whether the subject file is malware or benign. Algorithm 1
The XGBoosting algorithm has been used to train a model and Figure 4 summarize the online operation of the proposed
based on the scores made by the Multifaceted Sequential MDEB-MVDS-XGB model.
Deep Learning model. The gradient boosting method used in
the XGBoost algorithm incrementally creates new decision
IV. PERFORMANCE EVALUATION
trees that consider the error made by the previous decision
This section describes the evaluation process of the proposed
trees. The gradient descent algorithm is used to reduce the
model. It also describes the setup of the experimental envi-
error when a new tree is added. XGBoost uses Taylor expan-
ronment, the used dataset, and the performance measures.
sion to calculate the cost function. The trees are gradually
built and added to the ensemble. A regularization term is used
to prevent the tree from being complex. Figure 3 shows the A. EXPERIMENTAL SETUP
structure of the proposed MDEB-MVDS-SDLXGB model. The four types mentioned above of behavioral features were
The hidden features are extracted from the trained model and extracted during runtime from a dynamic analysis envi-
used for classification in the online operation. ronment. The dynamic malware analysis environment is

VOLUME 10, 2022 42769


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 4. The online operation of the proposed MDEB-MVDS-SDLXGB model.

constructed in an isolated virtual environment with a host detection researchers have already used this dataset [4], [8],
computer CPU Intel (R) Core i7 @ 3.20 GH, and the RAM [10], [21], [34], [36]. There are numerous distinct types of
is 16.0 GB. Cuckoo sandbox tools were used with the virtual malware families in the malware dataset, including trojans,
box to build an isolated and controlled virtual environment adware, backdoors, ransomware, viruses, and worms. The
for malware investigation. The host operating system is Linux Vxheaven collection yielded a total of 19076 malware sam-
Ubuntu 18.04 and Windows 7 as the guest operating system. ples, which were chosen at random. The benign or benign
Windows 7 was used as a victim machine. Several researchers binary files were obtained from a freshly installed Windows
commonly use sandboxes to extract behavioral features [8], operating system. A total of 3994 benign executable and
[34], [35]. dynamic link libraries were collected. As a result, the dataset
The sandbox was set up following the instructions pre- utilized in this study has 23070 samples, with 19076 malware
sented in [29]. The guest Windows 7 operating system was samples and 3994 benign ones.
installed in the virtual machine, and a configured and clean
slate screenshot was made. Many applications have been C. PERFORMANCE MEASURES
installed, some dummy files and folders have been generated, Multiclass performance metrics are commonly used for mea-
and an internet connection has been enabled to make the suring and evaluating the quality of malware detection [37].
guest operating system more realistic to the evasive malware The same metrics have also been used to validate the pro-
sample. The cuckoo agent on the guest operating system runs posed model in this study. These metrics include detection
the provided binary files and hooks their API calls, as well accuracy, detection rate (or recall), false-positive rate, preci-
as logs the network traffic, file access activity, and register sion, and F-measure. However, these performance measures
access behavior. The cuckoo agent on the virtual machine are not enough for the evaluation because they do not con-
collects these behavioral features of the submitted file and sider the fail-safe security principle. We argue that malware
sends them back to the host machine. The virtual machine is detection should consider the false-negative rate more than
then restarted with the initial clean slate restored, allowing the the false positive rate. A false-positive leads to more inves-
new analysis to begin with a fresh copy of the guest operating tigation and analysis (increase human intervention), while a
system. Finally, the API call sequences were extracted from false-negative leads to compromise the security (the fail-safe
the cuckoo agent reports folder using Python programming principle is violated).
packages. This study investigated the models based on the above per-
formance measures, including the false-negative rate. Con-
B. DATASET DESCRIPTION sequently, five main performance evaluation metrics were
The malware binary files used in this study were downloaded used to evaluate the effectiveness of the proposed model,
from the public repository VX Heaven.1 Previous malware namely, detection accuracy (ACC), false-positive rate (FPR),
detection rate (or recall) (DR), and F measures (F1). The
1 https://www.vxheaven.org detection accuracy (ACC) is the percentage of the benign

42770 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

Algorithm 1. The Proposed MDEB-MVDS-XGB Mode Sequential Deep Learning (MB-MVDS-SDL), all the fea-
Input: suspiciousfile , TFIDFTransformer , S the set of the trained deep tures from different domains have been combined in one
learning model, and CXGBoosting feature vector (See Figure 5). The API calls features have
Output: fileclass
been combined with the features extracted from registry
Start
submit access, file access, and network traffic. Then, sequential deep
1: suspiciousfile –––––−→ analysisenvirnoment
learning with four layers was trained for the classification.
Send
2: analysisenvirnoment –––––−→ rawfeatures Like the MB-MVDS-SDL, the three other tested models
preprocessing were designed, but each model was trained using one of
3: rawfeatures –––––––––––––––−→ ∀f s ∈ Fs : Fs = {fs1, fs2, fs3, fs4}
the following machine learning techniques, extreme gradient
N −Gram(features extraction)
4: Fs ––––––––––––––––––––––––––−→ Fn : Fn = {fn1, fn2, fn3, fn4} boosting for the MB-MVDS-XGB model, SVM is used for
5: Calculate TF − IDFusing TFIDFTransformer the MB-MVDS-SVM model, and random forest algorithm
  append was used for the MB-MVDS-RF model.
∀fi ∈ Fnusingx tf −idf = tf (x) × ln df (x)+1 N ––––−→ F
5: ∀fi ∈ Fn Calculate featureimportance fi = 1 −
P2  |F(j) | 2 Store
j=1 |F| −→ F V. RESULTS ANALYSIS AND DISCUSSION
6: F = get_topn_features_score(F)
Figure 6(a) and Table 2 show a comparison between the
&: ∀ S j in S do
7: ∀ nueroninthe last layer l in S find the hidden features
performance of the proposed MDEB-MVDS-SDLXGB with
8: Extract the activation value ai !
(l) the five designed models. As can be seen in Figure 6(a),
k
(l) (l−1)
the proposed MDEB-MVDS-SDLXGB outperforms all the
+ b(l−2)
P
=g wij ai other designed models. It achieved 99.23% accuracy, which
j=1
9:
(l)
Compute the hidden feature f 0 i is better than the performance achieved by the other designed
m
(l) P (l−1) (l)
append models. For example, in terms of accuracy, MDEB-MVDS-
= wij wij ai + b(l−1) ––––−→ F 0 j
j=1
SDLXGB achieved 0.8%, 0.5%, 1.2%, and 1.6% better
append than the other designed models MDEB-MVDS-SDLMV,
F0 j –––– F0
−→
MB-MVDS-SDL, MB-MVDS-SVM, and MB-MVDS-RF,
classify
10: CXGBoosting (F 0 ) ––––−→ fileclass respectively. Similarly, in terms of the recall, the proposed
End model MDEB-MVDS-SDLXGB achieved the best true pos-
itive rate compared to the other tested model, while the
SVM-based model achieved the lowest true positive rate.
In terms of precision, the majority voting-based ensemble
samples correctly classified to all the classified samples. The
MDEB-MVDS-SDLMV achieved the best precision, fol-
detection rate (DR), also called recall, is the fraction of mal-
lowed by the proposed MDEB-MVDS-SDLXGB model.
ware samples that are correctly classified. The false-positive
Although the model with the majority voting scheme,
rate (FPR) is the percentage of the incorrectly classified
MDEB-MVDS-SDLMV, achieved the best precision among
instances as malware samples. The false-negative rate (FNR)
all the tested models, such achievement is not praised in
is the percentage of the instances that are incorrectly clas-
security and malware detection, which violate the fail-safe
sified as benign samples. F-measures (F1) is the harmonic
principle. When the precision is higher than the recall, that
mean and calculated as in Equation (10), where the TP is
is an indication of a higher false-negative rate, which means
the number of malware samples that are correctly classified,
increasing undetected malware, which makes the target vul-
FP number of benign samples that are wrongly classified, and
nerable. Therefore, recall is more important than precision,
FN number of malware samples that are wrongly classified.
and thus the MDEB-MVDS-SDLXGB model wins in this
2 × TP case. The results in terms of the F-measure confirm how
F1 = (10) the MDEB-MVDS-SDLXGB model has the better trade-
2 × TP + FN + FP
off of precision and recall. It achieves 99.48% F-measure,
D. PERFORMANCE EVALUATION which is better than the other designed models MDEB-
The MDEB-MVDS-SDLXGB model proposed in Figure 2 MVDS-SDLMV, MB-MVDS-SDL, MB-MVDS-SVM, and
has been evaluated by comparing its performance with MB-MVDS-RF by 0.51%, 0.33%, 0.7%, 1.1%, and 2.9%,
the other five designed models as follows. MDEB-MVDS- respectively. Overall, the proposed MDEB-MVDS-SDLXGB
SDLXGB is compared with the Multifaceted and Deep achieved the best performance, followed by the MB-MVDS-
Ensemble Behavioral-Based Malware Variant Detection SDL model. The MDEB-MVDS-SDLMV model has a
Scheme using Sequential Deep Learning Technique with trade-off recall by precision; because of this, its over-
Majority Voting Scheme (MDEB-MVDS-SDLMV). The all performance is lower than MB-MVDS-SDL. Although
Majority Voting Scheme has replaced the XGBoosting tech- the combined features with the random forest-based model
nique of the proposed model MDEB-MVDS-SDLXGB in MB-MVDS-RF works well with the high definitional data, its
the MDEB-MVDS-SDLMV. Meanwhile, in the Multifaceted performance archives the worst performance among studies
Behavioral-Based Malware Variant Detection Scheme using models. There are two interpretations of this behavior. The

VOLUME 10, 2022 42771


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 5. Combined behavioral features vector.

FIGURE 6. Overall performance comparison between the proposed multifaceted deep ensemble behavioral-based malware variant detection scheme
using sequential deep learning technique with XGBoost (MDEB-MVDS-SDLXGB), and with the majority voting scheme (MDEB-MVDS-SDLMV), and
multifaceted behavioral-based malware variant detection scheme using sequential deep learning (MB-MVDS-SDL), using XGBoost (MB-MVDS-XGB),
using support vector machine (MB-MVDS-SVM), and using random forest (MB-MVDS-RF) in terms of (a) accuracy, recall, precision, and F-measure (b) FPR
and FNR.

TABLE 2. Performance comparison between the proposed model and MDEB-MVDS-SDLMV, which achieved a 0.9% false-
those evaluated by others.
positive rate, followed by the proposed MDEB-MVDS-
SDLXGB model, which archives a 1.56% false-positive
rate. The worst false positive rate has been achieved by
MB-MVDS-XGB where the features were combined and the
XGBoost algorithm was used for classification. Although
reducing the false positive rate is important, it is not critical
for malware detection like reducing the false-negative rate.
Reducing FNR is a critical security requirement in malware
detection because it may lead to successful attacks. As can be
seen in Figure 7 and Table 2, the proposed MB-MVDS-XGB
model archives the best reduction in terms of FNR followed
first is the use of majority voting for decision-making, and by the combined feature vector with the sequential deep learn-
the second is the sparsity of the data. ing MB-MVDS-SDL achieve a 0.67% false-negative rate.
Figure (6) and Table 3 present the performance in terms However, the MB-MVDS-SDL model has a trade-off of the
of FPR and FNR. The lowest FPR has been achieved by FNR by the FPR, as can be observed in Figure 3.

42772 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 7. Performance measures using the sequential deep learning technique.

FIGURE 8. The performance measures using extreme gradient boosting algorithm.

Figures 7 (a) and (b) illustrate the performance of the TABLE 3. Results in terms of FPR and FNR.
Sequential Deep Learning Classification for four study types
of behavioral features. Figure 7 (a) displays the accuracy,
recall, precision, and recall, while Figure 8 (b) displays the
FPR and FNR. As shown in Figure 7 (a), the API call
sequence can effectively represent most malware variance.
However, API call sequence-type features create relatively
high FPR. A combined features vector with sequential deep
learning performs better than a single type of behavioral
feature. The results in Figures 7 (a) and (b) show that each
type of behavior contributes to creating a more distinctive
malware variant. It shows how the false-negative rate has been
reduced using the combined behavioral vector compared to
the FNR of the individual behavioral vector. false-negative rate FNR = 6.2%, which is the main drawback
Figures 8 (a) and (b) show the performance of the trained of this model.
models using the Extreme Gradient Boosting Algorithm. Figures 9 (a) and (b) illustrate the performance of the
Figure 8 (a) presents the results in terms of accuracy, recall, trained models using the SVM technique. Figure 9 (a) shows
precision, and recall, while Figure 8 (b) shows the FPR and the performance in terms of accuracy, recall, precision, and
FNR results. As can be seen in Figure 8 (a), the model trained recall, while Figure 9 (b) shows the FPR and FNR results.
using the combined features archives the best accuracy, detec- Similar to the XGBoost model in Figure 9, the model trained
tion rate (recall), F-Measure, among others, while the model using the combined features archives the best accuracy, detec-
designed based on the API call sequence features archives the tion rate (recall), F-Measure compared with the other studied
best performance in terms of precision and FPR. However, models. Meanwhile, the model designed based on the API
the API call sequence type features create a relatively high call sequence features archives the best performance in terms

VOLUME 10, 2022 42773


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 9. The performance measures using the support vector machine technique.

FIGURE 10. Performance measures using the random forest technique.

of precision and FPR. However, this model creates a relatively type of behavioral feature is used. Combined features are
high false-positive rate FPR = 3%, which is the main draw- outstanding in terms of reducing the FNR and FPR while
back of this model. achieving a high detection rate (recall). Moreover, a model
Figures 10 (a) and (b) show the performance of the trained designed with sequential deep learning achieved the best
models using the Random Forest Algorithm. Figure 10 (a) reduction of false-negative rates with high detection accuracy.
shows the performance in terms of accuracy, recall, preci- XGBoost algorithm achieved a low false-negative rate while
sion, and recall, while Figure 10 (b) displays the FPR and RF suffered from high FNR. Meanwhile, SVM achieves the
FNR results. It can be observed that the RF-based model best trade-off between precision and recall; however, both
trained using the combined features archives the best perfor- FPR and FNR are relatively higher than those of the proposed
mance compared with the other studied models. However, MDEB-MVDS-SDLXGB model.
this model generates a high false-positive rate FPR = 2% To have insights into how the proposed MDEB-MVDS-
with FNR = 5.9%, which is the main problem of this SDLXGB performs with different malware categories,
model. Table 4 illustrates the detection accuracy for each malware
From Figures 7, 8, 9, and 10, we can conclude that the category in the dataset. As can be seen in Table 4, there are
models designed using the combined features sets achieve nine malware categories in the testing dataset namely, Virus,
better accuracy than those were designed using individual Worm, Backdoor, Trojan, Downloader, Bot, Dropper, Spy-
features. The better achievement is because the behavior of ware, Keylogger, and Generic. The majority of the malware
malware variants can be better represented by considering in the dataset are either Generic or Trojan. This malware are
many types of behavior. When different behavioral features belonging to different malware families. In most cases, the
are considered, the model can accurately distinguish between proposed MDEB-MVDS-SDLXGB model archives higher
malicious and non-malicious behavior. In most of the cases than 99.2%. However, deep investigation needs to be con-
studied, API-based features can well represent malware vari- ducted on balanced malware families. Such investigation has
ants. However, high false alarms are observed when a single been lifted for future study.

42774 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

TABLE 4. Detection accuracy based malware category. TABLE 6. Performance comparison with related work.

TABLE 7. The improvement gained by the proposed model.

TABLE 5. Performance in terms of detection evasive malware (%).

To evaluate the performance of the proposed MDEB- of the .text section of the PE samples. These features are
MVDS-SDLXGB model in terms of detecting evasive mal- commonly used in the literature in congestion with API-
ware behavior, to memic evasive behavior, evasive malware based features, as mentioned in [4], [5]. Both models were
samples are created by injecting APIs sequences related to trained using the XGBoots algorithm due to its effectiveness
benign samples into the malware APIs sequence to repre- for APIs classification compared to other machine learning
sent the evasive behavior. Table 5 shows the performance of techniques (as discussed in the previous section, see Figure 6
detecting such evasive malware behavior. As can be noticed and reported in [5]). Figure 11 and Table 5 present the detailed
in Table 5, the performance has been slightly degraded as performance comparison of the proposed model with the
compared to the results on the original dataset before inject- corresponding state of the art in terms of accuracy, recall,
ing the evasive behavior (See Tables 2 and 3). However, the precision, F Score, and the false positive rate and false-
performance of the proposed model is still higher than the negative rate.
other tested and also with the related work as compared in Table 6 lists the performance comparison between
the subsequent section. The use of ensemble deep learning the proposed model (MDEB-MVDS-SDLXGB) with three
classifiers with diverse features sets exposes such an evasive related works in which API features are extracted using
technique. dynamic analysis with adaptive deep learning classifier as in
Darem et al. [25], API Call Sequences extracted from
VI. COMPARISON WITH THE RELATED WORK dynamic analysis as proposed in [5], and API Calls extracted
The proposed model is compared with state-of-the-art related from static analysis namely from the Import Address-
solutions. As mentioned earlier, most of the related work used able Table (IAT) as in [31]. As shown in Table 7, the
API call sequences to construct the malware detection model proposed MDEB-MVDS-SDLXGB model outperforms the
[2], [4], [5], [25], [26], [31], [32]. Accordingly, the proposed related work concerning all tested performance measures.
model in this study was compared with the models that uti- The improvement gained by the proposed model is listed in
lized the API calls features extracted from either dynamic or Table 7.
static analysis. The comparison with the model in [25] was As can be seen in Table 6 and 7, the overall performance in
made without providing the labels during the testing (assum- terms of F-measure of the proposed model is 99.48% which is
ing no human intervention), which is the main limitation of 1.26% higher than Darem et al. [25], 1.95% higher than API
the model in [25]. As mentioned in Section 2 (the related work Call Sequences extracted from dynamic analysis as proposed
section), the models designed in [26], [31], [32] extracted the by Kang et al. [5], and 3.21% higher than the performance
API calls from the import address table (IAT) of the PE files using the API Calls extracted from static analysis (IAT) as in
using static analysis. Meanwhile, the solutions in [2], [4], [5], Liu et al. [31].
[25] used API sequences extracted from dynamic analysis. To sum up, the results of the proposed MDEB-MVDS-
Accordingly, two models were implemented for the com- SDLXGB model support the hypothesis of integrating dif-
parisons, each consisting of two classifiers. The first model ferent behavioral features to extract the hidden patterns that
utilizes the API Calls Sequences that were extracted from the can effectively discriminate between benign and malicious
dynamic analysis, and the second model uses the IAT-based programs. It is clear from the results of the deep learning-
API calls to construct the first classifier. The second classifier based classifier with combined features MB-MVDS-SDLC
is constructed using features extracted from the binary sets (see Tables 2 and 3) as compared to the performance

VOLUME 10, 2022 42775


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

FIGURE 11. The performance comparison with the related work.

of the single feature set used in MB-MVDS-SDL, Ensemble-based learning creates multiple different patterns
MB-MVDS-XGB, MB-MVDS-SVM, and MB-MVDS-RF that represent different behavioral perspectives. An obfus-
(see Figures 7,8,9, and 10). The correlation between fea- cated malware variant can be detected and naturalized due
tures was also considered. The ensemble-based learning con- to its difficulty in hiding its malicious behavior. The results
tributed to considering different sets of patterns that malware show that the proposed model improves the detection accu-
can represent a wide range of behaviors, and this interprets racy while reducing the false-negative rate compared to the
the effectiveness of the ensemble classifiers MDEB-MVDS- related evaluated models.
SDLMV (see Table 2 and Figures 6 and 7) compared to the One challenge that may face the proposed detection model
nonensemble-based model MB-MVDS-SDL (see Table 2 and is evasive malware that does not show its malicious behavior
Figure 8). during the feature extraction phase. A stealthy malicious pro-
Although the proposed model attends the highest accuracy gram that behaves like a benign one can go undetected until
even with the tested evasive malware behavior as compared specific conditions have occurred. One can think of including
by the related works, a deep analysis of obfuscated and features from the static analysis to extract such statistical fea-
evasive malware behavior is needed. Because the main focus tures. However, static features are subject to obfuscation by
of this paper is on variant malware detection, the in-depth malware authors; thus, they can remain hidden. One should
investigation of obfuscated and evasive malware behavior is consider continuous monitoring of behavioral activities as
lifted for future work. However, as shown in Table 5, the a critical, challenging, and open research problem. Future
use of combined features with ensemble deep learning makes research will extract features from the runtime environment to
it possible for detecting evasive behavior especially when continuously monitor malware behavior and detect malicious
malware uses benign APIs sequence to evade the detection. patterns.

VII. CONCLUSION AND FUTURE WORK REFERENCES


This study proposes a multifaceted and Deep Ensemble
[1] (2020). AV-TEST. Malware Statistics and Trends Report.
Behavioral-based Malware Variant Detection Scheme using (Dec. 27, 2020). [Online]. Available: https://www.av-test.org/en/statistics/
sequential deep learning and the Extreme Gradient Boost- malware/#: :text=Every%20day%2C%20the%20AV%2DTEST,potentially
%20unwanted%20applications%20(PUA).
ing algorithm. The proposed model combines different sets
[2] M. Sun, X. Li, J. C. Lui, R. T. Ma, and Z. Liang, ‘‘Monet: A user-oriented
of behavioral features to detect the malware variants. The behavior-based malware variants detection system for android,’’ IEEE
hypothesis is that each type of behavioral feature can tell Trans. Inf. Forensics Security, vol. 12, no. 5, pp. 1103–1112, May 2017.
a part of the maliciousness or goodness of the investigated [3] W. Zhang, H. Wang, H. He, and P. Liu, ‘‘DAMBA: Detecting Android
malware by ORGB analysis,’’ IEEE Trans. Rel., vol. 69, no. 1, pp. 55–69,
executable file. A deep multifaceted hidden features vector Mar. 2020.
is extracted automatically from the last hidden layer of a [4] J. Zhang, Z. Qin, H. Yin, L. Ou, and K. Zhang, ‘‘A feature-hybrid malware
trained deep sequential learning model. Four deep learning variants detection using CNN based opcode embedding and BPNN based
models were constructed, each trained based on different API embedding,’’ Comput. Secur., vol. 84, pp. 376–392, Jul. 2019.
[5] J. Kang and Y. Won, ‘‘A study on variant malware detection techniques
sets of behavioral features such as API calls sequence, file using static and dynamic features,’’ J. Inf. Process. Syst., vol. 16, no. 4,
access behavior, registry access, and network traffic. The pp. 882–895, 2020.
hidden representative features are extracted from the hidden [6] Threat Report, in Webroot Smarter Cybersecurity, H. Lonas, Ed., Webroot,
Broomfield, CO, USA, 2018.
layer of each trained deep learning model and combined into
[7] X. Liu, X. Du, Q. Lei, and K. Liu, ‘‘Multifamily classification of Android
one feature vector. These features are used as input to the malware with a fuzzy strategy to resist polymorphic familial variants,’’
XGBoost technique to train a set of ensemble classifiers. IEEE Access, vol. 8, pp. 156900–156914, 2020.

42776 VOLUME 10, 2022


A. A. Al-Hashmi et al.: Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model

[8] B. A. S. Al-rimy, M. A. Maarof, M. Alazab, S. Z. M. Shaid, F. A. Ghaleb, [25] A. A. Darem, F. A. Ghaleb, A. A. Al-Hashmi, J. H. Abawajy, S. M. Alanazi,
A. Almalawi, A. M. Ali, and T. Al-Hadhrami, ‘‘Redundancy coefficient and A. Y. Al-Rezami, ‘‘An adaptive behavioral-based incremental batch
gradual up-weighting-based mutual information feature selection tech- learning malware variants detection model using concept drift detection
nique for crypto-ransomware early detection,’’ Future Gener. Comput. and sequential deep learning,’’ IEEE Access, vol. 9, pp. 97180–97196,
Syst., vol. 115, pp. 641–658, Feb. 2021. 2021.
[9] Y. A. Ahmed, B. Koçer, S. Huda, B. A. Saleh Al-rimy, and M. M. Hassan, [26] J. Bai, Q. Shi, and S. Mu, ‘‘A malware and variant detection method
‘‘A system call refinement-based enhanced minimum redundancy maxi- using function call graph isomorphism,’’ Secur. Commun. Netw., vol. 2019,
mum relevance method for ransomware early detection,’’ J. Netw. Comput. pp. 1–12, Sep. 2019.
Appl., vol. 167, Oct. 2020, Art. no. 102753. [27] J. Huang, X. Zhang, L. Tan, P. Wang, and B. Liang, ‘‘AsDroid: Detecting
[10] A. G. Kakisim, M. Nar, and I. Sogukpinar, ‘‘Metamorphic malware identi- stealthy behaviors in Android applications by user interface and program
fication using engine-specific patterns based on co-opcode graphs,’’ Com- behavior contradiction,’’ in Proc. 36th Int. Conf. Softw. Eng., May 2014,
put. Standards Interfaces, vol. 71, Aug. 2020, Art. no. 103443. pp. 1036–1046.
[11] G. Xiao, J. Li, Y. Chen, and K. Li, ‘‘MalFCS: An effective malware [28] W. Han, J. Xue, Y. Wang, L. Huang, Z. Kong, and L. Mao, ‘‘MalDAE:
classification framework with automated feature extraction based on deep Detecting and explaining malware based on correlation and fusion of
convolutional neural networks,’’ J. Parallel Distrib. Comput., vol. 141, static and dynamic characteristics,’’ Comput. Secur., vol. 83, pp. 208–233,
pp. 49–58, Jul. 2020. Jun. 2019.
[12] A. Khalilian, A. Nourazar, M. Vahidi-Asl, and H. Haghighi, ‘‘G3MD: [29] C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich, V. Paxson, N. Pohlmann,
Mining frequent opcode sub-graphs for metamorphic malware detection H. Bos, and M. V. Steen, ‘‘Prudent practices for designing malware exper-
of existing families,’’ Expert Syst. Appl., vol. 112, pp. 15–33, Dec. 2018. iments: Status quo and outlook,’’ in Proc. IEEE Symp. Secur. Privacy,
[13] W. Wang, Z. Gao, M. Zhao, Y. Li, J. Liu, and X. Zhang, ‘‘Droidensemble: May 2012, pp. 65–79.
Detecting Android malicious applications with ensemble of string and [30] S. Hameed, F. I. Khan, and B. Hameed, ‘‘Understanding security require-
structural static features,’’ IEEE Access, vol. 6, pp. 31798–31807, 2018. ments and challenges in Internet of Things (IoT): A review,’’ J. Comput.
[14] X. Liu, Y. Lin, H. Li, and J. Zhang, ‘‘A novel method for malware detection Netw. Commun., vol. 2019, pp. 1–14, Jan. 2019.
on ML-based visualization technique,’’ Comput. Secur., vol. 89, Feb. 2020, [31] L. Liu, B.-S. Wang, B. Yu, and Q.-X. Zhong, ‘‘Automatic malware
Art. no. 101682. classification and new malware detection using machine learning,’’
[15] S.-C. Hsiao, D.-Y. Kao, Z.-Y. Liu, and R. Tso, ‘‘Malware image classifica- Frontiers Inf. Technol. Electron. Eng., vol. 18, no. 9, pp. 1336–1347,
tion using one-shot learning with siamese networks,’’ Proc. Comput. Sci., Sep. 2017.
vol. 159, pp. 1863–1871, Jan. 2019. [32] M. Fan, J. Liu, X. Luo, K. Chen, Z. Tian, Q. Zheng, and T. Liu, ‘‘Android
[16] Z. Cui, X. Fei, X. Cai, C. Yang, G. G. Wang, and J. Chen, ‘‘Detection malware familial classification and representative sample selection via
of malicious code variants based on deep learning,’’ IEEE Trans. Ind. frequent subgraph analysis,’’ IEEE Trans. Inf. Forensics Security, vol. 13,
Informat., vol. 14, no. 7, pp. 3187–3196, Jul. 2018. no. 8, pp. 1890–1905, Aug. 2018.
[17] Q. Qian and M. Tang, ‘‘Dynamic API call sequence visualisation for mal- [33] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento, ‘‘A (sub)graph
ware classification,’’ IET Inf. Secur., vol. 13, no. 4, pp. 367–377, Jul. 2019. isomorphism algorithm for matching large graphs,’’ IEEE Trans. Pattern
[18] S. S. Chakkaravarthy, D. Sangeetha, and V. Vaidehi, ‘‘A survey on malware Anal. Mach. Intell., vol. 26, no. 10, pp. 1367–1372, Oct. 2004.
analysis and mitigation techniques,’’ Comput. Sci. Rev., vol. 32, pp. 1–23, [34] B. A. S. Al-Rimy, M. A. Maarof, M. Alazab, F. Alsolami,
May 2019. S. Z. M. Shaid, F. A. Ghaleb, T. Al-Hadhrami, and A. M. Ali,
[19] Z. Salehi, A. Sami, and M. Ghiasi, ‘‘MAAR: Robust features to detect ‘‘A pseudo feedback-based annotated TF-IDF technique for dynamic
malicious activity based on API calls, their arguments and return values,’’ crypto-ransomware pre-encryption boundary delineation and features
Eng. Appl. Artif. Intell., vol. 59, pp. 93–102, Mar. 2017. extraction,’’ IEEE Access, vol. 8, pp. 140586–140598, 2020.
[20] C. K. Patanaik, F. A. Barbhuiya, and S. Nandi, ‘‘Obfuscated malware [35] Y. Ye, T. Li, D. Adjeroh, and S. S. Iyengar, ‘‘A survey on malware
detection using API call dependency,’’ in Proc. 1st Int. Conf. Secur. Internet detection using data mining techniques,’’ ACM Comput. Surv., vol. 50,
Things (SecurIT), 2012, pp. 185–193. no. 3, pp. 1–40, May 2018.
[21] A. Sami, B. Yadegari, N. Peiravian, S. Hashemi, and A. Hamze, ‘‘Malware [36] M. Sikorski and A. Honig, Practical Malware Analysis the Hands-On
detection based on mining API calls,’’ in Proc. ACM Symp. Appl. Comput. Guide to Dissecting Malicious Software. 2012.
(SAC), 2010, pp. 1020–1025. [37] S. A. Ebad, A. A. Darem, and J. H. Abawajy, ‘‘Measuring software
[22] N. Kumar, S. Mukhopadhyay, M. Gupta, A. Handa, and S. K. Shukla, obfuscation quality—A systematic literature review,’’ IEEE Access, vol. 9,
‘‘Malware classification using early stage behavioral analysis,’’ in Proc. pp. 9903–99024, 2021.
14th Asia Joint Conf. Inf. Secur. (AsiaJCIS), Aug. 2019, pp. 16–23. [38] Zhang, H., Classification of ransomware families with machine learning
[23] R. Sihwail, K. Omar, and K. A. Z. Ariffin, ‘‘A survey on malware analysis based onN-gram of opcodes. Future Generation Computer Systems, 2019.
techniques: Static, dynamic, hybrid and memory analysis,’’ Int. J. Adv. Sci., vol. 90, pp. 211–221.
Eng. Inf. Technol., vol. 8, nos. 4–2, p. 1662, Sep. 2018. [39] D. Gibert, C. Mateu, and J. Planes, ‘‘The rise of machine learning for detec-
[24] D. K. Mahawer and A. Nagaraju, ‘‘Metamorphic malware detection using tion and classification of malware: Research developments, trends and
base malware identification approach,’’ Secur. Commun. Netw., vol. 7, challenges,’’ J. Netw. Comput. Appl., vol. 153, Mar. 2020, Art. no. 102526.
no. 11, pp. 1719–1733, Nov. 2014.

VOLUME 10, 2022 42777

You might also like