Authors:
Sanidhya Vijayvargiya
1
;
Lov Kumar
2
;
Lalita Murthy
1
;
Sanjay Misra
3
;
Aneesh Krishna
4
and
Srinivas Padmanabhuni
5
Affiliations:
1
BITS-Pilani Hyderabad, India
;
2
NIT kurukshetra, India
;
3
Østfold University College, Halden, Norway
;
4
Curtin University, Australia
;
5
Testaing.Com, India
Keyword(s):
SMOTE, ANOVA, Genetic Algorithm, Ensemble Learning, Malware Family.
Abstract:
Malware is used to attack computer systems and network infrastructure. Therefore, classifying malware is essential for stopping hostile attacks. In the after-effects of COVID-19, the virtual presence of individuals has greatly increased. From money transactions to personal information, everything is shared and stored in cyberspace. This has led to increased and more innovative malware attacks. Advanced packing and obfuscation methods are being used by malware variants to get access to private information for profit. There is an urgent need for better software security. In this paper, we identify the best ML techniques that can be used in combination with various ML and ensemble classifiers for malware classification. The goal of this work is to identify the ideal ML pipeline for detecting the family of malware. Imbalanced datasets and a lack of feature selection have plagued many previous works. The best tools for describing malware activity are application programming interfaces (AP
Is). However, creating API call attributes for classification algorithms to achieve high accuracy is challenging. The dataset used to validate the proposed method includes API call count histogram features extracted by dynamic analysis. The experimental results demonstrate that the proposed ML pipeline may effectively and accurately categorize malware, producing state-of-the-art results.
(More)