A Survey On Data Mining Techniques For COVID Prediction

A Survey on Data Mining Techniques for COVID Prediction

Jayshree Pawar1, Urjita Thakar2
Research Scholar, Department of Computer Engineering, Shri Govindram Seksaria Institute of Technology and
Science, Indore, India, [email protected]
Professor, Department of Computer Engineering, Shri Govindram Seksaria Institute of Technology and Science,
Indore, India, [email protected]

ABSTRACT causes severe sickness and loss of life in a number of cases if

not treated in early stages. According to a study in which
Corona Virus Disease of 2019 (COVID-19) has emerged as a parametric analysis was carried out, COVID-19 has a growth
serious health emergency worldwide. The symptoms of rate that is roughly twice that of SARS and MERS [5].
COVID-19 are un-detectable at early stage in most of the
patients. It spreads from person to person very rapidly and Health care workers are working over time for more than past
causes severe sickness and loss of life in a number of cases if one year to fight against this deadly virus. Every day, the
not treated early. Data mining techniques are very commonly health-care industry is generating massive amount of data
being used in medical sector for detection and prediction of a about COVID patients and disease. Researchers and
variety of diseases and medical conditions of patients. A physicians across the globe are working together to detect the
number of researchers are also working towards prediction of infection in people at early stages and find a treatment to cure
possibility of infection of COVID-19 among humans using this disease.
machine learning techniques, specifically by applying data
mining methods. In this paper, an extensive survey of Some researchers have analysed severity of COVID infection
available literature in the domain of prediction of COVID-19 based on specific existing illness condition such as cancer,
infection and other diseases has been presented. This also pneumonia, pregnancy, hypertension etc. Also different types
includes survey on data mining techniques, models and of datasets have been used by different researchers. These
various datasets. include images from chest X-ray, CT-Scans, pathological
reports of patients etc. Some researchers are focusing on test
Key words : Data Mining, Machine learning, COVID-19, methods and attempting to minimise testing workload [6].
Prediction, Diagnosis, Feature Selection, Misclassification.
Many researchers are using machine learning methods,
1. INTRODUCTION especially data mining techniques in the healthcare domain.
These are used to discover useful information out of a huge
Coronavirus epidemic has grappled the whole world. amount of data and to present it in an easy-to-understand
Countries on all the continents are fighting to save their format for humans. Classification and clustering are among
citizens from this deadly disease. The World Health the most common data mining techniques. Disease prediction
Organization revealed the official name of the pneumonia is very significant application of these techniques.
transmitted by this virus as "COVID-19" or "Corona Virus
Disease 2019" on February 11, 2020 [1]. Corona virus is an The algorithms of machine learning are very important for
infection transmitted by a novel severe acute respiratory the diagnosis of diseases and they have a significant impact in
syndrome coronavirus 2 (SARS-Cov-2). The virus spreads the medical field. Medical data mining is a term used to
rapidly among people in many different ways which is a major describe a variety of strategies for discovering valuable
concern around the world [2]. Though it spreads primarily patterns that assist in medical diagnosis. This is aimed at
through the air [3]. According to a report, World Health improved disease prediction and early diagnosis. This aids in
Organisation first identified this virus on 31 Dec 2019 in faster and better medical treatment and patient care [16].
Wuhan city, China. Many of the first cases of COVID-19
were related to Huanan seafood wholesale market, implying Despite the fact that several studies have been conducted on
that SARS-CoV-2 was spread from animals to humans [4]. prediction of COVID and other diseases using various
This pandemic is a major public health issue that is affecting machine learning techniques and variety of datasets, very
people all over the world. Also, it is a contagious disease and little literature containing a survey of these is available. In
this work, therefore, a review of numerous studies conducted
on the prediction of COVID and other diseases is presented.

Rest of the paper is organized as follows: In section II, recent R. Kumari et al. examined some established forecasting
study on COVID prediction is given. In section III, the work models in depth and predicted the number of confirmed,
done to predict various diseases is discussed. In section IV, recovered, and death cases caused by COVID-19 in India.
feature selection techniques presented by different researchers This research looked into how COVID-19 spread through
is discussed. In section V, the literature on data imbalancing India. The possibility of cases that may arise in the future was
techniques is discussed. Finally, in section VI the conclusion predicted using multiple linear regression and autoregression.
and discussion is presented. This prediction may be useful in resource control, such as
health care and prompt steps can be taken with advance
2. RECENT STUDY ON COVID PREDICTION planning to minimise human life loss [6].

In past one and half years lot of literature has been generated In a report, Epidemiologists predicted COVID-19 confirmed
in the field of recognition and prediction of COVID disease by cases which could rise in the United States and various other
using number of machine learning techniques and different nations. Authors used two unsupervised classic clustering
datasets. approaches: K-means clustering and correlation to forecast
the distribution of disease. They also predicted a 0.85
At the early time of spread of COVID in the year 2020, Naoya relationship between overall deaths and critical patient
Itoh et al. studied the COVID immune responses, genomics, attributes [12].
identification, treatment and management of the disease.
They also reviewed the prevention and control strategies for V. Bhadana et al. compared five machine learning standard
such disease. They advised that globally countries need to pay models to forecast the threatening variables of COVID-19:
more attention to corona virus disease monitoring systems linear regression (LR), decision tree, least absolute shrinkage
and increase country’s readiness [2]. and selector operator, random forest, and SVM. Each model
generated three types of forecasts for COVID prediction in the
To detect coronavirus disease, CT scans, Magnetic Resonance next five days: total active cases, total deaths, and total
Imaging (MRI), and other imaging techniques are beneficial. recoveries. Authors analysed the findings of the experiment
Medical images assist physicians in determining the impact in which poly LR, LASSO, Random forest, and decision tree
of disease on affected persons. This type of data is extremely showed better results and SVM showed a weak outcome [13].
useful to detect the accuracy and efficiency of diagnosis [6].
Therefore, large amount of data containing medical images E.V. Robilotti et al. analysed the risk factors on COVID-19
are available. for severe infection in patients with more than one illness
including cancer. Authors proposed that the result of
COVID prediction from X-rays using Artificial Intelligence coronavirus was more severe among people with cancer [14].
(AI) techniques can be extremely useful, and it could be able
to alleviate the shortage of doctors and physicians in rural Early risk identification of COVID can be done byexamining
areas. F. Pan et al. examined the improvements in the lungs of three primary sounds: coughing, breathing, and voice [1]. A
COVID patients from initial stage of diagnosis to recovery of number of researchers have analysed the features of cough,
patient using the chest CT findings [7]. Radiologists breathing, and voice of the patients using the Recurrent
differentiated COVID from other viral pneumonia using chest Neural Network (RNN), Convolution Neural Network
CT scans with a high degree of specificity [8]. D. Haritha et al. (CNN), Artificial Neural Network (ANN) and specifically its
presented a transfer learning method for predicting important well-known architecture, the Long-Short Term
coronavirus cases from images of patient's chest X-ray [9]. In Memory (LSTM) [15], [16], [17]. As opposed to coughing
another work, authors used Computed Tomography (CT) and breathing sound samples, the speech test had a poor
images. They constructed an ensemble model using accuracy [15].
multivariate logistic regression by combining the features of
radiomics and deep learning to distinguish critical cases from Based on the recent researches it has been observed that
severe cases of COVID [10]. various forecasting models have been developed to predict
COVID. Authors used chest X-ray images, cancer reports of
With chest X-rays, S. Rajaraman et al. showed COVID-19 patients to predict the possibility of infection. The authors
pulmonary manifestation detection using iteratively pruned demonstrated the ability of machine learning models to
deep learning model ensembles. To minimise complexity and predict the number of future covid-19 infected patients, which
increase memory performance, the best performing classifiers is now considered as a significant challenge to humanity.
were pruned iteratively. To boost classification accuracy,
authors performed predictions by combining the pruned 3. DISEASE PREDICTION USING DATA MINING
models using various ensemble strategies. Improved
predictions were achieved by combining modality-specific Earlier researchers have done lot of work in the area of disease
information transfer, iterative process pruning, and ensemble prediction. Contribution of various researchers is discussed
learning [11]. next.

Disease Prediction is a popular application of data mining. Authors Presented comprehensive literature review of various
Various types of diseases such as liver disorder, diabetes, deep learning algorithms for early risk identification and
breast cancer, thyroid illness, skin cancer, etc. can be classification of skin cancer. Researchers used different
predicted using data mining. algorithms such as Artificial Neural Network, Convolutional
Neural Network, Kohonen Self-Organizing Neural Network,
R. Vijiyarani et al. presented various algorithms of machine and Generative Adversarial Neural Network for classification
learning used in the area of disease prediction. The focus of of lesion images. Studies revealed that CNN performed better
survey presented was use of data mining techniques and as compared to other algorithms for classification of image
multiple target attributes to predict various types of diseases data [29].
namely Heart disease, Diabetes and Breast cancer disease
predictions [18]. Some authors focused on the decision The analysis of the literature reveals that researchers have
parameters, attributes and features that are used to predict used several machine learning methods for diagnosing
disease. They highlighted the usefulness of various various types of diseases. The classification model's accuracy
classification systems for disease detection in medical ranged between 85% to 98%.
datasets [19].
In next section, the work carried out for feature selection by
A number of researchers investigated and compared different different researchers is presented.
data mining and machine learning techniques such as hybrid
ANN, back propagation, Decision Tree, Random Forest, 4. FEATURE SELECTION FOR CLASSIFICATION
Naive Bayes, KNN, association rule etc. for prediction of
heart disease [20]-[23]. In medical datasets, sometimes data is available in raw form
and it may also contain irrelevant features. It is rare that all
Diabetes affects millions of people around the world. Many of the features in the dataset are useful to build a machine
these individuals are completely unaware of their condition. learning prediction model. Therefore, dimensionality
Machine learning models such as Logistic Regression, reduction or feature selection becomes an important step
AdaBoost, etc. for prediction of diabetes have also been before applying any data mining technique to decrease the
proposed by earlier researchers. The authors also highlighted model's overall complexity. Many researchers have applied
the significance of various classification methods used for feature selection techniques to identify relevant features to
disease prediction in medical datasets [24]. N. Nnamoko et al. improve the quality of dataset. Their contribution is discussed
researched to predict diabetes and exploit diversity from further.
heterogeneous base classifiers and the optimisation effect of
attribute subset selection in order to improve accuracy [25]. In A number of methods are available for feature selection in
another work, authors predicted whether someone has which dimensionality reduction is the one which is most
diabetes or not. Authors applied a new algorithm called the widely used method for removing noisy and redundant
Homogeneity-Based Algorithm (HBA) and combined it with elements. Feature extraction and feature selection are the two
existing classification algorithms such as Support Vector major types of dimensionality reduction techniques. Feature
Machine (SVM), Artificial Neural Network (ANN), or extraction usually transforms or combines the original feature
Decision Tree (DT) to improve classification accuracy. The space to obtain a new feature space with lower dimensions.
outcomes of experiment showed that the suggested approach While feature selection selects a subset of features from
beats current approaches significantly [26]. original feature space without performing transformation.
Both the methods have the potential to improve learning
R. Baitharu et al. examined the effect of liver disorder using efficiency, reduce computational complexity, create more
six different classifiers namely J48, Naive Bayes, Multilayer generalizable models, and reduce storage requirements [30].
Perceptron, IBK, ZeroR, VFI. The findings indicated that all Some authors discussed and compared existing feature
of the classifiers had improved predictive performance, selection algorithms such as filter, wrapper and embedded
however Naive Bayes showed the weak outcome [27]. methods with their pros and cons. Their findings indicated
that embedded method performed better [31].
Breast cancer has surpassed all other cancers as the leading
cause of mortality among women. Shelly Gupta et al. studied To achieve substantial dimensionality reduction in medical
the literature available on diagnosis and prognosis of breast datasets, the authors presented an embedded hybrid feature
cancer using machine learning models such as Multilayer selection model which combines two well known data mining
Perceptron, Self Organizing Map, Radial Basis Function, techniques which are clustering and classification. Authors
Artificial Neural Network, Support vector machine etc. conducted an experiment using F-score and k-means feature
Authors analyzed that ANN gave better accuracy as compare selection techniques with SVM classifier on Diabetes, Breast
to other classification techniques [28]. Cancer and Heart Disease datasets. Their findings indicated
that choosing the most important features from medical data

improve the accuracy of classifier and also helps the physician is frequently unsuitable for learning and better performance
to make accurate diagnosis [32]. can be achieved by using class distribution techniques. In the
second experiment, authors suggested which distribution is
IhsanAbodKhalaf et al. proposed the network intrusion optimal for training based on two performance metrics:
detection system which is based on Support Vector Machines classification accuracy and area under the ROC curve [36].
(SVM) classifiers and two feature selection algorithms:- Self Many researchers have described most well-known data
Organizing Map (SOM) and Principle Component Analysis reduction techniques [37], [38]. J. Laurikkala studied three
(PCA). Voting technique was used to combine the two feature methods namely Simple random sampling (SRS), One-Sided
selection algorithms. The findings of this study revealed that Selection (OSS), Neighborhood Cleaning rule (NCL) for
different feature selection algorithms can have different improving identification of misclassification problem.
impact on classification performance. In addition, authors Authors conducted an experiment with ten datasets, six of
presented a comparative study of accuracy results between the which were medical data, which is our major application area
feature selection algorithms namely Principal Component of concern. Their experiments indicated that Neighborhood
Analysis and Self Organizing Map using SVM classifier [33]. Cleaning rule (NCL) showed better results as compare to
simple random sampling and one-sided selection methods.
A. Jovic et al. summarized the application domains of The findings suggested that NCL can be used to improve the
multi-dimensional feature space such as text mining, image modelling of problematic small classes and to create
processing and computer vision, bioinformatics, industrial classifiers that can detect these classes from real-world data
applications. Detailed study of feature selection techniques [39].
was also presented [34].
Another approach used an over-sampling approach SMOTE
In the surveyed literature, feature selection algorithms such as to handle the issue of data imbalance. Some authors
filter, wrapper and embedded methods along with their pros demonstrated that combination of over-sampling the minority
and cons are discussed. Many methods are found to be class with under-sampling the majority class will improve
effective for feature selection in data related to COVID-19. classifier efficiency over simply under-sampling the majority
class [40], [41].
Studies revealed that real world datasets often have
The available datasets on COVID are sometimes highly misclassification problem. It occurs when the distribution of
imbalanced, using such datasets can be very difficult. There classes is biased in the training dataset. Data balancing
are some standard techniques to balance the dataset, which techniques discussed in the literature could be useful to
are discussed in this section. handle misclassification issue in data available on COVID.

Misclassification problem or class imbalance occurs when 6. CONCLUSION AND DISCUSSION

one class has fewer training instances than the other classes.
Phung et al. looked at a variety of approaches and strategies to Corona virus is one of the major cause of death around the
deal with the issue of class imbalance at both data level and world. It's early detection is essential for better medical
algorithmic level. Authors argued that sampling is one of the treatment. In this paper, the available literature towards
most popular method to address the issue. They described various diseases, specifically COVID-19 has been reviewed.
fundamental sampling techniques such as undersampling and According to the findings, data mining plays a significant role
oversampling, as well as advanced sampling techniques for in disease prediction. Using machine learning to analyse the
minimising misclassification problem in training data with prediction model yields promising results with better
their pros and cons. Also, they presented a new technique to accuracy. Mining the necessary knowledge from medical data
handle with the issue of misclassification by integrating assists in making possibly the best diagnosis and decisions.
supervised and unsupervised learning. These approaches Also, different feature selection and data imbalancing
change the majority and minority class distributions in the techniques have been presented in a number of papers
training data sets to achieve equal number of instances in each surveyed. Based on the findings, it has been observed that by
class [35]. using balanced data and reducing the number of attributes,
the classification accuracy can be increased. This survey will
G.M. Weiss et al. analysed how data imbalancing affects guide in determining the best algorithm for COVID
classifier learning. Authors discussed how misclassification prediction in order to improve classification. Since the corona
influenced learning and how it impacted the evaluation of virus is still prevailing in the world, mutating its form and
learned classifiers. The results of two experimental studies are causing havoc, it is expected that more machine learning and
then presented. In the first experiment, authors compared the deep learning methods will be applied to make long term and
performance of classifiers built from unbalanced datasets short term prediction of infection in human population.
against classifiers built from balanced copies of the same
datasets. Their study suggested that original class distribution

REFERENCES Comparative Study of Machine Learning Models

for COVID-19 prediction in India", IEEE 4th
1. [Online] WHO Health information and resources Conference on Information & Communication
https://www.who.int/emergencies/diseases/novelc-oro Technology (CICT), Chennai, India, pp. 1-7, 2020.
na virus-2019 [accessed on 1 april 2021]. 14. Elizabeth V. Robilotti, N. Esther Babady, Peter A.
2. Naoya Itoh et al., "Coronavirus disease 2019 Mead, Thierry Rolling, "Determinants of COVID-19
(COVID-19): A literature review", Journal of disease severity in patients with cancer", Nature
Infection and Public Health, Volume 13, Issue 5, ISSN Medicine, volume 26, 2020.
1876-0341, 2020. 15. A. Hassan, I. Shahin and M. B. Alsabek, "COVID-19
3. Morawska, Lidia, and Junji Cao. "Airborne Detection System using Recurrent Neural
transmission of SARS-CoV-2: The world should Networks" 2020 International Conference on
face the reality", Environment international, volume Communications, Computing, Cybersecurity, and
139 : pages 105730, ISSN: 0160-4120, 2020. Informatics (CCCI), Sharjah, United Arab Emirates,
4. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. pp. 1-5, 2020.
"Early transmission dynam-ics in Wuhan, China, of 16. G. Deshpande and B. Schuller, "An Overview on
novel coronavirus-infected pneumonia", N Engl J Audio, Signal, Speech, & Language Processing for
Med; Volume 382(13):1199207, 2020. COVID-19", ArXiv, pp. 1-5, May 2020.
5. Liang K. "Mathematical model of infection kinetics 17. C. Bales et al., "Can Machine Learning Be Used to
and its analysis for COVID-19, SARS and MERS", Recognize and Diagnose Coughs?", ArXiv, pp. 1-10,
Infection, Genetics and Evolution; Volume May 2020.
82:104306, PMID: 32278147; PMCID: 18. S.Vijiyarani, S. Sudha, "Disease Prediction in Data
PMC7141629, 2020. Mining Technique – A Survey", International Journal
6. R. Kumari et al., "Analysis and predictions of of Computer Applications & Information Technology
spread, recovery, and death caused by COVID-19 Vol. II, Issue I, (ISSN: 2278-7720), January 2013.
in India", Big Data Mining and Analytics, vol. 4, no. 19. A. Tikotikar and M. Kodabagi, "A survey on
2, pp. 65-75, June 2021. technique for prediction of disease in medical data",
7. F. Pan, T. Ye, P. Sun, S. Gui, B. Liang, L. Li, D. International Conference On Smart Technologies For
Zheng, J. Wang, R. L. Hesketh, L. Yang, et aI., "Time Smart Nation (SmartTechCon), Bengaluru, India, pp.
course of lung changes on chest CT during 550-555, 2017.
recovery from 2019 novel coronavirus (COYID-19) 20. Syed.Matheen. Pasha, Shilpa Ankalaki, " Diabetes
pneumonia", Radiology, vol. 295, no.3, pp.715-721, and Heart Disease Prediction Using Machine
2020. Learning Algorithms ", International Journal of
8. X. Bai et al., "Performance of radiologists in Emerging Trends in Engineering Research, Volume 8,
differentiating COVID-19 from viral pneumonia No. 7, 2020.
on chest CT", Radiology, pp. 200823, 2020. 21. Mangesh Limbitote , Dnyaneshwari Mahajan , Kedar
9. D. Haritha, N. Swaroop and M. Mounika, "Prediction Damkondwar, Pushkar Patil, 2020, "A Survey on
of COVID-19 Cases Using CNN with X-rays", 2020 Prediction Techniques of Heart Disease using
5th International Conference on Computing, Machine Learning", INTERNATIONAL JOURNAL
Communication and Security (ICCCS), Patna, India, OF ENGINEERING RESEARCH & TECHNOLOGY
pp. 1-6, 2020. (IJERT) Volume 09, Issue 06 (June 2020).
10. C. Li et al., "Classification of Severe and Critical 22. Asha Rajkumar, G.Sophia Reena, "Diagnosis Of
Covid-19 Using Deep Learning and Radiomics", Heart Disease Using Datamining Algorithm",
IEEE Journal of Biomedical and Health Informatics, Global Journal of Computer Science and Technology
vol. 24, no. 12, pp. 3585-3594, Dec. 2020. 38 Vol. 10, Issue 10 Ver. 1.0 September 2010.
11. S. Rajaraman, J. Siegelman, P. O. Alderson, L. S. Folio, 23. Jyoti Soni, Ujma Ansari, Dipesh Sharma, Sunita Soni
L. R. Folio and S. K. Antani, "Iteratively Pruned "Predictive Data Mining for Medical Diagnosis: An
Deep Learning Ensembles for COVID-19 Overview of Heart Disease Prediction", IJCSE, Vol.
Detection in Chest X-Rays", IEEE Access, vol. 8, pp. 3, No. 6 June 2011.
115041-115050, 2020. 24. Aishwarya Mujumdar, V Vaidehi, "Diabetes
12. R. Kurniawan, S. N. H. Sheikh Abdullah, F. Lestari, Prediction using Machine Learning Algorithms",
M. Z. A. Nazri, A. Mujahidin and N. Adnan, Procedia Computer Science, Volume 165, Pages
"Clustering and Correlation Methods for Predicting 292-299, ISSN 1877-0509, 2019.
Coronavirus COVID-19 Risk Analysis in Pandemic 25. N. Nnamoko, A. Hussain and D. England, "Predicting
Countries", 8th International Conference on Cyber and Diabetes Onset: An Ensemble Supervised Learning
IT Service Management (CITSM), Pangkal, Indonesia, Approach", IEEE Congress on Evolutionary
2020, pp. 1-5, 2020. Computation (CEC), Rio de Janeiro, pp. 1-7, 2018.
13. V. Bhadana, A. S. Jalal and P. Pathak, "A
26. Huy Pham & Evangelos Triantaphyllou, "Prediction of Fisher, D.H. (ed.): Proceedings of the Fourteenth
Diabetes by Employing a New Data Mining International Conference in Machine Learning.
Approach Which Balances Fitting and Morgan Kaufmann, San Francisco, pp. 179-186,
Generalization", Computer and Information Science. 1997.
SCI, Volume 131, pages 11-26, 2008. 39. J. Laurikkala, "Improving Identification of Difficult
27. T. R. Baitharu and S. K. Pani, "Analysis of Data Small Classes by Balancing Class Distribution",
Mining Techniques for Healthcare Decision Proc. Conf. AI in Medicine in Europe: Artificial
Support System Using Liver Disorder Dataset", Intelligence Medicine, pp. 63-66, 2001.
Procedia Computer Sci., vol. 85, no. Cms, pp. 862–870, 40. N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P.
2016. Kegelmeyer, "SMOTE: Synthetic Minority
28. Shelly Gupta, Dharminder Kumar and Anand Sharma, Over-sampling Technique", Journal Of Artificial
"Data mining classification techniques applied for Intelligence Research, Volume 16, pages 321-357,
breast cancer diagnosis and prognosis", Indian 2002.
Journal of Computer Science and Engineering, volume 41. S. F. Abdoh, M. Abo Rizka and F. A. Maghraby,
2, 2011. "Cervical Cancer Diagnosis Using Random Forest
29. Mehwish Dildar et al., "Skin Cancer Detection: A Classifier With SMOTE and Feature Reduction
Review Using Deep Learning Techniques", Techniques" IEEE Access, vol. 6, pp. 59475-59485,
International Journal of Environmental Research and 2018.
Public Health, Volume 18, No. 10, ISSN 1660-4601,
30. J. Tang, S. Alelyani, and H. Liu, "Feature Selection
for Classification: A Review", C. Aggarwal (ed.),
Data Classification: Algorithms and Applications.
CRC Press, 2014.
31. Y. Dhote, S. Agrawal and A. J. Deen, "A Survey on
Feature Selection Techniques for Internet Traffic
Classification", International Conference on
Computational Intelligence and Communication
Networks (CICN), Jabalpur, pp. 1375-1380, 2015.
32. Dr. B. Sarojini, Dr. N.Ramaraj, "Enhancing Medical
Prediction using Feature Selection", International
Journal of Artificial Intelligence & Expert Systems
(IJAE), Volume (1) : Issue (3), 2011.
33. Ihsan Abod Khalaf, Abdallah M Abualkishik and
Abdulla Amin Aburomman, Mamun Bin IbneReaz,
"Two Features Selection Algorithms based On
ensemble of SVM Classifier For Intrusion
Detection", Australian Journal of Basic and Applied
Sciences, Volume 7, pp. 480-485, ISSN 1991-8178,
34. A. Jovic, K. Brkic and N. Bogunovic, "A review of
feature selection methods with applications", 38th
International Convention on Information and
Communication Technology, Electronics and
Microelectronics (MIPRO), pp. 1200-1205, 2015.
35. Phung, S. L., Bouzerdoum, A., and Nguyen, G. H.,
"Learning pattern classification tasks with
imbalanced data sets", P. Yin (Eds.), Pattern
recognition, pp. 193-208, 2009.
36. G.M. Weiss and F. Provost, "The Effect of Class
Distribution on Classifier Learning: An Empirical
Study", Technical Report MLTR 43, Dept. of
Computer Science, Rutgers Univ., 2001.
37. W.G. Cochran, "Sampling Techniques. 3rd edn."
Wiley, New York, 1977.
38. M. Kubat, S. Matwin, "Addressing the Curse of
Imbalanced Training Sets: One-Sided Selection",

