Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges

Kaur, Arvinder; Kumar, Yugal; Sidhu, Jagpreet

doi:10.1007/s10462-024-10920-1

Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges

Open access
Published: 12 September 2024

Volume 57, article number 287, (2024)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges

Download PDF

Arvinder Kaur¹,
Yugal Kumar² &
Jagpreet Sidhu²

847 Accesses
Explore all metrics

Abstract

Partitional clustering is a type of clustering that can organize the data into non-overlapping groups or clusters. This technique has diverse applications across the different various domains like image processing, pattern recognition, data mining, rule-based systems, customer segmentation, image segmentation, and anomaly detection, etc. Hence, this survey aims to identify the key concepts and approaches in partitional clustering. Further, it also highlights its widespread applicability including major advantages and challenges. Partitional clustering faces challenges like selecting the optimal number of clusters, local optima, sensitivity to initial centroids, etc. Therefore, this survey describes the clustering problems as partitional clustering, dynamic clustering, automatic clustering, and fuzzy clustering. The objective of this survey is to identify the meta-heuristic algorithms for the aforementioned clustering. Further, the meta-heuristic algorithms are also categorised into simple meta-heuristic algorithms, improved meta-heuristic algorithms, and hybrid meta-heuristic algorithms. Hence, this work also focuses on the adoption of new meta-heuristic algorithms, improving existing methods and novel techniques that enhance clustering performance and robustness, making partitional clustering a critical tool for data analysis and machine learning. This survey also highlights the different objective functions and benchmark datasets adopted for measuring the effectiveness of clustering algorithms. Before the literature survey, several research questions are formulated to ensure the effectiveness and efficiency of the survey such as what are the various meta-heuristic techniques available for clustering problems? How to handle automatic data clustering? What are the main reasons for hybridizing clustering algorithms? The survey identifies shortcomings associated with existing algorithms and clustering problems and highlights the active area of research in the clustering field to overcome these limitations and improve performance.

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

Article 10 October 2020

A Generalized Study on Data Mining and Clustering Algorithms

Clustering Analysis Based on Coyote Search Technique

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The process of exploring and analysing large data for new, valid, and profitable patterns is termed knowledge discovery. However, due to rapid increments in data generation and storage, it is becoming more and more difficult to retrieve information by traditional analysis methods. Data mining is a task that can be employed to retrieve valuable information and patterns from this large data. Data mining techniques are being used to scour databases so that new and convenient patterns can be effortlessly discovered. Data mining tasks are classified as predictive tasks and descriptive tasks (Tan et al. 2016). Predictive tasks determine the value of a particular attribute based on other attributes. Descriptive tasks derive patterns (correlations, trends, clusters) that summarize underlying relationships. Hence, clustering is a descriptive task that can group the objects based on some similarity measure. Broadly, clustering can be characterized as Partitional and hierarchical. Partitional clustering is grouping objects into non-overlapping clusters based on inter-cluster distances. Hierarchical clustering is a tree clustering either by an agglomerative (Bottom-up) approach or by Divisive (Top-down) approach. Several other clustering methods are reported in the literature (i) graph clustering, (ii) spectral clustering, (iii) model-based clustering, (iv) spectral clustering, (v) density-based clustering, etc. Graph clustering is based on a collection of vertices and edges (Schaeffer 2007). Graph clustering includes grouping of vertices based on edges within a cluster and relatively fewer among other clusters. Spectral clustering is a subset of graph clustering methods that utilize spectral analysis to cluster data points based on their graph representation (Kannan et al. 2004). This clustering method leverages graph theory and spectral analysis (eigenvalue decomposition) to cluster data points based on their similarity or affinity. Spectral clustering is an efficient technique to handle various heuristic problems. Model-based clustering uses the concept of finite mixture models (Schaeffer 2007). Model-based clustering is a statistical clustering approach and it is assumed that the data can be generated from a mixture of underlying probability distributions. In this clustering technique, data can be viewed as a combination of different probability distributions each corresponding to a cluster. In model-based clustering, the goal is to find the best-fitting model of the data by estimating the parameters of the underlying probability distributions. Density-based clustering techniques are designed to find clusters of arbitrary shapes. DBSCAN is a popular density-based clustering example (Hahsler and Bolaños 2016). The DBSCAN counts eps-neighbourhood and identifies core, border, and noise points on user-specified thresholds to estimate density around each data point.

However, in the literature, it is found that Partitional clustering is a prominent one among all clustering methods for data analysis. Partitional clustering is a widely used approach in data analysis, machine learning, and data mining. It divides a dataset into non-overlapping groups, such that each data point belongs to exactly one cluster. This clustering technique aims to minimize within-cluster variance and maximize inter-cluster variance, resulting in clusters that are as distinct and cohesive as possible. While Partitional clustering methods such as k-means and k-mediods are popular due to their simplicity and efficiency, these algorithms have some limitations including sensitivity to initial conditions, potential convergence to local optima, and challenges in determining the optimal number of clusters. To handle these limitations and enhance clustering performance, meta-heuristic algorithms have been proposed as alternatives or enhancements to traditional methods. Meta-heuristic algorithms offer a flexible and adaptive approach to Partitional clustering. These algorithms consist of intelligent search strategies to explore the solution space and optimize clustering assignments. The metaheuristics are optimization algorithms that help in finding the solutions to the complex problems. Thus, metaheuristic algorithms provide a powerful approach to optimizing the different aspects during the clustering process. This helps to improve the cluster quality and can efficiently handle complex clustering problems. Different metaheuristics approaches have been developed and used for optimizing the clustering process. The clustering process using a metaheuristic consists of various steps. The clustering problem is defined by initializing the number of clusters and objective function. Initialize the population and randomly generate the initial set of solutions. The objective function further evaluates the quality of each solution and the fitness values of each solution define the satisfying criteria of the clustering objective. The Metaheuristic approach is used for iterating thru the candidate solution and improving the fitness value and quality of clusters. The best solutions when found are updated in the current population. When the convergence criteria are met, the best solutions are returned as cluster centroid. Further, the quality of clusters can be evaluated using different performance measures or metrics such as compactness, separation, or clustering stability. Further, metaheuristic algorithms also help in improving the quality of clustering by modifying the cluster centres iteratively concerning the fitness requirements such as minimum intra-cluster distance. These algorithms are also capable of handling non-convex clusters through the exploration of intricate search spaces and the determination of non-linear cluster boundaries. However, it also observed that metaheuristic algorithms also have some limitations such as being stuck in local optima, convergence rate, unbalanced search mechanism, population diversity, and initialization issues (Yao et al. 2018; Bahrololoum et al. 2015; Bijari et al. 2018; Chang et al. 2016). Hence, the objective of this survey is to identify the different metaheuristic algorithms available in the literature for Partitional clustering, shortcomings associated with these algorithms, alleviation of the shortcomings, objective functions, and benchmark datasets for clustering. Before proceeding, several research questions are designed to find the accurate outcome for this survey. These research questions are highlighted below. Further, metaheuristic algorithms also help in improving the quality of clustering through modifying the cluster centres iteratively with respect to the fitness requirements such as minimum intra-cluster distance. These algorithms also capable to handle the non-convex clusters through the exploration of intricate search spaces and the determination of non-linear cluster boundaries. However, it also observed that metaheuristic algorithms also have some limitations such as stuck in local optima, convergence rate, unbalanced search mechanism, population diversity, and initialization issues (Yao et al. 2018; Bahrololoum et al. 2015; Bijari et al. 2018; Chang et al. 2016). The visualization in Fig. 1a–d illustrates the examination of meta-heuristics in data clustering using VOS Viewer (Abbasi and Choukolaei 2023). This analysis involved exploring various key terms within research articles from 2015 to 2024 from Science Direct, leveraging meta-heuristics in data clustering. VOS Viewer is a specialized software tool designed for constructing and visualizing bibliometric networks. Widely embraced in academic circles, VOS Viewer facilitates the analysis and visualization of relationships among scientific publications, authors, keywords, and other entities within a specific research domain (Emrouznejad et al. 2023). These visualizations assist researchers in discerning patterns, clusters, and trends within the literature, providing valuable insights into the structure and dynamics of the field under investigation.

The primary aim of this survey is to identify different metaheuristic algorithms presented in the literature for Partitional clustering, along with their associated shortcomings, methods for mitigating these shortcomings, objective functions, and benchmark datasets for clustering. To achieve this objective, several research questions have been formulated to ensure the accuracy of the survey findings. These research questions are outlined below.

1.1 Research questions (RQ)

The primary survey objective is to find answers to the following Research Questions (RQ):

RQ 1

What are the various meta-heuristic techniques available for clustering problems?

RQ 2

How to handle automatic data clustering?

RQ 3

How to handle high dimensional data (problems) with clustering?

RQ 4

What are the main reasons for hybridizing the clustering algorithms?

RQ 5

What are different objective functions (distance function), different performance measures, and benchmark datasets adopted to evaluate the performance of Partitional clustering algorithms?

1.2 Purpose of this survey

The purpose of this survey paper is to provide a comprehensive review of the field of partitional clustering. This study aims to identify the recent advancement in the context of meta-heuristic algorithms, exploring the structure of the meta-heuristic algorithms and, the strengths and weaknesses of the algorithms for handling the partitional clustering problems. This survey also synthesizes the knowledge from both classical and contemporary approaches for partitional clustering, including optimization-based methods (meta-heuristic algorithms), improved algorithms, hybrid algorithms, and adaptive control parameters. It also highlights the various distance functions adopted as similarity measures for clustering tasks and considers the benchmark datasets that can be adopted for evaluating the efficacy of the clustering algorithms. By examining the strengths, limitations, and potential areas for improvement of these methods, this paper seeks to offer insights into the evolution of partitional clustering and guide future research directions. The goal of this survey is to serve as a valuable resource for researchers for selecting and designing effective meta-heuristic algorithms for complex clustering tasks and for understanding the current state of partitional clustering. To analysis this rich literature, several research questions are designed. The paper is divided into six sections. Section second summarizes the methodology adopted for the survey. The different techniques adopted for cluster analysis are discussed in section three. Section four presents the diverse clustering objective functions, performance metrics, and datasets considered for clustering problems. Section five discusses the various open issues and challenges related to clustering. Section six concludes the entire article, including the research questions devised in section two.

2 Methodology for the survey

This section including research questions, source of information, and inclusion and exclusion criteria of research articles for an effective and efficient survey. Figure 2 illustrates the process of collecting research articles for this survey.

2.1 Source of information

The following databases are explored for the domain of data clustering.

Google scholar (www.scholar.google.co.in)
IEEE (www.ieeexplore.ieee.org)
Springer (www.springerlink.com)
Science Direct (www.sciencedirect.com)
ACM digital library (dl.acm.org)
Semantics scholar (www.semanticscholar.org)
Elsevier (www.elsevier.co.in) and others

2.2 Inclusion and search criteria

The objective is to find various meta-heuristic algorithms for effective handling of clustering problems. Figure 3 describes the process of inclusion and exclusion of research articles. The meta-heuristic algorithms considered meet the following criteria:

(i)
Related to meta-heuristic algorithms.
(ii)
Includes data on high dimensional clustering, data clustering, dynamic, and automatic clustering.
(iii)
Related to single objective and multi-objective clustering.
(iv)
Work published in between 2015 to 2024.
(v)
Published in SCI and SCOPUS-listed journals.

Initial search considered all relevant work with key words: (Data clustering) < OR > (Meta heuristic algorithms) < OR > (Single objective Clustering) < OR > (Multi-objective clustering) < OR > (High dimensional clustering) < OR > (Data clustering) < OR > (dynamic and automatic clustering) < OR > (Graph clustering).The above query generated literature rather than a title or abstract.

2.3 Exclusion criteria

An exclusion criterion is also adopted for the exclusion of non-relevant research papers. Research articles from journals of high repute are only considered (SCI and free Scopus). The exclusion criterion includes research published in books, national and international conferences, magazines, newsletters and educational courses, symposium workshops, and journals of less repute.

2.4 Extraction of articles

Initially, 956 articles are collected from various research databases. A huge amount of research articles were found due to the keyword “clustering”. The next step is to exclude non-relevant as per the criteria. It resulted in 455 research articles. Further, research articles published in the journal of repute are considered by manually removing articles from non-repute journals, books, and magazines. It resulted in the exclusion of 182 more research articles. During the study, 189 research articles didn’t fit well in the predefined search criteria. Finally, 130 research articles are analysed during the survey. Table 1 illustrates the data. Further, a team of four researchers is formed to manually select articles on predefined search criteria. Initially, two researchers select the articles, and the selected articles are further crosschecked by the third and fourth researchers. In case of a conflict, a collective decision has been taken by the team. This process has been repeated in every phase of study selection. Table 1 and Fig. 3 illustrate journals considered for the survey.

Table 1 Journal composition after selection

Full size table

Figure 3 provides a comprehensive visualization of the distribution of research articles across various journals within the surveyed literature. The figure presents a tabular representation with three columns: Sr. No., Journal Name, Publisher, and No. of Papers. Each row in the table corresponds to a specific journal and includes details such as the journal name, publisher, and the number of papers published within the surveyed literature. This detailed breakdown allows for a clear understanding of the publication landscape and the relative contribution of each journal to the body of research on clustering algorithms. From prestigious publishers like Elsevier and Springer to specialized journals such as IEEE Transactions, the table encompasses a wide array of publication outlets. It highlights the diversity of sources from which researchers draw when exploring clustering algorithms, reflecting the interdisciplinary nature of the field. By presenting this information in a structured and easily digestible format, Fig. 4 offers valuable insights into the dissemination of knowledge within the clustering research community, aiding researchers in identifying key journals and publishers within the domain.

2.5 Data classification process

Finally, articles are classified into five and explored thoroughly to find key points for comparative study. Articles are reanalysed and evaluated on parameters (i) Algorithm/methodology used (ii) Type of clustering (iii) Data sets used (iv) Performance metrics and (v) Authors.

3 Literature survey

The literature survey is divided into five subsections.

This section analyses various meta-heuristic algorithms reported for clustering problems. Further, clustering problems are divided into Partitional clustering, dynamic and automatic clustering, and fuzzy clustering.

3.1 Meta-heuristic algorithms for partitional clustering

Meta-heuristic algorithms are higher-level procedures and heuristics for optimization problems. These algorithms are optimization algorithms inspired by natural phenomena such as biological evolution and swarm behaviour. These algorithms aim to find the optimal and near-to-optimal solution for Partitional problems. Further, several assumptions are taken into consideration for solving optimization tasks. These algorithms have been applied to clustering tasks to improve the quality of the clustering process and overcome challenges such as determining the optimal number of clusters, handling complex data distributions, and dealing with outliers. In this section, we explore improved meta-heuristic clustering algorithms that have been developed to enhance clustering performance, focusing on novel strategies and recent advancements. Meta-heuristic clustering algorithms, such as Genetic Algorithms (GAs), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO), use population-based search strategies to optimize clustering objectives. In Partitional clustering, these algorithms aim to find a set of cluster assignments that maximize intra-cluster similarity while minimizing inter-cluster similarity. Moreover, the data are partitioned into a fixed number of clusters using some distance measures. It is also noticed that the number of clusters is fixed and known in advance. In Partitional clustering, Euclidean distance is applied to determine the optimal set of clusters in most cases. Partitional clustering is also known as non-overlapping clustering because the data belongs to only one cluster. The popular example of Partitional clustering is K-mean and it is also known as hard clustering. Table 2, illustrates Partitional clustering literature during the survey. Table 2, illustrates Partitional clustering literature in terms of meta-heuristic algorithms that can be applied for improving the efficacy of the clustering problems.

Table 2 Illustrates partitional clustering

Full size table

3.1.1 Meta-heuristic algorithms for dynamic and automatic partitional clustering

Dynamic and automatic clustering is a sub-branch of Partitional clustering that focuses on grouping data points into meaningful clusters in scenarios where the data itself is changing over time, or new data is constantly being added. This presents a challenge because static clustering techniques, which rely on fixed data sets, might not be suitable for data that evolves. Dynamic clustering techniques aim to adapt to changes in the data set by adjusting cluster structures and numbers as new data is introduced or as data distribution changes. Automatic clustering involves algorithms that automatically determine the optimal number of clusters and other parameters required to generate the clusters. When combined, dynamic and automatic clustering can provide an effective approach for evolving data sets without requiring extensive manual intervention. Recently, meta-heuristic algorithms are optimization algorithms that can be used effectively in dynamic clustering because they provide flexible and efficient methods for exploring the search space. These algorithms are particularly useful in solving complex optimization problems and can adapt to changing environments. These meta-heuristic algorithms can be applied to dynamic and automatic clustering by defining an appropriate objective function, such as minimizing intra-cluster distance or maximizing inter-cluster distance. As the data changes over time, these algorithms can adapt the clusters accordingly, ensuring that the clustering remains relevant and meaningful. This clustering includes very large data, data streams, incomplete data, noisy data, unbalanced data, and structured data. In dynamic and automatic clustering, it is important to evaluate the model performance regularly, ensuring that the clusters remain meaningful as the data evolves. The choice of the specific algorithm will depend on the characteristics of the data set, including its size, dimensionality, and the rate at which it changes over time. This subsection highlights the recent work reported on dynamic and automatic Partitional clustering. Table 3, illustrates various dynamic and automatic clustering algorithms considered during the survey.

Table 3 Illustrates dynamic and automatic clustering

Full size table

3.1.2 Meta-heuristic algorithms for fuzzy clustering (generalization of the partitional clustering)

Fuzzy clustering is also known as soft clustering. It is a generalization of the Partitional clustering method. In this clustering, each data can belong to more than one cluster. Fuzzy clustering is a type of clustering approach where each data point can belong to more than one cluster with a certain degree of membership. In contrast to traditional (hard) clustering methods, such as k-means, where each data point is assigned to one and only one cluster. Fuzzy clustering is particularly useful when the boundaries between clusters are not clear-cut, or when the data itself is inherently ambiguous or overlapping. The most commonly used fuzzy clustering algorithm is Fuzzy C-Means (FCM), introduced by Jim Bezdek in 1981. FCM is an extension of the classic k-means algorithm that allows data points to have partial membership in multiple clusters. Fuzzy clustering is widely used in various applications such as pattern recognition, data analysis, image segmentation, and bioinformatics, where overlapping or ambiguous groups may exist in the data. Further, Meta-heuristic algorithms can be employed in fuzzy clustering to optimize the clustering process, particularly in terms of finding the optimal number of clusters, the best initial cluster centroids, or the optimal fuzziness parameter (m). The most common fuzzy clustering algorithm is Fuzzy C-Means (FCM), but it can suffer from limitations such as sensitivity to initial conditions and local optima. Meta-heuristic algorithms can help improve the performance of fuzzy clustering by exploring a broader search space and finding better solutions. By integrating meta-heuristic algorithms with fuzzy clustering, more robust, flexible, and efficient clustering results can be obtained in complex data environments. Table 4, highlights the recent work reported on fuzzy clustering. Fuzzy clustering is widely used in various applications such as pattern recognition, data analysis, image segmentation, and bioinformatics, where overlapping or ambiguous groups may exist in the data. Further, Meta-heuristic algorithms can be employed in fuzzy clustering to optimize the clustering process, particularly in terms of finding the optimal number of clusters, the best initial cluster centroids, or the optimal fuzziness parameter (m). The most common fuzzy clustering algorithm is Fuzzy C-Means (FCM), but it can suffer from limitations such as sensitivity to initial conditions and local optima. Meta-heuristic algorithms can help improve the performance of fuzzy clustering by exploring a broader search space and finding better solutions. By integrating meta-heuristic algorithms with fuzzy clustering, more robust, flexible, and efficient clustering results can be obtained in complex data environments. Table 4, highlights the recent work reported on fuzzy clustering.

Table 4 Illustrates fuzzy clustering

Full size table

3.1.3 Improved meta heuristic algorithm for partitional clustering

Meta-heuristic algorithms can explore the search space to determine solutions to optimization problems. But, sometimes it is not possible to explore the entire search space through a meta-heuristic algorithm. As these algorithms are not exact; so to enhance the performance of meta-heuristic algorithms, a few amendments can be made to improve the efficiency and effectiveness of meta-heuristic algorithms. These amendments can be described as using neighbourhood concepts, defining new search strategies, making the algorithmic parameters adaptive, etc. The improved meta-heuristic algorithms can be described by enhancing their efficiency, convergence speed, exploration–exploitation balance, and robustness in solving Partitional-clustering problems. It can be understood as combining different meta-heuristic algorithms according to their strengths and offset individual weaknesses. Further, integrating the local search methods with meta-heuristics can refine solutions in promising areas of the search space. Dynamically adjust the parameters of the algorithms based on feedback from the search process so that these algorithms can adapt more effectively to solve the clustering problems. Also, design the procedure for algorithms to self-adapt parameters automatically during the search. These improvements can be tailored and combined in various ways depending on the specific problem and application. Research and innovation in meta-heuristic algorithms continue to evolve, and new approaches and enhancements are regularly being proposed in the academic and research communities. Hence, this section summarizes the improvements reported in original meta-heuristic algorithms for effectively solving clustering problems. Table 5, illustrates various improved metaheuristic algorithms in literature.

Table 5 Improved metaheuristic

Full size table

3.1.4 Hybrid metaheuristic algorithm for partitional clustering

Hybridization is a warm area of research to improve and enhance the performance of algorithms. A hybrid meta-heuristic algorithm combines different meta-heuristic approaches or integrates a meta-heuristic with other optimization techniques to take advantage of their respective strengths while mitigating weaknesses. In the context of clustering, a hybrid meta-heuristic algorithm can optimize cluster assignments and centroids while balancing exploration and exploitation in the search process. Hybrid meta-heuristic algorithms for Partitional clustering combine the strengths of different optimization techniques to achieve better clustering results. Partitional clustering involves dividing the dataset into disjoint clusters where each data point belongs to exactly one cluster. A hybrid meta-heuristic algorithm for Partitional clustering can enhance the clustering process by improving the selection of initial cluster centres, balancing exploration and exploitation during the search process, and increasing the algorithm’s robustness and efficiency. Hybrid meta-heuristic algorithms can be fine-tuned and adapted based on the specific clustering problem and dataset characteristics. This approach can be particularly beneficial for complex clustering problems where traditional methods may struggle. By leveraging the strengths of multiple meta-heuristic approaches, hybrid algorithms can potentially outperform individual methods, offering more robust and effective solutions for clustering problems. Hence, this section aims to present various hybrid meta-heuristic algorithms reported for solving clustering problems. Table 6, illustrates various hybrid metaheuristic algorithms for clustering in literature.

Table 6 Hybrid approaches

Full size table

4 Objective function, performance metric and dataset

This section describes various objective functions, performance metrics, and datasets used to solve clustering problems.

4.1 Objective function

Clustering is an unsupervised technique that can be applied for data exploration. Clustering aims to find a group of data, known as clusters. An objective function is required to find these groups of data. The objective function is a distance-based function that can measure the distance between data and clusters. Hence, the objective function in clustering aims to determine the quality of clusters. This can be described in terms of cluster compactness. The cluster compactness can be defined as the total distance of each cluster data to the cluster centroid. There are a lot of objective functions presented in the literature for effective clustering. Without these, the clustering cannot be performed. For effective clustering, it is necessary to pick the appropriate clustering objective. Table 7 depicts the well-known clustering objective reported for the clustering task. It is seen that Euclidean distance is a widely adopted and popular objective function for clustering problems. Table 7, illustrates the objective functions studied during this survey.

Table 7 List of objective functions

Full size table

4.2 Performance metrics

The performance metrics are used to evaluate the performance of the clustering algorithm. The performance metrics should be independent and reliable measures that can assess and compare the experimental results of the clustering algorithm. Based on comparison, the validity of a clustering algorithm is described. In general, to evaluate the performance of the clustering, two evaluations are used i.e. external evaluation and internal evaluation. The external evaluation contains the information of the dataset. The internal evaluation can be described as the evaluation of the dataset itself. Performance metrics like accuracy, f-measure, normalized mutual information, and rand index are commonly used in external evaluation. Performance metrics like the Davies-Bouldin index, Silhouette index, Dunn index, and Entropy are used for internal evaluation. This paper also focuses on different performance metrics reported for clustering algorithms to assess the performance. It is seen that 42 performance metrics are reported in the literature. Table 8 illustrates the performance metrics reported in the literature. It is observed that widely adopted performance metrics are NMI, rand index, accuracy, entropy, f-measure, and error rate. Figure 5 presents a dynamic 3D pie chart, offering a visual representation of key aspects related to clustering algorithm performance assessment. The chart portrays an intricate interplay of various metrics, each contributing to the evaluation of clustering algorithms. As the pie chart rotates, viewers can observe the distribution and significance of different performance metrics within the clustering domain. Additionally, the performance metrics prevalent in the literature, shed light on the diversity and breadth of assessment criteria utilized by researchers. Among these metrics, certain indicators emerge as particularly prominent and widely embraced within the research community. Noteworthy examples include Normalized Mutual Information (NMI), Rand Index, Accuracy, Entropy, F-measure, and Error Rate. Their prevalence underscores their significance in gauging the effectiveness and efficiency of clustering algorithms across various applications and scenarios.

Table 8 List of different performance measures reported in the literature

Full size table

4.3 Dataset

The dataset also plays an important role in validating the performance of clustering algorithms. Clustering is an unsupervised method. Therefore, when a clustering algorithm is implemented no class information is given. The objects are assigned to different clusters based on the objective function. Some external evaluations are used to assess the performance of the clustering algorithm. These evaluations require the class information (cluster information). Moreover, some datasets are linearly separable, whereas some others are non- linearly separable. The performance of the clustering algorithm may be affected due to the above-mentioned properties of data. Another point, the simulation results of the clustering algorithm also depend on attribute types, dimensions of the dataset, size of data, etc. This study also highlights the various datasets that are used to evaluate the performance of clustering algorithms. It is seen that forty datasets are reported in the literature to evaluate the performance of the clustering algorithms. Table 9 demonstrates the list of these datasets. It is also revealed that iris, wine, glass, CMC, vowel, cancer, breast cancer, and thyroid datasets are widely used datasets to evaluate the performance of clustering algorithms.

Table 9 List of datasets adopted to evaluate simulation results

Full size table

Figure 6 showcases a dynamic 3D pie chart, providing a comprehensive overview of the datasets commonly utilized in assessing clustering algorithm performance. The chart captures the diversity and breadth of datasets employed in clustering research, offering insights into the range of scenarios and applications where these algorithms are applied. Each segment of the pie chart represents a specific dataset, with the size of the segment corresponding to the relative frequency or significance of its usage in clustering algorithm evaluation. Notably, the chart underscores the prevalence of certain datasets such as iris, wine, glass, CMC, vowel, cancer, breast cancer, and thyroid, which emerge as widely adopted benchmarks for assessing clustering algorithms. This visualization serves as a valuable reference for researchers and practitioners, providing a visual depiction of the dataset landscape and highlighting key datasets that have become standard benchmarks within the clustering community. By presenting this information in a visually accessible format, Fig. 5 facilitates a deeper understanding of the datasets employed in clustering research and their role in algorithm evaluation.

5 Issues and challenges

This section summarizes the various issues that can be addressed through meta-heuristic algorithms. It is observed that large numbers of meta-heuristic algorithms are taken into consideration to solve the clustering problems effectively.

5.1 Issues in partitional clustering

In Partitional clustering, various meta-heuristic algorithms are applied to solve clustering problems effectively. The main reasons for adopting the meta-heuristic algorithm for Partitional clustering are listed.

(i)
To determine near-optimal solutions for Partitional clustering problems.
(ii)
To evaluate optimal centroid for effective clustering.
(iii)
To determine similar patterns in categorical data.
(iv)
To handle heterogeneous data.
(v)
To determine subspace clusters in the dataset.
(vi)
To handle multimodal and heterogeneous data for effective clustering.
(vii)
To perform clustering of high dimensional data.
(viii)
To handle the educational data mining.

5.2 Issues in dynamic and automatic clustering

From the extensive literature survey, it is inferred that some meta-heuristic algorithms are also adopted in the field of dynamic and automatic clustering. The main reasons for applying the meta-heuristic algorithm are listed.

(i)
To enhance the convergence rate of algorithms.
(ii)
To avoid stagnation and premature convergence.
(iii)
To develop an optimization strategy for dynamic clustering.
(iv)
To handle dynamic streams automatically.

5.3 Issues in fuzzy clustering

In the field of fuzzy clustering, some meta-heuristic algorithms are also reported. These algorithms aim to improve the quality of solutions, especially for fuzzy clustering problems. The issues handled by these algorithms are listed.

(i)
To generate optimum cluster centres using the fuzzy membership function.
(ii)
To handle high-dimensional dataset.
(iii)
To determine relevant features in case of high dimensional data.
(iv)
To develop accurate prediction models.
(v)
To improve the quality of solutions.
(vi)
To handle data streams in an effective manner.

5.4 Issues in improved meta heuristic algorithm for clustering

This subsection demonstrates various issues related to the performance of the meta-heuristic algorithm and the need to improve these algorithms for efficiently solving clustering problems. The various shortcomings associated with meta-heuristic algorithms and successfully addressed through improved versions of meta-heuristic algorithms. The main reasons to improve the meta-heuristic algorithms are listed.

(i)
To overcome the slow convergence rate of meta-heuristic algorithms.
(ii)
To avoid premature convergence problem.
(iii)
To reduce noise effect and improve quality of solutions.
(iv)
To handle clustering in a hierarchical manner.
(v)
To reduce computational cost.
(vi)
To effective trade-off between local search and global search.
(vii)
To tackle overlapping and incremental clustering.
(viii)
To handle constraints in an effective manner.

5.5 Issues in hybrid meta heuristic algorithm for clustering

The issues that can be addressed through hybrid meta-heuristic algorithms are listed.

(i)
To overcome the shortcomings of traditional clustering algorithms like local optima and improve the quality of results.
(ii)
To remove infeasible solutions generated during execution.
(iii)
To handle local optima and convergence issues of meta-heuristic algorithm.
(iv)
To improve search mechanisms of algorithms.
(v)
To effectively handle exploration and exploitation processes.
(vi)
To address the initialization issues of clustering algorithms.
(vii)
To explore more promising solutions for clustering problems.
(viii)
To explore solution search space in an effective and efficient manner.
(ix)
To generate a neighbourhood solution.

6 Conclusion

In this survey, a large number of meta-heuristic algorithms are analysed concerning clustering applications. It is inferred that clustering problems can be classified in terms of Partitional, dynamic, and fuzzy clustering. A diversity of algorithms are reported in the literature to solve clustering problems effectively and efficiently. Some algorithms address issues related to performance, population diversity, local optima, search strategies, neighbourhood solutions, number of clusters, optimized cluster centres, etc. This paper presents a survey of high-repute publications in a particular period (2015–2021). These articles are categorized into Partitional, dynamic & automatic, and fuzzy clustering. Moreover, they are further classified into meta-heuristic, improved meta-heuristic, and hybrid meta-heuristic algorithms. Before the literature survey, several research questions are designed for an effective and efficient survey. The major contributions of this literature survey to the scientific community are.

RQ 1

What are the various meta-heuristic techniques available for clustering problems?

Answer: Large numbers of meta-heuristic algorithms employed to solve clustering problems are analysed. Several new algorithms are developed to solve these problems (CSS, MCSS, Bird flock algorithm, Electromagnetic force based algorithm, Magnetic optimization algorithm, Gravity algorithm, Big Bang Big Crunch algorithm). It is observed that these algorithms provide significant results in contrast to PSO, SA, TS, ACO, GA, and K-means etc. It is also observed that a smaller number of algorithms are based on traditional mathematical models. All recently developed algorithms are inspired by some natural phenomenon like the Big Bang Big Crunch, well-established laws like gravity law, and swarm behaviour (cuckoo optimization inspired through cuckoo’s behaviour). Tables 2, 3, 4, 5, 6 summarizes various algorithms.

RQ 2

How to handle automatic data clustering?

Answer: Dynamic & Automatic clustering problems are an active area of research due to online, web, and social mining. In these problems, the number of clusters is undefined, and clusters are designed according to the nature of the data. It is observed that several single-objective clustering algorithms are proposed to address the dynamic clustering problem. Again, these algorithms are based on natural phenomena (swarm behaviour). A few multi-objective algorithms are developed to handle dynamic clustering problems. Hence, it can be concluded that a lot of attention soon will be formed in this direction.

RQ 3

How to handle high dimensional data (problems) with clustering?

Answer: At present, a large number of data is generated, and this volume is increasing exponentially. This data contains meaningful patterns, but it is not an easy task to explore and analyze these patterns. So, to handle large data problems and extract meaning, several meta-heuristic clustering algorithms are proposed. A few are integrated with Hadoop (a parallel architecture) to retrieve and process data much faster than traditional approaches. Some ensemble clustering methods can handle high-dimensional data. It is seen that lack of multi-objective clustering methods to handle the aforementioned issues.

RQ 4

What are the main reasons for hybridizing the clustering algorithms?

Answer: Many improved and hybridized versions of algorithms are proposed. An algorithm is either improved/hybridized due to shortcomings associated with it or to avoid shortcomings related to problems being solved. Through the literature survey, it is observed that several shortcomings are associated with algorithm and clustering problems. These are local optima, convergence rate, population diversity, boundary constraints, neighbourhood solution structure, the effective trade-off between local and global searches of the algorithm, solution search mechanism, solution search equations, and dependence on random functions. It is also observed that hybridization is an active area of research and hybridization of an algorithm can improve its performance. Hence, to overcome the aforementioned problems, an algorithm can either be improved or hybridized to obtain significant and optimized results. Till date, there is no generic algorithm for solving all types of clustering problems and data (categorical, nominal, numeric, text, and binary).

RQ 5

What objective functions, performance measures, and datasets are adopted to evaluate the performance of clustering algorithms?

Answer: Large numbers of performance measures are employed to evaluate the performance of clustering algorithms. Table 8 contains performance measures, which are reported in the literature. It is observed that NMI, rand index, accuracy, inner and inter-cluster distance, and F-measure are widely adopted performance measures. Table 7 summarizes objective functions to find closeness between data objects. Ten objective functions are reported in the literature, Euclidean Distance is a widely adopted objective function. To evaluate performance various datasets reported in the literature are summarized in Table 9. It is analysed that Iris, Wine, Glass, Haberman, CMC, Vowel, and Breast cancer are the most significant (benchmark) datasets for evaluation. Highlights of the survey are listed.

130 SCI and/or Scopus (Free) articles are included from 70 journals that are published (2015-2024).
Euclidean distance is adopted as a significant distance to determine closeness between data objects.
It is analysed that partitional clustering is a widely adopted problem.
Improved and enhanced meta-heuristic algorithms are hybrid algorithms for effective and efficient clustering of data.
It is analysed that hybrid meta-heuristic algorithms are the more significant approach to handling various clustering problems.
Fuzzy and Automatic data clustering is a new and active area of research.
Lack of work reported on multi-objective data clustering, which leads to a scope in this direction.

In this survey, we have undertaken a comprehensive analysis of various meta-heuristic algorithms in the context of clustering applications. Our investigation has shed light on the diverse landscape of clustering problems, which can be classified into Partitional, dynamic, and fuzzy clustering categories. Through an extensive review of the literature published between 2015 and 2024, we have identified a multitude of algorithms that address key challenges associated with clustering, including performance, population diversity, local optima, and search strategies. Our survey has revealed the emergence of several novel meta-heuristic techniques for solving clustering problems, such as CSS, MCSS, Bird flock algorithm, Electromagnetic force-based algorithm, Magnetic optimization algorithm, Gravity algorithm, and Big Bang big crunch algorithm. These algorithms have demonstrated promising results compared to traditional methods like PSO, SA, TS, ACO, GA, and K-means, showcasing the effectiveness of leveraging natural phenomena and established laws as inspiration for algorithm design.

Additionally, we have explored the ongoing research efforts in dynamic and automatic clustering, which are driven by the growing demand for real-time data analysis in domains like online, web, and social mining. While single-objective clustering algorithms have made significant strides in addressing dynamic clustering challenges, there remains a need for the development of multi-objective algorithms to handle the complexity of evolving datasets more effectively. Furthermore, our survey has highlighted the importance of addressing the challenges posed by high-dimensional data in clustering. With the exponential growth of data volumes, there is a pressing need for meta-heuristic clustering algorithms capable of handling large-scale datasets efficiently. Integration with parallel architectures like Hadoop and the exploration of ensemble clustering methods represent promising avenues for addressing these challenges in the future. While our survey has provided valuable insights into the state-of-the-art in clustering, it is essential to acknowledge certain limitations inherent in our study. From a theoretical standpoint, the complexity of clustering problems and the diversity of datasets make it challenging to devise a one-size-fits-all solution. Moreover, practical limitations, such as computational resources and algorithm scalability, may impact the applicability of certain clustering techniques in real-world scenarios.

Moving forward, future research in clustering should focus on addressing these limitations and exploring new avenues for improvement. One promising direction is the development of hybrid meta-heuristic algorithms that combine the strengths of different optimization techniques to overcome the shortcomings of individual approaches. Additionally, there is a need for more extensive benchmarking of clustering algorithms using diverse datasets and performance metrics to ensure robustness and generalizability of results. In conclusion, our survey has provided valuable insights into the state-of-the-art meta-heuristic clustering algorithms and identified key areas for future research. By addressing the challenges posed by clustering in the era of big data, we can unlock new opportunities for knowledge discovery and decision-making in various domains.

Data availability

This is a survey paper and data is not associated with the manuscript.

References

Abasi AK, Khader AT, Al-Betar MA, Naim S, Alyasseri ZAA, Makhadmeh SN (2020) A novel hybrid multi-verse optimizer with K-means for text documents clustering. Neural Comput Appl 32:17703–17729
Article Google Scholar
Abbasi S, Choukolaei HA (2023) A systematic review of green supply chain network design literature focusing on carbon policy. Decis Anal J 6:100189
Article Google Scholar
Abualigah LM, Khader AT, Hanandeh ES (2018a) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Article Google Scholar
Abualigah LM, Khader AT, Hanandeh ES (2018b) A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering1. Intell Decis Technol 12(1):3–14
Google Scholar
Abualigah L, Elaziz MA, Yousri D, Al-qaness MA, Ewees AA, Zitar RA (2023) Augmented arithmetic optimization algorithm using opposite-based learning and lévy flight distribution for global optimization and data clustering. J Intell Manuf 34(8):3523–3561
Article Google Scholar
Ahmadi R, Ekbatanifard G, Bayat P (2021) A modified grey wolf optimizer based data clustering algorithm. Appl Artif Intell 35(1):63–79
Article Google Scholar
Alam S, Dobbie G, Rehman SU (2015) Analysis of particle swarm optimization-based hierarchical data clustering approaches. Swarm Evol Comput 25:36–51
Article Google Scholar
Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inform Syst 62:507–539
Article Google Scholar
Allab K, Labiod L, Nadif M (2017) A semi-NMF-PCA unified framework for data clustering. IEEE Trans Knowl Data Eng 29(1):2–16
Article Google Scholar
Alotaibi Y (2022) A new meta-heuristics data clustering algorithm based on tabu search and adaptive search memory. Symmetry 14(3):623
Article Google Scholar
Alswaitti M, Ishak MK, Isa NAM (2018) Optimized gravitational-based data clustering algorithm. Eng Appl Artif Intell 73:126–148
Article Google Scholar
Amiri E, Mahmoudi S (2016) Efficient protocol for data clustering by fuzzy cuckoo optimization algorithm. Appl Soft Comput 41:15–21
Article Google Scholar
Asadi-Zonouz M, Amin-Naseri MR, Ardjmand E (2022) A modified unconscious search algorithm for data clustering. Evol Intel 15(3):1667–1693
Article Google Scholar
Bahrololoum A, Nezamabadi-pour H, Saryazdi S (2015) A data clustering approach based on the universal gravity rule. Eng Appl Artif Intell 45:415–428
Article Google Scholar
Banharnsakun A (2017) A MapReduce-based artificial bee colony for large-scale data clustering. Pattern Recogn Lett 93:78–84
Article Google Scholar
Barshandeh S, Dana R, Eskandarian P (2022) A learning automata-based hybrid MPA and JS algorithm for numerical optimization problems and its application on data clustering. Knowl-Based Syst 236:107682
Article Google Scholar
Baykasoğlu A, Gölcük İ, Özsoydan FB (2018) Improving fuzzy c-means clustering via quantum-enhanced weighted superposition attraction algorithm. Hacettepe J Math Stat 48(3):859–882
MathSciNet Google Scholar
Bijari K, Zare H, Veisi H, Bobarshad H (2018) Memory-enriched big bang–big crunch optimization algorithm for data clustering. Neural Comput Appl 29:111–121
Article Google Scholar
Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum chaotic cuckoo search algorithm for data clustering. Expert Syst Appl 96:358–372
Article Google Scholar
Bouyer A, Hatamlou A (2018) An efficient hybrid clustering method based on improved cuckoo optimization and modified particle swarm optimization algorithms. Appl Soft Comput 67:172–182
Article Google Scholar
Chang X, Wang Q, Liu Y, Wang Y (2016) Sparse regularization in fuzzy $ c $-means for high-dimensional data clustering. IEEE Trans Cybern 47(9):2616–2627
Article Google Scholar
Cho PPW, Nyunt TTS (2020) Data clustering based on modified differential evolution and quasi-opposition-based learning. Intell Eng Syst 13(6):168–178
Google Scholar
Cruz DPF, Maia RD, da Silva LA, de Castro LN (2016) BeeRBF: a bee-inspired data clustering approach to design RBF neural network classifiers. Neurocomputing 172:427–437
Article Google Scholar
Das P, Das DK, Dey S (2018a) A new class topper optimization algorithm with an application to data clustering. IEEE Trans Emerg Top Comput 8(4):948–959
Google Scholar
Das P, Das DK, Dey S (2018b) A modified bee colony optimization (MBCO) and its hybridization with k-means for an application to data clustering. Appl Soft Comput 70:590–603
Article Google Scholar
Deb S, Tian Z, Fong S, Wong R, Millham R, Wong KK (2018) Elephant search algorithm applied to data clustering. Soft Comput 22(18):6035–6046
Article Google Scholar
Demirci H, Yurtay N, Yurtay Y, Zaimoğlu EA (2023) Electrical search algorithm: a new metaheuristic algorithm for clustering problem. Arab J Sci Eng 48(8):10153–10172
Article Google Scholar
dos Santos TR, Zárate LE (2015) Categorical data clustering: What similarity measure to recommend? Expert Syst Appl 42(3):1247–1260
Article Google Scholar
Elyasigomari V, Mirjafari MS, Screen HR, Shaheed MH (2015) Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization. Appl Soft Comput 35:43–51
Article Google Scholar
Emrouznejad A, Abbasi S, Sıcakyüz Ç (2023) Supply chain risk management: a content analysisbased review of existing and emerging topics. Supply Chain Anal 3:100031
Article Google Scholar
Ferrari DG, De Castro LN (2015) Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inform Sci 301:181–194
Article Google Scholar
Gebru ID, Alameda-Pineda X, Forbes F, Horaud R (2016) EM algorithms for weighted-data clustering with application to audio-visual scene analysis. IEEE Trans Pattern Anal Mach Intell 38(12):2402–2415
Article Google Scholar
Ghorbanzadeh L, Torshabi AE, Nabipour JS, Arbatan MA (2016) Development of a synthetic adaptive neuro-fuzzy prediction model for tumor motion tracking in external radiotherapy by evaluating various data clustering algorithms. Technol Cancer Res Treat 15(2):334–347
Article Google Scholar
Gupta Y, Saini A (2019) A new swarm-based efficient data clustering approach using KHM and fuzzy logic. Soft Comput 23(1):145–162
Article Google Scholar
Gupta C, Jain A, Tayal DK, Castillo O (2018) ClusFuDE: forecasting low dimensional numerical data using an improved method based on automatic clustering, fuzzy relationships and differential evolution. Eng Appl Artif Intell 71:175–189
Article Google Scholar
Gutierrez-Rodríguez AE, Martínez-Trinidad JF, García-Borroto M, Carrasco-Ochoa JA (2015) Mining patterns for clustering on numerical datasets using unsupervised decision trees. Knowl-Based Syst 82:70–79
Article Google Scholar
Haeri Boroujeni SP, Pashaei E (2023) A hybrid chimp optimization algorithm and generalized normal distribution algorithm with opposition-based learning strategy for solving data clustering problems. Iran J Comput Sci 65:1–37
Google Scholar
Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461
Article Google Scholar
Han X, Quan L, Xiong X, Almeter M, Xiang J, Lan Y (2017) A novel data clustering algorithm based on modified gravitational search algorithm. Eng Appl Artif Intell 61:1–7
Article Google Scholar
Harita M, Wong A, Suppi R, Rexachs D, Luque E (2024) A metaheuristic search algorithm based on sampling and clustering. IEEE Access 12:15493
Article Google Scholar
Hashemi SE, Gholian-Jouybari F, Hajiaghaei-Keshteli M (2023) A fuzzy C-means algorithm for optimizing data clustering. Expert Syst Appl 227:120377
Article Google Scholar
Hu H, Liu J, Zhang X, Fang M (2023) An effective and adaptable K-means algorithm for big data cluster analysis. Pattern Recogn 139:109404
Article Google Scholar
Jadhav AN, Gomathi N (2018) WGC: hybridization of exponential grey wolf optimizer with whale optimization for data clustering. Alex Eng J 57(3):1569–1584
Article Google Scholar
Jing L, Tian K, Huang JZ (2015) Stratified feature sampling method for ensemble clustering of high dimensional data. Pattern Recogn 48(11):3688–3702
Article Google Scholar
Kannan R, Vempala S, Vetta A (2004) On clusterings: good, bad and spectral. J ACM (JACM) 51(3):497–515
Article MathSciNet Google Scholar
Kaur A, Datta A (2015) A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data 2(1):17
Article Google Scholar
Kaur A, Kumar Y (2022) A new metaheuristic algorithm based on water wave optimization for data clustering. Evol Intel 15(1):759–783
Article Google Scholar
Kaur A, Pal SK, Singh AP (2020) Hybridization of chaos and flower pollination algorithm over k-means for data clustering. Appl Soft Comput 97:105523
Article Google Scholar
Kumar Y, Kaur A (2022) Variants of bat algorithm for solving partitional clustering problems. Eng Comput 38(Suppl 3):1973–1999
Article Google Scholar
Kumar Y, Sahoo G (2014) A charged system search approach for data clustering. Progress Artif Intell 2(2–3):153–166
Article Google Scholar
Kumar Y, Sahoo G (2015a) A hybrid data clustering approach based on improved cat swarm optimization and K-harmonic mean algorithm. AI Commun 28(4):751–764
Article MathSciNet Google Scholar
Kumar Y, Sahoo G (2015b) Hybridization of magnetic charge system search and particle swarm optimization for efficient data clustering using neighborhood search strategy. Soft Comput 19(12):3621–3645
Article Google Scholar
Kumar Y, Sahoo G (2016) A hybridise approach for data clustering based on cat swarm optimisation. Int J Inform Commun Technol 9(1):117–141
Google Scholar
Kumar Y, Singh PK (2018) Improved cat swarm optimization algorithm for solving global optimization problems and its application to clustering. Appl Intell 48:2681–2697
Article Google Scholar
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2015) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385
Article Google Scholar
Kumar V, Chhabra JK, Kumar D (2016a) Automatic data clustering using parameter adaptive harmony search algorithm and its application to image segmentation. J Intell Syst 25(4):595–610
Google Scholar
Kumar V, Chhabra JK, Kumar D (2016b) Data clustering using differential search algorithm. Pertanika J Sci Technol 24(2):295
Google Scholar
Kuo RJ, Lin TC, Zulvia FE, Tsai CY (2018a) A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis. Appl Soft Comput 67:299–308
Article Google Scholar
Kuo RJ, Rizki M, Zulvia FE, Khasanah AU (2018b) Integration of growing self-organizing map and bee colony optimization algorithm for part clustering. Comput Ind Eng 120:251–265
Article Google Scholar
Kuo RJ, Lin JY, Nguyen TPQ (2021) An application of sine cosine algorithm-based fuzzy possibilistic c-ordered means algorithm to cluster analysis. Soft Comput 25(5):3469–3484
Article Google Scholar
Kushwaha N, Pant M (2018) Fuzzy magnetic optimization clustering algorithm with its application to health care. J Ambient Intell Human Comput 1:1–10
Google Scholar
Kushwaha N, Pant M, Kant S, Jain VK (2018) Magnetic optimization algorithm for data clustering. Pattern Recogn Lett 115:59–65
Article Google Scholar
Kuwil FH, Shaar F, Topcu AE, Murtagh F (2019) A new data clustering algorithm based on critical distance methodology. Expert Syst Appl 129:296–310
Article Google Scholar
Lakshmi K, Visalakshi NK, Shanthi S (2018) Data clustering using K-means based on crow search algorithm. Sādhanā 43(11):190
Article MathSciNet Google Scholar
Lee J, Perkins D (2021) A simulated annealing algorithm with a dual perturbation method for clustering. Pattern Recogn 112:107713
Article Google Scholar
Leski JM (2016) Fuzzy c-ordered medoids clustering for interval-valued data. Pattern Recogn 58:49–67
Article Google Scholar
Li Y, Yang G, He H, Jiao L, Shang R (2016) A study of large-scale data clustering based on fuzzy clustering. Soft Comput 20(8):3231–3242
Article Google Scholar
Li T, De la Prieta Pintado F, Corchado JM, Bajo J (2017) Multi-source homogeneous data clustering for multi-target detection from cluttered background with misdetection. Appl Soft Comput 60:436–446
Article Google Scholar
Liu Q, Zhang R, Hu R, Wang G, Wang Z, Zhao Z (2019) An improved path-based clustering algorithm. Knowl-Based Syst 163:69–81
Article Google Scholar
Mageshkumar C, Karthik S, Arunachalam VP (2019) Hybrid metaheuristic algorithm for improving the efficiency of data clustering. Clust Comput 22(1):435–442
Article Google Scholar
Mansueto P, Schoen F (2021) Memetic differential evolution methods for clustering problems. Pattern Recogn 114:107849
Article Google Scholar
Meng L, Tan AH, Wunsch DC (2016) Adaptive scaling of cluster boundaries for large-scale social media data clustering. IEEE Trans Neural Netw Learn Syst 27(12):2656–2669
Article MathSciNet Google Scholar
Mikaeil R, Haghshenas SS, Haghshenas SS, Ataei M (2018) Performance prediction of circular saw machine using imperialist competitive algorithm and fuzzy clustering technique. Neural Comput Appl 29(6):283–292
Article Google Scholar
Moghadam P, Ahmadi A (2023) A novel two-stage bio-inspired method using red deer algorithm for data clustering. Evol Intell 17:1–18
Google Scholar
Montgomery D, Addison PS, Borg U (2016) Data clustering methods for the determination of cerebral auto regulation functionality. J Clin Monit Comput 30(5):661–668
Article Google Scholar
Narayana GS, Vasumathi D (2018) An attributes similarity-based K-medoids clustering technique in data mining. Arab J Sci Eng 43(8):3979–3992
Article Google Scholar
Nayak J, Naik B, Kanungo DP, Behera HS (2018) A hybrid elicit teaching learning based optimization with fuzzy c-means (ETLBO-FCM) algorithm for data clustering. Ain Shams Eng J 9(3):379–393
Article Google Scholar
Nazari A, Dehghan A, Nejatian S, Rezaie V, Parvin H (2019) A comprehensive study of clustering ensemble weighting based on cluster quality and diversity. Pattern Anal Appl 22(1):133–145
Article MathSciNet Google Scholar
Nguyen DD, Ngo LT, Pham LT, Pedrycz W (2015) Towards hybrid clustering approach to data classification: multiple kernels based interval-valued fuzzy C-means algorithms. Fuzzy Sets Syst 279:17–39
Article MathSciNet Google Scholar
Noorbehbahani F, Mousavi SR, Mirzaei A (2015) An incremental mixed data clustering method using a new distance measure. Soft Comput 19(3):731–743
Article Google Scholar
Özbakır L, Turna F (2017) Clustering performance comparison of new generation meta-heuristic algorithms. Knowl-Based Syst 130:1–16
Article Google Scholar
Ozturk C, Hancer E, Karaboga D (2015) Dynamic clustering with improved binary artificial bee colony algorithm. Appl Soft Comput 28:69–80
Article Google Scholar
Pacifico LD, Ludermir TB (2021) An evaluation of k-means as a local search operator in hybrid memetic group search optimization for data clustering. Nat Comput 20(3):611–636
Article MathSciNet Google Scholar
Pakrashi A, Chaudhuri BB (2016) A Kalman filtering induced heuristic optimization based partitional data clustering. Inform Sci 369:704–717
Article Google Scholar
Patel VP, Rawat MK, Patel AS (2023) Local neighbour spider monkey optimization algorithm for data clustering. Evol Intel 16(1):133–151
Article Google Scholar
Pimentel BA, de Carvalho AC (2019) A new data characterization for selecting clustering algorithms using meta-learning. Inform Sci 477:203–219
Article Google Scholar
Pohl D, Bouchachia A, Hellwagner H (2016) Online indexing and clustering of social media data for emergency management. Neurocomputing 172:168–179
Article Google Scholar
Premkumar M, Sinha G, Ramasamy MD, Sahu S, Subramanyam CB, Sowmya R, Derebew B (2024) Augmented weighted K-means grey wolf optimizer: an enhanced metaheuristic algorithm for data clustering problems. Sci Rep 14(1):5434
Article Google Scholar
Puschmann D, Barnaghi P, Tafazolli R (2017) Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J 4(1):64–74
Article Google Scholar
Qiao S, Zhou Y, Zhou Y, Wang R (2019) A simple water cycle algorithm with percolation operator for clustering analysis. Soft Comput 23(12):4081–4095
Article Google Scholar
Qtaish A, Braik M, Albashish D, Alshammari MT, Alreshidi A, Alreshidi EJ (2024) Optimization of K-means clustering method using hybrid capuchin search algorithm. J Supercomput 80(2):1728–1787
Article Google Scholar
Queiroga E, Subramanian A, Lucídio dos Anjos FC (2018) Continuous greedy randomized adaptive search procedure for data clustering. Appl Soft Comput 72:43–55
Article Google Scholar
Rahnema N, Gharehchopogh FS (2020) An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multim Tools Appl 79(43):32169–32194
Article Google Scholar
Rathore P, Kumar D, Bezdek JC, Rajasegarar S, Palaniswami M (2018) A rapid hybrid clustering algorithm for large volumes of high dimensional data. IEEE Trans Knowl Data Eng 31(4):641–654
Article Google Scholar
Safarinejadian B, Hasanpour K (2016) Distributed data clustering using mobile agents and EM algorithm. IEEE Syst J 10(1):281–289
Article Google Scholar
Salem SB, Naouali S, Chtourou Z (2018) A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput Electr Eng 68:463–483
Article Google Scholar
Salih SQ, Alsewari AA, Wahab HA, Mohammed MK, Rashid TA, Das D, Basurra SS (2023) Multi-population black hole algorithm for the problem of data clustering. PLoS ONE 18(7):e0288044
Article Google Scholar
Santi É, Aloise D, Blanchard SJ (2016) A model for clustering data from heterogeneous dissimilarities. Eur J Oper Res 253(3):659–672
Article MathSciNet Google Scholar
Schaeffer SE (2007) Graph clustering computer. Sci Rev 1(1):27–64
Google Scholar
Senthilnath J, Kulkarni S, Suresh S, Yang XS, Benediktsson JA (2019) FPA clust: evaluation of the flower pollination algorithm for data clustering. Evol Intell 14:1–11
Google Scholar
Serapião AB, Corrêa GS, Gonçalves FB, Carvalho VO (2016) Combining K-means and K-harmonic with fish school search algorithm for data clustering task on graphics processing units. Appl Soft Comput 41:290–304
Article Google Scholar
Sharma M, Chhabra JK (2019) An efficient hybrid PSO polygamous crossover based clustering algorithm. Evol Intell 14:1–19
Google Scholar
Sheng W, Chen S, Fairhurst M, Xiao G, Mao J (2014) Multilocal search and adaptive niching based memetic algorithm with a consensus criterion for data clustering. IEEE Trans Evol Comput 18(5):721–741
Article Google Scholar
Sheng W, Chen S, Sheng M, Xiao G, Mao J, Zheng Y (2016) Adaptive multisubpopulation competition and multiniche crowding-based memetic algorithm for automatic data clustering. IEEE Trans Evol Comput 20(6):838–858
Google Scholar
Shial G, Sahoo S, Panigrahi S (2023) An enhanced GWO algorithm with improved explorative search capability for global optimization and data clustering. Appl Artif Intell 37(1):2166232
Article Google Scholar
Siddiqi UF, Sait SM (2017) A new heuristic for the data clustering problem. IEEE Access 5:6801
Article Google Scholar
Singh T (2020) A chaotic sequence-guided Harris hawks optimizer for data clustering. Neural Comput Appl 32:17789–17803
Article Google Scholar
Singh S, Srivastava S (2022) Kernel fuzzy C-means clustering with teaching learning based optimization algorithm (TLBO-KFCM). J Intell Fuzzy Syst 42(2):1051–1059
Article Google Scholar
Singh H, Rai V, Kumar N, Dadheech P, Kotecha K, Selvachandran G, Abraham A (2023) An enhanced whale optimization algorithm for clustering. Multim Tools Applic 82(3):4599–4618
Article Google Scholar
Su ZG, Denoeux T (2018) BPEC: belief-peaks evidential clustering. IEEE Trans Fuzzy Syst 27(1):111–123
Article Google Scholar
Tan PN, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education India
Google Scholar
Tang D, Dong S, He L, Jiang Y (2016) Intrusive tumor growth inspired optimization algorithm for data clustering. Neural Comput Appl 27(2):349–374
Article Google Scholar
Tekieh R, Beheshti Z (2024) A MapReduce-based big data clustering using swarm-inspired meta-heuristic algorithms. Sci Iranica 31:737
Google Scholar
Tinós R, Zhao L, Chicano F, Whitley D (2018) NK hybrid genetic algorithm for clustering. IEEE Trans Evol Comput 22(5):748–761
Article Google Scholar
Tsai CW, Chang WY, Wang YC, Chen H (2019) A high-performance parallel coral reef optimization for data clustering. Soft Comput 23:9327–9340
Article Google Scholar
Turkoglu B, Uymaz SA, Kaya E (2022) Clustering analysis through artificial algae algorithm. Int J Mach Learn Cybern 13(4):1179–1196
Article Google Scholar
Vo TNC, Nguyen HP, Vo TNT (2016) Making kernel-based vector quantization robust and effective for incomplete educational data clustering. Vietnam J Comput Sci 3(2):93–102
Article Google Scholar
Xiang WL, Zhu N, Ma SF, Meng XL, An MQ (2015) A dynamic shuffled differential evolution algorithm for data clustering. Neurocomputing 158:144–154
Article Google Scholar
Xie H, Zhang L, Lim CP, Yu Y, Liu C, Liu H, Walters J (2019) Improving K-means clustering with enhanced firefly algorithms. Appl Soft Comput 84:105763
Article Google Scholar
Xu S, Liu S, Zhou J, Feng L (2019) Fuzzy rough clustering for categorical data. Int J Mach Learn Cybern 10(11):3213–3322
Article Google Scholar
Yan Y, Nguyen T, Bryant B, Harris FC Jr (2019) Robust fuzzy cluster ensemble on cancer gene expression data. Proc Int Conf 60:120–128
Google Scholar
Yang Y, Jiang J (2018) Adaptive Bi-weighting toward automatic initialization and model selection for HMM-based hybrid meta-clustering ensembles. IEEE Trans Cybern 49(5):1657–1668
Article MathSciNet Google Scholar
Yang CL, Kuo RJ, Chien CH, Quyen NTP (2015) Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering. Appl Soft Comput 30:113–122
Article Google Scholar
Yao X, Ge S, Kong H, Ning H (2018) An improved clustering algorithm and its application in wechat sports users analysis. Procedia Comput Sci 129:166–174
Article Google Scholar
Yu H, Zhang C, Wang G (2016) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91:189–203
Article Google Scholar
Yuwono M, Su SW, Moulton BD, Nguyen HT (2014) Data clustering using variants of rapid centroid estimation. IEEE Trans Evol Comput 18(3):366–377
Article Google Scholar
Zhang B, Qin S, Wang W, Wang D, Xue L (2016a) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Signal Process 126:111–116
Article Google Scholar
Zhang H, Raitoharju J, Kiranyaz S, Gabbouj M (2016b) Limited random walk algorithm for big graph data clustering. J Big Data 3(1):26
Article Google Scholar
Zhang QH, Li BL, Liu YJ, Gao L, Liu LJ, Shi XL (2016c) Data clustering using multivariant optimization algorithm. Int J Mach Learn Cybern 7(5):773–782
Article Google Scholar
Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl-Based Syst 163:546–557
Article Google Scholar
Zhu E, Ma R (2018) An effective partitional clustering algorithm based on new clustering validity index. Appl Soft Comput 71:608–621
Article Google Scholar

Download references

Funding

This study was not funded by any organization.

Author information

Authors and Affiliations

Department of Information Technology, Chandigarh Engineering College-CGC, Landran, Punjab, India
Arvinder Kaur
School of Technology Management and Engineering, NMIMS, Chandigarh, India
Yugal Kumar & Jagpreet Sidhu

Authors

Arvinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Yugal Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Jagpreet Sidhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B wrote the main manuscript text and C. prepared all figures & tables. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jagpreet Sidhu.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Research involving human participants and/or animals

This article does not contain any studies with human participants and animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kaur, A., Kumar, Y. & Sidhu, J. Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges. Artif Intell Rev 57, 287 (2024). https://doi.org/10.1007/s10462-024-10920-1

Download citation

Accepted: 22 August 2024
Published: 12 September 2024
DOI: https://doi.org/10.1007/s10462-024-10920-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges

Abstract

Similar content being viewed by others

Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature

A Generalized Study on Data Mining and Clustering Algorithms

Clustering Analysis Based on Coyote Search Technique

Explore related subjects

1 Introduction

1.1 Research questions (RQ)

RQ 1

RQ 2

RQ 3

RQ 4

RQ 5

1.2 Purpose of this survey

2 Methodology for the survey

2.1 Source of information

2.2 Inclusion and search criteria

2.3 Exclusion criteria

2.4 Extraction of articles

2.5 Data classification process

3 Literature survey

3.1 Meta-heuristic algorithms for partitional clustering

3.1.1 Meta-heuristic algorithms for dynamic and automatic partitional clustering

3.1.2 Meta-heuristic algorithms for fuzzy clustering (generalization of the partitional clustering)

3.1.3 Improved meta heuristic algorithm for partitional clustering

3.1.4 Hybrid metaheuristic algorithm for partitional clustering

4 Objective function, performance metric and dataset

4.1 Objective function

4.2 Performance metrics

4.3 Dataset

5 Issues and challenges

5.1 Issues in partitional clustering

5.2 Issues in dynamic and automatic clustering

5.3 Issues in fuzzy clustering

5.4 Issues in improved meta heuristic algorithm for clustering

5.5 Issues in hybrid meta heuristic algorithm for clustering

6 Conclusion

RQ 1

RQ 2

RQ 3

RQ 4

RQ 5

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Research involving human participants and/or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation