This section shows the experimental results of finding the optimum number of abandoned least solutions of particles and the comparative results of the proposed subswarm-PSO algorithm with the optimum number of abandoned particles with the least solution for the text document clustering problem.
8.1. Different Number of Abandoned Particles with Least Solutions for Subswarm-PSO
This section shows the experimental results of a different number of abandoned particles of PSO for the subswarm-PSO approach to find the optimum number of abandoned particles with the least solution. Here, we experiment and compare with 1, , , , and number of abandoned particles for subswarm-PSO with data set 5 for 10 and 20 particles.
This experiment includes the following.
1 particle with the least solution will be abandoned and reinitialized randomly. The remaining particles will move to the next position.
particles with the least solution will be abandoned and reinitialized randomly. The remaining particles will move to the next position.
particles with the least solution will be abandoned and reinitialized randomly. The remaining particles will move to the next position.
particles with the least solution will be abandoned and reinitialized randomly. The remaining particles will move to the next position.
particles with the least solution will be abandoned and reinitialized randomly. The remaining 1 particle will move to the next position.
Here,
Table 3 and
Figure 3 show the comparative experimental results for a number of abandoned particles with the least solutions such as 1,
,
,
, and
are compared with the standard PSO algorithm for 10 particles. This table shows the maximum, mean, and standard deviation of purity values for the iterations (1, 10, 20, 30, …, 100) and data set 5. This table also shows the average and ranks for all numbers of abandoned particles. Here, the results show that
abandoned particles show the highest performance with the purity values 0.806, 0.812, 0.816, 0.836, 0.842, 0.848, and 0.850 for iterations 10, 30, 40, 60, 70, 90, and 100, respectively. The next level performance is shown for
and
with an average of 0.797 and 0.788.
Figure 3 shows that comparing the standard PSO algorithm with other compared methods shows better performance and half-abandoned
particles show the best performance.
Here,
Table 4 shows the comparative experimental results for a number of abandoned particles with the least solutions such as 1,
,
,
, and
are compared with the standard PSO algorithm for 20 particles. This table shows the maximum, mean, and standard deviation of purity for the iterations (1, 10, 20, 30, …, 100) and data set 5. This table also shows the average and ranks for all numbers of abandoned particles. Here, the results show that
abandoned particles show the highest performance with the purity values 0.862, 0.852, 0.866, and 0.906 for iterations 20, 40, 70, and 90 respectively. The next level’s performance is shown for
with an average of 0.832.
Figure 4 shows the results of a different number of abandoned particles for 20 particles and this figure shows that comparing the standard PSO algorithm other compared methods show better performance and half-abandoned
particles show the best performance with the highest average value of 0.843.
The results from
Table 3 and
Table 4 show that experiments with 10 and 20 particles are having almost similar patterned results and half-abandoned
particles show the best performance.
8.2. Performance Comparison of Proposed Subswarm PSO Algorithm
This section shows the performance comparison results of the proposed subswarm-PSO algorithm with an optimal number of abandoned particles with the least solution which is determined by the experimental results of the previous subsection.
Table 5 shows the maximum, mean, and standard deviation of purity values for the text document clustering algorithms. These values are the average of all the iteration numbers. The highest purity means values among all the algorithms for each data set are highlighted in bold. Here, the subswarm-PSO algorithm has the highest purity mean values of 0.728, 0.820, 0.650, 0.807, and 0.888 for data set 1, 3, 4, 5, and 6 respectively. The algorithm K-means shows the highest purity mean value of 0.763 for data set 2. The K-means algorithm shows the lowest purity mean values of 0.527, 0.651, 0.450, and 0.595 for data sets 1, 3, 4, and 5 respectively.
Table 6 shows the ranking of the mean of purity values for text document clustering algorithms with all data sets for the data shown in
Table 5. This table shows that the total performance ranks of the algorithms K-means, PSO, and subswarm-PSO are 3, 2, and 1 respectively.
The comparative results and average running time of all algorithms for the data sets 1 to 6 are shown in the
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12,
Figure 13,
Figure 14,
Figure 15 and
Figure 16. K-means takes very little time for execution. So, we ignored K-means and compared the execution time for only PSO and subswarm-PSO.
Figure 5 shows the comparative results of the proposed algorithm subswarm-PSO with all other algorithms for data set 1. Here, the result shows that the proposed algorithm shows the best performance than other algorithms and the algorithm PSO is performing second. K-means is the lowest-performing algorithm.
Figure 6 compares the average execution time of PSO and subswarm-PSO for the results of
Figure 5 and
Table 5. Here, the proposed algorithm subswarm-PSO takes
less time than PSO for the execution on average.
Figure 7 shows the comparative results and
Figure 8 show the average execution time of the proposed subswarm-PSO algorithm with other algorithms for the data set 2. From
Figure 7 we can see that comparing other algorithms, K-means is performing well. However, our proposed algorithm subswarm-PSO shows a second better performance than PSO algorithms. On average our proposed algorithm subswarm-PSO takes
less time for the execution than PSO as shown in
Figure 8.
Figure 9 and
Figure 10 show the comparative results and average running time respectively for all algorithms for data set 3.
Figure 9 and
Table 5 show that our proposed subswarm-PSO algorithm shows the highest purity. The PSO algorithm performs second. Here, the K-means algorithm shows the least performance.
Figure 10 shows that subswarm-PSO takes
less execution time than PSO.
Figure 11 and
Figure 12 are the comparative results and average running time respectively of all algorithms for data set 4. From
Figure 11 and
Table 5, we can say that the proposed subswarm-PSO shows the highest performance, and the PSO algorithm performs second. K-means shows the least performance for dataset 4.
Figure 12 shows similar results that subswarm-PSO take
little less execution time than PSO.
Figure 13 and
Figure 14 are the comparative results and average running time respectively of all algorithms for data set 5.
Figure 13 shows that subswarm-PSO, PSO, and K-means algorithms perform first, second, and third respectively. Here, similarly to other data sets subswarm-PSO takes
less execution time than PSO.
Figure 15 and
Figure 16 are the comparative results and average running time respectively of all algorithms for data set 6.
Figure 15 shows that subswarm-PSO, PSO, and K-means algorithms perform first, second, and third respectively. Here, the algorithm subswarm-PSO takes
less time for execution than PSO as shown in
Figure 16.
8.3. Discussion
Based on the above results, we can say that the proposed algorithm subswarm-PSO is performing best to find the optimal solution for most of the data sets by comparing standard PSO and K-means algorithms in text document clustering. The average running times of this proposed subswarm-PSO algorithm are , , , , , and lesser than the standard PSO algorithm for data sets 1, 2, 3, 4, 5, and 6 respectively. The K-means is much faster than other algorithms but it shows the very least optimum solutions for all data sets except data set 2.
The standard PSO algorithm has more ability to solve complex optimization problems such as text document clustering but is usually trapped into the local optimum. Our proposed algorithm ignores half of the least solutions and reinitializes the particles which include the least solution in each iteration. This proposed approach increases the global search capability of PSO and enhances the performance of the PSO algorithm. During each iteration, only half of the particles which include the best solutions are moved to the next solution, and others are reinitialized randomly. Random initialization takes less time compared with updating particles to the next position. This might be the reason for the reduced execution time of our proposed algorithm.
In our proposed method, we choose half
of the particles with the least solution to abandon based on the results from
Table 3 and
Table 4. Comparing the standard PSO with other different numbers of abandoned solutions, this half
and nearly half
and
are showing better performance. While choosing
, it may be balanced well to retain the top half of good solutions to move towards the global best solution and use the remaining half of the particles to explore the new solution. Here, if we reduce the number of abandoned particles to
, the global search capability may be reduced, and if increase the number of abandoned particles to more than
, may be distracted from moving toward the global best solution. This might be the reason for the
abandoned particles performing well.