In recent years, vast amounts of unstructured data have been and are being produced from multiple sources around the world. In the field of data mining, clustering is the most efficient technique for grouping such unstructured or unsupervised data. In its basic form, data clustering is an unsupervised method that groups or cluster objects so that all objects within the same cluster are very similar to each other, whereas objects grouped similarly in the different clusters are quite distinct. However, due to the exponential growth of data amounts available in a wide variety of scientific fields, it has become increasingly difficult to manipulate and analyze such information. In addition, it is becoming progressively more cumbersome to extract hidden features from the resulting huge unstructured and unsupervised datasets. This study reports on an improved general k-means clustering algorithm that was created by modifying its initial centroid selection process and adding a new outlier detection and filtering algorithm. Under normal conditions, most algorithms select initial centroids randomly, which often leads to poor initial cluster quality. Additionally, most existing outlier detection techniques are inadequate due to their poor accuracy levels, high computational complexity, and inability to identify outlier data. In contrast, after an analysis of comprehensive experiments performed to validate our approach via comparisons against existing techniques and benchmark performance values, we found that our proposed approach performs better than existing methods in terms of initial centroid selection, outlier detection, and other related matters.