An improved k-means algorithm based on min-max distance and BWP metrics
List of Authors
  • Lim Eng Aik , Tan Wee Choon

Keyword
  • K-means, algorithm, clustering, maximum-minimum distance, BWP, UCI.

Abstract
  • The k-means algorithm is a conventional unsupervised cluster analysis algorithm, which is fast and easy to implement. Still, the number of clusters needs to be defined, and selecting the centre of mass is uncertain. A K-means algorithm based on the combination of maximum-minimum distance and Between-Within-Proportion (BWP) metrics is proposed to overcome these limitations. The results of simulation experiments on three datasets in the UCI database show that the proposed algorithm outperforms both the conventional K-means algorithm and the maximum-minimum distance-based K-means algorithm in terms of accuracy and clustering effect.

Reference
  • 1. Anon, University of California, Irvine (2022). UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/index.php.

    2. Batool, I., Khan, T. (2022). Software Fault Prediction Using Data Mining, Machine Learning and Deep Learning Techniques: A Systematic Literature Review, Computers and Electrical Engineering, 100, 107886.

    3. Huang, S., Kang, Z., Xu, Z., Liu, Q. (2021). Robust Deep K-means: An effective and Simple Method for Data Clustering, Pattern Recognition, 117, 107996.

    4. Kumar, M., Reddy, R. (2017). An Efficient K-means Clustering Filtering Algorithm Using Density Based Initial Cluster Centers, Information Sciences, 418, 286-301.

    5. MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations, Proceeding of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1, 281-297.

    6. Zhang, H., Yu, H., Li, Y., Hu, B. (2015). Improved K-means Algorithm Based on the Clustering Reliability Analysis, International Symposium on Computer & Informatics (ISCI 2015), 1, 2516-2523.

    7. Yang, L., Wu, D., Cai, Y., Shi, X., Wu, Y. (2020). Learning-Based User Clustering and Link Allocation for Content Recommendation Based on D2D Multicast Communications, IEEE Transactions on Multimedia, 22(8), 2111-2125.

    8. Aradnia, A., Haeri, M., Ebadzadeh, M. (2022). Adaptive Explicit Kernel Minkowski Weighted K-Means, Information Sciences, 584, 503-518.

    9. Guo, R., Chen, J., Wang, L. (2021). Hierarchical K-means Clustering for Registration of Multi-View Point Sets, Computers & Electrical Engineering, 94, 107231.

    10. Yang, H., Luo, J., Fan, Y., Zhu, L. (2020). Using Weighted K-means to Identify Chinese Leading Venture Capital Firms Incorporating with Centrality Measures, Information Processing & Management, 57(2), 102083.

    11. Fan, M., Cao, Z., Cheng, J., Yang, F., Qi, X. (2020). Degree-Like Centrality with Structural Zeroes or Ones: When is a Neighbor not a Neighbor?, 63, 38-46.

    12. Savitha, R., Ambikapathi, A., Rajaraman, K. (2020). Online RBM: Growing Restricted Boltzmann Machine on the Fly for unsupervised Representation, Applied Soft Computing, 92, 106278.

    13. Bianchessi, N., Corberan, A., Plana, I., Reula, M., Sanchis, J. (2022). The Min-Max Close Enough Arc Routing Problem, European Journal of Operation Research, 300(3), 837-851.

    14. Razaee, M., Eshkevari, M., Saberi, M., Hussain, O. (2021). GBK-Means Clustering Algorithm: An Improvement to the K-means Algorithm Based on the Bargaining Game, 213, 106672.