CFP last date
15 May 2024
Call for Paper
June Edition
IJAIS solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 15 May 2024

Submit your paper
Know more
Reseach Article

A Fast Deterministic Kmeans Initialization

by Omar Kettani, Faical Ramdani
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 2
Year of Publication: 2017
Authors: Omar Kettani, Faical Ramdani
10.5120/ijais2017451683

Omar Kettani, Faical Ramdani . A Fast Deterministic Kmeans Initialization. International Journal of Applied Information Systems. 12, 2 ( May 2017), 6-11. DOI=10.5120/ijais2017451683

@article{ 10.5120/ijais2017451683,
author = { Omar Kettani, Faical Ramdani },
title = { A Fast Deterministic Kmeans Initialization },
journal = { International Journal of Applied Information Systems },
issue_date = { May 2017 },
volume = { 12 },
number = { 2 },
month = { May },
year = { 2017 },
issn = { 2249-0868 },
pages = { 6-11 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number2/984-2017451683/ },
doi = { 10.5120/ijais2017451683 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:07:56.282663+05:30
%A Omar Kettani
%A Faical Ramdani
%T A Fast Deterministic Kmeans Initialization
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 2
%P 6-11
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The k-means algorithm remains one of the most widely used clustering methods, in spite of its sensitivity to the initial settings. This paper explores a simple, computationally low, deterministic method which provides k-means with initial seeds to cluster a given data set. It is simply based on computing the means of k samples with equal parts taken from the given data set. We test and compare this method to the related well know kkz initialization algorithm for k-means, using both simulated and real data, and find it to be more efficient in many cases.

References
  1. Aloise D., Deshpande A., Hansen P., Popat P.: NP-hardness of Euclidean sum-of- squares clustering. Machine Learning, 75, 245 - 249 (2009).
  2. Lloyd., S. P. (1982). "Least squares quantization in PCM". IEEE Transactions on Information Theory 28 (2): 129–137. doi:10.1109/TIT.1982.1056489.
  3. Peña J.M., Lozano J.A., Larrañaga P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10), 1027 - 1040 (1999).
  4. 4.. Forgy E., "Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications". Biometrics, 21, 768 - 769 (1965).
  5. Arthur D., Vassilvitskii S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM Symp. on Disc. Alg, pp. 1027 - 1035 (2007).
  6. Bahmani B., Moseley B., Vattani A., Kumar R., Vassilvitskii S.:Scalable K-means++. In: Proceedings of the VLDB Endowment (2012).
  7. I. Katsavounidis, C.-C. J. Kuo, Z. Zhang, A New Initialization Technique for Generalized Lloyd Iteration, IEEE Signal Processing Letters 1 (10) (1994) 144–146.
  8. 8.Asuncion, A. and Newman, D.J. (2007). UCI Machine LearningRepository[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.
  9. L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.
Index Terms

Computer Science
Information Sciences

Keywords

k-means initialization kkz