CFP last date
15 January 2025
Call for Paper
February Edition
IJAIS solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 15 January 2025

Submit your paper
Know more
Reseach Article

An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality

by Amanpreet Kaur Toor, Amarpreet Singh
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 7 - Number 2
Year of Publication: 2014
Authors: Amanpreet Kaur Toor, Amarpreet Singh
10.5120/ijais14-451136

Amanpreet Kaur Toor, Amarpreet Singh . An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality. International Journal of Applied Information Systems. 7, 2 ( April 2014), 5-9. DOI=10.5120/ijais14-451136

@article{ 10.5120/ijais14-451136,
author = { Amanpreet Kaur Toor, Amarpreet Singh },
title = { An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality },
journal = { International Journal of Applied Information Systems },
issue_date = { April 2014 },
volume = { 7 },
number = { 2 },
month = { April },
year = { 2014 },
issn = { 2249-0868 },
pages = { 5-9 },
numpages = {9},
url = { https://www.ijais.org/archives/volume7/number2/618-1136/ },
doi = { 10.5120/ijais14-451136 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T18:54:36.461349+05:30
%A Amanpreet Kaur Toor
%A Amarpreet Singh
%T An Advanced Clustering Algorithm (ACA) for Clustering Large Data Set to Achieve High Dimensionality
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 7
%N 2
%P 5-9
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The cluster analysis method is one of the critical methods in data mining; this method of clustering algorithm will manipulate the clustering results directly. This paper proposes an Advanced Clustering Algorithm in order to addresses the concern of high dimensionality and large data set [1]. The Advanced Clustering Algorithm method avoids computing the distance of each data object to the cluster recursively and save the execution time. ACA requires a simple data structure to store information in each iteration, which is to be used in the next iteration. Experimental results show that the Advanced Clustering Algorithm method can effectively improve the speed of clustering and accuracy, reducing the computational complexity of the traditional algorithm Kohonen SOM. This paper includes Advanced Clustering Algorithm (ACA) and its simulated experimental results with different data sets.

References
  1. Yuan F, Meng Z. H, Zhang H. X and Dong C. R, "A New Algorithm to Get the Initial Centroids," Proc. of the 3rd International Conference on Machine Learning and Cybernetics, pp. 26–29, August 2004.
  2. Sun Jigui, Liu Jie, Zhao Lianyu, "Clustering algorithms Research",Journal of Software ,Vol 19,No 1, pp. 48-61,January 2008.
  3. Amanpreet Kaur Toor, Amarpreet Singh, " Analysis of Clustering Algorithm based on Number of Clusters, error rate, Computation Time and Map Topology on large Data Set", International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Volume 2, Issue 6, November- December 2013.
  4. Amanpreet Kaur Toor, Amarpreet Singh, " A Survey paper on recent clustering approaches in data mining", International Journal of Advanced Research in Computer Science and Software Engineering Vol 3, Issue 11, November 2013.
  5. Sun Shibao, Qin Keyun," Research on Modified K-means Data Cluster Algorithm"I. S. Jacobs and C. P. Bean, "Fine particles, thin films and exchange anisotropy," Computer Engineering, vol. 33, No. 13, pp. 200– 201,July 2007.
  6. Merz C and Murphy P, UCI Repository of Machine Learning Databases, Available: ftp://ftp. ics. uci. edu/pub/machine-learning-databases
  7. Fahim A M,Salem A M,Torkey F A, "An efficient enhanced k-means clustering algorithm" Journal of Zhejiang University Science A, Vol. 10, pp:1626-1633,July 2006.
  8. Zhao YC, Song J. GDILC: A grid-based density isoline clustering algorithm. In: Zhong YX, Cui S, Yang Y, eds. Proc. of theInternet Conf. on Info-Net. Beijing: IEEE Press,2001. 140?145. http://ieeexplore. ieee. org/iel5/7719/21161/00982709. pdf
  9. Huang Z, "Extensions to the k-means algorithm for clustering large data sets with categorical values," Data Mining and Knowledge Discovery, Vol. 2, pp:283–304, 1998.
  10. K. A. AbdulNazeer, M. P. Sebastian, "Improving the Accuracy and Efficiency of the k-means Clustering Algorithm",Proceeding of the World Congress on Engineering, vol 1,london, July 2009.
  11. Fred ALN, Leitão JMN. Partitionalvs hierarchical clustering using a minimum grammar complexity approach. In: Proc. of the SSPR & SPR 2000. LNCS 1876, 2000. 193?202. http://www. sigmod. org/dblp/db/conf/sspr/sspr2000. htm
  12. Gelbard R, Spiegler I. Hempel's raven paradox: A positive approach to cluster analysis. Computers and Operations Research, 2000,27(4):305?320.
  13. Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proc. of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tucson, 1997. 146?151.
  14. Ding C, He X. K-Nearest-Neighbor in data clustering: Incorporating local information into global optimization. In: Proc. of the ACM Symp. on Applied Computing. Nicosia: ACM Press, 2004. 584?589. http://www. acm. org/conferences/sac/sac2004/
  15. HinneburgA,KeimD. An efficient approach to clustering in large multimedia databases with noise. In:AgrawalR,StolorzPE,Piatetsky- Shapiro G,eds. Proc. of the 4th Int'l Conf. on Knowledge Discovery and Data Mining(KDD'98). New York:AAAIPress,1998. 58~65.
  16. ZhangT,RamakrishnanR,LivnyM. BIRCH:An efficient data clustering method for very large databases. In:JagadishHV,MumickIS,eds. Proc. of the 1996 ACM SIGMOD Int'l Conf. on Management of Data. Montreal:ACM Press,1996. 103~114.
  17. Birant D, Kut A. ST-DBSCAN: An algorithm for clustering spatial- temporal data. Data & Knowledge Engineering, 2007,60(1): 208-221.
Index Terms

Computer Science
Information Sciences

Keywords

ACA SOM Clustering Large Data Set High Dimensionality Cluster Analysis