International Journal of Applied Information Systems |
Foundation of Computer Science (FCS), NY, USA |
Volume 4 - Number 5 |
Year of Publication: 2012 |
Authors: Neepa Shah, Sunita Mahajan |
10.5120/ijais12-450691 |
Neepa Shah, Sunita Mahajan . Document Clustering: A Detailed Review. International Journal of Applied Information Systems. 4, 5 ( October 2012), 30-38. DOI=10.5120/ijais12-450691
Document clustering is automatic organization of documents into clusters so that documents within a cluster have high similarity in comparison to documents in other clusters. It has been studied intensively becauseof its wide applicability in various areas such as web mining,search engines, and information retrieval. It is measuring similarity between documents and grouping similardocuments together. It providesefficient representation and visualization of thedocuments; thus helps in easy navigation also. In this paper, we have given overview of various document clustering methodsstudied and researched since last few years,starting from basic traditional methods to fuzzy based, genetic, co-clustering, heuristic oriented etc. Also, the document clustering procedure with feature selection process, applications, challenges in document clustering, similarity measures and evaluation of document clustering algorithm is explained.