CFP last date
16 December 2024
Reseach Article

Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques

by Hamid Reza Khosravani
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 3 - Number 3
Year of Publication: 2012
Authors: Hamid Reza Khosravani
http:/ijais12-450475

Hamid Reza Khosravani . Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques. International Journal of Applied Information Systems. 3, 3 ( July 2012), 8-12. DOI=http:/ijais12-450475

@article{ http:/ijais12-450475,
author = { Hamid Reza Khosravani },
title = { Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques },
journal = { International Journal of Applied Information Systems },
issue_date = { July 2012 },
volume = { 3 },
number = { 3 },
month = { July },
year = { 2012 },
issn = { 2249-0868 },
pages = { 8-12 },
numpages = {9},
url = { https://www.ijais.org/archives/volume3/number3/210-0475/ },
doi = { http:/ijais12-450475 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T10:45:32.394935+05:30
%A Hamid Reza Khosravani
%T Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 3
%N 3
%P 8-12
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data quality plays an important role in knowledge discovering process in databases. Researchers have proposed two different approaches for data quality evaluation so far. The first approach is based on statistical methods while the second one uses data mining techniques which caused further improvement in data quality evaluation results through relying on knowledge extracting. Our proposed method in data quality evaluation follows the second approach and focuses on accuracy dimension of data quality evaluation including both syntactic and semantic aspects.

References
  1. Partabiyan, J. , Mohsenzadeh, M. 2009. Database quality evaluation using a data mining technique, Science and Research Branch, Islamic Azad University, Tehran, Iran.
  2. Ghazanfari, M. , Alizadeh, S. , and Teymourpour, B. 2008. Data Mining and Knowledge Discovery, Publish Center of Iran University of Science & Technology, Tehran, Iran.
  3. Wang, L. , Teshnehlab, M. , Saffarpour, N. , Afuni, D. 2008. Fuzzy Systems and Fuzzy Control, Publish Center of K. N Toosi university of Technology, Tehran, Iran.
  4. Amir A. , Lipika, D. 2007. A k-mean clustering algorithm for mixed numeric and categorical data, Solid State Physics Laboratory, Timarpur, Delhi India, ScienceDirect.
  5. Amir, A. , Lipika, D. 2007. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set, Solid State Physics Laboratory, Timarpur, Delhi India, ScienceDirect.
  6. Augustin-Iulian Ionescu, Eugen Dumitrascu, 2004. Database Quality-Some Problems, 7th International Conference on Develpment and Application Systems, Suceava, Romania.
  7. Dharmendra S. , Modha, W. , Spangler, S. 2001. FeatureWeighting in k-Means Clustering , Kluwer Academic Publishers, Netherlands.
  8. Loshin, D. 2006. Monitoring Data Quality Performance Using Data Quality Metrics, Informatica Corporation.
  9. Luebbers, D. , Grimmer, U. , Jarke, M. 2003. Systematic Development of Data Mining-Based Data Quality Tools, Proceedings of the 29th VLDB Conference, Berlin, Germany.
  10. Erhard Rahm, Hong Hai Do, Data Cleaning: Problems and Current Approaches, University of Leipzig, Germany.
  11. Hipp, J. , G¨untzer, U. , Grimmer, U. 2003. Data Quality Mining, 3rd International Conference on Practical Aspects of Knowledge Management.
  12. Dougherty, J. , Kohavi, R. , Sahami, M. 1995. Supervised and Unsupervised Discretization of Continuous Features, Computer Science Department of Stanford University, Proceeding of the 12th International Conference.
  13. Peng, L. , Lei, L. A Review of Missing Data Treatment Methods, Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, China.
  14. Lee. 1999. Fuzzy logic in control systems: Fuzzy logic controller, IEEE Trans Systems.
  15. Pipino, L. L. , Lee, Y. W. , Wang, R. Y. 2002. Data Quality Assessment, Communications of the ACM.
  16. Helfert, M. , An Approach for Information Quality measurement in Data Warehousing, University of St. Gallen (Switzerland).
  17. Ludl, M. C. , Widmer, G. , Relative Unsupervised Discretization for Association Rule Mining , Department of Medical Cybernetics and Artificial Intelligence, University of Vienna.
  18. Scannapieco, M. , Missier, P. , Batini, C. , Data Quality at a Glance, Università di Roma "La Sapienza" , University of Manchester, Dipartimento di Informatica, Sistemistica e Comunicazione.
  19. Mamdani; E. H;"Application of fuzzy logic to approximate reasoning using linguistic synthesis", IEEE Trans on Computers, 2003.
  20. Manoranjan Dash, Huan Liu, Feature Selection for Clustering, National University of Singapore, Singapore.
  21. Ohn Mar San, Van-Nas huynh, Yoshiteru Nakamori, 2004. An alternative extention of the k-means algorithm clustering categorical data, Mathematics and Statistics Department of Co-Operative Degree College Sagaing Myanmar, Japan Advanced Institute of Science and Technology Asahidai Tatsunokuchi Ishikawa Japan.
  22. Vázquez Soler, S. , Yankelevich, D. , Quality Mining: A Data Mining Based Method for Data Quality Evaluation, Pragma Consultores and Departamento de Computación – FCEyN Universidad de Buenos Aires, Argentina.
  23. Zhexue Huang, 1998. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Kluwer Academic Publishers, Netherlands.
Index Terms

Computer Science
Information Sciences

Keywords

Data Quality Mining Association Rules Categorical Feature Numerical Feature