Wang Li-Na
College of Electronic and Information Engineering, Nanjing University of Information Science and Technology, 210044, Nanjing, China
Liu Qian
College of Electronic and Information Engineering, Nanjing University of Information Science and Technology, 210044, Nanjing, China
Zhou Yuan
College of Electronic and Information Engineering, Nanjing University of Information Science and Technology, 210044, Nanjing, China
ABSTRACT
In this study, a new fuzzy centroids clustering for categorical data is presented. The objective function of the fuzzy k-modes algorithm is modified by adding the between-cluster information so as to simultaneously minimize the within-cluster dispersion and enhance the between-cluster separation. Due to the misclassification by using the hard centroids, a fuzzy centroids clustering with the between-cluster information for categorical data is provided. Furthermore, the dissimilarity measure between an object and the centroid at the feature level is given as 1 minus the frequency of the feature value of the object. On several real data sets from UCI, the proposed algorithm is effective and the performance of the novel algorithm outperforms the one with hard-type centroids.
PDF References Citation
How to cite this article
Wang Li-Na, Liu Qian and Zhou Yuan, 2013. A Fuzzy Centroids Clustering Algorithm with Between-cluster Information for
Categorical Data. Information Technology Journal, 12: 5482-5486.
DOI: 10.3923/itj.2013.5482.5486
URL: https://scialert.net/abstract/?doi=itj.2013.5482.5486
DOI: 10.3923/itj.2013.5482.5486
URL: https://scialert.net/abstract/?doi=itj.2013.5482.5486
REFERENCES
- Bai, L., J. Liang, C. Dang and F. Cao, 2013. A novel fuzzy clustering algorithm with between-cluster information for categorical data. Fuzzy Sets Syst., 215: 55-73.
CrossRef - Cao, F., J. Liang, D. Li, L. Bai and C. Dang, 2012. A dissimilarity measure for the k-Modes clustering algorithm. Knowl. Based Syst., 26: 120-127.
CrossRefDirect Link - Chan, E.Y., W.K. Ching, M.K. Ng and Z.J. Huang, 2004. An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn., 37: 943-952.
CrossRefDirect Link - Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowledge Discovery, 2: 283-304.
CrossRefDirect Link - Huang, Z. and M.K. Ng, 1999. A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst., 7: 446-452.
CrossRefDirect Link - Ji, J., W. Pang, C. Zhou, X. Han and Z. Wang, 2012. A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl. Based Syst., 30: 129-135.
CrossRefDirect Link - Kim, D.W., K.H. Lee and D. Lee, 2004. Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recogn. Lett., 25: 1263-1271.
CrossRefDirect Link - Lee, M. and W. Pedrycz, 2009. The fuzzy C-means algorithm with fuzzy P-mode prototypes for clustering objects having mixed features. Fuzzy Sets Syst., 160: 3590-3600.
CrossRefDirect Link - Ng, M.K., M.J. Li, J.Z. Huang and Z. He, 2007. On the impact of dissimilarity measure in k-Modes clustering Algorithm. IEEE Trans. Pattern Anal. Machine Intelli., 29: 503-507.
CrossRefDirect Link - Yang, Y., 1999. An evaluation of statistical approaches to text categorization. Inform. Retrieval, 1: 69-90.
CrossRef