Li Kairong
College of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, 225127, China
Qu Libing
College of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, 225127, China
Zhu Junwu
College of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, 225127, China
Kong Zhaokun
College of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, 225127, China
Zhao Dongwei
College of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, 225127, China
ABSTRACT
For hierarchical text classification problem, the existing prototype-based classifiers, such as k-NN, kNN Model and Centroid classifier, have achieved the aim of expected function and performance. However, due to high dimensionality and complex class structures of document data sets, they usually perform less effectively, we proposed a new method for text classification that extracts semantic labels and builds a tree structure for each level of the classification hierarchy. We compare the proposed method with KNN method, using several multi-hierarchical classification datasets. Our experimental analysis shows that our method fully considers the semantic information between the contact hierarchies of the category, as well as enhances the efficiency of text classification.
PDF References Citation
How to cite this article
Li Kairong, Qu Libing, Zhu Junwu, Kong Zhaokun and Zhao Dongwei, 2013. Study of Documents Multi-hierarchy Categorization Based on Topic Label and LSI. Information Technology Journal, 12: 5044-5051.
DOI: 10.3923/itj.2013.5044.5051
URL: https://scialert.net/abstract/?doi=itj.2013.5044.5051
DOI: 10.3923/itj.2013.5044.5051
URL: https://scialert.net/abstract/?doi=itj.2013.5044.5051
REFERENCES
- Bade, K., E. Hunenneier and A. Numberger, 2006. Hierarchical classification by expected utility maximization. Proceedings of the Sixth Intemational Conference on Data Mining, December 18-22, 2006, Hong Kong, pp: 43-52.
CrossRef - Cerri, R. and A.C.P.L.F. de Carvalho, 2010. Hierarchical multilabel classification using top-down label combination and artificial neural networks. Proceedings of the 11th Brazilian Symposium on Artificial Neural Networks, October 23-28, 2010, Sao Paulo, pp: 253-258.
CrossRefDirect Link - Koller, D. and M. Sahami, 1997. Hierarchically classifying documents using very few words. Proceedings of the 14th International Conference on Machine Learning, July 8-12, 1997, Morgan Kaufmann Publishers, San Francisco, USA., pp: 170-178.
Direct Link - Liu, T., Z. Chen, B. Zhang, W.Y. Ma and G. Wu, 2004. Improving text classification using local latent semantic indexing. Proceedings of the 4th International Conference on Data Mining, November 1-4, 2004, Brighton, UK., pp: 162-169.
CrossRef - Loh, W.Y. and Y.S. Shih, 1997. Split selection methods for classification trees. Statistica Sinica, 7: 815-840.
Direct Link - Mayne, A. and R. Perry, 2009. Hierarchically classifying documents with multiple labels. Proceedings of the Symposium on Computational Intelligence and Data Mining, March 30-April 2, 2009, Nashville, TN., pp: 133-139.
CrossRef - Silla Jr, C.N. and A.A. Freitas, 2010. A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov., 22: 31-72.
CrossRefDirect Link - Vens, C., J. Struyf, L. Schietgat, S. Dzeroski and H. Blockeel, 2008. Decision trees for hierarchical multi-label classification. Mach. Learn., 73: 185-214.
CrossRefDirect Link