Xu Yan
School of Information Science, Beijing Language and Culture University, Beijing, 100083, P.R. China
ABSTRACT
This study, from the perspective of Chinese Spam Filtering, focuses on efficient feature selection methods. It expounds the traditional feature selection algorithms including Document Frequency (DF), Information Gain (IG), the Mutual Information (MI), Chi-square (CHI) and Knowledge Gain (KG) which is proposed in my previous study. Testing these methods on exposing Chinese spam data set, the results show that in Chinese spam corpus CHI and KG can efficiently extract valid features for spam classifications.
PDF References Citation
How to cite this article
Xu Yan, 2013. Efficient Feature Selection Methods in Chinese Spam Filtering. Information Technology Journal, 12: 5492-5496.
DOI: 10.3923/itj.2013.5492.5496
URL: https://scialert.net/abstract/?doi=itj.2013.5492.5496
DOI: 10.3923/itj.2013.5492.5496
URL: https://scialert.net/abstract/?doi=itj.2013.5492.5496
REFERENCES
- Androutsopoulos, I., J. Koutsias, K.V. Chandrinos and C.D. Spyropoulos, 2000. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. July 24-28, 2000, ACM, Athens, Greece, pp: 160-167.
- Androutsopoulos, I., J. Koutsias, K.V. Chandrinos, G. Paliouras and C.D. Spyropoulos, 2000. An evaluation of naive bayesian Anti-spam filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, May 2000, Barcelona, Spain, pp: 9-17.
- De Capitani, D., E. Damiani, S. De, C. Vimercati, S. Paraboschi and P. Samarati, 2004. An open digest-based technique for spam detection. Proceedings of the 2004 International Workshop on Security in Parallel and Distributed Systems, (PDCS`04), USA., pp: 15-17.
CrossRef - Manber, U., 1994. Finding similar files in a large file system. Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, January 17-21, 1994, San Francisco, California, pp: 2.
Direct Link - Xu, Y., 2011. Rough set and its application in Chinese spam filtering. Proceedings of the IEEE International Conference on Granular Computing, November 8-10, 2011, Kaohsiung, pp: 750-755.
CrossRef - Yu, H., Z. Li, H. Tang and Z. Wu, 2003. A rough set approach for analyzing E-mail filtering system. Comput. Eng. Appl., 15: 47-48, 67.
Direct Link