Yuwan Gu
College of Electronic and Information Engineering, Jiang Su University, Jiang Su, Zhenjiang, 212013, China
Guodong Shi
College of Electronic and Information Engineering, Jiang Su University, Jiang Su, Zhenjiang, 212013, China
Huanhuan Cai
International Institute of Ubiquitous Computing, Chang Zhou University, Jiangsu, Changzhou, 213164, China
Yan Chen
International Institute of Ubiquitous Computing, Chang Zhou University, Jiangsu, Changzhou, 213164, China
Yuqiang Sun
International Institute of Ubiquitous Computing, Chang Zhou University, Jiangsu, Changzhou, 213164, China
ABSTRACT
Traditional decision tree algorithm has been unable to solve large-scale data mining; this study presents a parallel decision tree classification algorithm based on MapReduce. The algorithm uses the attribute set dependence as a test attribute selection criteria to avoid shortcomings of the ID3 algorithm ,which is difficult to remove noise, the relationship between attributes is not close enough, so using the MapReduce model to solve large-scale data mining problem. Verified by an example: decision tree algorithm based on MapReduce can handle massive data classification problem and has better scalability and higher efficiency of classification.
PDF References
How to cite this article
Yuwan Gu, Guodong Shi, Huanhuan Cai, Yan Chen and Yuqiang Sun, 2013. Research of Parallel Decision Tree Algorithm Based on Mapreduce. Information Technology Journal, 12: 7345-7352.
DOI: 10.3923/itj.2013.7345.7352
URL: https://scialert.net/abstract/?doi=itj.2013.7345.7352
DOI: 10.3923/itj.2013.7345.7352
URL: https://scialert.net/abstract/?doi=itj.2013.7345.7352
REFERENCES
- Hu, C.J. and X.W. Wang, 2008. Research of parallel programming model based on Multi-core cluster system. Comput. Technol. Dev., 4: 70-73.
Direct Link - Li, C.H., X.F. Zhang, H. Jin and W. Xiang, 2011. MapReduce: A new programming model for distributed parallel computing. Comput. Eng. Sci., 3: 129-135.
Direct Link - Wang, H.C., D.J. Zhu, X.N. Cao and J.P. Fan, 2009. Research on hybrid parallel programming model based on SMP cluster. Comput. Eng., 3: 271-273.
Direct Link - Xiong, H., Z. Wang, Y. Liu and M. Yang, 2008. Constructing parallel cluster based on PVM in linux environment. Comput. Dev. Appl., 2: 52-54.
Direct Link