Jingyu Chen
School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi`an, 710071, Shaanxi, China
Ping Chen
School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi`an, 710071, Shaanxi, China
Xian`gang Sheng
College of Information Engineering, Qingdao University, Qingdao, 266071, Shandong, China
ABSTRACT
Due to the inaccuracy and noisy, uncertainty is inherent in time series data and increases the complexity of clustering. For the massive data size, efficient data storage is a crucial task. Based on the Hilbert SFC, a trend sketches is constructed to store trends of the uncertain time series. And based on divergence and sketch metric, a sketch based similarity is given. Then a clustering algorithm is proposed to improve the quality of clustering. The experimental results are shown in Final.
PDF References Citation
How to cite this article
Jingyu Chen, Ping Chen and Xian`gang Sheng, 2013. Trend Based Sketching for Massive Uncertain Time Series Clustering. Information Technology Journal, 12: 7280-7288.
DOI: 10.3923/itj.2013.7280.7288
URL: https://scialert.net/abstract/?doi=itj.2013.7280.7288
DOI: 10.3923/itj.2013.7280.7288
URL: https://scialert.net/abstract/?doi=itj.2013.7280.7288
REFERENCES
- Ackermann, M.R., J. Blomer and C. Sohler, 2010. Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms, Vol. 6, No. 4.
CrossRef - Ackermann, M.R., M. Martens, C. Raupach, K. Swierkot, C. Lammersen and C. Sohler, 2012. StreamKM++: A clustering algorithm for data streams. J. Exp. Algorithmics, Vol. 17.
CrossRef - Aggarwal, C., 2009. A framework for clustering massive-domain data streams. Proceedings of the 25th International Conference on Data Engineering, March 29-April 2, 2009, Shanghai, China, pp: 102-113.
CrossRef - Ankerst, M., M.M. Breunig, H.P. Kriegel and J. Sander, 1999. Optics: Ordering points to identify the clustering structure. ACM SIGMOD Rec., 28: 49-60.
CrossRef - Banerjee, A., S. Merugu, I.S. Dhillon and I. Ghosh, 2005. Clustering with Bregman divergences. J. Mach. Learn. Res., 6: 1705-1749.
Direct Link - Cormode, G. and S. Muthukrishnan, 2005. An improved data-stream summary: The count-min sketch and its applications. J. Algorithms, 55: 58-75.
CrossRef - Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, August 2-4, 1996, Portland, pp: 226-231.
CrossRefDirect Link - Jiang, B., J. Pei, Y.F. Tao and X.M. Lin, 2013. Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowledge Data Eng., 25: 751-763.
CrossRef - Jagadish, H.V., 1990. Linear clustering of objects with multiple attributes. Proceedings of the ACM SIGMOD International Conference on Management of Data, May 23-26, 1990, Atlantic City, NJ., USA., pp: 332-342.
CrossRef - Kriegel, H.P. and M. Pfeifle, 2005. Density-based clustering of uncertain data. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 21-24, 2005, Chicago, IL., USA., pp: 672-677.
CrossRef - Liu, Y., L.F. Zhang and Y. Guan, 2010. Sketch-based streaming PCA algorithm for network-wide traffic anomaly detection. Processing of the IEEE 30th International Conference on Distributed Computing Systems, June 21-25, 2010, Genova, Italy, pp: 807-816.
CrossRef - Ngai, W.K., B. Kao, C.K. Chui, R. Cheng, M. Chau and K.Y. Yip, 2006. Efficient clustering of uncertain data. Proceedings of the 6th International Conference on Data Mining, December 18-22, 2006, Hong Kong, pp: 436-445.
CrossRef - Nie, Y., R. Cocci, Z. Cao, Y.L. Diao and P. Shenoy, 2012. SPIRE: Efficient data inference and compression over RFID streams. IEEE Trans. Knowledge Data Eng., 24: 141-155.
CrossRef - Papapetrou, O., M. Garofalakis and A. Deligiannakis, 2010. Sketch-based querying of distributed sliding-window data streams. Proc. VLDB Endowment, 5: 992-1003.
Direct Link - Somasundaram, R.S. and R. Nedunchezhian, 2011. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int. J. Comput. Appl., 21: 14-19.
CrossRefDirect Link - Tran, T.T.L., L.P. Peng, B.D. Li, Y.L. Diao and A.N. Liu, 2010. PODS: A new model and processing algorithms for uncertain data streams. Proceedings of the International Conference on Management of Data, June 6-11, 2010, Indianapolis, IN., USA., pp: 159-170.
CrossRef - Wang, X.M. and D.B. Yuan, 2012. A query verification scheme for dynamic outsourced databases. J. Comput., 7: 156-160.
CrossRef - Xu, H.J. and G.H. Li, 2008. Density-based probabilistic clustering of uncertain data. Proceedings of the International Conference on Computer Science and Software Engineering, Volume 4, December 12-14, 2008, Wuhan, Hubei, China, pp: 474-477.
CrossRef