Zhang Yanjie
School of Computer Science and Technology, Yantai University, 264005, Shandong, China
Hu Zhanyi
Institute of Automation, Chinese Academy of Sciences, 100080, Beijing, China
Sun Limin
School of Computer Science and Technology, Yantai University, 264005, Shandong, China
ABSTRACT
Along with the development of the technology of microarray chips, more and more gene expression data are available. Biclustering with gene expression data has been proved to be an efficient way to discover the characteristic genes corresponding to some specific diseases. It also has wide applications in the other areas. Since there are usually quite a lot of different size biclusters lying in the original data matrix and not all of the biclusters play the same roles, how to evaluate the significance of the detected biclusters is very important. Information Entropy (IE) as a way to measure the uncertainty in a random variable is vital in information theory. In this study we propose a method of applying self-defined IE as an index to evaluate the significance of all the detected biclusters, based on it the significance of each bicluster can be quantified. The number of useful biclusters can be greatly decreased while keeping the high recognition accuracy. The preliminary experiment results shown at the end of the study demonstrate its feasibility.
PDF References Citation
How to cite this article
Zhang Yanjie, Hu Zhanyi and Sun Limin, 2013. Bicluster Significance Evaluation with the Application of Information Entropy. Information Technology Journal, 12: 7898-7901.
DOI: 10.3923/itj.2013.7898.7901
URL: https://scialert.net/abstract/?doi=itj.2013.7898.7901
DOI: 10.3923/itj.2013.7898.7901
URL: https://scialert.net/abstract/?doi=itj.2013.7898.7901
REFERENCES
- Alon, U., N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack and A.J. Levine, 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96: 6745-6750.
CrossRef - Prelic, A., S. Bleuler, P. Zimmermann, A. Wille and P. Buhlmann et al., 2006. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22: 1122-1129.
CrossRef - Alizadeh, A.A., M.B. Eisen, R.E. Davis, C. Ma and I.S. Lossos et al., 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403: 503-511.
CrossRefDirect Link - Roubos, J.A., M. Setnes and J. Abonyi, 2003. Learning fuzzy classification rules from labeled data. Inform. Sci., 150: 77-93.
CrossRefDirect Link - De Oliveira, J.V., 1999. Semantic constraints for membership function optimization. IEEE Trans. Syst. Man Cybernetics A, 29: 128-138.
CrossRef - Madeira, S.C. and A.L. Oliveira, 2004. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinform., 1: 24-45.
PubMedDirect Link - Tsai, C.J., C.I. Lee and W.P. Yang, 2008. A discretization algorithm based on class-attribute contingency coefficient. Inform. Sci., 178: 714-731.
CrossRefDirect Link