Feature Selection in Microarray Data Using Entropy Information

Ali Reza Soltanian; Niloofar Rabiei; Fatemeh Bahreini

doi:10.15586/computationalbiology.2019.ch10

PDF HTML XML

Published: Oct 31, 2019

DOI: https://doi.org/10.15586/computationalbiology.2019.ch10

Keywords:

data mining, entropy, genetics, microarray, system biology

Ali Reza Soltanian

Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran

Niloofar Rabiei

Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran

Fatemeh Bahreini

Department of Molecular Medicine and Genetics, Faculty of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran

ABSTRACT

Researchers in biological sciences and genetics are faced with high-dimensional data, such as the microarray data, and the analysis and proper interpretation of these data are very important in bioinformatics and systems biological sciences. In such types of data, the number of variables, for example, the genes, is many times greater than the number of samples. Therefore, the dimension of the data must be reduced at the primary point. Then, the analysis, for example, clustering, is performed on the compacted data. This process is called data summarization. There are various ways to summarize high-dimensional data, which depends on the nature of the data. The aim of data summarization is to remove unnecessary features so that the data are classified more accurately. Shannon’s entropy information is a common method for clustering genes in microarray data and selecting a set of disease-related genes. This chapter introduces and illustrates statistical inference concepts of entropy in microarray data clustering to select a set of the most important genes associated with a disease.