An Efficient Semiparametric Approach for Marker Gene Selection and Patient Classification
The advancement of microarray technology has greatly facilitated the research in gene expression based classification of patient samples. For example, in cancer re-search, microarray gene expression data has been used for cancer or tumor classification. When the study is only focusing on two classes, for example two different cancer types, we propose a two-sample semi parametric model to model the distributions of gene expression level for different classes. To estimate the parameters, we consider both maximum semi parametric likelihood estimate (MLE) and minimum Hellinger distance estimate (MHDE). For each gene, Wald statistic is constructed based on either the MLE or MHDE. Significance test is then performed on each gene. We exploit the idea of weighted sum of misclassification rates to develop a novel classification model, in which previously identified significant genes only are involved. To testify the usefulness of our proposed method, we consider a predictive approach. We apply our method to analyze the acute leukemia data of [1] in which a training set is used to build the classification model and the testing set is used to evaluate the accuracy of our classification model.
To Read More.... Full Text in Biostatistics and Biometrics Open Access Journal in Juniper Publishers












