Int J Biochem Mol Biol 2010;1(1):51-68

Original Article
Computing gene expression data with a knowledge-based gene clustering approach

Bruce A. Rosa, Sookyung Oh, Beronda L. Montgomery, Jin Chen, Wensheng Qin

Biorefining Research Initiative and Department of Biology, Lakehead University,955 Oliver Road, Thunder Bay, ON P7B 5E1 Canada; MSU-DOE
Plant Research Laboratory, Michigan State University, 106 Plant Biology Building, East Lansing, MI 48824 USA; Department of Biochemistry
and Molecular Biology, Michigan State University, 322 Plant Biology Building, East Lansing, MI 48824 USA;Department of Computer Sciences
and Engineering, Michigan State University, S232 Plant Biology Building , East Lansing, MI 48824 USA.

Received April 25, 2010, accepted June 11, 2010, available online June 15, 2010; published August 1, 2010

Abstract: Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions
of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the
datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering
based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered
datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and
the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3
treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the
resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores
than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from
1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to
gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while
increasing the likelihood that they are biologically relevant. (IJBMB1004004).

Keywords:  Bioinformatics, Arabidopsis, microarray, clustering, data mining, light regulation

Full Text  PDF  

Address all correspondence to:
Jin Chen, PhD,
MSU-DOE Plant Research Laboratory
Michigan State University
106 Plant Biology Building
East Lansing, MI
48824 USA
jinchen@msu.edu

Or

Wensheng Qin, PhD
Biorefining Research Initiative and Department of Biology
Lakehead University
955 Oliver Road
Thunder Bay, ON
P7B 5E1 Canada
HomeContents