A variety of statistical techniques for analyzing high-throughput biological data sets have been developed that figure prior knowledge about the entities being measured into their analysis. Such techniques are most effective when provided with high-quality "curated" knowledge - knowledge that is, or can be, represented in RDF. Of particular interest are functional knowledge of genes and proteins, and understanding of their interactions, in the analysis of DNA microarray data sets.
An example of such an algorithm is activity center analysis, developed by Joel Pradines. In this method an experimental microarray data set is used as a probe into a network of protein/protein (also protein/gene) interactions culled from the biological literature. Regions of the interaction "light up" following the statistical significancenot of individual gene measurements but after that of the combined measurements of small neighborhoods in the network of prior knowledge. The resulting network-based activity values can be much more illuminating of the biological processes active in the experimental preparation than individual data values for single genes would be.
The method operates most effectively when given information combined from multiple interaction sources. Rendering interaction sources in RDF for use in the Neurocommons and the Semantic Web is expected to simplify the process of preparing prior knowledge for use in methods such as activity center analysis. Rendering microarray data set metadata from sources such as GEO will also facilitate the search for data sets to be in used in meta-analysis and other kinds of studies.
Joël Pradines et al.
Detection of centers of activity in transcript profiling data.
Journal of Biopharmaceutical Statistics 14:710-712, 2004.