## "DENSE: Efficient and Prior Knowledge-driven Discovery of Phenotype-associated Protein Functional Modules"

Abstract: Identifying cellular subsystems that are involved in the expression of a target phenotype has been a very active research area for the past several years. In this paper, cellular subsystem refers to a group of genes (or proteins) that interact and carry out a common function in the cell. Most studies identify genes associated with a phenotype on the basis of some statistical bias, others have extended these statistical methods to analyze functional modules and biological pathways for phenotype-relatedness. However, a biologist might often have a specific question in mind while performing such analysis and most of the resulting subsystems obtained by the existing methods might be largely irrelevant to the question in hand. Arguably, it would be valuable to incorporate biologist's knowledge about the phenotype into the algorithm. This way, it is anticipated that the resulting subsystems would not only be related to the target phenotype but also contain information that the biologist is likely to be interested in. In this paper we introduce a fast and theoretically guaranteed method called DENSE (Dense and ENriched Subgraph Enumeration) that can take in as input a biologist's prior knowledge as a set of query proteins and identify all the dense functional modules in a biological network that contain some part of the query vertices. The density (in terms of the number of network edges) and the enrichment (the number of query proteins in the resulting functional module) can be manipulated via two parameters $\gamma$ and $\mu$, respectively. This algorithm has been applied to the protein functional association network of Clostridium acetobutylicum ATCC 824, a hydrogen producing, acid-tolerant organism. The algorithm was able to verify relationships known to exist in literature and also some previously unknown relationships including those with regulatory and signaling functions. Additionally, we were also able to hypothesize that some uncharacterized proteins are likely associated with the target phenotype. [pdf]

Authors William Hendrix, Andrea M Rocha, Kanchana Padmanabhan, Alok Choudhary, Kathleen Scott, James R Mihelcic, Nagiza F Samatova

Acknowledgement This work was supported in part by the U.S. Department of Energy, Office of Science, the Office of Advanced Scientific Computing Research (ASCR) and the Office of Biological and Environmental Research (BER) and the U.S. National Science Foundation (Expeditions in Computing). The work by A.M.R. was supported by the Delores Auzenne Fellowship and the Alfred P. Sloan Minority PhD Scholarship Program. The work of W.H. and A.C. was partially supported by NSF award numbers: OCI-0724599,CNS-0830927, CCF-0621443, CCF-0833131, CCF-0938000, CCF-1029166, and CCF-1043085 and in part by DOE grants DE-FC02-07ER25808, DE-FG02-08ER25848, DE-SC0001283, DE-SC0005309, and DE-SC0005340. Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under contract no. DEAC05-00OR22725.

Citation DENSE: efficient and prior knowledge-driven discovery of phenotype-associated protein functional modules William Hendrix, Andrea M Rocha, Kanchana Padmanabhan, Alok Choudhary, Kathleen Scott, James R Mihelcic, Nagiza F Samatova BMC Systems Biology Volume number and article number to be determined

### Main software

Author: William Hendrix