Using Python to solve problems in bioinformatics

Calculating gene regulatory networks from gene expression data

The C extension module Systems Biology for Python can be used to calculate a gene regulatory network in terms of a linear system of stochastic differential equations. Such a system can be regarded as a generalization of a dynamic Bayesian network, in which unequal time intervals between gene expression measurements are allowed. This software package is decribed in

De Hoon, M.; Ott, S.; Imoto, S.; Miyano, S.; "Assessing the biological validity of gene regulatory networks inferred from time course gene expression data". Submitted Bioinformatics, 2003.

To install this software package, download the source distribution or the Windows installer for Python 2.2 or for Python 2.3. As of version 0.40, this extension module no longer depends on the GNU Scientific Library. A manual is not yet available, except for the instructions on this page.

To infer a gene regulatory network from gene expression data, first import the module with
from sysbio import *
Then call the function
findnetwork (data, mask=None, time=None, criterion='AIC', maxnparents=None)
where time is a vector with ndata elements specifying the time points at which the measurements were made, data is an ngenes x ndata matrix containing the gene expression ratios, mask is an ngenes x ndata matrix containing 1's and 0's to indicate which elements in data are present (1) or missing (0), criterion is the criterion to be used to determine the number of parents, and maxnparents is the maximum number of parents a gene can have in the network. The maximum number of parents should be less than ndata-1. The parameter criterion can be 'AIC', 'BIC', or 'ML' for Akaike's Information Criterion, the Bayesian Information Criterion, or the constrained Maximum Likelihood method, respectively. For the AIC and BIC, the number of parents is equal to or less than maxnparents; for the constrained Maximum Likelihood method, the number of parents will be exactly equal to maxnparents.

This function returns the tuple (parents, values, score), where parents is a list with ngenes elements, where element i is a list containing the indeces of the parent genes in the graph for gene i, values has the same shape as parents, and contains the regulatory strength for the gene to be affected by each parent, and score is the AIC, BIC, or likelihood score for the network. The returned variable score is an array that contains the score for each gene separately; the overall network score is the sum of this array.

The Systems Biology module was released under the Python License.





Back to home page