Using Python to solve problems in bioinformatics

Python is an open-source general-purpose scripting language that has proven to be very suitable for scientific computing. With string processing capabilities similar to Perl, Python together with Numerical Python has been proposed as a replacement for commercial software packages such as Matlab. Not surprisingly, Python is becoming increasingly popular in the field of bioinformatics. The Python in bioinformatics page will get you started with Python within the context of bioinformatics, and describes some Python extension modules that have been developed at our laboratory.

Open Source Clustering Software

Clustering techniques are widely used in gene expression data analysis. By grouping genes together based on the similarity in their gene expression profile, genes with similar functions may be found. This may provide clues to the function of presently unknown genes. The Open Source Clustering Software developed at our laboratory is built around a C library of the most commonly used clustering algorithms. This library was then used to create enhanced versions of Michael Eisen's Cluster/TreeView program for the Windows, Mac OS X, and Linux/Unix platforms. The routines in the C Clustering Library can also be accessed from the Python scripting language, allowing a more flexible use of these algorithms.

Supplementary information

De Hoon, M.; Ott, S., Imoto, S.; Miyano, S.; "Assessing the biological validity of gene regulatory networks inferred from time course gene expression data". Submitted to Bioinformatics.
We have developed a C extension module for the Python scripting language to fit a stochastic differential equation model to time course gene expression data. This model is a generalization of dynamic Bayesian networks, allowing arbitrary time intervals between measurements. The C extension module can be downloaded from our Python in bioinformatics page.

De Hoon, M.J.L.; Imoto, S.; Miyano, S.: "Statistical analysis of a small set of time-ordered gene expression data using linear splines". Bioinformatics, 18: 1477--1485 (2002). The algorithm to fit a linear spline is available as an extension module for Python (source, Windows installer). This software package (with patent pending) is free of charge for academic use only.
An older demonstration version of this software is still available. The demonstration version shows an example Excel spreadsheet calling the routine using Visual Basic. To optimize the execution speed of the linear spline calculation, the core numerical routines are written in C/Fortran and stored in the Dynamic-Link Library linspline.dll. Please see the readme file for more information.

Back to home page