Using Python to solve problems in bioinformatics
Getting started with Python
Scripting languages such as Perl and Python are commonly used in bioinformatics for tasks such as data management, file parsing, string processing, and interaction with databases.
In addition, Python offers excellent numerical support through the Numerical Python package, providing a functionality similar to the commercial software packages Matlab and S.
The Biopython Project provides freely available Python tools for life science research, focusing on file parsing for bioinformatics, interaction with online resources, interfaces to commonly used bioinformatics progams, and sequence analysis tools. On this page, I will focus on numerical analysis using Python, particularly with respect to gene expression data analysis.
To get started with Python,
- First of course you will need to download python. An installer program is available for Windows platforms, as well as a source distribution for all other platforms.
To be able to use the arrow keys to retrieve previous commands, make sure that the readline library is installed before installing Python. The readline library should show up near the end of the output when running configure:
[...]
checking for rl_pre_input_hook in -lreadline... yes
checking for rl_completion_matches in -lreadline... yes
[...]
- Next, you will need to install Numerical Python (numpy). This will give you access to mathematical functions and matrix handling. Again, both an installer program is provided for Windows, and a source distribution for all other platforms. To download Numerical Python, go to their project page, and look for the NumPy package (don't use the "Old Numarray" or "Old Numeric" packages; they are quite old by now). Clicking on "Download" will take you to the project file list, where you will see three files highlighted in pink. Download the file appropriate for your platform.
On Windows computers, you'll just need to run the installer program. For other computers, unpack the source distribution, and run
python setup.py install
.
- Unfortunately NumPy does not include special functions. For special functions (including the basic probability density functions used in hypothesis testing in statistics), you can download the transcendental extension module.
- For graphical output and scientific plotting, use matplotlib.
and you're done.
Back to home page