SOM (Cluster 3.0 for Windows, Mac OS X, Linux, Unix)

4.3 Self-Organizing Maps

Self-Organizing Maps (SOMs) is a method of cluster analysis that are somewhat related to k-means clustering. SOMs were invented in by Teuvo Kohonen in the early 1980s, and have recently been used in genomic analysis (see Chu 1998, Tamayo 1999 and Golub 1999 in references). The Tamayo paper contains a simple explanation of the methods. A more detailed description is available in the book by Kohonen, Self-Organizing Maps, 1997.

The current implementation varies slightly from that of Tamayo et al., in that it restricts the analysis one-dimensional SOMs along each axis, as opposed to a two-dimensional network. The one-dimensional SOM is used to reorder the elements on whichever axes are selected. The result is similar to the result of k-means clustering, except that, unlike in k-means clustering, the nodes in a SOM are ordered. This tends to result in a relatively smooth transition between groups.

The options for SOMs are

whether or not you will organize each axis;
the number of nodes for each axis (the default is n^1/4, where n is the number of items; the total number of clusters is then equal to the square root of the number of items);
the number of iterations to be run.

The output file is of the form JobName_SOM_GXg-Yg_AXa-Ya.txt, where GXg-Yg is included if genes were organized, and AXg-Yg is included if arrays were organized. X and Y represent the dimensions of the corresponding SOM. Up to two additional files (.gnf and .anf) are written containing the vectors for the SOM nodes.

In previous versions of Cluster, only one-dimensional SOMs were supported. The current version of the Cluster introduces two-dimensional SOMs.

SOMs and hierarchical clustering: Our original use of SOMs (see Chu et al., 1998) was motivated by the desire to take advantage of the properties of both SOMs and hierarchical clustering. This was accomplished by first computing a one dimensional SOM, and using the ordering from the SOM to guide the flipping of nodes in the hierarchical tree. In Cluster, after a SOM is run on a dataset, the GORDER and/or EORDER fields are set to the ordering from the SOM so that, for subsequent hierarchical clustering runs, the output ordering will come as close as possible to the ordering in the SOM without violating the structure of the tree.