DataMatrix

DataMatrix is an implementation of R’s data.frame in Python. What it means is that you can use it to load text files and treat them as they were tables, referring to single columns or rows, using a dictionary-like syntax. I wrote this module because I had to deal with microarray expression data and there was nothing available, short of using R, which I don’t like as a programming language.

Download and installation

Current version: 0.9
Requirements: Python 2.5 or later (2.x series)

DataMatrix is a pure Python module, distributed either as a Windows installer or as a source package. Users of the source package can install it by decompressing the archive, moving inside the directory and typing

python setup.py install

as root (or use “sudo” if you are on a distribution that does not use root).

You can also get the source code via git, as there is a public repository hosted on gitorious.org. Check out the sources with

git://gitorious.org/datamatrix/datamatrix.git

Lastly, there is a RPM package for openSUSE 11.2 available, in the home:luca_b Build Service project:

sudo zypper install http://download.opensuse.org/repositories/home:/luca_b/openSUSE_11.2/noarch/python-datamatrix-0.9-5.1.noarch.rpm

Usage

DataMatrix works with files, or file-like objects that are supported by Python’s own csv module. A typical invocation would be:

import datamatrix
fh = open("somefile.txt","r")
data = datamatrix.DataMatrix(fh,row_names=1)

The row_names parameter is used to tell the initializer if there is a header, and on which line. If it is present, that line number will be used as identifier (I haven’t tested it with row numbers differen than the first, though). You can pass other parameters to tell DataMatrix how the text file is: those are passed to the underlying csv module, so have a look at its documentation for more information.

Once the DataMatrix object has been created, you can see the column names by accessing the “columns” attribute (bear in mind that you cannot access the identifier columns directly), and you can access the columns in a dictionary-like syntax. Should you want to get specific rows, the getRow method is used, and you can optionally tell how many columns to show.

The module also includes a writeMatrix function to write DataMatrix objects to disk (essentially by re-converting them to text files).

Documentation

For more information, you can have a look at the official documentation (HTML), which is also available as a PDF file.

License

DataMatrix is licensed under the GNU General Public License (GPL), version 2 only.

6 Responses leave one →
  1. 2009 August 28
    Gabriel permalink

    Why is the code under GPLv2 only? I would like to look at your code, but I would want it to be GPLv3 compatible … is there any reason for the restriction?

    thanks,
    Gabriel

  2. 2009 August 28

    I put it GPL v2 only because I didn’t really like the campaign used by the FSF to promote v3. That said, if there are good reasons for an “or later” clause, I can change the license.

  3. 2009 August 29
    Gabriel permalink

    The only good reason I can think of is compatibility, people can then ship/use your code with GPLv3 code. For me it would allow me to steal you good ideas :-) I have written a DataFrame object similar in spirit (before I saw your page) and would like to take some of your ideas. My code can happily be v2 but I would potentially bundle it with v3 code, so it is a now go unless you have the ‘or later’ clause.

  4. 2009 August 31

    Well, since you’re developing a similar module, perhaps it would be wise to join forces (along with changing the licensing). What do you think?

  5. 2009 September 1
    Gabriel permalink

    I would be happy to, though my version is seems a more literal translation of the R’s dataframe. I will get the code together and hosted somewhere so you can check it out in the next week. I would be interested in your thoughts.

  6. 2009 September 1
    Gabriel permalink

    Well heck, I just put everything up at launchpad, I will clean house later, this will give you an idea of what my code is like, and if it interests you. Note that it requires python 2.6 (for now) and is very alpha at the moment.

    send me an email with any thoughts.

    the launchpad site is:
    http://bazaar.launchpad.net/~ggellner/psl/scicollections/files

    it also depends on
    http://bazaar.launchpad.net/~ggellner/psl/sciutils/files

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS