For a long time I have tried to handle text files in Python in the same way that R’s data.frame does – that is, direct access to columns and rows of a loaded text file. As I don’t like R at all, I struggled to find a Pythonic equivalent, and since I found none, I decided to eat my own food and write an implementation, which is what you’ll find below.
The idea is to store the values of the text file as a dictionary of columns which includes then a list of (row name, row value) tuples. Like this, you can access the columns by their name (I need to see if it’s workable to also use numbers), or you can view specific rows, including all or a subset of the columns. It’s decently faster and it allows for non-sequential access, which you can’t do when reading a file (or a file-like structure).
I have tested this on Python 2.5.1. Older versions may or may not work. All modules called by this one should be shipped with Python itself.
Download and installation
This module is licensed under the GNU General Public License, version 2.
First of all, import the module
Then open a file and instantiate a DataMatrix object
fh = open("somefile.txt") data = datamatrix.DataMatrix(fh)
By default no column with row names is specified, so if you have one, you have to specify it:
data = datamatrix.DataMatrix(fh, row_names=1)
More options are in the documentation.
Once the DataMatrix is initialized, you can view how many columns are there and also view rows with the getRow method:
>> data.columns ["GeneID","Great_Exp1","Great_Exp2"] >> data["Great_Exp1"] [("Gene1",56.34), ... ] >> data.getRow(5) ["NOT_EXISTENT","56.545","4.56"]
Sometimes you’d want to get only the column without the row identifier, and that’s where getColumn comes in:
>> data.getColumn("Great_Exp1") [56.34,2.55.....]
Should you want to save a DataMatrix instance, you can use the writeMatrix function:
That’s all. Questions and suggestions, especially on coding and improvements, are very welcome.