Tag Archives: python

data.frames in Python - DataMatrix

For a long time I have tried to handle text files in Python in the same way that R’s data.frame does - that is, direct access to columns and rows of a loaded text file. As I don’t like R at all, I struggled to find a Pythonic equivalent, and since I found none, I decided to eat my own food and write an implementation, which is what you’ll find below.

Read More »

QSql vs DB-API?

I’ve recently begun trying to create GUIs for my Python applications with PyQt, and I can say I’m absolutely loving the toolkit, relatively easy to use and featureful. As I’m trying to create a GUI for some module I wrote that deals with databases (using MySQLdb), I also learnt that Qt has a series of classes for dealing with databases, mainly QSql.

My question, directed to whoever has experience with QSql and the Python DB-API, is: what are the advantages of one approach to the other? I’m leaning towards DB-API because like that I can create modules which work also in command line applications.

Gene identifiers

While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs…), but today I had a list of mixed identifiers.

The subsequent idea was “let’s implement auto-detection of common identifiers in the class”. The problem is… is there any actual documentation on how identifiers are made? So far, using regular expressions, I’ve tracked down a few:

  • RefSeq
  • GenBank
  • Entrez Gene
  • UCSC Genome Browser
  • Ensembl

However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up?

(On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)

Data clustering with Python

Following up my recent post, I’ve been looking for alternatives to TMeV. So far I’ve found the R package pvclust and the Pycluster library, part of BioPython. The first one also performs bootstrapping (I’m not sure if it’s similar to what support trees do, but it’s still better than no resampling at all). I’ve found another Python project but it is still too basic to perform what I need.

Read More »