Tag Archives: bioinformatics

Buggy bioinformatics software

As people who read my science-related posts know already, I’m not a big fan of software made just to support a publication. Recently I’ve stumbled again into similar software. Namely, I’m talking about the TIGR Multiexperiment Viewer (TMeV), a Java-based program which is often used for microarray analysis. It’s not exactly “fit for publication”, because it has reached version 4 last year, but shows some of the problems (mentioned already) with releasing bioinformatics software.

I use TMeV mostly because I didn’t find any other implementation of the hierarchical clustering algorithm with support trees. However, I’ve stumbled upon a very annoying bug in the most recent version. Normally I use average linkage clustering and as the distance metric I employ the Pearson’s correlation, and with gene and sample bootstrapping: with certain files this makes TMeV report errors at random during the iterations.

Read More »

SOFT file woes

Today I started working on a data set published on GEO. As the sample data were somehow inconsistent (they mentioned 23 controls when I found 28), I decided to parse the SOFT file from GEO in order to get the exact sample information.

I did a grave mistake. First of all, Biopython’s SOFT parser is horribly broken (doesn’t work at all) and quite undocumented: I could work around the lack of documentation (API docs) but not with the fact that it wouldn’t work. So I turned to R, which offers a GEO query module through Bioconductor.

Again that proved to be a terrible mistake. For a file containing 183 samples, the analysis is going on since four hours and with no sign of completing anytime soon (not to mention a  possible memory leak). After this, I gave up. I’m going to get the reduced data sheet and write a small parser in Python myself.

What is frustrating is the lack of quality: I could concentrate on my own work rather than reinventing the wheel for the nth time if the existing implementations worked. What’s the point in releasing non-working software? I could understand bugs, but this is one step further.

Easy RMA: RMAExpress

Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn’t want to use R: I have already mentioned how I don’t like its design and implementation. While looking for some documentation, I stumbled upon this nifty little program called RMAExpress.

Read More »