Tag Archives: Science

Commercial applications, public funding

I wanted to write this earier, but I couldn’t: I’m now in a hotel in Maastricht, Netherlands, and waiting to get back tomorrow. I’ve been attending the 4th NuGO hands-on advanced microarray data analysis course and I even wanted to blog about it… but the hotel’s connection did not resolve any non-European web page until late today.

Read More »

FOSS and research

I’ve been wondering about why FOSS is often compared to the academic world, but at least in my limited experience, I see little people that grasp its concept in the world of research. On a quick look, developing FOSS in a research environment would be very good: not only you’d get publicly available results when you publish, but at the same time you can make sure that in an extreme case your application will be carried on by someone else should you not be able to continue development.

At least in the life sciences, it’s hard to see such a mentality. I can understand the publish or perish frenzy, but at the same time, don’t we all remember about published and unmantained software? For me, such an idea would be optimal. Once the paper is out, you can release your software (GPL would be best) and make sure someone will improve or mantain in. Of course you won’t be able to publish for each upgrade you do, but I would generally think of that as a bad policy, one made just to increase the publication count.

Does something like that happen with FOSS in other research areas?

Performance and R

I’m often wondering why people only resort to R when working with microarrays. I can understand that Bioconductor offers a plethora of different packages and that R’s statistical functions come in handy for many applications, but still, I think people underestimate the impact of performance.

R is not a performing language at all, it doesn’t parallelize well when using HPC (at least from the talks I’ve had with people studying the matter), and in general is a memory and resource hog. For example, it takes much more to perform RMA via R that with RMAExpress (which is a C++ application): the latter works also better with regards to memory utilization. I can understand the complexity of some statistical procedures, but what about parsing GEO files?

The surprising aspect is that aside by a few exceptions (like the aforementioned RMAExpress) no one has tried to write more performing implementations of certain algorithms. I for one would welcome a non-R implementation of SAM (the original implementation works in Excel… ugh) or similar algorithms. Otherwise we would be stuck with programs that are interesting, but way too memory hungry (AMDA comes to mind).

Follow up on meta-analysis

Fourteen days since my last post. Quite a while, indeed. Mostly I’ve been stumbled with work and some health related issues. Anyway, I thought I’d follow up on the meta analysis matter I discussed in my last post.

It turns out that it’s a fault of both limma and the data sets, because apparently the raw data found in the Stanford Microarray Database have different length, gene-wise (a result of not all spots on the array being good?) and limma itself does need equal length tables to form a single object (I stumbled upon the same problem when doing my thesis, but I used a hack to work around it), and does not perform any checking.

According to the documentation, the “merge” command should be used to deal with these cases, but here’s what I get:


>> RG1 = read.maimages(file="file1.txt",source="smd")
Read file1.txt
>> RG2 = read.maimages(file="file2.txt",source="smd")
Read file2.txt
>> merge(RG1,RG2)
Error in merge(RG1,RG2): Need row names to align on
>> rownames(RG1)
NULL
>> rownames(RG2)
NULL

I’m going to ask the Bioconductor ML and see what they tell me.