December 22, 2007 – 12:53
There is always a lot of talk about “brain drain” (fuga di cervelli in Italian) from my country. I keep on reading disgruntled comments of low pays and poor research, and that going abroad is the only solution for an Italian scientist to be successful.
While I believe that research done outside of my country can be handled better (but it’s impossible to know for sure: never tar everyone with the same brush), I think that, also thanks to the way the media and the scientists themselves handle it, in everyone’s view it has almost become like the El Dorado. And that, in my opinion, is incorrect.
Read More »
November 15, 2007 – 20:57
While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs…), but today I had a list of mixed identifiers.
The subsequent idea was “let’s implement auto-detection of common identifiers in the class”. The problem is… is there any actual documentation on how identifiers are made? So far, using regular expressions, I’ve tracked down a few:
- RefSeq
- GenBank
- Entrez Gene
- UCSC Genome Browser
- Ensembl
However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up?
(On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)
Following up my recent post, I’ve been looking for alternatives to TMeV. So far I’ve found the R package pvclust and the Pycluster library, part of BioPython. The first one also performs bootstrapping (I’m not sure if it’s similar to what support trees do, but it’s still better than no resampling at all). I’ve found another Python project but it is still too basic to perform what I need.
Read More »
As people who read my science-related posts know already, I’m not a big fan of software made just to support a publication. Recently I’ve stumbled again into similar software. Namely, I’m talking about the TIGR Multiexperiment Viewer (TMeV), a Java-based program which is often used for microarray analysis. It’s not exactly “fit for publication”, because it has reached version 4 last year, but shows some of the problems (mentioned already) with releasing bioinformatics software.
I use TMeV mostly because I didn’t find any other implementation of the hierarchical clustering algorithm with support trees. However, I’ve stumbled upon a very annoying bug in the most recent version. Normally I use average linkage clustering and as the distance metric I employ the Pearson’s correlation, and with gene and sample bootstrapping: with certain files this makes TMeV report errors at random during the iterations.
Read More »