Software and biology

I’ve noticed that the journal Science code for Biology and Medicine has finally launched. While some said that would be a journal if someone is desperate for a publication, I think it fills in a gap that’s very felt in bioinformatics: the availability of source code.Perhaps I’m being too naive, but I think that at least academic groups should always release their source code, if they develop a new program or an algorithm, or just give a proof of concept so that others can reproduce their work. This will also help in finding and squashing bugs. Bugs that are sadly much present in a large part of the biological software outlined in publications.

One of the most striking examples is CNAG (short for Copy Number Analyzer for GeneChip) which is based on a very interesting algorithm which uses Hidden Markov Models to calculate DNA copy number. However the implementation is one of the buggiest I’ve ever seen, with frequent crashes and poor documentation (not to mention the English: please, don’t write “draw our paper” in your license agreement!). Availability of the source would help in fixing at least the obvious bugs (such as “crash when a list is empty”). Instead, I guess the authors were happy enough in just getting their paper published (though at least they’re still developing the program, unlike others).

In some cases the lack of source is down right outrageous. Affymetrix employees have published an algorithm for copy number detection called CARAT. The paper only includes the formal definition of algorithm without any working public implementation. And of course they’re not releasing anything- they want to keep it for future products. However the journal let them publish even though there is no guarantee of the results (since no one can reproduce them, unless they reimplement the algorithm from scratch).

What’s the point of making software that even free, locks you in to the people who provided it, that may just abandon it after a paper is published? No, the source should always be kept available: that’s why personally I think that the GNU GPL (version 2) is the only acceptable license for academic, non-profit biological software.

Dialogue & Discussion