Data handling

As the people who read my science related posts already know, I’m in the middle of doing meta-analysis. That brought up a problem, so to speak, and it’s related to annotations.

Probes on microarrays are referenced to genes (to over-simplify): usually these references are made with the latest version of the genome available. As the map of the genome is not static, but it’s a moving target, these annotations tend to become obsolete. And that unfortunately leads to problems when you compare experiments made in different time frames.

To be precise, the papers I’m using the data from are from 2005 to 2006, but the actual experiments were performed earlier. One uses the annotation data from the Affymetrix HG-U133A chip, which (along with the whole HG-U133 family) have been proven to be outdated by Dai and coworkers. The other uses Entrez Gene identifiers, but some IDs are no longer valid or overlap.

How can such a situation be solved? For some experiments there’s nothing much to do, perhaps reannotate the IDs using an automated system (I believe this is possible), for others (Affy chips) the paper I linked gives a possible (and effective: we’ve tested it in our group) solution by creating new “meta-probes” that reflect the updated annotations.

In any case, you should be wary of that, should you want to compare different microarray datasets.

Dialogue & Discussion