UPDATE: Today I found out that J Brooks (the corresponding author of Zhao’s paper) has agreed to send the data I needed. Thanks a lot!
When you do bioinformatics, you often test your own procedures not only on your data, but also on datasets provided by other people and publicly available. As I stated previously, that’s what meta-analysis is. I’m doing a bit of that for my thesis and recently I noticed that some datasets, while being public, are far from complete.
I was looking at the data published by Zhao et al. today and while it’s a rather interesting dataset (177 samples of renal cell carcinoma compared to Human Universal Reference RNA), there is little or no information regarding the samples themselves. As I’m running analyses comparing different tumor grades, this is essential for me. However neither the supplementary materials nor the paper give any information. Basically this makes the whole dataset a lot less useful than what it could be.
On the same note, evaluating results by Jones et al. presented different problems, because of the aging annotation of the Affymetrix HG-U133A chip. Dai et al. have shown an interesting approach to reannotation for several Affymetrix chips, so I thought I could use that. However, while the supplementary materials give raw normalized data, there are no CEL files, needed for such a procedure.
Personally I think that all journals should make the submission to databases such as Array Express mandatory. MIAME was meant to be a way to give enough information about a microarray experiment, and it’s a shame that there are still so many hurdles when someone wants to make use of someone else’s data.
*[MIAME]: Minimal Information About a Microarray Experiment
Luca Beltrame SCIENCE