In the past two days I’ve been tackling an annotation problem. I’m trying to provide annotations for genes found in regions that are significantly altered, DNA copy-number wise (thanks to the STAC method). The idea would be to annotate those regions (that span one megabase) using UCSC Table Browser.
However, the task was impractical, so I decided to automate it a bit. I converted the data into ranges and then used the KnownGene annotation file (downloaded from UCSC) to obtain which genes were in which reagion. The last part wasn’t easy at all (at least in Python), as I had to check for ranges and adjust for consecutive regions. The code is terribly ugly, so I’ll try to clean it up before posting it.
If I can I’ll try to integrate it with the other scripts I have written to make a small annotation pipeline.