<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>dennogumi.org &#187; database</title>
	<atom:link href="http://www.dennogumi.org/tag/database/feed" rel="self" type="application/rss+xml" />
	<link>http://www.dennogumi.org</link>
	<description>On the web since 1999</description>
	<lastBuildDate>Tue, 27 Jul 2010 22:41:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The plague of cross-database annotations</title>
		<link>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations</link>
		<comments>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations#comments</comments>
		<pubDate>Sun, 02 Nov 2008 14:15:20 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=470</guid>
		<description><![CDATA[Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid &#8220;annotation files&#8221; (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date without having to remember to update them. So the obvious solution was to use a [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid &#8220;annotation files&#8221; (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date without having to remember to update them. So the obvious solution was to use a service available on the web, and in an automated way. For reference, I just tried to attach gene symbol, gene name, chromosome and cytoband.<br />
I tried many services:</p>
<ul>
<li><strong><a href="http://genome.ucsc.edu">UCSC Genome Browser</a></strong>: it has a MySQL server but it&#8217;s rather slow and I did not want to clog it up. Using their tables and .sql files I managed to get a first shot at annotation, but about 2,000 genes were without annotation!</li>
<li><strong><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene">NCBI&#8217;s own Entrez Gene</a></strong>: This needs EUtils, and in Biopython there is not a parser for Entrez Gene XML entries. I had to scrap the idea because I did not have time.</li>
<li><strong><a href="http://www.ensembl.org">Ensembl</a></strong>: I decided to use the <a href="http://www.biomart.org">Biomart</a> service, through Rpy. There were missing genes, and sometimes the IDs were &#8220;converted&#8221; in something else (I  had no time to figure out what was happening). Also some perfectly valid genes (in Entrez Gene) were not present in Ensembl.</li>
</ul>
<p>In the end I just grabbed <a href="http://www.bioconductor.org/packages/2.3/data/annotation/html/org.Hs.eg.db.html">Bioconductor&#8217;s &#8220;org.Hs.eg.db&#8221; package </a>and used its sqlite gene database (from Entrez Gene) to annotate the list, with only 97 missing IDs (mostly genes that had changed identifiers). However, this effort revealed a problem:<em>the annotations are not consistent between databases</em>. This is a real pain when doing microarray-based analysis, because you often have large number of genes and perceived lack of annotation might get lead to a number of them getting discarded. </p>
<p>I thought the situation was better than this. If I annotate genes in different databases with the same ID, I expect to get identical results. I mean, it&#8217;s not like Gene or Ensembl have little resources&#8230; or am I wrong?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
