<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>dennogumi.org &#187; microarray</title>
	<atom:link href="http://www.dennogumi.org/tag/microarray/feed" rel="self" type="application/rss+xml" />
	<link>http://www.dennogumi.org</link>
	<description>On the web since 1999</description>
	<lastBuildDate>Fri, 06 Jan 2012 14:56:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The plague of cross-database annotations</title>
		<link>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations</link>
		<comments>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations#comments</comments>
		<pubDate>Sun, 02 Nov 2008 14:15:20 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=470</guid>
		<description><![CDATA[Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid &#8220;annotation files&#8221; (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date &#8230; <a href="http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid &#8220;annotation files&#8221; (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date without having to remember to update them. So the obvious solution was to use a service available on the web, and in an automated way. For reference, I just tried to attach gene symbol, gene name, chromosome and cytoband.<br />
I tried many services:</p>
<ul>
<li><strong><a href="http://genome.ucsc.edu">UCSC Genome Browser</a></strong>: it has a MySQL server but it&#8217;s rather slow and I did not want to clog it up. Using their tables and .sql files I managed to get a first shot at annotation, but about 2,000 genes were without annotation!</li>
<li><strong><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene">NCBI&#8217;s own Entrez Gene</a></strong>: This needs EUtils, and in Biopython there is not a parser for Entrez Gene XML entries. I had to scrap the idea because I did not have time.</li>
<li><strong><a href="http://www.ensembl.org">Ensembl</a></strong>: I decided to use the <a href="http://www.biomart.org">Biomart</a> service, through Rpy. There were missing genes, and sometimes the IDs were &#8220;converted&#8221; in something else (I  had no time to figure out what was happening). Also some perfectly valid genes (in Entrez Gene) were not present in Ensembl.</li>
</ul>
<p>In the end I just grabbed <a href="http://www.bioconductor.org/packages/2.3/data/annotation/html/org.Hs.eg.db.html">Bioconductor&#8217;s &#8220;org.Hs.eg.db&#8221; package </a>and used its sqlite gene database (from Entrez Gene) to annotate the list, with only 97 missing IDs (mostly genes that had changed identifiers). However, this effort revealed a problem:<em>the annotations are not consistent between databases</em>. This is a real pain when doing microarray-based analysis, because you often have large number of genes and perceived lack of annotation might get lead to a number of them getting discarded. </p>
<p>I thought the situation was better than this. If I annotate genes in different databases with the same ID, I expect to get identical results. I mean, it&#8217;s not like Gene or Ensembl have little resources&#8230; or am I wrong?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance and R</title>
		<link>http://www.dennogumi.org/2008/04/performance-and-r</link>
		<comments>http://www.dennogumi.org/2008/04/performance-and-r#comments</comments>
		<pubDate>Sat, 05 Apr 2008 13:12:18 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=390</guid>
		<description><![CDATA[I&#8217;m often wondering why people only resort to R when working with microarrays. I can understand that Bioconductor offers a plethora of different packages and that R&#8217;s statistical functions come in handy for many applications, but still, I think people &#8230; <a href="http://www.dennogumi.org/2008/04/performance-and-r">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m often wondering why people only resort to R when working with microarrays. I can understand that <a title="Bioconductor home page" href="http://www.bioconductor.org">Bioconductor</a> offers a plethora of different packages and that R&#8217;s statistical functions come in handy for many applications, but still, I think people underestimate the impact of performance.</p>
<p>R is not a performing language at all, it doesn&#8217;t parallelize well when using HPC (at least from the talks I&#8217;ve had with people studying the matter), and in general is a memory and resource hog. For example, it takes much more to perform RMA via R that with <a title="RMAExpress" href="http://rmaexpress.bmbolstad.com/">RMAExpress</a> (which is a C++ application): the latter works also better with regards to memory utilization. I can understand the complexity of some statistical procedures, but what about <!--intlink id="298" type="post" text="parsing GEO files"-->?</p>
<p>The surprising aspect is that aside by a few exceptions (like the aforementioned RMAExpress) no one has tried to write more performing implementations of certain algorithms. I for one would welcome a non-R implementation of <abbr title="Significance Analysis of Microarrays">SAM</abbr> (the original implementation works in Excel&#8230; ugh) or similar algorithms. Otherwise we would be stuck with programs that are interesting, but way too memory hungry (<a title="AMDA: an R package for the automated microarray data analysis.AMDA: an R package for the automated microarray data analysis." href="http://www.ncbi.nlm.nih.gov/pubmed/16824223?ordinalpos=4&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum">AMDA</a> comes to mind).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/04/performance-and-r/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Follow up on meta-analysis</title>
		<link>http://www.dennogumi.org/2008/02/follow-up-on-meta-analysis</link>
		<comments>http://www.dennogumi.org/2008/02/follow-up-on-meta-analysis#comments</comments>
		<pubDate>Thu, 28 Feb 2008 19:42:15 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[meta-analysis]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2008/02/28/follow-up-on-meta-analysis</guid>
		<description><![CDATA[Fourteen days since my last post. Quite a while, indeed. Mostly I&#8217;ve been stumbled with work and some health related issues. Anyway, I thought I&#8217;d follow up on the meta analysis matter I discussed in my last post. It turns &#8230; <a href="http://www.dennogumi.org/2008/02/follow-up-on-meta-analysis">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Fourteen days since my last post. Quite a while, indeed. Mostly I&#8217;ve been stumbled with work and some health related issues. Anyway, I thought I&#8217;d follow up on the meta analysis matter I discussed in my last post.</p>
<p>It turns out that it&#8217;s a fault of both limma and the data sets, because apparently the raw data found in the Stanford Microarray Database have different length, gene-wise (a result of not all spots on the array being good?) and limma itself does need equal length tables to form a single object (I stumbled upon the same problem when doing my thesis, but I used a hack to work around it), and does not perform any checking.</p>
<p>According to the documentation, the &#8220;merge&#8221; command should be used to deal with these cases, but here&#8217;s what I get:</p>
<pre class="brush: cpp; title: ; notranslate">

&gt;&gt; RG1 = read.maimages(file=&quot;file1.txt&quot;,source=&quot;smd&quot;)
Read file1.txt
&gt;&gt; RG2 = read.maimages(file=&quot;file2.txt&quot;,source=&quot;smd&quot;)
Read file2.txt
&gt;&gt; merge(RG1,RG2)
Error in merge(RG1,RG2): Need row names to align on
&gt;&gt; rownames(RG1)
NULL
&gt;&gt; rownames(RG2)
NULL </pre>
<p>I&#8217;m going to ask the Bioconductor ML and see what they tell me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/02/follow-up-on-meta-analysis/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meta analysis difficulty increasing</title>
		<link>http://www.dennogumi.org/2008/02/meta-analysis-difficulty-increasing</link>
		<comments>http://www.dennogumi.org/2008/02/meta-analysis-difficulty-increasing#comments</comments>
		<pubDate>Thu, 14 Feb 2008 20:17:09 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[meta-analysis]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2008/02/14/meta-analysis-difficulty-increasing</guid>
		<description><![CDATA[Again in the past days I&#8217;ve been banging my head thanks to the fact that doing meta-analysis with microarray data is more difficult than what it seems. The problem sometimes lies in the data, sometimes lies in the analysis software &#8230; <a href="http://www.dennogumi.org/2008/02/meta-analysis-difficulty-increasing">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Again in the past days I&#8217;ve been banging my head thanks to the fact that doing meta-analysis with microarray data is more difficult than what it seems.</p>
<p>The problem sometimes lies in the data, sometimes lies in the analysis  software and sometimes in a combination of factors. When doing work on a public data set (Zhao et al., 2005), I had to start analysis from raw data. Now, I tried using both the limma and marray Bioconductor packages, but both of them bail out with cryptic error messages. From what I&#8217;ve learnt by googling around, it seems that R doesn&#8217;t like batch loading of tables of different length.</p>
<p>I have 177 samples and I <strong>have</strong> to normalize them all together. Apparently this is a quirk of marray and limma (or worse, R itself) which is preventing me to work properly. And this is not the first time it happens, either: in the past year I&#8217;ve lost a lot of time dealing with software issues rather than performing real analsis. The problem has been posted already on some R mailing lists (and on BioC, too), but judging from the responses I doubt I&#8217;ll see a solution.</p>
<p>I guess I&#8217;ll have to work around this somehow (and of course, this doesn&#8217;t improve the idea I have of R&#8230;).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/02/meta-analysis-difficulty-increasing/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gene identifiers</title>
		<link>http://www.dennogumi.org/2007/11/gene-identifiers</link>
		<comments>http://www.dennogumi.org/2007/11/gene-identifiers#comments</comments>
		<pubDate>Thu, 15 Nov 2007 19:57:16 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/11/15/gene-identifiers</guid>
		<description><![CDATA[While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs&#8230;), but today I had &#8230; <a href="http://www.dennogumi.org/2007/11/gene-identifiers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs&#8230;), but today I had a list of mixed identifiers.</p>
<p>The subsequent idea was &#8220;let&#8217;s implement auto-detection of common identifiers in the class&#8221;. The problem is&#8230; is there any actual documentation on how identifiers are made? So far, using regular expressions, I&#8217;ve tracked down a few:</p>
<ul>
<li>RefSeq</li>
<li>GenBank</li>
<li>Entrez Gene</li>
<li>UCSC Genome Browser</li>
<li>Ensembl</li>
</ul>
<p>However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up?</p>
<p>(On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/11/gene-identifiers/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Easy RMA: RMAExpress</title>
		<link>http://www.dennogumi.org/2007/10/easy-rma-rmaexpress</link>
		<comments>http://www.dennogumi.org/2007/10/easy-rma-rmaexpress#comments</comments>
		<pubDate>Thu, 04 Oct 2007 21:17:56 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[affymetrix]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/10/04/easy-rma-rmaexpress</guid>
		<description><![CDATA[Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn&#8217;t want to use R: I have already mentioned how I don&#8217;t like its design and implementation. While looking &#8230; <a href="http://www.dennogumi.org/2007/10/easy-rma-rmaexpress">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn&#8217;t want to use <a href="http://www.r-project.org" title="The R programming language">R</a>: I have already mentioned how I don&#8217;t like its design and implementation. While looking for some documentation, I stumbled upon this nifty little program called <a href="http://rmaexpress.bmbolstad.com/" title="RMAExpress">RMAExpress</a>.</p>
<p><span id="more-296"></span>Let me first say what RMA is about: it stands for &#8220;Robust Multi-array Average&#8221; and is a model-based quantification method for Affymetrix arrays, <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=12925520&amp;ordinalpos=3&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum" title="Exploration, normalization, and summaries of high density oligonucleotide array probe level data.">originally developed by Irizarry <em>et al.</em> </a>It has a number of advantages over the Microarray Analysis Suite 5 (MAS5) algorithm used by Affymetrix software, especially with weakly expressed transcripts. It is commonly made up of three steps: background correction, quantile normalization and median polish.</p>
<p>RMAExpress is a C++, GUI-based program (using <a href="http://www.wxwidgets.org/" title="wxWidgets">wxWidgets</a>) that performs this process. The main advantage over the various R implementations is speed, as R doesn&#8217;t really excel in this regards. You can adjust the various RMA parameters, and you can also view the model representations, to see if some areas on the array perform differently (e.g., when there are irregularities in the signal intensities).</p>
<p>What I liked best is that you can use custom chip definition files (CDFs). <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=16284200&amp;ordinalpos=8&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum" title="Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data.">Dai <em>et al.</em></a> have already shown that old 3&#8242; GeneChips have outdated annotations, and have proposed new CDFs to compensate. We have already tested their improvement and it gives a nice increase in the number of annotated genes. RMAExpress processes these CDFs just fine.</p>
<p>Finally, you can export data either in log2 format (to use in procedures like <a href="http://www-stat.stanford.edu/~tibs/SAM/" title="Significance Analysis of Microarrays">SAM</a>) or in absolute form (which I need for my work). The program is extremely light and processes a good number of arrays fairly quickly. Windows users have a pre-built binary, while Linux ones need to build sources. The instructions on the page are overly complicated: here&#8217;s how I managed to build it on Kubuntu:</p>
<pre class="brush: cpp; title: ; notranslate">

sudo aptitude install libwxtgk2.8-dev
mkdir tmp
cd tmp
tar xvzf  /path/to/RMAExpress_1.0beta3_src.tar.gz
make all
</pre>
<p>After that, just run RMAExpress from its directory.</p>
<p>After all, I&#8217;m quite pleased with the program and I will keep using it in the future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/10/easy-rma-rmaexpress/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using xcache (Feed is rejected)
Page Caching using xcache
Database Caching using xcache
Object Caching 529/593 objects using xcache

Served from: www.dennogumi.org @ 2012-02-05 05:08:34 -->
