<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>dennogumi.org &#187; bioinformatics</title>
	<atom:link href="http://www.dennogumi.org/tag/bioinformatics/feed" rel="self" type="application/rss+xml" />
	<link>http://www.dennogumi.org</link>
	<description>On the web since 1999</description>
	<lastBuildDate>Fri, 06 Jan 2012 14:56:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Gene search applet: suggestions and code review needed</title>
		<link>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed</link>
		<comments>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed#comments</comments>
		<pubDate>Tue, 31 Mar 2009 17:33:09 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=594</guid>
		<description><![CDATA[In the past months I&#8217;ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of &#8230; <a href="http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In the past months I&#8217;ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of my analysis work which I want to take a look at. I am often lazy, so instead of firing up the browser to look at the online resources, I wanted to write something which could access said resources programmatically.</p>
<p><span id="more-594"></span></p>
<p>I found a way thanks to the <a href="http://biopython.org" title="The Biopython project">Biopython project,</a> which offers a Python module to access the resources of the <a href="http://www.ncbi.nlm.nih.gov" title="NCBI">National Center for Biotechnology Information (NCBI)</a> by providing an interface to their <a href="http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html" title="EUtils web page">EUtils</a>. Since the back-end was already taken care of, almost, at least, I sought to write a small Plasma applet. Which is what I&#8217;m presenting today. It&#8217;s written in Python, and uses the Python ScriptEngine to work. Currently, it searches the &#8220;Gene&#8221; database at NCBI by inputting the &#8220;Entrez Gene IDs&#8221;, that are numerical IDs that uniquely identify a gene record, and returns name, official symbol,  organism, and a description if it&#8217;s present. It does not support anything else (see below).</p>
<p>The code lives in <a href="http://github.com/cswegger/plasma-genesearch/tree/master" title="Code repository">a git repository at github</a>. <strong>WARNING: </strong>The code may be a complete mess (I&#8217;m not too well versed in GUI stuff, I mostly do text file manipulation) If you are so daring, you can obtain and install it in a very simple manner:</p>
<p>
<pre class="brush: bash; title: ; notranslate">git clone git://github.com/cswegger/plasma-genesearch.git
cd plasma-genesearch
zip -r ../plasma-genesearch.plasmoid *
plasmapkg -i ../plasma-genesearch.plasmoid</pre>
</p>
<p>After that you will see an &#8220;Entrez Gene Searcher&#8221; in your add applets dialog. Once added, it&#8217;ll look like this:</p>
<p align="center"><img src="http://www.dennogumi.org/wp-content/uploads/2009/03/plasma-genesearch1.png?cda6c1" title="Gene searcher" alt="Gene searcher image" /></p>
<p align="left">Pretty horrible, isn&#8217;t it? Well, once you get past that, you can input an ID (only IDs will work for now) in the text field (which doesn&#8217;t clear the text: see further on) and push &#8220;Go!&#8221;. The following is an example with ID 10000, which corresponds to the human gene <em>AKT3</em>:</p>
<p align="center"><img src="http://www.dennogumi.org/wp-content/uploads/2009/03/plasma-genesearch2.png?cda6c1" title="Gene search results" alt="Gene search results image" /></p>
<p align="left">&#8220;Search again&#8221; will bring you back to the search form.</p>
<p align="left">Now, what has this to do with Planet KDE? Well, I&#8217;m asking for some code review from the community, if it&#8217;s possible, and suggestions to improve the horrid default look. I am especially interested in layouting, since I did not quite understand how it works, I mean, it should not work and it <em>does&#8230;.</em> </p>
<p align="left">Other things that need to be improved are:</p>
<ul>
<li align="left">The Plasma.TextEdit is not cleared upon clicking. Is there a signal I can catch for that, so I can connect it to clear()?</li>
<li align="left">Proper searching. Bio.Entrez already does this: what I need is  a way to display the records properly. </li>
<li align="left">A way to link the names to URLs, and have them open in Konqueror. </li>
</ul>
<p>That should be it. I hope to work on it some more next weekend&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>FOSS and research</title>
		<link>http://www.dennogumi.org/2008/05/foss-and-research</link>
		<comments>http://www.dennogumi.org/2008/05/foss-and-research#comments</comments>
		<pubDate>Sat, 10 May 2008 07:30:48 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[free software]]></category>
		<category><![CDATA[publish or perish]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=400</guid>
		<description><![CDATA[I&#8217;ve been wondering about why FOSS is often compared to the academic world, but at least in my limited experience, I see little people that grasp its concept in the world of research. On a quick look, developing FOSS in &#8230; <a href="http://www.dennogumi.org/2008/05/foss-and-research">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been wondering about why FOSS is often compared to the academic world, but at least in my limited experience, I see little people that grasp its concept in the world of research. On a quick look, developing FOSS in a research environment would be very good: not only you&#8217;d get publicly available results when you publish, but at the same time you can make sure that in an extreme case your application will be carried on by someone else should you not be able to continue development.</p>
<p>At least in the life sciences, it&#8217;s hard to see such a mentality. I can understand <!--intlink id="266" type="post" text="the publish or perish frenzy"-->, but at the same time, <!--intlink id="328" type="post" text="don\'t we all remember about published and unmantained software"-->? For me, such an idea would be optimal. Once the paper is out, you can release your software (GPL would be best) and make sure someone will improve or mantain in. Of course you won&#8217;t be able to publish for each upgrade you do, but I would generally think of that as a bad policy, one made just to increase the publication count.</p>
<p>Does something like that happen with FOSS in other research areas?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/05/foss-and-research/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance and R</title>
		<link>http://www.dennogumi.org/2008/04/performance-and-r</link>
		<comments>http://www.dennogumi.org/2008/04/performance-and-r#comments</comments>
		<pubDate>Sat, 05 Apr 2008 13:12:18 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=390</guid>
		<description><![CDATA[I&#8217;m often wondering why people only resort to R when working with microarrays. I can understand that Bioconductor offers a plethora of different packages and that R&#8217;s statistical functions come in handy for many applications, but still, I think people &#8230; <a href="http://www.dennogumi.org/2008/04/performance-and-r">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m often wondering why people only resort to R when working with microarrays. I can understand that <a title="Bioconductor home page" href="http://www.bioconductor.org">Bioconductor</a> offers a plethora of different packages and that R&#8217;s statistical functions come in handy for many applications, but still, I think people underestimate the impact of performance.</p>
<p>R is not a performing language at all, it doesn&#8217;t parallelize well when using HPC (at least from the talks I&#8217;ve had with people studying the matter), and in general is a memory and resource hog. For example, it takes much more to perform RMA via R that with <a title="RMAExpress" href="http://rmaexpress.bmbolstad.com/">RMAExpress</a> (which is a C++ application): the latter works also better with regards to memory utilization. I can understand the complexity of some statistical procedures, but what about <!--intlink id="298" type="post" text="parsing GEO files"-->?</p>
<p>The surprising aspect is that aside by a few exceptions (like the aforementioned RMAExpress) no one has tried to write more performing implementations of certain algorithms. I for one would welcome a non-R implementation of <abbr title="Significance Analysis of Microarrays">SAM</abbr> (the original implementation works in Excel&#8230; ugh) or similar algorithms. Otherwise we would be stuck with programs that are interesting, but way too memory hungry (<a title="AMDA: an R package for the automated microarray data analysis.AMDA: an R package for the automated microarray data analysis." href="http://www.ncbi.nlm.nih.gov/pubmed/16824223?ordinalpos=4&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum">AMDA</a> comes to mind).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/04/performance-and-r/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gene identifiers</title>
		<link>http://www.dennogumi.org/2007/11/gene-identifiers</link>
		<comments>http://www.dennogumi.org/2007/11/gene-identifiers#comments</comments>
		<pubDate>Thu, 15 Nov 2007 19:57:16 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/11/15/gene-identifiers</guid>
		<description><![CDATA[While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs&#8230;), but today I had &#8230; <a href="http://www.dennogumi.org/2007/11/gene-identifiers">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While working today on an annotation class in Python I stumbled on a problem. Normally I work with lists of genes that are consistent, i.e. all Entrez Gene IDs (or RefSeq IDs, or Genome Browser IDs&#8230;), but today I had a list of mixed identifiers.</p>
<p>The subsequent idea was &#8220;let&#8217;s implement auto-detection of common identifiers in the class&#8221;. The problem is&#8230; is there any actual documentation on how identifiers are made? So far, using regular expressions, I&#8217;ve tracked down a few:</p>
<ul>
<li>RefSeq</li>
<li>GenBank</li>
<li>Entrez Gene</li>
<li>UCSC Genome Browser</li>
<li>Ensembl</li>
</ul>
<p>However, I have no idea if I have implemented all types of these IDs. Does anyone know a place where to look these information up?</p>
<p>(On a related note: my thesis defense will be on January 14th, 2008, so I have to get the printing going)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/11/gene-identifiers/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data clustering with Python</title>
		<link>http://www.dennogumi.org/2007/11/data-clustering-with-python</link>
		<comments>http://www.dennogumi.org/2007/11/data-clustering-with-python#comments</comments>
		<pubDate>Wed, 07 Nov 2007 18:15:29 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/11/07/data-clustering-with-python</guid>
		<description><![CDATA[Notice:Just now I realized this has been linked to to a Stack Overflow question. I recently wrote a new post that uses a different technique and a combination of R and Python. Check it out! Following up my recent post, &#8230; <a href="http://www.dennogumi.org/2007/11/data-clustering-with-python">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Notice:</strong>Just now I realized this has been linked to <a href="http://stackoverflow.com/questions/5002783/best-python-clustering-library-to-use-for-product-data-analysis">to a Stack Overflow question</a>. I recently wrote a new post that uses a different technique and a combination of R and Python. <a href="http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r">Check it out!</a></p>
<p>Following up my recent post, I&#8217;ve been looking for alternatives to TMeV. So far I&#8217;ve found the R package pvclust and the <a href="http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm#pycluster" title="Pycluster">Pycluster library</a>, part of <a href="http://biopython.org" title="Biopython">BioPython</a>.  The first one also performs bootstrapping (I&#8217;m not sure if it&#8217;s similar to what support trees do, but it&#8217;s still better than no resampling at all). I&#8217;ve found <a href="http://python-cluster.sourceforge.net/" title="python-cluster">another Python project</a> but it is still too basic to perform what I need.</p>
<p><span id="more-330"></span><br />
Pvclust would be my first interest, but it only plots dendrograms and not heatmaps, and the clustering must be done twice by transposing the data (it only clusters columns). <a href="http://www.is.titech.ac.jp/~shimo/prog/pvclust/" title="Pvclust page">The package&#8217;s web page</a> shows the various options and what to do with it.</p>
<p>Pycluster, on the other hand, can be used to generate files which can be read by the Java TreeView program, where you can view a heat map of the results and their annotations. Although there&#8217;s documentation available, it is not part of the Biopython documentation (as usual, I&#8217;d say: lack of documentation is a plague for Biopython). In any case, doing a cluster analysis is rather simple, but we need to remember that we need to do two cluster runs (one for genes, the other for experiments). Here I show an example with hierarchical clustering, but <a href="http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/cluster.pdf" title="Pycluster documentation on chapter 8">the documentation</a> (Python part on chapter 8) has examples also with other methods such as SOMs or k-means.</p>
<pre class="brush: python; title: ; notranslate">

from Bio.Cluster import *

#   Load data, in Cluster format
data = DataFile(&quot;somefile.txt&quot;)

#   Clustering using Pearson's correlation and average linkage
gene_clustering=data.treecluster(method=&quot;a&quot;,dist=&quot;c&quot;,transpose=0)

#   Same as above, but clustering samples
exp_clustering = data.treecluster(method=&quot;a&quot;,dist=&quot;c&quot;, transpose=1)

#   We then save the results to a series of files to view in Java TreeView
data.save(&quot;name&quot;,gene_clustering,exp_clustering)
</pre>
<p><a href="http://jtreeview.sourceforge.net/" title="Java Tree View">Java TreeView</a> is a program to view trees and heat maps. Unlike its counterpart TreeView, it&#8217;s truly cross-platform (Java) and GPLed, a nice added bonus. You can load the files directly and display the results like in this picture, taken with the sample data available on the project page.</p>
<p align="center"> <a href="http://www.dennogumi.org/wp-content/uploads/2007/11/treeview.png?cda6c1" class="thickbox" title="Java TreeView"><img src="http://www.dennogumi.org/wp-content/uploads/2007/11/treeview.thumbnail.png?cda6c1" alt="Java TreeView" class="imageframe" height="133" width="200" /></a></p>
<p>It&#8217;s still not perfect (no data shown on the main map page, only with the detailed view) but a good start, nevertheless. I&#8217;ll investigate whether I can complement TMeV usage with these tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/11/data-clustering-with-python/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Buggy bioinformatics software</title>
		<link>http://www.dennogumi.org/2007/11/buggy-bioinformatics-software</link>
		<comments>http://www.dennogumi.org/2007/11/buggy-bioinformatics-software#comments</comments>
		<pubDate>Wed, 07 Nov 2007 14:43:52 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[publish or perish]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/11/07/buggy-bioinformatics-software</guid>
		<description><![CDATA[As people who read my science-related posts know already, I&#8217;m not a big fan of {{post id=&#8221;software-and-biological-research&#8221; text=&#8221;software made just to support a publication&#8221;}}. Recently I&#8217;ve stumbled again into similar software. Namely, I&#8217;m talking about the TIGR Multiexperiment Viewer (TMeV), &#8230; <a href="http://www.dennogumi.org/2007/11/buggy-bioinformatics-software">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As people who read my science-related posts know already, I&#8217;m not a big fan of {{post id=&#8221;software-and-biological-research&#8221; text=&#8221;software made just to support a publication&#8221;}}. Recently I&#8217;ve stumbled again into similar software. Namely, I&#8217;m talking about the <a href="http://www.tm4.org/mev.html" title="Multiexperiment Viewer">TIGR Multiexperiment Viewer (TMeV)</a>, a Java-based program which is often used for microarray analysis. It&#8217;s not exactly &#8220;fit for publication&#8221;, because it has reached version 4 last year, but shows some of the problems ({{post id=&#8221;genbugg&#8221; text=&#8221;mentioned already&#8221;}}) with releasing bioinformatics software.</p>
<p>I use TMeV mostly because I didn&#8217;t find any other implementation of the hierarchical clustering algorithm with support trees. However, I&#8217;ve stumbled upon a very annoying bug in the most recent version. Normally I use average linkage clustering and as the distance metric I employ the Pearson&#8217;s correlation, and with gene and sample bootstrapping: with certain files this makes TMeV report errors at random during the iterations.</p>
<p><span id="more-328"></span><br />
<a href="http://www.dennogumi.org/wp-content/uploads/2007/11/tmev.png?cda6c1" class="thickbox" title="What a meaningful error…"><img src="http://www.dennogumi.org/wp-content/uploads/2007/11/tmev.thumbnail.png?cda6c1" alt="What a meaningful error…" class="imageframe" align="left" height="78" width="200" /></a>What you see on the left is the &#8220;error&#8221; that TMeV gives. As you can see, it is all but informative. Digging a bit, it shows that Java throws an ArrayIndexOutOfBoundsException, but I wonder how, since this happens with different data files that have nothing in common at all.</p>
<p>Since I don&#8217;t want to pass for someone who just whines, I contacted the MeV developers and offered to give also example files, but I&#8217;ve got no response at all. Luckily, the older version of MeV (3.1) is still around and works.</p>
<p>Now, I wonder how this was released in the first place: it&#8217;s not the only bug I found, the other being a scaling algorithm that would mistake the files for Affymetrix MAS5 expression values (when they have nothing in common with that). There is absolutely no mention of that in the TMeV page. At least some release notes would be helpful.</p>
<p>Why am I making a big fuss over this? Because this is not the first time that I&#8217;ve wasted my time working around bugs instead of using the software for what it was meant: for research.  Instead, there seem to be little interest since the publication is already out. Didn&#8217;t someone tell those people that if you release some software (especially if it is widely used) you&#8217;re expected to provide at least little maintenance?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/11/buggy-bioinformatics-software/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SOFT file woes</title>
		<link>http://www.dennogumi.org/2007/10/soft-file-woes</link>
		<comments>http://www.dennogumi.org/2007/10/soft-file-woes#comments</comments>
		<pubDate>Tue, 09 Oct 2007 20:00:23 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/10/09/soft-file-woes</guid>
		<description><![CDATA[Today I started working on a data set published on GEO. As the sample data were somehow inconsistent (they mentioned 23 controls when I found 28), I decided to parse the SOFT file from GEO in order to get the &#8230; <a href="http://www.dennogumi.org/2007/10/soft-file-woes">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today I started working on a data set published on <a href="http://www.ncbi.nlm.nih.gov/geo/" title="Gene Expression Omnibus">GEO</a>. As the sample data were somehow inconsistent (they mentioned 23 controls when I found 28), I decided to parse the <a href="http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat" title="GEO SOFT Deposit">SOFT</a> file from GEO in order to get the exact sample information.</p>
<p>I did a grave mistake. First of all, <a href="http://www.biopython.org" title="Biopython">Biopython</a>&#8216;s SOFT parser is horribly broken (doesn&#8217;t work at all) and quite undocumented: I could work around the lack of documentation (API docs) but not with the fact that it wouldn&#8217;t work. So I turned to <a href="http://www.r-project.org" title="The R Project for Statistical Computing">R</a>, which offers a GEO query module through <a href="http://www.bioconductor.org" title="Bioconductor">Bioconductor</a>.</p>
<p>Again that proved to be a terrible mistake. For a file containing 183 samples, the analysis is going on since <strong>four hours</strong> and with no sign of completing anytime soon (not to mention a  possible memory leak). After this, I gave up. I&#8217;m going to get the reduced data sheet and write a small parser in Python myself.</p>
<p>What is frustrating is the lack of quality: I could concentrate on my own work rather than reinventing the wheel for the nth time if the existing implementations worked. What&#8217;s the point in releasing non-working software? I could understand bugs, but this is one step further.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/10/soft-file-woes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Easy RMA: RMAExpress</title>
		<link>http://www.dennogumi.org/2007/10/easy-rma-rmaexpress</link>
		<comments>http://www.dennogumi.org/2007/10/easy-rma-rmaexpress#comments</comments>
		<pubDate>Thu, 04 Oct 2007 21:17:56 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[affymetrix]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2007/10/04/easy-rma-rmaexpress</guid>
		<description><![CDATA[Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn&#8217;t want to use R: I have already mentioned how I don&#8217;t like its design and implementation. While looking &#8230; <a href="http://www.dennogumi.org/2007/10/easy-rma-rmaexpress">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Today I was looking for an easy way to do some calculations of raw expression data on Affymetrix arrays, but I didn&#8217;t want to use <a href="http://www.r-project.org" title="The R programming language">R</a>: I have already mentioned how I don&#8217;t like its design and implementation. While looking for some documentation, I stumbled upon this nifty little program called <a href="http://rmaexpress.bmbolstad.com/" title="RMAExpress">RMAExpress</a>.</p>
<p><span id="more-296"></span>Let me first say what RMA is about: it stands for &#8220;Robust Multi-array Average&#8221; and is a model-based quantification method for Affymetrix arrays, <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=12925520&amp;ordinalpos=3&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum" title="Exploration, normalization, and summaries of high density oligonucleotide array probe level data.">originally developed by Irizarry <em>et al.</em> </a>It has a number of advantages over the Microarray Analysis Suite 5 (MAS5) algorithm used by Affymetrix software, especially with weakly expressed transcripts. It is commonly made up of three steps: background correction, quantile normalization and median polish.</p>
<p>RMAExpress is a C++, GUI-based program (using <a href="http://www.wxwidgets.org/" title="wxWidgets">wxWidgets</a>) that performs this process. The main advantage over the various R implementations is speed, as R doesn&#8217;t really excel in this regards. You can adjust the various RMA parameters, and you can also view the model representations, to see if some areas on the array perform differently (e.g., when there are irregularities in the signal intensities).</p>
<p>What I liked best is that you can use custom chip definition files (CDFs). <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=16284200&amp;ordinalpos=8&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum" title="Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data.">Dai <em>et al.</em></a> have already shown that old 3&#8242; GeneChips have outdated annotations, and have proposed new CDFs to compensate. We have already tested their improvement and it gives a nice increase in the number of annotated genes. RMAExpress processes these CDFs just fine.</p>
<p>Finally, you can export data either in log2 format (to use in procedures like <a href="http://www-stat.stanford.edu/~tibs/SAM/" title="Significance Analysis of Microarrays">SAM</a>) or in absolute form (which I need for my work). The program is extremely light and processes a good number of arrays fairly quickly. Windows users have a pre-built binary, while Linux ones need to build sources. The instructions on the page are overly complicated: here&#8217;s how I managed to build it on Kubuntu:</p>
<pre class="brush: cpp; title: ; notranslate">

sudo aptitude install libwxtgk2.8-dev
mkdir tmp
cd tmp
tar xvzf  /path/to/RMAExpress_1.0beta3_src.tar.gz
make all
</pre>
<p>After that, just run RMAExpress from its directory.</p>
<p>After all, I&#8217;m quite pleased with the program and I will keep using it in the future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2007/10/easy-rma-rmaexpress/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using xcache (Feed is rejected)
Page Caching using xcache
Database Caching 1/30 queries in 0.017 seconds using xcache
Object Caching 659/721 objects using xcache

Served from: www.dennogumi.org @ 2012-02-05 05:19:44 -->
