<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>dennogumi.org &#187; Science</title>
	<atom:link href="http://www.dennogumi.org/category/science/feed" rel="self" type="application/rss+xml" />
	<link>http://www.dennogumi.org</link>
	<description>On the web since 1999</description>
	<lastBuildDate>Fri, 06 Jan 2012 14:56:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Multiscale bootstrap clustering with Python and R</title>
		<link>http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r</link>
		<comments>http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r#comments</comments>
		<pubDate>Sun, 29 May 2011 12:11:40 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=906</guid>
		<description><![CDATA[While reading the statistics for my blog, I noticed that a number of searches looked for hierarchical clustering with Python, which I covered quite a while ago. Today I&#8217;d like to present an updated version which uses more robust techniques. &#8230; <a href="http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While reading the statistics for my blog, I noticed that a number of searches looked for hierarchical clustering with Python, which <a href="http://www.dennogumi.org/2007/11/data-clustering-with-python">I covered quite a while ago</a>. Today I&#8217;d like to present an updated version which uses more robust techniques.</p>
<p><span id="more-906"></span></p>
<h2>Defining the problem</h2>
<p>Since Eisen&#8217;s original paper on clustering, this form of analysis has been widely used by a lot of researchers. However, as it is known, such systems may be susceptible to an ordering bias: in other words, the order of the samples and/or genes might influence the final result. That&#8217;s why popular software such as <a href="http://www.tm4.org/mev/">TMeV</a> offers alternative approaches, based on <i>bootstrapping</i>.&nbsp;</p>
<p>In this specific form of bootstrapping, the samples and/or genes are randomly shuffled a number of times (1000 or more iterations are a good starting point) and the resulting dendrograms checked for consistency and robustness of partitioning. In other words, a p-value is calculated, our null hypothesis being that the arrangement of samples/genes is merely due by chance. Depending on the software, this value might be expressed either in form of p-value or percentage (TMeV calls it <i>support</i>).&nbsp;</p>
<p>In the past years, I found <a href="http://www.is.titech.ac.jp/~shimo/prog/pvclust/">an interesting method developed by Hidetoshi Shimodaira</a>: the technique, called <i>multiscale bootstrap resampling</i>, aims at determining more accurate p-values out of the bootstrapping. Shimodaira calls the resulting p-value an <i>AU</i>&nbsp;value, where AU stands for &#8220;approximately unbiased&#8221;, a more precise p-value than the one obtained through bootstrapping alone.</p>
<p>In addition to this nice algorithm, a R package was also provided, named <i>pvclust </i>(it&#8217;s available on your favorite CRAN mirror). And that&#8217;s exactly what we&#8217;ll use for this exercise.</p>
<h2>Prerequisites</h2>
<p>Some of the readers of this blog might remember my disdain of R: while I need to use it for Bioconductor, I&#8217;m often annoyed by its weird syntax, and difficult to understand error messages. Luckily, thanks to the hard work of Laurent Gautier and contributors, there&#8217;s <a href="http://rpy.sourceforge.net">rpy2</a>, a nice R-to-Python bridge. All the examples here require this package, version 2.1 or newer (I&#8217;d recommend the release candidate of 2.2, it&#8217;s really nice). Unfortunately, this means that Windows users are out of luck as there&#8217;s no version of rpy2 2.1 or 2.2 available for that platform..</p>
<p>Also, don&#8217;t forget to have the pvclust package installed in R.</p>
<h2>Loading and preparing the data</h2>
<div>Let&#8217;s start first by importing the necessary bits:</div>
<div></div>
<div>
<pre class="brush: python; title: ; notranslate">
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
</pre>
</div>
<div>The second line is important, because it&#8217;ll let us play with R libraries as they were packages. Case in point, we&#8217;ll get the &#8220;base&#8221; and &#8220;pvclust&#8221; libraries loaded:</div>
<pre class="brush: python; title: ; notranslate">
base = importr(&quot;base&quot;)
pvclust = importr(&quot;pvclust&quot;)
</pre>
<div>Now we can manipulate them as if they were modules, and (most) of R&#8217;s dotted functions have been converted to underscores, as the dot is the namespace operator in Python. Example: as.data.frame becomes as_data_frame.&nbsp;</div>
<div>Next, we&#8217;ll load the data in a data.frame. rpy2 conveniently gives us the <i>DataFrame </i>class, which is a no-nonsense wrapper to R&#8217;s data.frames. For this exercise, we&#8217;ll load a set of normalized data from <a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4984">GSE4984</a>, a microarray experiment with dendritic cells expoosed to different stimuli. It&#8217;s just a matter of downloading the data from <a href="http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-4984/E-GEOD-4984.processed.1.zip">Array Express</a>&nbsp;(if you ask why from AE and not GEO: the latter doesn&#8217;t have a clearly-identified link for normalized data)&nbsp;and then loading it in a data.frame as:</div>
<pre class="brush: python; title: ; notranslate">
dataframe = robjects.DataFrame.from_csvfile(&quot;GSE4984.txt&quot;, sep=&quot;\t &quot;, row_names=1)
</pre>
<div>The resulting Python object has all the attributes of a R data.frame but with added Python goodness. We can use the <i>colnames</i>&nbsp;and <i>rownames</i>&nbsp;attributes to access the row names (if set) and column names of the object, and likewise we can use <i>nrow</i>&nbsp;and <i>ncol</i>&nbsp;to quickly glance at the rows/columns.</p>
<div>Since a full array has a lot of genes, we&#8217;re going to choose only the first 500 genes:</p>
<pre class="brush: python; title: ; notranslate">
rows = robjects.IntVector(range(1,501))
subset = dataframe.rx(rows, True)
</pre>
<div>An <i>IntVector</i>&nbsp;is a rpy2 object which replicates R&#8217;s vectors of integers: there are variants for strings, floats, integers, lists (R lists, not the Python type) and factors. rx is an <i>accessor</i>&nbsp;that mimicks R&#8217;s item access: in short, it&#8217;s equivalent to</p>
<pre class="brush: r; title: ; notranslate">
subset &lt;- dataframe[rows, ]
</pre>
<div>rpy2 has another accessor, <i>rx2,</i>&nbsp;which mimicks the [[ ]] access in data.frames.</div>
<h2>Clustering</h2>
<p>Once we have the data, it&#8217;s time to do some serious clustering on it:</p>
<pre class="brush: python; title: ; notranslate">
result = pvclust.pvclust(subset, nboot=100, method_dist=&quot;correlation&quot;, method_hclust=&quot;average&quot;)
</pre>
<p>We&#8217;re using a small number of permutations (100) because the computation times are long. You can change the distance metric and the linkage types using <i>method_dist</i>&nbsp;and <i>method_hclust</i>. Internally the data.frame is converted to a matrix, so ensure you have valid data (i.e. numeric) prior to proceeding.</p>
<p>Notice that this will just cluster the columns by default. If we want to cluster genes, we have to transpose the data.frame. In this case we have to first convert it to a matrix, then transpose it:</p>
<pre class="brush: python; title: ; notranslate">
matrix = base.as_matrix(subset)
subset_transposed = matrix.transpose()
result_rows = pvclust(subset_tranposed, nboot=100, method_dist=&quot;correlation&quot;, method_hclust=&quot;average&quot;)
</pre>
<p>Once the computation is done, we have a <i>pvclust</i>&nbsp;object which holds information on the results. What we&#8217;re most interested in is the <i>hclust</i>&nbsp;attribute, as it holds a dendrogram object we can use for plotting (either standalone or via a heat map). We can also manipulate the object with the <i>pvpick</i>&nbsp;function, for example to color the trees of the dendrogam basing on their AU values.</p>
<p>To get a fast representation, we can just dump the object as it is to a dendrogram which will show AU and BP values for each element of the cluster:</p>
<pre class="brush: python; title: ; notranslate">
graphics = importr(&quot;graphics&quot;)
graphics.plot(result)
</pre>
<div>Or we can do the same, but to a PDF:</div>
<pre class="brush: python; title: ; notranslate">
graphics = importr(&quot;graphics&quot;)
grdevices = importr(&quot;grDevices&quot;)
grdevices.pdf(&quot;myresult.pdf&quot;, paper=&quot;a4&quot;)
graphics.plot(result)
grdevices.dev_off()</pre>
<div>Of course, we might want a heat map (<b>everyone</b>&nbsp;wants pretty heat maps, right?). In that case we extract both dendrograms and use something like gplots&#8217; <i>heatmap.2 </i>&nbsp;function to represent it (you will need the <i>gplots</i>&nbsp;package installed in order for the following to work):</div>
<pre class="brush: python; title: ; notranslate">
gplots = importr(&quot;gplots&quot;)
row_dendrogram = result_rows.rx2(&quot;hclust&quot;)
column_dendrogram = result.rx2(&quot;hclust&quot;)
gplots.heatmap_2(subset, Rowv=row_dendrogram, Colv=column_dendrogram, col=gplots.greenred(255), density_info=&quot;none&quot;)
</pre>
<div>You can add the <i>grdevices</i>&nbsp;lines like above to make a PDF of the plot. &nbsp;If you notice we have used the <i>rx2</i>&nbsp;accessors here, just as I wrote above, to access the <i>hclust</i>&nbsp;attribute of the pvclust object.&nbsp;</div>
<h2>Moving further</h2>
<p>pvclust as-is it&#8217;s quite slow. There&#8217;s however a parallelized version, called <i>parPvclust</i>, which uses <i>snow</i>&nbsp;to parallelize the clustering, either through multiple machines or using multiple cores. Setting snow properly up is beyond the scope of this tutorial, but it may be worth investing if you cluster a lot of data.</p>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Akademy: my own BoF</title>
		<link>http://www.dennogumi.org/2010/05/akademy-my-own-bof</link>
		<comments>http://www.dennogumi.org/2010/05/akademy-my-own-bof#comments</comments>
		<pubDate>Sat, 29 May 2010 19:55:37 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[akademy2010]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=774</guid>
		<description><![CDATA[My Akademy talk proposal was not accepted, but the organizers were kind enough to offer me the chance to hold a BoF on the same subject. Now I bet you wonder on what I&#8217;m going to discuss, and I think &#8230; <a href="http://www.dennogumi.org/2010/05/akademy-my-own-bof">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p align="center"><a href="http://akademy.kde.org"><img src="http://www.dennogumi.org/wp-content/uploads/2010/05/igta2010.png?cda6c1" title="I'm going to Akademy 2010" alt="I'm going to Akademy 2010 image" /></a></p>
<p>My Akademy talk proposal was not accepted, but the organizers were kind enough to offer me the chance to hold a BoF on the same subject. Now I bet you wonder on what I&#8217;m going to discuss, and I think the title already gives you an idea:</p>
<p align="center"><strong>KDE and bioinformatics: the missing link</strong></p>
<p align="left">Although in the KDE community we have our fair share of scientists (hey there, Stuart!), my BoF will focus on the adoption of KDE in the field of <a href="http://en.wikipedia.org/wiki/Bioinformatics" title="General explanation on bioinformatics">bioinformatics</a> (my day job, not-so-by-chance) on the &quot;outsiders&quot; front and how to improve the current situation. To elaborate further, bioinformatics is a rather broad field where biological data are treated with computational methods. The oldest and most famous branch of bioinformatics is sequence analysis and related field, where sequences of DNA are analyzed, for example, to find common ancestors among several species, or to reconstruct the genetic code of an organism by comparing it to a related species. Another recent example is related to <em>high-throughput technologies</em>, technologies which produce huge amounts of data from a very small number of experiments (&quot;<a href="http://en.wikipedia.org/wiki/DNA_sequencing#Large-scale_sequencing_strategies" title="Wikipedia article">ultramassive sequencing</a>&quot; and <a href="http://en.wikipedia.org/wiki/DNA_microarray" title="Wikipedia explanation">DNA microarrays</a> are examples of such a technology). </p>
<p align="left">Either way, bioinformaticians have to deal with large amounts of data all the time, and usually there&#8217;s no &quot;shrink-wrap&quot; solution to the problems they have to face, software-wise. That&#8217;s because we do research, so we need to find something new. So the solution is often to write algorithms, or re-implement existing ones in a form that is suited for the tasks at hand. So, bioinformaticians also write software, although they&#8217;re by no means (usually) professional coders: some have a mathematical or statistical background, others (like me) come from an experience at the lab bench. What kind of programs bioinformaticians write? Normally scripts and small stuff, but in certain cases even full blown-algorithms and applications. Some become so famous that are even trend-setters.</p>
<p align="left">Which brings us to the heart of the matter: how does KDE stand in all of this? Sadly, not too well. I&#8217;ve done some research in the published literature, but there&#8217;s just <strong>one</strong> hit returned that&#8217;s proper: <a href="http://www.ncbi.nlm.nih.gov/pubmed/18695948" title="KInNeSS: a modular framework for computational neuroscience.">a KDE application for neuroscience</a> (based on the 3.5.x Development Platform) published in 2008. I know that big research places like CERN use KDE, but to my knowledge smaller realities such as research group code in the majority of the cases for Windows or for web-based solutions. Given that at least a signficant portion of bioinformaticians uses UNIX-like operating systems, the question we need to answer is: why?</p>
<p align="left">The first and foremost problem is related to market share. Research groups don&#8217;t even know that KDE exists, so it&#8217;s unlikely they develop something using the Development Platform (even now that&#8217;s becoming more cross-platform). This is where some promo efforts could help. Secondly, the problem lies in the &quot;difficulty&quot; (notice the quotes!) of developing using the KDE Development platform: most bioinformaticians, as I wrote, are <strong>not</strong> professional coders, and few of them know C++. The most used languages in bioinformatics are Perl and Java (with some Python and Ruby thrown into the mix). Thus, the need for proper bindings. The bindings are there, thanks to the excellent work of the kde-bindings team, but documentation is still lacking (namely in the examples department, but also in tutorials and getting started guides that aren&#8217;t aimed at C++). Some documentation is auto-generated, and while the KDE API docs are usually not too hard to read, they can still scare off newcomers. Of course this is not the fault of the kde-bindings team: namely, more help is needed. </p>
<p align="left">Promo efforts and better bindings are the keys to spread KDE more in the field of the bioinformatics. This is what my BoF is about, plus an informal discussion on the use of FOSS in academia and related matters. </p>
<p align="left">Interested? If you are, you can come to the BoF which will be on <strong>Tuesday, 6th July</strong> at <strong>15.00</strong> in the Area 2 of the main room at Demola. </p>
<p align="left">I&#8217;ll also be around later till the following morning (sadly, two days is the best I can do to attend) in case you&#8217;re interested for a chat.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2010/05/akademy-my-own-bof/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>DataMatrix 0.8 is finally out</title>
		<link>http://www.dennogumi.org/2009/06/datamatrix-08-is-finally-out</link>
		<comments>http://www.dennogumi.org/2009/06/datamatrix-08-is-finally-out#comments</comments>
		<pubDate>Sat, 13 Jun 2009 13:29:40 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[datamatrix]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2009/06/datamatrix-08-is-finally-out</guid>
		<description><![CDATA[At last, after months of inactivity, I pushed out a new release of DataMatrix. Although the version bump is small (0.8) there are a lot of changes since last releases. The most notable include: Ability to apply functions to elements &#8230; <a href="http://www.dennogumi.org/2009/06/datamatrix-08-is-finally-out">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At last, after months of inactivity, I pushed out a new release of <a href="http://www.dennogumi.org/projects/datamatrix" title="DataMatrix"><em>DataMatrix</em></a>. Although the version bump is small (0.8) there are a lot of changes since last releases. The most notable include:</p>
</p>
<ul>
<li>Ability to apply functions to elements of the matrix</li>
<li>Ability to filter rows by column contents</li>
<li>Ability to transpose rows with columns</li>
<li>An option to load text files produced by R (which are, by design, broken)</li>
<li>Removed the getter for columns, using dictionary-like syntax directly</li>
<li>A lot of bug fixes</li>
</ul>
<p>The download links on <a href="http://www.dennogumi.org/projects/datamatrix" title="Project page">the project page</a> have been updated, along with <a href="http://www.dennogumi.org/doc/datamatrix/" title="Documentation">the documentation</a>.  Also, there is another change, because from now on the official Git repository <a href="http://gitorious.org/datamatrix/datamatrix" title="Web interface on gitorious.org">is hosted on gitorious.org</a>, and no longer on github, because gitorious (the software) is also free, while github.com&#8217;s is not. It&#8217;s mainly a philosophical issue (the same that prompted me to switch from twitter to identi.ca). </p>
<p>Also, from today <em>DataMatrix</em> is also officially hosted on the <a href="http://pypi.python.org/pypi/datamatrix/0.8" title="Page on PyPI">Python Package Index</a> (with the name &#8220;datamatrix&#8221;), meaning that you can use easy_install to quickly install it.</p>
<p>If you use this module, let me know what you think (including bugs, if you find them).</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/06/datamatrix-08-is-finally-out/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gene search applet: suggestions and code review needed</title>
		<link>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed</link>
		<comments>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed#comments</comments>
		<pubDate>Tue, 31 Mar 2009 17:33:09 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=594</guid>
		<description><![CDATA[In the past months I&#8217;ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of &#8230; <a href="http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In the past months I&#8217;ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of my analysis work which I want to take a look at. I am often lazy, so instead of firing up the browser to look at the online resources, I wanted to write something which could access said resources programmatically.</p>
<p><span id="more-594"></span></p>
<p>I found a way thanks to the <a href="http://biopython.org" title="The Biopython project">Biopython project,</a> which offers a Python module to access the resources of the <a href="http://www.ncbi.nlm.nih.gov" title="NCBI">National Center for Biotechnology Information (NCBI)</a> by providing an interface to their <a href="http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html" title="EUtils web page">EUtils</a>. Since the back-end was already taken care of, almost, at least, I sought to write a small Plasma applet. Which is what I&#8217;m presenting today. It&#8217;s written in Python, and uses the Python ScriptEngine to work. Currently, it searches the &#8220;Gene&#8221; database at NCBI by inputting the &#8220;Entrez Gene IDs&#8221;, that are numerical IDs that uniquely identify a gene record, and returns name, official symbol,  organism, and a description if it&#8217;s present. It does not support anything else (see below).</p>
<p>The code lives in <a href="http://github.com/cswegger/plasma-genesearch/tree/master" title="Code repository">a git repository at github</a>. <strong>WARNING: </strong>The code may be a complete mess (I&#8217;m not too well versed in GUI stuff, I mostly do text file manipulation) If you are so daring, you can obtain and install it in a very simple manner:</p>
<p>
<pre class="brush: bash; title: ; notranslate">git clone git://github.com/cswegger/plasma-genesearch.git
cd plasma-genesearch
zip -r ../plasma-genesearch.plasmoid *
plasmapkg -i ../plasma-genesearch.plasmoid</pre>
</p>
<p>After that you will see an &#8220;Entrez Gene Searcher&#8221; in your add applets dialog. Once added, it&#8217;ll look like this:</p>
<p align="center"><img src="http://www.dennogumi.org/wp-content/uploads/2009/03/plasma-genesearch1.png?cda6c1" title="Gene searcher" alt="Gene searcher image" /></p>
<p align="left">Pretty horrible, isn&#8217;t it? Well, once you get past that, you can input an ID (only IDs will work for now) in the text field (which doesn&#8217;t clear the text: see further on) and push &#8220;Go!&#8221;. The following is an example with ID 10000, which corresponds to the human gene <em>AKT3</em>:</p>
<p align="center"><img src="http://www.dennogumi.org/wp-content/uploads/2009/03/plasma-genesearch2.png?cda6c1" title="Gene search results" alt="Gene search results image" /></p>
<p align="left">&#8220;Search again&#8221; will bring you back to the search form.</p>
<p align="left">Now, what has this to do with Planet KDE? Well, I&#8217;m asking for some code review from the community, if it&#8217;s possible, and suggestions to improve the horrid default look. I am especially interested in layouting, since I did not quite understand how it works, I mean, it should not work and it <em>does&#8230;.</em> </p>
<p align="left">Other things that need to be improved are:</p>
<ul>
<li align="left">The Plasma.TextEdit is not cleared upon clicking. Is there a signal I can catch for that, so I can connect it to clear()?</li>
<li align="left">Proper searching. Bio.Entrez already does this: what I need is  a way to display the records properly. </li>
<li align="left">A way to link the names to URLs, and have them open in Konqueror. </li>
</ul>
<p>That should be it. I hope to work on it some more next weekend&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Moving on</title>
		<link>http://www.dennogumi.org/2009/02/moving-on</link>
		<comments>http://www.dennogumi.org/2009/02/moving-on#comments</comments>
		<pubDate>Fri, 27 Feb 2009 16:30:57 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=569</guid>
		<description><![CDATA[Some say that all good things must come to an end. I&#8217;m not entirely sure that this is a universal truth, but I can say that at some point in life there are decisions that need to be taken. In &#8230; <a href="http://www.dennogumi.org/2009/02/moving-on">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Some say that all good things must come to an end. I&#8217;m not entirely sure that this is a universal truth, but I can say that at some point in life there are decisions that need to be taken.</p>
<p>In this case I made my own: today was the last day in<a href="http://www.centro-cisi.com/microarray.htm"> Dr.Cristina Battaglia&#8217;s laboratory</a>, a place where I spent my three-year Ph.D. course and one year as a post-doc research fellow.</p>
<p>Those four years were not bad at all. They were interesting, and provided a good learning experience. I think I owe quite a bit to that place, especially because I was able to learn and improve my skills alongside the analysis and research work. So my thanks go to my former supervisor (Dr.Cristina Battaglia) and all my colleagues. It&#8217;s been a fun ride.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/02/moving-on/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Science and KDE: kile</title>
		<link>http://www.dennogumi.org/2009/02/science-and-kde-kile</link>
		<comments>http://www.dennogumi.org/2009/02/science-and-kde-kile#comments</comments>
		<pubDate>Sun, 22 Feb 2009 20:49:20 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[latex]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=551</guid>
		<description><![CDATA[During the course of my research work, I may obtain results that are worthy of publication in scientific journals. Since my master&#8217;s thesis I&#8217;ve been using LaTeX as my writing platform, mainly because I can concentrate on content rather than &#8230; <a href="http://www.dennogumi.org/2009/02/science-and-kde-kile">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>During the course of my research work, I may obtain results that are worthy of publication in scientific journals. Since my master&#8217;s thesis I&#8217;ve been using <a title="LaTeX web page" href="http://latex-project.org">LaTeX</a> as my writing platform, mainly because I can concentrate on content rather than presentation (I find it useful also for writing non-scientific stuff as well). Also, I can handle bibliography (essential for a scientific publication) very well without using expensive proprietary applications (such as Endnote).</p>
<p>In my early days I used kLyX first, then <a title="LyX" href="http://www.lyx.org">LyX</a>, but I found the platform to be too limited for my tastes, and also LaTeX errors were difficult to diagnose. I needed a proper editor, and that&#8217;s when I heard of <a title="Kile's web page" href="http://kile.sourceforge.net">kile, a KDE front-end for LaTeX</a>. Kile is currently at version 2.0.2 and is a KDE 3 application. However, in KDE SVN work is ongoing to produce a KDE4 version (2.1) and that&#8217;s what I&#8217;ll look at in this entry.</p>
<p><span id="more-551"></span></p>
<p><strong>Obtaining kile 2.1</strong></p>
<p>First and foremost, a disclaimer. kile 2.1 has not been released yet in any form, and so should be considered unstable and crash-prone. That said, it runs more or less well on my platform.</p>
<p>The first thing to do is to grab the sources from SVN:</p>
<p><code>svn checkout svn://anonsvn.kde.org/home/kde/trunk/extragear/office/kile</code></p>
<p>That will put kile&#8217;s sources in a directory called &#8220;kile&#8221;. The next step is to compile it (as usual, you need KDE4 development packages/files installed):</p>
<p><code>cd kile<br />
mkdir build; cd build<br />
cmake -DCMAKE_INSTALL_PREFIX=`kde4-config --prefix` ../<br />
make</code></p>
<p>Followed by the usual <code>make install</code> as root or using <code>sudo</code>.</p>
<p><strong>kile 2.1 at a glance</strong></p>
<p>This is how kile looks when loaded on my system:</p>
<p style="text-align: center;"><a class="shutterset_" title="Kile at startup" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile1.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile1.png?cda6c1" alt="kile1.png" /></a></p>
<p style="text-align: left;">(For the inquisitive people, it&#8217;s not a scientific work, rather a sci-fi like book I&#8217;m writing).</p>
<p style="text-align: left;">Kile uses the katepart for editing, so that means all the goodies that come with Kate can be used, including the recently-added vim input mode. Aside from editing and LaTeX syntax highlighting, kile offers a configurable LaTeX command completion, like this screenshot shows:</p>
<p style="text-align: center;"><a class="shutterset_" title="Command completion" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile4.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile4.png?cda6c1" alt="kile4.png" /></a></p>
<p style="text-align: left;">From the toolbars and the menus you can insert almost every LaTeX command known to mankind. For the people less apt with LaTeX, kile offers a series of wizards in order to make the creation of figures, tables and even complete documents. The one I&#8217;m showing here is the Quick Start wizard, which enables you to select document classes, add packages, and add information like author and date. As I was saying earlier, kile 2.1 is still a work in progress, and that explains why the dialog is still a little unrefined.</p>
<p style="text-align: center;"><a class="shutterset_" title="Quick start wizard" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile2.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile2.png?cda6c1" alt="kile2.png" /></a></p>
<p style="text-align: left;">Like with its KDE3 counterpart, kile offers the possibility of using &#8220;projects&#8221;, which means you can collect LaTeX documents, bib files, and so on, and associate them together. You can also set a master document, so that even if you are editing other files (included in the master document), when you build your LaTeX file the compilation runs on the master document.  Even in this case, a wizard helps in creating a project and the master document.</p>
<p style="text-align: center;"><a class="shutterset_" title="New project" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile3.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile3.png?cda6c1" alt="kile3.png" /></a></p>
<p style="text-align: left;">Lastly, kile has a plethora of other options, including customizing what you can use to build LaTeX files and view them (DVI, PS, PDF&#8230;), as shown in this screenshot.</p>
<p style="text-align: center;"><a class="shutterset_" title="Build options" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile5.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile5.png?cda6c1" alt="kile5.png" /></a></p>
<p style="text-align: left;"><strong>Conclusions</strong></p>
<p style="text-align: left;">I have merely scratched the surface of this application, which is extremely powerful and can help anyone with their LaTeX needs. While the many options may be confusing, I think that this application is already geared towards a technically-inclined userbase and so it doesn&#8217;t matter much. kile 2.1 is still unstable but extremely promising, and I&#8217;m looking forward to its release.</p>
<p style="text-align: left;">
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/02/science-and-kde-kile/feed</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Science and KDE: rkward</title>
		<link>http://www.dennogumi.org/2009/02/science-and-kde-rkward</link>
		<comments>http://www.dennogumi.org/2009/02/science-and-kde-rkward#comments</comments>
		<pubDate>Sat, 07 Feb 2009 18:55:53 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rkward]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=533</guid>
		<description><![CDATA[I try to use FOSS extensively for my scientific work. In fact, when possible, I use only FOSS tools. Among these there is the R programming language. It&#8217;s a Free implementation of the S-plus language, and it&#8217;s mainly aimed at &#8230; <a href="http://www.dennogumi.org/2009/02/science-and-kde-rkward">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I try to use FOSS extensively for my scientific work. In fact, when possible, I use <em>only</em> FOSS tools. Among these there is the R programming language. It&#8217;s a Free implementation of the S-plus language, and it&#8217;s mainly aimed at statistics and mathematics. As the people who read my scientific posts know, I don&#8217;t like R much. But sometimes it&#8217;s the only alternative.</p>
<p>Well, what does R have to do with KDE? With this post I&#8217;d like to start a series (hopefully) of articles that deals with KDE programs used for scientific purposes. In this particular entry, I&#8217;ll focus on rkward, a GUI front-end for R.<br />
<span id="more-533"></span><br />
<strong>Introduction</strong></p>
<p>Although R is a programming language, it&#8217;s mainly used in an interactive session, started from the terminal. The standard installation can be improved by the use of add-on packages, <em>libraries</em> in R-speak, which can be installed from the Internet (Comprehensive R Archive Network or CRAN) or from local files. One of the most famous third party repositories is the Bioconductor project, which hosts a lot of packages used by life scientists who do bioinformatics.</p>
<p>The Windows version of R has a GUI (Rgui) which provides extra functionality, such as package management and loading, and other goodies. Although there were plan for a GTK+ frontend for Linux, the project is (as far as I know) stuck in a limbo.</p>
<p>That&#8217;s where rkward comes to the rescue. It&#8217;s a GUI front-end for R for KDE4, which aims to provide a graphical shell for many R commands and environments (and especially the publication-quality plotting figures).</p>
<p><strong>Getting rkward</strong></p>
<p>rkward is available from <a title="rkward main page" href="http://rkward.sourceforge.net/">Sourceforge.net</a>. Unfortunately, if you use a recent (&gt;=2.8) version of R  it won&#8217;t compile, due to the changes in R itself. For that, you need to directly download the sources off SVN with a command like this</p>
<pre class="brush: cpp; title: ; notranslate">

svn co https://rkward.svn.sourceforge.net/viewvc/rkward/trunk/rkward/
</pre>
<p>Either way, the sources are compiled the usual, way, that is</p>
<pre class="brush: cpp; title: ; notranslate">

cd rkward-xxx # Your rkward source dir
mkdir build; cd build
cmake  -DCMAKE_INSTALL_PREFIX=`kde4-config --prefix` ../
make
</pre>
<p>Followed by <code>make install</code> as root or using sudo, depending on your distribution.</p>
<p><strong>rkward at a glance</strong></p>
<p><strong>
<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward1.png?cda6c1" title="" class="shutterset_singlepic263" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/263__320x240_rkward1.png?cda6c1" alt="rkward1.png" title="rkward1.png" />
</a>
</strong></p>
<p>This is how rkward looks when loading it up (yes, it&#8217;s in Italian because that is my own locale). You have the R console (which I brought up) and then an output window which is used to display results. There is also another tab called &#8220;mio.dataset&#8221; (my.dataset) which keeps data, in a spreadsheet-like form. This is useful when you want to create your own datasets from scratch, or if you want to inspect one you have loaded.</p>
<p>So how do you start coding? You can create a new script using the &#8220;Script File&#8221; button. Like that, you can input R commands and then execute them all at once, or the current line. If you prefer interactive work, you can use the R command line (shown in the screenshot).</p>

<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward2.png?cda6c1" title="" class="shutterset_singlepic264" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/264__320x240_rkward2.png?cda6c1" alt="rkward2.png" title="rkward2.png" />
</a>

<p>You can also use rkward to import data: R provides a series of functions (like <code>read.table</code>) to load data sets (usually comma- or tab-delimited text files). rkward provides a complete GUI to those functions, which is shown in the screenshot above. Notice that for working, it requires PHP (the line command version).</p>

<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward5.png?cda6c1" title="" class="shutterset_singlepic266" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/266__320x240_rkward5.png?cda6c1" alt="rkward5.png" title="rkward5.png" />
</a>

<p>Ok, we have data loaded. Now we may want to do some operations: rkward provides front-ends to many of R&#8217;s statistical functions. In the screenshot, we can see the GUI for a two-variable t-test. Notice how it shows also the code, so the most experienced R people can view exactly what it does.</p>
<p>Like with statistics, R has powerful support for graphics, and even in this case rkward offers some frontends, for example histograms, boxplots, and scatter plots. You can also plot all kinds of distributions.</p>

<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward3.png?cda6c1" title="" class="shutterset_singlepic265" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/265__320x240_rkward3.png?cda6c1" alt="rkward3.png" title="rkward3.png" />
</a>

<p>Lastly, rkward can manage your R packages (R package management is akin to one of a Linux distribution), and als your package sources. You can install or upgrade packages, and select where they&#8217;ll get installed to.</p>
<p><strong>Conclusions</strong></p>
<p>rkward is a nice frontend for the R programming language, which adds a GUI with the power of KDE to R. Unfortunately the program is still somewhat unstable (also shown by a warning when you run it) and its main developer has currently very little time to work on it. In case you may want to help, you can hop to the r<a title="rkward-devel" href="http://sourceforge.net/mailarchive/forum.php?forum_name=rkward-devel">kward-devel mailing list.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/02/science-and-kde-rkward/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Published! (and it matters more)</title>
		<link>http://www.dennogumi.org/2009/01/published-and-it-matters-more</link>
		<comments>http://www.dennogumi.org/2009/01/published-and-it-matters-more#comments</comments>
		<pubDate>Tue, 06 Jan 2009 17:39:39 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[pathway analysis]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=489</guid>
		<description><![CDATA[Finally I can lift the curtain of silence and tell the reason why I&#8217;ve been very busy before Christmas: it all lies in the publication of a paper, &#8220;Using Pathway Signatures as Means of Identifying Similarities among Microarray Experiments&#8221;, which &#8230; <a href="http://www.dennogumi.org/2009/01/published-and-it-matters-more">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Finally I can lift the curtain of silence and tell the reason why I&#8217;ve been very busy before Christmas: it all lies in the publication of a paper, &#8220;Using Pathway Signatures as Means of Identifying Similarities among Microarray Experiments&#8221;, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004128">which is finally out on this week&#8217;s issue of <em>PLoS ONE</em></a>. It&#8217;s different from <a href="http://www.dennogumi.org/2008/01/phd">the previous paper I mentioned</a> (which was not my first publication, either), for two main reasons:</p>
<ul>
<li>It&#8217;s a bioinformatics paper;</li>
<li>I am <strong>first author</strong> there.</li>
</ul>
<p>The second point is very important because usually for a person doing bioinformatics is more difficult to end up as first author in a paper, since most we do is &#8220;something in the middle&#8221; like data analysis. Therefore, this paper is quite important for me. Also, it deals with an interest of mine, mainly analysis of biological networks using high-throughput platforms such as microarrays. Actually I&#8217;m also interested in network <em>reconstruction</em>, but I need to study far more than what I&#8217;m doing right now. </p>
<p>In any case, let&#8217;s hope this is the first of a (hopefully long) series!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/01/published-and-it-matters-more/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DataMatrix 0.7 has been released</title>
		<link>http://www.dennogumi.org/2008/12/datamatrix-07-has-been-released</link>
		<comments>http://www.dennogumi.org/2008/12/datamatrix-07-has-been-released#comments</comments>
		<pubDate>Sat, 27 Dec 2008 15:33:07 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[datamatrix]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/2008/12/datamatrix-07-has-been-released</guid>
		<description><![CDATA[Finally a new entry! I&#8217;ve been extremely busy with other things, that is why I did not have time to write more. One of the main reason is related to an important landmark in my professional career, but I&#8217;ll write &#8230; <a href="http://www.dennogumi.org/2008/12/datamatrix-07-has-been-released">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Finally a new entry! I&#8217;ve been <strong>extremely</strong> busy with other things, that is why I did not have time to write more. One of the main reason is related to an important landmark in my professional career, but I&#8217;ll write more about it after January 1st (hint: those who follow my Twitter updates may have already understood).</p>
<p>As a nice way to break the hiatus, I&#8217;m releasing a new version of DataMatrix, my implementation of R&#8217;s data.frame in Python. Although the version bump is small, there are loads of improvements. First of all, there is proper support for file-like objects, as well as support for appending and inserting both rows and columns. writeMatrix has been substantially improved and now writes files correctly, and I have added (experimental) support for a DataMatrix object that does not require files &#8211; EmptyMatrix. Also, there is now <a href="http://www.dennogumi.org/doc/datamatrix/">proper documentation</a>. Last but not least, unit tests have been added, a good way to watch out for regressions in the code.</p>
<p>Finally, this version marks the entrance of <a href="http://bioinfoblog.it">dalloliogm</a> as contributor to the code. He gave quite a number of helpful hints, especially with regards to unit tests.</p>
<p>I&#8217;m quite satisfied on how DataMatrix behaves &#8211; as a matter of fact I use it extensively on a number of internal projects.</p>
<p>You can grab DataMatrix 0.7 as a <a href="http://www.dennogumi.org/files/datamatrix-0.7.tar.gz?cda6c1">source package</a> or as <a href="http://www.dennogumi.org/files/datamatrix-0.7.win32.exe?cda6c1">a Windows installer</a>.  Comments are welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/12/datamatrix-07-has-been-released/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The plague of cross-database annotations</title>
		<link>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations</link>
		<comments>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations#comments</comments>
		<pubDate>Sun, 02 Nov 2008 14:15:20 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[microarray]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=470</guid>
		<description><![CDATA[Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid &#8220;annotation files&#8221; (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date &#8230; <a href="http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently I had to annotate a large (10,000+) number of genes identified by Entrez Gene IDs. My goal was to avoid &#8220;annotation files&#8221; (basically CSV files) that a part of wet lab group likes, because I wanted to stay up-to-date without having to remember to update them. So the obvious solution was to use a service available on the web, and in an automated way. For reference, I just tried to attach gene symbol, gene name, chromosome and cytoband.<br />
I tried many services:</p>
<ul>
<li><strong><a href="http://genome.ucsc.edu">UCSC Genome Browser</a></strong>: it has a MySQL server but it&#8217;s rather slow and I did not want to clog it up. Using their tables and .sql files I managed to get a first shot at annotation, but about 2,000 genes were without annotation!</li>
<li><strong><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene">NCBI&#8217;s own Entrez Gene</a></strong>: This needs EUtils, and in Biopython there is not a parser for Entrez Gene XML entries. I had to scrap the idea because I did not have time.</li>
<li><strong><a href="http://www.ensembl.org">Ensembl</a></strong>: I decided to use the <a href="http://www.biomart.org">Biomart</a> service, through Rpy. There were missing genes, and sometimes the IDs were &#8220;converted&#8221; in something else (I  had no time to figure out what was happening). Also some perfectly valid genes (in Entrez Gene) were not present in Ensembl.</li>
</ul>
<p>In the end I just grabbed <a href="http://www.bioconductor.org/packages/2.3/data/annotation/html/org.Hs.eg.db.html">Bioconductor&#8217;s &#8220;org.Hs.eg.db&#8221; package </a>and used its sqlite gene database (from Entrez Gene) to annotate the list, with only 97 missing IDs (mostly genes that had changed identifiers). However, this effort revealed a problem:<em>the annotations are not consistent between databases</em>. This is a real pain when doing microarray-based analysis, because you often have large number of genes and perceived lack of annotation might get lead to a number of them getting discarded. </p>
<p>I thought the situation was better than this. If I annotate genes in different databases with the same ID, I expect to get identical results. I mean, it&#8217;s not like Gene or Ensembl have little resources&#8230; or am I wrong?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/11/the-plague-of-cross-database-annotations/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using xcache (Feed is rejected)
Page Caching using xcache
Database Caching using xcache
Object Caching 865/962 objects using xcache

Served from: www.dennogumi.org @ 2012-02-05 05:14:10 -->
