<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>dennogumi.org &#187; Science</title>
	<atom:link href="http://www.dennogumi.org/tag/science/feed" rel="self" type="application/rss+xml" />
	<link>http://www.dennogumi.org</link>
	<description>On the web since 1999</description>
	<lastBuildDate>Fri, 06 Jan 2012 14:56:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Multiscale bootstrap clustering with Python and R</title>
		<link>http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r</link>
		<comments>http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r#comments</comments>
		<pubDate>Sun, 29 May 2011 12:11:40 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=906</guid>
		<description><![CDATA[While reading the statistics for my blog, I noticed that a number of searches looked for hierarchical clustering with Python, which I covered quite a while ago. Today I&#8217;d like to present an updated version which uses more robust techniques. &#8230; <a href="http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While reading the statistics for my blog, I noticed that a number of searches looked for hierarchical clustering with Python, which <a href="http://www.dennogumi.org/2007/11/data-clustering-with-python">I covered quite a while ago</a>. Today I&#8217;d like to present an updated version which uses more robust techniques.</p>
<p><span id="more-906"></span></p>
<h2>Defining the problem</h2>
<p>Since Eisen&#8217;s original paper on clustering, this form of analysis has been widely used by a lot of researchers. However, as it is known, such systems may be susceptible to an ordering bias: in other words, the order of the samples and/or genes might influence the final result. That&#8217;s why popular software such as <a href="http://www.tm4.org/mev/">TMeV</a> offers alternative approaches, based on <i>bootstrapping</i>.&nbsp;</p>
<p>In this specific form of bootstrapping, the samples and/or genes are randomly shuffled a number of times (1000 or more iterations are a good starting point) and the resulting dendrograms checked for consistency and robustness of partitioning. In other words, a p-value is calculated, our null hypothesis being that the arrangement of samples/genes is merely due by chance. Depending on the software, this value might be expressed either in form of p-value or percentage (TMeV calls it <i>support</i>).&nbsp;</p>
<p>In the past years, I found <a href="http://www.is.titech.ac.jp/~shimo/prog/pvclust/">an interesting method developed by Hidetoshi Shimodaira</a>: the technique, called <i>multiscale bootstrap resampling</i>, aims at determining more accurate p-values out of the bootstrapping. Shimodaira calls the resulting p-value an <i>AU</i>&nbsp;value, where AU stands for &#8220;approximately unbiased&#8221;, a more precise p-value than the one obtained through bootstrapping alone.</p>
<p>In addition to this nice algorithm, a R package was also provided, named <i>pvclust </i>(it&#8217;s available on your favorite CRAN mirror). And that&#8217;s exactly what we&#8217;ll use for this exercise.</p>
<h2>Prerequisites</h2>
<p>Some of the readers of this blog might remember my disdain of R: while I need to use it for Bioconductor, I&#8217;m often annoyed by its weird syntax, and difficult to understand error messages. Luckily, thanks to the hard work of Laurent Gautier and contributors, there&#8217;s <a href="http://rpy.sourceforge.net">rpy2</a>, a nice R-to-Python bridge. All the examples here require this package, version 2.1 or newer (I&#8217;d recommend the release candidate of 2.2, it&#8217;s really nice). Unfortunately, this means that Windows users are out of luck as there&#8217;s no version of rpy2 2.1 or 2.2 available for that platform..</p>
<p>Also, don&#8217;t forget to have the pvclust package installed in R.</p>
<h2>Loading and preparing the data</h2>
<div>Let&#8217;s start first by importing the necessary bits:</div>
<div></div>
<div>
<pre class="brush: python; title: ; notranslate">
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
</pre>
</div>
<div>The second line is important, because it&#8217;ll let us play with R libraries as they were packages. Case in point, we&#8217;ll get the &#8220;base&#8221; and &#8220;pvclust&#8221; libraries loaded:</div>
<pre class="brush: python; title: ; notranslate">
base = importr(&quot;base&quot;)
pvclust = importr(&quot;pvclust&quot;)
</pre>
<div>Now we can manipulate them as if they were modules, and (most) of R&#8217;s dotted functions have been converted to underscores, as the dot is the namespace operator in Python. Example: as.data.frame becomes as_data_frame.&nbsp;</div>
<div>Next, we&#8217;ll load the data in a data.frame. rpy2 conveniently gives us the <i>DataFrame </i>class, which is a no-nonsense wrapper to R&#8217;s data.frames. For this exercise, we&#8217;ll load a set of normalized data from <a href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4984">GSE4984</a>, a microarray experiment with dendritic cells expoosed to different stimuli. It&#8217;s just a matter of downloading the data from <a href="http://www.ebi.ac.uk/arrayexpress/files/E-GEOD-4984/E-GEOD-4984.processed.1.zip">Array Express</a>&nbsp;(if you ask why from AE and not GEO: the latter doesn&#8217;t have a clearly-identified link for normalized data)&nbsp;and then loading it in a data.frame as:</div>
<pre class="brush: python; title: ; notranslate">
dataframe = robjects.DataFrame.from_csvfile(&quot;GSE4984.txt&quot;, sep=&quot;\t &quot;, row_names=1)
</pre>
<div>The resulting Python object has all the attributes of a R data.frame but with added Python goodness. We can use the <i>colnames</i>&nbsp;and <i>rownames</i>&nbsp;attributes to access the row names (if set) and column names of the object, and likewise we can use <i>nrow</i>&nbsp;and <i>ncol</i>&nbsp;to quickly glance at the rows/columns.</p>
<div>Since a full array has a lot of genes, we&#8217;re going to choose only the first 500 genes:</p>
<pre class="brush: python; title: ; notranslate">
rows = robjects.IntVector(range(1,501))
subset = dataframe.rx(rows, True)
</pre>
<div>An <i>IntVector</i>&nbsp;is a rpy2 object which replicates R&#8217;s vectors of integers: there are variants for strings, floats, integers, lists (R lists, not the Python type) and factors. rx is an <i>accessor</i>&nbsp;that mimicks R&#8217;s item access: in short, it&#8217;s equivalent to</p>
<pre class="brush: r; title: ; notranslate">
subset &lt;- dataframe[rows, ]
</pre>
<div>rpy2 has another accessor, <i>rx2,</i>&nbsp;which mimicks the [[ ]] access in data.frames.</div>
<h2>Clustering</h2>
<p>Once we have the data, it&#8217;s time to do some serious clustering on it:</p>
<pre class="brush: python; title: ; notranslate">
result = pvclust.pvclust(subset, nboot=100, method_dist=&quot;correlation&quot;, method_hclust=&quot;average&quot;)
</pre>
<p>We&#8217;re using a small number of permutations (100) because the computation times are long. You can change the distance metric and the linkage types using <i>method_dist</i>&nbsp;and <i>method_hclust</i>. Internally the data.frame is converted to a matrix, so ensure you have valid data (i.e. numeric) prior to proceeding.</p>
<p>Notice that this will just cluster the columns by default. If we want to cluster genes, we have to transpose the data.frame. In this case we have to first convert it to a matrix, then transpose it:</p>
<pre class="brush: python; title: ; notranslate">
matrix = base.as_matrix(subset)
subset_transposed = matrix.transpose()
result_rows = pvclust(subset_tranposed, nboot=100, method_dist=&quot;correlation&quot;, method_hclust=&quot;average&quot;)
</pre>
<p>Once the computation is done, we have a <i>pvclust</i>&nbsp;object which holds information on the results. What we&#8217;re most interested in is the <i>hclust</i>&nbsp;attribute, as it holds a dendrogram object we can use for plotting (either standalone or via a heat map). We can also manipulate the object with the <i>pvpick</i>&nbsp;function, for example to color the trees of the dendrogam basing on their AU values.</p>
<p>To get a fast representation, we can just dump the object as it is to a dendrogram which will show AU and BP values for each element of the cluster:</p>
<pre class="brush: python; title: ; notranslate">
graphics = importr(&quot;graphics&quot;)
graphics.plot(result)
</pre>
<div>Or we can do the same, but to a PDF:</div>
<pre class="brush: python; title: ; notranslate">
graphics = importr(&quot;graphics&quot;)
grdevices = importr(&quot;grDevices&quot;)
grdevices.pdf(&quot;myresult.pdf&quot;, paper=&quot;a4&quot;)
graphics.plot(result)
grdevices.dev_off()</pre>
<div>Of course, we might want a heat map (<b>everyone</b>&nbsp;wants pretty heat maps, right?). In that case we extract both dendrograms and use something like gplots&#8217; <i>heatmap.2 </i>&nbsp;function to represent it (you will need the <i>gplots</i>&nbsp;package installed in order for the following to work):</div>
<pre class="brush: python; title: ; notranslate">
gplots = importr(&quot;gplots&quot;)
row_dendrogram = result_rows.rx2(&quot;hclust&quot;)
column_dendrogram = result.rx2(&quot;hclust&quot;)
gplots.heatmap_2(subset, Rowv=row_dendrogram, Colv=column_dendrogram, col=gplots.greenred(255), density_info=&quot;none&quot;)
</pre>
<div>You can add the <i>grdevices</i>&nbsp;lines like above to make a PDF of the plot. &nbsp;If you notice we have used the <i>rx2</i>&nbsp;accessors here, just as I wrote above, to access the <i>hclust</i>&nbsp;attribute of the pvclust object.&nbsp;</div>
<h2>Moving further</h2>
<p>pvclust as-is it&#8217;s quite slow. There&#8217;s however a parallelized version, called <i>parPvclust</i>, which uses <i>snow</i>&nbsp;to parallelize the clustering, either through multiple machines or using multiple cores. Setting snow properly up is beyond the scope of this tutorial, but it may be worth investing if you cluster a lot of data.</p>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2011/05/multiscale-bootstrap-clustering-with-python-and-r/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Akademy: my own BoF</title>
		<link>http://www.dennogumi.org/2010/05/akademy-my-own-bof</link>
		<comments>http://www.dennogumi.org/2010/05/akademy-my-own-bof#comments</comments>
		<pubDate>Sat, 29 May 2010 19:55:37 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[akademy2010]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=774</guid>
		<description><![CDATA[My Akademy talk proposal was not accepted, but the organizers were kind enough to offer me the chance to hold a BoF on the same subject. Now I bet you wonder on what I&#8217;m going to discuss, and I think &#8230; <a href="http://www.dennogumi.org/2010/05/akademy-my-own-bof">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p align="center"><a href="http://akademy.kde.org"><img src="http://www.dennogumi.org/wp-content/uploads/2010/05/igta2010.png?cda6c1" title="I'm going to Akademy 2010" alt="I'm going to Akademy 2010 image" /></a></p>
<p>My Akademy talk proposal was not accepted, but the organizers were kind enough to offer me the chance to hold a BoF on the same subject. Now I bet you wonder on what I&#8217;m going to discuss, and I think the title already gives you an idea:</p>
<p align="center"><strong>KDE and bioinformatics: the missing link</strong></p>
<p align="left">Although in the KDE community we have our fair share of scientists (hey there, Stuart!), my BoF will focus on the adoption of KDE in the field of <a href="http://en.wikipedia.org/wiki/Bioinformatics" title="General explanation on bioinformatics">bioinformatics</a> (my day job, not-so-by-chance) on the &quot;outsiders&quot; front and how to improve the current situation. To elaborate further, bioinformatics is a rather broad field where biological data are treated with computational methods. The oldest and most famous branch of bioinformatics is sequence analysis and related field, where sequences of DNA are analyzed, for example, to find common ancestors among several species, or to reconstruct the genetic code of an organism by comparing it to a related species. Another recent example is related to <em>high-throughput technologies</em>, technologies which produce huge amounts of data from a very small number of experiments (&quot;<a href="http://en.wikipedia.org/wiki/DNA_sequencing#Large-scale_sequencing_strategies" title="Wikipedia article">ultramassive sequencing</a>&quot; and <a href="http://en.wikipedia.org/wiki/DNA_microarray" title="Wikipedia explanation">DNA microarrays</a> are examples of such a technology). </p>
<p align="left">Either way, bioinformaticians have to deal with large amounts of data all the time, and usually there&#8217;s no &quot;shrink-wrap&quot; solution to the problems they have to face, software-wise. That&#8217;s because we do research, so we need to find something new. So the solution is often to write algorithms, or re-implement existing ones in a form that is suited for the tasks at hand. So, bioinformaticians also write software, although they&#8217;re by no means (usually) professional coders: some have a mathematical or statistical background, others (like me) come from an experience at the lab bench. What kind of programs bioinformaticians write? Normally scripts and small stuff, but in certain cases even full blown-algorithms and applications. Some become so famous that are even trend-setters.</p>
<p align="left">Which brings us to the heart of the matter: how does KDE stand in all of this? Sadly, not too well. I&#8217;ve done some research in the published literature, but there&#8217;s just <strong>one</strong> hit returned that&#8217;s proper: <a href="http://www.ncbi.nlm.nih.gov/pubmed/18695948" title="KInNeSS: a modular framework for computational neuroscience.">a KDE application for neuroscience</a> (based on the 3.5.x Development Platform) published in 2008. I know that big research places like CERN use KDE, but to my knowledge smaller realities such as research group code in the majority of the cases for Windows or for web-based solutions. Given that at least a signficant portion of bioinformaticians uses UNIX-like operating systems, the question we need to answer is: why?</p>
<p align="left">The first and foremost problem is related to market share. Research groups don&#8217;t even know that KDE exists, so it&#8217;s unlikely they develop something using the Development Platform (even now that&#8217;s becoming more cross-platform). This is where some promo efforts could help. Secondly, the problem lies in the &quot;difficulty&quot; (notice the quotes!) of developing using the KDE Development platform: most bioinformaticians, as I wrote, are <strong>not</strong> professional coders, and few of them know C++. The most used languages in bioinformatics are Perl and Java (with some Python and Ruby thrown into the mix). Thus, the need for proper bindings. The bindings are there, thanks to the excellent work of the kde-bindings team, but documentation is still lacking (namely in the examples department, but also in tutorials and getting started guides that aren&#8217;t aimed at C++). Some documentation is auto-generated, and while the KDE API docs are usually not too hard to read, they can still scare off newcomers. Of course this is not the fault of the kde-bindings team: namely, more help is needed. </p>
<p align="left">Promo efforts and better bindings are the keys to spread KDE more in the field of the bioinformatics. This is what my BoF is about, plus an informal discussion on the use of FOSS in academia and related matters. </p>
<p align="left">Interested? If you are, you can come to the BoF which will be on <strong>Tuesday, 6th July</strong> at <strong>15.00</strong> in the Area 2 of the main room at Demola. </p>
<p align="left">I&#8217;ll also be around later till the following morning (sadly, two days is the best I can do to attend) in case you&#8217;re interested for a chat.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2010/05/akademy-my-own-bof/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Gene search applet: suggestions and code review needed</title>
		<link>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed</link>
		<comments>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed#comments</comments>
		<pubDate>Tue, 31 Mar 2009 17:33:09 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=594</guid>
		<description><![CDATA[In the past months I&#8217;ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of &#8230; <a href="http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In the past months I&#8217;ve always wanted to write a small Plasma applet to aid me in some boring tasks as a bioinformatician. One example (for the non-scientific crowd out there) is when I find a specific gene out of my analysis work which I want to take a look at. I am often lazy, so instead of firing up the browser to look at the online resources, I wanted to write something which could access said resources programmatically.</p>
<p><span id="more-594"></span></p>
<p>I found a way thanks to the <a href="http://biopython.org" title="The Biopython project">Biopython project,</a> which offers a Python module to access the resources of the <a href="http://www.ncbi.nlm.nih.gov" title="NCBI">National Center for Biotechnology Information (NCBI)</a> by providing an interface to their <a href="http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html" title="EUtils web page">EUtils</a>. Since the back-end was already taken care of, almost, at least, I sought to write a small Plasma applet. Which is what I&#8217;m presenting today. It&#8217;s written in Python, and uses the Python ScriptEngine to work. Currently, it searches the &#8220;Gene&#8221; database at NCBI by inputting the &#8220;Entrez Gene IDs&#8221;, that are numerical IDs that uniquely identify a gene record, and returns name, official symbol,  organism, and a description if it&#8217;s present. It does not support anything else (see below).</p>
<p>The code lives in <a href="http://github.com/cswegger/plasma-genesearch/tree/master" title="Code repository">a git repository at github</a>. <strong>WARNING: </strong>The code may be a complete mess (I&#8217;m not too well versed in GUI stuff, I mostly do text file manipulation) If you are so daring, you can obtain and install it in a very simple manner:</p>
<p>
<pre class="brush: bash; title: ; notranslate">git clone git://github.com/cswegger/plasma-genesearch.git
cd plasma-genesearch
zip -r ../plasma-genesearch.plasmoid *
plasmapkg -i ../plasma-genesearch.plasmoid</pre>
</p>
<p>After that you will see an &#8220;Entrez Gene Searcher&#8221; in your add applets dialog. Once added, it&#8217;ll look like this:</p>
<p align="center"><img src="http://www.dennogumi.org/wp-content/uploads/2009/03/plasma-genesearch1.png?cda6c1" title="Gene searcher" alt="Gene searcher image" /></p>
<p align="left">Pretty horrible, isn&#8217;t it? Well, once you get past that, you can input an ID (only IDs will work for now) in the text field (which doesn&#8217;t clear the text: see further on) and push &#8220;Go!&#8221;. The following is an example with ID 10000, which corresponds to the human gene <em>AKT3</em>:</p>
<p align="center"><img src="http://www.dennogumi.org/wp-content/uploads/2009/03/plasma-genesearch2.png?cda6c1" title="Gene search results" alt="Gene search results image" /></p>
<p align="left">&#8220;Search again&#8221; will bring you back to the search form.</p>
<p align="left">Now, what has this to do with Planet KDE? Well, I&#8217;m asking for some code review from the community, if it&#8217;s possible, and suggestions to improve the horrid default look. I am especially interested in layouting, since I did not quite understand how it works, I mean, it should not work and it <em>does&#8230;.</em> </p>
<p align="left">Other things that need to be improved are:</p>
<ul>
<li align="left">The Plasma.TextEdit is not cleared upon clicking. Is there a signal I can catch for that, so I can connect it to clear()?</li>
<li align="left">Proper searching. Bio.Entrez already does this: what I need is  a way to display the records properly. </li>
<li align="left">A way to link the names to URLs, and have them open in Konqueror. </li>
</ul>
<p>That should be it. I hope to work on it some more next weekend&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/03/gene-search-applet-suggestions-and-code-review-needed/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Moving on</title>
		<link>http://www.dennogumi.org/2009/02/moving-on</link>
		<comments>http://www.dennogumi.org/2009/02/moving-on#comments</comments>
		<pubDate>Fri, 27 Feb 2009 16:30:57 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=569</guid>
		<description><![CDATA[Some say that all good things must come to an end. I&#8217;m not entirely sure that this is a universal truth, but I can say that at some point in life there are decisions that need to be taken. In &#8230; <a href="http://www.dennogumi.org/2009/02/moving-on">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Some say that all good things must come to an end. I&#8217;m not entirely sure that this is a universal truth, but I can say that at some point in life there are decisions that need to be taken.</p>
<p>In this case I made my own: today was the last day in<a href="http://www.centro-cisi.com/microarray.htm"> Dr.Cristina Battaglia&#8217;s laboratory</a>, a place where I spent my three-year Ph.D. course and one year as a post-doc research fellow.</p>
<p>Those four years were not bad at all. They were interesting, and provided a good learning experience. I think I owe quite a bit to that place, especially because I was able to learn and improve my skills alongside the analysis and research work. So my thanks go to my former supervisor (Dr.Cristina Battaglia) and all my colleagues. It&#8217;s been a fun ride.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/02/moving-on/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Science and KDE: kile</title>
		<link>http://www.dennogumi.org/2009/02/science-and-kde-kile</link>
		<comments>http://www.dennogumi.org/2009/02/science-and-kde-kile#comments</comments>
		<pubDate>Sun, 22 Feb 2009 20:49:20 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[latex]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=551</guid>
		<description><![CDATA[During the course of my research work, I may obtain results that are worthy of publication in scientific journals. Since my master&#8217;s thesis I&#8217;ve been using LaTeX as my writing platform, mainly because I can concentrate on content rather than &#8230; <a href="http://www.dennogumi.org/2009/02/science-and-kde-kile">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>During the course of my research work, I may obtain results that are worthy of publication in scientific journals. Since my master&#8217;s thesis I&#8217;ve been using <a title="LaTeX web page" href="http://latex-project.org">LaTeX</a> as my writing platform, mainly because I can concentrate on content rather than presentation (I find it useful also for writing non-scientific stuff as well). Also, I can handle bibliography (essential for a scientific publication) very well without using expensive proprietary applications (such as Endnote).</p>
<p>In my early days I used kLyX first, then <a title="LyX" href="http://www.lyx.org">LyX</a>, but I found the platform to be too limited for my tastes, and also LaTeX errors were difficult to diagnose. I needed a proper editor, and that&#8217;s when I heard of <a title="Kile's web page" href="http://kile.sourceforge.net">kile, a KDE front-end for LaTeX</a>. Kile is currently at version 2.0.2 and is a KDE 3 application. However, in KDE SVN work is ongoing to produce a KDE4 version (2.1) and that&#8217;s what I&#8217;ll look at in this entry.</p>
<p><span id="more-551"></span></p>
<p><strong>Obtaining kile 2.1</strong></p>
<p>First and foremost, a disclaimer. kile 2.1 has not been released yet in any form, and so should be considered unstable and crash-prone. That said, it runs more or less well on my platform.</p>
<p>The first thing to do is to grab the sources from SVN:</p>
<p><code>svn checkout svn://anonsvn.kde.org/home/kde/trunk/extragear/office/kile</code></p>
<p>That will put kile&#8217;s sources in a directory called &#8220;kile&#8221;. The next step is to compile it (as usual, you need KDE4 development packages/files installed):</p>
<p><code>cd kile<br />
mkdir build; cd build<br />
cmake -DCMAKE_INSTALL_PREFIX=`kde4-config --prefix` ../<br />
make</code></p>
<p>Followed by the usual <code>make install</code> as root or using <code>sudo</code>.</p>
<p><strong>kile 2.1 at a glance</strong></p>
<p>This is how kile looks when loaded on my system:</p>
<p style="text-align: center;"><a class="shutterset_" title="Kile at startup" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile1.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile1.png?cda6c1" alt="kile1.png" /></a></p>
<p style="text-align: left;">(For the inquisitive people, it&#8217;s not a scientific work, rather a sci-fi like book I&#8217;m writing).</p>
<p style="text-align: left;">Kile uses the katepart for editing, so that means all the goodies that come with Kate can be used, including the recently-added vim input mode. Aside from editing and LaTeX syntax highlighting, kile offers a configurable LaTeX command completion, like this screenshot shows:</p>
<p style="text-align: center;"><a class="shutterset_" title="Command completion" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile4.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile4.png?cda6c1" alt="kile4.png" /></a></p>
<p style="text-align: left;">From the toolbars and the menus you can insert almost every LaTeX command known to mankind. For the people less apt with LaTeX, kile offers a series of wizards in order to make the creation of figures, tables and even complete documents. The one I&#8217;m showing here is the Quick Start wizard, which enables you to select document classes, add packages, and add information like author and date. As I was saying earlier, kile 2.1 is still a work in progress, and that explains why the dialog is still a little unrefined.</p>
<p style="text-align: center;"><a class="shutterset_" title="Quick start wizard" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile2.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile2.png?cda6c1" alt="kile2.png" /></a></p>
<p style="text-align: left;">Like with its KDE3 counterpart, kile offers the possibility of using &#8220;projects&#8221;, which means you can collect LaTeX documents, bib files, and so on, and associate them together. You can also set a master document, so that even if you are editing other files (included in the master document), when you build your LaTeX file the compilation runs on the master document.  Even in this case, a wizard helps in creating a project and the master document.</p>
<p style="text-align: center;"><a class="shutterset_" title="New project" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile3.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile3.png?cda6c1" alt="kile3.png" /></a></p>
<p style="text-align: left;">Lastly, kile has a plethora of other options, including customizing what you can use to build LaTeX files and view them (DVI, PS, PDF&#8230;), as shown in this screenshot.</p>
<p style="text-align: center;"><a class="shutterset_" title="Build options" href="http://www.dennogumi.org/wp-content/gallery/screenshots/kile5.png?cda6c1"><img class="ngg-singlepic ngg-none" src="http://www.dennogumi.org/wp-content/gallery/screenshots/thumbs/thumbs_kile5.png?cda6c1" alt="kile5.png" /></a></p>
<p style="text-align: left;"><strong>Conclusions</strong></p>
<p style="text-align: left;">I have merely scratched the surface of this application, which is extremely powerful and can help anyone with their LaTeX needs. While the many options may be confusing, I think that this application is already geared towards a technically-inclined userbase and so it doesn&#8217;t matter much. kile 2.1 is still unstable but extremely promising, and I&#8217;m looking forward to its release.</p>
<p style="text-align: left;">
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/02/science-and-kde-kile/feed</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Science and KDE: rkward</title>
		<link>http://www.dennogumi.org/2009/02/science-and-kde-rkward</link>
		<comments>http://www.dennogumi.org/2009/02/science-and-kde-rkward#comments</comments>
		<pubDate>Sat, 07 Feb 2009 18:55:53 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[KDE]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rkward]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=533</guid>
		<description><![CDATA[I try to use FOSS extensively for my scientific work. In fact, when possible, I use only FOSS tools. Among these there is the R programming language. It&#8217;s a Free implementation of the S-plus language, and it&#8217;s mainly aimed at &#8230; <a href="http://www.dennogumi.org/2009/02/science-and-kde-rkward">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I try to use FOSS extensively for my scientific work. In fact, when possible, I use <em>only</em> FOSS tools. Among these there is the R programming language. It&#8217;s a Free implementation of the S-plus language, and it&#8217;s mainly aimed at statistics and mathematics. As the people who read my scientific posts know, I don&#8217;t like R much. But sometimes it&#8217;s the only alternative.</p>
<p>Well, what does R have to do with KDE? With this post I&#8217;d like to start a series (hopefully) of articles that deals with KDE programs used for scientific purposes. In this particular entry, I&#8217;ll focus on rkward, a GUI front-end for R.<br />
<span id="more-533"></span><br />
<strong>Introduction</strong></p>
<p>Although R is a programming language, it&#8217;s mainly used in an interactive session, started from the terminal. The standard installation can be improved by the use of add-on packages, <em>libraries</em> in R-speak, which can be installed from the Internet (Comprehensive R Archive Network or CRAN) or from local files. One of the most famous third party repositories is the Bioconductor project, which hosts a lot of packages used by life scientists who do bioinformatics.</p>
<p>The Windows version of R has a GUI (Rgui) which provides extra functionality, such as package management and loading, and other goodies. Although there were plan for a GTK+ frontend for Linux, the project is (as far as I know) stuck in a limbo.</p>
<p>That&#8217;s where rkward comes to the rescue. It&#8217;s a GUI front-end for R for KDE4, which aims to provide a graphical shell for many R commands and environments (and especially the publication-quality plotting figures).</p>
<p><strong>Getting rkward</strong></p>
<p>rkward is available from <a title="rkward main page" href="http://rkward.sourceforge.net/">Sourceforge.net</a>. Unfortunately, if you use a recent (&gt;=2.8) version of R  it won&#8217;t compile, due to the changes in R itself. For that, you need to directly download the sources off SVN with a command like this</p>
<pre class="brush: cpp; title: ; notranslate">

svn co https://rkward.svn.sourceforge.net/viewvc/rkward/trunk/rkward/
</pre>
<p>Either way, the sources are compiled the usual, way, that is</p>
<pre class="brush: cpp; title: ; notranslate">

cd rkward-xxx # Your rkward source dir
mkdir build; cd build
cmake  -DCMAKE_INSTALL_PREFIX=`kde4-config --prefix` ../
make
</pre>
<p>Followed by <code>make install</code> as root or using sudo, depending on your distribution.</p>
<p><strong>rkward at a glance</strong></p>
<p><strong>
<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward1.png?cda6c1" title="" class="shutterset_singlepic263" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/263__320x240_rkward1.png?cda6c1" alt="rkward1.png" title="rkward1.png" />
</a>
</strong></p>
<p>This is how rkward looks when loading it up (yes, it&#8217;s in Italian because that is my own locale). You have the R console (which I brought up) and then an output window which is used to display results. There is also another tab called &#8220;mio.dataset&#8221; (my.dataset) which keeps data, in a spreadsheet-like form. This is useful when you want to create your own datasets from scratch, or if you want to inspect one you have loaded.</p>
<p>So how do you start coding? You can create a new script using the &#8220;Script File&#8221; button. Like that, you can input R commands and then execute them all at once, or the current line. If you prefer interactive work, you can use the R command line (shown in the screenshot).</p>

<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward2.png?cda6c1" title="" class="shutterset_singlepic264" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/264__320x240_rkward2.png?cda6c1" alt="rkward2.png" title="rkward2.png" />
</a>

<p>You can also use rkward to import data: R provides a series of functions (like <code>read.table</code>) to load data sets (usually comma- or tab-delimited text files). rkward provides a complete GUI to those functions, which is shown in the screenshot above. Notice that for working, it requires PHP (the line command version).</p>

<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward5.png?cda6c1" title="" class="shutterset_singlepic266" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/266__320x240_rkward5.png?cda6c1" alt="rkward5.png" title="rkward5.png" />
</a>

<p>Ok, we have data loaded. Now we may want to do some operations: rkward provides front-ends to many of R&#8217;s statistical functions. In the screenshot, we can see the GUI for a two-variable t-test. Notice how it shows also the code, so the most experienced R people can view exactly what it does.</p>
<p>Like with statistics, R has powerful support for graphics, and even in this case rkward offers some frontends, for example histograms, boxplots, and scatter plots. You can also plot all kinds of distributions.</p>

<a href="http://www.dennogumi.org/wp-content/gallery/screenshots/rkward3.png?cda6c1" title="" class="shutterset_singlepic265" >
	<img class="ngg-singlepic ngg-center" src="http://www.dennogumi.org/wp-content/gallery/cache/265__320x240_rkward3.png?cda6c1" alt="rkward3.png" title="rkward3.png" />
</a>

<p>Lastly, rkward can manage your R packages (R package management is akin to one of a Linux distribution), and als your package sources. You can install or upgrade packages, and select where they&#8217;ll get installed to.</p>
<p><strong>Conclusions</strong></p>
<p>rkward is a nice frontend for the R programming language, which adds a GUI with the power of KDE to R. Unfortunately the program is still somewhat unstable (also shown by a warning when you run it) and its main developer has currently very little time to work on it. In case you may want to help, you can hop to the r<a title="rkward-devel" href="http://sourceforge.net/mailarchive/forum.php?forum_name=rkward-devel">kward-devel mailing list.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/02/science-and-kde-rkward/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Published! (and it matters more)</title>
		<link>http://www.dennogumi.org/2009/01/published-and-it-matters-more</link>
		<comments>http://www.dennogumi.org/2009/01/published-and-it-matters-more#comments</comments>
		<pubDate>Tue, 06 Jan 2009 17:39:39 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[pathway analysis]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=489</guid>
		<description><![CDATA[Finally I can lift the curtain of silence and tell the reason why I&#8217;ve been very busy before Christmas: it all lies in the publication of a paper, &#8220;Using Pathway Signatures as Means of Identifying Similarities among Microarray Experiments&#8221;, which &#8230; <a href="http://www.dennogumi.org/2009/01/published-and-it-matters-more">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Finally I can lift the curtain of silence and tell the reason why I&#8217;ve been very busy before Christmas: it all lies in the publication of a paper, &#8220;Using Pathway Signatures as Means of Identifying Similarities among Microarray Experiments&#8221;, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004128">which is finally out on this week&#8217;s issue of <em>PLoS ONE</em></a>. It&#8217;s different from <a href="http://www.dennogumi.org/2008/01/phd">the previous paper I mentioned</a> (which was not my first publication, either), for two main reasons:</p>
<ul>
<li>It&#8217;s a bioinformatics paper;</li>
<li>I am <strong>first author</strong> there.</li>
</ul>
<p>The second point is very important because usually for a person doing bioinformatics is more difficult to end up as first author in a paper, since most we do is &#8220;something in the middle&#8221; like data analysis. Therefore, this paper is quite important for me. Also, it deals with an interest of mine, mainly analysis of biological networks using high-throughput platforms such as microarrays. Actually I&#8217;m also interested in network <em>reconstruction</em>, but I need to study far more than what I&#8217;m doing right now. </p>
<p>In any case, let&#8217;s hope this is the first of a (hopefully long) series!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2009/01/published-and-it-matters-more/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Commercial applications, public funding</title>
		<link>http://www.dennogumi.org/2008/06/commercial-applications-public-funding</link>
		<comments>http://www.dennogumi.org/2008/06/commercial-applications-public-funding#comments</comments>
		<pubDate>Fri, 27 Jun 2008 20:15:10 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=404</guid>
		<description><![CDATA[I wanted to write this earier, but I couldn&#8217;t: I&#8217;m now in a hotel in Maastricht, Netherlands, and waiting to get back tomorrow. I&#8217;ve been attending the 4th NuGO hands-on advanced microarray data analysis course and I even wanted to &#8230; <a href="http://www.dennogumi.org/2008/06/commercial-applications-public-funding">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I wanted to write this earier, but I couldn&#8217;t: I&#8217;m now in a hotel in Maastricht, Netherlands, and waiting to get back tomorrow. I&#8217;ve been attending the 4th <a title="The European Nutrigenomics Organization" href="http://www.nugo.org">NuGO</a> hands-on advanced microarray data analysis course and I even wanted to blog about it&#8230; but the hotel&#8217;s connection did not resolve <strong>any</strong> non-European web page until late today.</p>
<p><span id="more-404"></span></p>
<p>Recently it came to my mind that certain organizations, laboratory groups, etc. use commercial software when doing publicly funded research. I&#8217;m speaking about microarray experiments, since that is my field of work, while other fields may be different. Personally I see the use of commercial software in microarray data analysis feasible only for groups that can&#8217;t afford a dedicated person for data analysis. As for the rest, I don&#8217;t think it&#8217;s quite a good idea for a number of reasons:</p>
<ul>
<li>Commercial software may be polished and <em>shiny</em>, but for obvious reasons it always lags behind the academic developed software;</li>
<li>Most of the time the <em>same</em> results can be obtained with free alternatives, for example normalization, differential expression and hierarchical clustering;</li>
<li>Most importantly, <a title="More information on the Wikipedia" href="http://en.wikipedia.org/wiki/Vendor_lockin">there is the issue of lock in.</a> What happens if you get a cut in your funds and you can&#8217;t pay your annual license anymore? You get a bunch of unusable data. And again, what if the company goes belly up? Again, you are screwed. This is even more true for web-based applications, where the data resides on a server that is away from you.</li>
</ul>
<p>Of course the latter point also applies to academic software that is not either free or open source. <!--intlink id="125" type="post" text="That is why publishing software under open licenses is important"-->.</p>
<p>It is also worthy to note that sometimes the results of algorithm designing, data workflows and the like end up in commercial applications. That may be perhaps healthy for business, but if those people got <strong>public</strong> funding, they should not be allowed to profit on the citizen&#8217;s tax money. If they got their funding by other means, they can do whatever they want in my view. I&#8217;m only concerned that goverment-funded research would then be used to restrict knowledge (only to paying customers) instead of spreading it.</p>
<p>Of course, <!--intlink id="205" type="post" text="it\'s not like all scientific software is great"-->, but if it is open sourced, at least someone can pick it up and improve it even if the original author is no longer around.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/06/commercial-applications-public-funding/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FOSS and research</title>
		<link>http://www.dennogumi.org/2008/05/foss-and-research</link>
		<comments>http://www.dennogumi.org/2008/05/foss-and-research#comments</comments>
		<pubDate>Sat, 10 May 2008 07:30:48 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[free software]]></category>
		<category><![CDATA[publish or perish]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=400</guid>
		<description><![CDATA[I&#8217;ve been wondering about why FOSS is often compared to the academic world, but at least in my limited experience, I see little people that grasp its concept in the world of research. On a quick look, developing FOSS in &#8230; <a href="http://www.dennogumi.org/2008/05/foss-and-research">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been wondering about why FOSS is often compared to the academic world, but at least in my limited experience, I see little people that grasp its concept in the world of research. On a quick look, developing FOSS in a research environment would be very good: not only you&#8217;d get publicly available results when you publish, but at the same time you can make sure that in an extreme case your application will be carried on by someone else should you not be able to continue development.</p>
<p>At least in the life sciences, it&#8217;s hard to see such a mentality. I can understand <!--intlink id="266" type="post" text="the publish or perish frenzy"-->, but at the same time, <!--intlink id="328" type="post" text="don\'t we all remember about published and unmantained software"-->? For me, such an idea would be optimal. Once the paper is out, you can release your software (GPL would be best) and make sure someone will improve or mantain in. Of course you won&#8217;t be able to publish for each upgrade you do, but I would generally think of that as a bad policy, one made just to increase the publication count.</p>
<p>Does something like that happen with FOSS in other research areas?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/05/foss-and-research/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance and R</title>
		<link>http://www.dennogumi.org/2008/04/performance-and-r</link>
		<comments>http://www.dennogumi.org/2008/04/performance-and-r#comments</comments>
		<pubDate>Sat, 05 Apr 2008 13:12:18 +0000</pubDate>
		<dc:creator>Einar</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[microarray]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.dennogumi.org/?p=390</guid>
		<description><![CDATA[I&#8217;m often wondering why people only resort to R when working with microarrays. I can understand that Bioconductor offers a plethora of different packages and that R&#8217;s statistical functions come in handy for many applications, but still, I think people &#8230; <a href="http://www.dennogumi.org/2008/04/performance-and-r">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m often wondering why people only resort to R when working with microarrays. I can understand that <a title="Bioconductor home page" href="http://www.bioconductor.org">Bioconductor</a> offers a plethora of different packages and that R&#8217;s statistical functions come in handy for many applications, but still, I think people underestimate the impact of performance.</p>
<p>R is not a performing language at all, it doesn&#8217;t parallelize well when using HPC (at least from the talks I&#8217;ve had with people studying the matter), and in general is a memory and resource hog. For example, it takes much more to perform RMA via R that with <a title="RMAExpress" href="http://rmaexpress.bmbolstad.com/">RMAExpress</a> (which is a C++ application): the latter works also better with regards to memory utilization. I can understand the complexity of some statistical procedures, but what about <!--intlink id="298" type="post" text="parsing GEO files"-->?</p>
<p>The surprising aspect is that aside by a few exceptions (like the aforementioned RMAExpress) no one has tried to write more performing implementations of certain algorithms. I for one would welcome a non-R implementation of <abbr title="Significance Analysis of Microarrays">SAM</abbr> (the original implementation works in Excel&#8230; ugh) or similar algorithms. Otherwise we would be stuck with programs that are interesting, but way too memory hungry (<a title="AMDA: an R package for the automated microarray data analysis.AMDA: an R package for the automated microarray data analysis." href="http://www.ncbi.nlm.nih.gov/pubmed/16824223?ordinalpos=4&amp;itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum">AMDA</a> comes to mind).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dennogumi.org/2008/04/performance-and-r/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using xcache (Feed is rejected)
Page Caching using xcache
Database Caching 5/58 queries in 0.030 seconds using xcache
Object Caching 850/980 objects using xcache

Served from: www.dennogumi.org @ 2012-02-07 15:41:29 -->
