tag:blogger.com,1999:blog-75568134352242915792024-03-13T16:58:43.362+00:00One R Tip A Day"A big computer, a complex algorithm and a long time does not equal science." -- Robert GentlemanPaolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.comBlogger103125tag:blogger.com,1999:blog-7556813435224291579.post-80506086415867380312013-07-03T08:44:00.000+01:002013-07-03T08:44:14.776+01:00Summer ReadingGet your fresh copy of the R-Journal from <a href="http://journal.r-project.org/archive/2013-1/" target="_blank">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-73155572149245722362013-04-03T13:01:00.000+01:002013-04-03T13:01:00.448+01:00R 3.0.0 is released!The new R 3.0.0 is out! You know the <a href="http://onertipaday.blogspot.it/search/label/upgrade">drill</a>! Get the source code from <a href="http://cran.r-project.org/src/base/R-3/" target="_blank">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-77714759093946127312012-12-14T16:26:00.000+00:002012-12-14T16:26:34.243+00:00R Journal Volume 4/2, December 2012The 'Winter edition' of the R Journal is out! Get it from <a href="http://journal.r-project.org/archive/2012-2/RJournal_2012-2.pdf" target="_blank">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-20105977650581887832012-12-03T14:41:00.000+00:002012-12-04T16:09:24.951+00:00Italian Bio R Day 2012 - Slides on Reproducible Research using R and BioconductorThanks to <a href="http://www.tecnoparco.org/" target="_blank">Parco Tecnologico Padano (PTP)</a>, I was invited to speak at the first <a href="http://www.tecnoparco.org/index.php?option=com_content&view=article&id=649:italian-bior-day-at-ptp&catid=94:news-bioinformatica&Itemid=166&lang=en" target="_blank">Italian Bio R Day</a> that was held in <a href="http://goo.gl/bqOA1" target="_blank">Lodi</a> on 30 November 2012. It was a nice opportunity to talk and listen about different aspects of R from practitioners with different backgrounds (epidemiology, chemometrics and bioinformatics).<br />
My <a href="https://github.com/onertipaday/ItalianBioRDay2012/tree/master/Slides" target="_blank">presentation</a> was about Reproducible Research in High-Throughput Biology using R and Bioconductor. The presentation was held in Italian but the slides and the case study are in English. All the material was created using <a href="http://www.rstudio.org/" target="_blank">Rstudio</a>, taking advantage of its amazing integration with both <a href="http://yihui.name/knitr/" target="_blank">knitr</a> and github, knitr to convert R Markdown to Markdown and Sweave/knitr to LaTeX, and <a href="http://johnmacfarlane.net/pandoc/" target="_blank">pandoc</a> for converting markdown to html5. The material is quite basic, nevertheless I'd like to share it under the <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/" target="_blank">Creative Commons Attribution-NonCommercial-Share Alike 3.0 License</a>. You can access everything from <a href="http://onertipaday.github.com/ItalianBioRDay2012/" target="_blank">here</a>. Fell free to fork it, highlight errors or plagiarism, suggest modifications, etc.: I'll be more than happy to fix bugs and give credits to where is due.<br />
Finally, I'd like to thank Andrea Pedretti for inviting me at this nice meeting , <a href="http://yihui.name/" target="_blank">Yihui Xie</a> for his awesome knitr package and Vince Buffalo for his inspiring <a href="http://vincebuffalo.org/2012/03/08/the-beauty-of-bioconductor.html" target="_blank">The Beauty of Bioconductor</a> blog post.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com2tag:blogger.com,1999:blog-7556813435224291579.post-64340144960382894092012-07-06T08:34:00.002+01:002012-07-06T08:34:59.282+01:00The R Journal Volume 4/1The 'Summer edition' of the R Journal is out! Get it from <a href="http://journal.r-project.org/archive/2012-1/RJournal_2012-1.pdf" target="_blank">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-33867467365405925652011-12-19T09:35:00.002+00:002011-12-19T09:35:53.176+00:00Christmas Gift to the R Community: The R Journal!The R Journal Volume 3/2 is available!<br />
Get it from <a href="http://journal.r-project.org/current.html" target="_blank">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-47477612135915874492011-12-05T13:12:00.000+00:002011-12-07T14:12:17.160+00:00The Art of R Programming - my two cents<div class="separator" style="clear: both; text-align: center;">
<a href="http://nostarch.com/artofr.htm" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;" target="_blank"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2K7U6dWgfgP_mmacDAxPJxWTqixUZnHRTt9VEF37CzOva3cSLbfDlgWKUysEl8oF0DqRCEizH_yimDy7j2mIHUZCsFMWtMdBs97xEDT-UOVF3_2-qQJkVA37jlFmdn5fqyWOGHC1F6PU/s320/R_cvr_front_small.png" width="241" /></a></div>
What makes this book different from other books about R is stated clearly by the author Norman Matloff in the introduction:<br />
<blockquote class="tr_bq">
<i>"This book is not a compendium of the myriad types of statistical methods that are available in the wonderful R package. It really is about programming and cover programming-related topics missing from most other books on R".</i></blockquote>
Most books about R present a gentle introduction to the language and then jump to practical applications. Norman Matloff, across the 350 pages of this book, accompanies the reader in developing the necessary skills useful to write software in a proper way focusing on the characteristics and idiosyncrasies of the R language.<br />
<br />
In each of the first six chapters of the book the author covers a different R data type: vector, matrix, list, data.frame and factor. Starting from basic examples and progressing to more complex ones each data type is properly introduced and used in the proper context. Furthermore, some extended examples are ameliorated or re-implemented along new type are introduced in order to show the expressivity of the language. The explanation of small details such as the use of the <i>drop=FALSE</i> argument in matrix/data.frame subsetting or the <i>stringsAsFactor=FALSE</i> argument when building up a data.frame are the proverbial icing on the cake which can make your day-by-day workflow more productive.<br />
Chapter 7, 8 and 9 are the heart of the Art of R Programming introducing the structures, idioms, peculiarities and idiosyncrasies of R as programming language.<br />
Chapter 7 presents how the typical programming structures are implemented in R and how to use them correctly: control statements, functions, recursion etc. are explained by clear and appropriate examples of increased complexity and usefulness.<br />
Chapter 8 about doing math and simulation in R is a more 'traditional' chapter depicting the mathematical/statistical facilities embedded in R. Since the main selling-point of R is its statistical capabilities an introduction to their characteristics and use makes perfectly sense.<br />
Chapter 9 covers S3 and S4 the two most commonly used paradigms of object-oriented programming (OOR) implemented in R. If you are going to start designing and developing R software in a proper and reusable form this chapter will provide all the necessary information and a good collection of examples tailored to R mathematical/statistical peculiarities.<br />
Chapter 10 is about I/O and provides all the necessary directions needed to parse data in R locally and from the internet.<br />
Chapter 11 is about string manipulation and it is less technical than former chapters, presenting a sort of cheat-sheet collection of the most common functions to handle string in R. The author covers the string capabilities embedded in base R but advices to take a look at Hadley Wickham's <a href="http://cran.r-project.org/web/packages/stringr/index.html" target="_blank">stringr package</a> for a more consistent handling of strings in R.<br />
Chapter 12 introduces graphics in R providing a gentle overview of the huge R graphics capabilities but it doesn't present an in-depth discussion. Fortunately there are a lot of other books (for example <a href="http://www.amazon.com/Graphics-Chapman-Hall-CRC/dp/158488486X" target="_blank">Paul Murrel's R Graphics</a>) dedicated to this subject which is indeed one of the R's strong points.<br />
Chapter 13 about debugging is short but points out almost everything is important to know about debugging R code; furthermore it provides a wide vision about debugging in general: the author Norman Matloff is also the co-author of <a href="http://shop.oreilly.com/product/9781593271749.do" target="_blank">The Art of Debugging with GDB and DDD</a> and clearly he knows the matter of which he speaks.<br />
Chapter 14 covers strategies to handle the time/space trade-off in order to enhance the performance of R programs. In particular it explains the proper use of vectorization in order to speed up your code.<br />
Chapter 15 and 16 are a sort of follow-up to chapter 14, meaning that they explain how to enhance the performance of your code by integrating R with other language, such as Python and C/C++ (Chapter 15) and by parallelizing your code. Both chapters provide an introductory glance on these topics but present sufficient coverage in order to be useful. <br />
Conclusions:<br />
Is it worth to buy this book? The short answer is YES. If you are serious in learning R in order to both analyze in the most appropriate and effective way your data (e.g. using the appropriate data type according your specific task) and to develop software, The Art of R programming will be beneficial to you.<br />
Caveats: since the peculiar approach and aim of this book my advice is to buy this book together with a more statistical oriented, for example <a href="http://www.manning.com/kabacoff/" target="_blank">Rob Kabacoff's R in Action</a> and one or two about graphics in R (e.g. <a href="http://www.amazon.com/R-Graph-Cookbook-Hrishi-Mittal/dp/1849513066/" target="_blank">Hrishi Mittal's R Graph Cookbook</a> or <a href="http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403" target="_blank">Hadley Wickham's ggplot2 book</a>).<br />
<br />
Disclaimer: <a href="http://nostarch.com/catalog.htm" target="_blank">No Starch Press</a> provided me a free copy for review.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com4tag:blogger.com,1999:blog-7556813435224291579.post-46797676318642971042011-11-16T08:31:00.001+00:002011-11-25T16:13:36.089+00:00Weather forecast and good development practicesInspired by <a href="http://papermashup.com/using-googles-weather-api/" target="_blank">this</a> tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:<br />
<blockquote class="tr_bq">
Weather summary for Trieste, Friuli-Venezia Giulia:<br />
The weather in Trieste is clear. The temperature is currently 14°C (57°F). Humidity: 63%.</blockquote>
Fortunately, thanks to the always useful Duncan Temple Lang's <a href="http://cran.r-project.org/web/packages/XML/index.html" target="_blank">XML</a> package (see <a href="http://www.omegahat.org/RSXML/" target="_blank">here</a> for a tutorial about XML programming under R), it is straightforward to write few lines of R code to invoke the google weather api for the location of interest, retrieve the XML file, parse it using the <a href="http://www.w3schools.com/xpath/default.asp" target="_blank">XPath</a> paradigm and get the required informations:<br />
<br />
<pre class="brush: r">address="Trieste"
url = paste( "http://www.google.com/ig/api?weather=", URLencode(address), sep="" )
xml = xmlTreeParse(url, useInternalNodes=TRUE) # take a look at the xml output:
# Get the required informations:
condition=xpathSApply(xml,"//xml_api_reply/weather/current_conditions/condition",xmlGetAttr,"data")
temp_c=xpathSApply(xml,"//xml_api_reply/weather/current_conditions/temp_c",xmlGetAttr,"data")
humidity=xpathSApply(xml,"//xml_api_reply/weather/current_conditions/humidity",xmlGetAttr,"data")
cat( paste("The Weather in ", address, " is ", condition, ". The temperature is ", temp_c, "°C. Humidity is ", humidity, "%.") )
</pre>
<br />
Times ago I came to the conclusion that the best way to organize my R code is to create packages even for basic tasks. I know that It seems too much effort for this trivial task (and it was in the past) but fortunately, thanks to the Hadley Wickham's <a href="http://cran.r-project.org/web/packages/devtools/index.html" target="_blank">devtools</a> package development It has become a piece of cake process (sort of)!<br />
<br />
Below I present the minimal workflow I used to create this simple package. For a proper introduction to package development using devtools take a look at this <a href="https://github.com/hadley/devtools/wiki" target="_blank">link</a>.<br />
<br />
First create the skeleton for the project using the package.skeleton() function:<br />
<pre class="brush: r">package.skeleton("pkg")</pre>
Read './pkg/Read-and-delete-me' file, compile the DESCRIPTION fiels according to your needs and delete './pkg/Read-and-delete-me'.<br />
Now the devtools magic: <br />
<pre class="brush: r">library("devtools")
pkg <- as.package("pkg") # pkg is the directory containing the structure created using package.skeleton()</pre>
Create your functions and documentation following the <a href="http://roxygen.org/" target="_blank">roxygen</a> literate programming paradigm: basically you write your functions together with its documentation using in the preamble tags such as <i>@param, @example</i>, etc. to indicate the different constituents of the functions and devtools automagically will create the functions' documentation (.Rd files).<br />
Then you test your code, try your examples, verify that your package passes the check without errors and warnings, build it and, if you like, you can ftp it directly to CRAN (disclaimer: I didn't check this feature)!<br />
<pre class="brush: r">load_all(pkg, reset=T) # to reload the package without having to restart R
document(pkg) # to be used together with roxygen2 to creating the corresponding Rd files
run_examples(pkg) # to check the examples for the different functions
devtools:::check(pkg) # to verified if your package raises errors or warnings
devtools:::build(pkg)
install(pkg) # install your package
# release()
</pre>
<br />
Final consideration: the devtools package improved significantly my day-by-day workflow and I want to thank <a href="http://had.co.nz/" target="_blank">Hadley Wickham</a> for this and all the other valuable packages he gifted the R community! <br />
P.S. If you like to install the RWeather package I created using devtools, you can do it by typing: <br />
<pre class="brush: r">install.packages("RWeather", repos="http://R-Forge.R-project.org")</pre>
or download the source code from <a href="https://r-forge.r-project.org/src/contrib/RWeather_0.1.tar.gz" target="_blank">here</a>.<br />
P.S.2 I'd like to thank Kay Cichini for <a href="http://thebiobucket.blogspot.com/2011/11/using-syntaxhighlighter-and-r-brush-in.html" target="_blank">this</a> post which explains how to set-up the syntax-highlighting for the R code on Blogger.<br />
<br />
Update: Thanks to the useful info I got from <a href="http://code.google.com/p/python-weather-api/" target="_blank">this</a> Python module, now RWeather can show weather information from <a href="http://developer.yahoo.com/weather/" target="_blank">Yahoo! Weather</a>, Google Weather and <a href="http://graphical.weather.gov/xml/" target="_blank">NOAA APIs</a>.<br />
From now the stable version of the package can be installed directly from CRAN:<br />
<pre class="brush: r">install.packages("RWeather")</pre>Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-28944312543053686082011-10-31T14:14:00.001+00:002011-10-31T21:25:38.477+00:00R 2.14.0 is released!<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
The new R 2.14.0 is out! Get the source code from <a href="http://cran.r-project.org/src/base/R-2">here</a>.</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
Take a look at <a href="http://onertipaday.blogspot.com/search/label/upgrade">these</a> posts for some miscellaneous advices to make the upgrade easier.</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
Also <a href="http://stackoverflow.com/questions/1401904/painless-way-to-install-a-new-version-of-r">this</a> thread on stackoverflow and <a href="http://www.r-statistics.com/2011/04/how-to-upgrade-r-on-windows-7/">this</a> post contributed by <a href="http://www.r-statistics.com/about/">Tal Galili</a> can be of some value to make the procedure less painful.</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
Feel free to contribute with suggestions about how to upgrade your R installation.</div>Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com3tag:blogger.com,1999:blog-7556813435224291579.post-20913080311208437752011-07-27T16:30:00.001+01:002016-04-28T09:54:46.586+01:00Word Cloud in RA word cloud (or <a href="http://en.wikipedia.org/wiki/Tag_cloud">tag cloud</a>) can be an handy tool when you need to highlight the most commonly cited words in a text using a quick visualization. Of course, you can use one of the several on-line services, such as <a href="http://www.wordle.net/">wordle</a> or <a href="http://www.tagxedo.com/">tagxedo</a> , very feature rich and with a nice GUI. Being an R enthusiast, I always wanted to produce this kind of images within R and now, thanks to the recently released Ian Fellows' <a href="http://cran.r-project.org/web/packages/wordcloud/index.html">wordcloud</a> package, finally I can!<br />
In order to test the package I retrieved the titles of the XKCD web comics included in my <a href="http://cran.r-project.org/web/packages/RXKCD/">RXKCD</a> package and produced a word cloud based on the titles' word frequencies calculated using the powerful <a href="http://cran.r-project.org/web/packages/tm/index.html">tm</a> package for text mining (I know, it is like killing a fly with a bazooka!).<br />
<br />
<pre class="brush: r">library(RXKCD)
library(tm)
library(wordcloud)
library(RColorBrewer)
path <- system.file("xkcd", package = "RXKCD")
datafiles <- list.files(path)
xkcd.df <- read.csv(file.path(path, datafiles))
xkcd.corpus <- Corpus(DataframeSource(data.frame(xkcd.df[, 3])))
xkcd.corpus <- tm_map(xkcd.corpus, removePunctuation)
xkcd.corpus <- tm_map(xkcd.corpus, content_transformer(tolower))
xkcd.corpus <- tm_map(xkcd.corpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(xkcd.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
dev.off()</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHbFm7rG62Kcs9dGRJHBEE2awK6a3WOxynJ4br6YLSZhbzNewmJm-hY2SiTrCwWv3J7fzGLH6zh0xXI2xhXk5QL1jJG0PLtceo1AxeT1XHfEip7Y8_taGnTE-vhVylH87kcbpvH90HhNU/s1600/wordcloud.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="377" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHbFm7rG62Kcs9dGRJHBEE2awK6a3WOxynJ4br6YLSZhbzNewmJm-hY2SiTrCwWv3J7fzGLH6zh0xXI2xhXk5QL1jJG0PLtceo1AxeT1XHfEip7Y8_taGnTE-vhVylH87kcbpvH90HhNU/s400/wordcloud.png" width="400" /></a></div>
<br />
As a second example, inspired by <a href="http://ekonometrics.blogspot.com/2011/04/painting-picture-of-statistical.html">this</a> post from the <a href="http://ekonometrics.blogspot.com/">eKonometrics</a> blog, I created a word cloud from the description of 3177 available R packages listed at <a href="http://cran.r-project.org/web/packages">http://cran.r-project.org/web/packages</a>.<br />
<pre class="brush: r">require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
u = "http://cran.r-project.org/web/packages/available_packages_by_date.html"
t = readHTMLTable(u)[[1]]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(t[,3]))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, content_transformer(tolower))
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))</pre>
<pre class="brush: r">ap.corpus <- Corpus(VectorSource(ap.corpus))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_packages.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=3,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()</pre>
<div class="separator" style="clear: both; text-align: center;">
<span class="Apple-style-span"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiENhBTwPXQozLl51x79HCIv76PCNI-iaUICfoSKnBbmA97ORAbwSar_EXeUkxyppEyfV7arxy15jJUu8UYOUFwVp1bC1R2qNswlgxIYOQFIB1oCbsraFRMwoFQIm2xRVDHNeLv8jmKxqI/s1600/wordcloud_packages.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiENhBTwPXQozLl51x79HCIv76PCNI-iaUICfoSKnBbmA97ORAbwSar_EXeUkxyppEyfV7arxy15jJUu8UYOUFwVp1bC1R2qNswlgxIYOQFIB1oCbsraFRMwoFQIm2xRVDHNeLv8jmKxqI/s320/wordcloud_packages.png" width="320" /></a></span></div>
<br />
As a third example, thanks to Jim's comment, I take advantage of <a href="http://www.omegahat.org/" target="_blank">Duncan Temple Lang</a>'s RNYTimes package to access user-generate content on the NY Times and produce a wordcloud of 'today' comments on articles.<br />
Caveat: in order to use the RNYTimes package you need a API key from The New York Times which you can get by registering to the The New York Times Developer Network (free of charge) from <a href="http://developer.nytimes.com/" target="_blank">here</a>.<br />
<pre class="brush: r">require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
install.packages(packageName, repos = "http://www.omegahat.org/R", type = "source")
require(RNYTimes)
my.key <- "your API key here"
what= paste("by-date", format(Sys.time(), "%Y-%m-%d"),sep="/")
# what="recent"
recent.news <- community(what=what, key=my.key)
pagetree <- htmlTreeParse(recent.news, error=function(...){}, useInternalNodes = TRUE)
x <- xpathSApply(pagetree, "//*/body", xmlValue)
# do some clean up with regular expressions
x <- unlist(strsplit(x, "\n"))
x <- gsub("\t","",x)
x <- sub("^[[:space:]]*(.*?)[[:space:]]*$", "\\1", x, perl=TRUE)
x <- x[!(x %in% c("", "|"))]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(x))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, content_transformer(tolower))
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_NewYorkTimes_Community.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=2,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-Djwtj6JF_UA/Tv7aN-3E5MI/AAAAAAAABBc/BcogkgjJpEY/s1600/wordcloud_NewYorkTimes_Community.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://1.bp.blogspot.com/-Djwtj6JF_UA/Tv7aN-3E5MI/AAAAAAAABBc/BcogkgjJpEY/s320/wordcloud_NewYorkTimes_Community.png" width="320" /></a></div>
<br />
<br />Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com48tag:blogger.com,1999:blog-7556813435224291579.post-89047737469412906242011-07-14T08:33:00.002+01:002011-12-19T15:03:55.650+00:00R meets XKCDBeing a big fan of <a href="http://xkcd.com/">XKCD</a> and, of course, of the R programming language, I thought that a package which allows to display my favorite strips would something (useless) but cool!<br />
So, mimicking the approach (and the code) of the <a href="http://cran.r-project.org/web/packages/fortunes/index.html">fortunes</a> package (thanks Achim Zeileis!), I created a simple package (names RXKCD) which allows the user to displays his favorite XKCD strip by selecting the specific number, randomly or simply displaying the current strip.<br />
You can install the package using:<br />
<pre class="brush: r">if (!require('RJSONIO')) install.packages('RJSONIO', repos = 'http://cran.r-project.org')
if (!require('png')) install.packages('png', repos = 'http://cran.r-project.org')
if (!require('ReadImages')) install.packages('ReadImages', repos = 'http://cran.r-project.org')
install.packages("RXKCD", repos="http://R-Forge.R-project.org")</pre>
And you can use it by typing:<br />
<pre class="brush: r">library(RXKCD)
searchXKCD("someone is wrong")
getXKCD(386)</pre>
<span class="Apple-style-span" style="font-family: inherit;">Below the result (<a href="http://xkcd.com/license.html">xkcd license</a>):</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-5ye_CNxjj8HNioWiMpmxfLVE4AP5asMbYeuNX5ZmycBUmx9GxsEk5nOtzlYnL_ERYR4zsLzgxI5svD4LhK6QwRn8HOgAK12sdxzxiWPdz-kqznchvuT8f7eRPX7xZpyWelM2LXy6kq0/s1600/Duty+Calls.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-5ye_CNxjj8HNioWiMpmxfLVE4AP5asMbYeuNX5ZmycBUmx9GxsEk5nOtzlYnL_ERYR4zsLzgxI5svD4LhK6QwRn8HOgAK12sdxzxiWPdz-kqznchvuT8f7eRPX7xZpyWelM2LXy6kq0/s320/Duty+Calls.png" width="290" /></a></div>
<span class="Apple-style-span" style="font-family: inherit;"><br />
</span><br />
<b>Update</b>: The updated version of the <a href="http://cran.r-project.org/web/packages/RXKCD/index.html" target="_blank">package</a> , which is available from CRAN (just type <span style="font-family: "Courier New",Courier,monospace;">install.packages("RXKCD")</span> ), allows the user to save the xkcd metadata database in a local directory (<i>.Rconfig</i>) and update it in order to have access to the latest XKCD info: see <span style="font-family: "Courier New",Courier,monospace;">?saveConfig</span> and <span style="font-family: "Courier New",Courier,monospace;">?updateConfig</span>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com8tag:blogger.com,1999:blog-7556813435224291579.post-8430421802646505042011-06-24T13:25:00.001+01:002011-06-24T13:25:20.541+01:00Installing Multiple Version of R in parallel on the same machine - Mac OS XIn a few days I'm going to attend a <a href="http://users.unimi.it/marray/2011/">Bioconductor Course</a>; I was requested to install on my MacBook (Mac OS X 10.5.8) a developer version of R (plus ad hoc Bioconductor packages). In order to keep my old R installation ((2.13) along side the new one (2.14) I decided to use the RSwitch app (you can download from <a href="http://r.research.att.com/#other">here</a>) and the instructions you can read <a href="http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#How-can-R-for-Mac-OS-X-be-obtained-and-installed_003f">here</a>.<br />
In practical term, you type the following commands in Terminal:<br />
<br />
<code>sudo pkgutil --forget org.r-project.R.Leopard.fw.pkg<br />
sudo pkgutil --forget org.r-project.R.Leopard.GUI.pkg<br />
sudo pkgutil --forget org.r-project.R.Leopard.GUI64.pkg</code><br />
<br />
You install the alternative version of R (for example, following the procedure depicted <a href="http://onertipaday.blogspot.com/search/label/mac%20os%20x">here</a>) and then you can switch between the different version using the RSwitch GUI (see the below screenshot). So easy!<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3Hyb6Lm73xGJcGNP3qFvXm4lhH2HAV7M_L-2Mt6nFff5L3pWvGqiy3kyANoh2UjUuG9M5Rx3WQWORKpusaDEwdUDvWfo9OFmS7N3oeVspC3OiE-5-pg7_ueVoGXP9tkb9EoOanFJgO9Y/s1600/RSwitch.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3Hyb6Lm73xGJcGNP3qFvXm4lhH2HAV7M_L-2Mt6nFff5L3pWvGqiy3kyANoh2UjUuG9M5Rx3WQWORKpusaDEwdUDvWfo9OFmS7N3oeVspC3OiE-5-pg7_ueVoGXP9tkb9EoOanFJgO9Y/s320/RSwitch.png" width="320" /></a></div><br />
<br />
<br />
Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-33830841282932711562011-04-14T12:41:00.001+01:002011-10-31T21:26:02.637+00:00R 2.13.0 is released!<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
The new R 2.13.0 is out! Get the source code from <a href="http://cran.r-project.org/src/base/R-2">here</a>.</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
Take a look at <a href="http://onertipaday.blogspot.com/search/label/upgrade">these</a> posts for some miscellaneous advices to make the upgrade easier.</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
Also <a href="http://stackoverflow.com/questions/1401904/painless-way-to-install-a-new-version-of-r">this</a> thread on stackoverflow and <a href="http://www.r-statistics.com/2011/04/how-to-upgrade-r-on-windows-7/">this</a> post contributed by <a href="http://www.r-statistics.com/about/">Tal Galili</a> can be of some value to make the procedure less painful.</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
Feel free to contribute with suggestions about how to upgrade your R installation.</div>Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-64768195015047948032011-02-02T08:52:00.000+00:002011-02-02T08:52:28.740+00:00Plotting images on a grid using R or PythonA thread depicting how to insert a png image in a plot, thanks to <a href="http://stackoverflow.com/">Stackoverflow</a>: <a href="http://stackoverflow.com/questions/4860417/plotting-images-on-a-grid">plotting-images-on-a-grid</a>. <br />
A very basic tip, still useful to someone.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com6tag:blogger.com,1999:blog-7556813435224291579.post-91283400632559913952010-12-31T11:43:00.000+00:002010-12-31T11:43:17.994+00:00R Journal 2/2The last gift of 2010: R Journal 2/2 is out! Get it from <a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2.pdf">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com2tag:blogger.com,1999:blog-7556813435224291579.post-81517704154656944512010-11-08T09:41:00.004+00:002010-12-11T15:32:13.691+00:00A R wrapper for Google Prediction APISince I got the chance to access to both Google Storage for Developers and Google Prediction API (more details <a href="http://code.google.com/apis/storage/">here</a> and <a href="http://code.google.com/apis/predict/">here</a>), I decided to create a simple wrapper (just 4 basic functions until now) to be capable to play with the Google Prediction API from R.<br />
<a href="https://github.com/onertipaday/predictionapirwrapper">Here</a> you can find the github repository for the project and below few lines of code reproducing an <a href="http://code.google.com/apis/predict/docs/getting-started.html#start">example</a> you can find on the Google Prediction API website.<br />
<br />
Download the source code from <a href="https://github.com/onertipaday/predictionapirwrapper/archives/master">here</a>.<br />
Either source the functions contained in the R directory or install the package typing (from the command line in a Unix-like environment):<br />
R CMD INSTALL predictionapirwrapper_1.0.tar.gz<br />
# start R and type (code highlighting thanks to Revolution Analytics <a href="http://www.inside-r.org/pretty-r/tool">Pretty R syntax highlighter</a>):<br />
<div style="overflow: auto;"><div class="geshifilter"><pre class="r geshifilter-R" style="font-family: monospace;"><span style="color: #99ccff;">library</span><span style="color: #009900;">(</span>predictionapirwrapper<span style="color: #009900;">)</span>
<span style="color: #666666; font-style: italic;">## The first stage of using the API is to acquire an authorization token. This can be done via this command:</span>
token <- GetAuthToken<span style="color: #009900;">(</span>email=<span style="color: #99ccff;">"user@gmail.com"</span><span style="color: #339933;">,</span> passwd=<span style="color: #99ccff;">"mypassword"</span><span style="color: #009900;">)</span>
<span style="color: #666666; font-style: italic;">## This command begins training on data that has been previously uploaded to Google Storage.</span>
GoogleTrain<span style="color: #009900;">(</span>auth_token=token$Auth<span style="color: #339933;">,</span> mybucket=<span style="color: #99ccff;">"data_languages"</span><span style="color: #339933;">,</span> mydata=<span style="color: #99ccff;">"language_id.txt"</span><span style="color: #009900;">)</span>
<span style="color: #666666; font-style: italic;">## Once training has started, this command checks the status of the training job and gets meta-information on the model (if available).</span>
GoogleTrainCheck<span style="color: #009900;">(</span>auth_token=token$Auth<span style="color: #339933;">,</span> mybucket=<span style="color: #99ccff;">"data_languages"</span><span style="color: #339933;">,</span> mydata=<span style="color: #99ccff;">"language_id.txt"</span><span style="color: #009900;">)</span>
<span style="color: #666666; font-style: italic;">## When training has finished, this command issues a request for a new prediction from the model. </span>
GooglePredict<span style="color: #009900;">(</span>auth_token=token$Auth<span style="color: #339933;">,</span> mybucket=<span style="color: #99ccff;">"data_languages"</span><span style="color: #339933;">,</span> mydata=<span style="color: #99ccff;">"language_id.txt"</span><span style="color: #339933;">,</span> myinput=<span style="color: #99ccff;">"La idioma mas fina"</span><span style="color: #009900;">)</span></pre></div></div><br />
All comments, corrections, alternative code are more than welcome!<br />
<br />
Update: a more complete and functional alternative can be found <a href="https://code.google.com/p/google-prediction-api-r-client/">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com2tag:blogger.com,1999:blog-7556813435224291579.post-11221681989611900522010-10-15T12:03:00.000+01:002011-10-31T21:26:15.740+00:00R 2.12.0 is released!The new R 2.12.0 is out! Get the source code from <a href="http://cran.r-project.org/src/base/R-2">here</a>.<br />
Take a look at <a href="http://onertipaday.blogspot.com/search/label/upgrade">these</a> posts for some miscellaneous advices to make the upgrade easier.<br />
Also <a href="http://stackoverflow.com/questions/1401904/painless-way-to-install-a-new-version-of-r">this</a> thread on stackoverflow can be of some value.<br />
Feel free to contribute with suggestions about how to upgrade your R installation.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com5tag:blogger.com,1999:blog-7556813435224291579.post-13055531667385062402010-07-21T13:37:00.000+01:002010-07-21T13:37:19.656+01:00R Cheat Sheets and more<a href="http://devcheatsheet.com/tag/r/">Here</a> you can find a collection of cheat sheets useful to R developers.<br />
Visit the devcheatsheet <a href="http://devcheatsheet.com/">homepage</a> to inspect cheat sheets and quick reference card for other programming languages and applications.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-31056253022304951382010-06-30T17:00:00.000+01:002010-06-30T17:00:48.297+01:00R Journal 2/1R Journal 2/1 is out! Grab it from <a href="http://journal.r-project.org/index.html">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-61332625010559007822010-04-22T13:07:00.002+01:002011-10-31T21:26:29.174+00:00R 2.11.0 is released!The new R 2.11.0 is out! Get it from <a href="http://cran.r-project.org/src/base/R-2/R-2.11.0.tar.gz">here</a>.<br />
Take a look at <a href="http://onertipaday.blogspot.com/search/label/upgrade">these</a> posts for some miscellaneous advices to make the upgrade easier.<br />
Also <a href="http://stackoverflow.com/questions/1401904/painless-way-to-install-a-new-version-of-r">this</a> thread on stackoverflow can be of some value.<br />
Feel free to contribute with suggestions about how to upgrade your R installation.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com4tag:blogger.com,1999:blog-7556813435224291579.post-70627224829772676872010-03-19T11:35:00.001+00:002011-11-21T08:55:08.091+00:00Balloon plot using ggplot2Following <a href="http://www.talgalili.com/">Tal Galili</a> example and using part of his code, I want to plot the balloonplot you can see <a href="http://www.informationisbeautiful.net/play/snake-oil-supplements/">here</a> using R and the excellent <a href="http://cran.r-project.org/web/packages/ggplot2/index.html">ggplot2</a> package by <a href="http://had.co.nz/">Hadley Wickham</a>.<br />
<br />
<pre class="brush: r">### I retrieve the data from the google document you can find here using Tal Galili code:
## I slightly modified Tal code to include popularity stats:
supplement.popularity <- supplements.data[ss,7]
supplements.df <- na.omit(data.frame(supplement.name, supplement.benefits, supplement.popularity, supplement.score)) ## remove rows containing NAs
colnames(supplements.df) <- c("name", "benefits", "popularity", "score")
## For sake of simplicity I select only the cardio metacondition
cardio <- (supplements.df[supplement.benefits=="cardio",])[, -2]</pre>
<pre class="brush: r">## For reproducibility I add the cardio data.frame so you can use it right away
cardio <- read.table(tc <-textConnection(
" name popularity score
2 'arginine' 1.080 3
10 'vitamin b3' 0.201 3
15 'omega 3' 4.000 3
22 'hawthorn' 0.442 4
27 'red yeast rice' 0.264 4
29 'vitamin d' 6.700 4
31 'omega 6' 2.000 4
35 'green tea' 26.100 5
37 'olive leaf' 0.224 5
41 'fish oil' 4.000 6
43 'red yeast rice' 0.264 6")); close(tc)
cardio$name <- gsub(" ", "\n", cardio$name) #substitute ' ' with '\n' in the names</pre>
<pre class="brush: r">library(ggplot2)
myTheme <- function(base_size = 10) {
structure(list(
panel.background = theme_rect(size = 1, colour = "lightgray"),
panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank(),
axis.line = theme_blank(),
axis.text.x = theme_blank(),
axis.ticks = theme_blank(),
strip.background = theme_blank(),
strip.text.y = theme_blank(),
legend.background = theme_blank(),
legend.key = theme_blank(),
legend.key.size = unit(1.2, "lines"),
legend.title = theme_text(size = 8, face = "bold", hjust = 0),
legend.position = "right"
), class = "options")
}</pre>
<pre class="brush: r">s <- ggplot(cardio, aes(name, score)) + xlab(NULL) + ylab(NULL) + myTheme()
s <- s + geom_point( aes(size=popularity, colour=score, fill=score), legend=TRUE) +
scale_y_continuous( breaks=as.numeric(levels(factor(cardio$score))), labels=c("Conflicting", "Promising", "Good", "Strong") ) +
scale_area( breaks=c(min(cardio$popularity),mean(cardio$popularity),max(cardio$popularity)), to=c(4,60) ) +
geom_text(aes(y=cardio$score, label=cardio$name, size=cardio$popularity/90), legend=FALSE)
#pdf("cardio.pdf",height=8,width=12);s;dev.off()
png("cardio.png",height=700,width=1000);s;dev.off()</pre>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgM6qm7ivXkSEAFqvD4XldMeEjVBMHDzmAFusC00r17p3SBM_Y6BrJoCVYuD2Jk4e3HgVB9xdYMoZlWlxQhhr0sSOE49j0bi7FsEzy2Nm9GHUQvgOsePPHebfcliBAVsJ-H181liiiUiGU/s1600-h/cardio.png"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5450309484623327826" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgM6qm7ivXkSEAFqvD4XldMeEjVBMHDzmAFusC00r17p3SBM_Y6BrJoCVYuD2Jk4e3HgVB9xdYMoZlWlxQhhr0sSOE49j0bi7FsEzy2Nm9GHUQvgOsePPHebfcliBAVsJ-H181liiiUiGU/s320/cardio.png" style="cursor: hand; cursor: pointer; display: block; height: 224px; margin: 0px auto 10px; text-align: center; width: 320px;" /></a>Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com2tag:blogger.com,1999:blog-7556813435224291579.post-64766224745290867252010-03-07T17:43:00.000+00:002010-03-07T17:43:10.263+00:00One R Tip A Day meets Tecnica ArcanaFor italian speaking people only (sorry!).<br />
<br />
Carlo il curatore dell'ottimo podcast tecnologico <a href="http://www.tecnicaarcana.com/"><b>Tecnica Arcana</b></a> mi ha intervistato sulla mia professione e su R. <a href="http://www.tecnicaarcana.com/2010/ta-039-linguaggio-di-programmazione-r-e-bioinformatica/">Qui</a> potete scaricare l'intervista in formato mp3.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com3tag:blogger.com,1999:blog-7556813435224291579.post-55135224382181581802010-01-07T16:19:00.000+00:002010-01-07T16:19:10.482+00:00Scatter plot with 4 axes labels and gridRavi from <a href="http://blog.revolution-computing.com/2010/01/r-package-growth.html">this</a> post (via Revolutions blog) wanted to check the code that produces the left panel of the Figure 3 from <a href="http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf">this</a> article taken from the current issue of the <a href="http://journal.r-project.org/archive/2009-2/2009-2_index.html">R Journal</a>. Below my attempt to reproduce the plot:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_zct02J1FROM/S0YG0tq6E6I/AAAAAAAAAak/uC6uWHbsA-4/s1600-h/CRAN_packages.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="http://1.bp.blogspot.com/_zct02J1FROM/S0YG0tq6E6I/AAAAAAAAAak/uC6uWHbsA-4/s320/CRAN_packages.png" width="320" /></a><br />
</div><br />
<code>rv <- seq(1.3, 2.9, .1)<br />
rv <- rv[-grep("1.6", rv)] # remove R version 1.6<br />
pckg.num <- c(110,129,162,219,273,357,406,548,647,739,911,1000,1300,1427,1614,1952)<br />
rv.dates <- c("2001-6-21", "2001-12-17","2002-06-12","2003-05-27",<br />
"2003-11-16","2004-06-05","2004-10-12","2005-06-18","2005-12-16", "2006-05-31",<br />
"2006-12-12","2007-04-12","2007-11-16","2008-03-18","2008-10-18","2009-09-17")<br />
pckg.fit <- lm(pckg.num~rv)<br />
png("CRAN_packages.png")<br />
par(mar=c(7, 5, 5, 3), las=2)<br />
plot(as.POSIXct(rv.dates), pckg.num, xlab="",ylab="",col="red", log="y", pch=19, axes=F)<br />
axis.POSIXct(1, 1:16, rv.dates, format="%Y-%m-%d")<br />
mtext("Date", side=1, line=5, las=1)<br />
axis(2, at=c(100,200,300,400,500,600,800,100,1200,1500,2000))<br />
mtext("Number of CRAN Packages", side=2, line=3, las=3)<br />
axis.POSIXct(3, rv.dates, rv.dates, labels=as.character(rv))<br />
mtext("R Version", side=3, line=3, las=1)<br />
axis(4, pckg.num)<br />
abline(v=as.POSIXct(rv.dates), col="lightgray", lty="dashed")<br />
abline(h=pckg.num, col="lightgray", lty="dashed")<br />
box()<br />
abline(lm(log10(pckg.num)~as.POSIXct(rv.dates)), col="red")<br />
dev.off()<br />
</code>Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com2tag:blogger.com,1999:blog-7556813435224291579.post-23386037863644373752010-01-05T12:46:00.000+00:002010-01-05T12:46:09.195+00:00R Journal 1/2R Journal 1/2 is out! Grab it from <a href="http://journal.r-project.org/archive/2009-2/RJournal_2009-2.pdf">here</a>.Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0tag:blogger.com,1999:blog-7556813435224291579.post-82471041823236815002009-12-12T16:21:00.002+00:002010-01-05T12:46:44.703+00:00A central hub for R bloggersI would like to suggest to my readers to take a look and bookmark a new blog named <a href="http://www.r-bloggers.com/">R-bloggers</a> which aims to be "a central hub of content collected from bloggers who write about R".<br />It seems a nice idea to me to have a centralized source of information for the R blogger community.<br />Good Luck, Tal!Paolohttp://www.blogger.com/profile/01969817827028660433noreply@blogger.com0