<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" gd:etag="W/&quot;C0cEQXc8eyp7ImA9WhRUFUQ.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035</id><updated>2012-01-26T09:16:40.973-06:00</updated><category term="Policy" /><category term="Stata" /><category term="1000 genomes" /><category term="Recommended Reading" /><category term="Twitter" /><category term="SQL" /><category term="Visualization" /><category term="Statistics" /><category term="Machine Learning" /><category term="Noteworthy blogs" /><category term="Pathways" /><category term="ggplot2" /><category term="Imputation" /><category term="Perl" /><category term="Sequencing" /><category term="Search" /><category term="Tutorials" /><category term="Announcements" /><category term="PubMed" /><category term="Productivity" /><category term="RSS" /><category term="GWAS" /><category term="Journal club" /><category term="Linux" /><category term="Clustering" /><category term="Software" /><category term="Writing" /><category term="Web Apps" /><category term="Ethics" /><category term="PLINK" /><category term="News" /><category term="Bioinformatics" /><category term="R" /><title>Getting Genetics Done</title><subtitle type="html">Software, tips, &amp;amp; productivity hacks for getting things done in genetics research</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>297</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/GettingGeneticsDone" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="gettinggeneticsdone" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by-sa/3.0/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">GettingGeneticsDone</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><entry gd:etag="W/&quot;Ak4FQXY8fSp7ImA9WhRUEEs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-6536223172088465316</id><published>2012-01-20T08:15:00.001-06:00</published><updated>2012-01-20T08:15:10.875-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-20T08:15:10.875-06:00</app:edited><title>Joint Techs Netcast: Enhancing Infrastructure Support for Data Intensive Science</title><content type="html">The winter Joint Techs meeting is next week in Baton Rouge. I'm not going, but I plan on participating via a &lt;a href="http://events.internet2.edu/2012/jt-loni/agenda.cfm?go=netcast" target="_blank"&gt;netcast&lt;/a&gt; to see what's going on.&amp;nbsp;Jim Bottum, Clemson's CIO, is moderating an entire day devoted to the topic Enhancing Infrastructure Support for Data Intensive Science. Of particular interest to me are the talks from 9:30-11am Tuesday January 24 from researchers and those supporting climatology, genomics, and the XSEDE projects. The afternoon of January 24 has some talks from academic and government labs who've&amp;nbsp;successfully deployed methods to enhance their infrastructure support for data intensive science. Check out the full agenda for the day &lt;a href="http://events.internet2.edu/2012/jt-loni/agenda.cfm?types=&amp;amp;details=&amp;amp;timespan=2012-01-24" target="_blank"&gt;here&lt;/a&gt;. These sessions sound particularly relevant for those researching and supporting large-scale genomics and bioinformatics projects.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://events.internet2.edu/2012/jt-loni/agenda.cfm?go=netcast" target="_blank"&gt;Joint Techs Meeting Netcast&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-6536223172088465316?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/5HUR8LbGTB0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/6536223172088465316/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/joint-techs-netcast-enhancing.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6536223172088465316?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6536223172088465316?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/joint-techs-netcast-enhancing.html" title="Joint Techs Netcast: Enhancing Infrastructure Support for Data Intensive Science" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DkUBR3k-eip7ImA9WhRVGE8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4779486207616584590</id><published>2012-01-17T12:17:00.000-06:00</published><updated>2012-01-17T12:17:36.752-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-17T12:17:36.752-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Software" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Annotating limma Results with Gene Names for Affy Microarrays</title><content type="html">Lately I've been using the &lt;a href="http://bioconductor.org/packages/release/bioc/html/limma.html" target="_blank"&gt;limma&lt;/a&gt; package often for analyzing microarray data. When I read in Affy CEL files using ReadAffy(), the resulting ExpressionSet won't contain any featureData annotation. Consequentially, when I run topTable to get a list of differentially expressed genes, there's no annotation information other than the Affymetrix probeset IDs or transcript cluster IDs. There are other ways of annotating these results (INNER JOIN to a MySQL database, &lt;a href="http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html" target="_blank"&gt;biomaRt&lt;/a&gt;, etc), but I would like to have the output from topTable already annotated with gene information. Ideally, I could annotate each probeset ID with a gene symbol, gene name, Ensembl ID, and have that Ensembl ID hyperlink out to the Ensembl genome browser. With some &lt;a href="https://stat.ethz.ch/pipermail/bioconductor/2011-February/037866.html" target="_blank"&gt;help from Gordon Smyth&lt;/a&gt; on the Bioconductor Mailing list, I found that annotating the ExpressionSet object results in the output from topTable also being annotated.&lt;br /&gt;
&lt;br /&gt;
The results from topTable are pretty uninformative without annotation:&lt;br /&gt;
&lt;script src="https://gist.github.com/1627892.js?file=noanno.txt"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
After annotation:&lt;br /&gt;
&lt;script src="https://gist.github.com/1627896.js?file=afteranno.txt"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
You can generate an HTML file with clickable links to the Ensembl Genome Browser for each gene:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-gzRPpFeET40/TxW6NqSiDcI/AAAAAAAAnE4/C8ixO4MXx7o/s1600/Picture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="157" src="http://4.bp.blogspot.com/-gzRPpFeET40/TxW6NqSiDcI/AAAAAAAAnE4/C8ixO4MXx7o/s320/Picture.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-_HE6mHwMggw/TxW6P7e1HzI/AAAAAAAAnFA/1eqL66KET3U/s1600/Picture+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-_HE6mHwMggw/TxW6P7e1HzI/AAAAAAAAnFA/1eqL66KET3U/s320/Picture+1.png" width="262" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Here's the R code to do it:&lt;br /&gt;
&lt;script src="https://gist.github.com/1627927.js?file=annotatelimma.r"&gt;
&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4779486207616584590?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/sGcx_FKdKPk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4779486207616584590/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4779486207616584590?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4779486207616584590?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html" title="Annotating limma Results with Gene Names for Affy Microarrays" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-gzRPpFeET40/TxW6NqSiDcI/AAAAAAAAnE4/C8ixO4MXx7o/s72-c/Picture.png" height="72" width="72" /><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DEIFQ3w5eSp7ImA9WhRWF0o.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3801894636132263702</id><published>2012-01-05T09:13:00.000-06:00</published><updated>2012-01-05T09:15:12.221-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-05T09:15:12.221-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Productivity" /><category scheme="http://www.blogger.com/atom/ns#" term="Tutorials" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>New Year's Resolution: Learn How to Code</title><content type="html">Farhad Manjoo at Slate has a good article on &lt;a href="http://www.slate.com/articles/technology/technology/2012/01/learn_to_program_make_a_free_weekly_coding_lesson_your_new_year_s_resolution_.single.html" target="_blank"&gt;why you need to learn how to program&lt;/a&gt;. Chances are, if you're reading this post here you're already fairly adept at some form of programming. But if you're not, you should give it some serious thought.&lt;br /&gt;
&lt;br /&gt;
Gina Trapani, former editor of tech blog Lifehacker, is quoted in the article:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
“Learning to code demystifies tech in a way that empowers and enlightens. When you start coding you realize that every digital tool you have ever used involved lines of code just like the ones you're writing, and that if you want to make an existing app better, you can do just that with the same foreach and if-then statements every coder has ever used.”&lt;/blockquote&gt;
Farhad makes the point that programming is important even in traditionally non-computational fields: if you were a travel agent in the 90's and knew how to code, not only would you have been able to see the approaching inevitable collapse of your profession, but perhaps you would have been able to get in early on the dot-com travel industry boom.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/get-all-your-questions-answered.html" target="_blank"&gt;Q&amp;amp;A sites for biologists&lt;/a&gt; are littered with questions from researchers asking for non-technical, code-free ways of doing a particular analysis. Your friendly bioinformatics or computational biology neighbor can often point to a resource or design a solution that can get you 90% of the way, but usually won't grok the biological problem as truly as you do. By learning even the smallest bit of programming, you can at least be equipped with the knowledge of what is programmatically possible, and collaborations with your bioinformatician can be more fruitful. As every field of biological research becomes more computational in nature, learning how to code is becoming more important than ever.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Where to start&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
Getting started really isn't that difficult. Grab a good text editor like Notepad++ for windows, TextMate or Macvim for Mac, or vim for Linux/Unix. What language should you start with? This can be a subject of intense debate, but in reality, it doesn't matter - just pick something that's relevant to what you're doing. If you know Perl or Java, you can pick up the basics of Ruby or C++ in a weekend. I started with Perl (using the &lt;a href="http://www.amazon.com/o/ASIN/1449303587/ref=nosim/gettgenedone-20" target="_blank"&gt;Llama book&lt;/a&gt;), but for scientific computing and basic scripting/automation, I would recommend learning Python instead. While Perl lets you get away with sloppy coding, terse shortcuts, with the motto of "there's more than one way to do it," Python forces you to keep your code tidy, and has a model that there's probably one best way to do something, and that's the way you should use. Python has a huge following in the scientific community - chances are you'll find plenty of useful functionality in the &lt;a href="http://biopython.org/" target="_blank"&gt;BioPython&lt;/a&gt; and &lt;a href="http://www.scipy.org/" target="_blank"&gt;SciPy&lt;/a&gt; modules. I learned Python in an afternoon through watching videos and doing exercises in&amp;nbsp;&lt;a href="http://code.google.com/edu/languages/google-python-class/" target="_blank"&gt;Google's Python Class&lt;/a&gt;, and the free book &lt;a href="http://www.diveintopython.net/" target="_blank"&gt;Dive Into Python&lt;/a&gt; is a great reference. If you're on Windows, you can get Python from ActiveState; if you're on Mac or Linux, you already have Python.&lt;br /&gt;
&lt;br /&gt;
The Slate article also points to &lt;a href="http://codeyear.com/" target="_blank"&gt;Code Year&lt;/a&gt; - a site that will send you interactive coding projects once a week throughout 2012 starting January 9. Code Year is from the creators of &lt;a href="http://www.codecademy.com/" target="_blank"&gt;Code Academy&lt;/a&gt; - a site with a series of fun, interactive JavaScript tutorials. Lifehacker has a 5-part &lt;a href="http://lifehacker.com/5744113/learn-to-code-the-full-beginners-guide" target="_blank"&gt;"Night School" series on the basics of programming&lt;/a&gt;. Once you have some basic programming chops, take a look at Stanford's free &lt;a href="http://jan2012.ml-class.org/" target="_blank"&gt;machine learning&lt;/a&gt;,&amp;nbsp;&lt;a href="https://www.ai-class.com/" target="_blank"&gt;artificial intelligence&lt;/a&gt;, and &lt;a href="http://www.nlp-class.org/" target="_blank"&gt;Natural Language Processing&lt;/a&gt;&amp;nbsp;classes to hone your scientific computing skills. Need a challenge? Try the &lt;a href="http://www.pythonchallenge.com/" target="_blank"&gt;Python Challenge&lt;/a&gt; for fun puzzles to hone your Python skills, or check out &lt;a href="http://projecteuler.net/" target="_blank"&gt;Project Euler&lt;/a&gt; if you want to tackle more math-oriented programming challenges with any language. The point is - there is no lack of free resources to help you get started or get better at programming.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.slate.com/articles/technology/technology/2012/01/learn_to_program_make_a_free_weekly_coding_lesson_your_new_year_s_resolution_.single.html" target="_blank"&gt;Slate - You Need to Learn How to Program&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3801894636132263702?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/pwJJv7G46eA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3801894636132263702/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/new-years-resolution-learn-how-to-code.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3801894636132263702?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3801894636132263702?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2012/01/new-years-resolution-learn-how-to-code.html" title="New Year's Resolution: Learn How to Code" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;Ak8GR3k8fip7ImA9WhRXE00.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4441580376952065191</id><published>2011-12-15T15:37:00.000-06:00</published><updated>2011-12-19T09:33:46.776-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-19T09:33:46.776-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Query a MySQL Database from R using RMySQL</title><content type="html">I use this all the time, and the setup is dead simple. Follow &lt;a href="https://gist.github.com/1482991"&gt;the code&lt;/a&gt; below to load the RMySQL package, connect to a database (here the UCSC genome browser's public MySQL instance), set up a function to make querying easier, and query the database to return results as a data frame.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1482991.js?file=rmysql.r"&gt;
&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4441580376952065191?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/TYbbdLxe6gI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4441580376952065191/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/query-mysql-database-from-r-using.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4441580376952065191?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4441580376952065191?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/query-mysql-database-from-r-using.html" title="Query a MySQL Database from R using RMySQL" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;CEUDRns8fCp7ImA9WhRQGUU.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3250033445526668928</id><published>2011-12-15T14:51:00.000-06:00</published><updated>2011-12-15T14:51:17.574-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-15T14:51:17.574-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Writing" /><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Galaxy Project Group on CiteULike and Mendeley</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://wiki.g2.bx.psu.edu/CiteULike" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-V4mwYdS6Ihw/Tupdfg5TZBI/AAAAAAAAmCI/aR2Kg62Pg1s/s1600/Picture+3.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;br /&gt;&lt;/div&gt;
The &lt;a href="http://usegalaxy.org/"&gt;Galaxy Project&lt;/a&gt; started using CiteULike to organize papers that are about, use, or reference Galaxy. The &lt;a href="http://www.citeulike.org/group/16008"&gt;Galaxy CiteULike group&lt;/a&gt; is open to any CUL user, and once you join, you can add papers to the group, assign tags, and rate papers. &lt;br /&gt;
&lt;br /&gt;
While not a CUL user, I'm a big fan of &lt;a href="http://www.mendeley.com/"&gt;Mendeley&lt;/a&gt; for managing references, PDFs, and creating bibliographies (&lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/results-from-reference-management-poll.html"&gt;and so are many of you&lt;/a&gt;). I'm happy to hear that the Galaxy folks also set up a &lt;a href="http://www.mendeley.com/groups/1710745/galaxy-project/"&gt;Galaxy Mendeley Group&lt;/a&gt;, also open to the public for anyone to join.&amp;nbsp; If you join the Galaxy public Mendeley group, all of the groups references will show up in your Mendeley library (and these won't count against your personal quota).&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-My2xF2i0CgI/TupbZSwRs0I/AAAAAAAAmBo/XNC5p4aB0GI/s1600/galaxymendeley.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="395" src="http://2.bp.blogspot.com/-My2xF2i0CgI/TupbZSwRs0I/AAAAAAAAmBo/XNC5p4aB0GI/s400/galaxymendeley.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
Just one important thing to note: The Mendeley group is a &lt;i&gt;mirror&lt;/i&gt; of the CiteULike group, so if you want to add more publications to the Galaxy Group, &lt;i&gt;add them on CiteULike,&lt;/i&gt; not Mendeley (it doesn't work the other way around - papers added to Mendeley won't make it to the CUL group).&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://wiki.g2.bx.psu.edu/CiteULike"&gt;Galaxy Project Group on CiteULike and Mendeley&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3250033445526668928?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/4wJkF3fJhJk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3250033445526668928/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/galaxy-project-group-on-citeulike-and.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3250033445526668928?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3250033445526668928?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/galaxy-project-group-on-citeulike-and.html" title="Galaxy Project Group on CiteULike and Mendeley" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-V4mwYdS6Ihw/Tupdfg5TZBI/AAAAAAAAmCI/aR2Kg62Pg1s/s72-c/Picture+3.png" height="72" width="72" /><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;D0UDQno_fSp7ImA9WhRQE0k.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-2348199117673614095</id><published>2011-12-08T05:48:00.001-06:00</published><updated>2011-12-08T05:54:33.445-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-08T05:54:33.445-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>RNA-Seq &amp; ChiP-Seq Data Analysis Course at EBI</title><content type="html">I just got this announcement from EMBL-EBI about an RNA-seq/ChIP-seq analysis hands-on course. Find the full details, schedule, and speaker list &lt;a href="http://www.ebi.ac.uk/training/handson/course_120502_RNA.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Title&lt;/i&gt;: &lt;a href="http://www.ebi.ac.uk/training/handson/course_120502_RNA.html"&gt;Advanced RNA-Seq and Chip-Seq Data Analysis Course&lt;/a&gt;&lt;br /&gt; &lt;i&gt;Date&lt;/i&gt;: May 1-4 2012&lt;br /&gt; &lt;i&gt;Venue&lt;/i&gt;: EMBL-EBI, Hinxton, Nr Cambridge, CB10 1SD, UK&lt;br /&gt; &lt;i&gt;Registration Closing Date&lt;/i&gt;: March 6 2012 (12:00 midday GMT)&lt;br /&gt;&lt;br /&gt;This course is aimed at advanced PhD students and post-doctoral researchers who are applying or planning to apply high throughput sequencing technologies and bioinformatics methods in their research. The aim of this course is to familiarize the participants with advanced data analysis methodologies and provide hands-on training on the latest analytical approaches. &lt;br /&gt;&lt;br /&gt; Lectures will give insight into how biological knowledge can be generated from RNA-seq and ChIP-seq experiments and illustrate different ways of analyzing such data Practicals will consist of computer exercises that will enable the participants to apply statistical methods to the analysis of RNA-seq and ChIP-seq data under the guidance of the lecturers and teaching assistants. Familiarity with the technology and biological use cases of high throughput sequencing is required, as is some experience with R/Bioconductor. &lt;br /&gt;&lt;br /&gt; The course covers data analysis of RNA-Seq and ChIP-Seq experiments.&lt;br /&gt; Topics will include: alignment, data handling and visualisation, region identification, differential expression, data quality assessment and statistical analysis, using R/Bioconductor.&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-2348199117673614095?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/A9sC66SmlSc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/2348199117673614095/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/rna-seq-chip-seq-data-analysis-course.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2348199117673614095?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2348199117673614095?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/rna-seq-chip-seq-data-analysis-course.html" title="RNA-Seq &amp; ChiP-Seq Data Analysis Course at EBI" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;CUUCSHczfip7ImA9WhRQEk0.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1205612774126081401</id><published>2011-12-06T14:11:00.001-06:00</published><updated>2011-12-06T14:27:49.986-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-06T14:27:49.986-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>An example RNA-Seq Quality Control and Analysis Workflow</title><content type="html">I found the slides below on the &lt;a href="http://jura.wi.mit.edu/bio/education/"&gt;education page&lt;/a&gt; from Bioinformatics &amp;amp; Research Computing at the Whitehead Institute. The first set (&lt;a href="http://jura.wi.mit.edu/bio/education/hot_topics/QC_HTP/QC_HTP.pdf"&gt;PDF&lt;/a&gt;) gives an overview of the methods and software available for quality assessment of microarray and RNA-seq experiments using the &lt;a href="http://hannonlab.cshl.edu/fastx_toolkit/"&gt;FastX toolkit&lt;/a&gt; and &lt;a href="http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/"&gt;FastQC&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;iframe height="450" src="http://docs.google.com/viewer?url=http%3A%2F%2Fjura.wi.mit.edu%2Fbio%2Feducation%2Fhot_topics%2FQC_HTP%2FQC_HTP.pdf&amp;amp;embedded=true" style="border: none;" width="500"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;br /&gt;
The second set (&lt;a href="http://jura.wi.mit.edu/bio/education/hot_topics/RNAseq/RNAseqDE_Dec2011.pdf"&gt;PDF&lt;/a&gt;)&amp;nbsp; gives an example RNA-seq workflow using &lt;a href="http://tophat.cbcb.umd.edu/"&gt;TopHat&lt;/a&gt;, &lt;a href="http://samtools.sourceforge.net/"&gt;SAMtools&lt;/a&gt;, &lt;a href="http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html"&gt;Python/HTseq&lt;/a&gt;, and &lt;a href="http://www-huber.embl.de/users/anders/DESeq/"&gt;R/DEseq&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;iframe height="450" src="http://docs.google.com/viewer?url=http%3A%2F%2Fjura.wi.mit.edu%2Fbio%2Feducation%2Fhot_topics%2FRNAseq%2FRNAseqDE_Dec2011.pdf&amp;amp;embedded=true" style="border: none;" width="500"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;br /&gt;
If you're doing any RNA-seq work these are both really nice resources to help you get a command-line based analysis workflow up and running (if you're not using &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/11/guide-to-rna-seq-analysis-in-galaxy.html"&gt;Galaxy for RNA-seq&lt;/a&gt;).&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1205612774126081401?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/8fhFT_mTsiU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1205612774126081401/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/example-rna-seq-quality-control-and.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1205612774126081401?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1205612774126081401?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/example-rna-seq-quality-control-and.html" title="An example RNA-Seq Quality Control and Analysis Workflow" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkAGQ3k5eyp7ImA9WhRQEU0.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-5444651446771477730</id><published>2011-12-05T12:06:00.001-06:00</published><updated>2011-12-05T12:12:02.723-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-12-05T12:12:02.723-06:00</app:edited><title>Webinar: Applications of Next-Generation Sequencing in Clinical Care</title><content type="html">I just got an email from Illumina about a webinar that looks interesting this Wednesday at 9am PST (noon EST) on clinical applications of next-gen sequencing. &lt;br /&gt;
&lt;br /&gt;
Date: Wednesday, December 7, 2011&lt;br /&gt;Time: 9:00 AM (PST)&lt;br /&gt;Speaker: Rick Dewey, MD, Stanford Center for Inherited Cardiovascular Disease&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Next-generation sequencing (NGS) presents both challenges and opportunities for clinical care. Dr. Dewey will share examples from his experience at Stanford, successful and otherwise, in which NGS has been applied to cases of familial cardiomyopathy, and other inherited conditions. Bring your questions for a Q&amp;amp;A session. In this webinar, Dr. Dewey will discuss approaches to: Data storage and management; Error identification and reduction; Disease risk encoded in the reference sequence; and Variant validation.&lt;br /&gt;
&lt;br /&gt;
The webinar will be recorded and available to you afterwards if you register.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://mkt.illumina.com/Webinar_Landingpage_IGS_112811.html"&gt;Registration - Applications of Next-Generation Sequencing in Clinical Care&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-5444651446771477730?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/DIwC6C1xGEY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/5444651446771477730/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/webinar-applications-of-next-generation.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/5444651446771477730?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/5444651446771477730?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/12/webinar-applications-of-next-generation.html" title="Webinar: Applications of Next-Generation Sequencing in Clinical Care" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkUBQHs_cCp7ImA9WhRSFk8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-4352481818485473354</id><published>2011-11-18T08:50:00.001-06:00</published><updated>2011-11-18T08:57:31.548-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-18T08:57:31.548-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>BioMart Gene ID Converter</title><content type="html">&lt;a href="http://www.biomart.org/"&gt;BioMart&lt;/a&gt; recently got a facelift. I'm not sure if this was always available in the old BioMart, but there's now a link to a &lt;a href="http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2"&gt;gene ID converter&lt;/a&gt; that worked pretty well for me for converting S. cerevisiae gene IDs to standard gene names. It looks like the tool will convert nearly any ID you could imagine. Looks like it will also map Affy probe IDs to gene, transcript, or protein IDs and names.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="230" src="http://4.bp.blogspot.com/-SdvoN6SqDoY/TsZxs_LlFPI/AAAAAAAAl94/4_lKYy8u1G0/s400/2011-11-18_095157.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2"&gt;BioMart Gene ID Converter&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-4352481818485473354?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/yYd6CzJZsNA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/4352481818485473354/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/biomart-gene-id-converter.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4352481818485473354?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/4352481818485473354?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/biomart-gene-id-converter.html" title="BioMart Gene ID Converter" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-SdvoN6SqDoY/TsZxs_LlFPI/AAAAAAAAl94/4_lKYy8u1G0/s72-c/2011-11-18_095157.png" height="72" width="72" /><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CkIMQHc-fSp7ImA9WhRSFUs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-2113732268572589058</id><published>2011-11-17T13:29:00.001-06:00</published><updated>2011-11-17T14:09:41.955-06:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-17T14:09:41.955-06:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R</title><content type="html">&lt;a href="http://www.ncbi.nlm.nih.gov/geo/"&gt;Gene Expression Omnibus&lt;/a&gt; is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data directly into R using the &lt;a href="http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html"&gt;GEOquery&lt;/a&gt; bioconductor package written (and &lt;a href="http://www.bioconductor.org/packages/1.8/bioc/vignettes/GEOquery/inst/doc/GEOquery.pdf"&gt;well documented&lt;/a&gt;) by &lt;a href="https://twitter.com/#%21/seandavis12"&gt;Sean Davis&lt;/a&gt;, and analyze the data using the &lt;a href="http://www.bioconductor.org/packages/release/bioc/html/limma.html"&gt;limma package&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.ncbi.nlm.nih.gov/geo/info/geo2r.html"&gt;GEO2R&lt;/a&gt; is a very nice web-based tool to do this graphically and automatically. Enter the GEO series number in the search box (or use &lt;a href="http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE7442"&gt;this one&lt;/a&gt; for an example). Start by creating groups (e.g. control vs treatment, early vs late time points in a time course, etc), then select samples to add to that group.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-AUPHkYFCYzA/TsVj9lORyNI/AAAAAAAAl9k/EXgKM90XNKU/s1600/2011-11-17_144235.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-AUPHkYFCYzA/TsVj9lORyNI/AAAAAAAAl9k/EXgKM90XNKU/s1600/2011-11-17_144235.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Scroll down to the bottom and click Top 250 to run an analysis in limma (the &lt;a href="http://www.bioconductor.org/packages/2.9/bioc/vignettes/limma/inst/doc/usersguide.pdf"&gt;users guide&lt;/a&gt; documents this well). GEO2R will automatically fetch the data, group your samples, create your design matrix for your differential expression analysis, run the analysis, and annotate the results. A big complaint with point-and-click GUI and web based applications is the lack of reproducibility. GEO2R obviates this problem by giving you all the R code it generated to run the analysis. Click the R script tab to see the R code it generated, and save it for later.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1374252.js?file=demo_geo2r.r"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
The options tab allows you to adjust the multiple testing correction method, and the value distribution tab lets you take a look at the distribution gene expression values among the samples that you assigned to your groups.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-GtAWj2M2H84/TsVmL7YJguI/AAAAAAAAl9s/7dCa4Zr0Nug/s1600/Rplot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-GtAWj2M2H84/TsVmL7YJguI/AAAAAAAAl9s/7dCa4Zr0Nug/s1600/Rplot.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
There's no built-in quality assessment tools in GEO2R, but you can always take the R code it generated and do your own QA/QC. It's also important to verify what values it's pulling from each array into the data matrix. In this example, epithelial cells at various time points were compared to a reference cell line, and the log base 2 fold change was calculated. This was used in the data matrix rather than the actual expression values.&lt;br /&gt;
&lt;br /&gt;
GEO2R is a very nice tool to quickly run an analysis on data in GEO. Now, if we could only see something similar for the European repository, ArrayExpress.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.ncbi.nlm.nih.gov/geo/geo2r/"&gt;GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-2113732268572589058?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/X0ariGAC_Wo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/2113732268572589058/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/geo2r-web-app-to-analyze-gene.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2113732268572589058?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/2113732268572589058?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/geo2r-web-app-to-analyze-gene.html" title="GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-AUPHkYFCYzA/TsVj9lORyNI/AAAAAAAAl9k/EXgKM90XNKU/s72-c/2011-11-17_144235.png" height="72" width="72" /><thr:total>3</thr:total></entry><entry gd:etag="W/&quot;C0YFR3gyeCp7ImA9WhRTEUs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3184286931372991887</id><published>2011-11-01T10:24:00.001-05:00</published><updated>2011-11-01T10:25:16.690-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-01T10:25:16.690-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Tutorials" /><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><title>Guide to RNA-seq Analysis in Galaxy</title><content type="html">&lt;a href="http://bx.mathcs.emory.edu/"&gt;James Taylor&lt;/a&gt; came to UVA last week and gave an excellent talk on how &lt;a href="http://usegalaxy.org/"&gt;Galaxy&lt;/a&gt; enables transparent and reproducible research in genomics. I'm gearing up to take on several projects that involve next-generation sequencing, and I'm considering &lt;a href="http://getgalaxy.org/"&gt;installing my own&lt;/a&gt; Galaxy framework on a local cluster or &lt;a href="http://wiki.g2.bx.psu.edu/Admin/Cloud"&gt;on the cloud&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
If you've used Galaxy in the past you're probably aware that it allows you to share data, workflows, and histories with other users. New to me was the &lt;a href="http://main.g2.bx.psu.edu/page/list_published"&gt;pages section&lt;/a&gt;, where an entire analysis is packaged on a single pages, and vetting is crowdsourced to other Galaxy users in the form of comments and voting.&lt;br /&gt;
&lt;br /&gt;
I recently found a page published by Galaxy user Jeremy that serves as a &lt;a href="http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise"&gt;guide to RNA-seq analysis using Galaxy&lt;/a&gt;. If you've never done RNA-seq before it's a great place to start. The guide has all the data you need to get started on an experiment where you'll use TopHat/Bowtie to align reads to a reference genome, and Cufflinks to assemble transcripts and quantify differential gene expression, alternative splicing, etc. The dataset is small, so all the analyses start and finish quickly, allowing you to finish the tutorial in just a few hours. The author was kind enough to include links to relevant sections of the TopHat and Cufflinks documentation where it's needed in the tutorial. Hit the link below to get started.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise"&gt;Galaxy Pages: RNA-seq Analysis Exercise&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3184286931372991887?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/NzLbBkNsgn0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3184286931372991887/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/guide-to-rna-seq-analysis-in-galaxy.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3184286931372991887?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3184286931372991887?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/11/guide-to-rna-seq-analysis-in-galaxy.html" title="Guide to RNA-seq Analysis in Galaxy" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DkcDQXY6fSp7ImA9WhdaF04.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7000425127932039812</id><published>2011-10-27T11:47:00.001-05:00</published><updated>2011-10-27T11:47:50.815-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-27T11:47:50.815-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Clustering" /><category scheme="http://www.blogger.com/atom/ns#" term="Visualization" /><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>A New Dimension to Principal Components Analysis</title><content type="html">&lt;br /&gt;
&lt;br /&gt;
&lt;div class="MsoNormal"&gt;
In general, the standard practice for correcting for population stratification in genetic studies is to use principal components analysis (PCA) to categorize samples along different&amp;nbsp;&lt;i&gt;ethnic axes&lt;/i&gt;.&amp;nbsp;&amp;nbsp;&lt;a href="http://genepath.med.harvard.edu/~reich/Price%20et%20al.pdf"&gt;Price et al.&lt;/a&gt;&amp;nbsp;published on this in 2006, and since then PCA plots are a common component of many published GWAS studies.&amp;nbsp; One key advantage to using PCA for ethnicity is that each sample is given coordinates in a multidimensional space corresponding to the varying components of their ethnic ancestry.&amp;nbsp; Using either full GWAS data or a set of ancestral informative markers (AIMs), PCA can be easily conducted using available software packages like&amp;nbsp;&lt;a href="http://genepath.med.harvard.edu/~reich/Software.htm"&gt;EIGENSOFT&lt;/a&gt;&amp;nbsp;or&amp;nbsp;&lt;a href="http://gump.qimr.edu.au/gcta/"&gt;GCTA&lt;/a&gt;.&amp;nbsp;HapMap samples are sometimes included in the PCA analysis to provide a frame of reference for the ethnic groups. &amp;nbsp;&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
Once computed, each sample will have values that correspond to a position in the new coordinate system that effectively clusters samples together by ethnic similarity. &amp;nbsp;The results of this analysis are usually plotted/visualized to identify ethnic outliers or to simply examine the structure of the data. &amp;nbsp;A common problem however is that it may take more than the first two principal components to identify groups. &amp;nbsp;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="MsoNormal"&gt;
To illustrate, I will plot some PCs generated based on 125 AIMs markers for a recent study of ours. &amp;nbsp;I generated these using GCTA software and loaded the top 5 PCs into R using the read.table() function. &amp;nbsp;I loaded the top 5, but for continental ancestry, I've found that the top 3 are usually enough to separate groups. &amp;nbsp;The values look something like this: &amp;nbsp;&lt;/div&gt;
&lt;blockquote class="tr_bq"&gt;
&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&amp;nbsp; &amp;nbsp; new_ruid &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pc1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pc2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pc3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pc4 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pc5&lt;br /&gt;1 &amp;nbsp; &amp;nbsp;11596 &amp;nbsp;4.10996e-03 -0.002883830 &amp;nbsp;0.003100840 -0.00638232 &amp;nbsp;0.00709780&lt;br /&gt;2 &amp;nbsp; &amp;nbsp; 5415 &amp;nbsp;3.22958e-03 -0.000299851 -0.005358910 &amp;nbsp;0.00660643 &amp;nbsp;0.00430520&lt;br /&gt;3 &amp;nbsp; &amp;nbsp;11597 -4.35116e-03 &amp;nbsp;0.013282400 &amp;nbsp;0.006398130 &amp;nbsp;0.01721600 -0.02275470&lt;br /&gt;4 &amp;nbsp; &amp;nbsp; 5416 &amp;nbsp;4.01592e-03 &amp;nbsp;0.001408180 &amp;nbsp;0.005077310 &amp;nbsp;0.00159497 &amp;nbsp;0.00394816&lt;br /&gt;5 &amp;nbsp; &amp;nbsp; 3111 &amp;nbsp;3.04779e-03 -0.002079510 -0.000127967 -0.00420436 &amp;nbsp;0.01257460&lt;br /&gt;6 &amp;nbsp; &amp;nbsp;11598 &amp;nbsp;6.15318e-06 -0.000279919 &amp;nbsp;0.001060880 &amp;nbsp;0.00606267 &amp;nbsp;0.00954331&lt;/span&gt;&lt;/blockquote&gt;
&lt;div&gt;
I loaded this into a dataframe called pca, so I can plot the first two PCs using this command:&lt;/div&gt;
&lt;blockquote class="tr_bq"&gt;
plot(pca$pc1, pca$pc2)&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-WuZRG08fmDU/TqheB_z7d_I/AAAAAAAABKc/tLKelT_-ctc/s1600/pc1and2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-WuZRG08fmDU/TqheB_z7d_I/AAAAAAAABKc/tLKelT_-ctc/s320/pc1and2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
We might also want to look at the next two PCs:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
plot(pca$pc2, pca$pc3)&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-xqG7cSLxGvc/TqheV71q5oI/AAAAAAAABKk/Escu9Z6nUvE/s1600/pc2and3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-xqG7cSLxGvc/TqheV71q5oI/AAAAAAAABKk/Escu9Z6nUvE/s320/pc2and3.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&amp;nbsp;Its probably best to look at all of them together:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
pairs(pca[2:4])&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-qSfFRq3dZcw/TqheurajM8I/AAAAAAAABKs/Gq4eIeZmzGM/s1600/pairs.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-qSfFRq3dZcw/TqheurajM8I/AAAAAAAABKs/Gq4eIeZmzGM/s320/pairs.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
So this is where my mind plays tricks on me. &amp;nbsp;I can't make much sense out of these plots -- there should be four ethnic groups represented, but its hard to see who goes where. &amp;nbsp;To look at all of these dimensions simultaneously, we need a 3D plot. &amp;nbsp;Now 3D plots (especially 3D&amp;nbsp;&lt;i&gt;scatterplots)&amp;nbsp;&lt;/i&gt;aren't highly regarded -- in fact I hear that some&amp;nbsp;&lt;a href="https://lh6.googleusercontent.com/-qS7jJhy91Vg/Rn9pBLaq_JI/AAAAAAAAC8Y/3-ar4b8AGIA/s640/Mount2Sound%2525206-24-2007%2525203-28-17%252520PM.JPG"&gt;poor soul&lt;/a&gt;&amp;nbsp;at the University of Washington gets laughed at for showing his 3D plots &amp;nbsp;-- but in this case I found them quite useful.&lt;br /&gt;
&lt;br /&gt;
Using a library called rgl, I generated a 3D scatterplot like so:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
&amp;nbsp;plot3d(pca[2:4])&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-x-K3ZcsMJOM/Tql4wiXVEwI/AAAAAAAABLA/_b3QVFccTT8/s1600/noclust.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-x-K3ZcsMJOM/Tql4wiXVEwI/AAAAAAAABLA/_b3QVFccTT8/s1600/noclust.gif" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Now, using the mouse I could rotate and play with the cloud of data points, and it became more clear how the ethnic groups sorted out. &amp;nbsp;Just to double check my intuition, I ran a model-based clustering algorithm (&lt;a href="http://www.stat.washington.edu/fraley/mclust/tr504.pdf"&gt;mclust&lt;/a&gt;) on the data. &amp;nbsp;Different parameters obviously produce different cluster patterns, but I found that using an "ellipsoidal model with equal variances" and a cluster size of 4 identified the groups I thought should be there based on the overlay with the HapMap samples.&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
fit &amp;lt;- Mclust(pca[2:4], G=4, modelNames = "EEV")&lt;/blockquote&gt;
&lt;blockquote class="tr_bq"&gt;
plot3d(pca[2:4], col = fit$classification)&amp;nbsp;&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-pgMAHiIWvuw/Tql5HIXNdRI/AAAAAAAABLI/I2zPF5cLRwQ/s1600/clust.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-pgMAHiIWvuw/Tql5HIXNdRI/AAAAAAAABLI/I2zPF5cLRwQ/s1600/clust.gif" /&gt;&lt;/a&gt;&lt;/div&gt;
Basically, the red sphere corresponds to the European descent group, the green indicates the admixed African American group, the black group corresponds to the Hispanic group, and the blue identifying the Asian descent group. &amp;nbsp;We are still a bit confused as to why the Asian descent samples don't form a more concise cluster -- it may be due to relatively poor performance of these AIMs in Asian descent groups. &amp;nbsp; Whatever the case, you might notice several individuals falling either outside a clear cluster or at the interface between two groups. &amp;nbsp;The ethnic assignment for these individuals is questionable, but the clustering algorithm gives us a very nice measure of cluster assignment uncertainty. &amp;nbsp;We can plot this like so:&lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
plot(pca[2:3], cex = fit$uncertainty*10)&lt;/blockquote&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-uaGDcHAHoIw/Tql_NAuNf1I/AAAAAAAABLQ/71PTT2lUyLA/s1600/uncertainty.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://1.bp.blogspot.com/-uaGDcHAHoIw/Tql_NAuNf1I/AAAAAAAABLQ/71PTT2lUyLA/s320/uncertainty.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;
I had to scale the uncertainty factor by 10 to make the questionable points more visible in this plot, shown as the hollow circles. &amp;nbsp;We will likely drop these samples from any stratified analyses. &amp;nbsp;We can export the cluster assignment by accessing the fit$classification column, and we have our samples assigned to an ethnic group.&lt;br /&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7000425127932039812?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/rPEXrjMv2kc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7000425127932039812/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/new-dimension-to-principal-components_27.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7000425127932039812?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7000425127932039812?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/new-dimension-to-principal-components_27.html" title="A New Dimension to Principal Components Analysis" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-WuZRG08fmDU/TqheB_z7d_I/AAAAAAAABKc/tLKelT_-ctc/s72-c/pc1and2.png" height="72" width="72" /><thr:total>5</thr:total></entry><entry gd:etag="W/&quot;Ak4DRH4-eip7ImA9WhdbGUk.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1475248792265832281</id><published>2011-10-18T09:41:00.000-05:00</published><updated>2011-10-18T09:42:55.052-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-18T09:42:55.052-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Machine Learning" /><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Twitter" /><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>My thoughts on ICHG 2011</title><content type="html">&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-85pQXz9VB8E/TpyArwOEY6I/AAAAAAAABKA/bhYrzX2qAhM/s1600/ichg_pg.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-85pQXz9VB8E/TpyArwOEY6I/AAAAAAAABKA/bhYrzX2qAhM/s1600/ichg_pg.png" /&gt;&lt;/a&gt;&lt;/div&gt;
I’m a bit exhausted from a week of excellent science at ICHG.  First, let me say that Montreal is a truly remarkable city with fantastic food and a fascinating blend of architectural styles, all making the meeting a fun place to be…. Now on to the genomics – I’ll recap a few of the most exciting sessions I attended.   You can find a live-stream of tweets from the meeting by searching the &lt;a href="http://twitter.com/#!/search/realtime/%23ICHG2011"&gt;#ICHG2011&lt;/a&gt;&amp;nbsp;and &lt;a href="http://twitter.com/#!/search/realtime/%23ICHG"&gt;#ICHG&lt;/a&gt; hashtags.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
On Wednesday, Marylyn Ritchie(&lt;a href="http://twitter.com/#!/MarylynRitchie"&gt;@MarylynRitchie&lt;/a&gt;) and Nancy Cox organized “Beyond Genome-wide association studies”. &amp;nbsp;Nancy Cox presented some ideas on how to integrate multiple “intermediate” associations for SNPs, such as expression QTLs and newly discovered protein QTLs (More on pQTLs later).  This approach which she called a &lt;i&gt;Functional Unit Analysis&lt;/i&gt; would group signals together based on the genes they influence.  Nicholas Shork presented some nice examples of pros and cons of sequence level annotation algorithms. &amp;nbsp;Trey Idekker gave a very nice talk illustrating some of the properties of epistasis in yeast protein interaction networks.  One of the more striking points he made was that epistasis tends to occur between full protein complexes rather than within elements of the complexes themselves.   Marylyn Ritchie presented the ideas behind her &lt;a href="http://ritchielab.com/method.php?method=athena"&gt;ATHENA&lt;/a&gt; software for machine learning analysis of genetic data, and Manuel Mattesian from Tim Becker’s group presented the methods in their &lt;a href="http://intersnp.meb.uni-bonn.de/"&gt;INTERSNP&lt;/a&gt; software for doing large-scale interaction analysis.  What was most impressive with this session is that there were clear attempts to incorporate underlying biological complexity into data analysis. &lt;br /&gt;
&lt;br /&gt;
On Thursday, I attended the second Statistical Genetics section called “Expanding Genome-wide Association Studies”, organized by Saurabh Ghosh and Daniel Shriner.  Having recently attended IGES, I feel pretty “up” on newer analysis techniques, but this session had a few talks that sparked my interest.  The first three talks were related to haplotype phasing and the issues surrounding computational accuracy and speed.  The basic goal of all these methods is to efficiently estimate genotypes for a common set of loci for all samples of a study using a set of reference haplotypes, usually from the HapMap or 1000 genomes data.  Despite these advances, it seems like phasing haplotypes for thousands of samples is still a massive undertaking that requires a high-performance computing cluster.  There were several talks about ongoing epidemiological studies, including the Kaiser Permanente UCSF cohort.  Neil Risch presented an elegant study design implementing four custom GWAS chips for the four targeted populations.  Looks like the data hasn't started to flow from this yet, but when it does we’re sure to learn about lots of interesting ethnic-specific disease effects.  My good friend and colleague Dana Crawford presented an &lt;i&gt;in silico&lt;/i&gt; GWAS study of hypothyroidism.  In her best NPR voice, Dana showed how electronic medical records with GWAS data in the&lt;a href="https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page"&gt; EMERGE&lt;/a&gt; network can be re-used to construct entirely new studies nested within the data collected for other specific disease purposes.  Her excellent Post-Doc, Logan Dumitrescu presented several gene-environment interactions between Lipid levels and vitamin A and E from Dana’s EAGLE study.  Finally Paul O’Reilly presented a cool new way to look at multiple phenotypes by essentially flipping a typical regression equation around, estimating coefficients that relate each phenotype in a study to a single SNP genotype as an outcome.  This rather clever approach called &lt;a href="http://cran.r-project.org/web/packages/MultiPhen/index.html"&gt;MultiPhen&lt;/a&gt; is similar to log-linear models I’ve seen used for transmission-based analysis, and allows you to model the “interaction” among phenotypes in much the same way you would look at SNP interactions.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;By far the most interesting talks of the meeting (for me) were in the Genomics section on Gene Expression, organized by Tomi Pastinen and Mark Corbett.  Chris Mason started the session off with a fantastic demonstration of the power of RNA-seq.  Examining transcriptomes of 14 non-human primate species, they validated many of the computational predictions in the &lt;a href="http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/"&gt;AceView&lt;/a&gt; gene build, and illustrated that most “exome” sequencing is probably examining less than half of all transcribed sequences.  Rupali Patwardhan talked about a system for examining the impact of promoter and enhancer mutations in whole mice, essentially using mutagenesis screens to localize these regions.  Ron Hause presented work on the protein QTLs that Nancy Cox alluded to earlier in the conference.  Using a high-throughput form of western blots, they systematically examined levels for over 400 proteins in the Yoruba HapMap cell lines.  They also illustrate that only about 50% of eQTLs identified in these lines actually alter protein levels.  Stephen Montgomery spoke about the impact of rare genetic variants within a transcript on transcript levels.  Essentially he showed an epistatic effect on expression, where transcripts with deleterious alleles are less likely to be expressed – an intuitive and fascinating finding, especially for those considering rare-variant analysis. Athma Pai presented a new QTL that influences mRNA decay rates.  By measuring multiple time points using RNA-seq, she found individual-level variants that alter decay, which she calls dQTLs.  Veronique Adoue looked at cis-eQTLs relative to transcription factor binding sites using ChIP, and Alfonso Buil showed how genetic variants influence gene expression networks (or correlation among gene expression) across tissue types.&lt;br /&gt;
&lt;br /&gt;
&amp;nbsp;I must say despite all the awesome work presented in this session, Michael Snyder stole the show with his talk on the “Snyderome” – his own personal –omics profile collected over 21 months.  His whole-genome was sequenced by Complete Genomics, and processed using Rong Chen and Atul Butte’s risk-o-gram to quantify his disease risk.  His profile predicted increased risk of T2D, so he began collecting glucose measures and low and behold, he saw a sustained spike in blood glucose levels following a few days following a common cold.  His interpretation was that an environmental stress knocked him into a pseudo-diabetic state, and his transcriptome and proteome results corroborated this idea.  Granted, this is an N of 1, and there is still lots of work to be done before this type of analysis revolutionizes medicine, but the take home message is salient – multiple -omics are better than one, and everyone’s manifestation of a complex disease is different.  This was truly thought-provoking work, and it nicely closed an entire session devoted to understanding the intermediate impact of genetic variants to better understand disease complexity.  


&lt;br /&gt;
&lt;br /&gt;
This is just my take of a really great meeting -- I'm sure I missed lots of excellent talks. &amp;nbsp;If you saw something good please leave a comment and share!&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1475248792265832281?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/QRGgNIRXLSI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1475248792265832281/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/my-thoughts-on-ichg-2011.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1475248792265832281?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1475248792265832281?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/my-thoughts-on-ichg-2011.html" title="My thoughts on ICHG 2011" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-85pQXz9VB8E/TpyArwOEY6I/AAAAAAAABKA/bhYrzX2qAhM/s72-c/ichg_pg.png" height="72" width="72" /><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;DEQEQX84fyp7ImA9WhdbEEU.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7553789175644076917</id><published>2011-10-08T09:54:00.003-05:00</published><updated>2011-10-08T10:05:00.137-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-08T10:05:00.137-05:00</app:edited><title>Find me at ICHG!</title><content type="html">This week, I'm off to Montreal for the International Congress on Human Genetics and I hope to see you there!  &lt;br /&gt;&lt;br /&gt;If you are attending and are already a part of the &lt;a href="http://www.twitter.com"&gt;Twitterverse&lt;/a&gt;, bring a tablet or phone and tweet away about the meeting using the official hashtag,  #ICHG2011.  If you are new to twitter, go &lt;a href="http://twitter.com/"&gt;sign up&lt;/a&gt;!  Using nearly any twitter application, you can search for tweets that contain the #ICHG2011 hashtag and follow the thoughts of your fellow conference goers.  &lt;br /&gt;&lt;br /&gt;Using Twitter at an academic conference is a fascinating experience!  Not only can you get fantastic information about what is being presented at the multitude of sessions, you get lots of opinion on what is going on in the field, and sometimes practically useful tips, like the location of the nearest &lt;a href="http://maps.google.com/maps?hl=en&amp;prmd=imvns&amp;resnum=1&amp;bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&amp;biw=1212&amp;bih=794&amp;um=1&amp;ie=UTF-8&amp;q=Starbucks+Montreal&amp;fb=1&amp;gl=us&amp;hq=Starbucks&amp;hnear=0x4cc91a541c64b70d:0x654e3138211fefef,Montreal,+QC,+Canada&amp;ei=XWWQTriFKYHAtgeix6SlDA&amp;sa=X&amp;oi=local_group&amp;ct=image&amp;ved=0CAkQtgM"&gt;Starbucks&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If you see me wandering aimlessly around the conference center, please come say hello!  Its always great to make new academic friends.&lt;br /&gt;&lt;br /&gt;See you there!&lt;br /&gt;&lt;br /&gt;Will&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7553789175644076917?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/X-FiRuyDvA0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7553789175644076917/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/find-me-at-ichg.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7553789175644076917?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7553789175644076917?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/10/find-me-at-ichg.html" title="Find me at ICHG!" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkYNQX09eip7ImA9WhdUE04.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7225595639745584053</id><published>2011-09-29T11:41:00.009-05:00</published><updated>2011-09-29T18:16:30.362-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-29T18:16:30.362-05:00</app:edited><title>The Utility of Network Analysis</title><content type="html">Like most bioinformatics nerds (or anyone with a facebook account), I’m fascinated by networks.  Most people immediately think of protein-protein interaction networks, or biological pathways when thinking about networks, but sometimes representing a problem as a network makes solving problems easier.  &lt;br /&gt;&lt;br /&gt;Recently, some collaborators from the &lt;a href="http://www.pagestudy.org"&gt;PAGE study&lt;/a&gt; had a list of a few hundred SNPs gathered from multiple loci across the genome.  For analysis purposes, they were interested in quantifying the number of loci these SNPs represented – in other words, how many distinct signals were represented by their collection of SNPs.  &lt;br /&gt;&lt;br /&gt;We had linkage disequilibrium data from the HapMap for all pairs of SNPs, and we filtered this using an r-squared cutoff.  What we were left with was a mess of SNP pairs that could be tedious to sort through in a spreadsheet.  Instead, I represented each pair of SNPs as an edge in a network and loaded the data into &lt;a href="http://gephi.org/"&gt;Gephi&lt;/a&gt;, which provides some wonderful analysis tools.  Suppose my LD data is structured like this:&lt;br /&gt;&lt;span style="font-family:arial"&gt;&lt;br /&gt;&lt;table border="0"&gt;&lt;tr&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;SNP1&lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;SNP2&lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;d-prime&lt;/span&gt;&lt;/td&gt;&lt;td&gt;&lt;span style="font-weight:bold;"&gt;r-squared&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;16969968&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.98&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;2036534&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.92&lt;/td&gt;&lt;td&gt;0.205&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;578776&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.96&lt;/td&gt;&lt;td&gt;0.23&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;8034191&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0.961&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;8042374&lt;/td&gt;&lt;td&gt;1051730&lt;/td&gt;&lt;td&gt;0.99&lt;/td&gt;&lt;td&gt;0.205&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;tr&gt;&lt;td&gt;...&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;&lt;br /&gt;&lt;/table&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In a spreadsheet application, I sorted and filtered the LD pairings I wanted using either the r-squared or the d-prime columns.  I then deleted any rows that didn’t meet my cutoff, renamed the header for SNP1 to “Source” and SNP2 to “Target”, and exported the file as a comma-separated file (.csv).  I opened &lt;a href="http://gephi.org/"&gt;Gephi&lt;/a&gt;, clicked the “Data Laboratory” tab, and Import Spreadsheet to load my data.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-ygNQoXfodMI/ToT5gfVG4xI/AAAAAAAABIs/yISdP-0G7fs/s1600/gephi1.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 314px;" src="http://2.bp.blogspot.com/-ygNQoXfodMI/ToT5gfVG4xI/AAAAAAAABIs/yISdP-0G7fs/s400/gephi1.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657921368445346578" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Once loaded, I clicked on the “Overview” tab and I can see my graph.  The graph looks like a big mess, but we don’t really care how it looks – we’re going to run an analysis.  In the “statistics” tab on the right-hand side, you’ll see an option for “connected components”.  This runs an algorithm that picks apart and labels collections of nodes that are connected.  Running this only takes a second.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/-7P1BN7bX28o/ToT5uYkubvI/AAAAAAAABI0/t3GJ2acQGDM/s1600/gephi2.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 315px;" src="http://2.bp.blogspot.com/-7P1BN7bX28o/ToT5uYkubvI/AAAAAAAABI0/t3GJ2acQGDM/s400/gephi2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657921607149973234" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I then click on the “Data Laboratory” tab again, and I can see that my nodes are labeled with an ID.  This corresponds to the Locus those SNPs represent.  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-5laCf3ZSY0M/ToT6h9gJfFI/AAAAAAAABI8/680gMCKJSzA/s1600/gephi3.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 314px;" src="http://3.bp.blogspot.com/-5laCf3ZSY0M/ToT6h9gJfFI/AAAAAAAABI8/680gMCKJSzA/s400/gephi3.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657922493236214866" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If you want to actually SEE how these relationships fall out, we’ll need to run a layout engine.  Back on the “Overview” tab, on the lower left-hand side, there is a drop-down allowing you to choose a layout engine.  I have found YifanHu’s Multilevel to be the quickest and most effective for separating small groups like these.  Depending on the size of your graph, it may take a moment to run.  Once its finished, you should be able to see the components clearly separated.  If you want, you can color code them by clicking the green “refresh” button in the “partition” tab in the upper left corner.  This reloads the drop-down menu and will provide you with an option to color the nodes by component ID.  Select this, and click apply to see the results!  &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/-INapvDuRwS0/ToT6pbD0E3I/AAAAAAAABJE/V-YDNOQuElY/s1600/gephi4.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 324px;" src="http://4.bp.blogspot.com/-INapvDuRwS0/ToT6pbD0E3I/AAAAAAAABJE/V-YDNOQuElY/s400/gephi4.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5657922621429519218" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I’ve used Gephi component analysis to do all kinds of fun things, like the number of families in a study using pairwise IBD estimates, looking at patterns of phenotype sharing in pedigrees, and even visualizing citation networks.  Sometimes representing a problem as a graph lets you find patterns more easily than examining tables of numbers.&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7225595639745584053?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/RFisqV6uC7c" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7225595639745584053/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/utility-of-network-analysis.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7225595639745584053?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7225595639745584053?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/utility-of-network-analysis.html" title="The Utility of Network Analysis" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-ygNQoXfodMI/ToT5gfVG4xI/AAAAAAAABIs/yISdP-0G7fs/s72-c/gephi1.png" height="72" width="72" /><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;C0QCRno6cSp7ImA9WhdWFUw.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-6681360119986198582</id><published>2011-09-08T14:49:00.000-05:00</published><updated>2011-09-08T14:49:27.419-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-08T14:49:27.419-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Twitter" /><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>I'm Starting a New Position at the University of Virginia</title><content type="html">I just accepted an offer for a faculty position at the University of Virginia in the Center for Public Health Genomics / Department of Public Health Sciences. Starting in October I will be developing and directing a new centralized bioinformatics core in the UVA School of Medicine. Over the next few weeks I'm taking a much-needed vacation next door in Kauai and then packing up for the move to Charlottesville. Posts here may be sparse over the next few weeks, but once I start my new gig I'll be sure to make up for it. And if you're bioinformatics-savvy and in the job market keep an eye out here - once I figure out what I need I will soon be hiring, and will repost any job announcements here.&lt;br /&gt;
&lt;br /&gt;
I've enjoyed my postdoc here at the University of Hawaii Cancer Center, and there is much I'll miss about island life out here in the Pacific. But I'm very seriously looking forward to getting started in this wonderful opportunity at UVA. Thank you all for your comments, suggestions, and help when I needed it. I'll be back online in a few weeks - until then, follow me on Twitter (&lt;a href="http://twitter.com/#%21/genetics_blog"&gt;@genetics_blog&lt;/a&gt;).&lt;br /&gt;
&lt;br /&gt;
Aloha!&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-6681360119986198582?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/OExPXrX_1aQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/6681360119986198582/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/im-starting-new-position-at-university.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6681360119986198582?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/6681360119986198582?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/im-starting-new-position-at-university.html" title="I'm Starting a New Position at the University of Virginia" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DUYGQ30-fip7ImA9WhdWFU0.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1065081340277534395</id><published>2011-09-08T13:38:00.000-05:00</published><updated>2011-09-08T13:38:42.356-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-08T13:38:42.356-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>True Hypotheses are True, False Hypotheses are False</title><content type="html">I just read Gregory Cooper and Jay Shendure's review &lt;a href="http://www.nature.com/nrg/journal/v12/n9/full/nrg3046.html"&gt;"Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data"&lt;/a&gt; in Nature Reviews Genetics. It's a good review about how to narrow down deleterious disease-causing variants from many, many variants throughout the genome when statistics and genetic information alone isn't enough.&lt;br /&gt;
&lt;br /&gt;
I really liked how they framed the multiple-testing problem that routinely plagues large-scale genetic studies, where nominal significance thresholds can yield many false positives when applied to multiple hypothesis tests:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;
However, true hypotheses are true, and false hypotheses are false, 
regardless of how many are tested. As such, the actual 'multiple testing
 burden' depends on the proportion of true and false hypotheses in any 
given set: that is, the 'prior probability'
 that any given hypothesis is true, rather than the number of tests per 
se. This challenge can thus be viewed as a 'naive hypothesis testing' 
problem — that is, when in reality only one or a few variants are causal
 for a given phenotype, but all (or many) variants are &lt;i&gt;&lt;span class="i"&gt;a priori&lt;/span&gt;&lt;/i&gt;
 equally likely candidates, the prior probability of any given variant 
being causal is miniscule. As a consequence, extremely convincing data 
are required to support causality, which is potentially unachievable for
 some true positives.&lt;br /&gt;
&lt;br /&gt;
Defining the challenge in terms of hypothesis quality rather than 
quantity, however, points to a solution. Specifically, experimental or 
computational approaches that provide assessments of variant function 
can be used to better estimate the prior probability that any given 
variant is phenotypically important, and these approaches thereby boost 
discovery power.&lt;/blockquote&gt;
&lt;br /&gt;
Check out the full review at Nature Reviews Genetics:&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.nature.com/nrg/journal/v12/n9/full/nrg3046.html"&gt;Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1065081340277534395?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/m7aWES1mgHA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1065081340277534395/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/true-hypotheses-are-true-false.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1065081340277534395?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1065081340277534395?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/true-hypotheses-are-true-false.html" title="True Hypotheses are True, False Hypotheses are False" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DkMCQn8zcCp7ImA9WhdWFE8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7003954756642604275</id><published>2011-09-07T14:41:00.000-05:00</published><updated>2011-09-07T14:41:03.188-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-07T14:41:03.188-05:00</app:edited><title>Excel Template for Mapping Four 96-Well Plates to One 384-Well Plate</title><content type="html">Daniel Cook in &lt;a href="http://genetics.uiowa.edu/"&gt;Jeff Murray&lt;/a&gt;'s lab at the University of Iowa put together &lt;a href="http://www.stephenturner.us/96_to_384_platemapper.xlsx"&gt;this handy Excel template&lt;/a&gt; for keeping track of how samples from four 96-well plates are interleaved to configure a single 384-well plate using robotic liquid handling systems, like the &lt;a href="http://www.stephenturner.us/96_to_384_platemapper.xlsx"&gt;Hydra II&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Paste in lists of samples on your 96-well plates:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-tCfRzLKF62A/TmfH7P6QeJI/AAAAAAAAjmU/gJV0axFAXrA/s1600/2011-09-07_093420.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-tCfRzLKF62A/TmfH7P6QeJI/AAAAAAAAjmU/gJV0axFAXrA/s1600/2011-09-07_093420.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
And you'll get out a map of how the 384-well plate layout:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-0ubwxDPGWS4/TmfIL1h3HgI/AAAAAAAAjmY/sG_aqexZrSQ/s1600/2011-09-07_093513.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="85" src="http://1.bp.blogspot.com/-0ubwxDPGWS4/TmfIL1h3HgI/AAAAAAAAjmY/sG_aqexZrSQ/s400/2011-09-07_093513.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
And a summary list:&lt;br /&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-d0YjwIGjysY/TmfITIFM7HI/AAAAAAAAjmc/jrT8cVFfEE4/s1600/2011-09-07_093600.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="253" src="http://1.bp.blogspot.com/-d0YjwIGjysY/TmfITIFM7HI/AAAAAAAAjmc/jrT8cVFfEE4/s400/2011-09-07_093600.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;You can &lt;a href="http://www.stephenturner.us/96_to_384_platemapper.xlsx"&gt;download the Excel file here&lt;/a&gt;. Thanks for sharing, Daniel. &lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7003954756642604275?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/NCvAJY2c5EA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7003954756642604275/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/excel-template-for-mapping-four-96-well.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7003954756642604275?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7003954756642604275?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/09/excel-template-for-mapping-four-96-well.html" title="Excel Template for Mapping Four 96-Well Plates to One 384-Well Plate" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-tCfRzLKF62A/TmfH7P6QeJI/AAAAAAAAjmU/gJV0axFAXrA/s72-c/2011-09-07_093420.png" height="72" width="72" /><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;AkMFSXw8eip7ImA9WhdXGEw.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3372228763074294102</id><published>2011-08-31T14:15:00.003-05:00</published><updated>2011-08-31T14:20:18.272-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-31T14:20:18.272-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Announcements" /><title>Personal Genomics and Data Sharing Survey</title><content type="html">I was recently contacted by a couple of German biologists working on a project evaluating opinions on sharing raw data from DTC genetic testing companies like 23andme. A handful of people like the gang at &lt;a href="http://www.genomesunzipped.org/data"&gt;Genomes Unzipped&lt;/a&gt;, the &lt;a href="http://www.personalgenomes.org/public/"&gt;PGP-10&lt;/a&gt;, and others at &lt;a href="http://www.snpedia.com/index.php/Genomes"&gt;SNPedia&lt;/a&gt; have released their own genotype or sequencing data into the public domain. As of now, data like this is scattered around the web and most of it is not attached to any phenotype data.&lt;br /&gt;
&lt;br /&gt;
These three biologists are working on a website that collects genetic data as well as phenotypic data. The hope is to make it easy to find and access appropriate data and to become a resource for a kind of open-source GWAS - similar to the &lt;a href="https://www.23andme.com/research/"&gt;research&lt;/a&gt; 23andMe performs in its walled garden right now.&lt;br /&gt;
&lt;br /&gt;
But because of privacy concerns, many people (myself included) hesitate to freely publish their genetic data for the world to see. These three biologists are conducting a survey to assess how willing people might be to participate in something like this, and for what reasons they would (or would not). The survey can be accessed at &lt;a href="http://bit.ly/genotyping_survey"&gt;http://bit.ly/genotyping_survey&lt;/a&gt;. It took about 2 minutes for me to complete, and you can optionally sign up to receive an email with their results once they've completed the survey.&lt;br /&gt;
&lt;br /&gt;
Although I'm still hesitant to participate in something like this myself, I like the idea, and I'm very interested to see the results of their survey. Hit the link below if you'd like to take the quick survey. &lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://bit.ly/genotyping_survey"&gt;Personal Genomics and Data Sharing Survey&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3372228763074294102?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/mQ71wslo4sQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3372228763074294102/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/personal-genomics-and-data-sharing.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3372228763074294102?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3372228763074294102?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/personal-genomics-and-data-sharing.html" title="Personal Genomics and Data Sharing Survey" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;DUIGSXo5cCp7ImA9WhdXFkk.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7007398607800893924</id><published>2011-08-29T14:52:00.000-05:00</published><updated>2011-08-29T14:52:08.428-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-29T14:52:08.428-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Bioinformatics Posters Collection</title><content type="html">I mentioned &lt;a href="http://biostar.stackexchange.com/"&gt;BioStar&lt;/a&gt; in a &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/get-all-your-questions-answered.html"&gt;previous post about getting all your questions answered&lt;/a&gt;. I can't emphasize enough how helpful the BioStar and other StackExchange communities are. Whenever I ask a statistics question on &lt;a href="http://stats.stackexchange.com/"&gt;CrossValidated&lt;/a&gt; or a programming question on &lt;a href="http://stackoverflow.com/"&gt;StackOverflow&lt;/a&gt; I often multiple answers within 10 minutes.&lt;br /&gt;
&lt;br /&gt;
Recently there was a &lt;a href="http://biostar.stackexchange.com/questions/10597/bioinformatics-posters-collection"&gt;question&lt;/a&gt; on BioStar from someone making their poster for a bioinformatics poster presentation and wanted some inspiration for design and layout. No less than 7 community members posted responses the same day, linking to sites where you can download poster presentations, including &lt;a href="http://vizbi.org/2011/Posters/Collection/?poster=A05"&gt;VIZBI 2011&lt;/a&gt; (workshop on visualizing biological data), &lt;a href="http://posters.f1000.com/PosterList?facID=8001"&gt;F1000 Posters&lt;/a&gt; (which collects posters from the Intelligent Systems for Molecular Biology conference), &lt;a href="http://precedings.nature.com/documents/type/poster/revisions"&gt;Nature Precedings&lt;/a&gt; (not specifically limited to bioinformatics), and several others.&lt;br /&gt;
&lt;br /&gt;
While you can see plenty of posters at the meeting you're attending, it isn't much help when you're trying to design and layout your poster beforehand. I've used the same tired old template for poster presentations for years, and it's helpful to see examples of other bioinformatics posters for fresh ideas about design and layout.&lt;br /&gt;
&lt;br /&gt;
I would also encourage you to deposit some of your posters in places like &lt;a href="http://posters.f1000.com/Index?page=Deposit"&gt;F1000&lt;/a&gt; (deposit link) or Nature Precedings (&lt;a href="http://precedings.nature.com/documents/new"&gt;submission link&lt;/a&gt;). While these aren't peer-reviewed, it can really increase the visibility of your work, and it gives you a permanent DOI (at least for Nature Precedings) that you can link to or reference in other scientific communication.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://biostar.stackexchange.com/questions/10597/bioinformatics-posters-collection"&gt;See this Q&amp;amp;A at BioStar for more&lt;/a&gt;. &lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7007398607800893924?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/g1BAGi4G4Wo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7007398607800893924/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/bioinformatics-posters-collection.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7007398607800893924?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7007398607800893924?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/bioinformatics-posters-collection.html" title="Bioinformatics Posters Collection" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry><entry gd:etag="W/&quot;CUIHSHo-eyp7ImA9WhdXEUw.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3341529212990763250</id><published>2011-08-22T17:47:00.007-05:00</published><updated>2011-08-23T10:32:19.453-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-23T10:32:19.453-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Software" /><category scheme="http://www.blogger.com/atom/ns#" term="GWAS" /><title>Estimating Trait Heritability from GWAS Data</title><content type="html">Peter Visscher and colleagues have recently published a flurry of papers employing a new software package called GCTA to estimate the heritability of traits using GWAS data (GCTA stands for Genome-wide Complex Trait Analysis -- clever acronymity!).  The tool, supported (and presumably coded) by Jian Yang is remarkably easy to use, based in part on the familiar PLINK commandline interface.  The &lt;a href="http://gump.qimr.edu.au/gcta/"&gt;GCTA Homepage&lt;/a&gt; provides an excellent walk-through of the available options.  
&lt;br /&gt;
&lt;br /&gt;The basic idea is to use GWAS data to estimate the degree of "genetic sharing" or relatedness among the samples, computing what the authors call a genetic relationship matrix (GRM).  The degree of genetic sharing among samples is then related to the amount of phenotypic sharing using restricted maximum likelihood analysis (REML).  The result is an estimate of the variance explained by the SNPs used to generate the GRM.  Full details of the stats along with all the gory matrix notation can be found in their &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21167468"&gt;software publication&lt;/a&gt;.
&lt;br /&gt;
&lt;br /&gt;The approach has been applied to &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed?term=Estimating%20Missing%20Heritability%20for%20Disease%20from%20Genome-wide%20Association%20Studies"&gt;several disorders studied by the WTCCC&lt;/a&gt; and to a recent study of &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/20562875"&gt;human height&lt;/a&gt;.  Interestingly, the developers have also used the approach to partition the trait variance &lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21552263"&gt;across chromosomes&lt;/a&gt;, resulting in something similar to population-based variance-components linkage analysis.  The approach works for both quantitative and dichotomous traits, however the authors warn that variance estimates of dichotomous trait liability are influenced by genotyping artifacts.
&lt;br /&gt;
&lt;br /&gt;The package also includes several other handy features, including a relatively easy way to estimate principal components for population structure correction, a GWAS simulation tool, and a regression-based LD mapping tool.  &lt;a href="http://gump.qimr.edu.au/gcta/download.html"&gt;Download&lt;/a&gt; and play -- a binary is available for Linux, MacOS, and DOS/Windows.
&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3341529212990763250?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/nG7N53-PSMQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3341529212990763250/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/estimating-trait-heritability-from-gwas.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3341529212990763250?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3341529212990763250?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/estimating-trait-heritability-from-gwas.html" title="Estimating Trait Heritability from GWAS Data" /><author><name>Will</name><uri>http://www.blogger.com/profile/09703349044940180835</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" /></author><thr:total>1</thr:total></entry><entry gd:etag="W/&quot;CE8BRXczeCp7ImA9WhdQFE4.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-1817658359852315957</id><published>2011-08-15T13:27:00.000-05:00</published><updated>2011-08-15T13:27:34.980-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-15T13:27:34.980-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Productivity" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Sync Your Rprofile Across Multiple R Installations</title><content type="html">Your Rprofile is a script that R executes every time you launch an R session. You can use it to automatically load packages, set your working directory, set options, define useful functions, and set up database connections, and run any other code you want every time you start R.&lt;br /&gt;
&lt;br /&gt;
If you're using R in Linux, it's a hidden file in your home directory called&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt; ~/.Rprofile&lt;/span&gt;, and if you're on Windows, it's usually in the program files directory: &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;C:\Program Files\R\R-2.12.2\library\base\R\Rprofile&lt;/span&gt;. I sync my Rprofile across several machines and operating systems by creating a separate script called called syncprofile.R and storing this in my &lt;a href="https://www.dropbox.com/referrals/NTY2MjgxOQ?src=global9"&gt;Dropbox&lt;/a&gt;. Then, on each machine, I edit the real Rprofile to source the syncprofile.R script that resides in my Dropbox.&lt;br /&gt;
&lt;br /&gt;
One of the disadvantages of doing this, however, is that all the functions you define and variables you create are sourced into the global environment (.GlobalEnv). This can clutter your workspace, and if you want to start clean using &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;rm(list=ls(all=TRUE))&lt;/span&gt;, you'll have to re-source your syncprofile.R script every time.&lt;br /&gt;
&lt;br /&gt;
It's easy to get around this problem. Rather than simply appending source(/path/to/dropbox/syncprofile.R) to the end of your actual Rprofile, first create a new environment, source that script into that new environment, and attach that new environment. So you'll add this to the end of your real Rprofile on each machine/installation:&lt;br /&gt;
&lt;br /&gt;
&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;my.env &amp;lt;- new.env()&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;sys.source("C:/Users/st/Dropbox/R/Rprofile.r", my.env)&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;attach(my.env)&lt;/div&gt;&lt;br /&gt;
All the functions and variables you've defined are now available but they no longer clutter up the global environment.&lt;br /&gt;
&lt;br /&gt;
If you have code that you only want to run on specific machines, you can still put that into each installation's Rprofile rather than the syncprofile.R script that you sync using Dropbox. Here's what my &lt;a href="https://gist.github.com/1141346"&gt;syncprofile.R&lt;/a&gt; script looks like - feel free to take whatever looks useful to you.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1141346.js?file=Rprofile.R"&gt;
&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-1817658359852315957?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/-Fyxqzgnouw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/1817658359852315957/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/sync-your-rprofile-across-multiple-r.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1817658359852315957?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/1817658359852315957?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/sync-your-rprofile-across-multiple-r.html" title="Sync Your Rprofile Across Multiple R Installations" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>10</thr:total></entry><entry gd:etag="W/&quot;DkUMRXcyeyp7ImA9WhdRFUs.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-3256760917683539130</id><published>2011-08-05T01:31:00.001-05:00</published><updated>2011-08-05T12:11:24.993-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-05T12:11:24.993-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Tutorials" /><category scheme="http://www.blogger.com/atom/ns#" term="Statistics" /><category scheme="http://www.blogger.com/atom/ns#" term="Recommended Reading" /><category scheme="http://www.blogger.com/atom/ns#" term="Twitter" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><category scheme="http://www.blogger.com/atom/ns#" term="Bioinformatics" /><title>Friday Links: R, OpenHelix Bioinformatics Tips, 23andMe, Perl, Python, Next-Gen Sequencing</title><content type="html">&lt;i&gt;I haven't posted much here recently, but here is a roundup of a few of the links I've shared on Twitter (&lt;a href="http://twitter.com/#%21/genetics_blog"&gt;@genetics_blog&lt;/a&gt;) over the last two weeks.&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Here is a &lt;a href="http://watson.nci.nih.gov/%7Esdavis/tutorials/publicdatatutorial/"&gt;nice tutorial&lt;/a&gt; on &lt;b&gt;accessing high-throughput public data&lt;/b&gt; (from NCBI) using R and Bioconductor.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;&lt;a href="http://cloudnumbers.com/"&gt;Cloudnumbers.com&lt;/a&gt;&lt;/b&gt;, a startup that allows you to run high-performance computing (HPC) applications in the cloud, now supports the&lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/rstudio-new-free-ide-for-rstats.html"&gt; previously mentioned&lt;/a&gt; R IDE, &lt;a href="http://www.rstudio.org/"&gt;RStudio&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;23andMe &lt;/b&gt;announced a project to enroll 10,000 African-Americans for research by giving participants their personal genome service for free. You can read about it &lt;a href="https://www.23andme.com/roots/"&gt;here at 23andMe&lt;/a&gt; or &lt;a href="http://www.wired.com/wiredscience/2011/07/personal-genomics-no-longer-just-for-rich-white-folks/"&gt;here at Genetic Future&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Speaking of 23andMe, they emailed me a coupon code (8WR9U9) for getting &lt;b&gt;$50 off their personal genome service&lt;/b&gt;, making it $49 instead of $99. Not sure how long it will last. &lt;br /&gt;
&lt;br /&gt;
I previously &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/02/results-from-reference-management-poll.html"&gt;took a poll&lt;/a&gt; which showed that most of you use &lt;b&gt;Mendeley &lt;/b&gt;to manage your references. Mendeley recently released version 1.0, which includes some nice features like duplicate detection, better library organization (subfolders!), and a better file organization tool. You can &lt;a href="http://www.mendeley.com/download-mendeley-desktop/"&gt;download it here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
An interesting &lt;a href="http://www.bioinformaticszen.com/philosophy/bioinformatics-wide-set-of-transferable-skills/"&gt;blog post&lt;/a&gt; by Michael Barton on how training and experience in &lt;b&gt;bioinformatics &lt;/b&gt;leads to a wide set of transferable skills.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Dienekes &lt;/b&gt;&lt;a href="http://dienekes.blogspot.com/2011/07/diy-dodecad.html"&gt;releases a free DIY admixture program&lt;/a&gt; to analyze genomic ancestry.&lt;br /&gt;
&lt;br /&gt;
A few tips from &lt;b&gt;OpenHelix&lt;/b&gt;: the new &lt;a href="http://blog.openhelix.eu/?p=9185"&gt;SIB Bioinformatics Resource Portal&lt;/a&gt;, and testing correlation between SNPs and gene expression using &lt;a href="http://blog.openhelix.eu/?p=9275"&gt;SNPexp&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
A nice &lt;a href="http://www.nejm.org/action/showMediaPlayer?doi=10.1056%2FNEJMoa1106920&amp;amp;aid=NEJMoa1106920_attach_1&amp;amp;area=aop"&gt;animation&lt;/a&gt; describing a Circos plot from&lt;b&gt; PacBio's &lt;i&gt;E. coli &lt;/i&gt;paper &lt;/b&gt;in NEJM.&lt;br /&gt;
&lt;br /&gt;
The Court of Appeals for the Federal Circuit reversed the lower court's invalidation of Myriad Genetics' patents on &lt;b&gt;BRCA1/2&lt;/b&gt;, reinstating most of the claims in full force. &lt;a href="http://www.genomicslawreport.com/index.php/2011/07/31/pigs-return-to-earth-federal-circuit-reinstates-most-but-not-all-of-myriads-patents/"&gt;Thoughtful analysis from Dan Vorhaus here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Using the Linux shell and &lt;b&gt;perl to delete files&lt;/b&gt; in the current directory that don't contain the right number of lines: If you want to get rid of all files in the current directory that don't have exactly 42 lines, run this code at the command line (*be very careful with this one!*): &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;for f in *.txt;do perl -ne 'END{unlink $ARGV unless $.==42}' ${f} ;done&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
The &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/05/golden-helix-hitchhikers-guide-to-next.html"&gt;previously mentioned&lt;/a&gt; &lt;b&gt;Hitchhiker's Guide&lt;/b&gt; to Next-Generation Sequencing by Gabe Rudy at Golden Helix is now available in &lt;a href="http://j.mp/pchrJ7"&gt;PDF format here&lt;/a&gt;. You can also find the related post describing all the various file formats used in NGS in &lt;a href="http://j.mp/qRjG9w"&gt;PDF format here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
The Washington Post ran an &lt;a href="http://www.washingtonpost.com/lifestyle/magazine/web-site-offering-free-online-math-lessons-catches-on-like-wildfire/2011/07/15/gIQAtL5KuI_story.html?tid=sm_twitter_washingtonpost"&gt;article&lt;/a&gt; about the &lt;b&gt;Khan Academy&lt;/b&gt; (&lt;a href="http://www.khanacademy.org/"&gt;http://www.khanacademy.org/&lt;/a&gt;), which has thousands of free &lt;a href="http://www.khanacademy.org/#browse"&gt;video lectures&lt;/a&gt;, mostly on math. There are also a few computer science lectures that teach Python programming. (Salman Khan also &lt;a href="http://www.colbertnation.com/the-colbert-report-videos/388279/june-02-2011/salman-khan"&gt;appeared on the Colbert Report&lt;/a&gt; a few months ago).&lt;br /&gt;
&lt;br /&gt;
Finally, I stumbled across &lt;a href="http://biostar.stackexchange.com/questions/137/what-methods-do-you-use-for-short-read-mapping"&gt;this old question on BioStar&lt;/a&gt; with lots of answers about &lt;b&gt;methods for short read mapping&lt;/b&gt; with next-generation sequencing data.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;And here are a few interesting papers I shared:&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.1904.html"&gt;Nature Biotechnology: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.plosgenetics.org/article/info:doi%2F10.1371%2Fjournal.pgen.1002177"&gt;PLoS Genetics: Gene-Based Tests of Association&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.plosgenetics.org/article/info:doi%2F10.1371%2Fjournal.pgen.1002198"&gt;PLoS Genetics: Fine Mapping of Five Loci Associated with Low-Density Lipoprotein Cholesterol Detects Variants That Double the Explained Heritability&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg3033.html"&gt;Nature Reviews Genetics: Systems-biology approaches for predicting genomic evolution&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.blogger.com/goog_1110107147"&gt;&lt;br /&gt;
&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://genome.cshlp.org/content/early/2011/08/02/gr.125047.111.abstract"&gt;Genome Research: A comprehensively molecular haplotype-resolved genome of a European individual (paper about the importance of phase in genetic studies)&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.ncbi.nlm.nih.gov/pubmed/21407244"&gt;Nature Reviews Microbiology: Unravelling the effects of the environment and host genotype on the gut microbiome.&lt;/a&gt;&lt;br /&gt;
...&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-3256760917683539130?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/YC4V_nTXDvI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/3256760917683539130/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/friday-links-r-openhelix-bioinformatics.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3256760917683539130?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/3256760917683539130?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/08/friday-links-r-openhelix-bioinformatics.html" title="Friday Links: R, OpenHelix Bioinformatics Tips, 23andMe, Perl, Python, Next-Gen Sequencing" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>2</thr:total></entry><entry gd:etag="W/&quot;DkUNSXo7eSp7ImA9WhdSFk8.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-8122570924942869959</id><published>2011-07-25T12:44:00.003-05:00</published><updated>2011-07-25T15:04:58.401-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-07-25T15:04:58.401-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="ggplot2" /><category scheme="http://www.blogger.com/atom/ns#" term="Visualization" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Scatterplot matrices in R</title><content type="html">I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is &lt;a href="http://www.inside-r.org/r-doc/graphics/pairs"&gt;pairs()&lt;/a&gt;. Producing these plots can be helpful in exploring your data, especially using the second method below.&lt;br /&gt;
&lt;br /&gt;
Try it out on the built in iris dataset. (data set gives the measurements in cm of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris.  The species are &lt;i&gt;Iris setosa&lt;/i&gt;, &lt;i&gt;versicolor&lt;/i&gt;, and &lt;i&gt;virginica&lt;/i&gt;).&lt;br /&gt;
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;&lt;div class="geshifilter"&gt;&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&lt;span style="color: #666666; font-style: italic;"&gt;# Load the iris dataset.&lt;/span&gt;
&lt;a href="http://inside-r.org/r-doc/utils/data"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;data&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/datasets/iris"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;iris&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;
&lt;span style="color: #666666; font-style: italic;"&gt;# Plot #1: Basic scatterplot matrix of the four measurements&lt;/span&gt;
&lt;a href="http://inside-r.org/r-doc/graphics/pairs"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;pairs&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/data"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;data&lt;/span&gt;&lt;/a&gt;=&lt;a href="http://inside-r.org/r-doc/datasets/iris"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;iris&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-c5bDhQx8Y-w/TisWWy8s4kI/AAAAAAAAjkY/k-Af-OWK1t0/s1600/Rplot01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-c5bDhQx8Y-w/TisWWy8s4kI/AAAAAAAAjkY/k-Af-OWK1t0/s1600/Rplot01.png" /&gt;&lt;/a&gt;&lt;/div&gt;Looking at the pairs help page I found that there's another built-in function, panel.smooth(), that can be used to plot a loess curve for each plot in a scatterplot matrix. Pass this function to the lower.panel argument of the pairs function. The panel.cor() function below can compute the absolute correlation between pairs of variables, and display these in the upper panels, with the font size proportional to the absolute value of the correlation.&lt;br /&gt;
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;&lt;div class="geshifilter"&gt;&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&lt;span style="color: #666666; font-style: italic;"&gt;# panel.smooth function is built in.&lt;/span&gt;
&lt;span style="color: #666666; font-style: italic;"&gt;# panel.cor puts correlation in upper panels, size proportional to correlation&lt;/span&gt;
panel.cor &amp;lt;- &lt;a href="http://inside-r.org/r-doc/base/function"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;function&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;x&lt;span style="color: #339933;"&gt;,&lt;/span&gt; y&lt;span style="color: #339933;"&gt;,&lt;/span&gt; digits=&lt;span style="color: #cc66cc;"&gt;2&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; prefix=&lt;span style="color: blue;"&gt;""&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; cex.cor&lt;span style="color: #339933;"&gt;,&lt;/span&gt; ...&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&lt;span style="color: #009900;"&gt;{&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; usr &amp;lt;- &lt;a href="http://inside-r.org/r-doc/graphics/par"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;par&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: blue;"&gt;"usr"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #339933;"&gt;;&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/on.exit"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;on.exit&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/graphics/par"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;par&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;usr&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;a href="http://inside-r.org/r-doc/graphics/par"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;par&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;usr = &lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #cc66cc;"&gt;0&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #cc66cc;"&gt;1&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #cc66cc;"&gt;0&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #cc66cc;"&gt;1&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; r &amp;lt;- &lt;a href="http://inside-r.org/r-doc/base/abs"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;abs&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/stats/cor"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;cor&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;x&lt;span style="color: #339933;"&gt;,&lt;/span&gt; y&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; txt &amp;lt;- &lt;a href="http://inside-r.org/r-doc/base/format"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;format&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/c"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;c&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;r&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #cc66cc;"&gt;0.123456789&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; digits=digits&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;[&lt;/span&gt;&lt;span style="color: #cc66cc;"&gt;1&lt;/span&gt;&lt;span style="color: #009900;"&gt;]&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; txt &amp;lt;- &lt;a href="http://inside-r.org/r-doc/base/paste"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;paste&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;prefix&lt;span style="color: #339933;"&gt;,&lt;/span&gt; txt&lt;span style="color: #339933;"&gt;,&lt;/span&gt; sep=&lt;span style="color: blue;"&gt;""&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;span style="color: black; font-weight: bold;"&gt;if&lt;/span&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/missing"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;missing&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;cex.cor&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt; cex.cor &amp;lt;- &lt;span style="color: #cc66cc;"&gt;0.8&lt;/span&gt;/strwidth&lt;span style="color: #009900;"&gt;(&lt;/span&gt;txt&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;a href="http://inside-r.org/r-doc/graphics/text"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;text&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;span style="color: #cc66cc;"&gt;0.5&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;span style="color: #cc66cc;"&gt;0.5&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; txt&lt;span style="color: #339933;"&gt;,&lt;/span&gt; cex = cex.cor * r&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
&lt;span style="color: #009900;"&gt;}&lt;/span&gt;
&amp;nbsp;
&lt;span style="color: #666666; font-style: italic;"&gt;# Plot #2: same as above, but add loess smoother in lower and correlation in upper&lt;/span&gt;
&lt;a href="http://inside-r.org/r-doc/graphics/pairs"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;pairs&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/utils/data"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;data&lt;/span&gt;&lt;/a&gt;=&lt;a href="http://inside-r.org/r-doc/datasets/iris"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;iris&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lower.panel=&lt;a href="http://inside-r.org/r-doc/graphics/panel.smooth"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;panel.smooth&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; upper.panel=panel.cor&lt;span style="color: #339933;"&gt;,&lt;/span&gt; 
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pch=&lt;span style="color: #cc66cc;"&gt;20&lt;/span&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; main=&lt;span style="color: blue;"&gt;"Iris Scatterplot Matrix"&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-DbDNdN47KrU/TisXGKpxKGI/AAAAAAAAjkc/c-rWtzr02Yo/s1600/Rplot02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-DbDNdN47KrU/TisXGKpxKGI/AAAAAAAAjkc/c-rWtzr02Yo/s1600/Rplot02.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;
Finally, you can produce a similar plot using ggplot2, with the diagonal showing the kernel density.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;div style="overflow: auto;"&gt;&lt;div class="geshifilter"&gt;&lt;pre class="r geshifilter-R" style="font-family: monospace;"&gt;&lt;span style="color: #666666; font-style: italic;"&gt;# Plot #3: similar plot using ggplot2&lt;/span&gt;
&lt;span style="color: #666666; font-style: italic;"&gt;# install.packages("ggplot2") ## uncomment to install ggplot2&lt;/span&gt;
&lt;a href="http://inside-r.org/r-doc/base/library"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;library&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/packages/cran/ggplot2"&gt;ggplot2&lt;/a&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;
plotmatrix&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/base/with"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;with&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;&lt;a href="http://inside-r.org/r-doc/datasets/iris"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;iris&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #339933;"&gt;,&lt;/span&gt; &lt;a href="http://inside-r.org/r-doc/base/data.frame"&gt;&lt;span style="color: #003399; font-weight: bold;"&gt;data.frame&lt;/span&gt;&lt;/a&gt;&lt;span style="color: #009900;"&gt;(&lt;/span&gt;Sepal.Length&lt;span style="color: #339933;"&gt;,&lt;/span&gt; Sepal.Width&lt;span style="color: #339933;"&gt;,&lt;/span&gt; Petal.Length&lt;span style="color: #339933;"&gt;,&lt;/span&gt; Petal.Width&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;span style="color: #009900;"&gt;)&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;&lt;a href="http://www.inside-r.org/pretty-r" title="Created by Pretty R at inside-R.org"&gt;&lt;br /&gt;
&lt;/a&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-d5ym9arW1ZE/TisXj-nXh5I/AAAAAAAAjkg/EYH329WFIOs/s1600/Rplot03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-d5ym9arW1ZE/TisXj-nXh5I/AAAAAAAAjkg/EYH329WFIOs/s1600/Rplot03.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;a href="http://www.inside-r.org/pretty-r" title="Created by Pretty R at inside-R.org"&gt;&lt;br /&gt;
&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.inside-r.org/r-doc/graphics/pairs"&gt;See more on the pairs function here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
Update:&amp;nbsp; A tip of the hat to &lt;a href="http://had.co.nz/"&gt;Hadley Wickham&lt;/a&gt; (&lt;a href="http://twitter.com/#%21/hadleywickham"&gt;@hadleywickham&lt;/a&gt;) for pointing out two packages useful for scatterplot matrices. The &lt;a href="http://cran.r-project.org/web/packages/gpairs/index.html"&gt;gpairs&lt;/a&gt; package has some useful functionality for showing the relationship between both continuous and categorical variables in a dataset, and the &lt;a href="http://cran.r-project.org/web/packages/GGally/index.html"&gt;GGally&lt;/a&gt; package extends &lt;a href="http://had.co.nz/ggplot2/"&gt;ggplot2&lt;/a&gt; for plot matrices.&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-8122570924942869959?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/DAlomonTqtw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/8122570924942869959/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/07/scatterplot-matrices-in-r.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/8122570924942869959?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/8122570924942869959?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/07/scatterplot-matrices-in-r.html" title="Scatterplot matrices in R" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-c5bDhQx8Y-w/TisWWy8s4kI/AAAAAAAAjkY/k-Af-OWK1t0/s72-c/Rplot01.png" height="72" width="72" /><thr:total>4</thr:total></entry><entry gd:etag="W/&quot;Ak8FQ3Y4eSp7ImA9WhdTFEU.&quot;"><id>tag:blogger.com,1999:blog-6232819486261696035.post-7512106799130037961</id><published>2011-07-12T11:40:00.000-05:00</published><updated>2011-07-12T11:40:12.831-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-07-12T11:40:12.831-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Sequencing" /><category scheme="http://www.blogger.com/atom/ns#" term="1000 genomes" /><title>Download 69 Complete Human Genomes</title><content type="html">Sequencing company Complete Genomics recently made available &lt;a href="http://www.completegenomics.com/sequence-data/download-data/"&gt;69 ethnically diverse complete human genome sequences&lt;/a&gt;: a Yoruba trio; a Puerto Rican trio; a 17-member, 3-generation pedigree; and a diversity panel representing 9 different populations. Some of the samples partially overlap with HapMap and the 1000 Genomes Project. The data can be downloaded directly from the &lt;a href="ftp://ftp2.completegenomics.com/"&gt;FTP site&lt;/a&gt;. See the link below for more details on the directory contents, and have a look at the &lt;a href="http://media.completegenomics.com/documents/CG_QSG-WorkingCGData_Final.pdf"&gt;quick start guide&lt;/a&gt; to working with complete genomics data.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://www.completegenomics.com/sequence-data/download-data/"&gt;Complete Genomics - Sample Human Genome Sequence Data&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Getting Genetics Done by Stephen Turner is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6232819486261696035-7512106799130037961?l=gettinggeneticsdone.blogspot.com' alt='' /&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/GettingGeneticsDone/~4/gwxw2JaKDtI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://gettinggeneticsdone.blogspot.com/feeds/7512106799130037961/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/07/download-69-complete-human-genomes.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7512106799130037961?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/6232819486261696035/posts/default/7512106799130037961?v=2" /><link rel="alternate" type="text/html" href="http://gettinggeneticsdone.blogspot.com/2011/07/download-69-complete-human-genomes.html" title="Download 69 Complete Human Genomes" /><author><name>Stephen Turner</name><uri>http://www.blogger.com/profile/06656711316726116187</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="26" height="32" src="http://3.bp.blogspot.com/-aT3qBWI4VYc/TgvR9CnlS0I/AAAAAAAAMDk/KuA2GGqURcc/s220/pic2-cropped-400x500.jpg" /></author><thr:total>0</thr:total></entry></feed>

