tag:blogger.com,1999:blog-6710487119650146215Wed, 16 Apr 2014 14:53:15 +0000R ProjectR Tutorial SeriesstatisticstutorialANOVAbooksmultiple linear regressiontwo-wayupdatePackt Publishinganalysis of varianceguideone-waypairwise comparisonsR Graphs Cookbookgraphicshierarchical linear regressionomnibuspackagespublishingrepeated measuresreviewscatterplotsAPABonferroniEFAEndNoteFisher LSDGoogle AnalyticsHolmR BloggersR Programming WikibookStatistical Analysis with RTukey HSDWordbibliographybox plotscategorical regressioncell sizecenteringcitationscorrelationdatadescriptive statisticsdummy codingfactor analysisinteraction effectinteraction modelopen sourcepacktpolynomial regressionpsychregression assumptionsreshapesample sizescalesimple linear regressionsimple main effectsstandardized scoressummary statisticsunequalvariablesz-scoresR Tutorial Series<p><strong>By <a href="http://www.johnmquick.com">John M Quick</a></strong></p>
<p>The R Tutorial Series provides a collection of user-friendly guides to researchers, students, and others who want to learn how to use R for their statistical analyses.</p>http://rtutorialseries.blogspot.com/noreply@blogger.com (John Quick)Blogger36125tag:blogger.com,1999:blog-6710487119650146215.post-4552488905530897204Thu, 01 Mar 2012 14:00:00 +00002012-05-28T22:42:19.082-07:00centeringR ProjectR Tutorial Seriesscalestandardized scorestutorialvariablesz-scoresR Tutorial Series: Centering Variables and Generating Z-Scores with the Scale() FunctionCentering variables and creating z-scores are two common data analysis activities. While they are relatively simple to calculate by hand, R makes these operations extremely easy thanks to the <span class="Apple-style-span" style="color: #cc0000;">scale()</span> function.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_scale.csv" target="_blank">dataset (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory.<br />
<h3>The Scale() Function</h3>The <span class="Apple-style-span" style="color: #cc0000;">scale()</span> function makes use of the following arguments.<br />
<ul><li>x: a numeric object</li>
<li>center: if TRUE, the objects' column means are subtracted from the values in those columns (ignoring NAs); if FALSE, centering is not performed</li>
<li>scale: if TRUE, the centered column values are divided by the column's standard deviation (when center is also TRUE; otherwise, the root mean square is used); if FALSE, scaling is not performed</li>
</ul><h3>Centering Variables</h3>Normally, to center a variable, you would subtract the mean of all data points from each individual data point. With <span class="Apple-style-span" style="color: #cc0000;">scale()</span>, this can be accomplished in one simple call.<br />
<blockquote class="codeBlock"><ol><li>> #center variable A using the scale() function</li>
<li>> scale(A, center = TRUE, scale = FALSE)</li>
</ol></blockquote>You can verify these results by making the calculation by hand, as demonstrated in the following screenshot.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-WgKo4UZvx5w/T8FKJZ53y2I/AAAAAAAAA8k/lqde6E_8heU/s1600/20120224_scale_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-WgKo4UZvx5w/T8FKJZ53y2I/AAAAAAAAA8k/lqde6E_8heU/s1600/20120224_scale_1.png" /></a></div><br />
<div class="alignCenter"></div><div class="alignCenter" style="text-align: center;">Centering a variable with the scale() function and by hand</div><h3>Generating Z-Scores</h3>Normally, to create z-scores (standardized scores) from a variable, you would subtract the mean of all data points from each individual data point, then divide those points by the standard deviation of all points. Again, this can be accomplished in one call using <span class="Apple-style-span" style="color: #cc0000;">scale()</span>.<br />
<blockquote class="codeBlock"><ol><li>> #generate z-scores for variable A using the scale() function</li>
<li>> scale(A, center = TRUE, scale = TRUE)</li>
</ol></blockquote>Again, the following screenshot demonstrates equivalence between the function results and hand calculation.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-EVDTuoRwlAg/T8FKJzvQSRI/AAAAAAAAA88/JXKEVUKagjM/s1600/20120224_scale_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-EVDTuoRwlAg/T8FKJzvQSRI/AAAAAAAAA88/JXKEVUKagjM/s1600/20120224_scale_2.png" /></a></div><br />
<div class="alignCenter"></div><div class="alignCenter" style="text-align: center;">Generating z-scores from a variable by hand and using the scale() function</div><h3>Complete Scale() Example</h3>To see a complete example of how <span class="Apple-style-span" style="color: #cc0000;">scale()</span> can be used to center variables and generate z-scores in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_scale.txt" target="_blank">scale() example (.txt)</a> file.<br />
<h3>References</h3>The official scale function manual page is available from: http://stat.ethz.ch/R-manual/R-patched/library/base/html/scale.html<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/2rk7_AY2ZQI" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/2rk7_AY2ZQI/r-tutorial-series-centering-variables.htmlnoreply@blogger.com (John Quick)6http://rtutorialseries.blogspot.com/2012/03/r-tutorial-series-centering-variables.htmltag:blogger.com,1999:blog-6710487119650146215.post-7990576896814528785Fri, 06 Jan 2012 15:55:00 +00002012-05-26T15:43:43.597-07:00APAbibliographycitationsEndNoteR ProjectR Tutorial SeriestutorialWordR Tutorial Series: Citing R with EndNoteUnfortunately, due to the vexing complexities of academic style guides and the limitations of associated software packages, citing a non-standard name, such as Cher, Prince, or R Development Core Team can be problematic. Thankfully, I have discovered a simple trick in Word and EndNote that allows for the accurate automatic formatting of R citations. Note that this method was developed using Word 2011 and EndNote X4 for Mac. I am unaware of the differences between operating systems and software versions, but it is anticipated that this method will work for almost anyone.<br />
<h3>
The Intuitive, But Nonworking Way</h3>
If you were going to create your R record in EndNote, you would probably enter something like what is pictured below. In the name field, it makes sense to just type in R Development Core Team.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-swKDkzeP4uU/T8FKIhL_CzI/AAAAAAAAA8M/uhaaB5dTxXQ/s1600/20120106_endnote_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="243" src="http://4.bp.blogspot.com/-swKDkzeP4uU/T8FKIhL_CzI/AAAAAAAAA8M/uhaaB5dTxXQ/s400/20120106_endnote_1.png" width="400" /></a></div>
<br />
<div class="alignCenter">
</div>
However, this is where things take an untimely turn. EndNote will try to interpret that peculiar name as a series of first, last, and middle names, which leads to inaccurate citations.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-9IgLQzbLJA0/T8FKI_BXerI/AAAAAAAAA8Q/qnoKvinVzvE/s1600/20120106_endnote_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-9IgLQzbLJA0/T8FKI_BXerI/AAAAAAAAA8Q/qnoKvinVzvE/s1600/20120106_endnote_2.png" /></a></div>
<br />
<div class="alignCenter">
</div>
<h3>
The Unintiuitive, But Working Way</h3>
This is where we basically need to trick EndNote into interpreting our R citation the proper way. All we have to do is add a comma after R Development Core Team in the name field.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-BNY9theSeZI/T8FKJSd7N4I/AAAAAAAAA8c/q9eNrUnUV6A/s1600/20120106_endnote_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="http://4.bp.blogspot.com/-BNY9theSeZI/T8FKJSd7N4I/AAAAAAAAA8c/q9eNrUnUV6A/s400/20120106_endnote_3.png" width="400" /></a></div>
<br />
<div class="alignCenter">
</div>
This tells EndNote that R Core Development Team is a complete last name of an author that has no first name. Hence, EndNote uses what it has (a last name with no first name) in generating its citations.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-iIZ13yrprX8/T8FKJWuQPWI/AAAAAAAAA8g/VHTr4Ee40LI/s1600/20120106_endnote_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-iIZ13yrprX8/T8FKJWuQPWI/AAAAAAAAA8g/VHTr4Ee40LI/s1600/20120106_endnote_4.png" /></a></div>
<br />
<div class="alignCenter">
</div>
Note: The official citation for R can be found by issuing the citation() command in the R console.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/wUym6DTaAIk" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/wUym6DTaAIk/r-tutorial-series-citing-r-with-endnote.htmlnoreply@blogger.com (John Quick)16http://rtutorialseries.blogspot.com/2012/01/r-tutorial-series-citing-r-with-endnote.htmltag:blogger.com,1999:blog-6710487119650146215.post-5802274663531997130Mon, 24 Oct 2011 14:00:00 +00002012-05-28T22:40:45.079-07:00EFAfactor analysispackagespsychR ProjectR Tutorial SeriesstatisticstutorialR Tutorial Series: Exploratory Factor AnalysisExploratory factor analysis (EFA) is a common technique in the social sciences for explaining the variance between several measured variables as a smaller set of latent variables. EFA is often used to consolidate survey data by revealing the groupings (factors) that underly individual questions. This will be the context for demonstration in this tutorial.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_exploratoryFactorAnalysis.csv" target="_blank">dataset (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 300 responses on 6 items from a survey of college students' favorite subject matter. The items range in value from 1 to 5, which represent a scale from Strongly Dislike to Strongly Like. Our 6 items asked students to rate their liking of different college subject matter areas, including biology (BIO), geology (GEO), chemistry (CHEM), algebra (ALG), calculus (CALC), and statistics (STAT). This is where our tutorial ends, because all students rated all of these content areas as Strongly Dislike, thereby rendering insufficient variance for conducting EFA (just kidding).<br />
<h3>Beginning Steps</h3>To begin, we need to read our datasets into R and store their contents in variables.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into R variable using the read.csv(file) function</li>
<li>> data <- read.csv("dataset_EFA.csv")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-EywFSq_7Tsg/T8FKIWXvBII/AAAAAAAAA70/J4NyA8Xrxbk/s1600/20111024_exploratoryFactorAnalysis_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-EywFSq_7Tsg/T8FKIWXvBII/AAAAAAAAA70/J4NyA8Xrxbk/s1600/20111024_exploratoryFactorAnalysis_1.png" /></a></div><div class="alignCenter"><br />
</div><div class="alignCenter"></div><div class="alignCenter" style="text-align: center;">First 10 rows of the dataset</div><h3>Psych Package</h3>Next, we need to install and load the <em>psych</em> package, which I prefer to use when conducting EFA. In this tutorial, we will make use of the package's <span class="Apple-style-span" style="color: #cc0000;">fa()</span> function.<br />
<blockquote class="codeBlock"><ol><li>> #install the package</li>
<li>> install.packages("psych")</li>
<li>> #load the package</li>
<li>> library(psych)</li>
</ol></blockquote><h3>Number of Factors</h3>For this tutorial, we will assume that the appropriate number of factors has already been determined to be 2, such as through eigenvalues, scree tests, and a priori considerations. Most often, you will want to test solutions above and below the determined amount to ensure the optimal number of factors was selected.<br />
<h3>Factor Solution</h3>To derive the factor solution, we will use the <span class="Apple-style-span" style="color: #cc0000;">fa()</span> function from the <span class="Apple-style-span" style="color: #cc0000;">psych</span> package, which receives the following primary arguments.<br />
<ul><li>r: the correlation matrix</li>
<li>nfactors: number of factors to be extracted (default = 1)</li>
<li>rotate: one of several matrix rotation methods, such as "varimax" or "oblimin"</li>
<li>fm: one of several factoring methods, such as "pa" (principal axis) or "ml" (maximum likelihood)</li>
</ul>Note that several rotation and factoring methods are available when conducting EFA. Rotation methods can be described as <em>orthogonal</em>, which do not allow the resulting factors to be correlated, and <em>oblique</em>, which do allow the resulting factors to be correlated. Factoring methods can be described as <em>common</em>, which are used when the goal is to better describe data, and <em>component</em>, which are used when the goal is to reduce the amount of data. The <span class="Apple-style-span" style="color: #cc0000;">fa()</span> function is used for common factoring. For component analysis, see <span class="Apple-style-span" style="color: #cc0000;">princomp()</span>. The best methods will vary by circumstance and it is therefore recommended that you seek professional council in determining the optimal parameters for your future EFAs.<br />
In this tutorial, we will use oblique rotation (<span class="Apple-style-span" style="color: #cc0000;">rotate = "oblimin"</span>), which recognizes that there is likely to be some correlation between students' latent subject matter preference factors in the real world. We will use principal axis factoring (<span class="Apple-style-span" style="color: #cc0000;">fm = "pa"</span>), because we are most interested in identifying the underlying constructs in the data.<br />
<blockquote class="codeBlock"><ol><li>> #calculate the correlation matrix</li>
<li>> corMat <- cor(data)</li>
<li>> #display the correlation matrix</li>
<li>> corMat</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-zv86aI7Auh8/T8FKIvof-LI/AAAAAAAAA8A/DeoF7ZSgIlU/s1600/20111024_exploratoryFactorAnalysis_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-zv86aI7Auh8/T8FKIvof-LI/AAAAAAAAA8A/DeoF7ZSgIlU/s1600/20111024_exploratoryFactorAnalysis_2.png" /></a></div><div class="alignCenter"><br />
</div><div class="alignCenter"></div><div class="alignCenter" style="text-align: center;">The correlation matrix</div><blockquote class="codeBlock"><ol><li>> #use fa() to conduct an oblique principal-axis exploratory factor analysis </li>
<li>> #save the solution to an R variable </li>
<li>> solution <- fa(r = corMat, nfactors = 2, rotate = "oblimin", fm = "pa") </li>
<li>> #display the solution output </li>
<li>> solution</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-R43t9T64GYQ/T8FKImdHrRI/AAAAAAAAA8I/bOKKx4PO27w/s1600/20111024_exploratoryFactorAnalysis_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-R43t9T64GYQ/T8FKImdHrRI/AAAAAAAAA8I/bOKKx4PO27w/s1600/20111024_exploratoryFactorAnalysis_3.png" /></a></div><div class="alignCenter"><br />
</div><div class="alignCenter"></div><div class="alignCenter" style="text-align: center;">Complete solution output</div><br />
By looking at our factor loadings, we can begin to assess our factor solution. We can see that BIO, GEO, and CHEM all have high factor loadings around 0.8 on the first factor (PA1). Therefore, we might call this factor <em>Science</em> and consider it representative of a student's interest in science subject matter. Similarly, ALG, CALC, and STAT load highly on the second factor (PA2), which we might call <em>Math</em>. Note that STAT has a much lower loading on PA2 than ALG or CALC and that it has a slight loading on factor PA1. This suggests that statistics is less related to the concept of <em>Math</em> than algebra and calculus. Just below the loadings table, we can see that each factor accounted for around 30% of the variance in responses, leading to a factor solution that accounted for 66% of the total variance in students' subject matter preference. Lastly, notice that our factors are correlated at 0.21 and recall that our choice of oblique rotation allowed for the recognition of this relationship. <br />
Of course, there are many other considerations to be made in developing and assessing an EFA that will not be presented here. The intent with this tutorial was simply to demonstrate the basic execution of EFA in R. For a detailed and digestible overview of EFA, I recommend the Factor Analysis chapter of <em>Multivariate Data Analysis</em> by Hair, Black, Babin, and Anderson.<br />
<h3>Complete EFA Example</h3>To see a complete example of how EFA data can be organized using the <em>psych</em> package in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_exploratoryFactorAnalysis.txt" target="_blank">EFA example (.txt)</a> file. For the code used in this tutorial, download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_exploratoryFactorAnalysis.R" target="_blank">EFA Example (.R)</a> file.<br />
<h3>References</h3>Revelle, W. (2011). psych: Procedures for Personality and Psychological Research. http://personality-project.org/r/psych.manual.pdf<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/LLM3lnuMWzQ" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/LLM3lnuMWzQ/r-tutorial-series-exploratory-factor.htmlnoreply@blogger.com (John Quick)8http://rtutorialseries.blogspot.com/2011/10/r-tutorial-series-exploratory-factor.htmltag:blogger.com,1999:blog-6710487119650146215.post-5875265027083992903Fri, 08 Jul 2011 14:00:00 +00002011-07-08T07:00:09.408-07:00booksopen sourceR Programming WikibookR ProjectR Tutorial SeriesstatisticstutorialThe R Programming Wikibook<p>The R Programming wikibook is an open source community project that "aims to create a cross-disciplinary practical guide to the R programming language." It was launched in June 2011 and is seeking content and contributors. The full call for the R Programming wikibook can be found on <a href="http://www.r-statistics.com/2011/06/calling-r-lovers-and-bloggers-to-work-together-on-the-r-programming-wikibook/" target="_blank">Tal Galili's blog</a>. The R Programming wikibook itself is available at <a href="http://en.wikibooks.org/wiki/R_Programming" target="_blank">http://en.wikibooks.org/wiki/R_Programming</a>.</p><p>I am writing to raise awareness for the R Programming wikibook and to formally offer content from the <span dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">R Tutorial Series</span> by <a cc="http://creativecommons.org/ns#" href="http://www.johnmquick.com/" property="cc:attributionName" rel="cc:attributionURL" target="_blank">John M. Quick</a> for use under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/" target="_blank">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>. This means that articles from the R Tutorial Series may be included in and modified for the R Programming wikibook, so long as proper attribution is given and the resulting content is made available under an equivalent license. A complete list of contributing blogs can be found on the R Programming wikibook's <a href="http://en.wikibooks.org/wiki/R_Programming/Sources" target="_blank">Sources page</a>. I hope that the <a href="http://en.wikibooks.org/wiki/R_Programming" target="_blank">R Programming wikibook</a> will thrive and grow to support a large community of R users, including readers of the R Tutorial Series.</p><img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/JVjrzr_Rk5E" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/JVjrzr_Rk5E/r-programming-wikibook.htmlnoreply@blogger.com (John Quick)1http://rtutorialseries.blogspot.com/2011/07/r-programming-wikibook.htmltag:blogger.com,1999:blog-6710487119650146215.post-7200295657489296954Mon, 28 Mar 2011 14:00:00 +00002012-05-26T15:26:42.709-07:00ANOVAdataGoogle AnalyticsR ProjectR Tutorial SeriesR Tutorial Series: 2011 ANOVA Article DataHaving wrapped up a recent flurry of R ANOVA articles (and exhausted my knowledge of the subject), I decided to take a look at the R Tutorial Series' Google Analytics data from the past few months. <br />
Since I posted the Two-Way Omnibus ANOVA article on January 17, we have had about 150 visits per day and over 19,000 total page views. The original introduction to R posts are still the most popular ones here, although a few from the regression and ANOVA series are also represented in the most viewed.<br />
I also wanted to share a funny observation about the patterns of visits to the R Tutorial Series. The following graph portrays our daily viewership over the past few months.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-FYrT1afnSwg/T8FKIIWjKBI/AAAAAAAAA7o/GQxSdWW5ems/s1600/20110325_rBlogStats.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-FYrT1afnSwg/T8FKIIWjKBI/AAAAAAAAA7o/GQxSdWW5ems/s1600/20110325_rBlogStats.png" /></a></div>
<br />
<div align="center">
</div>
The valleys on the chart correspond to the weekends (evidently when no one wants to read about statistical computing). The initial peaks are Mondays, which happen to be when I most often make new posts. Typically, a slightly higher peak comes on Tuesday, followed by a gradual decline back into the weekend valley. Thus, our visits to the R Tutorial Series end up creating a nice little wave pattern throughout the year. The good news with these numbers is that people are reading the R Tutorial Series and (hopefully) learning to use R and apply it to their daily work, which is the blog's ultimate purpose. I also appreciate all of the comments, questions, and tips that have been posted by readers. Your feedback really helps to improve the tutorials.<br />
<h3>
Upcoming Plans</h3>
As mentioned, I have concluded my planned coverage of ANOVA in R. Thus, we have reached a sort of break period, much like the one that followed last year's spurt of regression tutorials. I do plan to keep writing R tutorials, but for the time being, they may arrive in less predictable intervals and cover a wider variety of content. As always, I welcome contact regarding guest posts, especially on statistical and R content that I have not covered or am not yet familiar with.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/xbytgM5u2mk" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/xbytgM5u2mk/r-tutorial-series-2011-anova-article.htmlnoreply@blogger.com (John Quick)2http://rtutorialseries.blogspot.com/2011/03/r-tutorial-series-2011-anova-article.htmltag:blogger.com,1999:blog-6710487119650146215.post-5974080616435214255Mon, 14 Mar 2011 14:00:00 +00002012-05-28T22:38:18.440-07:00ANOVApackagesR ProjectR Tutorial SeriesreshapestatisticstutorialR Tutorial Series: Applying the Reshape Package to Organize ANOVA DataAs demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the <span class="Apple-style-span" style="color: #cc0000;">tapply()</span> function to generate a table for a single summary function. In repeated measures ANOVA, we used separate datasets for our omnibus ANOVA and follow-up comparisons. This tutorial will demonstrate how the <em>reshape</em> package can be used to simplify the ANOVA data organization process in R.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_reshape_1.csv" target="_blank">between group</a> and <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_reshape_2.csv" target="_blank">repeated measures</a> datasets (.csv) used in this tutorial. Be sure to right-click and save the files to your R working directory. The between groups dataset contains a hypothetical sample of 30 cases separated into three groups (a, b, and c). The repeated measures dataset contains a hypothetical sample of 10 cases across three measurements (a, b, and c). In both cases, the values are represented on a scale that ranges from 1 to 5.<br />
<h3>Beginning Steps</h3>To begin, we need to read our datasets into R and store their contents in variables.<br />
<blockquote class="codeBlock"><ol><li>> #read the datasets into R variables using the read.csv(file) function</li>
<li>> dataBetween <- read.csv("dataset_ANOVA_reshape_1.csv")</li>
<li>> dataRepeated <- read.csv("dataset_ANOVA_reshape_2.csv")</li>
</ol></blockquote><h3>Reshape Package</h3>Next, we need to install and load the <em>reshape</em> package. In this tutorial, we will make use of the package's <span class="Apple-style-span" style="color: #cc0000;">cast()</span> and <span class="Apple-style-span" style="color: #cc0000;">melt()</span> functions.<br />
<blockquote class="codeBlock"><ol><li>> #install the package</li>
<li>> install.packages("reshape")</li>
<li>> #load the package</li>
<li>> library(reshape)</li>
</ol></blockquote><h3>Using cast() to Derive ANOVA Descriptives</h3>The <span class="Apple-style-span" style="color: #cc0000;">cast()</span> function can be used to easily derive summary statistics for a between groups ANOVA dataset. The <span class="Apple-style-span" style="color: #cc0000;">cast()</span> function receives the following primary arguments.<br />
<ul><li>data: the dataset</li>
<li>formula: in our case, a one-sided formula indicating the grouping variable</li>
<li>fun.aggregate: a function or vector of functions for deriving summary statistics, such as mean, var, or sd</li>
</ul><blockquote class="codeBlock"><ol><li>> #display the raw between groups data</li>
<li>> dataBetween</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-qENWqmSzDwI/T8FKHaOMw1I/AAAAAAAAA68/ecbHNiPsAPQ/s1600/20110314_anova_reshape_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-qENWqmSzDwI/T8FKHaOMw1I/AAAAAAAAA68/ecbHNiPsAPQ/s1600/20110314_anova_reshape_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The raw between groups data</div><blockquote class="codeBlock"><ol><li>> #cast the between groups data using cast(data, formula, fun.aggregate) to get the group means</li>
<li>> cast(dataBetween, formula = ~group, fun.aggregate = mean)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-O6sKYFRuSio/T8FKHwOpolI/AAAAAAAAA7M/NRdtMP7505E/s1600/20110314_anova_reshape_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-O6sKYFRuSio/T8FKHwOpolI/AAAAAAAAA7M/NRdtMP7505E/s1600/20110314_anova_reshape_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The casted data with means</div><br />
Note that the <span class="Apple-style-span" style="color: #cc0000;">fun.aggregate</span> argument can also receive a vector of summary statistics functions. This will yield all of the requested descriptives via a single <span class="Apple-style-span" style="color: #cc0000;">cast()</span> function.<br />
<blockquote class="codeBlock"><ol><li>> #cast the between groups data using cast(data, formula, fun.aggregate) to get the group means, variances, and standard deviations</li>
<li>> cast(dataBetween, formula = ~group, fun.aggregate = c(mean, var, sd))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-L6RrWuSiqQk/T8FKHwiecmI/AAAAAAAAA7Q/LJqgrsh69r0/s1600/20110314_anova_reshape_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="32" src="http://1.bp.blogspot.com/-L6RrWuSiqQk/T8FKHwiecmI/AAAAAAAAA7Q/LJqgrsh69r0/s400/20110314_anova_reshape_3.png" width="400" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The casted data with descriptives</div><h3>Using melt() to Prepare Repeated Measures Data for Pairwise Comparisons</h3>The <span class="Apple-style-span" style="color: #cc0000;">melt()</span> function can be used to morph a repeated measures ANOVA dataset prior to conducting pairwise comparisons. The <span class="Apple-style-span" style="color: #cc0000;">melt()</span> function receives the following primary arguments.<br />
<ul><li>data: the dataset</li>
<li>id.vars: the id variable or a vector of values that can be used as ids</li>
<li>measure.vars: a vector containing the variables to be melted</li>
<li>variable_name: the name of the column containing the melted variables</li>
</ul><blockquote class="codeBlock"><ol><li>> #display the repeated measures data</li>
<li>> dataRepeated</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-GUSmU7e-di8/T8FKH_g9S3I/AAAAAAAAA7U/3H3WDvTKK8A/s1600/20110314_anova_reshape_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-GUSmU7e-di8/T8FKH_g9S3I/AAAAAAAAA7U/3H3WDvTKK8A/s1600/20110314_anova_reshape_4.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The raw repeated measures data</div><blockquote class="codeBlock"><ol><li>> #melt the repeated measures data using melt(data, id.vars, measure.vars, variable_name) to organize it for pairwise comparisons</li>
<li>> melt(dataRepeated, id.vars = "case", measure.vars = c("valueA", "valueB", "valueC"), variable_name = "abcValues")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-fd9QdF-ODz0/T8FKIGYFivI/AAAAAAAAA7s/ekBP5LQMkl4/s1600/20110314_anova_reshape_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-fd9QdF-ODz0/T8FKIGYFivI/AAAAAAAAA7s/ekBP5LQMkl4/s1600/20110314_anova_reshape_5.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The melted repeated measures data</div><br />
Note that the data are now prepared to be used in the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function. See the <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-one-way-anova-with.html" target="_blank">One-Way ANOVA with Pairwise Comparisons</a> tutorial for details on using the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function.<br />
<h3>Complete ANOVA Reshape Example</h3>To see a complete example of how ANOVA data can be organized using the <em>reshape</em> package in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_reshape.txt" target="_blank">ANOVA reshape example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/uBwUngF18-k" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/uBwUngF18-k/r-tutorial-series-applying-reshape.htmlnoreply@blogger.com (John Quick)4http://rtutorialseries.blogspot.com/2011/03/r-tutorial-series-applying-reshape.htmltag:blogger.com,1999:blog-6710487119650146215.post-7160272712432283929Mon, 07 Mar 2011 14:00:00 +00002012-05-28T22:36:47.535-07:00ANOVABonferroniFisher LSDHolmpairwise comparisonsR ProjectR Tutorial SeriesstatisticsTukey HSDtutorialR Tutorial Series: ANOVA Pairwise Comparison MethodsWhen we have a statistically significant effect in ANOVA and an independent variable of more than two levels, we typically want to make follow-up comparisons. There are numerous methods for making pairwise comparisons and this tutorial will demonstrate how to execute several different techniques in R.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_comparisonMethods.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 participants who are divided into three stress reduction treatment groups (mental, physical, and medical). The values are represented on a scale that ranges from 1 to 5. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication. The values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness.<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataPairwiseComparisons <- read.csv("dataset_ANOVA_OneWayComparisons.csv")</li>
<li>> #display the data</li>
<li>> dataPairwiseComparisons </li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-WxHzPxXIomI/T8FKGlEIFjI/AAAAAAAAA6Q/rtN9ikTjhL8/s1600/20110307_anova_comparisonMethods_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-WxHzPxXIomI/T8FKGlEIFjI/AAAAAAAAA6Q/rtN9ikTjhL8/s1600/20110307_anova_comparisonMethods_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset</div><h3>Omnibus ANOVA</h3>For the purposes of this tutorial, we will assume that the omnibus ANOVA has already been conducted and that the main effect for treatment was statistically significant. For details on this process, see the <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-one-way-anova-with.html" target="_blank">One-Way ANOVA with Pairwise Comparisons</a> tutorial, which uses the same dataset.<br />
<h3>Means</h3>Let's also look at the means of our treatment groups. Here, we will use the <span class="Apple-style-span" style="color: #cc0000;">tapply()</span> function, along with the following arguments, to generate a table of means.<br />
<ul><li>X: the data</li>
<li>INDEX: a list() of factor variables</li>
<li>FUN: the function to be applied</li>
</ul><blockquote class="codeBlock"><ol><li>> #use tapply(X, INDEX, FUN) to generate a table displaying each treatment group mean</li>
<li>> tapply(X = dataPairwiseComparisons$StressReduction, INDEX = list(dataPairwiseComparisons$Treatment), FUN = mean)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-CxsN4gqGd-s/T8FKGizzM5I/AAAAAAAAA6M/gcI3XZwC74Y/s1600/20110307_anova_comparisonMethods_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-CxsN4gqGd-s/T8FKGizzM5I/AAAAAAAAA6M/gcI3XZwC74Y/s1600/20110307_anova_comparisonMethods_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The treatment group means</div><h3>Pairwise Comparisons</h3>We will cover five major techniques for controlling Type I error when making pairwise comparisons. These methods are no adjustment, Bonferroni's adjustment, Holm's adjustment, Fisher's LSD, and Tukey's HSD. All of these techniques will be demonstrated on our sample dataset, although the decision as to which to use in a given situation is left up to the reader.<br />
<h4>pairwise.t.test()</h4>Our first three methods will make use of the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function, which has the following major arguments.<br />
<ul><li>x: the dependent variable</li>
<li>g: the independent variable</li>
<li>p.adj: the p-value adjustment method used to control for the family-wise Type I error rate across the comparisons; one of "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or "BY"</li>
</ul><h4>No Adjustment</h4>Using <span class="Apple-style-span" style="color: #cc0000;">p.adj = "none"</span> in the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function makes no correction for the Type I error rate across the pairwise tests. This technique can be useful for employing methods that are not already built into R functions, such as the Shaffer/Modified Shaffer, which use different alpha level divisors based on the number of levels composing the independent variable. The console results will contain no adjustment, but the researcher can manually consider the statistical significance of the p-values under his or her desired alpha level.<br />
<blockquote class="codeBlock"><ol><li>> #use pairwise.t.test(x, g, p.adj) to test the pairwise comparisons between the treatment group means</li>
<li>> #no adjustment</li>
<li>> pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "none")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-XcyvFcdOW4o/T8FKGzPgwUI/AAAAAAAAA6c/fH5PBt4ahUQ/s1600/20110307_anova_comparisonMethods_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-XcyvFcdOW4o/T8FKGzPgwUI/AAAAAAAAA6c/fH5PBt4ahUQ/s1600/20110307_anova_comparisonMethods_3.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means with no adjustment</div><br />
With no adjustment, the mental-medical and physical-medical comparisons are statistically significant, whereas the mental-physical comparison is not. This suggests that both the mental and physical treatments are superior to the medical treatment, but that there is insufficient statistical support to distinguish between the mental and physical treatments.<br />
<h4>Bonferroni Adjustment</h4>The Bonferroni adjustment simply divides the Type I error rate (.05) by the number of tests (in this case, three). Hence, this method is often considered overly conservative. The Bonferroni adjustment can be made using <span class="Apple-style-span" style="color: #cc0000;">p.adj = "bonferroni"</span> in the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function.<br />
<blockquote class="codeBlock"><ol><li>> #Bonferroni adjustment</li>
<li>> pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "bonferroni")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-Ctgt_WxceWU/T8FKG5wRc0I/AAAAAAAAA6k/eHsIsrIZMHY/s1600/20110307_anova_comparisonMethods_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-Ctgt_WxceWU/T8FKG5wRc0I/AAAAAAAAA6k/eHsIsrIZMHY/s1600/20110307_anova_comparisonMethods_4.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means using Bonferroni adjustment</div><br />
Using the Bonferroni adjustment, only the mental-medical comparison is statistically significant. This suggests that the mental treatment is superior to the medical treatment, but that there is insufficient statistical support to distinguish between the mental and physical treatments and the physical and medical treatments. Notice that these results are more conservative than with no adjustment.<br />
<h4>Holm Adjustment</h4>The Holm adjustment sequentially compares the lowest p-value with a Type I error rate that is reduced for each consecutive test. In our case, this means that our first p-value is tested at the .05/3 level (.017), second at the .05/2 level (.025), and third at the .05/1 level (.05). This method is generally considered superior to the Bonferroni adjustment and can be employed using <span class="Apple-style-span" style="color: #cc0000;">p.adj = "holm"</span> in the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function.<br />
<blockquote class="codeBlock"><ol><li>> #Holm adjustment</li>
<li>> pairwise.t.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, p.adj = "holm")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-aThW1ZFASJ4/T8FKHGP-8VI/AAAAAAAAA6o/5vxVz5DqX-c/s1600/20110307_anova_comparisonMethods_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-aThW1ZFASJ4/T8FKHGP-8VI/AAAAAAAAA6o/5vxVz5DqX-c/s1600/20110307_anova_comparisonMethods_5.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means using Holm adjustment</div><br />
Using the Holm procedure, our results are practically (but not mathematically) identical to using no adjustment.<br />
<h4>LSD Method</h4>The Fisher Least Significant Difference (LSD) method essentially does not correct for the Type I error rate for multiple comparisons and is generally not recommended relative to other options. However, should the need arise to employ this method, one should seek out the <span class="Apple-style-span" style="color: #cc0000;">LSD.test()</span> function in the <em>agricolae</em> package, which has the following major arguments.<br />
<ul><li>y: the dependent variable</li>
<li>trt: the independent variable</li>
<li>DFerror: the degrees of freedom error</li>
<li>MSerror: the mean squared error</li>
</ul>Note that the <span class="Apple-style-span" style="color: #cc0000;">DFerror</span> and <span class="Apple-style-span" style="color: #cc0000;">MSerror</span> can be found in the omnibus ANOVA table.<br />
<blockquote class="codeBlock"><ol><li>> #load the agricolae package (install first, if necessary)</li>
<li>> library(agricolae)</li>
<li>#LSD method</li>
<li>#use LSD.test(y, trt, DFerror, MSerror) to test the pairwise comparisons between the treatment group means</li>
<li>> LSD.test(dataPairwiseComparisons$StressReduction, dataPairwiseComparisons$Treatment, 30.5, 1.13)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-8go73ribmyY/T8FKHLRNFkI/AAAAAAAAA64/WG_RNorcXFQ/s1600/20110307_anova_comparisonMethods_6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-8go73ribmyY/T8FKHLRNFkI/AAAAAAAAA64/WG_RNorcXFQ/s1600/20110307_anova_comparisonMethods_6.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means using LSD method</div><br />
Using the LSD method, our results are practically (but not mathematically) identical to using no adjustment or the Holm procedure.<br />
<h4>HSD Method</h4>The Tukey Honest Significant Difference (HSD) method controls for the Type I error rate across multiple comparisons and is generally considered an acceptable technique. This method can be executed using the <span class="Apple-style-span" style="color: #cc0000;">TukeyHSD(x)</span> function, where <span class="Apple-style-span" style="color: #cc0000;">x</span> is a linear model object created using the <span class="Apple-style-span" style="color: #cc0000;">aov(formula, data)</span> function. Note that in this application, the <span class="Apple-style-span" style="color: #cc0000;">aov(formula, data)</span> function is identical to the <span class="Apple-style-span" style="color: #cc0000;">lm(formula, data)</span> that we are already familiar with from <a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-simple-linear.html" target="_blank">linear regression</a>.<br />
<blockquote class="codeBlock"><ol><li>> #HSD method</li>
<li>> #use TukeyHSD(x), in tandem with aov(formula, data), to test the pairwise comparisons between the treatment group means</li>
<li>TukeyHSD(aov(StressReduction ~ Treatment, dataPairwiseComparisons))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-NbSnQBu_utw/T8FKHRmswHI/AAAAAAAAA7A/53NZVpklO0g/s1600/20110307_anova_comparisonMethods_7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-NbSnQBu_utw/T8FKHRmswHI/AAAAAAAAA7A/53NZVpklO0g/s1600/20110307_anova_comparisonMethods_7.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means using HSD method</div><br />
Using the HSD method, our results are practically (but not mathematically) identical to using the Bonferroni, Holm, or LSD methods. <br />
<h3>Complete Pairwise Comparisons Example</h3>To see a complete example of how various pairwise comparison techniques can be applied in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_comparisonMethods.txt" target="_blank">ANOVA pairwise comparisons example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/A9gRXiQp8zo" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/A9gRXiQp8zo/r-tutorial-series-anova-pairwise.htmlnoreply@blogger.com (John Quick)8http://rtutorialseries.blogspot.com/2011/03/r-tutorial-series-anova-pairwise.htmltag:blogger.com,1999:blog-6710487119650146215.post-2339830260901121974Mon, 28 Feb 2011 14:00:00 +00002012-05-28T22:35:06.319-07:00ANOVAcell sizeR ProjectR Tutorial Seriessample sizestatisticstutorialtwo-wayunequalR Tutorial Series: Two-Way ANOVA with Unequal Sample SizesWhen the sample sizes within the levels of our independent variables are not equal, we have to handle our ANOVA differently than in the typical two-way case. This tutorial will demonstrate how to conduct a two-way ANOVA in R when the sample sizes within each level of the independent variables are not the same.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_twoWay_unequalSample.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 students who were exposed to one of two learning environments (offline or online) and one of two methods of instruction (classroom or tutor), then tested on a math assessment. Possible math scores range from 0 to 100 and indicate how well each student performed on the math assessment. Each student participated in either an offline or online learning environment and received either classroom instruction (i.e. one to many) or instruction from a personal tutor (i.e. one to one).<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataTwoWayUnequalSample <- read.csv("dataset_ANOVA_TwoWayUnequalSample.csv")</li>
<li>> #display the data</li>
<li>> dataTwoWayUnequalSample</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-rFjx6KNhpcA/T8FKFv3GnVI/AAAAAAAAA5c/vAJ8iqcOMOU/s1600/20110228_anova_twoWay_unequalSample_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-rFjx6KNhpcA/T8FKFv3GnVI/AAAAAAAAA5c/vAJ8iqcOMOU/s1600/20110228_anova_twoWay_unequalSample_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset</div><h3>Unequal Sample Sizes</h3>In our study, 16 students participated in the online environment, whereas only 14 participated in the offline environment. Further, 20 students received classroom instruction, whereas only 10 received personal tutor instruction. As such, we should take action to compensate for the unequal sample sizes in order to retain the validity of our analysis. Generally, this comes down to examining the correlation between the factors and the causes of the unequal sample sizes en route to choosing whether to use weighted or unweighted means - a decision which can drastically impact the results of an ANOVA. This tutorial will demonstrate how to conduct ANOVA using both weighted and unweighted means. Thus, the ultimate decision as to the use of weighted or unweighted means is left up to each individual and his or her specific circumstances.<br />
<h3>Weighted Means</h3>First, let's suppose that we decided to go with weighted means, which take into account the correlation between our factors that results from having treatment groups with different sample sizes. A weighted mean is calculated by simply adding up all of the values and dividing by the total number of values. Consequently, we can easily derive the weighted means for each treatment group using our <span class="Apple-style-span" style="color: #cc0000;">subset(data, condition)</span> and <span class="Apple-style-span" style="color: #cc0000;">mean(data)</span> functions.<br />
<blockquote class="codeBlock"><ol><li>> #use subset(data, condition) to create subsets for each treatment group</li>
<li>> #offline subset</li>
<li>> offlineData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$environment == "offline")</li>
<li>> #online subset</li>
<li>> onlineData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$environment == "online")</li>
<li>> #classroom subset</li>
<li>> classroomData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$instruction == "classroom")</li>
<li>> #tutor subset</li>
<li>> tutorData <- subset(dataTwoWayUnequalSample, dataTwoWayUnequalSample$instruction == "tutor")</li>
<li>> #use mean(data) to calculate the weighted means for each treatment group</li>
<li>> #offline weighted mean</li>
<li>> mean(offlineData$math)</li>
<li>> #online weighted mean</li>
<li>> mean(onlineData$math)</li>
<li>> #classroom weighted mean</li>
<li>> mean(classroomData$math)</li>
<li>> #tutor weighted mean</li>
<li>> mean(tutorData$math)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-hainF6pJ8MM/T8FKF9WBGjI/AAAAAAAAA5s/4UsxV4UTlvA/s1600/20110228_anova_twoWay_unequalSample_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-hainF6pJ8MM/T8FKF9WBGjI/AAAAAAAAA5s/4UsxV4UTlvA/s1600/20110228_anova_twoWay_unequalSample_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The weighted means for the environment and instruction conditions</div><h3>ANOVA using Type I Sums of Squares</h3>When applying weighted means, it is suggested that we use Type I sums of squares (SS) in our ANOVA. Type I happens to be the default SS used in our standard <span class="Apple-style-span" style="color: #cc0000;">anova(object)</span> function, which will be used to execute our analysis. Note that in the case of two-way ANOVA, the ordering of our independent variables matters when using weighted means. Therefore, we must run our ANOVA two times, once with each independent variable taking the lead. However, the interaction effect is not affected by the ordering of the independent variables.<br />
<blockquote class="codeBlock"><ol><li>> #use anova(object) to execute the Type I SS ANOVAs</li>
<li>> #environment ANOVA</li>
<li>> anova(lm(math ~ environment * instruction, dataTwoWayUnequalSample))</li>
<li>> #instruction ANOVA</li>
<li>> anova(lm(math ~ instruction * environment, dataTwoWayUnequalSample))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-JXle4szoxwc/T8FKF_qt39I/AAAAAAAAA6I/ZkvQRg8zk20/s1600/20110228_anova_twoWay_unequalSample_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-JXle4szoxwc/T8FKF_qt39I/AAAAAAAAA6I/ZkvQRg8zk20/s1600/20110228_anova_twoWay_unequalSample_3.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The Type I SS ANOVA results. Note the differences in main effects based on the ordering of the independent variables.</div><br />
These results indicate statistically insignificant main effects for both the environment and instruction variables, as well as the interaction between them.<br />
<h3>Unweighted Means</h3>Now let's turn to using unweighted means, which essentially ignore the correlation between the independent variables that arise from unequal sample sizes. An unweighted mean is calculated by taking the average of the individual group means. Thus, we can derive our unweighted means by summing the means of each level of our independent variables and dividing by the total number of levels. For instance, to find the unweighted mean for environment, we will add the means for our offline and online groups, then divide by two.<br />
<blockquote class="codeBlock"><ol><li>> #use mean(data) and subset(data, condition) to calculate the unweighted means for each treatment group</li>
<li>> #offline unweighted mean = (classroom offline mean + tutor offline mean) / 2</li>
<li>(mean(subset(offlineData$math, offlineData$instruction == "classroom")) + mean(subset(offlineData$math, offlineData$instruction == "tutor"))) / 2</li>
<li>> #online unweighted mean = (classroom online mean + tutor online mean) / 2</li>
<li>> (mean(subset(onlineData$math, onlineData$instruction == "classroom")) + mean(subset(onlineData$math, onlineData$instruction == "tutor"))) / 2</li>
<li>> #classroom unweighted mean = (offline classroom mean + online classroom mean) / 2</li>
<li>> (mean(subset(classroomData$math, classroomData$environment == "offline")) + mean(subset(classroomData$math, classroomData$environment == "online"))) / 2</li>
<li>> #tutor unweighted mean = (offline tutor mean + online tutor mean) / 2</li>
<li>> (mean(subset(tutorData$math, tutorData$environment == "offline")) + mean(subset(tutorData$math, tutorData$environment == "online"))) / 2</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-qjGjU34pu74/T8FKGWlemHI/AAAAAAAAA54/SSDMaoV0Gs8/s1600/20110228_anova_twoWay_unequalSample_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-qjGjU34pu74/T8FKGWlemHI/AAAAAAAAA54/SSDMaoV0Gs8/s1600/20110228_anova_twoWay_unequalSample_4.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The unweighted means for the environment and instruction conditions</div><h3>ANOVA using Type III Sums of Squares</h3>When applying unweighted means, it is suggested that we use Type III sums of squares (SS) in our ANOVA. Type III SS can be set using the <span class="Apple-style-span" style="color: #cc0000;">type</span> argument in the <span class="Apple-style-span" style="color: #cc0000;">Anova(mod, type)</span> function, which is a member of the <em>car</em> package.<br />
<blockquote class="codeBlock"><ol><li>> #load the car package (install first, if necessary)</li>
<li>> library(car)</li>
<li>> #use the Anova(mod, type) function to conduct the Type III SS ANOVA</li>
<li>> Anova(lm(math ~ environment * instruction, dataTwoWayUnequalSample), type = "3")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-XdfQtjUha6M/T8FKGexEyHI/AAAAAAAAA58/3390cOxn0J4/s1600/20110228_anova_twoWay_unequalSample_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-XdfQtjUha6M/T8FKGexEyHI/AAAAAAAAA58/3390cOxn0J4/s1600/20110228_anova_twoWay_unequalSample_5.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The Type III SS ANOVA results.</div><br />
Once again, our ANOVA results indicate statistically insignificant main effects for both the environment and instruction variables, as well as the interaction between them. However, it is worth noting that both the means and p-values are different when using unweighted means and Type III SS compared to weighted means and Type I SS. In certain cases, this difference can be quite pronounced and lead to entirely different outcomes between the two methods. Hence, choosing the appropriate means and SS for a given analysis is a matter that should be approached with conscious consideration.<br />
<h3>Pairwise Comparisons</h3>Note that since our independent variables contain only two levels, there is no need to conduct follow-up comparisons. However, should you reach this point with a statistically significant independent variable of more than three levels, you could conduct pairwise comparisons in the same manner as demonstrated in the <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-two-way-anova-with.html" target="_blank">Two-Way ANOVA with Comparisons</a> tutorial.<br />
<h3>Complete Two-Way ANOVA with Unequal Sample Sizes Example</h3>To see a complete example of how two-way ANOVA with unequal sample sizes can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_twoWay_unequalSample.txt" target="_blank">two-way ANOVA with unequal sample sizes example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/cXv3FZ8N2S4" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/cXv3FZ8N2S4/r-tutorial-series-two-way-anova-with_28.htmlnoreply@blogger.com (John Quick)8http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-two-way-anova-with_28.htmltag:blogger.com,1999:blog-6710487119650146215.post-8574974866783965268Mon, 21 Feb 2011 14:00:00 +00002012-05-28T22:33:30.696-07:00ANOVAR ProjectR Tutorial Seriesrepeated measuresstatisticstutorialtwo-wayR Tutorial Series: Two-Way Repeated Measures ANOVARepeated measures data require a different analysis procedure than our typical two-way ANOVA and subsequently follow a different R process. This tutorial will demonstrate how to conduct two-way repeated measures ANOVA in R using the <span class="Apple-style-span" style="color: #cc0000;">Anova()</span> function from the <em>car</em> package.<br />
Note that the two-way repeated measures ANOVA process can be very complex to organize and execute in R. Although it has been distilled into just a few small steps in this guide, it is recommended that you fully and precisely complete the example before experimenting with your own data. As you will see, organization of the raw data is critical to successfully conducting a two-way repeated measures ANOVA using the demonstrated technique.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_twoWay_repeatedMeasures.csv" target="_blank">sample data (.csv)</a> and <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_twoWay_repeatedMeasures_idata.csv" target="_blank">sample idata frame (.csv)</a> used in this tutorial. Be sure to right-click and save the files to your R working directory. This dataset contains a hypothetical sample of 30 participants whose interest in school and interest in work was measured at three different ages (10, 15, and 20). The interest values are represented on a scale that ranges from 1 to 5 and indicate how interested each participant was in a given topic at each given age.<br />
<h3>Data Setup</h3>Notice that our data are arranged differently for a repeated measures ANOVA. In a typical two-way ANOVA, we would place all of the values of our independent variable in a single column and identify their respective levels with a second column, as demonstrated in this <a href="http://www.dailyi.org/blogFiles/RTutorialSeries/dataset_ANOVA_TwoWay.csv" target="_blank">sample two-way dataset</a>. In a two-way repeated measures ANOVA, we instead combine each independent variable with its time interval, thus yielding columns for each pairing. Hence, rather than having one vertical column for school interest and one for work interest, with a second column for age, we have six separate columns for interest, three for school interest and three for work interest at each age level. The following graphic is intended to help demonstrate this organization method.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-JbtPXVx43jE/T8FKE7-rC0I/AAAAAAAAA48/7b5aHtJ-0gk/s1600/20110221_anova_twoWay_repeatedMeasures_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-JbtPXVx43jE/T8FKE7-rC0I/AAAAAAAAA48/7b5aHtJ-0gk/s1600/20110221_anova_twoWay_repeatedMeasures_1.png" /></a></div><br />
<div align="center"></div><div align="center">Treat time as if it were an independent variable. Then combine each independent variable with each level of time and arrange the columns horizontally.</div><h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataTwoWayRepeatedMeasures <- read.csv("dataset_ANOVA_TwoWayRepeatedMeasures.csv")</li>
<li>> #display the data</li>
<li>> #notice the atypical column arrangement for repeated measures data</li>
<li>> dataTwoWayRepeatedMeasures</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-aIoropoyPEQ/T8FKFEByeoI/AAAAAAAAA5I/CK3AohvwnHk/s1600/20110221_anova_twoWay_repeatedMeasures_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-aIoropoyPEQ/T8FKFEByeoI/AAAAAAAAA5I/CK3AohvwnHk/s1600/20110221_anova_twoWay_repeatedMeasures_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset</div><h3>idata Frame</h3>Another item that we need to import for this analysis is our idata frame. This object will be used in our <span class="Apple-style-span" style="color: #cc0000;">Anova()</span> function to define the structure of our analysis.<br />
<blockquote class="codeBlock"><ol><li>> #read the idata frame into an R variable</li>
<li>> idataTwoWayRepeatedMeasures <- read.csv("idata_ANOVA_TwoWayRepeatedMeasures.csv")</li>
<li>> #display the idata frame</li>
<li>> #notice the text values and correspondence between our idata rows and the columns in our dataset</li>
<li>> idataTwoWayRepeatedMeasures</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-ebN1RUVUX1I/T8FKFIjtZDI/AAAAAAAAA5M/KgOO38DmlKo/s1600/20110221_anova_twoWay_repeatedMeasures_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-ebN1RUVUX1I/T8FKFIjtZDI/AAAAAAAAA5M/KgOO38DmlKo/s1600/20110221_anova_twoWay_repeatedMeasures_3.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The idata frame</div><br />
Note that it is critical that your idata frame take the demonstrated form for this technique to work. I experimented with several alternative, perhaps more intuitive, layouts without success. It is particularly important to notice that both columns of the idata frame contain text values (not numerical ones - hence the repeated prefixing of <em>Age</em> to the values in every row of the <em>Age</em> column). Additionally, if you read the rows of the idata frame horizontally, you will see that they correspond precisely to the columns of our dataset. The following graphic is intended to help demonstrate this organization method.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-E2NK7JAJYIY/T8FKFWwCG5I/AAAAAAAAA5Y/nNBFeviI59Y/s1600/20110221_anova_twoWay_repeatedMeasures_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-E2NK7JAJYIY/T8FKFWwCG5I/AAAAAAAAA5Y/nNBFeviI59Y/s1600/20110221_anova_twoWay_repeatedMeasures_4.png" /></a></div><div align="center"><br />
</div><div align="center"><br />
</div><div align="center">Use only text values in your idata frame. Ensure that the rows of your idata frame correspond to the columns in your dataset.</div><h3>Linear Model</h3>Prior to executing our analysis, we must follow two steps to formulate our linear model to be used in the <span class="Apple-style-span" style="color: #cc0000;">Anova()</span> function.<br />
<h4>Step 1: Bind the Columns</h4><blockquote class="codeBlock"><ol><li>> #use cbind() to bind the columns of the original dataset</li>
<li>> interestBind <- cbind(dataTwoWayRepeatedMeasures$schoolAge10, dataTwoWayRepeatedMeasures$schoolAge15, dataTwoWayRepeatedMeasures$schoolAge20, dataTwoWayRepeatedMeasures$workAge10, dataTwoWayRepeatedMeasures$workAge15, dataTwoWayRepeatedMeasures$workAge20)</li>
</ol></blockquote><h4>Step 2: Define the Model</h4><blockquote class="codeBlock"><ol><li>> #use lm() to generate a linear model using the bound columns from step 1</li>
<li>> interestModel <- lm(interestBind ~ 1)</li>
</ol></blockquote><h3><em>Anova(mod, idata, idesign)</em> Function</h3>Typically, researchers will choose one of several techniques for analyzing repeated measures data, such as an epsilon-correction method, like Huynh-Feldt or Greenhouse-Geisser, or a multivariate method, like Wilks' Lambda or Hotelling's Trace. Conveniently, having already prepared our data, we can employ a single <span class="Apple-style-span" style="color: #cc0000;">Anova(mod, idata, idesign)</span> function from the <em>car</em> package to yield all of the relevant repeated measures results. This allows us simplicity in that only a single function is required, regardless of the technique that we wish to employ. Thus, witnessing our outcomes becomes as simple as locating the desired method in the cleanly printed results.<br />
Our <span class="Apple-style-span" style="color: #cc0000;">Anova(mod, idata, idesign)</span> function will be composed of three arguments. First, <span class="Apple-style-span" style="color: #cc0000;">mod</span> will contain our linear model. Second, <span class="Apple-style-span" style="color: #cc0000;">idata</span> will contain our data frame. Third, <span class="Apple-style-span" style="color: #cc0000;">idesign</span> will contain a multiplication of the row headings from our idata frame (in other words, our independent variables), preceded by a tilde (~). Thus, our final function takes on the following form.<br />
<blockquote class="codeBlock"><ol><li>> #load the car package (install first, if necessary)</li>
<li>library(car)</li>
<li>> #compose the Anova(mod, idata, idesign) function</li>
<li>> analysis <- Anova(interestModel, idata = idataTwoWayRepeatedMeasures, idesign = ~Interest * Age)</li>
</ol></blockquote><h3>Results Summary</h3>Finally, we can use the <span class="Apple-style-span" style="color: #cc0000;">summary(object)</span> function to visualize the results of our repeated measures ANOVA.<br />
<blockquote class="codeBlock"><ol><li>> #use summary(object) to visualize the results of the repeated measures ANOVA</li>
<li>> summary(analysis)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-0Gm4chKy_VY/T8FKFQPjb0I/AAAAAAAAA5o/iAWGd-_CEso/s1600/20110221_anova_twoWay_repeatedMeasures_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-0Gm4chKy_VY/T8FKFQPjb0I/AAAAAAAAA5o/iAWGd-_CEso/s1600/20110221_anova_twoWay_repeatedMeasures_5.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Relevant segment of repeated measures ANOVA results</div><br />
Supposing that we are interested in the Wilks' Lambda method, we can see that there is a statistically significant interaction effect between interest in school and interest in work across the age groups (<em>p</em> < .001). This suggests that we should further examine our data at the level of simple main effects. For more information investigating on simple main effects, see the <a href="http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-two-way-anova-with.html" target="_blank">Two-Way ANOVA with Interactions and Simple Main Effects</a> tutorial. Of course, in this case of repeated measures ANOVA, another way to break the data down would be to run two <a href="http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-one-way-repeated.html" target="_blank">one-way repeated measures ANOVAs</a>, one for each of the independent variables. In either instance, <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-two-way-anova-with.html" target="_blank">pairwise comparisons</a> can be conducted to determine the significance of the differences between the levels of any significant effects.<br />
<h3>Complete Two-Way Repeated Measures ANOVA Example</h3>To see a complete example of how two-way repeated measures ANOVA can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_twoWay_repeatedMeasures.txt" target="_blank">two-way repeated measures ANOVA example (.txt)</a> file.<br />
<h3>References</h3>Moore, Colleen. (n.d.). 610 R9 -- Two-way Repeated-measures Anova. Retrieved January 21, 2011 from http://psych.wisc.edu/moore/Rpdf/610-R9_Within2way.pdf<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/UfCLWHf_3dM" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/UfCLWHf_3dM/r-tutorial-series-two-way-repeated.htmlnoreply@blogger.com (John Quick)21http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-two-way-repeated.htmltag:blogger.com,1999:blog-6710487119650146215.post-3265947639041470786Thu, 17 Feb 2011 14:00:00 +00002011-02-17T07:00:03.955-07:00booksR Graphs CookbookR ProjectreviewBook Review: R Graphs Cookbook<h3>Book Information</h3><p>Mittal, H. (2011). <em>R graphs cookbook</em>. Birmingham, UK: Packt Publishing Ltd.</p><h3>Audience</h3><p>The book's stated audience is anyone who is familiar with the basics of R, as well as expert users who are looking for a graphical reference. However, it is my opinion that the book is better suited for advanced users who are already somewhat familiar with R graphics and are very comfortable with programming in R.</p><h3>Content</h3><p>To begin, the first chapter of <em>R Graphs Cookbook</em> rapidly introduces all of the major graphic types covered in the book. Next, in Chapter two, readers are acquainted with various arguments and modification functions that are used throughout the book to customize and enhance visuals. Subsequently, individual chapters focus on specific topics in R graphics, such as:</p><ol><li>scatterplots,</li><li>line and time series charts,</li><li>bar, dot, and pie charts,</li><li>histograms,</li><li>box plots,</li><li>heat and contour maps,</li><li>geographical maps, and</li><li>exporting and annotating graphics.</li></ol><h3>Analysis</h3><p>I will start with some general impressions, before moving into chapter by chapter analyses.</p><p>First, I feel that the book needs both more and larger screenshots. Often times, recipes are without any visuals and most of the time only one is present, whereas one per major graphical modification is expected. Furthermore, the screenshots are too small. These are critical items to neglect in a book that explicitly deals with visuals. Fortunately, full-size, full-color images are provided with the downloadable code for the book.</p><p>Second, I feel that the topics presented in the book are glanced over with far too little explanation. This is the main reason that I feel it is not suited for those who are not already well versed in R programming. Moreover, <i>R Graphs Cookbook</i> frequently refers the reader to help documentation or to other books on R, which can be frustrating. I personally feel that a book should be largely self-contained, at least when discussing topics within its scope.</p><p>Third, I believe that the book could be better organized for use as a fast reference guide and that it generally could be better structured to present information. For example, rather than tables are a clearer way to present head to head comparisons between objects, and lists are better for describing several function arguments.</p><p>On the other hand, I do like the book's code formatting, which displays one argument per line. While this could confuse novice users into thinking that each argument is a separate line of executable code, most readers should find this a welcomed organization style for often lengthy graphics functions. I also enjoyed how the <em>see also</em> sections at the end of each recipe let me know whether more recipes would build on a given topic.</p><p>Continuing, chapter one felt like a whirlwind of information that charged forward with a lack of purpose, organization, and explanation. Chapter two was much better, offering several nice recipes that were fast and easy to digest, with just enough information provided.</p><p>Chapter three takes an in-depth look at scatterplots and provides a number of useful recipes, such as how to group data, label points, generate error bars, and create graphical correlation matrices. Similarly, chapter four provides a solid collection of recipes for time series and line charts.</p><p>In contrast, chapters five through seven cover a disappointingly sparse amount of material related to their respective topics. Unfortunately, they do not stretch far beyond what is covered in the two graphics-focused chapters of <em><a href="http://link.packtpub.com/or7f1u" target="_blank">Statistical Analysis with R</a></em>, which is a guide for newcomers and early beginners. From an advanced reference like <em>R Graphs Cookbook</em>, I expected broader coverage. For instance, very few external packages are presented in this book, with the author choosing to focus on built-in graphics functions almost exclusively. An introduction to external options, such as <em>ggplot2</em>, would be warmly welcomed.</p><p>Chapters eight and nine relate a few of the lesser covered topics in R, including heat, contour, and geographical maps. These chapters will likely be informative and valuable for readers interested in these graphical applications.</p><p>Lastly, chapter ten deals with the presentation and exportation of graphics. While I wish a deeper exploration was made, there are some useful tips in is chapter. Namely, the use of the <em>expression()</em> function to annotate graphics is well covered.</p><h3>Brief Summary</h3><ul><li>Title: <em>R Graphs Cookbook</em></li><li>Author: <a href="http://www.prettygraph.com/" target="_blank">Hrishi Mittal</a></li><li>Where To Find: <a href="http://link.packtpub.com/T7PMbW" target="_blank">Packt Publishing</a>
</li><li>Audience: those who are comfortable programming in R, able to mix, match, apply, and extend recipes for their own purposes, and looking to learn more about R's built-in graphical capabilities.</li><li>Content: a loosely associated collection of recipes for applying R's built-in graphics functions to create the most common types of charts, graphs, plots, and maps.</li><li>Analysis: although it could have better visuals, structure, and coverage, it is likely that almost any reader will be able to take away valuable techniques from this book</li><li>Arbitrary Rating: 6/10</li><li>Recommendation: take a look at the table of contents and count the number of recipes that would both be useful to you and that you do not already know how to accomplish to get an idea of how much you will take away from this book; also read the <a href="https://www.packtpub.com/sites/default/files/3067OS-Chapter-2-beyond-the-basics-adjusting-Key-Parameters.pdf" target="_blank">free sample chapter</a></li><li>Disclaimer: I received a review copy of this book</li></ul><img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/P6dBs19xt74" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/P6dBs19xt74/book-review-r-graphs-cookbook.htmlnoreply@blogger.com (John Quick)0http://rtutorialseries.blogspot.com/2011/02/book-review-r-graphs-cookbook.htmltag:blogger.com,1999:blog-6710487119650146215.post-729591615944645575Mon, 14 Feb 2011 14:00:00 +00002012-05-28T22:30:49.523-07:00ANOVAone-wayR ProjectR Tutorial Seriesrepeated measuresstatisticstutorialR Tutorial Series: One-Way Repeated Measures ANOVARepeated measures data require a different analysis procedure than our typical one-way ANOVA and subsequently follow a different R process. This tutorial will demonstrate how to conduct one-way repeated measures ANOVA in R using the <span class="Apple-style-span" style="color: #cc0000;">Anova(mod, idata, idesign)</span> function from the <em>car</em> package.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_oneWay_repeatedMeasures.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 participants whose interest in voting was measured at three different ages (10, 15, and 20). The interest values are represented on a scale that ranges from 1 to 5 and indicate how interested each participant was in voting at each given age.<br />
<h3>Data Setup</h3>Notice that our data are arranged differently for a repeated measures ANOVA. In a typical one-way ANOVA, we would place all of the values of our independent variable in a single column and identify their respective levels with a second column, as demonstrated in this <a href="http://www.dailyi.org/blogFiles/RTutorialSeries/dataset_ANOVA_OneWay.csv" target="_blank">sample one-way dataset</a>. In a repeated measures ANOVA, we instead treat each level of our independent variable as if it were a variable, thus placing them side by side as columns. Hence, rather than having one vertical column for voting interest, with a second column for age, we have three separate columns for voting interest, one for each age level.<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataOneWayRepeatedMeasures <- read.csv("dataset_ANOVA_OneWayRepeatedMeasures.csv")</li>
<li>> #display the data</li>
<li>> #notice the atypical column arrangement for repeated measures data</li>
<li>> dataOneWayRepeatedMeasures</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-BL7ll9Cg-Fk/T8FKEo4_X6I/AAAAAAAAA4s/7pNnYcA4p0M/s1600/20110214_anova_oneWay_repeatedMeasures_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-BL7ll9Cg-Fk/T8FKEo4_X6I/AAAAAAAAA4s/7pNnYcA4p0M/s1600/20110214_anova_oneWay_repeatedMeasures_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset</div><h3>Preparing the Repeated Measures Factor</h3>Prior to executing our analysis, we must follow a small series of steps in order to prepare our repeated measures factor.<br />
<h4>Step 1: Define the Levels</h4><blockquote class="codeBlock"><ol><li>> #use c() to create a vector containing the number of levels within the repeated measures factor</li>
<li>> #create a vector numbering the levels for our three voting interest age groups</li>
<li>> ageLevels <- c(1, 2, 3)</li>
</ol></blockquote><h4>Step 2: Define the Factor</h4><blockquote class="codeBlock"><ol><li>> #use as.factor() to create a factor using the level vector from step 1</li>
<li>> #convert the age levels into a factor</li>
<li>> ageFactor <- as.factor(ageLevels)</li>
</ol></blockquote><h4>Step 3: Define the Frame</h4><blockquote class="codeBlock"><ol><li>> #use data.frame() to create a data frame using the factor from step 2</li>
<li>> #convert the age factor into a data frame</li>
<li>> ageFrame <- data.frame(ageFactor)</li>
</ol></blockquote><h4>Step 4: Bind the Columns</h4><blockquote class="codeBlock"><ol><li>> #use cbind() to bind the levels of the factor from the original dataset</li>
<li>> #bind the age columns</li>
<li>> ageBind <- cbind(dataOneWayRepeatedMeasures$Interest10, dataOneWayRepeatedMeasures$Interest15, dataOneWayRepeatedMeasures$Interest20)</li>
</ol></blockquote><h4>Step 5: Define the Model</h4><blockquote class="codeBlock"><ol><li>> #use lm() to generate a linear model using the bound factor levels from step 4</li>
<li>> #generate a linear model using the bound age levels</li>
<li>> ageModel <- lm(ageBind ~ 1)</li>
</ol></blockquote><h3>Employing the <em>Anova(mod, idata, idesign)</em> Function</h3>Typically, researchers will choose one of several techniques for analyzing repeated measures data, such as an epsilon-correction method, like Huynh-Feldt or Greenhouse-Geisser, or a multivariate method, like Wilks' Lambda or Hotelling's Trace. Conveniently, having already prepared our data, we can employ a single <span class="Apple-style-span" style="color: #cc0000;">Anova(mod, idata, idesign)</span> function from the <em>car</em> package to yield all of the relevant repeated measures results. This allows us simplicity in that only a single function is required, regardless of the technique that we wish to employ. Thus, witnessing our outcomes becomes as simple as locating the desired method in the cleanly printed results.<br />
Our <span class="Apple-style-span" style="color: #cc0000;">Anova(mod, idata, idesign)</span> function will be composed of three arguments. First, <span class="Apple-style-span" style="color: #cc0000;">mod</span> will contain our linear model from Step 5 in the preceding section. Second, <span class="Apple-style-span" style="color: #cc0000;">idata</span> will contain our data frame from Step 3. Third, <span class="Apple-style-span" style="color: #cc0000;">idesign</span> will contain our factor from Step 2, preceded by a tilde (~). Thus, our final function takes on the following form.<br />
<blockquote class="codeBlock"><ol><li>> #load the car package (install first, if necessary)</li>
<li>library(car)</li>
<li>> #compose the Anova(mod, idata, idesign) function</li>
<li>> analysis <- Anova(ageModel, idata = ageFrame, idesign = ~ageFactor)</li>
</ol></blockquote><h3>Visualizing the Results</h3>Finally, we can use the <span class="Apple-style-span" style="color: #cc0000;">summary(object)</span> function to visualize the results of our repeated measures ANOVA.<br />
<blockquote class="codeBlock"><ol><li>> #use summary(object) to visualize the results of the repeated measures ANOVA</li>
<li>> summary(analysis)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-nIpdUa9T_5I/T8FKEi-q9BI/AAAAAAAAA4w/Y5GKIuJfCKE/s1600/20110214_anova_oneWay_repeatedMeasures_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-nIpdUa9T_5I/T8FKEi-q9BI/AAAAAAAAA4w/Y5GKIuJfCKE/s1600/20110214_anova_oneWay_repeatedMeasures_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Relevant segment of repeated measures ANOVA results</div><br />
Supposing that we are interested in the Wilks' Lambda method, we can see that the differences in the means for voting interest at ages 10, 15, and 20 are statistically significant (<em>p</em> < .001).<br />
<h3>Pairwise Comparisons</h3>Note that we could conduct follow-up comparisons on our age factor to determine which age level means are significantly different from one another. For this purpose, it is recommended that the data be rearranged into the standard ANOVA format that we have used throughout our other tutorials. Subsequently, we could conduct pairwise comparisons in the same manner as demonstrated in the <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-one-way-anova-with.html" target="_blank">One-Way ANOVA with Comparisons</a> tutorial.<br />
<h3>Complete One-Way Repeated Measures ANOVA Example</h3>To see a complete example of how one-way repeated measures ANOVA can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_oneWay_repeatedMeasures.txt" target="_blank">one-way repeated measures ANOVA example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/mcD7wZHho0s" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/mcD7wZHho0s/r-tutorial-series-one-way-repeated.htmlnoreply@blogger.com (John Quick)24http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-one-way-repeated.htmltag:blogger.com,1999:blog-6710487119650146215.post-2593087957678846202Mon, 07 Feb 2011 14:00:00 +00002012-05-28T22:28:38.173-07:00ANOVAinteraction effectR ProjectR Tutorial Seriessimple main effectsstatisticstutorialtwo-wayR Tutorial Series: Two-Way ANOVA with Interactions and Simple Main EffectsWhen an interaction is present in a two-way ANOVA, we typically choose to ignore the main effects and elect to investigate the simple main effects when making pairwise comparisons. This tutorial will demonstrate how to conduct pairwise comparisons when an interaction is present in a two-way ANOVA.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_twoWay_interactions.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 60 participants who are divided into three stress reduction treatment groups (mental, physical, and medical) and two gender groups (male and female). The stress reduction values are represented on a scale that ranges from 1 to 5. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication across genders. The values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness.<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataTwoWayInteraction <- read.csv("dataset_ANOVA_TwoWayInteraction.csv")</li>
<li>> #display the data</li>
<li>> dataTwoWayInteraction</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-IUa_Lg7wsLA/T8FKEAvX6LI/AAAAAAAAA4c/sCUthAx8OmI/s1600/20110207_anova_twoWay_interactions_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-IUa_Lg7wsLA/T8FKEAvX6LI/AAAAAAAAA4c/sCUthAx8OmI/s1600/20110207_anova_twoWay_interactions_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset.</div><h3>Omnibus Test</h3>Let's run a general omnibus test to assess the main effects and interactions present in the dataset.<br />
<blockquote class="codeBlock"><ol><li>> #use anova(object) to test the omnibus hypothesis</li>
<li>> #Are main effects or interaction effects present in the independent variables?</li>
<li>> anova(lm(StressReduction ~ Treatment * Gender, dataTwoWayInteraction))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-6fs4WyRC0qk/T8FKEOTZ60I/AAAAAAAAA4Y/diMHkJg9tf8/s1600/20110207_anova_twoWay_interactions_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-6fs4WyRC0qk/T8FKEOTZ60I/AAAAAAAAA4Y/diMHkJg9tf8/s1600/20110207_anova_twoWay_interactions_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The omnibus ANOVA test</div><h3>Divide the Data</h3>The significant omnibus interaction suggests that we should ignore the main effects and instead investigate the simple main effects for our independent variables. To do so, we need to divide our dataset along each level of our treatment variable. We can create subsets of our dataset using the <span class="Apple-style-span" style="color: #cc0000;">subset(data, condition)</span> function, where <span class="Apple-style-span" style="color: #cc0000;">data</span> is the original dataset and <span class="Apple-style-span" style="color: #cc0000;">condition</span> contains the parameters defining the subset.<br />
<blockquote class="codeBlock"><ol><li>> #use subset(data, condition) to divide the original dataset</li>
<li>> #medical subset</li>
<li>> dataMedical <- subset(dataTwoWayInteraction, Treatment == "medical")</li>
<li>> #mental subset</li>
<li>> dataMental <- subset(dataTwoWayInteraction, Treatment == "mental")</li>
<li>> #physical subset</li>
<li>> dataPhysical <- subset(dataTwoWayInteraction, Treatment == "physical")</li>
</ol></blockquote><h3>Group ANOVAs</h3>With datasets representing each of our treatment groups, we can now run an ANOVA for each that investigates the impact of gender. You may notice that this is effectively running three one-way ANOVAs with gender being the independent variable. Therefore, we should control for Type I error by dividing our typical .05 alpha level by three (.017).<br />
<blockquote class="codeBlock"><ol><li>> #run ANOVA on the treatment subsets to investigate the impacts of gender within each</li>
<li>> #medical</li>
<li>> anova(lm(StressReduction ~ Gender, dataMedical))</li>
<li>> #mental</li>
<li>> anova(lm(StressReduction ~ Gender, dataMental))</li>
<li>> #physical</li>
<li>> anova(lm(StressReduction ~ Gender, dataPhysical))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-5ZO2VqyvQNE/T8FKEVWoAlI/AAAAAAAAA4o/Bg7YnFwP4kU/s1600/20110207_anova_twoWay_interactions_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-5ZO2VqyvQNE/T8FKEVWoAlI/AAAAAAAAA4o/Bg7YnFwP4kU/s1600/20110207_anova_twoWay_interactions_3.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The gender within treatment group ANOVA tests</div><br />
At an alpha level of .017, the gender effect within the mental (<em>p</em> = .014) and physical (<em>p</em> < .001) groups was statistically significant. In the mental condition, the means are 3 for males and 4 for females. In the physical condition, the means are 4 for males and 2 for females. These results suggest that the mental treatment is more effective in reducing stress for females than males, while the physical treatment is more effective for males than females. Further, there is insufficient statistical support for a gender difference in the medical treatment.<br />
<h3>Pairwise Comparisons</h3>Note that since our gender variable contains only two levels, there is no need to conduct follow-up comparisons. However, should you reach this point with an independent variable of more than three levels, you could conduct pairwise comparisons in the same manner as demonstrated in the <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-two-way-anova-with.html" target="_blank">Two-Way ANOVA with Comparisons</a> tutorial. In this case, remember to carry through your reduced Type I error rate from the preceding ANOVA tests.<br />
<h3>Complete Two-Way ANOVA with Interactions Example</h3>To see a complete example of how two-way ANOVA simple main effects can be explored in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_twoWay_interactions.txt" target="_blank">two-way ANOVA interaction example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/4mFG0DBmpz8" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/4mFG0DBmpz8/r-tutorial-series-two-way-anova-with.htmlnoreply@blogger.com (John Quick)8http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-two-way-anova-with.htmltag:blogger.com,1999:blog-6710487119650146215.post-6158900307091057984Wed, 02 Feb 2011 14:00:00 +00002012-05-26T15:13:46.130-07:00booksR Graphs CookbookR ProjectreviewStatistical Analysis with RstatisticsStatistical Analysis with R Book ReviewsReviews of my <a href="http://link.packtpub.com/or7f1u" target="_blank">Statistical Analysis with R</a> book have started to emerge online and I am writing today to share them with potential readers and recommenders.<br />
<h3>
Reviews</h3>
The following is a list of online reviews for <em>Statistical Analysis with R</em>. If you have written a review of the book and would like it to be featured in this post, please <a href="http://www.emailmeform.com/builder/form/612615" target="_blank">contact me</a>.<br />
<ul>
<li><a href="http://slashdot.org/submission/1452180/Book-Review-Statistical-Analysis-with-R" target="_blank">A review from Slashdot.org</a></li>
<li><a href="http://www.stubbornmule.net/2011/01/a-gentle-introduction-to-r/" target="_blank">A review from the Stubborn Mule blog</a></li>
<li><a href="http://nortalktoowise.com/?p=848" target="_blank">A review from the Nor Talk Too Wise blog</a></li>
<li><a href="http://blog.rtwilson.com/review-statistical-analysis-with-r-beginners-guide-by-john-m-quick/" target="_blank">A review from Robin Wilson's blog</a></li>
<li><a href="http://lemire.me/blog/archives/2011/01/19/book-review-statistical-analysis-with-r/" target="_blank">A review from Daniel Lemire's blog</a></li>
</ul>
In summarizing the reviews, a few points are very clear about <em>Statistical Analysis with R</em>.<br />
<ol>
<li>It is for beginners: The book was written for people who have little to no experience with R, statistical software, and programming. It makes no assumptions of prior experience along these lines and starts right from the beginning. If you are new to R and want to learn how to apply it to your work, then this book is for you. If you are already an intermediate or experienced user, perhaps you might recommend it to people you know who are just becoming familiar with R.</li>
<li>It is a learning tool, not a reference: The book is structured with the intent that it is experienced as a holistic learning experience. The chapters build on one another and progressively delve deeper into R. It is not a dictionary-style reference that one might pull out, flip to an entry, and get a brief answer on a single item. Again, this has implications for the audience. Beginners are more likely to enjoy this approach, whereas experienced users may be interested in more of a reference-style book.</li>
<li>It has a story: Woven into the book's learning structure is a storyline based on the Three Kingdoms period of ancient Chinese history. For many, this will be a motivating and engaging way to learn. For others, the story may not inspire the same level of interest. If you would like to get a taste of the story, and the book in general, it is recommended that you read the <a href="https://www.packtpub.com/sites/default/files/2084-chapter-8-briefing-the-emperor.pdf" target="_blank">free sample chapter</a>.</li>
</ol>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-bKGPx8X7sYA/T8FTtmB2rTI/AAAAAAAABIQ/JC9GhirToag/s1600/000000_statisticalAnalysisWithR_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-bKGPx8X7sYA/T8FTtmB2rTI/AAAAAAAABIQ/JC9GhirToag/s1600/000000_statisticalAnalysisWithR_2.png" /></a></div>
<div align="center">
<br /></div>
<div align="center">
</div>
<h3>
New Release: R Graph Cookbook</h3>
Packt Publishing recently released a second book on R, the <a href="http://link.packtpub.com/T7PMbW" target="_blank">R Graph Cookbook</a> by Hrishi Mittal. This reference-style guide covers an array of R graphical applications and is geared towards users who are already familiar with the basics of R. A <a href="https://www.packtpub.com/sites/default/files/3067OS-Chapter-2-beyond-the-basics-adjusting-Key-Parameters.pdf" target="_blank">sample chapter</a> is available. I will be reviewing this book in the near future and posting my thoughts here on the R Tutorial Series blog. [Update: <a href="http://rtutorialseries.blogspot.com/2011/02/book-review-r-graphs-cookbook.html">read my review of R Graph Cookbook</a>]<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/yk82bhVkC9A" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/yk82bhVkC9A/r-tutorial-series-statistical-analysis.htmlnoreply@blogger.com (John Quick)2http://rtutorialseries.blogspot.com/2011/02/r-tutorial-series-statistical-analysis.htmltag:blogger.com,1999:blog-6710487119650146215.post-5187844404717526239Mon, 31 Jan 2011 14:00:00 +00002012-05-28T22:25:38.860-07:00ANOVApairwise comparisonsR ProjectR Tutorial Seriesstatisticstutorialtwo-wayR Tutorial Series: Two-Way ANOVA with Pairwise ComparisonsBy extending our one-way ANOVA procedure, we can test the pairwise comparisons between the levels of several independent variables. This tutorial will demonstrate how to conduct pairwise comparisons in a two-way ANOVA.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_twoWay_comparisons.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 27 participants who are divided into three stress reduction treatment groups (mental, physical, and medical) and three age groups (young, mid, and old). The stress reduction values are represented on a scale that ranges from 0 to 10. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication, across three age groups. The stress reduction values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness. Note that the numbers in this dataset are not very realistic and are simply used to make this example possible.<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataTwoWayComparisons <- read.csv("dataset_ANOVA_TwoWayComparisons.csv")</li>
<li>> #display the data</li>
<li>> dataTwoWayComparisons</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-XGEPx6pJoFs/T8FKDcsZT7I/AAAAAAAAA3w/YBokRUW199o/s1600/20110131_anova_twoWay_comparisons_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-XGEPx6pJoFs/T8FKDcsZT7I/AAAAAAAAA3w/YBokRUW199o/s1600/20110131_anova_twoWay_comparisons_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset.</div><h3>Omnibus Test</h3>Let's run a general omnibus test to assess the main effects and interactions present in the dataset.<br />
<blockquote class="codeBlock"><ol><li>> #use anova(object) to test the omnibus hypothesis</li>
<li>> #Are main effects or interaction effects present in the independent variables?</li>
<li>> anova(lm(StressReduction ~ Treatment * Age, dataTwoWayComparisons))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-kgTwWW4oPHY/T8FKDehFnrI/AAAAAAAAA38/XfCaEY_kdZM/s1600/20110131_anova_twoWay_comparisons_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-kgTwWW4oPHY/T8FKDehFnrI/AAAAAAAAA38/XfCaEY_kdZM/s1600/20110131_anova_twoWay_comparisons_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The omnibus ANOVA test</div><h3>Pairwise Comparisons</h3>Since the omnibus test was significant for both variables and no interaction effect was present, we can proceed to testing the main effect pairwise comparisons. To accomplish this, we will apply our <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function to each of our independent variables. For more details on the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function, see the <a href="http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-one-way-anova-with.html" target="_blank">One-Way ANOVA with Pairwise Comparisons</a> tutorial.<br />
<blockquote class="codeBlock"><ol><li>> #use pairwise.t.test(x, g, p.adj) to test the pairwise comparisons between the treatment group means</li>
<li>> #What significant differences are present amongst the treatment means?</li>
<li>> pairwise.t.test(dataTwoWayComparisons$StressReduction, dataTwoWayComparisons$Treatment, p.adj = "none")</li>
<li>> #use pairwise.t.test(x, g, p.adj) to test the pairwise comparisons between the age group means</li>
<li>> #What significant differences are present amongst the age group means?</li>
<li>> pairwise.t.test(dataTwoWayComparisons$StressReduction, dataTwoWayComparisons$Age, p.adj = "none")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-SXChcKcHhnE/T8FKDgrPkUI/AAAAAAAAA4A/hpO7zPoCJZA/s1600/20110131_anova_twoWay_comparisons_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-SXChcKcHhnE/T8FKDgrPkUI/AAAAAAAAA4A/hpO7zPoCJZA/s1600/20110131_anova_twoWay_comparisons_3.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means</div><div align="center"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-JVTzxJDY3_c/T8FKDogFnhI/AAAAAAAAA4M/h_GUfh4iy50/s1600/20110131_anova_twoWay_comparisons_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-JVTzxJDY3_c/T8FKDogFnhI/AAAAAAAAA4M/h_GUfh4iy50/s1600/20110131_anova_twoWay_comparisons_4.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of age group means</div><br />
Note that the desired p-adjustment method will vary by researcher, study, etc. Here, we will assume an alpha level of .05 for all tests, effectively making no adjustment for the family-wise Type I error rate.<br />
These results indicate that there are are no statistically significant pairwise differences between the treatment groups and that all of the comparisons between age groups are statistically significant. The age group means are 8 for young, 5 for mid, and 2 for old. Consequently, we are inclined to conclude that, regardless of treatment, young patients are going to be most responsive, followed by middle aged patients, followed by older ones. However, there is insufficient support to differentiate between the effectiveness of the treatment methods themselves.<br />
<h3>Complete Two-Way ANOVA with Pairwise Comparisons Example</h3>To see a complete example of how two-way ANOVA pairwise comparisons can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_twoWay_comparisons.txt" target="_blank">two-way ANOVA comparisons example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/3OFmHX3fOLE" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/3OFmHX3fOLE/r-tutorial-series-two-way-anova-with.htmlnoreply@blogger.com (John Quick)3http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-two-way-anova-with.htmltag:blogger.com,1999:blog-6710487119650146215.post-1933794817062516625Mon, 24 Jan 2011 14:00:00 +00002012-05-28T22:24:12.873-07:00ANOVAone-waypairwise comparisonsR ProjectR Tutorial SeriesstatisticstutorialR Tutorial Series: One-Way ANOVA with Pairwise ComparisonsWhen we have more than two groups in a one-way ANOVA, we typically want to statistically assess the differences between each group. Whereas a <a href="http://rtutorialseries.blogspot.com/2010/10/r-tutorial-series-one-way-omnibus-anova.html" target="_blank">one-way omnibus ANOVA</a> assesses whether a significant difference exists at all amongst the groups, pairwise comparisons can be used to determine which group differences are statistically significant.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_oneWay_comparisons.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 30 participants who are divided into three stress reduction treatment groups (mental, physical, and medical). The values are represented on a scale that ranges from 1 to 5. This dataset can be conceptualized as a comparison between three stress treatment programs, one using mental methods, one using physical training, and one using medication. The values represent how effective the treatment programs were at reducing participant's stress levels, with higher numbers indicating higher effectiveness.<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the dataset into an R variable using the read.csv(file) function</li>
<li>> dataOneWayComparisons <- read.csv("dataset_ANOVA_OneWayComparisons.csv")</li>
<li>> #display the data</li>
<li>> dataOneWayComparisons</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-gac8Ch-1jNo/T8FKC740KGI/AAAAAAAAA3s/DW5p1p0UAo0/s1600/20110124_anova_oneWay_comparisons_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-gac8Ch-1jNo/T8FKC740KGI/AAAAAAAAA3s/DW5p1p0UAo0/s1600/20110124_anova_oneWay_comparisons_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our dataset.</div><h3>Omnibus Test</h3>One way to begin an ANOVA is to run a general omnibus test. The advantage to starting here is that if the omnibus test comes up insignificant, you can stop your analysis and deem all pairwise comparisons insignificant. If the omnibus test is significant, you should continue with pairwise comparisons.<br />
<blockquote class="codeBlock"><ol><li>> #use anova(object) to test the omnibus hypothesis</li>
<li>> #Is there a significant difference amongst the treatment means?</li>
<li>> anova(lm(StressReduction ~ Treatment, dataOneWayComparisons))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-4nKYwSPu8HA/T8FKCzNIO_I/AAAAAAAAA3c/F0b3OVR_YZg/s1600/20110124_anova_oneWay_comparisons_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-4nKYwSPu8HA/T8FKCzNIO_I/AAAAAAAAA3c/F0b3OVR_YZg/s1600/20110124_anova_oneWay_comparisons_2.png" /></a></div><div align="center">The omnibus ANOVA test</div><h3>Pairwise Comparisons</h3>Since the omnibus test was significant, we are safe to continue with our pairwise comparisons. To make pairwise comparisons between the treatment groups, we will use the <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function, which has the following major arguments. <br />
<ul><li>x: the dependent variable</li>
<li>g: the independent variable</li>
<li>p.adj: the p-value adjustment method used to control for the family-wise Type I error rate across the comparisons; one of "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or "BY"</li>
</ul>The <span class="Apple-style-span" style="color: #cc0000;">pairwise.t.test()</span> function can be implemented as follows.<br />
<blockquote class="codeBlock"><ol><li>> #use pairwise.t.test(x, g, p.adj) to test the pairwise comparisons between the treatment group means</li>
<li>> #What significant differences are present amongst the treatment means?</li>
<li>> pairwise.t.test(dataOneWayComparisons$StressReduction, dataOneWayComparisons$Treatment, p.adj = "none")</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-6DBSb9sXZ1I/T8FKDFZsrAI/AAAAAAAAA3k/ueRwzSsRv4U/s1600/20110124_anova_oneWay_comparisons_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-6DBSb9sXZ1I/T8FKDFZsrAI/AAAAAAAAA3k/ueRwzSsRv4U/s1600/20110124_anova_oneWay_comparisons_3.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Pairwise comparisons of treatment group means</div><br />
Note that the desired p-adjustment method will vary by researcher, study, etc. Here, we will assume an alpha level of .05 for all tests, effectively making no adjustment for the family-wise Type I error rate.<br />
These results indicate that there is a statistically significant difference between the mental and medical (<em>p</em> = .004) and physical and medical (<em>p</em> = 0.045), but not the mental and physical (<em>p</em> = 0.302) treatments. The treatment means are 3.5 for mental, 3 for physical, and 2 for medical. Subsequently, we are inclined to conclude based on this study that the mental and physical treatments lead to greater stress reduction than the medical method, but that there is insufficient statistical support to determine that either the mental or physical treatment method is superior.<br />
<h3>Complete One-Way ANOVA with Pairwise Comparisons Example</h3>To see a complete example of how one-way ANOVA pairwise comparisons can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_oneWay_comparisons.txt" target="_blank">one-way ANOVA comparisons example (.txt)</a> file.<br />
<h3>References</h3>R Documentation (n.d.). Adjust P-values for Multiple Comparisons. Retrieved January 16, 2011 from http://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/BEs5KoabVm0" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/BEs5KoabVm0/r-tutorial-series-one-way-anova-with.htmlnoreply@blogger.com (John Quick)2http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-one-way-anova-with.htmltag:blogger.com,1999:blog-6710487119650146215.post-6403865339076805282Mon, 17 Jan 2011 14:00:00 +00002012-05-28T22:21:56.038-07:00analysis of varianceANOVAomnibusR ProjectR Tutorial Seriesstatisticstutorialtwo-wayR Tutorial Series: Two-Way Omnibus ANOVAAs with the <a href="http://rtutorialseries.blogspot.com/2010/10/r-tutorial-series-one-way-omnibus-anova.html" target="_blank">one-way</a> case, testing the omnibus hypothesis via two-way ANOVA is simple process in R. This tutorial will explore how R can be used to perform a two-way ANOVA to test the difference between two (or more) group means.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_twoWay_omnibus.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 60 participants who are divided by gender (male and female) and treatment group (control and treatment). The values represent a scale that ranges from 1 to 5. For instance, this dataset could be conceptualized as a comparison between two professional training programs, where the control group participated the company's longstanding program and the treatment group participated in an experimental program. The values could represent the attitudes of employees towards the training programs on a scale from 1 (poor) to 5 (excellent).<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the two-way ANOVA dataset into an R variable using the read.csv(file) function</li>
<li>> dataTwoWay <- read.csv("dataset_ANOVA_TwoWay.csv")</li>
<li>> #display the data</li>
<li>> dataTwoWay</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-S1rI6GbjR-k/T8FKClt_j_I/AAAAAAAAA3M/JKM4pnXT2lA/s1600/20110117_anova_twoWay_omnibus_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-S1rI6GbjR-k/T8FKClt_j_I/AAAAAAAAA3M/JKM4pnXT2lA/s1600/20110117_anova_twoWay_omnibus_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our two-way ANOVA dataset.</div><h3>Two-Way ANOVA</h3>Now that our data are ready, we can conduct a two-way omnibus ANOVA test using the <span class="Apple-style-span" style="color: #cc0000;">anova(object)</span> function. Note that the only step necessary to add a second independent variable into our ANOVA model is to incorporate it into our <span class="Apple-style-span" style="color: #cc0000;">lm(model, dataset) function</span> using the <span class="Apple-style-span" style="color: #cc0000;">*</span> operator. Whereas our one-way model was <span class="Apple-style-span" style="color: #cc0000;">lm(Values ~ Group)</span>, our two-way model becomes <span class="Apple-style-span" style="color: #cc0000;">lm(Values ~ Group * Gender)</span>. As you can see from the results below, adding a second independent variable in this manner also gives us information about the interaction between our variables.<br />
<blockquote class="codeBlock"><ol><li>> #use anova(object) to test the omnibus hypothesis in two-way ANOVA</li>
<li>> #Are the differences between the group means for treatment and gender statistically significant?</li>
<li>> #Is there a statistically significant interaction between treatment and gender?</li>
<li>> anova(lm(Values ~ Group * Gender, dataTwoWay))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-tyJ_0NLsmOE/T8FKCpJE5_I/AAAAAAAAA3Q/jNx8y87Ak-M/s1600/20110117_anova_twoWay_omnibus_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-tyJ_0NLsmOE/T8FKCpJE5_I/AAAAAAAAA3Q/jNx8y87Ak-M/s1600/20110117_anova_twoWay_omnibus_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Our two-way ANOVA table.</div><br />
The output of our ANOVA test indicates that the difference between our treatment group means is statistically significant (<em>p</em> < .001) and that the difference between genders is not (<em>p</em> = .585). However, in light of the statistically significant interaction between treatment group and gender (<em>p</em> = .032), we would generally elect to forgo these main effects. Subsequently, a series of follow-up procedures could be carried out to examine the simple main effects for treatment group and gender.<br />
<h3>Two-Way Multiple Group ANOVA</h3>Conducting a two-way omnibus ANOVA with multiple groups is identical to the demonstrated two-group test. The only difference is that the values in your dataset would be associated with more than two groups. Subsequently, the omnibus hypothesis would test for mean differences across all of the groups. The <span class="Apple-style-span" style="color: #cc0000;">anova(object)</span> function and its contained <span class="Apple-style-span" style="color: #cc0000;">lm(formula, data)</span> function would remain the same.<br />
<h3>Complete Two-Way Omnibus ANOVA Example</h3>To see a complete example of how a two-way omnibus ANOVA can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_twoWay_omnibus.txt" target="_blank">two-way ANOVA example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/1xJ77SbSQx8" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/1xJ77SbSQx8/r-tutorial-series-two-way-omnibus-anova.htmlnoreply@blogger.com (John Quick)0http://rtutorialseries.blogspot.com/2011/01/r-tutorial-series-two-way-omnibus-anova.htmltag:blogger.com,1999:blog-6710487119650146215.post-113246361364847354Mon, 08 Nov 2010 14:00:00 +00002012-05-26T15:56:08.456-07:00booksguidePackt PublishingR ProjectR Tutorial SeriesstatisticsR Beginner's Guide Book Update: Statistical Analysis with R ReleasedIn the final days of October, my beginner's guide to R was released. The book's official title is <a href="http://link.packtpub.com/or7f1u" target="_blank"><strong>Statistical Analysis with R</strong></a> and it can be found on the Packt Publishing website.<br />
The primary focus of <em>Statistical Analysis with R</em> is helping new users become accustomed to R and empowering them to apply R to suit their own needs. No prior experience with R, statistical software packages, or programming is necessary to learn from this book. It is written for a broad audience and should be well received by businesspeople, IT professionals, researchers, and students alike. <em>Statistical Analysis with R</em> takes readers on a journey from their first installation and launch of R, to analyzing and assessing data, to communicating and visualizing results. This guide is an excellent way to rapidly become an experienced R user and learn the skills that you need to apply R to your work.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://3.bp.blogspot.com/-qc4gZnjuFzw/T8Ffa08SEgI/AAAAAAAABIg/uRBL7vlrxso/s1600/000000_statisticalAnalysisWithR_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-qc4gZnjuFzw/T8Ffa08SEgI/AAAAAAAABIg/uRBL7vlrxso/s1600/000000_statisticalAnalysisWithR_1.png" /></a></div>
<h3>
Samples</h3>
A sample chapter from <em>Statistical Analysis with R</em> is available from the Packt website. This chapter, the book's eight, introduces the graphical capabilities of R, such as generating, customizing, and exporting various plots, charts, and graphs. You can download the <a href="https://www.packtpub.com/sites/default/files/2084-chapter-8-briefing-the-emperor.pdf" target="_blank">sample chapter</a> and its accompanying <a href="https://www.packtpub.com/support?nid=6478" target="_blank">R files</a> for free. If you like this chapter and are interested in learning more about R's graphical capabilities, you should know that chapter 9 demonstrates in depth how you can build and customize your own R visualizations.<br />
The publisher has also posted a few brief samples from the book, which can be accessed via the following links. These samples are taken from chapters 7 and 8 of <em>Statistical Analysis with R</em>. Respectively, they cover the common process behind all R analyses and introduce the graphical capabilities of R.<br />
<ul>
<li><a href="http://www.packtpub.com/article/organizing-clarifying-communicating-r-data-analyses" target="_blank">Organizing and Communicating R Analyses</a></li>
<li><a href="http://www.packtpub.com/article/customizing-graphics-creating-bar-chart-scatterplot-r" target="_blank">R Graphics Part 1</a></li>
<li><a href="http://www.packtpub.com/article/graphical-capabilities-of-r" target="_blank">R Graphics Part 2</a></li>
</ul>
<h3>
Feedback</h3>
If you decide to read <em>Statistical Analysis with R</em>, please feel free to provide me with your feedback. It would be great to know what you learned from the book, how future guides could be improved, and your overall experience with R.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/GoRNSxe4EUU" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/GoRNSxe4EUU/r-beginners-guide-book-update.htmlnoreply@blogger.com (John Quick)2http://rtutorialseries.blogspot.com/2010/11/r-beginners-guide-book-update.htmltag:blogger.com,1999:blog-6710487119650146215.post-1048569794875239790Mon, 11 Oct 2010 21:01:00 +00002012-05-28T22:19:30.308-07:00analysis of varianceANOVAomnibusone-wayR ProjectR Tutorial SeriesstatisticstutorialR Tutorial Series: One-Way Omnibus ANOVATesting the omnibus hypothesis via one-way ANOVA is simple process in R. This tutorial will explore how R can be used to perform a one-way ANOVA to test the difference between two (or more) group means.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_anova_oneWay_omnibus.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains a hypothetical sample of 60 participants, who are divided into two groups (control and treatment) of 30. The values represent a scale that ranges from 1 to 5. For instance, this dataset could be conceptualized as a comparison between two professional training programs, where the control group participated the company's longstanding program and the treatment group participated in an experimental program. The values could represent the attitudes of employees towards the training programs on a scale from 1 (poor) to 5 (excellent).<br />
<h3>Beginning Steps</h3>To begin, we need to read our dataset into R and store its contents in a variable.<br />
<blockquote class="codeBlock"><ol><li>> #read the one-way ANOVA dataset into an R variable using the read.csv(file) function</li>
<li>> dataOneWay <- read.csv("dataset_ANOVA_OneWay.csv")</li>
<li>> #display the data</li>
<li>> dataOneWay</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-D0OF2-ruDbY/T8FKCWCMi7I/AAAAAAAAA20/k1zXPwRKes8/s1600/20101011_anova_oneWay_omnibus_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/-D0OF2-ruDbY/T8FKCWCMi7I/AAAAAAAAA20/k1zXPwRKes8/s1600/20101011_anova_oneWay_omnibus_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">The first ten rows of our one-way ANOVA dataset.</div><h3>One-Way ANOVA</h3>Now that our data are ready, we can conduct a one-way omnibus ANOVA test using the <span class="Apple-style-span" style="color: #cc0000;">anova(object)</span> function.<br />
<blockquote class="codeBlock"><ol><li>> #use anova(object) to test the omnibus hypothesis in one-way ANOVA</li>
<li>> #is the difference between the group means statistically significant?</li>
<li>> anova(lm(Values ~ Group, dataOneWay))</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-MIDkFLREMd8/T8FKCrH627I/AAAAAAAAA3A/jQ182dAZAck/s1600/20101011_anova_oneWay_omnibus_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-MIDkFLREMd8/T8FKCrH627I/AAAAAAAAA3A/jQ182dAZAck/s1600/20101011_anova_oneWay_omnibus_2.png" /></a></div><div align="center"><br />
</div><div align="center"></div><div align="center">Our one-way ANOVA table.</div><br />
The output of our ANOVA test indicates that the difference between our group means is statistically significant (<em>p</em> < .001). Conceptually, this suggests that employee attitudes towards the experimental training program were significantly higher than their attitudes towards the preexisting program.<br />
<blockquote class="codeBlock">Note that the <span class="Apple-style-span" style="color: #cc0000;">object</span> argument in our <span class="Apple-style-span" style="color: #cc0000;">anova(object)</span> function contained a linear model generated by the <span class="Apple-style-span" style="color: #cc0000;">lm(formula, data)</span> function. This is the same type of model that is used when conducting linear regression in R. A more detailed explanation of the <span class="Apple-style-span" style="color: #cc0000;">lm(formula, data)</span> function and examples of its use are available in my <a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-simple-linear.html" target="_blank">Simple Linear Regression</a> article.</blockquote><h3>One-Way Multiple Group ANOVA</h3>Conducting a one-way omnibus ANOVA with multiple groups is identical to the demonstrated two-group test. The only difference is that the values in your dataset would be associated with more than two groups. Subsequently, the omnibus hypothesis would test for mean differences across all of the groups. The <span class="Apple-style-span" style="color: #cc0000;">anova(object)</span> function and its contained <span class="Apple-style-span" style="color: #cc0000;">lm(formula, data)</span> function would remain the same.<br />
<h3>Complete One-Way Omnibus ANOVA Example</h3>To see a complete example of how a one-way omnibus ANOVA can be conducted in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_anova_oneWay_omnibus.txt" target="_blank">one-way ANOVA example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/US3kr0tJJp4" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/US3kr0tJJp4/r-tutorial-series-one-way-omnibus-anova.htmlnoreply@blogger.com (John Quick)4http://rtutorialseries.blogspot.com/2010/10/r-tutorial-series-one-way-omnibus-anova.htmltag:blogger.com,1999:blog-6710487119650146215.post-7053711926821223922Fri, 01 Oct 2010 14:00:00 +00002010-11-13T09:28:33.404-07:00booksguidePackt PublishingpublishingR ProjectR Tutorial SeriestutorialupdateR Beginner's Guide Book Update 10/1/2010<p><strong>Update:</strong> <a href="http://rtutorialseries.blogspot.com/2010/11/r-beginners-guide-book-update.html"><em>Statistical Analysis with R</em></a> is now available!</p><hr/><p>I recently submitted the final drafts of all chapters of my R Beginner's Guide book, which is to be published through Packt. The official publishing timeline is set to December 2010, although the book may release ahead of schedule if all continues to go well. Below is an updated list of the major topics covered in the R Beginner's Guide.</p><p>Over the course of this book, you will acquire the knowledge and skills necessary to:</p><ul><li>Conduct organized data analyses in R </li><li>Communicate data analyses conducted in R</li><li>Generate, customize, and export detailed charts, plots, and graphs</li><li>Build your own custom data visualizations </li><li>Program in the R language </li><li>Create your own custom functions </li><li>Extend the functionality of R via external packages </li><li>Manage the R workspace and console</li><li>Import external data into R</li><li>Manipulate data using variables</li><li>Execute a wide array of multi-argument and variable-argument functions</li><li>Develop and employ predictive regression models</li><li>Assess the practical and statistical significance of predictions </li><li>Understand R, its benefits, and how to use it to maximize the impact of your data analyses</li></ul><img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/TgXtmZB7MxA" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/TgXtmZB7MxA/r-beginners-guide-book-update-1012010.htmlnoreply@blogger.com (John Quick)0http://rtutorialseries.blogspot.com/2010/10/r-beginners-guide-book-update-1012010.htmltag:blogger.com,1999:blog-6710487119650146215.post-7055142157745742503Sun, 19 Sep 2010 18:05:00 +00002012-05-28T22:16:04.868-07:00graphicsR ProjectR Tutorial SeriesscatterplotsstatisticstutorialR Tutorial Series: Labeling Data Points on a PlotThere are times that labeling a plot's data points can be very useful, such as when conveying information in certain visuals or looking for patterns in our data. Fortunately, labeling the individual data points on a plot is a relatively simple process in R. In this tutorial, we will use the <em>Calibrate</em> package's <em>textxy</em> function to label the points on a scatterplot.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_plot_labelingPoints.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains information used to estimate undergraduate enrollment at the University of New Mexico (Office of Institutional Research, 1990). Note that this tutorial assumes that this data has already been read into R and saved into a variable named <em>enrollmentData</em>.<br />
<h3>Plot</h3>To begin, we need to create a scatterplot using the <em>plot(x,y)</em> function. With our example data, we will plot the year on the x axis and the unemployment rate on the y axis.<br />
<blockquote class="codeBlock"><ol><li>> #generate a plot using the plot(x,y) function</li>
<li>> #plot year on the x axis and unemployment rate on the y axis</li>
<li> > plot(enrollmentData$YEAR, enrollmentData$UNEM)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-OFFVGMcECa4/T8FKB8ZUhqI/AAAAAAAAA2s/h66FJpV-MKU/s1600/20100919_plot_labelingPoints_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="373" src="http://2.bp.blogspot.com/-OFFVGMcECa4/T8FKB8ZUhqI/AAAAAAAAA2s/h66FJpV-MKU/s400/20100919_plot_labelingPoints_1.png" width="400" /></a></div><div align="center"><br />
</div><div align="center"></div>For a more detailed description of plotting data in R, see the article on <a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-scatterplots.html" target="_blank">scatterplots</a>.<br />
<h3>Textxy</h3>Within the <em>calibrate</em> package, the <em>textxy()</em> function can be used to label a plot's data points. The <em>textxy()</em> function accepts the following arugments ("Label points in a plot," n.d.).<br />
<ul>Required
<li>x: the x values of the plot's points</li>
<li>y: the y values of the plot's points</li>
<li>labs: the labels to be associated with the plot's points</li>
<ul></ul>Optional
<li>cx: used to resize the label font</li>
<li>dcol: used to set the label color; defaults to black</li>
<li>m: sets the origin of the plot; defaults to (0,0)</li>
</ul>Here, we will use <em>textxy()</em> to add labels for the enrollment at the University of New Mexico to each of our plot's data points.<br />
<blockquote class="codeBlock"><ol><li>> #if necessary, install the calibrate package</li>
<li>> #install.packages("calibrate")</li>
<li>> #load the calibrate package</li>
<li>> library(calibrate)</li>
<li>> #use the textxy() function to add labels to the preexisting plot's points</li>
<li>> #add labels for the total enrollment</li>
<li>> textxy(enrollmentData$YEAR, enrollmentData$UNEM, enrollmentData$ROLL)</li>
</ol></blockquote><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-f3KWEKhmX-0/T8FKB-6TymI/AAAAAAAAA24/XZq1L-MSAaI/s1600/20100919_plot_labelingPoints_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="370" src="http://4.bp.blogspot.com/-f3KWEKhmX-0/T8FKB-6TymI/AAAAAAAAA24/XZq1L-MSAaI/s400/20100919_plot_labelingPoints_2.png" width="400" /></a></div><div align="center"><br />
</div><div align="center"></div>In this case, adding labels to our data points helps us to better assess the relationships in our dataset.<br />
<h3>Complete Data Point Labeling Example</h3>To see a complete example of how a plot's data points can be labeled in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_plot_labelingPoints.txt" target="_blank">Data Point Labeling (.txt)</a> file.<br />
<h3>References</h3>Label points in a plot. (n.d.). Retrieved September 19, 2010 from http://rss.acs.unt.edu/Rdoc/library/calibrate/html/textxy.html<br />
Office of Institutional Research (1990). Enrollment Forecast [Data File]. Retrieved November 22, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/enrolldat.html<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/QjNHdDrfhwM" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/QjNHdDrfhwM/r-tutorial-series-labeling-data-points.htmlnoreply@blogger.com (John Quick)23http://rtutorialseries.blogspot.com/2010/09/r-tutorial-series-labeling-data-points.htmltag:blogger.com,1999:blog-6710487119650146215.post-1283198749198598037Mon, 19 Jul 2010 17:46:00 +00002010-11-13T09:28:48.020-07:00booksguidePackt PublishingpublishingR ProjectR Tutorial SeriestutorialupdateR Beginner's Guide Book Update 7/19/2010<p><strong>Update:</strong> <a href="http://rtutorialseries.blogspot.com/2010/11/r-beginners-guide-book-update.html"><em>Statistical Analysis with R</em></a> is now available!</p><hr/><p>I am excited to announce that I have submitted the entire first draft of my R Beginner's Guide book, which is to be published through Packt. The tenth and final chapter was submitted a full month ahead of schedule. The printed book could become available in as little as three to four months.</p><p>Below is a list of the major topics covered in the R Beginner's Guide.</p><ul><li>Understanding what R is, its benefits, and why to use it</li><li>Downloading, installing, and running R</li><li>Dissecting the anatomy of R</li><li>Programming in R</li><li>Handling external data</li><li>Using variables</li><li>Managing the R workspace and console</li><li>Using multi-argument and variable-argument functions</li><li>Creating predictive data models</li><li>Assessing practical vs. statistical significance</li><li>Regression modeling</li><li>Creating custom functions</li><li>Assessing the viability of predictions</li><li>Organizing and communicating data analyses</li><li>Generating, customizing, and exporting graphics</li><li>Building custom visualizations</li><li>Extending R via packages</li><li>Taking advantage of electronic learning resources</li></ul><img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/g5OUOIRXqT0" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/g5OUOIRXqT0/r-beginners-guide-book-update-7192010.htmlnoreply@blogger.com (John Quick)3http://rtutorialseries.blogspot.com/2010/07/r-beginners-guide-book-update-7192010.htmltag:blogger.com,1999:blog-6710487119650146215.post-7775596228241952551Thu, 29 Apr 2010 00:44:00 +00002010-11-13T09:29:14.487-07:00bookspacktR ProjectupdateR Beginner's Guide Book Update 4/28/2010<p><strong>Update:</strong> <a href="http://rtutorialseries.blogspot.com/2010/11/r-beginners-guide-book-update.html"><em>Statistical Analysis with R</em></a> is now available!</p><hr/><p>I am writing to update you on the progress of my R Beginner's Guide book, which is to be published through Packt. I have really gotten to work over the past couple months and have recently completed the first draft of the first half of the book. Right now, I am operating a few weeks ahead of our planned schedule, which calls for the first draft of all ten chapters by mid-August.</p><p>To give you an idea of its content, the book focuses on most of the topics covered in this blog as well as many more, such as data visualization, custom functions, and online resources. The topics are covered in great depth and numerous opportunities for practice and exploration are offered. The book's theme centers around the Three Kingdoms period of ancient China. The reader takes on the role of the lead strategist for the Shu kingdom at a pivotal point in history. Throughout the book, the reader uses R to help devise a course of action for the Shu forces.</p><p>I will continue to make steady progress on this book over the summer months. I am also excited to be able to share more R tutorials and knowledge in the near future through this project.</p><img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/Xj3-w3tJENU" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/Xj3-w3tJENU/r-beginners-guide-book-update-4282010.htmlnoreply@blogger.com (John Quick)0http://rtutorialseries.blogspot.com/2010/04/r-beginners-guide-book-update-4282010.htmltag:blogger.com,1999:blog-6710487119650146215.post-224259975705340065Mon, 15 Mar 2010 14:00:00 +00002012-05-26T13:10:21.023-07:00booksR BloggersR ProjectR Tutorial SeriesstatisticsupdateR Tutorial Series: R Beginner's Guide and R Bloggers Updates<hr /><p><strong>1/1/2011 Update:</strong> Tal Galili wrote an article that revisits <a href="http://www.r-statistics.com/2011/01/r-bloggers-in-2010-top-14-r-posts-site-statistics-and-invitation-for-sponsors/" target="_blank">the first year of R-Bloggers</a> and this post was listed as one of the top 14. Therefore, I decided to make a small update to each section. I start by describing the initial series of tutorials that I wrote. A few more have been added since and even more planned in the upcoming year. As always, an up to date listing of my articles can be found on the <a href="http://rtutorialseries.blogspot.com/" target="_blank">R Tutorial Series blog</a>. New posts will also continue to be offered through the <a href="http://www.r-bloggers.com/" target="_blank">R Bloggers</a> network.</p><hr /><p>Since October 2009, I have written 13 articles [many more now, of course] for the R Tutorial Series blog. The first two introduce new users to R. The remaining 11 cover a wide range of topics related to multiple regression and correlation. This collection of tutorials represents my most recent training in statistics. Thus, for the time being, I will not be contributing new articles as frequently as I have over the past few months. However, I will undoubtedly encounter future projects that require new statistical methods and partake in more statistics courses, both of which will provide additional tutorial material. Below is a categorized list of the articles currently offered in the R Tutorial Series.</p><p>Introduction to R</p><ul><li><a href="http://rtutorialseries.blogspot.com/2009/10/r-tutorial-series-introduction-to-r_11.html" target="_blank">Part 1</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2009/10/r-tutorial-series-introduction-to-r.html" target="_blank">Part 2</a></li>
</ul><p>Descriptive Statistics</p><ul><li><a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-summary-and.html" target="_blank">Summary and Descriptive Statistics</a></li>
</ul><p>Data Visualization</p><ul><li><a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-scatterplots.html" target="_blank">Scatterplots</a></li>
</ul><p>Correlation</p><ul><li><a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-zero-order.html" target="_blank">Zero-Order Correlations</a></li>
</ul><p>Regression</p><ul><li><a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-simple-linear.html" target="_blank">Simple Linear Regression</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2009/12/r-tutorial-series-multiple-linear.html" target="_blank">Multiple Linear Regression</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2009/12/r-tutorial-series-graphic-analysis-of.html" target="_blank">Regression Assumptions</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2010/01/r-tutorial-series-regression-with.html" target="_blank">Regression with Interaction Variables</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2010/02/r-tutorial-series-regression-with.html" target="_blank">Regression with Categorical Variables</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2010/02/r-tutorial-series-basic-polynomial.html" target="_blank">Polynomial Regression</a></li>
<li><a href="http://rtutorialseries.blogspot.com/2010/01/r-tutorial-series-basic-hierarchical.html" target="_blank">Hierarchical Linear Regression</a></li>
</ul><p>I also have two additional R-related items to update you on. The first is the R Bloggers website and the second is my R Beginner's Guide.</p><hr /><p><strong>1/1/2011 Update:</strong> I originally reported that 50 blogs composed the <a href="http://www.r-bloggers.com/" target="_blank">R Bloggers</a> network. Now that number has risen to over 140. I hope that R Bloggers continues to thrive and contribute to the R community.</p><hr /><h3>R Tutorial Series on R Bloggers</h3><p><a href="http://www.r-bloggers.com/" target="_blank">R Bloggers</a> (http://www.r-bloggers.com) is a website that aggregates over 50 different blogs that focus on R. It is an excellent resource for keeping up to date on the many uses of R and for learning about the wide range of work being conducted in R. I recommend using R Bloggers for these purposes. The R Tutorial Series was invited to participate in the R Bloggers collection and is now available to R Bloggers' readers.</p><h3>R Beginner's Guide</h3><hr /><p><strong>11/1/2010 Update:</strong> <a href="http://rtutorialseries.blogspot.com/2010/11/r-beginners-guide-book-update.html" target="_blank"><em>Statistical Analysis with R</em></a> is now available!</p><hr /><p>Lastly, I want to let you know that I am working on a beginner's guide for R. It is primarily focused towards introducing R to information technology, business, and data analyst professionals. The book will be offered through <a href="http://link.packtpub.com/or7f1u" target="_blank">PACKT Publishing</a> (http://www.packtpub.com) and should be available within the next year. If you have enjoyed the R Tutorial Series, then you may be interested in looking for the guide once it is completed. In the meantime, keep reading the R Tutorial Series and R Bloggers and I will keep you updated on the book's major milestones.</p><img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/59iGaXEXk9c" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/59iGaXEXk9c/r-tutorial-series-r-beginners-guide-and.htmlnoreply@blogger.com (John Quick)2http://rtutorialseries.blogspot.com/2010/03/r-tutorial-series-r-beginners-guide-and.htmltag:blogger.com,1999:blog-6710487119650146215.post-9121762178098403960Mon, 08 Feb 2010 14:00:00 +00002012-05-28T22:13:05.075-07:00hierarchical linear regressionmultiple linear regressionpolynomial regressionR ProjectR Tutorial SeriesstatisticstutorialR Tutorial Series: Basic Polynomial RegressionOften times, a scatterplot reveals a pattern that seems not so linear. Polynomial regression can be used to explore a predictor at different levels of curvilinearity. This tutorial will demonstrate how polynomial regression can be used in a hierarchical fashion to best represent a dataset in R.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_multipleRegression_polynomial.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached. This dataset contains hypothetical student data that uses practice exam scores to predict final exam scores.<br />
<h3>Scatterplot</h3><div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-0tmqX1l4j58/T8FKBjbvTjI/AAAAAAAAA2c/tMd5lcBSIxc/s1600/20100208_multipleRegression_polynomial_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-0tmqX1l4j58/T8FKBjbvTjI/AAAAAAAAA2c/tMd5lcBSIxc/s1600/20100208_multipleRegression_polynomial_1.png" /></a></div><div align="center"><br />
</div><div align="center"></div>The preceding <a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-scatterplots.html" target="_blank">scatterplot</a> demonstrates that these data may not be linear. Notably, no one scored lower than 50 on the practice exam and at approximately the 85 and above practice mark, final exam scores taper off. These suggest that the data is curvilinear. Furthermore, since exam scores range between 0 to 100, it is not possible to observe nor appropriate to predict that an individual with a 150 practice score would have a certain final exam score.<br />
<h3>Creating The Higher Order Variables</h3>A two step process, identical to the one used to create <a href="http://rtutorialseries.blogspot.com/2010/01/r-tutorial-series-regression-with.html" target="_blank">interaction variables</a>, can be followed to create higher order variables in R. First, the variables must be centered to mitigate multicollinearity. Second, the predictor must be multiplied by itself a certain number of times to create each higher order variable. In this tutorial, we will explore the a linear, quadratic, and cubic model. Therefore, the predictor will need to be squared to create the quadratic model and cubed to create the cubic model.<br />
<h4>Step 1: Centering</h4>To center a variable, simply subtract its mean from each data point and save the result into a new R variable, as demonstrated below.<br />
<blockquote class="codeBlock"><ol><li>> #center the independent variable</li>
<li>> FinalC <- Final - mean(Final)</li>
<li>> #center the predictor</li>
<li>> PracticeC <- Practice - mean(Practice)</li>
</ol></blockquote><h4>Step 2: Multiplication</h4>Once the input variable has been centered, the higher order terms can be created. Since a higher order variable is formed by the product of a predictor with itself, we can simply multiply our centered term from step one and save the result into a new R variable, as demonstrated below.<br />
<blockquote class="codeBlock"><ol><li>> #create the quadratic variable</li>
<li>> PracticeC2 <- PracticeC * PracticeC</li>
<li>> #create the cubic variable</li>
<li>> PracticeC3 <- PracticeC * PracticeC * PracticeC</li>
</ol></blockquote><h3>Creating The Models</h3>Now we have all of the pieces necessary to assemble our linear and curvilinear models.<br />
<blockquote class="codeBlock"><ol><li>> #create the models using lm(FORMULA, DATAVAR)</li>
<li>> #linear model</li>
<li>> linearModel <- lm(FinalC ~ PracticeC, datavar)</li>
<li>> #quadratic model</li>
<li>> quadraticModel <- lm(FinalC ~ PracticeC + PracticeC2, datavar)</li>
<li>> #cubic model</li>
<li>> cubicModel <- lm(FinalC ~ PracticeC + PracticeC2 + PracticeC3, datavar)</li>
</ol></blockquote><h3>Evaluating The Models</h3>As is the case in other forms of regression, it can be helpful to summarize and compare our potential models using the summary(MODEL) and anova(MODEL1, MODEL2,… MODELi) functions.<br />
<ol><li>> #display summary information about the models</li>
<li>> summary(linearModel)</li>
<li>> summary(quadraticModel)</li>
<li>> summary(cubicModel)</li>
<li>#compare the models using ANOVA</li>
<li>anova(linearModel, quadraticModel, cubicModel)</li>
</ol>The model summaries and ANOVA comparison chart are displayed below.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-11Vz-2J0dGE/T8FKBjPbXGI/AAAAAAAAA2o/AxjCGPvTjuw/s1600/20100208_multipleRegression_polynomial_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-11Vz-2J0dGE/T8FKBjPbXGI/AAAAAAAAA2o/AxjCGPvTjuw/s1600/20100208_multipleRegression_polynomial_2.png" /></a></div><br />
<div align="center"></div>At this point we can compare the models. In this case, the quadratic and cubic terms are not statistically significant themselves nor are their models statistically significant beyond the linear model. However, in a real research study, there would be other practical considerations to make before deciding on a final model.<br />
<h3>More On Interactions, Polynomials, and HLR</h3>Certainly, much more can be done with these topics than I have covered in my tutorials. What I have provided is a basic discussion with guided examples. The regression topics covered in these tutorials can be mixed and matched to create exceedingly complex models. For example, multiple interactions and higher order variables could be contained in a single model. The good news is that more complex models can be created using the same techniques covered here. The basic principles remain the same.<br />
<h3>Complete Polynomial Regression Example</h3>To see a complete example of how polynomial regression models can be created in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_multipleRegression_polynomial.txt" target="_blank">polynomial regression example (.txt)</a> file.<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/zhQ3mwvkjuM" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/zhQ3mwvkjuM/r-tutorial-series-basic-polynomial.htmlnoreply@blogger.com (John Quick)0http://rtutorialseries.blogspot.com/2010/02/r-tutorial-series-basic-polynomial.htmltag:blogger.com,1999:blog-6710487119650146215.post-8942017737289520969Mon, 01 Feb 2010 14:00:00 +00002012-05-28T22:11:15.211-07:00box plotscategorical regressiondummy codingmultiple linear regressionR ProjectR Tutorial SeriesstatisticstutorialR Tutorial Series: Regression With Categorical VariablesCategorical predictors can be incorporated into regression analysis, provided that they are properly prepared and interpreted. This tutorial will explore how categorical variables can be handled in R.<br />
<h3>Tutorial Files</h3>Before we begin, you may want to download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/dataset_multipleRegression_categorical.csv" target="_blank">sample data (.csv)</a> used in this tutorial. Be sure to right-click and save the file to your R working directory. Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached. This dataset contains variables for the following information related to NFL quarterback and team salaries in 1991.<br />
<ul><li>TEAM: Name of team</li>
<li>QB: Starting quarterback salary in thousands of dollars</li>
<li>TOTAL: team salary in thousands of dollars</li>
<li>CONF: conference (NFC or AFC)</li>
</ul>In this dataset, the CONF variable is categorical. It can take on one of two values, either NFC or AFC. Suppose for the purposes of this tutorial that our research question is "how well do quarterback salary and conference predict total team salary?" The model that we use to answer this question will need to incorporate the categorical predictor for conference.<br />
<h3>Dummy Coding</h3>To be able to perform regression with a categorical variable, it must first be coded. Here, I will use the as.numeric(VAR) function, where VAR is the categorical variable, to dummy code the CONF predictor. As a result, CONF will represent NFC as 1 and AFC as 0. The sample code below demonstrates this process.<br />
<blockquote class="codeBlock"><ol><li>> #represent a categorical variable numerically using as.numeric(VAR)</li>
<li>> #dummy code the CONF variable into NFC = 1 and AFC = 0</li>
<li>> dCONF <- as.numeric(CONF) - 1</li>
</ol></blockquote>Note that the -1 that comes after the as.numeric(CONF) function causes the variables to read 1 and 0 rather than 2 and 1, which is the default behavior.<br />
<h3>Interpretation</h3><h4>Visual</h4>One useful way to visualize the relationship between a categorical and continuous variable is through a box plot. When dealing with categorical variables, R automatically creates such a graph via the plot() function (see <a href="http://rtutorialseries.blogspot.com/2009/11/r-tutorial-series-scatterplots.html" target="_blank">Scatterplots</a>). The CONF variable is graphically compared to TOTAL in the following sample code.<br />
<blockquote class="codeBlock"><ol><li>> #use the plot() function to create a box plot</li>
<li>> #what does the relationship between conference and team salary look like?</li>
<li>> plot(CONF, TOTAL, main="Team Salary by Conference", xlab="Conference", ylab="Salary ($1,000s)")</li>
</ol></blockquote>The resulting box plot is show below.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-pTE9mZUYges/T8FKA76zdwI/AAAAAAAAA10/FbizemwTN78/s1600/20100201_multipleRegression_categorical_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-pTE9mZUYges/T8FKA76zdwI/AAAAAAAAA10/FbizemwTN78/s1600/20100201_multipleRegression_categorical_1.png" /></a></div><br />
<div align="center"></div>From a box plot, we can derive many useful insights, such as the minimum, maximum, and median values. Our box plot of total team salary on conference suggests that, compared to AFC teams, NFC teams have slightly higher salaries on average and the range of these salaries is larger.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-bMIOnf9SUOU/T8FKBA082iI/AAAAAAAAA2Q/mEvPXr4NpaI/s1600/20100201_multipleRegression_categorical_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/-bMIOnf9SUOU/T8FKBA082iI/AAAAAAAAA2Q/mEvPXr4NpaI/s1600/20100201_multipleRegression_categorical_2.png" /></a></div><br />
<div align="center"></div><h4>Routine Analysis</h4>Once a categorical variable has been quantified, it can be used in routine analyses, such as descriptive statistics and correlations. The following code depicts a few examples.<br />
<blockquote class="codeBlock"><ol><li>> #what are the mean and standard deviation of conference?</li>
<li>> mean(dCONF)</li>
<li>> [1] 0.5</li>
<li>> sd(dCONF)</li>
<li>> [1] 0.5091751</li>
<li>> #this makes sense… there are an even number of teams in both conferences and they are coded as either 0 or 1!</li>
<li>> #what is the correlation between total team salary and conference?</li>
<li>> cor(dCONF, TOTAL)</li>
<li>> [1]0.007019319</li>
</ol></blockquote>The correlation between total team salary and conference indicates that there is little to no linear relationship between the variables.<br />
<h4>Linear Regression</h4>Let's return to our original question of how well quarterback salary and conference predict team salary. With the categorical predictor quantified, we can create a regression model for this relationship, as demonstrated below.<br />
<blockquote class="codeBlock"><ol><li>> #create a linear model using lm(FORMULA, DATAVAR)</li>
<li>> #predict team salary using quarterback salary and conference</li>
<li>linearModel <- lm(TOTAL ~ QB + dCONF, datavar)</li>
<li>#generate model summary</li>
<li>summary(linearModel)</li>
</ol></blockquote>The model summary is pictured below.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-mGf3yJlNHZU/T8FKBSU7KYI/AAAAAAAAA2E/Dlq2tW_pT8Q/s1600/20100201_multipleRegression_categorical_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://3.bp.blogspot.com/-mGf3yJlNHZU/T8FKBSU7KYI/AAAAAAAAA2E/Dlq2tW_pT8Q/s1600/20100201_multipleRegression_categorical_3.png" /></a></div><br />
<div align="center"></div>Considering both the counterintuitive and statistically insignificant results of this model, our analysis of the conference variable would likely end or change directions at this point. However, there is one more interpretation method that is worth mentioning for future reference.<br />
<h4>Split Model</h4>With a dummy coded predictor, a regression model can be split into two halves by substituting in the possible values for the categorical variable. For example, we can think of our model as a regression of total salary on quarterback salary for two states of the world - teams in the AFC and teams in the NFC. These derivative models are covered in the following sample code.<br />
<blockquote class="codeBlock"><ol><li>> #input the categorical values to split the linear model into two representations</li>
<li>> #the original model: TOTAL = 19099 + 2.5 * QB - 103 * dCONF</li>
<li>> #substitute 0 for dCONF to derive the AFC model: TOTAL = 19099 + 2.5 * QB</li>
<li>> #substitute 1 for dCONF to derive the NFC model: TOTAL = 18996 + 2.5 * QB</li>
<li>#what is the predicted salary for a team with a quarterback salary of $2,000,000 in the AFC and NFC conferences?</li>
<li>#AFC prediction</li>
<li>19099 + 2.5 * 2000</li>
<li>[1] 24099</li>
<li>#NFC prediction</li>
<li>18996 + 2.5 * 2000</li>
<li>[1] 23996</li>
</ol></blockquote>Based only on what we have modeled, we can further infer that conference was not a significant predictor of total team salaries in the NFL in 1991. The difference between the team salaries based on conference is less than one-half of one percent on average! Of course, only using quarterback salary and conference to predict an NFL team's overall salary is neglecting quite a few potentially significant predictors. Nonetheless, split model interpretation is a useful way to break down the perspectives captured by a categorical regression model.<br />
<h3>More On Categorical Predictors</h3>Certainly, much more can be done with categorical variables than the basic dummy coding that was demonstrated here. Individuals whose work requires a deeper inspection into the procedures of categorical regression are encouraged to seek additional resources (and to consider writing a guest tutorial for this series).<br />
<h3>Complete Categorical Regression Example</h3>To see a complete example of how a categorical regression model can be created in R, please download the <a href="http://dl.dropbox.com/u/10246536/Web/RTutorialSeries/example_multipleRegression_categorical.txt" target="_blank">categorical regression example (.txt)</a> file.<br />
<h3>References</h3>The Associated Press. (1991). Q-back and team salaries [Data File]. Retrieved December 14, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/qbacksalarydat.html<img src="http://feeds.feedburner.com/~r/RTutorialSeries/~4/ACaZRs_OEYI" height="1" width="1"/>http://feedproxy.google.com/~r/RTutorialSeries/~3/ACaZRs_OEYI/r-tutorial-series-regression-with.htmlnoreply@blogger.com (John Quick)19http://rtutorialseries.blogspot.com/2010/02/r-tutorial-series-regression-with.html