<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:admin="http://webns.net/mvcb/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
<channel>
    <title>Blog-Normal Distribution</title>
    <link>http://blogs.sas.com/blognormal/</link>
    <description>Analytic lessons and random observations by John Sall, JMP creator and SAS co-founder</description>
    <dc:language>en</dc:language>
    <admin:errorReportsTo rdf:resource="mailto:noreply@sas.com" />
    <generator>Serendipity 1.0.3 - http://www.s9y.org/</generator>
    <pubDate>Fri, 02 Oct 2009 18:22:09 GMT</pubDate>

    <image>
        <url>http://blogs.sas.com/blognormal/templates/wwwgeneric/img/rss_banner.png</url>
        <title>RSS: Blog-Normal Distribution - Analytic lessons and random observations by John Sall, JMP creator and SAS co-founder</title>
        <link>http://blogs.sas.com/blognormal/</link>
        <width>1</width>
        <height>1</height>
    </image>

<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/Blog-normalDistribution" type="application/rss+xml" /><feedburner:emailServiceId>Blog-normalDistribution</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
    <title>JMP Is 20 Years Old</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/WCj28HY6LZM/index.php</link>
            <category>JMP</category>
            <category>SAS</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/11-JMP-Is-20-Years-Old.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=11</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=11</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    Today is the 20th anniversary of JMP's first release, and I want to thank everyone who has helped to make <a href="http://www.jmp.com" >JMP</a> a success.<br />
<br />
JMP Version 1 shipped on October 5, 1989 -- or as we claimed at the time September 35 -- so that we could say we shipped in the third quarter of 1989, our goal.<br />
<br />
JMP started as a research project in the late '80s. In the earlier part of that decade, we had spent several years rewriting <a href="http://www.sas.com" >SAS</a> completely (but compatibly) to fit on personal computers.<br />
<br />
But by 1988, we felt three big forces, which can be characterized by:<br />
<ul><li> the Vehicle -- cars as well as trucks<br />
<li> the Roles -- detectives as well as lawyers<br />
<li> the Technology -- pointing as well as writing</ul><br />
As for the Vehicle, SAS was becoming a large enterprise-scale product -- a larger investment than some users, like engineers and scientists, were willing to handle. We were producing analytical trucks, but there was a market for analytical cars, i.e., something with low investment and ease of driving. We needed a more personal-scale tool, one for the desktop project rather than for the enterprise system.<br />
<br />
As for the Roles, statistics itself was seeing the opportunities in exploratory techniques, and the value of graphics and interactivity. The statistics profession had been molded as a testing discipline, a role like a lawyer whose job is to prove things that we already knew. What was missing was the exploratory role, like a detective, whose job is to discover things we didn't already know. Especially since John Tukey's <em>Exploratory Data Analysis</em> and the improvement of statistical graphics, statistics needed to serve in the detective role as well as the lawyer role. Graphics was the key enabler of seeing patterns, and points that don't fit patterns.<br />
<br />
As for the Technology, the graphical user interface arrived with the Macintosh, and later, Windows. It is a huge difference to just point and click rather than look up and type. Applications written for batch computing through languages were not suited for graphical interactivity. It was time for some fresh design.<br />
<br />
In response to these three forces, we formed a small group to put something together. In a year and a half, we released Version 1 of JMP. This was a very small product compared to the JMP of today, but it had all the basics of statistics and graphics, with many innovative features. We thought "jump" was a name to suggest a big step into a new future, a product that jumps in responsiveness to the mouse, and a tool that enables our customers to do the experiments and make the discoveries to take huge strides in their products and processes.<br />
<br />
In the early years, we learned important lessons. We learned that engineers and scientists were our most important customer segment. These people were smart, motivated and in a hurry -- too impatient to spend time learning languages, and eager to just point and click on their data. We had a product that was nearly as easy as walk-up-and-use with enough delights to hold their loyalty.<br />
<br />
We learned that engineers need <a href="http://www.jmp.com/applications/doe/" >design of experiments</a> (DOE), quality and productivity support (Six Sigma), and reliability modeling. We made sure we got better in these areas -- particularly DOE. We thought that engineers should be able to just ask the computer to custom-make a design that fits their needs rather than attempting to find a pre-built design that works.<br />
<br />
We learned how to port to Windows. We made JMP work on Windows with release 3.1, using the Altura library. This was a quick effort. Soon we were busy rewriting the whole product in a different implementation language with a portability host-interface layer, which led to a wait of more than three years before Version 4. Version 4 not only switched languages, but also introduced a new nervous system for the product, including the JMP Scripting Language.<br />
<br />
In the last few years, JMP has matured considerably. The big driving force has been in meeting the needs of those users we talk to, who correspond with us, who sometimes invite us into their sites. We have a very dedicated group of users who keep us directed, and help us serve more and more researchers every year. Recently, I heard the group of passionate JMP users termed the “JMPerati,” analogous to Stephen Baker’s term, the “<a href="http://thenumerati.net/" >numerati</a>.”<br />
<br />
JMP has broadened to become more versatile. JMP now supports business visualization in partnership with <a href="http://www.sas.com/technologies/bi/" >SAS Business Intelligence</a>, and this in turn has encouraged us to introduce more visualization platforms, like the drag-and-drop <a href="http://www.jmp.com/software/jmp8/demos.shtml?panel=1" >Graph Builder</a> in JMP 8. JMP can now handle larger problems because of work we have done to multithread many of the bottleneck methods and to implement JMP on 64-bit systems. And we now work with various SAS teams on projects in several areas, collaborating and sharing efforts.<br />
<br />
JMP is 20 years old, but it seems like it is just getting started. We are growing fast. Last year, our business grew faster than ever, and we are set up to grow even faster in the future.<br />
<br />
Happy birthday, JMP, and thank you, everyone, for your contributions to JMP's success.<br />
 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/WCj28HY6LZM" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 02 Oct 2009 14:22:09 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/11-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/11-JMP-Is-20-Years-Old.html</feedburner:origLink></item>
<item>
    <title>Was Goldilocks’ Negative R-Square Sickness Contrived?</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/GZI01SVhW80/index.php</link>
            <category>Analytics</category>
            <category>JMP</category>
            <category>Statistics</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/10-Was-Goldilocks-Negative-R-Square-Sickness-Contrived.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=10</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=10</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    In my previous blog post about <a href="http://blogs.sas.com/blognormal/index.php?/archives/9-Goldilocks-and-the-Negative-R-Square.html" >Goldilocks and the negative R-Square</a>, I think I left you with an impression that regression fits are garbage unless you trim down your models. Basically, your attitude should be that of a sculptor: You cut away at the model until you have the best image of reality. <br />
<br />
Now I should confess that in order to get models that behaved so badly, I had to ratchet up the error variation and ratchet down the coefficient scale. My error standard deviation was 50 times the standard deviation of the coefficients, but that is somewhat unrealistic. Most models are not going to have negative R-Squares, even if they are overfit.<br />
<br />
To see that, consider another set of simulations, where I choose a number of different variances for the residual error and a number of different variances for the scale of the parameters. This is for the same 200-regressor by 512-row problem from the previous blog post. <br />
<br />
In the graphs of crossvalidation R-Square below, the columns are the different error standard deviations, and the rows are the standard deviations used to generate the model coefficients.<br />
<br />
Now we see that the negative R-Squares occur only in the upper right, where the error standard deviation is large, 16, and the coefficient scale is small, with standard deviation 1. All the other combinations behave pretty well — in fact, they even consistently rise as it captures more of the support from all the variables. You see five simulations here, with the generated errors and coefficients different for each track.<br />
<br />
<img width='472' height='515' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/blognormal/uploads/overfitOrthogValSmall.gif" alt="Graph in JMP showing five simulations and errors and coefficients for each track" /><br clear="all" /><br />
<br />
That is a reassuring picture. Unfortunately, it is the not the picture that actually comes out when you do a random holdback from the experimental data. Here is the actual picture:<br />
<br />
<img width='426' height='497' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/blognormal/uploads/overfitRandomXValSmall.gif" alt="Graph in JMP showing the picture that emerges when you perform a random holdback from the data" /><br clear="all" /><br />
<br />
What a mess! There are nine tracks of crossvalidation R-Square as it sequences, adding up to all 200 regressors. There are three different holdback selections, identified by the three colors, and three repeats of the same holdback using different simulations of the error and coefficients. Notice especially how misbehaved the blue tracks are. That was the first holdback, the one I used in <a href="http://blogs.sas.com/blognormal/index.php?/archives/9-Goldilocks-and-the-Negative-R-Square.html" >my last post</a>. The blue tracks not only go bad for low coefficient scale and high error (upper right), but some go bad for large coefficients and small error — notice the cell for column Sigma Error 2 and row Sigma Beta 8 — it has one track with R-Square near zero for most of the stepping to 200 regressors.<br />
<br />
But notice that the red and green tracks are much more reasonable, behaving more like the tracks in the first graph.<br />
<br />
It turns out that the blue holdback must have really damaged the experimental design; it took away enough important points to not support important corners of the experimental region, and thus the estimates were not supported well. <br />
<br />
The lesson here is that you shouldn’t just take a random holdback from an experimental design that is fairly thin on data (200 variables supported by 384 rows). The holdback that was used in the first picture was carefully selected, basically making another factor in the design, and using it to select a random holdback from. This ensured that it was still supporting the rest of the variables well. <br />
<br />
So Goldilocks may have seen Papa Bear’s big model and discovered that it was not so bad after all. With big effects and reasonable error variance, data is going to be worth fitting to models, even large ones, as long as you don’t mess up your estimates by a poor choice of validation sample. 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/GZI01SVhW80" height="1" width="1"/>]]></content:encoded>

    <pubDate>Mon, 24 Aug 2009 10:43:59 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/10-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/10-Was-Goldilocks-Negative-R-Square-Sickness-Contrived.html</feedburner:origLink></item>
<item>
    <title>Goldilocks and the Negative R-Square</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/ZUfXLsFH24E/index.php</link>
            <category>Analytics</category>
            <category>JMP</category>
            <category>Statistics</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/9-Goldilocks-and-the-Negative-R-Square.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=9</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=9</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    Learning from data is good, so you would think that bigger studies to fit bigger models is always good. How could it ever be worse to learn more through data? It’s time for a story. <br />
<br />
Goldilocks found the home of the research bears and looked into the first lab and found the work of Brother Bear. She found a little study, underpowered and thus unable to find anything significant, so Goldilocks left Brother Bear’s lab to look for better studies. She stepped into the next lab and found the work of Mama Bear. She found a bigger study than Baby Bear's, with some significant factors, but it only whetted Goldilocks' appetite to digest even bigger studies. So she stepped into the next lab and found the work of Papa Bear. Papa Bear’s study had thousands of cases on hundreds of variables and was performed by a careful experimental design. Goldilocks gobbled up the big results from Papa Bear and promptly got deathly sick. <br />
<br />
Goldilocks was rushed to the hospital, where it was learned that she was the victim of a poisonous level of overfitting. When the fitted model was studied on an independent crossvalidation sample, the R-Square of the fit was negative! It fit worse than a model that didn’t know any variables, a fit worse than the simple mean. <br />
<br />
The statistical pathologist had seen plenty of overfit poisonings, but they were fits to happenstance data. This isn’t supposed to happen in healthy well-designed experiments! But it turns out that size matters. Even in a well-designed experiment, you can overfit. <br />
<br />
You don’t believe this fairy tale? Let’s simulate some data and see. <br />
<br />
First, we create a fractional factorial experimental design with 200 factors in 512 observations. Then, we create a response that is just random normal. Here is the summary of fit. <br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="29" alt="fit summary for Null data, all terms" hspace="0" src="http://blogs.sas.com/blognormal/uploads/overfitAllMeasures.gif" width="464" border="1" /><br clear="all" /><br />
<br />
The R-Square for the model is .46. We are capturing 46% of the variation in the model with the regressors. But the response is random, so the variation that is captured in the coefficients of the estimated model is random too. We are misled by the R-Square. <br />
<br />
But we held back a random selection of 25% of the rows for crossvalidation. What is the R-Square of that? It is negative: -6.26. It is not merely negative, meaning that it the predictions are worse than the mean. The sum of squared errors is more than seven times worse than just fitting a mean! Of course, fitting a mean in this case is the correct model, i.e., just fit an intercept parameter. <br />
<br />
The problem is that if we didn’t hold back a crossvalidation sample, we wouldn’t have known that the intercept-only model was the right one. An R-Square of 46% would seem good, and many regressors would test as significant. <br />
<br />
This was the NULL case where there was no real effect. Does it improve if there were real effects? We simulated a second example with all effects being real, but with the coefficients of 10 of the 200 being large (around 2) and the rest being small (random with standard deviation .2) with normal random error with standard deviation 10. With plenty of real effects, can we predict well with new data (i.e., on the crossvalidation set)? If you fit all the variables, you still get a negative R-Square for validation. The model fits better when there are real effects, but it still doesn’t fit better than just fitting a mean; in fact, it fits with more than twice the sum of squared errors of fitting the mean!<br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="28" alt="summary with real effects, all in" hspace="0" src="http://blogs.sas.com/blognormal/uploads/overfitYRealAllIn.gif" width="475" border="1" /><br clear="all" /><br />
<br />
This is a real model with all real effects in a designed experiment! The only problem is that the error is big and the effects are small, and there are a lot of effects. The effects absorb random variation as well as some of their true effect variation. <br />
<br />
With this introduction, we need to consider two questions, first a question of attitude and second a question of criterion. <br />
<br />
The first question is whether we should be reducing the model, removing terms that seem to not be significant. If we are doing the study only to test hypotheses, then maybe we shouldn’t be cutting it down. In any testing situation, we expect to end up with 5% of the null-hypothesis tests being falsely significant — that is the way we sized the test, with an alpha level of .05. When we cut down the model to just the significant terms, we still expect 5% of the original null terms to be falsely significant, but now they don’t appear with all the other terms, so we may mislead ourselves into thinking 5% of the new set, not the original set of terms. What used to be an alpha-sized now becomes test selection bias. The correct answer is NO, don’t reduce; reducing the model makes the tests invalid! <br />
<br />
But if we don’t cut down the model, we can’t predict well — our predictions will be loaded up with random variation from the estimates. So we have no choice but to cut down. So the answer is also YES, we have to reduce the model if we want to predict. <br />
<br />
Thus we have two camps, equally valid, answering differently because they have different goals. One camp is trying to test, the other to predict. In industrial statistics, in data mining and in many other fields, we don’t care about tests; we care about prediction, so we need to reduce our models. <br />
<br />
The second question is, given that we need to cut down the model, how do we decide to select a model? The old default in JMP’s Stepwise is to forward select regressors until the effects are no longer significant at some p-value level. Here we see the sequence of p-values — remember that 10 terms have big coefficients, and all the others have small coefficients. It looks like Stepwise gets all the good terms, and at .05 selects about 5% of the weak regressors, as one would expect. The default p-value stopping criterion is .25, meant to ensure that it gets all the good terms. How well is the model predicting across this sequence? What is the validation R-Square at 30 terms? It turns out that it is about 5%, not very good. If you use alpha=.25 to stop, you get 66 terms and a validation R-Square of (negative) -.10, embarrassingly bad. <br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="256" alt="P value sequence" src="http://blogs.sas.com/blognormal/uploads/pvalueBypYReal.gif" width="484" /><br clear="all" /><br />
<br />
Here is the sequence of R-Squares with the training data in red, and the crossvalidation data in blue. As you would expect, the crossvalidation stops soon after 10 when it starts getting the weak regressors. The training R-Square can only increase. <br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="349" alt="validation R-Square" src="http://blogs.sas.com/blognormal/uploads/RSquareYReal.GIF" width="469" /><br clear="all" /><br />
<br />
A holdback crossvalidation set is the gold standard for determining the size of the model. It measures predictability in a very direct way. If you can afford to hold back data, you should definitely do it! <br />
<br />
What if you don’t have the luxury of holding back data? Is there some other way? There are many criteria, but some of the most popular are the “information” criteria, the Akaike Information Criterion (AICc) and the Bayesian Information Criterion (BIC). You choose the model with the smallest criterion. Here is a plot of the AICc (in red) and BIC (in blue) for this model with 10 strong regressors and 190 weak ones. <br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="392" alt="AICc and BIC" src="http://blogs.sas.com/blognormal/uploads/AICandBICYReal.gif" width="500" /><br clear="all" /><br />
<br />
BIC will choose smaller models than AIC. In this case, BIC gets it right, choosing 15 to 20 terms, which the crossvalidation shows will be close to the minimum prediction variance. So we recommend BIC, and in the next version of JMP, BIC will be the default stopping rule if there is no holdback crossvalidation set. <br />
<br />
Let’s go back to the NULL data with no true regression effects. Below are the corresponding plots for the NULL case response. The Validation R-Square actually thinks you should select four terms, too many, but it is not a suggesting very strongly. <br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="348" alt="Null Model RSquares" src="http://blogs.sas.com/blognormal/uploads/overfitRSquareNULL.gif" width="473" /><br clear="all" /><br />
<br />
The BIC gets it right, selecting the minimum number of terms, and AICc overselects around 30 terms, where the validation R-Square will be around -.33. <br />
<br />
<img style="BORDER-RIGHT: 0px; PADDING-RIGHT: 5px; BORDER-TOP: 0px; PADDING-LEFT: 5px; BORDER-LEFT: 0px; BORDER-BOTTOM: 0px" height="396" alt="Null Model AICc and BIC" src="http://blogs.sas.com/blognormal/uploads/overfitAICBICYNull.gif" width="497" /><br clear="all" /><br />
<br />
So Goldilocks, stay away from Big Papa Bear models unless you are equipped to cut them down to size. SAS has a whole PROC dedicated to cutting models down to size, PROC GLMSELECT. JMP needs more of these kind of features, which will be coming with its next release.  
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/ZUfXLsFH24E" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 07 Aug 2009 13:06:08 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/9-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/9-Goldilocks-and-the-Negative-R-Square.html</feedburner:origLink></item>
<item>
    <title>What Is the Error if the Probability of Rain Is .5 and It Rains?</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/NotcqM8Ckt8/index.php</link>
            <category>Analytics</category>
            <category>JMP</category>
            <category>Statistics</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/8-What-Is-the-Error-if-the-Probability-of-Rain-Is-.5-and-It-Rains.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=8</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=8</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    Suppose that a mortgage aggregator came to you and said that this triple-A assemblage of loans had one chance in a billion of losing money. Then you evaluated the package and found out that it really had a one-in-a-million probability of failing. The error in the probability estimate was just .000001, not much. So with so little error in the claimed probability of failure, you ignore the discrepancy. The problem, of course, is that the expected loss in the latter case is a thousand times greater. Sometimes with small probabilities, little errors become big errors. <br />
<br />
So what if we were told that the chance of something was zero -- that the event was impossible? If you believed that, with no reservation, you would bet your life on the certainty. Now suppose that that event claimed to have probability zero actually happens. What should the penalty be for being wrong? Should it be a dollar for getting the probability wrong by 1? Should it be infinity dollars, because it was a lie?<br />
<br />
This post is all about accounting systems for events that happen in the face of fitted probabilities that they would not happen. <br />
<br />
The events are response categories. The categories will be important, e.g., whether someone will buy a product or not, survive a disease or not, make a fraudulent transaction or not, engage in money laundering or not, choose one product versus others, etc. We fit models that attribute the probabilities, and then we need to find out how well these models predict. <br />
<br />
What is the best way to measure how good a prediction is in this situation?<br />
<br />
We have always known pretty well how to measure prediction for continuous responses. We use squared error to measure the fit, estimating to minimize the sum of squared residuals. Least squares is the foundation of most of our fitting arsenal for continuous responses. A measure of fit is the R-square, based on the sum of squared errors.<br />
<br />
With categorical responses, it’s not so obvious how to measure how well a model fits. What is the model supposed to do? We can think of predicting either as classification, i.e., picking a category we predict will result, or fitting probabilities so that the actual response is generally associated with a larger probability. For weather, the first approach would be to predict that it will rain; the second approach would be to assert that the probability of precipitation is 90%. <br />
<br />
For a statistician, the latter expression in terms of probability is preferred, because it expresses the degree of uncertainty. If you are calculating gains and losses from planning an event, if you know the probabilities, you can make decisions so that you can maximize your expected gain. If you are planning an open outdoor concert and it is very expensive to have valuable instruments rained on, you will avoid presenting the concert unless the probability of no rain times the revenue of the event is more than the probability of rain times the lost value in ruining the instruments. If the model only asserted that it will rain or not rain, you won’t be able to calculate your expected gain and make the best decision about whether to go ahead with the outdoor concert.<br />
<br />
Consider the following four measures of error. In each case, p is the probability we attribute to the event that actually occurred.<br />
<br />
<blockquote>Entropy error = -log(p)<br />
Squared error = (1-p)^2<br />
Absolute error = |1-p|<br />
Misclassification error = {0 if p is pmax, 1 otherwise}</blockquote><br />
For all these, we calculate the error for each observation, and average them, except that for squared error, we take the square root of the mean squared error; so we will call the average measure RMSE (root mean squared error). RMSE is really the standard deviation estimate, but here we divide by n instead of n-1.<br />
<br />
Remember that the goal here is to fit the model so that p is always close to 1, i.e., we associate a high probability with the outcome that actually occurred. Here is a graph showing the four measures of error.<br />
<br />
<img width='486' height='401' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/blognormal/uploads/fourMeasuresOfError.gif" alt="the four measures of error, shown in JMP" /><br clear="all"/><br />
<br />
Each measure scores events that happen with fitted probability of 1 as zero. Saying an event is certain, and then the event occurs, has no loss, no error. <br />
<br />
It is also the case that for all measures, the only way to get a perfect score of zero error is to have a fitted probability of 1 for the events that happen. Obtaining a zero fitted probability can be hard, considering that the models we often use make a fit of 1 very hard. For example, in logistic regression the probability is modeled as:<br />
<br />
<blockquote>p = 1/(1+exp(-Z))</blockquote><br />
where Z is a linear model in terms of regression variables. In order to get a probability of 1, Z will have to be infinite. Getting a zero is just as hard, needing Z to be –infinity. <br />
<br />
Of the four error measures, the only one that gives you a good chance of attributed perfection is misclassification error. It makes fitting like taking a pass-fail course in school. Each trial is likely to give you a perfect score of zero, but over many trials, a few misclassifications will creep in and spoil the grade.<br />
<br />
But of course, the problem with pass-fail courses is that they don’t measure very precisely. Misclassification error doesn’t care if you are slightly good or very good in estimating the probabilities. <br />
<br />
In the middle of the range, we have absolute error varying linearly with probability, measurement error is flat except for the sudden jump at .5, and the other two are increasing at increasing rates.<br />
<br />
The squared error and the absolute error are very similar, given that for squared error, you take the square root of the mean sum of squared errors. But RMSE and Absolute error differ in the middle. Suppose that you have two situations, one situation where half are all wrong and the other situation where all are half wrong. The first case can be predicting that it will always rain. The second case would be like predicting that the probability of rain is always .5. We suppose that it rains half the time. In the first case, the error in probability is either 1 or zero, half and half of the rows each. In the second case, the error in probability is a half in all the rows. If you believe that the two situations are equally bad, then you should like Absolute error. If you believe that the half all wrong is worse than all half wrong, then you should like root mean squared error. In the first case, both average errors are 1/2. In the second case, the RMSE error for the first is 1/sqrt(2), but for the second is 1/2. <br />
<br />
<table border="1"><tr><td>Situation </td><td>Description</td><td>RMSE</td><td>Avg Absolute</td></tr><tr><td>Claim the probability is .5</td><td>All Half Wrong</td><td>1/2</td><td>1/2</td></tr><tr><td>Claim it always rains</td><td>Half All Wrong</td><td>1/sqrt(2)</td><td>1/2</td></tr></table><br />
Consider the case of flipping a coin. The RMSE player will always say that the probability is always fifty-fifty. That sounds reasonable. But the Absolute player could just as easily say that the probability is always 1 (or always 0), and be just as well off. The coin flip RMSE player just seems more right to me.<br />
<br />
Now let’s look at the other end of the scale where you have an event that is attributed with probability of zero, and the event actually occurs. Three of the error measures agree that the error should be 1. The entropy measure says it should be infinity. It is a good thing that logistic regression makes it hard to reach zero, because the cost of making an error there is infinitely great. But this makes sense. When you attribute a probability of zero, you are saying that an event will never happen, that it is impossible. If that event ever happens, it is not just an estimation error. It is a refutation. <br />
<br />
I like entropy error most. First, it is precisely the error that we minimize in fitting the logistic regression by maximum likelihood. Maximum likelihood is simply minimizing sum of the negative logarithms of the probabilities that the model produces for the events that we have in the data. Entropy is a good name for this kind of error because it is the accounting measure for information theory. Taking the log of probabilities makes a lot of sense; just consider doing a binary search among n equally-likely items, and to find a level with probability 1/n takes log2(n) comparisons. The number of coin flips to get an event with rarity p takes –log2(p) flips. Bits of log-probability uncertainty are additive, just as joint probabilities are multiplicative. <br />
<br />
I like misclassification error the least. It is a crude measure that doesn’t care about the probabilities we fit. Nevertheless, it is the easiest to understand, being a simple count on predicted categories being wrong.<br />
<br />
For each measure of error, we can define an “RSquare” measure. In least squares regression, this is the percent of variation in the response that is accounted for by the model. Another way is to say that it is 1 minus the ratio of unexplained variation to total variation. <br />
<br />
<blockquote>RSquare = 1 – (error in model)/(error around the mean)</blockquote><br />
If we have no model terms, the estimate for the response will be the mean, so RSquare is 1 minus the error in the fitted model as a proportion to the error in a simpler model that has no regression terms.<br />
<br />
So if we define it this way, we can have an RSquare for each of the error measures defined above for categorical responses. For example, misclassification RSquare is:<br />
<br />
<blockquote>RSquare(misclass.) = 1 – (number misclassified) / (number of less-common responses)</blockquote><br />
This follows because the reduced model with no regressors will give a constant probability, and the level with the highest probability will have an error of zero, and the other level an error of 1. <br />
<br />
The entropy version of RSquare is a by-product of the estimation process:<br />
RSquare = 1 – (-loglikelihood for full model)/(-loglikelihood for reduced model)<br />
<br />
McFadden (1973) calls the entropy RSquare the Likelihood Ratio Index.<br />
<br />
The RMSE RSquare also seems to have merit. Efron (1978) calls this the Pseudo-RSquare.<br />
<br />
So which measure should our software report? In my product group, we have tended to just report the entropy measure, because that is what we are minimizing. But since other software reports results with other measures, we now feel that we should report all four of these measures, so this is what the report will look like in the future:<br />
<br />
<img width='349' height='88' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/blognormal/uploads/fourMeasures.gif" alt="what JMP report will look like in the future, showing the four measures" /><br clear="all"/><br />
<br />
We report both the error average and the RSquare. For the misclassification measure, people never like the RSquare scaling. They want the misclassification rate, which in this case is .1203, i.e., 12% misclassification. For the other measures, RSquare is a more natural measure. However, there are some downsides to this measure in categorical models.<br />
<br />
First, it is hard to get a high RSquare, even if your model is very good. People can get used to seeing RSquare of .9 for continuous responses, and then see a much lower RSquare for categorical models. [Maddala [1983] recommends using the (un-log) likelihood and taking the n/2th root of the ratio, preserving the 0 to 1 scale, but making RSquares closer to 1.]<br />
<br />
If you have lots of data, you should hold back some to get a crossvalidation estimate of the error: <br />
<br />
<img width='483' height='89' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/blognormal/uploads/fourMeasuresWithValidation.gif" alt="report in JMP with validation information" /><br clear="all"/><br />
 <br />
The report above contains two pairs of measures, one pair for Training, the other for validation. The validation measures are run against a held-back set of data that is not used to estimate the model. In this way, we see how well the model performs against new data that is not in the “training” set used to make the estimates. A future blog post will consider validation in detail.<br />
<br />
We also have to address the zero problem in a future blog entry. With contingency table estimates of probabilities (or in degenerate logistic models), probabilities can go to zero, but validation data can contain these zero-probability values, thus spoiling the calculation with infinities. Solving this problem is harder, but there are some good approaches.<br />
<br />
And I have neglected to describe one of the most important measures, sorting efficiency, which is developed in ROC curves and lift curves. But enough for now.<br />
<br />
<strong>References</strong><br />
Long [1997] <em>Regression Models for Categorical and Limited Dependent Variables</em>, Sage, 102-109.<br />
Maddala [1983] <em>Limited-dependent and qualitative variables in econometrics</em>, Cambridge University Press. <br />
Efron [1978] “Regression and ANOVA with zero-one data: Measures of residual variation,” <em>JASA</em> 73 113-121.<br />
McFadden [1973] “Conditional logit analysis of qualitative choice behavior,” in Zarembka, <em>Frontiers of Econometrics</em>, 105-142. Academic Press.  
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/NotcqM8Ckt8" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 31 Jul 2009 09:31:45 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/8-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/8-What-Is-the-Error-if-the-Probability-of-Rain-Is-.5-and-It-Rains.html</feedburner:origLink></item>
<item>
    <title>Welcome to the SAS Solar Farm, Gov. Perdue</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/kr9m28ubHGM/index.php</link>
            <category>Economics</category>
            <category>Environment</category>
            <category>Opinion</category>
            <category>SAS</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/1-Welcome-to-the-SAS-Solar-Farm,-Gov.-Perdue.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=1</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=1</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    <em>I had the privilege to lead off a press event this morning for <a href="http://www.governor.state.nc.us/NewsItems/PressReleaseDetail.aspx?newsItemid=404" >North Carolina Gov. Bev Perdue</a> at the <a href="http://www.sas.com/news/preleases/SolarFarmLive.html" >SAS Solar Farm</a> in Cary, NC. It was a beautiful photo event on a small hill overlooking the field. Here are my remarks:</em><br />
<br />
Good morning and welcome to the SAS Solar Farm. I am <a href="http://www.sas.com/presscenter/bios/jsall.html" >John Sall</a>, co-founder of SAS Institute.<br />
<br />
This field of solar panels was completed last December; it has 5,040 panels, generates 1 megawatt of electic power at peak and is projected to produce 1.7 million KWH per year. The panels swivel to track the sun across the sky. This solar farm will eliminate 1,600 tons of carbon emissions annually.<br />
<br />
The solar field occupies 4.8 acres of land -- and we also use the field as a pasture for Dorper Sheep (short sheep that fit better under the panels).<br />
<br />
Eventually, the revenues from this facility will repay our investment, but only because of the generous state and federal tax credits and NC GreenPower electric rates. Without the incentives, this solar-generating facility would not have been built.<br />
<br />
I hope that federal legislation will be forthcoming to make alternative energy and energy conservation economic by a federal charge on fossil carbon energy sourcing; this would be the most effective, efficient and ultimately the least painful way to a sustainable energy future. Until that becomes politically viable, other measures, such as alternative energy subsidies and quantity limits will at least move us in the right direction toward sustainability and energy security.<br />
<br />
This solar farm is one of several energy initiatives at SAS. We aim to conserve energy use as we grow more jobs here: <ul><br />
<li> Our next building, under construction over there behind those trees, is designed to achieve <a href="http://www.usgbc.org/DisplayPage.aspx?CMSPageID=1988" >LEED certification</a> at least at the Silver level. <br />
<li> Elsewhere, our forthcoming cloud computing facility will use energy-efficient computer systems. <br />
<li> We recently published our second annual corporate sustainability and social responsibility report, which summarizes these and other efforts. <br />
</ul><br />
SAS is happy to call North Carolina home, with the state’s support for business, research and higher education, all of this enabling better jobs, better health and long-term sustainability.<br />
<br />
North Carolina has many opportunities in alternative energy: in solar, in biofuels and in wind. Recently, the federal Department of the Interior, under Ken Salazar, made the first step to <a href="http://www.mms.gov/ooc/press/2009/press0422.htm" >unlock leasing for offshore wind farms,</a> and North Carolina has some of the best opportunities.<br />
<br />
North Carolina is a home for energy research, too. We congratulate NC State University, which last year was appointed the lead institution for a smart grid NSF grant, which led to the <a href="http://www.freedm.ncsu.edu/" >FREEDM Systems Center </a>(Future Renewable Electric Energy Delivery and Management). <br />
<br />
We welcome everyone here to this new energy farm, and we look forward to our governor’s announcements on the subject of sustainability.<br />
<br />
It is my privilege to introduce the Governor of the State of North Carolina, Bev Perdue. <br />
<br />
<img width='477' height='329' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/SAS_solarfarm_event.jpg" alt="photo of John Sall, Executive Vice President and Co-founder of SAS, with North Carolina Gov. Beverly Perdue" /><br />
<br clear="all"/><br />
Dale Carroll, Deputy Secretary of NC Department of Commerce (left); Hilda Pinnix-Ragland, Vice President of Corporate Public Affairs for Progress Energy (second from left); and NC Gov. Bev Perdue (center) join Jerry Williams of SAS (second from right) and me (right) at the SAS Solar Farm this morning. The Dorper Sheep are in the background.<br />
<br />
<strong>Photo by Steve Muir, SAS</strong> 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/kr9m28ubHGM" height="1" width="1"/>]]></content:encoded>

    <pubDate>Thu, 21 May 2009 12:43:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/1-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/1-Welcome-to-the-SAS-Solar-Farm,-Gov.-Perdue.html</feedburner:origLink></item>
<item>
    <title>Carbon Supply and Demand</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/ddpbrjPHt10/index.php</link>
            <category>Economics</category>
            <category>Environment</category>
            <category>Opinion</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/2-Carbon-Supply-and-Demand.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=2</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=2</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    In my <a href="http://blogs.sas.com/blognormal/index.php?/archives/3-Earth-Day-Ps-and-Qs.html" >Earth Day blog post</a> last week, I introduced some claims that actions on price were better than actions on quantity in the carbon market. Now it is time to back up those claims.<br />
<br />
The main tool of economic thinking is supply and demand curves. Let's draw hypothetical supply and demand curves for carbon in carbon fuels. Economists reverse the usual X and Y and put price on the Y axis and quantity on the X axis. At low prices, people are willing to buy more carbon; and at high prices, people will buy less, substituting other goods (alternative energy) or withdrawing (energy conservation). <br />
<br />
Thus, we have the red demand curve. Supply has the reverse slope, with low prices reducing supply and high prices increasing supply (the blue line). Where the curves cross, we have where the price and quantity are determined, at point A, with quantity 50 and price 50.<br />
<br />
<img width='465' height='453' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/carbon1.gif" alt="graph showing carbon supply and demand curves" /><br />
<br clear="all"/><br />
<br />
Now consider a charge on carbon sourcing. OK, it is a carbon tax, but please don't let the word <em>tax</em> scare you.<br />
<br />
The carbon tax just shifts the supply curve upward by the amount of the tax, 20, resulting in a new supply curve, the orange line. The new supply curve crosses the demand curve at point B now, resulting in a smaller quantity (40) and higher price (60). <br />
<br />
The alternative to a carbon tax is a carbon cap, restricting quantity to not exceed a cap of 40, the green line. Actually, the supply curve is the gray-shadow that begins with the blue line, but then turns straight up at the green line. The new capped supply also crosses the demand curve at quantity 40 and price 60. We have constructed the lines to show that you can set the tax at an amount that would be equivalent to some cap.<br />
<br />
<img width='466' height='452' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/carbon2.gif" alt="graph titled Carbon Cap and Carbon Tax" /><br />
<br clear="all"/><br />
<br />
Suppose that we have fluctuations in the economy, business cycles, booms and recessions. These things do happen, you know. In recession, the demand curve shifts down, with less quantity demanded at a given price. In a boom, the demand curve shifts up, with more quantity demanded at a given price. This variation is represented by the three red demand curves below. <br />
<br />
Now see how the different supply curves change the solution. With a carbon tax (orange supply curve), we have the price quantity solution varying from B up to D1 in a boom, and down to D2 in a recession -- resulting in prices varying between 55 and 65. <br />
<br />
If we have a cap (the green line) instead of a tax, then the three demand curves change the solution from B down to C2 and up to C1 -- resulting in prices varying between 50 and 70.<br />
<br />
<img width='465' height='452' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/carbon3.gif" alt="graph titled Carbon Demand Variation's Effect on Prices" /><br />
<br clear="all"/><br />
<br />
I don't know the true locations of carbon supply and demand, but we don't have to know to make some conclusions. A vertical supply curve (the cap) will always produce more-variable prices than any positively sloped supply curve. The cap produces the maximum price variation possible for any non-negatively sloped supply curve. <br />
<br />
What about total carbon consumed between a cap and a tax? The cap's quantity will be fixed, assuming the economy is not so bad that demand is less than the cap. The tax's quantity will be variable, but it can be designed so that the average quantity produced is the same as the carbon cap. Just imagine the variations between the three red lines, and the quantity solves between 35 and 45, with a long-run mean of 40, the same as the cap. Carbon goals are long term. We need to limit the average carbon emitted; we don't care whether carbon emissions vary some from year to year.<br />
<br />
So the mean carbon emitted is the same between cap and tax, but the price variation is much higher for the cap. The goal is to limit carbon emissions. But the way you get to the goal is through alternative energy and energy conservation. There is no other mechanism. And the economic driver for alternative energy and energy conservation is price. We have to have higher energy prices to solve demand to lower quantity, whether through a cap or a tax.<br />
<br />
Energy conservation and alternative energy are structural changes driven by investments. For example, when gasoline prices are high, like a year ago, we tend to buy high-mileage cars. And when gasoline prices are low, like now, we tend to buy bigger cars. When prices are high, we make long-term investments in solar and wind energy. When prices are low, we don't. It's not good business. But investments are decided on variability as well as price. Why invest in a solar farm if prices are so variable as to make that investment risky? We need stable prices, as well as high prices to make the structural change happen. <br />
<br />
Carbon taxes produce more stable prices than carbon caps do. History on previous caps back that up. The <a href="http://www.cbo.gov/ftpdocs/89xx/doc8934/02-12-Carbon.pdf"  title="null">CBO study, "Policy Options for Reducing CO2 Emissions,"</a> which I referred to in <a href="http://blogs.sas.com/blognormal/index.php?/archives/3-Earth-Day-Ps-and-Qs.html" >my previous post</a>, notes that sulfur-dioxide emission permits had prices that varied much more than the economy. And EU carbon credits collapsed to half the price that prevailed six months earlier, as reported in the issue of <a href="http://www.nature.com/nature/journal/v457/n7228/index.html"  title="null">the journal <em>Nature</em> </a> that I also mentioned last week. <br />
<br />
The <a href="http://energycommerce.house.gov/Press_111/20090331/acesa_summary.pdf" >Waxman climate-change bill</a> only specifies a cap of 3% below 2005 levels by 2012. Will that make prices high enough to stimulate investments in energy conservation and alternative energy? 2005 was a huge boom year, and we are unlikely to demand that much energy unless we have another huge boom economy. Therefore, I would expect almost no price support by then for alternative energy or energy conservation.<br />
<br />
Of course, support for alternative energy and energy conservation can be in the form of subsidies. Subsidies are an inefficient manipulation of the market. The economy adjusts far more efficiently to real prices than it does to subsidies. We want people and businesses to invest in energy conservation and alternative energy because energy prices are high enough for that to make solid business sense. Let the market do what it will. Let taxes that account for externalities (commons costs) make the adjustments we need to supply. <br />
<br />
What happens later when the economy booms again? Then we have a huge price impact from a cap, and this is so worrisome that the Waxman bill has a strategic carbon reserve ready to unload more emissions credits to auction when needed. This makes for an unpredictable wiggle to the right of the cap. It is better than not having it, but a smooth supply curve would be much better.<br />
<br />
You still might not want a carbon tax because it is new tax burden. Well, a cap is even more of a burden; you either have ineffective limits or even higher energy prices. Carbon taxes can be returned to taxpayers, per capita, to make them revenue-neutral. Caps could also, if the emissions credits were auctioned and the proceeds distributed, but that is not the current plan. The current plan needs to be changed.<br />
<br />
There is much more to say in future blog posts about the reserve, market manipulation, time-banking, operational considerations and international issues. 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/ddpbrjPHt10" height="1" width="1"/>]]></content:encoded>

    <pubDate>Mon, 27 Apr 2009 10:23:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/2-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/2-Carbon-Supply-and-Demand.html</feedburner:origLink></item>
<item>
    <title>Earth Day P's and Q's</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/A7LQLXrc1eM/index.php</link>
            <category>Economics</category>
            <category>Environment</category>
            <category>Opinion</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/3-Earth-Day-Ps-and-Qs.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=3</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=3</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    I suggest you spend some time on Earth Day reading about environmental policy. <br />
<br />
First, you should get the most recent (April 19, 2009) issue of <em>The New York Times Magazine</em> and read the article <a href="http://www.nytimes.com/2009/04/19/magazine/19Science-t.html?_r=6" >"Why isn't the brain green?"</a> by Jon Gertner. The most powerful force in environmental policy is the public’s opinion, and the shaping of this opinion is extremely important. The article profiles the work of <a href="http://www.cred.columbia.edu/" >CRED</a>, the Center for Research on Environmental Decisions, and the work of Elke Weber and her colleagues. In particular, the work of David Hardistry and the thinking of Baruch Fischhoff could make a huge difference in how to frame the current debate. The article says it could be very politically palatable to act on price, rather than quantity, in climate legislation. <br />
<br />
Second, you should be aware of the issue of <a href="http://www.nature.com/nature/journal/v457/n7228/index.html" ><em>Nature</em> dated Jan 22, 2009</a>. Notice the article titled “Not so sunny after all” to see that the solar energy market is deeply depressed now, though it was booming a year ago. Then notice the article “Prices plummet on carbon market” to see that the price of a ton of CO2 emission went from more than 30 euros in July 2008 to only 11.65 euros in January 2009. The direction now is not good.<br />
<br />
Third, you should read Tom Friedman’s book <em>Hot, Flat, and Crowded</em>. Tom has great insight into most current issues, and his insight into solving the climate problem is very good. He basically says that most actions have very small leverage. If you want effective change, we need to change the laws and policies we operate under. <br />
<br />
Fourth, you should read <em>Carbonomics</em>, by Steven Stoft, which puts an economist’s perspective on the current debate. An alternative to buying that book would be to read last year’s economist report by the Congressional Budget Office, <a href="http://www.cbo.gov/ftpdocs/89xx/doc8934/02-12-Carbon.pdf" >“Policy Options for Reducing CO2 Emissions.”</a> <br />
<br />
There are two approaches to addressing the climate problem: one is price-centered, the other is quantity-centered, the P and Q approaches, respectively. Most economists say the P approach is better because it is effective, efficient and gradual. The proponents of the Q approach say their way is more politically palatable and has certainty. <br />
<br />
The current climate legislation, the Waxman bill, is a Q-side approach of “cap and trade.” It is the work of scientists and politicians, who value certainty and palatability. Power companies don’t object to a Q approach as long as they get a lot of free emissions credits in the deal. Most NGOs, such as the ones that are in <a href="http://www.us-cap.org/" >USCAP</a>, strongly favor a Q approach, for a variety of reasons. A Q approach did work for sulfur dioxide. <br />
<br />
But how is a climate policy supposed to actually work? With a Q approach, you cap Q, auction or give emissions credits, trade them; as limits take hold, there is an artificial scarcity that will raise prices. With higher prices, you get energy conservation and alternative energy generation. <br />
<br />
With the P approach, you put a charge on carbon sources to raise prices, leading to energy conservation and alternative energy generation. The P approach is much simpler. The prices in a P approach are much more stable, and stable increasing price is the key to the changes we need in energy conservation and alternative energy. <br />
<br />
A Q approach leads to speculation and potentially wild fluctuation in prices, such as we had in the Enron-induced electricity price spike in California some years ago. The Waxman bill specifically fears fluctuations, creating a reserve of credits to sell when the price goes too high, and also ostensibly prohibiting “speculation.” It also leaves blank how credits are allocated, i.e., who gets free credits, which will be arbitrary and political. I don’t think a Q bill can ever be as good as a P bill, even if you load the Q bill down with gimmicks to make it more P-responsible.<br />
<br />
SAS Institute made a large investment last year in solar energy: Last December, we started generating electricity from an $8 million, nearly 5-acre 1-megawatt solar farm. For SAS, it is actually economic. We will eventually get a small return on the investment. So is the current system really working? Not really. It is artificial. The only reason that it is economic is due to large tax credits we get on the initial investment, together with favorable green-power rates you can get, as pressured by alternative-energy incentives to power companies. Do we actually use any of that solar electricity at SAS? No. If we had to use it ourselves, it would displace the cheap electricity we buy, and the project would no longer be economic. There is no way we could operate off the solar power, anyway -- it would supply only about 3% of our needs and obviously would not work at night or on cloudy days. If the subsidies and credits went away, we would not invest in solar. <br />
<br />
Alternative energy could develop naturally if the price of carbon-based electricity were to rise to several times its current level. That is what we really need: more expensive carbon electricity and more expensive carbon fuel. That is what will be effective in efficiently adjusting to the new future. We need a P signal. A P signal is so much more efficient and businesslike than subsidies and limits. <br />
<br />
I happen to care a lot about environmental issues. I remember my first Earth Day in 1970 when environmental conscience-raising seemed new. I was in college and lived in a special-interest house on campus called Ecology House. In the mid-nineties, I became involved with <a href="http://www.nature.org/" >The Nature Conservancy</a> (TNC) and traveled to many conservation sites all over the world with TNC or <a href="http://www.worldwildlife.org/" >WWF</a>, including Bolivia, Peru, Venezuela, Mexico, Alaska, Panama, Costa Rica, Brazil, China, Indonesia, Malaysia, Zambia, Namibia and Botswana. My interests led to joining the Board of Directors of TNC, serving under a variety of chairs, including Hank Paulson and John Morgridge, and now I am chair of the Audit Committee there. Between my wife and me, we serve on the boards of TNC, WWF, <a href="http://www.care.org/" >CARE</a>, and the <a href="http://www.nicholas.duke.edu/institute/" >Nicholas Institute for Environmental Policy Solutions</a>, and I am joining the <a href="http://www.edf.org/page.cfm?tagID=1768" >National Council of Environmental Defense Fund</a>. So that’s my “green card.”<br />
<br />
However, all these organizations are on the Q side of the current debate, and I feel a little lonely being a P advocate. But I am an economist and businessman in my head and my heart. So I will stay respectfully opposed to the current Q bill and hope the realism provided by the recession will help focus opinion on better policy. <br />
 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/A7LQLXrc1eM" height="1" width="1"/>]]></content:encoded>

    <pubDate>Wed, 22 Apr 2009 09:15:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/3-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/3-Earth-Day-Ps-and-Qs.html</feedburner:origLink></item>
<item>
    <title>Not Really an $11 Trillion Hole</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/O9BxjaHLDRg/index.php</link>
            <category>Economics</category>
            <category>Opinion</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/5-Not-Really-an-11-Trillion-Hole.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=5</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=5</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    The front page of the <em>Wall Street Journal</em> on March 13 highlighted an "$11 Trillion Hole" and said "<a href="http://online.wsj.com/article/SB123687371369308675.html" >Americans See 18% of Wealth Vanish</a>." I looked at the chart, and the 2008 number indeed looked as if it had fallen off a cliff. <br />
<br />
But then I looked at the rest of the curve and remembered the two big bubbles that were going on, the Internet Bubble in the 1990s and the Housing Bubble in the 1990s and 2000s. I thought I should just exclude those points from the long-term trend. <br />
<br />
So I got the data from the <a href="http://www.federalreserve.gov/" >Federal Reserve's Web site</a> and tried to reproduce the <em>Wall Street Journal</em> plot, adding a trend line that excluded the bubble points. <br />
<br />
So how does our current net worth look with respect to the long-term trend? Not bad at all. We are not in a $11 trillion hole but are back on track after some roller-coaster years.<br />
<br />
<img width='387' height='273' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/householdNetWorth.gif" alt="household net worth by year" /><br />
<br clear="all"/><br />
<br />
<strong>Legend:</strong><br />
green = used to estimate the regression line, 1985 to 1996<br />
red = the points in the bubbles<br />
blue = the current value that was the subject of the Wall Street Journal article<br />
<br />
I don't want to deny in any way that we are in an economic crisis. But I do want to remind everyone that portfolio valuation drops are not quite as bad as they seem if you consider that the last few years of huge yields were somewhat artificial, and just returning to normal valuations will look like a crash.<br />
<br />
Sources: <ul><br />
<li><a href="http://www.federalreserve.gov/releases/z1/Current/annuals/a1985-1994.pdf " >Flow of Funds Accounts of the United States, 1985-1994 [PDF]</a><br />
<li><a href="http://www.federalreserve.gov/releases/z1/Current/annuals/a1995-2004.pdf " >Flow of Funds Accounts of the United States, 1995-2004 [PDf]</a><br />
<li><a href="http://www.federalreserve.gov/releases/z1/Current/annuals/a2005-2008.pdf" >Flow of Funds Accounts of the United States, 2005-2008 [PDf]</a><br />
<li><a href="http://www.federalreserve.gov/releases/z1/Current/z1r-5.pdf" >Balance Sheet of Households and Nonprofit Organizations, March 12, 2009 [PDF]</a> </ul><br />
 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/O9BxjaHLDRg" height="1" width="1"/>]]></content:encoded>

    <pubDate>Mon, 06 Apr 2009 09:57:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/5-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/5-Not-Really-an-11-Trillion-Hole.html</feedburner:origLink></item>
<item>
    <title>Optimal Design of the Choice Experiment</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/E2yB2AN3ZuI/index.php</link>
            <category>Analytics</category>
            <category>JMP</category>
            <category>Statistics</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/6-Optimal-Design-of-the-Choice-Experiment.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=6</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=6</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    My <a href="http://blogs.sas.com/blognormal/index.php?/archives/7-Choice-Experimental-Designs-Are-Different.html" >previous blog post </a>covered issues in the design of a choice experiment for laptop computers. The goal was to model the trade-offs among features and price. In this post, I'll show how to design a choice experiment.  <br />
<br />
The Choice Design feature, which you access from the DOE menu in JMP 8, designs choice experiments. This platform was developed by <strong>Bradley Jones</strong>, with help from <strong>Chris Gotwalt</strong>. <br />
<br />
The first job is to enter the factors in the experiment. After adding the factors and specifying the levels, the window looks like this:<br />
<br />
<img width='478' height='203' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop1DOEFactors.gif" alt="" /><br />
<br clear="all"/><br />
<br />
Next, we specify the model. This has to be a small experiment, so we just take the default main-effect model.<br />
<br />
<img width='489' height='154' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop1bModel.gif" alt="" /><br />
<br clear="all"/><br />
<br />
Next, we fill in the Prior Specification. Remember from <a href="http://blogs.sas.com/blognormal/index.php?/archives/7-Choice-Experimental-Designs-Are-Different.html" >my previous post</a> that the optimal design depends on what the answer is, and we don’t know the answer. Actually, we already know a lot about the choices. We already know that people want large disks, higher speeds, longer battery life and lower price.  <br />
<br />
The experiment measures the relative strengths of these characteristics; it measures trade-offs, particularly the trade-off between price and features. The response is in the positive direction, utility. Notice that all the factor levels are ordered so that the least desirable levels are first and the most desirable levels are last. <br />
<br />
Now we can tell the designer that we know the direction of these levels. We do this by entering a prior mean. We say that 80 Gig (GB) is worth 1 utility unit more than 40 Gig. We say that 2.0 GHz CPU is worth 1 utility unit more than 1.5 GHz, etc. Of course we don’t really know the magnitude of these, and the uncertainty of that is expressed in the Prior Variance Matrix, with 1s on the diagonals. The convention is that if the first level is less desirable, then you enter a negative value, as we do here. When there are three levels in increasing utility order, enter negative, then 0. Actually, it doesn’t matter whether we enter the levels in the right order for the parameterization as long as the ordering is consistent across levels.  <br />
<br />
<img width='419' height='273' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop2Priors2.png" alt="" /><br />
<br clear="all"/><br />
<br />
This Prior Specification is important in experiments like this, where the factors all have known preference directions and the goal is to measure trade-offs. If we didn’t specify this, then we could easily get choice-set items where one choice included all of the better factor levels and the other choice included all of the worse factor levels; in such a case, the choice response would be trivially obvious, and the run would be wasted. <br />
<br />
Now we specify the rest of the experiment we want:<br />
<br />
<img width='375' height='157' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop3Specs.gif" alt="" /><br />
<br clear="all"/><br />
<br />
Suppose we have 16 subjects lined up to take the choice survey. We figure that each subject has the patience to do six comparisons.  Each choice set will be two profiles — we could ask people to choose among more, but that is more work for the subject — two is standard. We choose to do two survey sets. This is a compromise between giving everyone the same questions and giving everyone his or her own separate survey with separately designed choice sets. The total number of subjects is the product of the last two specifications (2*8=16). The total number of choice responses is the product of the last three specifications (6*2*8=96).  <br />
<br />
Now there are two levels of design data here. There are the profiles that go into making each choice set. There are two profiles per choice set times six choice sets per survey times two surveys, making a table of (2*6*2) 24 unique choice profiles. <br />
 <br />
<img width='483' height='375' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop4ProfileTable.gif" alt="" /><br />
<br clear="all"/><br />
<br />
This structures the factor-level data so that you can prepare the raw material for the survey.  <br />
<br />
Then there is the subject-level data for the responses, showing which subjects get which survey and having a slot to enter the response for each choice trial. Here are the rows for the first two subjects. The first subject is taking Survey 1, and the second subject is taking Survey 2. <br />
<br />
<img width='508' height='266' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop5ResponsesTable.gif" alt="" /><br />
<br clear="all"/><br />
<br />
The Choice1 and Choice2 values index the Choice ID value in the Profiles table that matches the Choice Set ID. For example, in row 10, Choice1 is Choice ID 1 for Choice Set 10 in Survey 2, which is Row 19 in the Profile table (80 Gig, 1.5 GHz, 4 hours, $1,000), where the other choice is the next profile in Row 20 (40 Gig, 2.0 GHz, 4 hours, $1,500).<br />
<br />
Why have two tables instead of one? It turns out that you have a choice of one table or two.<br />
<br />
<img width='308' height='41' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop5bRadio.gif" alt="" /><br />
<br clear="all"/><br />
<br />
Let’s see whether this design follows the guidelines. Every choice must be a trade-off of desirable alternatives: <br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>1</td><td>1</td><td>1</td><td>40 Gig</td><td>1.5 GHz</td><td>6 hours</td><td>$1,500</td></tr><br />
<tr><td>1</td><td>1</td><td>2</td><td>40 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,200</td></tr></table><br />
This tests whether you are willing to pay $300 more to get two more hours of battery life even if you also have to sacrifice speed. Trade-off of $300 and speed for battery life.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>1</td><td>2</td><td>1</td><td>40 Gig</td><td>1.5 GHz</td><td>6 hours</td><td>$1,000</td></tr><br /><br />
<tr><td>1</td><td>2</td><td>2</td><td>80 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,500</td></tr></table>Trade-off of $500 and battery life against hard disk.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>1</td><td>3</td><td>1</td><td>80 Gig</td><td>1.5 GHz</td><td>6 hours</td><td>$1,200</td></tr><br />
<tr><td>1</td><td>3</td><td>2</td><td>40 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,000</td></tr></table>Trade-off of $200 and speed for disk and battery life.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br /><br />
<tr><td>1</td><td>4</td><td>1</td><td>80 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,500</td></tr><br />
<tr><td>1</td><td>4</td><td>2</td><td>40 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,200</td></tr></table>Trade-off  of $300 for disk.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>1</td><td>5</td><td>1</td><td>80 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,200</td></tr><br />
<tr><td>1</td><td>5</td><td>2</td><td>40 Gig</td><td>1.5 GHz</td><td>6 hours</td><td>$1,000</td></tr></table>Trade-off of $300 and battery for speed and disk.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>1</td><td>6</td><td>1</td><td>80 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,500</td></tr><br />
<tr><td>1</td><td>6</td><td>2</td><td>80 Gig</td><td>1.5 GHz</td><td>6 hours</td><td>$1,200</td></tr></table>Trade-off of $300 and battery for speed.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>2</td><td>7</td><td>1</td><td>80 Gig</td><td>2.0 GHz</td><td>6 hours</td><td>$1,200</td></tr><br />
<tr><td>2</td><td>7</td><td>2</td><td>40 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,000</td></tr></table>Trade-off of $200 for disk, speed and battery life.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>2</td><td>8</td><td>1</td><td>40 Gig</td><td>2.0 GHz</td><td>6 hours</td><td>$1,200</td></tr><br />
<tr><td>2</td><td>8</td><td>2</td><td>80 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,000</td></tr></table>Trade-off of $200 and disk for speed and battery.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>2</td><td>9</td><td>1</td><td>40 Gig</td><td>1.5 GHz</td><td>6 hours</td><td>$1,500</td></tr><br />
<tr><td>2</td><td>9</td><td>2</td><td>40 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,000</td></tr></table>Trade-off of $500 and speed for battery.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>2</td><td>10</td><td>1</td><td>80 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,000</td></tr><br />
<tr><td>2</td><td>10</td><td>2</td><td>40 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,500</td></tr></table>Trade-off of $500 and disk for speed.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br />
<tr><td>2</td><td>11</td><td>1</td><td>80 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,200</td></tr><br />
<tr><td>2</td><td>11</td><td>2</td><td>40 Gig</td><td>2.0 GHz</td><td>6 hours</td><td>$1,500</td></tr></table>Trade-off of $300 and disk for speed and battery.<br />
<br />
<table border="1"><tr><td>Survey</td><td>Choice Set</td><td>Choice ID</td><td>hard disk</td><td>speed</td><td>battery life</td><td>price</td></tr><br /><br />
<tr><td>2</td><td>12</td><td>1</td><td>40 Gig</td><td>1.5 GHz</td><td>4 hours</td><td>$1,000</td></tr><br />
<tr><td>2</td><td>12</td><td>2</td><td>80 Gig</td><td>2.0 GHz</td><td>4 hours</td><td>$1,500</td></tr></table> Trade-off of $500 for disk and speed.<br />
<br />
Are there any degenerate choices (i.e., where the choices are equal)? No. That's good.<br />
<br />
For each factor, do we have choices where that factor is constant (so that a dominant factor can’t prevent the other factors from being measured)? Well, no. Price is always different in each choice set, so if price is totally dominant, we can’t measure other effects. If this is a concern, then we need to go back to the Design Generation field and change 4 to 3 in “Number of attributes that can change within a choice set.” <br />
<br />
<img width='411' height='46' style="border: 0px; padding-left: 5px; padding-right: 5px;" src="/jmp/uploads/laptop6AlternateSpec.gif" alt="" /><br />
<br clear="all"/><br />
<br />
How about the polarity question? Polar factors should always have a mixture of polarity. That means the trade-offs should always be meaningful, not just all-good versus all-bad. This is where the Prior Specification works well. All of the choices are working pretty hard to measure values of interest. No choice is uninteresting.<br />
<br />
Now we have an experimental design. Thanks to <strong>Brad Jones</strong> for this example.<br />
 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/E2yB2AN3ZuI" height="1" width="1"/>]]></content:encoded>

    <pubDate>Thu, 15 Jan 2009 09:21:00 -0500</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/6-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/6-Optimal-Design-of-the-Choice-Experiment.html</feedburner:origLink></item>
<item>
    <title>Choice Experimental Designs Are Different</title>
    <link>http://feedproxy.google.com/~r/Blog-normalDistribution/~3/NwQ-X7GB5h8/index.php</link>
            <category>Analytics</category>
            <category>JMP</category>
            <category>Statistics</category>
    
    <comments>http://blogs.sas.com/blognormal/index.php?/archives/7-Choice-Experimental-Designs-Are-Different.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/blognormal/wfwcomment.php?cid=7</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/blognormal/rss.php?version=2.0&amp;type=comments&amp;cid=7</wfw:commentRss>
    

    <author>John.Sall@jmp.com (John Sall)</author>
    <content:encoded><![CDATA[
    Laptop vendors need to know which features are valued in a laptop and how much customers are willing to pay for them. Manufacturers could learn this through a market research technique know as a <b>choice experiment</b>. This post covers the elements of experimental design for choice experiments using JMP 8. <br />
<br />
But first, I've got to give credit where it's due. Both the R&D and examples used here are the work of <b>Bradley Jones</b> and <b>Chris Gotwalt</b>, who implemented the techniques in JMP 8.<br />
<br />
So let's design a choice experiment to figure out how valuable a number of features are to customers. In particular, we focus on the following:<br />
<br />
<table border="1"><tr> <th>Factor</th>    <th>Levels </th>  </tr> <tr> <td>Speed </td>    <td>Fast,&#160;Slow</td> </tr> <tr> <td>Disk Size</td>      <td> Big,&#160;Little</td> </tr><tr> <td>Battery Life</td>      <td>Long,&#160;Short</td> </tr><tr> <td>Price</td>    <td> Cheap,&#160;Expensive</td> </tr></table><br />
<br />
In the experiment, a subject has to choose between two configurations, selecting the configuration that is most appealing. For example, one choice might be between an expensive but full-featured laptop, and a cheap but feature-compromised one.<br />
<br />
<table border="1"><tr> <th>Choice</th> <th>Speed</th> <th>Disk Size</th> <th>Battery life</th> <th>Price </th>  </tr><tr> <td>a</td><td>Fast</td>  <td>Big</td><td>Long</td> <td>Expensive</td></tr><tr> <td>b</td><td>Slow</td>  <td>Small</td>    <td>Short</td> <td>Cheap<td></tr></table><br />
<br />
Suppose we used an ordinary experimental design for use in a choice experiment. Each choice could be specified like a block in a traditional design. Would that be a good idea?<br />
<br />
Consider the following choice between a and b:<br />
<br />
<table border="1"><tr> <th>Choice</th> <th>Speed</th> <th>Disk Size</th> <th>Battery life</th> <th>Price </th> </tr><tr> <td>a</td>      <td>Fast</td>  <td>Small</td>    <td>Long</td>      <td>Expensive</td></tr><tr> <td>b</td>      <td>Fast</td>  <td>Small</td>    <td>Long</td>      <td>Expensive<td></tr></table><br />
<br />
These are runs in which the factors have the same values. There is no real choice here; there is no information to be gained because the choice is arbitrary. Thus, we have our first rule of choice experiments:<br />
<br />
<strong>Guideline: In choice experiments, there are no within-block replicates, i.e., the alternatives tested have to be different in order to learn something.</strong> <br />
<br />
This is quite different from, say, an industrial response surface design, where replicates are valuable in estimating factors with greater precision, in getting a more precise estimate of experimental error and in getting an estimate of pure error for a lack of fit test. So choice designs are different from experimental designs. <br />
<br />
Now let's consider a situation in which one choice outweighs all others. This could easily happen if the factor levels are spread out much more in one factor than in another, relative to the situation. For example, suppose that the high price was so high that no one could ever choose it, despite any other factor values. Now consider running the typical classical design, in which all the factors vary within blocks. Suppose each subject is given three choice questions, each with two choices.<br />
<br />
<table border="1"><tr> <th>Set</th> <th>Choice</th> <th>Speed</th> <th>Disk Size</th> <th>Battery life</th> <th>Price </th>  </tr><tr> <td>1</td> <td>a</td>  <td> Slow </td>  <td>Big</td>    <td>Long</td>  <td> Cheap </td></tr><tr> <td>1</td> <td>b</td>   <td> Fast </td>  <td>Small</td>  <td>Short</td> <td> Expensive <td> </tr><tr> <td>2</td> <td>a</td>  <td> Slow </td>  <td> Small </td><td>Long</td>  <td>Expensive</td></tr><tr> <td>2</td> <td>b</td>  <td> Fast </td>  <td> Big </td>  <td>Short</td>  <td>Cheap<td> </tr><tr> <td>3</td> <td>a</td>   <td>Fast</td>  <td> Small </td>  <td> Short </td> <td>Expensive</td></tr><tr> <td>3</td> <td>b</td>    <td>Slow</td>  <td> Big </td>  <td> Long </td><td>Cheap<td>  </tr></table><br /><br />
<br />
This design can tell us a lot about a dominant factor, like price. Can it show anything else? If price dominates the decision, the user doesn't even have to look at the other factor values to make a decision. The other factors are not even measurable, other than being smaller than the price effect. You sacrifice learning about other factors. To fix this, we need to keep some factors the same within a choice set for some of the trials.<br />
<br />
<strong>Guideline: You must have choice sets where one factor is constant across the choice set, for each factor.</strong> <br />
<br />
There is another reason that you shouldn't vary too many factors across a choice set: Subjects get too confused and fatigued when there are two many differences for them to evaluate the trade-offs. If the two choices are very different, the choice will tend to look like a choice between two very different things; we say it is like comparing apples and oranges -- they are each good or bad in their own way, and they don't really compare against each other.<br />
<br />
<strong>Guideline: Never vary more than three or four factors at most across a choice set.</strong> <br />
<br />
There is another problem with ordinary designs. Consider the following choice set for a laptop: <br />
<br />
<table border="1"><tr> <th>Choice</th> <th>Speed</th> <th>Disk Size</th> <th>Battery life</th> <th>Price </th></tr><tr> <td>a</td>      <td>Fast</td>  <td>Large</td>    <td>Long</td>      <td>Cheap</td></tr><tr> <td>b</td>      <td>Slow</td>  <td>Small</td>    <td>Short</td>      <td>Expensive<td></tr></table><br />
<br />
This choice is going to be easy for the subject. There are no trade-offs to make. The laptop experiment was built for trade-offs because all the factors have a naturally preferred level. Faster is always preferred to slower. Larger disk is always preferred to smaller. Longer battery life is always preferred to shorter. Cheaper price is always preferred to more expensive, other things being equal. This experiment has all polar factors. Thus, this choice set doesn't tell us anything we don't already know. This choice set is an insult to the subject. Yet traditional experimental designs will produce runs like this.<br />
<br />
<strong>Guideline: Polar factors should always have a mixture of polarity. No choice set should have all the polar factors set in the same direction within choices. </strong><br />
<br />
There are other issues with some surveys:<ul><br />
<br />
<li>Some surveys ask too many questions, aiming for "full profile" details that allow you to estimate a different model for each person. But people tire of surveys that are too long; they stop early at any survey that challenges their patience, unless you actually pay them. About 15 questions is the most we can expect from volunteer surveys.<br />
<br />
<li>Some surveys don't vary enough factors. At an extreme, suppose that only one factor is varied across any choice set. In that case, only main effects are estimable; no interactions can be estimated. You can't evaluate trade-offs very well because the relative trade-off across factors is never tested.<br />
<br />
<li>If you balance the need for estimating trade-offs and interactions with the need for not asking too many questions, you should realize that the survey needs to give different sets of questions to different people to accomplish both aims. Though this makes the application of the survey more complex, this is not a real problem because the surveys are computer-generated.</ul><br />
<br />
All these rules sound pretty obvious in retrospect, right? The irony is that these considerations are not usually followed, especially the last few. Often market researchers use the same design of experiments (DOE) software for choice experiments that they use for industrial experiments, thinking that DOE is an abstract and general concept that is the same in every situation. It is not the same. Choice experiments are harder to design well.<br />
<br />
The one good approach to the very specific needs of each choice experiment is to use the tools of optimal experimental design, but adapted to the specific needs of choice experiments and to the specific needs in each individual situation.<br />
<br />
The technique of optimal design in general arranges factor settings in runs so that the most is learned from a given number of runs. In learn models, the optimal arrangement is invariant to what the actual parameter values are, so the situation is straightforward.<br />
<br />
It turns out that choice models, which are fit with a specialized kind of logistic regression, are not linear in the parameters, so the optimal design depends on the true value of the parameters, which is unknown. So some range, or prior distribution, of the parameters is used to represent the the range you need to consider. The optimization of the design for this is fairly difficult, involving integrating out prior densities to create the optimal design. <i> [Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: a re-view, Statistical Science 10: 273-304.]</i> But JMP was able to take what it had learned for Nonlinear DOE and apply it to Choice designs <i>[Gotwalt, C., Jones, B. and Steinberg, D. (2009) Fast Computation of Designs Robust to Parameter Uncertainty for Nonlinear Settings accepted at Technometrics. ]</i><br />
<br />
So a good experimental design for a choice experiment is different from that for other experiments, and optimal design techniques can handle them best.<br />
<br />
In my next post, we'll see how the laptop experiment was actually designed and run. 
    <img src="http://feeds.feedburner.com/~r/Blog-normalDistribution/~4/NwQ-X7GB5h8" height="1" width="1"/>]]></content:encoded>

    <pubDate>Tue, 16 Dec 2008 08:30:00 -0500</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/blognormal/index.php?/archives/7-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/blognormal/index.php?/archives/7-Choice-Experimental-Designs-Are-Different.html</feedburner:origLink></item>

</channel>
</rss>
