<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Princeton S* Network Systems</title>
	
	<link>http://sns.cs.princeton.edu</link>
	<description>Secure, Scalable, Self-Organizing, Storage, Self-Managing, ...</description>
	<lastBuildDate>Wed, 09 May 2012 12:58:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/PrincetonSNS" /><feedburner:info uri="princetonsns" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>PrincetonSNS</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>JavaScript in JavaScript (js.js): Sandboxing Third-Party Scripts</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/m4mtfC9Ckfw/</link>
		<comments>http://sns.cs.princeton.edu/2012/04/javascript-in-javascript-js-js-sandboxing-third-party-scripts/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 23:09:01 +0000</pubDate>
		<dc:creator>Jeff Terrace</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[emscripten]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=1889</guid>
		<description><![CDATA[js.js is a JavaScript interpreter (which runs in JavaScript) that allows an application to execute a third-party script inside a completely isolated, sandboxed environment. An application can, at runtime, create and interact with the objects, properties, and methods available from within the sandboxed  environment, giving it complete control over the third-party script. js.js supports the full range of the JavaScript language, is compatible with major browsers, and is resilient to attacks from malicious scripts. In this post, I'll outline what js.js is, how it works, demonstrate a sample application that uses it, and show results of a performance analysis.]]></description>
			<content:encoded><![CDATA[<p>Early last fall, I started working on a project called <a href="https://github.com/jterrace/js.js">js.js</a> with two other graduate students, <a href="http://www.cs.princeton.edu/~nkatta/">Naga Katta</a> and <a href="http://www.cs.princeton.edu/~sbeard/">Stephen Beard</a>. We started using a public Github repository from the start, and at the beginning of January, the author of <a href="https://github.com/mrdoob/three.js/">three.js</a> found and tweeted <a href="https://twitter.com/#!/mrdoob/statuses/153890650083958784">a link</a> to our repository to his many followers. Soon after, a post wound up on <a href="https://news.ycombinator.com/item?id=3417658">Hacker News</a>. Unfortunately, we weren&#8217;t very far along in the project yet, and weren&#8217;t really ready to show anyone our work, so there was a lot of confusion and negative criticism.</p>
<p>We&#8217;ve reached the point where js.js performs reasonably well and has a decent amount of functionality implemented. In this post, I&#8217;ll outline what js.js is, how it works, demonstrate a sample application that uses it, and show results of a performance analysis.</p>
<p>js.js is a JavaScript interpreter (which runs in JavaScript) that allows an application to execute a third-party script inside a completely isolated, sandboxed environment. An application can, at runtime, create and interact with the objects, properties, and methods available from within the sandboxed  environment, giving it complete control over the third-party script. js.js supports the full range of the JavaScript language, is compatible with major browsers, and is resilient to attacks from malicious scripts.</p>
<p>Our initial prototype implementation of the js.js runtime has been created by compiling the <a href="https://developer.mozilla.org/en/SpiderMonkey">SpiderMonkey</a> JavaScript interpreter to <a href="http://llvm.org/">LLVM</a> bytecode using the <a href="http://clang.llvm.org/">Clang</a> compiler and then using <a href="https://github.com/kripken/emscripten">Emscripten</a> to translate the LLVM bytecode to JavaScript.</p>
<h2>Emscripten</h2>
<p>Emscripten, a project by <a href="http://mozakai.blogspot.com/">Alon Zakai</a>, is an LLVM-to-JavaScript compiler. It takes LLVM bitcode (compiled with an LLVM frontend like Clang) and compiles that into JavaScript, which can be run on the web. There is some great technical documentation on how this works on the <a href="https://github.com/kripken/emscripten/wiki">Emscripten wiki</a>, so I won&#8217;t go into too much detail, but I&#8217;ll give an example of this process.</p>
<p>Here&#8217;s a simple C++ functions that calculates a Fibonacci number:</p>
<pre class="brush: cpp; title: ; notranslate">
int fibonacci(unsigned int n) {
  if (n==0 || n==1) {
    return n;
  }
  unsigned int prev2 = 0, prev1 = 1, fib = 1, i;
  for (i=2; i&lt;=n; i++) {
    fib = prev1 + prev2;
    prev2 = prev1;
    prev1 = fib;
  }
  return fib;
}
</pre>
<p>After compiling this with Clang, below is the LLVM bitcode:</p>
<pre class="brush: plain; title: ; notranslate">
define i32 @fibonacci(i32 %n) nounwind readnone {
  %1 = icmp ult i32 %n, 2
  br i1 %1, label %.loopexit, label %.lr.ph

.lr.ph:                                           ; preds = %.lr.ph, %0
  %i.04 = phi i32 [ %3, %.lr.ph ], [ 2, %0 ]
  %prev2.03 = phi i32 [ %fib.02, %.lr.ph ], [ 0, %0 ]
  %fib.02 = phi i32 [ %2, %.lr.ph ], [ 1, %0 ]
  %2 = add i32 %prev2.03, %fib.02
  %3 = add i32 %i.04, 1
  %4 = icmp ugt i32 %3, %n
  br i1 %4, label %.loopexit, label %.lr.ph

.loopexit:                                        ; preds = %.lr.ph, %0
  %.0 = phi i32 [ %n, %0 ], [ %2, %.lr.ph ]
  ret i32 %.0
}
</pre>
<p>This is where Emscripten comes in. It takes this LLVM bitcode and generates JavaScript instructions. Rather than trying to emulate the LLVM, it actually translates operations that it can into their JavaScript equivalent. Here&#8217;s what it looks like after translation:</p>
<pre class="brush: jscript; title: ; notranslate">
Module._fibonacci = (function (a) {
    var b = 2 &gt; a;
    a: do {
        if (b) {
            var d = a
        } else {
            for (var c = 1, e = 0, f = 2;;) {
                var k = e + c,
                    f = f + 1;
                if (f &gt; a) {
                    d = k;
                    break a
                }
                e = c;
                c = k
            }
        }
    } while (0);
    return d
});
</pre>
<p>To see this working in actions, <a href="http://jsfiddle.net/6f759/">here&#8217;s a jsfiddle</a> showing this function calculating the 20th number in the Fibonacci sequence. Notice an important thing happening in this translation: the LLVM bitcode is not getting emulated. It has actually translated addition, assignment, and comparison operators into their equivalent JavaScript form.</p>
<h2>js.js</h2>
<p>To create js.js, we ran Emscripten on SpiderMonkey, the JavaScript engine used in Firefox. SpiderMonkey comprises about 300,000 lines of C and C++ code. Much of our effort was spent patching SpiderMonkey to get it to compile in Emscripten&#8217;s environment, a limited subset of libc. We also had to disable all assembly routines and just-in-time (JIT) compiling features of SpiderMonkey, since assembler is not available in JavaScript.</p>
<p>Once we had the SpiderMonkey API available, we wrote a wrapper script that makes it much easier to use the library from JavaScript. The js.js API wrapper is about 1000 lines of JavaScript and allows you to create sandboxed environments and execute code in them. The following is an example that shows how to use the API to run <code>1+1</code> in a sandbox and get the result as a number:</p>
<pre class="brush: jscript; title: ; notranslate">
var src = &quot;1 + 1&quot;;
var jsObjs = JSJS.Init();
var compiledObj = JSJS.CompileScript(jsObjs.cx, jsObjs.glob,
                                     src);
var rval = JSJS.ExecuteScript(jsObjs.cx, jsObjs.glob,
                              compiledObj);
d = JSJS.ValueToNumber(jsObjs.cx, rval);
</pre>
<p>After executing, the value of <code>d</code> will be <code>2</code>.</p>
<h2>Performance Evaluation</h2>
<p>We wanted to quantify the performance overhead at the microbenchmark level and the macrobenchmark level. For the former, we timed each of the js.js functions, and for the latter, we used the <a href="http://www.webkit.org/perf/sunspider/sunspider.html">SunSpider</a> JavaScript benchmark.</p>
<h3>Microbenchmark</h3>
<p>To quantify the performance of each the js.js API function, the following table shows the mean across 10 runs of execution time (in milliseconds) of each function called by the above <code>1+1</code> example:</p>
<p><center></p>
<table cellspacing="0">
<tr>
<th style="border-top: 1px solid #000000; border-bottom: 1px solid #000000;" align="left">Operation</th>
<th style="border-top: 1px solid #000000; border-bottom: 1px solid #000000;" align="right">Mean Execution Time (ms)</th>
</tr>
<tr>
<td>libjs.min.js load</td>
<td align="right">84.9</td>
</tr>
<tr>
<td>NewRuntime</td>
<td align="right">25.2</td>
</tr>
<tr>
<td>NewContext</td>
<td align="right">35.8</td>
</tr>
<tr>
<td>GlobalClassInit</td>
<td align="right">15.5</td>
</tr>
<tr>
<td>StandardClassesInit</td>
<td align="right">60.1</td>
</tr>
<tr>
<td>Execute 1+1</td>
<td align="right">70.6</td>
</tr>
<tr>
<td>DestroyContext</td>
<td align="right">33.3</td>
</tr>
<tr>
<td style="border-bottom: 1px solid #000000;">DestroyRuntime</td>
<td style="border-bottom: 1px solid #000000;" align="right">1.8</td>
</tr>
</table>
<p></center></p>
<p>The execution time here isn&#8217;t great, but it&#8217;s within the range that the library is usable. It takes about 220ms of setup time to get an execution environment up and running.</p>
<h3>Macrobenchmark</h3>
<p>We also wanted to compare the performance of js.js with native JavaScript execution. We took the SunSpider JavaScript benchmark and ran it natively using the SpiderMonkey js shell (with the JIT turned off). We then ran the benchmarks again using js.js. The following graph shows the median factor of slowdown for running each benchmark inside js.js in both Firefox and Chrome:</p>
<p><a href="http://sns.cs.princeton.edu/wp-content/uploads/2012/04/combined_times.png"><img src="http://sns.cs.princeton.edu/wp-content/uploads/2012/04/combined_times_thumb.png" alt="js.js Sunspider Benchmark" title="js.js Sunspider Benchmark" /></a></p>
<hr/>
<p>This benchmark shows that on average, running code inside js.js is about 200 times slower than native execution. Considering that JavaScript is being run instead of native x86, two orders of magnitude is not terrible. Depending on the scripts being executed in the sandbox, the performance overhead might be acceptable. We&#8217;ve also looked into where most of the execution time is going, and we found that the interpreter loop is being converted into a single JavaScript function that is thousands of lines long. JIT compilers don&#8217;t handle this case very well, so we&#8217;ve been brainstorming ideas on how to break up this interpreter loop into separate functions, that could help improve performance.</p>
<h2>Demo</h2>
<p>As a demo, we took the JavaScript used to render the <a href="https://twitter.com/about/resources/buttons">Twitter button</a> and ran it inside js.js, giving the script virtual access to a DOM. This allows us to run the script, while maintaining complete control over what the sandbox can do to the real DOM.</p>
<p><center><big><a href="http://jterrace.github.com/js.js/twitter/">Live js.js Twitter Demo</a></big></center><br />
<center><big><a href="http://jterrace.github.com/js.js">More Demos</a></big></center></p>
<h2>Conclusion</h2>
<p>This project has been a lot of fun to work, and we recently had a demo paper about js.js accepted to <a href="http://static.usenix.org/events/webapps12/">WebApps 2012</a>. For more details, check out <a href="http://www.cs.princeton.edu/~jterrace/docs/jsjs.pdf">the js.js paper</a>. The <a href="https://github.com/jterrace/js.js">source code for js.js</a> is available on GitHub, so if you&#8217;re interested in the project, you&#8217;re welcome to fork it and try it out. Pull requests accepted!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=m4mtfC9Ckfw:PZbpQMDEBqE:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=m4mtfC9Ckfw:PZbpQMDEBqE:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=m4mtfC9Ckfw:PZbpQMDEBqE:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=m4mtfC9Ckfw:PZbpQMDEBqE:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=m4mtfC9Ckfw:PZbpQMDEBqE:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=m4mtfC9Ckfw:PZbpQMDEBqE:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/m4mtfC9Ckfw" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2012/04/javascript-in-javascript-js-js-sandboxing-third-party-scripts/feed/</wfw:commentRss>
		<slash:comments>37</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2012/04/javascript-in-javascript-js-js-sandboxing-third-party-scripts/</feedburner:origLink></item>
		<item>
		<title>Princeton CS hiring again this year</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/LwgHyBZ-C7Y/</link>
		<comments>http://sns.cs.princeton.edu/2011/11/princeton-cs-hiring-again-this-year/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 22:00:40 +0000</pubDate>
		<dc:creator>Mike Freedman</dc:creator>
				<category><![CDATA[Princeton]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=1814</guid>
		<description><![CDATA[This year saw some great faculty additions to the CS department: Mark Braverman in theoretical computer science, Zeev Dvir in TCS and Math, Rebecca Fiebrink in computational music and HCI (hired in 2010), and David Wentzlaff in the EE department, although his work includes the very CS-like topics of computer architecture and operating systems. This [...]]]></description>
			<content:encoded><![CDATA[<p>This year saw some great faculty additions to the CS department:</p>
<ul>
<li><a href="http://www.cs.princeton.edu/~mbraverm/">Mark Braverman</a> in theoretical computer science,</li>
<li><a href="http://www.cs.princeton.edu/~zdvir/">Zeev Dvir</a> in TCS and Math,</li>
<li><a href="http://www.cs.princeton.edu/~fiebrink/">Rebecca Fiebrink</a> in computational music and HCI (hired in 2010), and</li>
<li><a href="http://www.princeton.edu/~wentzlaf/">David Wentzlaff</a> in the EE department, although his work includes the very CS-like topics of computer architecture and operating systems.</li>
</ul>
<p>This coming spring, we&#8217;re happy to be interviewing again at the Assistant Professor level. Rather than limit the search to any particular areas of focus, we&#8217;re looking for top applicants across all areas of computer science.</p>
<p><a style="display:none;" id="te1532438613" href="javascript:expand('#te1532438613')">Announcement</a>
<div class="te_div" id="te1532438613"><script language="JavaScript" type="text/javascript">expander_hide('#te1532438613');</script></p>
<blockquote><p><strong>Tenure-Track Positions.</strong></p>
<p>The Department of Computer Science at Princeton University invites applications for faculty positions at the Assistant Professor level. We are accepting applications in all areas of Computer Science.</p>
<p>Applicants must demonstrate superior research and scholarship potential as well as teaching ability. A PhD in Computer Science or a related area is required. Successful candidates are expected to pursue an active research program and to contribute significantly to the teaching programs of the department. Applicants should include a CV and contact information for at least three people who can comment on the applicant&#8217;s professional qualifications.</p>
<p>There is no deadline, but review of applications will start in December 2011. Princeton University is an equal opportunity employer and complies with applicable EEO and affirmative action regulations. You may apply online at: <a href="http://jobs.cs.princeton.edu/">http://jobs.cs.princeton.edu/</a>.</p>
<p>Requisition Number: 0110422</p></blockquote>
<p></div></p>
<p>Like at many schools, the application cycle has shifted earlier a bit compared to previous years, so applicants are recommended to apply earlier rather than later. The <a href="href=">website</a> is already open.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=LwgHyBZ-C7Y:Wb7bVR2orpM:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=LwgHyBZ-C7Y:Wb7bVR2orpM:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=LwgHyBZ-C7Y:Wb7bVR2orpM:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=LwgHyBZ-C7Y:Wb7bVR2orpM:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=LwgHyBZ-C7Y:Wb7bVR2orpM:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=LwgHyBZ-C7Y:Wb7bVR2orpM:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/LwgHyBZ-C7Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2011/11/princeton-cs-hiring-again-this-year/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2011/11/princeton-cs-hiring-again-this-year/</feedburner:origLink></item>
		<item>
		<title>Should We Extend Conference Q&amp;A With Written Responses?</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/PXiLoHxVdS0/</link>
		<comments>http://sns.cs.princeton.edu/2011/11/should-we-extend-conference-qa-with-written-responses/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 16:35:08 +0000</pubDate>
		<dc:creator>wlloyd</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Extended Q&A]]></category>
		<category><![CDATA[COPS]]></category>
		<category><![CDATA[Q&A]]></category>
		<category><![CDATA[SOSP]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=1751</guid>
		<description><![CDATA[The CS community recently discussed extending the Q&#38;A session that occurs after each talk at a conference into a more formal written Q&#38;A.  More specifically, this was raised during the business meeting at SOSP and the proposal was to publish the results in SIGOPS OSR.  The idea was this written extension to Q&#38;A could really [...]]]></description>
			<content:encoded><![CDATA[<p>The CS community recently discussed extending the Q&amp;A session that occurs after each talk at a conference into a more formal written Q&amp;A.  More specifically, this was raised during the business meeting at SOSP and the proposal was to publish the results in SIGOPS OSR.  The idea was this written extension to Q&amp;A could really get to the bottom of the issues raised, and it wouldn&#8217;t let speakers avoid questions by saying, &#8220;Let&#8217;s take that offline.&#8221;  There was some push back against this with arguments like &#8220;most questions are just misunderstandings&#8221; and &#8220;that will add a lot of pointless work for speakers/authors.&#8221;</p>
<p>In this post I&#8217;ll examine the questions asked at the end of my <a title="COPS Talk from SOSP '11" href="http://www.youtube.com/watch?v=jh9P1moDpAc" target="_blank">SOSP talk</a> on <a title="COPS: Scalable Causal Consistency" href="http://sns.cs.princeton.edu/projects/cops/" target="_blank">COPS</a>.  We&#8217;ll look at a summary of each of the questions asked and my written response, and then hopefully we&#8217;ll be able to conclude if a written Q&amp;A is a good idea or not.  The full transcript of each question with comments and clarification added in square brackets is toggable with the transcript links.</p>
<p>&nbsp;</p>
<h2>Question 1</h2>
<p><a style="display:none;" id="te41579355" href="javascript:expand('#te41579355')">Transcript</a>
<div class="te_div" id="te41579355"><script language="JavaScript" type="text/javascript">expander_hide('#te41579355');</script></p>
<p style="padding-left: 30px;">Question from Hussam Abu-Libdeh (Cornell University)</p>
<p style="padding-left: 30px;"><strong>Hussam:</strong> So I&#8217;m actually a bit confused by how you achieve partition tolerance.  If my operations are going to block until a server that has a dependency that I depend on responds back to me, I can talk to a datacenter perform an operation, that datacenter gets partitioned away, I talk to another datacenter but that datacenter didn&#8217;t see any of the operations that I depend on so I&#8217;m going to block.</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> Sure, so I think the question, if I can paraphrase the question, is that you have multiple datacenters, it&#8217;s possible you depend on something that&#8217;s being replicated from one datacenter, that datacenter gets partitioned away, and then your updates aren&#8217;t going to show up to a third datacenter until these updates are propagated from the now partitioned datacenter. Is that correct?</p>
<p style="padding-left: 30px;"><strong>Hussam:</strong> Yeah, and I&#8217;m blocking meanwhile.</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> You&#8217;re not blocking anywhere.  These operations won&#8217;t show up right away. Causal consistency doesn&#8217;t say, &#8220;I see thing right away.&#8221; It&#8217;s not strong consistency like that.  You&#8217;ll still get to see consistent values, they just won&#8217;t be super up-to-date.</p>
<p style="padding-left: 30px;"><strong>Hussam:</strong> I&#8217;ll take it offline.</div></p>
<p><strong>Question Summary:</strong>  The question could be interpreted two ways, so we&#8217;ll look at both.</p>
<p><strong>Interpretation A:</strong> &#8220;What happens if a client is partitioned from the datacenter they are accessing?&#8221; (Note: Much of the feedback and questions after the talk were questions like A, so I think this is what Hussam meant.)</p>
<p><strong>Written Answer:</strong> The clients of our system are the web servers collocated in the datacenter with the storage cluster, so they won&#8217;t be partitioned.  What you are really asking about are not the direct clients of the storage system, but the human who is a client of a web browser who is a client of a web server who is a client of the storage system.  Our system doesn&#8217;t provide consistency directly for those clients three levels away, but we think it&#8217;s an important and interesting problem, and we&#8217;re actively thinking about it.</p>
<p><strong>Interpretation B:</strong> &#8220;What happens if a datacenter that is replicating data you depend on is partitioned?&#8221; (This is what I interpreted Hussam to mean at the time.)</p>
<p><strong>Written Answer:</strong> No operations will ever block, but your new put operations won&#8217;t show up in other datacenters until their dependencies have shown up in that datacenter.  So there is no blocking, but this comes at the cost of not guaranteeing your updates show up everywhere immediately.</p>
<p>&nbsp;</p>
<h2>Question 2</h2>
<p><a style="display:none;" id="te1044215777" href="javascript:expand('#te1044215777')">Transcript</a>
<div class="te_div" id="te1044215777"><script language="JavaScript" type="text/javascript">expander_hide('#te1044215777');</script></p>
<p style="padding-left: 30px;">Question from Maysam Yabandeh (Yahoo! Research Spain)</p>
<p style="padding-left: 30px;"><strong>Maysam:</strong> Let&#8217;s put details, implementations, and your wide-area setting aside.  From an abstract point of view I see lots of similarities between your model [causal+ consistency] and snapshot isolation.  First, both of you might maintain multiple versions of data.  Second, both of you talk about snapshots.  And third, both of you try to detect and avoid write-write conflicts.  I wonder about the differences.</p>
<p style="padding-left: 30px;">[Note: COPS does <em>not</em> avoid write-write conflicts.  We only allow single key put operations, so we can only have write-write conflicts between two put operations. These <em>can</em> happen and are then either resolved by the last-writer-wins rule or the convergent conflict handler function.]</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> Are you asking me about the difference between this [causal+ consistency] and what Jinyang just talked about, PSI, or just Snapshot Isolation in general?</p>
<p style="padding-left: 30px;"><strong>Maysam:</strong> In general.</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> In general, snapshot isolation is sort of a database property, so it&#8217;s a stronger consistency then what you get [with causal+].  Snapshot isolation you can do these transaction that have reads and writes and things like that. We don&#8217;t have that in our system.  What we have in our system is we&#8217;re guaranteeing you low latency.  Things will always complete right away, very quickly, no matter what.</p>
<p style="padding-left: 30px;"><strong>Maysam:</strong> But look at this from an abstract point of view. I want to compare causal+ from an abstract point of view to snapshot isolation.</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> This is sort of tricky.  In the last talk Jinyang had this spectrum of consistency models. What she was showing you was more from the database side, where you have these transactions that involve multiple keys at the same time, and multiple updates, and multiple operations and things like that.  And we&#8217;re more from the shared memory side or something like that, where all of these things involve one operation at a time.  So how exactly they interact, it&#8217;s a very complex graph of how these consistency models interact.  I would say Snapshot Isolation is definitely a stronger property than what we provide, but we do so with better performance characteristics.</p>
<p style="padding-left: 30px;"><strong>Maysam:</strong> But you talk about write-write conflicts.  Write-write conflict make sense if you have write conflicts between two transactions.  You didn&#8217;t call it transactions, but I guess in the paper you call it context or something like that.  You didn&#8217;t call it transactions but you call it context or something like that.  You give it a different name.  But still it is kind of context, but it is kind of transactions.</p>
<p style="padding-left: 30px;">[Maysam is confused here, the context we describe in the paper is part of the client API for identifying different clients, it has nothing to do with transactions.]</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> So we only have read transactions.  You can only read multiple values in a transaction.</p>
<p style="padding-left: 30px;"><strong>Maysam:</strong> So when you talk about write-write conflicts, is it between [trailed off]</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> Write-write conflicts?  We can have write-write conflicts in our system, but we have to use the last writer wins rule, or we have to use some sort of application specific function that is going to resolve these conflicts for us.</p>
<p style="padding-left: 30px;">[Again, write-write conflicts are only for two puts to the same key.  There are not general transactions in COPS.]</p>
<p style="padding-left: 30px;"><strong>Maysam:</strong> But, to have write-write conflicts, you first need to [cut off]</p>
<p style="padding-left: 30px;"><strong>Ant Rowstron (Session Chair):</strong> I think we need to take this offline and head onto the next question.</div></p>
<p><strong>Question Summary:</strong> What are the differences between snapshot isolation and causal+ consistency?</p>
<p><strong>Question Answer:</strong> Causal+ consistency deals with single key put operations and single or multi key get operations.  Snapshot isolation is stronger that causal+ because it deals with general transactions that can include many different put and get operations.  In addition, snapshot isolation ensures there are never conflicting transactions in the system (avoids write-write conflicts). While causal+ doesn&#8217;t have the notion of a transaction, but does allow and then resolves conflicting writes to the same key (embraces single key write-write conflicts).</p>
<p>&nbsp;</p>
<h2>Question 3</h2>
<p><a style="display:none;" id="te1912029269" href="javascript:expand('#te1912029269')">Transcript</a>
<div class="te_div" id="te1912029269"><script language="JavaScript" type="text/javascript">expander_hide('#te1912029269');</script></p>
<p style="padding-left: 30px;">Question from Marcos Aguilera (Microsoft Research Silicon Valley)</p>
<p style="padding-left: 30px;"><strong>Marcos:</strong> You made a case that gets are not enough therefore you need get transactions.  [Wyatt says "yes"].  The previous person was asking about other types of transactions.  You could also make the argument that puts are not enough and you need put transactions.  In fact, you need more general transactions.  And you mentioned that you have more the perspective of a shared-memory system, but there we have transactional memory as well.  And so, I&#8217;m wondering without general transactions isn&#8217;t that the same thing as trying to go to war with rocks and stones when you have machine guns available, which is what general transactions are.</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> I would agree with the first half of what you said and strongly disagree with the second half.  So I think put transactions are important and it&#8217;s something that I&#8217;m thinking about.  What else did you say? General transactions.  My view of your work in the previous paper and this work is that they&#8217;re sort of complementary approaches.  Like, we really want to have low latency, we say operations must be really really fast.  In your work, you say, &#8220;We have to have these transactions.  We have to avoid write-write conflicts.&#8221;  I think there&#8217;s places for both of these and I think ultimately you&#8217;d have some sort of system that would join the two.  And I don&#8217;t think this is like using rocks, this is like using something that you know is going to be really fast.  I&#8217;m never going to have to do that slow 2PC across the wide area [unlike in walter].</div></p>
<p><strong>Question Summary:</strong>  Can you compare COPS and Walter? (Walter was the system described in the previous talk, one of whose authors asked this question.)</p>
<p><strong>Question Answer:</strong> The two systems provide complementary approaches.  COPS guarantees successful low latency operations at the cost of not providing general transactions.  Walter guarantees conflict-free general transaction at the cost of allowing transactions to abort and (sometimes) having to do wide-area locking via two phase commit, which is directly incompatible with low latency.</p>
<p><strong><br />
</strong></p>
<h2>Question 4</h2>
<p><a style="display:none;" id="te639151758" href="javascript:expand('#te639151758')">Transcript</a>
<div class="te_div" id="te639151758"><script language="JavaScript" type="text/javascript">expander_hide('#te639151758');</script></p>
<p style="padding-left: 30px;">Question from Marc Shapiro (INRIA)</p>
<p style="padding-left: 30px;"><strong>Ant Rowstrom (Session Chair):</strong>  Can we keep the last two questions very short.  Marc, is it a question?</p>
<p style="padding-left: 30px;"><strong>Marc:</strong> A comment and a question.</p>
<p style="padding-left: 30px;"><strong>Ant:</strong> Can we have the question?</p>
<p style="padding-left: 30px;"><strong>Marc:</strong> The comment is, I think your causal+ property is much too strong, you can get exactly the same results with something a lot simpler.  But we can take that offline. The question is, you said explicit dependency tracking is novel. It&#8217;s been around for a long time. It&#8217;s been beaten to death.</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> No, no, no.  I didn&#8217;t mean to say explicit dependency tracking itself is a novel technique.  Doing this is conjunction with decentralized replication is a new technique.</p>
<p style="padding-left: 30px;">[Note: The slide read, "Novel Techniques: Explicit dependency tracking and verification <em>with</em> decentralized replication. ..."]</p>
<p style="padding-left: 30px;"><strong>Marc:</strong> The question is, vector clocks were invented because explicit dependency tracking is complicated and slow.  So I&#8217;m really puzzled why didn&#8217;t you just use vector clocks.</p>
<p style="padding-left: 30px;">[Note: I misunderstood Marc's question here, see the response to what he was asking in the "written answer" below.  I thought he wanted to know why do we use lamport timestamps (small fixed size) to establish a causal order instead of (much larger) vector clocks that give a more precise order.]</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> So we don&#8217;t use vector clocks because we&#8217;re talking about really big systems.  And when we have this really big system, like let&#8217;s say I have a thousand nodes, then I&#8217;m going to have a vector clock with a thousand entries in it.  [Marc (while Wyatt is still speaking): Yeah but there are compressed versions of that.]  So it&#8217;s going to be huge compared to the small amount of metadata we&#8217;re propagating around normally.</p>
<p style="padding-left: 30px;"><strong>Marc:</strong> That&#8217;s been beaten to death. </div></p>
<p><strong>Question Summary:</strong> Why not use vector clocks instead of explicit dependencies to capture causality?</p>
<p><strong>Written Answer:</strong> We use explicit dependencies because they are compatible with distributed verification, whereas vector clocks are not. They would need a centralized serialization point in each datacenter to ensure that updates from other datacenters are applied in the correct causal order.</p>
<p>&nbsp;</p>
<h2>Question 5</h2>
<p><a style="display:none;" id="te1367824550" href="javascript:expand('#te1367824550')">Transcript</a>
<div class="te_div" id="te1367824550"><script language="JavaScript" type="text/javascript">expander_hide('#te1367824550');</script></p>
<p style="padding-left: 30px;"><strong>Ant Rowstrom(Session Chair):</strong> Okay, let&#8217;s go for the last one.  Is it quick?</p>
<p style="padding-left: 30px;">Question from Unidentified, un for short.</p>
<p style="padding-left: 30px;"><strong>Un:</strong> Where is the metadata stored physically?</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> Metadata? It&#8217;s physically stored both on servers and in the client library.</p>
<p style="padding-left: 30px;"><strong>Un:</strong> On the servers where?  Like in memory, or &#8230; My question is actually, &#8220;How do you deal with corruption of metadata or failures on the side?&#8221;</p>
<p style="padding-left: 30px;"><strong>Wyatt:</strong> So failures inside a datacenter.  We looked at this like, this is not what our main contribution is.  And we took existing techniques like chain replication, that give you this strong consistency, that give you this fault tolerance inside these datacenters.  We said we&#8217;ll just build on top of that, that&#8217;s not where our contribution is.  And in terms of dealing with bit flips, you&#8217;d probably want checksums in your system. I think Amazon came out with that, &#8220;we really want that, it screwed things up awhile ago.&#8221;</p>
<p style="padding-left: 30px;"><strong>Un:</strong> Thanks. </div></p>
<p><strong>Question Summary:</strong> How do you deal with different types of failures?</p>
<p><strong>Written Answer:</strong> That&#8217;s not where our innovation is, so we just used existing techniques to deal with failures (currently, chain replication).</p>
<p>&nbsp;</p>
<h2>Discussion</h2>
<p>In reviewing the questions, it seem pretty clear that almost all questions stem from confusion surrounding parts of the system that were gone over quickly or skipped in the talk.  These are good questions to have immediately after a talk, other people in the audience are probably confused about the same things.  However, the questions only make sense with the context from either the talk or the paper and almost all of them would be clarified by reading the paper.</p>
<p>So let&#8217;s break down the potential audience for the extended answers:</p>
<p>1) OSR readers who didn&#8217;t see or don&#8217;t remember the talk and didn&#8217;t read the paper.  The questions and answer wouldn&#8217;t make any sense to these people.</p>
<p>2) OSR readers who saw the talk, didn&#8217;t read the paper, and remember the talk over a month later.  Based on how much I remember from talks I saw a month ago, I don&#8217;t think this will be a very populous group.</p>
<p>3) OSR readers who read the paper. The paper should cover everything that was asked about, so the extra written answers should be unnecessary.  (E.g., Section 2/Fig 1 answer question 1, Related Work answers question 3)</p>
<p>4) People who watched the talk on youtube.  This audience is relatively large, the <a href="http://www.youtube.com/watch?v=jh9P1moDpAc">video of the talk</a> has 224 view after being up for about a week.  They have exactly the same context as IRL audience members, and I know they have some of the same questions.  For instance, <a href="http://toddhoff.com">Todd Hoff</a>, who wrote a <a href="http://highscalability.com/blog/2011/11/23/paper-dont-settle-for-eventual-scalable-causal-consistency-f.html">post about COPS on his high scalability blog</a>, also thought of question 5: why not use vector clocks?  Given I misinterpreted the question at the time, it&#8217;s good to have a correct answer here!</p>
<p>So while the audience for written answers in OSR would be tiny, I think there is an audience for more detailed answers to questions: youtube viewers!  I&#8217;m now all for written answers to questions, but I think that a blog, like this, is the appropriate venue for publishing them and not OSR!</p>
<p>&nbsp;</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=PXiLoHxVdS0:jYSVc8NKIZA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=PXiLoHxVdS0:jYSVc8NKIZA:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=PXiLoHxVdS0:jYSVc8NKIZA:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=PXiLoHxVdS0:jYSVc8NKIZA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=PXiLoHxVdS0:jYSVc8NKIZA:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=PXiLoHxVdS0:jYSVc8NKIZA:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/PXiLoHxVdS0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2011/11/should-we-extend-conference-qa-with-written-responses/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2011/11/should-we-extend-conference-qa-with-written-responses/</feedburner:origLink></item>
		<item>
		<title>Masquerading ACM Authorizer Links</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/wesqqLrhMPY/</link>
		<comments>http://sns.cs.princeton.edu/2011/10/masquerading-acm-authorizer-links/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 11:27:04 +0000</pubDate>
		<dc:creator>Jeff Terrace</dc:creator>
				<category><![CDATA[Tricks]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=1714</guid>
		<description><![CDATA[ACM recently released the ACM Authorizer service. It allows authors of ACM papers to link to ACM&#8217;s digital library from their website. Users who click the link will receive access to the author&#8217;s paper free of charge. The main concerns I had about changing my website to link to this new service are: If ACM&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>ACM recently released the <a href="http://www.acm.org/publications/acm-author-izer-service">ACM Authorizer</a> service. It allows authors of ACM papers to link to ACM&#8217;s digital library from their website. Users who click the link will receive access to the author&#8217;s paper free of charge.</p>
<p>The main concerns I had about changing my website to link to this new service are:</p>
<ol>
<li>If ACM&#8217;s website is down, my paper is no longer accessible</li>
<li>An independent copy of my paper&#8217;s PDF would not longer be linked in Google&#8217;s search index</li>
</ol>
<p>There is an easy workaround to this problem. You can keep your markup linking to a local PDF of your paper and use javascript to rewrite the link.</p>
<p>What I did was change my paper&#8217;s link from this:</p>
<pre class="brush: xml; gutter: false; title: ; notranslate">
&lt;a href=&quot;docs/feeding-frenzy-sigmod10-web.pdf&quot;&gt;PDF&lt;/a&gt;
</pre>
<p>to this:</p>
<pre class="brush: xml; gutter: false; title: ; notranslate">
&lt;a class=&quot;acm_authorizer&quot; id=&quot;acm_350703&quot;
  href=&quot;docs/feeding-frenzy-sigmod10-web.pdf&quot;&gt;PDF&lt;/a&gt;
</pre>
<p>I chose to use a class for my ACM links because a class-based lookup is very fast. I put the ACM digital library identifier into the <em>id</em> attribute of the element, but it has a prefix because HTML id&#8217;s are not allowed to start with a number (so my HTML still validates).</p>
<p>Now, it&#8217;s easy to add some javascript to rewrite the link on page load:</p>
<pre class="brush: jscript; gutter: false; html-script: true; title: ; notranslate">
&lt;script src=&quot;http://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js&quot;
  type=&quot;text/javascript&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
$(document).ready(function() {
  $('a.acm_authorizer').each(function() {
    $(this).attr('href', 'http://dl.acm.org/authorize?' +
                         $(this).attr('id').substr(4));
  });
});
&lt;/script&gt;
</pre>
<p>The first script tag loads jQuery from google&#8217;s CDN. The second script tag finds all <em>a</em> elements with a CSS class of &#8220;acm_authorizer&#8221; and changes their <em>href</em> attribute to point to the ACM digital library &#8212; getting the ACM ID from the <em>id</em> attribute of the element.</p>
<p>With this, users who visit the site with JavaScript enabled will be directed to ACM, but the markup of the page points to a copy of the paper on my site, keeping it in the index of search engines. You can see this in action at <a href="http://www.cs.princeton.edu/~jterrace/">my website</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=wesqqLrhMPY:GXTEvC8QaY4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=wesqqLrhMPY:GXTEvC8QaY4:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=wesqqLrhMPY:GXTEvC8QaY4:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=wesqqLrhMPY:GXTEvC8QaY4:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=wesqqLrhMPY:GXTEvC8QaY4:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=wesqqLrhMPY:GXTEvC8QaY4:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/wesqqLrhMPY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2011/10/masquerading-acm-authorizer-links/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2011/10/masquerading-acm-authorizer-links/</feedburner:origLink></item>
		<item>
		<title>Conference Presentation Faux Pas</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/YcZiwOCYZPQ/</link>
		<comments>http://sns.cs.princeton.edu/2010/10/conference-presentation-faux-pas/#comments</comments>
		<pubDate>Wed, 20 Oct 2010 22:10:30 +0000</pubDate>
		<dc:creator>Jeff Terrace</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[osdi]]></category>
		<category><![CDATA[powerpoint]]></category>
		<category><![CDATA[presentations]]></category>
		<category><![CDATA[slides]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=1269</guid>
		<description><![CDATA[I recently attended OSDI 2010 where I sat through about 30 presentations on systems-related topics. I was surprised that there were so many occurrences of things that I think should be avoided when giving a presentation. In this post, I&#8217;m going to outline several things that, in my opinion, you should avoid when giving a [...]]]></description>
			<content:encoded><![CDATA[<p>I recently attended <a href="http://www.usenix.org/events/osdi10/index.html">OSDI 2010</a> where I sat through about 30 presentations on systems-related topics. I was surprised that there were so many occurrences of things that I think should be avoided when giving a presentation.</p>
<p>In this post, I&#8217;m going to outline several things that, in my opinion, you should avoid when giving a talk at a conference. Keep in mind that this is just my opinion and you&#8217;re welcome to disagree, but I think you&#8217;re wrong and I&#8217;ll explain why.</p>
<p>Note that the examples below are all taken from the OSDI slides that were posted after the conference. If you&#8217;re an author of one of these presentations, please don&#8217;t take it as criticism of your work or you. The criticism is only aimed at your stylistic choices when creating your presentation.</p>
<p><br/></p>
<h2>Dark Backgrounds</h2>
<p><img style="border: 1px solid #000; padding: 0;" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/1_dark_bg.png" alt="Dark Background" title="Dark Background" width="500" height="375" class="aligncenter size-full wp-image-1278" /></p>
<hr/>
This is probably the worst decision you can make. Now it&#8217;s definitely true that dark backgrounds can look nice. In fact, they can look particularly good on the screen where you created the presentation, which is why I think it&#8217;s an attractive choice for a lot of people.</p>
<p>The problem with this is two-fold. First, a more minor reason, is that it&#8217;s difficult to keep things consistent when using a dark background. Often you include things like images, tables, diagrams, and graphs. Many of these will have white backgrounds, so to get them to fit with your presentation, it&#8217;s extra work that you have to do.</p>
<p>Second, the major reason not to use a dark background is that it makes your slides much harder to read when displayed on a projector. The way that the human eye perceives blackness is relative to what&#8217;s around it. When you project black text on a white background, the black color of the text looks very black because of the large, bright, patch of white around it. When you project white text on a black background, the background looks washed out because there isn&#8217;t much light around it. This effect is also amplified because projectors can&#8217;t actually display a real black color. Black is the absence of light, so a projector simply doesn&#8217;t project any light at the pixels that are supposed to be black. Since conferences are always in brightly lit rooms, the black background ends up looking like a pale gray.</p>
<p><br/></p>
<h2>Ugly Fonts</h2>
<p><img style="border: 1px solid #000; padding: 0;" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/2_bad_font.png" alt="" title="Bad Font" width="500" height="375" class="aligncenter size-full wp-image-1282" /></p>
<hr/>
This presentation uses the Chalkboard font. It turns what should be a technical diagram into what looks like a 4-year-old&#8217;s drawing. It&#8217;s very distracting. Comic Sans also made an appearance at OSDI. Please use reasonable fonts. You can also see the poor choice of a dark background here as well.</p>
<p><br/></p>
<h2>Outline Slides</h2>
<p><img style="border: 1px solid #000; padding: 0;" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/3_outline_slide.png" alt="" title="Outline Slide" width="500" height="376" class="aligncenter size-full wp-image-1286" /></p>
<hr/>
I&#8217;ll admit that outline slides can have their place. For example, if you&#8217;re giving an hour-long talk, it can be nice to give an overview to your audience to tell them what you&#8217;re going to cover. When you only have 22 minutes, however, wasting time telling us <i>what</i> you&#8217;re going to talk about instead of just talking about it is silly. In this particular talk, they went back to the outline slide each time they switched sections. Even if it only takes a few seconds each time, you probably wasted a full minute just telling me what you were <i>going</i> to talk about. Especially when you&#8217;re throwing darts at your bullet points. No clue what they were thinking.</p>
<p><br/></p>
<h2>Underlined Text</h2>
<p><img style="border: 1px solid #000; padding: 0;" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/4_underlined_text.png" alt="" title="Underlined Text" width="500" height="375" class="aligncenter size-full wp-image-1297" /></p>
<hr/>
This is not a deal breaker, but I still advise against it. Underlined text usually looks pretty ugly, as it does here. Avoid it.</p>
<p><br/></p>
<h2>Header/Footer on Every Slide</h2>
<p><img style="border: 1px solid #000; padding: 0;" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/5_header_footer.png" alt="" title="Header Every Slide" width="500" height="375" class="aligncenter size-full wp-image-1299" /><br />
<img style="border: 1px solid #000; padding: 0;" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/5_header_footer2.png" alt="" title="Footer Every Slide" width="500" height="375" class="aligncenter size-full wp-image-1300" /></p>
<hr/>
Sometimes these can be tastefully done, but many people abuse it. In the first example, the presentation had a giant bar that says &#8220;Facebook&#8221; on every single slide. Perhaps it&#8217;s company policy; I don&#8217;t know, but it looks obnoxious and I don&#8217;t need to be reminded every single slide that this presentation is from Facebook. In the second example, it even includes the name of the conference in the footer. I often forget which conference I&#8217;m at in the middle of a talk, so I&#8217;m glad it was there. But really, you don&#8217;t need a header and/or footer on every slide. It&#8217;s distracting and unnecessary.</p>
<p><br/></p>
<h2>Walls of Text</h2>
<p><img style="border: 1px solid #000; padding: 0;"  src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/6_wall_text.png" alt="" title="Wall of Text" width="500" height="375" class="aligncenter size-full wp-image-1303" /><br />
<img style="border: 1px solid #000; padding: 0;"  src="http://sns.cs.princeton.edu/wp-content/uploads/2010/10/6_wall_text2.png" alt="" title="Wall of Text 2" width="500" height="375" class="aligncenter size-full wp-image-1304" /></p>
<hr/>
I think people are wising up to this, but there were still a few cases. If you put too much text on a slide, then one of two things will happen: either people only pay attention to what you say &#8211; making the slide ineffective, or they spend time reading the slide instead of listening to you. Instead, use the text on your slide to support and enhance what you&#8217;re saying. The first example here just has too much text. The second example has a lot of code. Both make it difficult to simultaneously listen to the speaker and read the slides.</p>
<p><br/></p>
<h2>Conclusion</h2>
<p>Again, this was just my opinion, but if you can avoid these things, please do. It will enhance your presentation and make it so people don&#8217;t get distracted.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=YcZiwOCYZPQ:hZI6oi-fdMI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=YcZiwOCYZPQ:hZI6oi-fdMI:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=YcZiwOCYZPQ:hZI6oi-fdMI:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=YcZiwOCYZPQ:hZI6oi-fdMI:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=YcZiwOCYZPQ:hZI6oi-fdMI:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=YcZiwOCYZPQ:hZI6oi-fdMI:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/YcZiwOCYZPQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2010/10/conference-presentation-faux-pas/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2010/10/conference-presentation-faux-pas/</feedburner:origLink></item>
		<item>
		<title>Application-Level (Addon) Messaging in World of Warcraft</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/jGUh0QxBgKs/</link>
		<comments>http://sns.cs.princeton.edu/2010/09/application-level-addon-messaging-in-world-of-warcraft/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 22:01:22 +0000</pubDate>
		<dc:creator>Jeff Terrace</dc:creator>
				<category><![CDATA[Messaging]]></category>
		<category><![CDATA[addons]]></category>
		<category><![CDATA[application-level]]></category>
		<category><![CDATA[messaging]]></category>
		<category><![CDATA[sirikata]]></category>
		<category><![CDATA[virtual world]]></category>
		<category><![CDATA[warcraft]]></category>
		<category><![CDATA[world of warcraft]]></category>
		<category><![CDATA[wow]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=1155</guid>
		<description><![CDATA[Introduction I&#8217;m currently working on a project called Sirikata, an open, programmable, scalable, secure, and extensible virtual world. One aspect of this virtual environment is application-level messaging. Users can create objects in our world, and these objects can communicate with each other. To get an idea of what application-level messaging looks like, we wanted to [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>I&#8217;m currently working on a project called <a href="http://www.sirikata.com/">Sirikata</a>, an open, programmable, scalable, secure, and extensible virtual world.</p>
<p>One aspect of this virtual environment is application-level messaging. Users can create objects in our world, and these objects can communicate with each other. To get an idea of what application-level messaging looks like, we wanted to take a look at one of the world&#8217;s biggest virtual worlds: <a href="http://www.worldofwarcraft.com/">World of Warcraft</a> (WoW).</p>
<p>In WoW, users can create an &#8220;Addon&#8221;: a user-interface modification component that can add enhancements to the game. To communicate, these addons (written in Lua) have access to a function called SendAddonMessage.</p>
<p><br/></p>
<h2>WoW SendAddonMessage Function</h2>
<p><em>Sends a message to the hidden addon channel.</em></p>
<p><code>SendAddonMessage("prefix", "text", "type", "target")</code></p>
<ul>
<li><strong>prefix:</strong> String &#8211; Message prefix, can be used as your addon identifier.</li>
<li><strong>text:</strong> String &#8211; Text to send.</li>
<li><strong>type: </strong>String &#8211; AddOn channel to send to. Valid types are &#8220;PARTY&#8221;, &#8220;RAID&#8221;, &#8220;GUILD&#8221;, &#8220;BATTLEGROUND&#8221;, &#8220;WHISPER&#8221;.</li>
<li><strong>target: </strong>String &#8211; Used only for &#8220;WHISPER&#8221; communication &#8211; the player to whisper to.</li>
</ul>
<h3>Example:</h3>
<p><code>SendAddonMessage( "CTRA", "User Data: XYZ", "RAID" );</code></p>
<h3>Notes:</h3>
<ul>
<li>Calling this function results in the event CHAT_MSG_ADDON being invoked on:
<ul>
<li>&lt;target&gt;&#8217;s client if &lt;type&gt; is &#8220;WHISPER&#8221; or</li>
<li>all clients in the &lt;type&gt; chat channel, otherwise</li>
</ul>
</li>
<li>The combined length of message and prefix can be at most 254 characters (255th is the tab character used to delimit the two)</li>
<li>Length above 254 will disconnect you.</li>
</ul>
<p><br/></p>
<h2>Example Addons</h2>
<p>&nbsp;<br />
<h3>EPGP</h3>
<p><img class="alignleft size-medium wp-image-1171" title="EPGP" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/Epgp-300x104.jpg" alt="EPGP" width="300" height="104" /></p>
<p>EPGP is an Effort/Gear reward system and addon for World of Warcraft.</p>
<p>It is a DKP system, described below:</p>
<p>Dragon kill points or DKP are a semi-formal score-keeping system (loot system) used by guilds in massively multiplayer online games. Players in these games are faced with large scale challenges, or raids, which may only be surmounted through the concerted effort of dozens of players at a time. While many players may be involved in defeating a boss, the boss will reward the group with only a small number of items desired by the players. Faced with this scarcity, some system of fairly distributing the items must be established. Used originally in the massively multiplayer online role-playing game Everquest, dragon kill points are points that are awarded to players for defeating bosses and redeemed for items that those bosses would &#8216;drop&#8217;. At the time most of the bosses faced by the players were dragons, hence the name.</p>
<p>EPGP uses the messaging system to send information between users of the addon about the values of loot. For example, if an item drops for the first time, the &#8220;loot master&#8221; might assign a value to the item. This value is then shared with other people using the epgp addon.</p>
<h4>Properties</h4>
<ul>
<li>small message size</li>
<li>very infrequently sent</li>
<li>must be delivered</li>
</ul>
<p><br/></p>
<h3>Gatherer</h3>
<p><img src="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/Gatherer.jpeg" alt="Gatherer" title="Gatherer" width="267" height="188" class="alignleft size-full wp-image-1176" />Gatherer is an addon for herbalists, miners and treasure hunters in World of Warcraft. It&#8217;s main purpose is to track the closest plants, deposits and treasure locations on your minimap.</p>
<p>In WoW, players can gather resources. The resources are spread across the world and spawn in what seems to be a randomized fashion. In fact, though, the spawn locations are actually preset, so this addon tracks the location where users find resources to make it easier to find later.</p>
<p>Gatherer uses the messaging system to share the location of found resources between players. For example, when you find a resource, you can have it automatically share the location with your guild members. This way, if many people in your guild use the addon, you will get a nice compiled database of resource locations.</p>
<h4>Properties</h4>
<ul>
<li>small message size</li>
<li>medium send frequency</li>
<li>not important if messages are delayed</li>
<li>not important if some messages are completely lost</li>
</ul>
<p><br/></p>
<h3>Recount</h3>
<p><img src="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/Recount.jpg" alt="Recount" title="Recount" width="353" height="185" class="alignleft size-full wp-image-1177" />Recount is a graphical damage meter.</p>
<p>In WoW, players fight bad guys. To do this, they have to deal damage. A metric for evaluating players is to look at the amount of damage they deal per second (DPS).</p>
<p>Every action that players within your party take is logged, which includes actions that result in damage being dealt. However, you only get log information for players that are within a short range of you. Because of this, in some encounters, some players might be too far away from you for their actions to be logged.</p>
<p>To make the meter more accurate, Recount uses the messaging system to aggregate logging data about damage being done.</p>
<h4>Properties</h4>
<ul>
<li>Medium size messages</li>
<li>Heavy traffic load during encounters (about 10 seconds to 6 minutes)</li>
<li>Latency effects instantaneous accuracy during an encounter</li>
<li>Syncing algorithm still works if some messages are lost</li>
</ul>
<p><br/></p>
<h2>Data Collection</h2>
<p>To study WoW, I wrote an AddOn called AppMsgLogger. This AddOn collects three pieces of information:</p>
<ol>
<li>A timestamp every time the user logs in or out</li>
<li>For every addon message received (the CHAT_MSG_ADDON event), a timestamp, the zone the user is in, the channel it was received on (PARTY, GUILD, RAID, etc), and the length of the message in bytes</li>
<li>A list of the AddOns the user is running</li>
</ol>
<p>I had several players run this AddOn for me and send me the data they collected. As a result, I have data consisting of 49 hours of time played across eight players.</p>
<p><br/></p>
<h2>Results</h2>
<p>Out of the 173,456 messages received, here is the breakdown of the channel they were sent on:</p>
<ul>
<li>Battleground: 262 (0.15%)</li>
<li>Whisper: 1,240 (0.71%)</li>
<li>Party: 3,185 (1.84%)</li>
<li>Guild: 21,439 (12.36%)</li>
<li>Raid: 147,330 (84.94%)</li>
</ul>
<p>So, only less than 1% of the traffic is unicast (Whisper) while the majority is broadcast traffic related to combat (Raid) and social organization (Guild).</p>
<p>Next, I wanted to look at the breakdown of message size. Here&#8217;s a histogram plotting message size frequency:<br />
<img src="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/message_size_histogram.png" alt="" title="message_size_histogram" width="480" height="360" class="aligncenter size-full wp-image-1195" /></p>
<hr/>
<p>You can see in this histogram that the vast majority of messages are small (< 50 bytes), with a small cluster of messages near the maximum size of 254 bytes. This is probably because most addons use a library called <a href="http://www.wowace.com/addons/ace3/pages/api/ace-comm-3-0/">AceComm</a> which splits messages longer than 254 bytes into multiple messages.</p>
<p>Next, I wanted to see the distribution of messages across the different addons sending them. I matched each prefix string to its corresponding addon and plotted their message frequency:</p>
<p><img src="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/addon_zipf.png" alt="addon_frequency_log" title="addon_frequency_log" width="480" height="360" class="align-center size-full wp-image-1196" /></p>
<hr/>
Note the log-log scale. What we see here is that a small number of addons send the majority of messages, with a long tail of addons that send a very small number of messages.</p>
<p><br/><br />
Next, I wanted to look at how bursty the message traffic was. To get an idea, I took a single user&#8217;s 24-hour play time (over multiple sessions), strung it together, and calculated the average byte rate of each 1-minute interval (click for full size).<br />
<a href="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/message_throughput_time_zones.png"><img src="http://sns.cs.princeton.edu/wp-content/uploads/2010/09/message_throughput_time_zones-150x150.png" alt="message_throughput_time_zones" title="message_throughput_time_zones" width="150" height="150" class="align-center size-thumbnail wp-image-1197" /></a></p>
<hr/>
<p>What you can see in this graph is that there is considerable variability: some periods show a high rate (up to 14 Kbit/s) while others show a rate of almost zero.</p>
<p>To put these numbers in context, I added a background color that alternates every time the user switched zones. For zones that were longer than 10 minutes, I also added the name of the zone in the center of its colored region. As expected, high-rate periods are seen during raids (Icecrown) while the user was engaged in combat. Low-rate periods are seen when the user was presumably idling in capital cities (Orgrimmar, Dalaran).</p>
<p>The last thing I wanted to see was what percentage of messages were wasteful. Since messages are broadcast to their corresponding channels (GUILD, RAID, etc), users in the channel will receive all messages sent to the channel regardless of whether they have an addon actually listening to a given prefix string. Since I had a mapping of prefixes to the sending addon, I calculated the percentage of messages that were received by each user with a prefix string for an addon that the user wasn&#8217;t currently running. The percentages for the users that I collected data from are as follows: 58.9%, 28.2%, 15.9%, 12.6%, 9.5%, 86.1%, 83.9%, 17.0%. Since the sample size is low and variability is high, I&#8217;d be hesitant to draw any conclusions from those numbers, but it leads me to believe that a significant amount of traffic that is broadcast is wasteful.</p>
<p><br/></p>
<h3>Conclusion</h3>
<p>I hope you&#8217;ve found this analysis useful. We&#8217;ll be using some of this information as we continue to design and implement Sirikata.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=jGUh0QxBgKs:8VfEdwwqU5g:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=jGUh0QxBgKs:8VfEdwwqU5g:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=jGUh0QxBgKs:8VfEdwwqU5g:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=jGUh0QxBgKs:8VfEdwwqU5g:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=jGUh0QxBgKs:8VfEdwwqU5g:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=jGUh0QxBgKs:8VfEdwwqU5g:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/jGUh0QxBgKs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2010/09/application-level-addon-messaging-in-world-of-warcraft/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2010/09/application-level-addon-messaging-in-world-of-warcraft/</feedburner:origLink></item>
		<item>
		<title>Easing the Hybrid RAM/SSD Memory Management with SSDAlloc</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/7ezptflCjqg/</link>
		<comments>http://sns.cs.princeton.edu/2010/07/easing-the-hybrid-ramssd-memory-management-with-ssdalloc/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 17:43:36 +0000</pubDate>
		<dc:creator>Anirudh Badam</dc:creator>
				<category><![CDATA[System Management]]></category>
		<category><![CDATA[Technology for Developing Regions]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=991</guid>
		<description><![CDATA[Today, I am going to talk about a system called SSDAlloc, a hybrid RAM/SSD memory manager, that I have been working on over the past year and a half or so, jointly with my adviser Vivek Pai. The post is more of a research report briefing. I&#8217;ll first talk about what motivated us to explore [...]]]></description>
			<content:encoded><![CDATA[<p>Today, I am going to talk about a system called SSDAlloc, a hybrid RAM/SSD memory manager, that I have been working on over the past year and a half or so, jointly with my adviser <a href="http://www.cs.princeton.edu/~vivek" target="_blank">Vivek Pai</a>. The post is more of a research report briefing. I&#8217;ll first talk about what motivated us to explore this area and then delve into the technical jargon.</p>
<p>I will start with describing what motivated us to use SSD as program memory and then I will describe why using SSD as a swap space is a bad idea. After that, I will describe a SSD based non-transparent object storage engine we built that provides high performance but requires many application modifications. Later, I discuss the broader research goal, relevant to various situations (from developing regions to datacenters) which is to use as few modifications as possible to provide a nearly-transparent path to enable applications to use flash memory, providing ease of migration, high performance, and when desired, simple ACID transactions (because many applications expect transactions from non-volatile memories). After describing our solution in detail we provide some initial performance numbers.</p>
<p><strong>SSD as memory</strong>: When we first started running <a href="http://www.cs.princeton.edu/~abadam/hashcache.html" target="_blank">HashCache</a> on desktop towers, we ran into problems related to power, dust and slow restarts &#8212; described more formally by Surana et al in this <a href="http://www.usenix.org/events/nsdi08/tech/full_papers/surana/surana_html/">paper</a>. We contemplated running HashCache on netbooks, since they are sturdy, ruggedized, come with a battery, and are fairly immune to dust and a lot less power hungry compared to desktop towers/laptops. The proposition was not without challenges; the small amount of RAM (1-2GB) was not amenable to index large hard drives even with HashCache! We wondered if we could use the internal SSD (netbooks come with a 4-16GB SSD) as a cheap RAM substitute and an external drive as cache. This is a good idea since proxy caches&#8217; performance is disk driven and not limited by the number of index lookups when using HashCache. Most netbooks SSD supported 500 to 2000 random reads/sec, which is an order of magnitude more than disk-seek performance. Also, with the growing gap between CPU and disk speed, flash has attracted system designers in recent times. We thought it was a good time to explore an SSD based solution.</p>
<p><strong>SSD as Swap?</strong>: The next thing we did was to move the operating system to an external drive and use the internal SSD as swap. Another external hard drive, ~512GB, was used as the cache (<a href="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/netbook-cache.jpg" target="_blank">pic</a>) which needed an index of 8GB size. As a proxy cache the setting did just fine. On that small CPU (single core 1.2 Ghz 256KB cache), it was able to swap in and out enough number of times to hit the disk limit (about 200 requests per second). Later, when we tried to use the setting as a WAN Accelerator using <a href="http://www.cs.princeton.edu/~sihm/papers/wanax-usenix10.pdf" target="_blank">Wanax</a>, we realized that using SSD as a swap was a bad idea. Wanax is a WAN Accelerator, which makes multiple index lookups for populating a single HTTP object. Highest performance version of Wanax needs tens of index inserts/lookups per HTTP object insert/lookup. Swap was exhausting the random write throughput on the SSD. Also, RAM turned into a page cache with each 4KB page containing only about 100 bytes of useful data. That was when we realized we needed special techniques to reduce random writes and make the best of use of the limited RAM in the system.</p>
<p><strong>SSD as object store</strong>: Towards that end we built an SSD based object management tool that had non-transparent read and write calls, with a back-end SSD based log-structured object store and used RAM as a compact object cache, described more formally in our workshop <a href="http://www.cs.princeton.edu/~abadam/papers/ssdalloc.pdf" target="_blank">paper</a>. Additionally, in that paper we proposed a minor improvement to HashCache (called HashCache-Striping), which is better suited for the hybrid SSD/RAM setting. HashCache-Striping is an efficient hashtable representation for hybrid RAM/SSD settings where the RAM:SSD ratio is high (typically 1:16 or higher). It requires only 1 byte of RAM for each index entry and provides as many index operations per second as reads/sec of the SSD used. More recently, <a href="pages.cs.wisc.edu/~akella/papers/bufferhash-nsdi10.pdf" target="_blank">other</a> researchers have explored such indexes for hybrid settings.</p>
<p><strong>Broader research goal</strong>: While the non-transparent technique did wonders with respect to performance it did take a lot of development effort to do so. We wondered if one could obtain the same performance benefits without having to take the painful route of having to redesign and tweak ones data structures and algorithms to make them SSD-aware. Essentially, our new goal was to use as few modifications as possible to provide a nearly-transparent path to enable applications to use flash memory, providing ease of migration, high performance, and when desired, simple ACID transactions. I am glad to announce that we were successful in our endeavor. The solution is beneficial for many situations including datacenter servers, which could use a 4TB SSD with 64-128GB RAM to scale performance or gateway servers for developing regions on netbooks with 1-2GB RAM and 4-16GB SSD. Before I delve into to the details of the solution I would first like to summarize the design space and describe the full benefits of our solution called SSDAlloc. The following table provides a nice summary. The table shows analytical performance numbers for a high-end enterprise SSD.</p>
<p><a href="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/tabImage.png"><img class="aligncenter size-full wp-image-1020" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/tabImage.png" alt="" width="530" height="300" /></a></p>
<p><strong>Random writes are only half the devil</strong>: One could claim that the only vice of swap was the random writes and hence having a write-logged swap is the panacea, but that is only half-true. Using SSD as swap forces a page level access granularity for every access. Each random write would lead to a full page-write, which would significantly increase the write traffic when the application object is very small (~100 bytes in case of HashCache-Striping). The situation (one with a RAM:SSD ratio of 1:16 or higher) motivates the need to treat SSD as something other than swap space. When most objects reside on the SSD, each object access would transfer a full page from SSD to RAM. If the objects on a page do not have temporal locality, then the rest of the page, while still being populated, really presents little benefit to the application. Instead we propose using a separate memory allocation and management tool that explicitly addresses this situation. If only a few lines of memory allocation code need to be modified to migrate an existing application to an SSD-enabled one, this one-time development cost is low compared to the cost of high-density memory (and a high-end server which can house that much RAM).</p>
<p><a href="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/SSDAlloc-ATC-Image.001-0011.png"><img class="aligncenter size-large wp-image-1011" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/SSDAlloc-ATC-Image.001-0011-1024x576.png" alt="" width="530" height="300" /></a></p>
<p><strong>Introduction to SSDAlloc</strong>: I now describe our solution in more detail. SSDAlloc is a memory management tool that eases the hybrid RAM/SSD development. SSDAlloc is a slab based memory allocator whose interface looks similar to <em>calloc</em>. The only additional information needed by SSDAlloc for providing the performance benefits is the size of the application object. Each object allocated by SSDAlloc resides primarily on the SSD, is cached in RAM (RAM Object Cache, as shown above) when needed and also has a permanent virtual memory address where the object gets mapped to on demand. The SSD is managed as a log structured object storage engine, where the location of an object can change over time. Additionally, SSDAlloc is accompanied by various styles in which the virtual memory space is managed. We discuss two of them in this article. The first one, called Memory Pages (MP), is similar to how <em>malloced</em> memory looks. A contiguous memory space containing pages, each of which could contain multiple application objects. When object are allocated using MP the pages containing these objects are managed on the SSD in a log-structured manner (whole pages treated as objects).</p>
<p><strong>Object per page model</strong>: Another virtual memory space management that we use is called Object Per Page (OPP). In this model, each object resides on its own page; if the size of the object is lesser than a page then the rest of the page is kept empty. To avoid wasting too much RAM we map only those objects to their pages, which are currently in use. Frequently accessed objects are cached in RAM Object Cache, which is a compact object store in RAM. Unused objects are pushed to SSD. There are two methods by which we minimize the wastage of RAM from using OPP. In the first method, applications explicitly declare session boundaries, which are used as markers by the SSDAlloc runtime to map objects out of their virtual memory locations. In the second method, the runtime maintains a page buffer, which is a small buffer of pages with currently mapped objects on them. When the buffer fills up, the runtime unmaps all the objects mapped to their pages so far, pushes them to RAM object cache and starts filling the buffer again by mapped objects to their virtual memory page on demand. In case of MP, SSDAlloc deals in full pages and treats each page as an object in itself. For objects spanning multiple pages in both the cases, the individual pages are still managed as separate objects.</p>
<p><strong>Virtual memory management</strong>: As mentioned earlier, SSDAlloc runtime manages the SSD as a log structured object/page storage where the location of an object can change over time. To keep track of the location of an object we maintain a table called ObjectTable (shown in the figure above) that is similar to page tables. ObjectTables indicate the location of object regardless of where they currently reside (on the SSD or in the RAM Object cache or in the page buffer). A new ObjectTable is created for every set of objects obtained via SSDAlloc. When the application faults on a page for accessing an object, the runtime populates the required page &#8220;on-the-fly&#8221; by looking up the ObjectTable and inserts the new page in the page buffer. For this process to be quick, an efficient mechanism is needed for translating a virtual memory address of an object to its ObjectTable. In practice, we found a binary search tree was good enough for such a translation mechanism. Obviously, the lookups will be faster if the number of ObjectTables is minimized (to reduce the size of the binary search tree). Towards that end, SSDAlloc works as a slab allocator so that controlling the number of slabs leads to an optimal balance between SSD space usage and performance.</p>
<p><strong>ACID transactions</strong>: While programmed session information (referred to as soft transactions) and buffered pages (referred to as agnostic transactions) are enough for reducing space wastage they do not provide any kind of guarantees. Some applications need ACID transactions from non-volatile memory. SSDs readily provide durability; atomicity can be ensured by evicting all or none of the objects belonging to a session (specified by the programmer) to the SSD. Isolation can be provided in the runtime library by allowing multiple threads to map the same set of objects at a different virtual memory address. Trapping accesses to different addresses of the same object allow a user defined synchronous data access model to take action when two threads try to modify the same object. Consistency has to be an application level semantic. With ACID transactions, SSDAlloc provides feature improvement over using SSD as swap alongside the performance improvement.</p>
<p><a href="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/SSDAllocGainTable.png"><img class="aligncenter size-full wp-image-1015" src="http://sns.cs.princeton.edu/wp-content/uploads/2010/07/SSDAllocGainTable.png" alt="" width="530" height="300" /></a></p>
<p><strong>Performance</strong>: Using SSDAlloc we modified some applications including memcached (a key-value storage system), a boost library based B+Tree (transactional inserts, lookups and deletes) and HashCache (an efficient hashtable representation for in memory indexes). We also developed a transactional filesystem from scratch. We benchmarked these applications with and without SSDAlloc. When not using SSDAlloc these systems simply used the SSD as a swap space. The table above summarizes the results. In case of memcached, only the values are stored on the SSD using SSDAlloc. In case of the transactional filesystem the modified lines of code represent the number of lines needed to make the operations transactional. With only 15&#8211;36 lines of code modifications we obtain a speedup of upto an order of magnitude improvement over using SSD as a swap. Throughput gains obtained when using hard disk as swap are also indicated.</p>
<p>Additional information on the project can be obtained from the <a href="http://www.cs.princeton.edu/~abadam/ssdalloc.html">SSDAlloc project page</a>. Stay tuned for the first code release which is around the corner.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=7ezptflCjqg:xhcDg6bD6rU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=7ezptflCjqg:xhcDg6bD6rU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=7ezptflCjqg:xhcDg6bD6rU:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=7ezptflCjqg:xhcDg6bD6rU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=7ezptflCjqg:xhcDg6bD6rU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=7ezptflCjqg:xhcDg6bD6rU:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/7ezptflCjqg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2010/07/easing-the-hybrid-ramssd-memory-management-with-ssdalloc/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2010/07/easing-the-hybrid-ramssd-memory-management-with-ssdalloc/</feedburner:origLink></item>
		<item>
		<title>ACM Symposium on Cloud Computing (SOCC 2010) Day 2</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/GfO4N8yi0ns/</link>
		<comments>http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-2/#comments</comments>
		<pubDate>Sat, 12 Jun 2010 05:35:30 +0000</pubDate>
		<dc:creator>Jeff Terrace</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=958</guid>
		<description><![CDATA[I’m back in Princeton after spending the week in Indianapolis, Indiana for the first ACM Symposium on Cloud Computing (SOCC 2010). I’m posting here with a brief summary of each talk from Day 2 of the conference as well as some of my thoughts. These are my reactions to the presentations only, as I haven’t [...]]]></description>
			<content:encoded><![CDATA[<p>I’m back in Princeton after spending the week in Indianapolis, Indiana for the first ACM Symposium on Cloud Computing (<a href="http://research.microsoft.com/en-us/um/redmond/events/socc2010/">SOCC 2010</a>). I’m posting here with a brief summary of each talk from Day 2 of the conference as well as some of my thoughts. These are my reactions to the presentations only, as I haven’t read most of the papers.</p>
<p>[See my previous post for <a href="http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-1/">Day 1</a>]</p>
<p><br/></p>
<h3>Keynote 2</h3>
<p><b><a href="http://research.microsoft.com/en-us/um/redmond/events/socc2010/sobel.htm">Building Facebook: Performance at Massive Scale</a></b><br />
<i>Jason Sobel (Facebook)</i></p>
<p>Jason gave a great presentation about the architecture at Facebook and some of the lessons their engineering team has learned in the last several years. In the last three years, Facebook has seen a factor of 20 growth in the number of users they support &#8211; from 1 million users in 2007 to 400 million users in 2010. With this high rate of growth, the infrastructure team has had to keep the site running in addition to innovating.</p>
<p>The traditional website scaling technique is to have all of a user&#8217;s data stored in a single server so that data can be partitioned easily and therefore scale. The social network that Facebook has can&#8217;t use this method because the data is interconnected and there is no way to partition the user graph. Instead, Facebook uses a number of techniques &#8211; aggressive caching, a multi-tiered infrastructure, and innovative APIs for developers that allow for applications to easily scale.</p>
<p>The three tiered structure that Facebook uses consists of:
<ol>
<li>a web server front end running PHP that processes user requests. Not much to say here except that Facebook now compiles PHP to C++ to speed up execution</li>
<li>a caching layer that uses memcached which is simple, fast, and reliable. They have 10&#8242;s of TBs of memory for caching!</li>
<li>a database layer that uses MySQL. They have 8,000 server-years of run time experience without data loss or corruption. Since most requests are cached, the traffic to MySQL stays low, but they require a 3:1 ratio of the innodb table size to RAM for each server in order to keep up with request rate.</li>
</ol>
<p>One interesting caveat here is that they are double caching most data. It&#8217;s hard to provide enough caching space to only cache in the memcache layer or the database layer, so the double-caching approach is the simplest option right now for Facebook, but looking into reducing this in the future.</p>
<p>Jason also talked about their Hadoop/Hive/Scribe system used to run MapReduce queries as well as a system called TAO that they use to provide a simple API to their developers for writing applications. Rather than developers having to write database and memcache queries, TAO is an API-aware layer built on top of memcache that provides easy to use abstractions.</p>
<p>Overall, a great talk with nice insight into Facebook&#8217;s infrastructure.</p>
<p><br/></p>
<h3>Applications</h3>
<p><b><a href="http://research.microsoft.com/en-us/um/people/antr/publications/socc0023-karagiannis.pdf">Hermes: Clustering Users in Large-Scale E-mail Services</a></b><br />
<i>Thomas Karagiannis (Microsoft Research), Christos Gkantsidis (Microsoft Research), Dushyanth Narayanan (Microsoft Research) , Antony Rowstron (Microsoft Research)</i></p>
<p>The Hermes project comes out of MSR where they are looking into optimizing the storage of e-mail messages in Microsoft Exchange. The normal implementation for Exchange is to take a new user account and locate their messages on any server in the Exchange cluster that has space available. By doing this, it splits load across the Exchange servers well.</p>
<p>The problem is that when two users that are on different physical servers receive the same message in their mailbox, the message has to be stored twice &#8211; once on each server. Hermes looks into trying to cluster users to minimize the amount of storage redundancy and network traffic between servers.</p>
<p>An example system was given that has 128K users on 68 exchange servers (1800 users/server). The sample size was 22 weeks of e-mail which contained 337M messages. They calculated that there is 3TB of extra storage required per week to store messages. By using a clustering algorithm to cluster together users that frequently e-mail each other, Hermes was able to save 55TB of storage over the 22 weeks of e-mail. Since moving user mailboxes is expensive, they suggest partitioning once and then re-partitioning once a week.</p>
<p>This was nice work, and seems like it solved a big problem with Exchange, although I question the architecture of Exchange to begin with. Services like Gmail do a much better job of storing e-mail messages and don&#8217;t run into problems like this.</p>
<p><b>Defining Future Platform Requirements for e-Science Clouds</b><br />
<i>Lavanya Ramakrishnan (Lawrence Berkeley National Lab), Keith Jackson(Lawrence Berkeley National Lab) , Shane Canon (Lawrence Berkeley National Lab) , Shreyas Cholia (Laurence Berkeley National Lab) , John Shalf (Lawrence Berkeley National Lab)</i><br />
<i>[position paper]</i></p>
<p>Keith talked about the current state of how (non-computer) scientists use cloud services to do their work. Many scientists use things like Hadoop to process large datasets (e.g. Large Hadron Collider). There are three main types:
<ol>
<li>Large scale synchronous workloads &#8211; generally run today on supercomputers</li>
<li>Medium scale synchronous workloads &#8211; can use local clusters</li>
<li>Large scale asynchronous workloads &#8211; these can be parallelized well so can be run on cloud services</li>
</ol>
<p>He talked about how the interfaces to cloud services are not standardized, so once chosen, scientists are locked in to a single provider. The suggestion was that cloud services should be more standardized and make it easier to share and collaborate among scientists.</p>
<p><br/></p>
<h3>Programming Models and Optimization</h3>
<p><b><a href="http://cseweb.ucsd.edu/groups/sysnet/miscpapers/socc10_final.pdf">Fluxo: A System for Internet Service Programming by Non-expert Developers</a></b><br />
<i>Emre Kiciman (Microsoft Research), Benjamin Livshits (Microsoft Research), Madanlal Musuvathi (Microsoft Research), Kevin Webb (University of California San Diego) </i></p>
<p>This work focused on trying to provide a way for non-experts to write Internet services that scale well. Non-experts now have access to large scale cloud platforms like Amazon, Google, and Microsoft, so hardware is no longer the bottleneck when needing to scale an application. Instead, it&#8217;s difficult to architect an application.</p>
<p>The current techniques for scaling Internet services are things like tiering, partitioning, replication, data duplication and de-normalization, pre-computation, and others. These techniques are notoriously difficult to build and deploy correctly. It would be nice for non-experts to have a way to write applications using these techniques easily.</p>
<p>In comes Fluxo &#8211; a compiler that takes input in the form of a restricted programming language and translates it to run on Azure. The restricted programming language simplifies program analysis and allows for the use of the common techniques listed above. Fluxo is profile driven, so it collects metrics, analyzes the program, transforms it, and repeats.</p>
<p>In an evaluation, pre-computation can get benefits of up to 60% in performance, and caching can decrease latency by up to 50%. Seems like a nice idea &#8211; I can see small to medium size companies finding a platform like this useful.</p>
<p><b>Nephele/PACs: A Programming Model and Execution Framework for Web-Scale Analytical Processing</b><br />
<i>Stephan Ewen (TU Berlin) , Fabian Hueske (TU Berlin) , Daniel Warneke (TU Berlin) , Dominic Battré (TU Berlin) , Volker Markl (TU Berlin) , Odej Kao (TU Berlin)</i></p>
<p>Data-centric parallel programming uses schema-free systems like MapReduce and Hadoop to run efficiently. Relational databases, on the other hand, are schema bound, have well-defined properties, are flexible and optimizable, but don&#8217;t scale well. This work looks at extending the capabilities of MapReduce. The notion of first-order and second-order function is introduced. First-order functions are user code, while second order functions are things like the Map and Reduce steps today.</p>
<p>PACTS extends the MapReduce paradigm with some additional functions. For example:
<ul>
<li>Cross (two inputs) &#8211; each combination of records from the two inputs is built and is independently processable</li>
<li>Match (two inputs) &#8211; each combination of records with equal keys from the two inputs is built</li>
<li>CoGroup (many inputs) &#8211; pairs with identical keys are grouped for each input, group of all inputs with identical keys are processed together</li>
</ul>
<p>PACT programs are compiled to parallel data-flows to be executed in Nephele. Nephele is an execution engine for parallel data-flow programs. It provides the process scheduling, communication, and fault-tolerance mechanisms needed to execute the workload. Example given of PACT workloads are TCP-H aggregation and K-Means Clustering. I&#8217;m not too familiar with running MapReduce workloads, so not sure of the impact this work will have.</p>
<p><b><a href="http://www.eecs.berkeley.edu/~franklin/Papers/socc10armbrust.pdf">The Case For PIQL: A Performance Insightful Query Language</a></b><br />
<i>Michael Armbrust (UC Berkeley), Nick Lanham (UC Berkeley), Stephen Tu (UC Berkeley) , Armando Fox (UC Berkeley) , Michael Franklin (UC Berkeley) , David Patterson (UC Berkeley) </i><br />
<i>[position paper]</i></p>
<p>This work looked at the tradeoff between RDBMS&#8217;s and NoSQL-type databases. RDBMS&#8217;s have powerful declarative query languages, have a query optimizer that decides the best execution plan, has automatic parallelism, and is performance-opaque. Contrastingly, NoSQL has a simple query interface (get/set), but complex queries have to be implemented at the application layer and are non-trivial to architect.</p>
<p>The argument is made to use a system called PIQL which is designed to be a middle ground between the two extremes. PIQL is a scale-independent declarative language that only allows developers to write queries with a data-size independent upper bound. PIQL is able to say no to certain queries, where as an RDBMS can&#8217;t.</p>
<p>Most applications have natural cardinality constraints. Example given was that a Facebook user has an upper limit of 5000 friends. PIQL lets developers specify these constraints in their data model so the optimizer can use them to decide on an optimized execution plan. If a query would be too slow, PIQL rejects it. PIQL is designed to run on top of existing key/value data-stores and it scales well. I think this is the right direction for providing easier ways to write NoSQL-type applications. I&#8217;m interested in looking more at what their API provides.</p>
<p><b><a href="http://www.cs.duke.edu/~gang/documents/details.pdf">Towards Automatic Optimization of MapReduce Programs</a></b><br />
<i>Shivnath Babu (Duke University)</i><br />
<i>[position paper]</i></p>
<p>Shivnath described some of the difficulties with using MapReduce today. The lifecycle of a MapReduce job is that the programer first provides the Map and Reduce functions, kicks off the job, which gets split into many Map waves, followed by many Reduce waves, until it finishes.</p>
<p>The complexity here is that the user has to specify things like the number of maps, the number of reduces, and memory usage. There are over 190 parameters that a user can specify in Hadoop, and for many, it isn&#8217;t clear how they impact performance. As an experiment, a 50GB terasort was run on EC2. By changing parameters, the running time drastically changed, and certain parameters depend on each other, so individual parameters can&#8217;t be tuned separately.</p>
<p>The argument was that a better approach should be used for default parameters in Hadoop. There has been a lot of work in traditional RDBMS&#8217;s and these techniques should be leveraged to try and tune MapReduce workloads.</p>
<p><br/></p>
<h3>Benchmarking and Testing</h3>
<p><b><a href="http://research.yahoo.com/files/ycsb.pdf">Benchmarking Cloud Serving Systems with YCSB</a></b><br />
<i>Brian Cooper (Yahoo! Research) , Adam Silberstein (Yahoo! Research) , Erwin Tam (Yahoo! Research) , Raghu Ramakrishnan (Yahoo!) , Russell Sears (Yahoo! Research) </i></p>
<p>This work looks at how to properly evaluate a NoSQL cloud database. There are many systems out there &#8211; PNUTS, BigTable, Megastore, Azure, Cassandra, CouchDB, and many more, but the tradeoffs between systems are not clear. It would be nice if there was a standard benchmark that could be used to analyze the performance of each system so that developers can choose the appropriate solution to meet their needs.</p>
<p>In comes YSCB &#8211; an open source, standard benchmark tool for evaluating NoSQL systems. YCSB focuses on performance (latency and throughput) and scalability (adding servers, latency as system scales, and elasticity). The YCSB tool is written in Java, which provides easily extendable workloads.</p>
<p>A test setup was run on six server-class machines, and systems evaluated are Cassandra, HBase, sharded MySQL, and PNUTS. Some of the example workloads given were (more in paper):
<ul>
<li>A &#8211; Update heavy workload that had 50:50 ratio of reads to updates</li>
<li>B &#8211; Read heavy workload with 95:5 ratio of reads to updates</li>
<li>E &#8211; Short range scans of 1 to 100 records of size 1KB</li>
<li>Elasticity &#8211; Run a read-heavy workload while adding servers from 2 to 6.</li>
</ul>
<p>The graphs showed a clear distinction between different systems and provided insight to which systems would be best in different situations. It was also noted that these tests can be used to improve the database system, as happened with Cassandra which improved dramatically in subsequent version releases.</p>
<p>This looks like a great tool for evaluating the plethora of NoSQL systems out there. It would be nice to have a standard setup (e.g. on Emulab) so that the comparison between systems is fair and repeatable.</p>
<p><b>Automated Software Testing as a Service</b><br />
<i>George Candea (EPFL) , Stefan Bucur (EPFL) , Cristian Zamfir (EPFL)</i><br />
<i>[position paper]</i></p>
<p>Stefan pointed out that when we install privileged software today (like drivers) we have to take their reliability on faith. Third-party drivers run with the highest privilege in the operating system, so if they have bugs or malicious code, serious problems can occur. The analogy made was to cars &#8211; we wouldn&#8217;t give our car keys to a random driver without some guarantee of the car&#8217;s safety.</p>
<p>It would be nice to be able to test a driver by exploring all possible code paths and look for common bugs and problems. Testing as a Service (TaaS) is a system for doing just that. TaaS can be used by home users by uploading a driver to a website and evaluating it, by developers who can integrate testing into their IDE&#8217;s for convenience, and as a commercial certification service for software.</p>
<p>This seems like a nice idea, although I question how much traction it could get. A lot of Windows drivers today don&#8217;t get certified because companies like to cut corners and produce a product for the least amount of money possible. It seems like even if this service was provided, companies would still cut corners and just skip using it to save money. Being able to test as a home user is not that useful, as a failed test would just give you the option of not installing the software, the same option you have today. Perhaps if a big player like Microsoft started requiring certification for driver installation, it could get the traction it needed.</p>
<p><br/></p>
<h3>Keynote 3</h3>
<p><b><a href="http://research.microsoft.com/en-us/um/redmond/events/socc2010/woollen.htm">The Internal Design of Salesforce.com&#8217;s Multi-Tenant Architecture</a></b><br />
<i>Rob Woollen (Salesforce.com)</i></p>
<p>This was an interesting keynote. I wasn&#8217;t very familiar with exactly what Salesforce provided to its customers before this talk. I knew that a lot of companies used their services, but I didn&#8217;t realize the scale they were working at. Salesforce provides cloud services for 75,000 customers running 150,000 custom applications and serving 17,500 websites.</p>
<p>They provide many services, including a query language, API&#8217;s, OLAP, batch processing, search, mobile, content management, packaging, and many others. What&#8217;s interesting about Salesforce&#8217;s architecture, though, is that each customer (or &#8220;tenant&#8221; as they call them) is not given its own architecture. Instead, all tenants run on a single platform, supported by a single code base.</p>
<p>Each username is mapped to a tenant, which is mapped to an &#8220;instance&#8221;. An instance is a cluster of hardware including a SAN, application servers, load balancers, object storage, etc. To provide isolation, each tenant runs as a separate Java thread. They have a three-tier caching layer, first caching locally within each Java thread, then caching with remote requests, and finally providing an application-layer service to users running on memcache.</p>
<p>Underneath the cache is a persistence layer that uses Oracle. Every customer&#8217;s data is stored together in a table, with static (compile time) and dynamic (run time) query analysis that enforces that a customer can only read their own data. All data is stored as a string in the database, so to provide indexes, a separate table has a column for each possible type which can then be indexed.</p>
<p>Salesforce only maintains a single code base for their infrastructure. To provide compatibility with previous releases, functions are annotated with API version numbers, such that if a customer is still using an old version, they don&#8217;t get access to changes in the API. When customers package their own code, they are required to submit unit tests that cover most of their code paths, so when new versions are released, Salesforce uses customer tests to verify compatibility with old API versions. An analytics component is provided that does real-time calculations instead of offline processing like most other analytics systems.</p>
<p>This was a nice talk. The presentation was a little dry (Rob didn&#8217;t seem very enthusiastic about their services!) but it was still interesting.</p>
<p><br/></p>
<h3>Data Services</h3>
<p><b><a href="http://www.cs.ucsb.edu/~sudipto/papers/socc10-das.pdf">G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud</a></b><br />
<i>Sudipto Das (UC Santa Barbara) , Divyakant Agrawal (UC Santa Barbara) , Amr El Abbadi (UC Santa Barbara) </i></p>
<p>G-Store addresses the weakness of most NoSQL databases that they don&#8217;t provide transactions over multiple keys. Many NoSQL databases provide test-and-set type semantics for a single key, but can&#8217;t provide any guarantees about multiple keys. To do this, users have to implement any transactional logic in the application layer.</p>
<p>G-Store is a layer that runs on top of a key/value store and provides additional functions. Users can submit a list of keys to add to a group. When a group is created, a single node is assigned as the controller for this set of keys. Since a single node controls the group, it can execute transactions over the group. Once a user is finished, the group can be deleted.</p>
<p>This work is similar to our own CRAQ work, which can provide multi-key transactions over a chain. CRAQ only provides static group membership, in that it doesn&#8217;t support explicitly moving keys around after their initial creation, but by leveraging Sinfonia-type mini-transactions, it provides the same service as G-Store. I was disappointed that we weren&#8217;t cited.</p>
<p><b>Google Fusion Tables: Data Management, Integration and Collaboration in the Cloud</b><br />
<i>Alon Halevy (Google) , Hector Gonzalez (Google) , Jayant Madhavan (Google) , Christian Jensen (Aalborg University) , Jonathan Goldberg-Kidon (MIT) , Warren Shen (Google) , Rebecca Shapley (Google) , Anno Langen (Google)</i><br />
<i>[industrial presentation]</i></p>
<p>Google Fusion Tables is a service that allows data management for the web era. Google wants to provide data storage that can integrate seamlessly with the web, is searchable, embeddable, extensible, easy to use, provides incentives for sharing data, and facilitates collaboration.</p>
<p>The example given was biking trails in Europe. There is a website that allows users to upload GPS data of bike trails. They want to allow data like this to be integrated into the Google Maps API for visualization and allow users to query it easily, e.g. find trail lengths < 25 miles.</p>
<p>Google Fusion Tables allows users to extend data tables in order to collaborate. Tables can be merged. For example, a user might merge the bike trails table with a sponsors table and a results table. Users can have discussions about the data, set permissions, and perform other types of operations.</p>
<p>The architecture to support these operations has a front end dispatcher that receives requests, a query processing component, a backend that does plan execution, and a replicated storage layer running on BigTable. This seems like a nice, useful API. Looking forward to checking it out.</p>
<p><br/></p>
<h3>High Availability and Reliability</h3>
<p><b><a href="http://dprg.cs.uiuc.edu/docs/iss-socc/socc077-ko.pdf">Making Cloud Intermediate Data Fault-Tolerant</a></b><br />
<i>Steve Ko (Princeton University) , Imranul Hoque (University of Illinois at Urbana-Champaign) , Brian Cho (University of Illinois at Urbana-Champaign) , Indranil Gupta (University of Illinois at Urbana-Champaign) </i></p>
<p>This work tries to address some shortcomings in Hadoop. In MapReduce, there is a two stage computation: the Map step and the Reduce step. Between processing steps, there is a large amount of intermediate data that has to get shipped from server to server. For example, Google&#8217;s indexing application runs a chain of 24 MapReduce jobs. In many cases, this intermediate data is larger than either the input data or output data.</p>
<p>If a MapReduce node fails, the computation it was supposed to provide has to be rerun on another node. The problem is that to rerun the computation, the intermediate data that was sent to the failed node has to be sent to the new node as input. If it&#8217;s not the first step in the chain, this intermediate data is no longer available &#8211; it was the output of the previous step. To recover it, the previous step has to be rerun as well. This is a problem when chaining multiple steps together, because the failure will cascade and cause every previous step to have to be rerun.</p>
<p>This work extends Hadoop by replicating this intermediate data. Unfortunately, this isn&#8217;t a simple thing to do because most MapReduce jobs are network-bound (verified experimentally), meaning that adding additional replication slows down the job. An observation is made, though, that often the bottleneck is at the inter-rack switches. The network capacity within a rack is not saturated. To take advantage of this, replication can be done at the rack level.</p>
<p>Failures really do occur often in large MapReduce clusters (verified from industry reports), and an experimental evaluation shows that when using rack-level replication, jobs can be sped up significantly when failures occur. When no failures occur, there is only a small overhead associated with the replication. This seems like a nice addition to Hadoop, and I hope to see it added to the main branch.</p>
<p><b><a href="http://research.microsoft.com/pubs/120439/socc088-vishwanath.pdf">Characterizing Cloud Computing Hardware Reliability</a></b><br />
<i>Kashi Vishwanath (Microsoft Research) , Nachi Nagappan (Microsoft Research) </i></p>
<p>This work looks at hardware failures in large scale datacenters. Microsoft has 100K+ servers that are geographically distributed around the world running heterogeneous hardware. They observe lots of failures and have to deal with them with a support staff. Since a person has to diagnose hardware faults and perform the necessary fix, failures come at a high cost.</p>
<p>The dataset used here was from Microsoft&#8217;s online services. This runs things like Messenger, Hotmail, etc. The dataset was over a 14 month period, looking at 100K+ servers, and contained a work log from their datacenter. The work log records which server fails along with what component had to be replaced (e.g. disk, dimm, raid controller, etc).</p>
<p>The analysis they performed looked at characteristics in aggregate, failure rates across individual servers and datacenters, and tried to see if failures could be predicted. The method used was a decision tree. Some interesting takeaways:
<ul>
<li>For every 100 machines, 9.5 machines recorded a failure annually</li>
<li>For every 100 machines, 20 failures were recorded annually (so 2 repairs per machine)</li>
<li>70.7% of failures were disks, 6.5% raid controller, 5.1% memory, 17.6% other</li>
<li>When a failure occurs, there is a 20% chance another failure will occur within 1 day!</li>
<li>Failure probability is correlated with number of disks in a server, slope is greater than 1!</li>
<li>Many times, a faulty raid controller would cause failures to be reported as a disk failure</li>
</ul>
<p><b><a href="http://infoscience.epfl.ch/record/146774/files/socc048-bonvin.pdf">A Self-Organized, Fault-Tolerant and Scalable Replication scheme for Cloud Storage</a></b><br />
<i>Nicolas Bonvin (EPFL), Thanasis Papaioannou (EPFL), Karl Aberer (EPFL) </i></p>
<p>I really have nothing to say about this talk. I couldn&#8217;t understand anything from the presentation at all. I verified it wasn&#8217;t just me &#8211; others that I asked didn&#8217;t understand it, and there were no audience questions either. This is a great example of how not to give a talk.</p>
<p><br/></p>
<h3>Storage and System Modeling</h3>
<p><b><a href="http://www.pdl.cmu.edu/PDL-FTP/Storage/rabbit.pdf">Robust and Flexible Power-Proportional Storage</a></b><br />
<i>Hrishikesh Amur (Georgia Institute of Technology) , James Cipar (Carnegie Mellon University) , Varun Gupta (Carnegie Mellon University) , Michael Kozuch (Intel Corporation) , Gregory Ganger (Carnegie Mellon University) , Karsten Schwan (Georgia Institute of Technology) </i></p>
<p>This work looked at trying to provide a distributed storage service that is power-proportional. This means that as the load on the system increases, the power consumption should also increase proportionally. This is not the case today &#8211; idling servers still use a large percentage of the required power for peak load.</p>
<p>One way to reduce power consumption is just to turn off servers during times of low demand, but traditional distributed storage services use methods like consistent hashing to get good load balancing, so if you shut off a server, the data on that server is no longer available.</p>
<p>Instead, what Rabbit does is store a primary replica of every key in a small number of nodes. A secondary replica goes to a larger number of nodes, a third replica to an even larger set of nodes, and a fourth replica to the largest number of nodes. By doing this, you can turn off 95% of your servers without losing availability. If you have 5 servers on, each serves 20% of requests. If you have 100 servers on, each serves 1% of requests.</p>
<p>This allows for fast, fine-grained scaling by simply powering off machines. An example benchmark of a 100GB terasort shows that read throughput scales linearly with percentage of maximum power used. However, write performance decreases because all data has to be written to a small number of nodes. A method is used from previous work to offload writes to get better write performance.</p>
<p><b><a href="http://www.cs.cornell.edu/w8/~lonnie/racs_socc2010.pdf">RACS: A Case for Cloud Storage Diversity</a></b><br />
<i>Lonnie Princehouse (Cornell University) , Hussam Abu-Libdeh (Cornell University) , Hakim Weatherspoon (Cornell University)</i></p>
<p>RACS is a method for combining multiple cloud platforms together for data storage, such that a single vendor doesn&#8217;t have to be relied on. Today, when choosing a cloud datastore, customers are often locked in because of the high cost of switching providers. To transfer a large amount of data from vendor A to vendor B, the customer must pay for the bandwidth cost of the transfer twice &#8211; once for each provider.</p>
<p>What RACS does is use erasure coding across many cloud platforms to reduce the costs associated with switching vendors. The notation RACS(k,n) is used, where it stripes data over n providers such that any k-subset contains a complete copy of the data. This can tolerate up to (n-k) failures before data is lost.</p>
<p>An evaluation used an FTP trace from the Internet Archive over 1.5 years. The cost of three providers over time were very similar. Using RACS is more expensive: RACS(1,2) is much more expensive, while RACS(8,9) is only slightly more expensive. However, the cost of switching providers is very expensive when not using RACS, on the order of two-months of operating costs, compared to much cheaper cost when using RACS. This is interesting work &#8211; I hope they implement more cloud platforms in addition to their current prototype.</p>
<p><b><a href="http://www.cs.berkeley.edu/~bodikp/publications/socc10-spikes.pdf">Characterizing, Modeling, and Generating Workload Spikes for Stateful Services</a></b><br />
<i>Peter Bodik (UC Berkeley) , Armando Fox (UC Berkeley) , Michael Franklin (UC Berkeley) , Michael Jordan (UC Berkeley) , David Patterson (UC Berkeley)</i></p>
<p>Web applications often experience periods of spikes in traffic that are unpredictable. For example, Wikpedia, after the death of Michael Jackson, saw 13% of total requests to the single page of Michael Jackson. The first contribution of the paper was to characterize these spikes. There are two main types &#8211; workload volume spikes consisting of the time to peak, duration, magnitude, and maximum slope; and data hotspots consisting of the number of hotspots, spacial locality, and entropy.</p>
<p>The conclusion was that there is no typical type of spike because they vary widely. The next contribution was to try and generate workload spikes using the 7 characteristics listed above. They were able to create a generative spike model using two parameters N, the number of hotspots, and L, a clustering parameter.</p>
<p>After using the generative model, they were able to match real world examples they collected with their counterpart from the output of the generative model, validating that the model produces valid workload spikes.</p>
<p><br/></p>
<h3>Conclusion</h3>
<p>SOCC turned out to be a great conference, and was definitely a success overall. The next SOCC conference will be co-hosted with SOSP in 2011 in Portugal, so I&#8217;m looking forward to that if I&#8217;ll be able to attend!</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=GfO4N8yi0ns:ddd6w-ls3Jc:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=GfO4N8yi0ns:ddd6w-ls3Jc:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=GfO4N8yi0ns:ddd6w-ls3Jc:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=GfO4N8yi0ns:ddd6w-ls3Jc:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=GfO4N8yi0ns:ddd6w-ls3Jc:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=GfO4N8yi0ns:ddd6w-ls3Jc:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/GfO4N8yi0ns" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-2/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-2/</feedburner:origLink></item>
		<item>
		<title>ACM Symposium on Cloud Computing (SOCC 2010) Day 1</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/buPxsjz1tqc/</link>
		<comments>http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-1/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 01:51:32 +0000</pubDate>
		<dc:creator>Jeff Terrace</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=920</guid>
		<description><![CDATA[I'm currently in Indianapolis, Indiana for the first ACM Symposium on Cloud Computing (<a href="http://research.microsoft.com/en-us/um/redmond/events/socc2010/">SOCC 2010</a>). I'm posting here with a brief summary of each talk at the conference as well as some of my thoughts. These are my reactions to the presentations only, as I haven't read most of the papers.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently in Indianapolis, Indiana for the first ACM Symposium on Cloud Computing (<a href="http://research.microsoft.com/en-us/um/redmond/events/socc2010/">SOCC 2010</a>). I&#8217;m posting here with a brief summary of each talk at the conference as well as some of my thoughts. These are my reactions to the presentations only, as I haven&#8217;t read most of the papers. </p>
<p>[See my next post for <a href="http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-2/">Day 2</a></p>
<p><br/></p>
<h3>Keynote 1</h3>
<p><i>[Note: My SIGMOD <a href="http://www.cs.princeton.edu/~jterrace/docs/feeding-frenzy-sigmod10-web.pdf">paper</a> was being presented opposite the first keynote, so <a href="http://www.cs.princeton.edu/~steveko/">Steve Ko</a> who is also at the conference wrote this first summary.]</i></p>
<p><b><a href="http://research.microsoft.com/en-us/um/redmond/events/socc2010/dean.htm">Evolution and Future Directions of Large-Scale Storage and Computation Systems at Google</a></b><br />
<i>Jeffrey Dean (Google)</i></p>
<p>Jeff Dean gave a keynote about different services running at Google and general principles on how to build large-scale services. The talk was roughly divided into three parts. The first part was about Google&#8217;s data centers that house a few hundreds of clusters. Each cluster has thousands of machines with one or a handful of configurations. Each machine (at least) runs GFS and Colossus (next-gen GFS), and a cluster scheduling daemon.</p>
<p>The second part was about Google&#8217;s back-end services including MapReduce, BigTable, and Spanner. A few interesting notes about these systems are:
<ol>
<li>BigTable now has a dedicated team that manages BigTable service clusters. Because of this, there has been a lot of work on fair-share scheduling and performance isolation.</li>
<li>BigTable has something called coprocessors, which are basically &#8220;arbitrary code that runs next to each tablet in table&#8221;.</li>
<li>Spanner is a storage &#038; computation system that runs across data centers. It supports the mix of strong &#038; weak consistency models, fine-grained replication, &#8220;zone&#8221;-based hierarchy (1 master per zone), etc.</li>
</ol>
<p>The third part was about experiences and design patterns for building large-scale services. There were too many design patterns to post here (Jeff said he would post slides on his website), but here are a few which I find interesting:
<ol>
<li>Use back-of-the-envelope calculation whenever possible before choosing a design</li>
<li>Design for growth, but don&#8217;t overdo it (e.g., 5x &#8211; 10x OK, but not 100x growth), and</li>
<li>Canary requests: some requests crash every single process in the same service, so try a few machines first.</li>
</ol>
<p>Overall, it was an interesting talk about quite a broad set of topics. There are only a few places where you can accumulate wisdom about building truly large-scale systems, and it is always interesting to see what they are doing to cope with the scale.</p>
<p><br/></p>
<h3>Operating Systems</h3>
<p><b><a href="http://dspace.mit.edu/bitstream/handle/1721.1/51381/MIT-CSAIL-TR-2010-003.pdf?sequence=1">An Operating System for Multicore and Clouds: Mechanisms and Implementation</a></b><br />
<i>David Wentzlaff (MIT) , Charles Gruenwald III (MIT CSAIL) , Nathan Beckmann (MIT CSAIL) , Kevin Modzelewski (MIT CSAIL) , Adam Belay (MIT CSAIL) , Lamia Youseff (MIT CSAIL) , Jason Miller (MIT CSAIL) , Anant Agarwal (MIT CSAIL) </i></p>
<p>This work addresses the issue of how to create applications that run in the cloud. Machines today have many cores (16 today, up to 1000 in five years), and parallelizing across many cores is very difficult. They created a new system called Fos (Factored Operating System) which runs on top of Xen and splits the OS into separate services. The kernel is a microkernel and all services run on top (e.g. File System). The communication between components in the system is done with message passing. There are no shared locks.</p>
<p>Because of this abstraction, each component can be adjusted elastically to handle demand. For example, if an application starts accessing the File System component at too high of a rate, the underlying system can spawn another File System component on another core or another machine and start transparently redirecting requests to the new component.</p>
<p>The implementation of Fos is fairly complete &#8211; some applications include a web server, slide viewer, video transcoder (ffmpeg), and busy box. In fact, it was revealed at the end of the talk that the presentation was running on Fos, and it didn&#8217;t crash! The system&#8217;s <a href="http://fos.csail.mit.edu/">website</a> is running on the Fos web server.</p>
<p>I&#8217;m not sure how this work will play out. I could see this becoming a standard approach as servers start having thousands of cores, but I&#8217;m not sure how applications here would be able to cope with network latency involved in inter-machine message passing.</p>
<p><br/></p>
<h3>Virtualization</h3>
<p><b>Lithium: Virtual Machine Storage for the Cloud</b> (can&#8217;t find this paper online yet)<br />
<i>Jacob Hansen (VMware) , Eric Jul (Bell Labs, Dublin)</i></p>
<p>This work looks at trying to increase the performance of shared file systems that are used by virtual machine clusters. In the traditional setup, a datacenter has a <a href="http://en.wikipedia.org/wiki/Storage_area_network">SAN</a> server that hosts files for VMs in a cluster to access. The SAN is set up such that it is highly reliable and provides high throughput.</p>
<p>The problem with a SAN is that it&#8217;s expensive and it doesn&#8217;t scale very well to hundreds or thousands of servers accessing it simultaneously. It&#8217;s also limited by network bandwidth. VMware&#8217;s Lithium technology does away with the SAN and instead arranges the VM hosts into a peer-to-peer network and uses local storage to store files, replicating data for redundancy and performance.</p>
<p>The system still preserves the features required of a VM file server, e.g. cloning, snapshots, etc. It uses a branching method similar to some source control systems like Mercurial for quick copying of large files.</p>
<p>When compared to a SAN, Lithium doesn&#8217;t perform as well with a small number of hosts, but as the number of VM hosts increases, Lithium scales linearly while the SAN maxes out at a constant throughput. This approach seems like a great idea, and hope to see it pushed to production in future VMware releases.</p>
<p><b><a href="http://www.cc.gatech.edu/grads/m/mukil/papers/dvt_socc10.pdf">Differential Virtual Time (DVT): Rethinking I/O Service Differentiation for Virtual Machines</a></b><br />
<i>Mukil Kesavan (Georgia Institute of Technology), Ada Gavrilovska (Georgia Institute of Technology), Karsten Schwan (Georgia Institute of Technology)</i></p>
<p>The presentation of this work was quite confusing, but the basic idea is that when doing fair scheduling for resources in a VM, ill effects are often encountered. For example, as is well known, TCP doesn&#8217;t react very well to congestion in the network (e.g. why TCP doesn&#8217;t perform very well over wireless networks), so when a VM host artificially limits a guest&#8217;s access to the network, the TCP congestion window will drop dramatically and negatively impact performance.</p>
<p>To try and solve this problem, DVT takes an approach of never sharply decreasing a guest&#8217;s share of the network. For example, if host X is receiving 100% of network access, but hosts W, Y and Z suddenly request access, rather than the traditional approach of reducing X immediately to 25%, DVT instead slowly reduces it over time. To keep access fair, DVT will eventually reduce X to lower than 25% to make up for the increased share it got previously.</p>
<p>By doing this, DVT increases the performance of VM guests by up to 25% since the TCP congestion window doesn&#8217;t drop sharply. It was mentioned that future work will look at applying the same method to disk access, although it isn&#8217;t clear to me how slowly reducing disk access instead of sharply reducing it would increase performance in an application.</p>
<p><b><a href="http://research.microsoft.com/pubs/120435/JoulemeterVM.pdf">Virtual Machine Power Metering and Provisioning</a></b><br />
<i>Aman Kansal (Microsoft Research) , Feng Zhao (Microsoft Research) , Jie Liu (Microsoft Research) , Nupur Kothari (USC) , Arka Bhattacharya (IIT Kharagpur)</i></p>
<p>This work asked the question &#8220;Can we tell how much power a VM guest is consuming?&#8221;. I was wondering what the motivation for measuring this was throughput the talk until it was finally mentioned at the end. I&#8217;ll start with the motivation first instead &#8211; the reasoning given was mainly to use knowledge of a VM guest&#8217;s consumption to provision your datacenter power accordingly. Other usages are to charge your cloud users according to power consumption (although I don&#8217;t buy this, as I don&#8217;t see how it would differ from billing with current methods &#8211; cpu, memory, storage, bandwidth), and to track which VMs are consuming the most power so you can target power reduction for &#8220;green&#8221; initiatives.</p>
<p>To answer this question is a two step process. First, they measure the power consumption of each component in the system when a context switch between VM guests happens by using OS-level events. Next, they have to figure out which VM guest is using each component. Rather than doing this live, they monitor a VM guest for a period of time to determine its power consumption and then use that value for future calculations. The reason for this is that different loads may use different internal power for the same externally visible power state, so learning a VM guest&#8217;s power profile over time is a more accurate measurement of power consumption.</p>
<p>There were a lot of technical details glossed over in this talk, and there were many formulas on the slides that weren&#8217;t explained or accompanied by variable descriptions, so I found the presentation somewhat confusing. I&#8217;m sure reading the paper would make this more clear.</p>
<p><br/></p>
<h3>Distributed and Parallel Processing</h3>
<p><b><a href="http://cseweb.ucsd.edu/~kyocum/pubs/socc122-logothetis.pdf">Stateful Bulk Processing for Incremental Algorithms</a></b><br />
<i>Dionysios Logothetis (UC San Diego), Christopher Olston (Yahoo! Research) ,  Benjamin Reed (Yahoo! Research) , Kevin Webb (UC San Diego) , Kenneth Yocum (UC San Diego)</i></p>
<p>This work targets large data applications. These are things like web analytics, graph mining, log analysis, and PageRank, which use massive amounts of data. An insight here is that these applications have to continually process on the order of TBs of new data per day and they are stateful, but the running time is proportional to the total amount of state, not proportional to the amount of new data.</p>
<p>Continuous Bulk Processing (CBP) provides users with an additional API of a Translate() function and a RouteBy() function similar to the map and reduce stages of MapReduce. Current systems only do &#8220;outer&#8221; grouping, while CBP allows for &#8220;inner&#8221; grouping so that only the state that needs to be accessed is shipped around. In an evaluation, this inner grouping method reduced running time by up to 53%.</p>
<p>Perhaps I&#8217;m not familiar enough with MapReduce, but the presentation went too fast for me to follow the details of the Translate and RouteBy API, so see the paper for details.</p>
<p><b><a href="http://www.se.cuhk.edu.hk/~bshe/socc10.pdf">Comet: Batched Stream Processing for Data Intensive Distributed Computing</a></b><br />
<i>Bingsheng He (Microsoft Research), Mao Yang (Microsoft Research) ,   Zhenyu Guo (Microsoft Research) , Rishan Chen (Beijing University) , Wei Lin (Microsoft Research) , Bing Su (Microsoft Research) , lidong Zhou (Microsoft Research)</i></p>
<p>Comet is a system that tries to reduce the running time of large data-intensive applications that have work in common. The example given was a set of four jobs: one that computes the top ten hottest Chinese pages daily, another that computes the top ten hottest English pages daily, and corresponding jobs that compute the top ten hottest Chinese and English pages weekly. The first two jobs have the same input data, but have different filters in place, so the first step of each of those jobs is the same.</p>
<p>The flow of the system is: query series, normalization, logical optimization, physical optimization, execution plan, and then execution. By doing these optimization steps, Comet is able to reduce the amount of work done by 52% for jobs that have commonalities with previous jobs.</p>
<p>The evaluation showed that computing the top ten pages weekly was able to take advantage of the top ten daily calculation, but the top ten pages for the week don&#8217;t necessarily overlap with the top ten pages of each day, so it&#8217;s not clear how this works. This question was asked in the Q&#038;A, but the author wasn&#8217;t able to answer. The presentation of this work was very confusing, and it was clear that the rest of the audience didn&#8217;t understand either. I&#8217;m sure reading the paper would make more sense.</p>
<p><b><a href="http://nuage.cs.washington.edu/pubs/socc10.pdf">Skew-Resistant Parallel Processing of Feature-Extracting Scientific User-Defined Functions</a></b><br />
<i>YongChul Kwon (University of Washington) , Balazinska Magdalena (University of Washington) , Bill Howe (University of Washington) , Jerome Rolia (HP) </i></p>
<p>This work addresses how to reduce the running time of large scientific experiments. The example given here was an application that takes astronomical images, does feature extraction to identify the celestial objects in the images, and then runs a Friends of Friends algorithm, which is used in astronomy to cluster celestial objects. MapReduce-type systems like Hadoop are a great fit for workloads like this, but it is hard to express complex algorithms and get good performance. As an example, this algorithm when first implemented took 14 hours to run, and after a week of working, they were able to reduce the running time to 70 minutes.</p>
<p>The reason these types of algorithms can take a long time to run is that the same amount of input data doesn&#8217;t always have the same running time (e.g. if the cluster of celestial objects is more dense, takes longer), so a static partitioning scheme doesn&#8217;t get good performance. An alternative is to use micro partitions, which reduces the impact of skew, but there is additional framework overhead and to find the sweet spot, the algorithm must be run many times, which is undesirable.</p>
<p>The SkewReduce algorithm takes a sampling approach to figure out the best partitioning scheme. In evaluation, this SkewReduce algorithm was able to reduce the running time of algorithms by 2-8 times. This seems like a nice scheduling algorithm, and I hope this finds its way into the main branch of Hadoop. A person who works at Google shared a similar optimization that Google uses, but they do their optimizations in the Reduce stage rather than the Map stage.</p>
<p><br/><br />
I will attempt to get my writeup for Day 2 posted late tomorrow.<br />
[Update: <a href="http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-2/">Day 2</a> posted.]</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=buPxsjz1tqc:8HlEtXfBKJ8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=buPxsjz1tqc:8HlEtXfBKJ8:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=buPxsjz1tqc:8HlEtXfBKJ8:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=buPxsjz1tqc:8HlEtXfBKJ8:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=buPxsjz1tqc:8HlEtXfBKJ8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=buPxsjz1tqc:8HlEtXfBKJ8:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/buPxsjz1tqc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2010/06/acm-symposium-on-cloud-computing-socc-2010-day-1/</feedburner:origLink></item>
		<item>
		<title>Erroneous DMCA notices and copyright enforcement, part deux</title>
		<link>http://feedproxy.google.com/~r/PrincetonSNS/~3/iQAQmfaEFKM/</link>
		<comments>http://sns.cs.princeton.edu/2009/12/erroneous-dmca-notices-and-copyright-enforcement-part-deux/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 16:36:32 +0000</pubDate>
		<dc:creator>Mike Freedman</dc:creator>
				<category><![CDATA[Experiences]]></category>
		<category><![CDATA[Peer-to-peer]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[BitTorrent]]></category>
		<category><![CDATA[CoralCDN]]></category>
		<category><![CDATA[DMCA]]></category>

		<guid isPermaLink="false">http://sns.cs.princeton.edu/?p=791</guid>
		<description><![CDATA[[Given my continued use of Ed's Freedom-To-Tinker blog, I'm reposting this article from there.] A few weeks ago, I wrote about a deluge of DMCA notices and pre-settlement letters that CoralCDN experienced in late August. This article actually received a bit of press, including MediaPost, ArsTechnica, TechDirt, and, very recently, Slashdot. I&#8217;m glad that my [...]]]></description>
			<content:encoded><![CDATA[<p>[<em>Given my continued use of Ed's <a href="http://www.freedom-to-tinker.com/">Freedom-To-Tinker</a> blog, I'm reposting this article from <a href="http://www.freedom-to-tinker.com/blog/mfreed/erroneous-dmca-notices-and-copyright-enforcement-part-deux">there</a>.</em>]</p>
<p>A few weeks ago, I <a href="http://www.freedom-to-tinker.com/blog/mfreed/inaccurate-copyright-enforcement-questionable-best-practices-and-bittorrent-specificatio">wrote</a> about a deluge of DMCA notices and pre-settlement letters that CoralCDN experienced in late August.  This article actually received a bit of press, including <a href="http://www.mediapost.com/publications/?fa=Articles.showArticle&amp;art_aid=117924">MediaPost</a>, <a href="http://arstechnica.com/tech-policy/news/2009/11/using-faulty-data-to-demand-settlements-from-innocent-surfers.ars">ArsTechnica</a>, <a href="http://www.techdirt.com/articles/20091130/0616407124.shtml">TechDirt</a>, and, very recently, <a href="http://yro.slashdot.org/story/09/12/08/2116205/Questionable-Best-Effort-Copyright-Enforcement">Slashdot</a>.  I&#8217;m glad that my own experience was able to shed some light on the more insidious practices that are still going on under the umbrella of copyright enforcement.  More transparency is especially important at this time, given the current debate over the <a href="http://en.wikipedia.org/wiki/Anti-Counterfeiting_Trade_Agreement">Anti-Counterfeiting Trade Agreement</a>.</p>
<p>Given this discussion, I wanted to write a short follow-on to my previous post.</p>
<h3>The VPA drops Nexicon</h3>
<p>First and foremost, I was contacted by the founder of the <a href="http://videoprotectionalliance.com/">Video Protection Alliance</a> not long after this story broke.  I was informed that the VPA has not actually developed its own technology to discover users who are actively uploading or downloading copyrighted material, but rather contracts out this role to <a href="http://nexiconinc.com/">Nexicon</a>.  (You can find a comment from Nexicon&#8217;s CTO to my previous article <a href="http://www.freedom-to-tinker.com/blog/mfreed/inaccurate-copyright-enforcement-questionable-best-practices-and-bittorrent-specificatio#comment-109448">here</a>.)  As I was told, the VPA was contracted by certain content publishers to help reduce copyright infringement of (largely adult) content.  The VPA in turn contracted Nexicon to find IP addresses that are participating in BitTorrent swarms of those specified movies.  Using the IP addresses given them by Nexicon, the VPA subsequently would send pre-settlement letters to the network providers of those addresses.</p>
<p>The VPA&#8217;s founder also assured me that their main goal was to reduce infringement, as opposed to collecting pre-settlement money. (And that users had been let off with only a warning, or, in the cases where infringement might have been due to an open wireless network, informed how to secure their wireless network.)  He also expressed surprise that there were false positives in the addresses given to them (beyond said open wireless), especially to the extent that appropriate verification was lacking.  Given this new knowledge, he stated that the VPA dropped their use of Nexicon&#8217;s technology.</p>
<h3>BitTorrent and Proxies</h3>
<p>Second, I should clarify my claims about BitTorrent&#8217;s usefulness with an on-path proxy.  While it is true that the address registered with the BitTorrent tracker is not usable, peers connecting from behind a proxy can still download content from other addresses learned from the tracker.  If their requests to those addresses are optimistically unchoked, they have the opportunity to even engage in incentivized bilateral exchange.  Furthermore, the use of DHT- and gossip-based discovery with other peers&#8212;the latter is termed PEX, for <a href="http://en.wikipedia.org/wiki/Peer_exchange">Peer EXchange</a>, in BitTorrent&#8212;allows their real address to be learned by others.  Thus, through these more modern discovery means, other peers may initiate connections to them, further increasing the opportunity for tit-for-tat exchanges.</p>
<p>Some readers also pointed out that there is good reason why BitTorrent trackers do not just accept any IP address communicated to it via an HTTP query string, but rather use the end-point IP address of the TCP connection.  Namely, any HTTP query parameter can be spoofed, leading to anybody being able to add another&#8217;s IP address to the tracker list.  That would make them susceptible to receiving DMCA complaints, just we experienced with CoralCDN.  From a more technical perspective, their machine would also start receiving unsolicited TCP connection requests from other BitTorrent peers, an easy DoS amplification attack.</p>
<p>That said, there are some additional checks that BitTorrent trackers could do.  For example, if the IP query string or X-Forwarded-For HTTP headers are present, only add the network IP address if it matches the query string or X-Forwarded-For headers.  Additionally, some BitTorrent tracker operators have mentioned that they have certain IP addresses whitelisted as trusted proxies; in those cases, the X-Forwarded-For address is used already.  Otherwise, I don&#8217;t see a good reason (<a href="http://opentracker.blog.h3q.com/2007/02/12/perfect-deniability/">plausible deniability</a> aside) for recording an IP address that is known to be likely incorrect.</p>
<h3>Best Practices for Online Technical Copyright Enforcement</h3>
<p>Finally, my article pointed out a strategy that I clearly thought was insufficient for copyright enforcement: simply crawling a BitTorrent tracker for a list of registered IP addresses, and issuing a infringement notice to each IP address. I&#8217;ll add to that two other approaches that I think are either insufficient, unethical, or illegal&#8212;or all three&#8212;yet have been bandied about as possible solutions.</p>
<ul>
<li> <strong>Wiretapping:</strong> It has been suggested that network providers can perform deep-packet inspection (DPI) on their customer&#8217;s traffic in order to detect copyrighted content.  This approach probably breaks a number of laws (either in the U.S. or elsewhere), creates a dangerous precedent and existing infrastructure for far-flung Internet surveillance, and yet is of dubious benefit given the move to encrypted communication by file-sharing software.</li>
<li> <strong>Spyware:</strong> By surreptitiously installing spyware/malware on end-hosts, one could scan a user&#8217;s local disk in order to detect the existence of potentially copyrighted material.  This practice has even worse legal and ethical implications than network-level wiretapping, and yet politicians such as Senator Orrin Hatch (Utah) have gone as far as declaring that infringers&#8217; computers <a href="http://news.bbc.co.uk/2/hi/entertainment/2999780.stm">should be destroyed</a>.  And it opens users up to the real danger that their computers or information could  be misused by others; witness, for example, the <a href="http://www.cse.umich.edu/~jhalderm/pub/gd/">security weaknesses</a> of China&#8217;s Green Dam software.</li>
</ul>
<p>So, if one starts from the position that copyrights are valid and should be enforceable&#8212;<a href="http://en.wikipedia.org/wiki/Pirate_Party">some dispute this</a>&#8212;what <strong>would</strong> you like to see as best practices for copyright enforcement?</p>
<p>The approach taken by DRM is to try to build a technical framework that restricts users&#8217; ability to share content or to consume it in a proscribed manner.  But DRM has been largely disliked by end-users, mostly in the way it creates a poor user experience and interferes with expected rights (under <a href="http://en.wikipedia.org/wiki/Fair_use">fair-use</a> doctrine).  But DRM is a misleading argument, as copyright infringement notices are needed precisely after &#8220;unprotected&#8221; content has already flown the coop.</p>
<p>So I&#8217;ll start with two properties that I would want all enforcement agencies to take when issuing DMCA take-down notices.  Let&#8217;s restrict this consideration to complaints about &#8220;whole&#8221; content (e.g., entire movies), as opposed to those DMCA challenges over <a href="http://www.eff.org/deeplinks/2008/08/judge-rules-content-owners-must-consider-fair-use-">sampled</a> or <a href="http://remix.lessig.org/">remixed</a> content, which is a legal debate.</p>
<ul>
<li>For any end client suspected of file-sharing, one MUST verify that the client was actually uploading or downloading content, AND that the content corresponded to a valid portion of a copyrighted file.  In BitTorrent, this might be that the client sends or receives a complete file block, and that the file block hashes to the correct value specified in the .torrent file.</li>
<li>When issuing a DMCA take-down notice, the request MUST be accompanied by logged information that shows (a) the client&#8217;s IP:port network address engaged in content transfer (e.g., a record of a TCP flow); (b) the actual application request/response that was acted upon (e.g., BitTorrent-level logs); and (c) that the transferred content corresponds to a valid file block (e.g., a BitTorrent hash).</li>
</ul>
<p>So my question to the readers: What would you add to or remove from this list?  With what other approaches do you think copyright enforcement should be performed or incentivized?</p>
<p><b>Update (12/21/2009)</b>: Discussion about post on <a href="http://yro.slashdot.org/story/09/12/20/192215/DMCA-Takedown-Scandal-Part-Two">Slashdot</a>.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=iQAQmfaEFKM:CI3eJwMEqnU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=iQAQmfaEFKM:CI3eJwMEqnU:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=iQAQmfaEFKM:CI3eJwMEqnU:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=iQAQmfaEFKM:CI3eJwMEqnU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/PrincetonSNS?a=iQAQmfaEFKM:CI3eJwMEqnU:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/PrincetonSNS?i=iQAQmfaEFKM:CI3eJwMEqnU:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/PrincetonSNS/~4/iQAQmfaEFKM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sns.cs.princeton.edu/2009/12/erroneous-dmca-notices-and-copyright-enforcement-part-deux/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<feedburner:origLink>http://sns.cs.princeton.edu/2009/12/erroneous-dmca-notices-and-copyright-enforcement-part-deux/</feedburner:origLink></item>
	</channel>
</rss>

