David Simmons-Duffin

The arXiv According to arXiv vs. snarXiv

davidsd — Fri, 17 Sep 2010 14:29:13 +0000

After more than 3/4 of a million guesses, in over 50,000 games played in 67 countries, the results are clear: Science sounds like gobbledygook.

arXiv vs. snarXiv has been live for 6 months now, and it’s time to take a look at the results. Here’s how the game works. The user sees two titles: one is the title of an actual theoretical high energy physics paper on the arXiv, and the other is a completely fake title randomly generated by the snarXiv. The user guesses which one is real, finds out if they’re right or wrong, and then starts over with a new pair of titles.

I’ve been recording the result of each guess, originally just out of curiosity. I never expected to get reasonable statistics on the over 120,000 high energy theory papers on the arXiv. But after more than 750,000 guesses, that’s exactly what I’ve got, which means we can do some fun stuff.

The Most Fake-Sounding Papers

First, let’s take a look at the most fake-sounding papers on the arXiv. These are the papers whose titles get the lowest percentage of correct guesses when users try to distinguish them from a randomly generated title. I designed arXiv vs. snarXiv to cycle such papers through the game more often, generating better statistics for them. Here are the 15 most fake-sounding papers with at least 30 guesses.

guesses	percent	paper
36/138	26%	Highlights of the TheoryB. Z. Kopeliovich and R. Peschanski(June 1998)
41/153	26%	Heterotic on Half-flatSebastien Gurrieri, Andre Lukas and Andrei Micu(August 2004)
13/48	27%	Relativistic confinement of neutral fermions with a trigonometric tangent potentialLuis B. Castro and Antonio S. de Castro(November 2006)
13/47	27%	Toric Kahler metrics and AdS_5 in ring-like co-ordinatesBobby S. Acharya, Suresh Govindarajan and Chethan N. Gowdigere(December 2006)
9/32	28%	Aspects of U_A(1) breaking in the Nambu and Jona-Lasinio modelAlexander A. Osipov, Brigitte Hiller, Veronique Bernard and Alex H. Blin(July 2005)
35/116	30%	Energy’s and amplitudes’ positivityAlberto Nicolis, Riccardo Rattazzi and Enrico Trincherini(December 2009)
16/53	30%	A covariant diquark-quark model of the nucleon in the Salpeter approachVolker Keiner(March 1996)
51/167	30%	Noncommutative Bundles and Instantons in TehranGiovanni Landi and Walter van Suijlekom(March 2006)
38/124	30%	Baby steps beyond rainbow-ladderRichard Williams and Christian S. Fischer(May 2009)
13/42	30%	Transverse force on a moving vortex with the acoustic geometryPeng-ming Zhang, Li-ming Cao, Yi-shi Duan and Cheng-kui Zhong(January 2005)
44/138	31%	Testing factorizationGudrun Hiller(November 2001)
19/59	32%	Prospects for Mirage MediationAaron Pierce and Jesse Thaler(April 2006)
49/152	32%	Determining the dualArjan Keurentjes(July 2006)
11/34	32%	Gravitational Dressing of Renormalization GroupI. R. Klebanov, I. I. Kogan and A. M. Polyakov(September 1993)
53/163	32%	Charging Black Saturn?Brenda Chng, Robert Mann, Eugen Radu and Cristian Stelea(September 2008)

A tip for future arXiv vs. snarXiv players: the snarXiv is more grammatical than the arXiv. If you see “Heterotic on Half-flat” and think “uh… Half-flat what?” then you can be nearly certain it’s a real scientific paper that was written to advance the boundaries of human knowledge.

As a bonus, here are some up-and-comers: papers with between 10 and 30 guesses, a spectacularly low percentage of which were correct.

guesses	percent	paper
2/13	15%	Actions and Fermionic symmetries for D-branes in bosonic backgroundsDonald Marolf, Luca Martucci and Pedro J. Silva(June 2003)
3/15	20%	CERN LEP2 constraint on 4D QED having dynamically generated spatial dimensionGi-Chol Cho, Etsuko Izumi and Akio Sugamoto(December 2001)
6/27	22%	Families as Neighbors in Extra DimensionG. Dvali and M. Shifman(January 2000)
6/25	24%	The Greening of Quantum Field Theory: George and IJulian Schwinger(October 1993)
3/12	25%	Dyonic solution of Horava-Lifshitz GravityEoin Ó Colgáin and Hossein Yavartanoo(April 2009)

Stump the Experts

People with all sorts of backgrounds play arXiv vs. snarXiv. My guess is that non-physicists are extremely suspicious of ridiculous-sounding words that have been co-opted for technical purposes — “‘Mirage Mediation?’ that can’t be real!” High energy physicists, on the other hand, are used to their own unfortunate verbiage. Unfortunately for them, however, there are still plenty of papers on the arXiv that sound like they were written by a computer.

Let’s define an “expert” game to have least 5 guesses and a score of 80% or higher. So far, there have been 3,916 expert games out of 49,258 total (as of September 16, 2010). Tallying up all the guesses in these games, we can get a sense for which papers stymie even those who excel in arXiv vs. snarXiv. Here are the top 10.

guesses	percent	paper
1/6	16%	FieldsW. Siegel(December 1999)
1/5	20%	An extended model for monopole catalysis of nucleon decayY. Brihaye, D. Yu. Grigoriev, V. A. Rubakov and D. H. Tchrakian(November 2002)
1/5	20%	Space-time symmetry restoration in cosmological models with Kalb–Ramond and scalar fieldsE. Di Grezia, G. Mangano and G. Miele(July 2004)
1/5	20%	Towards Evaluation of Stringy Non-Perturbative EffectsR. Brustein and B. A. Ovrut(November 1995)
1/5	20%	Dimensional ReductionCorinne A. Manogue and Tevian Dray(July 1998)
1/5	20%	The Ridge, the Glasma and FlowLarry McLerran(December 2008)
1/5	20%	Generalized Bunching Parameters and Multiplicity Fluctuations in Restricted Phase-Space BinsS. V. Chekanov, W. Kittel and V. I. Kuvshinov(June 1996)
2/7	28%	BGWM as Second Constituent of Complex Matrix ModelA. Alexandrov, A. Mironov and A. Morozov(June 2009)
3/10	30%	Testing factorizationGudrun Hiller(November 2001)
2/6	33%	Supersymmetric Potentials in Einstein-Cartan-Brans-Dicke CosmologyL. C. Garcia de Andrade(April 2001)

I have to say, some of these definitely sound like they came straight from the snarXiv. I really don’t know what Glasma is, though. The snarXiv does not have Glasma.

Famous Physicists

Ok, so the papers above sound especially ridiculous. However, the average across all 750,000 guesses on all papers is still only 59% correct. While better than a monkey, this is not particularly good. Who’s responsible? Surely not the world’s top minds?

Here’s a ranking of some of the most-highly cited physicists on the arXiv (H-index of 40 or higher, with a few other notable folks thrown in), according to the percentage of correct guesses on their papers.^[1] A smaller percentage means that their papers sound more like complete flapdoodle. I should note for the sake of my career that this has absolutely nothing to do with the quality of said papers.

178/360	49%	Frederik Denef	246/429	57%	Neil Turok	289/488	59%	Joseph Polchinski
129/253	50%	A. M. Polyakov	310/539	57%	Lisa Randall	573/963	59%	Edward Witten
69/135	51%	Steven Weinberg	353/612	57%	Dimitri Nanopoulos	1194/1981	60%	John Ellis
189/357	52%	Howard Georgi	209/362	57%	Herman Verlinde	368/610	60%	Shamit Kachru
764/1419	53%	Cumrun Vafa	450/778	57%	Gia Dvali	315/520	60%	Roman Jackiw
332/615	53%	Leonard Susskind	297/513	57%	Nima Arkani-Hamed	311/510	60%	Hirosi Ooguri
6/11	54%	H. David Politzer	416/717	58%	Nathan Seiberg	592/966	61%	Hitoshi Murayama
63/115	54%	Gerard t Hooft	94/162	58%	Lawrence M. Krauss	118/192	61%	David J. Gross
282/514	54%	Michael B. Green	368/633	58%	Thomas Banks	381/606	62%	Lawrence J. Hall
348/622	55%	Frank Wilczek	535/917	58%	Igor R. Klebanov	118/186	63%	Stephen Hawking
164/293	55%	Erik Verlinde	412/705	58%	Steven S. Gubser	383/600	63%	Aneesh V. Manohar
109/194	56%	Sheldon Glashow	356/603	59%	Juan Maldacena	170/257	66%	Michael Peskin
406/718	56%	Savas Dimopoulos	481/813	59%	Andrew Strominger	222/326	68%	John H. Schwarz
351/615	57%	Mark B. Wise	174/294	59%	Brian R. Greene

I’d especially like to congratulate Frederik on his anomalously low 49 percent. You make us all worse than a monkey, Frederik.^[2]

The Blogosphere

Now let’s turn to even more famous people: physics bloggers (and authors). Here are some of the most prominent, ranked from most fake-sounding papers (smallest percentage) to least fake-sounding papers (largest percentage). I think there’s a lesson here somewhere, though it’s hard to be sure in some cases, due to small statistics.

117/217	53%	Sean M. Carroll [blog]	68/112	60%	Lubos Motl [blog]
129/234	55%	Jacques Distler [blog]	202/331	61%	Mark Trodden [blog]
343/609	56%	Clifford V. Johnson [blog]	98/159	61%	Sabine Hossenfelder [blog]
56/97	57%	John C. Baez [blog]	220/352	62%	Lee Smolin [site]
306/505	60%	JoAnne Hewett [blog]	6/8	75%	Peter Woit [blog]

Fake-Sounding and Real-Sounding Words

Suppose you’re writing a scientific paper, and you want to ensure that the general public doesn’t think it’s complete malarkey. How do you do it? Here are the 10 words with the lowest percentage of correct guesses (most fake-sounding) for titles containing those words (to ensure no single paper dominates this percentage, I’m requiring that each word appear in at least 5 titles).

174/521	33%	Saturn	76/195	38%	multiskyrmions
66/196	33%	half-flat	100/252	39%	secret
69/189	36%	charging	99/249	39%	perturbing
54/147	36%	caustic	78/194	40%	pollution
80/208	38%	highlights	87/214	40%	enough

Avoid these words! Turns out people don’t believe in “multiskyrmions.” Also, you shouldn’t mention “Saturn,” or use normal english words like “secret” or “enough.” By contrast, here’s a list of the 10 words with the highest percentage of correct guesses (most realistic-sounding) for titles containing those words.

76/100	76%	cp-even	87/122	71%	spin-spin
75/101	74%	Argon	90/127	70%	anomaly-free
140/191	73%	two-particle	127/180	70%	atlas
74/102	72%	self-coupling	70/100	70%	supersymmetry-breaking
84/117	71%	unusual	128/183	69%	naked

In other words, if you want to be taken seriously as a scientist, you should call your next paper Unusual Naked, but Anomaly-Free.

Incidence of Apparent Hooey in Various Subfields

Papers on the arXiv can be associated with one or more physics subfields. Here’s a ranking of subfields with at least 50 guesses from most fake-sounding to least fake-sounding.

39/81	48%	Adaptation and Self-Organizing Systems	134/226	59%	Combinatorics
48/98	48%	Popular Physics	638/1075	59%	Other
172/340	50%	Data Analysis, Statistics and Probability	116/195	59%	Accelerator Physics
212/414	51%	History of Physics	140/235	59%	Soft Condensed Matter
167/307	54%	Operator Algebras	1898/3172	59%	Differential Geometry
136/248	54%	Rings and Algebras	114/190	60%	Group Theory
101/184	54%	Disordered Systems and Neural Networks	195/324	60%	Fluid Dynamics
282/512	55%	Pattern Formation and Solitons	216/358	60%	Functional Analysis
128/227	56%	Classical Analysis and ODEs	94/155	60%	Dynamical Systems
295/523	56%	Representation Theory	96/158	60%	Algebraic Topology
39/69	56%	Probability	500/820	60%	Atomic Physics
67/118	56%	Geophysics	180/295	61%	Geometric Topology
138/242	57%	Number Theory	206/337	61%	Symplectic Geometry
757/1327	57%	Strongly Correlated Electrons	41/67	61%	Symbolic Computation
4724/8190	57%	Quantum Algebra	103/168	61%	Classical Physics
2696/4666	57%	Exactly Solvable and Integrable Systems	54/88	61%	Complex Variables
766/1317	58%	Superconductivity	758/1229	61%	Mesoscopic Systems and Quantum Hall Effect
88/151	58%	Computational Physics	50/81	61%	Materials Science
1763/3011	58%	Algebraic Geometry	50/80	62%	Category Theory
8306/14161	58%	Mathematical Physics	76/120	63%	K-Theory and Homology
2072/3511	59%	Statistical Mechanics	59/93	63%	Instrumentation and Detectors
279/472	59%	Chaotic Dynamics	55/86	63%	Spectral Theory
58/98	59%	Analysis of PDEs	82/121	67%	Optics
99/167	59%	Plasma Physics

Performance by Country

These last few statistics have less to do with the arXiv, and more to do with arXiv vs. snarXiv itself. I have location data for the most recent quarter-million guesses.^[3] So let’s look at how performance varies across the the globe. Here’s a ranking of correct guesses from countries with at least 2000 total guesses.^[4]

1400/2256	62%	Austria	2393/4189	57%	Japan
10665/17467	61%	Germany	89632/158059	56%	United States
3111/5183	60%	Israel	12137/21471	56%	United Kingdom
1658/2825	58%	Spain	7233/13053	55%	Canada
4134/7080	58%	Italy	2281/4183	54%	India
5804/9960	58%	France	1652/3037	54%	Finland
2355/4071	57%	Switzerland	2270/4266	53%	Russian Federation
3485/6083	57%	Australia	3496/6593	53%	Netherlands
1690/2958	57%	Sweden	1061/2001	53%	Argentina

It looks like having English as a first language is not particularly helpful.

Performance by School

Finally, universities account for about 1/8th of the total number of guesses on arXiv vs. snarXiv. Altogether, their performance is almost exactly average (59%). However, there are variations… Here’s a ranking of schools with at least 400 total guesses.

1145/1388	82%	University of Colorado at Boulder	560/963	58%	The University of Chicago
553/785	70%	University of Regensburg	278/481	57%	UC Santa Barbara
317/481	65%	University of Washington	264/461	57%	Madison
718/1097	65%	Penn State	542/967	56%	University of Cambridge
1001/1538	65%	Berkeley	237/426	55%	Cornell University
549/849	64%	Princeton University	476/861	55%	UC Davis
1277/1981	64%	MIT	471/855	55%	Columbia University
444/691	64%	Imperial College London	859/1577	54%	California Institute of Technology
349/544	64%	Monash University	393/723	54%	Harvard University
376/597	62%	University of Illinois at Urbana-Champaign	363/671	54%	Stanford University
287/457	62%	Hebrew University of Jerusalem	219/423	51%	Yale University
284/461	61%	The University of Edinburgh	308/599	51%	University of Minnesota
261/435	60%	Boston University	281/551	50%	University of Warwick

Congratulations to the University of Colorado at Boulder, which is the clear winner here.^[5] Also, I just wanted to say: seriously Harvard? Seriously?

Disclaimer

Finally, before heading into the comments, let me do a crapload of disclaiming. This is obviously the least scientific survey of science ever conducted. The ranking of a paper as “fake-sounding” or “realistic-sounding” has as much to do with the peculiarities of the snarXiv as with the arXiv itself.^[6] Also, although 750,000 guesses is a lot in total — such that I’m fairly certain that the 59% overall average isn’t going anywhere — the statistics get dicey when chopped into small bits (see what I did there?). To be sure of anything, I guess we’ll just have to wait until the blogosphere writes more papers.

I couldn’t find an h-index ranking of physicists that was more recent than this one from 2005. I’m probably missing lots of names. Let me know and I’ll add them.
In addition to being a top-notch physicist, Frederik also happens to be one of the world’s best arXiv vs. snarXiv players.
When I told my software-startup friend about arXiv vs. snarXiv and mentioned that I wasn’t logging ip addresses, he looked at me very seriously and said: “You’re not logging ip addresses? You should always log ip addresses.”
The high-scores leader “Ed” is from Portugal, which would have done very well in the rankings had I included his guesses. Unfortunately, “Ed” was cheating (probably already obvious to everyone, though I can also prove it with certitude), so I’ve removed all his guesses from this analysis.
Or rather, congratulations to the two dudes at the University of Colorado at Boulder who together played over 118 games with a total of 1215 guesses and an average score of 85%.
I already got slammed for this on marginalrevolution.com, and I’ll surely get slammed again.

Lie Group Computations With Python

davidsd — Wed, 31 Mar 2010 06:08:12 +0000

lie is a python module for computations with Lie groups, Lie algebras, representations, root systems, and more.

I based it on the computer algebra package LiE, written by M. A. A. van Leeuwen, A. M. Cohen and B. Lisser in the early 90’s. They chose to implement a proprietary scripting language as a wrapper for all the fancy mathematical algorithms. While this language is useful for interactive computations and short scripts, python is more expressive and powerful — definitely what you want when exploring your favorite exceptional group.

A Fun Example

Here’s an example of using lie to do a calculation that’s near and dear to every high energy theorist’s heart. We’ll show how the 10 + 5bar + 1 representation of SU(5) contains a single standard model generation. First we’ll fire up python and import the lie module.

Python 3.1 (r31:73578, Jun 27 2009, 21:49:46)
>>> from lie import *

Let’s let g be the GUT group SU(5) (A4 in Cartan’s classifiation).

>>> g = A4; g.pprint()
'SU(5)'

We’ll let the representation r be a single GUT generation: 10 + 5bar + 1 of SU(5).

>>> r = g.fund().dual() + g.alt_tensor(2) + g.trivial()
>>> r.pprint()
'(1) + (5b) + (10)'

Here, we’ll check that SU(2)×SU(3) is a subgroup of SU(5). Then we’ll get the restriction matrix for SU(2)×SU(3) in SU(5), and add the U(1) part by hand:

>>> [h.pprint() for h in g.max_subgrps()]
['SU(4)', 'SO(5)', 'SU(2)xSU(3)']
>>> m = g.res_mat(A1*A2)
>>> m = m.transpose().concat(mat([[-2,1,-1,2]])).transpose()

Finally, we decompose r under the group SU(2)×SU(3)×U(1). The first two numbers are the SU(2) and SU(3) rep dimensions, and the third is the U(1) charge (which differs from the conventional hypercharge by a factor of 6).

>>> r.branch(A1*A2*T1, m).pprint() 
'(1,1,0) + (1,1,6) + (1,3b,-4) + (1,3b,2) + (2,1,-3) + (2,3,1)'

These are indeed the representations of a generation of standard model fermions (written as left-handed Weyl spinors). In order, we have: a heavy neutrino, the positron, the up quark, the down quark, the lepton doublet, and the left-handed quark doublet. Yay!

Background

I decided I needed something like lie last summer while working on Superconformal Flavor Simplified with David Poland. Since none of the mathematical tools that physicists commonly use (e.g. Mathematica) had any routines for Lie group representation theory, I was delighted to stumble upon LiE, which implemented basically all the algorithms I wanted. However, the authors’ choice to create their own scripting language was a little silly, and definitely not future-proof. For instance, while LiE implements lists of numbers (vectors) and lists of lists of numbers (matrices), it doesn’t know anything about lists of lists of lists of numbers. This might seem insignificant, but it was frustrating for two reasons:

Every real programming language on earth implements lists of lists of lists.
I needed lists of lists of lists.

LiE uses a list of integers to represent the highest weight vector of a representation, so if I want lists of representations, like in a model, I need lists of lists. And if I want lists of models, I need lists of lists of lists. Turns out I was trying to write a whole paper about lists of models.

Future Work

I developed lie to the point where it was capable of doing what I needed for Superconformal Flavor Simplified. There’s definitely some debugging and restructuring to be done. For instance, the sourcecode currently includes the entire lexer and parser from the original LiE, just because I haven’t bothered to extract them from everything else (I originally took the approach of trying to modify the LiE source as little as possible). It’s a work in progress, but lie has already been useful to me, so I figure others might benefit from it too.

So check out the source, give it a whirl, and if it doesn’t do what you want, help me make it better!

The snarXiv

davidsd — Thu, 11 Mar 2010 02:32:51 +0000

The snarXiv is a random high-energy theory paper generator incorporating all the latest trends, entropic reasoning, and exciting moduli spaces. The arXiv is similar, but occasionally less random.^[1]

Actually, the snarXiv only generates tantalizing titles and abstracts at the moment, while the arXiv delivers matching papers as well. Details of the implementation are below.^[2] I’m the author, and I don’t remember exactly why I decided to do this. I did already have the framework lying around from a previous project, and I swear I spent more time doing research last weekend than implementing snarXiv.org.

Suggested Uses for the snarXiv^[3]

If you’re a graduate student, gloomily read through the abstracts, thinking to yourself that you don’t understand papers on the real arXiv any better.
If you’re a post-doc, reload until you find something to work on.
If you’re a professor, get really excited when a paper claims to solve the hierarchy problem, the little hierarchy problem, the mu problem, and the confinement problem. Then experience profound disappointment.
If you’re a famous physicist, keep reloading until you see your name on something, then claim credit for it.
Everyone else should play arXiv vs. snarXiv.^[4]

Context-Free Grammars

The snarXiv is based on a context free grammar (CFG) — basically a set of rules for computer-generated mad libs.^[5] Each rule in a CFG consists of a term, and a set of choices for how to make that term. The choices can contain text, or other terms, or even refer recursively to the term being defined. The CFG syntax used on the snarXiv is a collection of statements “term ::= choices”, where choices is a list of possibilities separated by “|”. Some possibilities are just text, but the ones that look like “<newterm>” are directions to go find the definition for newterm and fill it in. For instance, the following grammar

nounphrase ::=  |    | super 
noun ::= apple | pear | mailman
adj ::= smelly | chartreuse | enormous

can produce nounphrases like “apple,” “enormous smelly mailman,” or “super super smelly chartreuse mailman.” The snarxiv’s grammar is 622 lines long, and ends like this:

...
morecomments ::=  figures | JHEP style | Latex file
  | no figures | BibTeX | JHEP3 | typos corrected
  |  tables | added refs | minor changes
  | minor corrections | published in PRD
  | reference added | pdflatex
  | based on a talk given on 's 0th birthday
  | talk presented at the international  workshop
comments ::=  pages | , 

primarysubj ::= High Energy Physics - Theory (hep-th)|
                High Energy Physics - Phenomenology (hep-ph)|
secondarysubj ::= Nuclear Theory (nucl-th)|
      Cosmology and Extragalactic Astrophysics (astro-ph.CO)|
      General Relativity and Quantum Cosmology (gr-qc)|
      Statistical Mechanics (cond-mat.stat-mech)
papersubjects ::=  | ; 

paper ::=  \\ <authors> \\ <comments> \\ <papersubjects> \\ <abstract>
...
</pre>
<p>The coolest and most natural thing to do with a CFG is exploit recursiveness as much as possible.  The more recursion built in, the less predictable and richer the output.  For instance, the following definition of a “space” has three rules: <em>space</em>, <em>singspace</em>, <em>pluralspace</em>, which refer recursively to each other in many different ways, allowing for a huge number of possibilities.</p>
<pre class="brush: grammar; title: ; notranslate">
space ::= <pluralspace> | <singspace> | <mathspace>

singspace ::= a <spacetype> | a <spaceadj> <spacetype>
   | <properspacename> | <spaceadj> <properspacename>
   | <mathspace> | <mathspace>
   | a <bundletype> bundle over <space>
   | <singspace> fibered over <singspace>
   | the moduli space of <pluralspace>
   | a <spacetype> <spaceproperty>
   | the <spacepart> of <space>
   | a <group> <groupaction> of <singspace>
   | the near horizon geometry of <singspace>
pluralspace ::= <spacetype>s | <spaceadj> <spacetype>s
   | <n> copies of <mathspace>
   | <pluralspace> fibered over <space>
   | <spacetype>s <spaceproperty>
   | <bundletype> bundles over <space>
   | moduli spaces of <pluralspace>
   | <group> <groupaction>s of <pluralspace>
</pre>
<p>Of course, there’s also a danger that in a very small number of cases the output might be a little pathological.  The <em>nounphrase</em> example above, for instance, can produce any phrase of the form “super super … super enormous pear.”  The snarXiv similarly occasionally mentions QFTs living on “the moduli space of moduli spaces of moduli spaces of moduli spaces of moduli spaces of SU(3) bundles over elliptically fibered Enriques surfaces.”  Too much recursion can also quickly lead to exponentially long abstracts, which are even harder to read all the way through than the usual ones on the arXiv.</p>
<h3>The Guts</h3>
<p>To get some actual output from the grammar definition, the most straightforward thing would be to write a script that reads in the grammar, and works its way down the tree, starting with the top term, filling in definitions recursively until it gets a block of text.  Instead of using an external script, the snarXiv compiles each grammar into its own program, a technique that originated from a freshman CS project and evolved minimally from there — it’s less straightforward, not clearly better, but maybe a bit more fun.  A <a href="http://snarxiv.org/grammar/compile-grammar">perl script</a> compiles the grammar file into <a href="http://en.wikipedia.org/wiki/OCaml">OCaml</a> code (<a href="http://snarxiv.org/grammar/snarxiv.ml">snarxiv.ml</a>):</p>
<pre class="brush: ocaml; title: ; notranslate">
type phrase = Str of string | Opts of phrase array array

let _ = Random.self_init ()

let randelt a = a.(Random.int (Array.length a))
let rec print phr = match phr with
  Str  s       -> print_string s
| Opts options ->
    let parts = randelt options in
    Array.iter print parts

(* Grammar definitions *)
let rec top = Opts [|
  [| paper;|];
|]

...

and comments = Opts [|
  [| smallinteger; Str " pages";|];
  [| comments; Str ", "; morecomments;|];
|]

and primarysubj = Opts [|
  [| Str "High Energy Physics - Theory (hep-th)";|];
  [| Str "High Energy Physics - Phenomenology (hep-ph)";|];
|]

and secondarysubj = Opts [|
  [| Str "Nuclear Theory (nucl-th)";|];
  [| Str "Cosmology and Extragalactic Astrophysics (astro-ph.CO)";|];
  [| Str "General Relativity and Quantum Cosmology (gr-qc)";|];
  [| Str "Statistical Mechanics (cond-mat.stat-mech)";|];
|]

and papersubjects = Opts [|
  [| primarysubj;|];
  [| papersubjects; Str "; "; secondarysubj;|];
|]

and paper = Opts [|
  [| title; Str " \\\\ "; authors; Str " \\\\ "; comments; Str " \\\\ "; papersubjects; Str " \\\\ "; abstract; Str " ";|];
|]

let _ = print top
let _ = print_string "\n"
</pre>
<p>And snarxiv.ml is now a specialized program that, when compiled and run, spits out a paper title and abstract.  This setup is more elaborate than necessary, but OCaml is a lovely language for recursive structures, and the code is nice and simple.  OCaml is also <a href="http://shootout.alioth.debian.org/u64q/which-programming-languages-are-fastest.php?calc=chart&ocaml=on&python3=on">fast</a>, allowing the snarXiv to generate papers even more swiftly than your favorite python script, or Ed Witten in the 80’s.</p>
<h3>Other CFGs</h3>
<p>A few years ago, the CFG-based CS paper generator <a href="http://pdos.csail.mit.edu/scigen/">SCIgen</a> made a splash by getting one of their papers accepted to the conference SCI 2005.  Their website has details, and links to some other random generators around the web.</p>
<ol class="footnotes"><li id="footnote_0_1959" class="footnote"> For those who aren’t high energy physicists, and are still interested (though I can’t imagine who that would be), the “X” in <em>arXiv</em> or <em>snarXiv</em> is supposed to be a <a href="http://en.wikipedia.org/wiki/Chi_(letter)">greek chi</a>.  We’re meant to pronounce them like <em>archive</em> (as in “archive of physics papers”) and <em>snarchive</em> (as in “snarky archive of physics papers”). </li><li id="footnote_1_1959" class="footnote"> Please don’t sue me, arXiv.org, for stealing your CSS file and your beautiful color scheme.  Also, Werner Heisenberg, if you’re still alive, please don’t sue me or my computer for libel. </li><li id="footnote_2_1959" class="footnote"> If someone pretentious is annoying you, and you use the <a href="http://davidsd.org/theorem">theorem generator</a> instead, you could try something like <a href="http://undergrad.davidsd.org/theorem/applications.html">this</a>. </li><li id="footnote_3_1959" class="footnote">And check out <a href="http://davidsd.org/2010/09/the-arxiv-according-to-arxiv-vs-snarxiv/">the results</a>.  Also, pick up the <a href="http://snarxiv.org/vs-arxiv/img/snarxraft.jpg">unofficial arXiv vs. snarXiv wallpaper</a>.</li><li id="footnote_4_1959" class="footnote"> I first encountered these in freshman year of college in an assignment for <a href="http://www.fas.harvard.edu/~lib51/">CS51: Abstraction and Design in Computer Programming</a>.  We had to implement a CFG in <a href="http://en.wikipedia.org/wiki/Lisp_programming_language">LISP</a>, and the cleverest won its author lunch at the faculty club.  The eventual winner was my friend Matt Gline’s <a href="http://davidsd.org/theorem/">theorem generator</a>, which has since <a href="http://davidsd.org/2009/01/the-real-theorem-generator-a-context-free-grammar/">been enhanced with LaTeX, commutative diagrams, ajax, and stuff like that.</a> </li></ol>
</article>
<article>
<h1>Energy Secretary! Evolve!</h1>
<p>davidsd — Tue, 07 Jul 2009 15:35:17 +0000</p>
<p><a href="http://davidsd.org/wp-content/uploads/2009/07/stevenchu.png"></a></p>

</article>
<article>
<h1>The Real Theorem Generator: a Context Free Grammar</h1>
<p>davidsd — Wed, 21 Jan 2009 03:14:02 +0000</p>
<p class="lead">I should probably document the real origin of the <a href="/theorem">Theorem of the Day</a> and <a href="/2009/01/philosophy-of-the-day/">Philosophy of the Day</a>.  Coffee and Henry David Thoreau are perhaps less involved than originally indicated.
</p>
<p><a href="http://davidsd.org/wp-content/uploads/2009/01/nothoreau.png"></a>The theorem generator was written by a good friend of mine, Matt Gline, as a project for <a href="http://www.fas.harvard.edu/~lib51/">CS51: Abstraction and Design in Computer Programming</a>, which we took together as freshmen.</p>
<p>The assignment was to use <a href="http://en.wikipedia.org/wiki/Lisp_programming_language">LISP</a> to implement a <a href="http://en.wikipedia.org/wiki/Context-free_grammar">context free grammar</a> — basically a set of rules for computer-generated mad libs.  The subject was whatever we wanted.  Good ones from past years include computer-generated mystery novellas, course-guide reports, and performance art directions.  Every year there’s a contest, and Matt’s theorem generator was hysterical enough to win him lunch at the faculty club.<span id="more-476"></span></p>
<h3>Context-Free Grammars</h3>
<p>Each rule in a context-free grammar consists of a term, and a set of choices for how to make that term.  The choices can contain text, or other terms, or even refer recursively to the term being defined.  For instance, a grammar like</p>
<pre class="brush: grammar; title: ; notranslate">
nounphrase ::= <noun> | <adj> <adj> <noun> | super <nounphrase>
noun ::= apple | pear | mailman
adj ::= smelly | chartreuse | enormous
</pre>
<p>can produce phrases like “apple,” “enormous smelly mailman,” or “super super smelly chartreuse mailman.”  The original grammars we wrote for CS51 were hundred-line clumps of LISP code that spit out un-punctuated paragraphs in all-caps.  Over the years, my roommate Mike and I spiffed-up the theorem generator a bit, resulting in the <a href="/theorem">current shiny ajax/latex version</a>.</p>
<p>The current theorem grammar is defined in a grammar file (<a href="/misc/theorem/grammar/thm.gram">thm.gram</a>) that looks like this</p>
<pre class="brush: grammar; title: ; notranslate">
theorem ::= \begin{theorem}[<thname>]<thmstatement>.
          \end{theorem} \begin{proof} <proof>
          \end{proof}

proof ::= <statement>. <conclusion>. | See <citation>.
    | <statement>. <conclusion>. | <statement>. <conclusion>.

conclusion ::= 
    The result follows by d\'evissage | The theorem follows trivially
    | The conclusion is self-evident | Clearly, the theorem holds
    | We leave the rest as an exercise | This is the desired result
    | The rest follows from <citation> | QED
    | A simple application of <famoustheorem> completes the proof

famoustheorem ::=
    <mathguy>'s theorem
    | Hilbert's problem <nzdigit><zdigit>
    | the Riemann Hypothesis
    | Perelman's theorem (formerly the Poincar\'e Conjecture)
    | horizontal Iwasawa theory | Kummer theory | the different
    | Dynkin diagrams | Gegenbauer polynomials
    | trichotomy
...
</pre>
<p>A perl script (<a href="/misc/theorem/grammar/compile-grammar">compile-grammar</a>):</p>
<pre class="brush: perl; title: ; notranslate">
...
print "parsing grammar '$gram' into '$out.ml'...";
foreach (@entries) {
    s/\s* \# .* $//xmg;
    if (/\s*(\S*)\s*::=\s*(.*)/s) {
        $lhs = $1;
        $rhs = $2;

        if (!defined ($mainphrase)) {
            $mainphrase = $lhs;
            print OUT "let rec $lhs = Opts [|\n";
        } else {
            print OUT "and $lhs = Opts [|\n";
        }

        @opts = split(/\s*\|\s*/, $rhs);
        foreach $opt (@opts) {
            print OUT "  [|";
            $opt =~ s/\s*\n\s*/ /g;

            # split just before < and after >
...
</pre>
<p>processes the grammar file and compiles it into <a href="http://en.wikipedia.org/wiki/OCaml">OCaml</a> code (<a href="/misc/theorem/grammar/thm.ml">thm.ml</a>).</p>
<pre class="brush: ocaml; title: ; notranslate">
type phrase = Str of string | Opts of phrase array array

let _ = Random.self_init ()

let randelt a = a.(Random.int (Array.length a))
let rec print phr = match phr with
  Str  s       -> print_string s
| Opts options ->
    let parts = randelt options in
    Array.iter print parts

(* Grammar definitions *)
let rec top = Opts [|
  [| theorem;|];
|]

and theorem = Opts [|
  [| Str "\\begin{theorem}["; thname; Str "]"; thmstatement; Str ". \\end{theorem} \\begin{proof} ";
 proof; Str " \\end{proof}";|];
|]
...
</pre>
<p>This is almost certainly more elaborate than necessary, but OCaml is a lovely language for recursive structures, and the code is nice and simple.  Running thm.ml spits out a ready-made LaTeX-ed theorem, which gets run through <a href="http://redsymbol.net/software/l2p/">LaTeX to png</a>, and cached, ready for viewing.</p>
<h3>The Philosophy of the Day</h3>
<p>My entry into the CS51 contest was a <a href="http://davidsd.org/2009/01/philosophy-of-the-day/">philosophy generator</a>, which spits out semi-plausible definitions of philosophies from numerous cultural and intellectual traditions.  It was never as clever as the theorem generator (which still cracks me up every time it produces anything attributed to Lipschitz), and the philosophies tend to involve concepts that are only of interest to nerdy Harvard freshmen.  It’s still amusing, though.  Here’s the <a href="/misc/philosophy/grammar/philo.gram">grammar definition</a>.</p>
<h3> Have Fun!</h3>
<p>The system I set up to create and run context-free grammars is pretty easy to use.  Grammar files are simple to write, and from there all you need is compile-grammar and an <a href="http://caml.inria.fr/">OCaml installation</a>.  Here, once again, are the relevant files</p>
<ol>
<li>
Some example grammar files: <a href="/misc/theorem/grammar/thm.gram">thm.gram</a>, <a href="/misc/philosophy/grammar/philo.gram">philo.gram</a>
</li>
<li>
Parse grammar files into ml code: <a href="/misc/theorem/grammar/compile-grammar">compile-grammar</a>
</li>
</ol>
<p>Anyone’s absolutely welcome to send extensions to thm.gram and philo.gram, or any new grammars you might write.  I’ll ajaxify them and post them here, if you like. Have fun!</p>

</article>
<article>
<h1>Philosophy of the Day</h1>
<p>davidsd — Wed, 21 Jan 2009 02:37:06 +0000</p>
<p><script type="text/javascript" src="/misc/philosophy/ajax.js"></script><span id="loader"></span></p>
<p><a href="javascript:void(0);" id="generate" class="button">New Philosophy</a></p>
<p><span id="more-525"></span></p>
<h3>How it Works</h3>
<p>You press button. Button tell robot hand “pick up Henry David Thoreau, move over next to pond.” Robot hand pick up Henry David Thoreau, put him next to pond. Result recorded.</p>
<div id="attachment_526" style="width: 570px" class="wp-caption aligncenter"><a href="http://davidsd.org/wp-content/uploads/2009/01/philo.jpg"></a><p class="wp-caption-text">How we get the philosophy of the day</p></div>

</article>
<article>
<h1>Honda Needs a Tune-Up</h1>
<p>davidsd — Tue, 23 Dec 2008 12:45:09 +0000</p>
<p class="lead">This is the story of how Honda engineers screwed up a big expensive project with a simple arithmetic mistake, tried to fudge their result with sound editing software, and congratulated themselves for being totally awesome.</p>
<p>When I was a kid, my family used to drive up to The Pinery in Ontario, a beautiful park by Lake Huron.  Very scenic.  My favorite part, though, was a stretch of road a half-hour outside of the park.  To discourage reckless Canadians from barreling past the houses and barns, the local government carved five sets of grooves in the road before every stop sign.  Drive over them, and the car would vibrate: <em>“vbvbvbvb… vbvbvbvb… vbvbvbvb… vbvbvbvb… vbvbvbvb.”</em>  The faster you drive, the higher the pitch.</p>
<p>My <a href="http://music.case.edu/duffin/">Dad</a> is a musicologist, with a <a href="http://www.amazon.com/Equal-Temperament-Ruined-Harmony-Should/dp/0393062279">particular</a> <a href="http://music.case.edu/duffin/Vallotti/default.html">interest</a> <a href="http://music.case.edu/duffin/JustTuning/Index.html">in</a> <a href="http://music.case.edu/duffin/BaroqueTemp/Default.html">tuning</a>.  So there was <em>no way</em> he was going to pass up the chance to experiment with this instrument.  Every time we approached some grooves, he’d start fast over the first set, and try to slow down by the last set, to play a descending scale: G-F-E-D-C.  If there was no oncoming traffic after the stop sign, he’d swing over to the other side of the road and play an ascending scale as we sped up.  <span id="more-12"></span></p>
<p>Ratios of speeds correspond to ratios of vibration frequencies, which correspond to intervals between notes.  To play an ascending scale C-D-E-F-G, you need to drive at these ratios to your starting speed: <tt>1 — 9/8 — 5/4 — 4/3 — 3/2</tt> (for example, <tt>24 — 27 — 30 — 32 — 36</tt> mph)<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_0_12" id="identifier_0_12" class="footnote-link footnote-identifier-link" title="If anyone's wondering what happened to the 1/12th powers of 2 in this whole tuning discussion, I'm using what's called Just Intonation, which is an (often better-sounding) approximation to the Equal Temperament system most people know.  Actually, it's really the other way around: the reason we use 12 equal semitones is that it lets us approximate nice integer ratios like 3/2, 4/3, 5/4, etc..  This is a long story that I'm not going to get into here.">1</a>]</sup>.</p>
<p>Playing a scale with a ’95 Toyota Previa is not easy.  The notes tend to come out a little wonky — we’d get the half-step between E and F too wide, and with not enough space between F and G.  It usually sounded kinda modal… but still awesome.</p>
<h3>Professionals?</h3>
<p>So imagine my delight when I heard about this <a href="http://reviews.cnet.com/8301-13746_7-10049007-48.html">musical road [CNET]</a> that Honda built in Lancaster, CA..  A team of engineers carved some grooves into a highway that were carefully spaced to play the <a href="http://en.wikipedia.org/wiki/William_Tell_Overture">William Tell Overture</a> as you drive over them at a constant speed.  Awesome, right?  The problem is, it’s <em>spectacularly</em> out of tune.</p>
<p></p>
<p>Here’s the original melody:<br>
</p>your browser does not support embedded html5 audio</audio>
<p>And here’s the Honda road again:<br>
</p>your browser does not support embedded html5 audio</audio>
<p>The Honda version isn’t simply “out of tune”… the notes are just wrong.  The original starts with a rising 4th, F-B♭<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_1_12" id="identifier_1_12" class="footnote-link footnote-identifier-link" title="Actually, the starting note in the recording is around a B♭.  I'm going to pretend like everything is in the key of B♭ (so the starting note is F), since that's the key they talk about in the making-of videos.a picture of Honda's scoreSorry to the perfect-pitch people.">2</a>]</sup>, and eventually reaches an octave above the starting note before descending to the tonic F-E♭-D-B♭.<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_2_12" id="identifier_2_12" class="footnote-link footnote-identifier-link" title="The original melody actually has a run down to the B♭: F-E♭-D-C-B♭.  Honda apparently decided this was too complicated and used a simplified version.  That's what I'll stick to here.">3</a>]</sup>  But Honda’s version starts with a rising major 3rd, and its top note is a major 6th above the starting note.  Some might have noticed that the last few notes in Honda’s commercial sound OK.  That’s because they edited over them!  I can prove it.</p>
<p id="william-tell-melody" class="wp-caption-text">Basic melody in the William Tell Overture (schematic)</p>
<p>The CNET article above speculates that Honda designed the road specifically for the Honda civic driving at the speed limit, and other cars might need to drive at a different speed to make it sound better.  But if you’re going at a constant speed, all that matters is the spacing between grooves.  Speeding up or slowing down just transposes everything.  It would be theoretically possible to “correct” the melody by driving at different speeds (like on the road to the Pinery).  But the notes on the musical road are too closely spaced for all but consummate musician Mario Andretti.</p>
<p>It also doesn’t matter what car you drive<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_3_12" id="identifier_3_12" class="footnote-link footnote-identifier-link" title="With one exception that can't fix the tuning.  See my comment, below.">4</a>]</sup>.  The vibration frequency is <tt>f = v/d</tt>, where v is the car’s speed, and d is the distance over which the road pattern repeats.  There’s no place in the equation for wheel spacing, tire size, side-impact airbags, etc.  All of these things affect the quality of the sound, but not the pitch.</p>
<p>So why is the musical road so unmusical?</p>
<h3>The Error</h3>
<p>Honda posted a series of 5 ridiculous videos: <a href="http://www.youtube.com/watch?v=gRiJlEte9l0">[Part 1]</a><a href="http://www.youtube.com/watch?v=wVynYbGhDcs">[Part 2]</a><a href="http://www.youtube.com/watch?v=gPWQ_TM6rsU">[Part 3]</a><a href="http://www.youtube.com/watch?v=1qmOR9lS6Pw">[Part 4]</a><a href="http://www.youtube.com/watch?v=Z8aJduZeAFk">[Part 5]</a>, in which they talk about all the hard work they did and congratulate themselves for being so awesome.  There are lots of complicated sounding numbers, there’s a “Mathematician/Musician,” and plenty of experts.  I’m sure some people behind the project understood what was going on.  But I think they failed to anticipate a basic misunderstanding on the part of the groove-designers.</p>
<p>In the fourth “making of” video, they mention that the initial note, a low F, has a spacing of 4 inches (4in) between grooves (1:47):</p>
<p></p>
<p>From the video, it looks like the grooves themselves are about 1in wide.  Now, suppose you want to make the B♭ a 4th above F.  A perfect 4th is a fequency ratio of <tt>4/3</tt>, so you should multiply the width by a factor of <tt>3/4</tt>…  But the width of what?</p>
<div id="attachment_2039" style="width: 530px" class="wp-caption aligncenter"><a href="http://davidsd.org/wp-content/uploads/2008/12/measurements.jpg"></a><p class="wp-caption-text">Based on the Civic’s 106.3 inch wheelbase, we can see from this picture that s+g is about 5 inches. Honda says the lowest note has a 4 inch spacing, so that’s consistent with 1 inch grooves.</p></div>
<p>The width that really matters is the total width of the spacing plus groove (s+g).  That’s the distance over which the road pattern repeats, so that’s the distance over which the car completes one vibration.<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_4_12" id="identifier_4_12" class="footnote-link footnote-identifier-link" title="More precisely, once you know the force driving the vibrations is periodic with period T=d/v, it follows that the vibrations themselves have that periodicity, so the Fourier transform of any resultant sound is only nonzero at integer multiples of f=1/T.  For more explanation, see the second comment, below.">5</a>]</sup>  Suppose you didn’t know this, and only changed the spacing, from <tt>s = 4in</tt> to <tt>s’ = 3/4 × 4in = 3in</tt>.  Then the frequency ratio is <tt>(s+g)/(s’+g) = (4+1)/(3+1) = 5/4</tt>, a major 3rd, not a perfect 4th.  What about the octave above the starting note?  An octave is a frequency ratio of <tt>2/1</tt>, but if you only changed the spacing to <tt>s’ = 1/2 × 4in = 2in</tt>, you’d get an actual ratio of <tt>(s+g)/(s’+g) = (4+1)/(2+1) = 5/3</tt>, a major 6th, not an octave.</p>
<p>Oops.</p>
<div id="attachment_2034" style="width: 460px" class="wp-caption aligncenter"><a href="http://davidsd.org/wp-content/uploads/2008/12/octave-bad.png"></a><p class="wp-caption-text">making an octave, incorrectly</p></div>
<p>There are two ways you could correct this problem:</p>
<ol>
<li>Adjust the groove width g as well as the spacing s.  For instance, to make an octave, use a spacing <tt>s’ = 2in</tt> and a groove <tt>g’ = .5in</tt>, giving a fequency ratio <tt>(s+g)/(s’+g’) = 5/2.5 = 2/1</tt>.  This is probably hard with typical cutting tools.  Also, the engineers may have found that they need to make the grooves bigger than some minimum width to get a good sound. So on to method 2…
</li>
<li>
Over-adjust the groove spacing so that the total <tt>g+s</tt> is correct.  For instance, to make an octave, adjust the groove spacing to <tt>s’ = 1.5in</tt>, so you get a frequency ratio of <tt>(s+g)/(s’+g) = 5/2.5 = 2/1</tt>.
</li>
</ol>
<div id="attachment_2035" style="width: 460px" class="wp-caption aligncenter"><a href="http://davidsd.org/wp-content/uploads/2008/12/octave-good.png"></a><p class="wp-caption-text">making an octave, correctly</p></div>
<h3>The Coverup</h3>
<p>Armed with this theory for why the musical road sounds so bad, I crunched some numbers in Mathematica, and was able to reproduce Honda’s result, sort of…</p>
<p>Here’s Mathematica playing the correct William Tell Overture:<br>
</p>your browser does not support embedded html5 audio</audio>
<p>And here’s Mathematica programmed to make the mistake I think Honda’s engineers made:<br>
</p>your browser does not support embedded html5 audio</audio>
<p>And here’s honda’s commercial version again:<br>
</p>your browser does not support embedded html5 audio</audio>
<p>Notice that a few notes in the commercial sound different from Mathematica’s version.  Particularly at the end.  Honda’s last few notes are sort of… in tune!  Turns out that’s a bit of Hollywood magic.  Here’s a recording I stole from a <a href="http://jalopnik.com/5053214/hondas-musical-road-to-be-paved-over">different video</a> of someone driving down the Musical Road<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_5_12" id="identifier_5_12" class="footnote-link footnote-identifier-link" title="I've actually transposed it up to be in approximately the same key as the other recordings in this article.  By the way, there are hundreds of such videos on Youtube.">6</a>]</sup>:<br>
</p>your browser does not support embedded html5 audio</audio>
<p>What happened to the ending?  It’s all funky again.  Go back and listen to the Mathematica version that mimics Honda’s mistake.  Same funky ending<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_6_12" id="identifier_6_12" class="footnote-link footnote-identifier-link" title="Aside from a single passing note.  If I change the closing notes from F-E♭-D-B♭-D-B♭ to F-E♮-D-B♭-D-B♭, and apply the Honda miscalculation, it sounds almost exactly like the undoctored recording of the musical road.  So it appears that there are two errors at work here: the groove spacing miscalculation, and replacing an E♭ with an E♮.">7</a>]</sup>.  Whoever put together the Honda commercial must have edited over the ending, assuming that as long as the last few notes were correct, no one would notice anything wrong.<sup>[<a href="http://davidsd.org/2008/12/honda-needs-a-tune-up/#footnote_7_12" id="identifier_7_12" class="footnote-link footnote-identifier-link" title="It seems like Honda fixed up some of the other notes, too, to get a more pleasant sound.  Some might object that it's easy to make the notes sound bad by speeding up or slowing down as you drive down the road.  However, I don't hear anything like that in the random person's recording.  The melody returns to previous notes with reasonable accuracy, which it wouldn't do if the speed were varying.">8</a>]</sup></p>
<p>What I don’t understand is: if they were going to doctor the sound, why didn’t they just correct the whole thing?  It’s not that hard.  My dad did this version in about 20 minutes:<br>
</p>your browser does not support embedded html5 audio</audio>
<h3>Aftermath</h3>
<p>I learned something else kind of ridiculous from this analysis: if Honda didn’t doctor the overall pitch of the melody in their commercial, then they were speeding.  The opening frequency is about 238Hz, which corresponds to a speed of about 67mph if the road pattern repeats over 5in.  But they mention in one of the videos that the speed limit is 55! Crap.</p>
<p>In fact, in <a href="http://www.youtube.com/watch?v=z9-s7RI4tOs">this youtube video</a>, where they explicitly state they’re going 55mph, the melody starts a minor third below the Honda commercial.  A minor third is a frequency ratio of <tt>6/5</tt>, so this is consistent with Honda’s driver doing <tt>6/5 × 55mph = </tt>more than 10mph over the speed limit…</p>
<p>Another funny point is that some of the intervals you get from Honda’s miscalculation are pretty bizarre.  The D, a major 6th above the starting F, should have a frequency ratio of <tt>5/3</tt> above the starting frequency.  Instead, it has a ratio <tt>5/(4 × 3/5+1) = 25/17</tt>.  This isn’t really in the western scale.  It’s about 2/3rds of the way between an augmented 4th and a pure 5th.  Microtonal composers like <a href="http://en.wikipedia.org/wiki/Easley_Blackwood_Jr.">Easley Blackwood</a> might have found a use for it, but I don’t think it’s what Honda was after.</p>
<p>If I were them, I’d seriously consider paving over the road.  In fact, it seems like some local residents <a href="http://cbs2.com/local/William.Tell.Road.2.822008.html">might do it for them</a>.  There is another option, though.  If they bring in the bulldozers, and shuffle around a few chunks of asphalt at the end of the road, they might get a decent rendition of “When The Saints Go Marching In.”<br>
</p>your browser does not support embedded html5 audio</audio>
<p>Update [12/30/08]: Added picture comparing grooves to Civic wheelbase</p>
<p>Update [5/2/11]: I am both sorry and delighted to hear that they rebuilt the musical road (see, e.g., <a href="http://9teen87spostcards.blogspot.com/2011/05/musical-road-in-lancaster-california.html">here</a>), and they fixed nothing.  Here it is on April 28, 2011:</p>
<p></p>
<p>Just… wow.</p>
<p>Update [4/15/18]: This post was recently featured on Tom Scott’s Youtube Channel “Amazing Places.” As of today, the video currently has about 5 million views. It was also mentioned in <a href="https://www.nytimes.com/2018/04/12/world/europe/netherlands-singing-road.html">The New York Times</a>.</p>
<p></p>
<ol class="footnotes"><li id="footnote_0_12" class="footnote">If anyone’s wondering what happened to the 1/12th powers of 2 in this whole tuning discussion, I’m using what’s called <a href="http://en.wikipedia.org/wiki/Just_intonation">Just Intonation</a>, which is an (often better-sounding) approximation to the <a href="http://en.wikipedia.org/wiki/Equal_temperament">Equal Temperament</a> system most people know.  Actually, it’s really the other way around: the reason we use 12 equal semitones is that it lets us approximate nice integer ratios like 3/2, 4/3, 5/4, etc..  This is a <a href="http://www.amazon.com/Equal-Temperament-Ruined-Harmony-Should/dp/0393062279">long story</a> that I’m not going to get into here.</li><li id="footnote_1_12" class="footnote">Actually, the starting note in the recording is around a B♭.  I’m going to pretend like everything is in the key of B♭ (so the starting note is F), since that’s the key they talk about in the making-of videos.<span id="attachment_197" class="wp-caption aligncenter" style="width: 350px"><span class="wp-caption-text">a picture of Honda’s score</span></span>Sorry to the perfect-pitch people.</li><li id="footnote_2_12" class="footnote">The original melody actually has a run down to the B♭: F-E♭-D-C-B♭.  Honda apparently decided this was too complicated and used a simplified version.  That’s what I’ll stick to here.</li><li id="footnote_3_12" class="footnote">With one exception that can’t fix the tuning.  See <a href="#comment-4">my comment</a>, below.</li><li id="footnote_4_12" class="footnote">More precisely, once you know the force driving the vibrations is periodic with period T=d/v, it follows that the vibrations themselves have that periodicity, so the Fourier transform of any resultant sound is only nonzero at integer multiples of f=1/T.  For more explanation, see the <a href="#comment-4">second comment</a>, below.</li><li id="footnote_5_12" class="footnote">I’ve actually transposed it up to be in approximately the same key as the other recordings in this article.  By the way, there are hundreds of such videos on Youtube.</li><li id="footnote_6_12" class="footnote">Aside from a single passing note.  If I change the closing notes from F-E♭-D-B♭-D-B♭ to F-E♮-D-B♭-D-B♭, and apply the Honda miscalculation, it sounds <em>almost exactly</em> like the undoctored recording of the musical road.  So it appears that there are two errors at work here: the groove spacing miscalculation, and replacing an E♭ with an E♮.</li><li id="footnote_7_12" class="footnote">It seems like Honda fixed up some of the other notes, too, to get a more pleasant sound.  Some might object that it’s easy to make the notes sound bad by speeding up or slowing down as you drive down the road.  However, I don’t hear anything like that in the random person’s recording.  The melody returns to previous notes with reasonable accuracy, which it wouldn’t do if the speed were varying.</li></ol>
</article>
</main></body></html>