Behind the Guesses

Quaternions -- Part 1: How many?

2009-11-30T23:42:00.001-05:00

[Click here for a PDF of this post with nicer formatting or see below]
The Setup
Quaternions seem to be one of the least understood mathematical things amongst physicists. I have sat in countless lectures where at some point the lecturer pointed out that a particular topic could be understood or explained using quaternions, but, when pressed, could not really explain what, precisely, one of these quaternion thingies actually is.

The first encounter people have with quaternions is generally after they learn about the complex plane and its relationship to the regular 2D Cartesian plane. After seeing all sorts of nifty properties and uses of this relationship (we'll see one shortly) it's only natural to ask if there's a 3D complex analogue to the 2D complex plane. And, therefore, most books ask precisely this question. However, they usually give less-than-satisfactory attempts at generalizing, highlighting the mysterious algebraic problem of ``closure'' or something to that effect.

Then, often retelling the story of Hamilton and a bridge¹ they pull some strange, ``4D'' quaternions out of a hat and show how they happily resolve all the algebraic problems. This, it seems, should be enough to placate even the most thoughtful reader, and stands in place of an actual explanation. And even though there is a lot of information about these buggers out there on the intertubes, all that I've seen is of the same approach.

So, it doesn't surprise me, honestly, that ``quaternion'' is also one of the most popular searches on this blog. The topic of quaternions is really too big to handle fully in one post (and, for full disclosure, I do not completely understand them myself), so this post will deal primarily with a rationale for the initial guess of a ``4D'' quaternion.

This post assumes you have read, and thoroughly grokked my discussion of dot and cross products,[1] and have a solid understanding of traditional complex numbers.

Complex Numbers and 2D Vectors
My approach in this section is based on the fantastic book, Visual Complex Analysis by Tristan Needham.[2] If something here isn't clear (and it's not the fault of my writing), or is different from the way you learned complex numbers, read this book. Even if everything is perfectly clear, read this book. What I'm trying to say is: Read this book.²

Recall that a complex number can be represented by a vector in a 2D -plane. Also, if -- i.e. the ``modulus'' of or the length of -- and is the ``argument'' of or the angle between and the -axis, we can also write in ``polar form.'' See [2] for pictures. We also have Euler's identity

(1)

Furthermore, recall that multiplying two complex numbers together effects a rotation and scaling. For example, multiplying a complex number -- graphically, a vector of length making angle with respect to the real axis (-axis) -- by gives . This can be understood graphically as a scaling of by and a rotation of the direction of by the angle . Finally, the complex conjugate of is given either by or .

From Complex Multiplication to Vector Products
For two complex numbers and let's see what is. This demonstration (at least initially) is based on [2].³ Anyway,

(2)

Graphically this is a vector with length at an angle from the -axis. Expanding this into a real and complex part using Euler's identity (1) gives:

(3)

We now note that the real part of this expression corresponds to the dot product between the two vectors
and . But should we do with the imaginary part?

Well, the magnitude of the imaginary part certainly corresponds to the magnitude of one dimension of the cross product between the two vectors. That is, if we relate the complex plane to the Cartesian -plane then the imaginary part of is the -component of . This important point is often lost in passing, and thus this property of complex multiplication is relegated to the realm of ``cool trick.'' However, we'll make good use of this detail.

Rethinking complex numbers
Now we are ready for the conceptual jump. Although we got to the representation of dot and cross products through use of a 2D complex plane, we're going to distance ourselves from this wonderful visualization for the moment and note that an arbitrary complex number has two parts: One corresponds to a dot product, the other corresponds to one dimension of a cross product of two vectors.[3]⁴ If we want to find a relationship between complex numbers and 3D vectors we need to pick one of these parts to generalize.

Now, recall that the dot product yields a scalar quantity equal to the amount that two vectors point in the same direction. Since there is no directionality or dimensionality inherent in this quantity -- it's just a length -- there's really no way to add extra bits here. Length stays a scalar in any dimension.

So, instead we turn to the cross-product part. In the preceding section I repeatedly stressed that the imaginary part of corresponds to one dimension of a 3D cross product. However, which single dimension of the cross product we choose is completely arbitrary: Just as with the calculation of area for the cross product, the 2D Cartesian plane we choose to map to the complex plane could just as easily be the -, - or the -planes.

Recall, that to resolve this ambiguity in in cross-product land we chose to identify which plane we were talking about by a right-hand rule normal vector to the plane. However, here we're attempting to generalize complex numbers, not cross products per se. So, instead of assigning different normal vectors to each cross product term, let's assign a different complex number to each term. That is, and , but for example. Then, we assign to the cross product of two vectors in the -plane and to the cross product of two vectors in the -plane.

The one question remaining, though, as we generalize our complex plane, is how many additional complex numbers do we need? Maybe, naively, we can try adding just one extra cross-product dimension. That is, and only. The problem, though, can be seen in cross-product land.

Closure
Remember, that a cross product resultant vector is a normal vector to an arbitrary plane in 3D Cartesian space, and thus always requires all three unit vectors , and . For example, the cross product is . That is, in order to make sense of cross which can exist in 2D, you must already have a third unit vector .

Physically, in Cartesian vector space, it means that you must be able to add any arbitrary 3D cross product resultant vectors and still get a 3D vector. In fact, if this wasn't true, there'd be no way to even write the 3D cross product in the first place since you need to project the arbitrary vectors to three (independent) 2D planes and then add the resulting normal vectors. You can't have just two cross-product parts and get a result that always makes sense. This is the requirement of ``closure.''

The reason there's no problem in the 2D plane version is simply because there's only one possible normal vector, so we only look at the magnitude of the cross product -- i.e. the amount of area -- and the sign. And that is just a scalar! In 2D land nothing is preventing you from adding the cross product to the dot product -- they're both scalars -- so you can write a two-element complex number combination with no trouble.

However, in 3D we can't simply add a vector to a scalar, and therefore we need all three parts of the cross product. So too, then, if we want a generalized complex number to have a dot-product part and a cross-product part that makes physical sense, we need three complex numbers: and from above, plus a , corresponding to the cross product of the projection of vectors in the -plane.

Thus, we now have a generalized complex number -- quaternion -- of the form

(4)

References
[1] E. Lansey. The dot product and cross products [online]. April 2009. Available from: http://behindtheguesses.blogspot.com/2009/04/dot-and-cross-products.html.
[2] Tristan Needham. Visual Complex Analysis. Oxford University Press, Oxford, UK, 1997.
[3] C. Doran and A.N. Lasenby. Geometric algebra for physicists. Cambridge University Press New York, Cambridge, UK, 1st edition, 2003.

¹ Just Google it, it's not really worth retelling, in my opinion.
² If I was stranded on an island forever but could bring only one math book, this would be it.
³ Just go out and get that book already! What are you waiting for?
⁴Thanks, Peeter, for recommending the book ([3]) which highlighted this point.

Nothing new

2009-09-30T09:02:00.004-04:00

Unless I miraculously complete 3 problems sets, find the errors in the calculations I've been working on for 4 months and the bugs in the code based on those calculations, and mow the lawn by a reasonable hour today, there will likely not be a new Behind the Guesses post of substance this month.

However, I am considering a new method of posting, using Google Document Viewer, rather than converting LaTeX into pictures, etc. This is the last post, Noncommuting Rotation and Angular Momentum Operators, using the new method. Would you prefer this, or the old way? Or do you just download the PDF, and it makes no difference?

Noncommuting Rotation and Angular Momentum Operators

2009-08-31T16:49:00.020-04:00

[Click here for a PDF of this post with nicer formatting]
The Setup
Avi Ziskind¹ asked me to cover non-commuting operators in quantum mechanics, specifically why angular momentum operators do not commute. He pointed out that Griffiths [1] gives an intuitive argument for understanding why position and momentum operators do not commute but does not present any rationale given for why the different components of angular momentum have the commutation relation

(1)

Additionally, Schwabl [2], for example, defines the angular momentum operator, presents the commutation relations, and at least attempts (I think) to show (in a post-facto way) why they should have such relations. Likewise, in a related (as we'll see) problem, Goldstein, et. al. [3] discuss the commutation relations of generators of rotation without any physical argument.

However, both Sakurai [4] and Landau and Lifshitz [5], to some degree, present physical rationales for these relations. Landau and Lifshitz derive the notion of angular momentum in quantum theory quite nicely, and succinctly, but do not argue for why the commutation relations should hold. Sakurai develops a set of commutation relations independently of QM (as I will, shortly), but, I feel, bridges the gap to angular momentum rather poorly.

This post assumes familiarity with the ``generator of transformation'' ideas in [6].

The Generator of Rotation

(a) In 3D.

(b) The projection of (a) onto the -plane.

Figure 1: The rotation of a vector around the -axis.

In a previous post I covered the notion of ``generators of transformations,''[6] and claimed, as an example, that the ``generator of rotation'' is the angular momentum. Actually, I was getting ahead of myself there, and the statement in that context was not entirely correct. As I did not derive this result in that post, I will now, and will hopefully clear things up.

Suppose we have a function and we want to rotate it in space around the axis through some angle to . To do this, we'll find an ``angle rotation'' operator , which, when applied to , gives . That is,

(2)

The shift in coordinates can be derived from regular vector analysis, see Fig. 1 and Ref. [7], applied inside the arguments of the function.

Now the tricky part -- the Taylor expansion. Unlike the last time where the translated function had a simple argument, here we have inside sines and cosines. Since I'm really too lazy to do this expansion by hand I had Mathematica do it for me (click to see full-size):

(3)

In Mathematica's notation, raised to those parenthetical powers denotes partial derivatives. Say, means , for example. This expression is a bit of a mess, but we are not completely lost. From our discussion in the beginning of [6], we know that at least one similar operator takes an exponential form. So, we'll guess that here, as well, our operator will take an exponential form. We just need to process the mess of (3) to find that hidden exponential.

The first two terms in the series give us hope. They can be written as

(4)

which are, indeed, what we would expect to see at the beginning of an exponential expansion, where is the generator. Now we check that this keeps up for higher powers.

Continuing with the quadratic term, let's see if we can write as which would be the next term in an exponential. We check:

(5)

which does match the mess for the term in the expansion (3). You can verify on your own that this pattern continues in the higher powers.

Thus we conclude that

(6)

where we now identify

(7)

as the generator of the rotation.

This generator is the -component of the cross product

where and . Thus, we can simplify

(8)

If we carry through these same calculations for rotations around the or axes (try it yourself!) we get similar generators

(9)

(10)

This allows us to write the rotation operator for a rotation around an arbitrary axis , as

(11)

where for

(12)

is the generator of the transformation.

Commutators in general
In general, rotations do not commute. That is, rotating an object first around the -axis and then around the -axis will give a different result than rotating in the opposite order. You can convince yourself of this by the ultimate hand-waving argument² -- twist your hand around different axes in different orders. Or see Fig. 2.

We'd like to find a way to quantify the difference between applying the rotations in different order, but, for the sake of generality, we'll discuss this for any two arbitrary operators and . The most natural way to quantify a difference is to look at, well, the difference. That is, if these operators act on a vector , we'd like to know what

(13)

is. This difference (for linear operators) does not depend on the particular vector , so we'll define the commutator of two operators as

(14)

Thus, a commutator of two operators is another operator which enacts this difference. If the order of operator application does not affect the end result the commutator is 0, and the operators are said to ``commute.''

In quantum mechanics, the issue of non-commuting operators is closely tied to the problem of measurement and the uncertainty principle. For example, if I have a state and I want to measure the position I apply the position operator . Likewise, if I want to measure the momentum I apply the momentum operator . However, in quantum mechanics, the order of taking these measurements affects the results, such that , for example. However, the applicability of commutators is not relegated only to quantum mechanics.

Commutators for rotation
This brings us back to our original question of the commutator of rotations. Because any two rotations through arbitrary angles, done in opposite orders give drastically different results depending on the angles, we'll consider rotations through small angles , such that we can approximate (11) by the first two terms in the expansion:

(15)

This simpler expression makes calculating the commutator much simpler. For rotations around and , the commutator depends only on the commutator of the generators .³ This commutator is the generator of the transformation for ``the difference between the order of the rotations.'' That is

(16)

where is the parameter for this transformation. Then, just as any rotation can then be built up from repeated applications of the generator (as in that exponential), the commutators for larger angles can be built up from repeated applications of the commutators of the generators.

Figure 3: Graphical commutator of . Blue vector is application of either or . Red is further application of to and green is further application of to . Brown is difference between the two.

For ease of illustration, we'll consider small rotations around the - and -axes (i.e. and ). There are two ways to find the commutator . One way is by brute force calculation which I encourage you to try on your own (use the expressions for (9) and (10)). However, I prefer showing it graphically, see Fig. 3. Starting with a vector in the -plane, we apply a small rotation around . This directs the vector upwards (blue in the picture). Then we apply another small rotation around , which directs the vector along the red line.

If we start with the same vector, and apply a small rotation around , the vector follows the blue line again. However, when we then rotate around , the vector veers off in the opposite direction at the same rate. The difference between the red and green vectors, as well as that difference added to the initial vector is shown in brown. The picture illustrates that

(17)

i.e. the generator of rotation around the -axis. Similar relationships

(18)

hold for other permutations of .

Angular momentum
Looking back at the expression for the generator of rotations (12), we see that we can re-write this in terms of the momentum operator

(19)

in quantum mechanics:

(20)

where we call the ``quantum mechanical angular momentum'' operator.⁴ Flipping this around to solve for in terms of :

(21)

In other words, the quantum mechanical angular momentum is the same (up to a constant) as the generator of rotations. Thus, the reason that quantum angular momentum has commutation relations (1) is due to the fact that it's simply a generator of rotation masquerading as a quantum mechanical operator.

References
[1] D.J. Griffths. Introduction to Electrodynamics. Pearson Prentice Hall, 3rd edition, 1999.
[2] F. Schwabl. Quantum Mechanics. Springer, 3rd edition, 2005.
[3] H. Goldstein, C. Poole, and J. Safko. Classical Mechanics. Cambridge University Press, San Francisco, CA, 3rd edition, 2002.
[4] J.J. Sakurai. Modern Quantum Mechanics. Addison-Wesley, San Francisco, CA, revised edition, 1993.
[5] L.D. Landau and E.M. Lifshitz. Quantum Mechanics. Butterworth-Heinemann, Oxford, UK, 3rd edition, 1977.
[6] E. Lansey. The Schrodinger Equation -- Corrections [online]. June 2009. Available from: http://behindtheguesses.blogspot.com/2009/06/schrodinger-equation-corrections.html.
[7] D.C. Lay. Linear Algebra and Its Applications. Addison-Wesley, Reading, MA, 3rd edition, 2003.
[8] C.T.J. Dodson and T. Poston. Tensor Geometry: The Geometric Viewpoint and its Uses. Springer, 2nd edition, 1997.

¹ Everyone congratulate him on the birth of a son!
² Borrowing a joke from Dodson and Poston, [8]
³ If this isn't obvious, work it out for yourself. Hint: The identity operator 1 commutes with everything.
⁴ There are better arguments (see [5]) using symmetry for why should actually be the angular momentum, not just called it, as I've argued, but they require much more talking. And this post is long enough already.

Transverse Electric and Magnetic Fields in a Waveguide

2009-07-29T13:32:00.022-04:00

[Click here for a PDF of this post with nicer formatting]
The Setup

Figure 1: An example of a section cylindrical waveguide with embedded coordinate axes.

A conducting waveguide is a metal tube -- think pipe or air conditioning duct, for example -- through which electromagnetic waves can propagate. If you want to know what real-life waveguides look like, just do a quick internet image search. We'll assume the length of the tube is oriented along the -direction, see Fig 1. There is no loss of generality in doing this, since we can always choose a coordinate system as we like. So really, we're picking a coordinate system such that the -axis points along the tube.

Now, we can decompose the electric field and magnetic (inductance) field vectors into two parts each. One part points along the (normal) direction while the other is pointing somewhere in the (transverse) plane. Explicitly:

(1a)

(1b)

In the first([1], Eq. (8.24)) and third[2], Eq. (8.26)) editions of Classical Electrodynamics, J.D. Jackson gives the transverse fields in terms of the -components of the fields. (I have no idea why he left the complete expression out of the second edition.) In the third edition, for example, he assumes plane wave propagation in the positive direction -- that is an dependance -- and simply states, without any real explanation:

the transverse fields are

where I've converted his new choice of MKSA units back into the clearer CGS units. However, back in the first edition he does not insist on the assumption of positive propagation. Moreover, he does not just state the fields; he suggests a method for getting them -- namely, manipulation of the curl equations in Maxwell's equations. However, in that edition, he does not expand the curl equations in light of the separation of the fields into transverse and parallel components as he does in the second and third editions.

Because of all this confusion, I'm going to derive the cavity modes fully, starting from Maxwell's equations, once and for all. This derivation is based on a combination of all three editions of Jackson's book. This is a tedious, although not completely trivial exercise. Brace yourselves for quite a bit of algebra.

Maxwell's Equations - The Curls
Here we'll deal with the two curl equations in Maxwell's equations:

(2a)

(2b)

where is the magnetic field and is the electric displacement field. We will assume the inside of the waveguide has uniform permittivity and permeability, so and . Also, we'll assume the absence of any currents, so and we'll drop it from here on. Additionally, we'll assume the same sinusoidal time dependance for both the fields. Thus, the time derivatives ``bring down'' a factor of .

Furthermore, since we're splitting up and into normal and transverse parts, we'll do the same with the gradient operator :

Because curl equations are annoying, and because we're ultimately looking for an equation for the transverse fields, I'm going to try and get rid of the 's. The symmetry of form in (2) means that we'll only need to do these calculations once; I will use in place of either or .
First, we'll expand :

(3)

We've killed one term through this expansion. However, the leftmost cross product term gives a quantity with only a component. The righthand side of these equations also have a term. We can get rid of both by multiplying the entire equation(s) by :

(4)

Figure 2: Vectors , and .

For why see Fig. 2. Also, we note that

(5)

for the same reason. We could have used the vector multiplication identity

to simplify both of these expressions, or expanded and and carried through even more algebra, but I think the picture is clearer.

Thus,

(6)

and we can write (2) as

(7a)

(7b)

At this point, it's time to introduce the explicit dependence and process the derivatives.

Some and notes
Unlike Jackson, who works with the assumption of upward propagating waves -- i.e. an dependence -- we'll work with an assumed dependance, thus allowing both upward and downward propagating waves. Thus, the derivatives ``bring down'' a factor of . Whenever we have or the upper symbol is the sign for upward propagating waves, the lower symbol is for downward propagating. Because we'll be mucking about with these plus-minus guys in some algebra, I want to get a few issues out of the way.

The first thing to keep in mind about these plus-minus operators is that an equation like

(8)

is shorthand for two different equations:

(9a)

(9b)

So, there are essentially two ways to approach these things. One way is to carefully trace at the outset what happens to or under various arithmetic operations like addition, multiplication, etc. This has the benefit of being more concise -- you only need to write each equation once -- but is a lot easier to make errors and hides the double-equation nature of the symbol. I'll admit, though, that when I'm writing a paper I'm generally inclined to take this path.

However, for the purposes of this blog post, I'll explicitly carry out the calculations in parallel equations. (This really looks much better in the PDF. If anyone has any suggestions for improving the web version, please, let me know!) The left-hand column corresponds to , the right-hand column to . At the end I will also show what the results looks like in the shorthand notation and I encourage you to work out the rules on your own. Perhaps in another post I'll address the shorthand notation in detail.

Some more algebra
Now, it's time for some more algebra.¹ Taking the derivative in (7) gives:

(10)

and

(11)

Solving (10) for gives

(12)

Substituting this into (11) and simplifying:

			\|
			\|
			\|

(13)

Solving this for gives:

(14)

Or, in form:

(15)

In the first edition, Jackson converts the back into to get rid of the , but I feel this confuses things, as this expression only holds for a plane wave in the direction. In any case, we now substitute this expression for back into (12) and simplify:

			\|
			\|
			\|
			\|

(16)

Or, in form:

(17)

So, we've finally achieved Jackson's result, allowing for both upward and downward propagating waves.

References
[1] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 1st edition, 1966.
[2] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition, 1998.

¹ In case you were wondering why Jackson left out the whole calculation...

Derivative and Integral of the Heaviside Step Function

2009-06-30T10:04:00.011-04:00

[Click here for a PDF of this post with nicer formatting]
The Setup

(a) Large horizontal scale

(b) ``Zoomed in''

Figure 1: The Heaviside step function. Note how it doesn't matter how close we get to

the function looks exactly the same.

The Heaviside step function , sometimes called the Heaviside theta function, appears in many places in physics, see [1] for a brief discussion. Simply put, it is a function whose value is zero for and one for . Explicitly,

(1)

We won't worry about precisely what its value is at zero for now, since it won't effect our discussion, see [2] for a lengthier discussion. Fig. 1 plots . The key point is that crossing zero flips the function from 0 to 1.

Derivative -- The Dirac Delta Function

(a) Dirac delta function

(b) Ramp function

Figure 2: The derivative (a), and the integral (b) of the Heaviside step function.

Say we wanted to take the derivative of . Recall that a derivative is the slope of the curve at at point. One way of formulating this is

(2)

Now, for any points or , graphically, the derivative is very clear: is a flat line in those regions, and the slope of a flat line is zero. In terms of (2), does not change, so and . But if we pick two points, equally spaced on opposite sides of , say and , then and . It doesn't matter how small we make , stays the same. Thus, the fraction in (2) is

(3)

Graphically, again, this is very clear: jumps from 0 to 1 at zero, so it's slope is essentially vertical, i.e. infinite. So basically, we have

(4)

This function is, loosely speaking, a ``Dirac Delta'' function, usually written as , which has seemingly endless uses in physics.

We'll note a few properties of the delta function that we can derive from (4). First, integrating it from to any :

(5)

since . On the other hand, integrating the delta function to any point greater than :

(6)

since .

At this point, I should point out that although the delta function blows up to infinity at , it still has a finite integral. An easy way of seeing how this is possible is shown in Fig. 2(a). If the width of the box is and the height is , the area of the box (i.e. its integral) is , no matter how large is. By letting go to infinity we have a box with infinite height, yet, when integrated, has finite area.

Integral -- The Ramp Function
Now that we know about the derivative, it's time to evaluate the integral. I have two methods of doing this. The most straightforward way, which I first saw from Prof. T.H. Boyer, is to integrate piece by piece. The integral of a function is the area under the curve,¹ and when there is no area, so the integral from to any point less than zero is zero. On the right side, the integral to a point is the area of a rectangle of height 1 and length , see Fig. 1(a). So, we have

(7)

We'll call this function a ``ramp function,'' . We can actually make use of the definition of and simplify the notation:

(8)

since and . See Fig. 2(b) for a graph -- and the reason for calling this a ``ramp'' function.

But I have another way of doing this which makes use of a trick that's often used by physicists: We can always add zero for free, since . Often we do this by adding and subtracting the same thing,

(9)

for example. But we can use the delta function (4) to add zero in the form

(10)

Since is zero for , the part doesn't do anything in those regions and this expression is zero. And, although at , at , so the expression is still zero.

So we'll add this on to :

(11)

where the last step follows from the ``product rule'' for differentiation. At this point, to take the integral of a full differential is trivial, and we get (8).

References
[1] M. Springer. Sunday function [online]. February 2009. Available from: http://scienceblogs.com/builtonfacts/2009/02/sunday_function_22.php [cited 30 June 2009].
[2] E.W. Weisstein. Heaviside step function [online]. Available from: http://mathworld.wolfram.com/HeavisideStepFunction.html [cited 30 June 2009].

¹ To be completely precise, it's the (signed) area between the curve and the line .

The Schrödinger Equation - Corrections

2009-06-04T09:43:00.010-04:00

[Click here for a PDF of this post with nicer formatting]
In my last post, I claimed

Additionally, we can extend from here that any quantum operator is written in terms of its classical counterpart by

Peeter Joot correctly pointed out that this result does not follow from the argument involving the Hamiltonian. While it is true that

any arbitrary unitary transformation, , can be written as
where is an Hermitian operator,

the relationship between a classical and its quantum counterpart is not as straightforward as I claimed. In reality, we can only relate the classical Poisson brackets to the quantum mechanical commutators, and we must work from there. Perhaps I will discuss this further in a later post.

In any case, though, the derivation of the Schrödinger equation only makes use of the relationship between the classical and quantum mechanical Hamiltonians, so the remainder of the derivation still holds. I am leaving the original post up as reference, but the corrected, restructured version (with some additional, although slight, notation changes) is below.

A brief walk through classical mechanics
Say we have a function of and we want to translate it in space to a point , where need not be small. To do this, we'll find a ``space translation'' operator which, when applied to , gives . That is,

(1)

We'll expand in a Taylor series:

(2)

which can be simplified using the series expansion of the exponential¹ to

(3)

from which we can conclude that

(4)

If you do a similar thing with rotations around the -axis, you'll find that the rotation operator is

(5)

where is the -component of the angular momentum.

Comparing (4) and (5), we see that both have an exponential with a parameter (distance or angle) multiplied by something ( or ). We'll call the something the ``generator of the transformation.'' So, the generator of space translation is and the generator of rotation is . So, we'll write an arbitrary transformation operator through a parameter

(6)

where is the generator of this particular transformation.² See [1] for an example with Lorentz transformations.

From classical to quantum
Generalizing (6), we'll postulate that any arbitrary quantum mechanical (unitary) transformation operator through a parameter can be written as

(7)

where is the quantum mechanical version of the classical operator . We'll call this the ``quantum mechanical generator of the transformation.'' If we have a way of relating a classical generator to a quantum mechanical one, then we have a way of finding a quantum mechanical transformation operator.

For example, in classical dynamics, the time derivative of a quantity is given by the Poisson bracket:

(8)

where is the classical Hamiltonian of the system and is shorthand for a messy equation.[2] In quantum mechanics this equation is replaced with

(9)

where the square brackets signify a commutation relation and is the quantum mechanical Hamiltonian.[3] This holds true for any quantity , and is a number which commutes with everything, so we can argue that the quantum mechanical Hamiltonian operator is related to the classical Hamiltonian by

(10)

Time translation of a quantum state
Consider a quantum state at time described by the wavefunction . To see how the state changes with time, we want to find a ``time-translation'' operator which, when applied to the state , will give . That is,

(11)

From our previous discussion we know that if we know the classical generator of time translation we can write using (7). Classically, the generator of time translations is the Hamiltonian![4] So we can write

(12)

where we've made the substitution from (10). Then (11) becomes

(13)

This holds true for any time translation, so we'll consider a small time translation and expand (13) using a Taylor expansion³ dropping all quadratic and higher terms:

(14)

Moving things around gives

(15)

In the limit the right-hand side becomes a partial derivative giving the Schrödinger equation

(16)

For a system with conserved total energy, the classical Hamiltonian is the total energy

(17)

which, making the substitution for quantum mechanical momentum and substituting into (19) gives the familiar differential equation form of the Schrödinger equation

(18)

References
[1] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition, 1998.
[2] L.D. Landau and E.M. Lifshitz. Mechanics. Pergamon Press, Oxford, UK.
[3] L.D. Landau and E.M. Lifshitz. Quantum Mechanics. Butterworth-Heinemann, Oxford, UK.
[4] H. Goldstein, C. Poole, and J. Safko. Classical Mechanics. Cambridge University Press, San Francisco, CA, 3rd edition, 2002.

¹
² There are other ways to do this, differing by factors of in the definition of the generators and in the construction of the exponential, but I'm sticking with this one for now.
³Kind of the reverse of how we got to this whole exponential notation in the first place...

The Schrödinger Equation

2009-05-26T11:43:00.006-04:00

Update: A corrected and improved version of this post is now up: http://behindtheguesses.blogspot.com/2009/06/schrodinger-equation-corrections.html

[Click here for a PDF of this post with nicer formatting]
notElon asked me to discuss, and to try and derive the Schrödinger equation, so I'll give it a shot. This derivation is partially based on Sakurai,[1] with some differences.

A brief walk through classical mechanics
Say we have a function of and we want to translate it in space to a point . To do this, we'll find a ``space translation'' operator which, when applied to , gives . That is,

(1)

We'll expand in a Taylor series:

(2)

which can be simplified using the series expansion of the exponential¹ to

(3)

from which we can conclude that

(4)

If you do a similar thing with rotations around the -axis, you'll find that the rotation operator is

(5)

(6)

where is the generator of this particular transformation.² See [2] for an example with Lorentz transformations.

From classical to quantum
In classical dynamics, the time derivative of a quantity is given by the Poisson bracket:

(7)

where is the classical Hamiltonian of the system and is shorthand for a messy equation.[3] In quantum mechanics this equation is replaced with

(8)

where the square brackets signify a commutation relation and is the quantum mechanical Hamiltonian.[4] This holds true for any quantity , and is a number which commutes with everything, so we can argue that the quantum mechanical Hamiltonian operator is related to the classical Hamiltonian by

(9)

specifically.

Additionally, we can extend from here that any quantum operator is written in terms of its classical counterpart by

(10)

So, using (4) the quantum mechanical space translation operator is given by

(11)

and, using (5), the rotation operator by

(12)

or, from (6) any arbitrary (unitary) transformation, , can be written as

(13)

where is (an Hermitian operator and is) the classical generator of the transformation.

Time translation of a quantum state
Consider a quantum state at time described by the wavefunction . To see how the state changes with time, we want to find a ``time-translation'' operator which, when applied to the state , will give . That is,

(14)

From our previous discussion we know that if we know the classical generator of time translation we can write using (13). Well, classically, the generator of time translations is the Hamiltonian![5] So we can write

(15)

and (14) becomes

(16)

This holds true for any time translation, so we'll consider a small time translation and expand (16) using a Taylor expansion³ dropping all quadratic and higher terms:

(17)

Moving things around gives

(18)

In the limit the righthand side becomes a partial derivative giving the Schrödinger equation

(19)

For a system with conserved total energy, the classical Hamiltonian is the total energy

(20)

which, making the substitution for quantum mechanical momentum and substituting into (19) gives the familiar differential equation form of the Schrödinger equation

(21)

References
[1] J.J. Sakurai. Modern Quantum Mechanics. Addison-Wesley, San Francisco, CA, revised edition, 1993.
[2] J.D. Jackson. Classical Electrodynamics. John Wiley & Sons, Inc., 3rd edition, 1998.
[3] L.D. Landau and E.M. Lifshitz. Mechanics. Pergamon Press, Oxford, UK.
[4] L.D. Landau and E.M. Lifshitz. Quantum Mechanics. Butterworth-Heinemann, Oxford, UK.
[5] H. Goldstein, C. Poole, and J. Safko. Classical Mechanics. Cambridge University Press, San Francisco, CA, 3rd edition, 2002.

The Dot and Cross Products

2009-04-27T06:30:00.003-04:00

[Click here for a PDF of this post with nicer formatting]

A bad way

The dot product and cross product of two vectors are tools which are heavily used in physics. As such, they are typically introduced at the beginning of first semester physics courses, just after vector addition, subtraction, etc. Although they are not strictly required for these intro courses (see [1], for example), they make the development and computations of work and energy, torque, and electromagnetism far simpler.

Unfortunately, they are consistently introduced in an awful way: by straight definition. That is, using the dot product for example, for two vectors and they say something like

We define the dot product between and as:
or,
where is the angle between them.

Then, for the cross product, either they use an equation like the latter of the above two equations coupled with the ``right-hand rule,'' or a strange algebraic combination of the components of and , often ``simplified'' with help of a startling determinant.¹ See [2], [3], [4], [5] and [6] as a few examples. Although a few of these give a geometric interpretation after the fact, it is usually in passing, and does not really contribute to their discussion. These approaches are not limited to textbooks, either. See [7] for an in-class lecture example.

In these examples, the dot product is introduced first and then the cross product. From one standpoint this makes some sense -- the dot product is definitionally simpler and usually easier to calculate. However, from a conceptual standpoint, I think this order is backwards. Furthermore, in my experience, students, by and large, miss the physical and graphical significance of these definitions, and upon encountering the concepts of work or torque later on, take the resulting expressions purely as definitions as well.² This is yet another example of the fact that definition explanation.

Personally, it is my inclination to wait to introduce these products until they're needed, thus motivating the discussion in the first place. However, I do understand the notion of ``getting it over with,'' and, it's possible that introducing them as abstract concepts lends to easier application of the concepts to general problems. In any case, my discussion follows the latter approach (for better insertion into standard texts) and presupposes understanding of vector basics: addition, decomposition, etc..

A better way

The Cross Product

(a) Geometrical view of the cross product as the parallelogram area.

(b) Graphical derivation of area for two 2D arbitrary vectors, from [8].

Figure 1: The 2D cross product of vectors and .

Say we have two vectors and with lengths and , and we want to find something which is a measure of how much of is perpendicular to . Looking at Fig. 1(a), we can see that the area of the parallelogram sided by the two vectors is such a measure. The area of a parallelogram is

(1)

which, for our case, is the same as

(2)

That is, you can only have an area if you have a ``base'' and a ``height'' perpendicular to the base. Thus area is a good measure of perpendicularity.³

There are two different ways of calculating this area. If the angle between the two vectors is , as in Fig. 1(a), we see that, choosing as the ``base'' we can write the ``height'' as . Alternatively, choosing as the base, we write the perpendicular part as . Then the area is

(3)

However, if we don't know angle between them, we're not completely out of luck. If you look at Fig. 1(b), you can see that for a simple, two-dimensional case, we can express the area in terms of the and components of and :

(4a)

Of course, I could just have easily labeled the axes and which would give a different area

(4b)

or and , which would give yet another area

(4c)

If all we've done is relabel our axes, keeping and fixed, then we wouldn't expect the size of these areas to be different -- and they're not. However, although the amount of area is the same, in a way the areas are different in that they're facing different directions in each case. So, we need a way to distinguish these three areas from each other, and from an arbitrarily oriented area. What we'll do is pick a vector perpendicular to both to -- and thus perpendicular to the area of the parallelogram -- with magnitude equal to the area. We'll call this vector

(5)

and say it's the result of a ``cross product'' of and . However, in principle, we have a choice of two such perpendicular vectors. In Fig. 1, for example, we could choose the vector pointing in either the or direction. Additionally, this arbitrariness can be seen in choosing whether to measure the angle in (3) from to or vise-versa.

So, as a matter of convention, we'll decide to always measure angles from the first term in the cross product ( in (5)) such that

(6)

so if the fingers in your right hand point along the little arcs we draw for angles, your thumb points in the direction that this vector goes. Thus,

(7)

since your hand would curl in the other direction. This is called the ``Right-Hand Rule.'' Then, the areas we discussed in equations (4) become

(8a)

(8b)

and

(8c)

where the subscripts tell us which coordinate plane the two crossed vectors are in. Thus, the cross product represents how much these two vectors point in perpendicular directions, and is a signed area vector perpendicular to the plane described by and .

(a) Geometrical view of the 3D cross product as the parallelogram area.

(b) Looking at the area from the xy-plane (dashed outline), the yz-plane (shaded) and the zx-plane (solid).

Figure 2: The 3D cross product of vectors and and the decomposed area.

So far, though, we've only discussed vectors which have only two coplanar components. But it's fairly straightforward to generalize to arbitrary 3D vectors. See Fig. 2(a), for example. Here the area vector, and hence the cross product vector, is pointing in a complicated direction. However, we know we can decompose any vector into its , and components, and this area vector is no different:

(9)

All we need to do is find out how much area is pointing in each direction. To do that, look at Fig. 2(b). This picture shows what the area between the two vectors looks like if we look only at two coplanar components at a time -- in other words the , and components of the area. But we already know what each of these areas are from (8)! So, then we can combine these equations and write the cross product

(10)

The Dot Product

(a) When B < A.

(b) When B > A.

Figure 3: The projection of vector on to vector .

Having discussed the perpendicularity of two vectors, it's natural to ask if there's a similar measure of the parallelity of two vectors. There are two ways of doing this. The way I'll do it first is explicitly geometrical, the second way is only implicitly geometrical.

Say we have two vectors and again, and we want to know how much of is pointing (projected) along . From Fig. 3 we see that this is equal to

(11)

Similarly, the amount of that is projected along is

(12)

Now, it would be nice if we could have one statement which somehow combined the these two statements and gave a measure both of how much of is pointing along and of how much of is pointing along ; that is, a measure of how much these two vectors point in the same direction. Additionally, since (2) used a multiplicative combination of the two vectors as a measure of perpendicularity, we'll try a similar multiplicative measure here, as well.

If we multiply (11) by and (12) by we can write a single, symmetric statement

(13)

and say it's the result of a ``dot product'' of and , which amounts to multiplying together the parallel parts of two vectors. Here, too, if we don't know the angle between them, we're not out of luck. For a vector written in component form, it's straightforward to multiply the parallel parts together:

(14)

However, unlike the cross product which gave us an actual area with a natural direction, this area-like structure is actually a measure of ``non-area'' and doesn't really have a natural direction. Although we could, completely arbitrarily, define a direction for this dot product,⁴ and thus make it a vector as well, to the best of my knowledge such a quantity does not have any uses in physics, so we'll leave it alone and treat it only as a number (scalar).

Alternatively, we know that the largest area possible between two vectors occurs when they are perpendicular to each other, where the area is (you can also see this from (3)). If we are interested in the maximal ``amount perpendicular'' we can write

(15)

where they are squared to take care of sign problems. Now, when they are completely parallel there is no area, and we're left only with non-area, which, also, can't be larger than the total maximum area, so

(16)

as well.

Then using a rough analogue to the Pythagorean theorem we see that

(17)

which, choosing the positive root, is the same as (13).

References

[1] F.W. Sears and M.Z. Zemansky. University Physics. Addison-Wesley, Reading, MA, 2nd edition, 1955.

[2] D. Halliday amd R. Resnick and J. Walker. Fundamentals of Physics. John Wiley & Sons, Inc., 7 edition, 2005.

[3] G.R. Fowles and G.L. Cassiday. Analytical Mechanics. Thomson Brooks/Cole, Belmont, CA, 7th edition, 2005.

[4] J.R. Reitz, F.J. Milford, and R.W. Christy. Foundations of Electromagnetic Theory. Addison-Wesley, 4th edition, 1992.

[5] D.J. Griffths. Introduction to Quantum Mechanics. Pearson Prentice Hall, 2nd edition, 2005.

[6] J. Stewart. Multivariable Calculus. Brooks/Cole Publishing Company, Pacific Grove, CA, 4th edition, 1999.

[7] W. Lewin. Lec 3 | 8.01 Physics I: Classical Mechanics, Fall 1999 [online]. Available from:http://www.youtube.com/watch?v=fwNQKjTj-0w#t=13m45s [cited 16 March 2009].

[8] C.T.J. Dodson and T. Poston. Tensor Geometry: The Geometric Viewpoint and its Uses. Springer, 2nd edition, 1997.

¹ Of course, not all first semester physics students even know what a determinant is, but that is not my point.

² Work , and torque

³Another way to approach this is to start by calculating the area, and then explain that this can also be viewed as a measure of perpendicularity.

⁴ i.e. along either or , or along a line midway between them, or perpendicular to them, or some other arbitrary choice

New Layout

2009-04-20T16:20:00.000-04:00

I'm trying a new, wider layout for the blog to allow for wider equations. I'm curious what you think about the new layout. Please, let me know in the comments, via email, or via the poll on the top-right of the page. For comparison, this is what the site used to look like:

The Quantum Harmonic Oscillator Ladder Operators

2009-03-30T07:00:00.005-04:00

[Click here for a PDF of this post with nicer formatting]

The Setup

As a first example, I'll discuss a particular pet-peeve of mine, which is something covered in many introductory quantum mechanics classes: The algebraic solution to quantum (1D) simple harmonic oscillator.¹The one-dimensional, time-independent Schrödinger equation is:

(1)

where is the Hamiltonian of the system. Explicitly, this Hamiltonian is

(2)

where is the particle's momentum, is it's mass and is the potential the particle is placed into. The potential associated with a classical harmonic oscillator is

(3)

where . For the sake of convenience, so we don't get bogged down with various factors,² we'll consider . Then, if we substitute (3) back into (2) we write the Hamiltonian as:

(4)

A bad way

Figure 1: How Einstein developed his famous expression.

Now, at this point, many texts, (see [1] or [2] for example,) define, with no motivation other than future ``convenience'', two operators

(5a)

(5b)

and proceed to show how these can be used to simplify the Hamiltonian and easily solve the problem. While it is an elegant and quick solution, this presentation is completely useless. I find it highly unlikely that Dirac sat down to solve this problem and tried a whole series of random operators

and so on, along with their complex conjugates, until he lucked out with the solution in (5), see Fig. 1.

A better way

What is far more likely is the argument given by Griffiths in [3], which I'll loosely follow. He presents a rationale and a method for approaching this problem. Namely, he suggests factoring the Hamiltonian (4) into terms linear in and . If we ignore the operator properties of and momentarily, and consider the classical quantities, we can factor the Hamiltonian

(6)

Now we see a reason why (5) makes sense to try. Each term, either on the right or left side of (6),³ contains two terms which are complex conjugates of each other. If this were a classical problem, we could, in principle, make a change of variables converting the Hamiltonian to something of the form

(7)

where is any of the combinations of and in (6). Although this is not a ``canonical transformation,'' the symmetric form⁴ of the Hamiltonian allows us to reduce the Hamiltonian from a function of two dynamical variables to a function of a single dynamical variable.

Switching back to quantum mechanics, we now see a rationale for choosing and as we did.⁵ Although deciding which variable to attach the to and its choice of sign is a guess,⁶ we now have a general method for approaching Hamiltonians that look like they might be easily factored classically -- try using the classical factorizations with quantum quantities and see what happens.

References
[1] J.J. Sakurai. Modern Quantum Mechanics. Addison-Wesley, San Francisco, CA, revised edition, 1993.
[2] F. Schwabl. Quantum Mechanics. Springer, 3rd edition, 2005.
[3] D.J. Griffiths. Introduction to Electrodynamics. Pearson Prentice Hall, 3rd edition, 1999.
[4] P.A.M. Dirac. The Principles of Quantum Mechanics. Oxford University Press, USA.

¹ I'll be discussing this in the boring algebraic sense of symbols and whatnot, leaving off the geometric/visual interpretation of the algebra for another time.
² Or maybe I'm just lazy
³ And, since this is a classical problem, the order does not matter, either.
⁴ Yes, it's only up to a constant which I've set to one, but you can still symmetrize things by changing to unitless variables, see [2].
⁵ Recall that the for a quantum-mechanical operator/matrix serves the role of the in (7).
⁶ It actually does not matter to the solution of the problem. The only change is which operator acts to `step up' the state. Dirac actually defined his operators with the attached to the variable, see [4].

Behind the Guesses

2009-03-23T15:39:00.010-04:00

[Click here for a PDF of this post with nicer formatting]
Why I'm starting this blog

Figure 1: How physics does not work. Comic (cropped) from [1].

If you sit in enough physics, and to a lesser degree mathematics lectures, and/or read enough books, you're bound to encounter the phrase ``We guess the solution...'' Or, ``If we define ... it turns out that..." Now, this is not inherently problematic. After all, if the goal is only to solve some problem which does not have an obvious solution, or prove a random mathematical statement, these approaches will suffice. Then, the student or reader, assuming they memorize the solution (or remember where to find it), will now be exquisitely equipped to solve exactly the same problem.

However, this approach does not teach students any skills, tools or methods for approaching new and exciting problems. Additionally, I find, it is difficult to gain any meaningful understanding of the material through the ``Guess and Show'' method. It's possible that for many people definition is the same as explanation, but from my personal experience, and from discussions with other students, it seems that such people are not the majority of physics students.

So I'm starting this blog to attempt to show some of the reasoning that goes into these guesses. After all, the great physicists didn't just pull their solutions out of their butt, see Fig. 1. Even if getting the correct solution ultimately was a result of guesswork, physicists don't sit around solving problems by randomly throwing darts at variables to see what sticks.

Now, I am not claiming that science should be taught according to the historical development of the theories. In fact, usually doing that leads to needless confusion. What I am saying, however, is that even if we know the answer already, it is still essential to explain some rationale as to why such an answer makes sense -- even if it's not the original idea behind the guess.

What to expect
So what can you expect to see on this blog? Well, a lot of quantum mechanics, to be sure. The conceptual difficulty and widespread lack of understanding,¹ of this subject, together with the plethora of ``known solutions'' make it a prime target. But I'll hopefully cover topics in electrodynamics, classical mechanics, thermodynamics, and so on. Maybe even some mathematics, if I'm feeling up for it.

In general, I will not carry through the solution to the end; you can always follow my references to see the complete thing. Instead I will try to provide some physical or mathematical reasoning behind some so-called ``guesses'' which ``turn out'' the correct solution. Additionally, with each post I will try to highlight a key methodological point that going behind the guesses gives us. These points will be written in a different typeface to easily spot.

I do not expect to always get things right. This is a learning experience for me, too. If I make mistakes, please point them out in the comments or email me. If I make corrections, I will leave the original, unedited version up as reference -- sometimes we learn more from mistakes! I will try to post about once a month until I run out of ideas. I suggest you use one of the subscription options on the side so you don't have to keep checking the site to see if I've updated. So, if you have a particular solution (or problem) that's always bothered you, let me know, and I'll see if I can come up with something. And if I can't I'll post it anyway, and we'll see if anyone out there in the vast Intertubes who stumbles across the blog can help out.

Each blog post will be have a link to a nicely formatted PDF with the same content. Come back here this Monday (March 30, 2009) for my first real post.

References

[1] Z. Weiner. Saturday morning breakfast cereal [online]. March 2009. Available from: http://www.smbc-comics.com/index.php?db=comics&id=1452 [cited 9 March 2009].

¹ I don't claim to understand quantum mechanics. But at least I don't pretend to.