Of Galileo, Groups and What’s So Special about the Speed of Light

When I was ten years old, I first read about Einstein’s famous, but non-general, equation $E=m\,c^2$. The American childrens series “How and Why Wonder Books” number 5015 “Atomic Energy” naturally described $c$ as standing for the speed of light to me. I was utterly mystified. What on earth has light got to do with energy locked up in nucleusses, for example? As I thought about this some more, I slowly decided that this equation must mean that light had a fundamental role in existence, that perhaps everything was made of a kind of light. Much to my chagrin, I swiftly found that I was hardly alone or original in these thoughts. Indeed several of my fellow teenage students put the right same question, “what’s the speed of light got to do with the energy content of matter”? In the decades since, almost every teenager interested in physics whom I have met seems to arrive at this same question independently: these bright young people would number at least in their tens. And the needless mystery that the “speed of light” bears with it drives them nearly to distraction. Light still seems to be centerstage for much of elementary relativity education. This is because it seems customary to stick to something near Einstein’s own, original, approach[1] to the subject and to begin from the first and second special relativity postulates. In he past few years I have been active on the internet forum Physics Stack Exchange and the relativity questions there suggest that the study of light is still very much the mainstream pedagogical approach to special relativity. There are notable exceptions which begin special relativity with an approach much more like Ignatowski’s[2]; see for example, the excellent introduction to special relativity from scratch in Crowell’s book[3] or those cited in §2.

When I was fifteen, I read about special relativity and followed essentially Einstein’s first, 1905 conception[1] of it wherein light was centerstage, time dilation was derived from bouncing pulses of light held between moving mirrors and Maxwell’s equations were put on some kind of exalted pedestal as the ultimate truth: for me at that age this was a desirable situation for in those days, it was a pretty good brag for a fifteen year old to say he had grasped Maxwell’s equations, or at least thought he had, and so special relativity was a chance to apply my newly gotten knowledge. In many ways my belief that I had mastered the notions of special relativity through Einstein’s Maxwellian approach was mistaken; not wrong, in the sense of being physically incorrect, but it is certainly not how I now describe special relativity when asked the question, “what’s so special about the speed of light in relativity?”. To which I would give (and have on several occasions given) an answer summarized by:

The constant $c$ encodes certain symmetries of our universe that arise from the equivalence of physical laws described from different frames in uniform relative motion, as well as the way wherein causality manifests itself in our universe. It comes to mean a speed that is observed to be the same by all inertial observers and that only objects without rest mass can be observed to move at this speed (and indeed must always be observed to move at this speed). The symmetries we speak of were perfectly understood by Galileo, but he added the further assumption that time intervals between two given events would be independent of the observer. Special relativity can be thought of as what arises from Galileo’s relativity if one relaxes this last assumption, and then $c$ determines which of a whole family of possible transformation laws consistent with this generalized Galilean relativity is followed by our universe. So $c$ is primarily about symmetry and causality, not light. It is experimentally found, however, that the speed of light is the same as this speed and this last fact can be taken as an experimental observation that light is mediated by a massless particle.

Einstein and his theory of special relativity are still spoken of in our culture (hopefully, though, not in our physics departments), fully one hundred and ten years after the fact and notwithstanding overwhelmingly experimental evidence, as being abstract and at the theoretical fringe. The idea of special relativity as being at the unfathomable is like the attitude “maths is hard” or “boring” that can damage child numeracy, whereas one needs to learn relativity as a centerpiece of any physics education and as an expression of the most basic symmetries of the everyday World. Hence, like Ignatowski, we shall hereafter use Galileo’s relativity postulate alone together with isotropy and continuity assumptions:

Principle of relativity (Galileo) [4]: “No experiment can measure the absolute velocity of an observer; the results of any experiment performed by an observer do not depend on his speed relative to other observers who are not involved in the experiment.” (italics mine)

treating Einstein’s postulate as a summary of the Michelson-Morley experiment; more of this in §6. As we shall show, Galileo’s postulate together with some highly intuitive assumptions but without Einstein’s postulate foretells the possibility of a universal invariant speed as well as and alongside the possibility that we wontedly think of as “Galilean relativity”, wherein this universal speed is infinite. This speed also comes to mean the maximum speed that a cause-effect relationship can propagate at; we’ll make that statement more precise in §3. This prediction follows independently of light and the fact that the speed of light experimentally is found to be invariant means that the speed of light must be the same as the invariant speed and special relativity then becomes a tool to study the properties of light with, rather than a theory predicated on light itself.

What does Galileo’s postulate mean really? When special relativity is first introduced to students, it is fair to say that many of them have not had wide practice at reconstituting glib principles and axioms like Galileo’s principle of relativity to grasp their full meaning. So, in the spirit of this paper, seeking to emphasize the soundness of everyday physical intuition, it’s helpful to lay before students some of the thoroughly everyday observations summarized by this principle. One really can’t do better in this task than Galileo himself as his character Salviati tells the Allegory of Salviati’s Ship[5]:

Shut yourself up with some friend in the main cabin below decks on some large ship, and have with you there some flies, butterflies, and other small flying animals. Have a large bowl of water with some fish in it; hang up a bottle that empties drop by drop into a wide vessel beneath it. With the ship standing still, observe carefully how the little animals fly with equal speed to all sides of the cabin. The fish swim indifferently in all directions; the drops fall into the vessel beneath; and, in throwing something to your friend, you need throw it no more strongly in one direction than another, the distances being equal; jumping with your feet together, you pass equal spaces in every direction. When you have observed all these things carefully (though doubtless when the ship is standing still everything must happen in this way), have the ship proceed with any speed you like, so long as the motion is uniform and not fluctuating this way and that. You will discover not the least change in all the effects named, nor could you tell from any of them whether the ship was moving or standing still. In jumping, you will pass on the floor the same spaces as before ….throwing something to your companion, you will need no more force to get it to him whether he is in the direction of the bow or the stern….. The droplets will fall as before into the vessel beneath without dropping toward the stern …..the fish in their water will swim toward the front of their bowl with no more effort than toward the back…. Finally the butterflies and flies will continue their flights indifferently toward every side, nor will it ever happen that they are concentrated toward the stern, as if tired out from keeping up with the course of the ship …. The cause of all these … effects is the fact that the ship’s motion is common to all the things contained in it, and to the air also. That is why I said you should be below decks; for if this took place above in the open air, which would not follow the course of the ship, more or less noticeable differences would be seen in some of the effects noted.

Galileo here poetically broadens the reach of the Copernican idea that we on Earth dwell in no particularly privileged or atypical place in the universe to the idea that also there is no particularly privileged inertial frame of reference. Galileo’s famous principle underwent a hibernation whilst luminiferous aether theory was taken seriously, as the latter implies a detectability of one’s uniform motion relative to the aether. There are, of course, known conceptual difficulties defining “uniform” motion, but without an exposition of general relativity there is a simple solution for students. We simply say an inertial frame is one wherein any system of accelerometers at rest relative to the frame registers unanimously nought. This should be satisfying for students in 2015; invite your students to load and explore an accelerometer monitor app onto their phone or tablet computer, suggesting the experiment of dropping the device onto a pillow so that they can witness the accelerometer detecting the inertial, freefalling frame. But, once the existence of inertial frames is accepted, we shall, like Galileo, show that only (1) Galileo’s relativity principle together with (2) Copernican notions of invariance of physics with respect to co-ordinate translation and (3) Intuitive continuity and monotonicity of “swiftness” assumptions imply the correct form of the co-ordinate transformation between inertial frames. Copernican spatial isotropy assumptions then narrow down the free parameters to three independent, unspecified constants: the universal “Ignatowski Speed” $c_I$, a signature $\zeta=\pm1$ and a parameter defining rotation between co-ordinate axes of relatively uniformly moving frames in the plane orthogonal to the motion and proportional to the motion’s rapidity. If we, like Galileo and Newton, further impose an assumption of absolute time, we find that $c_I$ is infinite. Lastly, with a finite $c_I$, causality considerations define which signature prevails in our universe. I propose henceforth to call the speed $c_I$ the universal Ignatowski speed, to emphasize that its universality and independence from the particular phenomenon of light and the so-called second postulate of special relativity. It also seems fitting to honor the first person, Vladimir Ignatowski, largely forgotten amongst mainstream physicists and mathematicians, who was the first to derive special relativity in the way we are studying in this paper.

The Group and Linearity: Sailing on Salviati’s Ship with Copernicus

There are other approaches to the derivation that follows: a survey the literature of Ignatowski’s approach is introduced in the paper by Liberati, Sonego and Visser[6]; two of the more readable ones cited by this paper are those by Lévy-Leblond[7] and the approach in Rindler’s book[8]. Unlike all of these, the present paper will not make differentiability assumptions for co-ordinate transformations.

“Galilean Relativity” is wontedly understood as essentially the vectorial velocity addition rule used to calculate transformation of velocities in different inertial frames or, equivalently, as the ten dimensional Galilean group of all possible transformations between the co-ordinate systems of “inertial frames” (as defined, e.g. by our accelerometer test of §1) comprising rotations and translations, each in three dimensions, Galilean boosts in three dimensions, following the vector velocity addition law, and displacement in time. One can represent the Galilean group as a group of $5\times5$ real matrices acting on homogeneous co-ordinates containing the three spatial co-ordinates the time co-ordinate and a unit and this should be demonstrated to students. In my experience as a volunteer with my local primary school’s extra curricular science program, small children about the age of eight to ten or older can begin to build an intuition for one-dimensional Galilean Relativity and affine co-ordinates by counting along number lines that are imagined to slide at a set number of notches per unit time relative to one another. This relativity that seems to arise from mostly unconscious intuitions we all seem to have from a very young age is indeed in keeping with the all of the Salviati Allegory, the Copernican and continuity assumptions of §1, and, if we further require absolute time, i.e. that the time interval between two events is measured to be the same by all inertial observers, then this relativity is the unique relativity in keeping with these assumptions. When we make the uniquely Einstein step of relaxing the absolute time assumption, special relativity follows as a “generalized Galilean Relativity”. Let us see how this comes about.

Although the Salviati Allegory describes measurements from within one’s own frame, it also describes transformations between frames through its particular heed to motions within one’s own laboratory: underlying the unchanged motions relative to us, as the below-decks seafarers, of the allegory’s butterflies and fishes wrought by inertial transformation. For example, instead of fishes in one’s own laboratory, one can look at high speed muons, which, by decay lifetime statistics defined by a mean lifetime $\tau_m$, can report to us their lengthened lifetime as measured by the laboratory’s time co-ordinate, and this lengthening is unchanged by any uniform motion of our laboratory. More abstractly, the allegory calls on us to imagine inertial frames “nested” inside one another (Fig. (1)): any frame can be a “below-decks ship frame” and any other inertial frame our “fish frame” nested “inside” the other. With this observation of Salviati’s Ship in mind, we imagine the co-ordinates of events as measured from different inertial frames. Thus, let Frame 2 be our “ship frame” and Frame 3 our “fish frame”; Frame 2 measures the spacetime co-ordinates of an event $A$ in the co-ordinates $X_2$ of its own frame. It can also calculate the co-ordinates $X_3$ that Frame 3 would measure in Frame 3’s own co-ordinates for $A$, i.e. $X_3=T_{3\,2}(X_2)$, where $T_{2\,3}$ is the transformation between co-ordinates of an event in the different inertial frames in question and whose structure it is our aim to find.

Figure 1: Nested Laboratories, each in uniform relative motion to the other, generalizing Salviati’s Allegory

Now we think of Frame 1 as our “ship frame”, Frame 2 our “fish frame” and take heed that Frame 1 measures the co-ordinates of $A$ as $X_1$ and can also calculate the co-ordinates $X_2$ that Frame 2 would measure as $X_2=T_{1\,2}(X_1)$. From this last calculation, it can then impart the transformation $X_3=T_{3\,2}(X_2)$ to get $X_3=T_{3\,2}(T_{2\,1}(X_1))$ to find the co-ordinates in Frame 3. The transformation $T_{3\,2}$ is the same transformation as Frame 2 would use to calculate $X_3=T_{3\,2}(X_2)$ with: the transformation is unaltered by the motion of Frame 2 relative to Frame 1, or indeed any other putative “absolute” motion of Frame 2, in accordance with Galileo’s principle. If Galileo’s principle did not hold, the transformation would in general need to be some function $\tilde{T}_{2\,3}(X_2,\,V_{1\,2})$ of both the co-ordinates $X_2$ in Frame 2 and parameters $V_{1\,2}$ describing the putative absolute motion of Frame 2 relative to some other preferred reference frame. In summary, Galileo’s principle means that the transformation between any two inertial frames is wholly defined by the relative motion between the two frames alone and the spacetime co-ordinates of an event in any inertial frame serve as a kind of state, a complete specification of that event. So, given a set of inertial frames in relative uniform motion, we can define co-ordinate transformations $T_{k\,j}$ between any pair $(j,\,k)$ of them and the set $\mathfrak{L}$ of all such transformations is closed under the operation of function composition. Given our discussion so far, they compose following the rule $T_{j\,\ell} = T_{j\,k}\circ T_{k\,\ell}$, when we think of a single transformation as being made up of two or more “hops” between intermediate inertial frames. Each equation of the form $T_{j\,\ell} =t \circ T_{k\,\ell}$ has a unique solution: it is $t=T_{j\,k}$, the transformation wholly defined by the relative motion between Frames $j$ and $k$ and the pair $(\mathfrak{L},\,\circ)$ is therefore a quasigroup. There is, of course, the identity transformation $\mathrm{id}$ between any frame and itself, so that our quasigroup is seen to be a loop, i.e. a quasigroup with identity, and this fact also implies that each transformation has a unique left and a unique right inverse, if we think of the identity as comprising the “hop” $T_{j\,k}$ followed by the “hop” $T_{k\,j}$. Lastly, if we think of a transformation’s being made up of three or more “hops” between two or more intermediate inertial frames, the composition can depend only on the beginning and end frame, being defined by the relative motion between these two, so the composition is both altogether independent from how we bracket the transformations for the constituent hops and independent of the infinite choices of the intermediate frames one could make for a decomposition. Inertial motions are true motion states: we can think of them as nodes on a state transition diagram (Fig. (2)) and they are specified by the nodes alone: any sequence of co-ordinate transformation hops, thought of paths comprising links through the diagram, leading from one state to the other must be independent of path. Composition is therefore associative and so our pair $(\mathfrak{L},\,\circ)$ is in fact a group. The grouphood of $(\mathfrak{L},\,\circ)$ is how the Galileo relativity principle encodes itself into the algebra of co-ordinate transformations.

State Diagram
Figure 2:Inertial Frames Define Motion States: Independent of Paths Between Them in State Transitions

Although the group was only formalized in the second half of the nineteenth century as an abstract notion in its own right, we see that it arises very naturally from the Galileo’s principle. As soon as one makes the revolutionary conjecture, hallmark of the science of the Enlightenment, that identical systems behave in the same way, thus that science might have predictive power, one arrives at the notion of state and the notion that a system’s state at any time (or, more abstractly, other system evolution defining parameter(s)) is a reversible transformation of its state at any other time (or values of evolution parameters). Informally, the information contained in a state is conserved by a system’s evolution. Fig. (2) shows that the behaviors of our system of inertial frames and the group of transformations between generalize to any description of a physical system with well defined states. Thus, at the very core of its being, the notions of state and group are all about information conservation, so that they were, latently, important from the Renaissance through the age of Enlightenment onwards. Many physics problems are manifestly about symmetry and the group notion’s usefulness to them is obvious, but the group as the most basic state information conserving mathematical system underlies a much broader range of physical science than simply these. The black hole information paradox and the relevance today of the question of whether natural transformation are truly in-principle reversible shows how important the notion has been and continues to be to science. That Galileo’s principle implies information conservation is not surprising for it is saying, in summary, that any inertial frame must be as good as any other for the description of physics, and so descriptions in any frame must be derivable from those in any other inertial frame and contrariwise.

So far our discussion has been very general. So we now investigate how our most general group acts on the co-ordinates: the exact nature of the transformations. We write the action of $T\in\mathfrak{L}$ on the co-ordinates $X$ by $f(T,\,X)$ where $f$ is some arbitrary nonlinear function of the co-ordinates $X$\footnote{In this notation, our transformation composition respecting Galileo’s principle is expressed as $f(T_1\,f(T_2,\,X)) = f(T_1\circ T_2,\,X)$, i.e. we have a generalized flow. When we study one parameter groups of transformations wrought by collinear relative motions in \eqref{LieGroup} and we replace the transformations by their rapidities, this equation becomes $f(\eta_1,\,f(\eta_2\,X))=f(\eta_1+\eta_2,\,X)$ and we indeed recover the flow equation.}. Our discussion so far allows for very general spacetime co-ordinates as well, we’ve said nothing about them other than they form some unique labelling of spacetime events. We now specialize them to affine co-ordinates defined by the of rational multiples of displacements along linearly independent directions in space and time of uniform intervals marked out by unit measuring rods and clock ticks in each of the inertial frames. An excellent, fuller summary of affine geometry as applied to special relativity is given in Chapter 2 of Crowell[3]. For a first reading, the student can think of Cartesian co-ordinates augmented by a time co-ordinate. Here the first Copernican notion enters:

Homogeneity Postulate (Copernican Principle, Part 1):\footnote{The modern Copernican principle, that neither Solar system nor Earth hold any privileged place in the Universe is named for its analogy in Copernicus’s thought, not for his exact postulate of a Sun-centered universe unthroning the Earth in his De revolutionibus orbium coelestium.}: Physical laws are unaffected by any translation in either space or time of the origins of the affine co-ordinates labelling an inertial frame. In particular, the image of a vector $\vec{AB}$ joining events $A$ and $B$ under a co-ordinate transformation wrought by uniform relative motion between inertial frames is unaffected by the addition of any offset added to both ends $A$ and $B$. In symbols, if $X\mapsto f(T,\,X)$ defines the image of the point with position vector $X$ under transformation $T\in\mathfrak{L}$, then $f(T,\,X+Y) – f(T,\,0+Y) = f(T,\,X)-f(T,\,0);\,\forall\,T\in\mathfrak{L}$, where here $0$ is the chosen origin of affine co-ordinates and $Y$ is any chosen offset vector in any direction in spacetime. “Nature does not care where we put our origin”.

We can think of this postulate as partly a definition of “uniform relative motion”: it is, further to motion not begetting any positive result from our accelerometer test, motion that preserves the affine structure of the space: it is uniformly “rigid” so that all points undergo the same motion in each unit time interval. We experimentally find that these two notions are the same thing. Given this postulate:

\begin{equation}\label{CauchyFunctionalEquation}\begin{array}{cl}&\displaystyle{f(T,\,X+Y)-f(T,\,Y) = f(T,\,X)-f(T,\,0),\,\forall\,X,\,Y\,\in\mathbb{R}^{N+1}}\\\Leftrightarrow & \displaystyle{h(X+Y)=h(X)+h(Y);\;h(Z)\stackrel{def}{=}f(T,\,Z)-f(T,\,0)\,\forall\,X,\,Y,\,Z\,\in\mathbb{R}^{N+1}}\end{array}\end{equation}

where $\mathbb{R}^{N+1}$ is our $N$ dimensional affine co-ordinates augmented by the time co-ordinate. Our condition of spatial Copernican homogeneity is then the famous Cauchy functional equation generalized to $N+1$ dimensions. For one, real dimension, the only continuous solution is $h(X)\propto X$; there are other solutions, but they are everywhere discontinuous[9]. To broaden the analysis from a function $h:\mathbb{R}\to\mathbb{R}$ to $h:\mathcal{V}\to\mathcal{V}$ for a real vector space $\mathcal{V}$ we argue analogously with the method of Hewitt \& Stromberg[9] to find that $h(q\,X) = q\,h(X)$ for any vector $X\in\mathcal{V}$ and rational $q\in\mathbb{Q}$, whence the only continuous extension is $h(\alpha\,X) = \alpha\,h(X);\,\forall\,\alpha\in\mathbb{R}$. Then, since $h(X+Y)=h(X)+h(Y)$, $h$ is a linear homogeneous function defined by a square matrix “proportionality constant”. So now we make the following continuity postulate:

Continuity of Transformation Postulate: Transformations between co-ordinate systems corresponding between inertial frames where the relative motion between the frames are continuous matrix functions of the space and time co-ordinates.

This postulate encodes our utterly everyday observation that in uniform motion as we ride in a bus, the image of our World does not become shredded and shivvered into an everywhere discontinuous picture: we still see walkers and roadside trees as connected continuous sets with their shape still altogether recognizable. Therefore our most general co-ordinate transformation consistent with both our homogeneity and continuity of transformation postulates is the transformation $f(T,\,X)=\Gamma(T)\,X+\Delta(T)$ where $\Gamma$ is an invertible $N\times N$ square matrix and $\Delta(T) = f(T,\,0)$ measures the “offset” between the origins of the two co-ordinate systems. Again, “Nature does not care where we put our origin”, so we can make a linear transformation on the right hand side co-ordinate system to another relatively stationary frame displaced in space and time so as to cancel the “offset”. Therefore, without loss of generalness, we can take our co-ordinate transformations to be both linear and homogeneous (i.e. here having the sense of aligning the spatial/ temporal origins of both co-ordinate systems). We only need to make an assumption of the transformation’s continuity but nothing about its differentiability in the co-ordinates; the latter falls out “for free” from the Galilean/Copernican assumptions once we add continuity. Therefore, our most general transformation\footnote{Here we write transformations as operators acting on row vectors from the right, so that operation $T_a$ followed by $T_b$ is more straighforwardly written $T_a\,T_b$ and transformations compose left to right. This approach is of course wholly equivalent to the more conventional one where matrices act on column vectors from the left so that transformations compose right to left.} at last is the $4\times 4$ matrix equation $X^\prime =X\,\boldsymbol{\Lambda}$ where $X,\,X^\prime$ are the $4\times1$ row vectors of spacetime co-ordinates in the two frames, and our group of transformations between inertial frames is a matrix group acting linearly on affine co-ordinates.

We now specialize to collinear motions, so that we can characterize our group by one signed real “swiftness” or “velocity” parameter $\eta$. We shall henceforth talk about collinear motions, regardless of their sense, as being “along the same ray” and collinear motions with the same heading (sense) as being “in the same direction”. Thus, for example, motions at unit speed in the $+z$ direction and unit speed in the $-z$ direction are motions along the same ray, but motions in opposite direction. Our intuitive everyday experience suggests that there is a continuously linked set of transformations between the identity transformation (e.g. bus stopped at the roadside) and the bus-street transformation with the bus at its cruising speed. At each instant of its acceleration, there is a valid inertial frame momentarily co-moving with the bus. Each such frame is defined by a real-valued “velocity” parameter lying between nought and the cruise speed, and there is precisely one such transformation for every distinct real value in this interval. We are to think of $\eta$ as a generalized relative velocity parameter: any continuous, monotonic map of the real line $\ell:\mathbb{R}\to\mathbb{R}$ gives us another parameterization $\ell(\eta)$ that will do equally as well as any other parameterization and so there are many possible parameterizations. $\eta$ could be the distance-over-time velocity measured by the bus’s speedometer, as in §3. But we shall also make use of another more natural parameterization – a transformed velocity – with simpler mathematical properties, called the rapidity. For now, one is to think of both as equally good generalized speedometer readings. Furthermore, the geometry of a frame moving relatively to us is a continuous function of the velocity parameter; again, as with the continuity of transformation property, we don’t see a tree’s motion state out of our bus window discontinuously change as the bus accelerates. We also take heed of a monotonicity notion that the composition of two transformations of nonzero velocity in the same direction is a transformation corresponding to a “swifter” relative velocity in that direction. Thus, we now add a continuity assumption assumption expressing this smooth continuum idea. Note that we haven’t yet rigorously defined “collinear” motion; this notion will fall out of our continuity postulate:

Continuity and Monotonicity of Group Composition Postulate (Version 1): There exist collinear motions, whose co-ordinate transformations form a matrix group parameterized by a lone real parameter (“generalized velocity” or “rapidity”) $\eta$ such that the velocity parameter of the composition of two such transformations is a continuous function of both the velocity parameters of the individual transformations and the”velocity” is a monotonic notion: higher magnitude velocities always correspond to “swifter” relative motions.

It then follows from the theory of Lie groups that the most general possible group of transformations between inertial frames fulfilling our four postulates (Galilean, Copernican Part 1, Continuity of Transformation and Continuity of Group Composition) so far is the one parameter matrix Lie group:

\begin{equation}\label{LieGroup}\mathfrak{L} = \{\exp(\eta\,K)|\,\eta\in\mathbb{R}\};\quad \boldsymbol{\Lambda}(\eta) = e^{\eta\,K}\end{equation}

for some constant $4\times 4$ matrix $K$ wholly characterizing our group and the direction of motion and where $\eta$, our real rapidity parameter, fully characterizes the motion’s “swiftness” and is a one-to-one function of the relative velocity. Depending on the level and sophistication of the students, the teacher might choose to leave the statement of \eqref{LieGroup} as an unproven but plausible statement, or use a treatment more like Ignatowski’s[7], [6] and avoid the Lie group discussion and \eqref{LieGroup} out altogether. For a higher level first course, or a second course in special relativity, there is a more careful proof of \eqref{LieGroup} in the Appendix. We take as the most convenient (i.e. most straightforwardly deduced in the Appendix) definition the one where $\boldsymbol{\Lambda}(\eta_1)\,\boldsymbol{\Lambda}(\eta_2)=\boldsymbol{\Lambda}(\eta_1+\eta_2)$, i.e. with an additive rapidity parameter.

It may not yet be altogether obvious that this can be a group of “collinear” motions as we would intuitively understand this notion, but to understand this, we now we add the spatial isotropy postulate:

Isotropy Postulate (Copernican Principle, Part 2): The laws of physics are isotropic in space. No direction in space is preferred.

Therefore, without loss of generalness, we assume that the ray defining collinear motion is the $x$-axis, that the $y$ and $z$ co-ordinates are orthogonal to that motion and the transformations for relative motions along any other ray are simply found by a rotation of co-ordinates. Our task is reduced to finding the most general form of our $4\times 4$ matrix $K$. We also assume that we write our co-ordinates as row vectors of the form $(t,\,x,\,y,\,z)$, i.e. with the time co-ordinate at index “0”. By our isotropy postulate, no direction in the plane orthogonal to our motion can have any preference over any other. Therefore, both our transformation matrix $e^{\eta\,K}$ in \eqref{LieGroup} and $K$ itself must be unchanged if we transform our co-ordinates by any rotation $R_x(\phi)$ through any angle $\phi$ about the $x$-axis. Otherwise put: $R_x(\phi)\,K\,R_x(\phi)^{-1} = K$, thus $K$ must commute with $R_x(\phi)$ and so the invariant subspaces of $R_x(\phi)$ and $K$ must be the same. The eigenvectors of $R_x(\phi)$ are $(0,\,0,\,1,\,\pm i)$ together with any pair of linearly independent superpositions of $\hat{T}=(1,\,0,\,0,\,0)$ and $\hat{X}=(0,\,1,\,0,\,0)$. This assertion together with the understanding that $K$ must be real implies that the most general $K$ matrix must have the form:

\begin{equation}\label{RotationalSymmetryForm}K = \left(\begin{array}{cccc}\kappa_{t\,t}&\kappa_{t\,x}&0&0\\\kappa_{x\,t}&\kappa_{x\,x}&0&0\\0&0&\kappa_{y\,y}&-\kappa_{y\,z}\\0&0&\kappa_{y\,z}&\kappa_{z\,z}\end{array}\right)\end{equation}

The matrix $K$ comprises two, decoupled $2\times 2$ transformations: the $x$ and $t$ co-ordinates can mix only with each other and not with $y$ and $z$. How does isotropy bear on the direction of motion? We can perfectly well use the $-x$ direction instead of our $+x$ and all the results above would follow. So if we make the co-ordinate transformation $M$ defined by $t\mapsto t,\,x\mapsto-x,\,y\mapsto -y,\,z\mapsto z,\,$ (half turn rotation about the $z$-axis), we still have a right-handed basis indistinguishable from our original, with motion again along the $x$ axis (although in the opposite sense), so our $K$ matrix is unchanged and our transformation in this new co-ordinate system must belong to the $\mathfrak{L}$ in \eqref{LieGroup} with the same constant $K$. Thus, for any $\eta\in\mathbb{R}$, there is another rapidity $\eta^\prime=h_s(\eta)$ for some as yet unknown function $h_s$ such that $M\,\exp(\eta\,K)\,M^{-1}=\exp(\eta\,M\,K\,M^{-1})=\exp(h_s(\eta)\,K)$. There are only two possibilities: $h(\eta) = \pm\eta$, at least when $\eta$ is small enough that the matrix logarithm of $\exp(\eta\,M\,K\,M^{-1})$ is defined by its Taylor series about the identity. However, the logarithm has discrete branches, so, outside the region where the Taylor series converges, $h(\eta)$ cannot skip between branches continuously, so that $h(\eta) = \pm\eta,\forall\,\eta\in\mathbb{R}$ given the continuity postulate. Given the alternative $\eta=+1$, our most general $K$ matrix is diagonal, $e^{\eta\,K}$ is diagonal and the only motions consistent with our postulates so far are ones that do not mix $t$ and $z$ and there is no translational spatial motion at all, only dilation of the spatial and time co-ordinates separately. So this alternative, whilst describing a possible behavior, is not describing relative motion. Hence $h_s(\eta)=-\eta$ is the only alternative left and its fulfilling implies:

\begin{equation}\label{MostGeneral}\begin{array}{lclclcl}(t^\prime,\,x^\prime) &=&(t,\,x)\, \boldsymbol{\Lambda}_{2\times 2}(\eta)&\quad&(y^\prime,\,z^\prime) &=&(y,\,z)\,\boldsymbol{R}_{2\times 2}(\eta)\\\\\boldsymbol{\Lambda}_{2\times 2}(\eta) &=& \exp\left(\eta\,\left(\begin{array}{cc}0&\kappa_{t\,x}\\\kappa_{x\,t}&0\end{array}\right)\right)&\quad &\boldsymbol{R}_{2\times 2}(\eta) &=& \exp\left(\eta\,\left(\begin{array}{cc}0&-\kappa_{y\,z}\\\kappa_{y\,z}&0\end{array}\right)\right)\end{array}\end{equation}

There are only two unspecified physical constants in \eqref{MostGeneral}, the ratios $\kappa_{x\,t}/\kappa_{t\,x}$ and $\kappa_{y\,z}/\kappa_{t\,x}$, as can be understood by defining the positive constant $c_I$ with dimensions of length per time by:

\begin{equation}\label{IgnatowskiSpeed}\kappa_{t\,x} = \frac{\zeta}{c_I^2}\,\kappa_{x\,t}\end{equation}

where $\zeta=\pm1$ is the sign of the product $\kappa_{t\,x} \kappa_{x\,t}$. We then scale our time measurements by $c_I$ so that they have the dimensions of length. Let our “natural” co-ordinates be defined by $(t_n,\,x,\,y,\,z)=(t,\,x,\,y,\,z)\,\mathrm{diag}(c_I,\,1,\,1,\,1)$, then our reduced ($2\times 2$) transformation matrix $e^{\eta\,K_{2\times2}}$ becomes $\mathrm{diag}(c_I,\,1)\,e^{\eta\,K_{2\times2}}\,\mathrm{diag}(c_I^{-1},\,1)=\exp(\eta\,\mathrm{diag}(c_I,\,1)\,K_{2\times2}\,\mathrm{diag}(c_I^{-1},\,1))$. Since we can absorb any real constant we like into the rapidity parameter and still get an additive rapidity parameter, we replace $\eta\,\kappa_{x\,t}/c_I$ by $\eta$. In these new co-ordinates:

\begin{equation}\label{MostGeneralNatural}\boldsymbol{\Lambda}_{2\times2}(\eta) =\left(\begin{array}{cc}\cosh \left(\sqrt{\zeta}\, \eta\right) & \sqrt{\zeta}\,\sinh\left(\sqrt{\zeta}\,\eta \right) \\\frac{1}{\sqrt{\zeta}}\,\sinh\left(\sqrt{\zeta}\, \eta\right)& \cosh\left(\sqrt{\zeta}\, \eta\right)\end{array}\right)\quad\boldsymbol{R}_{2\times2}(\eta) =\left(\begin{array}{cc}\cos \left(\kappa_s\,\eta\right) &-\sin\left(\kappa_s\,\eta\right) \\\sin\left(\kappa_s\,\eta\right)& \cos\left(\kappa_s\,\eta \right)\end{array}\right)\end{equation}

if we use the definition \eqref{IgnatowskiSpeed} of our Ignatowski speed. Henceforth, we recall that any time co-ordinate is to be divided by $c_I$ if we wish to retrieve our everyday, SI-defined time. $\kappa_s$ is the ratio $c_I\,\kappa_{y\,z}/\kappa_{x\,t}$.

Let’s look at the $2\times2$ block $\boldsymbol{R}_{2\times2}(\eta)$, a rotation about the $x$ axis for which the rapidity is proportional to the angular displacement. A universe wherein a twist in the plane orthogonal to motion always arises with relative uniform motion with the twist angle proportional to the rapidity is perfectly in keeping with all of our postulates so far. But we need not consider this further, for four reasons: (1) there is no experimental evidence for it in our universe, (2) it is decoupled from the mixing of the time and $z$ co-ordinates, so our study of the latter is not influenced by $\kappa_s$ in any way, (3) $\kappa_s=0$ can be shown to follow from a further assumption that the co-ordinate transformation $t\mapsto-t$ should imply $K\mapsto-K$, i.e. that “playing a movie backwards” should imply the inverse co-ordinate transformation\footnote{Indeed, as an alternative approach, one can use the time inversion behavior assumption instead of $M\,K\,M^{-1}\eta=h_s(\eta)\,K$ to derive \eqref{MostGeneralNatural} and $\kappa_s=0$ from \eqref{RotationalSymmetryForm}.} and (4) we can simply rotate the co-ordinate frames of two observers without changing their relative motion. When we come to discuss the full Lorentz group, this object contains rotations about any axis as well, so that a nonzero $\kappa_s$ does not change the total group of transformations. If motions in a ray compose to a one parameter group defined by \eqref{MostGeneralNatural}, we can also define another, equally valid one-parameter group of motions along the ray where the group member defined for the rapidity $\eta$ is the group member defined by \eqref{MostGeneralNatural} followed by the inverse co-ordinate rotation $\boldsymbol{R}_{2\times2}(\eta)^{-1}=\boldsymbol{R}_{2\times2}(-\eta)$ without changing the relative motion state. Indeed we have constructed \eqref{RotationalSymmetryForm} to expressly make rotations in the orthogonal plane to commute with group members. So, if one-parameter group of motions defined by \eqref{MostGeneral} is possible with nonzero $\kappa_s$, then so is the one parameter group defined by \eqref{MostGeneral} with $\kappa_s=0$. There is a whole family of one parameter groups for motion along a given ray, one member for each $\kappa_s\in\mathbb{R}$ and we can choose the canonical family member defined by $\kappa_s=0$ and define an arbitrary transformation consistent with motion along a given ray at a given rapidity to be the unique transformation with that rapidity with $\kappa_s=0$, followed by an arbitrary rotation. We thus come to understand that $\kappa_s$ does not needfully mean there is a twist with inertial relative motion and so $\kappa_s$ may not be a universal constant of Nature, but one that we can choose through composing rotations with our chosen canonical transformations with $\kappa_s=0$.

There are two singular cases that $\boldsymbol{\Lambda}_{2\times2}(\eta) $ in \eqref{MostGeneralNatural} does not describe, to wit, $\kappa_{t\,x}=0$ and $\kappa_{x\,t} =0$. The former can be thought of as the limit as $c_I\to\infty$ and in this case \eqref{MostGeneral} becomes (on absorbing $\kappa_{x\,t}$ into the rapidity parameter):

\begin{equation}\label{InfiniteIgnatowskiSpeed}\boldsymbol{\Lambda}_{2\times2}(\eta) = \exp\left(\eta\,\left(\begin{array}{cc}0&0\\1&0\end{array}\right)\right) = \left(\begin{array}{cc}1&0\\\eta&1\end{array}\right);\quad\begin{array}{lcl}t^\prime &=&t\\x^\prime &=& x + \eta\,t\end{array}\end{equation}

which is simply the intuitive notion we all have that we add signed speeds for relative motion along the same ray, and the time co-ordinate in both frames is the same. The everyday velocity is the same as the rapidity in this case. That is, the case $c_I\to\infty$ is what is normally called Galilean relativity. The case $\kappa_{x\,t} =0$ becomes (on absorbing $\kappa_{t\,x}$ into the rapidity parameter):

\begin{equation}\label{ZeroIgnatowskiSpeed}\boldsymbol{\Lambda}_{2\times2}(\eta) = \exp\left(\eta\,\left(\begin{array}{cc}0&1\\0&0\end{array}\right)\right) = \left(\begin{array}{cc}1&\eta\\0&1\end{array}\right);\quad\begin{array}{lcl}t^\prime &=&t+\eta\,x\\x^\prime &=& x\\\end{array}\end{equation}

where no uniform relative spatial motion can be consistent with our postulates. Here the rapidity becomes instead a weird “time machine speed” where simultaneous events in one frame are pushed into each other’s past or future by an amount proportional to the spatial separation between them and dependent on the rapidity’s sign. This is not a causal universe: the sign of the time difference between events at some pairs of different spatial points is changed by any $\boldsymbol{\Lambda}_{2\times2}$ of nonzero rapidity..

All other values of $c_I$ define universes wherein uniform relative motion begets differences between the time measured between any two given events by relatively uniformly moving observers. Therefore, we see that what is wontedly called Galilean relativity is the unique relativity applying in a universe fulfilling our Galilean and continuity postulates and wherein time intervals between events are invariant. Other, more interesting universes such as our own are defined by \eqref{MostGeneralNatural} and we must examine the meanings of the Ignatowski speed $c_I$ and the sign $\zeta$. The transformations of \eqref{MostGeneralNatural} can therefore be thought of as the most general transformations consistent with Galilean relativity that result from the relaxation of an assumption of absolute time.

Note that $c_I$, if finite, must be universal in the sense that two scientists who could both potentially (1) change their states of motions to be at rest relative to one another and (2) communicate with each other to compare experimental results at any time in the future must agree on any measured value of $c_I$, otherwise at least one of our postulates so far postulate is falsified, as two different values of $c_I$ are not consistent with our derived transformation. Experimental measurements of the sign $\zeta$ by like pairs of scientists must all agree for the same reason. In particular, in §3, when $\zeta=+1$ there is one, but only one, invariant speed.

Before going forward, we “calibrate” our rapidity in terms of the more wonted notion of distance per time velocity. If I see an observer moving uniformly relative to me at speed $v$ in the $-x$ direction, I see their spatial origin $(t^\prime,\,0)$ at the point $(t,\,-v\,t/c_I)$ in my co-ordinates, given that my everyday clock time is $t / c_I$ units when I measure $t$ “natural units”. From \eqref{MostGeneral} and \eqref{MostGeneralNatural}, therefore, $0=- \sqrt{\zeta}\,\cosh(\sqrt{\zeta}\,\eta)\,\frac{v}{c_I}\,t +\sinh(\sqrt{\zeta}\,\eta)\,t$ is true for all real values of $t$:

\begin{equation}\label{RapidityCalibration}\frac{v}{c_I} = \frac{1}{\sqrt{\zeta}}\,\tanh(\sqrt{\zeta}\,\eta);\quad\eta = \frac{1}{\sqrt{\zeta}}\mathrm{artanh}\left(\sqrt{\zeta}\,\frac{v}{c_I}\right)\end{equation}

Given that $0=- \sqrt{\zeta}\,\cosh(\sqrt{\zeta}\,\eta)\,\frac{v}{c_I} +\sinh(\sqrt{\zeta}\,\eta)$, it follows that my spatial origin $(t,\,0)$ transforms to the point $(t^\prime,\,+v\,t^\prime/c)$ and the observer sees me moving in the $+x$ direction at speed $+v$, so inverse transformations correspond to opposite sign distance over time velocities just as they correspond to opposite sign rapidities; this observation is called the “reciprocity relationship” in the Liberati, Sonego \& Cisser paper[6]. We now write our group $\mathfrak{L}_{2\times2}$ both in terms of rapidity and the form more wonted to students of a first course on special relativity:


where the set $\mathbb{I}=\mathbb{R}$ when $\zeta=-1$ and, when $\zeta=+1$, $\mathbb{I}$ becomes the open interval $\mathbb{I}=(-1,\,1)$. Here, of course, the velocity is in “natural units”. When we compose two transformations, with relative velocities $v_1$ and $v_2$ from $\mathfrak{L}_{2\times2}$ as defined by \eqref{NaturalTansformationGroup}, we get an equation of the form $\boldsymbol{\Lambda}(v_1)\,\boldsymbol{\Lambda}(v_2)=\boldsymbol{\Lambda}(v_3)$; on doing this and solving for $v_3$ we find the relativistic velocity addition law:

\begin{equation}\label{RelativisticAddition}v_3 = \frac{v_1+v_2}{1+\zeta\,v_1\,v_2}\end{equation}

With velocities in “natural” units, our rapidity calibration relationship is $v= \tanh(\sqrt{\zeta}\,\eta)/\sqrt{\zeta};\;\eta =\sqrt{\zeta}\, \mathrm{artanh}( v/\sqrt{\zeta})$. $\eta$ ranges over the whole real line and so, if $\zeta$ is positive, $v=c_I\,\tanh\eta$ and the only valid velocities lie in the interval $(-c_I,\,+c_I)$, reflecting the fact that no finite sequence of finite accelerations can beget relative motion of speed $c_I$ or greater; if $\zeta$ is negative, then $v=c_I\,\tan(\eta)$ and, although $v$ can now range over the whole real line as $\eta$ does so, $v$ switches sign discontinuously at the pole singularities at $\eta = (2\,n+1)\,\pi/2;\,n\in\mathbb{Z}$.

Causality is the last assumption we shall need, and it both sets the sign $\zeta$ and is defined by $\zeta$. The sign $\zeta$, in turn, sets whether the Ignatowski speed has the important following Popper falsifiable property: anything moving at $c_I$ relative to any inertial observer must be measured by all inertial observers to move at $c_I$. The little matter of sign overflows with deep physical meaning and is perhaps the most interesting constant in special relativity.

Causality and Faster-Than-Light Mexican Waves

The most general group of co-ordinate transformations between collinearly uniformly moving reference frames is defined by the one parameter, $2\times 2$ matrix Lie group which, in “natural” co-ordinates\footnote{i.e. wherein time is the everyday clock time multiplied by the positive $c_I$ to yield “natural time” in length units. The same effect can be gotten by instead dividing the spatial lengths by $c_I$, thus yielding “natural length units” with dimensions of time corresponding to the astronomer’s notion of light second, light-year and so forth.} is given by \eqref{MostGeneralNatural} and \eqref{NaturalTansformationGroup}. We get more insight into the peculiar relationship \eqref{RapidityCalibration} by noting that, if $\mathscr{Q} = \zeta\,c_I^2\,t^2-x^2$ in one inertial frame of reference, then a general transformation from \eqref{NaturalTansformationGroup} leaves this quadratic form invariant. If we now rotate our spatial co-ordinate system, so that $\boldsymbol{\Lambda}(\eta)\mapsto R\,\boldsymbol{\Lambda}(\eta) \,R^{-1}$ and $R$ is here any rotation of the spatial co-ordinates, we know that rotations are isometries leaving invariant the Pythagorean form $x^2+y^2+z^2$. Therefore, allowing for such a rotation, any transformation $R\,\boldsymbol{\Lambda}(\eta) \,R^{-1}$ between relatively uniformly moving frames must leave invariant the generalized quadratic form $\mathscr{Q} =\zeta\,c_I^2\,t^2- x^2-y^2-z^2$ and it follows from this invariance alone that the “pseudo inner product” between spacetime vectors $\mathscr{Q}:\mathbb{R}^{3+1}\times\mathbb{R}^{3+1}\to\mathbb{R}; \mathscr{Q}(X_1,\,X_2)=(t_1,\,x_1,\,y_1,\,z_1)\,\mathrm{diag}(\zeta,-1,\,-1,\,-1)\,(t_1,\,x_1,\,y_1,\,z_1)^T$ is also invariant with respect to our most general transformations.

Given this invariance, we can we study the sets of transitivity of our transformations, asking the question: given an initial vector $X$ in spacetime, where can it shift to under a transformation of the form $X\mapsto\boldsymbol{\Lambda}(\eta)\,X$, or, what is the set $\{e^{\eta\,K}\,X|\,\eta\in\mathbb{R}\}$? Without loss of generalness, we use natural time units so that, in the one-spatial dimension case, these surfaces are of the form $\zeta\,t^2 – x^2 = const$. If $\zeta=-1$, then these surfaces are circles $x^2 + t^2 = x_0^2$, where $x_0$ is the $x$ co-ordinate at $t=0$, and our group in \eqref{NaturalTansformationGroup} is a rotation group, since $\cosh(i\,\eta) = cos(\eta)$, $\sinh(i\,\eta)/i=\sin(\eta)$, wth the rapidity $\eta=\arctan(v/c_I)$ becoming the rotation angle. If $\zeta=+1$, then these surfaces are hyperbolas. There are two cases: $x^2-t^2 = x_0^2$, where $x_0$ is the $x$ co-ordinate at $t=0$ and $x^2-t^2 = t_0^2$, where $t_0$ is the time co-ordinate of some event either in the future ($+t_0$) or in the past ($-t_0$) in our spacetime that happens at $x=0$. These alternatives, with their corresponding sets of transitivity, are sketched in Fig. (3).

Transitivity Sets
Figure 3: Transitivity Sets for $\zeta=\pm1$: Paths of images of events: (1) $T_f$, initially in the future when $\zeta=+1$, (2) $T_p$, initially in the past when $\zeta=+1$as $\eta$, (3) $S_l$ and $S_r$, initially elsewhere (initially with spacelike separation from origin) and (4) $A$, path of an event when $\zeta=-1$. All paths plotted as $\eta$ varies from $-\infty$ to $+\infty$; arrows show direction of increasing $\eta$

Now we can instantly understand how $\zeta$ bears on the causality of the universe in question. Fig. (3) shows our spacetime co-ordinates, with time axis vertical. If $\zeta=-1$, $\mathfrak{L}_{2\times2}$ is a rotation group so that, for any future event ($t>0$), there is at least one transformation in $\mathfrak{L}_{2\times2}$ that will rotate it so that its $t$ co-ordinate becomes negative, i.e. for at least one observer uniformly moving relative to us whose origin is co-incident with ours at $t=0$, the event in question will be in their past. As we pass through all the transformations linking our frame and those uniformly moving relative to us, an event such as $b$ follows the circle $A$ (for “acausal”) centered on the origin that passes through it in Fig. (3). This is truly weird if there is a causal link between us and an event at $b$: suppose we must give a trigger signal to make an expressly designed device we’re holding to flash a light at $t=1$. If $\zeta=-1$ then there is an inertial observer uniformly moving relative to us who sees our signal given after the event it triggers. So causal links in a universe where $\zeta=-1$ cannot be uniquely associated with a positive time interval and effects are seen to come before their causes for some observers.

If, however, $\zeta=1$, then the transitivity sets are the hyperbolas in Fig. (3), and the plane in Fig. (3) is then partitioned into four disconnected components sundered by the asymptotes $x=\pm t$. In the $1+1$ dimensional case, events can be thought of as split complex numbers (also called hyperbolic numbers) with contours of constant modulus $x^2-t^2$ as transitivity sets. The event at $b$, in our future at the point $x=1,\,t=1$ moves, under transformation by an element of $\mathfrak{L}_{2\times2}$ when $\zeta=1$, along the hyperbola $T_f$ and thus is in the future of every inertial observer whose co-ordinate origin is collocated with ours. Likewise, the event $d$ , in our past is also in the past of every such inertial observer. Event $d$ follows the hyperbola $T_p$ as the transformation in question ranges over $\mathfrak{L}_{2\times2}$.

The same is true of any point in the two open disjoint regions defined by $t^2>x^2+y^2+z^2$: these two regions are mapped to themselves by $\mathfrak{L}$. If $\mathscr{F}$ is the open cone region $\mathscr{F}=\{(x,\,y,\,z,\,t)|\,t^2>x^2+y^2+z^2;\,t>0\}$ and $\mathscr{P}=\{(x,\,y,\,z,\,t)|\,t^2>x^2+y^2+z^2;\,t<0\}$ then in symbols $\mathfrak{L}\,\mathscr{F}=\mathscr{F}$ and $\mathfrak{L}\,\mathscr{P}=\mathscr{P}$. A point inside one of the regions is shifted by transformations from $\mathfrak{L}$ so that $t^2-x^2-y^2-z^2$ is invariant; if, in our frame, $t^2>x^2+y^2+z^2$ then $t^2=\epsilon^2+x^2+y^2+z^2$ for some $\mathfrak{L}$-invariant $\epsilon>0$ and $t$ is excluded from the interval $(-|\epsilon|,\,+|\epsilon|)$. Any transformation $e^K\in\mathfrak{L}$ can be linked to the identity through the path $\sigma:[0,\,1]\to\mathfrak{L};\,\sigma(\eta) =e^{\eta\,K}$ so that the image $e^{\eta\,K}\,X$ of any point $X$ moves along a continuous path and thus cannot cross the excluded interval. Thus points within $\mathscr{F}$ or $\mathscr{P}$ are confined to those sets under a transformation in $\mathfrak{L}$.

The open connected region $\mathscr{E}=\{(x,\,y,\,z,\,t)|t^2<x^2+y^2+z^2\}$, i.e. the “elsewhere” in Fig. (3)is also mapped to itself by $\mathfrak{L}$: $\mathfrak{L}\,\mathscr{E}=\mathscr{E}$. However, points in $\mathscr{E}$ in our future $t>0$ can be shifted to past points ($t<0$) by transformations in $\mathfrak{L}$ and contrariwise: witness the path followed by a point on $S_r$ or $S_l$ in our future. So there are always relatively moving inertial observers with origins collocated with ours disagree with us about whether any particular event in the elsewhere $\mathscr{E}$ are future or past events. We understand, therefore, that almost all of our everyday intuitions about time – to wit: that causes must come before effects – is perfectly compatible with a Galileo relativity postulate generalized to include nonabsolute time if and only if causal links are confined to the closure of the cone $\mathscr{F}$. That is, the statement “A causes B” can imply the statements “B is in the future of A in all physical laboratories” or “the effect B cannot come before the cause A in all physical laboratories” only if the link $A-B$ lies within the closed cone $\bar{\mathscr{F}} = \{(x,\,y,\,z,\,t)|\,t^2\geq x^2+y^2+z^2;\,t\geq0\}$. Otherwise, an inertial observer moving relative to us could detect his or her motion by testing for causal links pointing backwards in time, in violation of Galileo’s postulate. Thus we are lead to the following definition:

Causal Link Definition: An event $A$ is said to be a cause of event $B$ if $A$ is necessary for $B$; accordingly we write $A\rightarrow B$ for “$A$ causes $B$”\footnote{Of course the logical relationship is the other way around: $A\Leftarrow B$ is precisely the same relationship, if $A,\,B$ are replaced by the predicates “$A$ happens” and “$B$ happens”.}.

and we add the following postulates to our list:

Postulate of Principle of Causality: For any causal link $A\rightarrow B$ between events $A$ and $B$, the spacetime displacement $B-A$ must have a positive time component in any inertial frame of reference.

Postulate of Nontrivial Existence of Causality: There is at least one causal link $A\rightarrow B$ between events $A$ and $B$ with different co-ordinates in spacetime.

Thus Galileo’s postulate in the generalized, variable-time case together with the two postulates above does two things: firstly, it rules out the case $\zeta=-1$, because there is then at least one member of $\mathfrak{L}$ that reverses the time component of any nonzero spacetime vector $B-A$, in contradiction of the nontrivial causality postulate. Secondly, Galileo’s postulate leads to the following conclusion:

Conclusion of No Faster Than Light Signalling Principle: For all causal links $A\rightarrow B$ between events $A$ and $B$, the spacetime displacement $B-A$ must lie in the closed cone $\bar{\mathscr{F}} = \{(x,\,y,\,z,\,t)|\,t^2\geq x^2+y^2+z^2;\,t\geq0\}$.

Thus given the causal chain $A\rightarrow B\rightarrow C \rightarrow D \rightarrow\cdots$ between events $A\,B,\,C,\,\cdots$, we have the time ordering $t_A\leq t_B \leq t_C\leq\cdots$ and, as long as each of the path segments $B-A,\,C-B,\,D-C,\,\cdots$ lie in the closed cone $\bar{\mathscr{F}}$, every inertial observer will see the same causal chain with the same time-ordering. The statement of “no faster than $c_I$ signalling” is by far the best informal description of the blunter rendering “nothing can travel faster than $c_I$” because the latter, unqualified, is patently untrue and begets a great deal of confusion because it is messy to qualify correctly. Students first pick up on the notion that optical material phase velocities can be greater than $c_I$ (the speed of light $c_L$ being experimentally found to be the same as $c_I$ ). The standard answer: that’s OK because the real velocity for the rule is the group velocity. But in some cases this latter can be greater than $c_I$ as well. One then must explain that it is the signal velocity that is important here, i.e. one needs to calculate the full impulse response in question to show that the delay wrought by a medium of thickness $\ell$ is greater than $\ell/c_I$, but the student is left with the impression of a tricky and shaky piecemeal rule. The right rule is very simple and watertight: how do we tell whether a sequence of events is in keeping with the no faster than light signalling principle? We simply draw all the putative causal links and check that they lie within the cone $\bar{\mathscr{F}}$. Let’s illustrate this with the a Mexican Wave thought experiment.


Figure 4: Two Mexican Waves travelling at speed greater than $c_I$: A Mexican Wave by Prior Arrangement (Event set $A\rightarrow B\rightarrow C\rightarrow\cdots$ together with $a\leftarrow A,\,b\leftarrow B,\,\cdots$) and a “Whisper Down the Lane” Propagating Mexican Wave comprising events $\alpha,\,\beta,\,\cdots$. The “disturbance” $a,\,b,\,c,\,\cdots$ can move at speed greater than $c_I$ without violating causality; the sequence $\alpha,\,\beta,\,\gamma,\,\cdots$ at the same speed cannot

Fig. (4) shows two versions of a “Mexican Wave” metachronal wave disturbance, both propagating at the same, greater than $c_I$ speed: the set of events $a,\,b,\,c,\,\cdots$ and $\alpha,\,\beta,\,\gamma,\,\cdots$. The former is perfectly in keeping with the principle of causality, the latter is not. In the former case, a preprogrammer visits positions on the $x$ axis one after the other, leaving instructions with protagonists at these positions to make a wave movement with their arms at a mutually agreed time in the future. These acts of “preprogramming” are the events $A,\,B,\,C,\,\cdots$. The mutually agreed times arrange for the motions of each protagonist’s body (the events $a,\,b,\,c,\,\cdots$) in very swift succession to one another, begetting a metachronal wave pattern that travels at greater than speed $c_I$ from our frame. However, there are no direct causal links $a\rightarrow b,\,b\rightarrow c,\,\cdots$, so when a relatively uniformly moving observer sees the sequence $\cdots,\,c,\,b,\,a$ reversed in their time order, there is no contradiction: all the causal links in the whole graph $\{A,\,B,\,C,\,\cdots\}\cup\{a,\,b,\,c,\,\cdots\}$ lie within the cone $\mathscr{F}$ and still do so even after any transformation from $\mathfrak{L}$. However, the sequence of events $\alpha,\,\beta,\,\gamma,\,\cdots$ with putative direct causal links between them (i.e. the “whisper down the lane” signal propagation that governs most natural metachronal motion) violates the principle of causality together with Galileo’s postulate because some inertial observers will see these events happenning in backwards time order. Very like arguments show that the motion of a laser pointer spot across e.g. the surface of the Moon when the laser origin is on Earth, rotating in a plane at an angular speed of greater than about $45^\circ$ per second so that the spot sweeps across the Moon at greater than $300\,000{\rm km\,s^{-1}}$ is also in keeping with the principle of causality. Because the spot is moving at greater than $c_I$, it is seen to move in the opposite direction by some inertial observers, which fact is not a problem because there is no direct causal relationship between neighboring reflecting positions on the Moon, as for the events $a,\,b,\,c,\,\cdots$ in Fig. (4). So, we can certainly see sequences of events (propagating “effects” or “things”) in Nature travelling at greater than $c_I$, it’s simply that such an observation rules out direct causal relationships between neighboring events in such a sequence. However, as we have seen with the Mexican Wave by Prior Arrangement, such a sequence does not rule out a causal relationships between the causal forerunners or antecedents of such a sequence. The causal antecedents $A,\,B,\,C,\,\cdots$ of $a,\,b,\,c,\,\cdots$ are causally related, even though there is no direct causal relationship between members of the latter sequence, which can therefore be seen evolving at a speed faster than $c_I$ without violating the causality principle. So, the observation of a sequence moving at the speed faster than the speed of light does not rule out all relationships between the sequence members, only direct causal ones. Any inertial observer, on observing a Mexican Wave moving faster than $c_I$, could make the null hypothesis that the motion is owing to statistically independent random jumps made by each of the Mexican Wavers. However, if there are $N$ wavers, then the likelihood of seeing them wave in order by chance alone is $2/(N!)$, so one is forced to reject the null hypothesis at pretty much any reasonable statistical significance for even a handful of wavers. The observation tells us that something is going on between them, even if it’s not causal (as with any other noncausal statistical correlation).

We see that our causality postulates, describing very everyday physics, tell us that $\zeta =1$. This deduction, of course, is also backed up experimentally when we measure the lengthening of metastable particle lifetimes that is observed in particle accelerators as we drive metastable particles to very high speeds. By decaying with decay lifetime statistics defined by its mean lifetime $\tau_m$, an ensemble of metastable particles moving relative to us in a particle accelerator can communicate to us when a time $\tau_m$ has passed relative to its reference frame. We observe the lengthened lifetime $\tau_m^\prime = \tau_m/\sqrt{1-\zeta\,v^2/c_I^2}$ as a function of the measured particle speed, so experimentally we know $\zeta$ is positive and curve fitting to our $\tau_m^\prime$ against $v$ results yields an experimental value of $c_I$. The lack of signal speed limit and resultant lack of causality is not the only consequence of a negative $\zeta$ gainsaying everyday experience. Other physics and relationships that would arise with a Euclidean spacetime norm are explored by science fiction author Greg Egan in his trilogy Orthogonal[10]. A wonderful and correct summary of some of these weird changes in a non Lorentzian universe are given as a primer for his trilogy on Egan’s website[11] and include a variable lightspeed depending on wavelength, thus a spectral spread of colors in the night sky, the decrease of a body’s total energy as its speed increases and the emission of light by plants to allow them to gain energy by photosynthesis.

How wonderfully elegantly the Lorentz transformation and the notion of a universal cause speed limit falls out of Galileo’s postulate and a careful contemplation of everyday observation!

Einstein’s Postulate and Relationships Between Experimental Results

From our analysis so far, $c_I$ is not needfully the speed of any actual existing thing, but its value can nonetheless in principle be measured experimentally quite independently from light by experimentally comparing the velocity addition between three collinear frames for high relative speeds and testing for deviation from the absolute time Galilean formula $v_3=v_1+v_2$. We should (and do) see the generalized addition law in \eqref{RelativisticAddition}. This is actually equivalent to seeking to measure the time dilation as evidenced by the metastable particle mean lifetime measurements already described, either in a particle accelerator or cosmic-ray-begotten muons. An everyday example of time dilation is the observation that metastable muons reach the Earth’s surface from interactions between cosmic rays and the high atmosphere. At rest, their lifetime is $2.2{\rm \mu s}$, so that, even at lightspeed, their passage through the atmosphere takes of the order of ten lifetimes, yet the flux of muons at the Earth’s surface is much greater than the flux attenuation implied by ten lifetimes should allow. The experiments of Rossi and Hall in 1941[12] and of Frisch and Smith in 1963[13] explored this attenuation in detail. A particularly simple and striking instance of \eqref{RelativisticAddition} is when $v_2=1$ in natural units: we get $v_3 = 1$ no matter what velocity $v_1$ is added to it. Otherwise put: the eigenvectors of any transformation in $\mathfrak{L}_{2\times2}$ when $\zeta=1$ are $(1,\,\pm1)$, i.e. along the dashed lines in Fig. (3) and corresponding to something moving at velocity $\pm c_I$. So, another potential experimental result arises if we do happen to find something that moves at $c_I$ relative to us. Without a numerical value for $c_I$ we simply look for something, anything, whose measured relative speed is experimentally invariant. If we observe such a thing, its speed relative to us must be $c_I$, and, by the argument for the universality of $c_I$ and $\zeta$ in §2 above, we cannot observe more than one speed transforming in this special way without gainsaying our postulates. Of course, the Michelson-Morley experiment did find such a thing: the speed of light transforms thus. The observation of an invariant speed also confirms, along with time dilation, that $\zeta=+1$ for if $\zeta=-1$, the addition rule in \eqref{RelativisticAddition} is $v_3=(v_1+v_2)/(1-v_1\,v_2)$ and there is no solution to $v=(v+v_2/(1-v\,v_2)$ aside from $v=0$. It turns out of course (from a simple, but outside-the-scope-of-this-paper analysis) that only things with zero rest mass can be observed to move at $c_I$ and indeed they cannot be observed to move at any other speed, without acquiring rest mass. Thus, from our universality of $c_I$ and $\zeta$ argument, if the massless graviton truly emerges from a future full quantum gravity theory, for example, its speed in all inertial reference frames must be the selfsame $c_I$ we are talking about here: otherwise it must have a nonzero rest mass. Likewise, the neutrino was for many years thought to have zero rest mass and indeed so far neutrino speeds are measured, to within the experimental error (including that of the infamous 2011 OPERA measurement[14]), to be the same as the $c_I$. Only indirectly through the phenomenon of flavor oscillation do we know that there must be a difference between $c_I$ and neutrino speeds[15]. The same can be said for the speed of light: all experiments show $c_L$ to be invariant and bounds on the photon rest mass are astoundingly low[16], [17]. Our most rigorous test so far of the existence of the Ignatowski speed is perhaps the analysis of gamma ray bursts with the Fermi Gamma Ray Space Telescope, which confirms the absence of photon speed variation even over journeys of cosmological time frames[18]

In the light of this thinking, the Michelson-Morley experiment is logically altogether independent from special relativity, aside from its historical relationship with relativity that its negative result gave us a direct measurement of the invariant Ignatowski speed and the experimental prompting needed for Einstein, Poincaré, Lorentz, Minkowsky and others to formulate the special theory of relativity. A positive result from the Michelson-Morley experiment would show that lightspeed $c_L$ is less than $c_I$; if $c_L$ were an appreciable fraction of $c_I$, the latter could still be measured by this experiment. In the light of Ignakowskian thinking, a positive Michelson-Morley result could be interpreted as follows:

  1. Light is mediated by a massive particle, i.e. that has a nonzero rest mass;
  2. The Earth itself is steeped in a material medium with nonunity refractive index that we have not otherwise identified and would have a true speed of $c_I>c_L$ in parts of the Universe that were empty of this material medium;
  3. Light requires a medium for propagation which can in principle be removed; this alternative begets the possibility of “dark regions” in space that cannot transmit light.

None of this is necessary, of course, because lightspeed has been experimentally confirmed to be equal to $c_I$ to an extreme precision; see for example, the experiments of Herrmann and others[19]. The point of the above discussion is to show the logical independence of the Michelson-Morley experiment and special relativity: one can conceive of a logically consistent universe wherein there were a positive result and there would be an invariant speed $c_I$ unrelated to light. We, along with Ignatowski, derived an invariant speed $c_I$ from our Galilean, continuity and causality postulates. The Michelson-Morley experiment was strongly suggestive, to Lorentz, Poincaré, Einstein and others, that $c_L$ was an invariant speed, which assumption afforded Einstein his derivation and interpretation of the special theory of relativity. Einstein’s own postulate (the so-called “second” relativity postulate) becomes, from the standpoint of the Ignatowskian approach, more like a summary of experimental results that supports Ignatowski’s derivation:

Einstein Assertion Summarizing Aether Detection Experments: The speed of light $c_L$ equals the Ignatowksi speed $c_I$ and is thus observed to be the same for all inertial observers.

Non Collinear Motions, Wigner Rotation and the Thomas Precession

Now unfortunately things become a little more complicated when motions are not all collinear. The smallest Lie group containing the three separate one-parameter groups of boosts in the $x$, $y$ and $z$ directions (and indeed for boosts along any three, linearly independent directions) also contains all spatial rotations along with boosts in any direction. This Lie group is $SO(1,\,3)$ and a discussion of it is likely to be a little beyond what most instructors would want to put in a first course on special relativity. However, once the student has mastered the one dimensional application above, he or she should be able to accept that when one allows the composition of many relative motions in different directions, the Galileo principle that the composition of two Lorentz transformation is still a Lorentz transformation still holds, but in a somewhat more complicated form. One needs to broaden one’s notion of the most general transformation from a group of collinear boosts to a group wherein every transformation is a rotation composed with a boost. It is also worth stating that this fact is a mathematical one, not one induced from experimental results: experiment cannot vary from this foretold behavior without falsifying our postulates themselves. We are of course talking about the polar decomposition of a general member of $SO(1,\,3)$: that every proper Lorentz transformation can be written $\Lambda = B\,R$ where $R$ is a rotation operator that leaves the time co-ordinate unchanged and $B$ a pure boost in any direction. One can also use a perfectly good decomposition where the rotation operator is on the right, as long as one is consistent with the order. The phenomenon of boosts composing to rotating boosts is called Thomas Precession[20], }[21] or Wigner Rotation[22]. How much one includes of the discussion of $SO(1,\,3)$ depends on the level of the students, but in all cases some informal mathematical experiments, particularly with a computerized symbolic mathematical exploration environment such as Mathematica, are an excellent introduction and intuition building exercise for students of all levels.

Another wonderful scientific experience of a simplified Thomas precession discussion is that everyday physical intuitions, on careful examination through the analysis of this paper, lead to something that is both well beyond everyday intuition – the Thomas precession is a subtle and weird effect, Popper-falsifiable and likely to be genuinely new. Every teenage physics nerd in 2015 has heard of and understands the phenomenon of time dilation but the same is not true of the Thomas Precession. Let’s sketch out an exercise exploring what is usually quite an advanced topic, but one which both (1) can be linked to compelling, supporting, easily stated experimental results and (2) lends itself well to simple computerized mathematical exploration; it’s surprising just how deeply one can make an inroad into this topic in this way. Thomas used this phenomenon to calculatethe splitting between the two $\ell=1,\,s=\pm1/2$ states in hydrogen successfully[20], [21] (also see §2.54 of Crowell’s book[3]); formerly to this, calculations were off the observed result by a famous unexplained factor of $1/2$.

Naturally, in thinking of the composition of two boosts, one aligns one’s co-ordinate system so that the $X\wedge Y$ plane is the plane defined by the two boosts, and, without loss of generalness, we align one of the boosts to the $x$ direction. We now only need three spacetime co-ordinates; our boost matrices act on row vectors of the form $(t,\,x,\,y)$. We use natural time and normalize all velocities so that $c_I=1$. Using a rotation $R(\theta)$ in the plane and the boost $\boldsymbol{\Lambda}_x(\eta)$ in the $x$-direction, we can understand that a boost along the ray defined by the unit vector $(0,\,\cos\theta,\,\sin\theta)$ is $R^{-1}(\theta)\, \boldsymbol{\Lambda}_x(\eta)\,R(\theta)$ (rotate one’s co-ordinates so that the new $x$ direction points along the boost-ray, write down the boost using \eqref{MostGeneralNatural} and then rotate your co-ordinates back again):

\begin{equation}\label{ThetaBoost}\begin{array}{l}\displaystyle{\boldsymbol{\Lambda}(\theta,\,\eta) = R(\theta)^{-1}\,e^{\eta\,K_x}\,R(\theta) = \exp\left(\eta\,R(\theta)^{-1}\,K_x\,R(\theta)\right)=\exp\left(\eta\,e^{-\theta\,H}\,K_x\,e^{\theta\,H}\right)}\\\\
K_x =\left(\begin{array}{ccc}0&1&0\\1&0&0\\0&0&0\end{array}\right);\quad\,H=\left(\begin{array}{ccc}0&0&0\\0&0&1\\0&-1&0\end{array}\right);\quad R(\theta)^{-1}\,K_x\,R(\theta)=\left(\begin{array}{ccc}0 & \cos\theta & \sin\theta \\\cos\theta & 0 & 0 \\\sin\theta & 0 & 0 \\\end{array}\right)\\\\\displaystyle{ R(\theta)^{-1}\,K_x\,R(\theta)= \exp(\cos\theta\,K_x+\sin\theta\,K_y)}\end{array}\end{equation}

Witness that any boost in this plane (have the student prove this from \eqref{ThetaBoost}) can be represented as $\exp(\eta_x\,K_x+\eta_y\,K_y)$ (see \eqref{SO12} for definition of $K_y$) . A fact from Lie theory that shouldn’t be too hard for students to swallow (the full proof is to be found in §2.5 of Rossmann[23]) is that the smallest group containing any set of matrices of the form $\exp(x_j\,X_j);\,x_j\in\mathbb{R}$ is a matrix group $\mathfrak{G}$ whose members “near” the identity are all of the form $e^Y$ where $Y\in\mathfrak{g}$ and $\mathfrak{g}$ is the smallest Lie algebra containing the original matrices $X_j$ in question and indeed all matrices of the form $e^Y;\,Y\in\mathfrak{g}$ are in the smallest group $\mathfrak{G}$ just spoken about. $\mathfrak{g}$ is most simply described as the set of all matrices gotten from the $X_j$ by a finite number of scaling, addition and Lie bracket $\mathrm{Lie}(X,\,Y)=[X,\,Y]=X\,Y-Y\,X$ operations. All our boosts in one plane are of the form $\exp(\eta_x\,K_x+\eta_y\,K_y)$ but the Lie bracket of two matrices of the form $\eta_x\,K_x+\eta_y\,K_y$ is not of this same form. Again, have the student prove this, preferably with the help of a computerized mathematical exploration environment as you want the student to see the proof happenning without being bogged down by algebraic slipups. Now have them show that the algebra spanned by $K_x,\,K_y$ and $H$:

\begin{equation}\label{SO12}K_x =\left(\begin{array}{ccc}0&1&0\\1&0&0\\0&0&0\end{array}\right);\quad K_y =\left(\begin{array}{ccc}0&0&1\\0&0&0\\1&0&0\end{array}\right);\quad\,H=\left(\begin{array}{ccc}0&0&0\\0&0&1\\0&-1&0\end{array}\right)\end{equation}

is closed under linear and Lie bracket operations and $\mathrm{Lie}(K_x,\,K_y)=[K_x,\,K_y]=H$. It is not the full group of boosts and rotatations, but rather the subgroup $SO(1,\,2)$ of boosts and rotations in one plane. So, our general composition of two boosts $\exp(\eta_{1\,x}\,K_x+\eta_{1\,x}\,K_y)$ and $\exp(\eta_{2\,x}\,K_x+\eta_{2\,x}\,K_y)$ is of the form $\exp(\eta_{3\,x}^\prime\,K_x+\eta_{3\,y}^\prime\,K_y+\phi^\prime\,H)$; this in turn can be written in polar form $\exp(\eta_{3\,x}\,K_x+\eta_{3\,y}\,K_y)\,\exp(\phi\,H)$. So now we address the Thomas Precession, asking the question in Fig. (5).

Figure 5: A short history within the life of an accelerated electron: we boost to the electron’s instantaneously uniformly comoving frame through transformation $e^{K_x}\,e^{\phi\,H}$; an instant later, the electron’s co-moving frame is boosted relative to our new frame by $e^{\delta_x\,K_x+\delta_y\,K_y}$, where $\phi$ is the initial angle between the electron’s frame and our frame. The question: what generalized polar Lorentz transformation $\exp((\eta+\delta\eta)(\cos\delta\theta\,K_x+\sin\delta\theta\,K_y))\,e^{(\phi+\delta\phi)\,H}$ relative to our initial frame defines the electron’s new instantaneously uniformly comoving frame?

The figure is meant to evoke matrix composition by vector addition; as already observed, this cannot capture the full picture. We have the final boost vector $\exp((\eta+\delta\eta)(\cos\delta\theta\,K_x+\sin\delta\theta\,K_y))$ composed with the changed rotation $e^{(\phi+\delta\phi)\,H}$ of the electron’s Cartesian $x$ and $y$ axes relative to ours. The full calculation is defined by:

\begin{equation}\label{ThomasPrecession}\begin{array}{ccl}&&\displaystyle{\exp((\eta+\delta\eta)(\cos\delta\theta\,K_x+\sin\delta\theta\,K_y))\,e^{(\phi+\delta\phi)\,H}= e^{\eta\,K_x}\,e^{\phi\,H}\,e^{\delta_x\,K_x+\delta_y\,K_y}}\\\\\Leftrightarrow &&\displaystyle{\exp((\eta+\delta\eta)(\cos\delta\theta\,K_x+\sin\delta\theta\,K_y))\,e^{\delta\phi\,H}\,e^{-\eta\,K_x}=e^{\eta\,K_x}\,e^{\phi\,H}\,e^{\delta_x\,K_x+\delta_y\,K_y}\,e^{-\phi\,H}\,e^{-\eta\,K_x}}\\&=&\displaystyle{\exp\left(e^{\eta\,K_x}\,e^{\phi\,H}\,(\delta_x\,K_x+\delta_y\,K_y)\,e^{-\phi\,H}\,e^{-\eta\,K_x}\right)}\end{array}\end{equation}

and, since in the final line of \eqref{ThomasPrecession} the matrices on both sides of the equation are near the identity, this equation fully, exactly defines $\delta\eta(\delta_x,\,\delta_y),\,\delta\phi(\delta_x,\,\delta_y)$ and $\delta\theta(\delta_x,\,\delta_y)$ ($\delta\theta$ representing the change in boost direction and $\delta\phi$ the change in orientation between the two co-ordinate systems) as functions of the boost change parameters $\delta_x,\,\delta_y$ when $\delta\eta,\,\delta\theta$ are small enough, but non-infinitessimal\footnote{i.e. through the matrix logarithm, which converges so long as $|\lambda_j-1|<1$ holds for both eigenvalues of the matrices on either side of the last equation in \eqref{ThomasPrecession}, and the Campbell-Baker-Hausdorff series.}. This is a highly involved calculation, but approximating it to first order and passing to the limit lets us work out the instantaneous angular velocity of precession of the electron’s axes relative to ours. This is done with the following Mathematica session, assuming the changes $\delta\eta,\,\delta\phi,\,\delta\theta,\,\delta_x,\,\delta_y$ are all proportional to a parameter $a$, and the first order behavior with respect to $a$ is derived:

In[1]:= Kx={{0,1,0},{1,0,0},{0,0,0}};
In[2]:= Ky={{0,0,1},{0,0,0},{1,0,0}};
In[3]:= H={{0,0,0},{0,0,1},{0,-1,0}};

In[4]:= ExpToTrig[Series[MatrixExp[(eta+a deltaE)(Cos[a deltaTh] Kx+Sin[a deltaTh] Ky)].
MatrixExp[a deltaPh H].MatrixExp[-eta Kx]-MatrixExp[MatrixExp[eta Kx].
MatrixExp[phi H].(a deltaX Kx+a deltaY Ky).MatrixExp[-phi H].MatrixExp[-eta Kx]],
Out[4]//MatrixForm= ........

In[5]:= Solve[%==0,{deltaTh,deltaPh,deltaE}]//Simplify
Out[5]= {{deltaTh->Csch[eta] (deltaY Cos[phi]-deltaX Sin[phi]),
deltaPh->(deltaY Cos[phi]-deltaX Sin[phi]) Tanh[eta/2],'
deltaE->deltaX Cos[phi]+deltaY Sin[phi]}}

In[6]:= deltaPh/deltaTh/.%[[1]]//Simplify
Out[6]= Sinh[eta] Tanh[eta/2]

In[7]:= %-(Cosh[eta]-1)//Simplify
Out[7]= 0

The last output in the Mathematica calculation is the identity nought, showing that $\Omega/\omega=\lim\limits_{\alpha\to0}\delta\phi/\delta\theta=\mathrm{d}_\theta\phi$, namely the rate of precession of the electron frame’s co-ordinate axes with respect to ours, is $\Omega = (\cosh\eta-1)\,\omega$, where $\omega=\dot{\theta}$ is the rate of change of the boost direction. Thus, if the electron undergoes circular motion and completes one “orbit”, its axes precess through an angle of $2\,\pi\,(\cosh\eta-1)=2\,\pi\,(\gamma-1)$ relative to ours, which is the well-kenned Thomas Precession Rate formula[23]) [21]). The large output after line 4 in the above session has been suppressed.

§2.5.3 of Crowell’s book[3] deals with the “simple” problem where the relativistic electron undergoes four, orthogonal, instantaneous, discrete accellerations, to make the particle follow a rectangular path. The total transformation is $e^{-\eta\,K_x}\,e^{-\eta\,K_y}\,e^{\eta\,K_x}\,e^{\eta\,K_y}$ which is easy to expand as a Taylor series in Mathematica as follows:

In[1]:= Kx={{0,1,0},{1,0,0},{0,0,0}};
In[2]:= Ky={{0,0,1},{0,0,0},{1,0,0}};
In[3]:= H={{0,0,0},{0,0,1},{0,-1,0}};

In[4]:= Series[MatrixLog[MatrixExp[-eta Kx].MatrixExp[-eta Ky].MatrixExp[eta Kx].
MatrixExp[eta Ky]],{eta,0,4}]//Normal//Simplify

Out[4]= {{0,eta^3/2,-(eta^3/2)},{eta^3/2,0,eta^2+eta^4/3},
{-(eta^3/2),-(1/3) eta^2 (3+eta^2),0}}

Indeed, the last line above outputs the expression $\log\left(e^{-\eta\,K_x}\,e^{-\eta\,K_y}\,e^{\eta\,K_x}\,e^{\eta\,K_y}\right)\approx \eta^2\,[K_x,\,K_y]+O(\eta^3)$; the expression is second order in the rapidity and the Lie bracket relationship for the second order term is general for any matrices $K_x,\,K_y$; in this special case, the Lie bracket $[K_x,\,K_y]=H$ and so, to second order in $\eta$, our four boosts compose to a rotation together with higher order residual boosts; as shown by the output from the last line of the above Mathematica session shows that there are nonzero, but third and higher order $K_x$ and $K_y$ terms.

Conclusion: Where Einstein Fits In

We have given a full derivation of the Lorentz transformation using Galileo’s relativity, together with isotropy, continuity and causality assumptions but without Einstein’s postulate, as well as arguing that homogeneity is contained within Galileo’s principle and does not need to be a separate postulate. In short, Galileo’s relativity alone with the assumption of absolute time relaxed is essentially special relativity. Let us take stock of our assumptions and how they lead to our results:

  1. The Galileo relativity postulate implies the transformations wrought between co-ordinates of relatively uniformly moving frames must be a group because a description of physics from all frames must contain the same information, and a group is the minimum structure we must impose on the transformations with their composition operation to ensure information is not destroyed by transformation;
  2. The Copernican assumption that “Nature does not care where we put our co-ordinate origin” together with an assumption of continuity of each transformation itself (e.g. we don’t see images of passing trees shattered into disconnected sets by our motion inside a bus), also implies that the group of transformations acts linearly on the frame co-ordinates;
  3. An assumption of continuity of the composition and inversion operations – intuitively that we can accelerate smoothly and not jaggedly between different inertial motion states – together with a spatial isotropy assumption implies motions along a given spatial ray compose to motions along that ray and that the group of corresponding co-ordinate transformations must be a one-parameter Lie group of the form $\{\exp(\eta\,K)|\,\eta\in\mathbb{R}\}$ for some constant matrix $K$;
  4. Isotropy of space also shows that the transformation group members, as well as $K$, must commute with rotations in the plane orthogonal to relative motion and become inverted by switching the sense of motion; at this point, the assumptions completely define the form of the Lorentz transformation aside from the signature $\zeta=\pm1$;
  5. $\zeta=-1$ is found to be incompatible with Galileo’s postulate because there is always some inertial observer who will see an effect come before its cause in a relatively uniformly moving frame. The case $\zeta=+1$ restores our intuitive notion of causality but only with a further genuinely new condition: only cause-effect links travelling at less than or equal to the speed $c_I$ can be both cause-effect links and be in keeping with our intuitive notions: that is, such links are forwards pointing in time for all inertial observers;
  6. In the $\zeta=+1$ case with finite $c_I$ we find that $c_I$ also has the property that it will be observed to be the same in all inertial frames, an experimentally observed behavior of the speed of light propagation. Speed of light measurements confirm that we live in a universe with a finite $c_I$, not that with $c_I=\infty$ traditionally thought of as Galilean relativity. The Galilean postulate also implies there can only be one value of $c_I$ with the invariance property.

Therefore, not only do we discover the full form of the Lorentz transformation, but we also discover the remarkable universal speed limit $c_I$ for cause-effect links. Once we accept this speed limit, cause and effect are related in the way we should expect from our everyday experience. This is perhaps the most surprising aspect of special relativity with a nonabsolute notion of time: for all the talk of twin-paradoxes and of the “overthrow” of our traditional and intuitive notions of time, special relativity, at an only slightly more abstracted level, actually preserves pretty much all of our intuitive, everyday notions of time. For, although our timelines may become stretched and shrunken relative to those of other observers somewhat in extreme conditions far beyond those we are ever likely to encounter, even if they do, yet still we are born, grow through our childhood, inexorably grow old and die and, most crucially, the
topology of the web of causal links that binds each of us to the World around us is utterlyunchanged and is the same for all inertial observers who look upon that web. Part of my underlying thesis is that special relativity is all about everyday observations and readily grasped symmetries that we have all appreciated at some level since our childhoods. Of course, the phenomenon of time dilation is a little weird, but in 2015 its existence is overwhelmingly confirmed experimentally.

Special relativity is not so arcane or counter-intuitive after all, its simply that we need to examine our own physical intuitions carefully and look a little beyond the obvious to understand this conclusion. Before Einstein, we extrapolated our everyday experience to conclusions that were initially the simplest, but not the only possible conclusions from our everyday experience. Indeed, as Brian Greene in his documentary “The Illusion of Time”[24] points out, before the late nineteenth century, very few people had seriously thought about the issues of synchronization or fine measurement of time, whereas during the late nineteenth century the widespread uptake of train networks brought with it the need to synchronize clocks everywhere, precisely. Whilst Greene’s thesis overlooks notable early exceptions, for example, John Harrison’s marine chronometer\cite{Sobel} and the longitude problem itself, his point is well made: firstly, the common woman and man had very few innate intuitions about time aside from the circadian rhythms of the Sun’s apparent motion, the causality we have discussed and the growth and ageing of themselves and their fellows until the industrial age forced a new conception of time on them. Secondly, the sudden mass need for time synchronization technology pressed on Einstein the task of examining patents to do with the problems of synchronization of electromechanical devices[25], gaining him both practical knowledge and the mindroom wherein to foster thoughts that would serve him well in 1905.

A superficial reading of this paper might imply that I have sought to excise Einstein, but this is false. Einstein’s uniquely Einstein contribution to special relativity was that there could and should be a relaxation of the notion of absolute time in the first place, that the different time rates experienced by different inertial observers were real and not some diminished local time that Lorentz seems to have beheld or trick of convention that Poincaré suspected. He did it in a way that reflected the experimental results and pressing issues of his day: by studying light i.e. by a different approach from that of this paper. Einstein promoted time transformation to a full reality, and only then was science ready for the birth of a geometrical description of spacetime, with Minkowski as midwife in 1908[26]. The word “spacetime” has become so much a part even of our broader culture that we forget that there was once a time before the birth of Minkowski’s geometrical thought picture; indeed what seems perfectly natural to us was thought by Einstein himself, one of the last century’s greatest minds, at first to be a mere mathematical trick[27]. %\footnote{The Oxford English Dictionary lists as its “official first” usage of “space-time” in English as being in E. Cunningham’s 1915 book Relativity and the Electron.}.

So, whilst the method of this paper, which I believe is sometimes a better teaching model for some people – those multicellular organisms who think somewhat like I do – than Einstein’s approach beginning with the invariant lightspeed postulate together with Galileo’s, we really can only make this approach work standing on Einstein’s and Minkowski’s shoulders. It is telling that Ignatoswki’s paper came in 1910, two years after Minkowski: pretty much the brooding time one would expect once Minkowski had made his contribution. It was Einstein who passionately reinvigorated Galileo’s principle, and it was Einstein who taught us how to read Galileo properly. And when we do read Galileo correctly, we come upon the remarkable universal speed limit for causal links, and with it the germ of the idea of locality, which of course was central to Einstein’s biggest work to come in 1915. One day therefore, I believe people will remember Galileo more than Einstein as the father of relativity; beside him Einstein as the father of locality, who at last freed science from the action at a distance notion that had both served Newton well, but which had also troubled Newton so long ago.

Appendix: Proof that the One-Parameter Lie Group Follows from the Continuity / Monotonicity Postulate

Here follws a proof that the form of the transformation group is that stated in \eqref{LieGroup}, i.e. that the continuity of the group operations for a one parameter matrix group imply a one parameter Lie group. This is a very first baby step in the greatly more difficult solution of Hilbert’s famous fifth problem[29]. We first begin with a more precise statement of our group operation continuity and monotonicity postulate suitable for mathematical inferences.

Continuity and Monotonicity of Group Composition Postulate (Version 2): The matrix group of transformations between inertial frames whose relative motion is defined by a velocity in one direction characterizing the group is the continuous image of the real line (the line of generalized velocities) i.e. $\sigma:\mathbb{R}\to\mathcal{M}(N,\,\mathbb{R})$ is the whole group and defines a continuous ($C^0$) path through the set $\mathcal{M}(N,\,\mathbb{R})$ of $N\times N$ real matrices. Furthermore:

  1. We define $\sigma(0)=\mathrm{id}$ and $\sigma(\eta)$ for $\eta>0$ always defines relative motion in the same direction whilst $\eta<0$ always defines motion in the opposite direction ($\eta$ has the same sign as the relative velocity along the chosen direction);
  2. The group operations are continuous so that the function $p:\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ defined by $\sigma(\eta_1)\,\sigma(\eta_2) \stackrel{def}{=}\sigma(p(\eta_1,\,\eta_2))$ is a continuous function of $\eta_1$ and $\eta_2$ and the function $\iota:\mathbb{R}\to\mathbb{R}$ defined by $\sigma(\eta)^{-1} = \sigma(\iota(\eta))$ is also a continuous function of $\eta$;
  3. The mapping $\sigma$ is monotonic, in the sense that if $\sigma(\eta_1)\,\sigma(\eta_2)=\sigma(p(\eta_1,\,\eta_2)=\eta_3)$ and if $\eta_1,\,\eta_2>0$, then $\eta_3>\eta_1$ and $\eta_3>\eta_2$. Likewise, if $\eta_1,\,\eta_2<0$, then $\eta_3<\eta_1$ and $\eta_3<\eta_2$. Intuitively: the composition of two relative motions in the same direction is a “swifter” relative motion in the same direction.

We now work a “trick” that Henry Briggs used to calculate his 1624 Arithmetica Logarithmica tables of logarithms with[30] and which was broadened to closed matrix groups by von Neumann in 1929[31]. We rescale the argument of the function $\sigma:\mathbb{R}\to\mathcal{M}(N,\,\mathbb{R})$ so as to choose a “unit” relative motion to be slow enough that the all the transformation matrices in the set $\{\sigma(\eta)|\,\eta\in[0,\,1]\}=\sigma([0,\,1])$, i.e. the continuous path of transformation matrices in the group linking the identity $\sigma(0)=\mathrm{id}$ and $\sigma(1)$, all fulfill the bound $\|\sigma(\eta) – \mathrm{id}\| < 1;\,\eta\in[0,\,1]$. Such a “unit” motion exists by dint of continuity of $\sigma$. The significance of this bound is that, when it is fulfilled, the matrix logarithm Taylor series converges, so that every $\sigma(\eta)$ in the path segment $\sigma([0,\,1])$ has a logarithm defined by $\log(\sigma(\eta)) = K = (\sigma(\eta) – \mathrm{id}) -\frac{1}{2} (\sigma(\eta) – \mathrm{id})^2 + \frac{1}{3}(\sigma(\eta) – \mathrm{id})^3 – \cdots$. Now we consider the function $\mathrm{sqr}:\mathbb{R}\to\mathbb{R}$ defined by $\sigma(\eta)\,\sigma(\eta)=\sigma(\mathrm{sqr}(\eta))$. By the group composition continuity postulate, this is a continuous function of a real variable, with $\mathrm{sqr}(0)=0$ and, by the monotonicity axiom in that postulate we see that $\mathrm{sqr}(1)>1$. By the intermediate value theorem, therefore, there is a $\eta_{\frac{1}{2}}\in[0,\,1]$ such that $\sigma(\eta_{\frac{1}{2}})\,\sigma(\eta_{\frac{1}{2}}) = \sigma(1)$. That is, $\sigma(1)$ has a square root in the path segment $\sigma([0,\,1])$. But now, by dint of the convergence of the logarithm series, every matrix within the ball defined by $\mathcal{V}=\{\gamma|\,\|\gamma-\mathrm{id}\|<1\}$ has a unique square root inside that ball\footnote{Although it may very well have other square roots outside the ball} defined by $\sqrt{\sigma(\eta)} = \exp\left(\frac{1}{2}\log(\sigma(\eta))\right)$ because the logarithm is defined and maps the ball $\mathcal{V}$ into a neighborhood $\mathcal{U}=\log(\mathcal{V})=\{K|\,\exp(K)\in\mathcal{V}\}$ and both the matrix exponential, defined by the universally convergent matrix exponential matrix Taylor series and logarithm are bijective maps between $\mathcal{U}$ and $\mathcal{V}$. Therefore, if there were two square roots $\varsigma_1,\,\varsigma_2$ inside the ball $\mathcal{V}$, then both have logarithms so their squares are $\sigma(1)=\exp(2\,\log\varsigma_1) = \exp(2\,\log\varsigma_1)$. Then because $\sigma(1)$ also belongs to $\mathcal{V}$, it has a unique logarithm so that $\log\varsigma_1=\log\varsigma_2$, whence $\varsigma_1=\varsigma_2$. So we now know that the path segment contains a square root $\sigma(\eta_{\frac{1}{2}})$ of $\sigma(1)$ where $\eta_{\frac{1}{2}}\in[0,\,1]$ and that square root must be the unique square root $\exp\left(\frac{1}{2}\,K(1)\right)$, where $K(1)=\log(\sigma(1))$, since the whole path segment $\sigma([0,\,1])$ lies in the ball $\mathcal{V}$.

Now we repeat this trick with $\sigma(\eta_{\frac{1}{2}})=\exp\left(\frac{1}{2}\,K(1)\right)$ to show that there is a $\sigma(\eta_{\frac{1}{4}}) = \exp\left(\frac{1}{4}\,K(1)\right)$ with $\eta_{\frac{1}{4}}\in[0,\,\eta_{\frac{1}{2}}]$. It also follows, by the monotonicity axiom in our group composition continuity postulate, that $\sigma(\eta_{\frac{3}{4}}) = \exp\left(\frac{3}{4}\,K(1)\right)$ with $\eta_{\frac{3}{4}}\in[\eta_{\frac{1}{2}},\,1]$ lies in our path segment. This is because $\sigma(\eta_{\frac{3}{4}}) = \sigma(\eta_{\frac{1}{4}})\,\sigma(\eta_{\frac{1}{2}})$, so $\eta_{\frac{3}{4}}>\eta_{\frac{1}{2}}$, whilst $\sigma(1) = \sigma(\eta_{\frac{1}{4}})\,\sigma(\eta_{\frac{3}{4}})$, so $\eta_{\frac{3}{4}}<1$. Next we repeat this trick for $\sigma(\eta_{\frac{1}{4}})$, thus showing $\{\exp\left(\frac{k}{8} K(1)\right)|\,k\in0,\,1,\,\cdots\,8\}\subset\sigma([0,\,1])$. Repeating this process inductively shows that the matrix $\exp\left(n\,2^{-m}\,K(1)\right)$ for any integer $n$ and $m$ belongs to our group, that is, every power $\exp\left(q\,K(1)\right)$ of $\exp\left(K(1)\right)$ lies in the path segment $\sigma([0,\,1])$ where $q$ is a rational number in the interval $[0,\,1]$ with a finite binary expansion. The path segment $\sigma([0,\,1])$ being a closed subset of the set of square matrices, it must contain the closure of the set of transformation matrices we have shown to belong to it, to wit, $\{\exp\left(n\,2^{-m}\,K(1)\right)|\,n,\,m\in\mathbb{N};\,n\,2^{-m}\in[0,\,1]\}$. But the set of numbers of the form $n\,2^{-m}$ is dense in the interval $[0,\,1]$, therefore the path segment contains path segment $\{e^{\eta\,K(1)}|\,\eta\in[0,\,1]\}$ which also links $\sigma(0)$ and $\sigma(1)$. It follows that our path segment must be the path segment $\{e^{\eta\,K(1)}|\,\eta\in[0,\,1]\}$ because the function $\eta\mapsto e^{\eta\,K(1)}$ is the unique possible continuous function of one real variable whose values coincide with the values determined above on the dense subset of $\{n\,2^{-m}|\,n,\,m\in\mathbb{Z}\}$ of the real line. Thus we are lead to the conclusion in \eqref{LieGroup} and the understanding that some constant $4\times 4$ matrix $K$ wholly characterizes our transformation group of inertial frames in our World and which group acts linearly on the spacetime co-ordinates, at least if our four postulates hold good.

There are several ways to argue the form of \eqref{LieGroup}; another simple way is to understand that any closed matrix group with elements arbitrarily near to the identity (i.e. there is a member $\gamma(\epsilon)$ with $\|\gamma(\epsilon)-\mathrm{id}\|<\epsilon$ for any $\epsilon>0$) must contain at least one ray of the form $\{\exp(\eta\,K)|\,\eta\in\mathbb{R}\}$ for some square matrix $K$. von Neumann began exploring such thoughts in 1929[31]; a most excellent, well readable, sophomore level account of these ideas is given in Chapter 7 of Stillwell’s book[32].


[1] Vladimir Ignatowski, “Einige allgemeine Bemerkungen über das Relativitätsprinzip”, Physikalische Zeitschrift, 11, 1910, pp. 972–976; English translation “Some General Remarks on the Relativity Principle”, from WikiSource \url{en.wikisource.org/wiki/Translation:Some_General_Remarks_on_the_Relativity_Principle}

[2] Albert Einstein, “Zur Elektrodynamik bewegter Körper”, Annalen der Physik, 18, p891, 1905; English translation “On the Electrodynamics of Moving Bodies (1920 Edition)” by Meghnan Saha from Wikisource \url{en.wikisource.org/wiki/On_the_Electrodynamics_of_Moving_Bodies_(1920_edition)}; also English translation in W. Perrett \& G. B. Jeffrey (translators), The Principle of Relativity, Dover Books, 1952

[3] Benjamin Crowell, General Relativity, Light and Matter, 2009, \url{lightandmatter.com}

[4] Stefano Liberati, Sebastiano Sonego \& Matt Visser, “Faster-than-$c$ signals, special relativity and causality”, Annals of Physics 298 pp167-285, \url{arXiv:gr-qc/0107091}

[5] Jean-Marc Lévy-Leblond, “One more derivation of the Lorentz transformation”, American Journal of Physics, 44 (3), pp271-277 1976

[6] Wolfgang Rindler, Essential Relativity, second edition, Springer, 1977 pp51-53

[7] Bernard Schutz, A First Course in General Relativity, Cambridge University Press, 2009; the quote is in \S1.1, “Fundamental principles of special relativity (SR) theory” and the whole of chapter 1 is an excellent introduction to special relativity.

[8] Galileo Galilei, Dialogue Concerning the Two Chief World Systems, translation of Dialogo sopra i due massimi sistemi del mondo‘ (1632) by Stillman Drake, University of California Press, 1953 pp 186 – 187; the Allegory of Salviati’s Ship

[9] Hermann Minkowski, “Raum und Zeit”, Vortrag auf der 80. Naturforscher-Versammlung, Köln, 21. Sept. 1908; Wikisource \url{de.wikisource.org/wiki/Raum_und_Zeit_(Minkowski)}; English translation “Space and Time (1920 Edition)”, address from the $80^{th}$ Assembly of Natural Scientists at Cologne, $21^{st}$ Sept. 1908 by Meghnan Saha, from Wikisource \url{https://en.wikisource.org/wiki/Space_and_Time_(Saha)}; also English translation in W. Perrett \& G. B. Jeffrey (translators), “The Principle of Relativity”, Dover Books, 1952

[10] E. Hewitt \& K. R. Stromberg, Real and Abstract Analysis (Graduate Texts in Mathematics), Springer-Verlag, Berlin, 1965. Chapter 1, \S5 constructs all solutions to the Cauchy equation $f:\mathbb{R}\to\mathbb{R};\,f(x+y)=f(x)+f(y)$

[11] B. Rossi \& D. B. Hall, “Variation of the Rate of Decay of Mesotrons with Momentum”, Phys. Rev., 59 (3), 1941, pp 223–228

[12] D. H. Frisch \& J. H. Smith, “Measurement of the Relativistic Time Dilation Using $\mu$-Mesons”, Am. J. Phys. 31 (5), 1963, pp342–355

[13] Greg Egan, The Orthogonal Trilogy: book 1 The Clockwork Rocket, (2011) book 2 The Eternal Flame, (2012) both published Nightshade Books; book three The Arrows of Time, (2013) Orion Publishing Group

[14] Greg Egan, “Plus, Minus: A Gentle Introduction to the Physics of Orthogonal”, Retrieved from Egan’s website $16^{th}$ June, 2015: \url{gregegan.customer.netspace.net.au/ORTHOGONAL/00/PM.html}

[15] Edwin Cartlidge, “Official Word on Superluminal Neutrinos Leaves Warp-Drive Fans a Shred of Hope—Barely”, Science Insider, February 24, 2012, retrieved $16^{th}$ June, 2015: \url{http://news.sciencemag.org/2012/02/official-word-superluminal-neutrinos-leaves-warp-drive-fans-shred-hope%E2%80%94barely}

[16] M. C. Gonzalez-Garcia \& Michele Maltoni, “Phenomenology with Massive Neutrinos”, Physics Reports 460, Issues 1-3, pp1–129, 2008. \url{arxiv.org/abs/0704.1800}

[17] Antonio Accioly, José Helay{\”e}l-Neto, Eslley Scatena “Upper bounds on the photon mass” Phys. Rev. D 82, p065026, 2010. \url{arxiv.org/abs/1012.2717}

[18] Alfred Scharff Goldhaber \& Michael Martin Nieto “Photon and Graviton Mass Limits” Rev. Mod. Phys. 82, pp939-979, 2010. \url{arxiv.org/abs/0809.1003}

[19] Fermi GBM/LAT Collaborations, “Testing Einstein’s special relativity with Fermi’s short hard gamma-ray burst GRB090510”, Nature 462, pp331-334,2009. \url{arxiv.org/abs/0908.1832}

[20] S. Herrmann, A. Senger, K. Möhle, M. Nagel, E. V. Kovalchuk \& A. Peters, “Rotating optical cavity experiment testing Lorentz invariance at the $10^{-17}$ level”. Phys. Rev. D 80 (100), p105011, 2008. \url{arxiv.org/abs/1002.1284}

[21] Brian Greene, Randall MacLowry \& Joseph McMaster, The Illusion of Time, Video documentary by WGBH-Boston (TV) of the PBS network. Reference to the historical co-incidence of Einstein and the need to precisely synchronize railway networks begins eight minutes in.

[22] Dava Sobel, Longitude, Bloomsbury, 1995

[23] Peter Galison, “Einstein’s Clocks: The Question of Time”, Critical Inquiry, 26, (2), pp355-389 Winter, 2000. \url{www.jstor.org/stable/1344127}

[24] Llewellyn Thomas, “The Motion of the Spinning Electron”, Letter to the Editor, Nature, $10^{th}$ April 1926

[25] Llewellyn Thomas, “The kinematics of an electron with an axis”, Philosophical Magazine Series 7, 3, (13), pp1-22, 1927. We derive equations 2.2 and 2.3 in Thomas’s paper.

[26] E. P. Wigner, “On unitary representations of the inhomogeneous Lorentz group”, Ann. Math. 40, 1939, pp149-204

[27] Wulf Rossmann, Lie Groups: An Introduction Through Linear Groups, Oxford Graduate Texts in Mathematics, 2006. The Lie Correspondence is proven in detail in §2.5.

[28] Scott Walter,”The Non-Euclidean Style of Minkowskian Relativity”, \url{ http://www.fisica.net/relatividade/the_non_euclidean_style_of_minkowskian_relativity_by_scott_walter.pdf}. Summary of the history of Minkowski’s geometrical ideas.

[29] Deane Montgomery \& Leo Zippin, Topological Transformation Groups, Interscience Publishing, 1955. The classic text by the discoverers of the accepted solution to Hilbert’s fifth problem; remarkably readable, although with effort.

[30] Richard P. Feynman, Robert B. Leighton, and Matthew Sands,
The Feynman Lectures on Physics, Vol.1 (Addison-Wesley, 1964), Chapter 22, “Algebra”. Also online \url{http://www.feynmanlectures.caltech.edu/I_22.html} Feynman describes here, amongst other wonderful things, the iterated square root approach Briggs used to calculate his tables.

[31] John von Neumann, “Über die analytischen Eigenschaften von Gruppen linearer Transformationen und ihrer Darstellungen”, Mathematische Zeitschrift, 30, 1929, pp3-42. Here von Neumann began his journey to his breakthrough in Hilbert’s Fifth Problem for compact groups in 1933, and he uses the matrix logarithm to broaden Briggs’s iterated square root method.

[32] John C. Stillwell, Naïve Lie Theory, Springer Science + Business Media, 2008. Chapter 7 discusses the closed matrix Lie group version of these ideas.