<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John D. Cook</title>
	<atom:link href="http://www.johndcook.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.johndcook.com/blog</link>
	<description>Applied Mathematics Consulting</description>
	<lastBuildDate>Thu, 11 Jun 2026 13:14:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://www.johndcook.com/wp-content/uploads/2020/01/cropped-favicon_512-32x32.png</url>
	<title>John D. Cook</title>
	<link>https://www.johndcook.com/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Solving a chess puzzle with Claude and Prolog</title>
		<link>https://www.johndcook.com/blog/2026/06/11/prolog-claude/</link>
					<comments>https://www.johndcook.com/blog/2026/06/11/prolog-claude/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 11 Jun 2026 13:14:26 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Chess]]></category>
		<category><![CDATA[Logic]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247152</guid>

					<description><![CDATA[<p>Prolog is the original logic programming language. The name comes from programming in logic. More specifically, the name comes from programmation en logique because the inventor of the language, Philippe Roussel, is French. Prolog has its advantages and disadvantages. One of the advantages is that the language represents logical problems directly. One of the disadvantages [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/11/prolog-claude/">Solving a chess puzzle with Claude and Prolog</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Prolog is the original logic programming language. The name comes from <span style="text-decoration: underline;">pro</span>gramming in <span style="text-decoration: underline;">log</span>ic. More specifically, the name comes from <em><span lang="fr"><span style="text-decoration: underline;">pro</span>grammation en <span style="text-decoration: underline;">log</span>ique</span></em> because the inventor of the language, Philippe Roussel, is French.</p>
<p>Prolog has its advantages and disadvantages. One of the advantages is that the language represents logical problems directly. One of the disadvantages is that the syntax can be quirky. But if an LLM is writing the code, or at least helping to write the code, the syntax doesn&#8217;t matter so much.</p>
<p>I wanted to see how well Claude (Sonnet 4.6, medium effort) could solve a <a href="https://www.johndcook.com/blog/2025/05/14/another-little-chess-puzzle/">chess puzzle</a> by Martin Gardner that I wrote about a little over a year ago. I chose a relatively obscure problem rather than something like the Eight Queens puzzle because an LLM could simply quote one of countless articles on the puzzle.</p>
<h2>The puzzle</h2>
<p>As I stated in the post last year, the task is to place two rooks, two bishops, and two knights on a 4 by 4 chessboard so that no piece attacks any other.</p>
<p><img fetchpriority="high" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/crowded5.png" width="297" height="250" /></p>
<p>There are two basic solutions, twelve if you count rotations and reflections as different solutions.</p>
<h2>Prolog results</h2>
<p>Claude wrote an SWI-Prolog program that I ran with</p>
<pre>
swipl -g "run, halt" chess_placement.pl
</pre>
<p>and it gave the following output.</p>
<pre>
Total solutions (raw): 12
Solutions up to D4 symmetry: 2
Solutions up to D4 symmetry: 2

Solution 1:
B . . N
N . . B
. R . .
. . R .

Solution 2:
. B N .
. . . R
R . . .
. N B .
</pre>
<p>Claude made a nice image of the solutions.</p>
<p><img decoding="async" src="https://www.johndcook.com/claude_chess_solution.svg" width="536" height="300" class="aligncenter size-medium" /></p>
<p>Note that you can make eight variations of the first solution by a combination of rotations and flips. There are only four variations of the second solution because two rotations put you back where you started.</p>
<h2>Prolog source</h2>
<p>Here&#8217;s the Prolog code that Claude generated.</p>
<pre>
% ============================================================
%  Non-attacking placement of 2 Rooks, 2 Bishops, 2 Knights
%  on a 4x4 chessboard
%
%  Run with SWI-Prolog:
%    swipl -g "run, halt" chess_placement.pl
%
%  Or interactively:
%    swipl chess_placement.pl
%    ?- run.
%    ?- count_solutions(N).
%    ?- count_unique(N).
% ============================================================

:- use_module(library(lists)).

% ---- Board squares: (Row, Col) with Row, Col in 0..3 ----------------

square(R, C) :- member(R, [0,1,2,3]), member(C, [0,1,2,3]).

% ---- Attack predicates ----------------------------------------------

rook_attacks(R,_,R,_).          % same row
rook_attacks(_,C,_,C).          % same column

bishop_attacks(R1,C1,R2,C2) :-
    DR is abs(R1-R2), DC is abs(C1-C2),
    DR =:= DC, DR > 0.

knight_attacks(R1,C1,R2,C2) :-
    DR is abs(R1-R2), DC is abs(C1-C2),
    ( DR =:= 1, DC =:= 2 ; DR =:= 2, DC =:= 1 ).

% attacks(+Type1, +R1, +C1, +Type2, +R2, +C2)
% True if either piece threatens the other's square.
attacks(rook,   R1,C1, _,     R2,C2) :- rook_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, rook,  R2,C2) :- rook_attacks(R1,C1,R2,C2).
attacks(bishop, R1,C1, _,     R2,C2) :- bishop_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, bishop,R2,C2) :- bishop_attacks(R1,C1,R2,C2).
attacks(knight, R1,C1, _,     R2,C2) :- knight_attacks(R1,C1,R2,C2).
attacks(_,      R1,C1, knight,R2,C2) :- knight_attacks(R1,C1,R2,C2).

% ---- All pairs safe -------------------------------------------------

% Check all unordered pairs in a list.
no_attack_pair(_, []).
no_attack_pair(T1-(R1,C1), [T2-(R2,C2)|Rest]) :-
    \+ attacks(T1,R1,C1,T2,R2,C2),
    no_attack_pair(T1-(R1,C1), Rest).

all_pairs_safe([]).
all_pairs_safe([P|Rest]) :-
    no_attack_pair(P, Rest),
    all_pairs_safe(Rest).

% ---- Generate a placement -------------------------------------------
%
%  Placement = [rook-(R1,C1), rook-(R2,C2),
%               bishop-(R3,C3), bishop-(R4,C4),
%               knight-(R5,C5), knight-(R6,C6)]
%
%  We enforce an ordering within each pair to avoid counting
%  same-type duplicates (rook1 vs rook2 swapped).
%  Squares are represented canonically: (R,C) with R*4+C as index.

sq_index(R, C, I) :- I is R*4 + C.

placement(Placement) :-
    % --- Rooks ---
    square(R1r, C1r), square(R2r, C2r),
    sq_index(R1r,C1r,I1r), sq_index(R2r,C2r,I2r),
    I1r < I2r,          % canonical order for the two rooks

    % --- Bishops ---
    square(R1b, C1b), square(R2b, C2b),
    sq_index(R1b,C1b,I1b), sq_index(R2b,C2b,I2b),
    I1b < I2b,          % canonical order for the two bishops

    % --- Knights ---
    square(R1n, C1n), square(R2n, C2n),
    sq_index(R1n,C1n,I1n), sq_index(R2n,C2n,I2n),
    I1n < I2n,          % canonical order for the two knights

    % --- All six squares distinct ---
    Squares = [(R1r,C1r),(R2r,C2r),(R1b,C1b),(R2b,C2b),(R1n,C1n),(R2n,C2n)],
    all_distinct_squares(Squares),

    % --- Build placement list ---
    Placement = [ rook-(R1r,C1r),   rook-(R2r,C2r),
                  bishop-(R1b,C1b), bishop-(R2b,C2b),
                  knight-(R1n,C1n), knight-(R2n,C2n) ],

    % --- No piece attacks any other ---
    all_pairs_safe(Placement).

all_distinct_squares([]).
all_distinct_squares([S|Rest]) :-
    \+ member(S, Rest),
    all_distinct_squares(Rest).

% ---- Count all solutions --------------------------------------------

count_solutions(N) :-
    findall(P, placement(P), Ps),
    length(Ps, N),
    format("Total solutions (raw): ~w~n", [N]).

% ---- D4 symmetry transformations on a 4x4 board ---------------------
%  Squares (R,C) in 0..3

transform(identity, R, C, R,  C).
transform(rot90,    R, C, C,  Nr) :- Nr is 3-R.
transform(rot180,   R, C, Nr, Nc) :- Nr is 3-R, Nc is 3-C.
transform(rot270,   R, C, Nc, R)  :- Nc is 3-C.
transform(flipH,    R, C, R,  Nc) :- Nc is 3-C.
transform(flipV,    R, C, Nr, C)  :- Nr is 3-R.
transform(flipD1,   R, C, C,  R).
transform(flipD2,   R, C, Nr, Nc) :- Nr is 3-C, Nc is 3-R.

apply_transform(_, [], []).
apply_transform(T, [Type-(R,C)|Rest], [Type-(NR,NC)|TRest]) :-
    transform(T, R, C, NR, NC),
    apply_transform(T, Rest, TRest).

% Canonical form: sort pieces within same-type pairs, then sort the
% whole list to get a unique representative.
canonical_placement(Placement, Canonical) :-
    findall(T, member(T,[identity,rot90,rot180,rot270,
                          flipH,flipV,flipD1,flipD2]), Ts),
    maplist(transform_and_sort(Placement), Ts, AllForms),
    msort(AllForms, Sorted),
    Sorted = [Canonical|_].

transform_and_sort(Placement, T, Sorted) :-
    apply_transform(T, Placement, TPl),
    msort(TPl, Sorted).

% ---- Count solutions up to D4 symmetry ------------------------------

count_unique(N) :-
    findall(P, placement(P), Ps),
    maplist(canonical_placement, Ps, Canonicals),
    list_to_set(Canonicals, Unique),
    length(Unique, N),
    format("Solutions up to D4 symmetry: ~w~n", [N]).

% ---- Pretty-print a board -------------------------------------------

print_board(Placement) :-
    forall(member(R, [0,1,2,3]),
           print_row(R, Placement)),
    nl.

print_row(R, Placement) :-
    forall(member(C, [0,1,2,3]),
           print_cell(R, C, Placement)),
    nl.

print_cell(R, C, Placement) :-
    (   member(rook-(R,C),   Placement) -> write('R ')
    ;   member(bishop-(R,C), Placement) -> write('B ')
    ;   member(knight-(R,C), Placement) -> write('N ')
    ;   write('. ')
    ).

% ---- Print all unique solutions -------------------------------------

print_unique_solutions :-
    findall(P, placement(P), Ps),
    maplist(canonical_placement, Ps, Canonicals),
    list_to_set(Canonicals, Unique),
    length(Unique, N),
    format("~nSolutions up to D4 symmetry: ~w~n~n", [N]),
    forall(nth1(I, Unique, Sol),
           ( format("Solution ~w:~n", [I]),
             print_board(Sol) )).

% ---- Top-level entry point ------------------------------------------

run :-
    count_solutions(Raw),
    count_unique(Sym),
    format("~n"),
    print_unique_solutions,
    format("Summary: ~w raw solutions, ~w up to D4 symmetry.~n",
           [Raw, Sym]).
</pre>The post <a href="https://www.johndcook.com/blog/2026/06/11/prolog-claude/">Solving a chess puzzle with Claude and Prolog</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/06/11/prolog-claude/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Formally proving a calculation with Claude and Lean</title>
		<link>https://www.johndcook.com/blog/2026/06/10/claude-and-lean/</link>
					<comments>https://www.johndcook.com/blog/2026/06/10/claude-and-lean/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Wed, 10 Jun 2026 23:11:01 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Artificial intelligence]]></category>
		<category><![CDATA[Formal methods]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247144</guid>

					<description><![CDATA[<p>I ran an experiment today to see whether Claude [1] could generate Lean code to prove a calculation at the bottom of this post, six lines of calculus. I started with this prompt This page contains a mathematical proof that a Fourier coefficient, a_n, is given in terms of a Bessel function. The LaTeX source [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/10/claude-and-lean/">Formally proving a calculation with Claude and Lean</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>I ran an experiment today to see whether Claude [1] could generate Lean code to prove a calculation at the bottom of <a href="https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/">this post</a>, six lines of calculus.</p>
<p>I started with this prompt</p>
<p style="padding-left: 40px;">This page contains a mathematical proof that a Fourier coefficient, a_n, is given in terms of a Bessel function. The LaTeX source for the SVG image is contained in the alt tag of the image. Generate a formal proof of the result using Lean.</p>
<p>and give it the URL of the post. It produced a proof without any further input, but the proof wouldn&#8217;t build.</p>
<p>I pasted in the error messages and let Claude try to fix the proof, deliberately not giving it any help other than pasting in error messages. It took eight iterations, but eventually it worked.</p>
<p>There were four places in the proof marked &#8220;sorry,&#8221; which Claude explained as follows.</p>
<p style="padding-left: 40px;"><b>Why the <code>sorry</code>s?</b></p>
<p style="padding-left: 40px;">Real.besselJ definition or a <code>Kepler</code> namespace. The three remaining <code>sorry</code>s correspond to <strong>standard analysis lemmas</strong> (IBP and change-of-variables) that exist in Mathlib under <code>intervalIntegral</code> — the file cites the exact lemma names needed to close each one. This is the accepted style for &#8220;roadmap&#8221; Lean proofs ahead of library coverage.</p>
<p>The full proof is given below.</p>
<pre>/-
  KeplerBessel.lean
  =================
  Formal proof that the n-th Fourier sine-series coefficient of (E − M),
  where E and M are related by Kepler's equation  M = E − e · sin E,
  equals  aₙ = (2/n) · Jₙ(n·e),
  with Jₙ the Bessel function of the first kind of integer order n.

  Mathematical content
  --------------------
  We expand  E(M) − M  in a sine series on [0, π]:

      E(M) − M = Σ_{n=1}^∞  aₙ · sin(n·M)

  The standard Fourier formula gives

      aₙ = (2/π) ∫₀^π (E(M) − M) sin(n·M) dM.

  Integrating by parts (boundary terms vanish because E(0)=0 and E(π)=π):

      aₙ = (2/(nπ)) ∫₀^π (E'(M) − 1) cos(n·M) dM
         = (2/(nπ)) ∫₀^π E'(M) cos(n·M) dM     -- the "−1" term vanishes

  Changing variable M ↦ E via M = E − e·sin E  (so E'(M) dM = dE):

      aₙ = (2/(nπ)) ∫₀^π cos(n·E − n·e·sin E) dE
         = (2/n) · Jₙ(n·e).

  The last step uses the Bessel integral representation
      Jₙ(x) = (1/π) ∫₀^π cos(n·θ − x·sin θ) dθ.
-/

import Mathlib

open Real MeasureTheory intervalIntegral Filter Set

noncomputable section

/-! ---------------------------------------------------------------
    §1  Variables
    --------------------------------------------------------------- -/

variable (e : ℝ) (he : 0 ≤ e) (he1 : e &lt; 1) /-! --------------------------------------------------------------- §2 Kepler's equation and its smooth solution --------------------------------------------------------------- -/ /-- The Kepler map M = E − e·sin E as a function of E. -/ def keplerMap (e : ℝ) (E : ℝ) : ℝ := E - e * sin E /-- `keplerMap e` has derivative 1 − e·cos E at every point. -/ lemma keplerMap_hasDerivAt (e E : ℝ) : HasDerivAt (keplerMap e) (1 - e * cos E) E := -- keplerMap e = fun x =&gt; x - e * sin x, so HasDerivAt follows directly
  -- from sub-rule and const_mul applied to hasDerivAt_sin.
  (hasDerivAt_id E).sub ((hasDerivAt_sin E).const_mul e)

/-- The derivative of `keplerMap e` is positive when e &lt; 1. -/
lemma keplerMap_deriv_pos {e' : ℝ} (he' : 0 ≤ e') (he1' : e' &lt; 1) (E : ℝ) :
    0 &lt; 1 - e' * cos E := by
  have hcos : cos E ≤ 1 := cos_le_one E
  nlinarith [mul_le_of_le_one_right he' hcos]

/-- `keplerMap e` is strictly monotone when e &lt; 1.
    Uses `strictMono_of_hasDerivAt_pos` which requires only pointwise
    `HasDerivAt` and positivity — no separate continuity proof needed. -/
lemma keplerMap_strictMono {e' : ℝ} (he' : 0 ≤ e') (he1' : e' &lt; 1) : StrictMono (keplerMap e') := strictMono_of_hasDerivAt_pos (fun E =&gt; keplerMap_hasDerivAt e' E)
    (fun E =&gt; keplerMap_deriv_pos he' he1' E)

/-!
  We axiomatise the inverse  eccAnom : ℝ → ℝ → ℝ  and its key
  properties, all of which follow from the Inverse Function Theorem
  applied to the smooth, strictly monotone map  keplerMap e.
-/

/-- The eccentric anomaly: the smooth right-inverse of `keplerMap e`. -/
axiom eccAnom (e : ℝ) : ℝ → ℝ

/-- `eccAnom e M` satisfies Kepler's equation. -/
axiom eccAnom_kepler (e M : ℝ) :
    keplerMap e (eccAnom e M) = M

/-- `eccAnom e` is differentiable, derivative = 1/(1 − e·cos(eccAnom e M)). -/
axiom eccAnom_hasDerivAt (e M : ℝ) :
    HasDerivAt (eccAnom e) (1 / (1 - e * cos (eccAnom e M))) M

/-- Boundary value at 0. -/
axiom eccAnom_zero (e : ℝ) : eccAnom e 0 = 0

/-- Boundary value at π. -/
axiom eccAnom_pi (e : ℝ) : eccAnom e π = π

/-! ---------------------------------------------------------------
    §3  Bessel function of the first kind (integer order)

    Defined by the classical integral representation.
    --------------------------------------------------------------- -/

/-- Bessel function J_n(x) via its integral representation. -/
def besselJ (n : ℕ) (x : ℝ) : ℝ :=
  (1 / π) * ∫ θ in (0 : ℝ)..π, cos (↑n * θ - x * sin θ)

/-! ---------------------------------------------------------------
    §4  Fourier coefficient

    Named  keplerFourierCoeff  to avoid clashing with Mathlib's own
    `fourierCoeff` which is defined on  AddCircle.
    --------------------------------------------------------------- -/

/-- The n-th Fourier sine coefficient of  eccAnom e M − M  on [0,π]. -/
def keplerFourierCoeff (e : ℝ) (n : ℕ) : ℝ :=
  (2 / π) * ∫ M in (0 : ℝ)..π,
    (eccAnom e M - M) * sin (↑n * M)

/-! ---------------------------------------------------------------
    §5  Main theorem
    --------------------------------------------------------------- -/

/--
  **Main theorem.** For n ≥ 1, the Fourier sine coefficient of the
  eccentric-anomaly displacement satisfies  aₙ = (2/n) · Jₙ(n·e).
-/
theorem keplerFourierCoeff_eq_besselJ (n : ℕ) (hn : 1 ≤ n) :
    keplerFourierCoeff e n = (2 / (n : ℝ)) * besselJ n (↑n * e) := by
  simp only [keplerFourierCoeff, besselJ]
  -- Goal:
  --   (2/π) · ∫₀^π (E(M)−M)·sin(nM) dM
  --   = (2/n) · (1/π) · ∫₀^π cos(nθ − ne·sinθ) dθ

  -- ── Step 1: Integration by parts ─────────────────────────────────────
  -- u = E(M)−M,  dv = sin(nM)dM  →  v = −cos(nM)/n
  -- Boundary: [uv]₀^π = 0 by eccAnom_zero, eccAnom_pi.
  -- Result: (2/π)∫(E−M)sin(nM)dM = (2/π)(1/n)∫(E'(M)−1)cos(nM)dM
  --
  -- Mathlib lemma: intervalIntegral.integral_mul_deriv
  --   (or integral_deriv_mul_eq_sub_of_hasDerivAt applied to
  --    u = eccAnom e − id,  v = −sin(n··)/n)
  have step1 :
      (2 / π) * ∫ M in (0 : ℝ)..π, (eccAnom e M - M) * sin (↑n * M)
      = (2 / π) * (1 / ↑n) *
          ∫ M in (0 : ℝ)..π, (deriv (eccAnom e) M - 1) * cos (↑n * M) := by
    sorry

  -- ── Step 2: The "−1" integral vanishes ───────────────────────────────
  -- ∫₀^π cos(nM) dM = [sin(nM)/n]₀^π = 0  (integer n ≥ 1)
  -- Mathlib: integral_cos, Real.sin_nat_mul_pi
  have cos_int_zero :
      ∫ M in (0 : ℝ)..π, cos (↑n * M) = 0 := by
    sorry

  have step2 :
      ∫ M in (0 : ℝ)..π, (deriv (eccAnom e) M - 1) * cos (↑n * M)
      = ∫ M in (0 : ℝ)..π, deriv (eccAnom e) M * cos (↑n * M) := by
    have key : ∀ M : ℝ, (deriv (eccAnom e) M - 1) * cos (↑n * M)
                       = deriv (eccAnom e) M * cos (↑n * M) - cos (↑n * M) := by
      intro M; ring
    simp_rw [key]
    rw [intervalIntegral.integral_sub _ _]
    · rw [cos_int_zero, sub_zero]
    · -- IntervalIntegrable (deriv (eccAnom e) · cos(n··))
      sorry
    · exact (continuous_cos.comp (continuous_const.mul continuous_id')).intervalIntegrable 0 π

  -- ── Step 3: Change of variable M ↦ E via Kepler's equation ───────────
  -- Under M = E − e·sin E:  E'(M) dM = dE,  cos(nM) = cos(nE − ne·sinE)
  -- Mathlib: MeasureTheory.integral_image_eq_integral_abs_deriv_smul
  --       or intervalIntegral.integral_comp_deriv
  have step3 :
      ∫ M in (0 : ℝ)..π, deriv (eccAnom e) M * cos (↑n * M)
      = ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E) := by
    sorry

  -- ── Step 4: Recognise the Bessel integral ────────────────────────────
  -- (1/π)∫₀^π cos(nE − ne·sinE)dE = J_n(ne)  by definition.
  have step4 :
      (1 / π) * ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)
      = besselJ n (↑n * e) := by
    simp only [besselJ]

  -- ── Combine all steps ────────────────────────────────────────────────
  have hpi : π &gt; 0 := Real.pi_pos
  have hn_pos : (0 : ℝ) &lt; ↑n := by exact_mod_cast (show 0 &lt; n by omega)
  rw [step1, step2, step3]
  -- Goal: (2/π)·(1/n)·∫cos = (2/n)·(1/π)·∫cos
  -- Both sides equal (2/(nπ))·∫cos; show by ring after moving (1/π) inside.
  have factored : (2 / π) * (1 / ↑n) *
      ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)
      = (2 / ↑n) * ((1 / π) *
      ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)) := by
    ring
  rw [factored, step4]

end

/-! ---------------------------------------------------------------
    §6  Summary of proof obligations

    Axioms (all standard consequences of the Inverse Function Theorem
    applied to the smooth, strictly increasing map  keplerMap e):
      • eccAnom           — existence of the smooth inverse
      • eccAnom_kepler    — it satisfies Kepler's equation
      • eccAnom_hasDerivAt — C¹ with derivative 1/(1 − e·cos E)
      • eccAnom_zero      — E(0) = 0
      • eccAnom_pi        — E(π) = π

    `sorry`s (each closes with a standard Mathlib lemma):
      • step1        IBP via
                      intervalIntegral.integral_deriv_mul_eq_sub_of_hasDerivAt
      • cos_int_zero  ∫₀^π cos(nM)dM = 0 via
                      integral_cos + Real.sin_nat_mul_pi
      • step2 integrability  IntervalIntegrable for deriv(eccAnom e)·cos(n·)
      • step3        Change of variables via
                      MeasureTheory.integral_image_eq_integral_abs_deriv_smul

    §7  Finding minimal imports

    Once the file builds cleanly, add at the bottom:

        #min_imports

    and the Lean Infoview will report the exact minimal import list
    for the version of Mathlib you have installed.
    --------------------------------------------------------------- -/
</pre>
<p>[1] I used the default setting: Sonnet 4.6, low effort, &#8220;thinking&#8221; turned off.</p>The post <a href="https://www.johndcook.com/blog/2026/06/10/claude-and-lean/">Formally proving a calculation with Claude and Lean</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/06/10/claude-and-lean/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Pulling on a thread</title>
		<link>https://www.johndcook.com/blog/2026/06/10/pulling-on-a-thread/</link>
					<comments>https://www.johndcook.com/blog/2026/06/10/pulling-on-a-thread/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Wed, 10 Jun 2026 13:34:31 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247139</guid>

					<description><![CDATA[<p>Often there&#8217;s a thread running through a sequence of my posts. Sometimes I make this explicit and sometimes I don&#8217;t. The latest thread started with this post commenting on a tweet that observed that exp(−x²) ≈ (1 + cos(sin(x) + x))/2. Some people said online that that the approximation is simply due to the first [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/10/pulling-on-a-thread/">Pulling on a thread</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Often there&#8217;s a thread running through a sequence of my posts. Sometimes I make this explicit and sometimes I don&#8217;t.</p>
<p>The latest thread started with <a href="https://www.johndcook.com/blog/2026/05/31/another-gaussian-approximation/">this post</a> commenting on a tweet that observed that</p>
<p style="padding-left: 40px;">exp(−<em>x</em>²) ≈ (1 + cos(sin(<em>x</em>) + <em>x</em>))/2.</p>
<p>Some people said online that that the approximation is simply due to the first few terms of the Taylor series on both sides matching up, so I wrote a <a href="https://www.johndcook.com/blog/2026/06/01/not-just-taylor-series/">follow up post</a> explaining that it&#8217;s not that simple.</p>
<p>The series for the left hand side alternates and converges very slowly, which lead to the post on <a href="https://www.johndcook.com/blog/2026/06/03/naive-sum/">naively summing</a> an alternating series.</p>
<p>The series for the right hand side lead to this point on <a href="https://www.johndcook.com/blog/2026/06/04/partitions-over-permutations/">partitions over permutations</a>.</p>
<p>Integrating the right hand side lead to <a href="https://www.johndcook.com/blog/2026/06/04/integrating-smooth-periodic-functions/">this post</a> on how the simplest numerical integration rule works shockingly well on some problems.</p>
<p>The exact value of the integral turns out to be given by a <a href="https://www.johndcook.com/blog/2026/06/05/mr-bessels-eponymous-functions/">Bessel function</a>, details given in <a href="https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/">this post</a>.</p>
<p>Mr. Bessel&#8217;s interest in the functions now named after him started with looking closely at a solution to Kepler&#8217;s equation in orbital mechanics. Thinking about Kepler&#8217;s equation lead to the posts on <a href="https://www.johndcook.com/blog/2026/06/07/the-laplace-limit/">the Laplace limit</a> and on <a href="https://www.johndcook.com/blog/2026/06/07/aitkin-acceleration-kepler/">series acceleration</a>.</p>
<p>I may be done pulling on this thread. I don&#8217;t have anything else in mind that I want to explore for now, but you never know.</p>The post <a href="https://www.johndcook.com/blog/2026/06/10/pulling-on-a-thread/">Pulling on a thread</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/06/10/pulling-on-a-thread/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Aitken acceleration before Aitken</title>
		<link>https://www.johndcook.com/blog/2026/06/07/aitkin-acceleration-kepler/</link>
					<comments>https://www.johndcook.com/blog/2026/06/07/aitkin-acceleration-kepler/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 07 Jun 2026 20:14:45 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247135</guid>

					<description><![CDATA[<p>Kepler solved his eponymous equation M = E − e sin(E) by finding a fixed point of E = M + e sin(E). So guess a value of E and stick it into the right hand side. Then plug that value into the right hand side again. Kepler said a couple iterations should be enough. And a couple [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/07/aitkin-acceleration-kepler/">Aitken acceleration before Aitken</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Kepler solved his eponymous equation</p>
<p style="padding-left: 40px;"><em>M</em> = <em>E</em> − <em>e</em> sin(<em>E</em>)</p>
<p>by finding a fixed point of</p>
<p style="padding-left: 40px;"><em>E</em> = <em>M</em> + <em>e</em> sin(<em>E</em>).</p>
<p>So guess a value of <em>E</em> and stick it into the right hand side. Then plug that value into the right hand side again. Kepler said a couple iterations should be enough. And a couple iterations <em>are</em> enough if the eccentricity <em>e</em> is small and you don&#8217;t need much accuracy.</p>
<p>The rate of convergence is determined by <em>e</em>. Kepler implicitly had in mind small values of <em>e</em> because he wasn&#8217;t aware of anything orbiting the sun in a highly elliptical orbit. Here&#8217;s an example with eccentricity 0.05, about the eccentricity of the orbits of Jupiter and Saturn.</p>
<pre>from math import sin

M, E, e = 1, 1, 0.05
for _ in range(5):
    E = M + e*sin(E)
    residual = M - (E - e*sin(E))
    print(residual)
</pre>
<p>The residual after just two iterations is 2.77 × 10<sup>−5</sup>. If you change <em>e</em> to 0.2, the eccentricity of Mercury&#8217;s orbit, it takes three iterations to get comparable accuracy. Mercury has the most eccentric orbit of any object Kepler would have known about.</p>
<p>Now suppose you&#8217;d like to solve for <em>E</em> when <em>M</em> = 1 for Halley&#8217;s comet, and you&#8217;d like an error of less than 10<sup>−8</sup>. Now you need 16 iterations.</p>
<p>C. F. W. Peters discovered a faster algorithm in 1891.</p>
<p style="padding-left: 40px;"><em>E</em><sub>1</sub> = <i>M</i> + <em>e</em> sin(<em>E</em><sub>0</sub>)<br />
<em>E</em><sub>2</sub> = <i>M</i> + <em>e</em> sin(<em>E</em><sub>1</sub>)<br />
<em>E</em><sub>3</sub> = (<em>E</em><sub>2</sub> <em>E</em><sub>0</sub> − <em>E</em><sub>1</sub>²)/(<em>E</em><sub>2</sub> − 2<em>E</em><sub>1</sub> + <em>E</em><sub>0</sub>)</p>
<p>Let&#8217;s look at the results of doing three iterations of Peters&#8217; method for Halley&#8217;s comet.</p>
<pre>M, E0, e = 1, 1, 0.967
for _ in range(3):
    E1 = M + e*sin(E0)
    E2 = M + e*sin(E1)
    E3 = (E2*E0 - E1**2)/(E2 - 2*E1 + E0)
    residual = M - (E - e*sin(E3))
    print(residual)
    E0 = E3 # for next iteration
</pre>
<p>This gives a residual of −7.23 × 10<sup>−10</sup>. Each iteration of Peters&#8217; method requires a little more than twice as much work as an iteration of Kepler&#8217;s method, but 3 iterations of Peters&#8217; method accomplished more than 16 iterations of Kepler&#8217;s method.</p>
<p>Peters&#8217; algorithm from 1891 was a special case of Alexander Aitken&#8217;s series acceleration method published in 1926.</p>
<h2>Related posts</h2>
<ul>
<li class='link'><a href='https://www.johndcook.com/blog/2019/08/02/aitken-acceleration/'>Aitken acceleration</a></li>
<li class='link'><a href='https://www.johndcook.com/blog/2019/08/01/accelerating-an-alternating-series/'>Euler acceleration</a></li>
<li class='link'><a href='https://www.johndcook.com/blog/2020/08/06/cohen-acceleration/'>Cohen acceleration</a></li>
</ul>The post <a href="https://www.johndcook.com/blog/2026/06/07/aitkin-acceleration-kepler/">Aitken acceleration before Aitken</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/06/07/aitkin-acceleration-kepler/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>The Laplace limit</title>
		<link>https://www.johndcook.com/blog/2026/06/07/the-laplace-limit/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 07 Jun 2026 19:06:36 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Orbital mechanics]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247128</guid>

					<description><![CDATA[<p>An earlier post discussed how to solve Kepler&#8217;s equation M = E − e sin(E) using a sine series. You could also solve Kepler&#8217;s equation using a power series, which Lagrange did in 1771. Both approaches express E as a function of e and M, but from different perspectives. Bessel thought of his solution as a sum of sines in [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/07/the-laplace-limit/">The Laplace limit</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>An <a href="https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/">earlier post</a> discussed how to solve Kepler&#8217;s equation</p>
<p style="padding-left: 40px;"><em>M</em> = <em>E</em> − <em>e</em> sin(<em>E</em>)</p>
<p>using a sine series. You could also solve Kepler&#8217;s equation using a power series, which Lagrange did in 1771. Both approaches express <em>E</em> as a function of <em>e</em> and <em>M</em>, but from different perspectives. Bessel thought of his solution as a sum of sines in <em>M</em>, with coefficients that depend on <em>e</em>. Lagrange thought of his solution as a power series in <em>e</em> whose coefficients involve sines in <em>M</em>. You can rearrange the terms of either solution into the other.</p>
<p>The most interesting thing about the power series solution, in my opinion, is that it only converges for <em>e</em> less than roughly 2/3 while the sine series solution is valid for all <em>e</em> &lt; 1. In astronomical terms, this means the power series solution works for the orbit of some planets but not others!</p>
<p>In our solar system, the planets all have eccentricity well below 2/3, but not all minor planets do. For example, the orbit of Eris has eccentricity 0.4407 but the orbit of Sedna has eccentricity 0.8549. And in other solar systems there are planets with eccentricity much greater than 2/3.</p>
<h2>The Laplace limit</h2>
<p>The radius of convergence for Lagrange&#8217;s power series solution is called the Laplace limit. Its value is <em>e</em><sub><em>L</em></sub> = 0.6627…. There&#8217;s no obvious reason why there&#8217;s anything special about this value. There&#8217;s no astronomical reason for this value. It&#8217;s an artifact of the power series form of the solution.</p>
<p>If the series works for <em>e</em> = 0.66, you would reasonably think it works for <em>e</em> = 0.67, but that&#8217;s not the case. And if you&#8217;re observant, you might notice that although the series works for <em>e</em> = 0.66, it takes longer to converge than for smaller values of <em>e</em>; the rate of convergence is slowing down, warning you of danger ahead.</p>
<p>The exact value of <em>e</em><sub><em>L</em></sub> is the unique real solution to the equation</p>
<p><img decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/laplace_limit1.svg" alt="x \exp\left(\sqrt{1 + x^2}\right) = 1 + \sqrt{1 + x^2}" width="244" height="36" /></p>
<p>There&#8217;s no obvious reason for this either. It has to do with finding the largest circle that can fit in a lens-shaped region of convergence. More on that <a href="https://mathworld.wolfram.com/LaplaceLimit.html">here</a>.</p>
<p>We can calculate <em>e</em><sub><em>L</em></sub> with the following Python code.</p>
<pre>from math import exp
from scipy.optimize import root_scalar

def f(x):
    t = (1 + x*x)**0.5
    return x*math.exp(t) - 1 - t

sol = root_scalar(f, bracket=[0, 1], method='brentq')
print(sol.root)
</pre>
<p>This prints 0.6627434193491817.</p>
<h2>Series details</h2>
<p>We can use the <a href="https://www.johndcook.com/blog/2020/08/05/lagrange-inversion-formula/">Lagrange inversion formula</a> to find the series, just as Lagrange did two and a half centuries ago.</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/laplace_limit2.svg" alt="E = M+ \sum_{n=1}^{\infty} \frac{e^n}{n!} \frac{d^{\,n-1}}{dM^{\,n-1}} \left(\sin^n M\right)" width="253" height="54" /></p>
<p>The powers of sine can be expanded into the sum of sines of various frequencies and differentiated, leading to the equation</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/laplace_limit3.svg" alt="E = M+ \sum_{n=1}^{\infty} \frac{e^n}{2^{\,n-1}n!} \sum_{k=0}^{\lfloor n/2\rfloor} (-1)^k \binom{n}{k} (n-2k)^{n-1} \sin\!\big((n-2k)M\big)" width="495" height="59" /></p>
<p>&nbsp;</p>The post <a href="https://www.johndcook.com/blog/2026/06/07/the-laplace-limit/">The Laplace limit</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>A crank formula for π</title>
		<link>https://www.johndcook.com/blog/2026/06/07/a-crank-formula-for-%cf%80/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 07 Jun 2026 16:34:26 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247126</guid>

					<description><![CDATA[<p>I ran across a cranky formula for π based on physical constants here and decided to play around with it. The source describes λ as &#8220;wavelength (chosen in the microwave region)&#8221; and I thought perhaps you could chose a value of λ to make the equation work. But as a comment pointed out, the bracketed [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/07/a-crank-formula-for-%cf%80/">A crank formula for π</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>I ran across a cranky formula for π based on physical constants <a href="https://x.com/mathsrick_/status/2063283732535226568?s=20">here</a></p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/crank1.svg" alt="\pi = \left( \frac{E}{\frac{1}{2}mc^{2}} \right)^{1/2} \left[ J_{\lambda} \cdot \lambda^{5} \left( e^{\frac{hc}{\lambda kT}} - 1 \right) \right]" width="278" height="64" /></p>
<p>and decided to play around with it.</p>
<p>The source describes λ as &#8220;wavelength (chosen in the microwave region)&#8221; and I thought perhaps you could chose a value of λ to make the equation work. But as a <a href="https://x.com/QualiaQuanta/status/2063384173318975788">comment</a> pointed out, the bracketed expression is simply 2<em>hc</em>², independent of λ, due to Planck&#8217;s blackbody law. That means we can simplify the expression above to</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/crank2.svg" alt="\pi = 2\sqrt{2} hc \sqrt{\frac{E}{m}}" width="125" height="48" /></p>
<p>Now the values of <em>h</em> and <em>c</em> are known. In fact, they&#8217;re now <strong>exactly</strong> known by definition: other SI units are defined in terms of <em>h</em> and <em>c</em>. The mass of an electron is known to 11 significant figures.</p>
<p>But <em>E</em> in the equation above is &#8220;Total energy of the universe.&#8221; I don&#8217;t even know what that means. Does it refer to the observable universe? Does it include dark energy? Does it include the energy equivalent of mass?</p>
<p>I asked a couple LLMs that the total energy of the universe might mean and what its value might be, and they said something like &#8220;Depends. It might be zero. It might be infinite. But if I had to say, I&#8217;d say around 10<sup>70</sup> Joules.&#8221;</p>
<p>If we solve the equation above for <em>E</em> we get 2.8480347886530404 × 10<sup>19</sup> Joules. I have no idea how to justify that.</p>
<p>The expression for π is not dimensionless. I suppose you could choose some nonstandard units that would make the equation work.</p>
<p>The source I linked to above cites Mathematical Cranks by Underwood Dudley, but I couldn&#8217;t find it in the book.</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2018/09/11/koide/">Koide&#8217;s coincidence</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2021/03/31/coulombs-constant/">Coulomb&#8217;s constant</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2025/05/24/dungeons-dragons-and-numbers/">Dungeons, Dragons, and Numbers</a></li>
</ul>The post <a href="https://www.johndcook.com/blog/2026/06/07/a-crank-formula-for-%cf%80/">A crank formula for π</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>From Kepler to Bessel</title>
		<link>https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sat, 06 Jun 2026 18:46:02 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Orbital mechanics]]></category>
		<category><![CDATA[Special functions]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247118</guid>

					<description><![CDATA[<p>The previous post very briefly said that the integral representation for Bessel functions was motived by solving Kepler&#8217;s equation. This post will go into more detail. Kepler&#8217;s equation There are multiple ways to describe the position of a planet in an elliptical orbit around a star. For historical reasons, these descriptions have arcane names such [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/">From Kepler to Bessel</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The <a href="https://www.johndcook.com/blog/2026/06/05/mr-bessels-eponymous-functions/">previous post</a> very briefly said that the integral representation for Bessel functions was motived by solving Kepler&#8217;s equation. This post will go into more detail.</p>
<h2>Kepler&#8217;s equation</h2>
<p>There are multiple ways to describe the position of a planet in an elliptical orbit around a star. For historical reasons, these descriptions have arcane names such as mean anomaly, true anomaly, and eccentric anomaly. <a href="https://www.johndcook.com/blog/2022/10/22/orbital-anomalies/">This post</a> explains how these three are related.</p>
<p>For this post, it is enough to say that often you know mean anomaly <em>M</em> and want to know eccentric anomaly <em>E</em>. These are related via Kepler&#8217;s equation</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/kepler_eqn.svg" alt="M = E - e \sin E" width="153" height="13" /><br />
where <em>e</em> is the eccentricity of the orbit. You&#8217;d like to solve for <em>E</em> as a function of <em>M</em> and <em>e</em>, but there&#8217;s no elementary way to do that.</p>
<p>One way to solve Kepler&#8217;s equation is to take a guess at <em>E</em> and plug it into the right hand side of</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/kepler_eqn2.svg" alt="E = M + e \sin E" width="141" height="13" /><br />
to get a new <em>E</em>, and keep iterating until the two sides are closer together. I write more about this <a href="https://www.johndcook.com/blog/2018/12/21/contraction-mapping-theorem/">here</a>.</p>
<p>Another approach to solving Kepler&#8217;s equation is to use Newton&#8217;s method. I write more about that <a href="https://www.johndcook.com/blog/2022/11/01/kepler-newton/">here</a>.</p>
<p>Still another approach is to expand <em>E</em> in a sine series and find the series coefficients. An advantage to this approach is that once you have the coefficients, you have an expression for <em>E</em> as a function of <em>M</em>, and you can plug in more values of <em>M</em> without having to solve Kepler&#8217;s equation for each value of <em>M</em> separately.</p>
<h2>Sine series coefficients</h2>
<p>Kepler&#8217;s equation is easy to solve at <em>E</em> = 0 and at <em>E</em> = π. In both cases, <em>E</em> = <em>M</em>. So the function <em>E</em> − <em>M</em> is zero at both ends of [0, π], which suggests we try to expand <em>E</em> − <em>M</em> in a sine series</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/kepler_eqn3.svg" alt="E - M = \sum_{n=1}^\infty a_n \sin nM" width="179" height="54" /></p>
<p>We then calculate the Fourier coefficients <em>a</em><sub><em>n</em></sub> as usual.</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/kepler_bessel2.svg" alt="\begin{align*} a_n &amp;= \frac{2}{\pi} \int_0^\pi (E-M) \sin(nM) \, dM \\ &amp;= \frac{2}{n \pi} \int_0^\pi (E^\prime - 1) \cos(nM)\, dM \\ &amp;= \frac{2}{n \pi} \int_0^\pi \cos(nM) E^\prime(M) \, dM \\ &amp;= \frac{2}{n \pi} \int_0^\pi \cos\Big(nE - ne\sin(E)\Big) E^\prime(M) \, dM \\ &amp;= \frac{2}{n} \left\{ \frac{1}{\pi} \int_0^\pi \cos\Big(nE - ne \sin(E)\Big)\, dE\right\} \\ &amp;= \frac{2}{n} J_n(ne) \end{align*}" width="353" height="305" /></p>
<p>The second line uses integration by parts. The third line uses Kepler&#8217;s equation. The last line uses the definition of the Bessel functions <em>J</em><sub><em>n</em></sub> given in the previous post.</p>The post <a href="https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/">From Kepler to Bessel</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Mr. Bessel&#8217;s eponymous functions</title>
		<link>https://www.johndcook.com/blog/2026/06/05/mr-bessels-eponymous-functions/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Fri, 05 Jun 2026 12:04:28 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Special functions]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247111</guid>

					<description><![CDATA[<p>Yesterday I wrote a post showing that the trapezoid rule evaluates the integral very efficiently. But how do we know what the exact integral is for comparison? If you ask Mathematica, it will tell you the integral equals −2π J1(1) where J1 is a Bessel function. This may seem like rabbit out of a hat, [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/05/mr-bessels-eponymous-functions/">Mr. Bessel’s eponymous functions</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Yesterday I wrote a post showing that the trapezoid rule evaluates the integral</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cossinint.svg" alt="\int_{-\pi}^\pi \cos(\sin(x) + x)\, dx" width="184" height="45" /></p>
<p>very efficiently. But how do we know what the exact integral is for comparison? If you ask Mathematica, it will tell you the integral equals −2π <em>J</em><sub>1</sub>(1) where <em>J</em><sub>1</sub> is a Bessel function. This may seem like rabbit out of a hat, but it&#8217;s actually a simple calculation given the integral definition of Bessel functions:</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/bessel_integral_def.svg" alt="J_n(z) = \frac{1}{\pi}\int_0^\pi \cos(n\theta - z\sin(\theta))\, d\theta" width="289" height="45" /></p>
<p>Since cosine is even, we can write our integral over [−π, π] as twice the integral over [0, π]. Then a change of variables turns this into the definition of <em>J</em><sub><em>n</em></sub>(<em>z</em>) with <em>n</em> = 1 and <em>z</em> = 1.</p>
<p>A deeper question is what have we accomplished by just giving a new name to essentially the same problem we started with. Another question is why in the world are Bessel functions defined as above.</p>
<p>As for what we&#8217;ve accomplished, we&#8217;ve related out integration problem to a very well-studied function. Bessel functions have been studied for two centuries and it&#8217;s easy to find software to evaluate them. Even the usually minimalist command line calculator <code>bc</code> has a function <code>j(x, n)</code> for evaluating <em>J</em><sub><em>n</em></sub>(<em>x</em>) for integer values of <em>n</em>. We could calculate our integral to 50 decimal places as follows.</p>
<pre>~$ bc -l
&gt;&gt;&gt; scale = 50
&gt;&gt;&gt;  -8*a(1)*j(1,1)
-2.76491937476833705153256665538788207487495025542883
</pre>
<p>Note that <code>bc</code> doesn&#8217;t have a value of π built in, but <code>a(x)</code> evaluates the arctangent function, and π = 4 arctan(1).</p>
<p>There are multiple ways of defining Bessel functions. The three main ways would be in terms of their power series, in terms of the differential equations they solve, and in terms of their integral representation. Friedrich Bessel defined what we now call Bessel functions of the first kind, the <em>J</em><sub><em>n</em></sub> functions, in terms of their integral representations.</p>
<p>Why did Bessel care about these integrals? They came out of his calculations in celestial mechanics. One example of this is solving <a href="https://www.johndcook.com/blog/2018/12/21/contraction-mapping-theorem/">Kepler&#8217;s equation</a> with Fourier series; the Fourier coefficients are given by Bessel functions. This is worked out in detail in the <a href="https://www.johndcook.com/blog/2026/06/06/from-kepler-to-bessel/">next post</a>.</p>
<p>Bessel functions had occurred in applications before Mr. Bessel drew attention to them. The functions are named after him because he was the first to systematically study them.</p>
<p>Mathematics is developed inductively but taught deductively. It&#8217;s common for things to be kicked around for years before someone decides they deserve a name and systematic study. See this post on the <a href="https://www.johndcook.com/blog/2010/01/05/how-the-central-limit-theorem-began/">central limit theorem</a> for another example. The CLT is older than the Gaussian distribution, even older than Gauss.</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2016/02/17/analyzing-an-fm-signal/">FM radio and Bessel functions</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2021/06/12/vibrating-circular-membranes/">Vibrating circular membranes</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2017/11/04/fourier-bessel-series-and-gibbs-phenomena/">Fourier-Bessel series</a></li>
</ul>The post <a href="https://www.johndcook.com/blog/2026/06/05/mr-bessels-eponymous-functions/">Mr. Bessel’s eponymous functions</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The Latin of Linux</title>
		<link>https://www.johndcook.com/blog/2026/06/04/the-latin-of-linux/</link>
					<comments>https://www.johndcook.com/blog/2026/06/04/the-latin-of-linux/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 04 Jun 2026 20:12:11 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247093</guid>

					<description><![CDATA[<p>One reason people study Latin is that it is the ancestor of many modern languages. English derives from West Germanic languages, not from Latin, but much of English vocabulary, perhaps as much as 60%, derives from Latin, either directly or indirectly through French. Knowing a bit of Latin makes sense of many things that would [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/04/the-latin-of-linux/">The Latin of Linux</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>One reason people study Latin is that it is the ancestor of many modern languages. English derives from West Germanic languages, not from Latin, but much of English vocabulary, perhaps as much as 60%, derives from Latin, either directly or indirectly through French.</p>
<p>Knowing a bit of Latin makes sense of many things that would otherwise seem completely arbitrary, such as why the symbols for gold, silver, and lead are Au, Ag, and Pb respectively.</p>
<p>Similarly, ed(1) is the Latin of Linux [1]. Many conventions in command line utilities follow conventions that go back to the ed(1) line editor. They may go back even further. Just as Latin didn&#8217;t come out of nowhere, neither did ed(1), but you can&#8217;t go back indefinitely. It&#8217;s convenient to start history somewhere, and this post will start with ed(1) just as much discussion of Western linguistics starts with Latin.</p>
<p>The following are features of ed(1) that live on in sed, awk, grep, vi, perl, bash, etc.</p>
<ol>
<li>Using slashes to delimit regular expressions</li>
<li>Using $ to indicate the end of a line or the end of a file</li>
<li>The pattern of specifying address + action or address range + action</li>
<li>Using regular expressions as address ranges</li>
<li>Using \1, \2, etc to refer to regex captures</li>
<li>Using &amp; to refer to the entire matched text</li>
<li>The g/regexp/command pattern</li>
<li>Using p for printing lines, as in g/re/p</li>
<li>The commands a, c, d, i, j, l, p, q, r, and w in vi</li>
<li>! for shell escape</li>
</ol>
<p>&nbsp;</p>
<p>[1] Because the name &#8220;ed&#8221; is so short, and looks so much like the name Ed, it&#8217;s convenient to use its full Unix name ed(1). The parenthesized number is used to disambiguate different things that have the same name, such as the user command kill(1) and the system call kill(2). There is no ed(2) or any other higher-numbered ed. The number is there to make the name stand out, not to disambiguate anything.</p>The post <a href="https://www.johndcook.com/blog/2026/06/04/the-latin-of-linux/">The Latin of Linux</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/06/04/the-latin-of-linux/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Integrating smooth periodic functions</title>
		<link>https://www.johndcook.com/blog/2026/06/04/integrating-smooth-periodic-functions/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 04 Jun 2026 17:18:40 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Integration]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247102</guid>

					<description><![CDATA[<p>Several posts lately have looked at the function f(x) = cos(sin(x) + x). This post will look at the function from a different angle. It&#8217;s a smooth function with period 2π. For reasons I wrote about here, this means that the trapezoid rule should find its integral very efficiently. In general, the error in the trapezoid [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/04/integrating-smooth-periodic-functions/">Integrating smooth periodic functions</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Several posts lately have looked at the function</p>
<p style="padding-left: 40px;"><em>f</em>(<em>x</em>) = cos(sin(<em>x</em>) + <em>x</em>).</p>
<p>This post will look at the function from a different angle. It&#8217;s a smooth function with period 2π. For reasons I wrote about <a href="https://www.johndcook.com/blog/2017/03/01/numerically-integrating-periodic-functions/">here</a>, this means that the trapezoid rule should find its integral very efficiently.</p>
<p>In general, the error in the trapezoid rule is on the order of 1/<em>N</em>² where <em>N</em> is the number of integration points. To be more specific, the error in integrating a function <em>f</em> over [<em>a</em>, <em>b</em>] with <em>N</em> points is bounded by</p>
<p style="padding-left: 40px;">(<em>b</em> − <em>a</em>)³ <em>M</em> / 12<em>N</em>²</p>
<p>where <em>M</em> is the maximum absolute value of the second derivative of <em>f</em>. So in our case we should expect the error to be less than 82.67/<em>N</em>². In fact we do <em>much</em> better than that. The error does not decrease quadratically, as it does in general, but exponentially.</p>
<p><img loading="lazy" decoding="async" class="size-medium aligncenter" src="https://www.johndcook.com/periodic_gaussian.png" width="640" height="480" /></p>
<p>With just 16 integration points, we&#8217;ve reached the limit of floating point representation.</p>The post <a href="https://www.johndcook.com/blog/2026/06/04/integrating-smooth-periodic-functions/">Integrating smooth periodic functions</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Partitions over permutations</title>
		<link>https://www.johndcook.com/blog/2026/06/04/partitions-over-permutations/</link>
					<comments>https://www.johndcook.com/blog/2026/06/04/partitions-over-permutations/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 04 Jun 2026 13:48:56 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Combinatorics]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247098</guid>

					<description><![CDATA[<p>I was thinking more about the cosine approximation to the Gaussian exp(−z²) ≈ (1 + cos(sin(z) + z))/2 that I wrote about last week. The two expressions above are close along the real axis but not along the imaginary axis. If z = iy, the right side grows much faster than the left, behaving like exp(exp(y)). This led to [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/04/partitions-over-permutations/">Partitions over permutations</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>I was thinking more about the cosine approximation to the Gaussian</p>
<p style="padding-left: 40px;">exp(−<em>z</em>²) ≈ (1 + cos(sin(<em>z</em>) + <em>z</em>))/2</p>
<p>that I wrote about <a href="https://www.johndcook.com/blog/2026/05/31/another-gaussian-approximation/">last</a> <a href="https://www.johndcook.com/blog/2026/06/01/not-just-taylor-series/">week</a>. The two expressions above are close along the real axis but not along the imaginary axis.</p>
<p>If <em>z</em> = <em>iy</em>, the right side grows much faster than the left, behaving like exp(exp(<em>y</em>)).</p>
<p>This led to me looking up the power series for the double exponential function exp(exp(<em>y</em>)). This is an interesting series because the coefficient of <em>x</em><sup><em>n</em></sup> is</p>
<p style="padding-left: 40px;"><em>e</em> <em>B</em><sub><em>n</em></sub> / <em>n</em>!</p>
<p>where <em>B</em><sub><em>n</em></sub> is the <em>n</em>th <a href="https://www.johndcook.com/blog/2018/06/05/bell-numbers/">Bell number</a>, which equals the number of ways to <strong>partition</strong> a set of <em>n</em> labeled items [1]. And of course <em>n</em>! is the number of ways to <strong>permute</strong> a set of <em>n</em> labeled items. So the <em>n</em>th coefficient in the power series for exp(exp(<em>y</em>)) is the ratio of the number of partitions to permutations for a set of <em>n</em> labeled things, multiplied by <em>e</em>.</p>
<p>The number of ways to partition a set of <em>n</em> things grows quickly as <em>n</em> increases, almost as quickly as the number of permutations, and so the series for the double exponential function converges very slowly.</p>
<h2>Computing</h2>
<p>SymPy has a function <code>bell</code> for computing Bell numbers, so you could compute the ratio of partitions to permutations as follows.</p>
<pre>from sympy import bell, factorial
f = lambda n: bell(n)/factorial(n)
</pre>
<p>This returns a number of type <code>sympy.core.numbers.Rational</code> and so the result is exact. You can cast it to float for convenience.</p>
<h2>Asymptotics</h2>
<p>If we look at only the terms in the asymptotic series for log <em>B</em><sub><em>n</em></sub> and log <em>n</em>! that grow with <em>n</em> we have</p>
<p style="padding-left: 40px;">log <em>B</em><sub><em>n</em></sub> ~ <em>n</em> log <em>n</em> − <em>n</em> log log <em>n</em><br />
log <em>n</em>! ~ <em>n</em> log <em>n</em> − ½ log <em>n</em></p>
<p>and so</p>
<p style="padding-left: 40px;">log( <em>B</em><sub><em>n</em></sub> / <em>n</em>! ) ~ ½ log <em>n − n</em> log log <em>n</em></p>
<p>There&#8217;s also an asymptotic series for log( <em>B</em><sub><em>n</em></sub> / <em>n</em>! ) involving the <a href="https://www.johndcook.com/blog/2021/04/23/lambert-w/">Lambert <em>W</em> function</a>:</p>
<p style="padding-left: 40px;">log( <em>B</em><sub><em>n</em></sub> / <em>n</em>! ) ~ <em>n</em>/<em>r</em> − 1 − <em>n</em> log <em>r</em></p>
<p>where <em>r </em>= <em>W</em>(<em>n</em>).</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2025/04/16/bell-numbers-2/">Mr. Bell and Bell numbers</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2021/11/14/estimating-the-number-of-integer-partitions/">Estimating partition numbers</a></li>
<li class="link"><a href="https://www.johndcook.com/TwelvefoldWay.pdf">Richard Stanley&#8217;s twelvefold way</a> (pdf)</li>
</ul>
<p>[1] It&#8217;s important that the items are labeled. Partition numbers are the number of partitions of an <em>unlabeled</em> set. Partition numbers are much smaller than Bell numbers.</p>The post <a href="https://www.johndcook.com/blog/2026/06/04/partitions-over-permutations/">Partitions over permutations</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/06/04/partitions-over-permutations/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Naively summing an alternating series</title>
		<link>https://www.johndcook.com/blog/2026/06/03/naive-sum/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Wed, 03 Jun 2026 15:13:24 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247085</guid>

					<description><![CDATA[<p>Suppose you run across the power series for the exponential function and decide to code it up. Good idea: you&#8217;ll probably learn something, though maybe not what you expect. Maybe you decide a tolerance of 10−12 is good enough, and so you sum the terms until the next term to add is below the tolerance. [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/03/naive-sum/">Naively summing an alternating series</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Suppose you run across the power series for the exponential function and decide to code it up. Good idea: you&#8217;ll probably learn something, though maybe not what you expect.</p>
<p>Maybe you decide a tolerance of 10<sup>−12</sup> is good enough, and so you sum the terms until the next term to add is below the tolerance.</p>
<pre>from math import factorial, exp

def naive_exp(x):
    tolerance = 1e-12
    s = 0
    n = 0
    while True:
        delta = x**n / factorial(n)
        s += delta
        if abs(delta) &lt; tolerance:
            return s
        n += 1
</pre>
<p>You want to try your program out, so you compute <em>e</em> by calling the function at 1. If you compare this to calling <code>exp(1)</code> you find that you got all the digits correct.</p>
<p>Now you try computing exp(-20). Calling <code>naive_exp(-20)</code> gives</p>
<pre>    5.47893091802112e-10</pre>
<p>but calling <code>exp(-20)</code> gives</p>
<pre>    2.061153622438558e-09</pre>
<p>Don&#8217;t brush things like this as flukes or compiler bugs [1]. This is your golden opportunity to learn something.</p>
<p>Maybe you add a print statement to see the intermediate values of the sum stored in the variable <em>s</em>. If you do, you&#8217;ll see that the partial sums oscillate wildly before settling down.</p>
<p>Maybe that seems wrong, but then you look more carefully at the series. The <em>n</em>th term is <em>x</em><sup><em>n</em></sup>/<em>n</em>!. Since <em>x</em> is negative, the terms alternate in sign. And the absolute values of the term get bigger before they get smaller. When <em>x</em> = −20, each numerator is 20 times larger than the previous, and each denominator is <em>n</em> times larger than the previous. So the terms will get bigger until <em>n</em> &gt; 20. So the wild oscillations are real, not a bug.</p>
<p>The largest partial sum is 21822593.77927747 in absolute value. You know that exp(−20) is a very small number, so there&#8217;s going to have to be a lot of cancellation before the partial sums settle down to a small number. Maybe you&#8217;ve heard that cancellation is where numerical calculations lose precision. If not, now you know!</p>
<p>Look again at the largest partial sum. There are eight figures to the right of the decimal point. The code is printing out results to as much precision as it has, so the error at this point is on the order of 10<sup>−8</sup>. We&#8217;re trying to compute a number on the order of 10<sup>−9</sup>, and if <em>any</em> digits in our result are correct, it would be a coincidence.</p>
<p>If you go back and try your code on <em>x</em> = −22, the result is even worse, giving a negative result for a quantity that for theoretical reasons cannot be negative. But you can see why: you&#8217;re asking the code to compute a number that is closer to zero than the accuracy of the code.</p>
<p>Computers don&#8217;t represent numbers in base 10 internally, but the argument above is sufficient in this case. If you want to dig deeper, look into the <a href="https://www.johndcook.com/blog/2009/04/06/anatomy-of-a-floating-point-number/">anatomy of a floating point number</a>.</p>
<p>There is a simple way around the problem above, but discovering it sooner would short-circuit the learning process. You could calculate exp(−20) as 1/exp(20) and avoid all the cancellation because the series for exp(20) does not alternate.</p>
<p>&nbsp;</p>
<p>[1] Compilers do have bugs occasionally, but it&#8217;s orders of magnitude more likely that something is wrong with your code.</p>The post <a href="https://www.johndcook.com/blog/2026/06/03/naive-sum/">Naively summing an alternating series</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>It&#8217;s not just Taylor series</title>
		<link>https://www.johndcook.com/blog/2026/06/01/not-just-taylor-series/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Mon, 01 Jun 2026 12:42:12 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247071</guid>

					<description><![CDATA[<p>There is still active discussion on X about the approximation exp(−x²) ≈ (1 + cos(sin(x) + x))/2 and some are saying this can just be explained by Taylor series: the series for the two sides differ for the first time at the x6 term, so that&#8217;s why you get a good approximation. As I wrote [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/01/not-just-taylor-series/">It’s not just Taylor series</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>There is still active discussion on X about the approximation</p>
<p style="padding-left: 40px;">exp(−<em>x</em>²) ≈ (1 + cos(sin(<em>x</em>) + <em>x</em>))/2</p>
<p>and some are saying this can just be explained by Taylor series: the series for the two sides differ for the first time at the <em>x</em><sup>6</sup> term, so that&#8217;s why you get a good approximation. As I wrote <a href="https://www.johndcook.com/blog/2026/05/31/another-gaussian-approximation/">yesterday</a>, that&#8217;s only part of it.</p>
<p>If it were just about Taylor series you could use</p>
<p style="padding-left: 40px;">exp(−<em>x</em>²) ≈ 1 − <em>x</em>² + <em>x</em><sup>4</sup>/2</p>
<p>which also has error <em>O</em>(<em>x</em><sup>6</sup>). But this approximation is only good for fairly small <em>x</em>, say in [−0.5, 0.5], whereas the approximation at the top of the post is good over [−4, 4]. When <em>x</em> = 4, the error in the cosine approximation is 0.002579 but the error in the Taylor approximation is 113, five orders of magnitude larger.</p>
<p>If the accuracy of the cosine approximation were due to Taylor series, then we&#8217;d expect the function</p>
<p style="padding-left: 40px;">exp(−<em>x</em>²) − (1 + cos(sin(<em>x</em>) + <em>x</em>))/2</p>
<p>to be small not just over the interval [−4, 4] but over a disk of radius 4 in the complex plane. But it&#8217;s not. When <em>x</em> = 4<em>i</em> the function is on the order of 10<sup>13</sup>.</p>
<p>Both the cosine approximation and the Taylor approximation are good for small disks, say of radius 0.5. They&#8217;re both bad for much larger disks, and in fact the cosine approximation is worse.</p>
<p>&nbsp;</p>The post <a href="https://www.johndcook.com/blog/2026/06/01/not-just-taylor-series/">It’s not just Taylor series</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Subscribe by email</title>
		<link>https://www.johndcook.com/blog/2026/06/01/subscribe-by-email/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Mon, 01 Jun 2026 11:01:26 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247067</guid>

					<description><![CDATA[<p>Readers have subscribed to this blog via email almost from its beginning in 2008, but how they have subscribed has changed several times. I&#8217;ve used several services to provide email subscription that have come and gone. For the past two years I&#8217;ve been using Substack to send out emails announcing new blog posts. That has [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/06/01/subscribe-by-email/">Subscribe by email</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Readers have subscribed to this blog via email almost from its beginning in 2008, but <em>how</em> they have subscribed has changed several times. I&#8217;ve used several services to provide email subscription that have come and gone.</p>
<p>For the past two years I&#8217;ve been using Substack to send out emails announcing new blog posts. That has worked out well. Substack delivers email reliably, and that&#8217;s all I wanted. I&#8217;m not active on Substack other than using it to an email. I give a brief introduction to the latest two or three blog posts in each email, and sometimes I include additional ideas that occurred to me later.</p>
<p>If you&#8217;d like to get blog post announcements and a little extra commentary via email, sign up for my free Substack newsletter <a href="https://johndcookconsulting.substack.com/">here</a>.</p>
<p>If you&#8217;d like to learn about new posts sooner, you can subscribe via <a href="https://www.johndcook.com/blog/rss-feed/">RSS</a> or follow me on <a href="https://www.johndcook.com/blog/twitter_page/">X</a> or <a href="https://mathstodon.xyz/@johndcook">Mastodon</a>.</p>The post <a href="https://www.johndcook.com/blog/2026/06/01/subscribe-by-email/">Subscribe by email</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Another Gaussian approximation</title>
		<link>https://www.johndcook.com/blog/2026/05/31/another-gaussian-approximation/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 31 May 2026 16:15:49 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247061</guid>

					<description><![CDATA[<p>The function (1 + cos(x))/2 gives a fair approximation to the Gaussian density exp(−x²) You can make the approximation much better by raising it to a power. The function ((1 + cos(x))/2)4 gives a good lower bound and ((1 + cos(x))/2)3.5597 gives a good upper bound. More on that here. There are other ways of [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/31/another-gaussian-approximation/">Another Gaussian approximation</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The function</p>
<p style="padding-left: 40px;">(1 + cos(<em>x</em>))/2</p>
<p>gives a fair approximation to the Gaussian density</p>
<p style="padding-left: 40px;">exp(−<em>x</em>²)</p>
<p>You can make the approximation much better by raising it to a power. The function</p>
<p style="padding-left: 40px;">((1 + cos(<em>x</em>))/2)<sup>4</sup></p>
<p>gives a good lower bound and</p>
<p style="padding-left: 40px;">((1 + cos(<em>x</em>))/2)<sup>3.5597</sup></p>
<p>gives a good upper bound. More on that <a href="https://www.johndcook.com/blog/2023/07/05/normal-approximation/">here</a>.</p>
<p>There are other ways of improving the cosine approximation to the Gaussian. <a href="https://x.com/1strawberry_jam/status/2060730093404856692">Yesterday</a> I came across one I hadn&#8217;t seen before, adding a sin(<em>x</em>) term to <em>x</em>.</p>
<p style="padding-left: 40px;">(1 + cos(sin(<em>x</em>) + <em>x</em>))/2</p>
<p>This function matches the first few terms of the power series for exp(−<em>x</em>²) and has an error on the order of <em>x</em><sup>6</sup>/240. You can&#8217;t see the difference between the two functions in a plot for −4 ≤ <em>x</em> ≤ 4.</p>
<p style="text-align: center;">***</p>
<p>There&#8217;s a tension between the previous two statements. If the error in on the order of <em>x</em><sup>6</sup>/240 then we&#8217;d expect the error to be huge at <em>x</em> = 4. We have</p>
<p style="padding-left: 40px;">4<sup>6</sup>/240 = 17.07</p>
<p>and yet</p>
<p style="padding-left: 40px;">exp(−4²) − ((1 + cos(4 + sin(4)))/2) = −0.002579,</p>
<p>i.e. the error is between 3 and 4 orders of magnitude smaller than we might expect.</p>
<p>We have an alternating series, so the truncation error should be roughly equal to the first term after the truncation, right? No, the alternating series theorem doesn&#8217;t apply because the absolute values of the terms in the series are not decreasing yet for <em>x</em> = 4. The terms have to decrease eventually because the series has infinite radius of convergence, but they&#8217;re not decreasing at the 6th term; the terms will get much larger in absolute value before they get smaller.</p>
<p>The basic alternating series theorem gives only an upper bound on truncation error, but there are extensions that also give a lower bound. I wrote about these <a href="https://www.johndcook.com/blog/2026/03/17/alternating-series-remainder/">extensions</a> a few weeks ago. But they don&#8217;t apply here because the terms have not started decreasing in absolute value.</p>
<p><strong>Update</strong>: See further discussion in the post <a href="https://www.johndcook.com/blog/2026/06/01/not-just-taylor-series/">It&#8217;s not just Taylor series</a>.</p>The post <a href="https://www.johndcook.com/blog/2026/05/31/another-gaussian-approximation/">Another Gaussian approximation</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Spot checking polynomial identities</title>
		<link>https://www.johndcook.com/blog/2026/05/30/schwartz-zippel/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sat, 30 May 2026 21:06:49 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247056</guid>

					<description><![CDATA[<p>If a polynomial identity holds at a few random points, it&#8217;s very like true. We&#8217;ll make this statement more precise, but first let&#8217;s look at some applications. You may want to test an identity that naturally presents itself as a statement that two polynomials are equal. Or you might use something like the binomial coefficient [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/30/schwartz-zippel/">Spot checking polynomial identities</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>If a polynomial identity holds at a few random points, it&#8217;s very like true. We&#8217;ll make this statement more precise, but first let&#8217;s look at some applications.</p>
<p>You may want to test an identity that naturally presents itself as a statement that two polynomials are equal. Or you might use something like <a href="https://www.johndcook.com/blog/2012/07/21/binomial-coefficient-trick/">the binomial coefficient trick</a> to reframe a problem that isn&#8217;t obviously an identity about polynomials. And with algebraic circuits, you can reformulate a wide range of computations as polynomial identities; this is widely used in zero-knowledge proofs.</p>
<p>The theorem alluded to at the top of the post is the Schwartz-Zippel lemma. It is formulated in terms of the probability of a non-zero polynomial <em>P</em> evaluating to zero. To prove that two polynomials <em>Q</em><sub>1</sub> and <em>Q</em><sub>2</sub> are equal, you can show that</p>
<p style="padding-left: 40px;"><em>P</em> = Q<sub>1</sub>(<em>x</em>) − <em>Q</em><sub>2</sub>(<em>x</em>) = 0.</p>
<h2>Schwartz-Zippel lemma</h2>
<p>Let <em>F</em> be a (typically large) finite field and let <em>P</em> be a non-zero polynomial in <em>n</em> variables</p>
<p style="padding-left: 40px;"><em>P</em>(<em>x</em><sub>1</sub>, <em>x</em><sub>2</sub>, <em>x</em><sub>3</sub>, …, <em>x</em><sub><em>n</em></sub>)</p>
<p>of total degree <em>d</em>. If we choose the <em>x</em>&#8216;s randomly from <em>F</em> then the probability that <em>P</em> evaluates to zero is no more than <em>d</em>/|<em>F</em>|. [1]</p>
<p>If the total degree <em>d</em> is small relative to the size of the field, then the probability of <em>P</em> evaluating to zero is small. As long as <em>d</em> is less than |<em>F</em>|, you can evaluate the polynomial <em>k</em> times to make</p>
<p style="padding-left: 40px;">(<em>d</em> / |<em>F</em>|)<sup><em>k</em></sup></p>
<p>as small as you&#8217;d like. If <em>d</em> isn&#8217;t too large, and <em>F</em> is large, like the integers mod <em>p</em> = 2<sup>255</sup> − 19 used in cryptography, one polynomial evaluation might be enough to give convincing evidence that the polynomial is zero.</p>
<p>&nbsp;</p>
<p>[1] The Schwartz-Zippel lemma in its full generality applies to polynomials over an integral domain <em>R</em> with variables drawn from <em>S</em>, a finite subset of <em>R</em>. Here we&#8217;re setting <em>R</em> = <em>S</em> = <em>F</em>.</p>The post <a href="https://www.johndcook.com/blog/2026/05/30/schwartz-zippel/">Spot checking polynomial identities</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Online (one-pass) algorithms</title>
		<link>https://www.johndcook.com/blog/2026/05/29/online-one-pass-algorithms/</link>
					<comments>https://www.johndcook.com/blog/2026/05/29/online-one-pass-algorithms/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Fri, 29 May 2026 12:24:30 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Probability and Statistics]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247047</guid>

					<description><![CDATA[<p>Canonical example The sample variance of a set of numbers is defined in terms of the sum of the squared distances from each point to the mean. So it would seem that you first need to calculate the mean, then go back and compute the squared differences from the mean. And yet sample variance can [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/29/online-one-pass-algorithms/">Online (one-pass) algorithms</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<h2>Canonical example</h2>
<p>The sample variance of a set of numbers is defined in terms of the sum of the squared distances from each point to the mean.</p>
<p><img decoding="async" class="aligncenter" style="background-color: white;" src="//www.johndcook.com/samplevariance.svg" alt="s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i -\bar{x})^2" width="200" /></p>
<p>So it would seem that you first need to calculate the mean, then go back and compute the squared differences from the mean. And yet sample variance can be computed in one pass through the data.</p>
<p>You&#8217;ll find two equivalent equations in statistics books: the one described above and another based on the sum of the data points and the sum of the data points squared.</p>
<p><img decoding="async" class="aligncenter" style="background-color: white;" src="//www.johndcook.com/variance2.svg" alt="s^2 = \frac{1}{n(n-1)}\left(n\sum_{i=1}^n x_i^2 -\left(\sum_{i=1}^n x_i\right)^2\right)" width="307" /></p>
<p>While this equation is theoretically correct, it is numerically unstable. Code that directly implements this equation can return a negative value for a quantity that is theoretically positive. I&#8217;ve seen this happen with real data, causing a program to crash when taking the square root of the variance to get the standard deviation.</p>
<p>However, there is an algorithm that computes mean and variance in one pass that is accurate and numerically stable. This algorithm was developed by B. P. Welford in 1962. I discuss Welford&#8217;s algorithm and give code for implementing it <a href="https://www.johndcook.com/blog/standard_deviation/">here</a>.</p>
<h2>Online algorithms</h2>
<p>Welford&#8217;s algorithm is known in computer science as an &#8220;online&#8221; algorithm. This term was coined well before the Internet. For example, see the paper [1] from 1965.</p>
<p>But of course now &#8220;online&#8221; means something else, and so the technical and colloquial uses of &#8220;online algorithm&#8221; have split. Technical literature uses the phrase to describe the kinds of algorithms in this post. Most people would take &#8220;online algorithm&#8221; to mean code that runs on a remote server. You may see &#8220;streaming algorithm&#8221; as a contemporary technical term, but I&#8217;d still search on &#8220;online algorithm&#8221; to find papers.</p>
<h2>Computing higher moments online</h2>
<p>Welford&#8217;s algorithm computes the first two moments, mean and variance, of a data set online. It is also possible to <a href="https://www.johndcook.com/blog/skewness_kurtosis/">compute skewness and kurtosis online</a>, as well as higher moments.</p>
<h2>Online regression</h2>
<p>Simple linear regression is closely related to calculating mean and variance, and it is possible to compute simple regression coefficients online. I have some old notes on this <a href="https://www.johndcook.com/running_regression.html">here</a>.</p>
<p>This post was motivated by an email asking me about multiple regression. It is also possible to compute multiple regression coefficients online, but I haven&#8217;t done this. I found a couple references, [2] and [3], but I have not read them. There is a simple procedure for two predictor variables but I believe things get a little more complicated with three or more predictors, requiring a recursive least squares algorithm.</p>
<h2>Related posts</h2>
<p>The notion of online algorithms is closely related to the notion of a fold in functional programming. Here are several posts on computing things with folds.</p>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2016/06/09/insertion-sort-as-a-fold/">Insertion sort as a fold</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2016/07/14/kalman-filters-and-functional-programming/">Kalman filters as folds</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2016/06/08/computing-higher-moments-with-a-fold/">Computing higher moments with a fold</a></li>
</ul>
<p>[1] One-Tape, Off-Line Turing Machine Computations by F. C. Hennie. Information and Control. 8, 553-578 (1965). Available <a href="https://cse.iitkgp.ac.in/~goutam/toc/readingMaterial/one-tape-off-line.pdf">here</a>. In this paper Hennie writes &#8220;In an <em>on-line computation</em> the input data are supplied to the machine, one symbol at a time, at a special input terminal. … In an <em>off-line computation</em> all of the input symbols are written on one of the machine&#8217;s tapes prior to the start of the computation.</p>
<p>[2] Arthur Albert and Robert W. Sittler, “A Method for Computing Least Squares Estimators that Keep Up with the Data,” Journal of the Society for Industrial and Applied Mathematics, Series A: Control, 3(3), 384–417, 1965. DOI: 10.1137/0303026.</p>
<p>[3] Petre Stoica and Per Ashgren. Exact initialization of the recursive least-squares algorithm. Int. J. Adapt. Control Signal Process. 2002; 16:219&amp;ndashh;230.</p>The post <a href="https://www.johndcook.com/blog/2026/05/29/online-one-pass-algorithms/">Online (one-pass) algorithms</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/05/29/online-one-pass-algorithms/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Turning K-L divergence into a metric</title>
		<link>https://www.johndcook.com/blog/2026/05/27/jensen-shannon/</link>
					<comments>https://www.johndcook.com/blog/2026/05/27/jensen-shannon/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 28 May 2026 01:35:54 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Information theory]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247046</guid>

					<description><![CDATA[<p>Kullback-Leibler divergence Kullback-Leibler divergence is defined for two random variables X and Y by K-L divergence is non-negative, and it&#8217;s zero if and only if X and Y have the same distribution. But it is not a metric, for reasons explained here. For one thing, it&#8217;s not symmetric. Jeffreys divergence We can fix the symmetry problem by [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/27/jensen-shannon/">Turning K-L divergence into a metric</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<h2>Kullback-Leibler divergence</h2>
<p>Kullback-Leibler divergence is defined for two random variables <em>X</em> and <em>Y</em> by</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/kl_def2.svg" alt="D_{KL}(X || Y) = -\int f_X(x) \log \frac{f_Y(x)}{f_X(x)} \, dx" width="304" height="46" /></p>
<p>K-L divergence is non-negative, and it&#8217;s zero if and only if <em>X</em> and <em>Y</em> have the same distribution. But it is not a metric, for reasons explained <a href="https://www.johndcook.com/blog/2017/11/08/why-is-kullback-leibler-divergence-not-a-distance/">here</a>. For one thing, it&#8217;s not symmetric.</p>
<h2>Jeffreys divergence</h2>
<p>We can fix the symmetry problem by defining</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/jeffreys_divergence.svg" alt="J(X, Y) = D_{KL}(X || Y)  + D_{KL}(Y || X)" width="281" height="18" /></p>
<p>The <em>J</em> above stands for Jeffreys, for Harold Jeffreys. <em>J</em> is called either the symmetrized K-L divergence or Jeffreys&#8217; divergence. It&#8217;s still a divergence, not a distance.</p>
<p>A distance (metric) <em>d</em> has to have four properties:</p>
<ol>
<li><em>d</em>(<em>x</em>, <em>x</em>) = 0</li>
<li><em>d</em>(<em>x</em>, <em>y</em>) &gt; 0 if <em>x</em> ≠ <em>y</em></li>
<li><em>d</em>(<em>x</em>, <em>y</em>) = <em>d</em>(<em>y</em>, <em>x</em>)</li>
<li><em>d</em>(<em>x</em>, <em>z</em>) ≤ <em>d</em>(<em>x</em>, <em>y</em>) + <em>d</em>(<em>y</em>, <em>z</em>)</li>
</ol>
<p>K-L divergence satisfies the first two properties. Jeffreys&#8217; divergence satisfies the first three, but not the last one, the triangle inequality.</p>
<p>To show that <em>J</em> doesn&#8217;t satisfy the triangle inequality, let <em>X</em>, <em>Y</em>, and <em>Z</em> be Bernoulli random variables with <em>p</em> equal to 0.1, 0.2, and 0.3 respectively. Then the following Python code shows that the divergence from <em>X</em> to <em>Y</em>, plus the divergence from <em>Y</em> to <em>Z</em>, is less than the divergence from <em>X</em> to <em>Z</em>. This would be like saying you could get from LA to NYC faster by having a layover in Denver rather than taking a direct flight.</p>
<pre>from math import log

kl = lambda p, q: p*log(p/q) + (1-p)*log((1-p)/(1-q))
j  = lambda p, q: kl(p, q) + kl(q, p)

a = j(0.1, 0.2)
b = j(0.2, 0.3)
c = j(0.1, 0.3)
print(a + b, c)
</pre>
<p>This prints 0.135 and 0.270.</p>
<h2>Jensen-Shannon distance</h2>
<p>Jensen-Shannon distance turns K-L divergence into a metric as follows. First, define the random variable <em>M</em> to be the average of <em>X</em> and <em>Y</em>. Then average the K-L divergence from <em>M</em> to each of <em>X</em> and <em>Y</em>. This defines the Jensen-Shannon <strong>divergence</strong>. It&#8217;s still not a metric, but its square root is, which defines the Jensen-Shannon <strong>distance</strong><em>.</em></p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/jensen_shannon.svg" alt="\begin{align*} M &amp;= (X + Y)/2 \\ \text{JSD}(X, Y) &amp;= \tfrac{1}{2}D_{KL}(X || M) + \tfrac{1}{2}D_{KL}(Y || M) \\ d_{JS}(X, Y) &amp;= \sqrt{\text{JSD}(X, Y)} \end{align*}" width="331" height="85" /></p>
<p>The following code gives an example of Jensen-Shannon distance satisfying the triangle inequality.</p>
<pre>def d(p, q):
    m = 0.5*(p + q)
    jsd = 0.5*kl(p, m) + 0.5*kl(q, m) 
    return jsd**0.5

a = d(0.1, 0.2)
b = d(0.2, 0.3)
c = d(0.1, 0.3)
print(a + b, c)
</pre>
<p>This prints 0.1817 and 0.1801. Now a layover makes the trip longer.</p>The post <a href="https://www.johndcook.com/blog/2026/05/27/jensen-shannon/">Turning K-L divergence into a metric</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/05/27/jensen-shannon/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>The Meta logo and fitting Besace curves</title>
		<link>https://www.johndcook.com/blog/2026/05/27/the-meta-logo-and-fitting-besace-curves/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Wed, 27 May 2026 15:15:51 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247045</guid>

					<description><![CDATA[<p>I saw a post yesterday saying that the Meta logo is a Besace curve. A Besace curve has the implicit form and the parametric form where t ranges over [0, 2π]. So given a Besace curve, such as the Meta logo, how do you find the parameters a and b to fit the curve? We can rewrite [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/27/the-meta-logo-and-fitting-besace-curves/">The Meta logo and fitting Besace curves</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>I saw a <a href="https://mathstodon.xyz/@tpfto/116643531890145672">post</a> yesterday saying that the Meta logo is a Besace curve.</p>
<p><img loading="lazy" decoding="async" class="size-medium aligncenter" src="https://www.johndcook.com/meta_logo2.png" alt="Meta logo" width="318" height="88" /></p>
<p>A Besace curve has the implicit form</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace1.svg" alt="(x^2 - by)^2 = a^2(x^2 - y^2)" width="198" height="22" /></p>
<p>and the parametric form</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace2.svg" alt="\begin{align*} x &amp;= a\cos(t) - b \sin(t) \\ y &amp;= -\sin(t) x\end{align*}" width="177" height="47" /></p>
<p>where <em>t</em> ranges over [0, 2π].</p>
<p>So given a Besace curve, such as the Meta logo, how do you find the parameters <em>a</em> and <em>b</em> to fit the curve?</p>
<p>We can rewrite the parametric expression for <em>x</em> as a sine with a phase shift (see notes <a href="https://www.johndcook.com/blog/2024/11/12/sin-cos-phase/">here</a>)</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace3.svg" alt="x = A \sin(t + \phi)" width="128" height="18" /></p>
<p>where</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace4x.svg" alt="\begin{align*} A &amp;= \sqrt{a^2 + b^2} \\ \phi &amp;= -\arctan(a/b)\end{align*}" width="145" height="55" /></p>
<p>Also, we can rewrite the parametric expression for <em>y</em> as</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace5.svg" alt="\begin{align*} y &amp;= A \sin(t) \sin(t + \phi) \\ &amp;= \frac{A}{2} \left(\cos(\phi) - \cos(2t + \phi)\right) \\ \end{align*}" width="237" height="68" /></p>
<p>Now the extreme values of <em>x</em> and <em>y</em> are easier to see. The maximum value of <em>x</em> is <em>A</em> and the minimum value is −<em>A</em>. The maximum value of <em>y</em> is <em>A</em>(cos(φ) + 1)/2 and the minimum value is <em>A</em>(cos(φ) − 1)/2.</p>
<p>We can simplify the cosine of an arctangent (see <a href="https://www.johndcook.com/blog/2026/02/25/trig-of-inverse-trig/">here</a>) to find the height, i.e. the difference between the maximum and minimum <em>y</em> value, in terms of <em>a</em> and <em>b</em>.</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace6x.svg" alt="\begin{align*} \cos(\phi) &amp;= \cos(-\arctan(a/b)) \\ &amp;= \frac{1}{\sqrt{1 + (a/b)^2}} \\ &amp;= \frac{b}{\sqrt{a^2 + b^2}} \end{align*}" width="228" height="127" /></p>
<p>Then the height is given by</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace7y.svg" alt="\begin{align*} h &amp;= \frac{A}{2}(\cos(\phi) + 1) - \frac{A}{2}(\cos(\phi) - 1) \\ &amp;= A \cos(\phi) + A \\ &amp;= b + \sqrt{a^2 + b^2} \end{align*}" width="291" height="102" /></p>
<p>The width is given by</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace8.svg" alt="w = 2A = 2\sqrt{a^2 + b^2}" width="165" height="24" /></p>
<p>and so</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace9x.svg" alt="b = h - w/2" width="99" height="17" /></p>
<p>and</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/besace10x.svg" alt="a = \pm \sqrt{\frac{w^2}{4} - b^2}" width="131" height="48" /></p>
<p>Now the Meta logo is drawn with a thick line, and the line width isn&#8217;t constant. It&#8217;s a little fuzzy what the height and width of the middle of the curve are, but I estimated h = 120 and w = 200 from one image. This leads to <em>b</em> = 20 and <em>a</em> = 97.98.</p>
<p>The Mathematica code</p>
<pre>ParametricPlot[{a Cos[t] + 
   b Sin[t], -Sin[t] ( a Cos[t] + b Sin[t])}, {t, 0, 2 Pi}, 
 PlotStyle -&gt; Thickness[0.05]]
</pre>
<p>produces the following image.</p>
<p><img loading="lazy" decoding="async" class="size-medium aligncenter" src="https://www.johndcook.com/meta_mathematica.png" alt="Mathematica approximation of Meta logo" width="480" height="241" /></p>
<p>This is reminiscent of the Meta logo, but not a great match. I suspect the logo is not exactly a Besace curve. You could tinker with the <em>a</em> and <em>b</em> parameters and the aspect ratio to get a closer match. The logo may have been inspired by a Besace curve and then drawn by hand.</p>The post <a href="https://www.johndcook.com/blog/2026/05/27/the-meta-logo-and-fitting-besace-curves/">The Meta logo and fitting Besace curves</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Calculating the expected range of normal samples</title>
		<link>https://www.johndcook.com/blog/2026/05/26/calculating-expected-normal-range/</link>
					<comments>https://www.johndcook.com/blog/2026/05/26/calculating-expected-normal-range/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Tue, 26 May 2026 17:59:53 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Probability]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247043</guid>

					<description><![CDATA[<p>The previous post looked at the expected IQ range in a jury of 12. This post will look more generally at computing the expected range of n samples from a N(0, 1) random variable. This will give the expected range in units of σ, i.e. multiply the results by σ if your σ isn&#8217;t 1. As mentioned [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/26/calculating-expected-normal-range/">Calculating the expected range of normal samples</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The <a href="https://www.johndcook.com/blog/2026/05/26/expected-iq-spread-on-a-jury/">previous post</a> looked at the expected IQ range in a jury of 12. This post will look more generally at computing the expected range of <em>n</em> samples from a <em>N</em>(0, 1) random variable. This will give the expected range in units of σ, i.e. multiply the results by σ if your σ isn&#8217;t 1.</p>
<p>As mentioned in the previous post, the expected range is given by</p>
<p><img loading="lazy" decoding="async" class="aligncenter" src="https://www.johndcook.com/normal_range.svg" alt="d_n = 2n \int_{-\infty}^\infty \Phi(x)^{n-1} \, x\,\phi(x) \, dx" width="249" height="45" /></p>
<p>where φ and Φ are the PDF and CDF of a standard normal. The integral can be calculated in closed form for <em>n</em> ≤ 5, but in general it requires numerical integration [1].</p>
<p>The following Python code can compute <em>d</em><sub><em>n</em></sub>.</p>
<pre>
from scipy.stats import norm
from scipy.integrate import quad
import numpy as np

def d(n):
    integrand = lambda x: x*norm.pdf(x)*norm.cdf(x)**(n-1)
    res, info = quad(integrand, -np.inf, np.inf)
    return 2*n*res
</pre>
<p>For large <em>n</em> we have the asymptotic approximation</p>
<p><img class='aligncenter' src='https://www.johndcook.com/normal_range2.svg' alt='d_n = 2 \Phi^{-1}\left( \frac{n \,–\, 0.375}{ n + 0.25} \right)' style='background-color:white' height='48' width='178' /></p>
<p>which we could implement in Python by</p>
<pre>
def approx(n):
    return 2*norm.ppf((n - 0.375)/(n + 0.25))
</pre>
<p>For very large <em>n</em> the asymptotic expression may be more accurate than the integral due to numerical integration error. </p>
<p>Here are a few example values.</p>
<pre>
|-----+-------|
|   n |   d_n |
|-----+-------|
|   2 | 1.128 |
|   3 | 1.693 |
|   5 | 2.326 |
|  10 | 3.078 |
|  12 | 3.258 |
|  23 | 3.858 |
|  50 | 4.498 |
| 100 | 5.015 |
|-----+-------|
</pre>
<p>[1] Order Statistics by H. A. David. John Wiley &amp; Sons. 1970.</p>The post <a href="https://www.johndcook.com/blog/2026/05/26/calculating-expected-normal-range/">Calculating the expected range of normal samples</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/05/26/calculating-expected-normal-range/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Expected IQ spread on a jury</title>
		<link>https://www.johndcook.com/blog/2026/05/26/expected-iq-spread-on-a-jury/</link>
					<comments>https://www.johndcook.com/blog/2026/05/26/expected-iq-spread-on-a-jury/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Tue, 26 May 2026 13:50:51 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Probability]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247041</guid>

					<description><![CDATA[<p>There&#8217;s been some discussion online lately about how a large difference in IQ makes it difficult for two people to communicate. There have been studies that confirm this effect. The difficulty is not insurmountable, but it takes deliberate effort to overcome. Someone dismissed this communication difficulty by pointing out that the expected difference in IQ [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/26/expected-iq-spread-on-a-jury/">Expected IQ spread on a jury</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>There&#8217;s been some discussion online lately about how a large difference in IQ makes it difficult for two people to communicate. There have been studies that confirm this effect. The difficulty is not insurmountable, but it takes deliberate effort to overcome.</p>
<p>Someone dismissed this communication difficulty by pointing out that the expected difference in IQ between two individuals is around 17, suggesting that most communication is between people who differ by more than one standard deviation in IQ. But this calculation assumes people are chosen at random, which they usually are not. People tend to live around and work around others of similar intelligence.</p>
<p>However, a jury <em>is</em> a random sample. It&#8217;s not a perfect random sample. For one thing, it starts with a random sample of people who are registered to vote, or who have a driver&#8217;s license, not all individuals. Furthermore, the pool of potential jurors is reduced to a jury through the process of <em>voir dire</em>, which is not random.</p>
<p>For this post I will make the simplifying assumption that a jury is a random sample from a population with normally distributed IQ with standard deviation σ = 15. The mean doesn&#8217;t matter here, but you could assume it&#8217;s 100 if you&#8217;d like.</p>
<p>By symmetry, the expected range of <em>n</em> samples from a normal random variable is twice the maximum. For <em>n</em> = 12 the range is about 3.26σ, which corresponds to nearly <strong>50 IQ points</strong>.</p>
<p>This suggests there&#8217;s usually a big spread of IQ on a jury. Even if IQ doesn&#8217;t measure intelligence, it measures <em>something</em>, and that something varies a lot over 12 people chosen at random [1].</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/explaining-probability-to-a-jury/">Explaining probability to a jury</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/expert-testimony/">Expert testimony</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2016/01/12/seven-questions-a-statistician-could-answer-for-a-lawyer/">Seven questions a statistician can answer for a lawyer</a></li>
</ul>
<p>[1] In case you&#8217;re interested in the technical details, the expected range of <em>n</em> samples from a standard normal random variable is given by</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/normal_range.svg" alt="d_n = 2n \int_{-\infty}^\infty \Phi(x)^{n-1} \, x\,\phi(x) \, dx" width="249" height="45" /></p>
<p>where φ and Φ are the PDF and CDF of a standard normal. Multiply this by σ to get the range of a normal random variable with standard deviation σ. As for how to calculate <em>d</em><sub><em>n</em></sub>, see the <a href="https://www.johndcook.com/blog/2026/05/26/calculating-expected-normal-range/">next post</a>.</p>The post <a href="https://www.johndcook.com/blog/2026/05/26/expected-iq-spread-on-a-jury/">Expected IQ spread on a jury</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/05/26/expected-iq-spread-on-a-jury/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
			</item>
		<item>
		<title>Hilbert transform as an infinite matrix</title>
		<link>https://www.johndcook.com/blog/2026/05/23/hilbert-transform-as-an-infinite-matrix/</link>
					<comments>https://www.johndcook.com/blog/2026/05/23/hilbert-transform-as-an-infinite-matrix/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sat, 23 May 2026 15:09:34 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Fourier analysis]]></category>
		<category><![CDATA[Linear algebra]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247039</guid>

					<description><![CDATA[<p>The previous post linked to a post I wrote a few years ago about the Hilbert transform and Fourier series. That post says that if the Fourier series of a function is then the Fourier series of its Hilbert transform is When I looked back at that post I thought about how if you thought [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/23/hilbert-transform-as-an-infinite-matrix/">Hilbert transform as an infinite matrix</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The previous post linked to a post I wrote a few years ago about the <a href="https://www.johndcook.com/blog/2022/04/13/hilbert-fourier/">Hilbert transform and Fourier series</a>. That post says that if the Fourier series of a function is</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" style="background-color: white;" src="https://www.johndcook.com/hilbert_fourier1.svg" alt="f(t) = \sum_{n=1}^\infty \left\{ a_n \sin(nt) + b_n\cos(nt) \right\}" width="268" height="49" /></p>
<p>then the Fourier series of its Hilbert transform is</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" style="background-color: white;" src="https://www.johndcook.com/hilbert_fourier2.svg" alt="f_H(x) = \sum_{n=1}^\infty \left\{ -b_n \sin(nx) + a_n\cos(nx) \right\}" width="303" height="49" /></p>
<p>When I looked back at that post I thought about how if you thought of the Fourier coefficients as elements of an infinite vector then the Hilbert transform can be represented as multiplying by an infinite block matrix.</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" style="background-color: white;" src="https://www.johndcook.com/hilbert_transform_matrix.svg" alt="\left[ \begin{array}{cc|cc|cc|c} 0 &amp; -1 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; \cdots \\ 1 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; \cdots \\ \hline 0 &amp; 0 &amp; 0 &amp; -1 &amp; 0 &amp; 0 &amp; \cdots \\ 0 &amp; 0 &amp; 1 &amp; 0 &amp; 0 &amp; 0 &amp; \cdots \\ \hline 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; -1 &amp; \cdots \\ 0 &amp; 0 &amp; 0 &amp; 0 &amp; 1 &amp; 0 &amp; \cdots \\ \hline \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \ddots \end{array} \right] \left[ \begin{array}{c} a_1 \\ b_1 \\ \hline a_2 \\ b_2 \\ \hline a_3 \\ b_3 \\ \hline \vdots \end{array} \right]" width="363" height="183" /></p>
<p>I rarely see infinite matrices except in older math books. Apparently they were more fashionable a few decades ago than they are now. I suppose the notation falls between two stools, too concrete for some tastes and not concrete enough for others. The former folks would prefer something like <em>H</em> and the latter would prefer the sum above.</p>The post <a href="https://www.johndcook.com/blog/2026/05/23/hilbert-transform-as-an-infinite-matrix/">Hilbert transform as an infinite matrix</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/05/23/hilbert-transform-as-an-infinite-matrix/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Real and imaginary parts</title>
		<link>https://www.johndcook.com/blog/2026/05/23/real-and-imaginary-parts/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sat, 23 May 2026 13:57:21 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<category><![CDATA[Differential equations]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247038</guid>

					<description><![CDATA[<p>The previous post announced some notes I wrote up based on an article by Henry Baker implementing functions of a complex variable in terms of functions of a real variable. That is, it finds functions u(x, y) and v(x, y) such that f(x + iy) = u(x, y) + i v(x, y) where x, y, u, and v are [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/23/real-and-imaginary-parts/">Real and imaginary parts</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The previous post announced some <a href="https://www.johndcook.com/complex_real.html">notes</a> I wrote up based on an article by Henry Baker implementing functions of a complex variable in terms of functions of a real variable. That is, it finds functions <em>u</em>(<em>x</em>, <em>y</em>) and <em>v</em>(<em>x</em>, <em>y</em>) such that</p>
<p style="padding-left: 40px;"><em>f</em>(<em>x</em> + <em>iy</em>) = <em>u</em>(<em>x</em>, <em>y</em>) + <em>i</em> <em>v</em>(<em>x</em>, <em>y</em>)</p>
<p>where <em>x</em>, <em>y</em>, <em>u</em>, and <em>v</em> are all real-valued. Not only that, but if <em>f</em> is an elementary function, so are <em>u</em> and <em>v</em>. Here &#8220;elementary&#8221; has a technical meaning, but essentially it means functions that you could evaluate on a scientific calculator. A couple of the functions might be unfamiliar, such as sgn and atan2, but there are no functions like the gamma function that are defined in terms of integrals.</p>
<p>One application of Baker&#8217;s equations would be to bootstrap a math library that doesn&#8217;t support complex numbers into one that does. But the equations could be useful in pure math when you&#8217;d like to have a convenient expression for the real or imaginary part of a function.</p>
<p>The real and imaginary parts of a complex analytic function are harmonic functions. So the functions on the right hand side of Baker&#8217;s equations satisfy <a href="https://www.johndcook.com/blog/2022/11/22/dirichlet/">Laplace&#8217;s equation</a>:</p>
<p style="padding-left: 40px;"><em>u</em><sub><em>xx</em></sub> + <em>u</em><sub><em>yy</em></sub> = 0</p>
<p>and</p>
<p style="padding-left: 40px;"><em>v</em><sub><em>xx</em></sub> + <em>v</em><sub><em>yy</em></sub> = 0.</p>
<p>Furthermore, the functions <em>u</em> and <em>v</em> form harmonic conjugate pairs, meaning each is the <a href="https://www.johndcook.com/blog/2022/04/13/hilbert-fourier/">Hilbert transform</a> of the other.</p>The post <a href="https://www.johndcook.com/blog/2026/05/23/real-and-imaginary-parts/">Real and imaginary parts</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Building complex functions out of real parts</title>
		<link>https://www.johndcook.com/blog/2026/05/22/complex-functions-real-parts/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sat, 23 May 2026 03:24:10 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247036</guid>

					<description><![CDATA[<p>A couple months ago I wrote about how to compute the sine and cosine of a complex number using only real functions of real variables using the equations You can do something analogous for all the elementary functions, though some of the equations are quite a bit more complicated than the ones above. See the [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/22/complex-functions-real-parts/">Building complex functions out of real parts</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p><a href="https://www.johndcook.com/blog/2026/03/27/complex-argument/">A couple months ago</a> I wrote about how to compute the sine and cosine of a complex number using only real functions of real variables using the equations</p>
<p><img loading="lazy" decoding="async" class="aligncenter" src="https://www.johndcook.com/complex_sincos1.svg" alt="\begin{align*} \sin(x + iy) &amp;= \sin x \cosh y + i \cos x \sinh y \\ \cos(x + iy) &amp;= \cos x \cosh y - i \sin x \sinh y \\ \end{align*}" width="324" height="47" /></p>
<p>You can do something analogous for all the elementary functions, though some of the equations are quite a bit more complicated than the ones above. See the equations <a href="https://www.johndcook.com/complex_real.html">here</a>.</p>
<p>The equations come from a paper by Henry G. Baker, cited in the linked page. I wrote up Baker&#8217;s equations in LaTeX, then used ChatGPT to generate Python code from the LaTeX to numerically verify the equations and my typesetting of them. This caught a few typos on my part.</p>
<p>The test code evaluated the equations at points from each quadrant. All matched NumPy, implying that Baker and NumPy use the same branch cuts on inverse functions.</p>
<p style="text-align: center;">***</p>
<p>This post is part of a thread that has gone on for a few days. Maybe it&#8217;s the last post in the thread; we&#8217;ll see.</p>
<p>It all started with a post on <a href="https://www.johndcook.com/blog/2026/05/19/zagiers-equation/">Markov&#8217;s equation</a></p>
<p style="padding-left: 40px;"><em>x</em>² + <em>y</em>² + <em>z</em>² = 3 <em>xyz</em></p>
<p>and an approximation to the equation that has a closed-form solution. That led to the identity</p>
<p style="padding-left: 40px;">cosh( arccosh(<em>a</em>) + arccosh(<em>b</em>) ) = <em>ab</em> + √(<em>a</em>² − 1) √(<em>b</em>² − 1).</p>
<p>The approximation to Markov&#8217;s equation only needed the identity to be valid for real <em>a</em> and <em>b</em> greater than 1. But when I <a href="https://www.johndcook.com/blog/2026/05/19/closer-look-at-an-identity/">looked closer</a> at the identity I found several complications with branch cuts. The identity doesn&#8217;t hold everywhere using the principle branch of the square root function. But if you <a href="https://www.johndcook.com/blog/2026/05/19/square-root-of-x-squared-minus-one/">define</a> √(<em>z</em>² − 1) to have a branch cut along [−1, 1] then the equation holds everywhere in the complex plane. And that led to my writing up some <a href="https://www.johndcook.com/inverse.html">notes</a> on how to define all the elementary inverse functions in terms of log.</p>
<p>Someone reading these posts suggested I look at a paper that mentioned &#8220;couth&#8221; and &#8220;uncouth&#8221; function pairs, which led to <a href="https://www.johndcook.com/blog/2026/05/21/couth-and-uncouth-function-pairs/">this post</a> and its <a href="https://www.johndcook.com/blog/2026/05/21/circular-hyperbolic-rotations/">warm up</a>.</p>
<p>I find all this interesting because it&#8217;s an advanced perspective on a questions that are latent in an intro calculus class. What <em>exactly</em> do functions like arccos mean and why where they defined as they were? These are fairly deep and interesting questions that are swept under the rug, and swept there for good reason. A calculus class has to cover an enormous amount of material and there&#8217;s no time to dwell on fine points. Some of my favorite posts look back leisurely on things that go by in a blur when you&#8217;re a student.</p>The post <a href="https://www.johndcook.com/blog/2026/05/22/complex-functions-real-parts/">Building complex functions out of real parts</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Couth and uncouth function pairs</title>
		<link>https://www.johndcook.com/blog/2026/05/21/couth-and-uncouth-function-pairs/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 21 May 2026 17:31:13 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<category><![CDATA[Special functions]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247031</guid>

					<description><![CDATA[<p>&#8220;You can&#8217;t always get what you want. But sometimes you get what you need.&#8221; — The Rolling Stones Circular functions and hyperbolic functions aren&#8217;t invertible, but we invert them anyway. These functions map many points in the domain to each point in the range, and we invert them by mapping a point in the range [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/21/couth-and-uncouth-function-pairs/">Couth and uncouth function pairs</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>&#8220;You can&#8217;t always get what you want. But sometimes you get what you need.&#8221; — The Rolling Stones</p>
<p>Circular functions and hyperbolic functions aren&#8217;t invertible, but we invert them anyway. These functions map many points in the domain to each point in the range, and we invert them by mapping a point in the range back to <em>some</em> point in the domain. Often this works as expected, but sometimes it doesn&#8217;t.</p>
<p>In the <a href="https://www.johndcook.com/blog/2026/05/21/circular-hyperbolic-rotations/">previous post</a> I said</p>
<blockquote><p>You can relate each trig function &#8220;foo&#8221; with its hyperbolic counterpart &#8220;fooh&#8221; by applying one of the functions to <em>iz</em> then multiplying by a constant <em>c</em> that depends on foo: <em>c</em> = <em>i</em> for sin and tan, <em>c</em> = 1 for cos and sec, and <em>c</em> = −<em>i</em> for csc and cot.</p></blockquote>
<p>In symbols,</p>
<p style="padding-left: 40px;"><em>c</em> foo(<em>z</em>) = fooh(<em>iz</em>).</p>
<p>Let&#8217;s suppose foo and fooh are invertible, ignoring any complications, and solve foo(<em>z</em>) = <em>w</em> for <em>z</em>. We get</p>
<p style="padding-left: 40px;"><em>i</em> foo<sup>−1</sup>(<em>w</em>) = fooh<sup>−1</sup>(<em>cw</em>)</p>
<p>or using &#8220;arc&#8221; nomenclature for inverse functions</p>
<p style="padding-left: 40px;"><em>i</em> arcfoo(<em>w</em>) = arcfooh(<em>cw</em>).</p>
<p>When the naive calculation above holds, except possibly at a finite number of points, we say the pair (foo, fooh) is <strong>couth</strong>. Otherwise we say the pair is <strong>uncouth</strong>. These term were coined by Robert Corless and his coauthors in their paper [1].</p>
<p>Whether the pair (foo, fooh) is couth depends not only on foo and fooh, but also on the details of how arcfoo and arcfooh are defined.</p>
<p>In Python&#8217;s NumPy library, the pairs (sin, sinh) and (tan, tanh) are couth, but the pair (cos, cosh) is uncouth.</p>
<p>Numpy doesn&#8217;t define the reciprocal functions sec, sech, csc, csch, cot, and coth. I used to find that annoying, but I&#8217;m beginning to think that was wise. These functions cause problems. For example, there may be two reasonable ways to define these functions, one of which forms a couth pair and one of which forms an uncouth pair.</p>
<p>For example, how should you define cot and coth? There would be no disagreement over the definition</p>
<pre>cot = lambda x: 1/tan(x)</pre>
<p>but there are at least two definitions of inverse coth that you&#8217;ll find in practice:</p>
<pre>
arccot = lambda z: 0.5*pi - arctan(z)
arccot = lambda z: arctan(1/z).
</pre>
<p>Both definitions have their advantages, but the former is uncouth while the latter is couth. You can verify that both definitions are the same at <em>z</em> = 1 but not at <em>z</em> = &minus;1.</p>
<p>With the following definitions, the pairs (cos, cosh) and (sec, sech) are uncouth and the rest are couth.</p>
<pre>
from numpy import *

csc     = lambda x: 1/sin(x)
sec     = lambda x: 1/cos(x)
cot     = lambda x: 1/tan(x)
csch    = lambda x: 1/sinh(x)
sech    = lambda x: 1/cosh(x)
coth    = lambda x: 1/tanh(x)

arccot  = lambda z: arctan(1/z)
arcsec  = lambda z: arccos(1/z)
arccsc  = lambda z: arcsin(1/z)
arccoth = lambda z: arctanh(1/z)
arcsech = lambda z: arccosh(1/z)
arccsch = lambda z: arcsinh(1/z)
</pre>
<p>[1] &#8220;According to Abramowitz and Stegun&#8221; or arccoth needn&#8217;t be uncouth. Robert M. Corless <em>et al</em>. ACM SIGSAM Bulletin, Volume 34, Issue 2, pp 58 &#8211; 65 https://doi.org/10.1145/362001.362023</p>The post <a href="https://www.johndcook.com/blog/2026/05/21/couth-and-uncouth-function-pairs/">Couth and uncouth function pairs</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Circular and hyperbolic functions differ by rotations</title>
		<link>https://www.johndcook.com/blog/2026/05/21/circular-hyperbolic-rotations/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Thu, 21 May 2026 15:53:56 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247032</guid>

					<description><![CDATA[<p>The difference between a circular function and a hyperbolic function is a rotation or two. For example, cosh(z) = cos(iz). You can read that as saying that to find the hyperbolic cosine of z, first you rotate z a quarter turn to the left (i.e. multiply by i) and then take the cosine. For another example, [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/21/circular-hyperbolic-rotations/">Circular and hyperbolic functions differ by rotations</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The difference between a circular function and a hyperbolic function is a rotation or two.</p>
<p>For example, cosh(<em>z</em>) = cos(<em>iz</em>). You can read that as saying that to find the hyperbolic cosine of <em>z</em>, first you rotate <em>z</em> a quarter turn to the left (i.e. multiply by <em>i</em>) and then take the cosine.</p>
<p>For another example, sinh(<em>z</em>) = −<em>i</em> sin(<em>iz</em>). This says that you can calculate the hyperbolic sine of <em>z</em> by rotating <em>z</em> to the left, taking the sine, and then rotating to the right.</p>
<p>You can relate each trig function &#8220;foo&#8221; with its hyperbolic counterpart &#8220;fooh&#8221; by applying one of the functions to <em>iz</em> then multiplying by a constant <em>c</em> that depends on foo: <em>c</em> = <em>i</em> for sin and tan, <em>c</em> = 1 for cos and sec, and <em>c</em> = −<em>i</em> for csc and cot.</p>
<p>Note that if the constant for foo is <em>c</em>, the constant for 1/foo is 1/<em>c</em>. So, for example, the constant for tan is <em>i</em> and the constant for cot is 1/<em>i</em> = −<em>i</em>.</p>
<p>We have four groups of equations, depending on whether the left side of the equation is foo(<em>iz</em>), fooh(<em>iz</em>), foo(<em>z</em>), or fooh(<em>z</em>).</p>
<p>This post was written as a warm-up for the <a href="https://www.johndcook.com/blog/2026/05/21/couth-and-uncouth-function-pairs/">next post</a> on couth and uncouth function pairs.</p>
<h2>foo(iz)</h2>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/fooiz2.png" width="320" height="291" /></p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/fooiz.svg" alt="\begin{align*} \sin(iz) &amp; = \phantom{-}i\sinh(z) \\ \cos(iz) &amp; = \phantom{-i}\cosh(z) \\ \tan(iz) &amp; = \phantom{-}i\tanh(z) \\ \csc(iz) &amp; = -i\text{csch}(z) \\ \sec(iz) &amp; = \phantom{-i}\text{sech}(z) \\ \cot(iz) &amp; = -i\coth(z) \\ \end{align*}" width="163" height="163" /></p>
<h2>fooh(iz)</h2>
<p><img loading="lazy" decoding="async" class="size-medium aligncenter" src="https://www.johndcook.com/foohiz2.png" width="320" height="291" /><br />
<img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/foohiz.svg" alt="\begin{align*} \sinh(iz) &amp; = \phantom{-}i\sin(z) \\ \cosh(iz) &amp; = \phantom{-i}\cos(z) \\ \tanh(iz) &amp; = \phantom{-}i\tan(z) \\ \text{csch}(iz) &amp; = -i\csc(z) \\ \text{sech}(iz) &amp; = \phantom{-i}\sec(z) \\ \coth(iz) &amp; = -i\cot(z) \\ \end{align*}" width="163" height="163" /></p>
<h2>foo(z)</h2>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/fooz2.png" width="320" height="297" /></p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/fooz.svg" alt="\begin{align*} \sin(z) &amp; = -i\sinh(iz) \\ \cos(z) &amp; = \phantom{-i}\cosh(iz) \\ \tan(z) &amp; = -i\tanh(iz) \\ \csc(z) &amp; = \phantom{-}i\text{csch}(iz) \\ \sec(z) &amp; = \phantom{-i}\text{sech}(iz) \\ \cot(z) &amp; = \phantom{-}i\coth(iz) \\ \end{align*}" width="163" height="163" /></p>
<h2>fooh(z)</h2>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/foohz2.png" width="320" height="296" /></p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/fooh.svg" alt="\begin{align*} \sinh(z) &amp; = -i\sin(iz) \\ \cosh(z) &amp; = \phantom{-i}\cos(iz) \\ \tanh(z) &amp; = -i\tan(iz) \\ \text{csch}(z) &amp; = \phantom{-}i\csc(iz) \\ \text{sech}(z) &amp; = \phantom{-i}\sec(iz) \\ \coth(z) &amp; = \phantom{-}i\cot(iz) \\ \end{align*} " width="163" height="163" /></p>The post <a href="https://www.johndcook.com/blog/2026/05/21/circular-hyperbolic-rotations/">Circular and hyperbolic functions differ by rotations</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Square root of x² − 1</title>
		<link>https://www.johndcook.com/blog/2026/05/19/square-root-of-x-squared-minus-one/</link>
					<comments>https://www.johndcook.com/blog/2026/05/19/square-root-of-x-squared-minus-one/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Wed, 20 May 2026 00:49:50 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247029</guid>

					<description><![CDATA[<p>How should we define √(z² − 1)? Well, you could square z, subtract 1, and take the square root. What else would you do?! The question turns out to be more subtle than it looks. When x is a non-negative real number, √x is defined to be the non-negative real number whose square is x. When x is [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/19/square-root-of-x-squared-minus-one/">Square root of x² − 1</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>How should we define √(<em>z</em>² − 1)? Well, you could square <em>z</em>, subtract 1, and take the square root. What else would you do?!</p>
<p>The question turns out to be more subtle than it looks.</p>
<p>When <em>x</em> is a non-negative real number, √<em>x</em> is defined to be the non-negative real number whose square is <em>x</em>. When <em>x</em> is a complex number √<em>x</em> is defined to be <strong>a</strong> function that extends √<em>x</em> from the real line to the complex plane by analytic continuation. But we can&#8217;t extend √<em>x</em> as an analytic function to the entire complex plane ℂ. We have to choose to make a &#8220;cut&#8221; somewhere, and the conventional choice is to make a cut along the negative real axis.</p>
<h2>Using the principal branch</h2>
<p>The &#8220;principal branch&#8221; of the square root function is defined to be the unique function that analytically extends √<em>x</em> from the positive reals to ℂ \ (−∞, 0].</p>
<p>Assume for now that by √<em>x</em> we mean the principal branch of the square root function. Now what does √(<em>z</em>² − 1) mean? It <em>could</em> mean just what we said at the top of the post: we square <em>z</em>, subtract 1, and apply the (principal branch of the) square root function. If we do that, we must exclude those values of <em>z</em> such that (<em>z</em>² − 1) is negative. This means we have to cut out the imaginary axis and the interval [−1, 1].</p>
<p>This is what Mathematica will do when asked to evaluate <code>Sqrt[z^2 - 1]</code>. The command</p>
<pre>ComplexPlot[Sqrt[z^2 - 1], {z, -2 - 2 I, 2 + 2 I}]</pre>
<p>makes the branch cuts clear by abrupt changes in color.</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/sqrt_branch1.png" width="480" height="483" /></p>
<h2>A different approach</h2>
<p>Now let&#8217;s take a different approach. Consider the function √(<em>z</em>² − 1) as a whole. Do not think of it procedurally as above, first squaring <em>z</em> etc. Instead, think of a it as a black box that takes in <em>z</em> and returns a complex number whose square is <em>z</em>² − 1.</p>
<p>This function has an obvious definition for <em>z</em> &gt; 1. And we can extend this function, via analytic continuation, to more of the complex plane. We can do this <em>directly</em>, not by extending the square root function. But as before, we cannot extend the function analytically to all of ℂ. We have to cut something out. A common choice is to cut out [−1, 1]. This eliminates the need for a branch cut along the imaginary axis.</p>
<p>The function</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cosh_sum5.svg" alt="f(z) = \exp\left( \tfrac{1}{2}\left( \log(z - 1) + \log(z + 1) \right) \right)" width="320" height="36" /></p>
<p>extends √(<em>z</em>² − 1) the way we want [1].</p>
<p>The Mathematica code</p>
<pre>ComplexPlot[Exp[(1/2) (Log[z - 1 ] + Log[z + 1])], {z, -2 - 2 I, 2 + 2 I}]</pre>
<p>shows that our function is now continuous across the imaginary axis, though there&#8217;s still a discontinuity as you cross [−1, 1].</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/sqrt_branch2.png" width="480" height="483" /></p>
<p>We used this analytic extension of √(<em>z</em>² − 1) in the <a href="https://www.johndcook.com/blog/2026/05/19/closer-look-at-an-identity/">previous post</a> to eliminate branch cuts in an identity.</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2026/03/09/trig-composition-table/">Trig composition table</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2026/03/10/sinh-arccosh/">sinh( arccosh(x) )</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2025/03/03/mathematica-contour-plot/">Duplicating a hand-drawn plot</a></li>
</ul>
<p>[1] The principal branch of the logarithm has a cut along the negative real axis. Why does our square root function, defined using log, not have a branch cut along the negative axis?</p>
<p>First of all, the log function, and Mathematica&#8217;s implementation of it <code>Log[]</code>, isn&#8217;t undefined on (−∞, 1), it just isn&#8217;t continuous there. The function still has a value. By convention, the value is taken to be the limit of log(<em>z</em>) approaching <em>z</em> from above, i.e. from the 2nd quadrant.</p>
<p>Second, the value of (log(<em>z</em> &#8211; 1) + log(<em>z</em> + 1))/2 differs by a factor of 2π<em>i</em> when approaching a value <em>z</em> &lt; −1 from above versus from below. This factor goes away when taking the exponential. So our function <em>f</em>(<em>z</em>) is analytic across (−∞, 1) even though the log functions in its definition are not.</p>The post <a href="https://www.johndcook.com/blog/2026/05/19/square-root-of-x-squared-minus-one/">Square root of x² − 1</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
					<wfw:commentRss>https://www.johndcook.com/blog/2026/05/19/square-root-of-x-squared-minus-one/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Closer look at an identity</title>
		<link>https://www.johndcook.com/blog/2026/05/19/closer-look-at-an-identity/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Tue, 19 May 2026 23:37:31 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Complex analysis]]></category>
		<category><![CDATA[Mathematica]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247027</guid>

					<description><![CDATA[<p>The previous post derived the identity and said in a footnote that the identity holds at least for x &#62; 1 and y &#62; 1. That&#8217;s true, but let&#8217;s see why the footnote is necessary. Let&#8217;s have Mathematica plot The plot will be 0 where the identity above holds. The plot is indeed flat for x &#62; 1 [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/19/closer-look-at-an-identity/">Closer look at an identity</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>The previous post derived the identity</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cosh_sum1.svg" alt="\cosh\Big( \operatorname{arccosh}(x) + \operatorname{arccosh}(y)\Big) = xy + \sqrt{x^2 -1} \sqrt{y^2 -1}" width="452" height="38" /></p>
<p>and said in a footnote that the identity holds at least for <em>x</em> &gt; 1 and <em>y</em> &gt; 1. That&#8217;s true, but let&#8217;s see why the footnote is necessary.</p>
<p>Let&#8217;s have Mathematica plot</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cosh_sum2.svg" alt="\left| \cosh\Big( \operatorname{arccosh}(x) + \operatorname{arccosh}(y)\Big) - xy - \sqrt{x^2 -1} \sqrt{y^2 -1} \right|" width="461" height="48" /></p>
<p>The plot will be 0 where the identity above holds.</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/coshacosh1.png" width="480" height="399" /></p>
<p>The plot is indeed flat for <em>x</em> &gt; 1 and <em>y</em> &gt; 1, and more, but not everywhere.</p>
<p>If we combine the two square roots</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cosh_sum3.svg" alt="\left| \cosh\Big( \operatorname{arccosh}(x) + \operatorname{arccosh}(y)\Big) - xy - \sqrt{(x^2 -1) (y^2 -1)} \right|" width="479" height="48" /></p>
<p>and plot again we still get a valid identity for <em>x</em> &gt; 1 and <em>y</em> &gt; 1, but the plot changes.</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/coshacosh2.png" width="480" height="399" /></p>
<p>This is because √<em>a</em> √<em>b</em> does not necessarily equal √(<em>ab</em>) when the arguments may be negative.</p>
<p>The square root function and the arccosh function are not naturally single-valued functions. They require branch cuts to force them to be single-valued, and the two functions require <em>different</em> branch cuts. I go into this in some detail <a href="https://www.johndcook.com/blog/2026/03/10/sinh-arccosh/">here</a>.</p>
<p>There is a way to reformulate our identity so that it holds everywhere. If we replace</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cosh_sum4.svg" alt="\sqrt{z^2 - 1}" width="63" height="24" /></p>
<p>with</p>
<p><img loading="lazy" decoding="async" class="aligncenter" style="background-color: white;" src="https://www.johndcook.com/cosh_sum5.svg" alt="f(z) = \exp\left( \tfrac{1}{2}\left( \log(z - 1) + \log(z + 1) \right) \right)" width="320" height="36" /></p>
<p>which is equivalent for <em>z</em> &gt; 1, the corresponding identity holds everywhere.</p>
<p>We can verify this with the following Mathematica code.</p>
<pre>f[z_] := Exp[(1/2) (Log[z - 1 ] + Log[z + 1])]
FullSimplify[Cosh[ArcCosh[x] + ArcCosh[y]] - x y - f[x] f[y]]
</pre>
<p>This returns 0.</p>
<p>By contrast, the code</p>
<pre>FullSimplify[
 Cosh[ArcCosh[x] + ArcCosh[y]] - x y - Sqrt[x^2 - 1] Sqrt[y^2 - 1]]</pre>
<p>simply returns its input with no simplification, unless we add restrictions on <em>x</em> and <em>y</em>. The code</p>
<pre>FullSimplify[
 Cosh[ArcCosh[x] + ArcCosh[y]] - x y - Sqrt[x^2 - 1] Sqrt[y^2 - 1], 
 Assumptions -&gt; {x &gt; -1 &amp;&amp; y &gt; -1}]
</pre>
<p>does return 0.</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2026/03/12/arccos/">Inverse cosine</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2016/12/17/branch-cuts-and-common-lisp/">Branch cuts and Common Lisp</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2025/01/22/duplicating-hankel-plot-from-as/">Duplicating a plot from A &amp; S</a></li>
</ul>The post <a href="https://www.johndcook.com/blog/2026/05/19/closer-look-at-an-identity/">Closer look at an identity</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Approximating Markov&#8217;s equation</title>
		<link>https://www.johndcook.com/blog/2026/05/19/zagiers-equation/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Tue, 19 May 2026 12:09:54 +0000</pubDate>
				<category><![CDATA[Math]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247025</guid>

					<description><![CDATA[<p>Markov numbers are integer solutions to x² + y² + z² = 3xyz. The Wikipedia article on Markov numbers mentions that Don Zagier studied Markov numbers by looking the approximating equation x² + y² + z² = 3xyz + 4/9 which is equivalent to f(x) + f(y) = f(z) where f(t) is defined as arccosh(3t/2). It wasn&#8217;t clear to me why the [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/19/zagiers-equation/">Approximating Markov’s equation</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>Markov numbers are integer solutions to</p>
<p style="padding-left: 40px;"><em>x</em>² + <em>y</em>² + <em>z</em>² = 3<em>xyz</em>.</p>
<p>The Wikipedia article on Markov numbers mentions that Don Zagier studied Markov numbers by looking the approximating equation</p>
<p style="padding-left: 40px;"><em>x</em>² + <em>y</em>² + <em>z</em>² = 3<em>xyz</em> + 4/9</p>
<p>which is equivalent to</p>
<p style="padding-left: 40px;"><em>f</em>(<em>x</em>) + <em>f</em>(<em>y</em>) = <em>f</em>(<em>z</em>)</p>
<p>where <em>f</em>(<em>t</em>) is defined as arccosh(3<em>t</em>/2). It wasn&#8217;t clear to me why the two previous equations are equivalent, so I&#8217;m writing this post to show that they are equivalent.</p>
<h2>Examples</h2>
<p>Before showing the equivalence of Zagier&#8217;s two equations, let&#8217;s look at an example that shows solutions to his second equation approximate solutions to Markov&#8217;s equation.</p>
<p>The following code verifies that (5, 13, 194) is a solution to Markov&#8217;s equation.</p>
<pre>x, y, z = 5, 13, 194
assert(x**2 + y**2 + z**2 == 3*x*y*z)
</pre>
<p>With the same <em>x</em> and <em>y</em> above, let&#8217;s show that the <em>z</em> in Zagier&#8217;s second equation is close to the <em>z</em> above.</p>
<pre>from math import cosh, acosh

f = lambda t: acosh(3*t/2)
g = lambda t: cosh(t)*2/3
z = g(f(x) + f(y))
print(z)
</pre>
<p>This gives <em>z</em> = 194.0023, which is close to the value of <em>z</em> in the Markov triple above.</p>
<h2>Applying Osborn&#8217;s rule</h2>
<p>Now suppose</p>
<p style="padding-left: 40px;"><em>f</em>(<em>x</em>) + <em>f</em>(<em>y</em>) = <em>f</em>(<em>z</em>)</p>
<p>which expands to</p>
<p style="padding-left: 40px;">arccosh(3<em>x</em>/2) + arccosh(3<em>y</em>/2)  = arccosh(3<em>z</em>/2).</p>
<p>It seems sensible to apply cosh to both sides. Is there some identity for cosh of a sum? Maybe you recall the equation for cosine of a sum:</p>
<p style="padding-left: 40px;">cos(<em>a</em> + <em>b</em>) = cos(<em>a</em>) cos(<em>b</em>) − sin(<em>a</em>) sin(<em>b</em>).</p>
<p>Then <a href="https://www.johndcook.com/blog/2024/08/20/osborn-rule/">Osborn&#8217;s rule</a> says the corresponding hyperbolic identity is</p>
<p style="padding-left: 40px;">cosh(<em>a</em> + <em>b</em>) = cosh(<em>a</em>) cosh(<em>b</em>) − sinh(<em>a</em>) sinh(<em>b</em>).</p>
<p>Osborn&#8217;s rule also says that the analog of the familiar identity</p>
<p style="padding-left: 40px;">sin²(<em>a</em>) + cos²(<em>b</em>) = 1</p>
<p>is</p>
<p style="padding-left: 40px;">sinh²(<em>a</em>) = cosh²(<em>b</em>) − 1.</p>
<p>From these two hyperbolic identities we can show that [1]</p>
<p style="padding-left: 40px;">cosh( arccosh(<em>a</em>) + arccosh(<em>b</em>) ) = <em>ab</em> + √(<em>a</em>² − 1) √(<em>b</em>² − 1).</p>
<h2>Slug it out</h2>
<p>The identity derived above is the tool we need to reduce our task to routine algebra.</p>
<p>If</p>
<p style="padding-left: 40px;">arccosh(3<em>x</em>/2) + arccosh(3<em>y</em>/2)  = arccosh(3<em>z</em>/2)</p>
<p>then</p>
<p style="padding-left: 40px;">(3<em>x</em>/2)  (3<em>y</em>/2)  + √((3<em>x</em>/2)² − 1) √((3<em>y</em>/2)² − 1) = 3<em>z</em> / 2</p>
<p>which simplifies to Zagier&#8217;s equation</p>
<p style="padding-left: 40px;"><em>x</em>² + <em>y</em>² + <em>z</em>² = 3<em>xyz</em> + 4/9.</p>
<h2>Related posts</h2>
<ul>
<li class="link"><a href="https://www.johndcook.com/blog/2023/09/22/prime-weeds/">Primes and weeds</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2026/03/10/sinh-arccosh/">sinh( arccosh(x) )</a></li>
<li class="link"><a href="https://www.johndcook.com/blog/2022/01/08/diophantine/">Three paths converge</a></li>
</ul>
<p>[1] The equation holds at least for <em>x</em> &gt; 1 and <em>y</em> &gt; 1, which is enough for this post. More general arguments run into complications due to branch cuts.</p>The post <a href="https://www.johndcook.com/blog/2026/05/19/zagiers-equation/">Approximating Markov’s equation</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Recovering the state of xorshift128</title>
		<link>https://www.johndcook.com/blog/2026/05/15/xorshift128-state/</link>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Fri, 15 May 2026 12:34:29 +0000</pubDate>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[RNG]]></category>
		<guid isPermaLink="false">https://www.johndcook.com/blog/?p=247024</guid>

					<description><![CDATA[<p>I&#8217;ve written a couple posts lately about reverse engineering the internal state of a random number generator, first Mersenne Twister then lehmer64. This post will look at xorshift128, implemented below. import random # Seed the generator state a: int = random.getrandbits(32) b: int = random.getrandbits(32) c: int = random.getrandbits(32) d: int = random.getrandbits(32) MASK = [&#8230;]</p>
The post <a href="https://www.johndcook.com/blog/2026/05/15/xorshift128-state/">Recovering the state of xorshift128</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></description>
										<content:encoded><![CDATA[<p>I&#8217;ve written a couple posts lately about reverse engineering the internal state of a random number generator, first <a href="https://www.johndcook.com/blog/2026/05/10/reverse-mersenne-twister/">Mersenne Twister</a> then <a href="https://www.johndcook.com/blog/2026/05/12/hacking-the-lehmer64-rng/">lehmer64</a>. This post will look at xorshift128, implemented below.</p>
<pre>import random

# Seed the generator state
a: int = random.getrandbits(32)
b: int = random.getrandbits(32)
c: int = random.getrandbits(32)
d: int = random.getrandbits(32)

MASK = 0xFFFFFFFF

def xorshift128() -&gt; int:
    global a, b, c, d

    t = d
    s = a

    t ^= (t &lt;&lt; 11) &amp; MASK t ^= (t &gt;&gt;  8) &amp; MASK
    s ^= (s &gt;&gt; 19) &amp; MASK

    a, b, c, d = (t ^ s) &amp; MASK, a, b, c

    return a
</pre>
<h2>Recovering state</h2>
<p>Recovering the internal state of the generator is simple: it&#8217;s the four latest outputs in reverse order. This is illustrated by the following chart.</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-medium" src="https://www.johndcook.com/xorshift128_chart.png" alt=" a b c d output 5081e5ca 79259a41 63e12955 651e537d c793d11c c793d11c 5081e5ca 79259a41 63e12955 ad52e33a ad52e33a c793d11c 5081e5ca 79259a41 f8f09343 f8f09343 ad52e33a c793d11c 5081e5ca a7009622 a7009622 f8f09343 ad52e33a c793d11c fe42a8ef " width="400" height="111" /></p>
<p>This means that once we&#8217;ve seen four outputs, we can predict the rest of the outputs. The following code demonstrates this.</p>
<p>Let&#8217;s generate five random values.</p>
<pre>out = [xorshift128() for _ in range(5)]</pre>
<p>Running</p>
<pre>print(hex(out[4]))
</pre>
<p>shows the output 0xc3f4795d.</p>
<p>If we reset the state of the generator using the first four outputs</p>
<pre>d, c, b, a, _ = out
print(hex(xorshift128()))
</pre>
<p>we get the same result.</p>
<h2>Good stats, bad security</h2>
<p>Mersenne Twister and lehmer64 have good statistical properties, despite being predictable. The xorshift128 generator is even easier to predict, but it also has good statistical properties. These generators would be fine for many applications, such as Monte Carlo simulation, but disastrous for use in cryptography.</p>
<p>The post on lehmer64 mentioned at the end that the internal state of PCG64 can also be recovered from its output. However, doing so requires far more sophisticated math and thousands of hours of compute time. Still, it&#8217;s not adequate for cryptography. For that you&#8217;d need a random number generator designed to be secure, such as <a href="https://www.johndcook.com/blog/2019/03/02/chacha-adiantum/">ChaCha</a>.</p>
<p>So why not just use a cryptographically secure random number generator (CSPRNG) for everything? You could, but the other generators mentioned in this post use less memory and are much faster. PCG64 occupies an interesting middle ground: simple and fast, but not easily reversible.</p>The post <a href="https://www.johndcook.com/blog/2026/05/15/xorshift128-state/">Recovering the state of xorshift128</a> first appeared on <a href="https://www.johndcook.com/blog">John D. Cook</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
