On Thomas Bayes

This post is part of a larger piece on Thomas Bayes, and two other nonconformist ministers, Richard Price and William Godwin (!). It started life as a footnote and grew, so I gave it a new home, with a bit more room to stretch its legs. The first part is a brief introduction to Bayesian probability, and the second part talks about Bayes himself and (very briefly) about Price, Bayes’ literary executor.

Bayesian Probability

A lot of people have been talking about Bayesian probability over the last decade or so. I don’t think this is a new thing, but it seems to have taken on a new kind of urgency, perhaps because a lot of important ideas in machine learning and text mining are Bayesian, or at least Bayesianish. If you’re not familiar with Bayesianism, it boils down to the assertion that probability measures our ignorance about the world, rather than the world’s own irreducible uncertainty. This is sometimes called a “subjectivist” way of thinking about probability, although I’ve never found that an especially useful term. What does it mean to say that ignorance is subjective?

I like the way Thomas Bayes himself put it, at least once it’s rephrased into straightforward language. It has to do with expectation values — say, the expected weight of a randomly chosen apple. According to an ordinary way of thinking about probability, if we wanted to find out the weight of “the average apple,” we’d start with a probability distribution over apple weights, and then calculate the weighted mean. Let’s reduce the size of the problem and say we’re bobbing for apples. There are five eight-ounce apples and three four-ounce apples in the barrel; that’s a total of 5 \times 8 + 3 \times 4 = 52 ounces, distributed over eight apples, for an expected value of six-and-a-half ounces. Assuming you have no particular apple-bobbing skills, but that you don’t stop until you get an apple, you’d probably get a good one five out of eight times. That means the expected value of one of the good ones is \frac{5}{8} \times 8 = 5, and the expected value of one of the bad ones is \frac{3}{8} \times 4 = 1.5. This sums to 6.5, the weight of “the average apple” in the barrel.

Bayes turned that on its head. He suggested that we start not with probabilities, but with the expectations themselves:

The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s [sic] happening.

In other words, Bayes defined probability by starting with expected values, and calculated the resulting probabilities based on them. If you are used to being offered probabilities as givens, this may look like a strange way of doing things. But mathematically, it’s no different from the more familiar approach. To find out the probability of getting a good apple, we just take the expected value of a good apple — the proportion of apple we can expect, in the long run, to come from good apples if we keep on bobbing1 — and divide it by the actual value of that good apple we just got: \frac{5}{8}. The result is the same as the probability we started with above.

A skeptic might now ask “but how do we know what the expected value is if we don’t already know the probability?” This is a great question. To answer it, I’m going to use a different example: a fiery plane crash.

We are told all the time that the probability of dying in a fiery plane crash is very low. “You’re more likely to be struck by lightning!” people say. Fair enough. But do we really know that? Isn’t that just hearsay? Somebody told somebody else told somebody else some numbers, and they did some math and told somebody else, and so on. I didn’t see it with my own eyes; why should I trust those people? When I step into an airplane cabin and momentarily imagine myself and all the people around me screaming in terror as we plunge towards our doom, you’ll forgive me if I have a moment of doubt.

For Bayes, the question is this: how much doubt? And the answer, in my case, is “not much.” I’ve ridden on planes many times now and nothing has gone wrong. I’m still alive. Everyone I know has done the same — I’ve never known someone who has died in a plane crash. I’ve never even known someone who knows someone who has died in a plane crash, which is probably why I’m being so insensitive to plane crash victims and their close relations right now. (Apologies!) I am strongly plane-crash averse, but I keep riding planes, and I don’t worry about it very much. This tells my inner Bayesian that plane crashes aren’t very probable. In fact,

\begin{aligned}probability\ of\ a\\plane\ crash\end{aligned} = \frac{\begin{aligned}how\ worried\ I\\am\ about\ dying\ in\\a\ fiery\ plane\ crash\end{aligned}}{\begin{aligned}how\ unpleasant\ it\ would\\be\ to\ actually\ die\ that\ way\end{aligned}}

According to this point of view, we can tell plane crashes aren’t very probable because I’m not worried about dying in one, even though it would be a truly awful way to die.

This way of thinking might seem bonkers, epistemologically speaking. But it’s mathematically identical to the other way of thinking about probability. And to tell the truth, I’ve made it sound more bonkers than it is by suggesting that my level of worry directly influences the probability of a plane crash.

The reason it sounds that way is that I left out a word: “ought.” If you look back at Bayes’ definition, you’ll notice that he didn’t talk about the expectation that we speculate is correct — he talked about the expectation that “ought to be computed.” That is a wonderful sleight of hand, isn’t it? Look what happens if I encounter someone who is extremely worried about dying in a plane crash. Armed with Bayes’ definition, I simply say “you’re more worried than you ought to be.”

\begin{aligned}probability\ of\ a\\plane\ crash\end{aligned} = \frac{\begin{aligned}how\ worried\ \\\boldsymbol{you\ ought\ to\ be}\ about\\ dying\ in\ a\ fiery\ plane\ crash\end{aligned}}{\begin{aligned}how\ unpleasant\ it\ would\\be\ to\ actually\ die\ that\ way\end{aligned}}

How do we know how worried we ought to be about dying in a plane crash? One way might be to add up all the pain and suffering caused by plane crashes, and divide it by the number of times everyone on earth has ever ridden a plane. Then we could compare that to the pain and suffering caused by lightning strikes, automobile accidents, the scratching of one’s finger, and so on. This again requires us to believe in some hearsay, but now it’s not so abstract. We’re talking about more than numbers now.

On the other hand, it requires us to quantify some very hard-to-quantify things, like human suffering. Maybe that’s another reason why people call this a “subjectivist” interpretation of probability. But a value isn’t subjective just because it’s hard to quantify, and an interpretation of probability isn’t epistemologically faulty just because it’s hard to act on. As someone who believes in the reality of human suffering, and who is interested in reducing it, I find this way of thinking about probability quite appealing, because it tells me something specific about the stakes.

Note also that the haziness of this “subjective” factor divides out of the equation. It ought to affect both the top of the fraction — our worry — and the bottom of the fraction — our actual suffering — with equal proportion. So in other words, if you know with great certainty that you won’t suffer in a crash because you will just pass out the instant the turbulence gets bad, then you ought to be proportionally less worried about a crash. (You might then be more worried about other aspects of flying though!)

In a short post like this, it’s hard to cover all the disagreements that arise over Bayesian interpretations of probability, and I have used a different starting point than most introductions. They generally start with Bayes rule,

p(C|E) = \frac{p(E|C) * p(C)}{p(E)}

which I left until now, because it tells you less about Bayesian probability than many people seem to think. This formula just tells us how to combine prior beliefs about a claim (p(C)) with new evidence (E and its probabilities p(E) and p(E|C)) to get an updated belief (p(C|E)). It’s just as useful to non-Bayesians as Bayesians — as long as you have a prior belief. The disagreement between Bayesians and non-Bayesians is over what makes a good prior belief. Thinking back to the plane crash example, this is the belief you have before you ever step on a plane. And here, Bayes’ “ought” comes in again; Bayesians are generally committed to the point of view that we can choose a correct (or good enough) prior belief without direct evidence, and let repeated applications of the above rule take care of the rest. You have to have a prior belief that allows you to step onto the plane at all. But once you’ve taken that first step, you can adjust your level of worry about clalefactostratospheric mortality in a way that’s governed by the evidence you’ve collected.

But we may not ever agree about the choice of prior — it’s “subjective,” at least if you think that anything we can’t agree on is subjective. This means that if you are really, really, really worried about dying in a plane crash, there’s nothing I can do to persuade you to change your mind.

But then — that’s the simple truth. There isn’t anything I can do to change your mind. Non-Bayesians want a more principled way of choosing a prior distribution, one that isn’t “subjective.” They want to be able to change your mind about plane crashes by reasoning with you. But since there’s disagreement about what principles to use, and there probably always will be, I’m not sure anything is gained by that approach. At least the Bayesian framework formalizes the disagreement, showing us what we can and what we can’t agree about, and why.

So that’s Bayesian probability in brief — probably too brief, but so be it.

Thomas Bayes and Richard Price

Thomas Bayes himself has a strange position in history. On Wikipedia, the list of things named after Bayes is fully half as long as the list of things named after Einstein, and almost as long as the list of things named after Newton. I’m not quite sure what that measures, but it must not be productivity: Bayes published just two works in his lifetime, and two short papers of his were published posthumously. One of those, “An Essay towards solving a Problem in the Doctrine of Chances,” is the source of all his fame, and of the quotation about expectations and oughts above. It contained a proof of the above formula for conditional probability, and used it to give a mathematically precise way to reason about the conclusions that a given body of evidence can support. His other works have been almost entirely forgotten.

That might be because Bayes was not really a mathematician, so much as a minister with a challenging hobby. One of the two works published while he was alive was a learned treatise on God’s Benevolence — not a genre that has much currency today. There has been some speculation that the second work, an explication and defense of Newton’s formulation of calculus, was well-received, and may have led to his election to the Royal Society2, but I haven’t found much primary reception evidence.

Even that work appears to have been motivated to some degree by Bayes’ religious views. The text is nominally a defense of Newton’s formulation of calculus against George Berkeley’s criticisms in The Analyst. But remarkably (at least to me), the book begins with a discussion of theology. In his preface, Bayes attempts to disentangle questions about mathematics and questions about religion. Berkeley, he says, had inappropriately linked the two:

Had it been his only design to bring this point to a fair issue, whether a demonstration by the method of Fluxions be truly scientific or not, I should have heartily applauded his conduct… but the invidious light in which he has put this debate, by representing it as of consequence to the interests of religion, is, I think, truly unjustifiable, as well as highly imprudent.

In Bayes’ account, Berkeley argued that because calculus was based on mysterious infinitesimal values, belief in the truth of calculus is no different from religious belief. This account of Berkeley’s argument strikes me as oversimplified, or at least a bit glib. Many of Berkeley’s basic mathematical objections were substantive, and would not be fully addressed by mathematicians until more than a century later. Bayes largely ignored these objections. But the broader argument that he made in the preface — that the mathematical questions at hand have no bearing on religious questions at all — has a modern ring. It reminds me a bit of Stephen Jay Gould’s famous characterization of science and religion as “non-overlapping magisteria.”

Thirty years after he published his work on calculus, Bayes’ most famous work appeared in the Transactions of the Royal Society, with a preface by Richard Price, Bayes’ literary executor, fellow minister, and long-time friend. Price’s preface to the essay made the remarkable claim that Bayes’ proof could be the basis of a probabilistic argument for the existence of a creator. The goal of such an argument would be this:

to shew what reason we have for believing that there are in the constitution of things fixt laws according to which things happen, and that, therefore, the frame of the world must be the effect of the wisdom and power of an intelligent cause; and thus to confirm the argument taken from final causes for the existence of the Deity.

Bayes had disentangled mathematical and theological questions; Price re-entangled them by arguing from intelligent design to the existence of an intelligent creator.

Was Price speaking for Bayes? It’s difficult to believe that Bayes would have chosen a literary executor who he knew would distort or misrepresent his argument. But if Bayes really did change his mind this way, it would be a bit like Gould dropping out of the evolutionary theory business, and setting up shop as a creationist. It would not have been quite as extreme, since Bayes was already a minister. But it nonetheless would represent a significant philosophical shift.

Three possible ways of explaining this tension occur to me. The first is that Bayes’ views did indeed shift, and that he came to the conclusion that mathematical questions really can have theological relevance. The second is that Price is simply using the publication of Bayes’ paper as an opportunity to push his own philosophical agenda. And the third is that Bayes’ argument in the preface to his defense of Newton’s calculus was not a global epistemological claim, as Gould’s was, but a local intervention without bearing on other possible links between mathematical and theological debates.

That third possibility intrigues me the most. I have no particular interest in developing an argument for the existence of a creator deity. But suppose Bayes did feel that while questions about the foundation of calculus had no relevance to theology, his work on probability did. That suggests to me that even when it was first proposed, Bayesian probability had a kind of philosophical oomph that mathematical theorems don’t always have.

Price himself would go on to write a number of influential works, including one on the statistics of annuities and life insurance. Apparently Edmund Burke thought this was an unseemly topic for a member of the clergy, and gave Price a nickname: “the calculating divine.” Some of the actuarial tables Price created, the so-called Northampton tables, were in wide use for almost a century. At the same time, Price was an influential moral philosopher; his Review of the Principal Questions and Difficulties in Morals probably influenced William Godwin’s Enquiry Concerning Political Justice. This leaves us with a final puzzle: one might expect an actuary and statistician to be an empiricist — a skeptic like Hume for whom no evidence is fully persuasive. But Price argued against the mainstay of empiricist moral philosophy, moral sense theory. Like Godwin, Price was a moral rationalist.

  1. This requires you to replace the apples, so bite gingerly. Also, as a non-apple-bobber, I have no idea how the size of an apple affects its… um… bobbability. I take it for granted here that large and small apples are equally bobbable. 
  2. See Bellhouse, “The Reverend Thomas Bayes, FRS: A Biography to Celebrate the Tercentenary of His Birth,” Statistical Science 2004, 19:1, 3–43. The work is attributed to James Hodgson in one Google Books copy, but I know of no reason to doubt the usual attribution represented here. However, if you want to actually read it, I recommend the misattributed copy; the scan quality is much higher.