Brownian Noise and Plot Arcs

A plot of Brownian noise.
A tenth of a second of Brownian noise in PCM format. Large samples of Brownian noise give results similar to those reported by Jockers and Reagan et. al.

A couple of months ago, a research group released a paper on the ArXiv titled “The emotional arcs of stories are dominated by six basic shapes.” In it, they replicate results similar to those first described by Matt Jockers, using a somewhat different technique.

I’ve written a jupyter notebook that raises doubts about their argument. They claim that their work has shown that there are “basic shapes” that dominate human stories, but the results they’ve described provide no basis for such generalizations. Given what we know so far, it’s much more likely that the emotional arcs that these techniques reveal are, in general, noise. The notebook is available for examination and reuse as a github repository.

What does it mean to say that these plot shapes are “noise”? The notebook linked above focuses on technical issues, but I want to write a few brief words here about the broader implications my argument has — if it’s correct. Initially, it may seem that if sentiment data is “noise,” then these measurements must be entirely meaningless. And yet many researchers have now done at least some preliminary work validating these measurements against responses from human readers. Jockers’ most recent work shows fairly strong correlations between human sentiment assessments and those produced by his Syuzhet package. If these sentiment measurements are meaningless, does that mean that human assessments are meaningless as well?

That conclusion does not sit well with me, and I think it is based on an incorrect understanding of the relationship between noise and meaning. In fact, according to one point of view, the most “meaningful” data is precisely the most random data, since maximally random data is in some sense maximally dense with information — provided one can extract the information in a coherent way. Should we find that sentiment data from novels does indeed amount to “mere noise,” literary critics will have some very difficult questions to ask themselves about the conditions under which noise signifies.

6 Responses

  1. Great work here.

    One minor point: you say

    They report using a window of 10,000 words, and they also state that some of their texts are only 10,000 words long. My instinct says that the majority of their texts must have been at least 50,000 words long, but without looking into it, we can’t be certain; they don’t seem to have released that information in an easy-to-investigate way. If the average length of texts in their corpus is only 20,000 words, we should probably disregard their results entirely.

    I think you can roughly read this off the various plots on page S2; the (log) average length that meets their length criterion is about 63,000 words, give or take; if you assume there’s no strong relationship between downloads and length (there doesn’t seem to be in panel 3), then that’s probably the average length for their corpus.

    1. Ah, yes! I think I saw that early on and then couldn’t find it later, when an interlocutor was pointing out the stated 10k minimum. I was convinced that most of the texts had to be >50k words long but couldn’t remember why! So I backed down on that point. I’ll have to finesse that section.

  2. Another way to cast the significance of noise is not about meaningless or information but unpredictability. At any given moment in a novel, do you know whether sentiment is about to get more or less positive on the next page. Subjectively, I don’t think I do.

    This would in fact make for a useful additional experiment in this space. (Matt, are you reading???) If readers can forecast sentiment as well as reproduce it, that suggests that modeling has some cognitive behaviors it can reproduce. If they can’t, it suggests that there’s an efficient markets hypothesis for plots. And I can handwave an explanation for why that might be: maybe the whole thing that makes narrative compelling is that we don’t know what’s going to happen–that even when the mood gets ominous when the “Jaws” theme starts playing, we adjust for the darkness and then hope that this is the scene they kill the shark in.

    My armchair wager would be that people are quite bad at forecasting except in certain constrained situations, particularly around beginnings and ends of novels.

    1. The “efficient plots” hypothesis is pretty wonderful! While pondering this, I’ve been thinking about unpredictability in very different terms — as an environmental effect on an unconstrained “author particle” in n-space; environmental noise produces the same kind of “kicks” that produce Brownian motion in good old fashioned physics. But that seemed just too elaborate and strange to build an argument from. This is way more concrete.

      It’s also a readerly rather than a writerly explanation, if that makes sense. The “Homo Narrativus” stuff strongly suggests (without saying it of course) fixed biological behavioral constraint, and in a sense what I was trying to argue is that writers are essentially unconstrained in the kinds of plots they produce, perhaps with the exception that we can expect some local correlation — e.g. if page 55 is very sad, page 56 won’t be very happy most of the time — or abstractly, high frequency movements won’t have high amplitude. But casting this in terms of reception is more natural, especially since that’s what Jockers is validating, and since that’s essentially what most sentiment dictionaries will be measuring. (But that suggests an interesting aside: is there a dictionary produced by asking people to write something that they think matches a fixed sentiment rating? Would it look different from the ones we have now?)

      This way of thinking also suggests that there really ought to be a genre signal here. I am more confident in my ability to predict the sentiment of the next page of a work of sharply defined genre fiction than of a less, well, generic work.

Leave a Reply