A Random Entry

There’s a way of telling a history of the digital humanities that does not follow the well known trajectory from Father Busa’s Index Thomisticus, Mosteller and Wallace’s study of the Federalist Papers, and the Text Encoding Initiative — to Distant Reading, data mining, and the present day. It does not describe the slow transformation of a once-peripheral field into an increasingly mainstream one. Instead, it describes a series of missed opportunities.

It’s a polemical history that inverts many unspoken assumptions about the relationship between the humanities and the sciences. I’m not sure I entirely believe it myself. But I think it’s worth telling.

It starts like this: there once was a guy named Frank Rosenblatt. In 1957, Rosenblatt created the design for a device he called the perceptron. It was an early attempt at simulating the behavior of networks of biological neurons, and it initially provoked a frenzy of interest, including the following New York Times report:

The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.

Needless to say, the perceptron never managed to do any of those things. But what it did do was exceptional in its own way. It used to great practical effect the following insight: that many small, inaccurate rules can be combined in simple ways to create a larger, more accurate rule. This insight is now central to statistical learning theory.1 But it was not understood as particularly important at the time.

In fact, when people began to realize that the perceptron in its simplest form was limited, a backlash ensued. Marvin Minsky and Seymour Papert wrote a book called Perceptrons that enumerated the limits of simple two-layer perceptrons2; people misunderstood the book’s arguments as applying to all neural networks; and the early promise of perceptrons was forgotten.

This turn of events may have delayed the emergence of interesting machine learning technologies by a couple of decades. After the perceptron backlash, artificial intelligence researchers focused on using logic to model thought — creating ever more complex sets of logical rules that could be combined to generate new rules within a unified and coherent system of concepts. This approach was closely related to the kinds of transformational grammars that Noam Chomsky has been exploring since the 1950s, and it largely displaced statistical approaches — with a few exceptions — until the 1990s.

Unsurprisingly, Chomsky remains hostile to statistical and probabilistic approaches to machine learning and artificial intelligence. Nonetheless, there does seem to be some evidence that those approaches have gotten something right. Peter Norvig offers the following summary:

Chomsky said words to the effect that statistical language models have had some limited success in some application areas. Let’s look at computer systems that deal with language, and at the notion of “success” defined by “making accurate predictions about the world.” First, the major application areas:

  • Search engines: 100% of major players are trained and probabilistic. Their operation cannot be described by a simple function.
  • Speech recognition: 100% of major systems are trained and probabilistic…
  • Machine translation: 100% of top competitors in competitions such as NIST use statistical methods…
  • Question answering: this application is less well-developed, and many systems build heavily on the statistical and probabilistic approach used by search engines…

Now let’s look at some components that are of interest only to the computational linguist, not to the end user:

  • Word sense disambiguation: 100% of top competitors at the SemEval-2 competition used statistical techniques; most are probabilistic…
  • Coreference resolution: The majority of current systems are statistical…
  • Part of speech tagging: Most current systems are statistical…
  • Parsing: There are many parsing systems, using multiple approaches. Almost all of the most successful are statistical, and the majority are probabilistic…

Clearly, it is inaccurate to say that statistical models (and probabilistic models) have achieved limited success; rather they have achieved a dominant (although not exclusive) position.

In the past fifteen years, these approaches to machine learning have produced a number of substantial leaps forward — consider Google’s famous creation of a neural network that (in at least some sense) reinvented the concept of “cat,” or this recurrent neural network capable of imitating various styles of human handwriting. These extraordinary successes have been made possible by a dramatic increase in computing power. But without an equally dramatic shift in ways of thinking about what constitutes knowledge, that increase in computing power would have accomplished far less. What has changed is that the people doing the math have stopped trying to find logical models of knowledge by hand, and have started trying to find probabilistic models of knowledge — models that embrace heterogeneity, invite contradiction, and tolerate or even seek out ambiguity and uncertainty. As machine learning researchers have discovered, the forms these models take can be defined with mathematical precision, but the models themselves tolerate inconsistencies in ways that appear to be unbound by rigid logic.3

I’d like to suggest that by embracing that kind of knowledge, computer scientists have started walking down a trail that humanists were blazing fifty years ago.

The kind of knowledge that these machines have does not take the form of a rich, highly structured network of immutable concepts and relations with precise and predictable definitions. It takes the form of a loose assembly of inconsistent and mutually incompatible half-truths, always open to revision and transformation, and definable only by the particular distinctions it can make or elide at any given moment. It’s the kind of knowledge that many literary scholars and humanists have found quite interesting for the last few decades.

Since the decline of structuralism, humanists have been driven by a conviction that the loosely or multiply structured behaviors that constitute human culture produce important knowledge that cannot be produced in more structured ways. Those humanities scholars who remained interested in structured ways of producing knowledge — like many of the early practitioners of humanities computing — were often excluded from conversations in the humanistic mainstream.

Now something has changed. The change has certainly brought computational methods closer to the mainstream of the humanities. But we mustn’t mistake the change by imagining that humanists have somehow adopted a new scientism. A better explanation of this change is that computer scientists, as they have learned to embrace the kinds of knowledge produced by randomness, have reached a belated understanding of the value of — dare I say it? — post-structuralist ways of knowing.

It’s a shame it didn’t happen earlier.


  1. I first encountered the above formulation of this idea in the first video in Geoffrey Hinton’s online course on neural networks. But you can see it being used by other researchers (p. 21) working in machine learning on a regular basis. 
  2. In machine learning lingo, it could not learn nonlinear decision boundaries. It didn’t even have the ability to calculate the logical XOR operation on two inputs, which at the time probably made logic-based approaches look far more promising. 
  3. I say “appear” because it’s not entirely clear what it would mean to be unbound by rigid logic. The mathematical formulation of machine learning models is itself perfectly strict and internally consistent, and if it weren’t, it would be irreparably broken. Why don’t the statistical models represented by that formulation break in the same way? I suspect that it has something to do with the curse blessing of dimensionality. They don’t break because every time a contradiction appears, a new dimension appears to accommodate it in an ad-hoc way — at least until the model’s “capacity” for such adjustments is exhausted. I’m afraid I’m venturing a bit beyond my existential pay grade with these questions — but I hope this sliver of uncertainty doesn’t pierce to the core of my argument. 

7 Responses

  1. “This turn of events may have delayed the emergence of interesting machine learning technologies by a couple of decades.” Well, I don’t know. Given that, as you point out, these machine learning technologies require a lot of computing “horsepower” I’m not sure they would have been very practical much earlier than they were in fact adopted. And when Stanley Fish was critiquing computational stylistics back in the 1970s, he wasn’t criticising symbolic AI, he was criticizing statistical methodology. But then he also criticised MAK Halliday’s (non-computational) linguistic analysis at the same time. What bothered Fish was mechanism; the difference between symbolic and statistical was irrelevant. At the same time, many in AI and cognitive science were aware of the limitations of logic in the mid-1970s, if not earlier.

    It seems to me that you paper over a lot of territory in footnote 3. It’s not as though the computationalists have given up on formalism. As you point out, they haven’t. But it’s not at all clear to me that “mainstream” humanists are prepared to move beyond discursive modes of thought expressed entirely in natural language free of graphs, charts, diagrams, math of any kind, much less computer programs running in the Other Room.

    Moreover I note that the perceptron was proposed as a way of thinking about, well, perception. And a lot of people in AI and cognitive science think that the statistical techniques they’ve come to use, they’re about how the mind works. But today’s computational humanists for the most part scrupulously avoid any hint that computation (in any form) might have something to say about the mind. They’re using these techniques because they produce useful results. The idea that minds might, in some measure, work like this, that is no interest.

    It seems to me you’ve got an argument based on broad intellectual themes, which is one thing. But the themes themselves are rather distant – to use that pesky spatial schematization – from the working details of intellectual practice by humanists and AI researchers/cognitive scientists.

    1. Well — my post was definitely a broad-brush sketch! Thank you for this response — it’s really insightful.

      I actually love the way you formulate things in that last paragraph. I often find that those working details defy conventional narratives about the development and history of different fields. But those narratives have force, and I think collaboration between fields is often hindered by high-level narratives that are suggestive of “disciplinary colonization” and the like. So part of my goal was to suggest a counter-narrative — not necessarily a more accurate one than the colonial narrative told in, for example, Suvir Kaul’s reply to Herbert Simon — but one that could open up imaginative space for balanced collaboration between humanists and technologists.

      A few specific replies:

      “This turn of events may have delayed the emergence of interesting machine learning technologies by a couple of decades.” Well, I don’t know. Given that, as you point out, these machine learning technologies require a lot of computing “horsepower” I’m not sure they would have been very practical much earlier than they were in fact adopted.

      They wouldn’t have been practical, no question. But this still feels to me like a big missed opportunity! My real interest in all of this is statistical learning theory, which didn’t require big CPUs to develop — it’s purely theoretical. I think it pairs nicely with many humanistic assumptions. And unless I’m mistaken, it didn’t really get going until statistical approaches went mainstream. (One of the foundational results, the No Free Lunch theorem, didn’t arrive until 1997.) I still feel fairly certain that the backlash against perceptrons is at least partially responsible for that delay. So here, it’s not a question of individual practitioners for me so much as a question of large-scale funding patterns.

      And when Stanley Fish was critiquing computational stylistics back in the 1970s, he wasn’t criticising symbolic AI, he was criticizing statistical methodology. But then he also criticised MAK Halliday’s (non-computational) linguistic analysis at the same time. What bothered Fish was mechanism; the difference between symbolic and statistical was irrelevant. At the same time, many in AI and cognitive science were aware of the limitations of logic in the mid-1970s, if not earlier.

      Yes — this is a part of the above narrative that is thin. The unfortunate fact is that while many humanistic critiques were sound (where applied appropriately), they were offered and taken as blanket critiques, as you say, of mechanism. I think part of the problem here is that humanists didn’t have the background knowledge to clearly recognize and embrace those developments that were actually promising. So here, maybe I’ve gone a little too easy on the humanities.

      It seems to me that you paper over a lot of territory in footnote 3. It’s not as though the computationalists have given up on formalism. As you point out, they haven’t. But it’s not at all clear to me that “mainstream” humanists are prepared to move beyond discursive modes of thought expressed entirely in natural language free of graphs, charts, diagrams, math of any kind, much less computer programs running in the Other Room.

      I have mixed feelings about this one. I’d say this is a transitional moment. At least here at Penn, there seem to be a lot of people with an open mind about charts, and so on — especially younger humanists, and especially for teaching purposes. But you’re right that there’s still a lot of disciplinary bridge-building to be done at this point of contact. This connects to the following point about “how the mind works” — I think that there might indeed be some humanists who want to go this direction. But in the path that’s most comfortable (I think) is a top-down one — starting with practical results about large-scale phenomena that are recognizable given current disciplinary knowledge. These are areas where literary scholars, at least, can propose and test new hypotheses quickly, without having to worry about how “the mind” actually works.

      And I think that’s the way it should be. Starting from “the mind” is a problem in at least two ways. First, it evokes the kinds of colonial narratives that I mention above, because it implies a hierarchy of “purity” that is not appealing. And it also has the problem that we appear to be talking about The Mind — not the many different human minds in the world, all of which work a little differently! Personally, I’m tremendously interested in minds, but not very interested in The Mind at all. At some point, there might be some generalizations about The Mind that humanists find useful and valid, but so far, I haven’t seen anything. We just don’t know enough about The Mind to be especially useful to humanists (with a few exceptions, some of which were actually driven by literary scholars — Mark Turner comes to mind — Edit (2016-5-17): and Elaine Scarry!).

      Eventually, I think we’ll see some convergence between these bottom-up and top-down approaches, but it’s going to be impossible to get to questions of interest in the humanities using the bottom-up approach. And we shouldn’t have to try! This is where my strongest disciplinary partisanship comes out. We don’t need to have perfect atomic accounts of how culture works to produce sound knowledge.

      1. Thanks for your response, Scott. “But those narratives have force, and I think collaboration between fields is often hindered by high-level narratives that are suggestive of “disciplinary colonization” and the like.” Yes. And Simon’s piece on lit crit was unfortunate in various ways.

        I could go on and on about this stuff, as I’ve been thinking about it for decades. FWIW I came of intellectual age in the late 1960s into the 1970s when structuralism was giving way to deconstruction etc. and symbolic computation was beginning to stutter. You might be interested in this crude chronology in which I ran cognitive science in parallel with literary theory. You might also want to look at this working paper where, among other things, I have a brief account of the 1970s in lit crit and computational linguistics (pp. 8 ff.). Finally, I leave you with a passage from Peter Gärdenfors (Conceptual Spaces 2000, p. 253) on computing:

        On the symbolic level, searching, matching, of symbol strings, and rule following are central. On the subconceptual level, pattern recognition, pattern transformation, and dynamic adaptation of values are some examples of typical computational processes. And on the intermediate conceptual level, vector calculations, coordinate transformations, as well as other geometrical operations are in focus. Of course, one type of calculation can be simulated by one of the others (for example, by symbolic methods on a Turing machine). A point that is often forgotten, however, is that the simulations will, in general be computationally more complex than the process that is simulated.

        Seems related to No Free Lunch, no?

        1. Re: NFL — yes — I’ll have to take a look at that.

          I love the long view of the field that both of your pieces offer, and the way they diverge from so many conventional histories of DH. I’m going to stew on all this for a while but I bet I’ll have more to say afterwards — thank you!

          1. I suppose the point is that I don’t see DH as a (quasi)autonomous formation but as something that has emerged in a long process of intellectual growth and change. I’ll be interested in your thoughts after you’ve had a time to think this through.

  2. Here’s an interesting (and recent) article that speaks to statistical thought in linguistics: The Unmaking of a Modern Synthesis: Noam Chomsky, Charles Hockett, and the Politics of Behaviorism, 1955–1965, by Gregory Radick (abstract below). Commenting on it at Dan Everett’s FB page, Yorick Wilks observed: “It is a nice irony that statistical grammars, in the spirit of Hockett at least, have turned out to be the only ones that do effective parsing of sentences by computer.”

    Abstract: A familiar story about mid-twentieth-century American psychology tells of the abandonment of behaviorism for cognitive science. Between these two, however, lay a scientific borderland, muddy and much traveled. This essay relocates the origins of the Chomskyan program in linguistics there. Following his introduction of transformational generative grammar, Noam Chomsky (b. 1928) mounted a highly publicized attack on behaviorist psychology. Yet when he first developed that approach to grammar, he was a defender of behaviorism. His antibehaviorism emerged only in the course of what became a systematic repudiation of the work of the Cornell linguist C. F. Hockett (1916–2000). In the name of the positivist Unity of Science movement, Hockett had synthesized an approach to grammar based on statistical communication theory; a behaviorist view of language acquisition in children as a process of association and analogy; and an interest in uncovering the Darwinian origins of language. In criticizing Hockett on grammar, Chomsky came to engage gradually and critically with the whole Hockettian synthesis. Situating Chomsky thus within his own disciplinary matrix suggests lessons for students of disciplinary politics generally and—famously with Chomsky—the place of political discipline within a scientific life.

Leave a Reply