The Size of the Signifier

What is a feature?

It’s worth thinking about the way machine learning researchers use the word “feature”. They speak of “feature selection,” “feature engineering,” and even “automated feature learning.” These processes generally produce a structured body of input — a set of “feature vectors” that machine learning algorithms use to make predictions. But the definition of “feature” is remarkably loose. It amounts to something like “a value we can measure.”

Convolutional Neural Network feature visualization.
Three levels of features learned by a convolutional neural network. From Lee, Grosse, Raganath, Ng, “Convolutional Deep Belief Networks”

People unfamiliar with machine learning might imagine that in the context of (say) face recognition, a “feature” would be the computational equivalent of “high cheekbones” or “curly hair.” And they wouldn’t be far off the mark to think so, in a way — but they might be surprised to learn that often, the features used by image recognition software are nothing more than the raw pixels of an image, in no particular order. It’s hard to imagine recognizing something in an image based on pixels alone. But for many machine learning algorithms, the measurements that count as features can be so small as to seem almost insignificant.

This is possible because researchers have found ways to train computers to assemble higher-level features from lower-level ones. And we do the same thing ourselves, in our own way — we just don’t have conscious access to the raw data from our retinas. The features that we recognize as recognizable are higher-level features. But those had to be assembled too, and that’s part of what our brain does.1 So features at one level are composites of smaller features. Those composite features might be composed to form even larger features, and so on.

When do things stop being features?

One answer is that they stop when we stop caring about their feature-ness, and start caring about what they mean. For example, let’s change our domain of application and talk about word counts. Word counts are vaguely interesting to humanists, but they aren’t very interesting until we start to build arguments with them. And even then, they’re just features. For example, suppose we have an authorship attribution program. We feed passages labeled with their authors into the program; it counts the words in the passages and does some calculations. Now the program can examine texts it hasn’t seen before and guess the author. In this context, that looks like an interpretive move: the word counts are just features, but guess the program makes is an interpretation.

We can imagine another context in which the author of a text is a feature of that text. In such a context, we would be interested in some other property of the text: a higher-level property that depends somehow on the identity of the text’s author. But that property would not itself be a feature — unless we keep moving up the feature hierarchy. This is an anthropocentric definition of the word “feature”: it depends on what humans do, and so gives up some generality. But for that same reason, it’s useful: it shows that we determine by our actions what counts as a feature. If we try to force ourselves into a non-anthropocentric perspective, it might start to look like everything is a feature, which would render the word altogether useless.

I think this line of reasoning is useful for thinking through this moment in the humanities. What does it mean for the humanities to be “digital,” and when will the distinction fade? I would guess that it will have something to do with coming shifts in the things that humanists consider to be features.

In my examples above, I described a hierarchy of features, and without saying so directly, suggested that we should want to move up that hierarchy. But I don’t actually think we should always want that — quite the opposite. It may become necessary to move back down a level or two at times, and I think this is one of those times. Things that used to be just features are starting to feel more like interpretations again. This is how I’m inclined to think of the idea of “Surface Reading” as articulated by Stephen Best and Sharon Marcus a few years ago. The metaphor of surface and depth is useful; pixels are surface features, and literary pixels have become interesting again.

Why should that be so? When a learning algorithm isn’t able to answer a given question, sometimes it makes sense to keep using the same algorithm, but return to the data to look for more useful features. I suspect that this approach is as useful to humans as to computers; in fact, I think many literary scholars have adopted it in the last few years. And I don’t think it’s the first time this has happened. Consider this observation that Best and Marcus make about surface reading in that article:

This valorization of surface reading as willed, sustained proximity to the text recalls the aims of New Criticism, which insisted that the key to understanding a text’s meaning lay within the text itself, particularly in its formal properties.

From a twenty-first century perspective, it’s sometimes tempting to see the New Critics as conservative or even reactionary — as rejecting social and political questions as unsuited to the discipline. But if the analogy I’m drawing is sound, then there’s a case to be made that the New Critics were laying the ground necessary to ask and answer those very questions.2

In his recent discussion of the new sociologies of literature, Ted Underwood expresses concern that “if social questions can only be addressed after you solve all the linguistic ones, you never get to any social questions.” I agree with Underwood’s broader point — that machine learning techniques are allowing us to ask and answer social questions more effectively. But I would flip the argument on its head. The way we were answering linguistic questions before was no longer helping us answer social questions of interest. By reexamining the things we consider features — by reframing the linguistic surfaces we study — we are enabling new social questions to be asked and answered.

  1. Although neural networks can recognize images from unordered pixels, it does help to impose some order on them. One way to do that is to use a convolutional neural network. There is evidence that the human visual cortex, too, has a convolutional structure
  2. In the early twentieth century, the relationships between physics, chemistry, biology, and psychology were under investigation. The question at hand was whether such distinctions were indeed meaningful — were chemists and biologists effectively “physicists in disguise”? A group of philosophers sometimes dubbed the “British Emergentists” developed a set of arguments defending the distinction, arguing that some kinds of properties are emergent, and cannot be meaningfully investigated at the atomic or chemical level. It seems to me that linguistics, literary study, and sociology have a similar relationship; linguists are the physicists, literary scholars the chemists, and sociologists the biologists. We all study language as a social object, differing more in the levels at which we divide things into features and interpretations — morpheme, text, or field. And in this schema, I think the emergent relationships do not “stack” on top of psychology and go up from there. That would suggest that sociology is more distant from psychology than linguistics. But I don’t think that’s true! All three fields seem to me to depend on psychology in similar ways. The emergent relationships between them are orthogonal to emergent relationships in the sciences.