The Radical Potential of RDF Dimension Reduction?

Note (2016-06-03): This revises an earlier post by removing some dated information and expanding the conclusion.

This brief post was inspired by the abstract for a talk by Hanna Wallach at the fourth IPAM Culture Analytics workshop. I didn’t even attend it, so take the first part of this with a grain of salt! In her talk, Wallach discussed a new Bayesian dimension-reduction technique that operates over timestamped triples. In other words — as she and her co-authors put it in the paper summarizing their research — “records of the form ‘country i took action a toward country j at time t’ — known as dyadic events.”

The abstract as well as the paper framed this as a way of analyzing international relations. But what struck me immediately about this model is that it works with data that could be represented very naturally as RDF triples (if you add in a timestamp, that is). That means that this method might be able to do for RDF triples what topic modeling does for texts.

This probably seems like an odd thing to care about to people who haven’t read Miriam Posner‘s keynote on the radical potential of DH together with Matthew Lincoln‘s response. In her keynote, Posner poses a question to practitioners of DH: why do we so willingly accept data models that rely on simplistic categories? She observes, for example, that the Getty’s Union List of Artist Names relies on a purely binary model of gender. But this is strangely regressive in the wider context of the humanities:

no self-respecting humanities scholar would ever get away with such a crude representation of gender… So why do we allow widely shared, important databases like ULAN to deal so naively with identity?”

She elaborates on this point using the example of context-dependent racial categories:

a useful data model for race would have to be time- and place-dependent, so that as a person moved from Brazil to the United States, she might move from white to black. Or perhaps the categories themselves would be time- and place-dependent, so that certain categories would edge into whiteness over time. Or! Perhaps you could contrast the racial makeup of a place as the Census understands it with the way it’s articulated by the people who live there.

Matt Lincoln’s brilliant response takes this idea and gives it a concrete computational structure: RDF. Rather than having fixed categories of race, we can represent multiple different conceptualizations of race within the same data structure. The records of these conceptualizations take the form of {Subject, Verb, Object} triples, which can then form a network:

A diagram of a network of perceived racial categories.

Given that Posner’s initial model included time as well, adding timestamps to these verbs seems natural, even if it’s not, strictly speaking, included in the RDF standard. (Or is it? I don’t know RDF that well!) But once we have actors, timestamped verbs, and objects, then I think we can probably use this new dimension reduction technique on networks of this kind.1

What would be the result? Think about what topic modeling does with language: it finds clusters of words that appear together in ways that seem coherent to human readers. But it does so in a way that is not predictable from the outset; it produces different clusters for different sets of texts, and those differences are what make it so valuable. They allow us to pick out the most salient concepts and discourses within a particular corpus, which might differ case-by-case. This technique appears to do the very same thing, but with relationships between groups of people over time. We might be able to capture local variations in models of identity within different communities.

I am not entirely certain that this would work, and I’d love to hear any feedback about difficulties that this approach might face! I also doubt I’ll get around to exploring the possibilities more thoroughly right now. But I would really like to see more humanists actively seeking out collaborators in statistics and computer science to work on projects like this. We have an opportunity in the next decade to actively influence research in those fields, which will have widespread influence in turn over political structures in the future. By abstaining from that kind of collaboration, we are reinforcing existing power structures. Let’s not pretend otherwise.

  1. With a few small adjustments. Since actors and objects are of the same kind in the model, the verbs would need to have a slightly different structure — possibly linking individuals through identity perceptions or acts of self-identification. 

Neoliberal Citations (and Singularities)

A chart showing the use of the word postmodernism over time.
Ever wonder where “postmodernism” went? I suspect “digital humanities” is headed there too. (Both Google Ngrams and COCA show the same pattern. COCA even lets you limit your search to academic prose!)

My first impulse upon reading last week’s essay in the LA Review of Books was to pay no attention. Nobody I know especially likes the name “digital humanities.” Many people are already adopting backlash-avoidant stances against it. “‘Digital Humanities’ means nothing” (Moretti); “I avoid the phrase when I can” (Underwood). As far as I can tell, the main advantage of “digital humanities” has been that it sounds better than “humanities computing.” Is it really worth defending? It’s an umbrella term, and umbrella terms are fairly easy to jettison when they become the targets of umbrella critiques.

Still, we don’t have a replacement for it. I hope we’ll find in a few years that we don’t need one. In the meanwhile we’re stuck in terminological limbo, which is all the more reason to walk away from debates like this. Daniel Allington, Sarah Brouillette, and David Golumbia (ABG hereafter) have not really written an essay about the digital humanities, because no single essay could ever be about something so broadly defined.

That’s what I told myself last week. But something about the piece has continued to nag at me. To figure out what it was, I did what any self-respecting neoliberal apologist would do: I created a dataset.

A number of responses to the essay have discussed its emphasis on scholars associated with the University of Virginia (Alan Jacobs), on its focus on English departments (Brian Greenspan), and on its strangely dismissive attitude towards collaboration and librarianship (Stewart Varner, Ted Underwood). Building on comments from Schuyler Esprit, Roopika Risam has drawn attention to a range of other occlusions — of scholars of color, scholars outside the US, and scholars at regional institutions — that obscure the very kinds of work that ABG want to see more of. To get an overview of these occlusions, I created a list of all the scholars ABG mention in the course of their essay, along with the scholars’ gender, field, last degree or first academic publication, year of degree or publication, and granting institution or current affiliation.1

The list isn’t long, and you might ask what the point is of creating such a dataset, given that we can all just — you know — read the article. But I found that in addition to supporting all the critiques described above, this list reveals another occlusion: ABG almost entirely ignore early-career humanists. With the welcome exception of Miriam Posner, they do not cite a single humanities scholar who received a PhD after 2004. Instead, they cite three scholars trained in the sciences:

Two, Erez Aiden and Jean-Baptiste Michel, are biostatisticians who were involved with the “culturomics” paper that — well, let’s just say it has some problems. The other, Michael Dalvean, is a political scientist who seems to claim that when used to train a regression algorithm, the subjective beliefs of anthology editors suddenly become objective facts about poetic value.2 Are these really the most representative examples of DH work by scholars entering the field?

I’m still not entirely certain what to make of this final occlusion. Given the polemical character of their essay, I’m not surprised that ABG emphasize scholars from prestigious universities, and that, given the position of those scholars, most of them wind up being white men. ABG offer a rationale for their exclusions:

Exceptions too easily function as alibis. “Look, not everyone committed to Digital Humanities is a white man.” “Look, there are Digital Humanities projects committed to politically engaged scholarly methods and questions.” We are not negating the value of these exceptions when we ask: What is the dominant current supported even by the invocation of these exceptions?

I disagree with their strategy, because I don’t think invoking exceptions inevitably supports an exclusionary mainstream. But I see the logic behind it.

When it comes to early-career scholars, I no longer see the logic. Among all the humanities scholars who received a PhD in the last ten years, I would expect there to be representatives of the dominant current. I would also expect them to feature prominently in an essay like this, since they would be likely to play important roles directing that current in the future. The fact that they are almost entirely absent casts some doubt on one of the essay’s central arguments. Where there are only exceptions, no dominant current exists.

I share with ABG the institutional concerns they discuss in their essay. I do not believe that all value can be reduced to monetary value, and I am not interested in the digital humanities because it increases ROI. Universities and colleges are changing in ways that worry me. I just think those changes have little to do with technology in particular — they are fundamentally social and political changes. Reading the essay with that in mind, the absence of early-career humanists looks like a symptom of a more global problem. Let’s provisionally accept the limited genealogy that ABG offer, despite all it leaves out. Should we then assume that the field’s future will follow a linear trajectory determined only by its past? That a field that used to create neoliberal tools will mechanically continue to do so in spite of all efforts to the contrary? That would be a terrible failure of imagination.

In 1993, Vernor Vinge wrote a short paper called “The Coming Technological Singularity,” which argued that the rate of technological change will eventually outstrip our ability to predict or control that change. In it, he offers a quotation from Stanislaw Ulam’s “Tribute to John von Neumann” in which Ulam describes a conversation with von Neuman that

centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.

Over the last decade, many optimistic technologists have described this concept with an evangelistic fervor that the staunchest Marxist revolutionary could admire. And yet this transformation is always framed as a technological transformation, rather than a social or political one. The governing fantasy of the singularity is the fantasy of an apolitical revolution, a transformation of society that requires no social intervention at all.3 In this fantasy, the realms of technology and politics are separate, and remain so even after all recognizable traces of the preceding social order have been erased. “Neoliberal Tools (and Archives)” seems to work with the same fantasy, transformed into a nightmare. In both versions, to embrace technology is to leave conscious political engagement behind.

But the singularity only looks like a singularity to technologists because they are used to being able to predict the behavior of technology. From the perspective of this humanist, things look very different: the singularity already happened, and we call it human society. When in the last five-hundred years has it ever been possible for human affairs, as known at a given moment, to continue? What have the last five centuries of human history been if not constant, turbulent, unpredictable change? If a technological singularity arrives, it will arrive because our technological lives will have become as complex and unpredictable as our social and political lives already are. If that time comes, the work of technologists and humanists will be the same.

That might be an unsettling prospect. But we can’t resist it by refusing to build tools, or assuming that the politics of tool-building are predetermined from the outset. Instead, we should embrace building and using tools as inherently complex, unpredictable social and political acts. If they aren’t already, they will be soon.

  1. Please let me know if you see any errors in this data. 
  2. This would be like claiming that when carefully measured, an electorate’s subjective beliefs about a candidate become objective truths about that candidate. To be fair, I’m sure Dalvean would recognize the error when framed that way. I actually like the paper otherwise! 
  3. The genius of The 100 (yes, that show on CW) is that it breaks through this mindset to show what a technological singularity looks like when you add the politics of human societies back in.