Old English Phonotactic Constraints?

Something interesting happens when you train a neural network to predict the next character given a text string. Or at least I think it might be interesting. Whether it’s actually interesting depends on whether the following lines of text obey the phonotactic constraints of Old English:

amancour of whad sorn on thabenval ty are orid ingcowes puth lee sonlilte te ther ars iufud it ead irco side mureh

It’s gibberish, mind you — totally meaningless babble that the network produces before it has learned to produce proper English words. Later in the training process, the network tends to produce words that are not actually English words, but that look like English words — “foppion” or “ondish.” Or phrases like this:

so of memmed the coutled

That looks roughly like modern English, even though it isn’t. But the earlier lines are clearly (to me) not even pseudo-English. Could they be pseudo-Old-English (the absence of thorns and eths notwithstanding)? Unfortunately I don’t know a thing about Old English, so I am uncertain how one might test this vague hunch.

Nonetheless, it seems plausible to me that the network might be picking up on the rudiments of Old English lingering in modern (-but-still-weird) English orthography. And it makes sense — of at least the dream-logic kind — that the oldest phonotactic constraints might be the ones the network learns first. Perhaps they are in some sense more fully embedded in the written language, and so are more predictable than other patterns that are distinctive to modern English.

It might be possible to test this hypothesis by looking at which phonotactic constraints the network learns first. If it happened to learn “wrong” constraints that are “right” in Old English — if such constraints even exist — that might provide evidence in favor of this hypothesis.

If you’d like to investigate this and see the kind of output the network produces, I’ve put all the necessary tools online. I’ve only tested this code on OS X; if you have a Mac, you should be able to get this up and running pretty quickly. All the below commands can be copied and pasted directly into Terminal. (Look for it in Applications/Utilities if you’re not sure where to find it.)

  1. Install homebrew — their site has instructions, or you can just trust me:
    ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  2. Use homebrew to install hdf5:
    brew tap homebrew/science
    echo 'Now you are a scientist!'
    brew install hdf5
  3. Use pip to install PyTables:
    echo 'To avoid `sudo pip`, get to know'
    echo 'virtualenv & virtualenvmanager.'
    echo 'But this will do for now.'
    sudo -H pip install tables
  4. Get pann:
    git clone https://github.com/senderle/pann
  5. Generate the training data from text files of your choice (preferably at least 2000k of text):
    cd pann
    ./gen_table.py default_alphabet new_table.npy
    ./gen_features.py -t new_table.npy \
        -a default_alphabet -c 101 \
        -s new_features.h5 \ 
        SOME_TEXT.txt SOME_MORE_TEXT.txt
  6. Train the neural network:
    ./pann.py -I new_features.h5 \ 
        -C new_features.h5 -s 4000 1200 300 40 \
        -S new_theta.npy -g 0.1 -b 100 -B 1000 \
        -i 1 -o 500 -v markov
  7. Watch it learn!

It mesmerizes me to watch it slowly deduce new rules. It never quite gets to the level of sophistication that a true recurrent neural network might, but it gets close. If you don’t get interesting results after a good twenty-four or forty-eight hours of training, play with the settings — or contact me!

Log-Scale Reading

When I started this blog, I promised myself that I’d post once every two weeks. Recently, that has begun to feel like a challenge. According to that schedule, dear reader, I owed you a post approximately eleven days ago. Unfortunately, I have only a massive backlog of half-baked ideas.1

But the show must go on!2 I’ve decided to take inspiration from Andrew Piper, who recently introduced “Weird Idea Wednesday” with this astute observation:

Our field is in its infancy and there is no road map. Weird ideas have an important role to play, even if they help profile the good ideas more clearly.

Tis wrote only for the curious and inquisitive.
Sound advice from the first volume of Tristram Shandy (1760).

I couldn’t agree more. In fact, I thought the inaugural weird idea was quite wonderful! It involved using a “market basket” algorithm on texts — the amazon.com approach to diction analysis. My philosophy is, why shouldn’t a sentence be like a shopping cart? I have no idea whether this approach could be useful, but then — that’s the point.

I will forgive the skeptics in my audience for thinking that this is just a highfalutin justification for writing filler to meet an arbitrary publication schedule.3 You others: read on.

My half-baked idea for this Friday is that we should come up with a new kind of reading in addition to close and distant reading: log-scale reading. I’m not certain this is a good idea; I’m not even totally certain what it means. But think for a moment about why people use log scales: they reveal patterns that only become obvious when you use a scale that shows your data at many different levels of resolution.

For example, consider this chart:

An exponential-looking chart.

It’s a line chart of the following function:

y = 10 ** x + 0.5 * 6 ** x * np.sin(10 * x)

Now, you can see from the equation that there’s a lot of complexity here; but if you had only seen the graph, you’d only notice the pattern of exponential growth. We’re zoomed way too far out. What happens when we zoom in?

A weird, squiggly chart

Now we get a much better sense of the low-level behavior of the function. But we get no information about what happens to the right. Does it keep going up? Does it level off? We have no idea. Log scale to the rescue:


This doesn’t make smaller patterns invisible, nor does it cut off our global view. It’s a much better representation of the function.

Now, this may seem dreadfully obvious. It’s data visualization 101: choose the right scale for your data. But I find myself wondering whether there’s a way of representing texts that does the same thing. When discussing distant reading and macroanalysis, people talk a lot about “zooming out” and “zooming in,” but in other fields, that’s frequently not the best way to deal with data at differing scales. It’s often better to view your data at multiple scales simultaneously. I’d like to be able to do that with text.

So that illustrates, in a very hazy and metaphorical way, what log-scale reading would be. But that’s all I’ve got so far. In some future Filler Friday4 post, I’ll explore some possibilities, probably with no useful outcome. I’ll try to make some pretty pictures though.

One final question: Am I missing something? Does log-scale reading already exist? I’d love to see examples.5

  1. This is partially the result of personal circumstances; I spent the last month moving from Saratoga Springs to Philadelphia. But that’s no excuse! 
  2. This is not literally true. There’s no reason for the show to go on. 
  3. I decided not to call this feature of my blog “Filler Friday,” but I won’t object if you do. 
  4. OK, actually, it’s a pretty good name. 
  5. Since I posted this, there have been some interesting developments along these lines. In the Winter 2016 issue of Critical Inquiry, Hoyt Long and Richard Jean So make some persuasive arguments for this kind of multi-scale reading, although the task of visualizing it remains elusive, as does (I would argue) the task of developing arguments across multiple scales in fully theorized ways.