Do style and statistics mix? I’m not sure of this. But it gets me thinking.
Coding Robots’s I Write Like… has been around for a while, but I’ve never checked it out—until now, when I found the German version Ich schreibe wie …, published last year by the online edition of the German newspaper Frankfurter Allgemeine Zeitung, based on the Coding Robots code.
I have no idea about how the algorithms work. The code has been open sourced, but I’ve never heard of the programming language “Racket,” and to go through it I’d have to take a day off or two. The German article F.A.Z.-Stiltest »Ich schreibe wie« about the tool isn’t particularly enlightening. But it’s a statistical analysis tool:
Actually, the algorithm is not a rocket science, and you can find it on every computer today. It’s a Bayesian classifier, which is widely used to fight spam on the Internet. Take for example the “Mark as spam” button in Gmail or Outlook. When you receive a message that you think is spam, you click this button, and the internal database gets trained to recognize future messages similar to this one as spam. This is basically how “I Write Like” works on my side: I feed it with “Frankenstein” and tell it, “This is Mary Shelley. Recognize works similar to this as Mary Shelley.” Of course, the algorithm is slightly different from the one used to detect spam, because it takes into account more stylistic features of the text, such as the number of words in sentences, the number of commas, semicolons, and whether the sentence is a direct speech or a quotation.
Now, I fed the German version my blogpost Das Guttenberg-Krislein und die Modellierung einer Post-Demokratie, which—politics, snark, and all that—is a fairly typical piece for the style and tone I write in German. Now, wie schreibe ich?
Okay. For the English version I picked my blogpost Pig Stories, Dog Stories, Our Stories without the Hofstaedter quote, which—more essayistic, more emotional—is a rather typical example for how I write in English. How do I write?
George Orwell? One of the best essayists who’s ever lived? Now that’s flattering! But I suspect it’s more likely because my blogpost has pigs and dogs in abundance, not because of my stylistic brilliance.
And Freud, well, that’s flattering too in many ways. And, come to think of it—maybe statistical algorithms have become our new kind of Über-Ich for the Digital Age, filtering out everything deemed unsuitable and socially disruptive.