What is the difference between a novelist and a language model?
#3/n in the 'Language Models and Writing' Series
Last week I wrote about Johnny Dawes and style, which oddly enough has a lot of resonance with the essay below on AI and novel writing, so I would encourage you to go and read that one if you haven’t already. It’s fundamentally about ‘how-ness’. Otherwise, welcome back to the AI + lit. series. The previous post is here, while the introduction is here.
This year at NeurIPS – the important global AI research conference – there is a ‘Creative AI’ track, specifically focussed on ‘ambiguity’. It’s a nice theme, because it is a rebalancing. There is an awful lot of forecasting and foreseeing that goes on in AI, when there should be a lot more shrugging and waiting.
Lots of the creative work produced by language models (LMs) is ambiguous, not simply in terms of its content, but also because we don’t yet know quite what to make of it. It’s often taken as read that AI agents will become good writers – if they aren’t already – simply because they have excellent text manipulation skills. Yet after experimenting with some fairly successful consumer AI novel-writing agents, such as SudoWrite and DeepAI (though these are just wrappers around OpenAI and Anthropic models) it’s quite hard to tell whether what they are producing is promising, and will in time become good writing, or whether they are entirely barking up the wrong tree.
I think there are a few concrete capabilities that LM writers are lacking which are preventing them from producing great work. In the spirit of ambiguity, I remain undecided as to whether they’ll be developed or not.
How do LMs know what matters?
Human writers alight upon their subject matter through a complicated combination of factors: intellectual / aesthetic interest (‘Paris is so beautiful/interesting’), political resonance (‘The world needs someone to explain Paris), self-promotion (‘I’ll get noticed if I write about Paris’), contingency (‘my train broke down in Paris’). Many of these factors also include predictions about states of affairs: for example, my goal of self-promotion requires me to have a guess at what might matter to readers ‘out there’ now and in the future.
The way that writing ‘matters’ is not just by responding to whatever seems most pressing in the news at that moment. Bernard Williams once made the simple but under-appreciated point that important writing – in this case philosophical, but the point extends to other genres – “is not likely to make it immediately obvious what the work has to do with our most urgent concerns, because its interest is in the less obvious roots and consequences of our concerns”. It is precisely this ‘less obvious’-ness which poses the problem, since almost by definition the less obvious roots of our concerns are those which are poorly described or perhaps not described at all, and language models currently gain their knowledge through things that have been described or represented in some way.
In other words, I think it is highly plausible that an LM author will write an accomplished, stylish work, but will it write one that matters to us?
To the shape-rotators out there, this is a prompt for something like more experimentation with reinforcement learning in the LM architecture. In The Gay Science (1887) Nietzsche explained well the difference between the current wrist-slapping RHLF at the end of the training pipeline, and what an AI agent embedded properly in language and human culture would be like.
“The thinker,” he writes, meaning the ideal LM agent, “sees his own actions and experiments as questions, as seeking explanations of something: to him, success and failures are primarily answers. To be vexed or even to feel remorse because something goes wrong — he leaves that to those who act because they were ordered to do so” – by which he means agents trained with the RHLF – “and who expect a beating when his gracious lordship” – the reward model – “is not pleased with the result.”
Reasoning over style
It is of course very easy to point out the crimes against literary style that current LM authors commit, but it is also very midwit to do so. If you are mocking a model for being bad at writing, you are missing the point that none of the models you use on a regular basis are rewarded for their abilities to produce the kind of writing that you like. As a user, you have to override the default style of the models with few-shot prompting or something similar. At that point, the difficulty lies in specifying what kind of writing you want. If you could somehow rigorously and exhaustively describe the type of literary voice that you want to effect on the page, then of course the model would have no problem generating reams of prose.
Of course, anyone who has tried to talk about literary writing for even a second will acknowledge that you can’t pin down what exactly is good style. It’s not a very beautiful passage, but Bourdieu basically get this in Distinction (1979) when he explains that stylistic mastery is
is, for the most part, acquired simply by contact with works of art—that is, through an implicit learning analogous to that which makes it possible to recognise familiar faces without explicit rules or criteria—and it generally remains at a practical level; it is what makes it possible to identify styles, i.e., modes of expression characteristic of a period, a civilisation or a school, without having to distinguish clearly, or state explicitly, the features which constitute their originality”.
While I can’t in writing ‘order up’ a style of thought and expression from an LM writer, I think I can do so from an appropriately-gifted human writer. It would not be absurd of me to go to someone and ask them – for example – to write something like Charles Dickens crossed with Ayn Rand (ignoring for the moment the question why). The writer would be able to have a go, and then, if asked, be able to explain why they did things a certain way.
While I could certainly prompt an LM to write like Dickens plus Rand, the crucial difference with a human is that the LM could not then reason about the eventual result it has given me: an LM could not introspect on the specific details of decisions they have made with pieces of prose. They can reread something they have done, and make up some explanation for why an agent might have done it like that, but it will never, strictly speaking, be true: it won’t actually be the reason why they chose a certain phrase over another.
In this way, it therefore becomes extremely hard to see how a current LM could contribute anything to literary style. If it is not able to introspect on moments of stylistic difficulty and divergence, then how can it know that these are the moments it needs to seek out in order to make advances in literary expression?
Incidentally, it is tempting at this juncture to inject some random term into the equation, and say that an LM can break new ground in literary style by simply making more random choices in its production of language. But this gets two things wrong. First, it misses that this already exists in LM architectures. Second, it misses that innovations in form and expression catch on because they matter to someone, and so they are motivated with respect to a culture, and are not merely random.
The anticipation and management of artistic risk
It doesn’t apply to all good writing, but clearly lots of your favourite works will do something ‘risky’. Risk doesn’t just mean writing something that offends someone, despite what many seem to think. Risk might also be staking your reputation by doing something formally strange, laughing where others are serious, or creating challenging and difficult work that others might simply interpret as ‘bad’ or ‘boring’.
To create work which will be highly regarded for long stretches of time, LM authors need some way of gauging not only what kinds of risks are worth taking, but also distinguishing what is a ‘risk’ in the first place. That’s because there is lots of literature which appears to think it is taking risks when that is in fact not the case.
This idea is gestured at by a great scene in Don Delillo’s first book Americana (1977) in which one of the characters, Brand, tries to explain the novel he’s working on:
[The main character] is the former president of the United States. He’s completed his two terms but he’s still very popular and he’s always speaking at important banquets. At the same time he’s turning into a woman. He’s beginning to grow breasts and his genitals are shrinking. [...] It’ll be over a thousand pages long. It’s called Coitus Interruptus. The theme is whatever you want it to be because appearance is all that matters, man. The whole country’s going to puke blood when they read it.
At first blush, the image of ‘puking blood’ materialises a type of literary risk: you’ve done something, sure, but will people thank you for it? It sounds like a risk in the sense that you’ll either be remembered as the genius who made America puke, or the lunatic who made America puke. And yet, in the scene, no one takes Brand seriously. The narrator is not really paying attention to him, and Brand is throughout characterised as a minor maniac.
In writing his book, we as readers do not think Brand is taking a risk which could go either way. Instead, we are led simply to think that his book will join the infinitely long list of books that no one will much care for because they have an over-inflated sense of their own riskiness. There is no category safer, more predictable and more reified than that of the self-consciously risky book. Literature thus has to delicately balance the taking of risks with the concealment that it is doing so. I won’t go into detail here, but if you take a roll call of some paradigm-shifting works in English – Tristram Shandy, Lyrical Ballads, The Waste Land – I don’t think your overriding perception is of an author who is writing something for merely the sake of taking literary risks.
If all these capabilities arrive – roughly put, the prediction of moral concern, the perception of lived fashion and style, and risk management – then AI agents will not just be better at novel writing, but also be much better at other things such as therapy, political consulting, brand design, and much more. By describing the absence of such capabilities, I am potentially taking a first step towards building them. Now, you might think that building these is impossible, but I would I warn against becoming too attached to saying such things. Heed Hugh Kenner, in The Invisible Poet (1965): “The eloquence of inadequacy is richly comforting, entoiling the futile man in beglamouring postulations of the impossible.”
brilliant and bold as always Will. I was especially provoked by the idea that LLMs can invent a reason for its choices but that reason cannot be true— but it did make me wonder what it means for that reason to be ‘true’ for people— as so often we make decisions (for instance but not exclusively decisions about art) intuitively with a feeling of spontaneity, and the analysis on why we made a particular choice happens retrospectively, so in some senses we might not understand our reasoning any better than the LLMs