Critiquing the Critic: AI vs Human Feedback

Despite the many functions AI models can perform, a human partner remains an essential step in the development process.

As AI tools become more accessible, many writers have started to dabble with them as part of their creative process. They might use an LLM (large language model) to brainstorm new story ideas, ask where to find primary resources for research needs, or summarize large swaths of information in much briefer terms. 

The appeal is understandable. What AI can provide, it can deliver in minutes, if not seconds. It can be done any time of day or night. And it comes pre-programmed with response mechanisms that are designed first and foremost to please the user.

This article doesn’t get into the creative or legal ramifications of generating scripts (and other narrative formats) with AI. Rather, today’s focus is to explore the extent to which AI might be used to evaluate a finished story.

Firstly, let’s consider how LLMs like ChatGPT and Claude work. At their core, these systems are pattern-matching engines trained to predict language. What does that mean, you might ask? To start with, the AI is given a huge dataset of text files that cover various topics, such film and TV scripts, novels and short stories, as well as text that covers the craft of writing: articles on standard story structures, blog posts about character archetypes, etc. Once it takes in this mega-sized dataset, the AI looks for patterns in those text files. With these patterns, it can then construct output for the user, based on what it predicts will most likely come next in a sequence of words given the prompt it receives.

Therefore, when a scribe uploads their draft and asks an AI tool to critique the story, what it can do is tell them the degree to which the scribe’s input matches the patterns it has memorized from its dataset. In short, a machine can tell the individual if it lands well with a machine.

This raises a simple but important question. Is your intended audience a machine?

If that’s not the case for you, and you want a living human to enjoy the material, then at some point, it behooves you to check in with a real-life person for their opinion on the draft.

Of course, it’s entirely possible that someone would want to create content for other people and still not choose to seek out human feedback before sharing the work publicly. Reaching out to a trusted reader requires coordination. Waiting for their feedback takes time. In some cases, consulting with a professional may also involve a financial investment that goes beyond the $20 monthly cost of an AI subscription.

To consult, or not to consult a fellow human, is a choice every writer has the freedom to make. Yet skipping human critique often results in a gap between how the work is intended to land and how it’s actually received by a wide audience. Consider this line from William Shakesepare’s tragic play Hamlet: “Our thoughts are ours, their ends none of our own.”

(Looking for a trustworthy reader? Take a peek at this article: How to Find a Compatible Critique Partner.)

When a wordsmith does seek out feedback before launching their work in the public eye, it shows respect for the community of viewers, readers, fellow artists, etc. who engage with the content. Audiences are smart; if material has been generated simply to please ego and be seen, rather than connect with a thinking, feeling populace, they will often notice that right away.

Which brings us to the next point. Because humans genuinely feel, they provide better judgment than a machine on three key elements in any story: tone, humor, and emotional impact. Let’s go over each one.

Tone. It’s true that AI can analyze sentence structure and suggest different lines that better adhere to the patterns it has memorized. LLMs, however, possess no instinctive reactions to those sentences. For example, if a scribe wanted to write an intense thriller, AI can offer a list of common plot tropes, character types, settings, and props that routinely appear in the thriller genre. But to know if the scene in the basement is truly gripping, if it can actually make someone anxious and put them on the edge of their seat, a real person would have to be consulted.

Humor. A tool like ChatGPT or Claude can process a huge dataset of written jokes, identify patterns in the data, and generate a list of jokes based on those patterns. But it cannot react with a genuine response of “This made me laugh.”

A human reader can. They’ll tell you whether a joke lands, whether it feels predictable, or whether it interrupts the flow of a scene. They can also point out when something reads as unintentionally funny. That kind of response is intuitive and experiential. It reflects how humor functions in practice, not just how it is constructed on the page.

Emotional impact. Many writers would consider this to be the most important of the three. What an AI tool can do is locate passages that describe emotional interactions. Yet it cannot determine the full extent to which emotional moments feel realistic. Nor does it experience investment in a particular character.    

A living-breathing reader, on the other hand, can gush with excitement to tell the author about a particular scene that blew them away. They can articulate where in the draft they felt drawn in to the action, as well as where they “bumped” or felt apathetic about the conflict. They can assess whether a draft is palpably impactful and appealing, or flat and predictable.

So despite the many functions AI models can perform, a human partner remains an essential step in the development process.

Here’s a real-world example that further illustrates the kind of machine-based shortcomings that can pop up. I recently asked an LLM to help me plan a writing schedule so that I could meet a specific deadline for a large project… and the results were wonky. It knew how to put together a nice-looking document with colorful charts, but it couldn’t successfully calculate that writing 3 pages a day, 5 days a week, for 3 weeks would result in 45 pages. Even after several requests for it to try again and focus on the math, the results were still significantly off.

If a “reader” can’t even compute objective math like 3 x 5 x 3 = 45, what does that say about its capacity to grasp subjective art?

H. S. Fishbrook is a freelance writer and story analyst from LA who finds great joy in fostering stories that elevate thought, for both film and print. Her experience includes studying abroad at The Shakespeare Birthplace Trust in Stratford-upon-Avon and the Globe Theatre in London, which quickly fanned the flames of her love for dramatic storytelling. As a story analyst her biggest client to date is Amazon Studios, but she also enjoys supporting writers 1-on-1. To learn more about her and her work as a creative writing consultant for screenwriters and novelists, visit HSFishbrook.com.