Buyer Beware in the Age of AI Garbage

The most recent kerfuffle in the writing world and why it’s bigger than one crappy website

Aug 09, 2023

First—a quick bit of news! Our Kickstarter for the Worldbuilding for Masochists Anthology Traveling Light is 100% funded! There’s still time—mere hours now!—to pledge if you want to jump in on rewards. Plus, we have stretch goals in hopes of opening to more submissions! Because hey, we want to pay writers fairly for their work!

Which is, apparently, not a given these days.

In the past couple days, writers got wind of an online writing tool that, to be fair, has been festering on the webbernets awhile but apparently avoided detection. Proscraft.io analyzed thousands of works of fiction and aggregated the data into a tool that let you see a work of fictions’ “vividness” and “adverb use” among other attempts at quantitative analysis (more on that later). Once a couple writers pinged the creator of the website publicly on The Social Media Site Formerly Known as Twitter and received less than reassuring answers about why and how their work had been fed into this data mill, all bets were off and many other writers joined the chorus—and started involving their publishers’ legal departments. The site was taken down, and under continued pressure, the creator added that all of the data had been expunged.

Despite some tech-bro apologists’ attempts to excuse this, the problems are myriad. For one, the sheer volume of books and the creator’s explanations of obtaining material strongly suggest that the books fed into the chipper were pirated to begin with. In case anyone needs a refresher, book piracy—both making AND using unauthorized copies of books—is in fact illegal. The creator of the tool skirts open confession of using pirated material, saying "When I ran out of books on my own shelves, I looked to the internet for more text that I could analyze, and I used web crawlers to find more books." Color me skeptical that The Free Internet was only serving up legit material. Whether anything that followed was fair use or not, Prosecraft made crappy book data cookies out of stolen chocolate chips.

And that question is a bit of a cluster. Fair use is a still a squishy concept when it comes to training AI or feeding copyrighted material into AI. I don’t pretend to know the ins and outs of all of it—and apparently neither did the Prosecraft creator, though to be fair, we’re still sorting a lot of these questions out the old fashioned way, with lawsuits. Though he claims his initial inquiry into this data was educational in nature, to learn better writing craft himself, he developed a for-profit writing tool called Shaxpir (yes, pronounced…yeah) out of it. This alone does negate any claim he may have had to “scholarly” or “academic” use of the material—because, yes, the field of studying literature in terms of data sets does exist. However, attempting to turn a profit changes the equation a bit. It goes without saying that no publishers, authors, or anyone else gave permission in any way to be part of a for-profit writing ~~scam~~ tool.

And scam it is, ultimately. The tool analyzed elements of writing that have little to no bearing on the ultimate “quality” of the prose. Part of this is the obvious element that software is only as good as its programmer, that the robot will only do the task as well as it has been taught. Prosecraft was taught badly. The most obvious of errors is a feature that quantifies how much “passive voice” a book has—but the creator obviously, embarrassingly, does not seem know what passive voice is. (Passive voice is putting the object of the verb first—as in “The bill was passed by the council”—as opposed to active voice, which places the subject first—”The council passed the bill.”) The tool counts usage of the helping words (like was and were) that passive voice relies on—but clever readers, you’ll notice right away that past tense, descriptive phrasing, and other necessary grammatical structures require these words, too.

Of course, knowing how much passive voice a work has also assumes something about the quality of writing that uses passive voice—that passive voice is, apparently, bad. It’s true that too much passive voice is a Writing 101 No-No, but as any writer will tell you, it’s not memorizing rules but applying language well that makes for good writing. Sometimes passive voice is better than active. Similarly, the tool tracked adverb use—following some stale old advice, perhaps, that adverbs are bad. Adverbs can be great, actually. (Consider the difference between “she smiled” and “she smiled viciously.”)

The tool also attempted to categorize the “vividness” of words and then count how many “vivid” words works had—but the entire concept of “vivid” is a bit nebulous, and, again, assumes something about quality, not only of each individual word but of the goals of writing. Valuing “vividness” assumes that “painting bright word pictures” is the main goal of writing. This may be a goal for many writers, but even then it’s far from universal. And it is certainly far from universal that the word “eyelash” is more affecting, gripping, or colorful than “apple.” Much depends on context and use. “She batted her eyelashes” is perhaps less “vivid” than “Long eyelashes of cloud closed over the reddening sun” (or maybe not, I don’t even know what vivid means here, YMMV).

Therein lies the biggest problem with Prosecraft, the free web tool, and its $8 a month buddy Shaxpir Pro—the belief that you can use data to crack your way into good writing without doing the work yourself. It made broad assumptions about “good writing” and then attempted to shill these to newbie writers looking for guidance. I won’t give the newbies full passes here—ya’ll should know that work is the only way you get better at craft, not shortcuts. Buyer beware: The snake oil probably doesn’t work, and it might actually harm you.

So there’s lesson the first—if you want to learn craft, AI can’t do it for you. Prosecraft isn’t the first and won’t be the last scam leveraged at inexperienced writers who want to learn How To Story. (Worth noting that Writer Beware exists and is a good first stop when investigating anything peddled to writers, from agents to publishers.) Even if it was assessing any kind of viable metric of writing (and I’m not even able to start to get into the question of “is it possible to find objectivity in art” here), it can only show you Things About that metric, not help you understand how YOU engage with the process of writing. Because that’s the thing—writing is a craft, a process. You can’t only look at the end result and learn how to make it. You can learn a lot by looking at plenty of pots and vases, but you can’t learn how clay feels under your own hands.

Now, I will never say that data has no place in creative endeavors like writing. It’s simply how some people thing—to approach a process by the numbers, to analyze patterns, to pick out trends and then play with replicating them. Neat! If that’s you, do you! But the thing is, you are DOING the work of reading, picking apart, and analyzing—not letting someone else tell you what is important in the process.

That letting someone else tell you part—it’s older than AI, of course, but the advent of computer learning models only intensifies that particular treachery to art. There’s something a bit darker and ickier than just falling for the snake oil pitch. When we hand over the “how” of any creative work to a computer, you might just forget to think about the “why.” Lesson the second is broader, then, than writers learning to write. It’s that when we program a thing, all of our biases, beliefs, judgements, and errors go into it, too. With Prosecraft it’s just that adverbs are bad and a misunderstanding of passive voice. But when we consider all the biases baked into our art, would we want to replicate them with AI? Do we even want computer models teaching human artists, as was the case here? I think not—we want real artists, sometimes insightful, sometimes fumbling, to rap at the sides of the containers we’ve built around good art and see where the weak spots are. We want artists’ craft to built upwards, outwards, to change and challenge the parameters of any metric we can apply, to continue to delight and mystify us. Delight and mystery both require novelty, to some degree—novelty that we can’t get by replication of the same hits on the same metrics.

We are only at the beginning, I’m sure, of the debates over AI and art. It’s not going away. But it’s our responsibility to interrogate it, question it, never assume that a computer is smarter than an artist at creating art. It probably isn’t.

Your requisite chickens:

Words, Books, Chickens

Discussion about this post