Do AI Humanizers Actually Work? We Tested Them And This Is What We Found
First, we had ChatGPT and other Generative Pre-Trained Transformers, which created AI-generated text. Next, we had AI Detectors, which used AI to determine whether people were using artificial intelligence. AI Humanizers — the next step in the evolution of artificial intelligence — use AI to change AI text so other AI won’t spot that it’s AI-generated. If that sounds mad, well, it probably is. However, there are dozens of AI Humanizer tools on the market that promise to do exactly that.
Advertisement
So, what do humanizers do? AI Humanizers rewrite AI-generated text to generate content that detection tools can’t identify as machine-written. There are “tells” in AI-generated content that detectors look for, like using the most common words and phrases, a lack of variation in tone, and having sentences of a uniform length. Humanizers, therefore, try to circumvent these giveaways by swapping out common words for less common ones, paraphrasing sentences, and varying the tone or lengths of sentences.
Each AI Humanizer will have a different algorithm, which is why some of the ones we tested produced reasonably human-sounding writing, and some gave us unreadable gobbledegook. The testing process is explained in more detail in the Methodology section, along with a list of all 10 humanizers we tried.
Advertisement
Humanized text often fails AI detectors
The easiest way to test the effectiveness of AI humanizers is to run the text through AI detection software. These are tools designed to identify and evaluate whether content has been generated by artificial intelligence. Detectors use algorithms to spot patterns, inconsistencies, or markers that indicate it’s been created by AI.
Advertisement
Smudge, Quillbotand Undetectable AI offer both AI Detection and AI Humanizer features. Some of the results from AI Humanizers didn’t even pass their own AI detectors. Smodin’s AI detection tool gave its humanized text a 0% human score, although Quillbot was completely fooled and considered it 100% human. Detection scores are not consistent across different AI detectors. I put the humanized text I got from Quillbot through the Quillbot AI Detection tooland it returned a score of 100% AI. Undetectable AI did manage to pass its own (and others) AI detection.
Other AI Humanizers whose results were flagged up by AI Detectors included Humanize AIwhich was considered 80% AI by Sapling AI detectionand WriteHumanwhich got a 100% AI score on Quillbot.
Advertisement
Humanized text can pass AI detectors but still sound like AI
Being able to pass an AI detection test isn’t the whole story, though. Humanized text can be determined 100% human by detection software and still be robotic-sounding, unreadable, or outright gibberish. Undetectable AI’s humanized text scored well with detectors but contained nonsense sentences like “LinkedIn Premium is a profile subscription views which and enhances LinkedIn the Learning.”
Advertisement
Readability often doesn’t correspond to its human-generated content score. Smodin’s humanized text scored well on some AI detection tools but contained phrases that sounded distinctly less human than whatever ChatGPT wrote in the first place. Humanized text from Humboldt scored 40% and 25% on different AI detectors, suggesting that the text was considered more human than not. However, its actual text was garbled and bore little relation to the original text. It rewrote “Tap your Apple ID” as “Faucet your Apple ID.”
I couldn’t predict which results would perform best on detection tests. ContentShake AI provides several options for its humanized output, including Rephrase, Casual, and Improve. I found that the Rephrase option was the least pleasant to read. It changed a heading titled “Understanding LinkedIn Premium” to “Grasping LinkedIn Premium,” which doesn’t convey the same meaning at all. Although it didn’t score well, it did score better than the outputs generated with Casual and Improved mode, even though they seemed much better written to this human.
Advertisement
Some AI Humanizers are actually quite good
As a human writer, I’d prefer not to be replaced by a robot, but I do need to give credit to a couple of AI Humanizers that performed pretty well. Surfer SEO and AI Text Humanizer both produced perfectly readable copy. There weren’t any jarringly odd choices of synonyms or mangled sentence structures. However, AI Text Humanizer’s output would need a clean-up as the text missed punctuation like commas and slipped between British and American spellings, spelling the word “canceling” two different ways in the same article. Both Surfer and AI Text Humanizer passed AI Detection tests, with Quillbot pronouncing both pieces of writing to be 100% human. The resulting text was still boring, of course, because the original text was boring, and these products aren’t designed to make it more enjoyable to read.
Advertisement
I tried to produce something more interesting on ChatGPT by using a prompt that specified it should be “witty, friendly, and engaging” and “read as though it’s one friend giving advice to another.” This approach makes it slightly harder to detect AI writing, and its score before humanization was 54% AI. It scored 29% after it had been rewritten by Surfer, which wasn’t quite as successful as its first attempt. AI Text Humanizer’s version succeeded in getting a 0% AI score. In both cases, the humanized text wasn’t noticeably better than ChatGPT’s original, but it wasn’t any worse either.
Are humanizers just automated thesauruses?
The humanizers we tried didn’t seem to have any sense of the meaning of the article as a whole, and many made changes that didn’t make any sense in the context of the example we gave them. Many of the humanizers just took words from the text and did the digital equivalent of copying and pasting from a thesaurus.
Advertisement
Merlin (the same software that scored lowest in our AI Detector roundup) changed the sentence “However, as needs evolve, some users may decide to discontinue their subscription” to this needlessly complicated equivalent: “Nevertheless, with the radical changes in the life, many of the users will possibly take a choice to discontinue the subscription as per their needs-orientated strategy.” This kind of result is like the episode of Friends when Joey discovers using a thesaurus and rewrites “They’re warm, nice people with big hearts.” as “They’re humid, pre-possessing homo sapiens with full-sized aortic pumps.”
I tried manually replacing words in the sample text using Word’s synonym option. This is a fairly labor-intensive way of disguising AI-generated content (it involves a lot of right-clicking), but I was curious to find out if it would replicate the results of the AI humanizers. The results were not dissimilar. My updated text was slightly less deranged than Joey’s version and scored 57% AI on Quillbot’s AI detector. This was the same score we got for humanized text from ContentShake AI and better than Quillbot‘s or WriteHuman‘s scores. However, humanizers have other tricks up their sleeves besides just replacing words. They changed the order of sentences and rewrote entire phrases. Some introduced grammatical errors, but it’s hard to say if this was a deliberate ploy to fool AI detectors or a limitation of the software.
Advertisement
Do humanizers actually work?
Yes and No. They’re all certainly doing something. Each of the humanizers rewrote the text in a different way, and you can repeat the process as many times as you like on each humanizer and get different results every time. As we found in our article about AI Detectors, AI scores will vary wildly depending on which detector you — or the person checking your work — chooses to use. To avoid detection, you would need to know exactly what detector you’re up against (or be prepared to run your humanized text through dozens of detectors).
Advertisement
If you’re determined to generate text through an AI large language model like ChatGPT, your best bet would be to rewrite the copy yourself in order to avoid detection. Alternatively, you can use ChatGPT to generate a general outline of your subject and write your piece from there. Although using AI to do your writing for you should save time, humanized AI text needs to be checked thoroughly to ensure that it reads well, makes sense, and has retained the meaning of what you were trying to say.
Using these tools also compounds the AI regression problem, where AI large language models are increasingly being trained on AI-generated content, leading to even more homogenous and bland AI content.
Methodology
To write this article, we tested 10 humanizers: Quillbot, Smudge, Undetectable AI, Humanize AI, ContentShake AI, Surfer SEO, AI Text Humanizer, Merlin, WriteHumanand Humboldt. We stuck to free plans, which all had usage limitations and would require an upgrade to a paid plan if used regularly. I used the same piece of ChatGPT text on all the humanizers. This was the AI-generated LinkedIn piece that I used when testing AI detectors, although I had to shorten it where it exceeded a humanizer’s free plan character limit.
Advertisement
When testing the results against AI detection software, I tried to use a variety of AI detectors to get a range of results. However, I was limited by usage limits on the AI detector’s free plans. I used Quillbot a lot because it scored highly in our AI Detection article and allows unlimited free uses. If I have quoted an AI detection score and not specified which detector I’ve used, then it’s Quillbot.
As a final note, I would like to reassure you that a non-artificial, human writer entirely wrote this article. I’m also assuming, possibly naively, that you, the reader, are another human and not a robot. Out there on the web, there may be AI-written, AI-humanized, AI-checked articles being consumed entirely by AI bots, but not here.
Advertisement
Comments are closed.