AI detectors are tools that recognise when a text was generated in part or wholly by algorithms that use AI, such as ChatGPT. They are used to determine whether a piece of writing is likely to have been generated by AI, which is helpful for instructors who want to ensure that their pupils are writing independently, as well as moderators who want to delete fake product reviews and other spam content. Here’s how AI detectors function in practice:
1. Perplexity:
The unpredictability of a sentence is measured by perplexity. In simplest terms, it assesses how frequently a text may confuse a reader. AI models try to reduce perplexity, striving for a text that reads smoothly and rationally. Human writing is more perplexing: it has more inventive linguistic choices, but also more errors. Language models work by predicting and adding the word that would normally follow next in a phrase. As illustrated in the table below, there are more and less plausible continuations in the statement “I couldn’t get to sleep last…”
Example statement | Perplexity |
I couldn’t get to sleep last night. | Low: Probably the most likely statement |
I couldn’t get to sleep last time I drank coffee in the evening. | Low to medium: Less likely, but it makes grammatical and logical sense |
I couldn’t get to sleep last summer on many nights because of how hot it was at that time. | Medium: The sentence is coherent but quite unusually structured and long-winded |
I couldn’t get to sleep last pleased to meet you. | High: Grammatically incorrect and illogical |
Low perplexity is used to prove that a text is generated by AI.
2. Burstiness:
Burstiness is a measure of diversity in phrase structure and duration, similar to perplexity but at the sentence level rather than the word level. It is minimal in a text with little variation in sentence structure and length and is high in a text with more variance. AI text is typically less “bursty” than human text. Language models prefer to produce sentences of ordinary length (say, 10-20 words) and with traditional structures because they forecast the most likely word to come next. This is why AI writing can be tedious at times.
Low burstiness implies that the text was most likely created by AI.
3. Temperature:
When working with AI-generated text, it is essential to understand the concept of temperature. The randomness of predictions is measured by temperature probability. If the temperature is low, a model will most likely produce the most accurate text. But it will be extremely dull due to the lower degree of variance.
If the temperature likelihood is high, the generated text will be more diverse – but the model will be more likely to make grammar problems. If a sampled piece of text consistently chooses the most predicted term across paragraphs, you’re almost likely dealing with artificially generated content.
How reliable are AI detectors?
AI detectors typically perform well, particularly with longer texts, but they can easily fail if the AI output is urged to be less predictable or altered after it is formed. If the criteria are met, detectors can readily misidentify human-written material as AI-generated content.
According to research into the top AI detectors, no tool can achieve 100% accuracy; the highest accuracy reported was 84% in a premium product or 68% in the best free tool. These tools provide a good indication of how likely it is that a text was generated by AI, but they should not be used as evidence on their own. Even the most confident vendors frequently state that their tools can’t be used to prove that a text is generated by AI, and colleges haven’t put much faith in them so far.