The machine in the room

How to tell AI writing from human writing

A practical guide to the structural, linguistic, and behavioural patterns that separate generated text from writing produced by an actual thinking person.

“The barrier is not access to the right words. It is the absence of a specific mind behind them.”
– Paraphrased from Wikipedia’s WikiProject AI Cleanup research

There is a quiet skill spreading through offices, classrooms, and newsrooms: the ability to sense, before being able to explain, that a piece of writing was produced by a language model. Editors feel it in a pitch. Teachers flag an essay that passes every plagiarism check but still reads wrong. Hiring managers delete cover letters that are technically flawless.

The feeling is not random. It is picking up on something structural. Once you understand what that something is, you cannot stop seeing it.


Why AI text has a fingerprint at all

Language models do not “know” things in the way people do. They calculate the statistical probability of the next word in a sequence, trained on an enormous body of human-produced text. This produces a specific problem: the model gravitates toward the most statistically expected output.

When asked to write something, it reaches for the words, structures, and phrasings that appeared most frequently in similar contexts. It does not reach for the strange word, the surprising turn, or the slightly-off sentence that a particular human writer might produce simply because that is how their brain works.

Researchers call this tendency “regression to the mean.” The result is average writing, reliably produced at any scale. That is not the same thing as good writing.

Two measurable properties follow from this directly. The first is perplexity. The second is burstiness. Neither is a perfect detector on its own, but together they describe the shape of the problem with some precision.


Perplexity and burstiness

Perplexity measures predictability: how easy it is to guess what word comes next in a given sequence. A low-perplexity sentence is one where each word follows very naturally from the one before it. A high-perplexity sentence surprises you.

Human writers produce high-perplexity text without trying. They make unexpected connections. They pick words from their own reading history, professional vocabulary, regional dialect, and personal habits. A writer who says “the policy landed with a thud” instead of “the policy was met with criticism” is producing higher-perplexity text. The language model will almost never produce the first version unprompted, because “landed with a thud” is too specific, too informal, too tied to a particular sensibility. “Was met with criticism” is more probable. More expected. Safer.

Burstiness measures how much perplexity varies across a document. Human writing tends to vary its patterns considerably, while language models write with a consistent level of uniformity throughout. A human writer might open with a short, blunt sentence, follow it with a longer qualifying one, cut back to something spare, then write three complex sentences in a row. The rhythm changes because thought changes. AI text moves at a steady pace because the probability calculations that govern it stay roughly constant from the first paragraph to the last.

You can see the shift even at the vocabulary level:

  • 400% increase in “delve” appearing in published articles since late 2022
  • 3,900% rise in “meticulously researched” in the generative AI era
  • ~90% accuracy rate for expert readers who use LLMs frequently when judging whether text is AI‑generated

Pangram Labs, whose detection technology has been verified by researchers at the University of Chicago and the University of Maryland, notes that perplexity and burstiness are insufficient as standalone metrics, because they are relative to a particular model. What counts as unexpectedly high perplexity for one model may be normal output from another. This is part of why the same text gets flagged by one tool and cleared by another.


The vocabulary shift, era by era

Wikipedia’s project to clean AI-generated content from its pages has produced one of the most detailed public analyses of AI linguistic habits currently available. It runs to thousands of words, is updated regularly, and draws on the combined experience of editors who process millions of AI submissions each year. One of its more striking findings is that the vocabulary AI models favour has shifted over time, era by era, as models have improved and been trained to avoid their previous tells.

GPT‑4 era (2023 to mid‑2024) – most flagged words

delve · pivotal · tapestry · meticulous · intricate · underscore · additionally · boasts · bolstered · garner · interplay · testament · enduring · vibrant

GPT‑4o era (mid‑2024 to mid‑2025) – evolved vocabulary

showcasing · fostering · highlighting · align with · bolstered · crucial · emphasizing · enhance · enduring · vibrant · pivotal · underscore

This era-by-era drift is useful for something more specific than detection: it can help estimate roughly when AI text was added to a document. An article that leans heavily on “delve” and “intricate” was most likely generated in 2023 or early 2024. One full of “showcasing” and “fostering” points to a more recent model. The word “delve” dropped off sharply in 2025, not because AI stopped generating text, but because so many people flagged it that developers trained it out of the default output.

Researchers at the Max Planck Institute have documented an uncomfortable side effect of this: humans are beginning to absorb AI vocabulary unconsciously. They call it AI linguistic imprinting. People exposed to large volumes of AI output start using its vocabulary because it feels polished, formal, and safe. “Delve” and “pivotal” appear in human-written documents more often than they did in 2022. Vocabulary alone is therefore increasingly unreliable as a detection signal. The structural patterns are more durable.


The structural tells that don’t go away

The vocabulary has shifted with each model generation, but certain structural habits have remained consistent across all of them. These are the patterns that Wikipedia editors, experienced teachers, and professional editors have learned to recognise on sight.

Six structural patterns to look for

  1. Importance inflation
    AI rarely describes something without also explaining why it matters, usually in vague terms like “a pivotal moment” or “a broader movement.” The significance is asserted rather than earned.
  2. The false range
    “From intimate gatherings to global movements.” Two loosely related items are dressed up as a comprehensive spectrum. The middle is missing.
  3. Rule of three
    Traits grouped in threes, always parallel and vague: “fast, reliable, and scalable.” A human under time pressure usually picks the one that actually matters.
  4. Uniform rhythm
    Sentence length stays remarkably consistent throughout. No fragments. No very long subordinate clauses. Every paragraph feels like the same number of words, moving at the same speed.
  5. Pre‑announced summaries
    “In this article, we will explore…” Human writers in professional contexts rarely announce what they are about to do in this way. They just do it.
  6. Trailing significance clauses
    Sentences that end with “emphasizing the importance of” or “reflecting the continued relevance of” add zero information and exist purely to sound considered.

Wikipedia’s analysis points to one pattern that sits underneath all of these: AI text spends a disproportionate amount of time explaining the importance of the subject rather than describing the subject itself. The tell is a sentence that contains no information but exists to signal that the writer understands what they just wrote. A confident writer never needs to signal that.


How automated detection tools work (and where they fail)

GPTZero, one of the most widely deployed detection tools, claims a 99% accuracy rate when distinguishing AI-generated text from human writing, verified by Penn State’s AI Research Lab. Grammarly’s AI detector ranks first on RAID, an independent quality benchmark for the field. These numbers are impressive in controlled conditions.

Independent faculty researchers, however, note a consistent gap between benchmark performance and real-world use. In practice, the tools produce significant numbers of false positives and false negatives. They are also discriminatory against writers who do not natively speak English, generating false positives for these writers in up to 70% of cases in documented studies. Academic writing, legal writing, and technical documentation all have low perplexity by design, because those styles require precision and uniformity. This led to the well‑publicised case of AI detection tools flagging portions of the Bible and the U.S. Constitution as machine‑generated.

The arms race dynamic makes this worse over time. Every time researchers publish which patterns detectors use, developers train new models to avoid those exact patterns.

Tools like QuillBot’s AI Detector now attempt to distinguish between four categories rather than two:

  • AI‑generated
  • AI‑generated and AI‑refined
  • Human‑written and AI‑refined
  • Purely human‑written

This is a more honest framing of what detection actually looks like in 2026, because most AI‑assisted text is not raw model output. It has been edited, paraphrased, or prompted in ways that blend machine and human input in proportions that no current tool can reliably quantify.


What trained human readers catch that machines miss

A 2025 preprint found that heavy users of large language models correctly identify AI-generated articles roughly 90% of the time. People who rarely use language models perform only slightly better than random chance, in both directions. The gap between these groups is the interesting finding.

Expert readers are catching something real, something that statistical metrics do not fully capture. That something is best described as the absence of a specific mind. Human writing, even poor human writing, carries traces of an actual person: their knowledge gaps, enthusiasms, blind spots, and vocabulary preferences from their particular reading history.

A restaurant critic who writes “the pasta was fine” when every other sentence in their review is expansive is doing something intentional. Deliberate understatement. A language model does not produce deliberate understatement because it has no reason to restrain itself. More practically, human writing makes claims that are contestable and specific. It names things. It describes rather than evaluates. It takes positions that create friction with other positions. AI text trends toward positions designed not to offend anyone, which in practice means positions so generic they contain no real information.

There is also what researchers call the perfect grammar problem. Even the best human writers make mistakes or intentionally break grammatical rules for emphasis. Perfect grammar sustained across a long document is itself a mild signal that something automated was involved.


What the detection debate is really about

The technical question of whether a given piece of text was machine‑generated is, in many contexts, less important than a different question: does this text contain the work of an actual thinking person?

A press release can be entirely human‑written and still contain no real information. An AI‑drafted summary of a research paper can be accurate and genuinely useful. The provenance of the text often matters less than whether the text reflects genuine engagement with the subject.

What the detection conversation has surfaced, usefully, is that there are identifiable properties separating engaged writing from mechanical writing: specific language over vague language; sentences that vary in length and energy; claims that are contestable rather than universally safe; phrasing that could only have come from someone with a particular history with the subject.

Those properties do not require a human author to exist. But they do require genuine thought to produce. And that, more than any vocabulary list or perplexity score, is what most readers are trying to detect when something reads wrong.


Genuine thought, it turns out, is inseparable from genuine language — the kind learned through real conversation, cultural context, and human connection. But language is only one dimension of what it means to communicate with precision and purpose in today’s professional world.

Germanbhashi brings all of it together. Learn German the way it was meant to be learned -through meaning, not memorization. Go deeper into the technology reshaping how we write and think, through workshops on generative AI that move beyond the hype. And when it comes to entering or advancing in the job market, germanbhashi’s CV optimisation and interview preparation programmes help you present not just your credentials, but the specific, thinking mind behind them.

Because the sharpest professionals don’t just adapt to change — they understand it, articulate it, and make it work for them.


Further reading / Sources

Wikipedia’s WikiProject AI Cleanup is where most of the structural analysis in this article originates. It is updated regularly and goes considerably deeper than anything summarised here. Pangram Labs has published technical findings on perplexity and burstiness, verified by the University of Chicago and the University of Maryland. For the research on AI linguistic imprinting, search Max Planck Institute directly. GPTZero, Grammarly, and QuillBot each publish notes on how their detection tools work and where they get it wrong.

About Author

Niti Dua Breja

Niti’s expertise extends beyond teaching—she has worked extensively with students, professionals, and organizations, helping them navigate language barriers, explore academic opportunities, and integrate into Germany’s cultural and professional landscape.