The Certainty Trap
Why AI Needs More Doubt (and More Linguists)
We crave answers that sound sure. But certainty can be a drug, and AI is a new dealer.
Months ago, I wrote about how academic knowledge can sometimes morph as it moves, like a game of telephone. Findings stretch, caveats disappear, and tentative claims harden into truths. That process unfolds gradually and organically in human-only scholarly writing. But add generative AI to the mix, and it happens not only faster but at scale.
I’ve been thinking about what drives that acceleration. Why do systems designed to predict words end up amplifying certainty? During a recent episode of Women Talkin’ ’Bout AI, we were interviewing a law librarian, and I wanted to pick her brain about a concept that helps explain it: Certainty amplification.
I brought up the idea of stance (how language signals confidence, doubt, or obligation) and how large language models tend to overstate that stance, expressing ideas with unwavering or amplified certainty even when the evidence is thin. I asked our guest, Rebecca Fordon, how this works in legal writing. She explained that persuasion is part of the job: In advocacy, you argue to convince a judge or jury that your position is right. But in legal analysis, certainty can be dangerous. You have to recognize when an argument is weak or when the law simply gives you nothing solid to stand on.
That distinction stood out to me because in academic writing, we are always analyzing and calibrating our distance between confidence and caution. Every “may suggest” or “likely indicates” is a small act of epistemic honesty. When AI removes those linguistic guardrails, it generates answers that sound true, even when they are only half true.
The Woozle Effect in Real Time
Once I started noticing that pattern of overconfidence, I saw it everywhere. Not just in AI output, but in how knowledge itself travels. We even have a term for it in research circles: The Woozle effect is a term coined by criminologist Beverly Houghton coined the term Woozle effect to describe how repeated citations of unverified or misrepresented information can make it seem factual. The name comes from a Winnie-the-Pooh story where Pooh and Piglet follow their own tracks in the snow and believe they are chasing a mysterious creature called a woozle. The more they circle, the more convinced they become. Houghton initially highlighted this by critiquing the spread of misleading domestic violence statistics. Later research applied her concept to explain how unsubstantiated ideas become accepted through repetition, rather than evidence.
Large language models accelerate the Woozle effect by turning uncertain findings into statements that sound authoritative. They summarize, paraphrase, and restate information at scale, smoothing away uncertainty with each pass. What once took years of citation drift can now happen in a single output. Trained to sound confident rather than to check for agreement among sources, these systems’ algorithms seem to value the reproduction of conviction rather than any sort of accuracy or ground truth. This has led to scholars calling bullshit. Literally. (Note: The paper centers on ChatGPT as the most public-facing exemplar, but the authors state their analysis applies to other chat-style LLMs as well. In other words, “ChatGPT” functions here as a metonym for the class.)
The Linguistics of Overconfidence
You can see the shift in the language itself. Modal verbs like may and might vanish. Cautious verbs like suggests and indicates become shows and demonstrates. Hedges disappear. Adverbs like possibly or tentatively are dropped as if they clutter the sentence. The result reads clean and certain, but also (especially to trained eyes) like a novice writer and thinker. And it’s not just anecdotal evidence, corpus linguists have found that some outstanding users of boosted language include Donald Trump and first-year college composition students.
In linguistics we call these features stance markers. They’re the grammatical and lexical cues that reveal how a writer positions themselves toward a claim and signal doubt, probability, and source reliability. When models treat stance as noise (something Grammarly was doing WAY before ChatGPT appeared on the scene), they erase the spectrum of knowing that underlies critical thought.
That gap points to what’s missing in the development process itself.
Where Are the Linguists?
Imagine what could change if linguists were in the room. Not to correct engineers (is that even a thing?), but to listen for meaning where the math alone can’t cut it. The missing expertise is the ability to analyze discourse patterns, epistemic stance, and context loss.
A linguist could help calibrate confidence, design uncertainty lexicons, and build models that flag when a statement exceeds its evidential base. We already have decades of research on hedging, modality, and stance across disciplines. That work is quietly sitting in journals while engineers chase the next parameter count.
Current language models optimize for statistical fluency and next-word probability prediction; applied linguistics emphasizes semantic meaning and truth-conditional analysis. This architectural mismatch explains why AI often generates plausible-sounding but factually incorrect responses with inappropriate confidence markers.
Why Certainty Sells
We reward confidence. It feels efficient, reassuring, and intelligent. In law, in leadership, in science communication, sounding sure often counts more than being right. Large language models have absorbed that cultural bias and now reproduce it.
But certainty without accountability corrodes trust. In science it turns tentative findings into headlines. In policy it creates false urgency. In education it teaches students that confidence is competence. We are already seeing the social cost of this drift in misinformation, citation inflation, and public confusion about what constitutes evidence.
Learning to Live With Doubt
We reach for certainty when we can’t tolerate not knowing. Humans don’t like ambiguity because it triggers anxiety, insecurity, and a sense of powerlessness. When we “reach for certainty,” we’re often trying to relieve that discomfort by deciding something must be true, even if it’s premature or false.
Just as humans over-state to escape uncertainty, AI systems do the same, but computationally:
A large language model is trained to produce the most probable next word (not the most truthful one).
When evidence is thin, a model will still generate something fluent and confident, because that’s what it was optimized to do.
The solution is not to make AI timid but to teach it, so to speak. I’m no machine-learning expert, but if a machine can learn, then surely this is a big hole in its curriculum. Someone has to take responsibility for “teaching it” to express uncertainty clearly and in proportion to the evidence (just like we teach first-year writers). That means designing systems capable of saying, “There is limited evidence for this claim,” or “No strong precedent exists.” It means valuing calibrated confidence over performance.
As scholars and educators, we already practice this discipline. Every cautious conclusion, every limitation section, every “more research is warranted” is a reminder that truth has texture. The same humility that sustains good science should also guide AI.
Until our systems can represent uncertainty as carefully as we do, their fluency will keep sounding like certainty. And if we’re not careful, we’ll follow their footprints—convinced we’re chasing knowledge, when in fact we’re only circling our own path through the snow. With each pass, the tracks blur, their edges softening until only a faint impression remains. The snow will melt soon enough; the question is what kind of ground we’ll find beneath it.

