Column
When Everyone Can 'Write Well' With AI, Where Does an Organization Measure Individuality?
In an era where anyone can mass-produce polished prose with AI, the thinking of the whole organization is quietly being homogenized. A 5-minute read from the latest research on argument rarity and our emotion-AI perspective.
This article is an English translation of the original Japanese column. Some phrasing has been adapted for English readers.
Hello, this is Inoshita from Affectosphere Group.
A while back, someone in HR at a company brought me this.
“We have more and more essay submissions that look like they were written with AI. The writing is good, but somehow everyone is saying the same kind of thing.”
I had to stop for a second when I heard that.
There was a time when “good writing” was a real differentiator in screening. But today, anyone using ChatGPT or Claude can produce prose above a certain bar. Readable, well-structured, no typos — getting there is no longer about individual ability.
Which means organizations that used to hire and evaluate based on “writing skill” might be looking at something they think is a signal, but really is not anymore. A slightly unsettling thought.
In 2026 we published two studies 1 2 that tackle this problem head-on. One proposes a framework for measuring “argument rarity.” The other is a large-scale experiment on how much AI assistance homogenizes student thinking.
Today I want to write what we saw, for people working in HR, education, and recruiting.
Today’s takeaway in 3 lines
- Value: AI assistance lifting everyone to a basic prose quality is a clear plus for overall productivity.
- Pitfall: but “good writing” is dying as an evaluation axis. Without a different axis, you stop being able to tell who is actually thinking independently.
- Hidden cost: when everyone leans on the same tool, the organization’s thinking itself quietly homogenizes. It does not show up in the numbers, but it will hit you in the long run.
In order.
① The value side — AI eliminating “bad prose” is genuinely huge
I want to land the value side properly first.
Proposals written by new hires, first-draft internal documents, English translations of outbound emails — the time cost paid by “the person who has to read the unreadable” has historically been enormous.
AI assistance raised that floor in one go. That, honestly, is amazing.
Our research also confirmed that essays written with AI assistance reach writing-quality scores comparable to the top tier of human writers. Anyone can land there on the first draft. The number of companies that recover hours of staff time just from this is probably very large.
So today’s piece is not “stop letting people use AI.”
The problem is that, swept along by the convenience, organizations unconsciously equate “good writing = good thinking.” From here on, it gets a little tricky.
② What the research showed on the other face
What stuck with me from our work was this.
When we compared 1,375 human-written essays with 1,000 AI-generated essays, the AI essays matched the top tier of human writers on writing-quality scores.
But on claim rarity — “is this making a point that others are not making?” — AI essays sat at roughly one-fifth of the human baseline.
In other words — “the ability to write well” and “the ability to raise points others do not” turned out to be different abilities. Up until now we could substitute one for the other with a single metric. AI just separated them.
The second study compared 6,875 essays across five conditions, from “fully human” to “AI-only generation.”
The result: as AI assistance intensifies, writing quality rises cleanly, but the variance in argumentative-link structure — the patterns connecting claims to evidence — drops by 68 to 78 percent.
The paper calls this the “quality-homogenization trade-off.”
Translated into business:
AI assistance lets everyone produce “above-average prose.” At the same time, the risk that “the whole organization ends up thinking alike” has now been quantitatively confirmed.
Short-term KPIs go up. Long-term decision quality drifts down.
③ What does this look like through an emotion-AI lens?
Here is the point Affectosphere Group wants to emphasize.
Our lab’s core stance is to handle emotion “as ambiguous and polysemous as it actually is.” The reason is simple: human feelings and opinions are not the kind of thing that survives being collapsed into averages or majority votes.
The same structure shows up in homogenized thinking.
In a hiring round, everyone submits AI-written essays. The prose is uniformly good. But “the person with an unusual sense of unease, who raises a point from a weird angle” — that person becomes invisible on writing-quality alone.
This is structurally identical to what happens when emotion-label datasets capture only the majority emotion and discard minority judgments.
In other words —
If you evaluate only on “accuracy, efficiency, quality,” the minority originality you most wanted to find sinks below the resolution of your evaluator.
This is not a problem with the AI tool. It is a problem with organizations that have not redesigned their evaluator — that is the honest read, from someone who studies emotion AI.
So what do you do starting tomorrow
Three field-level things, so this is not just risk-fanning.
- Redesign your prompts: rewrite “questions you can pass by writing well” into “questions where you only score if you bring a point, counterexample, or experience others did not.” This alone starts surfacing the difference between AI-outsourcers and original thinkers.
- Deliberately build “no-AI” stretches into training programs: do problem framing and hypothesis formation without AI. Use AI for the polishing phase. This protects the muscle of independent thinking surprisingly well.
- Periodically inventory the diversity of your internal documents: once everyone uses the same AI tool, proposal decks and meeting docs start converging. Once a quarter is enough — make it a habit to review “how wide is the argumentative range in our recent internal documents.”
A side note for L&D vendors and HR-tech players: there is another opportunity in sight. Productizing rarity scoring along the lines of AROA could become a new evaluation foundation for the AI-assisted era.
Closing
The evaluator we built to measure “good writing” is, I think, half-retired now thanks to AI.
This is not a sad story. It is more a chance to ask: “what did we actually want to measure?”
Writing quality was always a proxy. What we really wanted was people who can think independently. And now, at the research-lab level, we are getting tools that measure that core thing directly.
Just as emotion AI is moving toward “feelings that do not get collapsed into averages,” the evaluation of thinking should be moving toward “originality that does not get collapsed into averages.” That is what I believe.
So — that is it for today.
If “we thought we were screening on writing skill, but maybe we were screening on AI skill” rang a bell, please open up your evaluation rubric one more time.
References
- Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji (2026). Argument Rarity-based Originality Assessment for AI-Assisted Writing. arXiv preprint.
- Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji (2026). Does AI Homogenize Student Thinking? A Multi-Dimensional Analysis of Structural Convergence in AI-Augmented Essays. arXiv preprint.
* This article was written in part with AI assistance and may contain inaccuracies.
Footnotes
-
Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji, “Argument Rarity-based Originality Assessment for AI-Assisted Writing”, arXiv preprint, 2026. ↩
-
Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji, “Does AI Homogenize Student Thinking? A Multi-Dimensional Analysis of Structural Convergence in AI-Augmented Essays”, arXiv preprint, 2026. ↩