AI Ready School

Scientists Gave AI a Test Used on the Human Brain — and Found Its Biggest Weakness

July 10, 2026

Chiranjeevi Maddala

The test that has measured human focus for nearly a century

Researchers gave today's top AI models a classic attention test used in psychology and found a major flaw. While the models could correctly name colors in short lists, their performance deteriorated sharply as the task became longer and more complex. Some leading systems fell from over 90% accuracy to nearly complete failure.

The study, conducted by Suketu Patel and colleagues, explored how transformer-based machine attention differs from human attention by testing AI models on the Stroop task — a test in which words for colors are printed in colored ink, and participants are asked to name the ink colour of each word while deliberately ignoring what the word says. If the word "red" is printed in blue ink, the correct answer is blue. The instinct to read the word rather than name the colour is exactly what the test is designed to catch. Fortune

Psychologists use this task to measure executive control — the set of mental processes that helps people regulate attention, resist distractions, and stay focused on a goal, even when something keeps pulling their focus elsewhere. It is one of the most widely used tools in cognitive psychology, deployed in clinics and research labs to assess everything from childhood attention disorders to age-related cognitive decline. The findings were published by Suketu Chandrakant Patel, Hongbin Wang, and Jin Fan in PNAS Nexus, the open-access journal of the National Academy of Sciences.

What the researchers found

When the word and ink colour did not match, the AI models performed well with a list of five words. But that is where the similarity to human performance ends.

Although humans generally take slightly longer to answer correctly when words and colours are mismatched than when they match, they can still perform stably and with high accuracy even on long word lists. Transformer-based large language models, by contrast, show a dramatic decline in accuracy on the Stroop task as the word list length increases — particularly when word meaning and ink colour are mismatched. Unlike humans, who maintain high accuracy regardless of list length, the AI models default to reading the word rather than naming the colour, and fail to sustain task focus.

Some systems dropped from over 90% accuracy to near complete failure as the lists grew longer and more intricate. This is not a small or marginal effect. It is a collapse — the kind of cliff-edge failure that suggests something structurally different is happening inside an AI's attention mechanism compared to a human's, not simply a difference in degree. DeepSeek

Dr. Elena Martinez, a cognitive scientist at the University of California, Berkeley, who was not involved in the study, noted that the results align with previous research on AI's struggles with sequential reasoning: "These models excel at pattern recognition but lack the flexible attention spans seen in human cognition. This could explain why they sometimes produce logically inconsistent outputs in extended interactions."

Why this happens — and why it matters more than it sounds

The study's methodology drew parallels to the limitations of transformer-based architectures, which dominate modern AI systems. Transformers rely on self-attention mechanisms to process input data, but their effectiveness diminishes as sequences grow longer.

This finding cuts directly against one of the most common assumptions people make about AI in 2026: that bigger, more capable models with longer context windows simply do everything better as tasks scale up. AI Ready School has covered exactly that trend all year — Gemini's 2-million-token context window, Fable 5's expanded capabilities, models that can hold entire textbooks in memory. Those advances are real. But this study identifies something different and more fundamental: even when a model's memory can technically hold a long task, its capacity to actively resist distraction and stay focused on the right goal throughout that task is a separate capability — and it is one where AI, even today's most advanced systems, behaves nothing like a focused human mind.

A model can have a vast context window and still lose its grip on what it is supposed to be doing the moment the task gets long enough and the distractions get loud enough. Remembering information and staying disciplined about a goal while filtering out noise are not the same skill. Humans, even young children, are remarkably good at the second one. The Stroop study suggests that today's AI, for all its fluency, is not.

What this means for how schools should think about AI

This is precisely the kind of finding that deserves a place in every conversation about AI literacy — not because it is alarming, but because it is genuinely clarifying about what AI is and is not.

AI Ready School's philosophy has never asked whether AI is "smart" in some single, simple sense. The more useful question — for a teacher, a parent, or a student deciding how much to trust an AI's output — is: smart at what, specifically, and under what conditions does that intelligence hold up? This study gives a precise, scientifically grounded answer to part of that question. AI is excellent at naming a colour in a short, simple test. Its accuracy can collapse toward failure on a longer, more demanding version of the exact same task — quietly, without necessarily flagging to the user that anything has gone wrong.

This is the executive function gap that Cypher is built around addressing in students themselves, and it is worth naming explicitly here: the capacity to sustain focus on a goal, resist an easier or more obvious but wrong response, and stay disciplined through a long, demanding task is precisely the kind of cognitive strength that well-designed education should be building in children. The Stroop study is a reminder that this capacity is not something AI has simply inherited by being trained on vast amounts of human-generated text. It is a distinct cognitive skill, and right now, the evidence suggests AI does not reliably have it once a task grows long and complicated.

For a student using AI to help with a long research project, a multi-step maths problem, or an extended piece of writing, this has a very practical implication. The longer and more complex the task an AI is asked to perform, the more important it becomes for the human directing it to stay actively engaged — checking the output, watching for the moment where focus may have quietly slipped, rather than assuming that a model which got the first few steps right will reliably sustain that accuracy all the way through. This is exactly the habit of mind that AI Ready School's products are designed to cultivate: not passive trust in an AI's output, but active, ongoing evaluation of it, especially as tasks become longer and more demanding.

It is also a genuinely hopeful finding, read the right way. The cognitive skill this study shows AI struggling with — sustained executive control, the ability to stay disciplined on a real goal while the easier, wrong answer keeps calling for attention — is precisely the skill that good teaching, good mentorship, and a well-designed school environment have always been uniquely positioned to build in a child. It is not a skill that comes from passively absorbing information. It comes from practice, from being asked repeatedly to hold focus on something hard, from teachers and mentors who notice when a child's attention has drifted and gently bring it back. That is not a task that can be outsourced to an AI that, according to this study, struggles with sustaining its own attention under exactly the same kind of pressure.

The sentence worth remembering

"Humans can often sustain focus on a specific goal while filtering out competing information. The results suggest that current AI models may struggle with this type of cognitive control when tasks become increasingly demanding." TNW | Launch

This is not a reason to distrust AI broadly, or to dismiss its genuine and growing capabilities. It is a precise, well-evidenced reminder that intelligence is not one single thing, and that the particular kind of disciplined, sustained focus that defines a genuinely well-educated mind remains, for now, a distinctly human achievement — one that schools are still the best place in the world to build.

‍

Back to AI Updates